Bulk Protein and Oil Prediction in Soybeans Using Transmission Raman Spectroscopy: A Comparison of Approaches to Optimize Accuracy

Rajveer Singh, Tomasz P. Wrobel, Prabuddha Mukherjee, Mark Gryka, Matthew Kole, Sandra Harrison, Rohit Bhargava

Research output: Contribution to journalArticlepeer-review


Rapid measurements of protein and oil content are important for a variety of uses, from sorting of soybeans at the point of harvest to feedback during soybean meal production. In this study, our goal is to develop a simple protocol to permit rapid and robust quantitative prediction of soybean constituents using transmission Raman spectroscopy (TRS). To develop this approach, we systematically varied the various elements of the measurement process to provide a diverse test bed. First, we utilized an in-house-built benchtop TRS instrument such that suitable optical configurations could be rapidly deployed and analyzed for experimental data collection for individual soybean grains. Second, we also utilized three different soybean varieties with relatively low (33.97%), medium (36.98%), and high protein (41.23%) contents to test the development process. Third, samples from each variety were prepared using whole bean and three different sample treatments (i.e., ground bean, whole meal, and ground meal). In each case, we modeled the data obtained using partial least squares (PLS) regression and assessed spectral metric-based multiple linear regression (metric-MLR) approaches to build robust prediction models. The metric-MLR models showed lower root mean square errors (RMSEPs), and hence better prediction, compared to corresponding classical PLS regression models for both bulk protein and oil for all treatment types. Comparing different sample preparation approaches, a lower RMSEPs was observed for whole meal treatment and thus the metric-MLR modeling with ground meal treatment was considered to be optimal protocol for bulk protein and oil prediction in soybean, with RMSEP values of 1.15 ± 0.04 (R2= 0.87) and 0.80 ± 0.02 (R2= 0.87) for bulk protein and oil, respectively. These predictions were nearly two- to threefold better (i.e., lower RMSEPs) than the corresponding NIR spectroscopy measurements (i.e., secondary gold standards in grain industry). For content prediction in whole soybean, incorporating physical attributes of individual grains in metric-MLR approach show up to 22% improvement in bulk protein and a relatively mild (up to ∼5%) improvement in bulk oil prediction. The unique combination of metric-MLR modeling approach (which is rare in the field of grain analysis) and sample treatments resulted in improved prediction models; using the physical attributes of individual grains is suggested as a novel measure for improving accuracy in prediction.

Original languageEnglish (US)
Pages (from-to)687-697
Number of pages11
JournalApplied Spectroscopy
Issue number6
StatePublished - Jun 1 2019


  • MLR
  • NIR spectroscopy
  • PLS regression
  • Soybean
  • multiple linear regression
  • near-infrared spectroscopy
  • transmission Raman spectroscopy

ASJC Scopus subject areas

  • Instrumentation
  • Spectroscopy


Dive into the research topics of 'Bulk Protein and Oil Prediction in Soybeans Using Transmission Raman Spectroscopy: A Comparison of Approaches to Optimize Accuracy'. Together they form a unique fingerprint.

Cite this