Simultaneous estimation of amylose, resistant, and digestible starch in pea flour by visible and near-infrared reflectance spectroscopy

ABSTRACT Due to its health benefits, resistant starch (RS) has received increasing attention from the public, and there is a need to develop methods to measure the amylose and RS concentration in pea (Pisum sativum L.) flour. The aim of this study was to develop a visible and near-infrared reflectance (vis–NIR) model for the simultaneous determination of amylose, RS, and digestible starch (DS) in pea flour. A total of 123 dry pea samples consisting of different pea varieties grown in different environments were collected, and ground to flour, and then the vis–NIR spectra were scanned. The amylose, RS, and DS contents of the pea flours were also measured by an enzymatic colorimetric assay. The spectra data were calibrated with the enzymatic colorimetric-assayed values. Results showed that amylose, RS, and DS in the pea flours can be simultaneously estimated using the vis–NIR spectra. Instead of using the full spectrum (300–2300 nm), we found the most efficient wave bands lying in the visible region between 370 and 560 nm and the NIR spectra in the range of 1600–1800 nm. Using the stepwise regression with backward elimination method, the multiple linear regression (MLR) models were developed from the most efficient wavelengths. The MLR models had the determination coefficients R2 of 0.95, 0.76, 0.80, and 0.88 for amylose, RS, DS, and total starch, respectively. The correlation coefficients between model estimated and the enzymatic colorimetric assayed values were 0.97, 0.80, 0.85, and 0.93 for amylose, RS, DS, and total starch, respectively.


Introduction
Pea (Pisum sativum) is one of the major legumes in the world used for food and feedstuff. In the Northern Great Plains (NGP) of the United States, pea production has increased rapidly during the last decade; Montana alone planted 247,000 ha of pea in 2016. [1] A great portion of the dry peas produced in the NGP is exported to Asian countries to be fractionated into protein, starch, and fiber. With 46% starch, 20% (W/W) protein, and 20% fiber contents, [2] dry pea is not considered a major starch source compared to rice (Oryza sativa L.), wheat (Triticum aestivum L.), corn (Zea mays L), and potato (Solanum tuberosum L.). Nevertheless, pea starch is widely processed into noodles in food industries, [3] and pea starch is considered the second best material (after mung bean) among all grain legumes for processing starch noodles. [4] Pea starch, like starch from other crops, consists of amylose and amylopectin. Based on its digestibility, pea starch can be classified into digestible starch (DS) and resistant starch (RS). RS is the starch that is resistant to digestion in the small intestine, [5] and it can be fermented in the colon and large intestine to produce short-chain fatty acids, [6][7][8][9] which may lead to various health benefits such as improving the insulin response, increasing satiety, [10][11][12] and reducing the risk for colon cancer, obesity, diabetes, and inflammatory bowel disease. [13][14][15][16][17] Research has shown that higher amylose content is correlated with elevated levels of RS; higher levels of RS contents were found in maize and barley that had higher amylose contents. [18][19][20] High amylose, or RS, can result in reduced digestibility and low glucose release, [21] which contributes to low glycemic indices. The content of amylose or RS in the grains depends on the growth conditions and timing of harvest of the crops, and it varies considerably between plant species and varieties. Variable RS contents have been reported among different genotypes of rice and wheat. [22,23] Strydhorst et al. [24] found winter pea (cv. Windham) had lower total starch content but higher RS and protein content than spring pea cultivars. Wrinkled peas have a very high amylose content of 63-70% compared to around 31% in the normal pea varieties. [25,26] Tao et al. [27] analyzed nine common pea varieties grown in different environments in Montana and found RS contents varied from 0.5% to 5.3% and RS content is influenced by both genetics and growing environment.
Because of the benefits of RS to human health, there has been an increasing interest in knowing the content of amylose or RS in our daily diets. Grain industries are also interested in segregating commodities with different amylose and RS contents for different food processing purposes. Various methods have been developed for the determination of amylose or RS contents in grain, [28][29][30][31][32] but those laboratory methods are complicated, requiring sample preparation and chemical manipulations making them laborious, costly, and time consuming. There is a need to develop a quick measuring method to screen amylose and RS in grains and flours.
The visible and near-infrared reflectance spectroscopy (vis-NIRS) technique may provide an alternative rapid, nondestructive, and cost-effective method for starch measurement. Several researchers have developed NIRS methods to determine amylose and total starch in wheat, rice, and bean, [33][34][35][36] but we have not found NIRS or vis-NIRs models being developed to determine amylose and RS in peas. Furthermore, it is not clear if a model can be developed to determine amylose, RS, and DS simultaneously. The objective of this study was to explore the potential of using the NIRS models to predict the contents of starch composition in pea flours.

Materials
A total of 123 pea seed samples consisting of different varieties grown in different environments were collected from Montana. These were expected to have a wide range of total and RS, as well as amylose contents. Pea seeds were ground into flour by passing through a 1-mm mesh screen using an Udy Cyclone Sample Mill (Udy Corporation, Fort Collins, Co).

Enzymatic colorimetric assay
Petroleum ether was used to remove fatty acids from pea flours. After removing the fatty acids, amylose was determined according to the colorimetric method of Williams et al. [37] using potato amylose as a standard. RS and DS were analyzed using the Megazyme resistant kit using glucose solution (1.0 mg/ml) as a standard. Total starch was calculated as the sum of RS and DS. Each sample of pea flours was analyzed in triplicates.

Vis-NIR measurement
The vis and NIR reflectance of the pea flours were measured with a miniature fiber-optic spectrometer (StellarNet Inc., USA), which measures spectral reflectance between the wavelength of 200 and 1080 nm in the BLACK-Comet Series (C-SR), and between 900 and 2300 nm in the RED-Wave NIR InGaAs Series. A RS50 white reflectance standard was used to optimize the instrument prior to taking the reflectance measurements. Samples were scanned over the spectra range of 300-900 and 900-2300 nm separately. After the scanning, the two wavelength ranges of spectra for each sample were combined into one spectrum (Fig. 1).

Data analysis
To enhance signal-to-noise ratio and to reduce baseline and background shifts, the spectral data were preprocessed with the Unscrambler 9.7 software (CAMO Software AS, Oslo, Norway) using different mathematical transformation methods, including vector normalization, multiplicative scattering correction (MSC), Savitzky-Golay first derivative (S.G. 1st derivative), and standard normal variate (SNV).
After preprocessing the data, the partial least squares (PLS) regression was used to get the fundamental relation between the spectral data and the enzymatic colorimetric assayed values of the starch components. The accuracy of the estimation models was tested by leave-one-out cross validation. The coefficient of determination (R 2 ) and the root mean square error (RMSE) were calculated and used as the criteria for the goodness of fit of the models. A good PLS model should have a high value of R 2 and a low value of RMSE. [38] Based on these criteria, the PLS models that used different data preprocessing methods were compared.
Since the PLS model with the full spectrum (300-2300 nm) is difficult to use, we tried to build a multiple linear regression (MLR) model with reduced most effective wavelengths. First, the data set was divided into calibration and validation subsets. To ensure that the selected data points for the calibration and validation subsets can cover the whole data range, cluster analysis was employed to explore the data distribution using the SPSS 19.0 (IBM, USA) software. [39,40] The dendrite form cluster graphs were plotted from all data points, and then the calibration and validation subsets data were randomly selected from each cluster at a proportion of 3:2. Finally, a total of 86 samples were selected for the calibration subset and 37 samples for the validation subset.
A correlation analysis was conducted between the enzymatic colorimetric assayed values of starch components (i.e., amylose, RS, DS, and total starch) and the spectral values at each wavelength in the range of 300-2300 nm at 1.0 nm intervals. Based on the values of the correlation coefficient, the highly correlated wavebands were determined and selected. After selection of the highly correlated wavebands, MLR models were built for amylose, RS, DS, and total starch. A stepwise regression with backward elimination procedure was employed using the SPSS 19.0 software (IBM Corp., Armonk, NY) and only the most correlated wavelengths were retained in the MLR models. Coefficients of Figure 1. Visible and near infrared reflectance spectra from scanning pea flour samples by using a miniature fiber optic spectrometer (StellarNet Inc., USA). The X-axis is wavelength (nm) and Y-axis is the reflectance (%). determination (R 2 ) were used to estimate the amount of explained variance in each model and to provide a measure of goodness of fit. The accuracy of each calibration model was subsequently validated using the validation subset of data, based on the correlation coefficient of the model estimated values versus laboratory measured values.

Results and discussion
Statistical description of the data sets for amylose, RS, DS, and total starch The range, mean, standard deviation, and coefficient of variation (CV) of the data sets are presented in Table 1 for amylose, RS, DS, and total starch. The contents of amylose and RS displayed large variations in concentrations with CV of 51.94% and 50.03%, respectively. The total starch and DS data sets showed smaller variations (CV = 19.55% and 20.14%, respectively). According to Brunet et al., [41] an accurate model calibration could be achieved if a data set covered a wide range. The data range for the pea starch components in this study was considered suitable for developing the spectral calibrations, and more accurate estimation models will be expected to be developed for amylose and total starch due to the wider data ranges.
Comparison of spectral data preprocessing procedures Table 2 presents the PLS regression models for estimation of amylose and RS. Comparing the model built on the unprocessed spectral data, the models built on the data that were preprocessed by normalization, MSC, SNV, and S.G. 1st derivative showed different degrees of reduced RMSE and improved R 2 . We found that the S.G. 1st derivative procedure is the suitable procedure for both amylose and RS spectral data. This is based on the RMSEP, R 2 p , RMSECV, R 2 cv calculated for each model. The best data preprocessing procedure must result in the smallest root mean square error of cross validation (RMSECV), the smallest root mean square error of prediction (RMSEP), and the highest coefficient of determination (R 2 cv and R 2 p ). Although the SNV procedure resulted in the lowest RMSEP and RMSECV, and the highest R 2 cv and R 2 p for the amylose, this procedure did not result in significant reduction in RMSEP and RMSECV or improvement in R 2 cv and R 2 p for the RS ( Table 2). The S.G. 1st derivative procedure had a better estimation of both amylose and RS. Therefore, this method was selected to preprocess the spectral data in this study.

Determination of efficient wavebands and construction of MLR models
Since there are many redundant wavebands when modeling with the whole spectrum (300-2300 nm with 1 nm intervals), [42] we conducted the correlation analyses to determine the highly correlated wavebands. Table 3 displays the highly correlated wavebands and the values of their correlation coefficients for amylose, RS, DS, and total starch. Results in Table 3 indicated that all the starch components had strong correlations with some visible spectral wavebands (r 2 > 0.5) and were negatively correlated with certain NIR spectra wavebands. The strong correlations with visible spectral wavebands in this study were likely due to the color of pea flours. In this study, the pea samples had color variations. Amylose appeared to have the highest coefficient of correlation (r 2 up to 0.91) with the selected wavebands, followed by total starch (r 2 up to 0.816). The efficient wavebands for amylose and RS lay in the spectra regions of 384-534, 1066-1093, 1243-1426, and 1600-1654 nm, while the efficient wavebands for total starch were 390-540, 1348-1420, 1618-1641, and 1762-1837 nm. The highly correlated wavebands for DS were similar to that for total starch.
After determination of the efficient wavebands, a stepwise MLR with the backward elimination method was employed to build the linear regression models for amylose, RS, DS, and total starch (Table 4). Results in Table 4 indicated that all models were well fitted, which explained 95.2%, 75.6%, 80.3%, and 88.1% of the variations in the calibration data sets for amylose, RS, DS, and total starch, respectively, with the corresponding adjusted R 2 of 0.952, 0.756, 0.803, and 0.881. Compared to the full The root mean square error of prediction and the correspondent coefficient of determination; RMSECV and R 2 cv : the root mean square error of cross-validation and the correspondent coefficient of determination.   PLS models that used the whole spectrum (300-2300 nm), the numbers of wavelengths in the reduced MLR models were between 8 and 13, and only the most significant wavebands were used ( Table 4).

Validation of MLR regression models
To validate the estimation ability of the MLR models built with the most significant wavelengths, we input the reflectance values in the validation data set (n = 37) to estimate the concentration of amylose, RS, DS, and total starch. Then, the model estimated values were compared with the actual measured values. Scatter plots and the correlation coefficient R 2 values between the estimated and actual measured are shown in Fig. 2. All MLR models performed very well in estimating starch components; the R 2 values were ranged from 0.80 to 0.97, indicating a very good fit. Vis-NIR spectra models performed the best in estimating amylose and total starch (R 2 = 0.97 and 0.93, respectively) followed by RS and DS (R 2 = 0.80 and 0.85, respectively).

Correlations among amylose, RS, DS, and total starch
Results from the correlation analyses showed strong correlations (P < 0.001) among the starch components (Table 5), except for DS and RS. Amylose was significantly correlated with total starch, RS, and DS (P < 0.001). The correlations between pea amylose and RS in this study were in agreement with the previous studies on maize. [20] No significant correlations between the resistant and DS in this study were likely due to the small percentage of the RS (0.9-11.7%) compared to the DS (25-86%). The total starch had more influence in the DS than the RS, because the variations of the RS were relatively small (Table 1).

Conclusion
This study demonstrates that the vis-NIR reflectance spectroscopy technology can be used to estimate the starch components in dry pea flours. We successfully developed MLR models to simultaneously estimate amylose, RS, DS, and total starch. The MLR models used 8-13 most correlated wavelengths that are simple and easy to use compared to the PLS regression models that used the full spectra (300-2300 nm). The MLR models had coefficients of determination R 2 from 0.76 to 0.95 and showed good correlations between the model estimated and laboratory measured values, with the correlation coefficients from 0.80 to 0.97.