Near Infrared Spectroscopy technology for prediction of chemical composition of natural fresh pastures

ABSTRACT This study evaluates the potential of Fourier-Transform Near Infrared Spectroscopy (FT-NIRS) to estimate the chemical composition of fresh natural pastures of Tuscany without previous drying and grinding. Chemical composition of herbage samples is determined by applying usual chemistry. FT-NIRS calibration and cross-validation were developed applying spectra pre-treatment and two statistical models: partial least square regression and principal component regression. The results are evaluated in terms of coefficients of determination (R2), root mean square error (RMSE) and residual prediction deviation (RPD). Calibration results, using partial least square models, obtained a R2 in calibration greater than 0.95 for dry matter and crude protein, intermediate values (>0.75) for the fibre fraction and lower results for ash and crude fat (<0.75). The chemometric analysis shows lower results using principal component regression than partial least square models, although dry matter and acid detergent fibre obtained relatively high R2 in calibration (0.876 and 0.863, respectively). Cross-validation achieved both lower R2 and higher errors than calibration. Despite the wide variability of the data set, the results suggest that coupling FT-NIRS with partial least squares analysis allows us to estimate some chemical parameters of natural pastures, while the use of principal component regression models needs further evaluation.


Introduction
Reliable assessment of the nutritive value of feed is a prerequisite for a qualitative and quantitative approach in animal nutrition. Since its introduction, Near Infrared Reflectance Spectroscopy (NIRS) has been recognized as tool in the determination of chemical composition of dried forages (Norris et al. 1976). Today, in many research areas there is an increasingly utilization of NIRS as flexible technology to predict quantitative and qualitative parameters of animal production (Prieto et al. 2009), other products of agriculture origin but also various industrial, pharmaceutical and bioenergetic materials (Roberts et al. 2003).
Unlike chemical analysis, NIRS is a non-destructive method, requiring a small amount of sample and producing reduced chemical waste (Park et al. 1998;Stuth et al. 2003); furthermore, this technology can provide a multiple evaluation of constituents (Roberts et al. 2003). The method is based on the absorption of wavelengths in the near-infrared electromagnetic region by certain molecular groups particularly involving hydrogen bonds (Deaville and Flinn 2000) which are the primary constituents of the organic compounds of plant and animal tissues (Foley et al. 1998). The spectrum obtained may act as a 'fingerprinting' (Woodcock et al. 2008) of sample and can be linked with its secondary characteristics. A further advancement of NIRS technology is Fourier Transformation (FT-NIRS) where there are improvements in signal-to-noise ratio, in spectral resolution, in wavenumber accuracy and a time-scan reduction (Shiroma and Rodriguez-Saona 2009;Dvořáček et al. 2012).
In all case, NIRS is an indirect method and chemometric analysis is necessary in order to compare spectral results with the samples of known composition (Shenk et al. 1992). A multivariate model, often involving a large number of regression procedures, is constructed by developing a regression equation between spectral absorbance and the characteristic of interest obtained in traditional laboratory analyses Westerhaus 1991, 1993;Deaville and Flinn 2000;Roberts et al. 2003).
Nowadays, techniques of statistical learning are numerous, varying from the most restrictive techniques such as the regression models up to the more flexible approaches (e.g. ANN, SVM, Random Forest and LS-SNV) which usually require a long and laborious phase of tuning as well as a difficult association between predictors and responses. Among the chemometric methods the most commonly used in NIRS analysis is partial least square (PLS), a multivariate regression method spread for its simplicity, rapidity and good performance. This statistical model extracts the information of the NIRS spectra and reference data parameters and compress it in a set of new independent latent variables (LVs) (Adams 1995;Kovalenko et al. 2006). Nevertheless, also other methods, such as principal component regression (PCR), can be an effective data mining technique that reduces the number of variables extracting information from NIR spectra and compresses into a few principal components (PCs) (Park et al. 2003).
Among the various factors influencing the prediction capacity of NIRS, an important role is covered by the molecular vibration mechanisms of spectrum, the mathematical and statistical procedures performed ) and the submission and preparation conditions of the sample (Prieto et al. 2009).
In animal nutrition, NIRS has been successfully used to predict the chemical composition and digestibility of dried and milling hay, silage and feedstuff as provided by Roberts et al. (2003) but, to our knowledge, only some publications have studied the NIRS capacity to predict the parameters related to pastures or complex botanical resources (Danieli et al. 2004;Andrés et al. 2005;Lobos et al. 2013;Parrini et al. 2017). Even fewer are the studies about the use of fresh samples and the effect of sample preparation. Alomar et al. (2009) considered fresh herbage of Southern Chile pasture, while Reddersen et al. (2013) studied the effect of sampling conditions (standing sward, silage, hay/chopping and milling) on the determination of nitrogen, ash and NDF on organic matter basis (NDFom), on fresh grassland biomass. Furthermore, some research studies, aimed to predict the botanical composition, had developed models to discriminate between grass and one or more leguminous species, but not many authors considered the strategies for grass mixtures (Cougnon et al. 2013).
Evaluation of the nutritional characteristics of the fresh product by FT-NIRS may allow a further reduction of cost and time analysis and, in animal nutrition, it could be particularly useful in mixed natural pastures that change their composition over time. Nevertheless, fresh herbage due to the natural presence of water cannot be grounded by traditional mill and this can affect the performance of NIRS calibration.
The aim of our research was the evaluation of FT-NIRS as an easy and fast method for the assessment of the chemical composition of natural pasture herbage, considering fresh herbage with a high water content, direct sample scans (large particle size) and an 'open population' of herbage with mixed species and applying two statistical models that use PLS regression and PCR.

Forage samples set
This study was carried out on 100 real samples collected in the period 2013-2014 from March to November, spanning over the full vegetative period of the natural and naturalized pastures. The samples were harvested in hilly and mountainous areas of Tuscany where pastures were dedicated to animal grazing or having the potential to be grazed. As described in Parrini et al. (2017), samples are obtained by an area of 1 m 2 representative of pastures. Each sample was composed of a different number of species and the proportion of each species in collected samples was highly variable. Predominant herbage species were as follows: Avena fatua L, Capsella bursa pastoris L, Dactylis glomerata L, Festuca ovina L, Festuca pratensis H, Holcus lanatus L, Lolium perenne L, Poa pratensis L, Poa annua L, Trifolium pratense L, Trifolium repens L, Ranunculus bulbosus L, Taraxacum officinale GH Weber ex Wiggers.

Sample preparation, FT-NIR spectral acquisition and chemical analysis
Fresh sample was cut to 2-4 cm with hand shears and mixed by hand. For each sample, three aliquots (randomly subsampled) were exposed, by a cup spinner, to an electro-magnetic scan in the absorbance mode using a FT-NIRS Antaris II model (Thermo Scientific). For each aliquot, spectral measurement was obtained from 32 scans performed at a wavenumber resolution of 4 cm −1 over the range of 4000-9999 cm −1 and corrected against the background spectrum of room environment which was performed routinely. The average spectrum of three measurements (Figure 1) was used as the final spectrum of each sample to assess the potential of prediction FT-NIRS.
After spectra collection, each sample was dried in a forced air oven at 60°C to constant weight, then was grounded through a mill (Brabender OHG, Duisburg) to pass 1 mm and analysed for the main chemical components.
The chemical analysis was performed according to AOAC (2012) protocol: dry matter (DM) content using the 934.01 method, crude protein (CP) by the 976.05 method, ash via the 942.05 procedure, ether extract (EE) using the 2003.05 method, acid detergent fibre inclusive of residual ash (ADF), and Lignin (sa) using the 973.18 method. Neutral detergent fibre, inclusive of residual ash (NDF), has been determined according to the procedure described by Van Soest et al. (1991).

NIRS calibration
Calibration and validation models were obtained, correlating FT-NIRS pre-processed spectral data with results from the wet chemistry. Mathematical pre-treatment to spectra and outliers evaluation was performed using the chemometrical software Result-TQ Analyst 8.6.12 (Thermo Fisher Scientific 2011). Multiplicative scatter correction (MSC) was applied to all spectrum, in order to eliminate optical interference (Martens et al. 1983), as well as physical effects like particle size and surface blaze at spectra wavenumber (Maleki et al. 2007). Moreover, a set of outliers spectra was identified considering a confidence limit of 5% and removed when necessary.
Finally, some mathematical spectra pre-treatments were used in order to optimize the extraction of useful information from the spectra. Each calibration model was optimized applying the first-order derivative, data normalization and correction for constant error, choosing the combination that provided the best result in terms of coefficient of determination.
Chemometrics were applied using two different linear methods of data analysis: PLS regression and PCR, both developed using the software TQ Analyst 8.6.12 (Thermo Fisher Scientific 2011). For each chemical constituent, an individual model was developed and the number of PLS factors or PCs retained was the one with the lowest error in cross-validation, considering the results of the PRESS (predicted residual error sum square). The two approaches considered the same set of samples and mathematical pre-treatments.
To test the robustness of the PCR and PLS models, both PLS regression and PCR were fully cross-validated using the 'leaveone-out' method, where a single sample is removed from the model and the model rebuilt without the sample.
The best calibration equation between chemical references value and FT-NIRS data was evaluated based on the highest coefficient of determination and smallest root mean square error (RMSE) in calibration (R 2 c -RMSEC) and in cross-validation (R 2 cv -RMSECV), respectively. RMSE, which suggests information about the adjustment of the model to the calibration data, was calculated considering the number of samples, the results of the reference analysis and the estimated results by the NIR model.
Goodness and accuracy of models were tested using the residual prediction deviation (RPD) calculated, according to Williams and Sobering (1995), as the ratio of the standard deviation of reference values to the root mean square error in calibration (RMSEC) and in cross-validation (RMSECV).

Results
The range, the mean and the standard deviation of samples used in calibration and cross-validation models, valued in their chemical entities (DM, CP, Ash, EE, NDF, ADF and Lignin-sa) by the traditional analysis, are summarized in Table 1. Values showed a high variability in chemical parameters of the forage sampled, which reflected not only the large number of herbaceous species among pastures but also the response in tissue ageing due to the differences in the phenological stage of samples.
A preliminary chemometric analysis conducted on the main wavenumber associated with organic compounds (O-H, C-H overtone, stretch and their combination) did not show a remarkable difference with respect to the analysis that considered the full wavelength of NIRS spectra. Hence, this study only shows the results using the full NIRS spectra.
The results of the best calibrations and the different mathematical treatments, obtained by the FT-NIRS analysis for each parameter using PLS model regressions, are shown in Table 2. The best models in the set of fresh samples analysed were reported for DM and CP, intermediate calibrations were obtained for the fibrous fraction (NDF, ADF, Lignin-sa), while the lower results were shown by ash and crude fat. Excluding these last parameters, the R c 2 values in calibration were always higher than 0.78; in the cross-validation model, R 2 cv were higher than 0.85 and RMSECV was included between 0.2 and 4. The relation between RMSECV and R 2 cv and the numbers of PLS factors for DM, CP and NDF are represented in Figure 2. The two statistics showed a decreasing or increasing trend up to 4-5 PLS factors with no substantial changes after this point. This trend has been shown by all parameters considered.
RPD for PLS models in calibration shows values higher than 3 for dry matter and crude protein, followed by ADF, while values around 2 are obtained for the other parameters excluding lipids. In cross-validation, RPD cv values were always lower with results between 2.2 and 2.7 for CP, ADF and DM, between 1.5 and 1.8 for NDF, Lignin (sa) and Ash; while only 1.2 for crude fat.  The total variance explained by the PCs used in the models is reported in Table 2 ranging from 94.4% to 99.9%. Furthermore, for all the parameters considered, the fraction of the variance explained by each factor is reported in supplementary Table  1. A limited number of PCs (from 3 to 7) are needed to fully explain the variability of the trait.
The performance of PCR is given in Table 3. This chemometric analysis shows lower results with respect to PLS models, nevertheless DM and ADF obtained relatively high R 2 c in calibration (0.876 and 0.863, respectively). Cross-validation presented lower R 2 cv compared to PLS, except for ash and crude fat. RMSEC of PCR models in calibration is similar to PLS regressions, ccoefficient of determination in calibration; PCprincipal component; RMSECroot mean square error of calibration; RPD cresidual prediction deviation of calibration; R 2 cvcoefficient of determination in cross-validation; RMSECVroot mean square error of cross-validation; RPD cvresidual prediction deviation of cross-validation; Math tr. -Math treatment: 1. multiplicative scatter correction, 2. first derivative, 3. data normalization, 4. correction for constant error. Figure 2. Relation between RMSECV and R 2 cv , and PLS factors. RMSECVroot mean square error in cross-validation; R 2 cvcoefficient of determination in cross-validation of crude protein; DM RMSECVroot mean square error in cross-validation of dry matter; CP RMSECVroot mean square error in cross-validation of crude protein; NDF RMSECVroot mean square error in cross-validation of neutral detergent fibre; DM R-squarecoefficient of determination of dry matter; CP R-squarecoefficient of determination of crude protein; NDF R-squareof neutral detergent fibre. except for DM and CP with error values higher than 3 and a similar trend was observed in cross-validation always with higher errors for DM and CP. Using PCR, RPD results were always lower than PLS performance, indicating that this model can contain less clear information of parameters considered.

Discussion
The high variability observed in the chemical composition of samples is desirable not only to develop NIRS calibration models in order to have a wide representation of the population but also to assess the applicability of the model in future prediction of pasture. According to Fekadu et al. (2010), the best performance in calibration equations corresponded to those traits for which the range of variability in the data set was wider, indicating that successful estimation using NIRS depends on the variability of constituents under investigation. Nevertheless, the number and consequently the size of sample of multiple natural pasture can provide less precise NIRS information than calibration of classical narrow and closed population (Vance et al. 2016).
Pearson correlation coefficients between data and wavelength were higher using PLS than PCR for almost all the parameters considered. PLS also required a smaller number of components, probably because PLS establishes a supervised dimensionality reduction through a joint covariance structure between the response and the explanatory variables Abdi (2010). On the contrary, the PCR is a two-step approach: firstly, PCA on X is applied for dimensionality reduction and to resolve potential multicollinearity problems in X. At this step, PCA is simply describing variability in X. Then, PCs are selected as independent variables in regression for predicting a response variable. The question is which PCs to select and in which order. For instance, it has been shown that although the first PCs might capture a significant proportion of the variance in X, as predictors in a PCR model they might be less important than the last PCs (Hadi and Ling 1998). Thus, PCR might be underpowered compared to other regression models, e.g. PLS. The effect of various methods for extracting and selecting PCs in PCR for predicting genomic breeding values in cattle has been investigated by Dadousis et al. (2014).
According to Lobos et al. (2013), PLS model seems to have more capacity to estimate chemical composition models for mixed herbage, despite some authors (Danieli et al. 2004) suggested that other chemometric regressions as Multiple Linear Regression (MLR) better fit chemical data.
FT-NIRS prediction capacity, using fresh samples, was lower than the prediction capacity using pre-dried samples (Danieli et al. 2004;Andrés et al. 2005). Applying NIRS on dried grounded herbage of Tuscany, Parrini et al. (2017) obtained values of R 2 c , R 2 cv and RPD always higher than those reported in the present study working on fresh matter. This result could be linked to the presence of water and to the large particle size of sample preparation. In fact, the spectrum and reliability of NIRS prediction can be complicated by a multitude of factors (Prieto et al. 2009). In particular, the high content of water in fresh sample could cause non-linear responses due to the strong absorption signals in the NIR spectra (Reeves 1994;Williams 2001). Furthermore, fresh samples cut by hand shears and mixed by hand were less homogeneous than dried and grounded samples; this less homogeneity could influence the calibration process and it is able to reduce the accuracy of estimation (Prieto et al. 2009;Reddersen et al. 2013).
Dry matter estimation, connected to the strong -OH absorption in NIRS spectra, showed a coefficient of determination higher than 0.87 in every model considered. More accuracy and relatively lower errors were shown in PLS than PCR both in calibration (1.77 vs 3.25) and in validation (2.7 vs 3.8). Alomar et al. (2009) in samples of Chile pastures obtained better cross-validation models on cut fresh sample using a reflectance technique than using an interactance reflectance mode. Nevertheless, the range of moisture values in our samples was higher than those analysed by Alomar et al. (2009) in which dry matter content was in the range of 92.10-359.80 g/kg.
Crude Protein estimation, linked to the N-H adsorption (Roberts et al. 2003), showed a coefficient of determination of 0.95 and 0.88 in calibration and cross-validation, respectively, using PLS models. Our RMSE of calibration was relatively higher compared to those reported by Fekadu et al. (2010), Andrés et al. (2005) and Parrini et al. (2017) who analysed dried herbage of pastures located in Ethiopia (0.92), Spain (1.02) and Italy (1.21) but lower than the RMSE reported by Danieli et al. (2004) (Italy;4.91). In comparison with the R 2 c in calibration here shown, Alomar et al. (2009), using Modified PLS on fresh herbage, obtained a higher value (+0.05) employing cross-validation and reflectance mode but a lower value using cross-validation and interactance mode. Estimation of CP using the PCR method was always less accurate, with R 2 c and R 2 cv included between 0.73 and 0.64 in calibration and cross-validation, respectively, and RMSE higher than those obtained with PLS.
Acid Detergent Fibre showed the better coefficient of determination among the fibrous fractions irrespective of regression models used, in line with other research studies. Also, Alomar et al. (2009), working on fresh herbage samples, showed better results for ADF than NDF with values of R 2 cv of 0.90 and 0.80 using a reflectance mode and 0.66 and 0.63 with interactance reflectance mode, respectively. Reddersen et al. (2013) working on fresh standing sward and using a distance field spectroscopy, obtained lower results of NDFom than our results. After all, NDF is a component which represents many constituents of grass cell (structural carbohydrates and Ligninsa) and there is not a direct connection between NIRS spectrum and constituent .
Lignin (sa), not considered in other studies on fresh herbage of pasture, showed the lowest R 2 among the fibrous fractions. Nevertheless, on dried samples of mixed pasture Danieli et al. (2004), Andrés et al. (2005) and Fekadu et al. (2010) reported lower and similar results for this constituent and attributed the non-satisfactory results to the negative influence of chemical methods used as the reference method. In particular, Roberts et al. (2003) attributed modest calibration results and a high error at the digestible Lignin (sa) procedure. Nevertheless, considering that Parrini et al. (2017) on dried samples showed higher result for acid detergent Lignin (sa) the lowest accuracies of our work could be attributed to the water interference of the samples.
Ash and much more lipids cannot be confidently predicted by this sample scanning NIRS. Ash showed a R 2 c and R 2 cv below average of other parameters both in calibration and in crossvalidation. The limited results for ash can relate to the absence of energy absorption in the near-infrared region of the inorganic substances as the minerals. Nevertheless, in some cases, this estimation has been possible probably due to the correlation of organic compounds with water that use a large number of wavelengths and so it is able to give significant information to the prediction models.
Furthermore, even if we do not have results for single mineral components, considering that potassium is the major component of grass ash and that it is almost present as aqueous ions, our ash results could be spuriously associated with the water content also present in this specific spectra region.
In any case, PLS regression performed slightly better than PCR. Our results in cross-validation models were similar to the study of Reddersen et al. (2013) on stand fresh swards, while were lower than the dried samples of Parrini et al. (2017).
The calibration of lipids should be possible due to the characteristic aliphatic -CH adsorption. In this study, the R 2 c was lower of 0.6, but the errors were not high (0.25-0.28). PLS models obtained similar results in calibration but better performance in cross-validation than PCR. Nevertheless, lipid prediction in grass samples is considered uncommon due to low tissue concentration and the narrow ranges in forage plants (Roberts et al. 2003;Stuth et al. 2003). Roberts et al. (2003) reported that the calibration of lipids in different samples of dried forage always showed lower result, whereas Stuth et al. (2003) suggested mixed results in the forage measures due to low variance linked to small concentration. Furthermore, as reported by Restaino et al. (2009) a narrow range in composition and a large error estimation compared to lower variability in composition do not allow us to obtain stable NIRS calibration.
RPD represents the ratio between the standard deviation of reference data and the RMSE and it was often used as a measure of the fitness in order to determine the applicability of the model. In the literature, there are many classifications of RPD value referred to NIRS methods and substrates. Williams (2001) suggests a value of RPD > 2.4 to evaluate the model goodness, while Williams and Sobering (1995) indicate that the value of 3 or more is recommended. Our result of RPD showed in PLS regression values larger than 3 for DM, CP and ADF in calibration demonstrating that the calibration model well performed the reference data. In cross-validation, RPD results might be considered adequate for DM and ADF, intermediate for CP, NDF and Lignin (sa), whereas the estimation of the other parameters is not suitable for practical use. Overall, our analysis suggests the use of, FT-NIRS in PLS for the prediction of some chemical components of fresh herbage; despite for other constituent model needs of further study. These differences suggest that the calibration models might be sensitive to the range of sample used, thus a specific range of reference values and different landscapes and sample preparations should be evaluated in order to verify the robustness of the NIRS models.
The variance explained by the PCs used in the models suggests that the first two PC explain 98% of the variability, except for NDF and protein. Further study can be applied in order to assess the utilization of models explained by a limited number of components, but that equally represent the full variability.
On the contrary, PCR, with its always lower RPD values, cannot be confidently used for the estimation of fresh herbage components of natural pasture. According to Lovett et al. (2005), these results suggest that the regression model employed has a fundamental role in the accuracy of NIRS calibration.

Conclusion
The results obtained in this work showed the potential of FT-NIRS technology to estimate the chemical composition of botanically complex as fresh herbages of natural pasture, particularly for DM, CP and ADF contents. However, in cross-validation RPD values suggest that only some parameters are barely suitable for the application use, while a differentiation of low and high values may be possible for other components, but results are not acceptable for lipids. The use of FT-NIRS on samples of fresh grasses performed better using PLS than PCR; however, the calibration results leave room for further evaluation that will consider a higher number of samples, but also different pasture areas.
The power of NIR spectroscopy is certainly complicated by the inherent increase in variability that is associated with natural pasture: calibration and validation become more challenging when environmental variation is high. However, the variation is itself the attribute of interest and in the agrosystem that considering plants and animals those factors and their interaction do not allow to standardize the sample collection. Finally, implementation of FT-NIRS on diversified grassland population could lead to many advantages in animal nutrition, even more if fresh samples are considered.