Prediction of soil carbon levels in calcareous soils of Iran by mid-infrared reflectance spectroscopy

ABSTRACT The objective of this study was to assess the predictive performance of midDRIFTS-PLSR models in quantifying total carbon (TC), total organic carbon (TOC), total inorganic carbon (TIC), total nitrogen (TN), hot water extractable carbon and nitrogen (CHWE, NHWE), pH, and the clay, silt, and sand content of soils. A total of 68 soil samples were taken across an agroecological region in southwest Iran, and analyzed in the laboratory using mid-DRIFTS-PLSR. midDRIFTS-PLSR calibration models were developed, and external validation was performed for each of the soil properties via an independent algorithm. The calibration and validation models allowed for a sufficient prediction of TC, TIC, and TOC with residual prediction deviations ≥3 and R2 values >0.9. The precise prediction ofcarbon fractions, such as TC, TIC, and TOC, in a rapid and inexpensive manner confirmed that midDRIFTS analysis was a rapid-throughput and cost-effective technique for monitoring soil carbon at the regional scale.


Introduction
Soil provides a large carbon pool that responds quickly to environmental changes and global warming [1,2]. Soil organic carbon (SOC) has several functions and provides ecosystem services through its crucial role in carbon cycling and greenhouse gas exchanges between the soil and atmosphere. It is also widely acknowledged to be one of the most significant factors affecting soil fertility and plant growth [3]. Therefore, SOC monitoring is given a high priority in sustainable management strategies and environmental assessments [4].
Assessment of soil quality and indicators of soil quality remains challenging [5,6]. Soil characteristics vary spatially and temporally, and soil monitoring is therefore essential for early diagnosis of changes in soil quality [7].
Several studies have proposed programs for monitoring soil quality [8,9]. Most of the proposed monitoring schemes use a data set of the important physical, biological, and chemical properties of soils [10]. Soil characteristics are selected from the data set and normalized to obtain a global soil quality index for a particular soil property or specific ecosystem [8,9]. However, the monitoring of soil quality indicators is very expensive and time consuming because it requires long term sampling (five to ten years) and many soil analyses are required by soil monitoring networks [11] to detect changes in soil quality [7].
Proximal soil sensing (PSS) is a modified sampling design that benefits from the use of cheaper laboratory measurements and portable field spectrometers. Spectroscopic methods, such as portable X-ray fluorescence (PXRF) devices and visible-infrared (Vis-IR) spectroscopy, are used in PSS and can provide rapid measurements of soil contaminants [12,13].
Many studies have shown that both the visiblenear infrared (400-700-2500 nm) and mid-infrared (2500-25,000 nm) ranges are suitable for estimating various soil properties such as total carbon (TC), SOC, total nitrogen (TN), and pH values [14,15,16]. However, differentiation of the carbon and nitrogen pools in soils using infrared spectroscopy has not yet been achieved [17,18].
Various studies that have estimated soil properties have shown mid-infrared spectroscopy (MIRS) to be more useful than near-infrared spectroscopy (NIRS). For example, Reeves et al. [19] found that MIRS achieved generally better results than NIRS for different soil carbon pools. McCarty et al. [20] and McCarty and Reeves [21] reported that results of TC, SOC, TN, and pH analyses using vis-NIRS and NIRS were not as good as those obtained using MIRS. Michel et al. [22] reported that MIRS could be a powerful method for estimating carbon and nitrogen using artificial soil mixtures. Vohland et al. [23] predicted soil properties such as TOC, TN, microbial biomass carbon (Cmic), hot water extractable carbon (C HWE ), and soil pH by mid-infrared diffuse reflectance Fourier transform spectroscopy (midDRIFTS) in a study of 60 samples using crossvalidation models. However, some studies have found that vis-NIRS was better than MIRS for estimating exchangeable aluminum and potassium [12].
Although infrared-based estimations of soil properties may not be as precise as classical laboratory analyses, they are useful for many research goals and applications [16]. midDRIFTS coupled with partial least squares regression (PLSR) is a technique used to predict a variety of soil properties [12,22]. Despite the successful application of infrared spectroscopic analysis for predicting a range of different soil properties, most studies that have used this technique have concentrated on soils from temperate and tropical regions. Few studies have used spectral data to quantify soil properties at different scales and under different land uses in arid and semiarid regions, such as Iran.
The soil in Iran is calcareous and contains a diverse range of carbonate minerals. Interference with spectral bands representing soil organic matter (SOM) functional groups is an important issue in the analysis of calcareous soils. There are several absorbance bands in the MIR range that are influenced by soil carbonate, including the bands at 2686-2460, 1850-1784, 1567-1295, 889-867, 734-719, and 719-708 cm -1 [24,25]. Thus, interference between organic matter and carbonates may affect model accuracy and prevent the use of particular spectral peaks for analysis [26], which may be problematic for monitoring of very heterogeneous soils. Thus, interference by carbonates must be taken into consideration when calcareous soils are studied. The main objective of our study was to verify the ability of midDRIFTS-PLSR models to predict soil carbon fractions in calcareous soil from southwest Iran.

Study area and soil sampling
Soil samples were collected at 20 locations in Lorestan Province in southwest Iran. All the soils in Lorestan Province are calcareous, with pH values above seven. The area is located in a geomorphological basin between 1000 and 2000 m above sea level. The climate is mild, with a mean annual precipitation and temperature of 350 mm and 17.7°C, respectively. The study area was chosen to represent the variations throughout the study region. Soil samples were taken from three different land uses: forest, range, and crop lands. Various cereals, but predominantly wheat, were cultivated in the crop lands.
All selected forest and grassland areas had a southern orientation with slopes <30°. For each land use, we established study plots and divided each plot into four subplots. Four replicated points were chosen in each subplot, and in the last step four replicated samples were mixed into one sample to reduce the number of samples. A total of 68 samples from the surface horizons (0-30 cm) were taken during 2016. At each sampling point, five subsamples were taken and mixed to form a composite sample. The soil samples were stored in plastic bags and kept in cool conditions. Soil samples were oven-dried at 32°C for 48 h and then crushed and sieved (<2 mm) using a mechanical sieve. A 10-g subsample was further ball milled for the midDRIFTS analyses described below.

Conventional laboratory analysis of soil samples
Soil samples were analyzed to determine their carbon and nitrogen fractions, texture, and pH. The Bouyoucos hydrometer method was applied to determine soil texture. Measurements of soil pH were conducted in 0.01 M CaCl 2 with a soil:solution ratio of 1:2.5 (m:v) [27]. Carbon and nitrogen extracted with hot-water were referred to as C HWE and N HWE , respectively, according to Schulz and Körschens [28], and analyzed with a Multi N/C analyzer (Analytik Jena, Jena, Germany). TC and TN were measured using a dry combustion method [29] with a Vario-EL III elemental analyzer (Elementar, Hanau, Germany).

Determination of Total Organic Carbon (TOC) and Total Inorganic Carbon (TIC) by combustion and gas analysis
All soil samples used in this study contained both calcite and dolomite. Therefore, the VDLUFA-Verlag soil analysis methods [29,30] were used for organic carbon and carbonate measurements.
Ball-milled soil samples were maintained at 32 ºC overnight before analysis. Briefly, around 10 g of each soil sample was weighed into a ceramic crucible which was then placed into a furnace at 500 ºC for 6 h. After 6 h, TIC was measured in samples by dry combustion [29] with a Vario-EL III elemental analyzer (Elementar) and TOC was reported as the difference between soil TC and TIC measured by the dry combustion method [29].

MidDRIFTS analysis of soil samples
The midDRIFTS analysis was performed according to Demyan et al. [31] and Mirzaeitalarposhti et al. [16]. Ballmilled soil samples were kept at 32 ºC overnight before the analysis. A subsample of approximately 200 mg was scanned using a Tensor-27 mid-infrared spectrometer (Bruker Optik GmbH, Ettlingen, Germany) equipped with a potassium bromide (KBr) beam splitter and a liquid nitrogen-cooled mid-band mercury-cadmiumtelluride detector., A praying mantis diffuse reflectance chamber (Harrick Scientific Products Inc., New York, NY, USA) was installed in the spectrometer. A chamber filled with dry air from a compressor (Jun-Air International, Nørresundby, Denmark), with a flow rate of 200 L h -1 , was used to reduce moisture and water interference. All subsamples were scanned with three replications in the mid-infrared range (4000-600 cm -1 ). The spectra were obtained by combining 16 individual scans at a resolution of 4 cm -1 recorded in absorbance units Prior to the analysis, the spectral average of triplicates was determined to generate a single spectrum for PLSR analysis. Dardenne et al. [32] reported that the cross-validation model for a reduced number of samples, as in this study, provides an optimistic prediction of the actual performance.
Through a stratified random selection, a subset of 45 samples was selected for model calibration (calibration subset), while a subset of 23 other samples was selected for independent model validation (validation subset). For model development, the edges of each spectrum (reduced to 3900-700 cm -1 ) and the CO 2 regions (2400-2300 cm -1 ) were excluded to reduce noise during model optimization.
Before optimization, the entire spectrum, with the exception of CO 2 regions, was divided into ten equal portions of 177 cm -1 . Optimization of the PLSR models in this study was achieved through the use of a successive projection algorithm, which used different combinations of predefined frequency regions and data preprocessing methods. The most informative and relevant frequency regions in the frequency domain were selected by the successive projection algorithm to develop precise calibration models [33]. Pre-processing transformations of spectral data were used to improve the accuracy of PLSR models through noise reduction and to offset the reduction in raw spectra. We used the following optimization procedures: constant offset elimination (COE), straight line subtraction (SLS), vector normalization (SNV), multiplicative scatter correction (MSC), first derivative (FD), second derivative (SD), FD+SLS, SD +SNV, and FD+MSC [16]. midDRIFTS-PLSR model calibration and model validation were performed for each soil with the OPUS version 7.0 software package (Bruker Optik GmbH, Ettlingen, Germany).

Data analysis and assessment of model accuracies
Descriptive and multiple regression analyses were conducted with the SPSS (version 14) software [34]. A regression analysis was conducted to test for correlations between the predicted and measured values of each property of interest using SigmaPlot software. The accuracy of midDRIFTS-PLSR models was evaluated by the residual prediction deviation (RPD), the coefficient of determination (R 2 ), and the root mean square error (RMSE) of independent models. In the current study, models with RPDs greater than 5 were considered to be ''excellent". Models with RPDs between 3 and 5 were considered to be "acceptable", while RPDs between 1.4 and 3 were considered to be "moderately successful" [35], and models with RPDs less than 1.4 were considered to be 'unsuccessful'

Soil properties and correlation analysis
As can be seen in Table 1, the values of the measured soil properties for forest land were higher (p < 0.01) than for the two other land uses, except for pH, TIC, and silt content. This was also true for C HWE (1168 vs. 493 and 725 mg kg -1 , respectively) and TOC (4.31% vs. 1.4% and 1.64%, respectively). For soil texture, higher mean sand contents were observed in range land (56.93%) compared to the forest land (50.74%). By calculating the coefficient of variation (CV), we found that 6 out of 10 soil properties in forest land had a higher variability than in range land and agricultural land (Table 1). Of all the soil properties, TN varied the most, with a CV of approximately 104% for forest land. Table 2 shows the correlation between soil properties from all three land uses. TOC and TN were strongly correlated. There were positive correlations among TC, TIC, and TOC (p < 0.001). As an important soil quality indicator, TOC was correlated positively not only with carbon pools, i.e. TC and C HWE (R 2 = 0.4 and 0.55, respectively; p < 0.001), but also with nitrogen pools, i.e. TN and N HWE (R 2 = 0.9 and 0.47, respectively). The clay content was not significantly correlated with most soil properties. Significant positive correlations with the clay content were only found for silt and pH.

Spectral features of soil
The baseline corrected and average spectra of all samples from each land use are shown in Figure 1. Similar shapes but different peak heights in the spectra were observed. As shown in Table 1, the soil varied in texture and physicochemical soil properties, which is reflected in the soil spectrum in Figure 1.
Several peaks throughout the spectrum were assigned to different minerals and organic functional groups [14]. In the 2100-600 cm -1 region, there were several peaks allocated to elemental molecular vibrations (e.g. quartz O-Si-O) and several peaks that were associated with organic compounds [36]. There was a maximum spectral feature in the 3700-3500 cm -1 region, which was mainly due to the presence of hydroxyl stretching vibrations in combination with clay minerals. Dixon and Weed [37] reported that O-H in the 3690-3620 cm -1 region and Si-O stretching near the 1100-1000 cm -1 region might be related to clay minerals. According to Stevenson [38], small peaks between 2960 and 2860 cm -1 could be related to the presence of aliphatic groups. In the forest land spectra, this band was more evident and may be related to higher SOC concentrations. All the soil samples in this study were collected from calcareous soils. Nguyen et al. [39] found that carbonate peaks were mainly present in the region of 2600-2500 cm -1 and quartz produced peaks at 1980 and 1870 cm -1 , with higher absorbance in agricultural land. A prominent peak from 2000 to 1500 cm -1 appeared in soils from all three land uses and was related to the double bonds the C = C and C = O groups [40]. The peak near 1160 cm -1 was an easily decomposed organic compound with C-O bonds in poly-alcoholic groups, with the highest peak occurring in agricultural land [36]. Table 3 and Figure 2 show the calibration results for the soil properties. Excellent calibration models were achieved for TC, TIC, and TOC with R 2 values ≥0.96 and RPD values ≥3.

Calibration of midDRIFTS-PLSR models
Moderately successful calibration models were found for C HWE with R 2 and RPD values of 0.77 and 2.09, respectively.  Moderately successful calibration models were obtained for N HWE , pH, clay, sand, and silt contents with RPDs <3 (Table 3). Different PLSR factors were used for the calibration, varying from 2 for pH to 9 for the TIC, TN, and silt content of soils. The smallest RMSE CV and largest R 2 values corresponded to the optimal number of factors used during calibration. Each soil property had specific absorption bands that corresponded to some of the frequency ranges. Soil carbon fractions such as TC, TIC, TOC, and C HWE had greater frequency ranges than the other measured parameters (Table 3).
Model validation is summarized in Table 4 and Figure 2. Prediction accuracies were low for all soil properties, with RPDs of only 0.9-3.52. Lower R 2 and RPD values were obtained for the sand, silt, and clay contents than for the other parameters ( Table 4). The prediction results were clearly weaker when the independent model was applied for model validation. The results showed acceptable predictions for TIC, whereas the predictions for other soil properties varied from unsuccessful to moderately successful.

Application of the model
The application of midDRIFTS in this study confirmed its ability to predict physicochemical and biological soil properties, with varying degrees of accuracy [41,42]. However, there were two issues that potentially affected the model performance including the number of samples, similarity or diversity of data sets, soil texture, and the range of the property of interest [43]. A midDRIFTS spectrum contains complex information on soil organic and inorganic compounds. The active vibrations of the molecular bonds of different organic and inorganic compounds may overlay each other and affect the model performance. Although the most informative spectral bands are used by  TC, total carbon; TIC, total inorganic carbon; TOC, total organic carbon; TN, total nitrogen; C HWE , hot water extractable carbon; N HWE , hot water extractable nitrogen; R 2 , coefficient of determination; RPD, residual prediction deviation; RMSE, root mean square error; SLS, straight line subtraction; COE, constant offset elimination; FD, first derivative; MSC, multiplicative scattering correction; SD, second derivative; SNV, vector normalization. a Optimal number of PLS factors for model calibration PLSR models to reduce complexity and increase model accuracy, spectral overlap and interference may still prevent a successful model calibration. Therefore, the predictive ability of the midDRIFTS model is based on calibration models that relate spectral data to soil properties.
The results of this study revealed that midDRIFTS-PLSR calibrated models were suitable for the prediction of soil properties. Despite the careful calibration of independent models for SOC, pH, and sand, silt, and clay contents, the prediction accuracies were classified as unsuccessful. Less accurate predictions were obtained for C HWE , N HWE , and TN, while predictions of TC, TOC, and TIC values were good. For pH, the inaccurate prediction was likely due to indirect spectral responses.
Cobo et al. [44] studied soils with various textures from three villages in Zimbabwe. The soils had different sand contents, ranging between 25% and 75%, and an RPD value of 6.8 was found for very sandy soil.  Tables 3 and 4).  [45]. A sufficiently heterogenous sample set is needed to ensure a suitable calibration [46]. Therefore, the inaccurate predictions of the sand, clay, and silt content of soils could be related to differences in the particle size distribution. Inaccurate results also were obtained for soil C HWE and N HWE , which agreed with observations made in other studies [23,45]. The relatively low levels of microbiological carbon in the soil samples (less than 5% of TOC) were poorly represented in the spectral measurements, with peaks overlaid by peaks associated with soil minerals and other organic materials [45,47]. Because of the close relationship with SOM quality, microbiological carbon was expected to be well predicted [48].
As shown in Table 2, a significant positive correlation was found for both C HWE and N HWE with TOC. The range of variability, soil type, site, and number of samples may have resulted in the different model performances [49,50]. For the calibration of a reliable PLSR prediction model, a certain amount of heterogeneity is needed [46]. Successful predictions of soil carbon levels in several soils have been achieved using midDRIFTS [16,51,52]. As an example, in the midDRIFTS-PLSR model obtained for TOC, several bands were assigned to organic matter, carbonate, and other soil constituents such as iron oxides and clay minerals. Thus, the model may be constructed based on the bands corresponding to organic compounds or other soil minerals such as iron oxides and clay [53]. Because soils are extremely diverse and variable, making the midDRIFTS-PLSR model useful for predicting soil properties remains a challenging task.
The predictive performance of the PLSR model requires sufficient heterogeneity in a study region. However, if heterogeneity exceeds a certain level there may be adverse effects.

Conclusions
This study examined the combination of midDRIFTS with PLSR, with the results confirming that the technique was capable of predicting TC, TOC, and TIC in soils under three different land uses by evaluating the different values of various soil properties. The midDRIFTS-PLSR models developed in this study gave good predictions of the carbon content of soils and were able to make accurate predictions of the soil carbon fractions in 'unknown' samples from the study region by analyzing their midDRIFTS spectra without the necessity for any further laboratory measurements. Thus, in situations with an inadequate sample size (<100), as in this study, midDRIFTS-PLSR models can be used to obtain accurate data. The heterogeneity of the data used for calibration has a large effect on calibration and prediction accuracy. For favorable midDRIFTS-PLSR predictive models of soil properties in various regions, the development of accurate predictive models based on a large sample size and adequate heterogeneity is necessary.
In conclusion, our results revealed that the application of midDRIFTS to accurately predict carbon levels in soils was appropriate and the method has a strong potential to make spectrum-based predictions. midDRIFTS-PLSR models can act as a rapidthroughput approach for future spectrum-based predictions in different regions, but large numbers of soil samples are necessary to establish the models because of the unknown soil conditions in many areas.