Crop classification using spectral indices derived from Sentinel-2A imagery

ABSTRACT Optical remote sensing is one of the most attractive options for generating crop cover maps because it enables computation of vegetation indices, which are useful for assessing the condition of vegetation. The Sentinel-2A Multispectral Instrument (MSI), which is a multispectral sensor with 13 bands covering the visible, near infrared and short-wave infrared (SWIR) wavelength regions, offers a vast number of vegetation indices. Spectral indices, which are combinations of spectral measurements at different wavelengths, have been used in the previous studies and they sometimes contributed to improve classification accuracies. In this study, 91 published spectral indices were calculated from the MSI data. Additionally, classification algorithms are essential for generating accurate maps and the random forests classifier is one of which possesses the five hyperparameters were applied. The improvements in classification accuracies were confirmed achieving an overall accuracy of 93.1% based on the reflectance at 4 bands and 8 spectral indices.


Introduction
Crop cover maps with accurate spatiotemporal information are required, since agricultural fields are developed and managed through a variety of social actions or policies, which can hugely impact biogeochemical and hydrologic cycles, climate, ecosystem functions, the economy and human health (Wardlow & Egbert, 2008). In order to carry out social policies as well as estimating the amount and type of crops harvested in a certain area, some local governments apply manual labour to document field properties, such as crop type and location (Sonobe, 2019). However, labour is expensive and it is necessary to develop more efficient techniques. Quite a few studies have focused on the production of thematic maps using remote sensing data (Avci & Sunar, 2015;Foody, 2002) and they reported that remote sensing provides effective support to management of agricultural fields by providing spatial and temporal information for vegetation monitoring over large areas (Gao et al., 2017). Therefore, mapping crops based on remote sensing data can be an alternative to the routine collection of annual agricultural surveys.
The Sentinel-2A satellite was launched on 23 June 2015 and collects multispectral data including 13 bands covering the visible and SWIR wavelength regions, and includes three atmospheric bands (Band 1, Band 9 and Band 10). Furthermore, Sentinel-2B was launched on 7 March 2017 and provides great opportunities for monitoring agricultural fields. Even excluding the three atmospheric bands, various spectral indices can be extracted. Therefore, in this research we calculated 91 spectral indices from Sentinel-2A data and their importance in crop type identification was evaluated.
Geographic object-based image analysis (GEOBIA) has been paid most attention now and that's useful to clarify the borders of fields. However, very fine resolutions of less than 1 m are required to make good use of remote sensing data in GEOBIA (Baker, Warner, Conley, & Mcneil, 2013). Besides, (Hartfield, Marsh, Kirk, & Carriere, 2013) reported the classification accuracy based on pixel-based classification using classification and regression tree (CART) was superior to that based on object-oriented classifiers using Landsat data. The random forests (RF) is an improved algorithm based on an ensemble learning technique that builds multiple CART (Breiman, 2001). Indeed, RF has been extremely successful as a general-purpose classification and regression method (Biau & Scornet, 2016). RF fits many classification trees to training datasets, and then combines the predictions from all the trees for a final decision. Therefore, RF was applied in this study. Furthermore, it is easy to evaluate the variable importance (VIMP), which is calculated when a classification model is developed and shows how much worse the prediction would be without that variable when RF is applied (Ishwaran, 2007). The original RF has two hyperparameters: the number of trees (ntree) and the number of variables used to split the nodes (mtry). However, tuning of the hyperparameters related to nodes sometimes improves classification accuracies, therefore, RF as modified by Ishwaran (Ishwaran & Kogalur, 2007;Ishwaran, Kogalur, Blackstone, & Lauer, 2008) was applied in this study.
Although the grid search has widely been used for tuning hyperparameters of machine learning algorithms, it could be a poor choice for configuring algorithms for new data sets (Bergstra & Bengio, 2012). On the other hand, Bayesian optimization is useful to obtain effective combinations of hyperparameters and has been applied in previous studies (Sonobe et al., 2017b), therefore, Bayesian optimization was applied in this study. The main objectives of this study were (1) to evaluate the potential of metrics extracted from Sentinel-2A data for crop classification and (2) to identify which spectral indices are effective for identification of crop types.

Study area
The study area was in Hokkaido, Japan (142°55 ′ 12 ′′ to 143°05 ′ 51 ′′ E, 42°52 ′ 48 ′′ to 43°0 2 ′ 42 ′′ N) at an elevation between 50 and 230 m. The location has warm summers and cold winters, with an average annual temperature of 6°C and an annual precipitation of 920 mm. A total of 4719 crop fields were in the study area, and the field size ranged from 0.1 to 17.5 ha with a mean of 2.6 ha, a skewness of 2.2 and a standard deviation of 1.7 ha. The dominant crops were beans (981 fields, Vigna angularis and Glycine max), beetroots (569 fields, Beta vulgaris), grass (640 fields, Phleum pratense and Dactylis glomerata), maize (317 fields, Zea mays), potatoes (783 fields, Solanum tuberosum) and winter wheat (1429 fields, Triticum aestivum). Rice is cultivated mainly in the central parts in Hokkaido (i.e. Ishikari, Sorachi and Kamikawa). In the northern and eastern parts of Hokkaido (i.e. Kitami, Abashiri and Tokachi) rice cultivation could not be established because of very cool climate. The crop species chosen in this study have mainly been cultivated in Tokachi ( Figure 1).
The 2016 crop map was provided by Tokachi Nosai (located in Obihiro City, Hokkaido. Nosai provides agricultural insurance, which helps stabilize farmers suffering from damage caused by natural disasters and contributes to the growth of Japanese agriculture). The 2016 crop map was received as a polygon shape file that included attribute data, such as crop types from April to August. In 2016, seeding of winter wheat was conducted after September, therefore the crop cover did not change between the date recorded and August 2016.

Satellite data
Sentinel-2A is designed to measure reflectance of blue, green, red and near-infrared-1 bands at 10 m resolution; red edge 1-3, near-infrared-2, and short-wave infrared 1 and 2 at 20 m; and 3 atmospheric bands (Band 1, Band 9 and Band 10) at 60 m. However, the three atmospheric bands were not used during this study because they are mainly dedicated to atmospheric corrections and cloud screening (Drusch et al., 2012). In previous studies, the satellite data showed great potential for agricultural purposes including crop and grass chlorophyll and nitrogen content (Clevers & Gitelson, 2013), vegetation state in grasslands and savannah (Hill, 2013), and potato crop yield (Al-Gaadi et al., 2016).
The seven observations were conducted by Sentinel-2A from May to September 2016 for the whole site. However, only one image date (11 August) during the growing season was used due to better quality for classification purposes; the other images were covered with cloud. On 11 August, the mean temperature was 20.7°C, the duration of sunshine was 8.2 h, the atmospheric pressure was 999.6 hPa and the relative humidity was 66%.
In this study, the Level 1C data acquired on 11 August 2016 were downloaded from EarthExplorer (https://earthexplorer.usgs.gov/). All bands are converted to 10 m resolution using Sentinel-2 Toolbox version 5.0.4. Then, average reflectance values of each band were calculated for the fields using field polygons to compensate for spatial variability and to avoid problems related to uncertainty in georeferencing. Beside them, 91 published spectral indices were evaluated for their potential for crop classification ( Table 1). The wavelength difference (D), such as Green Difference Vegetation Index (GDVI) (Tucker, 1979a), and the normalized differences (ND), such as Green Normalized Difference Vegetation Index (GNDVI) (Gitelson et al., 1996), Normalized difference near infrared/shortwave infrared normalized Burn Ratio (NBR) (Key & Benson, 2006) and Normalized Difference Infrared Index (NDII) (Zarco-Tejada et al., 2001), are the most common forms of spectral indices. Gitelson et al. (2009Gitelson et al. ( , 2001a proposed inverse reflectance differences, such as Anthocyanin Reflectance Index (ARI) or Carotenoid Reflectance Index (CRI). However, most indices are based on reflectance in more than three bands. Some indices, such as Soil Adjusted The γ is a weighting function that depends on aerosol type. In this study, a value of 1 for γ.

Classification algorithm
We used a stratified random sampling approach (Foody, 2009) to select the fields used for developing (50%), hyperparameter tuning (25%) and evaluation (25%) of the classification models to prevent overfitting of classification models (Hastie, Tibshirani, & Friedman, 2009). Crop classifications were conducted using the following three different datasets; Case 1: reflectance data, Case 2: spectral indices data and Case 3: a combination of reflectance and spectral indices in order to evaluate if adding vegetation indices improved the classification accuracy. RF has demonstrated its ability to yield accurate land cover maps (Rodriguez-Galiano, Chica-Olmo, Abarca-Hernandez, Atkinson, & Jeganathan, 2012; Sonobe, Tani, Wang, Kobayashi, & Shimamura, 2014). Furthermore, RF carries out classifications with a shorter computing time; although the number and quality of variables has a great influence on computational time (Pelletier, Valero, Inglada, Champion, & Dedieu, 2016). In RF, multiple decision trees are generated based on random bootstrapped samples from the training data (Breiman, 2001) and nodes are split using the best split variable from a group of randomly selected variables (Liaw & Wiener, 2002). Finally, the output is determined by a majority vote from the trees. The original RF has two hyperparameters including the number of trees (ntree) and the number of variables used to split the nodes (mtry). However, the best split for a node can increase classification accuracy (Ishwaran et al., 2008;Ishwaran & Kogalur, 2007;Sonobe et al., 2017a). Thus, three hyperparameters related to nodes were also considered: the minimum number of unique cases in a terminal node (nodesize), the maximum depth of tree growth (nodedepth) and the number of random splits (nsplit).
Although the grid search is one of the most common strategies to optimize the hyperparameters of machine learning algorithms (Puertas, Brenning, & Meza, 2013), it could be a poor choice for configuring algorithms for new data sets (Bergstra & Bengio, 2012). Therefore, Tuning these hyperparameters was conducted using the Gaussian process, Bayesian optimization, which is superior to the grid search strategy for configuring algorithms from new data sets (Bergstra & Bengio, 2012). In Bayesian optimization, the black-box function (such as machine learning algorithms), is sampled from a Gaussian process and its posterior distribution is maintained. To select the hyperparameters of the next experiment, the expected improvement was used in this study.

Accuracy assessment
The kappa statistic (Cohen, 1960) had been used as an accuracy measure, however, it has fundamental conceptual flaws, such as being undefined even for simple cases, or having no useful interpretation (Pontius & Millones, 2011). Therefore. The classification accuracies were assessed using quantity disagreement (QD) and allocation disagreement (AD), which provide an effective summary of the cross-tabulation matrix (Pontius & Millones, 2011). Each cell of the confusion matrix was calculated using the following equation; where P ij denotes the proportion of fields in class i according to the classification results, and class j according to the reference data (i.e. the polygon shape file provided by Tokachi Nosai). W i is the fields classified as class i, n ij is the number of fields classified as class i according to the classification results, and class j according to the reference data. n i+ is the row totals. AD and QD were calculated as follows; where N c is the number of classes, p i+ and p +i denote the row and column totals, AD i is the allocation disagreement for class i and QD i is the quantity disagreement for class i. QD is defined as the difference between the correct data and the classified data based on a mismatch of class proportions, while AD is the difference between the classified data and the correct data due to incorrect spatial allocations of fields in the classification. The sum of QD and AD indicates total disagreement (Pontius & Millones, 2011). Overall accuracy (OA), producer's accuracy (PA) and user's accuracy (UA), which are commonly used for accuracy assessments, were also used.
where N is the number of fields, R i and C i denote the total number of classes i in the reference data and the total number of the classification results, and p ii represents the number of correctly classified fields. We analysed whether there were significant differences between the two classification results based on McNemar's test (Mcnemar, 1947). In this test, a chi-square value greater than 3.84 indicates a significant difference between the two classification results at a 95% significance level. The data-based sensitivity analysis (DSA), which is a simple method performs a pure black box use of the fitted models by querying the fitted models with sensitivity samples and recording their responses, were used to evaluate the sensitivity of the classification models. Although DSA is similar to a computationally efficient one-dimensional sensitivity analysis (Kewley, Embrechts, & Breneman, 2000), this method uses several training samples instead of a baseline vector (Cortez & Embrechts, 2013).

Reflectance obtained from Sentinel-2A
The mean reflectance acquired from Sentinel-2A on 11 August is shown in Figure 2. For wheat, the reflectance at Bands 6-8a was lower than for other crops, while the reflectance at Bands 2-5 was higher. In this period, wheat was in the late grain-filling stage, but it had not yet been harvested. Wheat had very different reflectance features the other five living crops. On the other hand, the date corresponded to the late growing season of beetroot and grass and the early ripening period of beans; therefore, the reflectance features of vegetation that are characterized by low reflectance in the visible domain and high reflectance in the infra-red were quite clear in these crops. Regarding potatoes, a chemical treatment to kill potato vines, which facilitates harvest, was used in the study area. As the result, a relatively high reflectance at Bands 2-5 was confirmed in the potatoes as the features of senile plants. For maize, the only C 4 plant, the reflectance at Bands 2-5 was the lowest and at Bands 6-8a was similar to that of potato.
The results of accuracy assessments and McNemar's test are given in Tables 2 and 3, respectively. The best OA was confirmed for Case 2 and the worst was confirmed for Case 1 and their differences in classification were significant (p < 0.05), even though the statistical differences between these 3 cases are minor. Although Case 1 showed the highest PA of beans, misclassification sometimes occurred with grass or potato. On the other hand, misclassifications between beans and grass were reduced using spectral indices (Cases 2 and 3). However, a few beetroot fields were misclassified as grass or potato for Cases 2 and 3, while all fields classified as beetroot were correct. Although Case 2 had the worst UA of wheat, the use of spectral indices improved classification accuracies for other crops.

Importance for identifying each crop types
The importance of each index for identifying each crop type was evaluated using the value of VIMP, which indicates degradation if that variable was not used for classification and we can reveal which variable is effective. Figure 4 shows VIMP values of contribution of Case 3 to the classification model. SIPI, which contributed to the identification of maize and wheat, was the most important variable occupying 18.8%. PVR, which contributed to the identification of beans, beetroot, grass and wheat, was second and occupied 14.6%. VARIgreen, which contributed to the identification of beans and wheat, was third; although, it was relatively correlated with PVR ( Figure 3).

Discussion
The classification results were different from each other (p < 0.05 based on McNemar's test) and Case 2 was superior to Case 1 even though the statistical differences between  X these 3 cases are minor. It was assumed that the use of spectral indices would improve classification accuracy; however, the classification accuracy of Case 3, which was the integration of reflectance and spectral indices, was inferior to Case 2. Figure 3 shows the clustering of the correlation relationship among reflectance and spectral indices. In RF, there is the hypothesis that each variable is independent of the response variable as well as from all other predictors. Clearly, there are large sets of correlated variables (e.g. strong correlations were confirmed between Bands 5 and Hue), which may cause the negative effects (Strobl, Boulesteix, Kneib, Augustin, & Zeileis, 2008). Next, DSA was applied to clarify which variables contributed to the high accuracy. The importance of identifying each crop type or improving classification accuracies based on  AD + QD or OA were ranked based on the results of DSA when Case 3 was applied. The classification contribution of different spectral indices and bands were evaluated by adding the matrices of inferior rank one by one ( Figure 5). Using only 12 metrics including PVR (normalized difference between Band 3 and Band 4), Band 11, VARIgreen, SIPI, GARI, mNDVI, Band 12, REIP, EPIChlb, Band 2, Band 4 and NDII, an OA of 93.1% (AD + QD = 6.89) was achieved and at least 6 metrics were required to achieve an OA over 90%. Figure 6 shows the crop classification map generated when the 12 metrics were used. Especially, using the top 2 matrices, PAs and UAs more than 0.5 with an overall accuracy of 68.4% were achieved. On 11 August, wheat was in the late grain-filling stage and the difference between Band 3 and Band 4 was negative in wheat fields while those over other fields were positive, so identifying wheat could be carried out using only PVR. Besides that, in potato fields, the difference between Band 3 and Band 4 was smaller than those of beans, beet, grass and maize fields because of discolouring by a chemical treatment to kill potato vines ( Figure 2). Furthermore, the differences in photosynthetic capacities among crop species and the reflectance at Band 11 is sensitive to non-structural carbohydrates of the leaf (Asner & Martin, 2015). It has a positive correlation with leaf nitrogen content (Bartlett, Ollinger, Hollinger, Wicklein, & Richardson, 2011) and it could be useful to estimate the contents of photosynthetic pigments, water, nitrogen, cellulose, lignin, phenols, and leaf mass per area (Asner et al., 2011).
Adding VARIgreen, which is similar to PVR but Band 2 was deducted in its denominator, was useful for distinguishing beans and beetroot fields. Although the reflectance at Band 4 was almost the same between beans and beetroot, the difference between Band 3 and Band 2 was positive for beetroot and negative for beans. VARIgreen and GARI could emphasize the differences in these spectral features. While the reflectances at Bands 2 and 4 were almost similar among beetroot, grass and maize, the reflectance at Band 8 was markedly different. After identifying beans using the three metrics, SIPI and GARI contributed to improvement of identifying grasslands. However, the differences at Bands 3 and 4, which are considered in GARI, were obscure in Figure 2. This could be the reason that GARI was inferior to SIPI in this study. mNDVI helped to improve the PAs of beans, beetroot and maize, and the UA of grassland by emphasizing the difference in Band 8 using Bands 2 and 4 like VARIgreen and an OA over 90% was achieved. However, high PA or UA of potato was not achieved without Band 4 and NDII. Band 4 of potato was smaller than wheat but larger than the other 4 crops, while Band 8 of potato was larger than wheat but smaller than the other 4 crops. These relationships helped with identification of potato.
The cost sensitive learning was used to deal with the imbalanced data classification problem following (Chen, Liaw, & Breiman, 2004) and penalties were imposed on misclassifying the minority class since RF tends to be biased towards the majority class. When the 12 metrics were used, UAs of beetroot and potato were slightly improved (98.3% to 100.0% for beetroot and 86.3% to 87.9% for potato) compared with the results when this strategy was not applied, however, other indicators were not improved and thus it revealed that the influences on the imbalanced data classification problem were so weak in this study.

Conclusions
In this study, only one image from Sentinel-2A MSI was used to generate a classification map. In addition to the original reflectance, 91 published spectral indices were evaluated for their crop classification performance. The use of spectral indices improved the classification accuracy (achieving an overall accuracy of 93.0%), but the integrated use of reflectance and spectral indices enhanced the negative effects related to large sets of correlated variables and decreased the accuracy (OA = 92.4%).
Next, the importance of identifying each crop type or improving classification accuracies were evaluated based on the results of DSA and the classification contribution of different spectral indices and bands were evaluated by adding the matrices of inferior rank one by one. As the results, the classification based on the reflectance including Bands 2, 4, 11 and 12, and 8 spectral indices including PVR, VARIgreen, SIPI, GARI, mNDVI, REIP, EPIChlb and NDII was useful for identification of each crop and achieved an OA of 93.1%.

Disclosure statement
No potential conflict of interest was reported by the authors.

Notes on contributors
Nobuyuki Kobayashi received the M.S. degree from Hokkaido University, Japan. He is currently with Smart Link Hokkaido, Japan. His research mainly focuses on smart agriculture.
Hiroshi Tani received the Ph.D. degree in Agriculture from Hokkaido University, Japan. He is an Associate Professor of the Research Faculty of Agriculture, Hokkaido University. His research interests include agricultural meteorology and remote sensing of environment.
Xiufeng Wang received the B.S. degree from Northeast Agricultural University, Harbin, China, in 1982 and the Ph.D. degree from Hokkaido University, Sapporo, Japan, in 1994. She is an Associate Professor with the Research Faculty of Agriculture, Hokkaido University. Her main research interests are in the study of the analysis methods and its applications about satellite data for agricultural meteorology.
Rei Sonobe received the Ph.D. degree in Agriculture from Hokkaido University, Japan. He is currently an Assistant Professor and a Faculty Member with the Faculty of Agriculture, Shizuoka University, Shizuoka, Japan. His research interests include remote sensing applications for agriculture and machine learning.