Methodological evaluation of vegetation indexes in land use and land cover (LULC) classification

ABSTRACT Vegetation indices are intended to emphasize the vegetation spectral behavior in relation to the soil and other terrestrial surface targets. The objective of this study was to evaluate the vegetation cover types present in the municipality of Campo Belo do Sul, Brazil, using data from five vegetation indices obtained through satellite images. In order to do so, calculations of the Normalized Difference Vegetation Index (NDVI), Soil-adjusted Vegetation Index (SAVI), Leaf Area Index (LAI), Enhanced Vegetation Index (EVI) and Normalized Difference Water Index (NDWI) were performed using Quantum Gis software. The generated maps allowed the detection of the different vegetation cover classes, thus the results indicated that there is no specific vegetation index that best represents all the evaluated classes in the study, however, NDVI, EVI, and SAVI had good adjustments in the majority of the thematic classes.


Introduction
In countries of great extension such as Brazil, earth observation through satellite systems based on mediumresolution multispectral scanners is one of the most efficient and economical ways to obtain relevant information about terrestrial natural resources and the vegetation conditions (Mallmann, Prado, & Pereira Filho, 2015). Remote sensing can generate useful spectral reflectance data that provides rapid means for monitoring and managing natural resources. In addition, through visual and digital images processing it is possible to extract biophysical information from the vegetation cover, which is considered crucial to elucidate processes related to forest distribution, human activities, biodiversity conservation, as well as socioeconomic processes (Mancino, Nolè, Ripullone, & Ferrara, 2014).
The reflectance of surface targets combinations at two or more wavelengths, especially in the visible and infrared region, generates dimensionless radiometric measurements called vegetation indices (VIs). The objective of the VIs is to highlight a particular vegetation characteristic such as leaf area index (LAI), the percentage of green cover, chlorophyll content, green biomass and absorbed photosynthetically active radiation (Jensen, 2009). Those are often used in ecological research, ecosystems modeling, biophysical parameters of vegetation estimation, as well as for monitoring the terrestrial surface (Robinson et al., 2017). VIs are mathematical models used in studies conducted since the 1960s developed to evaluate, monitor and analyze the vegetation cover and relating the spectral signature and measurable parameters in the field both quantitative and qualitatively (Mallmann et al., 2015).
There are several VIs, with their specific applicability, used in the different representative phytophysiognomies of the world biomes for vegetation mapping and monitoring on the terrestrial surface. The vegetation indices allow to obtain information from the spectral response of the targets, and thus, to diagnose different biophysical parameters such as leaf area index, biomass, the percentage of land cover, photosynthetic activity and productivity (Ponzoni, Shimabukuro, & Kuplich, 2012). To Xue and Su (2017) these indices are simple and effective algorithms to evaluate the vigor and dynamics of terrestrial vegetation. It is emphasized that such indices have particularities regarding their sensitivity towards targets since this relation is influenced by factors inherent to the target.
This study aimed to evaluate the efficiency of different vegetation indices as NDVI, SAVI, LAI, EVI, and NDWI; in the classification of land use and occupation, in order to identify the index that best represents the current coverage and more closely resembles the classification performed by the MaxVer algorithm.

Materials and method
The studied area is located in the municipality of Campo Belo do Sul, southwest region of the state of Santa Catarina, Brazil (Figure 1), between the coordinates 27º53'57" south latitude and 50º45'39" west longitude, covering an area of approximately 1.027,65 km 2 with a population of 7.483 inhabitants, with population density of 7,28 inhab./km 2 ; with an equivalence between urban and rural population (IBGE, 2011).
According to the Köppen classification (1948), the climatic type of the Santa Catarina Plateau region is a transition between Cfa (wet mesothermic, with no defined dry season, hot summers, with rare occurrence of frost in winter) and Cfb (mesothermic wet, with no defined dry season, fresh summers, with severe and frequent frosts occurring in winter), with temperatures varying from 13°C to 25°C and rainfalls distributed throughout the year, totaling an average of 1841 mm annually (Oliveira, Bertol, Barbosa, Campos, & Mecabô Junior, 2015). As for the soil, they are shallow and stony, with low fertility and little suitable for annual crops with some obstacles from the undulating and gently undulating soils, more suitable for crops; being predominant the latosols and tropohumult in the less steep areas and lithosols and neosols in more rugged areas (Embrapa, 2004).
The study area presents in rural activity its main socioeconomic activity; most of the properties are composed of natural pastures, temporary/permanent crops and planted forests. According to Silva (2015), the municipality of Campo Belo do Sul is known for having its economy based on agrosilvopastoral systems crops, with emphasis on forestry from the forest farm Gateados with the largest reforested area in the south of the country.
The image processed in this work was obtained from the Landsat 8 Operational Land Imager (OLI) sensor made available by the USGS, with a passage on 11 March 2017, orbit/point 221/79, with a spatial resolution of 30 m for visible bands, near and midinfrared and 15 m for the panchromatic band. The 16 days temporal resolution enables continuous monitoring projects (Explorer, 2016). The geoprocessing of the cartographic data was performed by the software Quantum Gis 2.18.6, with the help of the Semiautomatic Classification Plugin (SCPA) tool, which allowed the processing of information from the Landsat 8 image.
For the elaboration of the image chart and analysis of the land cover, the pseudo-color composition was used, with the bands stacking 4-5-6 (RGB). The conversion of digital levels to surface reflectance was performed through the SCP Pre-Processing options, which uses the DOS 1 method to correct atmospheric effects. The atmospheric correction is done in order to eliminate the imperfections that may damage the information (Maranhão, Pereira, Costa, & Anjos, 2017), thus enabling physical surface reflectance values to be obtained without the effects of atmospheric interference. Regarding the vector data, the cartographic bases pertinent to the municipal boundary, obtained from the Brazilian Geological Service (CPRM, 2017) were used.
The vegetation indices of this study were selected based on those usually applied in studies of this nature (Xue & Su, 2017), such as the Normalized Difference Vegetation Index (NDVI), Soil-Adjusted Vegetation Index (SAVI) Leaf area index (LAI), Enhanced Vegetation Index (EVI) and Normalized Difference Water Index (NDWI) ( Table 1).
Using the Semi-automatic Classification Plugin (SCP) plugin in Quantum Gis 2.18.6 software, the algebraic operations were performed on the components of the indices mentioned above, from the converted bands to apparent reflectance values, and then the classified images by each index were generated and then compared.
The mapping of land use and the cover was obtained by a supervised classification, using the maximum likelihood classifier algorithm (MaxVer), which according to Meneses and Sano (2012), considers the distance weighting between the means of the digital levels based on statistical parameters. For the study area, the following thematic classes were proposed: Water bodies (water slide, dams, rivers, lagoons, artificial lakes); Native forests (areas occupied by different native forest formations, including permanent preservation areas); Planted forests (established monocultures occupied with Eucalyptus and Pinus genus plantations); built areas (includes urban area, established rural areas, roads and other constructions and infrastructures); agricultural areas (cultivated areas with undefined crop types); under fallow areas (newly harvested areas, from farming or forestry, and areas prepared for the next planting); and native fields (areas where there is no forest presence and characteristic vegetation is a natural pasture, planted pastures were also included).
After obtaining the map of land use and cover, in order to estimate the accuracy of the classification data, we performed statistical analyzes that relate the occurrences of each class with reference points, generating a contingency matrix (matrix of errors). The reference points were randomly collected and labeled by visual interpretation. The contingency matrix allows validating the supervised classification by estimating the overall accuracy of the mapping, as well as the Kappa coefficient (Cohen, 1960), which quantifies the agreement between classification and reference data, ranging from zero to one, being more accurate those data that have the value closer to one, while it will have a doubtful veracity the closer the value is to zero (Silva, 2011).
Through the error matrix it was also possible to extract information regarding the producer accuracy (relative to the omission error, calculated from the reference data, indicates the probability of a reference data -field truth, to be correctly classified) and the user accuracy (relative to the commission error, calculated from the classification data, indicates the probability that a classified pixel will effectively represent the category in the field) (Furtado, 2013) and with this information it was possible to evaluate if the classification was effective.
A 100 random samples were collected from each land use and cover class and later extracted classification and indices using the Point Sample Tools. The data were compiled and statistically analyzed in the software R Statistical 3.3.1 through the Rcmdr package and then the contribution of each vegetation index in the identification of the proposed thematic classes was evaluated.
Correlation analysis was performed between the vegetation indices and the land use and occupation classes, determining the linear dependence degree between them. A total of five independent variables were analyzed using the original indices. For the variables selection, it was used the Forward method, and the indices that contributed the most to the identification of land use and occupation classes were used, as it allows to examine the contribution of each independent index to the regression model. The models were evaluated based on the adjusted coefficient of determination (R 2 aj), standard error of the estimate (Syx) and coefficient of variation (CV%), where the fitting level of the selected models for each class of land use and occupation was determined by the distribution of the residuals (Alba et al., 2017) and by the sum of the statistical scores proposed by Thiersch (1997). It was assigned the lowest weight (one) for the best statistical results of each evaluated index, the best model was designated by the sum of the scores, values from one to N, where the lowest sum of the scores indicates the selection of the best equation.

Results
Through the land use and cover mapping, it was possible to identify seven different thematic classes  Gao (1996) in the study area ( Figure 2). The visual analysis allowed to identify the predominance of native forests in the southern region, associated to watercourses. The mapping quantitative analysis (Table 2) expresses the area occupied by the thematic classes, which corresponds to the original classification. It should be noted that the native forests, native fields, and established monocultures are the classes that comprise the highest percentages, occupying the largest areas in the municipality. On the other hand, built areas and watercourses are less expressive in Campo Belo do Sul.
The Kappa index, calculated by the contingency matrix of the thematic map and the reference points, presented a value of 0.88. Thus, the classification was considered as very good according to the evaluation criteria of Galparsoro and Fernández (1999), as well as its overall accuracy, which expressed a value of approximately 92% (Table 3).
The vegetation indices resulting from the maps algebra are shown in Figure 3, demonstrating the variations occurring in the classification after application of the indices when compared to the initial classification made by the MaxVer classifier algorithm ( Figure 2).
It is inferred that there are variations in the classification of the land cover according to each index, which are corroborated by observing the different class intervals obtained for each vegetation index tested in the study (Table 4).
The Pearson correlation analysis performed in the vegetation indices obtained in each use class is demonstrated in Figure 4, in which 100% correlated variables expressed the value one. It is possible to analyze that for the different land uses and cover, the vegetation indices showed a high correlation with each other, except for the native forests class, which showed little correlation in most of the variables, showing a high correlation only in relation to the LAI to the SAVI.
It was also observed that for the native forest, the relationship between NDVI and NDWI expressed a correlation coefficient of 0.68, representing one of the highest correlations for this class. It should be noted that EVI and SAVI were strongly correlated when the native field, planted forest, agriculture, and urbanization were evaluated, showing the highest correlation coefficients. However, presenting a higher correlation does not mean that both indices are adequate for the classes, as can be seen in Table 5.
From the linear regression models, based on the scores of each class of land use (Table 5), it can be inferred that for native forests the most adequate  indices were NDVI and EVI; for planted forests, SAVI, followed by EVI and NDVI; for the class of agricultural areas were SAVI and EVI, the same indices that were more adequate for the native fields; for the fallow areas class the best adjustments were with the NDWI and NDVI indices; NDWI was also the best performing index to determine constructed areas; and, for the waterbodies, NDVI and SAVI were more indicated. Making it possible to infer that there was not an index that excelled in all classes.

Discussion
According to the initial analysis of land use and cover of the municipality of Campo Belo do Sul, it is noteworthy that native compositions of the region, represented by forests and native fields, occupy approximately 70% of the municipality. This result corroborates with the percentage of the classes related to the anthropic action being less expressive, allowing to infer that the municipality environmental degradation is not an expressive practice. The native fields are present in 34% of the area of the municipality, a percentage below only the native vegetation, which can be explained due to the extensive livestock practice in this territory, in which the native fields (natural pastoral ecosystem) offer the support for the development of the activity (Kaibara, 2014). It is also observed that approximately 50% of the municipal territory is covered by shrub-tree vegetation, adding native forests (38%) and established  monocultures (12%) that are composed of Eucalyptus and Pinus tree individuals. However, in 8% of the municipality, there is no vegetation cover, analyzing areas under fallow and also built areas; the areas under fallow have its soil exposed, which are a result of the intense soil exploitation carried out for the development of agriculture and livestock production in the region. Agricultural areas (7%) and watercourses (1%) were also present in the identified classes, being less expressive. The variation of accuracy among the classes evaluated, can be associated with common remote sensing limitations, among them the distinction of targets with partially similar spectral responses, such as: native and planted forests, native fields, agricultural areas and areas under fallow. Another possible confounding factor in the automatic classification is the characteristic relief, because the sun-earth-sensor alignment prevents the electromagnetic radiation from being reflected by the targets located on steep slopes in full shape, detecting only shadows. This factor interferes in the spectral response of targets located on slopes and consequently in the quality of the automatic classification (Caixeta, Edmundo, Rodrigues, Moreira, & Medeiros, 2012).
The understanding of the land use and occupation forms, with modeling purposes and representing large areas with good levels of detail, can be considerably improved using geoprocessing tools. In addition to reducing fieldwork, the extraction of information in medium-resolution digital images, such as the Landsat 8 satellite, allows larger areas to be studied in a shorter time.
When comparing the classification of land use by MaxVer (initial classification) with the vegetation indices, it was noticed variations in the classification of the municipality. Among vegetation indices, this difference is also expected, since they are based on different algorithms to classify the images, each index having its specificities and different levels of detail.
The correlogram within each thematic class showed that some indices have a high correlation between them for most classes. Macedo, Sousa, Gonçalves, Silva, and Rodrigues (2017), seeking to develop allometric functions to estimate total and commercial volume, using as a dependent variable a vegetation index, infers that the high correlations obtained for vegetation indices are justified mainly due to their compositions, which essentially use the spectral bands in the region of the electromagnetic spectrum related to red and infrared. Thus, such behaviors observed in the present study are expected, since in order to determine some indices there is dependence on others. Signif, codes: 0 '***'; 0,001 '**'; 0,01 '*'; not significant ' n.s. ' From the regression analysis, it was possible to notice that in each thematic class the indices are presented in a different way, some being more adequate than others in the determination of the different coverages.
In determining the native forests class, NDVI was the best index, also suitable for determining waterbodies (best fit), areas under fallow (second best fit) and planted forests (third best fit). NDVI is a relevant index for areas of medium to high vegetation density, since it is less susceptible to the soil and to the effects of the atmosphere. According to Ponzoni et al. (2012), this index has been used to detect the effects of seasonality, phenological stage of vegetation, duration of the growth period, green peak, physiological changes of leaves, senescence periods and other important vegetation related situations. However, it is not suitable for areas with low vegetation coverage (Karimi et al., 2018). The index values range from −1 to +1 and is based on the high reflection of the healthy plants in the wavelengths in the infrared band and its low reflection in the presence of the red band of the electromagnetic spectrum (Hesketh, Sánchez-Azofeifa, & Azofeifa, 2014), therefore, healthy forest cover usually has higher NDVI values. This is the most commonly used vegetation index, which minimizes topographic effects by producing a linear scale of measurement, in which the closer to 1 the greater the vegetation cover density and, in its turn, the 0 would be the approximate value for the absence of vegetation, that is, represents non-vegetated surfaces (Rosendo, 2005). In this regard, it is justified that this index had good results to determine waterbodies and areas under fallow, being an index that presents comprehensive values of class intervals.
In the determination of areas under fallow and areas with buildings, as well as waterbodies (after NDVI), the best fit index was NDWI. This is a modification of the NDVI, that allows to highlight water features and minimize the rest of the targets. The NDWI is highly correlated with the water content in the vegetation cover, making it possible to measure biomass changes and to evaluate vegetation water stress, through mathematical operations using nearinfrared and medium infrared bands (Jensen, 2009).
However, some authors have used NDWI for numerous purposes. When investigating phenological metrics in dry and dormancy periods in semi-arid pastures, Ding, Liu, Huan, Li, and Zou (2017) have found results that suggest that phenological studies using NDWI can expand the understanding of the terrestrial surface phenology; furthermore, they suggest that the combination of this index with climatic variability could contribute to the study of ecosystem processes in semi-arid pastures. Ahmed and Akter (2017) studied changes in land use and cover following regular flooding in coastal areas of Bangladesh, noting that both NDWI and NDVI are prominent in identifying vegetation and water coverages considering their individual restrictions.
With another approach, Choung and Jo (2016) used NDWI to monitor changes in water resources in South Korea using Landsat multitemporal images; their results showed significant differences in waterbody sizes throughout the study years. Sarp and Ozcelik (2018) also used NDWI to assess changes in water resources in southwestern Turkey, and the results effectively showed the detection of change in the water surface between specified time intervals. On the other hand, aiming to determine indicators for assessing the vulnerability of forests and fires in Indonesia, Nurdiana and Risdiyanto (2015) used NDVI and NDWI as key parameters for the identification of fire outbreaks, where both indices were able to provide the highest contribution to the vulnerability level.
Efforts around the world have been observed to justify the use of this index (associated or not to others) to identify dry areas. Gu, Brown, Verdin, and Wardlow (2007) evaluated the results of the use of satellite-derived indices (NDVI and NDWI) to monitor vegetation drought through soil moisture observations in the United States. They found that there is a strong relationship between the vegetation indices with the heterogeneity of the land cover, soil type and humidity, and for areas of homogeneous forest cover, both were sensitive to changes in soil moisture and are strongly related to vegetation drought conditions; suggesting that both indices are appropriate to monitor water stress in vegetation. Szabó, Gácsi, and Balázs (2016) investigated ranges of three spectral indices for land cover types by comparing the spectral ratios by land cover types and assessed their efficiency in discriminating land cover classes. As well as the multiple functions that the authors suggest for NDWI, the results of this study foster the idea that their use can be indicated to evaluate various land coverages.
It was from the use of the abovementioned indices that the need for an index that considered soil response, which could be dominant on the vegetation response, was found, depending on the coverage percentage. Thus, in order to mitigate this soil effect, SAVI (Huete, 1988) was created, which is based on the principle that the vegetation curve tends to approach the soil curve for low vegetation densities, passing through a mixture of spectral responses to average densities and almost no influence of the soil to high vegetation densities (Sousa & Ponzoni, 1998).
In this study, SAVI was the best fit index to determine planted forests, agricultural areas, native fields, and waterbodies. According to Huete (1988), the SAVI adjusted to different soil conditions that may exert considerable influence on the canopy, thus in areas where there are considerable variations in the soil brightness resulting from differences in moisture, variations in roughness, shade or differences in organic matter, there are soil-induced influences on vegetation index values. For this reason, this index reduces the soil effects and it is probable that in this study, its good adjustment in the different coverages was provoked precisely because these areas have soils where the reflectance is favored.
Corroborating with the results obtained in the present study, Liaqata et al. (2017) found that SAVI was the index that best adjusted for estimated agricultural production in irrigated areas in Pakistan; González-Dugo and Mateos (2008) observed that SAVI also excels when used in irrigated cotton and beet crops in southern Spain. For planted forests areas, Alba et al. (2017) demonstrated that SAVI was the index that had the best correlation with the volume estimation for a Pinus elliottii forest. In a study developed by Cassol (2013), Maciel (2002) and Bernardes (1998), the SAVI index also stood out as one of the spectral variables with the greatest relation to the forest biomass in a Mixed Ombrophilous Forest, which has a high population density. These authors reinforce that soil brightness influences the spectral response, even in closed canopies and with high individual density.b The EVI was the second-best index for the four land uses, being the most sensitive to the canopy structure (Huete et al., 2002), although it is more influenced by the solar zenith angle (Galvão et al., 2011) under certain conditions. This index has higher sensitivity in regions with high biomass production and reduces the atmospheric and soil influences (Jiang, Huete, Didan, & Miura, 2008). Risso et al. (2012) evaluated the performance of EVI and NDVI to discriminate soybean areas from sugarcane, pasture, forested and Cerrado areas, they found that EVI excelled in the image classification, quite possibly due to this characteristic of a greater sensitivity to higher biomass values.

Conclusion
There is no specific vegetation index that best represents all the classes evaluated in the study; however, NDVI, EVI, and SAVI had good adjustments in most of the thematic classes. The choice of the best vegetation index for land use and occupation classification depends, particularly on the predominant soil use.
In general, it is possible to suggest that the information spectral mixture and the determination of several thematic classes favored the non-definition of the best vegetation index, and thus it can be inferred that depending on the objective and the type of predominant use, more adequate indexes can be indicated.

Disclosure statement
No potential conflict of interest was reported by the authors.