Preliminary comparative assessment of various spectral indices for built-up land derived from Landsat-8 OLI and Sentinel-2A MSI imageries

ABSTRACT Urbanization in China has been rapid over the past three decades causing substantial replacement of the natural landscape by built-up land. In this paper, we present a comparison of Sentinel-2A MSI (S2A) and Landsat-8 OLI (L8) data in the retrieval of five built-up indices, namely Urban Index (UI), Normalized Difference Built-up Index (NDBI), Index-based Built-up Index (IBI) and two visible based indices, i.e. VgNIR-BI and VrNIR-BI. All the built-up indices maps water-masked were classified into built-up and non-built-up land using Otsu’s method. Simultaneously, the support vector machine (SVM) algorithm was employed to classify the two imageries into three respective classes. The accuracy assessment results show that all built-up indices had higher Overall Accuracy for S2A (up to 98.14% for VrNIR-BI) and L8 (up to 98.42% for VrNIR-BI) imageries compared to SVM. The percentage differences demonstrate that L8 estimates higher built-up area compared to S2A between 1.48% and 8.45% via the built-up indices and 13.40% compared to the SVM. Cross-checking with the Statistical Yearbook, S2A is superior to L8 in built-up land mapping capability, especially utilizing built-up indices. The difference caused by spatial resolution and spectral response functions should be taken into consideration in synergistic scientific application.


Introduction
The world is urbanizing rapidly, and the urbanization process has replaced a substantial amount of natural landscape by built-up land (H. Q. Xu, Huang, & Zhang, 2013). The accelerating trend is self-evident in the urbanized density and the spatial sprawl expansion of urbanized areas, triggering the changes from natural landscape to impervious surfaces (Sun, Chen, Jia, Yao, & Wang, 2016). The spatial distribution and temporal dynamics of built-up land play a significant role in ecosystem services and global environment change. Therefore, there is an urgent need to monitor and assess spatial distribution and sprawl pattern of urban area precisely (Grimm et al., 2008;Masek, Lindsay, & Goward, 2000). Since the late 1990s, the number of remote-sensing satellites has greatly increased. As for urbanization, it is essential to derive land-use/cover maps from remote-sensing imageries (Antrop, 2007;Radoux et al., 2016). In recent years, the Landsat and Aster imageries are often used for mapping land-use/cover and (Coulter et al., 2016;De Rose, Oguchi, Morishima, & Collado, 2011;Garcia et al., 2008;Satir & Erdogan, 2016;Schmugge, Kustas, & Humes, 1998). Meanwhile, some researches focused on modelling the urban sprawl and its environment impact (Kurucu & Chiristina, 2008;Nole, Murgante, Calamita, Lanorte, & Lasaponara, 2015), and stimulating the heat island phenomenon (Kato & Yamaguchi, 2005;Liu & Zhang, 2011). In general, there are three key indicators to characterize the urban ecosystem including biodiversity (Cohen, Baudoin, Palibrk, Persyn, & Rhein, 2012;Savard, Clergeau, & Mennechez, 2000), the density of vegetation (Jung, Kardevan, & Tokei, 2005;Quattrochi & Ridd, 1998) and impervious surface (Shahtahmassebi et al., 2016;Weng, 2012). The increase in impervious surface in urban area has led to the degradation of the environment (Carlson & Arthur, 2000) and the decrement of natural resources (Kaufmann et al., 2007).
In order to rapidly and accurately map built-up lands from satellite imageries, previous studies have put forward various spectral built-up indices. This includes the UI (Urban Index) (Kawamura, Jayamana, & Tsujiko, 1996), the NDBI (Normalized Difference Built-up Index) (Zha, Gao, & Ni, 2003), the IBI (Index-based Built-up Index) (H. Xu, 2008), VrNIR-BI and VgNIR-BI (Estoque & Murayama, 2015), and the Normalized Difference Impervious Surface Index (NDISI) (H. Q. Xu, 2010). In spite of the continuous development of various built-up spectral indices, there is still a lack of comprehensive comparison applied to various imageries acquired by different sensors, especially Sentinel-2A MSI imagery. Hence, a preliminary comparative study for assessing various spectral indices of built-up in mapping built-up land is of vital significance between the Landsat-8 OLI and Sentinel-2A MSI sensors.

Study area and data resource
Xuzhou city (33°43′-34°58′N, 116°22′-118°40′E) is located in the northwest of Jiangsu Province, China, adjacent to Shandong, Henan, and Anhui provinces ( Figure 1). It consists mainly of plains, with small hills and mountains in the central and northeastern regions. For the purpose of this study, the central area of the city was selected as the study area with three districts, and it covers an area of 43,945.3 ha. The Old Yellow River flows across the city generally from west to southeast, and the Yunlong Lake is located in the south of the region. The typical land-use types consist of built-up land, water, forest, grassland and cropland in the study area.
Landsat-8 OLI and Sentinel-2 MSI sensors were launched in 2013 and 2015, respectively. Apart from commercial satellite companies providing high-resolution imagery, the new generation of Landsat-8 and Sentinel-2 satellites provide moderately high-resolution imagery free-of-charge. There are numerous characteristics in common but also some dissimilarity between the two sensors. For instance, they have differences not only in spatial resolution but also in spectral resolution. A comparison of the multispectral bands between the two sensors is shown in Table 1, and the spectral response functions between Landsat-  Figure 2). Table 1 Band name, spectral range (nm) and spatial resolution (m) of the corresponding Sentinel-2A MSI, Landsat-8 OLI To conduct a comparative study on the differences between the sensors, we acquired the image pair with similar solar azimuth and elevation angles. In this study, we used a near-simultaneous Sentinel-2A MSI image and Landsat-8 OLI image (path 122, row 36) acquired on 28 August 2016 02:55:42 Z UTC (solar elevation angle 56.91633907°, solar azimuth angle mean 146.682425442976°) and 2 September 2016 02:49:07 Z UTC (solar elevation angle 56.91631739°, solar azimuth angle mean 139.25155778°), respectively, concurrent with the field campaign (same year and season). The S2 MSI data were downloaded from the Sentinels Scientific Data Hub (https://scihub.coperni cus.eu/) as Level-1C product with 13 bands, i.e. geometrically corrected Top-of-Atmosphere reflectance. We downloaded L1T (Standard Terrain Correction) images of Landsat-8 OLI image with 11 bands from the United States Geological Survey (USGS) Earth Explorer web (http://earthexplorer.usgs.gov). Figure 3 illustrates the spectral profiles of different land-use types in the study area. The typical land-use types in the study area show a similar surface reflectance pattern in the two images ( Figure 3), despite the fact that there are differences in the wavelength (Table 1) and spectral response functions (Figure 2) of the two sensors.

Data pre-processing
It should be noted that atmosphere condition and aerosols are variable in space and time, and have significant impacts on remote-sensing images. Conducting radiometric calibration and atmospheric correction is compulsory prior to deriving the various spectral indices,   including built-up index, water index and vegetable index, etc. In this study, we used LaSRC (Landsat Surface Reflectance Code) to obtain each band surface reflectance of Landsat-8 OLI (https://espa.cr.usgs.gov/), which is developed for Landsat application, and explicitly models their spectral response functions (Vermote, Justice, Claverie, & Franch, 2016;Zhang et al., 2018). The surface reflectance was stored as signed 16-bit integers scaled by 10,000.
The Sentinel-2 MSI data are distributed as Level-1C, which are ortho-image TOA reflectance products (Radoux et al., 2016). To derive the MSI land products at Level-2A (Bottom-of-atmosphere reflectance), the Sentinel toolbox (S2TBX) provides the Sen2Cor processor (Version 2.3.1, released 13 February 2017) for atmospheric correction (Martins et al., 2017). As a module of the Sen2Cor algorithm, an operational atmospheric correction is applied to the MSI spectral bands to retrieve atmospheric parameters from the image itself, with cirrus correction in a channel at 1375 nm; water vapor retrieval based on the B8A and B9 bands (865, 945 nm) and AOD retrieval (Muller-Wilm, Louis, Richter, Gascon, & Niezette, 2013). Consequently, the algorithm implements a semi-empirical approach that associates image-derived atmospheric properties with the precomputed Look-up table (LUT) from libRadtran radiative transfer model. It is the advantage of this image-based approach that supports the application in regions without climatological information.

Spectral built-up indices
In the 1990s, UI (Equation (1)) as a built-up index was derived to generate information and evaluate the situation and sprawl pattern of built-up areas from remote-sensing data (Kawamura et al., 1996). The UI normalizes the NIR and SWIR 2 band, which makes full use of the inverse relationship between the brightness of the NIR and SWIR in built-up area. In recent years, NDBI was usually used to map or detect changes of built-up land, which fully utilize the unique spectral response of built-up land against other types (Zha et al., 2003). Contrasting with UI, the NDBI employs the SWIR 1 (Equation (2)). In fact, due to drier vegetation with higher reflectance in SWIR band, it is difficult to distinguish the built-up from drier vegetation (Cibula, Zetka, & Rickman, 1992). In 2008, XU proposed IBI to rapidly extract built-up area from remote-sensing data (H. Xu, 2008). The index is constructed based on other index-derived from original image bands and is different to conventional indices (Equation (3)). The IBI can overcome the limitation of NDBI, which utilizes the visible, NIR and SWIR bands. Combining with the NIR band, there are two Vis-based indices employed in this study, namely VrNIR-BI and VgNIR-BI, utilizing the red and green bands, respectively (Estoque & Murayama, 2015) (Equations (4) and (5)): where ρ Green , ρ Red , ρ NIR , ρ SWIR1 and ρ SWIR2 are the planetary reflectance of the green, red, NIR, SWIR1 and SWIR2 bands of the Landsat OLI and Sentinel-2A MSI sensors, respectively.

Optimal threshold
Binarization is the key step to extract built-up land from various built-up spectral indices. During this process, threshold plays an important role. The Otsu's (1979) method as a common method of image segmentation is one of the most accurate and widely used methods (Sahoo, Soltani, Wong, & Chen, 1988). Otsu's method automatically obtains the optimal threshold via the histogram of the grayscale image. The basic principle of this method is to calculate the threshold, which maximizes the between-class variance σ 2 B t ð Þ (Mizushima & Lu, 2013). It can be expressed as follows Equation (8): where M G is the average intensity of the entire image, M t ð Þ is the cumulative mean up to level t (t 2 a; b d e), P t ð Þ is the cumulative sum of probability assigned to object (background). The value t Ã is the optimal threshold, which maximizes σ 2 B t ð Þ (Equation (9)): A large number of studies have suggested that Otsu's method is an effective solution for the extraction of water, built-up and other land-use type from various index maps derived from remote-sensing data (Du et al., 2012(Du et al., , 2014Li et al., 2013). In this paper, based the Otsu's method, we developed a Python program to obtain the optimal thresholds (Figure 6).

Comparison and results
In this study, we used three comparative approaches that are based on band, spectral index and classification algorithm.

Corresponding bands comparison
The corresponding bands comparison was only implemented on the bands related to calculating built-up indices or classification algorithm for the two sensors, which are the blue, green, red, NIR, and SWIR 1 and SWIR 2 bands. Sentinel-2A data have two NIR bands: band 8 and band 8a. Band 8 (NIR) has a wider spectral range than band 8a (NIR narrow ) and is intended for producing outputs in 10 m resolution (Table 1). Therefore, we omitted band 8a in the comparison and only used band 8. A comparison of atmospherically corrected Landsat-8 and Sentinel-2A data of the study area showed that the SWIR 1 and SWIR 2 mean reflectance are coincident (mean difference −2.97%~5.29%), but in the visible to NIR bands, the Landsat-8 reflectance were higher than the sentinel-2A reflectance (mean difference 17.34%-80.63%) (Figure 4). Furthermore, we also conducted regression analysis of corresponding bands and obtained the coefficients and correlativity for each band pair using least squares fit ( Figure 5).

Comparison based on built-up index
In this paper, three land-use classes were mapped, which are built-up, non-built-up and water. There is a two-step process, including extracting water bodies and classifying built-up area based on built-up indices. The non-built-up class includes forest, grassland, and cropland. The built-up area includes the residential area, parking area, road and other impervious surfaces.
The water bodies were firstly extracted from the two water index maps. There are two most commonly used water indices: one is NDWI (McFeeters, 1996) and the other is MNDWI (H. Q. Xu, 2006). In this study area, the results indicated that MNDWI is far superior to NDWI in Landsat-8 by means of Otsu's method to produce a binary water map, while for Sentinel-2A image the opposite is true. Thus, in this study, the MNDWI (Equation (7)) was used for Landsat-8, and NDWI for Sentinel-2A (Equation (6)). Otsu's method, mentioned in Section 2.4, was used to calculate the optimal thresholds and produce binary water maps, one for the Landsat-8 data and one for the Sentinel-2A data (Figure 6). The water bodies include lakes, reservoirs, rivers, etc. Landsat-8 and Sentinel-2A imageries were masked by the two binary water maps. Subsequently, the built-up index maps were derived from each water-masked image (Equations (1)-(5)). 10 optimal thresholds were calculated based on water-masked built-up index maps utilizing Otsu's method (5 for the Landsat-8 data and 5 for the Sentinel-2A data). The histograms and optimal thresholds of the watermasked built-up index maps are shown in Figure 6. Thus, 10 binary maps were produced, each containing built-up land and non-built-up land. Finally, the 10 binary maps were merged with the extracted water class, respectively, obtaining 10 three-classed landcover maps (Figure 7, Figure 8).

Comparison based on classification algorithm
As supervised learning models with associated learning algorithms, the support vector machine (SVM) was used for classification and regression analysis without any assumption made on the underlying data distribution (Pal & Mather, 2005). In this study, the SVM classification algorithm was applied to classify the two different remote-sensing data. In order to achieve the goal of comparing the capability in mapping built-up area between the two sensors, three major classes were classified, including built-up area, non-built-up area and water bodies. The number of training samples for each class is greater than 200. Each training sample from the homogeneous area was carefully selected to improve the accuracy of classification. Furthermore, to ensure the results of this method are comparable to classification based on various built-up indices, no ancillary and no further post-processing were used in SVM classification (Figures 7 and 8).

Quantitative accuracy assessment
To verify the robustness of the five spectral built-up indices and classification based on SVM algorithm, quantitative assessments of the accuracy were implemented. During quantitative assessments, we calculated their respective omission error (OE), commission error (CE), overall accuracy (OA) and overall kappa coefficients (KCs) of classification utilizing built-up indices and SVM on the basis of the confusion matrices (Congalton, 1991;Foody, 2002).
A total of more than 150 reference samples for each class were applied to assess the accuracy of all the classification results. Each reference sample was checked against the pan-sharpened GF-1 PMS image (2m resolution). Thus, all the land-cover maps indexderived and SVM-derived were compared with the reference samples to obtain the confusion matrix and get the respective OEs, CEs, OAs, and Kappa coefficients (Table 2). It can be seen that water bodies derived using the MNDWI/NDWI and Otsu's optimal threshold method had a CE and OE of 0.00% and 3.67% for the Landsat-8 image (Table 2, Figure 7), and 0.02% and 2.06% for the Sentinel-2A image (Table 2, Figure 8), respectively. For the built-up land, the NDBI had the highest CE for both imageries (4.62% and 3.25%). It also had the highest OE for the Sentinel-2A image (6.28%), while the UI had the highest OE for the Landsat-8 image (2.42%), and NDBI with 7.76% OE for Sentinel-2A. It is clear that the CEs of the built-up area derived from indices were higher as compared to their respective OEs for Landsat-8, while CEs were much lower as compared to their respective OEs for Sentinel-2A. The results indicate that the area of built-up lands derived from Landsat-8 have been over-estimated by these indices in the study area, while the opposite is true for Sentinel-2A (Table 2, Figure 7, Figure 8). The results also show that these indices also had the higher OAs for both Sentinel-2A (up to 98.14% for VrNIR-BI) and Landsat-8 (up to 98.42% for VrNIR-BI) imageries. But there are great differences in the built-up area extracted by built-up indices between Landsat-8 image and Sentinel-2A image. The percentage differences show that Landsat-8 estimates higher built-up area than Sentinel-2A by 1.48% to 8.45% (Table 2).
Table 2 also indicates that Landsat-8 estimates higher built-up area in the SVM classification scheme than Sentinel-2A by 13.40%, though they have higher OAs (97.67%, 99.53%, respectively) for both Landsat-8 and Sentinel-2A. According to the Statistical Yearbook of Xuzhou City, the built-up area is approximately 225 km 2 by the end of 2015. This indicates that Sentinel-2A image is superior to Landsat-8 image in built-up land mapping capability, especially utilizing built-up indices.

Discussion
The results show that the performance of the two sensors in mapping is not identical (Table 2). Spectral confusion is one of the reasons why the classification scheme failed to better distinguish the built-up land from the other lands, especially from drier vegetation. Generally, compared with the Landsat-8 sensor, Sentinel-2A has a higher accuracy to estimate built-up area under the same circumstances. We believe that this is a result of the difference between the two sensors in the relative spectral response function (Figure 2), as well as the spatial resolution ( Table 2).
As for all the built-up indices derived from Landsat-8 and Sentinel-2A, they all employ the NIR band (Equation (1)(5)). It can be seen from Figure 3 that the built-up land and water bodies have lower reflectance than cropland/grassland and forest. The main argument is that built-up land have the higher reflectance in MIR than NIR. The three built-up indices, namely the UI, NDBI and IBI are calculated using SWIR (SWIR 1/SWIR 2) with the NIR of the electromagnetic spectrum (Red and Green for IBI). This enabled the three built-up indices to well distinguish built-up land from vegetation types, guaranteeing more accurate classification results.
By contrast, combing with NIR channel, the VrNIR-BI and VgNIR-BI utilize the red and green bands, respectively (4)-(5)). We can see from Figure 3 that in green and red bands, built-up land appears brighter than other lands such as crop/grass, forest and water ( Figure 3). Otsu's optimal thresholds for two indices were lower than the thresholds of the other indices, as the built-up area has higher reflectance in NIR bands than the two visible bands (Figure 6). It is worthwhile to mention that the VrNIR-BI, as the inverse of the NDVI, more accurately classified built-up lands from Landsat-8 than from Sentinel-2A data.
Otsu's method performed relatively well for grayscale image with bimodal or multi-bimodal histogram distribution and deep and sharp valley among peaks. It contributes to the automation and objectivity of extraction optimal threshold in grayscale image binarization process (Chen, Zhao, Li, & Yin, 2006;Masek et al., 2000;Thapa & Murayama, 2011). All histograms of index maps calculated in this study almost have the characteristic of bimodal distribution, more especially the VrNIR-BI and VgNIR-BI ( Figure 6).
Additionally, the higher precision area estimation of built-up land from Sentinel-2A could also benefit from the higher spatial resolution of the sensor. In all the Sentinel-2A bands utilized in built-up indices and SVM classification, four of the Sentinel-2A bands (blue, green, red, and NIR) have a resolution of 10 m, and the SWR bands with 20m resolution (resampled with 10m resolution while computing built-up indices). While for Landsat-8, all the bands we used in this study have a base resolution of 30 m. It is self-evident that higher spatial resolution provides more detailed information.
As for land-cover mapping derived from remotesensing data, the accuracy is defined as the consistency between the validated reference points or pixels and the pixels on corresponding classified land-cover map. In theory, it should be desirable to compare each pixel in the classified land-cover map with the reference data or ground truth information (Myint, Gober, Brazel, Grossman-Clarke, & Weng, 2011). It is, however, unrealistic to collect reference data for the whole study area, and opposed to the purpose of land-use classification from remote-sensing data. Previous studies advised that at least 50-100 samples for each class should be selected to assess the accuracy of classification (Congalton, 1991). Taking into account the balance between practical constraints and statistical rigor, no less than 150 reference samples for each class were gathered to perform accuracy assessment of the classification results derived from various indices and SVM method in the study area with 43,945.3 ha.

Conclusions
Sentinel-2A MSI is the super-spectral instrument of the European Space Agency (ESA) with a spatial resolution ranging from 10 m × 10 m to 60 m × 60 m with 13 spectral bands, which is an additional data continuity applied to monitor the global land surface with Landsat and SPOT missions. Similar to the other remote sensors, Sentinel-2A MSI is a valuable source of data for global change studies. It would provide benefits to use the Sentinel-2A data for Landsat data continuity and to fill the data gaps in the future scientific applications. To the best of the authors' knowledge, this research is the first comparative study on built-up land derived from various builtup indices between Landsat-8 and Sentinel-2A satellite imageries. As a contribution to the field of remote-sensing, we have compared and analyzed the performance of five spectral built-up indices in builtup land mapping from Landsat-8 and Sentinel-2A imageries. The capabilities of Landsat-8 and Sentinel-2A also were assessed in built-up land mapping. Overall, Sentinel-2A has similar performance in built-up land mapping to Landsat-8 in this study area, yet there are some differences, which may have been caused by the differences in the spectral response functions and spatial resolution between the two sensors. When deriving built-up land from Landsat8 OLI and Sentinel-2A MSI data for a synergistic scientific application, the differences between the two sensors have to be taken into consideration. As already mentioned in previous relative study (Zhang et al., 2018), TOA, surface reflectance, NBAR and derived NDVI were compared and the differences were quantified by OLS transformation functions between the two sensors.
As for urban remote sensing, a large number of studies have shown that it is more effective to detect built-up lands by means of built-up indices utilizing the SWIR 1/SWIR 2 bands, in which, built-up lands have higher reflectance, allowing them to be clearly distinguished from other lands. In this paper, VgNIR-BI and VrNIR-BI, which combine the red and green bands with the NIR band, can be used to separate built-up lands from vegetable types, especially the dry vegetation types. The results reveal that the two built-up indices are more robust, superior and accurate than others indices (Table 2). It is no doubt that the regions with different landcover types have a direct impact on land-use classification result. Honestly, it is a limitation by using only one single study area in this study. It is necessary to further study and compare the performance of builtup land extraction between the two sensors in different regions, especially those with complex land-cover types. Overall, the findings of this study would also raise awareness of the differences between Landsat-8 OLI and Sentinel-2A MSI, as well as sensors with similar spectral and spatial characteristics, such as SPOT, ETM+, and ASTER. It would be helpful to the seamless integration of historical imageries, and to build long-term time series for dynamic monitoring in a synergistic scientific application based on more than one remote sensors.

Disclosure statement
No potential conflict of interest was reported by the authors.