Fusion of Sentinel-1A and Sentinel-2A data for land cover mapping: a case study in the lower Magdalena region, Colombia

ABSTRACT Land cover–land use (LCLU) classification tasks can take advantage of the fusion of radar and optical remote sensing data, leading generally to increase mapping accuracy. Here we propose a methodological approach to fuse information from the new European Space Agency Sentinel-1 and Sentinel-2 imagery for accurate land cover mapping of a portion of the Lower Magdalena region, Colombia. Data pre-processing was carried out using the European Space Agency’s Sentinel Application Platform and the SEN2COR toolboxes. LCLU classification was performed following an object-based and spectral classification approach, exploiting also vegetation indices. A comparison of classification performance using three commonly used classification algorithms was performed. The radar and visible-near infrared integrated dataset classified with a Support Vector Machine algorithm produce the most accurate LCLU map, showing an overall classification accuracy of 88.75%, and a Kappa coefficient of 0.86. The proposed mapping approach has the main advantages of combining the all-weather capability of the radar sensor, spectrally rich information in the visible-near infrared spectrum, with the short revisit period of both satellites. The mapping results represent an important step toward future tasks of aboveground biomass and carbon estimation in the region.

Radar and optical remote sensing data deliver complementary information, hence land cover classification tasks can take advantage of the fusion of both data types leading generally to increase mapping accuracy (Bagan, Kinoshita, & Yamagata, 2012;Herold & Haack, 2002). Erasmi and Twele (2009) used Landsat ETM+ and Envisat ASAR data for land cover mapping in Central Sulawesi, Indonesia. The authors found that SAR-based texture information combined with VNIR optical data led to an improved classification of vegetation. ALOS PALSAR and phenological information from the MODIS sensor were used by Qin et al. (2016) to map forests in Monsoon Asia using decision tree algorithms. L-band SAR helped in reducing the limitations of frequent cloud coverage and improved separability of evergreen shrubs and crops from forests. Gessner et al. (2015) mapped West African land cover by integrating optical MODIS, Envisat ASAR and TandemX/TerraSAR-X radar data using random forest classification (Breiman, 2001) combined with a an unsupervised classification scheme exploiting texture measures and backscattering amplitudes. The classification approach led to an overall accuracy of 80%. In general terms, the main advantage of the integration of satellite and optical radar data for classification lies in the fact that by exploiting different physical principles they provide both multi-band spectral information and structural land cover properties to be synergistically combined (see Joshi et al., 2016 for a comprehensive review of optical and radar data fusion for LCLU mapping).
With the development of the recent European Space Agency Sentinel-1, -2, -3 satellite constellation new opportunities open for Earth Observation. The constellation provide high operational ability, long-term continuity, superior calibration of sensors and a variety of sensing methods and products for the scientific community (Malenovský et al., 2012). Also, Sentinel data distribution is supported by the key advantage of a full free and open access policy for the majority of the products (Belward & Skøien, 2015). In this research we explored the integrated use of the recently launched Sentinel-1A and Sentinel-2A satellites for synergistic land cover mapping exploiting radar and optical data, for a case study in Colombia. The Sentinel-1 sensors provide C-band SAR images in single/dual polarization for a variety of acquisition modes (see Torres et al., 2012 for detailed characteristics of the Sentinel-1 mission). Its combination of high spatial (5 × 20 m in the Interferometric Wide Swath mode), large coverage (up to 400 km) and improved temporal resolution are expected to provide new exciting possibilities for accurate land cover mapping (Wagner et al., 2012). Frequent revisit time is a major advantage over previous radar missions, especially for the mapping and analysis of phenological dynamics in vegetation and agricultural land covers, together with the dual polarization capability and rapid product delivery (Torres et al., 2012).
The Sentinel-2 Multi-spectral Instrument (MSI) sensors provide radiometrically and geometrically superior multi-spectral high spatial resolution images over the global surface, at high revisit time (5 days at the Equator with two satellites in orbit) and a wide field of view covering 290 km with 13 bands in the optical NIR, SWIR parts of the electromagnetic spectrum (Drusch et al., 2012). Major advantages of Sentinel-2 with respect to previous satellite derived products (e.g. Landsat series) are represented by richer spatial and spectral content together with improved operational capabilities (Pesaresi et al., 2016). For vegetation mapping Sentinel-2 is especially relevant for the presence of two new bands in the red edge spectrum, at 705 and 740 nm (Delegido, Verrelst, Alonso, & Moreno, 2011). In other words, Sentinel-2 ′ s unique combination of characteristics represents an unprecedented potential for land cover characterization and mapping at regional and global scales (ESA, 2012;Spoto et al., 2012).
The main objective of this investigation is to derive a methodological framework to integrate Sentinel-1 and Sentinel-2 imagery for accurate land cover mapping by integrating the advantages of radar and optical imagery, using object-based and spectral classification techniques. The study was motivated by the need for both cost-effective and accurate land cover map production in a cloudy region (Colombian Andes), as part of an environmental characterization of the nationally funded project 'Strategies for natural resources valuation and ownership as a climate change adaptation mechanism in the Lower Magdalena region, Colombia'.

Study area
The study area covers approximately 2095 km 2 , in the Lower Magdalena region within the Cundinamarca district in central Colombia ( Figure 1). Elevation ranges from 90 m a.s.l. of the north-western fluvial plains till 2570 m of the south-eastern mountain range. Yearly average temperature is 23.8°C, with averages varying from 27.1°C maximum of the low altitude plain till 16.9°C minimum of the upper areas. Average precipitations are 1973 mm/year, std. dev. 359 mm (Karger et al., 2016). In the study area the warm western fluvial areas contiguous to the Magdalena River are dominated by managed grasslands and dry tropical forests. Moving eastwards the landscape becomes hilly and mountainous, reaching the Andean Cordillera Oriental, being characterized by secondary forests and degraded tropical mountain forests. At the east of the region, secondary vegetation and agricultural mosaics are the dominant land cover.
The study area includes five municipalities (Guaduas, Caparrapi, Quebradanegra, Utica and Puerto Salgar), whose economic activity is mainly based on livestock, and farming (especially sugar cane, coffee, bananas). Agricultural activities and cattle raising have produced marked landscape fragmentation in the region, a common characteristic of a large part of the Colombian Andes (Etter, McAlpine, & Possingham, 2008). Field observations have revealed that land degradation is also quite common, due principally to illegal forest logging activities and fires lighted to open fields and to induce green re-sprouting to feed cattle. The study area was selected for its importance in the context of the regional economy and for the presence of rapidly developing unregulated land cover conversion dynamics; such land changes also make the territory especially vulnerable to climate change-induced natural hazards. The heterogeneous and complex environment is associated with cloudy conditions that require the use of an all-weather remote sensing technology, but additionally the use of optical-IR spectrally rich imagery in order to achieve satisfactory classification and mapping accuracy through the diverse socio-ecological gradients present.

Satellite imagery pre-processing
Two Sentinel 1A images (Level 1 product) and two Sentinel-2A images (Level 1C) covering the entire study area were downloaded from the ESA's Sentinel Scientific Hub (ESA, 2016a). SAR data pre-processing of the Level-1 SLC image was performed using the SNAP Sentinel Application Platform toolbox (ESA, 2016b) standard processing methods (calibration, thermal noise removal, TOPSAR deburst, mosaicking), as well as resampling to 10 × 10 m 2 spatial resolution and multilook parametrization for both VV and VH polarization images. Speckle effects, characteristic of radar images that influences the radiometric information, were compensated by the use of a Refined Lee low-pass filter with a 5 × 5 kernel, which averages the images while preserving feature edges (Wang, Ge, & Li, 2012). A Ground Range Detection (GRD) level product was produced from the Single Look Complex (SLC) level 1 image.
The ESA SEN2COR processor for was used for atmospheric and terrain correction of the Sentinel-2A Top-Of-Atmosphere Level 1C image (ESA, 2016c) to obtain a Level-2A BOA reflectance image. A bilinear interpolation-based re-sampling to 10 × 10 m 2 spatial resolution was performed to bands 5, 6, 7 and 8a to achieve the same resolution of bands 2, 3, 4 and 8 and the re-sampled Sentinel-1A radar images. Both optical and radar images were terrain corrected using a 30 m resolution Shuttle Radar Topography Mission (SRTM) digital elevation model (Jarvis, Reuter, Nelson, & Guevara, 2008) and re-projected to UTM Zone 18, WGS84.

Land cover mapping
Ground surface texture, strongly influencing the radar image backscatter (Herold et al., 2006), represents valuable information for classification tasks, and has been previously exploited in radar-based land cover mapping (e.g. Chand & Badarinath, 2007;Haack & Bechdol, 2000;Herold, Haack, & Solomon, 2004). Radarderived texture measures were thus calculated using the SNAP grey-level co-occurring matrix (GLCM) module (texture variance and texture contrast, based on Haralick, Shanmugat, & Dinstein, 1973) for both VV and VH polarization, parameterized with a 5 × 5 window size and 32 probabilistic quantization levels. As a result, we obtained four different texture images of the Sentinel-1A data.
Vegetation indices are parameters sensitive to photosynthetically active radiation and are commonly calculated from the spectral reflectance of two or more bands (Bannari, Morin, Huete, & Bonn, 1995). Importantly, they have been demonstrated to be effective variables for vegetation discrimination (Karlson, Ostwald, Reese, Roméo Bazié, & Tankoano, 2016;Vaglio Laurin et al., 2016). Four vegetation indices were calculated for the Sentinel-2A reflectance image: the Normalized Difference Vegetation Index (NDVI; Rouse, Haas, Schell, Deering, & Harlan, 1974), the Sentinel-2 Red-Edge Position index (S2REP; Frampton, Dash, Watmough, & Milton, 2013), the Green Normalized Difference Vegetation Index (GNDVI; Gitelson & Merzlyak, 1998), and the Modified Soil Adjusted Vegetation Index (MSAVI; Qi, Chehbouni, Huete, & Kerr, 1994). NDVI is a commonly used vegetation index in remote sensing, and it has been used extensively in land cover mapping (e.g. Rodrigues, Marcal, Furlan, Ballester, & Cunha, 2013;Usman, Liedl, Shahid, & Abbas, 2015). Previous studies reported that land use classification can also generally benefit from the incorporation of information from the red-edge part of the electromagnetic spectrum (e.g. Recio, Helmholz, & Muller, 2011;Schuster, Förster, & Kleinschmit, 2012). The S2REP red-edge index is based on linear interpolation (following the method of Guyot & Baret, 1988) by exploiting Sentinel-2A bands 5 and 6, both positioned on the red edge slope, at 705 and 740 nm respectively. The GNDVI exploits the green visible spectrum (540-570 nm), and apart from being more sensitive to chlorophyll concentration than NDVI, it is a good senescence and leaf hydric stress indicator (Gitelson & Merzlyak, 1998), a situation often occurring in the drier periods within the western plains of the study area. The MSAVI index was chosen to minimize the effect of bare soil, particularly present in the western fluvial plains, and additionally because of its minor susceptibility to atmospheric effects (Qi et al., 1994;Saadat et al., 2011). Indices formulations and corresponding Sentinel-2A MSI bands are as follows: where L = 1−2 * s * (NIR−RED)*(NIR−s*RED)/(NIR+RED), where s is the slope of the soil line from a plot in the RED-NIR space.
A set of land cover classes of interest were defined, namely: forests, secondary vegetation/shrubs, cropland, water, pastureland and built (comprising urban and paved roads). The land use-land cover categories were selected using common categories described in many studies, and functional to the project mapping task, that is, the identification of a small number of dominant LCLUs for which, in a second stage of the project, aboveground biomass will be estimated. Due to the high heterogeneity present within some land cover classes, especially cropland, we opted for a classification approach that also exploits land textural properties. An object-based classification (segmentation) was performed using eCognition® v.9.01 (eCognition, 2016) by a user-supervised parameterization. Three steps represent the main operations performed: (i) image segmentation, (ii) generation of an image object hierarchy and (iii) classification (Blaschke, Burnett, & Pekkarinen, 2005). In this approach 'objects' are defined exploiting topologic (neighborhood, context) and geometric (form, size) information. The identification of homogeneous land patches was carried out through a supervised iterative process by tuning parameters of scale, shape and compactness. Optimum segmentation was identified with scale = 20, shape = 0.8 and compactness 0.2. We then trained the objectbased classifier by assigning a land cover class (labeling) to a set of 130 locations identified in Google Earth and a set of 30 field points (measured using a global positioning system receiver) collected in situ in Spring 2016. In order to evaluate the performance of different classification methods, we tested three different classification algorithms: Random Forest (Breiman, 2001), Support Vector Machine (SVM) (Burges, 1998) and k-Nearest Neighbors (k-NN) (Altman, 1992) assigning the same weight to each layer of the 20bands layer stack image. Additionally, to test the influence of the integrated radar-VNIR datasets, we applied the same classification algorithms to two layer stacks composed respectively by (i) the Sentinel 1A radar layers, and (ii) the Sentinel 2A VNIR layers. The estimation of the classification accuracy was performed based on a dataset composed by 160 field validation points. Thematic classification accuracy indices (overall, user, producer accuracy and Kappa coefficient) were then calculated. Figure 2 outlines the pre-processing and processing steps followed in this study.
The use of the radar stack layers and the VNIR stack layers separately did not produce acceptable results with any classifier. In contrast, the classification results showed that the SVM-classification algorithm applied to the integrated layer stack composed of both Sentinel radar and VNIR data outperformed the other configurations of classifiers and layer stacks (accuracy statistics in Table 1), thus we discuss here detailed results only from this classification approach.
Visual inspection of the object-oriented classification versus a natural color composite (Sentinel-2A) shows close-to-reality classified land cover patches. The integrated datasets SVM-classification map shows large extents of grasslands at the northern and western flat parts of the study area, a territory that is largely used for livestock (see Main Map). The eastern part of the study area is more heterogeneous, and characterized by a mosaic of croplands, shrubs and forests. Here the topography is more rugged thus less suitable to develop large fields for cattle raising. The dominant land cover in the study area is pastureland (860 km 2 ), followed by forest (591 km 2 ) and secondary  Figure 2. Pre-processing and processing chain performed in this study (integrated radar-VNIR dataset). vegetation/shrubs (476 km 2 ). Following standard accuracy calculation procedures (Congalton & Green, 2009), we estimated an overall accuracy of 88.75%, and a Kappa coefficient of 0.86. Omission and commission classification errors were found mainly in crops and secondary vegetation/shrubs, likely due to the high spectral heterogeneity present in both classes generated by the large variety of different cultivars and of vegetation types/growth stages, respectively. Nevertheless, no user and producer accuracy estimates were lower than 80% per class. A detailed confusion matrix of the integrated datasets SVM-classified map, with overall, user and producer class accuracies, is reported in Table 2.

Conclusion
The proposed LCLU classification approach using Sentinel-1A and Sentinel-2A data has two main advantages. First, the pre-processing chain supported by sensor-specific toolboxes developed by ESA represents a reliable and fast approach for the preparation of ready-to-process imagery. Second, the integration of texture and spectral information for an object-based classification produced satisfactory to very satisfactory accuracy results for all the LCLU classes considered. The inclusion of vegetation indices added important information for the discrimination between forests, secondary vegetation and crops. Although we are aware that a certain degree of redundancy by adding four vegetation indices is inherently present, we believe vegetation in heterogeneous environmental conditions, as in the study area, can be better detected using a set of different spectral indices. In this context the presence of red edge bands in the Sentinel-2 MSI imagery represents a significant spectral enrichment with respect to other commonly used sensors in LCLU mapping exercises.
A main advantage of using combined Sentinel-1 and -2 for LULC classification is the availability of data resulting from the all-weather capability of the active sensor, together with the short revisit period of both satellites (6 and 5 days at the equator with two operating satellites for the Sentinel-1 and -2 family, respectively). This is a major benefit in areas, such as the Andes, where the frequent and dense cloud cover represents a major problem in land cover mapping tasks.
A limitation of the classification approach used in this study is the supervised procedure used for the tuning operations of the parameters defining optimal object definition in eCognition, which limits the development of straightforward automatic processing tasks. On the other hand, it does represent a further step toward high accuracy classifications by taking into account more precisely site-specific land characteristics.
Further research will involve the analysis of the spectral discrimination potential of the Sentinel-1 and -2 sensors for the mapping of the diverse types of cultivars present in the region. Multi-temporal images can also offer a further area of research towards better discriminating and classifying temporally dynamic LULC classes, such as crops (Schmidt, Pringle, Devadas, Denham, & Tindall, 2016).
Also, the LCLU mapping results, with the outcome of few, but accurately mapped, LCLU categories, represents an important step towards to the calculation of aboveground biomass and carbon content estimation for each LCLU category in the study area. This will be undertaken during a second project phase.

Software
Data pre-processing was performed with ESA Sentinel Application Platform toolbox (SNAP) for the Sentinel-1A data, and ESA SEN2COR processor for the Sentinel-2A imagery. Object-based classification was performed using the image segmentation suite eCognition v9.01. Map layout was carried out with ESRI ArcGIS 10.4.1.

Acknowledgements
We thank Marta Lucia Guardiola and Mauricio Valencia for insightful discussions on the study area.

Disclosure statement
No potential conflict of interest was reported by the authors.