Urban land-use classification by combining high-resolution optical and long-wave infrared images

Abstract Multi-sensor and multi-resolution source images consisting of optical and long-wave infrared (LWIR) images are analyzed separately and then combined for urban mapping in this study. The framework of its methodology is based on a two-level classification approach. In the first level, contributions of these two data sources in urban mapping are examined extensively by four types of classifications, i.e. spectral-based, spectral-spatial-based, joint classification, and multiple feature classification. In the second level, an objected-based approach is applied to decline the boundaries. The specificity of our proposed framework not only lies in the combination of two different images, but also the exploration of the LWIR image as one complementary spectral information for urban mapping. To verify the effectiveness of the presented classification framework and to confirm the LWIR’s complementary role in the urban mapping task, experiment results are evaluated by the grss_dfc_2014 data-set.


Introduction
In recent decades, remote sensing images are provided with increasingly finer resolutions in both spectral and spatial domains. Improvements in spatial resolution for optical sensors are specifically prominent, which can support the more detailed and accurate mapping. Data from these sensors enable advanced applications, such as urban mapping, precision agriculture, environmental monitoring, and military applications. Besides, the optical sensors of high spatial resolutions also challenge the image analysis, since high spatial resolution can lead to high interclass spectral confusion (Huang and Zhang 2009). Consequently, spectral-based methods are not appropriate solutions to deal with very high-resolution (VHR) spatial images.
To overcome this inadequacy, various attempts have been made to improve VHR classifications. One of the most widely used methods is to extract spatial features to provide discriminate information, such as gray level co-occurrence matrix (GLCM) textures (Puissant, Hirsch, and Weber 2005;Pacifici, Chini, and Emery 2009), shape information (Zhang et al. 2006), objectbased approach (Huang, Zhang, and Li 2008), morphological profiles (Tuia et al. 2009), and Markov random field (Li, Bioucas-Dias, and Plaza 2012). Meanwhile, the emergence of new sensors and advanced processing techniques has made the utility of multi-source data increasingly feasible. Consequently, a different approach to compensate the spectral deficiency of VIR imagery is to integrate different data sources into applications, such as LIDAR data (Huang, Zhang, and Gong 2011) and SAR data (Waske and Benediktsson 2007).Since information provided by single sensor might be incomplete, inconsistent or imprecise, information fusion of multi-sensors is definitely a better approach in remote sensing classification, comparing with the single-source image classification.
The objective of this study is to examine the utility of the long-wave infrared (LWIR) information as a complementary spectral source for VHR classification in urban mapping. There are several studies reporting the use of thermal bands for land cover mapping. Lu and Weng (2005) converted the thermal infrared image into a surface temperature map and incorporated it in the classification process. Gao et al. (2006) used thermal infrared (TIR) bands to distinguish earth objects that have spectral confusion in the visible and nearinfrared bands, their experiments also revealed that TIR bands contained useful information to distinguish different types of rock. Bischof, Schneider, and Pinz (1992) stacked thermal data as spectral feature into the neural networks classification for the multi-spectral classification. Their experiments revealed that temperature information is helpful for the land cover class identification. Segl et al. (2003) treated thermal data as spectral features in the classification and their results showed that thermal data played a key role for improving identification of non-vegetated urban surfaces. Keuchel et al. (2003) selected a temperature threshold level on the thermal data to remove cool clouds from the warmer ground surface. Then the thermal band after preprocessing was combined with spectral bands for the classification. Their results showed that the thermal information indirectly provided helpful information for the land cover classification.

OPEN ACCESS
In order to exploit the utility of LWIR information in urban mapping, a novel two-level-based analytical approach is proposed in this paper. In the first level, optical image is firstly down-sampled to fit the size of the LWIR image. Afterward, four classification strategies are applied to investigate the contributions of the optical and LWIR image in urban mapping. Then, in the second level, an objected-based approach is adopted to project the low-resolution classification maps into the original size. The experiments in the first level are conducted on the 1 m optical image and 1 m LWIR image, and are conducted on the 0.2 m optical image in the second level.

The classification framework
Different from the traditional optical image that records sunlight reflectance of the earth surface, the LWIR image responds to the varied temperature and emissivity of the ground. In our classification framework (Figure 1), in order to investigate the utility of the optical/LWIR image and their combined applications in urban mapping, experiments are carried on in the following two levels: (1) First, the bilinear down-sampling process is implemented on the optical image as a preprocessing procedure.  Figure 1. the overall classification framework.
of these two data sources in the classification are extensively discussed by four types of classification: (a) Spectral-based classification for both optical and LWIR images. (b) Spectral-spatial classification for both optical and LWIR images. In this process, spatial features, such as GLCM texture, extended morphological profiles (EMP), extended morphological attribute profiles (EAP), differential morphological profiles (DMP), and 3D discrete wavelet transform (DWT) texture, are taken into comparison. (c) Joint classification of optical and LWIR images. Furthermore, to have a joint application of those various spatial features. (d) Multiple feature classification by feature stacking and decision fusion is applied.
(2) Second, the preferable classification maps from the first level are projected into the highresolution boundaries which are generated by multi-resolution segmentation.

Spectral-based classification
Optical and LWIR images are utilized as two individual spectral data sources to investigate their validity in classifications, respectively. Thus, only spectral features are input for classification. It is adventurous to apply classification directly on the LWIR image. The classification performance is intriguing. Support vector machine (SVM) classifier, ENVI, is adopted in our study for all the classifications, with penalty coefficient = 100, kernel = RBF (radial basis function), and band width σ = 1/n (n is the dimension of the input features).

Spectral-spatial classification
For the spectral-spatial classification process, we exploit the spatial features as complementary information in the spectral feature space. Improvements in mapping accuracies can be expected, when shape, texture, and spatial coherence are integrated with the spectral features. Spectral-spatial classifications are performed separately for these two data sources with their corresponding spatial features involved in the classifications. To be innovative, spectral-spatial classification is firstly implemented on the LWIR images. For these two data sources, the 3D DWT features are calculated based on the original spectral features, while the other four spatial features are built from the first principal component (PC1) of the original spectral features.

Opt-LWIR joint classification
The Opt-LWIR joint classification is achieved by stacking the optical images (1 m), the spatial features and three PCs (most significant) of the LWIR images into the classifiers. The corresponding spatial features of these two data sources are utilized separately for comparison. This process aims to examine the practicability of the combined imagery of optical and LWIR images for urban mapping, and to verify the respective effectiveness of spatial features generated from each data source in the urban mapping.

Multiple feature classification
Since there are so many spatial features concerned in the spectral-spatial and joint classification process, it is interesting to explore whether their combined application will further improve the accuracy of urban mapping. Stacking different features is a conventional way to integrate various features. Unfortunately, feature stacking can also lead to high dimensionality and is not so effective in certain cases. It is a promising solution to fuse the classification outputs generated by various spatial features according to some decision rules. Accordingly, the decision rules used in this paper are majority voting, posterior probability, and uncertainty, respectively. In our experiments, the stacking is to stack all the spatial features with the optical and the first three principal components analysis (PCA) of the LWIR image for the classification process. However, for the decision fusion process, all the classification outputs of the spectral-spatial and joint classification are taken into consideration.

Objected-based approach
According to practical application, it is necessary to recover the low-resolution classification maps generated by the first level into the original size. Up-sampling the low-resolution maps directly into the original size can lead to very serious blurry edges. To overcome this inadequacy, objected-based approach provides a nice possibility. In this study, a segmentation algorithm, which is implemented in the commercial software eCognition®, is adopted to acquire boundaries at different scales by dividing the image into a series of non-overlapping objects. Afterward, for the up-sampled classification map, class labels for each object are relearned by majority voting of all the classification labels among this object.

GLCM
GLCM-based textural feature extraction has been proved to be the most effective statistical texture measures for land cover classification (Pacifici, Chini, and Emery 2009). In our study, the following two commonly used measures are chosen for the co-occurrence matrix: (1) The signal recorded in the DMPs gives information about the size and the type of the structures in the image. Similar with EMPs, DMPs can also be built on base images of the hyperspectral images to avoid highdimensional feature space. In our experiments, the SE setting for the DMP is the same as that of the EMP's.

EAP
The third morphological profile involved in this paper is attribute profiles (AP). AP represents adaptive analysis by implementing a series of attribute thickening and thinning operators on the connect components according to various criteria (Dalla Mura et al. 2010). It has been proved to be effective spatial analysis tools to enhance the structures that presented in an image and is widely applied in urban mapping (Dalla Mura et al. 2011;Huang et al. 2014). APs can also be expressed within the framework of MPs defined in Equation (3), by replacing the opening and closing operators by a series of morphological attributes. Let us denote T (I) and T (I) by the attribute thinning and thickening operators, respectively. With a criterion T λ , the APs can be written as: The criteria considered in this paper consist of the area of the regions and the standard deviation. Similarly, for the multi/hyperspectral imagery, the EAPs can be represented by: In this study, the parameters of the morphological attribute profiles were defined according to the works of Dalla Mura et al. (2010): (1) the area of the regions (λ a = [100, 500, 1000, 5000]), (2) the standard deviation of the graylevel values of the pixels in the regions (λ s = [20, 30, 40, 50]).

3D DWT
Wavelet-transform-based feature extraction method is achieved by firstly over-complete wavelet decomposing of a square local area around each pixel. Afterward, different statistical measures of each sub-image are calculated and assigned to the components of the feature vector of the central pixel in the area (Fukuda and Hirosawa 1999). On account of the ability of examining the signal at different resolutions and desired scales, it was found to be a promising tool in image analysis of both spatial and frequency domain (Khare et al. 2013). In our research, a recently proposed object-based 3D discrete wavelet transform texture (Guo, Huang, and where (i, j) are coordinates in the co-occurrence matrix space; p(i, j) is the co-occurrence matrix value at (i, j); N is the dimension of co-occurrence matrix. In order to obtain multi-scale textural features, in our experiments, four window sizes are used: 3 × 3, 5 × 5, 7 × 7, and 9 × 9. Afterward, to suppress the directionality of GLCM, extracted textural features of each window size are averaged over four directions.

EMP
The opening and closing operators, which are defined on dilation and erosion by reconstruction, were firstly proposed to analyze panchromatic images by Pesaresi and Benediktsson (2001). It has been proven to be effective tools for extracting spatial features from images and widely applied in remote sensing images analysis (Benediktsson, Palmason, and Sveinsson 2005).
Let SE (I) and SE (I) be the morphological opening and closing with a structuring element (SE) for an image I, respectively. MPs are defined by a series of SEs with increasing sizes: where λ is the radius of a disk-shaped SE. EMPs have been proposed for morphological feature extraction from hyperspectral imagery and can be written as (Liao et al. 2012): where f comprises a set of the n-dimensional base images with f(1) is the first band and f(n) is the nth band of image f. Similarly, to obtain multi-scale features, in this study, the EMPs are calculated based on a disk SE which is set to SE = [1, 3, 5, 7], with four openings and four closings.

DMP
DMPs (Pesaresi and Benediktsson 2001), another set of opening-and closing-based morphological features, were used to calculate the morphological profiles value differences at different scales. It has been proved out to be the state-of-the-art spatial feature extraction approach and is widely applied in the urban mapping (Huang and Zhang 2009) and automatic information extraction (Jin and Davis 2005). It can be expressed within the framework of MPs defined in Equation ( A training map was provided along with the data-sets, and the spatial resolution equals to that of the airborne color data-set. Numbers for the training and testing set in our experiments are listed in Table 1, and the training samples (200 for each class) used in the classification are generated randomly from the training set.

Spectral-based classification
The SVM classification results for the spectral-based classification are listed in Table 2. It can be learned from the global accuracies that the optical image can roughly discriminate the main classes of the image, while the LWIR image has difficulty in recognizing them. Regarding the class-specific accuracies of optical images, some classes, such as road and vegetation, have rather low accuracies due to their spectral similarity with some other classes in high-resolution images. For the LWIR image, the classification performance of road achieves a markedly good accuracy of 90.9%. This can be attributed to the fact that the class road exhibits unique thermal radiation characteristics in the LWIR image.

Spectral-spatial classification
The SVM classification results for the spectralspatial classification are listed in Table 3 and AAs over spectral-based classification are shown in Figure 2. As shown in Figure 2, the integration of spatial features allows significant improvements for both the optical Zhang 2014) was adopted, which considers the local imagery (object) patch as a cube, and decomposes it into a set of spectral-spatial components. The 3D DWT feature is subsequently obtained by measuring the energy function of the wavelet coefficients, and providing the representation of the imagery information in both the spectral and spatial domains. Accordingly, for the multi/hyperspectral imagery, 3D DWT is constructed by a tensor product and can be written as follows: where ⊕ and ⊗ are the space direct sum and tensor product, respectively; L and H denote the low and high filters; superscripts of x and y denote the spatial coordinates of an image; z is the spectral axis. Afterward, the energy statistic is used to characterize the texture property and it can be written as follows: where W is a B × B × N local cube; B and N are the dimensions in the spatial and spectral domains, respectively; P(i, j, k) is the wavelet coefficient in the cube centered by the pixel (i, j, k). In our experiments, the so called "local imagery" is non-overlapping blocks that are generated by multi-scale segmentation of eCognition® software. For the segmentation criterion: scale parameter is set to 100, with shape and compactness is set to 0.1 and 0.5, respectively.

Data-set
Experiments are carried on the grss_dfc_2014 data-sets which are provided for the 2014 Data Fusion Contest by Telops Inc. (Canada). Thegrss_dfc_2014 data-sets consist of two different data-sets acquired at different spectral ranges and spatial resolutions, which are integrated to the same airborne platform − a coarseresolution LWIR hyperspectral data-set and a fine-resolution visible data-set, covering an urban area near Thetford Mines in Quebec, Canada. The thermal hyperspectral image which was acquired using an airborne LWIR hyperspectral imager (Hyper-Can), contains 84 spectral bands in the range of 7.8-11.5 μm at a spatial resolution of approximately 1 m. While the visible image was collected by a digital color camera, it contains RGB uncalibrated digital data at a spatial resolution of 0.2 m. The two data-sets were collected simultaneously on 21 May 2013, between 22:27:36 and 23:46:01 UTC with the

Opt-LWIR joint classification
The SVM classification results for the joint classification are shown in Tables 4 and 5, with the corresponding spatial features calculated based on the first PCA of the optical and LWIR image, respectively. The "spectral" denotes the spectral-joint classification of this two data sources, while the "GLCM", "EAP", "EMP", "DMP", and "3D DWT" represent the corresponding spatial features involved in the joint classification. It can be learned from Table 4 that the integration of LWIR information leads to better global results, compared with the optical spectral and spectral-spatial classifications. And when we use the spatial features generated from the first PC of optical image, the improvements are more obvious. Regarding the class-specific accuracies, the classification performance of trees, red roof, concrete roof, and vegetation achieve the highest accuracies than ever before. Table 5 shows when spatial and LWIR image. For the optical image, the textural features (GLCM and 3D DWT) outperformed the other spatial features, according to the average accuracies. For the LWIR image, all the spatial features have shown comparable capabilities for the classification improvement.
When a careful analysis is done on the class-specific accuracies, excellent class accuracies are shown. Particularly, in Table 3, the classification performance of class road reaches an accuracy of 96.5% using EAP, trees 92.7% using EMP, red roof 86.5% and gray roof 93.6% using GLCM, concrete roof 87.1% using EMP, and finally, bare soil 89.0% using EAP. Accordingly, it can be concluded that for each information class there are specific spatial features that are helpful for their identification.  Figure 2. percentage of improvement for the aa obtained by the spectral-spatial classification methods, compared to the raw spectral-based method.   It is also obvious that multiple feature classification has average increments of 2% in terms of AA, compared with the best results in spectral-spatial classification. Surprisingly, the classification of stacked features gives trees and vegetation wonderful recognition accuracies, especially for the vegetation, showing an unprecedented good result for all of the classifications results in our research.

Object-based classification
Accuracies for the object-based classification are listed in Table 7, and preferable classification maps are displayed in Figure 3. As expected, the effectiveness of our classification framework is demonstrated by the far more accurate class recognitions. And the GLCM textures are proved to be the most helpful spatial features for features are calculated based on LWIR information, the overall performance of the joint classification is not as satisfactory as the former. An exception is the DMP features calculated based on LWIR information which promotes the optical spectral classification. Surprisingly, in this case, concrete roof achieves the highest accuracy. To sum up, spatial features based on optical information are more effective than on LWIR information for the majority class discrimination.

Multiple features classification
The classification accuracies for the multiple features classification are listed in Table 6. It can be seen from the global classification that the decision fusion shows the superiority in implementing multiple feature classification over the feature stacking.  identification of some specific class, such as the class concrete roof for this data-set. (4) Multiple features classification. To have a combined usage of various spatial features, multiple feature classification is implemented in our experiments, and the results show that far more accurate accuracy are obtained. Besides, the decision fusion displays the superiority in the global classification over the feature stacking. But stacking all the spatial features for the classification gives great accuracies for some specific classes, such as the class trees and vegetation in our experiments.
Level 2: Projecting classification maps to the high resolution.
To recover the original size of the classification map, the object-based approach is implemented to retain the edge information with multi-resolution segmentation map regarding as the boundary reference. Experiments show good result of the objectbased up-sampling methods, in terms of the global accuracy. However, the accuracies proved that classes with regular shapes are more adapted to the objectbased approach.
The proposed urban mapping framework demonstrates that the LWIR data are effective complementary information for land cover task. It is evident that thermal remote sensing is potentially a powerful tool for examining land cover and is particularly suited for some specific class recognition. This is essential within the research area of global change where the current land cover must be accurately monitored in order to better determine the potential future change. As a result, the thermal data are an increasingly important component in remote sensing research. It still remains necessary to be analyzed extensively. the classification of this data-set, as AA being equally matched with the multiple features classification. However, the good accuracy of trees and vegetation that generated by stacking is weakened by object-based classification. Thus, object-based approach is more adapted to classes that have regular shapes such as road and roofs.

Conclusions
In this paper, we addressed the challenge of using VHR data in urban mapping, and proposed a combined imagery methodology framework which is based on the optical and LWIR image. In which, the spectral, spatial, and multi-senor information are simultaneously taken into account. Experiments are conducted on a two levels classification framework using the 0.2 m spatialresolution optical image and 1 m resolution LWIR image.
Level 1: Classifications at low resolution. The optical image is down-sampled firstly to fit the size of LWIR image. To examine the contribution of the optical and LWIR image in the classification, four types of classifications are applied on the two data sources. We have illustrated the performance of the framework in both the global accuracies and the class-specific accuracies. Some important observations resulting from this level can be summarized as follows: (1) Spectral-based classification only. Experiments prove that the two data sources do not show preferable performances according to the global accuracy. But the LWIR image has the potential value for better identifying of the Road class.
(2) Spectral-spatial classification. Spatial features such as GLCM texture, EAP, EMP, DMP, and 3D DWT texture are compared in the spectralspatial process for the optical and LWIR image, respectively. Experiments show that both the two data sources exhibit better classification accuracies, while taking spatial features into consideration. And particularly, textural features are more effective than the MPs for the optical image accuracy increments. Further, for each class, there exist the specific spatial features that are best suitable for the recognition. (3) Joint classification. To overcome drawbacks brought by single data source, optical and LWIR image with their respective spatial features are classified together. Experiments show that joint classification can greatly improve the optical classification (some classes reaching best accuracies). Furthermore, according to the global accuracy, the spatial features calculated based on optical information are proved to be more effective than that of LWIR information. But the LWIR-based spatial features are suitable for the