A decision-based multi-sensor classification system using thermal hyperspectral and visible data in urban area

ABSTRACT Multi-sensor data fusion has become more and more popular for classification applications. The fusion of multisource remote-sensing data can provide more information about the same observed site results in a superior comprehension of the scene. In this field of study, a combination of very high-resolution data collected by a digital color camera and a new coarse resolution hyperspectral data in the long-wave infrared range for urban land-cover classification has been extensively enticed much consideration and turned into a research hot spot in image analysis and data fusion research community. In this paper, a decision-based multi-sensor classification system is proposed to completely use the advantages of both sensors to attain enhanced land-cover classification results. In this context, spectral, textural and spatial features are extracted for the proposed multilevel classification. Then, a land-cover separability preprocessing is employed to identify how the proposed method can fully utilize the sensor advantages. Next, a support vector machine is applied to classify road classes by using thermal hyperspectral image data; plants, roofs and bare soils are classified by the joint use of sensors via Dempster–Shafer classifier fusion. Finally, an object-based post-processing is employed to improve the classification results. Experiments carried out on the dataset of 2014 IEEE GRSS data fusion contest indicate the superiority of the proposed methodology for the potentialities and possibilities of the joint utilization of sensors and refine the classification outcomes when evaluated against single sensor data. Meanwhile, the obtained classification accuracy can be a competitor against the results issued by the 2014 IEEE GRSS data fusion contest.

With recent technological advances in remote-sensing systems, fusion of very high-resolution data collected by a digital color camera and a new coarse resolution hyperspectral data in the long-wave infrared (LWIR) range for urban land-cover classification has been extensively enticed much consideration and turned into a research hot spot in image analysis and data fusion research community (Liao et al., 2015;Lu et al., 2015;Eslami & Mohammadzadeh, 2015;. In this context, thermal infrared hyperspectral (TIR HS) data represent extremely challenging remotely sensed data with numerous potentialities in target recognition and material classification irrespective of illumination conditions (Liao et al., 2015); these data show very high potentialities in many circumstances, e.g. for remote-sensing and non-destructive technologies which provide an exhaustive discrimination of similar ground entities (Lu et al., 2015;Wang, Wu, Nerry, Li, & Li, 2011). Nevertheless, low energy, low SNR, high inter-band correlation, spectral variation and ambiguous object boundaries are the most challenging problems which can seriously affect the classification efficiency (S. Li et al., 2011;Rodríguez-Galiano, Ghimire, Pardo-Igúzquiza, Chica-Olmo, & Congalton, 2012 ;Miliaresis, 2014;. On the other hand, visible image data provide detailed spatial features and clarity. However, low distinctive spectral features result in an inability to distinguish homogeneous spectral objects. The physical background of the LWIR HS data is the basic spectral absorption features of silicate minerals, which are the most constituent of the terrestrial surface and man-made construction objects. The silicon-oxygen bonds of the silicate minerals (Si-O) cannot exhibit spectral features in the visible-to-shortwave infrared region of the spectrum, while the Si-O bonds' stretching vibrations expand strong significant features in the LWIR spectral wavelengths. Man-made objects additionally emit a greater extent polarized infrared radiation than naturally derived background materials (i.e. tree, soil and vegetation), because they have relatively smooth surface features compared to most naturally occurring surfaces. In this context, the emissivity can parametrically suffice if surface irregularities are large compared to the emitted radiation's wavelength. However, if surface irregularities are small compared to the emission wavelength, the surface may be more specular and an observable induced polarization occurs in the emitted thermal radiation. The basic principles can be employed for the development of spectral-based urban classification/un-mixing of manmade objects by LWIR HS data (Liao et al., 2015).
In this paper, a decision-based multi-sensor classification system is proposed to completely use the advantages of both sensors to attain enhanced land-cover classification results. In this context, spectral, textural and spatial (STS) features are extracted for the proposed multilevel classification. Then, a land-cover separability preprocessing is employed to identify how the proposed method can fully utilize the sensor advantages. Next, a support vector machine (SVM) is applied to classify road classes by using TIR HS image data; plants, roofs and bare soils are classified by the joint use of sensors via Dempster-Shafer (D-S) classifier fusion. Finally, an object-based post-processing (OBPP) is employed to improve the classification results.
The rest of the paper is organized as follows: After a literature review, the concept of the proposed decision-based multi-sensor fusion system is presented, followed by experiments, obtained results, discussion and conclusion.

Literature review
The image analysis and data fusion technical committee of the geoscience and remote-sensing society (GRSS) is an international network of scientists who are active in multi-temporal, multisource, multi-resolution and multimodal remote-sensing image analysis fields, released two airborne datasets collected at various spectral and spatial modalities with a concise temporal interval to deal with two open image analysis and data fusion research community problems comprising of handling multiple source and multiple resolution data in two parallel identical validity tracks. The classification contest concentrated on classification performance outcome at the highest spatial modalities with recent kinds of sensors, while the paper contest was regarded as new ideas of the multi-resolution data processing and analysis of the new TIR HS imagery. In this context, the classification contest's winning manuscript was focused on maximizing the land-cover mapping's accuracy for a particular dataset. In this study, both datasets were resampled to 0.5-m spatial resolution. A principal component analysis (PCA) was employed on TIR HS image data to reduce redundancy and computation time for image classification. Then, textural features, vegetation index and morphological building index were extracted to identify successively required classes using a binary SVM (Huang & Zhang, 2012b). Finally, the obtained pixel-based land-cover classification map was refined by majority voting (MV), adaptive mean shift segmentation and multiple semantic rule. On the other side, the paper contest's winning manuscript was focused on the novel development for mutually taking profits from both datasets. In this respect, morphological features were extracted from the visible imagery; the visible image was utilized as a part of a guided filtering scheme to increase the LWIR image's spatial separability in the PCA domain. Then, the extracted features and enhanced LWIR data were integrated by using a graph-based method. Finally, the feature combination was used to generate the final land-cover classification map with an SVM classifier. As the last point, the classification contest's winner modified the primary land-cover classification map using impressive production-like points and obtained maximal classification accuracy, while the paper contest winner's focus turned into the novel development with less emphasis on the land-cover classification map's geometrical precision (Liao et al., 2015).
After the contest, the data remain publicly available for further experimental analysis; 1 Lu et al. (2015) proposed a synergetic decision-based classification method to estimate a thematic classification map for the mentioned datasets. First, a set of preprocessing steps was carried out on both datasets. Next, a semi-supervised local discriminant analysis was applied to identify distinguish descriptors for an SVM classifier. A combination of texture and spectral features was used for visible image classification. Finally, the outcomes of both SVM classifiers were fused to estimate the thematic classification map. The obtained results of the study confirmed an enhancement of the proposed synergic decision-based classification method against the standard classifiers or any single sensor's classification outcome. Eslami and Mohammadzadeh (2015) proposed an integration method to classify urban objects for the 2014 IEEE GRSS data fusion contest datasets. First, TIR HS image's atmospheric effects were removed by the in-scene atmospheric compensation as described in (Winter, 2004). Then, the sequential parametric projection pursuit dimension reduction (DR) operator was used to achieve multispectral TIR image data at 20-cm spatial resolution. An SVM classifier was employed to classify integrated visible and multispectral TIR images. Finally, an object rule-based postprocessing was applied to generate the final classification map. The obtained results of this study proved the advantages of the proposed method against the standard classifiers or any single sensor's classification outcome. Li et al. (2015) proposed another fusion approach for the mentioned datasets to attain enhanced urban land-cover classification map. In this study, the proposed method was composed of data preprocessing, road extraction and remaining classes classification. In the preprocessing step, TIR HS data were denoised by using a low-rank matrix recovery. In parallel, visible data gaps were predicted by utilizing the mapping relationship between the visible and LWIR HS data in a supervised fashion. In the road extraction step, a linear SVM classifier was employed to classify road pixels by the use of TIR HS imagery. A mean shift algorithm was applied to segment the visible image; further, a MV approach was utilized to achieve object-oriented fusion results guided by the clusters of the segmentation map. Finally, the morphological dilation operation was conducted to refine extracted road pixels of the TIR HS image. In the remaining class classification step, the fine spatial resolution visible data were used to be classified using a linear SVM classifier. After the classification, the segmentation map was utilized as an object-based decision fusion step to conduct the post-classification process. The obtained results of this study demonstrated the advantages of the proposed method.
In the above-described papers, different ways were proposed to combine the 2014 IEEE GRSS data fusion contest datasets; they considered fascinating innovations in terms of development novelty or urban land-cover mapping applications. The significant challenge was the contrivance of a combined classification architecture that is a trade-off between accuracy enhancement and land-cover classification solution's reliability, complexity reduction and processing proficiency optimization. In addition, the architecture was extremely constrained by the demand to integrate data from multiple source and at multiple resolution. The interpretation of this new dataset combination remains quite challenging and therefore is still a focus of research activities. The most important challenges that should be considered in the integration architecture are as follows (J. Li et al., 2015): (a) the TIR HS image's low energy and low SNR can hinder the extraction of distinctive descriptors; (b) excessive inter-band correlation of TIR HS image indicates considerable descriptor redundancy and a very time-consuming image classification process (S. Li et al., 2011); (c) various descriptors for identical land-cover objects collected by LWIR HS at different times are possible (Miliaresis, 2014); (d) TIR HS ambiguous object boundaries can severely affect the classification accuracy at the highest spatial resolution (Rodríguez-Galiano et al., 2012) and (e) visible data interpretation can also be difficult due to severe interclass variations (Liao et al., 2015).
In order to tackle the abovementioned challenges, a decision-based multi-sensor classification system is proposed to completely use the advantages of both sensors to attain enhanced urban land-cover classification results.

Proposed method
The proposed methodology is a combined architecture which can be valued as a trade-off between accuracy enhancement and land-cover classification solution's reliability, complexity reduction and processing proficiency optimization ( Figure 1); preprocessing, feature extraction, multilevel SVM classification (MLSC) and OBPP are employed as shown in the following sections.

Preprocessing
The first step of the multilevel classification process is to analyze the land-cover spectral separability of the training dataset, as it can be used as a foundation to enhance the classification accuracy. In this context, the Jeffries-Matusita distance (JMD) and the transformed divergence (TD) indices, the most widely used discriminability evaluation indices, are estimated to identify how the proposed method can fully utilize the advantages of the 2014 IEEE GRSS data fusion contest datasets (Table 1). From Table 1, first, it can be observed that the road pixels can be easily classified by utilizing the TIR HS imagery due to the strong separability of the road and other landcover classes. Second, plant (tree/vegetation), roof (red/gray/concrete roofs) and bare soil classes are discriminated by the fusion of both visible and TIR HS datasets. For the TIR HS image data, it can be observed that the internal plant classes (tree and vegetation) show weak separability, and similar observation can be made for the internal roof classes (red, gray and concrete roofs) as marked in Table 1; in this context, tree and vegetation pixels are separated within the plant pixels by utilizing the visible image data and the same operation is employed to classify red, gray and concrete roofs within the roof   pixels. As a summary, the proposed multilevel classification framework contains the following operations: (a) road pixels' classification by utilizing the TIR HS imagery; (b) plant, roof and bare soil pixels' discrimination by the fusion of both visible and TIR HS imageries and (c) land-cover pixels' separation by utilizing the visible imagery. Table 2 illustrates the procedure of utilizing the 2014 IEEE GRSS data fusion contest datasets to obtain maximum classification accuracy based on the spectral separability analysis.

Feature extraction
The next phase of the classification procedure for the multi-sensor data consists of extracting appropriate descriptors. The extracted features should comprise distinct descriptors to separate several objects (Table 3). Hyperspectral remote-sensing images consist of extremely narrow spectral bands that result in high inter-band correlation and timeconsuming image analysis operations; excessive descriptors can lead to the curse of dimensionality problem, also called Hughes phenomenon, in case of using standard classifiers (S. Bigdeli et al., 2013;Li et al., 2011). In this context, DR is used to transform the data volume into a reduced dimensionality form with distinct descriptor information to overcome the mentioned phenomenon (Hasanlou, Samadzadegan, & Homayouni, 2015). In this regard, PCA, the most widely known linear technique for data volume reduction, is applied to reduce the high dimensionality of TIR HS imagery and the first five principal components (PCs) are extracted as spectral features on the TIR HS data. Furthermore, the gray value of the image and its spatial distribution in a local window can be used as spectral and textural descriptors of the visible data (Haralick & Shanmugam et al., 1973). After extraction of the above features, a multilevel classification framework is performed by considering the above-described land-cover spectral separability analysis.

Multi-level SVM classification
In the proposed multilevel classification strategy, a progressive multiple procedure classification model is applied on the extracted features in order to be a trade-off between accuracy enhancement and land-cover classification reliability, complexity reduction and processing proficiency optimization; a progressive process composed of multiple step as described in the following sections.
In the first step, the road pixels are classified by using the first five PCs due to the strong separability of the road and other land-cover classes as described in the preprocessing step. Among the various supervised classifiers, an SVM is an encouraging and welldocumented methodology because of its simple utilization, supreme efficiency and ability to handle different issues (Abe, 2010;Lu et al., 2015). An SVM maximizes the predefined classes' discrepancy using optimal separating hyperplane estimation. A linear decision function is applied to transform nonlinearly separable data into a higher dimensional space using SVM kernel functions. In this paper, the most wellknown SVM kernel function, radial basis function, is used to estimate inner products among any sample pairs in the feature space. As the SVM parameters, composed of regularization parameter (C), defines a trade-off between the training error and the model complexity minimization, and the kernel bandwidth parameters (γ), have strong effects on its classification efficiency, grid search is employed to automatically estimate the optimum regularization and kernel parameters (Chang & Lin, 2011).
In the next step, plants, roofs and bare soils are discriminated by the fusion of both data source information. In this paper, a multi-SVM system is applied to classify each data source features and then a multiclassifier system is taken on the decision level to integrate the SVM classification results. Decision level fusion is commonly determined as the procedure of combining single source data results in an enhanced classifier in comparison with any single classifiers that create the ensemble (Kuncheva, 2004). In this context, the proposed multiple SVM system uses one SVM classifier for each data source features which is adjusted according to the corresponding data information, while standard methods employ one SVM classifier for the series combination of the whole data source features that cannot be adjusted to the entire data. Among the various decision level fusion techniques, D-S theory can demonstrate and combine uncertain data, as it may integrate objective evidence for a hypothesis by the framework's expectation of the significance of that evidence to the hypothesis (Lu et al., 2015). The following steps   are carried out to illustrate how the D-S algorithm combines c individual classifiers (Kuncheva, 2004;Rogova, 1994): • The "proximity" Φ is calculated between DT i and the classifier's output D i for the input x as is the ith decision profile's row DP(x).
• The following belief degrees are determined for each class, j = 1,. . .,c and for each classifier, i = 1,. . .,L as • The final degree of support is estimated as K is a normalizing constant.
In the final step, land-cover classes are separated by using the visible data source features as described in the preprocessing step. In this context, an SVM classifier is applied to classify tree versus vegetation from the extracted plant pixels. The same procedure is used to classify red, gray and concrete roofs from the extracted roof pixels. Meanwhile, grid search is employed before the mentioned procedure to automatically estimate the optimum regularization and kernel SVM parameters.

Object-based post-processing
There can be several outlier pixels (wrongly classified pixels) and a spatial coherency deficiency caused by the problem of excessive heterogeneity in the standard pixel-based classification techniques. Image segmentation is used to eliminate the salt and pepper noise or points commonly created by "the same object but different spectrum" or "the same spectrum but different objects". It makes utilization of the mentioned STS descriptors to split an image into spatially uninterrupted disjoint and un-overlapping alike zones (Lu et al., 2015). In this paper, multiresolution segmentation technique is applied to segment the data into regions. The multi-resolution segmentation algorithm starts with one pixel as single image objects and frequently combines an image objects pair into larger single entities. The combination decision is based on a local homogeneity criterion, defining the likeness among contiguous image objects (Baatz and Schäpe 2000). After performing segmentation, a MV should be employed on each of the segmented regions to make the final label outputs decision (Kuncheva, 2004).
In the final step, the spatial relationship between classes is investigated and the following semantic rules are employed to refine the land-cover classification results (Table 4).

Experiments and results
The proposed decision-based multi-sensor fusion system is evaluated using the released 2014 IEEE GRSS data fusion contest datasets comprising two airborne datasets collected at two spectral and spatial modalities with a concise temporal interval: (a) TIR HS imagery with approximately 1-m resolution; (b) visual data with approximately 0.1-m resolution that was spatially down-sampled into 0.2-m resolution to optimize multiple resolution ratio (Figure 2). The contest datasets were acquired by two different fixed-wing aircrafts at approximately 800-m flight height on 21 May 2013 by Telops Inc., Québec, Canada. 2 Furthermore, the flights were performed over an urban area at a short distance away from Thetford Mines in Québec, Canada, containing commercial and residential structures, gardens and roads.
The TIR HS image was collected by utilizing the recent airborne LWIR HS imager "Hyper-Cam" which is a Fourier transform spectrometer (FTS) comprising 84 spectral narrow bands in the 7.8-11.5-µm wavelength range. The visible data were If length/width ≤ Tr and relative border of Roof class = 1, then merge it as Roof class For Roof object, if length/width ≥ Tr and existence of Road class = 1, then merge it as Road class For Roof object, if size ≤ Tr and size ≥ Tr, then merge it with the neighboring object that shares the most of its boundary composed of un-calibrated digital data at the highest spatial resolution with sparse ground coverage over the same area as the LWIR HS imagery. The visible data were georeferenced to be aligned into the LWIR HS image's coordinate system (Figure 2). The provided multi-sensor dataset presents some challenging problems which can seriously affect the classification efficiency, including (a) low energy, low SNR, high inter-band correlation, spectral variation and ambiguous object boundaries in TIR HS imagery; (b) spatial gaps in visible data.
• As shown in Figure 2(c), the training samples represent obvious spatial correlation which reveal significant spatial redundancy causing over-fitting issues. • As can be seen in Figure 2(d), the ambiguous boundaries of land objects can affect thermal image interpretation at a fine spatial resolution. • The existing spatial gaps between the sequentially acquired visible data strips can decrease the spatial descriptors efficiency (Figure 2(e)). Furthermore, there are several outlier pixels and a spatial coherency deficiency caused by the problem of excessive heterogeneity due to "the same object but different spectrum" or "the same spectrum but different objects". • Figure 3(a) shows thermal mean spectra of each class where the horizontal axis indicates the number of thermal bands and the vertical axis represents the radiant energy. It can be seen that the maximum vertical value is less than 0.12, which illustrates extremely bounded radiant energy.
The excessive inter-band correlation indicates considerable descriptor redundancy in the thermal data. • As shown in Figure 3(b), there is a radiant energy discrepancy across the flight direction between the sequentially acquired thermal data strips due to environmental changes, while the visible data remain relatively stable as shown in Figure 2(e).
In the first step of the presented methodology, JMD/TD indices were estimated using Envi software to analyze the land-cover spectral separability of the training dataset (Table 1). Following that, STS features (Table 3) were employed on both data to determine the feature space ( Figure 4).
After determination of abovementioned feature space, the proposed progressive processing model was used by considering the land-cover spectral separability analysis. Then, the multi-resolution segmentation algorithm was performed using eCognition software with the values 25, 0.3 and 0.5 for the scale, compactness and shape parameters. After that, MV was employed on each of the segmented regions to make the final label outputs decision. Finally, the spatial relationship among land covers (Table 4) was considered to refine the classification results. In this section, the efficiency of the proposed method is assessed via a single experiment. The following objectives of the methodology are considered: • the effectiveness of STS features to enhance the classification accuracy, • the effectiveness of the multilevel classification method to utilize the pros of multi-sensor data, • the effectiveness of the knowledge-based system to tackle the common challenges of traditional pixel-based classification methods, and • the comparison of the obtained results with the methods evaluated in the 2014 IEEE GRSS data fusion contest.
First, the effect of the extracted visible STS features is investigated through the visible data classification results (Table 5 and Figure 5). It can be seen that the classification performance coefficients are still identical (OA/kappa: 0.82/ 0.75) while the accuracies of most classes have been increased effectively; STS features can obtain much better accuracies of the plant (tree/vegetation: 0.85/0.82) and gray roof (0.75) classes via the generated distinguishing descriptors. Table 6 presents the confusion matrix of the experiment using the extracted visible STS features to enhance land-cover classification.
After this step, the effect of the extracted LWIR STS features is investigated by the TIR HS data classification results (Table 5 and Figure 5). It can be summarized that the PCA-LWIR classification method improves the classification coefficients (OA/ kappa: 0.69/0.53) via inter-band correlation reduction. Also, it can be observed that the accuracies of all classes have been increased effectively. Table 7 displays the confusion matrix of the test by utilizing the PCA-LWIR features.
Second, the effect of the MLSC method is investigated through the D-S combining STS-based classification results (Table 5 and Figure 5). From the classification accuracy viewpoint, the two strategies resulted in satisfactory accuracies when compared with individual classification results. In more detail, the overall results clearly demonstrate that the proposed progressive processing model outperforms the STS-based data fusion method in terms of classification accuracy; the MLSC strategy represents the best classification performance coefficients (OA/kappa: 0.91/0.87) that was caused by accuracy improvement of D-S fusion strategy with up to 4% for OA and 6% for kappa coefficient. The proposed MLSC method can improve the classification accuracies of the road (0.93) and gray roof (0.90) classes by considering the land-cover spectral separability analysis (Table 1). Table 8 displays the confusion matrix of the experiment using the MLSC method.
Third, the effect of the knowledge-based system is perused to investigate how the proposed strategy tackles the common challenges of traditional pixel-based classification methods (Table 5 and Figure 5). MV strategy leads to an even better classification accuracy (OA/kappa: 0.93/0.89) that represents an accuracy improvement of the MLSC strategy by up to 2% for the classification performance coefficients. Furthermore, it can be seen that the accuracies of all classes have been enhanced effectively via MV on multi-resolution segmented regions. As a consequence, the semantic rules can improve the classification accuracy (OA/kappa: 0.96/0.93) which still enhance MV strategy by up to 4% for the kappa coefficient.  Hereby, the accuracies of all classes have been enhanced effectively via considering the spatial relationship between extracted classes. Table 9 demonstrates the confusion matrix of the experiment using the OBPP.
The obtained results confirm that the proposed decision-based multi-sensor classification system exhibits a superior performance compared to the conventional classification methods or any individual classification result. Figure 6 illustrates the obtained classification maps of the 2014 IEEE GRSS data fusion contest datasets.
Fourth, the proposed classification method is in a higher rank place than the majority of the participating teams regarding the comparison of the obtained results with the top 10 papers presented in the 2014 IEEE GRSS data fusion contest. This comparison is performed under the same condition, i.e. same training and testing datasets prepared by Telops Inc. (Figure 7).

Conclusion
This paper represents a decision-based multi-sensor classification system for LWIR HS and visible images to produce a classification map at the spatial resolution of the visible data. In the proposed method, a combination architecture is used to be a trade-off between accuracy enhancement and landcover classification solution's reliability, complexity reduction and processing proficiency optimization.
In this context, STS features are extracted for the proposed multilevel classification. Then, a landcover separability preprocessing is employed to identify how the proposed method can fully utilize the advantages of both sensors. Next, an SVM is applied to classify road classes by using thermal hyperspectral image data; plants, roofs and bare soils are classified by the joint use of both sensors via D-S classifier fusion. Finally, an OBPP is employed to improve the classification results. The  proposed method is evaluated with respect to the potentiality of STS features to enhance the classification accuracy, the effectiveness of the multilevel classification methodology to utilize the pros of multi-sensor data, the effectiveness of the knowledge-based system to tackle the common challenges of traditional pixel-based classification methods and the comparison of the obtained results to the methods presented in the 2014 IEEE GRSS data fusion contest. As a conclusion, the decision-based multisensor fusion system yields a higher classification performance coefficient against single source images and is indicated to be an encouraging method against the top 10 techniques evaluated in the 2014 IEEE GRSS data fusion contest. Furthermore, the land-cover classification map shows a superior objective result and turns out to be more reliable toward human perception. Future studies will focus on the context-aware decision level fusion.