Unsupervised hyperspectral band selection by combination of unmixing and sequential clustering techniques

ABSTRACT Selecting the decisive spectral bands is a key issue in unsupervised hyperspectral band selection techniques. These methods are the most popular ways for dimensionality reduction of original data. A compact data representation without compromising the physical information and optimizing the separation between different materials are the main objectives of such selection processes. In this work, a hyperspectral band selection approach is proposed based on linear spectral unmixing and sequential clustering techniques. The use of these two specific techniques constitutes the main novelty of this investigation. The proposed approach operates in different successive steps. It starts with extracting material spectra contained in the considered data using an unmixing method. Then, the variance of extracted spectra samples is calculated at each wavelength, which results in a variances vector. This one is segmented into a fixed number of clusters using a sequential clustering strategy. Finally, only one spectral band is selected for each segment. This band corresponds to the wavelength at which a maximum variance value is obtained. Experiments on three real hyperspectral data demonstrate the superiority of the proposed approach in comparison with four methods from the literature.


Introduction
Earth observation based on remote sensing hyperspectral imagery has been widely used in recent years for better understanding and managing our natural resources and our environment. Hyperspectral imaging sensors simultaneously acquire spectral bands with wavelengths ranging from the visible spectrum to the infrared region (Landgrebe, 2002). Obtained data are useful to several remote sensing applications, such as image classification and target detection. Indeed, these data increase the opportunity for better identification of spectrally unique materials (Chang, 2007).
The high number of spectral bands can result in an excessive computational complexity of hyperspectral data analysis techniques. More precisely, the obtained detailed spectra lead to an increase in the dimensionality of the data that brings the "Hughes phenomenon" for the classification process (Hughes, 1968;Kim & Landgrebe, 1991). This phenomenon significantly impacts the performance of supervised classifiers (Jimenez & Landgrebe, 1998). Thus, reducing the dimensionality of these data is a simpler way to address such problems.
Feature extraction and feature selection are two ways to perform dimensionality reduction. Feature extraction methods (Imani & Ghassemian, 2014;Marpu et al., 2012;Mojaradi, Moghaddam, Zoej, & Duin, 2009;Yang, Yu, & Kuo, 2010) transform the spectral bands of a hyperspectral image to a low dimensional feature through projection techniques. Although these projections preserve the maximum information of the original data, the obtained features lack physical interpretability. On the contrary, feature selection (also referred to as band selection) methods (Das, Ghosh, & Ghosh, 2014;Feng, Itoh, Parente, & Duarte, 2017;Guo, Damper, Gunn, & Nelson, 2008;Martinez-Uso, Pla, Sotoca, & Garcia-Sevilla, 2007;Yang, Liu, Bruzzone, Guan, & Du, 2013;Zhan, Hu, Xing, & Yu, 2017;Zhu, Huang, Li, Tang, & Liang, 2017) choose some representative spectral bands that preserve the physical information of the original data. These band selection methods are, in some situations, preferable than feature extraction techniques, especially when some materials need to be recognized using spectral libraries, which contain physical information about these materials. Also, band selection methods can be very useful to design new sensors by an initial choice of decisive spectral bands, which allow a maximum discrimination between different materials.
Depending on the availability of labeled reference data, band selection can be performed with supervised or unsupervised ways. Supervised methods (Backer, Kempeneers, Debruyn, & Scheunders, 2005;Bruzzone & Persello, 2009;Shi, Shen, & Liu, 2003;Yang, Du, Su, & Sheng, 2011) require the collection of reliable labeled samples, which is a difficult and tedious task. A supervised mutual-information-based band selection approach is proposed in Guo, Gunn, Damper, & Nelson (2006). Also, an approach that uses a spectral separability index is reported in Yin, Wang, & Zhao (2010). In Yang et al. (2011), a supervised band selection method, which uses known spectra without examining original spectral bands of the considered data, is considered.
Supervised band selection methods can generally achieve better selection than unsupervised ones, and that by considering some performance criteria such as the overall accuracy of the classification of selected spectral bands. However, unsupervised approaches (Amri, Benabadji, & Karoui, 2017;Chang, Du, Sun, & Althouse, 1999;Du & Yang, 2008;Rashwan & Dobigeon, 2017;Xu, Shi, & Pan, 2017;Zhang, Ma, & Gong, 2017) may be more suitable in applicative scenarios since they do not require any labeled data to perform the band selection process. Many of these unsupervised methods are based on the clustering strategy. For example, a hierarchical-clustering-based band selection approach is introduced in Martinez- Uso et al. (2007), which consists in grouping spectral bands to minimize the intra-cluster variance while maximizing the inter-cluster variance. Also, in Cariou, Chehdi, & Moan (2011), an original approach that splits initial spectral bands into disjoint clusters is reported. Finally, an unsupervised approach based on band clustering by split-and-merge stages is introduced in Rashwan & Dobigeon (2017).
In this paper a new unsupervised hyperspectral band selection approach is proposed, wherein linear spectral unmixing and sequential clustering techniques are combined. The designed approach operates in two principal stages. The first one consists of extracting, from the considered data, pure material spectra (also called endmember), while the second stage aims at sequentially clustering the variances vector obtained from the extracted spectra. This allows to directly taking into consideration the nature of classes contained in an imaged scene. The used clustering strategy results in sequentially clustered wavelengths. From each cluster, only one decisive spectral band is selected in an optimal sense described in the next section.
It should be noted here that the proposed method can be seen as an approach for identifying spectral regions, which allow the separability between different materials present in a given remote sensing image. For example, in Feilhauera et al. (2013), a data-driven approach, which considers intercorrelations between neighboring wavelengths in a correlogram of a field spectrometer data, is reported. Local minima of correlation values are chosen as boundaries of desired spectral regions. Other approaches, in the same field, exist and interested readers can refer to Feilhauera et al. (2013) or to the references cited therein.
The remainder of this paper is structured as follows. In section 2, the proposed approach is presented. Section 3 consists of test results with three real hyperspectral data sets. In that section, the obtained results are compared with those obtained by four methods from the literature. Finally, section 4 concludes this paper.

Proposed hyperspectral band selection approach
As stated in Section 1, the proposed hyperspectral band selection approach combines unmixing and sequential clustering techniques. The proposed method operates in different successive stages:

The unmixing stage
Since the selected bands must discriminate between endmembers that are ideally characterized by their spectra, the first stage consists of extracting these spectra by an unsupervised Linear Spectral Unmixing (LSU) technique (Bioucas-Dias et al., 2012). Indeed, and principally due to the coarse spatial resolution of hyperspectral sensors, mixed pixels may occur in the acquired images. These ones prevent direct recognition of endmembers without advanced processing to unmix these spectra. LSU is one of the most used techniques for processing considered data (Bioucas-Dias et al., 2012). It aims at linearly unmixing these data into a set of endmember spectra, and a collection of associated abundance fractions. It is worth remembering here that in the mathematical data model, used in the LSU techniques, each observed nonnegative radiance/reflectance pixel spectrum is supposed to be a linear mixture of the nonnegative radiance/reflectance endmember spectra contained in that pixel. As a result, the entire observed nonnegative radiance/reflectance hyperspectral image X 2 R NxP þ can be modeled, in matrix form, as (Bioucas-Dias et al., 2012) where each row of the matrix X represents one spectral band of the considered hyperspectral image. The P pixels of this image are restructured as a onedimensional array. N represents the number of spectral bands of that image. Each column of the matrix A 2 R NxL þ represents one endmember spectrum and each row of the matrix S 2 R LxP þ represents all nonnegative abundance fractions, in all pixels, of one endmember. These coefficients must satisfy the wellknown abundance sum-to-one constraint (Bioucas-Dias et al., 2012). L represents the number of endmembers.
In this work, the well-known Vertex Component Analysis (VCA) (Nascimento & Bioucas-Dias, 2005) technique is used to linearly extract desired endmember spectra. This fast technique requires, as input, the number L of endmembers. Here, this number is automatically estimated by using the well-known Hyperspectral Signal subspace identification by minimum error (HySime) (Bioucas-Dias & Nascimento, 2008) technique.

The variance measures stage
At each wavelength λ i (i = 1…N), the variance var λ i , of the extracted spectra samples a i1 . . . a iL ð Þ , is calculated as follows where m i is the mean of the above spectra sample values.
Then, the variances vector v is obtained and defined as The potential spectral bands may be those that correspond to wavelengths where obtained variances reach maximum values. Indeed, in this case the differences between the samples of the endmember spectra are significant, and thus a better discrimination is possible between endmembers for a classification purpose. However, this strategy is not sufficient, because it can result in the selection of very close spectral bands, that is a highly correlated subset of spectral bands, which certainly leads to misclassification results. As a consequence, the first step of the proposed band selection approach may be coupled with a clustering technique applied on the computed variances vector v. This way is also not promising for choosing significant spectral bands, for the reason that standard clustering techniques can lead to disjoint segments, which belong to the same cluster. Thus, the fact of choosing a spectral band that belongs to a cluster where the variance is maximum, can lead to the neglect of significant spectral bands in certain wavelengths of the considered electromagnetic spectrum. In this case, it is possible to consider each segment as a separate cluster, and select a spectral band for each one. However, in this situation the number of selected spectral bands may be higher than the number previously fixed.
The sequential clustering stage The computed variances vector v is segmented to a predefined number K of clusters with a sequential clustering algorithm (Trahanias & Skordalakis, 1989). This way allows the selection of significant spectral bands, i.e. where the differences between the samples of the extracted endmember spectra are significant, corresponding to wavelengths spread across the considered electromagnetic spectrum. In the proposed spectral band selection approach, the Warped K-Means (WKM) (Leiva & Vidal, 2013) sequential clustering algorithm is used in order to segment the computed variances vector v. The WKM is a K-means-based clustering algorithm that imposes a hard sequential constraint, which is implicitly embedded, via the considered wavelengths, in the computed variances vector v. Therefore, this algorithm leads to the segmented variances vector v seg defined as ½ var λ N with k 1 , k 2 ,… are integers that are constrained by: 1 < k 1 < k 2 < …< K.
It should be noted here that this clustering strategy leads to the same sequential wavelength clusters structure.

The selection stage
For each obtained segment seg t (t = 1…K), only one spectral band is selected. This one, denoted s λ t , corresponds to the wavelength where the variance var λ : , which belongs to the considered segment seg t , reaches a maximum value.
The whole proposed hyperspectral band selection approach is described by the following pseudo-code (Table 1).

Data sets
Here, the performance of the proposed band selection approach is evaluated on three real radiance hyperspectral data sets.
The first hyperspectral image is acquired by the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) sensor on the Kennedy Space Center (KSC), Florida, USA. This 512 × 614 pixel image is characterized by 176 spectral bands and a spatial resolution of 18 m. This image is available with labeled samples ground truth including 13 materials that represent the various land cover types that occur in this scene (scrub, willow swamp, cabbage palm/oak hammock, slash pine, oak/broadleaf hammock, hardwood swamp, graminoid/spartina/cattail/salt marsh, mud flats and water). The reader is referred to (Ham, Chen, Crawford, & Ghosh, 2005) for more details on this hyperspectral image.
The second image is also gathered by the AVIRIS sensor over the land of Indian Pines, in North-Western, Indiana, USA. This 145 × 145 pixel image, with a spatial resolution of 20 m, contains 185 spectral bands. This image is also available with labeled samples including 16 materials, which principally represent different cereal classes (corn, corn-min /notill, soybeans-min/notill/clean, oats, wheat) and vegetation. The reader is referred to (Jimenez & Landgrebe, 1999) for more information on this second used image.
The last hyperspectral image is acquired by the spaceborne Hyperion sensor on the Okavango Delta, Botswana. In total 14 ground truth materials with labeled samples are available for this 256 × 1476 pixel image that is characterized by 145 spectral bands and a spatial resolution of 30 m. The 14 materials represent, in the considered area, the land-cover types in seasonal swamps, occasional swamps, and drier woodlands. These materials reflect the impact of flooding on vegetation. The reader is referred to (Ham et al., 2005) for more details on this last used hyperspectral image.
It is clear that the used images contain some specific classes that can be confused in the classification task. Thus, the use of these data sets allows a good evaluation of the proposed technique.

Design of experiments
In this paper, the proposed unsupervised approach is evaluated by a supervised hyperspectral pixel classification in comparison with four literature methods, including the supervised Rough set and fuzzy C-means (Rough-FCM) (Shi et al., 2003), the supervised approach based on Rough Set (Patra, Modi, & Bruzzone, 2015), the unsupervised wards linkage strategy using divergence (WaLuDi) (Martinez-Uso et al., 2007), and the unsupervised wards linkage strategy using mutual information (WaLuMI) (Martinez-Uso et al., 2007). It should be noted here that these two well-known unsupervised band selection methods are the most techniques used in the literature. Also, the classification, using all the spectral bands, is considered in the conducted experiments.
The desired number of spectral bands to be selected is not known in advance and certainly varied from an image to another. In the conducted experiments, different numbers of selected spectral bands are considered ranging from 5 to 40 with a step size of 5.
As stated above, and after running the considered band selection algorithms, the supervised multiclass Support Vector Machine (SVM) classifier (Hearst, Dumais, Osuna, Platt, & Scholkopf, 1998) is used to evaluate the effectiveness of selected spectral bands. This classifier is implemented with the Radial Basis Function (RBF) kernel. The spread of the RBF kernel and the regularization parameter of the SVM classifier are derived, for each subset of selected spectral bands and for each data set, by using a grid search strategy. For each data set, the half of the available labeled ground truth samples are randomly selected for the learning of the used classifier. The accuracies are evaluated on the remaining samples by using Overall Accuracy (OA) and Kappa measures. Ten runs are performed to reduce the random effect of the results and the means of them are reported.

Results and discussion
Since the proposed approach uses the VCA unmixing technique in its first stage, the extracted endmember spectra from the considered data set are given in the following figures (Figures 1-3).
The above figures clearly demonstrate that the used data sets contain some specific classes, with similar radiance values in several spectral bands, which can be confused in the classification task.
In the next figures, the obtained variances vectors, computed from the above spectra, are given (Figures 4-6).
These last figures give an initial idea on the spectral bands where the discrimination between the classes, contained in observed scenes, is possible.
Also, and as examples, the results of the WKM sequential clustering algorithm applied on the above variances vectors with 10 segments (in order to select 10 decisive spectral bands) are given in the next figures. These figures show the obtained clusters with different colors, and the red stars correspond to the maximum variance of each segment. The indices of these stars are those of the spectral bands to be selected (Figures 7-9).  (3). 3-Sequential clustering stage 3.1-Sequentially segment the variances vector v to predefined number K of clusters by using the WKM method. 4-Selection stage 4.1-Select only one spectral band, where the variance is maximum, for each obtained cluster. Output: Reduced image with K spectral bands (K < N).
The following tables report the mean overall classification accuracy measures, as well as the mean Kappa accuracies obtained for both the data sets. Table 2 shows the obtained results for the KSC data set. This table reports that when selecting only 5 spectral bands, the proposed approach is capable to attain 86.56% mean overall accuracy, while the WaLuDi, the WaLuMI, the Rough-FCM and the Rough Set methods, with the same number of selected spectral bands, achieve 82.63%, 80.67%, 81.18% and 85.03% mean overall accuracies respectively. This table also shows that the proposed approach achieves better mean overall accuracy when selecting only 20 spectral bands than when using all the available spectral bands. Also, Table 2 reports that the results are improved by increasing the number of selected spectral bands. It can also be seen that the proposed   approach is able to achieve better results, for the different number of selected spectral bands, than the used literature methods, which confirms the efficiency of the proposed band selection approach.
In an analog manner, Table 3 highlights, for the Indian Pines data set, the same points as the previous, but with significantly better performances for the proposed approach compared to the used literature methods. In particular, when selecting only 5 spectral bands, the proposed band selection approach is able to achieve 78.08% mean overall accuracy, while the used literature methods attain mean overall accuracies between 60.07% and 67.19% with the same number of selected spectral bands. For the same data set, and compared to the use of all spectral bands, the proposed approach is able to achieve better mean overall accuracy by using only 25 spectral bands.   Concerning the obtained results by using the Botswana data set, Table 4 shows that when selecting small numbers of spectral bands, the proposed approach produces higher classification accuracy as compared to the literature techniques. For higher numbers of selected bands, the proposed approach is slightly less effective than the other used methods (in particular the supervised Rough set approach). These results (i.e. for higher numbers of selected bands) can be explained by the type of materials, and consequently their spectra, present in the considered data set. Also, and as stated in the introduction section, these results can be explained by the fact that the Rough set method is supervised while the proposed one is unsupervised.   Finally, these tables show that the mean Kappa accuracies, for both data sets, follow the same trends as the mean overall classification accuracies.
It should be noted here that the proposed approach gives, globally, very satisfactory results although the presence of the described specific classes in the used data sets. Also, the proposed approach is very easy to implement (since it combines wellknown techniques) and it is very fast: for an example, its execution time, for all data sets when considering ten selected spectral bands, is less than 3 s by using an Intel Core i5 CPU processor running at 2.50 GHz, with a memory capacity of 4 GB.

Conclusion
In this paper, a new method was proposed for unsupervised hyperspectral band selection. The designed approach combines linear spectral unmixing and sequential clustering techniques.
The proposed method starts with a linear spectral unmixing technique that extracts material spectra contained in the considered remote sensing hyperspectral data. After that, the variance of extracted spectra samples is calculated at each wavelength resulting in a variances vector. Then, the obtained vector is segmented with a sequential clustering technique. Finally, only one spectral band is selected for each segment. This band corresponds to the wavelength where a maximum of variance is obtained.
The designed approach along with four other literature methods were applied to three real hyperspectral data sets for comparison purpose. The evaluation, of the obtained results, was performed by using a supervised hyperspectral pixel classification. Experimental results show that the proposed method yields very satisfactory results. Indeed, for the first two considered data sets, this original approach outperforms the tested methods from the literature for all selection scenarios. For the last considered data set, the proposed method also outperforms the tested  literature approaches when selecting small numbers of spectral bands. For higher numbers of selected bands, the proposed approach is slightly less effective than one of the tested methods. The proposed method is easy to implement, and can undoubtedly contribute to the dimensionality reduction of remote sensing hyperspectral data. It also can contribute to the design of future sensors by an initial choice of decisive spectral bands allowing a maximum discrimination between different materials.
Future extensions of this work will particularly consist in automatically determining the optimal number of clusters by using clustering validity indices.

Disclosure statement
No potential conflict of interest was reported by the authors.