Decision fusion for hyperspectral image classification based on multiple features and locality-preserving analysis

ABSTRACT A novel fusion-classification system is proposed for hyperspectral image classification. Firstly, spectral derivatives are used to capture salient spectral features for different land-cover classes and a Gabor filter is applied to extract useful spatial features at neighbouring locations. Then, two locality-preserving dimensionality reduction methods are employed to reduce the dimensionality of data and preserve the local structure of neighbouring samples in the original image, derivative-feature and Gabor-feature domains. Finally, the classification results from Gaussian-mixture-model classifiers are fused by a decision-fusion approach. We have compared the proposed system to several traditional and state-of-the-art methods on two benchmark classification data sets. In both cases, our system achieved improved accuracy than the current best performing methods. Especially in the case of classification for the Indian Pines data set, the proposed DG-locality-preserving nonnegative matrix factorization (LPNMF) has 7% higher accuracy than support vector machine-Markov random field and LPNMF-G-MRF, and also has about 4% higher accuracy than Gabor-LPNMF. The proposed method has practical relevance as it shows potential to create smooth classification maps even for complex, spatially heterogeneous land-cover classes.


Introduction
Hyperspectral imagery (HSI) records hundreds of spectral bands for each pixel, each of which contains values that correspond to the reflected light in a defined range of the electromagnetic spectrum (Ye, Prasad, Li, Fowler, & He, 2014). Due to the detailed spectral information, HSI can be used to efficiently distinguish land-cover types (Kang, Li, & Benediktsson, 2014). However, the HSI often has a much greater dimensionality than the number of available training samples. The lack of training samples and the high computational burden caused by high-dimensional data processing are inevitable obstacles for designing HSI classifiers (Li, Zhang, & Zhang, 2015). To avoid this "curse of dimensionality" problem, dimensionality reduction is often employed to project the HSI data onto a lower-dimensional feature space. Principal component analysis (PCA) and Fisher's linear discriminant analysis (LDA) are traditional methods used for HSI dimensionality reduction (Martinez & Kak, 2001). PCA aims to find projections with a minimal reconstruction error for the overall information content in the data, whereas LDA seeks to find projections that preserve the information necessary to separate target classes (Prasad & Bruce, 2008a). A popular method, which employed PCA-based dimensionality reduction as a preprocessing to LDA, is named SLDA (Li, Prasad, & Fowler, 2013). Due to the influence of various factors, such as geometric features of material surfaces, the spectral signatures of pixels within the same landcover class often have complicated distributions. The mentioned three dimensionality reduction methods all assume that the distributions of spectral signatures within one class of the HSI are Gaussian. However, the distributions of classes in HSI data are often not Gaussian and, in extreme cases, may have complicated multimodal structures.
To address this problem, locality-preserving dimensionality reduction methods are studied to exploit the complicated statistical structure of HSI data. These methods include, for example, localitypreserving projection (LPP) (He & Niyogi, 2004), locality-preserving non-negative matrix factorization (LPNMF) (Cai, He, Wang, Bao, & Han, 2009) and local Fisher's discriminant analysis (LFDA) (Sugiyama, 2007). LPP is an unsupervised dimensionality reduction method which can preserve the local structure of each class. LPNMF combines the advantages of LPP and the nonnegative matrix factorization (NMF). The NMF decomposes the data into two non-negative matrices and extracts features for HSI classification, unmixing of HSI and other applications (Li, Prasad, Fowler, & Cui, 2012). LFDA combines the properties of LDA and LPP, exploiting the advantages of each and preserving the neighbourhood relationships within the embedding by employing an "affinity" matrix. In , it is argued that LFDA-based dimensionality reduction followed by Gaussian-mixture-model (GMM) (Berge & Solberg, 2006) classifiers (named LFDA-GMM) could capture the underlying statistical structure accurately and obtain higher classification accuracies as compared to LDA-GMM.
Although LFDA-GMM was shown to outperform traditional dimensionality reduction methods (i.e. LDA-GMM) for HSI classification, it still suffers from the fact that the classifier exploits only original spectral signatures. During the last decade, many researchers have worked with other spectral features. In Bao, Chi, and Benediktsson (2013), spectral derivatives have been proven to be an effective measure to capture salient features of different land-cover classes. In our previous work , spectral derivatives were used for HSI classification, and two fusion-classification algorithms, called D-LPNMF and D-LFDA, were proposed. In the two algorithms, the locality-preserving dimensionality reduction method (LPNMF or LFDA) is used to reduce the dimensionality of the original image and derivative features. Following this, GMM classifiers are applied to classify the image, and a decisionfusion approach is used to fuse the classification results. It has been proved that derivative features are helpful to improve the classification performance in this fusion-classification system. However, there is further space for improvements. Since there is a high probability that two adjacent pixels belong to the same HSI class, spatial information can be helpful to create accurate classification maps. In the aforementioned spectral-feature methods, the spatial information at neighbouring locations is not considered. Recently, several strategies have been proposed for spatial feature extraction. In particular, one popular approach adopts a Bayesian maximum a posteriori formulation partitioned into class-conditional and class-prior probability terms, in which the Markov random field (MRF) (Xu & Li, 2014) was viewed as a post-processing stage for HSI classification. In Tarabalka, Fauvel, Chanussot, and Benediktsson (2010), SVM-MRF as a popular method was proposed for HSI classification. For this method, the support vector machine (SVM) (Camps-Valls & Bruzzone, 2005) and MRF were employed for classconditional probability and class-prior probability of the classes, respectively. In Li, Prasad, and Fowler (2014), LFDA-G-MRF and LPNMF-G-MRF strategies have been proven to outperform SVM-MRF. In LFDA-G-MRF and LPNMF-G-MRF strategies, LFDA and LPNMF were employed for dimensionality reduction followed by GMM classifiers to preserve multimodal structures of HSI, and then an MRF model as post-processing integrates the classification results to exploit spatial-context information.
Recently, Gabor filters have also been successfully used for spatial feature extraction. In Bau, Sarkar, and Healey (2010), Shen and Jia (2011) and Jia, Zhu, Shen, and Li (2014), three-dimensional Gabor feature extraction approaches were found to represent useful spatial and spectral information. In Huo andTang (2011), Zhang, Zhang, Tao, andHuang (2012) and Li and Du (2015), two-dimensional Gabor features extracted in PCA-projected subspaces have also been proven to be efficient for HSI classification.
The aforementioned algorithms, such as the MRFbased and the Gabor-based, can incorporate the spatial-contextual information into the classification process. However, only single features have been considered in these algorithms, which may lead to the loss of useful information and unstable classification accuracy. Specifically, spectral features cannot differentiate between objects made of the same material, while spatial features alone may fail to discriminate between objects that are made of different materials but resemble in shape and size (Liao, Dalla Mura, Chanussot, & Pizurica, 2016). In this work, we develop a new framework for the classification of hyperspectral scenes that pursues the combination of multiple features.
The new contribution of this research is an innovative framework to combine three processing steps for efficient classification of HSI. In the first step, spectral derivate and Gabor filters are combined to extract the spectral and spatial features of HSI and thereby capturing the salient features of different land-cover classes and the orientation and scale properties of HSI. In the second step, the locality-preserving dimensionality reduction is studied to avoid the distortion of the information contained in multimodal distribution and preserve the local structure of each class in both derivative-feature and Gabor-feature domains. Furthermore, GMM classifiers are employed after applying locality-preserving dimensionality reduction for preserving the multimodal distribution of classes for HSI. In the third step, the pipeline outputs of multiple features are combined effectively by decision fusion in the form of logarithmic-opinion-pool (LOGP) (Prasad & Bruce, 2008b). Merging the classification results of the original signatures, the derivative features and the Gabor features into a final decision, is an effective way to exploit useful information from different sources. Thus, we assume that the proposed framework can significantly increase the classification accuracy and result in smoother classification maps with fewer erroneous outliers, compared to some traditional and state-of-the-art algorithms.

Locality-preserving analysis
Dimensionality reduction, which is an essential preprocessing step in high-dimensional data spaces, can decrease the computational cost and may also improve the classification accuracy. Here, localitypreserving analysis is studied for dimensionality reduction while preserving the local structure of each class. LPP, as an unsupervised dimensionality reduction method, seeks to find a linear map, W, that preserves the local statistical structure of neighbouring samples from the input data space. We consider a hyperspectral image data with training samples where d is the dimensionality of the data (the number of bands) and n is the total number of pixels per band. Class labels are given as where c is the number of classes. The "affinity" matrix between x i and x j are defined as A i;j 2 ½0; 1, where denotes the local scaling of the data x i , and x ðmÞ i is the mth nearest neighbour to x i . A i;j is an n Â n symmetric matrix which measures the distance among data samples.
The eigenmap is calculated via computing the eigenvectors and eigenvalues of the following generalized eigenvalue problem: where Λ is the diagonal eigenvalue matrix, W is the eigenvector matrix, L ¼ D À A is the Laplacian matrix and D is a diagonal matrix with the ith diagonal element. An example of dimensionality reduction for a twodimensional multimodal synthetic data set with two classes is shown in Figure 1. To demonstrate the advantages of the locality-preserving approaches, the data set is projected by LPP and traditional dimensionality reduction (i.e. PCA). Although the data set is projected onto a one-dimensional subspace, the example given in Figure 1(b) is in two-dimension for demonstration purposes. As seen in Figure 1, LPP not only can force far-apart data pairs of the same class to be close for preserving the within-class local structure, but also can obtain good betweenclass separation while PCA obviously fails for such a data set.

LPNMF
As a popular method to process high-dimensional data, the NMF focuses on the analysis of data matrices whose elements are nonnegative. The NMF seeks to find two nonnegative matrices U which minimize the following objective function: Since the objective function is not convex in both the U and V variables simultaneously, it is difficult to study an algorithm to determine the global minimum of O. Lee and Seung (1999) proposed an iterative algorithm to solve the NMF problem.
Although the NMF has been frequently applied in hyperspectral image processing tasks, it fails to consider the intrinsic geometric structure. As a consequence, LPNMF, which combines the advantages of LPP and NMF, was found to have more discriminating power for high-dimensional data (Cai et al., 2009). Compared to the traditional NMF, the objective function of the LPNMF can be defined as where λ is the regularization parameter. The first part of Equation (4) is the usual objective function of the NMF, which uses Kullback-Leibler divergence between X and Y. In the second part, < is used to force a geometric locality constraint among points in the reduced-dimensional subspace V(t < d), expressed as follows: Since the local geometric structure can be effectively modelled through a nearest neighbour graph on a scatter of data points, a graph can be considered with N vertices where each vertex corresponds to a document in the corpus. The edge-weight matrix W can be defined as where N p ðx s Þ denotes the set of p nearest neighbours of x s to measure the distance between points in the original space, X. This matrix, following the theory of LPP, is used to preserve the intrinsic geometry of the data distribution. The following multiplicative rules are used to minimize the function O and estimate the matrices U and V, where v q is the qth column of V, I is the n Â n identity matrix and L is the Laplacian graph of W. Due to the limited spatial resolution of many HSI, the spectrum of a pixel may be a mixture of several endmembers. The LPNMF can be used to extract an endmember-based feature and preserve the intrinsic geometric structure of the HSI.

LFDA
As a supervised dimensionality reduction technique, the LFDA, which combines LDA and LPP, has been proposed to handle multimodal, non-Gaussian distributions. The LFDA preserves the neighbourhood relationships within the embedding by employing an "affinity" matrix, as performed in the LPP. In this part, the heat kernel of the LFDA as described in Equation (1) is employed. In the LFDA, the "local" between-class S ðlbÞ and within-class S ðlwÞ scatter matrices are scaled via the affinity matrix and defined as where W ðlbÞ and W ðlwÞ are n Â n matrices The values of the in-class pairs are weighted by the affinity in S ðlbÞ and S ðlwÞ , which are weakly dependent on the far-part, in-class pairs. The solution that maximizes the Fisher's ratio within the context of the local scatter matrices is given by where Λ is the diagonal eigenvalue matrix and ϕ is the eigenvector matrix. Maximizing Fisher's ratio as defined using the local scatter matrices, the solution T LFDA can be calculated by T is determined so that data pairs of different classes are apart and nearby data pairs of the same class are close. Since the far-apart data pairs of the same class are not assumed to be close, the LFDA can be viewed as a localized LDA. The modified Fisher's ratio in the LFDA treats samples of a class within each mode independently so that it can estimate the dimensionality-reduction projection by these local scatter matrices and obtain good between-class separation while preserving local neighbourhoods. It is hence expected that the LFDA will be superior to the LDA when the distributions of input data are complex and even multimodal. In conventional approaches, locality-preserving analysis is presented for dimensionality reduction using only the spectral features, which may be suboptimal. In this article, these two advanced localitypreserving dimensionality-reduction methods -LPNMF and LFDAare utilized to reduce the dimensionality of both local spatial features and spectral features. The LPNMF and LFDA need to optimize the parameter r, which is the reduced dimensionality of the LPNMF projection and LFDA projection. In this work, we varied r over a wide range of values and chose the optimal r as those that maximized the training accuracy for the Indian Pines data set and the Pavia University data set.

Proposed fusion-classification system
Framework Spectral derivatives have been proven to be an effective measure to capture salient features of different land-cover classes (Bao et al., 2013). In our previous work, D-LFDA and D-LPNMF were studied by fusing original spectral signatures and derivative features . It also has been proven that Gabor filters can effectively capture the orientation and scale of the physical structures for the HSI spatial feature extraction (Zhang et al., 2012;. However, little work has been conducted to study the combination of derivative features and Gabor features for HSI classification. In this work, we focus on a fusion-classification framework, in which the locality-preserving dimensionality reduction followed by GMM classifiers is introduced into both derivative-feature and Gabor-feature domains. The objective here is to preserve the local structure of each class, obtain the output labels by GMM classifiers and then combine the results by using a decision-fusion approach for a final decision. As shown in Figure 2, the proposed fusion-classification framework consists of four steps. First, the derivative features are extracted in the spectral direction and the Gabor features are extracted in the spatial direction. Then, either the LPNMF or LFDA is used to reduce the dimensionalities of the original image, the derivative features and the Gabor features. Afterwards, GMM classifiers are subsequently applied to classify the image. Finally, the classification results are merged by the LOGP decision-fusion approach. A detailed description of the proposed algorithms (named DG-LPNMF and DG-LFDA) is described as Strategy 1.
Strategy 1 The proposed DG-LPNMF/LFDA Input: Training data X ¼ x i f g n i¼1 , class labels ω i , testing samples y 2 R dÂn The spectral derivative is conducted on y, giving the first-order derivative features via Equation (15); A Gabor-filter is applied to y, giving Gabor features via Equation (17); Either the LPNMF or LFDA followed by the GMM classifier is implemented to the derivative features, Gabor features and y; Classification results are fused using Equation (23); Output: class (y) Multiple feature analysis

Derivative features
The spectral derivatives, estimated by obtaining the slope information from the reflectance curve over the available wavelengths in the spectrum, can be used to alleviate spectral distortions and to eliminate systematic errors in the original HSI data (Kalluri, Prasad, & Bruce, 2010;Bao et al., 2013). Traditionally, the spectral derivatives are evaluated in different step-lengths and different orders for hyperspectral image classification. The first-order spectral derivative is defined as @y i;j;λ @λ l ¼ y i;j;λ l À y i;j;λ k λ l À λ k where y i;j;λ k and y i;j;λ l are the spectral reflectance values corresponding to the wavelengths λ k and λ l , respectively, where k and l represent the different spectral bands. The difference between k and l is considered to be the step-length. If the step-length Δλ ¼ λ l À λ k > 0, the wavelength windows are fixed.
In this article, we set Δλ ¼ 1.
Similar to the first-order derivative, the nth-order spectral derivative is defined as @ n y i;j;λ @λ n ¼ @ @λ In our previews work , two fusionclassification algorithms were studied to fully exploit the spectral information of HSI. D-LFDA indicates the fusion-classification algorithm-based LFDA-GMM using original spectral signatures and derivative features, and D-LPNMF indicates the fusionclassification-algorithm-based LPNMF-GMM using original spectral signatures and derivative features. In preview work, we conducted experiments to study the classification performance of D-LFDA and D-LPNMF with fusing different order derivatives. The results demonstrated that the performance of both approaches improves slowly with an increasing number of derivative features and the growth of the classification accuracy diminishes for more than eight derivative features. As in this work, the application of Gabor features further increased the complexity of the workflow; we followed a trade-off between classification accuracy and system complexity by employing only the first-order derivative features.

Gabor features
The Gabor filter, which is a sinusoidal function modulated by a Gaussian envelope, can be viewed as an orientation dependent bandpass filter and can effectively capture the orientation and scale of the physical structures for the HSI spatial feature extraction . However, the HSI contains a wealth of spectral information over a wide range of the spectrum and the wavelength interval is relatively small, which adds to the processing complexity and statistical ill-conditioning in the classification tasks. To improve the efficiency, we considered dimensionality reduction before the Gabor feature extraction. Since Gabor features extracted in PCA-projected feature spaces have been proven to deliver very good results (Huo & Tang, 2011), (Zhang et al., 2012) and (Li & Du, 2015), we employed this strategy in our feature-extraction pipeline.
In a two-dimensional a; b ð Þ coordinate system, the Gabor filter, including both a real and imaginary component, can be represented as gða; b; δ; θ; ψ; σ; γÞ where δ represents the wavelength of the sinusoidal factor, ψ is the phase offset and γ is the spatial aspect ratio (the default value was set to 0.5 in (Daugman, 1985) and (Clausiand & Jernigan, 2000) specifying the ellipticity of the support of the Gabor function. ψ ¼ 0 and ψ ¼ π=2 return the real and imaginary parts of the Gabor filter, respectively (Chen, Li, Su, & Liu, 2014). In Equations (18) and (19), θ represents the orientation separation angle between the Gabor kernels. The parameter σ is the standard derivation of the Gaussian envelope and is determined by δ and the spatial frequency bandwidth, bw, as In our proposed fusion-classification system, all eight orientations (0, π 8 , π 4 , 3π 8 , π 2 , 5π 8 , 3π 4 , 7π 8 ) for each band pass are initially considered and system parameters (δ and bw) are tuned for optimal classification results. For example, we discuss the overall classification accuracy (OA) of the proposed algorithms for different δ and bw for the Indian Pines data set in Figure 3. It can be observed that the classification performance of the proposed algorithms is affected by the parameter δ but insensitive to the parameter bw.
GMM GMM can be viewed as a combination of two or more normal Gaussian distributions. In typical GMM representation, a probability density function for X ¼ x i f g n i¼1 2 R dÂn as the sum of K Gaussian components (the default value is 5 in  can be estimated as represents the kth Gaussian component of the mixture. The variables α k , μ k and Σ k are the mixing weights, mean vector and covariance matrix, respectively, which are expressed by the parameter vector Θ α k ; μ k ; Σ k À Á . Once the optimal number of components has been determined, the parameters for the mixture model can be estimated by using an iterative optimization strategyexpectation maximization (Dundar & Landgrebe, 2002). In recent years, GMM classifiers have been shown to be well-suited for remotely sensed data. In  a LFDA-GMM strategy is proposed which achieved higher performances for hyperspectral image classification as compared to traditional LDA-GMM. We also investigated the LPNMF combined with GMM (named LPNMF-GMM) and observed good classification results under the proposed framework. In this work, we focus on discussing multimodal distribution of classes for HSI and preserving the within-class local structure. GMM classifiers are hence a natural fit for this topic.

Decision fusion
In this article, the LPNMF-GMM and LFDA-GMM are employed as the basic classification strategies for multiple features, such as the derivative features and Gabor features. We propose a fusion-classification system based on the LOGP decision fusion rule to combine the distinct decisions from different feature spaces into a final decision. The LOGP, which is a popular soft-decision fusion rule (Prasad & Bruce, 2008a), entails the use of posterior probabilities or some class-membership function from every classifier for making the final decision. In the LOGP, the distinct decisions from different classifiers are independent, and the output is a weighted product of different probability distributions .
The global class-membership function for the LOGP can be defined as where α i is the confidence score for the ith classifier, w is the class label, i is the classifier index, n is the number of classifiers and class j is detected in the bank of classifiers.

Data description and experimental set-up
All experiments of the study were conducted using MATLAB. We applied two hyperspectral data sets. The first data set is the Indian Pines collected by the National Aeronautics and Space Administration's Airborne Visible/Infrared Imaging Spectrometer sensor. The data set represents a vegetation-classification scenario with 145 × 145 pixels with a pixel size of 20 m and 220 spectral bands. The original Indian Pines data set consists of 16 land-cover classes and some of them contain a small number of samples. Since the probability distribution obtained from such a small number of training samples cannot well represent their statistical characteristics, we selected a subset of eight classes (see Table 1). For this data set, an average of 50 training samples per class and 8624 testing samples (the ratio of testing and training is approximately 21.6:1) are set up for experiments. The second data set is located in an urban environment and provided by the University of Pavia. The imagery was collected by the Reflective Optics System Imaging Spectrometer sensor. The data set has 103 spectral bands with 610 × 340 pixels and a pixel size of 1.3 m. The data set has nine classes, with 50 training samples per class and 24,776 testing samples (the ratio of testing and training is approximately 55:1). For this data set, the number of training and test samples for each class is summarized in Table 2.

Parameter selection
Existing classification algorithms, such as the SVMbased and the dimensionality-reduction-based, need to optimize the system parameters for any HSI classification task. For the SVM-based, the kernel parameter, σ, is an important factor that impacts the generalization ability in the kernel-induced space. For the LFDA-based and the LPNMF-based, we need to find an optimal parameter, r, which is the reduced dimensionality of the LFDA projection and the LPNMF projection. We varied the parameter σ from 0.1 to 1.0 and r over a wide range from 5 to 40. The optimal parameters were selected by maximizing the training accuracy.
The proposed two fusion-classification algorithms are named DG-LFDA and DG-LPNMF. Both employ Gabor filters to extract spatial information of HSI. For the Gabor-feature-based algorithms, all eight orientations for each pass band are initially considered and system parameters are tuned for optimal classification results. First, we present the parameters of the Gabor filter with different δ and bw for the Indian Pines data set (see Figure 3). For this data set, the available training samples and testing samples are both 50 per class. Figure 3(a,b) illustrate OA of the proposed DG-LFDA and DG-LPNMF versus the varying δ and bw. We observe that the classification performance of the proposed two fusion-classification algorithms is affected by the parameter δ but is insensitive to the parameter bw. We selected δ ¼ 18 and bw ¼ 4 as the optimal parameters in the following experiments. As a result of the Gabor features extracted in the PCA-projected subspaces, the number of principle components (PCs) should be viewed as a parameter for the proposed system. For the LFDA and LPNMF, the r for dimensionality reduction is expected to vary with the experimental data set. Figure 4(a,b) shows the OA of the proposed DG-LFDA and DG-LPNMF versus varying the PC and r. For the DG-LFDA, the optimal values of PC and r are 20 and 7, respectively. For the DG-LPNMF, the optimal values of PC and r are 16 and 33, respectively. Similar experiments using the Pavia University data set are performed for finding the appropriate parameters. Accordingly, we selected δ ¼ 20 and bw ¼ 5 as the optimal parameters of Gabor filter for the Pavia University data set. For the DG-LFDA, the optimal values of PC and r are 16 and 7, respectively. For the DG-LPNMF, the optimal values of PC and r are 20 and 29, respectively.

Classification results
To quantify the efficacy of the proposed system, we compare the proposed fusion-classification algorithms (DG-LFDA and DG-LPNMF) with SVM (Camps-Valls & Bruzzone, 2005), SVM-MRF (Tarabalka et al., 2010), IFRF (Kang et al., 2014) and several corresponding methods (LDA-GMM, LPP-GMM, LFDA-GMM, LFDA-GMM , D-LFDA , D-LPNMF , LFDA-G-MRF  and LPNMF-G-MRF . Table 3 shows the classification results for the two data sets. Two mathematical criteria are derived from the confusion matrix: the OA and the Kappa (к) coefficient. To avoid any bias, the experiments are repeated 20 times, reporting the average classification accuracy and the standard deviation in Table 3.
We investigated the performance of three SVM-classifier methods, including traditional SVM, SVM-MRF and IFRF. Since IFRF is also designed to fuse multiple features and published recently, we added IFRF as comparison to the experiments. The code of IFRF is available on the web. 1 From Table 3, it can be seen that, although IFRF perform much better than traditional SVM, the OA and к of IFRF is unsatisfactory when compared with SVM-MRF for the Indian Pines data set. The proposed DG-LFDA and DG-LPNMF always outperform SVM and SVM-MRF for the Indian Pines data set and the Pavia University data set. Compared with IFRF, DG-LPNMF shows comparable performance and DG-LFDA has a slight advantage for the Pavia University data set, while the improvements from both DG-LPNMF and DG-LFDA are very good for the Indian Pines data set. From the results of GMM-classifier methods, it inferred that the performance of LFDA-GMM and LPNMF-GMM are better than LDA-GMM and LPP-GMM, while D-LFDA and D-LPNMF are superior to LFDA-GMM and LPNMF-GMM, respectively. As spatial-spectral classification algorithms, LFDA-G-MRF and LPNMF-G-MRF achieve much higher OA and к than aforementioned spectral classification algorithms, wherein the performance of LPNMF-G-MRF is excellent for both the Indian Pines data set and the Pavia University data set. Moreover, we investigated the performance of two Gabor-feature-based algorithms (named Gabor-LFDA and Gabor-LPNMF). From the results of Table 3, these two Gabor-feature-based algorithms can achieve improved OA and к compared to the derivative-feature-based algorithms (such as D-LFDA and D-LPNMF). Especially for the Indian Pines data set, Gabor-LFDA shows over 19% higher OA than D-LFDA, and Gabor-LPNMF yields about 17% higher OA than D-LPNMF. However, the performance of the proposed DG-LFDA and DG-LPNMF exceeds those of the traditional and the state-of-art algorithms based on locality-preserving dimensionality reduction. For the Indian Pines data set, the proposed DG-LFDA has an OA that is 25% and 11% higher than the D-LFDA and LFDA-G-MRF respectively, and the proposed DG-LPNMF has an OA 20% and 7% higher than the D-LPNMF and LPNMF-G-MRF. The performance of the proposed DG-LPNMF and DG-LFDA is also better than all aforementioned GMM-classifier methods for the Pavia University data set described in Table 3. Figure 5 shows the classification maps resulting from all approaches for the Indian Pines data set with the same labelled samples as listed in Table 1. It can be observed that four spectral-spatial algorithms (including LFDA-G-MRF, LPNMF-G-MRF, DG-LFDA and DG-LPNMF) are superior to other spectral algorithms, tending to result in smoother classification maps with fewer erroneous outliers. MRF-based methods, such as SVM-MRF, LFDA-G-MRF and LPNMF-G-MRF, can smooth classification maps by capturing spatial contextual information as compared with SVM, LFDA-GMM and LPNMF-GMM correspondingly. But upon closer inspection for LFDA-G-MRF and LPNMF-G-MRF, the mislabelling of large contiguous regions can be observed for the classes Corn-no till with Corn-min till.  Additionally, we observe that D-LFDA and D-LPNMF cannot clearly improve classification maps as compared with LFDA-GMM and LPNMF-GMM, while the proposed DG-LFDA and DG-LPNMF can clear and smooth the classification maps significantly due to the ability of Gabor features to capture additional spatial information. From what has been discussed earlier, we may conclude that the proposed system is superior to current MRF-based methods and Gabor features have a bigger contribution than derivative features in the system for improved classification maps. Figure 6 illustrates that the classification maps for the Pavia University data set is consistent with Since labelling training samples for hyperspectral image classification is a complicated task, the number of training samples is often insufficient to reliably estimate the classifier models for each class. To match the challenging conditions, we investigate the influence of the different training sample sizes. Hence, we report the performance of the traditional methods (LFDA-GMM and LPNMF-GMM), the state-of-art methods (IFRF, LFDA-G-MRF and LPNMF-G-MRF) and the proposed methods (DG-LFDA and DG-LPNMF) as measured by the OA, along with 95% confidence intervals. For both the Indian Pines and the Pavia University data sets, the OA of the algorithms mentioned earlier is studied over an available range of training samples (from 50 to 100 per class). Figure 7 demonstrates that the performance of the proposed DG-LFDA and DG-LPNMF obviously achieve much higher OA than all of the corresponding algorithms for the Indian Pines data set. For Pavia University data set as shown in Figure 8, the performance of DG-LFDA performs consistently better than other algorithms, while DG-LPNMF has a close performance to IFRF. When 100 samples per class are used for training, DG-LPNMF approaches a 98% OA for the Indian Pines data set, and DG-LFDA approaches an OA higher than 97% for the Pavia University data set.