Gaussian mixture model and Markov random fields for hyperspectral image classification

ABSTRACT This paper presents a novel method for reliable and efficient spatial-spectral classification of hyperspectral data. This algorithm is based on the Bayesian labelling by combining the results of the Gaussian mixture model (GMM) with spatial-contextual information extracted by Markov random fields (MRF). Moreover, a new fuzzy segmentation-based function was defined and incorporated into the spatial energy involved to improve the performance of MRF. To evaluate the proposed algorithm in real analysis scenarios, three benchmark hyperspectral datasets, i.e. Indian Pines, Pavia University and Salinas, were used. Experimental results demonstrated that the proposed method could considerably improve the classification’s overall accuracies when compared to conventional MRF-based approaches.


Introduction
Over the past two decades, various hyperspectral image (HSI) classification methods have been developed for land cover mapping applications (Plaza et al., 2009). Most of those methods process each pixel independently without considering the correlations between spatially neighbouring pixels (Camps-Valls & Bruzzone, 2005). For HSI with excellent spatial resolution, it is very likely that two adjacent pixels belong to the same class of objects.
Gaussian mixture model (GMM) classifiers, as a popular classification tool, have gained significant attention in the remote sensing community; thanks to its capability of capturing non-Gaussian statistics of multivariate data and remarkable accuracy in classifying high-dimensional data (Benediktsson & Ghamisi, 2015). GMMs do not consider the spatial correlations and classify image data only based on their spectral information. Therefore, they do not employ the relevant information that can be extracted from the spatial domain.
MRFs are a family of probabilistic models for integrating spatial-spectral information into image classification problems. They have been applied recently to exploit the spatial information, in which neighbouring pixels tend to be of the same class. The underlying assumption in the MRF is that, in HSIs, it is most likely that two adjacent pixels have the same class label. In the MRF framework, the maximum a posteriori (MAP) decision rule is usually formulated as the minimization of an appropriate energy function (Solberg, Taxt, & Jain, 1996). MRF-based classifiers are well-known and frequently used approaches in various remote sensing applications. In particular, Li, Prasad, and Fowler (2014) investigated the integration of a GMM method within an MRF framework for accurate HSI classification. In this context, they have considered a GMM classifier based on local Fisher's discriminant analysis (LFDA) and locality-preserving nonnegative matrix factorization (LPNMF)-induced feature subspace. Then, the GMM-based pixel-wise classification results were subsequently followed by an MRFbased technique that integrates the spatial information. Tarabalka, Fauvel, Chanussot, and Benediktsson (2010) developed a Bayesian-like processing results were derived from SVM-based discriminant functions and combined with MRF models of the contextual information. In another research, it was shown that the MRF model has the advantage of working with various probability distributions (Li, Bioucas-Dias, et al., 2012). In that work, the posterior class densities exploited by a subspace multinomial logistic regression (MLR) classifier and the corresponding spatial contextual information was integrated. This information is generated by MRF-based multilevel logistic prior to a combinatorial optimization problem and solves this MAP estimation problem by a graph cut algorithm (Li, Bioucas-Dias, et al., 2012). Recently, a novel and rigorous supervised method have been proposed for the classification of multi-resolution images (i.e. a high spatial resolution panchromatic channel and several coarse spatial resolution multispectral channels) using an MRF (Moser, De Giorgi, & Serpico, 2016). The method iteratively exploits a linear mixture model to characterize the relationships between data at different resolutions and uses a graph cut approach to optimize Markovian energy for the contextual classification.
Another group of spatial-spectral classification approaches is based on segmentation algorithms (Ghamisi, Benediktsson, & Ulfarsson, 2014;Tarabalka et al., 2010). In these algorithms, an HSI is first segmented into homogeneous areas based on its intensity or texture properties. Then, the pixels within the same areas can be considered as a spatial neighbourhood. Afterward, majority voting is used for integrating the pixel-wise classification results with a segmentation map obtained by an image segmentation algorithm. For instance, an automatic framework for the classification of HSIs was employed in Ghamisi, Benediktsson, and Ulfarsson (2014). The method was based on combining Hidden MRF segmentation with the Support Vector Machines (SVM) classifier. In another research (Tarabalka, Benediktsson, & Chanussot, 2009), the authors proposed a new spectral-spatial classification scheme for HSIs that combines the results of a pixel-wise SVM classification and a segmentation map obtained by partitional clustering using majority voting. However, various spatial-spectral classifiers can be combined for obtaining high accuracies.
Hyperspectral data are highly correlated in adjacent spectral bands since the feature vectors are the discretization of continuous curves. Moreover, the impractically large size of resulting parameter space has hindered widespread adoption of GMM for HSI (Li et al., 2014). For example, if the HSI data are of d-dimensional, then the resulting dimensionality of the parameter space for a C-component GMM, assuming full covariance matrices, will be C(1 + d (d−1)/2 + d). To deal with this problem, dimensionality reduction techniques have been suggested. It has been shown that the first projection of data might facilitate the learning of high-dimensional mixtures of Gaussians into a randomly chosen subspace of low dimension (Dasgupta, 2000). As a result, a random projection (RP) dimension reduction method has been used to represent the data in a lower-dimensional space. Dasgupta (2000) presented a comprehensive series of experiments intended to illustrate the benefits of this technique precisely. Moreover, LFDA, which has been introduced for hyperspectral data (Sugiyama, 2007), is an extension of FDA such that the class distributions are no longer restricted to be a unimodal Gaussian. Li, Prasad, Fowler, and Bruce (2012) demonstrated that appropriate LFDA preprocessing ensures that the GMM models learned in the HSI feature space do not have an unreasonably large number of free parameters to be estimated from the training data. LFDA obtains an excellent intra-class local structure in the projection while preserving the inter-class local structure at the same time.
This paper presents a novel GMM-and MRFbased (GMMMRF) method for spectral-spatial classification of HSIs. In the first step, a GMM classifier, based on a low-dimensional feature space induced by either LFDA or RP, is applied to learn posterior probabilities from the spectral signatures. Then, spatial contextual information, which is achieved by using MRF regularization, is used for refining the classification results of the first step. An essential difference of the proposed method from previously proposed ones consists in applying a segmentation algorithm and defining a new "fuzzy segmentationbased" function into the spatial energy function involved in MRFs, aiming at improving classification efficiency while performing the spatial regularization. Figure 1 depicts an overview of the proposed classification method. The initial posterior probability that a pixel belongs to a specified class is estimated by performing the GMM classifier on the dimension reduced data. Taking spatial contextual information into account, the final classification map is obtained using the improved MRF regularization. The improved MRF regularization benefits from the "fuzzy segmentation-based" function, which is achieved by integrating the mean shift segmentation and classification maps. In the following, the specific parts of the proposed framework will be discussed in detail.

GMMMRF classification scheme
As the input, an HSI, (I), made up of B twodimensional arrays is given. This matrix is actually a set of n-pixel vectors X ¼ x i 2 R B ; i ¼ 1; 2; . . . ; n f g : Consider L ¼ l 1 ; l 2 ; . . . ; l n f g2 1; 2; . . . ; C f gas a set of classes in the scene, where C is the number of classes. Classification consists in assigning each pixel to one of the C classes of interest. According to the Bayes rule, the posterior probability of the class labels L given data X is where P X ð Þ is a constant that does not affect the final result, P L ð Þ can be viewed as the spatial prior and P XjL ð Þ is the likelihood function, which is derived from the data distribution. L minimizes P LjX ð Þ in Equation (1) is referred to as the MAP estimation of L.
The first and the second parts of the above equation are known as spectral energy and spatial energy, respectively. Equation (2) is a combinatorial optimization problem involving unary and pairwise information terms, which is difficult to compute. Several algorithms, such as iterated conditional modes (Besag, 1986), graph cuts (Boykov & Kolmogorov, 2004), tree-reweighted message passing (Kolmogorov, 2006) and α-Expansion graph-cutbased , have been proposed in the literature to solve this general optimization problem. In this paper, we used the Metropolis algorithm based on stochastic Simulated Annealing (SA) algorithm for computing the MAP estimate of the classification map (Geman & Geman, 1987). The Metropolis algorithm is guaranteed to converge to a global minimum and can significantly overcome the local minimum problem (Li, 2009).

Local Fisher discriminant analysis and random projection
LFDA provides supervised dimensionality reduction designed to handle non-Gaussian distributions (Sugiyama, 2007). LFDA combines the ideas of FDA and Locality preserving projection (LPP) . FDA is a traditional linear approach for supervised dimensionality reduction, but it tends to give unexpected results if samples in some classes make several separate clusters (i.e. multimodal). LPP is an unsupervised dimensionality reduction method which can work well with multimodal data due to its locality preserving characteristic. LPP does not take the label information into account. Furthermore, LPP can make samples of different classes overlapped, if they are close to the original highdimensional space. To overcome these problems, LFDA works in such a way that the local intra-class S b ð Þ and inter-class S w ð Þ scatter matrices are: where n c is the number of available training samples for the c th class, P C c¼1 n c ¼ n , and A is defined as the affinity matrix. Using S w ð Þ and S b ð Þ , the LFDA transformation matrix T LFDA is defined as: LFDA has shown to yield better performance than FDA and LPP for dimensionality reduction for non-Gaussian or multimodal data. RP, which has been introduced for hyperspectral data, is a general data reduction method (Dasgupta, 2000). In this method, the original high-dimensional data is projected into a low-dimensional subspace, without using the Eigen decomposition of matrices. As a result, it runs extremely fast and stable. The principal idea of RP comes from the Johnson-Lindenstrauss lemma (Frankl & Maehara, 1988). According to this lemma, if points in a vector space are projected onto a randomly selected subspace, the distances between these points are approximately preserved (Achlioptas, 2001) Þ subspace. It can be done using a random matrix R dÂB whose columns have unit lengths.
In Equation (8), X BÂN is the original set of B-dimensional observations. The choice of random matrix R is one of the key points of interest. The elements r ij of R are often Gaussian distributed, but this does not need to be the case. Here, we used a sparse matrix that has been introduced by Achlioptas (Achlioptas, 2001) and distributed according to Equation (9).
Bingham and Mannila (2001) showed that Achlioptas theoretical results have feasible significance. Moreover, RP is a data-independent and computationally efficient method that approximately preserves pairwise distances between the points of a set without introducing significant distortion. Hence, it is widely used in many dimension reduction applications that need to maintain the local structure of data and require immediate results.
Gaussian mixture model classification GMM has been successfully used for HSI classification (Bingham & Mannila, 2001). It has also proved beneficial for a variety of classification tasks, such as speech and speaker recognition, clustering, etc. (Hassanpour, Shahbahrami, & Wong, 2008). The multivariate finite mixture model can be obtained using the equation below: where, α j are the nonnegative mixing coefficients for the ith term, with sum-to-one constraint, i.e. α 1 þ . . . þ α C ¼ 1: Φ X; μ j ; P j denotes the normal probability density function with d-dimensional vector of means μ j , and P j as the covariance matrix. Note that in order to calculate the density estimate at a point x i , we only need to retain C−1 mixing coefficients, C Â d value for the means and Cd C þ 1 ð Þ ð Þ =2 value for the component's covariance matrices. For estimating the parameters of GMM, the Expectation-Maximization (EM) algorithm is used.

Expectation-Maximization algorithm
EM is the process of finding an estimate of maximum likelihood when there are missing data (Figueiredo & Jain, 2002;Hasan & Gan, 2009). To use the EM algorithm, the number of terms C in the mixture must be known. This is usually obtained using a priori knowledge of the application, looking for groups of data using clustering methods or other methods of estimating the number of terms, such as adaptive mixtures approach (Priebe, 1994). In this paper, we already know the number of components of the GMM. Moreover, an initial guess for the values of the mean, variance and mixing coefficients is needed. The quality of the GMM model is sensitive to the initialization of the parameter space. For this purpose, the training samples which are randomly selected from the overall ground truth in different rates (1%, 5%, 10%, 15% and 20%) are used. After obtaining the initial estimation, the required parameters are updated using equations defined below. The posterior probabilities are given by: where,P x i jL ð Þ is the finite mixture estimate at the point x i . Using the estimated posterior probability, the parameter of each component can be updated through Equations (13-15).
The procedure of EM algorithm for a finite mixture is as follows: (1) Determine the number of component densities C.

Mean shift segmentation
Image segmentation is of crucial importance to early vision tasks where pixels with similar features are assembled into analogous regions. In recent years, researchers have developed many useful image segmentation algorithms (Zuva, Olugbara, Ojo, & Ngwira, 2011). In this paper, the mean shift algorithm as a low-level segmentation technique is used. The Mean Shift method segments an image into many visual homogenous areas. Based on these areas, more features can be involved which facilitates the subsequent image interpretation. Mean shift, as a simple iterative method which shifts each pixel to the average of pixels in its neighbourhood, was introduced by Fukunaga and Hostetler (1975), and generalized by Cheng (1995). Since then, this segmentation method has become a widely-used approach in the vision community. It is one of many techniques under the heading of "feature space analysis." The mean shift algorithm consists of two main steps: a mean shift filtering of the original image data in the feature space and a succeeding clustering of the filtered data points. Mean shift algorithm is a popular segmentation algorithm because of its simplicity and flexibility. As with all clustering problems, there is no correct clustering. Instead, correct is usually defined by what seems reasonable given the problem domain and application. Mean shift provides one nice knob (the kernel bandwidth parameter) that can easily be tuned appropriately for different applications.
There are many different types of kernels, but the most popular one is the Gaussian kernel. So, as it was iteratively suggested in the literature, we used the Gaussian kernel in our method. We tested the results of the experiments with different number of bandwidth, and the results of the proposed method showed to be less sensitive to the bandwidth parameter. The results showed that the optimal value of bandwidth is [4][5][6][7][8]. A detailed description of this approach can be found in Comaniciu and Meer (2002).

MRF-based regularization
In the field of machine vision, image analysis is usually defined as a modelling problem. MRF provides a consistent and convenient modelling algorithm with contextual-based entities or intrinsically linked with features. To improve the classification performance achieved by using only spectral information, we integrated the spatial contextual information with spectral information, by using an MRFbased regularization technique. In the MRF, a clique is defined either as a single term or a set of pixels that are neighbours. Here, an eight-neighbourhood clique is assumed (let N j be the set of neighbours for a given pixel x i ). The most common form of the inference over the posterior MRF, in vision and image processing problems, is MAP estimation which is computed through Equation (2). In this paper, we used an MRF model based on (Tarabalka et al., 2010), thus, the MRF computes the MAP estimation via a Metropolis algorithm. This algorithm is based on the stochastic relaxation and annealing for computing the MAP estimation of the true classification image given the GMM classification map. The proposed method is based on the Bayesian approach and aims at minimizing the global energy in an image, by the iterative minimization of local spatial and spectral energies. These energies are associated with randomly chosen image sites, i.e. pixels. The local energy of a randomly chosen site associated with a pixel x i was computed using Equation (16).
where, E spectral x i ð Þ is the spectral energy and E spatial x i ð Þ is the spatial energy function computed over the local neighbourhood N i . The spectral energy is defined as: where, P x i jl i ð Þ is the membership value of the pixel x i in each of the classes' fraction outputs. First, we consider the standard spatial energy, which is computed as: where, δ L ð Þ (i.e. δ 0 ð Þ ¼ 1 and δ L ð Þ ¼ 0 for LÞ0) is the unit impulse function. β is the parameter which controls the level of smoothness. The spatial energy is suitable for an image with large spatial structures, such as agricultural lands. However, if a small object is presented in the image, this function tends to assign this object to the class of adjacent objects.
A reliable segmentation map can improve the spatial energy of the MRF regularization and help to overcome the problem mentioned above. The second expression of the spatial energy E S spatial is used to mitigate this drawback and to preserve the small structures and remove the noisy effects in the classification map. For this purpose, we use the result of the Mean Shift segmentation map. The output of this segmentation method is composed of some objects. Each object consists of several pixels with the same label. In other words, pixels in each object share the same characteristics. In order to further improve the MRF regularization, we investigate the benefits of integrating the segmentation map with the MRF model. Hence, we defined a new "fuzzy segmentation-based" function which is derived from the segmentation map and calculated as follows: where N is the number of pixels in each object and ρ j is the number of pixels with the same label pixel of the x j . To employ the fuzzy segmentation function in the MRF regularization process, first, the object that a given pixel x j fall into, is assigned using the output of the Mean Shift segmentation map. Then, the total number of pixels with any labels (N) and the set of pixels with the same label of that given pixel (ρ) in the object are counted. Thus, the fuzzy segmentation map is acquired through the Equation (19). The described technique leads to considerable improvement in terms of classification accuracy because it uses the local neighbourhoods by taking into account the spatial information in the classification process. Figure 2 represents an illustration of computing the "fuzzy segmentation-based" function. From here, the following spatial energy function is defined: The superscript "S" means that the segmentation information is taken into account. In the following, we refer to two different methods, namely, GMMMRF and GMMMRF-S, when Equations (17) and (20) are used for computing the spatial energy, respectively. In each iteration, a pixel was randomly chosen. The local energy of this pixel E x i ð Þ was computed by Equation (16). Then a new class label l new i is selected for the point. This label is the most repetitive label in the corresponding segment, except the old label. If there is not any different label, the new label will be assigned randomly. A new label energy E new x i ð Þ is also computed. If the variation of energy meets ΔE ¼E new x i ð ÞÀE x i ð Þ<0; the new class label will be assigned to x i otherwise, the new class label assignment is accepted with probabilityp ¼ exp ÀΔE ð Þ:

Image analysis
Hyperspectral datasets The proposed GMMMRF-S and GMMMRF approaches are experimentally evaluated using three real benchmark hyperspectral data sets recorded by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) and the Reflective Optics System Imaging Spectrometer (ROSIS).
• University of Pavia image is an urban area that was collected by the ROSIS sensor with a spatial resolution of 1.3 m. The image is of 610 by 340 pixels and 103 spectral channels. The reference data contain nine classes of interest. • Indian Pines dataset was acquired by AVIRIS sensor with a spatial resolution of 20 m/pixel. The image contains 145 × 145 pixels in 202 bands. This dataset contains 16 mutually exclusive classes and 10,366 labelled pixels. Sixteen classes of interest are considered, which are detailed in Table 1. • AVIRIS Salinas, this well-known dataset was captured over Salinas Valley, California, which is investigated as the third hyperspectral data for the experimental tests. The image has 204 spectral bands and a spatial resolution of 3.7 m. The ground truth data contains 16 separated classes and a total of 54,129 labelled pixels. Figure 2. The procedure of computing the ε x j À Á coefficient through the "fuzzy segmentation-based" function.
AVIRIS Salinas, this well-known dataset was captured over Salinas Valley, California, which is investigated as the third hyperspectral data for the experimental tests. The image has 204 spectral bands and a spatial resolution of 3.7 m. The ground truth data contains 16 separated classes and a total of 54,129 labelled pixels.

Experimental results
In all experiments, the EM algorithm was iterated 1000 times. As shown in Figure 3, we have investigated the performances of the proposed methods for different values of the context weight parameter β. The optimal value of β for GMMMRF is β 2 1; 2 ½ and for the GMMMRF-S is β 2 2; 4 ½ : LFDA and RP can be viewed as preprocessing for the proposed methods, which implies that reduced dimensionality is an essential parameter for LFDA-GMMMRF and RP-GMMMRF classifiers. The first experiment is performed on the Indian Pines dataset. To avoid any biases, all the experiments were repeated 10 times, and we have reported the averaged classification accuracies. Table 1 summarizes the global (Overall Accuracy (OA), Average Accuracy (AA) and a kappa coefficient of agreement) and class-specific classification accuracies for the Indian Pines image. In addition, for each data set, the standardized McNemar's test was applied separately to the map of each method to assess whether the difference in accuracy between this map and the result of RP-GMMMRF-S is statistically significant. (20) where f 12 indicates the number of test samples that are correctly labelled by RP-GMMMRF-S and simultaneously not by the comparison method. Using the commonly accepted 5% level of significance, the difference between the results of RP-GMMMRF-S and comparison method is statistically significant if Z j j>1:96 (Dietterich, 1998). If this condition is met, a positive or negative value of Z indicates that the RP-GMMMRF-S or the compared method, is more accurate on the test set.  The performance of the above methods with different numbers of training samples for Indian Pines, Salinas and Pavia University datasets is illustrated in Figure 4. The training samples are randomly selected from the overall ground truth in different rates (1%, 5%, 10%, 15% and 20%) for the three data sets. The remaining samples in each rate are used as test samples to evaluate the classification accuracies. As can be seen from Figure 4, the classification accuracies of all the classification methods share a similar trend as the number of training samples increases. The classification performances of all the methods gradually increase with the increase in training number. The classification methods incorporating MRF models (RP-GMMMRF, RP-GMMMRF-S, LFDA-GMMMRF and LFDA-GMMMRF-S) always outperform traditional RP-GMM and LFDA-GMM, in all cases, with a different rate of training samples. Furthermore, RP-GMMRF-S and LFDA-GMMMRF-S methods incorporating fuzzy segmentation-based function alongside the MRF model perform better than RP-GMMMRF and LFDA-GMMMRF, respectively. The RP-GMMMRF and LFDA-GMMMRF-S share similar trends in term of overall accuracy and LFDA-GMMMRF-S outperforms the RP-GMMMRF method for Pavia University and Indian Pines data sets, but in the case of "Salinas," RP-GMMMRF performs better than the LFDA-GMMMRF-S method. The RP-GMMMRF-S outperforms other methods, in all conditions, with a  different number of training and test samples and all three data sets. Figure 5 shows the reference data and the classification maps for LFDA-GMM, RP-GMM, LFDA-GMMMRF, LFDA-GMMMRF-S, RP-GMMMRF and RP-GMMMRF-S for Indian Pines data set. As can be seen from Figure 5, the corresponding GMMMRF-S and GMMMRF classification maps are comparable and contain more homogeneous regions when compared to the GMM classification map. Since the considered image contains mixed pixels, the advantage of using the segmentation map in MRF process versus not using it, is evident here.
As it can be seen from Table 1, the spatial-spectral approaches yielded higher classification accuracies compared to the pixel-wise method. For instance, the overall accuracies of RP-GMMMRF-S and LFDA-GMMMRF-S were improved by almost 5.2% and 7.5%, respectively, when compared to RP-GMMMRF and LFDA_GMMMRF. Moreover, the proposed GMMMRF-S technique demonstrated the best class-specific accuracies for most of the available classes. This amount of improvement in accuracy is because of the significant presence of mixed pixels in all available classes in the Indian Pines image that makes the outputs of pixel-wise classification methods noisy. However, by using the spatial information especially using the segmentation map in the MRF regularization, the noisy effects and mixed pixels was noticeably reduced. Table 2 contains the global classification accuracies for the LFDA-GMM, RP-GMM, RP-GMMMRF-S, RP-GMMMRF, LFDA-GMMMRF-S and LFDA-GMMMRF applied to the Pavia University dataset.
As can be seen from Table 2, using neighbourhood pixel information with MRF can improve classification accuracies. The RP-GMMMRF-S gives the best performance regarding classification accuracies when compared to other methods. RP-GMMMRF-S improved the classification accuracies of RP-GMM and RP-GMMMRF by almost 3.5% and 2.5%, respectively. In the same way, the LFDA-GMMMRF-S improved the classification accuracies by 4% and 2.7% in comparison with LFDA-GMM and LFDA-GMMMRF, respectively. It is worth mentioning that when the image consists of small structures, such as trees, buildings' roof or mixed pixels, the inclusion of the segmentation map information in the contextbased regularization improves the classification performances. For instance, in Indian Pines and Pavia University images, the differences between GMMMRF-S and GMMMRF is significant (see Figure 6). In overall, the improvements for Indian Pines and Pavia University were about 6.4% and 2.6% respectively in term of overall accuracy. Consequently, the advantage of the segmentationbased GMMMRF-S method for the classification of urban images and scenarios with mixed pixels is confirmed. Table 3 summarizes the global and class-specific classification accuracies for the Salinas data set. For the Salinas data set, which represents a vegetation classification scenario and contains large spatial structures, reference data do not comprise region edges. As a result, the advantages of the GMMMRF-S method versus the GMMMRF method are not evident. In this case, the differences between GMMMRF-S and GMMMRF are insignificant, and the improvement value is 0.5% in overall accuracy. Therefore, the proposed method efficient only in the case where there is no large misclassified region in the initial pixel-wise classification map.
The overall results of McNemar's test confirm that the differences between the accuracies of RP-GMMMRF-S and the above methods were statistically significant. For the Indian Pines and Salinas data sets Z>8:5 and Z>10 were obtained, respectively (Tables 1 and 3). In the case of "Pavia University", Z>3 was obtained when comparing with each of these methods except for the RP-GMMMRF, while the difference in accuracy between RP-GMMMRF-S and RP-GMMMRF was not significant (Table 2).
In order to compare the performances of the proposed method with other recently proposed classification techniques, MLR-MRF (Li, Bioucas-Dias, et al.,  2012), SVM-MRF (Tarabalka et al., 2010), edge-based SVMMRF-E method (Tarabalka et al., 2010) and LPNMF-GMM-MRF (W. Li et al., 2014) were also implemented. Classification accuracies of the proposed method, as well as those for current state-ofthe-art techniques are shown in Table 4 for the Pavia University, Salinas and Indian Pines dataset. All classification techniques are implemented using optimal parameters, and the training samples are randomly selected from the overall ground truth in a 10% rate.
To avoid any bias, the experiments are repeated 10 times, and we reported the average classification accuracy. As can be seen from Table 4, for the Pavia University and Salinas dataset, the proposed RP-GMMMRF-S and LFDA-GMMMRF-S significantly outperform other classification techniques, and RP-GMMMRF-S gives the highest accuracy. In the case of Indian Pines dataset, the LPNMF-GMM-MRF and RP-GMMMRF-S methods with a slight difference to each other performed better than the other methods.   Finally, we compared the computational cost of the classification methods using the same experimental data. Experiments are conducted using MATLAB on a 2.4-GHz processor with 3.6-GB of RAM. The processing times (in seconds) are 455 for RP-GMM, 470 for LFDA-GMM, 461 for RP-GMMMRF and 466 for RP-GMMMRF-S respectively. Therefore, the MRF regularization amounts to only an average of 9 s for the experimental data.

Conclusion
In this paper, a novel efficient method based on the combination of GMM and MRF was proposed for spectral-spatial classification of hyperspectral data. In this method, GMM was used as a pixel-wise classifier. In parallel, MRF was applied to extract spatial information. One important motivation of this paper, for the consideration of MRF regularization, was the usefulness of highlighting additional information from image objects. This information allows involving the shape characteristics, as well as the neighbourhood relationships. For that reason, a new fuzzy segmentation-based function was defined and combined with the MRF model. The efficiency of the proposed method was tested in both scenarios of with and without considering the segmentation map. Results confirm that the integration of MRF extracted contextual information with GMM, as a pixelwise classifier, can improve accuracies noticeably. In particular, when the large spatial structure exists in the scene, this approach can be beneficial. This attribute helps to mitigate the noisy effects of the pixel-wise classification. However, for the small structures, when the spatial information from neighbouring pixels is taken into account, these structures might likely be disappeared or merged with more prominent structures. The results of experiments showed that the proposed GMMMRF classification strategy outperformed other traditional techniques over a wide range of operating conditions. Moreover, GMMMRF-S was even superior to the GMMMRF. Further research on this method also needs to cover more contextual information and advanced strategies for robustification of the spatial information used in the method.