Review on graph learning for dimensionality reduction of hyperspectral image

Graph learning is an e ﬀ ective manner to analyze the intrinsic properties of data. It has been widely used in the ﬁ elds of dimensionality reduction and classi ﬁ cation for data. In this paper, we focus on the graph learning-based dimensionality reduction for a hyperspectral image. Firstly, we review the development of graph learning and its application in a hyperspectral image. Then, we mainly discuss several representative graph methods including two manifold learning methods, two sparse graph learning methods, and two hypergraph learning methods. For manifold learning, we analyze neighborhood preserving embedding and locality preserving projections which are two classic manifold learning methods and can be transformed into the form of a graph. For sparse graph, we introduce sparsity preserving graph embedding and sparse graph-based discriminant analysis which can adaptively reveal data structure to construct a graph. For hypergraph learning, we review binary hypergraph and discriminant hyper-Laplacian projection which can represent the high-order relationship of data.


Introduction
Hypersepctral imaging technique captures the image data with a large number of consecutive narrow bands spanning the visible-to-infrared spectrums Zhang, Li, and Du 2019). Owing to differences in reflectivity for different materials under different electromagnetic spectrums, this technique is very effective to discriminate the composition of material and has been widely used in the fields of agriculture, forestry, geology, oceanography, meteorology, hydrology, military, environmental monitoring (Cao et al. 2019;Liu et al. 2017;Pan et al. 2019;Yakovliev et al. 2019;Chen, Xiao, and Li 2016). In these applications, a fundamental task is to obtain the class types in Hyper-Spectral Image (HSI) (Amini et al. 2018). Based on the imaging procedure, HSI contains rich spectral and spatial features (Zhang and Du 2012;Natsagdorj et al. 2017). For a large number of narrow bands, they have a strong correlation that results in massive redundant information in HSI Mohanty, Happy, and Routray 2019a). Moreover, the pixels in HSI are mixed because of the impact of imaging condition. In addition, the hundreds of bands make HSI data possess high dimensionality which will cause the Hughes phenomenon with the traditional classification methods Mohanty, Happy, and Routray 2019b). For the Hughes phenomenon, the classification results will improve and then decrease with the increasing of dimensionality under limited samples (Hang and Liu 2018;Song et al. 2019;Zhao et al. 2017). Although HSI generates better discriminant performance for similar materials than the traditional image, it also faces more challenges to process HSI with the above problems.
To reduce the redundant information and enhance the discriminant power of features, an effective manner is to transform the original features into a new space that has lower dimensionality and stronger discriminant power (Deng et al. 2018;Li, Wang, and Cheng 2019;Huang et al. 2017;Frazier, Wang, and Chen 2014). With the transformed features, we can obtain better classification results. Principal Component Analysis (PCA) is a classic feature transformation method, which finds a set of orthogonal space to maximize the variance in each coordinate axis (Licciardi and Chanussot 2018;Liu, Singleton, and Arribas-Bel 2019). PCA has been widely used for HSI analysis to obtain the principal component features. Linear Discriminant Analysis (LDA) is a classic supervised method to obtain strong discriminant features of HSI (Peng and Luo 2016). LDA utilizes the priori class label to maximize the between-class variance and minimize the within-class variance which will separate the inter-class samples and compact the intra-class samples. Based on LDA, Nonparametric Weighted Feature Extraction (NWFE) is developed for dimensionality reduction in HSI classification task (Kuo, Li, and Yang 2009). NWFE sets different weights on each sample to calculate the weighted means and construct nonparametric between-class and within-class scatter matrices. To enhance antinoise ability, Maximum Noise Fraction (MNF) was proposed by maximizing the signal-noise-ratio on the basis of the signal variance and the noise variance (Wu et al. 2014). In addition, Maximum Margin Criterion (MMC) was proposed with the marginal criteria to enhance the differences between different classes of HSI (Datta, Ghosh, and Ghosh 2017). These methods are based on the statistical properties of HSI, which neglects the intrinsic geometric structures.
To reveal the intrinsic structures of data, manifold learning was designed to discover the geometric properties of data. Three classic methods were generated including Isometric Mapping (ISOMAP) (Tenenbaum, de Silva, and Langford 2000), locally Linear Embedding (LLE) (Roweis and Saul 2000), and Laplacian Eigenmaps (LE) (Belkin and Niyogi 2003). In ISOMAP, the geodesic distance was used to represent the geometric relationships and it preserves the geodesic distances between samples in low-dimensional space. In LLE, it uses the local linear reconstruction to reveal the intrinsic structures of data and maintains the reconstruction relationships in low-dimensional space. In LE, it applies the local neighbor information to construct a Laplacian graph and does not change the structure of this graph in low-dimensional space. However, these manifold learning algorithms belong to a nonlinear projection which does not exist an explicit projection matrix to map the out-of-samples into the low-dimensional space. To address the out-of-sample problem, two linear projection methods were proposed on the basis of LLE and LE, which are called Neighborhood Preserving Embedding (NPE) (Huang and Huang 2014) and Locality Preserving Projections (LPP) (Zhai et al. 2016). NPE and LPP have the same theoretical basis as LLE and NPE, while they can obtain an explicit projection matrix to directly generate the low-dimensional features of out-of-sample. Based on the manifold learning methods, researchers have discovered that HSI exists manifold structure and introduced these methods into dimensionality reduction of HSI, which can reveal the intrinsic manifold of HSI.
On the basis of the above methods, a unified framework was proposed to understand these methods. This framework can be represented by a graph (Yan et al. 2007). In this framework, PCA, LDA, LLE, LE, NPE, and LLP can be redefined as different graph learning methods. Their main differences are the construction of similarity matrix and constraint matrix. These graph learning methods are very effective to reveal the intrinsic similar relationships of data which can reflect the homogeneity of data. They have been widely used to learn the dimensional features of HSI. But HSI contains complex intrinsic structures, the traditional graph learning methods do not effectively represent the structure relationships of HSI generally. To better represent the intrinsic properties of HSI, some advanced methods were developed based on this graph framework in recent years. Marginal Fisher Analysis (MFA) is a newly developed method under the graph framework (Huang et al. 2019). MFA constructs two graphs to represent the intrinsic properties of data. In the two graphs, one is a similar graph to compact the homogeneity of intraclass samples, and the other one is a penalty graph to separate the heterogeneity of interclass samples. In addition, some other methods have Local Fisher Discriminant Analysis (LFDA) (Wang, Ruan, and An 2016), Regularized Local Discriminant Embedding (RLDE) (Zhou, Peng, and Chen 2015), and Local Geometric Structure Fisher Analysis (LGSFA) (Luo et al. 2017a), which are both based on the graph learning framework. LFDA combines the ideas of LDA and LPP, and it separates the between-class samples as much as possible and simultaneously maintains the within-class local information. RLDE is based on MFA to add two regularized terms, which can preserve the data diversity and address the singularity with limited training samples.
LGSFA considers the neighbor information of neighbor samples which is a benefit to obtain the homogeneity of data. For the advanced graph learning methods, they use the fixed neighbors to represent the local structures of data. However, a fixed neighbors may be unreasonable to represent the local structure of each sample.
To adaptively reveal the intrinsic structures of data, sparse representation was introduced into graph learning. Sparse representation is to linearly reconstruct a sample with an over-complete dictionary (Peng, Li, and Tang 2019). For the reconstructed coefficients, most of them are zeros and only a few of them are nonzeros which are termed sparse coefficients. According to the sparse coefficients, the relationship of coefficients can be used to adaptively represent the intrinsic properties of data (Wright et al. 2009). Then, a sparse graph can be constructed based on the sparse relationships of data. The sparse graph is robust to data noise and possesses datum-adaptive structures for each sample. Based on the sparse graph, many sparse graph embedding methods have been proposed to extract the low-dimensional features of data. Sparsity Preserving Projection (SPP) is a classic sparse dimensionality reduction method, which preserves the sparse properties of data in lowdimensional space (Qiao, Chen, and Tan 2010). In some applications, SPP is also termed as Sparsity Preserving Graph Embedding (SPGE) (Ly, Du, and Fowler 2014) and Sparse Neighborhood Preserving Embedding (Sparse NPE) (Cheng et al. 2010). In addition, Sparsity Preserving Analysis (SPA) was proposed to represent the relationships of data (Luo et al. 2017b). SPA utilizes the sparse coefficients to construct a sparse graph and maintains the structures of a sparse graph in a new space. The two sparse methods are both unsupervised which do not use any priori information in the dimensionality reduction models. To enhance the discriminant performance of models, some supervised graph methods have been developed based on sparse coefficients, including Discriminant Sparsity Neighborhood Preserving Embedding (DSNPE) (Lu, Jin, and Zou 2012), Discriminative Learning by Sparse Representation Projections (DLSP) (Zang and Zhang 2011), Sparse Graph-based Discriminant Analysis (SGDA) (Ly, Du, and Fowler 2014) and Sparse Discriminant Embedding (SDE) (Huang and Yang 2015). For DSNPE, it incorporates sparse graph and MMC, which represents the within-neighboring information by the sparse reconstruction weights of the samples from the same class and the between-neighboring information by the sparse reconstruction weights of the neighboring samples from different classes. For DLSP, it combines local interclass geometrical structure and sparsity property, which can make the local within-class compactness. For SGDA, it uses the class label information and sparse coefficients to construct an interclass reconstruction model and an intraclass reconstruction model, which can enhance the discriminant power of the model for HSI. For SDE, it utilizes the sparse reconstruction information and the interclass information to enhance the inter-manifold separability of data. For these graph learning methods, they just focus on the binary relationships of data that only two samples are connected with an edge. However, HSI contains complex high-order relationships. Therefore, the traditional graph learning methods cannot effectively reveal the intrinsic structures of HSI.
To represent the complex high-order structures, hypergraph was introduced into machine learning and many hypergraph learning methods were generated to better reveal intrinsic structures of data (Ji et al. 2014). Binary Hypergraph (BH) is a classic method for the feature representation of HSI which uses k nearest neighbors to construct the hypergraph model (Yuan and Tang 2015). To utilize the class label information, a supervised hypergraph was proposed to improve the compactness of low-dimensional features, which is termed as Discriminant Hyper-Laplacian Projection (DHLP) (Huang et al. 2016). With the spatial information, a new hypergraph learning method was generated to analyze the spatial-spectral structures of HSI, called Hyper-Graph embedding-based Spatial-Spectral joint features (SSHG) ). However, SSHG just uses the stack manner to represent the spatial-spectral features, which will not enough extract the intrinsic features. Therefore, the Spatial-Spectral Hyper-Graph Discriminant Analysis (SSHGDA) method was proposed to better reveal the spatial-spectral features and enhance the discriminating power of low-dimensional features (Luo et al. 2019). In addition, the class label is very difficult to obtain in real application. Based on this, the Semisupervised Hyper-Graph Embedding (SHGE) method was developed to construct a feature learning model with the label and unlabel samples .
To understand the development of graph learning method, we review several classic and show some experiments to analyze these methods in this paper. In Section 2, we introduce two basic theories including graph and hypergraph. Section 3 reviews several graph learning methods. Some experiments are shown in Section 4. Finally, Section 5 provides some conclusions.

Basic theory
In this paper, we represent a pixel of HSI as a vector x i 2 < D (i ¼ 1; 2; Á Á Á ; n), where n and D are the number of pixels and bands, respectively. ,ðx i Þ 2 f1; 2; Á Á Á ; cg denotes the class label of x i , where c is the class number of land-cover. A HSI data set is denoted as X ¼ ½x 1 ; x 2 ; Á Á Á ; x n 2 < DÂn .
Y ¼ ½y 1 ; y 2 ; Á Á Á ; y n 2 < dÂn represents the low-dimensional features of X, where d is the embedding dimension. With a projection matrix V 2 < DÂd , the low-dimension features can be represented as Y ¼ V T X.

Graph learning
Graph is used to reflect the relationship of two samples, which can represent some of the statistical or geometrical properties of data. For a graph G, it can be denoted as G ¼ fX; E; Wg that is an undirected graph, where X denotes the vertices, E denotes the edges, and W ¼ ½w ij n i;j¼1 represents the weight matrix of edges. To construct a graph, the neighbors are connected by edges and a weight is given to the corresponding edges. If vertices i and j are similar, we should connect an edge between vertices i and j, and a similar weight is also defined for the edge.
In low-dimensional space, we should preserve the structures of graph. Therefore, the objective function can be defined as where h is a constant, H is a constraint matrix to avoid the trivial solution, w nj Þ is a diagonal matrix. In general, H can be an identity matrix for scale normalization or a Laplacian matrix of penalty graph to restrain some unwanted similarity.

Hypergraph learning
For hypergraph and graph, the difference is the number of vertex on each edge. In graph, each edge is only composed of two vertices, while each edge of hypergraph has more than two vertices. For a hypergraph, it is defined as G H ¼ fX; E H ; W H g, where X is the vertices, E H is the hyperedges, W H is the weights, and each hyperedge e i 2 E H has a weight w e i . H ¼ ½h ve ij i;j 2 < jV H jÂjE H j is an incidence matrix to reveal the relationship between vertex v i and hyperedge e j , and h ve ij is For the hypergraph, the degrees of vertex v i and hyperedge e i are used to represent the properties of hypergraph, which are the summation of weights via v i and the number of vertices on e i . Therefore, the degrees of v i and e j are In low-dimensional space, the structures of hyperedge should be preserved to reveal the high-order properties of HSI, and the embedding function is where L H ¼ D v À HW H ðD e Þ À1 ðHÞ T is the hyper-Laplacian matrix. W H ¼ diagð½w e 1 ; w e 2 ; Á Á Á ; w e n Þ is the weight matrix. D e ¼ diagð½d e 1 ; d e 2 ; Á Á Á ; d e n Þ and are the diagonal matrices to denote the degrees of vertex and hyperedge. Figure 1 explains the difference between graph and hypergraph. In Figure 1(a), each edge only has two vertices for the graph (i.e. v 1 and v 2 ). In Figure  1(b), this hypergraph contains seven vertices and four hyperedges, and each hyperedge contains more than two vertices (i.e. v 1 , v 2 , and v 4 belonging to e 1 ).

Representative graph learning methods
In this section, we review several graph learning methods including LPP, NPE, SPGE, SGDA, BH, and DHLP.

Locality preserving projections
LPP is used to learn the intrinsic manifold structures of high-dimensional data. In LPP, a similar graph is designed with K nearest neighbors. Then, it preserves the neighbor structures in low-dimensional space, that is to say, the graph structure is maintained in feature embedding space.
According to the neighbor relationship of each sample, a similar weight between two samples is defined by the thermonuclear function (t is thermonuclear parameter).
where w ij is the weight between x i and x j . The weight matrix is W ¼ ½w ij i;j .
In low-dimensional space, the neighbor structure of each sample is preserved and the similar samples should be compacted as much as possible. The minimization objective function is defined as This optimization function can be equal to the following generalized eigenvalue problem. Then, the projection matrix is composed of the d smallest eigenvalues corresponding eigenvectors.

Neighborhood preserving embedding
NPE assumes that data possesses local linear structure. In low-dimensional space, the local linear structure should be maintained to reveal the intrinsic manifold properties of data. In LPP, it uses K nearest neighbors to represent the local information.
According to the local structure, the reconstruction error is used to calculate the reconstruction weights as follows: where w ij is the weight between x i and x j , To solve this function, we can obtain the weight as i is the j-th neighbor of x i . In low-dimensional space, the weights are preserved to extract the embedding features, and the objective function is defined as where M ¼ ðI À WÞ T ðI À WÞ.
For the optimization function, it can be transformed to a generalized eigenvalue problem.
According to the d smallest eigenvalues corresponding eigenvectors. The projection matrix can be denoted as 3.3. Sparsity preserving graph embedding SPGE utilizes sparse coefficients to construct a sparse graph that reveals the sparse properties of data. It preserves the sparse graph structures in low-dimensional space. SPGE inherits the natural discriminating power of sparse representation to represent the intrinsic relationships of data. For a sample x i , it can be represented by a dictionary X i where X i ¼ ½x 1 ; Á Á Á ; x iÀ1 ; x iþ1 ; Á Á Á ; x n , and the representation coefficients are very sparse. The optimization function is denoted as where ε > 0 is error tolerance, s i ¼ ½s i;1 ; s i;2 ; Á Á Á ; s i;iÀ1 ; s i;iþ1 ; Á Á Á ; s i;N is the sparse coefficients, s i k k 1 denotes the , 1 -norm of s i that is used to control the sparsity of s i .
According to the sparse coefficients s i , a graph G ¼ fX; W s g is constructed, where W s is the similar matrix to reflect the similarity between two vertices, and it is defined as Once the sparse graph is obtained, a dimensionality reduction function can be constructed to minimize the low-dimensional reconstruction error for the sparse representation, an optimization function is formulated as where L s ¼ ðI À W s Þ T ðI À W s Þ, V is the transformation matrix.
To avoid degenerate solutions, a constraint This optimization problem can be represented as a generalized eigenvalue problem.
With the d smallest eigenvalues corresponding eigenvectors, the projection matrix can be obtained as

Sparse graph-based discriminant analysis
SPGE is an unsupervised method that restrains the discriminant power of feature representation. To make full use of the priori information, a supervised method was formulated with the class label information based on SPGE, termed sparse graph based discriminate analysis (SGDA). SGDA utilizes the sparse coefficients and the class label information to define a with-class reconstruction and a between-class reconstruction, which can improve the intraclass compactness and interclass separability. For a sample x m from the z-th class, a selector function is defined as According to the sparse coefficients from (13), a similarity matrix W s can be obtained by (14). Then, a within-class reconstruction error is defined as If we define an intraclass similarity matrix W 0 s as then the objection can be simplified as At the same time, for the interclass samples, a betweenclass reconstruction error is denoted as which can transformed as In low-dimensional space, an optimization function is constructed to minimize the within-class reconstruction error while maximizing the between-class reconstruction error, i.e. where For the optimization problem, it can be represented as a generalized eigenvalue problem.
Then, we can obtain the d smallest eigenvalues corresponding eigenvectors to compose the projection matrix as

Binary hypergraph
BH can reveal the high-order relationship of data, which is more effective for real applications. BH uses K nearest neighbors to construct the hypergraph in general. For a hyperedge e j , it is composed of a centroid vertex x j and its K nearest neighbor; thus, each hyperedge has K + 1 pixels. According to the hypergraph, the incidence matrix H can be represented as The hyperedge weight w i is defined as the following.
where t is a parameter. According to the incidence matrix and the hyperedge weights, the degrees of each vertex and each hyperedge are represented as With the weights, the vertex degree, and the hyperedge degree, a feature embedding objective function is designed as where L BH ¼ D v À HW BH ðD e Þ À1 ðHÞ T is the hyper-Laplacian matrix. W BH ¼ diagð½w 1 ; w 2 ; Á Á Á ; w n Þ is the weight matrix. D v ¼ diagð½d 1 ; d 2 ; Á Á Á ; d n Þ and D e ¼ diagð½δ 1 ; δ 2 ; Á Á Á ; δ n Þ are the diagonal matrices to denote the degrees of vertex and hyperedge.
To obtain a unique solution, a constrain trðV T XD v X T VÞ ¼ 1 is added. The optimization solution is transformed to a generalized eigenvalue problem.
The projection matrix can be obtained by the d smallest eigenvalues corresponding eigenvectors.

Discriminant hyper-Laplacian projection
To enhance the discriminating power of BH, a supervised method is proposed with the class label information. According to hypergraph, a normalization hypergraph learning model is constructed as where L NH ¼ I À ðD v Þ À1=2 HW BH ðD e Þ À1 ðHÞ T ðD v Þ À1=2 is the hyper-Laplacian matrix.
To improve the difference of intraclass samples, a maximization objective function is defined as where U ¼ ½u 1 ; u 2 ; Á Á Á ; u c is the mean matrix. u i is the i-th class mean. L u ¼ D u À B is the Laplacian matrix of classes. D u ¼ diagð½ P j b 1j ; P j b 2j ; Á Á Á ; P j b cj Þ is a diagonal matrix. The mean similar weight matrix B ¼ ½b ij i;j and b ij is defined as Therefore, the objective function of DHLP is represented as min V trðV T XL NH X T VÞ trðV T UL u U T VÞ For the optimization function, it can be equal to a generalized eigenvalue problem.
According to the d smallest eigenvalues corresponding eigenvectors, we can get

Conclusions
Hyperspectral image possesses hundreds of narrow bands, which exists massive redundant information and Hughes phenomenon for the traditional classification methods. To address the challenges, graph learning is an effective technique to represent the intrinsic properties of data, which can be used to reduce data redundancy and dimensionality. In this paper, we review several graph learning methods for dimensionality reduction of hyperspectral image. The reviewed methods include two classic manifold methods (i.e. LPP and NPE, and they can be transformed into a graph framework), two sparse graph methods (i.e. SPGE and SGDA, and they can adaptively reveal the intrinsic relationship of data), and two hypergraph learning methods (i.e. BH and DHLP, and they can represent the high-order structures of data). According to the review, we will better understand the graph learning method and develop some new methods. In this paper, we just review several basic algorithms based on the spectral information. Therefore, we can use the spatial-spectral information to construct spatial-spectral graph learning in the future.