A Novel Hyperspectral Unmixing Method based on Least Squares Twin Support Vector Machines

ABSTRACT In hyperspectral images, endmembers characterizing one class of ground object may vary due to illumination, weathering, slight variations of the materials. This phenomenon is called intra-class endmember variability which is one of the important factors affecting the performance of unmixing. However, intra-class endmember variability is often ignored in unmixing, which causes a decrease in the accuracy of unmixing. How to deal with intra-class endmember variability is the focus. To address this problem, we propose a novel hyperspectral unmixing method based on Least Squares Twin Support Vector Machines (ULSTWSVM). ULSTWSVM uses multiple training samples (endmembers) to model a pure class, which takes intra-class endmember variability into account in unmixing. At the same time, ULSTWSVM obtains abundances by calculating the distances from the mixed pixels to the classification hyperplanes, which is simple and efficient. ULSTWSVM mainly comprises three steps: (1) to obtain the two non-parallel classification hyperplanes by solving two quadratic programming problems (QPPs) in least squares sense, (2) to calculate distances from the mixed pixels to classification hyperplanes, and (3) to normalize the distances and convert them to abundances. Experimental results on both synthetic and real hyperspectral data show that the proposed method outperforms the methods used for comparison.


Introduction
The hyperspectral remote sensing technology has been widely used in mineral exploration (Oskouei & Babakan, 2016), soil organic carbon mapping (Bayer et al., 2016) and vegetation monitoring (Stagakis et al., 2016). The spatial resolution of the hypersepectral images, however, is often low. Then, a pixel may contain more than one class of ground objects, and it is called a mixed pixel. Hyperspectral Unmixing (HU) techniques have been developed to address this problem .
Hyperspectral unmixing consists of extracting the spectral signatures (endmembers) of the pure materials existing in the scene, and estimating their proportions (abundances) at each pixel (Uezato et al., 2018). Some endmember extraction methods based on geometry have been developed since 1990's. These methods assume that all the pixels belong to the same geometrical simplex. Vertices of the simplex are the corresponding endmembers. The representative algorithms include the pixel purity index (PPI) algorithm (Boardman, 2013), the vertex component analysis (VCA) algorithm (Nascimento & Bioucas-Dias, 2005), the Minimum Volume Simplex Analysis (MVSA) (J. Li & Bioucas-Dias, 2008), and the simplex identification via split augmented lagrangian algorithm (SISAL) (J.M. Bioucas-Dias, 2009). The methods mentioned above fail to cope with the effect of intra-class endmember variability, for they use a single endmember to represent a pure class.
Statistical unmixing methods provide a natural framework for representing variability in endmembers . The core idea of these algorithms is to deduce a posteriori probability density from the prior distribution of endmembers and abundances (Zhou et al., 2018). Then, different Bayesian estimation approaches are used to get the estimation of endmembers and abundances . Halimi et al. (2015) introduced a generalization of the normal compositional model by considering Gaussian variability for endmembers as well as an additive Gaussian noise for modelling errors. Bhatt et al. (2014) addressed a MAP solution with the data-dependent Huber-Markov random field. Li and Yin. (2013) advanced an Independent Component Analysis (ICA) based algorithm with variational Bayesian (VB) methods. Statistical unmixing algorithms usually come with a price: higher computational complexity, and the statistical distribution defined as a priori knowledge by the user may not conform to the nature of endmembers.
In recent years, sparse unmixing algorithms have been developed. They assume that the mixed pixels can be represented as a linear combination of multiple endmembers from known and available spectral libraries (Zhang et al., 2018). Unmixing in this way seeks the optimal subset of the spectral library that can best model each mixed pixel in the scene (Iordache et al., 2012). Ma et al. (2017) adopt the l 2;1 norm loss function to handle noises and outliers. Then, sparse unmixing can be solved by the alternative direction method of multipliers (ADMM). Chen et al. (2018) took the combination of the spatial correlation and the global row sparsity into consideration and proposed an unmixing method based on collaborative sparsity and total variation. Zheng et al. (2019) synthesized spatial smoothness and spectral collaborative sparsity in unmixing. Sparse unmixing algorithms are affected heavily on the correctness of the model assumption used. Furthermore, considering different acquisition conditions between the spectral libraries and the mixed pixels, the unmixing result may be poor without accurate atmospheric correction algorithms.
Nonnegative Matrix Factorization (NMF) (Lee & Seung, 2000) has developed rapidly in the field of HU since it was proposed. Tong et al. (2017) proposed a novel region-based NMF (R-NMF) algorithm which aims at enforcing consistent abundances within homogeneous regions while discriminating the contribution from endmembers across regions. Fang et al.
(2019) added sparsity and geometric structure constraints to the multilayer NMF structure and proposed a double-constrained multilayer NMF method. Peng and Tan (2020) adopted the Hessian graph regularizer in abundance matrix to mine the intrinsic geometry of each NMF. The problem of the unmixing methods based on NMF is that they are greatly affected by the initial value of the iteration, less robust to noise, and prone to local minimum.
Considering the intra-class endmembers variability, multi-endmember unmixing methods have become a hot topic of research. Roberts et al. (1998) proposed Multiple Endmember Spectral Mixture Analysis (MESMA), which allows the types and number of endmembers to vary on a per pixel basis. All endmember combinations are tested, and according to the minimum reconstruction error criterion the optimal endmember combinations are found while abundances are obtained. MESMA can overcome the influence of endmember variability on unmixing, but it needs large computation time. Yang et al. (2015) brought forward an unmixing method based on Relevance Vector Machine (URVM), which is a sparse model in Bayesian framework. In this algorithm, RVM learning is transformed into maximum marginal likelihood function estimation, then the probability value of each sample's corresponding class attribution is obtained, and is transformed into abundance. Because of the sparse model, unmixing time of URVM is shorter, while training time is longer. At the same time, because there is no obvious corresponding relationship between probability value and abundance, the unmixing error is larger. Wang and Jia (2009) proposed an unmixing method based on support vector machine (USVM). In USVM, the mixed pixels are between the boundary hyperplanes. Then, the abundances are obtained by calculating the distances from the mixed pixels to boundary hyperplanes. USVM uses distances to calculate abundances, which is simple and efficient. The disadvantages of USVM are as follows: (1) USVM is applicable to the data with parallel class boundary distribution; (2) When intra-class endmember variability is high, the accuracy of USVM decreases. Based on USVM, Wang et al. (2013) proposed an unmixing method based on Least Squares Support Vector Machine (ULSSVM). In ULSSVM, equality constraints replace inequality constraints in USVM, therefore, the solutions can be obtained directly from closed-form operation. As a result, the computational efficiency of ULSSVM is higher than that of USVM. However, ULSSVM has the same disadvantages as USVM.
Inspired by USVM and ULSSVM, we propose an unmixing method based on Least Squares Twin Support Vector Machines (ULSTWSVM). In ULSTWSVM, the abundances are obtained by calculating the distances from the mixed pixels to classification hyperplanes. The main difference between ULSTWSVM and USVM and ULSSVM is: ULSTWSVM uses the classification hyperplanes to calculate the distances to obtain the abundances, while USVM and ULSSVM use the boundary hyperplanes to calculate the distances to obtain the abundances. Compared with the boundary hyperplanes, the classification hyperplanes pass through the centroid of the class, which can better deal with intra-class endmember variability.
The main contributions of this paper lie in: (1)We extend LSTWSVM to hyperspectral unmixing, and propose a novel supervised unmixing method named ULSTWSVM. In ULSTWSVM, each pure class is modelled by a set of training samples (endmembers) instead of a single endmember to take the intra-class endmember variability into account.
(2)In the framework of ULSTWSVM, a distancebased abundance calculation method is proposed. First, the training samples are used to obtain two classification hyperplanes, and then the distances from the mixed pixel to the classification hyperplanes are calculated and normalized to obtain the abundances. ULSTWSVM provides higher unmixing accuracy with a lower computational cost.
(3)The nonlinear kernel function is introduced to ULSTWSVM and the method is named UNLLSTWSVM (the unmixing method based on nonlinear LSTWSVM). UNLLSTWSVM is valuable for unmixing when it is extended to cope with class data which have complicated class boundaries in the original feature space.
The remainder of this paper is organized, as follows. Section 2 introduces basic theory of LSTWSVM. Section 3 presents ULSTWSVM method. Section 4 reports the experimental results on synthetic data and real hyperspectral images. In Section 5, we further discuss the experimental results. Finally, the conclusions are drawn in Section 6.

Basic theory of LSTWSVM
TWSVM is a binary classifier. In contrast with conventional SVM solving a single larger-sized Quadratic Programming Problem (QPP) to obtain a single hyperplane, TWSVM generates two nonparallel hyperplanes by solving two smaller-sized QPPs (Kumar & Gopal, 2009). The geometric interpretation of TWSVM is shown in Figure 1, square data points belong to class +1, circular data points belong to class-1. H 1 represents the classification hyperplane of class+1, and H 2 represents the classification hyperplane of class-1. A new data point (red pentagon point) is classified as class +1 if it is close to H 1 or class −1 if it is close to H 2 . In the n-dimensional real space R n , let matrix A 2 R n 1 �n represents the training samples set in class+1 and matrix B 2 R n 2 �n represents the training samples set in class-1. n 1 and n 2 represent the number of class +1 training samples and class-1 training samples, respectively. The two non-parallel hyperplanes equations of TWSVM are shown below: where ω ð1Þ ; ω ð2Þ 2 R n , b ð1Þ ; b ð2Þ 2 R. Solving the following dual QPPs formulas (1) and (2), TWSVM classifier can be obtained. (1) where e 1 ¼ ð1; . . . ; 1Þ T 2 R n 1 , e 2 ¼ ð1; . . . ; 1Þ T 2 R n 2 , c 1 ; c 2 > 0. We modify formula (1) to least squares sense as formula (3) to get LSTWSVM, in which inequality constraints are replaced by equality constraints.
Substituting equality constraints into the objective function, QPP formula (3) becomes: Setting the gradient of formula (4) with respect to ω ð1Þ and b ð1Þ to zero, we get: where e ¼ ð1; . . . ; 1Þ T 2 R n . Arranging formulas (5) and (6) in matrix form and solving for ω ð1Þ and b ð1Þ , we get: In the same way, the solution of QPP formula (2) is as follows: Once ω ð1Þ , b ð1Þ , ω ð2Þ , b ð2Þ are obtained, the two nonparallel hyperplanes are determined. The classification function is designed as: In the following section, we introduce the kernel function to LSTWSVM. The two non-parallel hyperplanes equations are shown below: and K is any arbitrary kernel. The original QPPs of TWSVM are modified to 2 norm form of slack variables and inequality constraints replaced by equality constraints as show in formula (7), (8).

Min
By substituting equality constraints into the objective function, formulas (7) and (8) become: The Solutions of QPPS formulas (9) and (10) can be acquired as: are obtained, the two hyperplanes are determined. The classification function is designed as:

Proposed approach
For convenience, we rearrange the k rows, l columns, and n bands of a hyperspectral data cube into n rows, h columns where h is the number of pixels in the image, and x i is a column vector representing a pixel.
Ignoring the random noise, the linear spectral mixture model can be described as: When p > 2, the classification problem is not binary, and becomes a multi-classification problem. The convenient "one against rest" strategy is used in this study.
As shown in Figure 2, when SVM is used for unmixing (USVM), the margins between H j and H j are recognized as the mixed region. H j and H j are parallel and formed by the pixels on the class boundaries (support vectors), called boundary hyperplanes. The two regions above H j or below H j are related to the pure pixels (training samples). The abundance r ji of the mixed pixel x i can be obtained by formula (13).
where D j ðx i Þ represents the distance from x i to H j , D j ðx i Þ represents the distance from x i to H j .
Inspired by USVM, we propose unmixing method based on LSTWSVM, named ULSTWSVM. As shown in Figure 3, H j and H j are nonparallel and pass through the centroids of classj and classj, respectively. H j and H j are called classification hyperplanes. The abundance r ji of the mixed pixel x i can be obtained by formula (13), where D j ðx i Þ represents the distance from x i to classification hyperplane H j , D j ðx i Þ represents the distance from x i to classification hyperplane H j . When r ji is obtained by formula (13), x i can be expressed as:

Multiple endmembers replacing single endmember in ULSTWSVM
Some traditional unmixing methods may get poor results as they ignore intra-class endmembers  variability. ULSTWSVM overcomes the shortcoming by using multiple pure pixels (training samples) to model a class. But formula (14) introduces rigid expression of single endmember again, so the single endmember should be replaced by multiple endmembers (training samples). For binary class unmixing problems, two pure pixels corresponding to two classes are mixed with known proportions to generate a mixed pixel. In this manner, the two pure pixels and the corresponding fractional abundances related to each mixed pixel are available (Wang et al., 2013). In this case, each mixed pixel x i has relevant specific endmembers combination For multiple class unmixing problems, let the endmembers combination corresponding to x i be r ji m i j . According to "one against rest" strategy, multiple classes unmixing problems can be solved by dividing them into several binary classes unmixing ones. For the kth ð1 � k � pÞ sub-unmixing model, m i k is the first corresponding endmember and its abundance is r ki . The other endmember is written as m i k with abundance

Application of kernel function
We call the function of computing the inner product of two vectors in the space after implicit mapping as kernel function. In linear mixing model, the mixed pixel is linear combination of endmembers weighted by the corresponding abundance fractions. Therefore, we use formula (15) as linear kernel function in ULSTWSVM. That is named ULLSTWSVM (the unmixing algorithm based on linear LSTWSVM).
Considering scattering, refraction, et al, the mixed pixel in real hyperspectral imaging contains many nonlinear factors. Therefore, we use formula (16) as nonlinear kernel function in ULSTWSVM. That is named UNLLSTWSVM (the unmixing algorithm based on nonlinear LSTWSVM).
After introducing the kernel function, formula (13) can be rewritten as:

Algorithm summary
We summarize the main steps of ULSTWSVM as follows: Input: x h �, whereX 2 R n�h , n is the number of bands, and h is the number of mixed pixels. Each column vector x i represents a mixed pixel (i ¼ 1; 2 � � � h).
Step 1: Evenly selecting pixels from each class to form training samples set S, , m 1 is the number of endmembers in A. According to "one against rest" strategy, let B ¼ ½S 1 ; S 2 � � � S jÀ 1; S jþ1; � � � S p �, which is taken as endmembers class j training samples set. Then B 2 R n�m 2 , and m 2 is the number of endmembers in B.
Step 5: Applying formula (17) to calculate the abundance r ji .
Step 6: According to sum-to-one constraint, elements in each column in R ¼ ½r ji � are normalized to [0, 1] to get the abundance matrix R.
Next, we study the complexity of ULSTWSVM. Assuming the inner product operation of two phasors is the basic operation Oð1Þ. The computational cost of Step 2 is Oðm 2 ðm 1 þ m 2 Þ þ m 1 ðm 1 þ m 2 ÞÞ. According to formulas (11) and (12), Step 3 and Step 4 each performed one matrix inverse operation. Compared with computational cost of Step 2, the computational cost of Step 3 and Step 4 is negligible. The computational cost of Repeat2 is Oððm 1 þ m 2 ÞhÞ. Since ðm 1 þ m 2 Þ � h, the computational cost of per-iteration in Repeat1 is Oððm 1 þ m 2 Þðm 1 þ m 2 þ hÞÞ ¼ Oððm 1 þ m 2 ÞhÞ. Reviewing the above process, we conclude that the computational complexity of ULSTWSVM is dominated by Oððm 1 þ m 2 ÞhpÞ, where ðm 1 þ m 2 Þis the total number of training samples.

Hyperspectral remote sensing datasets
Four hyperspectral data sets were used in the experiments. The first data set is Indian Pines Dataset, which was gathered by the AVIRIS sensor. The hyperspectral image has a spatial size of 145 × 145 pixels and contains 16 classes of ground objects. The actual number of spectral bands used in the experiments is 200. The second data set is the U.S. Geological Survey (USGS) spectral library, which contains 498 spectra, each with 224 bands. The third data set is Salinas-A, which was collected by the 224-band AVIRIS sensor over Salinas Valley, California. Salinas-A comprises 86 × 83 pixels and includes six classes of ground objects. The actual number of experimental bands is 204. The last data set is Pavia University, which was acquired by the ROSIS sensor. An area with a size of 144 × 144 pixels is used for test, which mainly covers eight classes of ground objects. The actual number of spectral bands used in the experiments is 103.

Unmixing performance evaluation method
The performance of spectral unmixing results is usually evaluated by RMSE (Root Mean Square Error). The RMSE of the abundances corresponding to class j is defined as following: RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 h where r is real abundance, r is estimated abundance, and h is the total number of pixels. The Mean RMSE for all classes is defined as following: Mean RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where p is the number of class. In addition, we use the reconstruction error (RE) to evaluate the performance of unmixing.

RE ¼
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi 1 hn where x i and x î are the actual and reconstructed spectrum of theith pixel.

Experiment 1
The purpose of the first experiment is to compare the traditional unmixing algorithms and ULLSTWSVM (the unmixing algorithm based on linear LSTWSVM) by using synthetic data. Some traditional unmixing algorithms include PPI/FCLS, VCA/FCLS, MVSA/FCLS, SISAL/FCLS. These algorithms firstly adopt PPI, VCA, MVSA, SISAL in the endmembers extracting, then use Fully Constrained Least Squares (FCLS) method to unmix the mixed pixels. In addition, NMF algorithm (initial endmembers are extracted by VCA) and ICA algorithm are included in the comparison.
The abundance maps are shown in Figure 4. The abundance maps of VCA/FCLS and ICA have obvious deviation from the real one, which means the performance of VCA/FCLS and ICA is poor. The abundance maps of PPI/FCLS, MVSA/FCLS, SISAL/FCLS, NMF are similar to the real one, the unmixing results of these algorithms are ideal. The abundance map of ULLSTWSVM has close similarity with the real one, especially in the light-coloured area in the centre of the abundance map. Compared with the other algorithms, ULLSTWSVM produces more accurate abundance estimation. Table 1  The computational time is also listed in Table 1. PPI/FCLS, VCA/FCLS, MVSA/FCLS, SISAL/FCLS, take more than 2s, about 40 times the time required by ULLSTWSV at least. This is because these algorithms are more time consuming to extract the endmembers first and then obtain the abundances. NMF takes 2.4180s, a time much longer than 0.0556s of ULLSTWSVM. This is due to the complex iteration of NMF. Except ULLSTWSVM, ICA has the shortest computational time which is 1.4417s. But the RMSE of ICA is 0.1657 larger than that of ULLSTWSVM. To sum up, the proposed ULLSTWSVM outperforms the comparison algorithms.

Experiment 2
In this experiment, we use three classes of endmembers to synthesize mixed pixels to evaluate the unmixing effect of ULLSTWSVM. The method of abundance generation refers to Miao and Qi (2007). Three 25 × 20 abundance matrices are generated by low pass averaging filter. 500 pixels are selected randomly from each class of Corn-notill, Grass/Trees, Woods in Indian Pines data set as endmembers which are used to generate the synthetic pixels.
In the experiment, we compare Multiple Endmember Spectral Mixture Analysis (MESMA) introduced by Roberts et al. (1998) with our proposed ULLSTWSVM. MESMA is a representative multiple endmembers unmixing algorithm and has been widely used in practice. 15 pixels are evenly selected from Corn-notill, Grass/Trees and Woods data in Indian Pines (5 pixels for each class) to compose the reference spectra library, then the unmixing results are obtained by MESMA. For the fair comparison, the pixels in reference spectral library are used as training samples, and the unmixing results are obtained by  The results show that when the number of reference spectra is 15, the RMSEs of unmixing the three classes endmembers by ULLSTWSVM are obviously smaller than those by MESMA, and the Mean RMSE of ULLSTWSVM is 0.0912 less than that of MESMA. The RE of ULLSTWSVM is 0.005 less than that of MESMA. When the number of reference spectra is 30 and 45, the RMSEs of unmixing Corn-notill and Grass/Trees by ULLSTWSVM are smaller than those by MESMA. The RMSE of unmixing Woods by ULLSTWSVM is slightly larger than that by MESMA. And the Mean RMSEs of ULLSTWSVM are less than those of MESMA. When the number of reference spectra is 30, the RE of ULLSTWSVM is 0.0015 less than that of MESMA. When the number of reference spectra is 45, the RE of MESMA is 0.0049 less than that of ULLSTWSVM. Table 2 lists the computational time of the two algorithms. MESMA is of heavy computational load, and the computational time of MESMA increases rapidly with the increase of the number of reference spectra. Compared with MESMA, ULLSTWSVM has a much shorter computational time. When the number of reference spectra number is 45, the computational time of ULLSTWSVM is only 1/22,000 of that of MESMA.
In summary, ULLSTWSVM significantly outperforms MESMA. This is because MESMA tests all possible combinations of reference spectra to unmix the mixed pixel. The performances of different combinations are evaluated, and the optimal one is chosen to unmix the mixed pixel. If the number of reference spectra number is larger, the unmixing performance is better, but the computational load would be too heavy to be practically implemented. ULLSTWSVM takes reference spectra as training samples to obtain classification hyperplanes. Then, abundances are obtained by calculating the distances from the mixed pixels to the classification hyperplanes. Compared with MESMA, ULLSTWSVM has the advantages of high accuracy and low computational burden.

Experiment 3
The purpose of this experiment is to use USGS spectral library to synthesize mixed pixels to evaluate ULLSTWSVM, URVM (unmixing algorithm based on relevance vector machine) and ULLSSVM (unmixing algorithm based on Linear Least Squares Support Vector Machine). Three classes of spectra, Olivine, Hematite and Muscovite, were chosen from the USGS spectral library to run the experiment. The numbers of within-class spectra were 17, 12 and 13, respectively. Then, the abundances selected from the abundance matrices generated in Experiment 2 and three randomly selected spectrum (one from each class) were linearly mixed to generate each mixed pixel. 500 mixed pixels were synthesized for the experiment. 6 spectra were evenly selected from each class as training samples. Figure 5 shows the abundance of maps of the three classes. Each row represents the abundances of one class. From top-to-bottom, the figure shows the abundances of Olivine, Hematite, and Muscovite. Each column stands for different algorithms. From left-toright, they are the results from real abundance, URVM, ULLSSVM, and ULLSTWSVM. As can be observed from abundance maps, the abundance map of URVM is quite different from real abundance map, which means the unmixing results of URVM are poor. The abundance map of ULLSSVM has certain similarity with real abundance map, but the green region (the abundance of three classes are equal, 0.333) is blurred. Compared with the abundance maps of the former two algorithms, the abundance maps of ULLSTWSVM are the closest to the real abundance maps.
The quantitative results of three algorithms are tabulated in Table 3. We can see clearly that the RMSEs of ULLSTWSVM are smaller than those of

Experiment 4
In this experiment, we evaluated the unmixing performance of proposed algorithm using real hyperspectral data Salinas-A. Comparison algorithms include URVM, ULLSSVM, and ULLSTWSVM. At the same time, gaussian kernel function is introduced into LSSVM (Least Squares Support Vector Machine) and LSTWSVM, as well as the unmixing algorithms are named UNLLSSVM (unmixing algorithm based on nonlinear LSSVM) and UNLLSTWSVM (unmixing algorithm based on nonlinear LSTWSVM) respectively. Three classes of pixels, Lettuce_romaine_5 wk, Lettuce_romaine_7 wk and Corn_senesced_green_weeds were selected to be unmixed. 68, 76 and 39 pixels (5% of the total number) were evenly selected from each class as training samples. The results are shown in Figure 6 and Table 4.
Examining the maps, Figure 6(c) is found to be more blurred than the other abundance maps, especially for unmixing Lettuce_romaine_5 wk and Corn_senesced_green_weeds. In Figure 6(b), there are some white speckles in black area, some pixels belonging to Corn_senesced_green_weeds are unmixed to Lettuce_romaine_7 wk. Figure 6(e), (f) are similar and close to the true abundance map, their white areas are clearer than those of Figure 6(d). Table 4 reports the unmixing results of five algorithms. ULLSTWSVM performs better than ULLSSVM, the RMSEs and Mean RMSE of ULLSTWSVM are reduced by 0.095, 0.0538, 0.1068 and 0.0877 than those of ULLSSVM. ULLSTWSVM also outperforms URVM. When unmixing Lettuce_romaine_5 wk and Lettuce_romaine_7 wk, ULLSTWSVM can obtain smaller RMSEs than those of URVM. However, URVM can obtain smallest RMSE when unmixing Corn_senesced_green_weeds. In addition, the Mean RMSE of ULLSTWSVM is smaller than that of URVM. Observing the REs, we can see that the RE of ULLSTWSVM is smaller than that of ULLSSVM and URVM. The RE of UNLLSTWSVM is smaller than that of UNLLSSVM.
After introducing gaussian kernel function, UNLLSTWSVM and UNLLSSVM greatly improve the unmixing performance. Furthermore, UNLLSTWSVM outperforms the compared methods for the Mean RMSE and RE.

Experiment 5
In the last experiment, we evaluate the unmixing performance of proposed algorithm using real hyperspectral data Pavia University. The comparison algorithms include URVM, ULLSSVM, ULLSTWSVM, UNLLSSVM and UNLLSTWSVM. In the experiment, we unmix four classes of pixels over Pavia University: Meadows, Metal sheets, Soil and Bitumen. 79,73,403 and 79 pixels (10% of the total number) from each class are evenly selected as training samples. The results are shown in Figure 7 and Table 5.
Observing Figure 7, we can find that the abundance maps of URVM are quite different from the reference abundance maps. The white areas in the abundance maps of ULLSSVM are clearer than those in the abundance maps of URVM. The abundance maps of ULLSTWSVM are similar to those of UNLLSSVM.
Among all the abundance maps, the abundance maps of UNLLSTWSVM are the closest to the reference ones. Quantitative evaluation is shown in Table 5 which lists RMSEs, Mean RMSEs and REs of unmixing algorithms. Obviously, URVM results in a lower performance than the other four algorithms. The performance of ULLSTWSVM and UNLLSSVM is similar, and their Mean RMSEs are obviously less than that of ULLSSVM. For endmembers Meadows, Soil and Bitumen, UNLLSTWSVM obtains the minimum RMSEs. In addition, UNLLSTWSVM produces the minimum Mean RMSE and RE, which are 0.1013 and 0.0689, respectively.

Discussion
In Experiment 1, we evaluated ULLSTWSVM and some representative single endmember unmixing methods ( Figure 4 and Table 1). ULLSTWSVM outperforms the other compared methods. This is because the single endmember unmixing methods use a fixed endmember to represent a pure class, the intra-class endmember variability is ignored. ULLSTWSVM uses multiple endmembers (training samples) to model a pure class to take intra-class endmember variability into account, and therefore  the performance of unmixing is obviously improved. Next, we discuss data imbalance through Experiment 1. Different number of samples are evenly selected from Soybean-min to form class j. 10 samples are evenly selected from Woods to form class j. These two classes of data are then used to perform ULLSTWSVM. The results are listed in Table 6. We can see clearly that when the number ratio is less than 50, there is no significant change in RMSE. This shows that data imbalance has little effect on ULLSTWSVM. When the number ratio is greater than 50, RMSE becomes larger with the increase of the number ratio. This is because as the number of class j increases, the within-class endmember variability of class j becomes larger, which in turn leads to an increase in RMSE.
In Experiment 2, we evaluated ULLSTWSVM and MESMA (Table 2). In MESMA, all endmember combinations are attempted, and the optimal endmember combinations are found according to the "minimum reconstruction error criterion", while abundances are obtained. But "the minimum reconstruction error" is purely in the mathematical sense, it is completely possible that the endmembers and abundances are quite different from the true ones. ULLSTWSVM obtains the abundances by calculating the distances from the mixed pixel to the classification hyperplanes, which reflects the contribution of all endmembers (training samples) to the unmixing. ULLSTWSVM is a more reasonable unmixing model. In addition, it is easy to find that the unmixing effects of MESMA are greatly affected by the number of reference spectra. When the number of reference spectra increases from 15 to 30, the Mean RMSEs of MESMA decreased significantly. However, the   Figure 8 shows the change of Mean RMSEs of ULLSTWSVM with the number of training samples. When the number of training samples is greater than 45, the Mean RMSE curve is stable. This shows that ULLSTWSVM is a small sample learning method. In the case of training with small number of samples, ULLSTWSVM has strong generalization ability. The experimental results of Experiment 4 and Experiment 5 reveal that introducing gaussian kernel function can improve the unmixing performance when unmixing real hyperspectral images ( Figure 6, Figure 7, Table 4 and Table 5). For example, UNLLSSVM outperform ULLSSVM, and UNLLSTWSVM outperform ULLSTWSVM. The introduction of nonlinear kernel function mainly solves two problems: (1) In the original feature space, the endmember classes are linearly inseparable.
(2) Nonlinear factors, such as scattering and reflection, affect the results of unmixing.

Conclusion
In this paper, to incorporate the endmember variability, a novel unmixing method based on Least Squares Twin Support Vector Machines, namely ULSTWSVM, is proposed for hyperspectral remote sensing imagery. According to the use of different kernel functions, ULSTWSVM includes two forms: ULLSTWSVM (the unmixing algorithm based on linear Least Squares Twin Support Vector Machines) and UNLLSTWSVM (the unmixing algorithm based on nonlinear Least Squares Twin Support Vector Machines). Five experiments on four hyperspectral image datasets are carried out to verify the superior performance of ULSTWSVM. The first three experiments on synthetic data evaluated the performance of ULLSTWSVM and the last two experiments on real remote sensing data evaluated the performance of UNLLSTWSVM as well as ULLSTWSVM. The experimental results show that ULSTWSVM delivers better performance compared with the other unmixing algorithms tested.
In the future work, unmixing uncertainty which exists in all kinds of unmixing models will be investigated. ULSTWSVM can be a good method to use for unmixing uncertainty analysis.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was supported by National Natural Science Foundation of China (61675051).