Optimal segmentation and improved abundance estimation for superpixel-based Hyperspectral Unmixing

ABSTRACT Superpixel-based hyperspectral unmixing (HU) can effectively reduce spectral variability’s influence on unmixing performance. In the superpixel-based HU method, this study proposes a segmentation scale determination method to improve the accuracy of endmembers and fully constrained least squares based on distance strategy (D-FCLS) to improve the efficiency of abundance estimation. In the segmentation-scale determination method, this study establishes a segmentation scale division criterion to divide segmented images with similar quality into the same segmentation scale. The optimal segmentation scale is selected according to the actual situation of hyperspectral images. Moreover, the distance strategy is applied to fully constrained least squares (FCLS) using the spatial relationship between endmembers and the mixed pixel in abundance estimation. The proposed methods are evaluated on the synthetic and real datasets. The results show that the validity of the segmentation-scale determination method is verified by quantitative and qualitative evaluation on all datasets. In terms of abundance estimation, compared with FCLS, D-FCLS improves the efficiency by more than 10.30% on the synthetic dataset and 18.71% on the real dataset. In addition, this study’s proposed abundance estimation method and unsupervised superpixel-based HU method are superior to the other comparison methods.


Introduction
Hyperspectral images (HSIs) contain hundreds of continuous spectral bands and spatial information. HSIs have been widely used in remote-sensing applications such as space exploration (Kouyama et al., 2016), target detection (Nasrabadi, 2014), agricultural production (F. F. Wang et al., 2018), and disease diagnosis (Shahidi et al., 2013). However, due to limited spatial resolution, microscopic material mixing, multiple scattering, and other reasons, many mixed pixels in HSIs seriously affect the further exploration of information (Shaw & Burke, 2003). Hyperspectral unmixing (HU) aims to decompose HSI data into pure spectra of constituent substances (endmembers) and their corresponding proportions (abundances), which can effectively identify different substances at the subpixel level and is beneficial to improving the accuracy of hyperspectral technology (Bioucas-Dias et al., 2012).
In recent years, linear mixed models (LMMs) have attracted much attention in HU due to their clear physical meaning, simplicity, and effectiveness. LMMs are mainly classified into statistical, sparse regression-based, and geometric techniques (Bioucas-Dias et al., 2012;2019a). Among them, statistical techniques use the statistical properties of HSI to obtain endmembers and abundances (S. Khoshsokhan et al., 2019b;Lee & Seung, 2000; H. C. H. C. Li et al., 2021;Sigurdsson et al., 2017). Statistical techniques usually come with a price: higher computational complexity and the statistical distribution defined as a priori knowledge by the user may not conform to the nature of endmembers (L. L. Wang et al., 2021). Furthermore, sparse regression-based techniques are a semi-supervised HU by finding the optimal subset of endmembers from a spectral library (Iordache et al., 2011;J. J. Li et al., 2019;Sun et al., 2013;Y. Xu, 2020). Considering different acquisition conditions between the spectral libraries and HSIs, the standard spectra of spectral libraries are not the same as the actual spectra, which limits the applicability of sparse regression-based techniques.
Geometric techniques can extract endmembers directly from realistic scenes to maintain the consistency of endmembers and HSI data (Guo et al., 2009). Traditional endmember extraction (EE) methods extract endmembers within the entire image, using a single endmember to represent a pure class. EE methods include spectral feature-based methods such as pixel purity index (PPI; Boardman, 1993), vertex component analysis (VCA; Nascimento & Dias, 2005), N-FINDER (Michael, 1999), and iterative error analysis (IEA; Neville, 1999), and spatial-spectral-based methods such as automatic morphology EE (AMEE; Plaza et al., 2002), spatial-spectral EE (SSEE; Rogge et al., 2007), and spatial purity EE (SPEE; Mei et al., 2010). However, traditional EE ignores the spectral variability caused by different light, atmospheric and topographic conditions, affecting HU's accuracy (Ben Somers et al., 2011). To solve the above problems, many scholars have proposed a multiple endmember spectral mixture analysis (MESMA) to use multiple endmembers to represent a pure class (Cheng et al., 2021;B. Somers et al., 2012;Velez-Reyes & Alkhatib, 2019;M. M. Xu et al., 2015; X. X. Xu et al., 2018). The above method can extract endmembers that occupy a small area or have low relative contrast. Meanwhile, multiple endmembers to represent the spectra of substances can effectively represent the variability of the spectra.
Superpixel segmentation plays a positive role in MESMA, but the number of segments significantly affects the results of EE. If the number of segments is too small, some endmembers cannot be extracted effectively. On the contrary, if the number of segments is too large, abnormal endmembers can be extracted from the mixed region. Currently, object-based image segmentation methods are usually used to obtain the number of segments (Blaschke et al., 2014). (Jiarui & Miguel, 2017) proposed to estimate the optimal number of segments by minimizing the correlation between the smoothed image and the original image. (Kavzoglu & Tonbul, 2017) determined the optimal number of segments using five metrics: undersegmentation, over-segmentation, potential segmentation error, number-of-segments ratio, and Euclidean distance 2. (Alkhatib & Velez-Reyes, 2019) proposed evaluating the number of segments by the percentage of homogeneous regions to the total number of regions. However, the above techniques only focus on regions' individual properties but ignore the whole segmentation quality. Also, according to the realistic situation, it is known that the segmentation state changes from the under-segmentation state to the optimal segmentation state and finally to the oversegmentation state as the number of segments changes from small to large. The above method does not consider the connection and difference between different segmentation maps.
Abundance estimation is one of the most critical steps in MESMA, which directly affects the accuracy and efficiency of HU. Many researchers have proposed applying various constraints to the abundance to obtain more accurate and reliable results. The abundance estimation of commonly used methods includes non-negative constrained linear least squares (NCLS; Bro & De Jong, 1997), fully constrained least squares (FCLS; Heinz & Chein, 2001), and simplex projection unmixing (SPU; Heylen et al., 2011). Furthermore, for the abundance estimation in MESMA, FCLS is often used for abundance estimation. (B. Somers et al., 2012) proposed two methods for constructing an endmember matrix in FCLS using average spectrum and iterative combination. (Velez-Reyes & Alkhatib, 2019) suggested that all the extracted endmembers construct an endmember matrix in FCLS, and the abundance of a class is the sum of the abundances of endmembers belonging to this class. Although FCLS applies some constraints on the spectral features in the literature mentioned above, it ignores the spatial relationship between endmembers and mixed pixels. At the same time, MESMA changes the representation of the substance from a single endmember to multiple endmembers. This method inevitably increases the number of endmembers, significantly reducing the efficiency of abundance estimation.
Given this, this study proposes a segmentation scale determination method based on segmentation quality evaluation to improve the accuracy of endmembers and a distance strategy-based fully constrained linear least squares (D-FCLS) method to improve the efficiency of abundance estimation in MESMA. The specific contributions of this study are summarized as follows.
(1) The proposed segmentation-scale determination method is used to determine the optimal number of segments. In this study, segmentation quality is used to evaluate the segmented images with different numbers of segments. The segmentation quality is used to explore the connection and difference between different segmented images. The segmentation state of the segmented image is analyzed, and all the segmented images are divided into different segmentation scales. The optimal segmentation scale is determined according to the actual situation of HSI, and thus the optimal number of segments is determined.
(2) The proposed FCLS based on distance strategy is used for abundance estimation in MESMA. In this study, the distance strategy is applied to FCLS using the spatial location relationship between the extracted endmembers and the mixed pixels to reduce the number of endmembers involved in the calculation in FCLS. This approach can improve operational efficiency while ensuring the accuracy of abundance estimation. The proposed methods aim to improve the accuracy and efficiency of HU in remote sensing applications. Figure 1 shows the flowchart of the unsupervised HU method based on superpixels proposed in this study. Firstly, HSI is segmented by entropy superpixel segmentation. The proposed segmentation-scale determination method obtains the optimal number of segments to achieve the optimal superpixel segmentation. Secondly, endmembers are extracted in each region, and the extracted endmembers are clustered into endmember classes. Finally, the abundance of the whole image is calculated using D-FLCS proposed in this study.Unsupervised superpixel-based HU. The content in the box is the main work of this study.

Superpixel segmentation
Entropy rate superpixel segmentation Entropy rate superpixel segmentation (ERS) is one of the most widely used segmentation algorithms in HSI processing (Liu et al., 2011;Tang et al, 2019). ERS uses the objective function of random walk entropy rate and equilibrium term for superpixel segmentation. The entropy rate favors the formation of compact and homogeneous clusters, while the balancing function encourages clusters with similar sizes.

Segmentation scale determination method
An entropy-based objective evaluation method can be used to evaluate the relative segmentation quality of the segmented image (Zhang et al., 2003). The method is based on entropy and uses a combination of region entropy and layout entropy to evaluate the relative segmentation quality of the overall segmented image. For HSI, the eigenvalues of the first principal component obtained by principal component analysis (PCA) are used as eigenvalues of the pixels in this study (Shlens, 2014). The entropy of a segmented image I with the number of segments k can be defined as: where N j is the number of pixels in the j th region, N I is the number of pixels, X j is the set of eigenvalues of all pixels in the j th region, and p j x ð Þ is the prior probability of x.
The above method shows that the relative segmentation quality of different segmented images can be judged by H corresponding to different k. According to the actual situation, it is known that the segmentation state usually changes from the under-segmentation to optimal segmentation and finally to the over-segmentation as k increases. Meanwhile, it is known from the results of (Hui et al., 2003) and (Zhang et al., 2008) that the entropy-based objective evaluation method is biased for evaluating segmented images with a small k. Therefore, this study proposes a segmentation scale determination method based on the relative segmentation quality evaluation by combining the above two reasons. The specific steps are as follows.
Step 1: Set the minimum number of segments as n min and the maximum number of segments as n max , and use equation (1) to obtain the H of different k (n min � k � n max ).
Step 2: From the above analysis of H and the segmentation state, the mean and variance of H are used to set four intervals by the interval convergence judgment method.
When H > μ þ σ or H < μ À σ, k corresponding to H belongs to the first interval.
Step 3: All H are standardized by Z-Score: (2) Step 4: By the definition in Step 2, the formula for obtaining the segmentation-scale nodes according to H � is as follows.
where n 1 ; n 2 ; n 3 are nodes between different segmentation scales. From the definition of entropy, it is known that when the entropy is smaller, the region has high consistency. Therefore, k corresponding to the smallest H at different segmentation scales is chosen as the optimal number of segments at that scale.
Unlike other methods for determining the optimal number of segments, this study uses an entropy-based objective evaluation method to evaluate the relative segmentation quality of segmented images. The relationship between segmented images with different k is considered. A segmentation scale division criterion is established to divide segmented images with similar quality into the same segmentation scale. The best segmentation scale is selected according to the actual situation of HSIs, and then the optimal number of segments is determined.

Endmember extraction in the region
Endmembers are usually located in homogeneous regions of HSI, while mixed pixels are usually located in transition or non-homogeneous regions (Zortea & Plaza, 2009). Based on this theory, (M. M. Xu et al., 2015) proposed the homogeneity index (HI) to measure the similarity of pixels in the neighborhood. The smaller the HI is, the more likely the pixel spectrum is to be an endmember. The most common spectral similarity indicators are Euclidean distance (ED), SAD, spectral correlation measure (SCM), and spectral information dispersion (SID; Chein, 2000). Since ED is sensitive to amplitude and SAD is sensitive to shape, this study chooses the combination of ED and SAD as the indicator to judge the spectral similarity. ED x; y ð Þ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where x i is the value of the i th band of pixel x. y i is the value of the i th band of pixel y adjacent to x.
where x is the spectral feature of pixel x and y is the spectral feature of y adjacent to x.
where x m is the set of neighboring pixels of pixel x. Let m = 8 be the comparison range of 8 neighborhoods in this study. This study considers the spectral features of pixels with minimum HI in the regions as endmembers. The EE based on HI extracts the spectral feature and preserves the spatial information, which provides conditions for optimizing the abundance estimation.

Endmember class extraction
According to the extracted endmembers, endmembers are divided into different endmember classes using the k-means method (different endmember classes represent different substances; Bioucas-Dias & Nascimento, 2008). The spectral variability can be better maintained by using endmember classes to represent the spectral features of substances (Velez-Reyes & Alkhatib, 2019).

Fully constrained least squares based on distance strategy
In the superpixel-based HU method, FCLS obtains abundance by minimizing the sum of square errors between the original and reconstructed spectral features and satisfies the abundance sum-to-one constraint (ASC) and abundance non-negativity constraint (ANC). The method focuses on the spectral features and ignores the spatial features between endmembers and mixed pixels. This study uses a synthetic image to visualize the importance of spatial features in abundance estimation. Figure 2 shows a 50 × 50 pixels synthetic HSI using the spectral features of four substances (Grass, Jarosite, Pyrite, Topaz) extracted from the USGS spectral library (Clark et al., 2007). Each substance was randomly distributed and occupied a specific space in the image. Each substance used multiple spectral features to represent the variability of the spectra (Halimi et al., 2016). A spatial low-pass filter with 7 × 7 was used to generate a mixed image by adding the ASC to each pixel.False-color image of synthetic HSI. (a) Unmixed image, (b) mixed image, (c) reference spectral features. ERS, EE, and k-means were used to extract endmember classes of the synthetic HSI. Figure 3 shows the spatial information of endmembers, and different  marker symbols represent different classes. FCLS uses equation (7) to estimate the abundance of mixed pixels (red box). Figure 3(a) takes the Pyrite class as an example to show the endmembers involved in the calculation in the Pyrite class ("▽" marks endmembers of the Pyrite class).Illustration of abundance estimation. (a) Endmembers used in FCLS, (b) the endmember used in D-FCLS.
where y j is the spectral feature of pixel j, the endmember matrix A contains all the endmembers and s j is the abundance vector of pixel j, s j > 0 is ANC, and s j T 1 ¼ 1 is ASC. As shown in Figure 3, the mixed pixels in HSI are all mixed by their adjacent pixels. Therefore, selecting the endmembers closer to the mixed pixel for abundance estimation is more realistic, as shown in Figure 3(b). This method ensures the accuracy of abundance estimation and reduces the number of endmembers involved in the calculation, thereby improving the efficiency of abundance estimation. Based on the above analysis, a fully constrained least squares based on distance strategy (D-FCLS) is proposed in this study, as shown in equation (8).
where A j is a subset of A and the elements of A j satisfy the distance strategy.
To explore the impact of distance range on the accuracy and efficiency of abundance estimation, D-FCLS is divided into D-FCLS with minimum distance strategy (MD-FCLS) and D-FCLS with top p distance strategy (PD-FCLS). The A j of MD-FCLS contains the endmember nearest to the mixed pixel in each class. This method minimizes the number of endmembers involved in the calculation. The A j of PD-FCLS contains endmembers that the top p nearest to the mixed pixel in each class. p is the selection ratio of the distance range. In PD-FCLS, the number of endmembers involved in the calculation is reduced to p times the original. The effects of different distance strategies on the accuracy and efficiency of abundance estimation are tested by setting different p.

Synthetic dataset
Five different types of synthetic HSIs (Rational, Exponent, Matern, Spheric, and Legendre) were generated using five endmembers extracted from the USGS spectral library and the Hyperspectral Image Synthesis Toolbox (Hyperspectral Imagery Synthesis (EIAs) toolbox). Each synthetic HSI consists of 128 × 128 pixels and 431 spectral bands ranging from 350 to 2500 nm. Different levels of Gaussian noise were added to each synthetic HSI with signal-to-noise ratios (SNRs) of 20 dB, 40 dB, 60 dB, and 80 dB, respectively, to evaluate the algorithm's performance at different SNRs. Falsecolor synthetic HSIs and reference spectral features Real dataset HYDICE Urban is one of the most widely used hyperspectral datasets in HU studies (Zhu, 2017). The scene has 307 × 307 pixels at 2 m resolution and 210 bands between 400 nm to 2500 nm at a spectral resolution of 10 nm. 162 channels are retained after removing channels 1-4, 76, 87, 101-111, 136-153, and 198-210 due to dense water vapor and atmospheric effects (a standard preprocessing process for HU). Ground truth has three versions, containing 4, 5, and 6 endmembers, respectively. In this study, the 5 endmembers version was chosen. Five endmembers are asphalt road, grass, tree, roof, and dirt. Figure 5 shows the reference data of HYDICE Urban.HYDICE Urban. (a) False-color image, (b) reference classification map, (c) reference spectral features.
Pavia University is a scene acquired with the ROSIS sensor during a flight over Pavia, northern Italy. The image contains 610 × 340 pixels with 103 bands from 420 nm to 860 nm and a spectral resolution of 4 nm. The image contains nine endmembers: Asphalt, Meadows, Gravel, Trees, Metal Sheet, Bare Soil, Bitumen, Brick, and Shadow. Since Gravel and Brick, Asphalt and Bitumen also have similar spectra. Therefore, this study decided to combine Gravel and Brick to form a single information class and Asphalt and Bitumen to form another information class. Furthermore, since Meadows and Soil are highly mixed in the image, they are combined into another information class. The above process is a common preprocessing step in unsupervised unmixing (Velez-Reyes & Alkhatib, 2019). Figure 6 shows the reference data from Pavia University.Pavia University. (a) Falsecolor image, (b) reference classification map, (c) reference spectral features.
Jasper Ridge is a common HSI (Zhu, 2017). It is divided into three sub-images, Jasper Ridge#1, Jasper Ridge#2, and Jasper Ridge#3. This study uses Jasper Ridge#2 as the real data to validate the proposed algorithm. Jasper Ridge#2 contains 100 × 100 pixels, and each pixel is recorded in 224 bands, ranging from 380 nm to 2500 nm. Due to dense water vapor and atmospheric effects, spectral bands 1-3, 108-112, 154-166, and 220-224 are removed, leaving 198 bands. This sub-image contains four endmembers: road, soil, water, and tree. Figure 7 shows the reference data for Jasper Ridge #2.Jasper Ridge#2. (a) Falsecolor image, (b) reference classification map, (c) reference spectral features.

Evaluation indicators
This study used SAD to evaluate the segmentationscale determination method quantitatively. SAD is used to measure the similarity between the reference   spectral features and extracted endmembers, as shown in equation (5). At the same segmentation scale, SAD is relatively stable. The segmentation scale with a small and stable SAD is optimal.
OA and Kappa were used to evaluate the accuracy of abundance estimation. OA is the number of correctly classified pixels divided by the total number. Kappa is a statistical measure of agreement between the classification and reference maps (Cohen, 1960). The categories of pixels in the classification map belong to the categories with an abundance greater than 0.5. If there is no category with an abundance greater than 0.5, the pixel belongs to unassigned (Velez-Reyes & Alkhatib, 2019). In addition, to make the results more realistic and valid, this study used 3 × 3 median filtering to remove the anomalous scatter in the classification map. Furthermore, this study used the running time t to evaluate the efficiency of abundance estimation in the same operating environment.

Experimental results of segmentation scale determination method
In this study, the proposed segmentation-scale determination method is evaluated quantitatively. ERS was used to perform superpixel segmentation on the synthetic dataset, n min was set to 5, and n max was set to 100. H � and segmentation-scale nodes were obtained by the method in 2.1.2. Figure 8 takes the SNR of 80 dB as an example to show H � and SAD corresponding to different k, where SAD is the average value of SAD of all endmembers and their reference spectral features. To more clearly represent the relationship between H � and SAD at each segmentation scale, the nodes at different scales were represented by red vertical lines. The relationship between H � and SAD. (a) Rational, (b) Exponential, (c) Matern, (d) Spheric, (e) Legendre. (Continued).(Continued).
In general, as k increases, H � decreases and becomes stable. The result indicates that the segmentation quality gradually tends to be consistent with the increase of k. At the first segmentation scale, the significant variation of H � indicates that the segmentation quality is unstable in this interval, consistent with the conclusion of (Zhang et al., 2008). The method has a high bias when k is small. Figures 8(a) and (b) show that SAD decreases with the increase of k due to the high degree of mixing of Rational and Exponential. It indicates that as k increases, endmembers of small regions can be effectively extracted, and the extracted endmembers are gradually consistent with the reference spectral features. According to Figures 8(c), (d), and (e), the mixing degree of Matern, Spheric, and Legendre are small. SAD first decreases and then increases with the increase of k. When k is too small, the segmentation state is under-segmentation. Some endmembers cannot be effectively extracted, resulting in a larger SAD. As k gradually increases, the number of correctly extracted endmembers increases, and the SAD decreases. When k is too large, the segmentation state is over-segmentation, and the spectral features of mixed pixels are regarded as endmembers, increasing SAD. In summary, it can be seen that different segmentation scales have specific effects on extracted endmembers, and the optimal segmentation scale can improve their accuracy.
The segmentation-scale determination method was evaluated qualitatively. Figure 9 shows the extracted endmembers at each segmentation scale with an SNR of 80 dB. Where the first to fifth rows show the endmembers for Rational, Exponential, Matern, Spheric, and Legendre, respectively, the number of segments is determined by the minimum H at the four segmentation scales. From Figure 9(a), it can be seen that the fiberglass_gds374 class was not extracted correctly at the first three scales until the fourth scale, which is caused by the highest degree of mixing in Rational. From Figure 9(b-d), it can be seen that Exponential, Matern, and Spheric can extract all endmembers at the third scale, second scale, and first scale, respectively. Meanwhile, the number of endmembers increases with the increase of k. When k is too large, the outliers of each category will appear. The above phenomenon is due to the over-segmentation caused by the excessive k, which makes the spectrum extracted from the mixed region. In Figure 9(e), all endmembers were not extracted in the first and second scales due to the uneven space occupied by the matter. In the third scale, endmembers occupying small areas are extracted. However, over-segmentation in the region occupies a larger area, increasing the number of endmembers. Extracted endmembers for different scales of the synthetic image. The first row is Rational, and k is 16, 21, 44, 60. The second row is Exponential, and k is 18, 29, 47, 59. The third row is Matern, k is 13, 31, 41, 61. The fourth row is Spheric, and k is 18, 26, 45, 65. The fifth row is Legendre, and k is 5,13,21,41.
The optimal segmentation scales for different HSIs were obtained from quantitative and qualitative analyses. Meanwhile, k corresponding to the smallest H at the optimal segmentation scale was selected as the optimal number of segments. Table 1 shows the optimal number of segments for each synthetic image at different SNRs.
The comparison of different SNRs shows that the optimal number of segments for low SNR images is larger than that for high SNR images. Furthermore, for the different types of HSIs, the optimal number of segments gradually decreases as the degree of mixing decreases.

Comparative analysis of abundance estimation methods
ERS, EE, and k-means were used to extract endmember classes on the synthetic dataset. MD-FCLS, PD-FCLS, and FCLS were used to estimate abundance. OA, Kappa, and running time t were used to quantitatively evaluate the accuracy and efficiency of the above methods. In the experiments, p in PD-FCLS was set to 0.2, 0.5, 0.8, and 1 (when p was 1, it was FCLS).

Experimental results of segmentation scale determination on the real dataset
The segmentation-scale determination method was evaluated quantitatively on the real dataset. ERS was used to perform superpixel segmentation on Urban, Figure 9. Extracted endmembers for different scales of the synthetic image. The first row is Rational, and k is 16, 21, 44, 60. The second row is Exponential, and k is 18, 29, 47, 59. The third row is Matern, k is 13, 31, 41, 61. The fourth row is Spheric, and k is 18, 26, 45, 65. The fifth row is Legendre, and k is 5,13,21,41. Pavia University, and Jasper Ridge#2, respectively, setting n min to 5 and n max to 150. H � and segmentation scale nodes were obtained by the method in 2.1.2. The H � and SAD corresponding to the different k are shown in Figure 11. SAD is the average value of SAD of all endmembers and their reference spectral features, and the red vertical lines are the segmentation-scale nodes. In Figure 11(a), it can be seen that SAD shows a trend of decreasing and then increasing with the increase of k. The above phenomenon is because when k is small, endmembers are not extracted effectively, and SAD is large. As k increases, the extracted endmembers are gradually complete and close to the reference spectrum. When k is large, endmembers extraction is performed in the mixed region resulting from the over-segmentation, which leads to mixed spectra as endmembers, causing SAD to increase. In Figures 11(b) and (c), it can be seen that SAD decreases with the increase of k. It indicates that endmembers can be extracted effectively as k increases. From the correspondence between H � and SAD, it can be seen that the different segmentation scales maintain a relatively stable SAD. Therefore, the segmentation scale can be judged by the change of H � , and the best segmentation scale can be selected according to the actual situation.Relationship between H � and SAD on the real dataset. The segmentation-scale determination method was qualitatively evaluated. Figure 12 shows the extracted endmembers for Urban at different scales. Among them, the first row to the fourth row are the extracted endmembers from the first to the fourth scale, respectively, and k is 19, 47, 62, and 78, which are determined by the minimum H at the four segmentation scales, and the fifth row is the reference spectral. As seen from Figure 12, the segmentation result at the first scale is under-segmentation and fails to extract the endmember of the Tree class. All endmembers can be extracted at the second, third, and fourth scales. However, as k increases, abnormal endmembers appear in individual classes. For example, at the third and fourth scales, the Roof and Road classes increase endmembers that are more different from the reference spectrum, but the Tree class appears to have more endmembers that are similar to the reference spectrum. It can be seen that too large k is beneficial to extracting endmembers occupying a small area. However, too large k can increase abnormal endmembers occupying a large area. Therefore, choosing a suitable segmentation scale can improve the accuracy of endmembers. For the Urban dataset, the second segmentation scale is the best. Figure 13 shows the extracted endmembers for Pavia University at different scales. Among them, the first to the fourth row are the extracted endmembers from the first to the fourth scale, respectively, and k is 13,28, 51, and 80, which are determined by the minimum H at the four segmentation scales, and the fifth row is the reference spectral. As can be seen from Figure 13, only three endmember classes were extracted at the first scale. As the number of segmentation scales increases, the number of extracted endmember classes increases, and all the endmember classes are extracted at the fourth scale. For the Pavia University dataset, the fourth segmentation scale is the best. Figure 14 shows the extracted end elements at different scales for Jasper Ridge#2. Among them, the first to fourth rows show the endmembers for the number of segments of 21, 32, 92, and 107, respectively, which are determined by the minimum H at the four segmentation scales, and the fifth row is the reference spectral. The comparative analysis of the results indicates that the soil class is not extracted at the first segmentation scale. All endmembers can be extracted at the second, third, and fourth scales. Due to many mixed regions of tree and soil in Jasper Ridge#2, extracted endmembers at the second scale have a spectrum of mixed regions. Moreover, as k increases, the pure small regions can be completely segmented, and the endmembers closer to the soil class are extracted in the third and fourth scales. Therefore, the third scale is the best segmentation scale for Jasper Ridge#2.
Combining the above results, it is clear that the number of segments significantly influences the accuracy of the extracted endmembers. When the number of segments is too small, all endmembers are often not extracted effectively. As the number of segment increases, the endmembers occupying small areas can be extracted gradually. However, when the number of segmentation is too large, it tends to lead to over-segmentation, which makes the spectral in the mixed region be treated as endmembers and leads to abnormal endmembers in the endmember class. Therefore, the segmentation process is divided using this study's segmentation-scale determination method. The optimal segmentation scale is selected to determine the optimal number of segments, which can effectively improve the accuracy of endmembers.

Experimental results of abundance estimation
On the real dataset, abundance estimation was performed by MD-FCLS, PD-FCLS, and FCLS. OA, Kappa, and running time t were used to evaluate the accuracy and operational efficiency. In the experiments, p in PD-FCLS was set to 0.2, 0.5, 0.8, and 1 (when p =1, it was FCLS). Figure 15 shows the comparison results of OA, Kappa and running time t of MD-FCLS (Min), PD-FCLS (P = 0.2, P = 0.5, P = 0.8) and FCLS (P = 1) on the real dataset. From Figure 15(a), it can be seen that PD-FCLS (p = 0.8) obtains higher accuracy on Urban, with OA and Kappa reaching 86.86% and 81.99%, which are 0.62% and 0.83% better than FCLS, respectively. And the operational efficiency of PD-FCLS (p = 0.8) improved by 23.15% over FCLS, respectively. From Figure 15(b), it can be seen that PD-FCLS (p = 0.8) obtains higher accuracy on Pavia University, with OA and Kappa reaching 84.88% and 76.01%, respectively, which is 1.09% and 0.80% better than FCLS, respectively. And the operational efficiency of PD-FCLS (p = 0.8) improved by 18.71% over FCLS, respectively. From Figure 15(c), it can be seen that PD-FCLS (p = 0.5) obtains higher accuracy on Jasper Ridge #2 with OA and Kappa reaching 91.72% and 87.82%, respectively, which is 0.24% and 0.37% better than the FCLS, respectively. And the operational efficiency of PD-FCLS (p = 0.5) improved 73.18% over FCLS, respectively.
From the results of the real dataset, when the p is small, fewer endmembers participate in the calculation, making the operation more efficient. Due to the uneven distribution of substances and noise, the accuracy of abundance estimation is reduced. As the p increases, the number of endmembers gradually increases, the accuracy of abundance estimation increases, and the efficiency gradually decreases. When the p is too large, the "abnormal endmembers" in the endmember class (endmembers are far away from the mixed pixel) reduce the accuracy. Therefore, choosing an appropriate p can improve the accuracy and efficiency of abundance estimation. According to the results, PD-FCLS (p = 0.8) is the best abundance estimation method on the real dataset.

Compared with other abundance estimation methods
The method proposed in this study was compared with state-of-the-art abundance estimation methods in terms of accuracy and efficiency (running time per pixel), including SPU (Heylen et al., 2011), MESMA (Roberts et al., 1998), Mean-FCLS(B. Somers et al., 2012), and FCLS(Velez-Reyes & Alkhatib, 2019). Also, this study applied the distance strategy to SPU (D-SPU) and MESMA (D-MESMA). Based on the optimal number of segments determined in this study, ERS, EE, and endmembers class extraction were performed on the Urban, Pavia University, and Jasper Ridge#2. Table 2 shows the results of different abundance estimation methods on Urban. As shown in Table 2, Mean-FCLS has the highest operational efficiency. However, it obtains the lowest accuracy, while MESMA obtains the highest accuracy, but its operational efficiency is much higher than the other methods, which is not conducive to application. The above results are consistent with the description of (B. Somers et al., 2012). Tables 3 and 4 show the results on Pavia University and Jasper Ridge#2, respectively. From 3 and Table 4, it is clear that Mean-FCLS has the highest operational efficiency but obtains the lowest estimation accuracy. Furthermore, D-SPU obtained slightly higher accuracy than D-FCLS. However, its operational efficiency was much lower than D-FCLS, resulting from the excessive number of end elements, which is consistent with the conclusion of (Heylen et al., 2011). Meanwhile, D-SPU, D-MESMA, and D-FCLS improve accuracy and operational efficiency compared to the original methods. This result indicates that the distance strategy improves the accuracy and efficiency of abundance estimation. Considering the accuracy and efficiency of abundance estimation, the D-FCLS proposed in this paper is superior to other comparison methods.

Compared with other unsupervised superpixel-based HU methods
The HU proposed in this study was compared with other unsupervised superpixel-based HU on Urban, Pavia University, and Jasper Ridge#2. Segmentation methods include ERS, simple linear iterative clustering Tables 5, 6 , and 7. As can be seen from Tables 5, 6 , and 7, ERS obtained higher accuracy than SLIC and QT on Urban and Pavia University. While on Jasper     Ridge#2, QT obtained higher accuracy. Meanwhile, the HU proposed in this study (ERS+HI+D-FCLS) obtained higher OA and Kappa than the methods proposed in other studies, indicating that the HU method has better accuracy and stability.

Conclusions
In this study, the segmentation-scale determination method and D-FCLS are proposed to improve the accuracy and efficiency of the super pixel-based HU method. The segmentation-scale determination method divides four segmentation scales according to the relative quality of segmented images with the different number of segments. The best segmentation scale is selected according to the type of HSI. In abundance estimation, the distance strategy is applied to FCLS by using the spatial relationship between endmembers and the mixed pixel. Selecting the endmembers involved in the calculation ensures the accuracy of abundance estimation and improves efficiency. This study's segmentation-scale determination method was tested and validated by quantitative and qualitative evaluations on the synthetic and real datasets. The optimal segmentation scales were determined. The D-FCLS proposed in this paper was compared and analyzed on the synthetic dataset with the FCLS. The results show that both PD-FCLS (p = 0.8) and FCLS obtain high accuracy. The operation efficiency of PD-FCLS (p = 0.8) is at least 10.30% higher than that of FCLS. Compared with FCLS on the real dataset, the OA and Kappa of PD-FCLS (p = 0.8) are improved by 0.24% and 0.37%, respectively, and the operational efficiency is improved by 18.71%. The results of the two datasets show that PD-FCLS (p = 0.8) is the best abundance estimation method.
The proposed method is compared with other abundance estimation and unsupervised superpixelbased methods HU on the real dataset. By comparing different abundance estimation methods, D-FLCS obtains better accuracy and efficiency. At the same time, the distance strategy is applied to other methods, and the results show that the distance strategy can effectively ensure accuracy and improve efficiency. In addition, by comparing different unsupervised superpixel-based HU methods, the HU method in this study obtains higher OA and Kappa. Therefore, the unsupervised superpixel-based HU method proposed in this study has higher accuracy and stability.

Disclosure statement
No potential conflict of interest was reported by the author(s).