Hyperspectral anomaly detection based on spectral–spatial background joint sparse representation

ABSTRACT In recent years, some algorithms based on sparse representation have been proposed to improve the detection performance for hyperspectral anomaly detection. Among these algorithms, the background joint sparse representation (BJSR) algorithm adaptively selects the most representative background bases for the local region and can obtain satisfactory results. However, BJSR mainly considers spectral characteristics of hyperspectral image. In this paper, we propose a BJSR-based spectral–spatial method. BJSR is first employed to process the original hyperspectral image in spectral domain. Then, linear local tangent space alignment (LLTSA) is used to obtain the low-dimensional manifold of the hyperspectral image. Next, spatial BJSR is used to process the low-dimensional manifold obtained by LLTSA. Finally, the proposed algorithm combines spectral BJSR with spatial BJSR to detect the anomaly targets. The experimental results demonstrate that the proposed algorithm can achieve a better performance when compared with the comparison algorithms.


Introduction
Hyperspectral image (HSI) is rich in spectral information. It can span the visible, near-infrared and midinfrared portions of the spectrum in many contiguous spectral bands, and the spectral resolution is usually less than 10 nm (Zhao, Du, & Zhang, 2014). Based on their spectral signatures, it is possible to discriminate different objects of our interest, and classification and target detection are two major applications of the discriminative capability (Camps-Valls & Bruzzone, 2005;Matteoli, Diani, & Corsini, 2010). Target detection is essentially a binary classification problem, which aims to separate specific target pixels from various backgrounds, and it attracts more attention because of many military and civilian applications in recent years (Bajorski, 2012;. Anomaly detection (AD) is a special case of target detection, in which no a priori knowledge about the spectra of the targets of interest is available (Matteoli et al., 2010;Stein et al., 2002). In fact, AD generally has more extensive applications than target detection with a priori knowledge because it is usually very difficult to obtain the true spectral information of objects of interest.
In the past few decades, many hyperspectral AD algorithms have been proposed in the literature. Among these algorithms, the RX algorithm proposed by Reed and Yu (Reed & Yu, 1990) is acknowledged to be a benchmark AD algorithm (Li, Zhang, Zhang, & Ma, 2015). A sliding dual-window version of Reed-Xiaoli (RX) called "local RX (LRX)" (Borghys, Kåsen, Achard, & Perneel, 2012a;2012b) is then proposed to improve the detection performance for hyperspectral AD. Both RX and LRX algorithms are based on the assumption that the background pixels follow a Gaussian distribution. However, the assumption may not be reasonable owing to the complexity of HSI. To overcome the problems of RX, the kernel-Reed-Xiaoli (KRX) (Kwon & Nasrabadi, 2005) algorithm which is a well-known kernel-based AD algorithm is proposed. KRX assumes a Gaussian model in a higher-dimensional feature space, the so-called "Hilbert space" (Kwon & Nasrabadi, 2007). In addition to the aforementioned AD algorithms, some sparse representation-based algorithms (Chen, Nasrabadi, & Tran, 2011;Li & Du, 2015; without any distribution hypothesis for the background have attracted attention recently, and they assume that hyperspectral signatures can be compactly represented by only a few nonzero coefficients in a certain dictionary. The algorithm proposed by Chen et al. (2011) is based on the concept that a pixel in HSI lies in a low-dimensional subspace and thus can be represented as a sparse linear combination of the training samples. The sparse representation of a test sample can be recovered by solving an l 0 -norm minimization problem, and if the solution is sufficiently sparse, the problem can be relaxed to a linear programming problem which can be solved efficiently. Then, the class of the test sample can be determined by the characteristics of the sparse vector on reconstruction. Collaborative-representation-based detector (CRD) (Li & Du, 2015) exploits the concept that each pixel in the background can be approximately represented by its spatial neighborhoods, while anomalies cannot. In CRD, the representation is hypothesized to be the linear combination of neighboring pixels, and the collaboration of representation is reinforced by l 2 -norm minimization of the representation weight vector. Kernel CRD which is the kernel version of CRD is also proposed and further improves the detection performance. In addition,  propose an HSI AD model using background joint sparse representation (BJSR), and the detection result is satisfactory. BJSR also adopts the sliding dual window as LRX and CRD. The center point of the dual window is the test point, and the pixels between the inner and outer windows are the local neighboring pixels of the test point. In addition, a local search window which is also centered at the test pixel constructs the dictionary. If the test point is an anomaly, the sub-dictionary set simultaneously representing the local neighboring pixels of the test point cannot represent the test point; otherwise, if the test point is not an anomaly, it can be represented by the sub-dictionary set. The detection result of the BJRS algorithm is judged by measuring the length of the matched projection on the orthogonal complementary background subspace that is estimated by the joint sparse representation.
All the aforementioned algorithms are mainly based on spectral information of HSI. In fact, with the development of remote sensing technology, the hyperspectral airborne sensors can also obtain the high spatial resolution, and HSI processing algorithms should exploit both the spatial and the spectral features (Plaza et al., 2009;Plaza, Martin, Plaza, Zortea, & Sanchez, 2011). In recent years, some algorithms exploiting both the spatial and the spectral information perform well for hyperspectral AD. Local sparsity divergence (LSD) (Yuan, Sun, Ji, & Li, 2014) detector is based on the fact that a target sample cannot be approximately represented by very few background samples from the local surrounding both in the spectral and in the spatial domains and gets better detection performance. However, in spatial domain, it is very time consuming to compute the LSD map at each spectral band separately. Sparsity divergence index based on locally linear embedding (SDI-LLE) (Zhang & Zhao, 2016) employs the kernel CRD both in spectral and spatial domain and in spatial domain; kernel CRD is based on the lowdimensional manifold obtained by locally linear embedding (LLE) dimensionality reduction (DR) algorithm. SDI-LLE obtains better detection results.
Although DR is lossy, it can remove interband spectral redundancy and ever present noise and optimize detection results as a result of the improved separation between anomaly and background signatures, so DR is usually employed as a preprocessing method for hyperspectral data processing (Ma, Crawford, & Tian, 2010). Among DR algorithms, principal component analysis (PCA) is the most representative one, but it as a linear projection method cannot exploit the nonlinear properties that are often intrinsic in hyperspectral data (Bachmann, Ainsworth, & Fusina, 2005). In recent years, more and more nonlinear methods based on manifold learning have been proposed. In 2000, the Isomap (isometric feature mapping) algorithm (Tenenbaum, Silva, & Langford, 2000) and LLE algorithm (Roweis & Saul, 2000) were proposed in the same issue of Science (published by the American Association for the Advancement of Science). Then, some other manifold learning algorithms appeared, such as laplacian eigenmaps (LE) (Belkin & Niyogi, 2003), local tangent space alignment (LTSA) (Zhang & Zha, 2004) and stochastic neighbor embedding (Hinton & Roweis, 2010). Among these nonlinear manifold learning algorithms, LTSA has received wide attention due to its simple geometric intuitions, straightforward implementation and global optimization (Wang, 2008). However, these nonlinear manifold learning algorithms are implemented restrictedly on the training sets and cannot show explicit maps on new testing data points (Zhang, Yang, Zhao, & Ge, 2007). The linearization versions of these manifold learning algorithms are then proposed. For example, neighborhood preserving embedding (He, Cai, Yan, & Zhang, 2005) and locality preserving projections (He, 2005) correspond to the linearization of LLE and LE, respectively. Linear local tangent space alignment (LLTSA) (Zhang et al., 2007) is the linear version of LTSA. LLTSA uses the tangent space in the neighborhood of a data point to represent the local geometry and then aligns those local tangent spaces in the lowdimensional space which is linearly mapped from the raw high-dimensional space. However, both LTSA and LLSTA are unsupervised learning methods and cannot deal with the sparse and non-uniformly distributed data set very well. In this paper, the alignment matrix in LLTSA is first used as a judgment to determine the anomaly regions, which are removed to obtain reliable background data. The background data are then employed as training data, and the lowdimensional manifold of the reliable background and the linear mapping matrix can be obtained by LLTSA. Finally, the low-dimensional manifold of the whole HSI can be obtained by the linear mapping.
In this paper, an algorithm based on spectral-spatial background joint sparse representation (SSBJSR) and LLTSA (LLTSA-SSBJSR) is proposed for hyperspectral AD. In the proposed LLTSA-SSBJSR algorithm, the original hyperspectral data and the data after DR using LLSTA are employed in spectral and spatial domain, respectively, and the detection result combines spectral BJSR with spatial BJSR. The rest of the paper is organized as follows. First, introduction to LLTSA and SSBJSR is presented, and the formation of the proposed LLTSA-SSBJSR algorithm is described in detail. Next, the experiment data are described, and results obtained are commented and discussed. The final section draws our conclusion.

Related works and methodology
Brief review of LLTSA Ng has N points in R B . Then, for DR, LLTSA is employed to find a projection matrix A to map the highdimensional data set X into a low-dimensional data set where T means transpose. Given the identity matrix I and an N-dimensional column vector ewith all ones, we can represent H N ¼ I À ee T =N, where E=eeT and E is a matrix with all ones. In LLTSA, local structure information in the neighborhood of each sample can be represented by the local tangent space. Thus, the local tangent spaces of all samples can be realigned in the global low-dimensional feature space to construct the projection matrix from high-dimensional input space to the low-dimensional feature space. The implemented steps of LLTSA are as follows (Zhang et al., 2007): Step 1: Determining the neighborhood.
According to the Euclidean distance between samples, the neighborhood X i ¼ ½x i1 ; x i2 ; Á Á Á x ik of the sample point x i ; ði ¼ 1; Á Á Á ; NÞ is constructed, where k is the number of the nearest neighbors.
A local transformation matrix Q is used to map the neighborhood X i to local low-dimensional tangent space. Thus, the local structure of the neighborhood of sample point x i can be approximated as: arg min where the centering matrix H k ¼ I À ee T =k, θ ij is the local low-dimensional representation of the nearest neighbor x ij and Θ i ¼ ½θ i1 ; . . . ; θ ik . Θ i is the local low-dimensional representation of X i . The optimal x can be given by the mean of all x ij and the optimal transformation matrix Q can be given by the d eigenvectors of covariance matrix ðX i H k ÞðX i H k Þ T corresponding to its d largest eigenvalues. Thus, Θ i can be computed by Step 3: Constructing alignment matrix.
After local structure extraction, the obtained local tangent spaces of all samples are realigned in the global low-dimensional feature space to get the global low-dimensional representation Y of X. Denote a selection matrix S given by S ¼ ½S 1 ; Á Á Á ; S N , where S i is a 0-1 selecting vector, and the global low-dimensional representation of X i is given by Y i ¼ YS i . The objective function of this step can be transformed into a minimal problem as follows: In Equation (3), L i is a global transformation matrix and the optimal L i is given by According to Equation (1), Equation (3) can be transformed into the following form: Step. 4: Computing the maps. The minimization problem mentioned above can be converted into solving a generalized eigenvalue problem as follows: The transformation matrix A can be obtained by the d eigenvectors α 1 ; α 2 ; . . . ; α d corresponding to the eigenvectors

Spectral BJSR
In spectral domain, BJSR  is employed. The center point of the dual windows is the test point. The pixels between the inner and outer windows are the local neighboring pixels of the test point. The dictionary is constructed by a local search window centered at the test pixel.
For each test pixel s c , the implemented steps of BJSR  are as follows: Step 1: Construct the neighboring pixel set S and the dictionary H.
Step 3: Calculate the background orthogonal complementary subspace P ? B as follows: Step 4: Calculate the detected value of s c according to Equation (7).
where s i is the ith pixel of the local neighboring pixel set S. The spectral BJSR detector is modeled as Equation (7).

Spatial BJSR
In the jth band, the characteristics of the reconstruction error of the local neighboring pixels P j can be denoted as follows: where X j is the local neighboring pixel set of the test point in the jth band and X j ¼ ½x 1j ; x 2j ; . . . ; x nj .B j is the sub-dictionary of the local neighboring pixels in the jth band; Θ is the set of all the sparse coefficient vectors; and P j ? B represents the background orthogonal complementary subspace in the jth band. By the use of SOMP, we can obtain Θ, P j andB j . P j ? B can also be calculated. The spatial BJSR detector can be modeled as follows: where x cj is the test point in the jth band and x ij is the ith local neighboring pixel of the test point in the jth band.

Spectral-spatial BJSR
The joint LSD (Yuan et al., 2014) is based on the fact that a target sample cannot be approximately represented by very few background samples from the local surrounding both in the spectral and in the spatial domains. Similar to joint LSD, SSBJSR is represented as follows: where d spec is spectral BJSR and d spat is spatial BJSR. Optimized detection results can be obtained by setting different weighting coefficientα.

Formation of LLTSA-SSBJSR
In this section, we introduce the proposed LLTSA-SSBJSR, which includes four steps as follows: (1) Normalize the input hyperspectral data set and calculate d spec according to Equation (7). (2) Obtain the low-dimensional manifold of the normalized hyperspectral data set using LLTSA. First, anomalous regions are selected via the alignment matrix in LLTSA.
In fact, the alignment matrix in Equation (4) is a sparse matrix, whose nonzero elements correspond to the Euclidean distances between the local neighboring pixels and the average. The anomalous regions can be obtained by the Euclidean distances. If the Euclidean distance sum of the local neighboring pixels and the average is larger than the threshold, there may be anomalous pixels in the regions of the local neighboring pixels.
Then, the anomalous regions with possible anomalous pixels are removed, and the remaining regions are employed as training data to construct the background manifold. This process is not for AD, but to obtain a reliable background data.
Next, LLTSA is used to obtain the low-dimensional manifold of the background data, and the transformation matrix can also be obtained.
Finally, low-dimensional manifold of the whole HSI is computed by the linear mapping.
(3) Apply spatial BJSR to the normalized lowdimensional manifold of the whole data and compute d spat according to Equation (9). (4) According to Equation (10), SSBJSR is computed for the final detection result.

Date description
In this paper, we evaluate the performance of our proposed LLTSA-SSBJSR algorithm using three hyperspectral data sets. The first one is a simulated hyperspectral data set, as shown in Figure 1. The background data set collected by the ROSIS sensor (0.43-0.86 μm with 102 bands) during a flight campaign over Pavia, Northern Italy. The geometric resolution is 1.3 m. The whole data set consists of 105 × 100 pixels and 102 bands. The anomaly targets are embedded in the different backgrounds. The synthesizing method is implanting the target signal and white noise into the background data set. The hundredth band of the synthetic data (the central wavelength is 0.85 μm) is shown in Figure 1(a). The anomaly target positions are shown in Figure 1(b).
The data set has six anomaly targets, two of which have a pixel size of 4 × 3, another two targets have a pixel size of 2 × 2 and the last two targets have a pixel size of 3 × 3 and 2 × 4, respectively. The anomalous pixels are very few in number, and the percentage of anomalous pixels is about 0.47%. Figure 2 shows the spectral ranges of the background and the anomaly targets. From Figure 2, we can see that the spectral ranges of the anomalous pixels present obvious differences from those of the background. Therefore, the synthetic data is suitable for the AD experiments.
The second experiment data is the real hyperspectral data of San Diego collected by AVIRIS. The image data ranges from 400 to 1800 nm and has   126 bands after removing the bands that corresponding to the water absorption regions and low Signal Noise Ratio (SNR). The size of the image is 400 × 400 pixels, and its section with a size of 60 × 60 pixels is used for the experiment. The hundredth band of the experiment data is shown in Figure 3(a). The three planes are regarded as anomaly targets, and the actual distribution is shown in Figure 3(b).
In order to further prove the superiority of the proposed algorithm, the simulated data with more complex background is employed as the third experiment data, as shown in Figure 4. The background data was collected by the 224-band AVIRIS sensor (0.4-2.5 μm with 224 bands) over Salinas Valley, California, and is characterized by high spatial resolution (3.7-m pixels). The area covered comprises 512 lines by 217 samples. The section data set consisting of 130 × 100 pixels and 110 bands is employed for the experiment. The anomaly targets are embedded in the different backgrounds, and the synthesizing method is implanting the target signal and white noise into the background data set. The hundredth band of the synthetic data (the central wavelength is 1.3 μm) is shown in Figure 4(a). The anomaly target positions are shown in Figure 4(b). The data set has eight anomaly targets, four of which have a pixel size of 2 × 2, another two have a pixel size of 2 × 1 and the last two have a pixel size of 1 × 2 and 3 × 2, respectively. The anomalous pixels are very few in number, and the percentage of anomalous pixels is about 0.22%. As shown in Figure 5, the spectral ranges of the background and the anomaly targets are different. Therefore, this synthetic data is suitable for the AD experiments.

Results
In this paper, LRX based on PCA DR (PCA-LRX), LSD based on PCA DR (PCA-LSD) (Yuan et al., 2014), SDI-LLE (Zhang & Zhao, 2016) and BJSR  are employed as the comparison algorithms. For a fair comparison, the test algorithms  are all with optimal parameters. Tables 1-4 are showing AUC performance of the comparison algorithms with respect to different parameters for the synthetic HSI of Pavia. For PCA-LRX, as shown in Table 1, the optimal AUC value equal to 0.9806 is obtained when the principal component P is 8 and the optimal dualwindow size (w in , w out ) is (9,15). For PCA-LSD, as shown in Table 2, the optimal AUC value equal to 0.9532 is obtained when P = 8 and (w in , w out ) is (5,9). For SDI-LLE, the number of the nearest neighbor pixels is set to 30 (Zhang & Zhao, 2016), and the optimal AUC value equal to 0.9721 is obtained when the reduced band d = 2 and (w in , w out ) is (5,11) as shown in Table 3. For BJSR, as shown in Table 4, the optimal AUC value equal to 0.9779 is obtained when the upper bound of the sparsity level L = 6, the dictionary size D = 11 and (w in , w out ) is (5,7). The optimal parameters selection for LLTSA-SSBJSR will be discussed in the next section, and in this section, the optimal parameters are given. The optimal AUC value is obtained when L = 6, D and (w in , w out ) for spectral BJSR are 11 and (5,7), d = 8, D and (w in , w out ) for spatial BJSR are 17 and (5,13), respectively, and the weighting coefficient a is 0.1.
Tables 5-8 are showing AUC performance of the comparison algorithms with respect to different parameters for the real HSI. For PCA-LRX, as shown in Table 5, the optimal AUC value equal to 0.9983 is obtained when the principal component P is 3 and the optimal dual-window size (w in , w out ) is (13,27). For PCA-LSD, as shown in Table 6, the optimal AUC value equal to 0.7796 is obtained when P = 4 and (w in , w out ) is (11,13). For SDI-LLE (the number of the nearest neighbor pixels is set to 30), as shown in Table 7, the optimal AUC value equal to 0.9945 is obtained when the reduced band d = 2 and (w in , w out ) is (9,19). For BJSR, as shown in Table 8, the optimal AUC value equal to 0.7637 is obtained when the upper bound of the sparsity level L = 4, the dictionary size D = 21 and (w in , w out ) is (15,17). For LLTSA-    SSBJSR, the optimal AUC value is obtained when L = 4, d = 4, D and (w in , w out ) for spectral BJSR are 21 and (15,17), D and (w in , w out ) for spatial BJSR are 23 and (13,15), respectively, and the weighting coefficient a is 0. Tables 9-12 are showing AUC performance of the comparison algorithms with respect to different parameters for the synthetic HSI of Salinas. For PCA-LRX, as shown in Table 9, the optimal AUC value equal to 0.9986 is obtained when the principal component P is 12 and the optimal dual-window size (w in , w out ) is (5,9). For PCA-LSD, as shown in Table 10, the optimal AUC value equal to 0.7830 is obtained when P = 11 and (w in , w out ) is (5,11). For SDI-LLE (the number of the nearest neighbor pixels is set to 30), as shown in Table 11, the optimal AUC value equal to 0.9774 is obtained when the reduced band d = 8 and (w in , w out ) is (3,9). For BJSR, as shown in Table 12, the optimal AUC value equal to 0.8758 is obtained when the upper bound of the sparsity level L = 5, the dictionary size D = 15 and (w in , w out ) is (3,7). For LLTSA-SSBJSR, the optimal AUC value is obtained when L = 5, D and (w in , w out ) for spectral BJSR are 15 and (3,7), d = 6, D and (w in , w out ) for spatial BJSR are 15 and (3,5), respectively, and the weighting coefficient a is 0.3. Figures 6-8 show the binary images of the detection results, and it is very intuitive to see that the      detection result of LLTSA-SSBJSR is satisfactory and LLTSA-SSBJSR achieves better results than PCA-LSD and BJSR. According to the anomaly target positions, as shown in Figures 1(b), 3(b) and 4(b), we can see that the proposed algorithm also performs better than PCA-LRX and SDI-LLE.
The experimental results for all the test algorithms are also provided through receiver operating characteristic (ROC) curves for the first two data sets, as shown in Figure 9(a,b), and the ROC curves of LLTSA-SSBJSR are always above those of the comparison algorithms. We provide the AUC bar graph for the synthetic HSI of Salinas, as shown in Figure 9 (c), and the AUC value of LLTSA-SSBJSR is greater than those of the comparison algorithms. In short, the results shown in Figure 9 indicate that the proposed LLTSA-SSBJSR provides a better detection performance than the comparison algorithms.
In addition, separability maps (Lou & Zhao, 2015;Zhang, Du, Zhang, & Wang, 2016;Zhang & Zhao, 2016) evaluating the separability between anomalies and background are employed to further evaluate the performances of the test algorithms, as shown in Figure 10. The lines at the top and bottom of each column are the normalized extreme values which are 0 and 1, and the line in the middle of the box is the mean of the pixels. The position of the boxes reflects the tendency and compactness of the distribution of the pixels, and that is to say, the position reflects the separability of the anomalies and background. For the synthetic HSI of Pavia, as shown in Figure 10(a), we can see that the gap between the anomaly box and the background box for LLTSA-SSBJSR is bigger than for PCA-LRX and SDI-LLE. In addition, LLTSA-SSBJSR can suppress the background information better than PCA-LSD and BJSR. For the real HSI of three planes, as can be seen from Figure 10 (b), the gap for LLTSA-SSBJSR is smaller than those of PCA-LRX and SDI-LLE, but LLTSA-SSBJSR can     effectively suppress the background information.
For the synthetic HSI of Salinas, as shown in Figure 10(c), we can see that the gap for LLTSA-SSBJSR is much bigger than those of the four comparison algorithms, and LLTSA-SSBJSR can still effectively suppress the background information.    To summarize, for the three experimental data sets, in most cases, separability between anomalies and background of the proposed LLTSA-SSBJSR is not always the best in all the test algorithms, but the ability of LLTSA-SSBJSR to suppress the background information is better than the four comparison algorithms.
From the above analysis, we can conclude that the proposed LLTSA-SSBJSR generally performs better than PCA-LRX, PCA-LSD, SDI-LLE and BJSR for the three experiment data sets.
Furthermore, the corresponding execution times of the test algorithms are listed in Table 13. All experiments were carried out in MATLAB on an Intel Core i7-4770k CPU machine with 16 GB of RAM. The proposed algorithm's computation mainly comprises two parts: spectral BJSR procedure and spatial BJSR procedure. Given a data set X 2 R NÂB (N is the total number of samples and B is the number of spectral bands). The computational burden of BJSR (spectral BJSR) is O((nN+B)LBD 2 ) , where   n ¼ w out Â w out À w in Â w in , L is the upper bound of the sparsity level and D is the dictionary size. The computational burden of spatial BJSR is O((nN+B) BD 2 d). Therefore, the computational complexity of LLTSA-BJSR is O((nN+B)BD 2 (d+L)). Compared with SDI-LLE, the execution time of LLTSA-BJSR is much less, as the computational complexity of kernel CRD in SDI-LLE is higher than that of BJSR in LLTSA-BJSR. For PCA-LRX, the covariance matrix inverse costs O (N(n 3 +d 3 )) . For PCA-LSD, the number of the local windows needed to be processed is N(d+1), and the sparse coefficients computation has a complexity of O(n 2 ) (Yuan et al., 2014). As shown in Table 13, the execution time of LLTSA-SSBJSR is acceptable.

Results and discussion
The proposed LLTSA-SSBJSR algorithm consists of two parts, which are spectral BJSR and spatial BJSR. In spectral BJSR, there are three main parameters. The first two are the dual-window size (w in , w out ) and the dictionary size D and the last one is the upper bound of the sparsity level L. Tables 4, 8 and 12 are also showing the AUC performance of spectral BJSR with respect to different parameters for three experiment data sets. The optimal parameter selections are the same as in BJSR for the three experiment data sets. From Tables 4, 8 and 12, we can see that all the three parameters have obvious influences on the detection results. In spatial BJSR, there are also three main parameters, which are the dual-window size (w in , w out ), the dictionary size D and the number of reduced bands d. Tables 14-16 are showing AUC performance of spatial BJSR with respect to different parameters for the three experiment data sets. For the synthetic data set of Pavia, as shown in Table 14, when d = 8, the AUC performance is better. The AUC value is not sensitive to D. The optimal AUC value equal to 0.9915 is obtained when (w in , w out ) is (5,13), (5,15) or (7,13). For the real HSI, as shown in Table 15, the AUC value is not sensitive to D and the optimal value 0.9992 is obtained when d = 4, (w in , w out ) is (13,15) or (13,17). For the synthetic HSI of Salinas, as shown in Table 16, the optimal AUC value equal to 0.9996 is obtained when d = 6, (w in , w out ) is (3,5) and the value is also insensitive to D. From the above analysis, we can see that d and (w in , w out ) have obvious influences on the detection results, but the results are not sensitive to D.
The weighting coefficient a in Equation (10) is also an important parameter that has an obvious impact on the detection results. Figure 11 shows AUC performance of LLTSA-SSBJSR with respect to a for the three experiment data sets, and the other parameters are all optimal. For the synthetic data set of Pavia, as shown in Figure 11(a), the optimal AUC value equal to 0.9915 is obtained when a = 0 or 0.1, and then the value decreases with the increase of a. When a = 1, the lowest value is reached. For the real HSI, the optimal AUC value is obtained when a = 0 and the AUC value is not sensitive to a when a changes from 0 to 0.2. Then, the value rapidly decreases with the increase of a, and the lowest value is reached when a = 1. For the synthetic data set of Salinas, the AUC value is not sensitive to a when a changes from 0 to 0.4, and the optimal AUC value equal to 0.9997 is obtained when a = 0.1, 0.2 or 0.3. Then the AUC value decreases with the increase of a, and the lowest value is reached when a = 1. From the above analysis, we can see that the optimal detection results are always achieved when a<0:5, and that is to say, the spatial BJSR plays a major role in the optimal detection results for the three experiment data sets.

Conclusions
In this paper, a new hyperspectral AD method, called LLTSA-SSBJSR, is proposed. LLTSA-SSBJSR considers the spectral and spatial characteristics and combines the spectral BJSR with the spatial BJSR to improve the detection performance. The proposed algorithm was tested on synthetic and real hyperspectral data. The extensive experimental results show that LLTSA-SSBJSR generally achieves better detection performance than the PCA-LRX, PCA-LSD, SDI-LLE and BJSR algorithms. Parameter selection is also discussed in this paper. LLTSA-SSBJSR still has room for improvement. The parameter selection is mainly determined by experience and repeated experiments, and our future research will focus on how to quickly and effectively determine the optimal parameters.