A method for remote sensing image classification by combining Pixel Neighbourhood Similarity and optimal feature combination

Abstract In the study of remote sensing image classification, feature extraction and selection is an effective method to distinguish different classification targets. Constructing a high-quality spectral-spatial feature and feature combination has been a worthwhile topic for improving classification accuracy. In this context, this study constructed a spectral-spatial feature, namely the Pixel Neighbourhood Similarity (PNS) index. Meanwhile, the PNS index and 19 spectral, textural and terrain features were involved in the Correlation-based Feature Selection (CFS) algorithm for feature selection to generate a feature combination (PNS-CFS). To explore how PNS and PNS-CFS improve the classification accuracy of land types. The results show that: (1) The PNS index exhibited clear boundaries between different land types. The performance quality of PNS was relatively highest compared to other spectral-spatial features, namely the Vector Similarity (VS) index, the Change Vector Intensity (CVI) index and the Correlation (COR) index. (2) The Overall Accuracy (OA) of the PNS-CFS was 94.66% and 93.59% in study areas 1 and 2, respectively. These were 7.48% and 6.02% higher than the original image data (ORI) and 7.27% and 2.39% higher than the single-dimensional feature combination (SIN-CFS). Compared to the feature combinations of VS, CVI, and COR indices (VS-CFS, CVI-COM, COR-COM), PNS-CFS had the relatively highest performance and classification accuracy. The study demonstrated that the PNS index and PNS-CFS have a high potential for image classification.


Introduction
As we all know, remote sensing image classification has been a traditional and important research topic.Currently, most classification studies are based on the original images or feature extraction (Toure et al. 2018;Zhang et al. 2019).In particular, feature extraction can highlight image information and amplify the property differences of objects (Benediktsson et al. 2003;Zhang et al. 2018).For feature extraction, there are many features in the literature (Li et al. 2014;Zhao and Du 2016), such as spectral, texture, shape and terrain features.These features were extracted from the original images based on Principal Components Analysis (PCA), Minimum Noise Fraction (MNF) rotation, and Tasseled Cap Transform (TCT) (Sonnenschein et al. 2011;Ben Brick et al. 2014;Xu et al. 2019).Alternatively, they were extracted by algebraic operations between different spectra.For example, the strong absorption of vegetation in the red band and high reflection in the near-infrared band were used to construct vegetation indices and enhance vegetation information (Chen et al. 2020).In addition, they also were represented by the texture indices in the Grey Level Co-occurrence Matrix (GLCM) (Haralick et al. 1973).The slope and curvature were also available from the DEM to enhance terrain differences (Hua et al. 2021).These features have good references for improving classification accuracy.However, in most cases, poorly classified areas are mainly concentrated at the boundaries between different land types.Due to the complexity and diversity of boundaries, similar or mixed spectral information exists at the boundaries.In this case, using spectral or spatial features alone does not make the boundary classification reliable.Therefore, what features are constructed to improve the differentiation between land types is particularly important.In recent studies, many spectral-spatial features have been proposed (Huang and Zhang 2008;Imani and Ghassemian 2015;Li et al. 2019).Some scholars have tried try to find suitable features by merging or transforming spectral and spatial information to improve classification accuracy.A relatively simple and natural method to is vector stacking, which concatenates different features into a long vector (Zhang et al. 2016).Alternatively, overlay spatial information directly into spectral features.For example, filters such as 3-D wavelets (Qian et al. 2013) or 3-D Gabor wavelets (Zhu et al. 2015) were used to extract spectral-spatial features.However, overlay strategies not only produce the higher-dimensional feature but also treat different features equally, ignoring the specific attributes of multiple features and leading to feature redundancy.As a result, they are difficult to be used effectively for classification.Some dimensionality reduction methods, such as PCA (Prasad and Bruce 2008) and Independent Component Analysis(ICA) (Villa et al. 2011), have been applied to solve this problem.However the problem is that they are difficult to explain the features that played a vital role in the classification.In general, these methods for constructing spectral-spatial features do not better account for the spatial relationship of pixels or the continuity of spectral information.In this context, it is still a challenge to construct a high-quality spatial-spectral feature to improve classification accuracy.
For feature selection, many selection algorithms have been proposed (Du 2007;Sitokonstantinou et al. 2018;Immitzer et al. 2019;Uddin et al. 2021a).Some studies have focused on the importance of features (Saha et al. 2015;Duan et al. 2020).For example, An object-oriented RF-SFS method based on the random forest model was used to calculate the importance of features, and this method used a sequential forward selection algorithm to select the optimal feature subset (Chen et al. 2020).Similarly, Zhu et al. (2019a) combined Gini coefficients with Out-Of-Bag (OOB) error in the random forest to assess the importance of features.In addition, some studies have considered the separation between features in feature selection (Bolourchi et al. 2018;Chen et al. 2021).For example, Zhao et al. (2020) used the Jeffries-Matusita (J-M) distance to determine the separation threshold between features to classify wetland and non-wetland.Alternatively, a feature combination was estimated from both feature-feature and feature-target correlations.Feature combinations were ranked rather than individual features (Freeman et al. 2015;Lee and Kim 2015;Zhou et al. 2016).One of the more typical algorithms is the Correlation-based Feature Selection (CFS) algorithm.The feature combination constructed by this algorithm can well satisfy the high correlation between features and targets and high independence between features.Many studies have proven that the CFS algorithm is an effective method for selecting the optimal feature combination from many features (Wu et al. 2013;Taghizadeh-Mehrjardi et al. 2016;Shukla et al. 2019).
The crucial to improving classification accuracy is to increase the distinction between different land types.From the perspective of feature extraction and selection, it is essential to determine what types of features are constructed and which features are used for classification.Therefore, this study has two motivations：(1) To construct a spectral-spatial feature besides spectral, textural, and terrain features.Use it to enrich the diversity of features, fully reflect the attribute information and improve the spatial distribution differences between land types.(2) To construct a feature combination by feature selection algorithm and to explore the value of the spectral-spatial feature and this combination in classification.In this context, the Pixel Neighbourhood Similarity (PNS) index was constructed in this study.PNS index calculated the spectral values of all channels, ensuring the continuity of the spectral information.Meanwhile, the spatial relationship and distance of pixels were considered in the spatial dimension.It can combine spectral and spatial information simultaneously.For feature selection, to choose a suitable algorithm, this study considered the computational efficiency and characteristics of different algorithms, and finally chose the CFS algorithm, which directly returns a feature combination.To ensure the objectivity and comprehensiveness of the assessment, this study compared different spectral-spatial features and feature combinations from performance quality assessment and classification accuracy assessment.Ultimately exploring the feature combination with the best performance quality and highest classification accuracy.

Study areas
Study area 1 was located in the northern part of Anhui Province, China (Figure 1).It covers 506.25 km 2 .The landscape is mainly plain, interspersed with low hill remnants.The cropland and residential in the study area are distributed regularly, and the water and roads are distributed in strips.In addition, Anhui Province has the greatest differences between the south and the north.The terrain and cultural differences between the north and south of Anhui are obvious due to terrain and river factors.Therefore, to ensure the diversity of the study area, study area 2 was chosen in the south of Anhui Province, with the same area of 506.25 km 2 .The area is adjacent to the Yangtze River and has abundant water resources.The terrain is high in the south and low in the north, mainly mountains, hills, and plains in study area 2.

Remote sensing imagery
The Sentinel-2 data were obtained from the Copernicus Open Access Hub (https://scihub.copernicus.eu/).They cover 13 spectral bands, from visible to short-wave infrared, and with spatial resolutions from 10 m to 60 m.In addition, Sentinel-2 images are the only optical data with three bands on the red-edge range.Therefore, they can effectively monitor vegetation information and are suitable for remote sensing image classification.

Digital elevation model (DEM) data
The DEM was obtained from NASADEM (https://lpdaac.usgs.gov/news/release-nasademdata-products/)with a resolution of 30 m.This data is based on the Shuttle Radar Topography Mission (SRTM) data with extended coverage and improved elevation accuracy.

Field samples
According to a field survey and the USGS (United States Geological Survey) land cover classification system (Luo et al. 2021), study areas 1 and 2 included six land cover types: cropland, grass, forest, water, building, and barren.This study collected samples primarily through a field survey (collection of cropland, grass, and barren).In addition, for distinctive and easily identifiable land types, this study collected these samples through sub-metre images from Google Earth (collection of forest, water and building).All the samples were more than 10 m away from the surrounding boundary to ensure their purity.A total of 521 and 531 samples were collected in study areas 1 and 2 (Figure 1), respectively, and they were randomly divided into training and validation samples in a ratio of 7:3.

Extraction of single-dimensional features
As shown in Table 1, this study extracted the 12 spectral features, four texture features, and three terrain features (Yang et al. 2018;Kupidura 2019;Yang et al. 2020) based on Sentinel-2 data in study areas 1 and 2, respectively.Among the spectral features, the principal components transformed the Sentinel-2 images and selected the first principal component (PCA1, which contained more than 95% of the information) as a spectral feature.These features have the common characteristic that spectral and spatial information are separated.

Extraction of spectral-spatial features
The single-dimensional features may suffer from inadequate information description.This study integrated spectral and spatial information to construct a spectral-spatial feature, namely the Pixel Neighbourhood Similarity (PNS) index.In addition, to demonstrate whether spectral-spatial features can improve classification accuracy, and which spectralspatial feature performs better in classification.This study selected three spectral-spatial features commonly used in current studies for comparison.

The PNS index
According to previous studies (Kruse et al. 1993;South et al. 2004;Wan et al. 2021), the Spectral Angle Mapping(SAM)is the most representative method of projection-based spectral similarity measurement.It determines the similarity by calculating the cosine of the angle between the test spectrum and the reference spectrum.The smaller the angle, the better the match between the test spectrum and the reference spectrum.Suppose A and B are two attribute vectors.The cosine between two vectors can be calculated using the Euclidean dot product formula: where A i , B i represent each component of vectors A and B, respectively.In p-dimensional space, the cosine similarity cos h is given by the dot product and the vector length, with values between À1 and 1.
It is worth emphasising that the SAM can simultaneously use spectral information from all channels and maintain the shape difference of the spectra well.This property is important for the construction of a spectral-spatial feature.However, this method usually investigates the problem of binary spectral similarity metrics.When calculating the similarity between pixels, this metric should be converted to a multivariate metric (Zhang et al. 2022).This study implemented this conversion through neighbourhood operation.As shown in Figure 2, there are two main vital steps: (1) Joint the spectral dimensional information of pixels and the spatial neighbourhood relationships between pixels.(2) Reduce the interference of different or mixed pixels.
For step (1), it is as follows: 1.A 3 Â 3 template window was defined, and the cosine similarity of the centre pixel in the window to the neighbouring pixels was calculated.2. The importance weight of pixels was assigned according to the spatial distance of the pixels in the window.3. The average result of the window operation was assigned to the centre pixel, but the operation was based on the actual pixels in the window when the window was partially empty.
4. The window was slid in parallel, and the PNS values were assigned to all the pixels of the output image, not just the central pixel.
In addition, it is essential to note that the similarity between different land types cannot be ignored entirely.Mixed pixels or different types of pixels provide a weak dedication of similarity values to the centre pixel.This situation leads to blurred boundaries and mixed classification effects.Therefore, identifying and filtering interference pixels are also a vital problem that needs to be resolved when calculating the PNS index.
Therefore, it is necessary to establish a judgment criterion to filter out the interference pixels.In step (2), this study solved this problem based on the similarity threshold between different land types.Here, the field samples were used as a reference to calculate the similarity in the different types of samples and to find the similarity threshold b.However, it is essential to note that the permutation and combination of all samples (m Â n) will produce many values before calculating b.These values need to be sufficiently pure and stable, i.e. ensure that the values are not too high (the similarity values of the different land types are not too high, actually).Therefore, this study regarded the high values as 'outliers' and excluded them based on the similarity judgment of the same type of samples (threshold a).On this basis, the threshold b was calculated.Eventually, when constructing the PNS index, neighbouring pixels smaller than b were regarded as outliers, and their dedication to the centre pixel was reduced to 0.
In such a way, the PNS index with spectral continuity and spatial relationships was obtained.It is calculated as follows： where i is the centre pixel with m neighbouring pixels, d n is the distance of each neighbouring pixel to i, and R d n is the relative distance weight of each neighbouring pixel to i: Here, R is the spatial resolution of the images.P is the number of bands, and x ik and x nk denote the value of the central pixel and its n th neighbouring pixel on the k th band, respectively.The closer the result is to 1, the more similar the pixel is to the neighbouring pixels, and the more likely it is to be the same type.

Contrast indices
These contrast features include the Vector Similarity (VS) index (Song and Yan 2014), the Change Vector Intensity (CVI) index (Bayarjargal et al. 2006), and the Correlation (COR) index calculated based on PCA1 (Qu et al. 2015).
The VS index VS is a detective method for spatial variation based on feature vectors.It normalized the objects involved in change detection into feature vectors (A and B).The modulus and the angle cosine (cos h) of two feature vectors were calculated to obtain the VS index.The index is calculated as follows: where A i and B i represent each component of vectors A and B in the n-dimensional space, respectively.The VS values are from 0 to 1.The closer to 1, the higher the similarity between the two vectors.The VS index and the PNS index are both cosine similarity measures, but the VS index is more complex to compute than the PNS index.
The CVI index CVI is a research method based on a radiometric variety of data.It determines whether the pixel type has changed by calculating the intensity of the spectral change between different pixels.The intensity of the change is expressed as Euclidean distance.The index is more commonly used for detecting spectral changes in multi-temporal.Here, this study applied it to space to detect similarities between pixels.The index is calculated as follows: where i is the centre pixel, j is one of i's neighbouring pixels, p is the number of bands and x is the value of the corresponding band.

The COR index
The Correlation (COR) index in the Grey Level Co-occurrence Matrix (GLCM) is a spatial similarity metric.It reflects the correlation of the matrix elements in the row or column direction.However, the COR index cannot handle multidimensional spectral data simultaneously.This study extracted it based on PCA1.

Feature selection algorithm
The Correlation-based Feature Selection (CFS) algorithm is the classical filtered method based on search strategies (Hall 1998).The core of this method is using a 'Heuristic' search strategy and a correlation assessment function to evaluate the importance of all feature combinations.The 'Heuristic' assumes that a good feature combination contains highly correlated features with the classification targets, but features are not correlated with each other.The 'Heuristic' calculation equation is as follows: where Merit s is the 'Heuristic Merit' of the feature combination S, the larger its value, the better the S; k is the number of features contained in S; r cf indicates the average correlation between features and classification targets; r ff indicates the average correlation between features and features.
The CFS algorithm uses Symmetric Uncertainty (SU) to calculate the correlation of the above equation.
where both X and Y denote sets of discrete variables containing multiple attribute values in the feature; The CFS algorithm first calculated the feature-target and feature-feature correlation matrices from the training subset.Secondly, the best first search method was used to search for features and calculated estimates for all individual features (represented by Merit values).The feature with the largest Merit value was selected into the feature combination S, and the second largest feature was selected.Suppose the Merit of these two features was less than the previous Merit.In this case, the second feature was removed, and so on, until the largest Merit combination was selected.

Performance quality assessment of features and feature combinations
When measuring the performance quality of the PNS, VS, CVI, and COR indices from the perspective of image quality, information content and classification capability, this study chose the three indicators of Information Gain (InfoGain) (Lee C and Lee 2006), Geary's C index (Tong et al. 2006), and ReliefF algorithm (Tan et al. 2019).The InfoGain is used to describe the ability of features to differentiate data samples and the purity of the classified attributes.The higher the value, the greater the dedication of the feature to the classification targets.The Geary's C index is a spatial autocorrelation index.The closer its value is to 0, the more spatially informative and the higher the image quality.The larger the value, the less spatially informative it is.The ReliefF algorithm uses the 'Hypothesis Margin' to evaluate the classification ability of features.The larger its value, the higher the quality of the feature.
Meanwhile, this study examined the performance of feature combinations from four perspectives: Dimensions (the dimensions of the combination), Merit (the Merit value of the combination), NE (the number of evaluations by the CFS algorithm), and Time (the time consumed by the combination in the classification process).

Accuracy assessment of classification
When evaluating the overall classification accuracy of all land types, this study selected the Overall Accuracy (OA) and the Kappa coefficient.When evaluating the extraction accuracy of each land type, the F1 score was chosen.
OA is expressed as: where p i, i represents the total number of correctly classified land type i, n represents the total number of land types, and N is the total number of samples.Kappa is expressed as: where a i is the number of actual samples in each land type, and b i is the number of predicted samples in each land type.
When evaluating the extraction accuracy of each land type, this study classified the pixels into four types: correctly identified as this class, correctly identified as other class, incorrectly identified as this class, and incorrectly identified as other class.They were represented by True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN), respectively.The F1 score is the harmonic average of the Precision and Recall.Precision and Recall are defined as: The F1 score is expressed as: 4. Experiments

Determine the similarity thresholds a, b in the PNS index
In the similarity threshold experiment, some different types of samples were more similar.Therefore, according to the flow in Figure 2, these outliers need to be filtered by the similarity threshold a. Due to the values were distributed stably (Figure 3a) and presented a half-normal distribution (Figure 3b) in the same type of samples.In this case, this study set a confidence level of 0.95 by trial and error.From the mean (l) and standard deviation (r), the threshold a was 0.919 and 0.893 in study areas 1 and 2, respectively.After excluding outliers higher than a, the remaining values also showed a half-normal distribution (Figure 3c and 3d).Therefore, the threshold b was still calculated at the confidence level of 0.95 and was 0.505 and 0.514 in study areas 1 and 2, respectively.Ultimately, the neighbourhood pixels with similarity values less than 0.505 and 0.514 were treated as different types from the centre pixel and ignored their values when constructing the PNS index.
The PNS index is shown in Figure 4. Six sub-areas (A-F) in study areas 1 and 2 were randomly selected for image presentation.The values were relatively high within the same land type and low at the boundaries of different land types.For example, the cropland in patches with high and homogeneous values.The building and water with clear and continuous boundaries.The results show that the PNS index can highlight the boundaries of land cover types.

Feature combination schemes
The 19 single-dimensional features in Table 1 were involved in the CFS algorithm, resulting in a single-dimensional combination (SIN-CFS).The four spectral-spatial features (PNS, VS, CVI, and COR) were separately involved in the CFS algorithm with single-dimensional features, resulting in the corresponding feature combinations: PNS-CFS, VS-CFS, CVI-COM, and COR-COM.Among them, the CVI and COR indices were filtered out in the experiment, but as the contrast indices for PNS, this study combined the remaining features in the PNS-CFS except for the PNS index with CVI and COR to form CVI-COM and COR-COM, respectively.The features contained in SIN-CFS, PNS-CFS, VS-CFS, CVI-COM and COR-COM are shown in Table 2. PNS-CFS reduced three dimensions and one dimension compared to SIN-CFS in study areas 1 and 2. This indicates that adding the PNS index to the CFS algorithm can effectively reduce the dimension of the combination.

Feature space analysis
For the spatial analysis of features, this study analysed the importance of features, the separation of features to land types and the correlation between features.When analysing the importance of features, the Merit values of each feature are shown in Table 3.It can be seen that the Merit values were relatively high on the spectral features and relatively low on the terrain and texture features.In addition, the Merit values of the spectral-spatial features were ranked from high to low in study area 1 as PNS > VS > CVI > COR and in study area 2 as PNS > CVI > VS > COR.In comparison, the PNS index performed better.Moreover, this study further explored the separation of features to land types and selected the PNS, VS, CVI, and COR indices and the other five features with high Merit values (NDWI, LSWI, CARI, Mean, and DEM) for analysis.As shown in Figure 5, for NDWI, LSWI, CARI, and Mean, the six land types showed a favourable gradient distribution, and the average of each type was highly variable.In contrast, for the PNS, VS, CVI, and DEM, their values were relatively concentrated and had poor separability for land types.For example, grass, forest, and barren are irregularly distributed in study areas, so they had a high span in DEM.Cropland, water, and building are primarily associated with human activity, so their elevations were low and concentrated in DEM.Likewise, the PNS index reflects the similarity between land types, and the values were almost identical within each land type.It also showed a concentrated distribution of values.Therefore, these features not only reflected their ability to separate land types, but also reflected the distribution characteristics of land types.When exploring the correlation between features, this study compared feature correlation in all features, SIN-CFS, and PNS-CFS, to explore whether feature selection and spectralspatial features reduce feature redundancy.As shown in Figure 6, the feature pairs with high correlation (Pearson correlation coefficient !0.8 or À0.8) of all features, SIN-CFS and PNS-CFS were 18.42%, 13.64% and 8.33% in study area 1 and 10.00%, 6.67% and 2. 78% in study area 2, respectively.The ratios became significantly smaller, and PNS-CFS had the lowest correlation between features.Therefore, feature selection can reduce the correlation between features compared to all features, and adding the PNS Index in feature selection can significantly reduce the correlation compared to SIN-CFS.

Performance quality of features and feature combinations
When measuring the quality of spectral-spatial features (Table 4), the PNS index had the highest InfoGain and ReliefF values in study areas 1 and 2, indicating that the PNS has information and the ability to differentiate land types.However, the VS and COR indices had Geary's C of 0.88 and 0.98 in study areas 1 and 2, respectively, and they performed relatively best.Geary's C of the PNS index was intermediate, which may be related to the fragmentation phenomenon and distribution characteristics of land types.Overall, from the perspective of InfoGain, Geary's C, and ReliefF, it can be concluded that the PNS index has high reliability.
When assessing the performance of the feature combinations (Table 5), the Dimensions, NE, and Time (Time calculated based on the ENVI5.3platform IDL8.5 with 2.6 GHz Intel i7-6700HQ CPU and 8 GB RAM) of the PNS-CFS were relatively low, and the Merit values were relatively highest.By comparison, it can be concluded that the PNS-CFS has the best performance.

Classification results
In order to explore the effects before and after adding the PNS index to the CFS algorithm, this paper compared the classification results of SIN-CFS, PNS-CFS, and the original images (ORI).As shown in Figure 7, a Random Forest algorithm (Breiman 2001) was used for classification, and three sub-areas were selected for presentation.The ORI had relatively poor identification for barren.In contrast, the SIN-CFS and PNS-CFS maintained the integrity of the barren very well.Moreover, the PNS-CFS maintained better continuity for narrow roads than the SIN-CFS.Meanwhile, to compare the classification results of the spectral-spatial features (PNS, VS, CVI, and COR), this study compared their combinations: PNS-CFS, VS-CFS, CVI-COM, and COR-COM (Figure 8).For narrower roads, the PNS-CFS performed best than others.It kept the continuity of roads very well.Likewise, for the finer areas between fields, PNS-CFS extracted them as grass and maintained good line properties.However, VS-CFS and CVI-COM still identified them as cropland.COR-COM had a good identification only for these wider areas.In general, the PNS-CFS better maintained the continuity and integrity of land types.This is mainly related to the fact that the PNS index enhances the differentiation boundaries of the land types.

Accuracy assessment
The classification accuracy of all feature combinations was assessed in Table 6.The OA and Kappa were ranked from high to low in study area 1 as PNS-CFS > VS-CFS > CVI-COM > SIN-CFS > ORI > COR-COM and in study area 2 as PNS-CFS > VS-CFS > CVI-COM > SIN-CFS > COR-COM > ORI.The classification accuracy of PNS-CFS was the highest.The OA of PNS-CFS was 94.66% and 93.59% in study areas 1 and 2, respectively.This was 7.48% and 6.02% higher than ORI, respectively.Likewise, this was 7.27% and 2. 39% higher than SIN-CFS, respectively.Furthermore, when comparing the F1 scores for each land type (Table 7), grass, forest, and building had relatively better F1 scores in PNS-CFS and VS-CFS.Water and barren had the best F1 scores in COR-COM and PNS-CFS, respectively.The F1 scores of the cropland were better in the ORI and VS-CFS in study area 1, and PNS-CFS and VS-CFS were better in study area 2. In general, the F1 scores of all land types were relatively stable and high in PNS-CFS and reached over 82%.This was mainly related to the PNS index can enhance the differences between land types and improve the boundaries of distinction between them.Not only that, Figure 9 shows a confusion matrix (with study area 2 as an example).It was used to deeply analyse the feature combinations to extract information about the land types.Each land type was more or less misclassified as building or cropland in all combinations.For example, ORI, VS- CFS, and CVI-COM misclassified 12%, 10%, and 11% of grass as building, respectively.ORI misclassified 13% of forest as cropland.Likewise, 10% of grass and barren was misclassified as cropland and building by COR-COM.These rates of misclassification were relatively high, at around 10%.It is worth noting that PNS-CFS kept the misclassification rate to less than 6% for each land type.Moreover, PNS-CFS improved the extraction accuracy of all land types compared to ORI and SIN-CFS (the extraction accuracy of waters was relatively flat).It also improved the accurate rate of grass, forest, building and barren compared to VS-CFS, CVI-COM, and COR-COM.Among them, the accurate rate of grass and building increased the most.It increased by 4%-13% and 2%-15%, respectively.Therefore, PNS-CFS was superior to other feature combinations.6. Discussion

Analysis of spectral-spatial features
For the spectral-spatial features, this study constructed the PNS index and three contrast indices (VS, CVI, and COR).Among them, spectral and spatial information were considered simultaneously in the PNS, VS, and CVI indices.The COR index separated the spectral and spatial information when it was constructed.Firstly, it compressed the spectral information into a one-dimensional plane and then extracted the spatial features.However, Zhang et al. (2018) proved the superiority of simultaneously combining spectral and spatial information when constructing spectral-spatial features.Their point also was  demonstrated in this study.This is because the COR index had poor performance quality (Table 4) and classification accuracy (Table 6).
In addition, the PNS and VS indices were based on cosine similarity, unlike the CVI index.The cosine similarity distinguishes the difference from the direction.It reflects relative differences in values and is not sensitive to absolute values.Even in high-dimensional data, the cosine similarity remains between 0 and 1.In contrast, the CVI index was based on Euclidean distance.Euclidean distance reflects the absolute difference in values.This characteristic causes the values to be susceptible to dimensionality, and the range of values is unstable.In classification studies, the spectral values are not uniform and are impossibly distributed in the same range due to the different types of pixels.Therefore, when measuring whether pixels are similar, more attention is paid to the relative differences in pixels, and the PNS and VS indices possess this property.In comparative experiments on performance and accuracy, it is easy to find that the PNS and VS indices performed better.However, the VS index had a relatively high computational complexity.In Table 5, the quality assessment of the VS index was relatively lower than the PNS index.In this study, the PNS index considered the spectral information, spatial relationships, and  distance proximity of the pixels simultaneously.It also contained the judgement criterion of similarity, which can effectively filter out interfering pixels, highlight the boundaries of land types (Figure 6) and amplify the differences between land types.

Analysis of feature combinations
The performance quality and classification accuracy of PNS-CFS was relatively highest through the multiple region experiments and the comparisons.PNS-CFS made the extraction results of land types present a high continuity and integrity (Figures 7 and 8).In study areas 1 and 2, the OA of PNS-CFS was 7.48% and 6.02% higher than ORI and 7. 27%, and 2.39% higher than SIN-CFS, respectively.PNS-CFS significantly improved classification accuracy.Moreover, this study only used the CFS algorithm to feature selection.
It is mainly considered that this algorithm can return the features as a combination, which is more convenient when constructing feature combinations.Although the CFS algorithm achieved good results in feature selection, different algorithms are suitable for different scenarios, and no uniform conclusion has been formed yet.Other algorithms such as Factor Analysis (Kondratyev and Pokrovsky 1979), SeaTH algorithm (Vanniel et al. 2005), Gain Ratio (Jeppson 1994), Optimum Index Factor (Chavez et al. 1982).Alternatively, some improved algorithms based on PCA.Such as Folded-PCA (FPCA), Spectrally-Segmented-FPCA (SSFPCA) and Segmented-FPCA (SFPCA) (Xiuping and Richards 1999; Uddin et al. 2019Uddin et al. , 2021b)).These methods can also be considered in feature selection.In general, the PNS index and the CFS algorithm in this study are only used as a reference.

Performance of the datasets and classification algorithms for extracting land types
The Sentinel-2 can provide multispectral images with a short replay period, high spatial resolution, and global coverage.In addition, hyperspectral images also have been widely used in classification studies (Rastogi et al. 2022).Compared to multispectral data, it has near-continuous spectral information, which significantly improves the detection and identification of land types.However, when extracting features based on hyperspectral data, the data must be reduced in dimension, which inevitably increases the computational effort.For the extraction of spectral-spatial features, both spectral and spatial information need to be considered.Although hyperspectral data has massive spectra, this advantage results in relatively low spatial resolution.In large-scale classification studies, open and timely data often helps strengthen land planning and management decisions.
For the classification algorithm, this study used the random forest algorithm.This algorithm uses a parallel approach to construct decision trees, and each tree is constructed using independent samples from the training set.Limited and unusual samples have less impact on the results.Moreover, The Out-Of-Bag (OOB) error can measure the classification accuracy of the random forest (Zhu et al. 2019b).This study used OOB error to measure the classification effect of different feature combinations (Figure 10).In the figure, the OOB error of PNS-CFS was always the smallest compared to the other combinations.This also proved that random forest had the highest classification accuracy in the PNS-CFS.In addition, the deep neural network (DNN) algorithm has also been widely used for classification problems (Cai et al. 2018).However, in this study, the sample sizes in the hundreds did not satisfy the criteria of DNN, and large area classification also affected the classification efficiency of DNN.In the future, deep learning methods will be used to promote the ability of image classification.

Conclusions
This study proposed a combination of spatial-spectral features (PNS-CFS) to classify images.This combination achieved accurate classification and produced reliable results in different study areas.With the following conclusions: 1.The PNS index had high and homogeneous values in the same land cover type and low values in the boundary areas of different land cover types.This characteristic make objects with obvious boundaries.2. In study areas 1 and 2, the PNS-CFS reduced the high-correlated feature pairs by 10. 09% and 7.22%, respectively, compared to all features without feature selection.
Compared to the single-dimensional feature combination (SIN-CFS), it decreased by 5.31% and 3.89%, respectively.PNS-CFS significantly reduced the correlation between features.When comparing the quality of the features, the InfoGain, Geary's C and ReliefF values of the PNS index were relatively best.The performance of PNS-CFS was also relatively optimal when comparing the performance quality of the feature combinations.3. The overall accuracy (OA) of the PNS-CFS was 94.66% and 93.59% in study areas 1 and 2, respectively.This result is 7.27% and 2.39% higher than the combination without the PNS index (SIN-CFS), and 7.48% and 6.02% higher than the original images (ORI).
These results demonstrated that the PNS index is a superior spectral-spatial feature, and the PNS-CFS is a feature combination with high performance and classification accuracy.When classifying images from the perspective of feature extraction, the construction approach of the PNS index and the PNS-CFS can provide an idea.

Figure 1 .
Figure 1.Study areas and field samples distribution.

Figure 2 .
Figure 2. The calculation process of the PNS index.

Figure 3 .
Figure 3. Distribution of similarity values for the samples.(a) Similarity values and (b) half-normal distribution for the same type of samples in study area 1, and half-normal distribution for the different types of samples in (c) study area 1, and (d) study area 2.

Figure 4 .
Figure 4. Detailed display of the PNS index.

Figure 5 .
Figure 5.The ability of features to separate land types.The centerline of each box in the boxplot is the median, and the edges of the box represent the upper and lower quartiles.

Figure 6 .
Figure 6.Correlation between features.From (a) to (c), represent all features, SIN-CFS, and PNS-CFS in study area 1, respectively, and from (d) to (f), represent all features, SIN-CFS, and PNS-CFS in study area 2.

Figure 9 .
Figure 9. Confusion matrix of classification results for each land type based on different feature combinations.

Table 1 .
List of features.

Table 2 .
List of features of the four combination schemes.

Table 3 .
Merit values of all features in the classification.

Table 4 .
Quality of the PNS, VS, CVI, and COR indices.

Table 5 .
Performance of SIN-CFS, PNS-CFS, VS-CFS, CVI-COM, COR-COM.COM and COR-COM combined features from PNS-CFS other than PNS and were not directly involved in the CFS algorithm.Therefore, they do not contain NE and Merit is an estimation interval.

Table 6 .
The OA and Kappa for feature combinations.

Table 7 .
The F1 score for each land type.