Evaluation of ForestPA for VHR RS image classification using spectral and superpixel-guided morphological profiles

ABSTRACT In very high resolution (VHR) remote sensing (RS) classification tasks, conventional pixel-based contextual information extraction methods such as morphological profiles (MPs), extended MPs (EMPs) and MPs with partial reconstruction (MPPR) with limited numbers, sizes and shapes of structural elements (SEs) cannot perfectly match all sizes and shapes of the objects in an image. To overcome such limitation, we introduce novel spatial feature extractors, namely, the superpixel-guided morphological profiles (SPMPs), where the superpixels are used as SEs in opening by reconstruction and closing by reconstruction operations. Moreover, to avoid possible side effects from unusual maximum and minimum values within superpixels, the mean pixel value of superpixels is adopted (SPMPsM). Additionally, new decision forest based on penalizing the attributes in previous trees, the ForestPA is introduced and evaluated through a comparative investigation on three VHR multi-/hyperspectral RS image classification tasks. Support vector machine and benchmark ensemble classifiers, including bagging, AdaBoost, MultiBoost, ExtraTrees, Random Forest and Rotation Forest, are adopted. The experimental results confirm the effectiveness and superior performances of the proposed SPMPs and SPMPsM relative to those of the MPs and MPPR. Moreover, ForestPA outperforms only bagging and is not suitable for learning from large numbers of samples with high dimensionality from the computational efficiency and classification accuracy perspective.


Introduction
In recent years, airborne and spaceborne multi-/hyperspectral remote sensors have advanced in terms of spectral and spatial resolution, which makes the analysis of small spatial structures possible with unprecedented spatial details. Hyperspectral sensors can provide detailed spectral information with hundreds of spectral wavelengths and increase the ability to accurately discriminate the materials of interest. However, the high dimensionality of hyperspectral images may lead to the Hughes phenomenon, which is related to the curse of dimensionality in classification tasks (Camps-Valls, Tuia, Bruzzone, & Benediktsson, 2014). Additionally, while high resolution (HR) and very high resolution (VHR) data solve the problem of "seeing" structural objects and elements, they do not help in focusing on the extraction procedure (Gamba, Dell'Acqua, Stasolla, Trianni, & Lisini, 2011). Two major challenges, namely, the necessity of spectral dimensionality reduction and the need for specific spectral-spatial classifiers, have been identified by the HR and VHR multi-/hyperspectral remote sensing (RS) image processing community (Fauvel, Tarabalka, Benediktsson, Chanussot, & Tilton, 2013;Plaza et al., 2009).
Among the efforts of addressing the abovementioned challenges, the value of adding contextual information for revealing relationships and dependencies among image objects has become one of the most important challenges for the successful analysis of HR and VHR RS images. Principally motivated by the ability of texture features to provide a quantitative description of image properties, including smoothness, roughness, symmetry and regularity, many texture extraction methods, such as statistical (gray-level co-occurrence matrix, GLCM), geometrical and structural approaches; Markov random field (MRF)and conditional random field (CRF)-model-based approaches; and signal processing (Gabor filter) approaches, have been examined for urban landcover mapping (Kasetkasem, Arora, & Varshney, 2005;Ma et al., 2017;Rajadell, García-Sevilla, & Pla, 2013;Zhang, Wang, Gong, & Shi, 2003). However, statistical textures (e.g. GLCM) are typically computed in a moving window with a specified size along certain direction, thereby imposing a crisp and not common for each pixel in the image. Additionally, pixel-based graph models, including MRF and CRF, often suffer from "salt and pepper" noise and cannot capture contextual information about objects. Instead of pixel-by-pixel classification, object-based image analysis (OBIA) and geographic OBIA (GEOBIA) techniques have been widely used to divide images into homogeneous segments and assign semantic labels according to the properties of image segments in HR and VHR image classification tasks. However, due to the complexity and heterogeneity of HR and VHR images, the segmentation process is challenging because this process typically relies on parameters that are highly dependent on the image at hand and the specific tasks (Blaschke et al., 2014;Costa, Foody, & Boyd, 2017;Gu et al., 2017;Ma, Cheng, Li, Liu, & Ma, 2015). For instance, objects and geographic objects often have their own optimal segmentation scales, even within the same class (Zhao, Du, Wang, & Emery, 2017). Accordingly, multiresolution segmentation (MRS) methods have been proposed to segment HR and VHR images at multiple scales. Specifically, the MRS methods aim to partition HR and VHR images into image objects by minimizing the heterogeneity within objects and maximizing the differences across objects. Although MRS-based approaches can produce multi-scale segments by employing various parameters, these approaches still require prior knowledge about the inherent scales of each geographic class, which depend strongly on the spatial and spectral features but also the semantic class.
To add a new family member of contextual information extraction, mathematical morphology-based approaches such as morphological profiles (MPs), extended morphological profiles (EMPs), attribute profiles (APs) and morphological profiles with partial reconstruction (MPPR) have demonstrated the benefits of using geometrical information from HR and VHR images in many urban applications (Benediktsson, Palmason, & Sveinsson, 2005;Dalla Mura, Villa, Benediktsson, Chanussot, & Bruzzone, 2011;Liao et al., 2017). However, being connected filters, these approaches have limitations such as the following: (1) structural elements (SEs) with user-specified shape and size are highly constrained when modeling concepts of the characteristics of size, shape and homogeneity information; (2) attribute filters (AFs) based on geodesic reconciling still suffer from the problem of leakage, which is also referred to as an over-reconstruction problem; and (3) most importantly, a sequence of SEs with predefined sizes and shapes cannot perfectly match all sizes and shapes of objects in a certain image, specifically at a single SE for the entire image at each operation case. To this end, we introduce superpixel guide morphological profiles (SPMPs). All superpixels are used as SEs in opening by reconstruction (OBR) and closing by reconstruction (CBR) operations. To avoid possible side effects from unusual maximum and minimum values within superpixels, the mean pixel value of superpixels, namely, the SPMPsM, is further adopted.
Recently, a new decision forest algorithm, which is called ForestPA and is constructed by penalizing attributes that are used in previous trees, was proposed by Adnan & Islam (2017). According to their experiments on 20 well-known UCI ML datasets, ForestPA is effective in generating highly accurate, more balanced and more evenly suitable decision forests than bagging, random subspace, RaF or ExtraTrees in terms of classification accuracy. ForestPA is found to be analogous to other contending algorithms in terms of complexity. However, the performance of ForestPA in the classification of HR and VHR RS images is unclear, specifically compared with that of bagging, random subspace, RaF, ExtraTrees and RoF. Thus, it is of interest to comparatively investigate the performance of ForestPA in the classification of HR and VHR multi-/hyperspectral images over urban areas, specifically using MPs, EMPs, MPPR and the proposed SPMPs and SPMPSM features.
The main contributions of this paper include: (1) SPMPs and SPMPSM are proposed and comprehensively applied for spatial features extraction in VHR RS images; (2) some effective model parameters are recommended based on different experimental scenarios; (3) ForestPA is introduced and comparatively investigated in the classification of HR and VHR multi-/hyperspectral images over complex urban areas.

ForestPA
ForestPA is constructed by penalizing attributes that are used in previous trees in a decision forest (Adnan & Islam, 2017). More specifically, considering that an attribute tested at a lower level can influence more logic rules than an attribute tested at a higher level, ForestPA imposes weights in a systematic way such that an attribute that is tested at a lower level receives a lower weight/higher penalty than an attribute tested at a higher level. Additionally, ForestPA randomly selects the weight of an attribute from the weight range that is allocated for the attribute's level. The weight range is defined as follows: where λ represent the attribute's level and λ¼ 1 means the root node, respectively, and ρ is used to ensure that the weight ranges for various levels are non-overlapping; it is recommended to set ρ to 0.001 (Adnan & Islam, 2017).
During the construction process, ForestPA imposes weights only on those attributes that appear in the latest tree; the weights of the attributes that do not appear in the latest tree (and thus the weights of the attributes that are obtained from any previous tree) are automatically preserved. By retaining previous weights, ForestPA avoids switching among similar trees. However, this strategy may also have the negative impact of removing the attributes that receive relatively small weights in any subsequent trees. To address this issue, ForestPA adopts the gradual weight increment value of an attribute, which is calculated as follows (Adnan & Islam, 2017): where ω i is the weight of the ith attribute A i from the previous tree and η is the tree height, which is equal to the highest level of the tree. Accordingly, ForestPA can be built by following four main steps as described in Table 1.

Superpixel-guided morphological profiles
Mathematical morphological operators act on the values of the pixels and consider the pixels' neighborhoods, which are determined by SE with predefined size and shape, based on the two basic operators of dilation and erosion. The erosion of an image by SE at any location (x, y) is defined as the minimum value of all the pixels in its SE-defined neighborhood. In contrast, dilation returns the maximum value of the image in the window that is outlined by SE. In grayscale morphological reconstruction, morphological OBR of gray-scale images can be obtained by eroding the input image (f) and using it as a marker (g), while CBR can be obtained by complementing the image, obtaining the opening by reconstruction and complementing the result of the procedure (Benediktsson et al., 2005;Dalla Mura et al., 2011;Liao et al., 2017, Rafael & Richard, 2010. As stated earlier, SEs with user-specified shape and size are highly constrained when modeling objects with different sizes, shapes and homogeneity information. Additionally, a sequence of SEs with predefined sizes and shapes cannot perfectly match all the sizes and shapes of objects in an image, specifically with a single SE for the entire image for each operation case. Moreover, according to the definitions of OBR and CBR, if SEs are replaced with perceptually meaningful atomic regions, namely, superpixels, then we can use many SEs with various shapes and sizes to better match the sizes and shapes of objects in an image. More importantly, the operation provides sufficiently many SEs for the entire image for each operation case. In Table 1. Algorithmic steps of ForestPA.
Inputs: Labeled training set X, number of trees T, set of weights W, set of weight increment values S. Training: Step 1: bootstrap-sampling-based generation of new set X t ; Step 2: build a conventional DT t using X t ; Step 3: update weights and gradual weight increment values of the attributes in the latest tree for (each A j 2 DT t ) do calculate w j and σ j using Equation (1) and Equation (2), respectively; Step 4: update weights of the applicable attributes that are not present in the latest tree for (each A j : A j ‚DT t and w j <; 1) do ForestPA [ DT t f g end Output: ForestPA other words, we can obtain spatial features of similar discrimination capability with lower dimensionality.
Again, according to the definitions of OBR and CBR, the superpixel-guided OBR (SPOBR) can be obtained by eroding the input image using selected superpixels J 0 Ã ¼ J 0 iÃ ; :::; J 0 SÃ f g , where S represents the number of superpixels, and the result can be used as a marker in geodesic reconstruction by a dilation phase: Similarly, we can define for the superpixel-guided CBR (SPCBR), which is obtained by complementing the image, obtaining the SPOBR using 9J 0 iÃ 2 J 0 Ã as SEs, and complementing the result: Finally, the SPMPs of an image f can be defined as To avoid the possible side effects from unusual minimum or maximum pixel values within superpixels, the SPMPsM can be obtained by adding mean pixel values: Although the use of MPs can help in creating an image feature set that carries more discriminative information, redundancy is still evident in the feature set, particularly for hyperspectral images. Therefore, feature extraction can be used to find the most important features first. Then, morphological operators can be applied. After PCA has been applied to the original feature set, the extended SPMPs and SPMPsM can be obtained by applying the basic principles of SPMPs and SPMPsM that are described above, for the first few (typically three) features.

Datasets and experimental configuration
Datasets A ROSIS Pavia University hyperspectral image was acquired with a ROSIS optical sensor, which provides 115 bands with a spectral range of 0.43-0.86 μm. The geometric resolution is 1.3 m. The image, which is shown in Figure 1(a), was captured over the Engineering School, University of Pavia, Pavia, Italy, and has pixel dimensions of 610 × 340 with 103 spectral channels (several original bands were noisy and were discarded immediately after the data acquisition). The validation data refer to nine land-cover classes (as shown in Figure 1 and Table 2). The second hyperspectral image was acquired at a spatial resolution of 2.5 m by the NSF-funded Center for Airborne Laser Mapping over the University of Houston campus and the neighboring urban area on 23 June 2012. The image has 349 × 1905 pixels with 144 spectral bands in the spectral range between 380 and 1050 nm. The 15 classes of interest selected by the Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society (GRSS) are reported for both the training and validation sets (Debes et al., 2014). In this work, a subset image with pixel dimensions of 340 × 1350 is obtained by removing   Test  Training  Total  1  Asphalt  6631  548  7179  2  Meadows  18,649  540  392   19,189  3  Gravel  2099  2491  4  Trees  3064  524  3588  5  Metal  1345  265  1610  6  Bare soil  5029  532  5561  7  Bitumen  1330  375  1705  8  Bricks  3682  514  4196  9 Shadows 947 231 1178 the blank areas and the cloud-covered area in the right side of the image, as shown in Figure 2.
The last set of test data was obtained from a multispectral VHR image ( Figure 3) that was collected over the city of Zurich (Switzerland) by the QuickBird satellite in August 2002, which is freely available at https://sites.google.com/site/michelevolpire search/data/zurichdataset. Originally, the image had a pixel size of 1295 × 1364, was composed by 4 channels (NIR-R-G-B) and was pansharpened to a PAN resolution of approximately 0.62 cm GSD. A total of 7 urban and peri-urban classes were manually annotated: roads, buildings, trees, grass, bare soil, railways and swimming pools. The cumulative number of class samples is highly unbalanced, which reflects real-world situations (see Table 4). In the experiments, we evaluate the generalization accuracy in a leave-one-out setting, that is, by training on a small portion of samples (smaller than 250) and evaluating the classifier on the remaining left-out samples.

Experimental configuration
To generate MPs and MPPR, we applied a diskshaped SE with n = 10 openings and closings by conventional and partial reconstructions, which range from one to ten with a step-size increment of one as recommended by Benediktsson et al. (2005), Dalla Mura et al. (2011) and Liao et al. (2017). Therefore, we obtained datasets with dimensions equal to 70 (i.e. = 10 + 3 × 10 × 2), 67 (i.e.7 + 3 × 10 × 2) and 84 (i.e.4 + 4 × 10 × 2) for ROSIS, GRSS-DFC2013 and Zurich QuickBird datasets, respectively. Specifically, only the first three PCA-transformed features were used for extracting MPs and MPPR for ROSIS and GRSS-DFC2013, and the raw four spectral bands were used for Zurich QuickBird.
To generate superpixels, we apply the simple linear iterative clustering algorithm, which adapts k-means clustering to generate superpixels in an oversegmented manner (Achanta et al., 2012). Notably, the algorithm carries the advanced properties of striking   simplicity, high speed and memory efficiency, free availability and state-of-the-art processing. Moreover, the number of superpixels can range from 100 to 20,000 with step sizes of 10,20,30,40,50,100,200,250,300,500,750 and 1000 for evaluating the effectiveness and configuration of optimal sets. In all cases, SPMPs are generated at the same dimensionality as MPs and MPPR for fair comparison. All the datasets used in experiment are normalized into [−1,1]. As classifiers, we considered bagging (Breiman, 1996), AdaBoost (Rätsch, Onoda, & Müller, 2001) and MultiBoostAB (Webb, 2000), ensembles of C4.5, ExtraTrees (Geurts, Ernst, & Wehenkel, 2006;Samat et al., 2018), RaF (Breiman, 2001), RoF (Rodriguez, Kuncheva, & Alonso, 2006), ForestPA (Adnan & Islam, 2017) and the SVM (Cortes & Vapnik, 1995). The parameters of the SVM are tuned with a crossvalidation technique. The numbers of DTs in bagging, RaF, RoF, ExtraTrees and ForestPA and the numbers of iterations in AdaBoost and MultiBoostAB are set to 100 by default. The overall accuracy (OA), kappa statistic and CPU running time are used to evaluate the classification performances of these methods using EMP, EMPPR, ESPMP and ESPMPsM features. Finally, ForestPA is evaluated in terms of classification accuracy, computational cost and robustness to the numbers of DTs in the ensemble.

Evaluation of SPMPs
To evaluate the performance of contextual information extraction capability of SPMPs using visual interpretation, we first present the OBR, CBR, OBR and CBR with partial reconstruction (OBPR and CBPR, respectively) and superpixel-guided OBR and CBR (SPOBR and SPCBR, respectively) at various scales from the second component of ROSIS Pavia University image in the lower-left corner in Figure 4.
According to the graphs in the left part of Figure 4, the OBR and OBPPR images become brighter as the size of SE increases, while the boundaries between the different land-cover types become increasingly thin, and some parts ultimately merge with other landcover classes. Specifically, many large objects that should remain are filtered, while many small objects that should disappear remain at a high scale after OBPR and CBPR at a specific scale of the area attribute; see the graphs in line 2. In contrast, the boundaries between the different land-cover types remain exactly as in the original; only the areas within the corresponding boundaries are filtered by the proposed SPOBR. Moreover, number of superpixels (scale parameter) affects only the brightness or darkness of the areas within the corresponding boundaries. Similar results can be found from the graphs that are presented in the right part of Figure 4.
To further evaluate the effectiveness of the proposed feature extraction methods, Figure 5 presents the overall accuracy (OA) values that are obtained using various sets of SEs (from 1 to 10 with a step size of 1) in MPs and MPPR and various numbers of superpixels (from 1000 to 2000 with a step size of 100 for the ROSIS Pavia University and GRSS-DFC2013 images and from 1000 to 10,000 with a step size of 1000 for the Zurich QuickBird image) in SPMPs and SPMPsM to extract spatial features.
According to the graphs in Figure 5, the proposed SPMPs and SPMPsM are effective. Moreover, SPMPsM is superior to SPMPs in terms of classification accuracy. Specifically, the best improvements are achieved by AdaBoost using MPs for the ROSIS Pavia University data, by ExtraTrees using SPMPsM for GRSS-DFC 2013, and by AdaBoost using SPMPsM for the Zurich QuickBird data.
The number of SEs, shape and step-size are the main elements that control the multiscale spatial   Total  Train  Test  1  Roads  5070  170  4900  2  Buildings  311,833  233  311,600  3  Trees  251,222  222  251,000  4  Grass  120,532  232  120,300  5  Bare soil  16,043  143  15,900  6  Rails  79,153  153  79,000  7  Pools  72,429  229  72,200 feature extraction capabilities of conventional MPs and MPPR. Similarly, and according to the definition of the proposed SPMPs, the number of superpixels S and the scale step-size are the two critical parameters for SPMPs and SPMPsM. Figure 6 presents the results for the configuration of these parameters using RaF with 200 DTs for the considered datasets.
SPMPs are generated at the same dimensionality with MPs and MPPR for fair comparison. In this sense, there are 15 ranges with a step size of 100 but only 2 ranges with a step size of 750 from 100 to 15,000 (numbers of superpixels) for the ROSIS Pavia University and GRSS-DFC2013 images, which is why the lines in Figure 6 are composed of different numbers of dots. Based on the results shown in Figure 6, both the number of superpixels and the scale step-size can affect the performance of SPMPs and SPMPsM in terms of classification accuracy. The use of too many or too few superpixels cannot result in positive improvements in OA values. For instance, first positive and then negative improvements in OA values are observed in Figure 6 (ac, e) because at a specific size of a given image, the use of too few superpixels results in large individual superpixels that cover multiple different ground objects, while the use of too many superpixels results in small individual superpixels that cannot provide valuable contextual information. These results can also explain the result that SPMPs can outperform SPMPsM in scenarios with too few superpixels, in which the mean pixel values from a superpixel cover multiple ground objects, which can corrupt the spatial discrimination capability of SPMPs (see Figure 6(a)). In contrast, SPMPsM can outperform SPMPs in cases of small individual superpixels, which cannot provide valuable contextual information, whereas both spectral and spatial discrimination capabilities can be improved by considering mean pixel values (see Figure  6(c-f)). The effects of scale step size are more complex and mixed with effects from the number of superpixels since a larger scale step size ultimately leads to a larger number of superpixels. Additionally, the effects of the scale step size are different on various test images with  various object and image sizes. According to the results that are shown in Figure 6, optimal ranges for the number of superpixels are 100-1000, 600-1600 and 1000-10,000 for the ROSIS Pavia University, GRSS-DFC2013 and Zurich QuickBird (c, f) images, respectively. Accordingly, the optimal ranges for the scale step size are 10-100, 10-100 and 10-1000 for the ROSIS Pavia University, GRSS-DFC2013 and Zurich QuickBird (c, f) images, respectively.

Evaluation of ForestPA
In this part, we first evaluate the classification performance of ForestPA as a function of its critical parameter: the number of DTs in the ensemble. Benchmark and widely accepted EL classifiers including bagging, AdaBoost, MultiBoostAB, ExtraTrees, RaF and RoF are considered with the recommended parameters, except that the numbers of iterations for AdaBoost and MultiBoostAB, and numbers of DTs in other ensembles are set to the same value as the number of DTs in ForestPA for fair evaluation. Figure 7 presents the results of all considered approaches using different features from all three test images, where the y-axis represents the OA values and the x-axis shows the number of DTs in the ensemble or the number of iterations.
According to the graphs in Figure 7, while the highest OA values are achieved by either RoF (black curves with five-pointed stars), AdaBoost (red curves with rectangles) or MultiBoostAB (pink curves with upward-pointing triangles) in most cases, better only than bagging (blue curves with diamonds) OA values reached by ForestPA (cyan curves with downward-pointing triangles), particularly using MP, MPPR, SPMP and SPMPsM features with high dimensionality. ForestPA reached better OA values than only RaF using SPMP features from the ROSIS Pavia University image, using the first seven principle components from the GRSS-DFC2013 image, and using raw spectral features from the Zurich QuickBird image; see the OA curves shown in cyan for ForestPA and shown in yellow for RaF in Figure 7(d, f, k). Additionally, increasing the tree size beyond 100 does not yield detectable improvements in OA values for ForestPA using any of the considered features, which is similar to the findings for RaF in many works (Belgiu & Drăguţ, 2016;Du et al., 2015;Pal, 2005). The performance of boosting ensembles such as AdaBoost and MultiBoostAB could be limited by overfitting due to the focus on challenging-to-classify but fewer sample criteria, particularly for small numbers of training samples with low discrimination capabilities, as shown in Figure 7(k).
In addition to classification accuracy, computational efficiency is considered a key factor when evaluating classifier performance. According to the algorithmic description of ForestPA in the "ForestPA" section, the nested loop operations slow classifier operation. However, its speed with respect to other benchmark EL classifiers is unclear, particularly in the classification of VHR RS images using features of different dimensionality. Thus, Figure 8 shows the CPU running times (in seconds) for bagging, AdaBoost, MultiBoostAB, ExtraTrees, RaF, RoF and ForestPA in the training and classification phases. A total of 100 C4.5 DTs for bagging, RaF, RoF and ForestPA, 100 ERDTs for ExtraTrees and 100 iterations for AdaBoost and MultiBoostAB with C4.5 DT are used for both computational efficiency and unbiased evaluation.
According to a comparison of the charts in Figure 8, ForestPA is the slowest of the methods in the training phase, particularly with high-dimensionality data, as expected. However, interestingly, ForestPA is slower than RaF but faster than others in the classification phase, while the worst efficiency is shown by RoF. In contrast, ExtraTrees shows the best computational efficiency, followed by RaF in the training phase. Additionally, the best efficiency is shown by RaF, while AdaBoost, bagging, MultiBoostAB and ExtraTrees perform similar in the classification phase. Our results first confirm that ForestPA is better than bagging, which is accordance with the findings by Adnan & Islam (2017). However, their finding of ForestPA is better than RaF and ExtraTrees in terms of classification accuracy is arguable, especially in processing of high dimensional data case. This could be explained by the fact that attribute penalizing strategy may not correctly weight the carried logic rules when the attributes are high dimensional and may be highly correlated (such as in our case). Consequently, the negative impact of removing the attributes that received relatively small weights in any subsequent trees could be severed and limit the gradual weight strategy that ForestPA adopted. Another finding is that ForestPA has slow model training efficiency due to the nested loop operations, but compatible in classification phase compared with considered classifies. Summing-up, ForestPA may not accommodate large numbers of samples with high dimensionality from the viewpoint of computationally efficient model training, and may not be suitable for handling data with a high dimensionality and high inter-band correlation from the high accuracy classification point of view.

Classification maps
To further compare the adopted approaches for urban land-cover mapping using PCA-transformed features or raw spectral bands and MP, MPPR, SPMP and SPMPsM features, Figure 9 shows the best land-cover maps with OA values that correspond to the numbers that are highlighted in boldface and underlined in Tables 5-7.   The improvements from SPMPs and SPMPsM are clear. For instance, the highest classification accuracy values (OA = 97.67%) are reached by AdaBoost using SPMPs features for the ROSIS Pavia University image, by ExtraTrees (OA = 94.55%) using SPMPsM features for the GRSS-DFC2013 image, and by AdaBoost (OA = 91.50%) using SPMPsM features for the Zurich QuickBird image. Moreover, if we compare the results from a specific classifier when using different features, all the classifiers uniformly show higher classification accuracy values when using SPMPs than MPs, MPPR and SPMPsM on the ROSIS University image, and they show higher classification accuracy values when using SPMPsM than MPs, MPPR and SPMPs on the GRSS-DFC2013 and Zurich QuickBird images, thereby confirming the results obtained in the "Evaluation of SPMPs" section. In addition, ForestPA is superior (+1% to +8% on OA values for the ROSIS Pavia University image, +2% to +5% on OA values for the GRSS-DFC2013, and approximately +1% on OA values for the Zurich QuickBird image) relative to bagging using various types of features from the considered test images. ForestPA shows similar results (±1%) on the Zurich QuickBird image, but worse results (−1% to −6%) on the ROSIS Pavia University and GRSS-DFC2013 images than those of ExtraTrees and RaF in terms of classification accuracy.

Conclusions
In this paper, we have presented the implementation details, analyzed the parameter sensitivity and presented a comprehensive validation of two novel spatial feature extractors, namely, SPMPs and SPMPsM, where the latter contains the mean values of superpixels. Then, we introduced a recently proposed ML approach, namely, ForestPA, which is similar to RaF but is constructed by penalizing attributes used in previous trees in a decision forest. ForestPA was shown to be superior to the bagging, random subspace, RaF and ExtraTrees algorithms. To fully investigate and evaluate the performance of ForestPA, three VHR multispectral and hyperspectral RS images over urban areas were selected.
The results show that the proposed SPMPs and SPMPsM are effective for urban land-cover mapping using VHR multi-/hyperspectral RS images. Regarding the influence of the critical parameters of SPMPs and SPMPsM, both the total number of superpixels and the scale step size are crucial, but the former has less effect than the latter. The optimal number of superpixels depends not only on the size of the image at hand but also on the sizes and shapes of the objects in that image. Thus, the total number of superpixels must be tuned before further implementation. In general, SPMPs are more compatible with superpixels of large size for introducing more spatial information, with the potential shortcoming of including mixed features from multiple objects. In contrast, the SPMPsM provides better results with superpixels of smaller size, whereas both spectral and spatial discrimination capabilities can be improved by considering the mean pixel values.
Comparing the algorithmic details of ForestPA with those of conventional RaF, we find that the former is more complex than the latter due to its procedure for penalizing attributes. This finding was confirmed by all the experiments, not only at the training model phase but also at the classification phase with respect to RaF. Specifically, ForestPA showed the worst computational efficiency in the training phase using high-dimensional training sets, even worse than that of RoF. In the classification accuracy evaluation, ForestPA outperformed only bagging and performed worse than ExtraTrees and RaF in most cases and uniformly worse than AdaBoost, RoF and MultiBoostAB on all considered test images. In summary, ForestPA may not be suitable for addressing cases in which the training set is large (e.g. big data scenarios), from both a computational efficiency and classification accuracy perspective.
In future work, we plan to focus on the selfadaptive selection of the number of superpixels and the scale step size in SPMPs and SPMPsM for moderate-and high-resolution multi-/hyperspectral and (probably) fully polarimetric SAR images with larger spatial coverage to further improve their classification performance. Additionally, the adoption of highperformance computing techniques, such as parallel, cluster and CPU computing, to accelerate ForestPA is worthy of investigation.