Landsat 8Bands’ 1 to 7 spectral vectors plus machine learning to improve land use/cover classification using Google Earth Engine

ABSTRACT This paper explores a spectral vector-based methodology on Landsat 8 bands of the visible wavelengths, that is deep-blue (1) to shortwave infrared (7), to improve the urban land features classification. Using two different ratio models, based on two and three bands’ combinations in the cloud environment of Google Earth Engine, the Uncertainty reducing Spectral Vector (USVr), the Onward Continuous Spectral Vector (OSVc) and the Onward Discontinuous Spectral Vector (OSVd) are proposed as new entries for the land use land cover (LULC) classification. Two different sizes of arrays are built, i.e. 42 vectors and 15 vectors corresponding to the same number of derivative bands and new pixels′ values. A decision tree is built in J.48 and applied to select the most suitable derivative bands for the analysis. Hereafter, the selected ones are stacked and submitted to five machine learning classifiers using a supervised process, namely, Classification and Regression Trees (CART), Random Forest (RF) Gradient Boosting (GBR), Support Vector Machine (SVM) and Minimum Distance (MD). This method was tested in the two cities of Bamenda and Foumban in west-Cameroon highlands, due to their good representativeness of tropical hilly urban areas’ spatial heterogeneity. The results are satisfying for 4/5 classifiers, up to 87% Overall Accuracy, OA, for 0.82 kappa coefficient, KC, in Bamenda, while combining SVM/OSVd. Whereas, in Foumban, the classifiers perform up to 85%OA and 0.78 KC for the combination SVM/USVr. Only the MD classifier has always performed below 80%OA. The process has been found better than performing classifiers directly on the multispectral (MS) image, by providing more possibilities of hidden spectral indices not yet explored, as far as we know, to detect and discriminate between LULC features, plus an accurate extraction of human settlements.


Overview
Global urban growth is a major concern for the international community. This is understandable in regard to the dizzying growth of the world's population, whom more than half live in cities, and will further increase to 66% by 2050 (United Nations 2014). Specifically, urban sprawl considerably affects land covers and terrestrial ecosystems (Zhang et al. 2015;Zhou, Xing, and Ju 2017). As such, numerous planning strategies supported by spatial research have been increasingly developed and improved to master related challenges, so as to prevent, and solve further issues.
Earth Observatory (EO) technologies have been recognized and widely used for targeted or generalized land cover mapping over the decades (Van der Meer et al. 2002;Bhatta, Saraswati, and Bandyopadhyay 2010;Taubenbock et al. 2012;Patino and Duque 2013). One of the most challenging tasks is the mapping of urban areas, due to their land use land cover (LULC) heterogeneity and rapid changes impacting spatial patterns (Parker et al. 2002;Verburg, De Groot, and Veldkamp 2003). Several techniques are employed to overcome the above issues. Image classification is among the most common processes, and consists in grouping pixels into meaningful classes according to different algorithms (Masek, Lindsay, and Goward 2000;Lu and Weng 2007;Zhang et al. 2002;Congalton and Green 2009).
Various land cover classification approaches have been used to map land covers, with two groups, i.e. parametric and non-parametric classifiers. Popular classification algorithms include the k-means, iterative self-organizing data analysis technique, Mahalanobis distance (MahD), maximum likelihood (ML), spectral angle mapper (SAM) parallelepiped (PP), and minimum distance to means (MD). Whereas advanced classification algorithms include artificial neural networks (ANN), support vector machine (SVM), and decision trees (DT) including classification and regression trees (CART), random forest (RF), gradient boosting regression (GBR). Extended review of different classification approaches and their evolution have been made by several authors (Tou and Gonzalez 1977;Richards and Richards 1999;Breiman 2001;Lu and Weng 2007;He et al. 2015;Gómez, White and Wulder, 2016;Faridatul and Wu 2018). Moreover, depending on the purpose of the study and the spatial resolution of the image, the pixelbased or object-based image analysis (OBIA) can be envisaged (Ma et al. 2015;Li et al. 2016;Ma et al. 2017).
Aside from this, spectral metrics have also been proven effective to characterize and estimate different environmental features such as vegetation, water, soil and builtup (Kavzoglu and Colkesen 2009;Zhai et al. 2002;Li and Wan 2015). For human settlements and artificial surface distribution specifically, spectral indices such as the urban index, UI (Kawamura, Jayamanna, and Tsujiko 1996), the normalized difference built-up index, NDBI (Zha, Gao, and Ni 2003), its modified version, MNDBI (He et al. 2010), and the normalized difference impervious surface index, NDISI (Hanqiu 2010) were proven to be efficient. Some inclusive methods combined existing indices and other derivative bands for the same goal (Xu 2006;. Further methods consider the combination of index-based detection with classification algorithms.

Specific background
These two methods particularly caught our attention and helped to shape the present study. On the one hand, the Normalized Difference Spectral Vector (NDSV) envisages the computation of all possible normalized spectral relations between Landsat ETM+ bands (Angiuli and Trianni 2014). Consider two consecutive bands b i and b j , their relation is expressed as a function f , such as: The six bands covering the visible, near, medium and shortwave infrared lead to 15 normalized derivative bands. Their stacking gives 15 new values for each pixel, in an array of elements called Normalized Difference Spectral Vector (NDSV) written such as: (2) This is assumed to minimize the risk of confusion between features, by compensating the lack of one index per land feature, when considered isolated. Three classifiers, SAM, ML and SVM, were performed, in two contrasting environments -desert and rocks (Tripoli in Libya); forest and lagoon (Lagos in Nigeria). The results were satisfying with a global accuracy of 97%. On the other hand, using Landsat OLI, some existing spectral indices are computed, stacked with multispectral images and submitted to a decision tree (DT) built in the J48 model to select the most suitable bands per class (Lee, Acharya, and Lee 2018). Based on the sum of occurrences along the tree, selected bands are submitted to different stacking combinations. The test was conducted in the hilly regions of South Korea and gave satisfying results for 28 combinations out of the 45 that were run, when applying five different supervised classifiers (MahD, ML, MD, PP and SVM).
Considering different types of cities' natural surroundings, with mixed spectral signals, these methods still need some improvements. As such, issues related to low and moderate resolution images (Faridatul and Wu 2018), as well as unplanned urbanization, mixed built-up material or complex biophysical environments (Ngandam Mfondoum et al. 2019, 2021a, 2021b remain challenging. • Goal and scope of the study This paper merges the spectral vector and decision trees (DT) method. We explore the performance of formulating spectral vectors in different ways, to discriminate LULC classes of heterogeneous areas for selected machine learning algorithms. The assumption is that, different ratio functions, not necessarily normalized, can offer more possibilities to assess differences or similarities of reflectance for the same (group of) pixel(s), between two or more sections of the covered electromagnetic spectrum. However, instead of using all the spectral vectors, an automatic choice of only the suitable ones through a DT will ensure a better accuracy of the classification process.

Study sites
Two hilly cities of the Cameroon western highlands, globally aligned south-north, were chosen for the experiment, based on the high heterogeneity of their land cover features (Figure 1a,b). The first site, Bamenda (Figure 1c), is perched on the hillside of Mont Oku with an average altitude of 1,600 m in the North-West administrative region, between latitudes 5°55'-6°1ʹN and longitudes 10°7'-10°14ʹE. The surrounding vegetation is a decreasing highland forest mixed with tea plantations, the soils are brown and dark volcanic, while swampy areas, volcanic lakes and hill shadows are other natural features impacting the environment. The second site, Foumban (Figure 1d), overhangs the Bamun Plateau in the West administrative region, between latitudes 5°42'-5°45ʹN and longitudes 10°51'-10°55ʹE, an average altitude of 1200 m. The dominant vegetation is the sparse bush-savannah, growing on red-brown to dark soils, the terrain is very rough in some areas, shadowing deep valleys where flow a couple of rivers and small streams.
The annual maximum precipitation, up to 3300 mm in Bamenda and 2400 mm in Foumban, combined with the topography, decreasing land cover and anthropogenesis exposing these sites to recurrent deadly rainfall-triggered landslides, emphasized by the unplanned urban sprawl  (Nyambod 2010;Ngandam Mfondoum et al. 2021b). Moreover, human settlements mix built-up (aluminium roofs, cement brick walls, pavement roads) and rural housing (terracotta brick walls, straw roofs). These conditions explain the mixing of spectral signals with LULC features, always challenging during image transformations and classification (Schueler 1994).

Data and working environment
The data used are those freely available online, specifically, Landsat 8 and Sentinel2A products, which have been proven efficient when combined (Xi et al., 2019). Images of Landsat mission 8 from Operational Land Imager (OLI) sensors have been selected to conduct the experimental study (Tables 1 and 2). The top-ofatmosphere Level 1 C products delivered as digital numbers (DN) were selected for the current year 2021 for formal analysis and 2016 for a focus on urban change. The dry season images of January and February were uploaded, because it is a very challenging phenological window, characterized by the very close reflectance of bare soils and built-up on the one hand, and swamps, proper surface water, hill shadows, as well as dense vegetation of valleys on the other (Tables 1 and 2). Further, Sentinel-2A multispectral imaging (MSI) satellite products were used as high-resolution images for test sampling to proceed with accuracy assessment (Tables 1  and 2). For the two images, the seven bands covering the visible, near-infrared (NIR) and shortwave infrared (SWIR) were selected. These images were displayed online in the cloud environment of the Google Earth Engine (GEE) platform, where the processing was implemented using a JavaScript open source simplified coding process. The outputs were only exported towards offline tools, i.e. desktop software, for final layout. ArcGIS Pro 2020 specifically was used for the purpose. Five major reflectance are dominating and consequently, five LULC classes were identified, such as, water, vegetation, soil, built-up and swamp/hill shadows ( Figure 2).

The new spectral vector-based method
The Landsat8-OLI visible to infrared dynamic range spectral vector is considered in several ways. This can be done by computing or transforming each band, among two or more consecutive others, as well as by extrapolating or modelling the potential reflectance of two or more non-consecutive bands throughout the covered wavelength. Several normalized and non-normalized indices have also been computed between two or more spectral bands, with or without a weighing factor (online: USGS Index Database-IDB).
In this study, two main approaches are developed.
• The first one considers a function g that defines the relationship between two bands b i and b j , such as:  The output was set at 10 m in the coding process to keep the highest resolution.
Thoroughly in this case, the absolute value of the difference between two selected bands, divided by each band separately, estimates the relative uncertainty of the reflectance so as to increase the accuracy of the information to be extracted. Therefore, the function g is a two-dimensional component function that belongs to a vector-valued function r, with the aim of creating a maximum derived p values for any p pixel, such as: The sum of different functions corresponds to stacking derivative bands, and gives an array named the 'Uncertainty reducing Spectral Vector' ̎ , (USV r ) and presented as follows: ð Þ 2 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 5 (5) • The second method establishes the relationship between three selected bands b i ; b j ; b k À � , by a function h, according to two alternatives expressed in Equations 6 and 7: In this case, the structure of ratios explores different perspectives of reflectance for consecutive bands in two ways. For any previous b i band to be related to the two following and consecutive b j and b k bands, function h 1 , establishes the continuous spectral relationship, while the function h 2 envisages a discontinuous spectral relationship, by skipping and reinserting b j . With this processing, the two functions, h 1 and h 2 , maximize the retrievability of reflectance information from three spectral bands at the same time for any p pixel, by integrating as much as possible their covered wavelengths lower limit (ρl i , ρl j , ρl k ), and upper limit (ρu i , ρu j , ρu k ) inside the electromagnetic spectrum. The vector-valued function, r, is based on a couple of h three-dimensional component functions that maximize the derived p values for any p pixel, such as: In the two cases, 15 spectral values are produced for each pixel. After stacking the derivative bands, the array created is named after the method, i.e. the ̎ Onward continuous Spectral Vector ̎ (OSV c ) for h 1 (Equation 9) and the ̎ Onward discontinuous Spectral Vector ̎ (OSV d ) for h 2 (Equation 10). They are presented as follow: ð Þ 2 6 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 7 5 (9) ð Þ 2 6 6 6 6 6 6 6 6 6 4 3 7 7 7 7 7 7 7 7 7 5 (10)

Selection of suitable derivative bands for land features detection
In each case and based on the multispectral (MS) image, five land cover features were sampled, such as water, vegetation, soil, built-up and hill shadow/ swamp, using the point selector instead of polygon for the pixels' purity purpose and observing two to four different colour tones per class. Their spectral vector values were exported as tables, and the C4.5 algorithm, one of the classifier trees J48 in the WEKA 3.8.5 data mining software was applied, to define the most suitable derivative bands for each land feature detection. This process generates trees as small as possible, on purpose to ease the understanding of the relationship between different levels of selection or hierarchy (Quinlan 1993). The algorithm makes use of ̎ Entropy ̎ , i.e. the measure of uncertainty associated with a random variable, for the construction of smaller trees. Given a set of examples D, the original entropy of the dataset is expressed such as: Where H D ½ � represents the original entropy, C indicates the set of desired class, and P assesses the purity of the said class. By increasing the pruning process, i.e. removing the branches that make use of features having low importance, the data become purer and purer, and the entropy value becomes smaller and smaller. For instance, if we make attribute A i , with v values, the root of the current tree, this will partition D into v subsets D 1 ,D 2 , . . .,D v . Therefore, the expected entropy if A i is used as the current root is expressed such as: Information gained by selecting attribute A i to branch or to partition the data is given by the difference of prior entropy and the entropy of selected branch as follows: It has been used in several studies of land cover classifications, but also for the best image selection (Lee, Acharya, and Lee 2018).

Machine learning classifiers used
Machine learning makes no assumptions about frequency distributions and is becoming increasingly popular for classifying remote sensing data, which rarely have normal distributions. Five different and/or improved ML algorithms were performed to train the spectral vectors.
• Classification and Regression Trees (CART) is based on the framework of hierarchical decision trees (details in Breiman, Friedman and Olshen 1984). The principle uses the Gini Impurity Index to decide the input features, which will provide the best split at each node. • Random Forest (RF) is an advanced computation of known DTs for refined results, by breaking down the classification process into a series of trees through a bagging approach (details in Breiman 2001;He et al. 2015). There are two main userdefined parameters, i.e. the number of decision trees and the number of random variables to be split at each node. We fixed the number of trees at 100 for this study. • Gradient Boosting Regression (GBR) combines a regression tree with a boosting algorithm, in such a way that it fits different types of data distributions (details in Friedman 2002). Boosting improves the performance of a simple base-learner by reweighting observations that were misclassified or had large residual errors in the previous iteration. A pseudocode was proposed to summarize the algorithm (details in Mountrakis, Im, and Ogole 2011). • The Support vector machine (SVM) performs classification tasks by constructing hyperplanes in a multidimensional space that separates linear and nonlinear samples of different class labels (details in Cortes and Vapnik 1995;Otukei and Blaschke 2002;Mountrakis, Im, and Ogole 2011). The principle is to obtain a separating hyperplane or decision boundary by maximizing the distance between these hyperplanes and the classes sample. In this study, 10-fold cross-validation was utilized which overcomes the overfitting problems. The optimized parameters cost, penalty and gamma (C, P, γ) were set with the best results in 100, 0, 0.25, respectively, for both sites. The gamma (γ) value was varied to 0.5, depending on the spectral vector. • The minimum distance (MD) function provides a relationship metric between each element in the dataset. For image classification purposes, the relationship can be based on the spectral distance or the spectral angle. The model with a distance measure that best fits the data with the smallest generalization error can be the appropriate proximity measure for the data (Kotu and Deshpande 2018). The cosine distance was used.

Accuracy assessment
For evaluation, the accuracy assessment measures were computed. The two methods used in this study to compare the performance of the algorithms in improving the land cover/used classification are the overall accuracy and the Kappa index. Several studies have made use of random sample pixels to conduct accuracy assessments (Hamedianfar et al. 2014;Li and Wan 2015;Akar and Güngör 2015). We approach this step based on the principle tested by Goodchild et al. (1994), who suggested using a minimum of 50 random points sample for each class to assess accuracy. Since the subsets extracted per studied area did not contain the same number of pixels for some classes (water and shadow specifically), we opted for the same number of training sample on Landsat 8 image and testing sample on Sentinel2-A image, i.e. a [1:1] ratio. This procedure was repeated for each classification to maintain the independence among the validation data sets (Sothe et al. 2017 Concerning the Kappa coefficient, its equation and computation derive from classification metrics that are, confusion and error matrices amongst classes. It is expressed such as: Where D ii is the number of observations in the row i and column i of the confusion matrix, n is the number of rows in the error matrix, N is the total number of counts in the confusion matrix, x iþ is the marginal total of a rowi, and x þi is the marginal total of the columni. The classification was done on five classes, with each of the four sub-classes according to their reflectance.

Most suitable derivative bands per spectral vector and feature class
Following the spectral-vector computations and stacking, two sets of arrays and derivative bands were obtained, such as 42 for USV r , and 15 for OSV c as well as OSV d (Table 4). This implicates as many new pixel' p values.
However, none of the spectral-vectors were found suitable for detecting all sampled land features. Therefore, at least four individual trees were produced for each tone of land cover feature, and the derivedbands with only information gain for learning were selected for the final stacking. The pruning of each tree resulted in a size of 31 for USV r with 16 leaves, and a size of 15 with 7 leaves for OSV c and OSV d . Depending on the sampled pixel's purity, the band selection results from the targeted land feature along the leaves or from the closest noisy land feature (Figures 3 and 4).   In general, the automatically performed accuracies of bands or instances were less than 50%. For each pruning process of a single sample in Bamenda, the overall accuracy, OA, for USV r is 38% corresponding to 16/42 bands with a kappa coefficient, KC, of 0.36, versus 40.5% OA, i.e. 17/42 bands, and 0.39 KC, in Foumban. For both sites, OSV c and OSV d gave the same OA, 47%, which equals to 7/15 bands, for 0.43 KC.
Further, our proper reading of the trees structure and outputs has helped for the final selection. The criteria were as follows: i) select all bands that are automatically listed at the end of tree (pure lineage); ii) eliminate every band that is at the end of the tree pruning process with a non-targeted land feature; iii) count the confusion band among at least 3 samples of the non-targeted land feature for acceptance in the final listing. Then, for the USV r , 11 and 9 derivative bands were rejected, versus 31 and 33 sustained, respectively in Bamenda and in Foumban (Table 5). While for OSV c and OSV d , 3 bands were rejected and 12 were sustained, in the same order of sites (Table  5). For the selected bands, a plot matrix ( Figure 5) and a histogram of the new pixels' value ( Figure 6) helped to assess their ability to detect and separate objects. As expected at the beginning of the process, the USV r predicts the ability detecting and discriminating the land features with positive values, contrarily to the OSV c and OSV d that oppose positive and negative values to contrast LULC classes.

Performance assessment of the classifiers 3.2.1. Validation and visual patterns
The overall accuracy, OA, and kappa coefficient, KC, vary substantially depending on the spectral vector. The agreement between selected patterns can be assessed using six levels of kappa values, from poor to almost perfect (details in Landis and Koch 1977). Here, the KC values start at 0.61 and more, i.e. at least a substantial agreement. • The performances show that SVM and RF are the best, followed by CART and GBR, whereas MD records the lowest performances. The highest and lowest values are commented on in this section, and the complete data are detailed in subsequent tables.
In Bamenda, SVM has recorded the highest performance and MD the lowest.  (Table 6a).
In Foumban, RF and SVM have recorded the highest performance and MD the lowest. For the USV r , the best classifier is RF with 87.5% OA and 0.81 KC, and the lower performer is MD with 73% OA and 0.63 KC. For the OSV c , SVM is the highest performer with 84% OA and 0.76 KC, whereas the MD records 70% OA and 0.61 KC. For the OSV d , RF best enhances the land Table 5. Rejected bands per spectral vector (see Table 4 for derivative bands' explanation).  features when applying RF, with 84% OA and 0.76 KC, while MD lowly performed with 72.4%OA and 0.64 KC (Table 6a).
• The quality of each SV to extract land features according to each classifier reveals that USV r is generally better than OSV c and OSV d .   (Table 6b).
Accordingly, the layout of the classification maps shows good results when visually compared to the features on the surface reflectance image (Figure 7a, b).

Performance on each land cover class
A thorough analysis of each classifier's performance on individual land feature class also gives variable results. The error matrix has been used to extract the user's and the producer's accuracies, UA/PA, per class. They are expressed as such: UA is the inverse of the error of commission, i.e. UA = 1 − commission error, and PA is the inverse of the error of omission, i.e. PA = 1 − omission error. The UA is more commented in this section, because it reflects the reliability of the classification, as the more relevant measure to the end-user utility in the field.
In Bamenda, globally, water and soil features are at the extremity, as, respectively, the highest and the lowest UA. The general information per SV is synthesized below and detailed in Table 7a: Colour key: Green = first; Cyan = second; Red = third. Color key for UA: Green = best performer; Cyan=second performer with at least 80% OA; Red = lowest performer.
• For the USV r , water is classified at 100% UA with all classifiers, and vegetation is also recorded 100% UA, but only with CART algorithm. On the contrary, the soil class always recorded the lowest values, between 46.4% UA and 68.5% UA, when performing the MD algorithm. The range of PA is between 53.3% for shadow class with MD and 100% for vegetation when applying RF and MD. • For the OSV c , water is also classified at 100% UA with all the algorithms. Whereas the soil class is lowly detected with only 49.4% UA when performing MD. The values of the PA vary between 50% for shadow class with MD algorithm and 100% for the vegetation class with RF classifier. • For the OSV d , water is still classified at 100% UA with all classifiers. The soil class, with 46.6% UA, is the lowest value when performing the MD algorithm. The PA values range between 40% for shadow class with MD and 100% for vegetation class with RF.
In Foumban, water, vegetation, built-up and shadow features switch to the first position with high UA percentages. Whereas the lowest UA is mostly from soil class. Details per SV are presented in Table 7b.
• For the USV r , water is classified at 100% UA with four classifiers, except CART, Shadow recorded 100% UA with RF and SVM algorithms, and the highest value when applying CART are obtained for vegetation with 92.4% UA. At the opposite, the lowest value when using CART is for shadow class, with 45.8% UA while for the four remaining classifiers, it is soil class with values between 26.9% UA when using MD and 63.15% UA when using RF. The range of PA is between 48% for soil class with RF and 97.4% for vegetation when applying CART. • For the OSV c , water is classified at 100% UA when applying GBR, SVM and MD, Shadow recorded 100% UA with SVM, whereas built-up is the best highlighted feature by using CART and RF with, respectively, 94.05% UA and 94.5% UA. The lowest values for the five classifiers are recorded for the soil class and vary between 28.6% UA for MD and 51.6% UA for GBR. The values of the PA vary between 48.8% for built-up class with MD algorithm and 97.3% for the vegetation class with RF classifier. • For the OSV d , shadow is still classified at 100% UA with CART, RF, SVM and MD, water at also 100% with SVM, while built-up is the best highlighted land cover feature by using GBR, with 95.1%UA.

Built-up footprint extraction and urban change assessment
Tables 7a and 7b above showed the same trends between the two study sites. Built-up class occupy the first, second, third or fourth position, but are always better than soil class. The important gap between their UA percentages is an indicator of the SV efficiency in discriminating built-up from other land features. The visual appraisal is expressive in terms of the footprint of the built-up feature, for each SV (Figure 8). Moreover, the proposed method may also enable LULC change studies. Using the built-up as a pilot LULC class, Landsat8 images of January 2016, were submitted to the same process, i.e. SVs analysis and a supervised binarized learning, with built-up and non-built-up classes. The classifiers performed up to 96.9% OA and 0.93 KC in Bamenda, versus 86% OA and 0.72 KC as shown in Table 8. After performing the Canny edge detector on each classified image, changes were visually and statistically found to increase for the two years 2016 and 2021 for all SVs and four classifiers in the two sites, except for MD that is inverted. For the successful ones, Urbanized areas of the two cities are estimated to be 565.5 km 2 (Bamenda) and 145 km 2 (Foumban). Based on the areas covered by the studied subsets   and the built-up areas extracted, the rate of urbanization was computed between the two dates. The formula proposed for the annual urban rate growth (AGR) was used, according to a previous adaptation (Boori et al. 2015), and originally formulated in Equation 16 as follows (Xiao et al. 2006): where nTA nþi is the total land area of the target unit to be calculated at the time point of i þ n; UA nþi and UA i the urban area or built-up area in the target unit at times i þ n and i, respectively; and n is the interval of the calculating period, expressed in years. Satisfying statistics were withdrawn to assess the efficiency of the proposed methodology (Figure 10a).

Discussions: comparative analysis with existing methods
The proposed spectral vectors, USV r , OSV c and OSV d , have been successful in detecting land features with high accuracies. These methods provide a wide range of bands for stacking and as entries for classification, meaning more alternatives for better discrimination of land features. From band selection to the testing of several machine learning  classifiers, this might be robust, for some further studies. The quality-test was conducted for comparison with existing methods followed in three steps. On the one hand, the classifiers were applied to the original MS image of each site. In Bamenda, the RF algorithm is the only classifier that gives an accuracy higher than those of SVs. The accuracies of the other classifiers are less or equal to those of SVs (Table 9). In Foumban, the RF algorithm also recorded the highest statistics for the MS image. Whereas the four other classifiers, performed with the best values on the SVs (Table 9).
Further, the same process was run with the NDSV. In Bamenda, GBR and SVM were higher than the SVs, whereas CART, RF and MD performed better for the proposed SVs (Table 9). In Foumban, the GBR classifier performed with the highest value, but recorded lower values than the other classifiers on the proposed SVs (Table 9).
Furthermore, an image stacking of some popular spectral indices was submitted to the process. Those selected were the Normalized Water Index, NWI (Ding 2009), the Soil Adjusted Vegetation Index, SAVI (Huete 1988), the Normalized Difference Soil Index, NDSI (Deng et al. 2015), the Modified New Built-up Index (Ngandam Mfondoum et al. 2019) and the Shadow Index, ShDI (Faridatul and Wu 2018). After computing and stacking, they were classified for each site using the same learning algorithms. In Bamenda, the best ranking of stacked spectral indices (SSIs) images showed the third performance with GBR and MD. Whereas in Foumban, CART recorded the highest value (Table 9).
Later, we also envisaged deepening the experiment by stacking each proposed SV with MS images for classification. In Bamenda, improvements were noticed on the three SVs for RF, GBR and SVM algorithms, up to +2% OA ( Figure 11). In Foumban, the results of all classifiers were generally poor on USV r and mixed on OSV c ; whereas there were some improvements up to +3.9% on OSV d for GBR and SVM ( Figure 11).
However, the change cover analysis performed between 2016 and 2021 has raised another aspect of the method that needs to be deepened, that is the band selection per date. In this case, the same ones were used for the two dates, although the image sampling was different. A thorough analysis of the band's suitability for each period might be interesting in a whole study dedicated to LULC changes based on SVs.

Conclusions
Satellite images classification has been improved over the years, and image transformations have been widely used to support the weaknesses and shortcomings of the original spectral bands. Spectral indices have specifically been used as independent alternatives or complementary assets to fulfill that goal. However, the normalized difference spectral vector (NDSV) analysis, inspired by Angiuli and Trianni (2014) for the computation of full potential indices in Landsat images, can be completed by considering other options, including ratio centred on one band or spectral bands overlap and interval.
The whole process presented in this paper explores some new spectral vector alternatives. Using relationships based on ratios between the two and three spectral bands of Landsat 8 images, the USV r , OSV c and OSV d have been proposed to increase the chances of computing a larger variety of spectral indices, to ease LULC classification. The application of five machine learning algorithms, i.e. CART, RF, GBR, SVM and MD to assess the quality of the spectral vector gave satisfactory results, with higher accuracies reaching 88.3% OA, for the pair SVM/USV r in Bamenda, and 87.5%OA, for the pair RF/USV r in Foumban. The lowest performer is MD for all the SVs, as its values are all below 80%.
Despite these very good performances, more research need to be done concerning the combination of each SV with the MS image, mixing decreases and increases by up to 3.9% OA, depending on the site. Moreover, as the SV band suitability selection gives different entries for the two sites, it will be interesting, for a LULC change study of the same site, to explore different band selections for each concerned date. These are the main outlooks and research directions that the authors are focusing on for the best land feature discrimination in complex urban areas.