Mapping land use with using Rotation Forest algorithm from UAV images

ABSTRACT The aim of this study is to test the performance of the Rotation Forest (RTF) algorithm in areas that have similar characteristics by using Unmanned Aerial Vehicle (UAV) images for the production of most up-to-date and accurate land use maps. The performance of the RTF algorithm was compared to other ensemble methods such as Random Forest (RF) and Gentle AdaBoost (GAB). The accuracy assessments showed that the RTF with 84.90% and 93.33% accuracies provided better performance than RF (7% and 4%) and GAB (15% and 11%) in urban and rural areas, respectively. Subsequently, in order to increase the classification accuracy, a majority filter was applied to post-classification images and the overall classification accuracy of the RFT was increased approximately up to 3%. Also, the results of classification were also analysed using the McNemar test. Consequently, this study shows the success of the RTF algorithm in the classification of UAV images for land use mapping.


Introduction
Planners, scientists, resource managers and decision makers commonly use updated land use data in problem analysis aimed towards the development of environmental and life conditions (Anderson, Hardy, Roach, & Witmer, 1976;Rozenstein & Karnieli, 2011). When obtaining land use/cover data, which represent the earth surface as a map, remote-sensing technology is considered a very useful tool (Foody, 2002;Rozenstein & Karnieli, 2011;Solaimani, Arekhi, Tamartash, & Miryaghobzadeh, 2010;Sonobe, Tani, Wang, Kobayashi, & Shimamura, 2014;Zhang & Zhu, 2011). However, spatial and spectral resolutions of remote-sensing data, which increased in accordance with advancing technology, led people to prefer to use them in other applications as well, such as urban and environmental applications. In order to accurately process the heterogeneity of the urban land cover, high spatial and spectral resolution images are specially preferred (Qian, Zhou, Yan, Li, & Han, 2015). Laliberte, Herrick, Rango and Winters (2010) indicated that low-cost Unmanned Aerial Vehicle (UAV), which ensures a high-spatial resolution image independent from the pilot, may be an alternative to rapidly generate the land use maps of these areas for its ability to obtain the most up to land use data for the area desired. Image classification is the most commonly used remote-sensing technique which is used to generate land use maps by analysing high-resolution remote-sensing data (Agarwal, Vailshery, Jaganmohan, & Nagendra, 2013;Aguilar, Saldaña, & Aguilar, 2013;Duro, Franklin, & Dubé, 2012;Foody, 2002;Jebur, Mohd Shafri, Pradhan, & Tehrany, 2014;Laliberte, Browning, & Rango, 2012;Liu & Yang, 2015;Schneider, 2012). Richards and Jia (2006) defined that pixel-based image classification is a labelling process which labels the pixels that belong to a given spectral class by using the existing spectral data. [A "salt and pepper" effect is generally observed in the maps generated using this approach (Sun, Heidt, Gong, & Xu, 2003).] This reduces the accuracy of the thematic map. Therefore, this effect may be minimized by applying different filters (spatial, temporal, logistic etc.) to the thematic images obtained from the classification process in order to improve the accuracy of the thematic maps (Khatami, Mountrakis, & Stehman, 2016;Lu, Huang, Liu, & Zhang, 2016;Nex et al., 2015;Stow, Shih, & Culter, 2015;Wang et al., 2015;Zhang, Li, & Zhang, 2016). In recent years, in order to improve overall classification accuracy of individual classifiers (Breiman, 2001;Colkesen & Kavzoglu, 2016;Gislason, Benediktsson, & Sveinsson, 2006;Halmy & Gessler, 2015;Miao, Heaton, Zheng, Charlet, & Liu, 2012;Opitz & Maclin, 1999;Wang, 2006), ensemble approaches such as Random Forest (RF), Rotation Forest (RTF), Boosting and Bagging are commonly used. The ensemble approach is a tree-based learning approach, which utilizes multiple classifiers rather than one single classifier and yields more than one classification result. Among these results, a pixel is labelled according to the class it was most often assigned.
The RTF algorithm is the most recently developed ensemble method. Rodriguez, Kuncheva and Alonso (2006) described RTF as an algorithm which generates classifiers that yield more accurate results when compared to other learning methods such as RF, AdaBoost and Bagging. The RTF approach bases on encourage diversity by using a transformation method to do feature extraction for each classifier (Kotsiantis, 2011). This algorithm generates the data set in a different feature space (principal component analysis [PCA], independent component analysis etc.) and produces multiple classification trees by using the data set transformed to this feature space (Liu & Huang, 2008). Each tree is trained on the data sets present in the transformed feature space (Xia, Du, He, & Chanussot, 2014). Other studies related to image classification support the claim that the classification logic and application of the RTF approach provides more accurate results when compared to other ensemble methods (Du, Samat, Waske, Liu, & Li, 2015;Gaikwad & Pise, 2014;Kuncheva & Rodriguez, 2007;Lasota, Łuczak, & Trawiński, 2012;Liu & Huang, 2008;Xia, 2016;Xia et al., 2014).
In this study, the orthophoto images obtained from high spatial resolution UAV-imaging techniques were used to the most up-to-date and accurate land use maps. The success of RTF was tested and its performance in the pixel-based classification process was compared with other commonly used ensemble methods, such as RF and Gentle AdaBoost (GAB). Lastly, the statistical significance of the results was examined using the McNemar test.
The collection of high-resolution images were collected using the Gatewing X-100 UAV. In the 150-hawide area on the Karadeniz Technical University Campus, 168 images were taken and these images were processed for geo-referencing in Agisoft Photoscan software for urban areas. The Ground Control Points (GCPs) used in processing were coordinated through RTK GPS. In urban area, UAV images were processed in x direction with 0.03 m and in y direction with 0.05 m error. The image used in urban areas was three banded (red, green, blue) and has 0.05 m spatial resolution. For the rural area of the Hıdırnebi Plateau of Trabzon-Akçaabat district, which was about 661 ha, 741 photos were processed with error rates in x direction with 0.03 m and y direction 0.03 m. GCPs which were coordinated through the RTK GPS technique were utilized in the geo-referencing process. This image used in rural areas has three bands and 0.16 m spatial resolution. Orthophoto images were obtained at the end of this processing.
The principle of RF is splitting each node by using the GINI index to find the best split among randomly selected variables in each node rather than among all the variables and use it for growing the tree (Akar & Güngör, 2015). In RF, Decision Trees (DTs) are trained on the bootstrap samples (random, with replacement) from original training data (Breiman, 2001). Two user-defined parameters are used at this stage. These parameters include the number of variables used in each node to determine the best division (m) and the number of DTs to be generated (N). First, bootstrap samples are formed by randomly selecting 2/3 of the training data set. The remaining 1/3 of the training data set, which is also called outof-bag data, is used to test the errors. Thereafter, the trees are grown from these randomly selected bootstrap samples without punning. For growing the tree, the best split is determined by the GINI index measurements from a randomly selected m number of variables in each node, and then the node is split accordingly (Kim, Im, Ha, Choi, & Ha, 2014). GINI index measures the class homogeneity and can be expressed with the following formula: In a given T training data set, the class belonging to a randomly selected pixel is C i and f C i ; T ð Þ= T j j is the probability that a selected sample belongs to the C i class of the sample (Pal, 2005). When the GINI index reaches zero, or in other words when a single class remains in each leaf node, the splitting process ends (Watts, Powell, Lawrence, & Hilker, 2011). Depending on the number of trees to be grown, that number is determined by the best split for each node (Liaw & Wiener, 2002).

Gentle AdaBoost
AdaBoost algorithm is an ensemble method that performs the classification process with the weights assigned to difficult-to-classify samples using iterative procedures (Freund & Schapire, 1996). The AdaBoost method grows many classifiers, similarly to RF, and votes them. These two methods differ in the fact that AdaBoost grows the classifiers in an interdependent and consecutive manner whereas RF simultaneously grows many independent classifiers. Weighed data sets at a number equal to T trial S 1 ; S 2 ; . . . ; S T ð Þare generated as series and C 1 ; C 2 ; . . . ; C T classifiers are grown. Final classifiers are determined using C* weighed votes (Bauer & Kohavi, 1999). An equal weight is given for all samples. In each iteration, while the weights of all unclassified samples are increased, the weights of accurately classified samples are decreased. For each individual classifier, one weight is assigned. This weight measures the overall accuracy of the classifier and it is a function of the total weights of the accurately classified samples. Therefore, higher weights are assigned to more accurate classifiers. These weights are used for the classification of the new samples. The AdaBoost algorithm is sensitive to noise in the training data, so Freidman, Hastie and Tibshirani (2000) proposed some new boosting algorithms including GAB, which has proved to solve this problem (Hamdi, Auhmani, Hassani, & Elkharki, 2015). This method is based on the use of the least squares regression method to determine the weight of the AdaBoost algorithm. It aims to minimize the exponential loss function of AdaBoost rather than adapting the data to an estimate of a class probability by using least squares regression (Ho, Lim, Tay, & Binh, 2009).

Rotation forest (RTF)
The RTF algorithm is an ensemble method, which has been proposed by Rodriguez et al. (2006), to encourage both individual accuracy and member diversity within a classifier ensemble (Xia et al., 2014). The RTF algorithm used for classification is a linear transformation method and provides a new performing space within another space (Liu & Huang, 2008). The RTF algorithm is similar to the principle of the RF algorithm in terms of growing multiple trees in classification. However, it differs from RF by using a different feature space such as PCA, to generate the data set. It generates many DTs using the training data sets determined with this feature space. During the training of the DTs using RTF, the training data set is divided into subsets and feature extraction is done by using the feature space selected from each subset. Rodriguez et al. (2006) stated that because of this feature, RTF yields a better classification accuracy compared to RF.
When X stands represents for the training data set; Y represents the classes corresponding to this data set and F represents the number of samples, N is accepted as the number of samples in training data set and n is considered as the number of classes. Y contains the class values in the range of 1; . . . ; n f g . Accordingly, the data set is randomly divided into K subdivisions that are approximately the same size. The number of DTs present in the RTF is D 1 ; . . . ; D L and expressed as L. In the RTF algorithm, K and L are two user-defined parameters. By using these parameters, the data set used for growing each DT with the RTF method is determined as follows. First, F, it is randomly divided into K subdivisions. In each subdivision, there are M ¼ n=K features. Assuming that F ij is the subset that includes j features used in the training of the D i classifier and that X ij is the data set that includes the features present in the F ij in the X data set, new data set is generated using the bootstrap method. Two-third of this data set is used as training data and 1/3 as testing data. PCA transformation is applied to this data set and the covariance matrix is generated. R i transformation matrix is obtained using the values found in the covariance matrix (Formula 2) (Rodriguez et al., 2006).
The columns of this R i matrix are rearranged according to the original features sequence. The new transformation matrix is indicated by R a i . The R a i transformation matrix is used to train D i classifier.
The D i data set transformed for the classifier is expressed as XR a i . According to this approach, all classifiers are provided training in a parallel manner (Liu & Huang, 2008), and the classification is realized. In this step, a trained tree is obtained for each transformed data set and each tree yields a classification result. Among these results, a pixel is labelled by the class to which it was most often assigned. In other words, x testing sample belongs to w j class and the probabilities generated using D i classifier are d ij XR a i À Á . The confidence is calculated for each class w j using the average combination method (Formula 3).
x, the class with the largest confidence interval, is assigned accordingly (Rodriguez et al., 2006).

Classification process
In the study, for the classification process, a total of seven classes were identified: forest, grass, road, soil, shadow, building1 and building2. The white buildings with roof structures such as steel and concrete were chosen as building1, the red buildings with roof structures such as tiles were chosen as build-ing2. According to these classes, the sample areas were collected for each class from the orthophoto images of the study areas by using Environment for Visualizing Images (ENVI) software. Care was taken to select approximately equal numbers of pixels for each class. In order to generate training samples, Mather (2004) defined the minimum number of pixels required to be collected for each class using ((number of bands) × (number of samples) × (number of classes) formula. Accordingly, the minimum number of samples was determined for these study areas as 3 ð Þ Â 30 ð Þ Â 7 ð Þ¼630. Given the number of samples in the ENVI program, training pixels were selected by the image for each class and from each class. Approximately 2700 pixels for urban area and 2500 pixels for rural area were collected, respectively.
Additionally, in order to determine the Separability Index (SI) of the region of interests (ROIs) of the land classes such as forest, grass, road, soil, building1 and building2, Jeffries-Matusita and Transformed divergence measurements in the ENVI 4.7 (ENVI 2009) software. For all classes, SI was calculated and ROIs with a high SI were used in the classification. When values were greater than 1.9, these classes were considered to have good separability (Elsharkawy, Elhabiby, & El-Sheimy, 2012). Two classes were considered to be very poorly separated when ROI pairs were lower than 1.0 (Mei, Manzo, Bassani, Salvatori, & Allegrini, 2014). The formulas (4 and 5) for computing the Jeffries-Matusita distance (JM ab ) are as below: where μ a and μ b are the mean values for the classes a and b, C a and C b are the covariance matrices for classes a and b, and T represents the transpose of a vector (Kumar et al., 2016). Afterwards, training and test data were created according to the sample grounds chosen on each of the images in the MATLAB software by applying the Random Feature Selection Method. Two-third of training samples were used for training while onethird of them were utilized as testing data. As a result, 12.980 pixels were selected as training data and 6.490 pixels were selected as testing data for rural areas. About 10.199 pixels were selected as training data and 5.099 pixels were selected as testing data for urban area. The user-defined parameters (e.g. m and N parameters for RF and, K and L parameters for RTF) were determined by the trial of the user. These optimal turning parameters were used to determine best-performing classification model, and images were classified with the RTF, RF and GAB methods (Figures 2 and 3). In the classification process, MATLAB codes were used. For the RTF, RF and  (2) GAB classification approaches, the same training data were utilized. After the post-classification process, misclassified pixels appeared to be the effect of noise in the thematic images. This reduced the quality and the accuracy of the land use map to be generated. Therefore, in order to minimize these effects and increase the accuracy, filtering methods may be used. For this study, a majority filter, which is a spatial filtering, was applied to minimize noise effects. A majority filter is based on the majority rule and applied inside a moving window of size defined by the user (Nex et al., 2015). The noise effect on the images obtained with the classification was likely somewhat eliminated by using the (3 × 3) majority filter (Figures 2 and 3). Lastly, land use maps were generated using the thematic images with the highest classification accuracy for the study areas (Figure 4).

Accuracy assessment
In literature, there are many studies that examine the number of reference pixels required for the accuracy of analysis. A majority of the investigators used approaches based on binomial distribution for the calculation of the number of required reference pixels. The equations applied to this distribution used the rate of correctly classified samples. However, this  technique is not sufficient to determine the number of reference pixels required for the error matrices. This is explained by the fact that the error matrices contained both correctly classified and misclassified samples (Congalton & Green, 1999). So, Congalton and Green (1999) proposed an approach based on multinomial distribution for the calculation of the number of reference pixels that would be enough to generate an error matrix for the accuracy of analysis. They offered the following multinomial distribution equation to calculate the number of required reference pixels for the accuracy of the classification (Equations (6) and (7)): In Equations (6) and (7), n represents the number of reference pixels, B; χ 2 is the distribution rate of single degree of freedom, k is the number of classes, i shows the area covered by class which is the ratio of the area covered by the ith class on the map, α is the confidence interval and b i is the required accuracy. If we wish to precisely determine the number of reference pixels, and i information in hand details are present, Equation (6) would be used. In the study, a total of 735 points were produced on an image for 7 classes, using a stratified random method based on the areas covered by the classes. These points were used to analyse the accuracy of the classified thematic images. The accuracy of each classification result was assessed using an error matrix and statistics of the percentages of accuracy were calculated based upon the error matrix. According to Pontius and Millones (2011), Kappa attempts to compare accuracy to a baseline of randomness. However, randomness is not a reasonable alternative for map construction. So, they recommend the use of two other parameters, which are quantity and allocation disagreements. Given this judgement, the κ quantity and κ allocation were also calculated for each classified image for post-classification accuracy assessment. For the statistical significance of the correlation across the classification accuracies, McNemar statistical test based on the table χ 2 was used. In this test, the following equation is used (Foody, 2004): where f 12 is the number of correctly classified pixels with a first classifier but misclassified pixels with a second classifier and f 21 is the number of misclassified pixels with second classifier but correctly classified pixel.

Result and discussion
In order to evaluate the performance of the RTF classifier compared to the RF and GAB ensemble methods, the error matrices obtained from the accuracy analysis were examined. First, when the error matrices calculated for the urban and rural areas in the RTF were examined, RTF showed a better performance compared to other ensemble methods with overall classification accuracies and kappa values [κ quantity (0.97), κ allocation (0.94)] of 93.33%, and kappa values [κ quantity (0.89), κ allocation (0.88)] of 84.90%. In urban areas, the higher number of the pixels with similar spectral properties compared to rural areas negatively affected the classification accuracy. When the error matrices for rural area were examined, misclassified pixels were observed in the classes with similar spectral reflectance values. For example, the road class has similar spectral specifications with building1 and the soil classes. Concrete roads are confused with concrete buildings and soil roads with soil class. The same was observed in the forest and grass classes as well. In the grass and soil classes, in areas with sparse grass, grass pixels were confused with the soil class. When the accuracy of each class was examined, RTF that had a producers' accuracy close to that of RF represented the forest class, grass class and building1 class better by 4%, 8% and 11% when compared to RF. RF provided a better classification accuracy by 3% and 1% in road and soil classes, respectively. For the users' accuracy, RTF was more successful in the forest, grass, road and soil classes by 6%, 3%, 8% and 13%, respectively. In building1 class, RF showed higher classification accuracy by 6% compared to RTF (Table 1).
For the urban area, when the error matrices present in Table 2 were examined, it was found that most commonly confused classes were forest and grass, road and building1, and soil and build-ing2. Similarly to rural area, white buildings with concrete and tin roofs were confused with concrete roads and the buildings with brick roof present in the building2 class were confused with the soil class. In terms of producers' accuracy, RTF was 7%, 13%, 30% and 30% in the forest, grass, road and soil classes compared to RF, respectively, whereas RF classified building1 class and building2 class 14% and 4% better, respectively. For the users' accuracy, RTF generally yielded more successful results. As also seen in terms of the comparison of classification performances, RTF yielded a higher accuracy compared to other methods in both study areas (Table 3). In addition, the majority filter was also applied to post-classification images in order to decrease the effect of noise in the classified images and to increase the classification accuracy. It was found that using filters resulted in improvements in the classification accuracy from 84.90% to 87.21% for RTF, from 77.69% to 78.50% for RF and from 69.52% to 72.52% for GAB in the urban area and from 93.33% to 94.15% for RTF, from 88.98% to 90.07% for RF and from 82.59% to 83.81% for GAB in the rural area (Table 3). Thereby, the effect of noise on the images was reduced and the quality of the land use maps generated improved.
Finally, the significance of the differences between the accuracies of the classification methods was studied using the McNemar test. As seen in Table 4, according to the result of the McNemar test, χ 2 , the values for RTF-RF and RTF-GAB were 12.6447 and 53.8407 in the rural

Conclusion
In this study, RTF showed an overall classification accuracy of 84.90% in the urban area and 93.33% in the rural area for the orthophoto images obtained from UAV images. When the overall accuracy of the RTF algorithm was compared with common ensemble methods such as RF and GAB, it was observed that RTF provided better performance than RF and GAB. In urban area, it was found that RTF showed a better performance by approximately 7% and 15% when compared to RF and GAB, respectively. In rural area, RTF was found to be more accurate by 4% and 11% when compared to RF and GAB, respectively. These results showed that the RTF classifier yielded a  higher accuracy compared to RF and GAB classifiers. A majority filter used to reduce the effect of the noise in the thematic images obtained after the classification was efficient in increasing the quality and accuracy of the land use maps produced. In addition, the significance of the differences of classification accuracy between RTF and other methods was tested using the McNemar test. These differences were found to be statistically significant. Consequently, the results obtained in this study that investigated the use of the RTF algorithm in the classification of UAV images support the usability of the RTF method in analysing the land use maps to be generated for rural and urban areas. f 11 : The number of correctly classified pixels in both cases; f 22 : the number of misclassified pixels in both cases; f 12 : the number of correctly classified pixels with first classifier but misclassified pixels with second classifier; f 21 : the number of misclassified pixels with second classifier but correctly classified pixels with first classifier.