Discriminating crops/weeds in an upland rice field from UAV images with the SLIC-RF algorithm

ABSTRACT In this study, we propose a method for discriminating crops/weeds in upland rice fields using a commercial unmanned aerial vehicles (UAVs) and red-green-blue (RGB) cameras with the simple linear iterative clustering (SLIC) algorithm and random forest (RF) classifier. In the SLIC-RF algorithm, we evaluated different combinations of input features: three color spaces (RGB, hue-saturation-brightness [HSV], CIE-L*a*b), canopy height model (CHM), spatial texture (Texture) and four vegetation indices (VIs) (excess green [ExG], excess red [ExR], green-red vegetation index [GRVI] and color index of vegetation extraction [CIVE]). Among the color spaces, the HSV-based SLIC-RF model showed the best performance with the highest out-of-bag (OOB) accuracy (0.904). The classification accuracy was improved by the combination of HSV with CHM, Texture, ExG, or CIVE. The highest OOB accuracy (0.915) was obtained from the HSV+Texture combination. The greatest errors from the confusion matrix occurred in the classification between crops and weeds, while soil could be classified with a very high accuracy. These results suggest that with the SLIC-RF algorithm developed in this study, rice and weeds can be discriminated by consumer-grade UAV images with acceptable accuracy to meet the needs of site-specific weed management (SSWM) even in the early growth stages of small rice plants..


Introduction
Weed management is critical for agricultural production. Weeds compete with crops for resources (e.g., light, water, and nutrients) and introduce diseases that can cause yield loss (Liu & Bruch, 2020;Machado, 2007;Monaco et al., 1981). Weeds are unevenly distributed as patches across fields (Cardina et al., 1995;Dieleman & Mortensen, 1999;Gerhards et al., 1997;Johnson et al., 1996); however, conventional weed management relies on whole-field management (Ghanizadeh & Harrington, 2019;Llewellyn et al., 2004). SSWM with monitoring data has considerable potential economic and environmental benefits (Shaw, 2005). SSWM includes spraying only weed patches and/or adjusting herbicide applications according to weed distribution information (López-Granados, 2011). Weed cover maps can be used for SSWM to make decisions regarding where the chemicals are needed most, least, or where they should not be used at all (Lan et al., 2010). Thus, timely detection of weed distribution can support cost reductions by facilitating precision application of inputs (Pedersen et al., 2006).
In Southeast Asia, the weeding method is rapidly shifting from manual weeding to the application of herbicide (Gianessi, 2013). In Laos, the widespread use of herbicide is remarkable not only in commercial farming, such as rubber (Sturgeon, 2013) and maize (Chaovanapoonpho & Somyana, 2018) plantations but also in upland rice farms managed by subsistent smallholders (Asai et al., 2017). The primary reason for this shift in weeding style, irrespective of its high cost, is to compensate the labor shortage due to the increasing opportunities for employment in urban areas (Asai et al., 2017, Lestrelin et al., 2005. Therefore, SSWM is a key issue for profitable smallholder farms in Laos. Furthermore, timely weed detection could be expected to a key technical component for the breeding and development of new varieties. Laos has diverse genotypic resources of upland rice, but these resources have never been utilized for selection and breeding due to the lack of human resources and research infrastructure. High-throughput phenotyping techniques by UAVs were developed to estimate the varietal differences in plant growth with high accuracy (Kawamura et al., 2020). However, manual weeding is inevitable at any UAV flight time, and thus, the weed effect must be removed for image analysis. Thus, weeding imagery, i.e., timely weed detection and rice/weed discrimination, can reduce the weeding cost process and accelerate the breeding process when applied in combination with high-throughput phenotyping.
Recently, high spatial resolution images acquired by unmanned aerial vehicles (UAVs) have been adopted and performed suitably for the early weed detection of crops (Castaldi et al., 2017;Peña et al., 2015;Pérez-Ortiz et al., 2015). In many cases where weed detection is required, the optimal flight timing is early in the growth season when weeds and crops are in their seedling growth stage (López-Granados, 2011;Peña et al., 2013). However, in the early season, crop and weed seedlings usually show similar spectral signatures, making crop/ weed classification difficult. Another issue is that the redgreen-blue (RGB) color images vary with the illumination conditions in the field (Kim et al., 2008;Onyango & Marchant, 2001). To solve the color-based crop segmentation problem, there have been many attempts using various methods, including the use of different color spaces and RGB-based vegetation indices (VIs). Two of the most considerable color space models are HSV (hue, saturation, and intensity) and CIE-L*a*b* (L* for illumination, a* for values from red to green, and b* for values from blue to yellow), which can be obtained by transformation functions from RGB images (Hamuda et al., 2016;A. Wang et al., 2019). The HSV and L*a*b* color spaces have been used by many studies in crop/weed segmentation research to adopt the color distribution to the outdoor environment (Bai et al., 2013;Hamuda et al., 2017). As another color-based approach, many RGBbased VIs have been proposed, such as excess green (ExG) (Woebbecke et al., 1995), excess red (ExR) (Meyer et al., 1999), green-red vegetation index (GRVI) (Motohka et al., 2010) and color index of vegetation extraction (CIVE) (Kataoka et al., 2003). These VIs are developed by aiming to emphasize the chromatic characteristics of vegetation (Hamuda et al., 2016). For example, ExG provides a clear contrast between plants and soil because green crops have larger green hue values than soil background (Woebbecke et al., 1995).
In UAV-based remote sensing data, better classification accuracy can be expected by object-based image analysis (OBIA) compared to traditional pixel-based image analysis (Peña et al., 2013). OBIA segments the high spatial resolution images into groups of adjacent pixels with homogenous spectral values and then use these groups, called 'objects', as the basic elements of classification analysis (Blaschke et al., 2014). OBIA techniques can combine spectral, topological, and contextual information from these objects to address difficult classification scenarios (Peña et al., 2013). Therefore, the combination of UAV imagery and OBIA addresses the major challenge of automating early weed detection in early season herbaceous crops Peña et al., 2013). More recently, the superpixel technique (Ren & Malik, 2003) began to be extensively applied to high spatial resolution image segmentation due to its ability to generate uniform and homogenous regions that preserve most of the useful information (Beaulieu & Goldberg, 1989;Tremeau & Colantoni, 2000). Superpixels are irregular blocks of visually significant pixels that consist of adjacent pixels that are similar in color, texture, and brightness, among other factors. Currently, many superpixel algorithms are available via online sources, and these algorithms can be classified into two types. One type includes algorithms based on gradient ascent methods, such as the mean shift algorithm (Comaniciu & Meer, 2002), the watershed transform algorithm (Haris et al., 1998), and the simple linear iterative clustering (SLIC) algorithm (Achanta et al., 2012). The other type includes algorithms based on graph theory, such as the normalized cuts algorithm (Shi & Malik, 2000) and efficient graph-based image segmentation (Felzenszwalb & Huttenlocher, 2004). Among the algorithms above, SLIC exhibits a good balance between accuracy and computational efficiency. Therefore, this algorithm has been applied extensively to high spatial resolution remote sensing data Csillik, 2017;M. Wang et al., 2018).
To date, numerous classification methods have been developed and applied for weed mapping using UAV images. Machine learning algorithms have emerged as accurate and efficient alternatives to conventional parametric algorithms, especially for high-dimensional and complex data (Rodriguez-Galiano et al., 2012). Among the numerous machine learning algorithms available, the random forest (RF) classifier has increasingly attracted the attention of researchers due to its generalized performance and fast operation speed (Belgiu & Drăguţ, 2016;Rodriguez-Galiano et al., 2012). Classification accuracy could be enhanced by combining the RF classifier with OBIA (De Castro et al., 2018). In addition, previous studies have reported that weed detection from UAV images could be improved by adding auxiliary information layers, such as spatial texture and vegetation height estimated from UAV digital surface models (DSMs) (Yuba et al., 2020b;Zisi et al., 2018).
From the previous findings mentioned above, accurate weed detection from UAV images can be expected by the OBIA-RF algorithm combined with auxiliary information, but the recommended input feature will differ depending on the target plants, and there is no common set of features yet. Therefore, we evaluated the performance of OBIA-RF algorithms by comparing the input features for the discrimination of crops, weeds and soil background in an upland rice field using commercial grade UAV images. In OBIA, we used the SLIC superpixel technique to extract the input feature information for each object, which included three color spaces (RGB, HSV and L*a*b*) as the primary input feature and a DSM, spatial texture and four VIs (ExG, ExR, GRVI and CIVE) as auxiliary information. In addition, to demonstrate the applicability of the SLIC-RF algorithm to the detected spatial distribution map of crops and weeds, weed removal experiments were conducted in part of the upland rice field. We developed the SLIC-RF model in the weed removal test area and compared the predicted cover areas of weeds and rice plants.

Experimental field and weed removal experiment
Laos is a country that contains one of the largest genetic resources of upland rice in the world (Appa Rao et al., 2006). This research was conducted in an experimental field at the Rice Research Center (RRC) of the National Agriculture and Forestry Research Institute (NAFRI) (18° 8ʹ56.65ʹ'N, 102°44ʹ9.78ʹ'E) in the central part of Vientiane in Laos (Figure 1). The experimental field is located in a tropical climate ('Aw' in Köppen's climate classification) with a hot, humid summer season. The mean annual temperature is 25.4°C, and the annual precipitation is 1622 mm. The soil type is characterized by clay loam (CL, 0-30 cm) and light clay (LiC, 40-60 cm). The target field was naturally infested with Digitaria ciliaris and Ageratum conyzoides L., and the weed species were mixed in the field.
Within the upland rice field, an experimental plot (27 m × 21 m = 567 m 2 ) was installed to evaluate the timing of weed removal treatments in the rainy season (May to October) of 2019 ( Figure 1). The plots, with a size of 7 m × 6 m, were laid out in a randomized complete block design with three replications (R1-R3) ( Figure 2). To create the variation in weed density for UAV observation, we prepared the three weeding treatments (Figure 2(a)). For all treatments, the first weed removal was performed with hallow plowing at 5 days before sowing the rice (June 9). The second weeding before the UAV flight was conducted on July 10 for Treatment 1 (T1) and June 25 for Treatment 2 (T2) and no weeding was done for Treatment 3 (T3). UAV observations were performed on July 12. Thus, the time interval from the last weeding to the UAV flight was 1, 17 and 34 days for T1, T2 and T3, respectively. The second weeding was manually conducted, and weeding residues were carefully removed from the plot as much as possible.
On 13 June 2019, 5-10 seeds of rice were sown with a dibbling stick at a spacing of 25 cm × 25 cm. Rice was grown under rainfed upland conditions without the addition of water and fertilizer. Before experimental installation, the field was fallow for 3 years. Field preparation was performed with hallow plowing 5-day prior to sowing. Missing hills that did not germinate were replaced two weeks after sowing.

Overview of the methodology
Here, we present an overview of the research process using a flowchart ( Figure 3) that summarizes five steps: (1) acquisition of UAV images, ground control points (GCPs), and ground-truth data; (2) generation of dense point clouds, DSMs, and ortho-mosaiced RGB images with 1-cm spatial resolution; (3) generation of color space images (RGB, HSV and L*a*b*), CHMs, texture images (spatial variety of the G band), and VI images (ExG, ExR, GRVI, and CIVE) from mosaic RGB images; (4) image segmentation and calculation of four statistical features within each segment for input layers (color spaces + DSM + Texture + VIs); and (5) calculation of RF classifier and generation of rice/weed spatial distribution maps.
The SLIC-RF algorithm used in the present study was developed originally by Yasuda (2018) for classifying shrubs that invaded seminatural grasslands. In the original methods, the input features include topographic openness, which represents the dominance (positive) or enclosure (negative) of a landscape (Yokoyama et al., 2002). We used spatial texture (Zisi et al., 2018), CHM, and VIs instead of openness, and different combinations were tested.

UAV image and GCP acquisition in field
A small consumer UAV, the DJI Phantom 4 (DJI, Shenzhen, China), was used to capture the RGB images after sowing (22 June 2019, DAS = 9) and the initial growth stage (12 July 2019, DAS = 29). The UAV flights followed an autonomous flight plan using the 'double grid' mission in Pix4Dcapture (https://support.pix4d. com/hc/en-us/articles/115002496206) to ensure substantial overlap (i.e., 90% forward and 70% side), and the flight height was 20 m. The camera angle was set at 80° because a previous study reported that structure from motion (SfM)-based DSMs (or digital elevation models (DEMs)) derived from UAV images showed systematic broad-scale deformations, which are expressed as a central 'doming' (Rosnell & Honkavaara, 2012), and the systematic error could be reduced by collecting oblique imagery (James & Robson, 2014).
As GCPs, we placed five wooden boards (30 × 30 cm) at the four corners and at the center position of the field (Figure 1), and recorded their positions using a differential global positioning system (Geo7X, Trimble, Westminster, CO, USA). Using Trimble Pathfinder Office (Trimble Navigation Ltd., Sunnyvale, CA, USA), postprocessing was performed on the GCP data producing horizontal and vertical resolutions of <30 cm.

Dense point cloud, DSM, and RGB generation
Using commercial SfM software, Metashape Pro version 1.5.1 (Agisoft LLC, St. Petersburg, Russia), 3D dense point clouds, DSMs, and orthomosaic RGB images (1 cm spatial resolution) were constructed from images taken by the UAV based on the geographic coordinates of six GCPs (UTM 53N). The parameter settings applied in Metashape Pro were similar to those applied in a previous study by Yuba et al. (2020a): 'Highest' accuracy in 'Camera alignment' and 'Ultra high' quality were used in the 'Build point clouds' processes. In the 'Build point clouds' process for generating 3D dense points clouds, 'Mild depth filtering' was used to achieve increased DSM accuracy (Holman et al., 2016). The DSM was generated by interpolating the dense point cloud data. An orthomosaic RGB image was constructed by orthorectification of the original RGB images with the 'Mosaic' blending mode. In building the orthomosaic image, the DSM data were used as the basis for estimating overlapping areas.

Color spaces
Mosaicked RGB images were converted to HSV and L*a*b* color space images using MATLAB version 9.3 (MathWorks, Herborn, MA, USA) ( Figure 4). The HSV color space uses a circular shape for the color layer as hue (H), a color purity for saturation (S), and a brightness for the value layer (V). In HSV images, brightness (V) is unrelated to the color information of the images. The L*a*b* color space includes one luminance layer (L*) and two color layers (a* and b*).

Canopy height model (CHM)
Using DSMs, the CHM can be computed from the distance between the DSM at the ground level and the DSM of the vegetation (Kawamura et al., 2020;Van Iersel et al., 2018). The lower boundary of the DSM at the ground level can be easily determined by drone flights during either the presowing stage or early in the season before emergence (Bendig et al., 2013;Holman et al., 2016). In the present study, the DSMs at the ground level (DSMt 0 ) and canopy level (DSMt 1 ) were generated with RGB images on 22 June 2019 (before emergence, DAS = 9), and 12 July 2019 (early growth stage, DAS = 29) ( Figure   Figure 2. Sowing, weeding and UAV flight dates in the weed experimental plot (a) and the plot design with an orthomosaic UAV image acquired on 12 July 2019 (b). R1-R3: Three replicates; T1-T3: Weed removal treatment date (July 10, June 25, and none); DAW: Days after weeding until UAV observation. 5). The ground sample distance (GSD) and pixel size for DSMt 0 and DSMt 1 were 0.94 and 0.95 cm pix -1 , respectively. To calculate the DSM distance, DSMt 1 was resampled to fit the DSMt 0 pixels using the nearestneighbor method based on the 'resample' function in the 'raster' package version 2.9-5 (Hijmans, 2014) using R software version 3.5.1 (R Core Team, 2018).

Spatial texture
The layers of spatial texture (Texture) information were created by applying a local variance filter (7 × 7 pixels) to the RGB image (Zisi et al., 2018). Spatial texture describes visual effects caused by spatial variation in tonal quantity over relatively small areas (Anys & He, 1995). In the present study, a high discrimination ability of rice/weed plants was observed in the G band ( Figure 6); therefore, this band was selected as an input feature for image classification.

SLIC superpixel image segmentation
SLIC superpixel image segmentation was performed using the second version of SLIC (SLICO) with k = 10 in the scikit-learn package (Pedregosa et al., 2011) of Python. In SLICO, only the parameter k (initial clustering size) needs to be defined, while m is adaptively refined for each superpixel. SLICO produces regularly shaped superpixels across the scene (Achanta et al., 2012), as shown in  In each object from the input layers, four statistical features, including the median (med), standard deviation (std), minimum (min), and maximum (max) values, were extracted. To discriminate the different types of areas in the upland field image, we set three classes: i) rice plants, ii) weed plants, and iii) soil class (including soil surface, dead plant materials and others). Spatial data points (rice, weeds and soil classes; n = 500 each) were generated in shape file format using ArcGIS version 10.6 (ESRI, Redlands, CA, USA). A spatial point data set of the input features (layers × 4 statistical features) was applied as the input features in RF classification.

RF classification
RF is an ensemble of numerous independent individual classification and regression trees (CARTs) (Breiman, 2001). The final response of RF is obtained by the output of all the decision trees involved. A more detailed overview of RF is given in Breiman (2001), and recent advances in the use of RF in remote sensing studies can be found in Belgiu & Drăguţ (2016).  We performed RF classification using the scikit-learn package (Pedregosa et al., 2011) in Python using input features from SLIC segmentation (see Figure 3). Initially, the data set was split into a training data set (80%) for building a model and a test (20%) data set for validating the accuracy of the model. Using the training data set, the RF model built a set of trees that were created by selecting a subset through a bagging approach, while the remaining subset, called the out-of-bag (OOB) sample, was used for internal cross-validation. The OOB sample data were used to compute the accuracies and error rates averaged overall predictions (Cutler et al., 2007) and estimate variable (feature) importance. The final output class was represented by the one with the maximum votes from the number of trees (ntree) used to grow the forest. In the present study, RF classification was performed with ntree = 5000.
In the RF procedure, there are two methods for assessing the importance of each input feature in the model (Cutler et al., 2007;Mellor et al., 2013). One is the mean decrease in accuracy (MDA), which is calculated as the normalized difference between the OOB accuracy of the original observations and the OOB accuracy of randomly permuted features (variables). Another measure is calculated by summing all the decreases in Gini impurity at each tree node split, normalized by the number of trees. We used Gini impurity to assess the importance of input features.

Input feature selection and classification accuracy
In feature selection, OOB accuracy was used in the training data set (n = 1,200) to compare the performance of OBIA-RF classifications using different features. The OOB  Figure 7. SLICO was applied to orthomosaic RGB images of weed experimental plots with a 1 cm spatial resolution and initial clustering of 10 × 10 pixels (k = 10) and the flow of the input feature extraction process.  Kataoka et al. (2003) accuracy was calculated using the OOB error in the internal cross-validation process, which is defined as follows: OOB accuracy ¼ 1 À OOB error (1) where the OOB error is the fraction of the number of incorrect classifications over the total number of OOB samples. The OOB can be used to assess how well the RF model performs (Belgiu & Drăguţ, 2016).
In the final classification model with the selected features, the general accuracy and reliability were evaluated using the test data set (n = 300) with the overall accuracy (OA) and F-score. In addition, the classification performance in each category (rice, weed, and soil) was evaluated with the precision, recall, and confusion matrix. The OA, F-score, recall and precision are calculated by the following equations: where tp, fn and fp are true positive, false negative and false positive, respectively. Recall (i.e., user accuracy) represents the proportion of trees that were classified as a specific class among all trees that truly belong to that class. Precision (i.e., producer accuracy) is the proportion of the examples that truly belong to a specific class among all those classified as that specific class. The F-score evaluates the relations between the data's positive labels and those given by the classifier, which is the harmonic average of precision and recall (Sokolova & Lapalme, 2009). The closer the F-score is to 1, the better the classification performance.

Statistical analysis on weed treatment
The predicted areas covered by rice, weeds and soil from the UAV images using the SLIC-RF algorithm with selected input features were analyzed by one-way analysis of variance (ANOVA) to compare the weeding effects among the treatments (T1, T2 and T3). The treatment means were compared at the 5% level of probability using Tukey's HSD test.

Color spaces
As the primary input feature in the SLIC-RF model, three color spaces (RGB, HSV and L*a*b*) from UAV image captured at initial growth stage (12 July 2019, DAS = 29) were initially compared, as shown in Table 2. The number of input features was 12 (3 layers × 4 statistical features within objects). Based on the OOB accuracy, the highest classification accuracy was obtained with HSV (0.904). The OOB accuracy in L*a*b* (0.886) also showed improvement over that in the original RGB (0.709). These results indicated that the HSV would be the best color space to use in the SLIC-RF procedure for discriminating rice/weed plants and soil in the upland rice field.
To assess the contributions of the input features to the classification accuracy, the importance of the input features (Gini impurity) based on RF procedures is illustrated in Figure 8. In the RGB-based SLIC-RF procedure, the three most important features were Rstd, Rmin and Gstd, while the blue color showed limited contribution to the classification. In the HSV-based SLIC-RF procedure, the five most important features were Smax, Vmin, Hmed, Hmax and Sstd. In the L*a*b*-based SLIC-RF procedure, the five most important features were L*min, a*min, a*std, L*std and a*med, while b* showed low importance in the classification.

HSV combined with CHM, texture and VIs
HSV, as a primary input feature, was combined with the CHM, Texture and VIs in the SLIC-RF procedure for discriminating rice/weed plants and soil, and the OOB accuracy was compared (Table 3). When one additional input feature was added (Table 3), the OOB accuracy showed higher values for HSV+CHM (0.908), HSV+Texture (0.915), HSV+ExG (0.909) and HSV+CIVE (0.908) than when only HSV was used in the model (0.904 in Table 2), while lower values were obtained for HSV+ExR (0.899) and HSV+GRVI (0.902). Among the additional input features, the HSV +Texture-based SLIC-RF model had the highest classification accuracy (OOB accuracy = 0.915). Therefore, Texture was used as the second important input feature, and then the HSV+Texture was further combined with four VIs (ExG, ExR, GRVI and CIVE) ( Table 3). When the second Table 2. OOB accuracies of OBIA-RF classifications using three colorimetric spaces (RGB, HSV and L*a*b*).
Features (median, std, max, min) Number of features OOB accuracy RGB 12 0.709 HSV 12 0.904 L*a*b* 12 0.886 additional input feature was added, the OOB accuracies showed decreased values in most combinations (0.910--0.912). These results indicated that the second additional feature did not improve the classification accuracy, and the HSV+Texture was the best combination of input features for the SLIC-RF models for the classification. Therefore, the HSV+Texture-based SLIC-RF model was used as the final model for further analysis. Figure 9 presents the importance of the input features (Gini impurity) for all the combinations. Overall, Smax showed the highest contribution in all combinations. Although the order of the contributions changed somewhat, high contributions were also recognized in Vmin, Hmed and Hmax. The results confirmed that the HSV color space was the most important input feature for the SLIC-RF model to discriminate crop, weeds and soils. When the HSV was combined with auxiliary layers, CHM and Texture had limited or small contributions even though they showed better OOB accuracies. Meanwhile, the VIs, especially in ExG and CIVE, had relatively high contributions, but they did not dramatically improve the OOB accuracy.

Evaluation of the classification accuracy
On the final SLIC-RF model that used the selected input features (HSV+Texture), the classification accuracy was further evaluated using the test data set by comparing with the results of the model using only HSV (Table 4). The HSV+Texture-based model showed higher classification accuracy according to the OA (0.910) and F-score (0.906) than the HSV-based model (OA = 0.901, F-score = 0.900). Based on the recall and precision for each category, the most errors occurred in the classification between crops and weeds. For example, in the HSVbased model, 18 crops were misclassified as weeds, and this classification was slightly improved by the final model with the HSV+Texture combination. Soil could be classified with a very high recall (0.990) and precision (1.000) since soil has a strong color difference with green vegetation. Figure 10 presents the spatial distributions of predicted rice plants and weeds from the SLIC-RF model applied to the HSV+Texture features from UAV images at the early growth stage in the upland rice field (29 DAS), and the mean proportions of percent coverage (%) in each treatment are summarized in Table 5. The weed distribution was not uniform and formed large patches in the T3  plots that had been left for 34 days (34 DAW) without weeding after plowing. The proportions of weed and soil cover areas showed significant differences in the weed removal treatments (p < 0.05; one-way ANOVA). The weed coverage significantly increased from 7.33% (T1, 1 DAW) to 44.35% (T3, 34 DAW) as the period after weeding increased. Accordingly, the soil coverage decreased from 72.08% (T1) to 26.02% (T3), while rice coverage exhibited no difference. These results suggest that rice plants were not affected by weeds at the initial growth stage, but weeds spread to the open space faster than rice without weeding. Accordingly, it is concerning that these impacts will affect rice growth.

Discussion
In this study, we applied the SLIC superpixel approach to UAV images and integrated the RF classifier to improve the classification accuracy when discriminating crops (rice plants), weeds and soils in an upland rice field. Recently, image segmentation has been an important step in UAV image information extraction and target detection (Dong et al., 2017). In addition, the RF algorithm is receiving increased attention in remote sensing research as a highly suitable approach for high-resolution image data classification (Ma et al., 2015). By combining SLIC and RF, further classification improvements could be obtained (Csillik, 2017;Yasuda, 2018;Yuba et al., 2020b). We initially compared the performance of the SLIC-RF model using different input features of three color spaces (RGB, HSV and L*a*b*) and then combined these features with CHM, Texture and four VIs (ExG, ExR, GRVI, CIVE). Among the color spaces, the HSVbased SLIC-RF model showed the highest classification accuracy (OOB accuracy = 0.904) ( Table 2). In addition, improvements were found in the SLIC-RF model for HSV combined with auxiliary features (CHM, Texture, ExG, or CIVE) (Table 3). Overall, the best classification accuracy was obtained in the SLIC-RF model using HSV combined with Texture information (OOB accuracy = 0.915). These results are consistent with previous findings, which   indicated that RF is a suitable approach for highresolution UAV data classification (Ma et al., 2015) and that the combination of SLIC with auxiliary information layers improves classification accuracy (Csillik, 2017). Previous studies investigated the best color spaces for image segmentation, but the recommended features differed depending on the target plants. García-Mateos et al. (2015) compared 11 color spaces for classifying soil and plants in lettuce (Lactuca sativa) cultures using images from outdoor fields and found that the L*a*b* color space achieved the best performance with a* channels with 99.2% correct classification. Yuba et al. (2020b) compared RGB and HSV color UAV images in discriminating Pennisetum alopecuoides plants in a pasture, and the results showed improved discrimination accuracy in the HSV-based SLIC-RF classifier. In the present study, by assessing the importance of input features (Figures 8  and 9), the HSV feature was considered the most important variable influencing the discrimination of crops, weeds and soils in the early growth stage of upland rice fields. The HSV color space is closely aligned with human color perception (Sobottka & Pitas, 1996) and is robust to illumination variations (Chaves-González et al., 2010). Thus, the HSV color space has been adopted for many studies on the segmentation of green vegetation from the soil background in images taken under outdoor field conditions (Hamuda et al., 2017;Yang et al., 2015). The CHM from UAV images is used as an essential agronomical parameter to assess plant growth in crop fields (Bendig et al., 2013;De Castro et al., 2018;Kawamura et al., 2020;Watanabe et al., 2017). In contrast, the CHM in the present study did not contribute, as we expected, to crop and weed discrimination (Figure 9). This finding might be because the plants were short (<50 cm) in the early growth stage, and the CHM exhibited small differences between crops and weeds. RGB-based VIs have been widely used and have performed very well on the segmentation of plants from the background (Meyer & Neto, 2008). Most VIs in the present study showed a greater contribution in the RF procedure than Texture (Figure 9) but there was no improvement in classification accuracy when combined with HSV. This result might be because the VIs overlapped with HSV with the same color information, and HSV had enough feature information to segment green vegetation from the soil background.
Further evaluation was performed in the SLIC-RF model with selected input features (HSV and HSV +Texture) in the test data set (Table 4). The results confirmed that the HSV+Texture-based SLIC-RF model could discriminate the crops and weeds in upland rice fields with good classification accuracy (OA = 0.910, F-score = 0.906) and outperformed the HSV-based model (OA = 0.901, F-score = 0.900). Based on the confusion matrix, the HSV-based model could almost perfectly discriminate between green vegetation (crops and weeds) and soils due to the large difference in the color information. Meanwhile, the greatest error was observed in the crop/weed classification, and this error was slightly improved by adding Texture information to HSV. These results were in agreement with previous findings that color alone is potentially inadequate for accurately discriminating crops and weeds, and further improvement can be expected by combining auxiliary information (De Castro et al., 2018;Hamuda et al., 2017). Accurate plant segmentation from background soil is important for crop monitoring in any field because missegmentation could seriously affect the accuracy of crop/weed detection (Hamuda et al., 2016). Our results confirmed that the HSV+Texture-based SLIC-RF model could accurately distinguish green vegetation (crops and weeds) from background soil information.
However, further improvements are required in crop/ weed classification. An expected solution is adding other auxiliary information that follows the manually designed rules of the spatial distribution pattern of crops (Peña et al., 2013;Pérez-Ortiz et al., 2016;Pérez-Ruiz et al., 2015). For example, Peña et al. (2013) developed an automatic definition method for crop rows within a maize field and then discriminated the weed seedlings and crop plants based on their relative positions, indicating much improved accuracy in weed detection. Gao et al. (2018) developed a weed detection method by fusing pixels and OBIA for the RF classifier combined with a Hough transform algorithm for maize row detection. This study demonstrated accurate weed mapping results with 94.5% accuracy, which also illustrates the benefit of utilizing prior knowledge of a field setup (i.e., crop row detection). The application of a deep learning approach is another option to improve crop/weed classification (Huang et al., 2018). Guirado et al. (2017) reported that convolutional neural networks (CNNs) outperformed the OBIA method in Ziziphus lotus shrub detection using free high-resolution Google EarthTM images. Milioto et al. (2018) developed a CNN model using color indices and classified crops, weeds and soil background with 91% accuracy. Huang et al. (2018) used a fully convolutional network (FCN) and transfer learning methods on UAV RGB images for weed cover mapping in rice fields, and the results achieved 93.5% OA and 88.3% weed recognition accuracy.
Using the HSV+Texture-based SLIC-RF model, we demonstrated the spatial distributions of crops and weeds in the weed experimental plot (Figure 10). The spatial distribution of weeds was not uniform, and weeds were distributed in patches between the crop rows. The proportion of weed coverage increased with the increase in the period after weeding treatments from 7.33% (1 DAW) to 44.35% (34 DAW) ( Table 5). The results clearly indicated that weeds would spread into the open space much faster than crops. Weed seedling populations are spatially and temporally heterogeneous within agricultural fields, and often occur in aggregated patches with different sizes or in stripes along the field borders and along the direction of crop rows (Cardina et al., 1995;Dieleman & Mortensen, 1999;Gerhards et al., 1997;Johnson et al., 1996). Timely monitoring for spatially heterogeneous weed distribution could be expected to support farmers in the design and implementation of SSWM in the early growth stage, which can reduce costs by facilitating precision application of inputs such as herbicides (Pedersen et al., 2006).
Our results confirmed that crops and weeds could be mapped with high classification accuracy using UAV images with the SLIC-RF approach. We used a consumergrade RGB camera onboarded a small UAV platform (DJI Phantom 4 and camera for approximately 1,500 USD US), which is a great advantage for the practical application of weed monitoring in the early growth stage of upland rice fields in a cost-effective way. However, we note that the results in this research were based on a single UAV flight as a case study in the early growing stage of upland rice fields. At the time of UAV observation in this research, the crops and weeds were small and showed similar green colors. As crops grow over the field season, however, their phenology changes, as does that of the weeds (Xiao et al., 2013). The plant phenology results in changes in the spectral characteristics (color properties) of the crop and weed species (N. Wang et al., 2001). In addition, common crops are grown in many different varieties, each with their own unique phenology and physiology (Fan et al., 2016;Inoue et al., 1998;Lawless et al., 2005;Sakamoto et al., 2006;Viña et al., 2004). For practical application of our proposed method, this method should be further investigated to characterize the effect of crop-weed phenology on classification accuracy in different growth stages across different fields.

Conclusions
The present study applied an SLIC-RF algorithm on UAV images to discriminate and map crops, weeds and soil at the initial growth stage in an upland rice field. Based on the SLIC-RF model, we compared the following input features: three color spaces (RGB, HSV and L*a*b*), CHM, Texture and four VIs (ExG, ExR, RGVI, CIVE) and their combinations. The main advantage of this study was the demonstration of a practical application for weed monitoring in the early growth stage of upland rice fields in a cost-effective way using commercial UAV and RGB cameras. The main conclusions were as follows: (1) With the capability of providing high-resolution and detailed spatial structure information (i.e., the plant canopy structure of crops and weeds), lowcost UAVs integrated with RGB cameras are a promising tool to discriminate and map crops and weeds.
(2) UAV-derived high-resolution HSV color images and spatial variation (Texture) information were significant indicators for discriminating crops, weeds and soil. By combining the HSV with Texture information, the optimal classification accuracy was obtained in the SLIC-RF model (OA = 0.910, F-score = 0.906).
(3) A confusion matrix for the classification in the test data set was suggested: (i) misclassifications occurred mainly between crop and weed classes, and these were improved by adding Texture information to HSV, and (ii) soil could be accurately discriminated using only HSV since soil has a strong color difference with green vegetation. (4) Spatial distribution maps of crops and weeds in the upland rice field demonstrated that weeds would spread to the open space faster than crops if weeding treatments were not performed.
Our findings illustrated the feasibility of UAV-based remote sensing technology in weed detection using a machine learning approach. Nevertheless, the SLIC-RF approach employed in this study should be further tested in the different growth stages across different fields, as well as in a variety of crop types.