Individual plant definition and missing plant characterization in vineyards from high-resolution UAV imagery

ABSTRACT In the last few years, high-resolution imaging of vineyards, obtained by unmanned aerial vehicle recognitions, has provided new opportunities to obtain valuable information for precision farming applications. While available semi-automatic image processing algorithms are now able to detect parcels and extract vine rows from aerial images, the identification of single plant inside the rows is a problem still unaddressed. This study presents a new methodology for the segmentation of vine rows in virtual shapes, each representing a real plant. From the virtual shapes, an extensive set of features is discussed, extracted and coupled to a statistical classifier, to evaluate its performance in missing plant detection within a vineyard parcel. Passing from continuous images to a discrete set of individual plants results in a crucial simplification of the statistical investigation of the problem.


Introduction
Detection and localization of individual plants from remotely sensed imagery might lead to new opportunities in mechanization and precision farming technologies. This is especially true for precision viticulture. However, the possibility to identify single plant is constrained by the peculiar structure of modern vineyards, as in most cases vines are planted in rows and their vegetation organized in continuous superposing canopies (see Figure 1).
In viticulture, this organization of the vines, commonly referred as vine training, is aimed to facilitate canopy management and to achieve a correct fruit/ vegetation balance. In these systems, despite the high spatial resolution of the sensors currently employed, the outcoming information, such as the vigour zoning, only accounts for averaged data neglecting the contribution of single vine (Arnó, Martínez Casasnovas, Ribes Dasi, & Rosell, 2009). While row detection techniques saw a great development in these last few years (Comba, Gay, Primicerio, & Aimonino, 2015;Delenne, Durrieu, Rabatel, & Deshayes, 2010;Puletti, Perria, & Storchi, 2014;Smit, Sithole, & Strever, 2010), a methodology for single plant detection is still not available. Instead, the ability to recognize automatically single vine within a training row could remarkably improve the representation of the contribution of single plant to the canopy curtain, enabling to detect specific plant pathologies in the row and improving the accuracy of vigour zoning (Lee et al., 2010;Naidu, Perry, Pierce, & Mekuria, 2009;Sankaran, Mishra, Ehsani, & Davis, 2010). Single plant representation could also provide complementary data to recent studies of structure from motion vineyard canopy modelling. (Mathews & Jensen, 2013) For canopies with discrete spatial distribution, such as the goblet training system, the problem has been already profitably addressed with a point pattern analysis approach. (Robbez-Masson & Foltête, 2005) Nevertheless, for the more common training systems in which vines are planted in rows, new robust and reliable techniques need to be developed.
A common problem in trellising systems is the occurrence of missing plants (also referred as voids) along the regular sequence of vines along rows. This is usually due to the premature death of a vine, and apart from a consequent lowering in the vineyard production, it poses some serious problem from a remote sensing point of view. In fact, not only it can be an obstacle to vine row recognition and thus to the automatic processing of aerial map (Comba et al., 2015;Rabatel, Delenne, & Deshayes, 2008) but it can also induce errors in the estimation of vigour zones inside vineyards, and affect the results of the application of precision agriculture techniques in vineyard management.
The simplest approach to identifying missing plants would be to detect areas not covered by vine vegetation along the row. Unfortunately, the zenithal aerial photography is unable to identify the actual situation under the top part of the canopy, and in the case of lack of a plant, the neighbouring plants can extend their shoots and foliage up to occupy the free space between two vines.
In order to improve canopy representation in presence of missing plants, a possible approach would be to analyse vegetation extension minima, applying a threshold based on the standard deviation of canopy thickness. Nevertheless, the performance of this methodology is low because the intrinsic variability of canopy patterns within a crop can be very high, and locally the averages of canopy thickness may differ significantly from the global average.
It has been shown that vegetative and soil parameters show a positive spatial autocorrelation, as points close to each other have similar values (Arnó, Rosell, Blanco, Ramos, & Martínez-Casasnovas, 2011;Baluja, Diago, Goovaerts, & Tardaguila, 2012;Tardaguila, Baluja, Arpon, Balda, & Oliveira, 2011). This leads to the idea of using local autocorrelation coefficients, such as the Moran local index, or locally weighted regressions to improve missing plant detectability, as proposed for spatial data analysis by Shekhar, Lu, and Zhang (2003) and Anselin, Syabri and Kho (2006). This paper addresses the issue of individual plant representation along the vine rows, and explores the feasibility of applying a machine learning process to the characteristics of the extracted plants. An automated algorithm is used for vine rows detection, followed by a segmentation procedure to extract single plant from the vine row. From the plants images an extensive set of features are discussed and extracted. Finally, a multi-logistic model for the detection of missing plants is implemented and validated in a real case-study vineyard, comparing the results with field observations. The paper is organized as follows: "Materials and methods" presents the proposed approach and the experimental site. The vine rows extraction step is here only briefly exposed since it has been proposed in Comba et al. (2015). The following steps of segmentation, parameters calculation and model construction are instead described in detail, as they constitute a new contribution. "Results and discussion" presents the results of the application of the developed method to a specific vineyard in Suvereto (Tuscany, Italy) with the final recognition of missing plants. "Conclusions" concludes the paper and presents future development of this line of research.

Materials and methods
The vineyard chosen for the experimental testing, Bulichella, is a flat clay-loam field located in Suvereto (Leghorn, Italy) (43°04′ N, 10°41′ E) at 50 m a.s.l. It was planted in 1999 (cv. Sangiovese), guyot trained, and has a between-row and withinrow spacing of 2.4 and 0.8 m, respectively, with a north-south orientation. The field undergoes periodic soil tillage. A set of 64 aerial images was acquired at noon on the 9 July 2015, using an unmanned aerial vehicle (UAV) octocopter (S1000, DJI, Shenzhen, China) able to fly autonomously over a predetermined waypoint course. The camera used to acquire UAV images, a Coolpix P7700 camera (Nikon, Shinjuku, Japan) equipped with a 12.2megapixel CMOS sensor (4000 × 3000 pixel), was fitted with a 6-mm lens, allowing the achievement of a 0.03-m ground spatial resolution with a 100-m flight height.
A set of sixty 10 cm ground control points was placed in the vineyard, and georeferenced using a Leica GS09 dGPS (Leica Geosystems A.G., Heerbrugg, Switzerland) with a 3D resolution of 2 cm. Missing plants in the vineyard appear either as an altogether missing vine or as a failed (dry) reimplant ( Figure 2).
The set of acquired images was mosaicked by Autopano Giga 3.5 Software (Kolor SARL, Challes-les-Eaux, France) and then georeferenced and orthorectified with QGIS software (Quantum GIS, 2011), integrating the information provided by the set of groundreferenced points. Finally, the position of voids along the trellis system was recorded in situ with dGPS accuracy, counting a total amount of 211 missing plants.
The method for individual vine detection in aerial RGB images (Figure 3(a)), herein introduced, can be organized in two main steps: • The vine rows detection and location in a vineyard aerial orthophoto, producing a binary mask where only pixels representing vine rows have been selected and providing a set of the endpoints coordinates of each row; • The identification of individual plants along each vine row by means of vine canopy centroids applying a nearest neighbour geometrical segmentation procedure. This methodology is then applied to the missing plant detection problem. From each plant, area and shape are extracted and a set of parameters describing the plants' main features is calculated. A detection method based on generalized linear model (GLM) is then implemented, and its performance tested matching estimated missing plant positions with real observations in a case-study vineyard.

Vine row detection
The detection of vine rows in an aerial image of a vineyard is carried out using the method proposed in Comba et al. (2015). Firstly, through a dynamic-windows segmentation procedure, a binary image is produced in which clusters of interconnected pixels mainly represent vine rows. Since, at this step of the procedure, a single cluster can still represent a group of near vine rows, each cluster of pixels is projected in the Hough Parameters Space to determine the set of the lines that gives best fit, and eventually, to split a single cluster in several sub-clusters. Finally, using a total least squares approximation, the set of straight-lines representing the optimal estimation of each vine row alignment is calculated, discarding any possible remaining cluster of pixel representing non-vine vegetation. The output of this process is a binary mask image (Figure 3(b)), of the same size of the preprocessed one, where pixels representing vine canopies, well distinguished from all the other ones considered as background, are collected in clusters C i i ¼ 1; . . . ; n, one for each of the n vine rows. The coordinates of the two endpoints A i and B i of the line representing the optimal estimation of the i-th vine row are expressed as: where λ i ; φ i ; λ i ; φ i are the longitude and latitude of the ending points of the i-th vine row.

Identification of individual plants
The calculation of the trunks position along a vine row can now be performed using the position of the vine rows, provided by the previous step, and the distance between the vines. Due to the small extension of a typical vineyard with respect to the Earth radius, the great circle distance calculation can be neglected and thus the distance d i between the two ending points A i and B i of each row can be calculated as the Euclidean distance on an equirectangular projection (Gade, 2010;Snyder, 1997).
where R is the mean Earth radius.
Knowing the regular spacing s between the vine trunks, the number of plants m i in the i-th row is the result of the integer division of d i by s:  Then, the estimated position coordinates v k;i λ; φ ð Þ of the k À th vine along thei À th row (counted starting from A i ) can be defined as follows (Figure 2): where Since, in general, an aerial image of a vineyard shows a continuous distribution of vegetation along a vine row, the coordinates v k;i of the vine trunks position calculated at this step are theoretical. Indeed, the present calculation is correct only for vineyards with a symmetrical distribution of the vegetative part with respect to the trunk. In case of asymmetric distributions, like in cordon-trained systems, the vine trunks positions can be easily adjusted to reflect the correct curtain disposition.
The assignment of a portion of the canopy envelope to an individual plant is performed dividing the cluster of pixels C i of the i À th vine row into a set of sub-clusters, one for each vine trunk previously determined. Every pixel p j,i of cluster C i is assigned to the canopy of a single plant sub-cluster W k;i on the base of its Euclidean distance from the vine trunk position v k;i , as follows: where n Ci is the number of pixels belonging to C i .

Missing plant detection
An application of the new information potential provided by the single plant identification of a vineyard pattern, resulting from the proposed detection procedure of individual plant extraction, is the missing plant detection task. A first rough approach to the problem of detecting missing plants could be the selection of plants having a canopy described by a small, or even null, amount of pixels, adopting a simple threshold h on the cardinality of w k;i , that is, missing plant at row i; vine k $ card w k;i À Á < h: This method, however, is a poor discriminating tool, since often vine shoots tend to grow towards and inside the voids left by missing plants, filling them partially (see Figure 4). Moreover, the intrinsic spatial variability of the vegetative vigour, typical of the vineyards, may determine a huge range of foliage densities and canopy extent within the same vineyard, and vines from low-vigour zones may present fewer curtain pixel than missing plants from high vigour ones.
Starting from the virtual shapes w k;i , a machine learning procedure has been adopted to properly discern between the presence or absence of a plant along a row. The required statistical classifier should be based on a set of canopy descriptors, wider than the sole canopy surface measurement. In order to select the proper set of variables, a GLM is adopted, where the depended (classification) variable is the probability that a sub-cluster of pixels w k;i represents a missing plant. More in detail, a Binary Multivariate-Logistic Regression (BMLR) model (Glonek & McCullagh, 1995) was selected which uses the logit link-function, the logarithm of odds ratio (Hogland, Billor, & Anderson, 2013).
The proposed set of potential descriptive variables to be evaluated, in order to improve the classifier robustness, is described in the following and organized in the Table 1 for sake of clarity.
The first parameter (here named Area) that can be derived from the segmentation procedure is the number of pixels of each detected sub-cluster w k;i , potentially representing the projection of the canopy area of each individual plant to the ground. As noted above, although the area alone cannot discriminate the presence or absence of vines, it is highly probable that a missing plant might cause a decrease in canopy area. The next parameters are perimeter P and roughness R. P-value consists of the pixel count of the boundary of the single vine's area, while R is a measure of its "compactness" calculated as in Tang and Tian (2008) by using the following relation: These two variables describe the shape of the single vine and account for possibly different patterns of canopy cover in case of shoots and foliage originating from neighbour plants. As pointed out in the introduction, plants with similar vigour, and thus having similar row width, are not randomly distributed in the vineyard but tend to be grouped as vigour patterns. In order to treat this phenomenon statistically, two other informative parameters are considered. I is the value of local Moran index (Anselin, 1995) calculated considering plants area and position inside the vineyard. Since it is a deviance from local area averages, the analysis of negative outliers in its values gives a measure of spatial association concerning area parameter, as suggested in Filzmoser, Ruiz-Gazen, and Thomas-Agnan (2014). L is defined as the residual of a non-parametric locally weighted regression (LOESS) fitting of the plant areas along every row, following the methodologies described in Cleveland and Devlin (1988). Since a larger residual is associated with an outlier in the sequence of plants area, its rationale is the same of I, this time enforcing the role of the vine row.
Finally, the last parameter considered is B, that is, a geometrical parameter measuring the shortest Euclidean distance of the vine trunk from the vineyard boundary. B takes into account potential boundeffects of vineposition on plant missingness, since plants near the boundaryexperiment different microclimate conditions fromthose inside the vineyard (Matese et al., 2014). All statistical procedures presented in this work were carried out in R (R Core Team, 2015) environment by using the specific R packages Deducer (Fellows, 2012) and ROCR (Sing, Sander, Beerenwinkel, & Lengauer, 2005).
The significance of the variables for the assessment of a single plant absence has been investigated by the Wald statistics as defined in Wasserman (2006). The parameters of the BMLR model were calculated with the maximum likelihood estimation method, and became a tool able to provide a probabilistic description of the state of the dependent variable (classification) for each single cluster w k;i (Hosmer & Lemeshow, 2004). From the comparison of collected field data with model estimation, a confusion matrix and the receiver operation curve (ROC) plot were built (Fawcett, 2006). Given the probabilistic nature of the recognition, the ROC plot reports the sensitivity versus specificity for the possible cut-off classification probability values. The model performance was then evaluated analysing the Area Under Curve (AUC) ROC curve's parameter. ROC plots and AUC scores are suitable tools to evaluate not only the strength of classifier but also the validity of the parameters used, a point that represents undoubtedly one of the goals of this work. More information on the use of ROC in statistics is available in Mason and Graham (2002).

Results and discussion
The proposed image processing workflow has been applied to the mosaicked, georeferred vineyard sample image (Figure 3(a)), successively defining the vine rows binary mask, computing the set of trunk positions and extracting the canopy shape of each individual plants. For every plant, the whole set of parameters to be used in the model was then calculated.
Vine row mask result from the RGB vineyard image is shown in Figure 3(b). Although, the algorithm is able to differentiate vines from the inter-row even in presence of grass, in the actual case, due to the clear separation of soil and vegetation pixels in the original image, a very precise extraction of the vine mask has been obtained. All clusters of pixel representing vine rows are clearly distinguishable, and the masking process confirms the ground evidence that, apart from very rare cases, the vine rows appears as continuous objects, hiding the eventual missingness of a vine underneath.
The extraction of the individual plants was then carried out. From every row, the number of vine trunks and their coordinates were calculated for a total counting of 2242 vines that resulted in a missing plant incidence of 9.4%.
The feasibility of the voids determination adopting a simple threshold h (Equation (7)) on the vine canopy area has been tested for completeness. Setting the h parameter to the value that provides the numbers of missing plants equal to the in field measured quantity (211), only 44.9% of the voids positions were correctly recognized, showing that the sole plant canopy areas measurement cannot reliably discriminate plant missingness.
A stepwise selection inside the GLM has been used to evaluate the significance of each selected parameters in a discrimination process between vine plant presence or absence along the vine rows. The significant parameters are presented in Table 2. As expected, area has a strong influence in the predictive power of the model, and also the roughness parameter R has a good significance. What is interesting, in this case-study vineyard, is the relative significance of I and L. Since Moran's I is sensible to local spatial effects and L measures these effects along the row, the stronger influence of L in the model predictive power can provide the hint that the possible causes of plant missingness is a property that propagates along the rows (as the case of treatments and common vineyard management) rather than spatial effects (e.g. pedological qualities). Of course, while in this case study some parameters do not show a significant influence in the model, they could be important in other applications, and only a deeper study of other real cases and a model optimization step could prove their real importance. From the possible sets of parameters to be included in the BMLR, the two that maximize the Likelihood function are group A1 (Area,R,L) and A3 (Area,R,I). The ROCcurve in Figure 5 summarizes classifier performances over a range of trade-offs between true positive (TP) and false positive(FP) error rates. The AUC gives a measure of the recognition power of the model. For model A1, it can be interpreted to mean that a randomly selected plant from the missing = 1 group has a test value larger than that for a randomly chosen individual from the missing = 0 group 95% of the cases. Table 3 presents the confusion matrices for models A1 and A3, for plants and voids TPs and FPs. Due to the probabilistic nature of the models, different threshold cut-offs produce a variation in the TP/FP ratios for both plants and voids. Lower thresholds discriminate better the voids but produce a consequent increase in plants FP (voids are correctly recognized, but more plants are considered voids), while higher thresholds recognize better plant presence but fail to detect voids correctly.

Conclusions
The work presented in this paper has shown how useful information for the vineyard modelling can be profitably extracted from standard RGB images obtained by UAV imaging, developing proper image processing algorithms. In detail, the specific application regarding missing plant detection has been discussed, but the proposed method can be adapted to address other classes of problems such as the detection of plant pathologies in the rows and high precision vigour zoning.
The main feature of the proposed method is the delineation of a set virtual shapes, by the definition of exclusive pixels clusters within the vineyard aerial image, which can be assumed to represent the vegetation canopy of each individual vine plant. A set of descriptors, derived from each plant shape characteristics, can be used for the detection of missing plants and, in general, for the desired recognition model. Indeed, passing from continuous images to a discrete set of individual plants results in a crucial simplification of the statistical investigation of the problem. It has to be point out that some constraints and criticalities still remain, even adopting the proposed Significance codes: 0"***", 0.001"**", 0.01"*", 0.05"."  method. For instance, this method can be applied only to cases in which the spacing between the plants and rows is constant, but this is not so critical, because in almost all cases this solution is the most convenient for the vineyard installation. In particular, this is generally true in modern vineyards whose implant is automatically and mechanically performed, but an a priori check of this regularity should be done in any case before applying the segmentation method. The choice of the specific descriptor (or a combination of) to be used in this operation is a crucial point. Since, the aim of this work was to explore the viability of applying statistical modelling to a set of automated object recognition processes, all possible information from the extracted objects was considered, showing that this course of reasoning is a viable one. Of course, the robustness and reliability of the method will be substantially improved increasing available data, for example, with the integration of multispectral imaging and NDVI maps calculation that would permit the extraction of single plant vigour and health status.
In the specific case of the detection of missing plants, the method described in this paper allows to put in evidence several complex factors that can prove to be useful in the managing of the vineyards. For instance, one could discriminate between situations in which missingness is uniformly (isotropically) distributed in the vineyard from cases in which it follows definite patterns (e.g. its occurrence seems to propagate along the rows, or it presents "hot spots"). Of course, this information can be used to infer the causes that originated the missingness.
With the possibility of detecting and removing missing plants in high-resolution aerial maps, this approach will permit in the future to solve some of the problems of correct representation of vigour zones inside vineyards. This is especially true in situation where the incidence of missing plants may alter all results, like vigour maps, that rely heavily on averages and interpolations. Moreover, individual plant representation will open the way to establishing a new course in precision viticulture, with the introduction of plant-specific application.