Estimating tree heights with images from an unmanned aerial vehicle

ABSTRACT Unmanned aerial vehicles (UAV) have been used in a variety of fields in the last decade. In forestry, they have been used to estimate tree heights and crowns with different sensors. This approach, with a consumer-grade onboard system camera, is becoming popular because it is cheaper and faster than traditional photogrammetric methods and UAV-light detecting and ranging systems (UAV-LiDAR). In this study, UAV-based imagery reconstruction, processing, and local maximum filter methods are used to obtain individual tree heights from a coniferous urban forest. A low-cost onboard camera and a UAV with a 96-cm wingspan made it possible to acquire high resolution aerial images (6.41 cm average ground sampling distance), ortho-images, digital elevation models, and point clouds in one flight. Canopy height model, obtained by extracting the digital surface model from the digital terrain model, was filtered locally based on the pixel-based window size using the provided algorithm. For accuracy assessment, ground-based tree height measurements were made. There was a high 94% correlation and a root-mean-square error of 28 cm. This study highlights the accuracy of the method and compares favourably to more expensive methods.


Introduction
Calculating the canopy and individual tree heights of a forest with remote sensing is highly accurate and can be done with less time and cost than traditional approaches. Airborne light ranging and detecting (LiDAR) is the most commonly used system for deriving metrics from a forest area. There have been several studies on the use of airborne LiDAR platforms in forest areas that show accurate results (Breindenbach et al. 2010;Gleason & Im 2012) with the use of Unmanned aerial vehicles-LiDAR (UAV-LiDAR) platforms (Wallace et al. 2014a;Wallace et al. 2014b) and even with spaceborne LiDAR platforms (Selkowitz et al. 2012). However, short flight sessions and the high cost of these surveys with experienced personnel prevent continuous studies (Wallace et al. 2012;Zarco-Tejada et al. 2014). There have also been studies with the satellite images (Gougeon & Lecke 2006;Takahashi et al. 2012) based on forest structure and the spatial resolution of the satellite images; the results are less precise but useful for large areas.
In recent decades, UAVs equipped with consumer-grade cameras have provided the most convenient approaches for inventory, monitoring, and modelling applications (Selkowitz et al. 2012;Zarco-Tejada et al. 2014). Lightweight UAV platforms (<2 kg) can fly longer than airborne LiDAR CONTACT Anıl Can Birdal anilcanbirdal@gmail.com and UAV-LiDAR platforms, which helps reduce survey costs. St-Onge et al. (2015) compares airborne LiDAR and UAV photogrammetric clouds. Lisein et al. (2013) compared a hybrid photo/ LiDAR (canopy height model) CHM construction to LiDAR and forest inventory data. They claimed that UAV flights with short time intervals can refresh LiDAR CHMs in order to create canopy height series. Lisein et al. (2015) reported classification of deciduous trees with UAV imagery with high classification accuracy. Zhang et al. (2016) showed that long-term ecological monitoring can be effectively done by UAV systems with high resolution data. Puliti et al. (2015) reported how small forest inventories can be created by UAV systems and provided highly accurate results.
In order to estimate individual tree metrics, high resolution digital elevation models (DEMs) and photogrammetric point clouds must be generated to create CHMs. With photogrammetric point clouds, virtual tree models can be generated . Classification of these point clouds based on their geometric characteristics can prove useful in avoiding detection errors and the interpolation of the terrain beneath forest structures ). However, generating only a few points from the ground surface in dense forest may be problematic when interpolating the terrain (Wallace et al. 2014b). Therefore, forest structure types become significant when it comes to detecting tree crowns or real tree heights (Vauhkonen et al. 2010).
Miniaturized UAV payloads with consumer-grade cameras, global positioning system devices, and embedded computer systems provide images with geometrical deformations and are generally of poorer quality than traditional metric systems used on airborne platforms (Berni et al. 2009;Zarco-Tejada et al. 2014). Several methodologies like structure-from-notion (SfM) and multiviewstereo must be performed to correct for these issues (Strecha et al. 2008;K€ ung et al. 2011;Remondino et al. 2011;James & Robson 2012;Vallet et al. 2012;Fritz et al. 2013;Zarco-Tejada et al. 2014). Traditionally, airborne photogrammetric acquisition of images has been used to obtain canopy heights (Waser et al. 2008;Wallerman et al. 2012) with on-board digital mapping cameras. These surveys produce promising data, but with high time and cost requirements.
In this study, we focus on low-cost UAV system to derive canopy height and individual tree height and crown data. A consumer-grade RGB camera on a lightweight UAV (< 0.70 kg) was used to generate ortho-images, which were then used to construct a digital surface model (DSM) and digital terrain model (DTM) of the study area. By subtracting the DSM from the DTM, a real height model (which contains the real heights from the ground), known as CHM, was filtered with the local maximum filter algorithm to obtain individual crown points and tree heights. Afterwards, tree heights were measured in the field with a laser distance metre and compared.

UAV workflow
Forestry work with UAV platforms started in the 2000s with model planes and helicopters (Hongoh et al. 2001;Hunt et al. 2005;Fritz et al. 2013). Recently they have been used with consumer-grade cameras and low-cost systems in order to make surveys more efficient. A workflow of how to generate individual tree heights and locations is shown in Figure 1.
In the 'Ground Exploratory' stage, which took about half a day, a suitable take-off and a landing area was located to reduce potential damages to the UAV platform. Also, types of trees were identified to gather more information about their growing classes and foliage structure. In the 'Flight Planning' stage, preliminary parameters of UAV flight, which are explained below, were evaluated based on area covered by the test area, wind velocity, atmospheric conditions, etc. During the 'Flight Session' stage, only 28 minutes were spent for flying the UAV; ground control point (GCP) measurements took approximately 1 hour, so the preliminary field survey took less than 1.5 hours in total.
The 'Ground Measurements' stage was the most tiring and time-consuming part of this study. In total, 53 trees were measured. Trees were selected for measurement based on how clear they could be identified in nadir imagery. A whole day was spent for this stage with only one person working, including measuring and recording in the field with pen and paper and entering data into a computer. In the stage 'Generating Ortho-Images, DSM and Point Clouds,' a semi-automated process was conducted. The only manual part was marking the GCPs. This stage only took only half a day with a high-end computer; a slower computer would increase the time.
In the stage 'Creating Canopy Height Model,' point cloud data was reconstructed to obtain above ground level (AGL) heights of the test area. Then, a raster was created based on AGL point cloud data, which is called CHM. Ground training data were used in a significant manner for this stage. 'Local Maximum Filtering' was part of the filtering process of the CHM. This stage only took about ten minutes, but more data would take longer. The stage 'Validation with Ground Measurements' was the second-most time consuming part of the study, as it involved matching ground-measured and algorithm-obtained tree heights, not only by locations but also with pen and paper. Matched heights were then compared statistically. This stage took approximately one day.
In total, less than three days were spent to obtain individual tree height of the test area with the validation process. This amount of time would vary with the size of the area, the processing speed of the computer, and the experience of the personnel.

UAV platform
A lightweight UAV platform (eBee), which is developed by senseFly, a Parrot company, was used throughout this study. The eBee is a fixed-wing UAV that weighs less than 0.70 kg, including the camera, and has a wingspan of 96 cm. Its cruising speed ranges from 40 to 90 km/h, which makes it suitable for mapping up to 12 km 2 (1200 ha) with a maximum flight time of 50 minutes. The camera was a Canon IXUS 127 HS with a 4608 £ 3456 pixel detector that captured images at f/2.7 and 1/2000 s.

Study area and flight session
The study area was a manmade forest called the Urban Forest of Eskişehir City, Turkey ( Figure 2). The forest covers approximately 15 ha, of which we studied roughly 1 ha. The main reason behind using a small test area was to work on a seamless ortho-image. Mini UAVs do not have good wind resistance, so some aerial images were not taken in smooth conditions. Therefore, the test area was selected based on the seamless parts of the ortho-image. Flight sessions' day and time were selected based on low wind speed over the study area. Mostly black and scots pines, which are coniferous trees, were planted in the area around 1960 ( Figure 3). The majority of these trees were not mature trees but trees in a shrub layer. According to Bersier and Meyer (1994), the shrub layer is considered lower than 5 m in height. During the 28-minute flight, 133 images were taken from 150 m (AGL) with a 6.41 cm ground sampling distance (GSD). Aerial images were stored on a memory card in the camera. Communication with ground control unit was via a 2.4 GHz radio link and a Universal  Serial Bus (USB) computer connection. A hand launch system was used at the beginning of flight. High overlapping parameters were used between each image: 80% forward overlap and 70% side overlap. Six three-dimensional (3D) GCPs were used, which were obtained with a Global Navigation Satellite System (GNSS) with the Real Time Kinematic (RTK) technique. Point clouds were produced from on the images, ortho-images, and DSMs.

Field measurements for tree height validation
In total, 53 trees were measured in the field and compared to the heights estimated from imagery. Measurements were taken with a laser distance metre ( §1 mm) ( Figure 4) and the trees' locations were recorded with a GNSS. A laser distance metre was used because of its low cost and working speed. Although there are more accurate devices like laser scanners or total stations theodolites, the purpose of this study is to prove the low cost of this methodology thus making a laser distance metre more suitable. Devices like laser scanners and total stations may prove difficulty to use due to size and finding a suitable place to set up in a dense forest area. A tripod (used at a maximum of 1 m above ground) (URL1) with a precision measuring apparatus was used with the laser distance metre in order to get the best results. Though it is very challenging to get the precise height values of the trees, expected error of the laser light ( §1 mm) proved effective in detecting the highest tree tops. Another approach is reported by Zarco-Tejada et al. (2014). They used a GNSS device in RTK mode and placed the antenna at the tree top in order to produce the height values. In their study area, tree heights ranged from 1.16-4.38 m which made it easier to use a GNSS device. In our study area, some trees were taller than 5 m, which would have made it difficult to use a GNSS device. Ground measurements were made the day after the flight. Measurements were made from the bottom, where the tree meets the ground, to the tree top, which rises above other branches. In dense forest areas, a ladder set against another tree was used to spot the crown of the trees if it could not be seen from the ground. Then the laser distance metre was placed on the ground with a visible point of view to the tree top before measuring. Collected heights were recorded with pen and paper. Because the GNSS receiver did not work in areas with dense foliage, trees selected for validation were near clearings, which could also be more easily measured. The GNSS recordings were not the precise position of each tree, as the algorithm for estimating positions could not match the locations of tree trunks, but they were easily paired with the correct tree in the processing stage. 2.5. Generation of ortho-images, DSM, and point clouds Image processing started with geotagging flight information and camera parameters for each image. Geotagging adds geographical locations to each image. These metadata add related information to exchangeable image file format (EXIF) header that contains coordinates and parameters of the camera. Ground control information based on GCPs was created as a text file. This text file contained the names and coordinates for each GCP. In all images, an analysis was made to determine if there was any GCPs present, later to be selected to match with related pixel coordinates. Six GCPs were used with a mean error of 0.041 m. In a fully-automated process, all 133 images were calibrated. A total of 1,752,447 key points were used for the bundle block adjustment with 587,230 3D points. The mean reprojection error of the adjustment was 0.3 pixels, or approximately 2 cm. Postflight Terra 3D, powered by Pix4D (Pix4D 2014), which is developed by the Swiss Federal Institute of Technology, was used in the fully automated process. This software is based on automatically finding thousands of common points between images. Each characteristic point found in an image is called a keypoint. When two keypoints on two different images are found to be the same, they are matched keypoints. Each group of correctly matched keypoints will generate one 3D point. The point cloud is a set of 3D points that is used to reconstruct the model. The position and colour information is stored for each point in the point cloud. The resulting DSM and point cloud data are shown in Figure 5. For detailed information about creating point clouds from aerial imageries with SfM, see Sch€ onberger et al. (2014). Throughout the study, ArcGIS (ESRI 2011) was used for visualization purposes.

Creating the canopy height model
A canopy usually refers to the upper layer of a forest that is formed by tree crowns. The CHM used in this study is the above ground height model of the forest. In order to obtain the AGL heights of the trees, we used the point cloud data to interpolate the terrain beneath the forest structure. First, point cloud data were classified as ground points or non-ground points. Points considered to be early returns were classified as non-ground points (which can be defined as forest, particularly when studying in forest areas with no human-made structures), while last returns were classified as ground points. A triangulated irregular network (TIN) was created based on these ground points. The most important part of interpolating the terrain is how many points can be gathered under the foliage. Fewer and less accurate point cloud data would cause problems in accuracy. Large overlapping areas of the images help create more accurate and denser point clouds. A big disadvantage of this study is that it is only applicable in very open canopies. A more mature and denser natural forest area with a closed canopy would have fewer points and lower data resolution. The average density of point cloud data for this study was roughly 40 points per m 3 . LAStools software (Rapidlosso 2014), developed by Rapidlosso GmbH, was used in this process. There were no trees over 8 m height in the test area based on the ground exploratory work, so a threshold of 8 m maximum height was used to eliminate noise in the data such as birds. The equations presented in Section 3.2 are not dependent on this threshold. Based on the TIN, each point's height was calculated ( Figure 6).
The point cloud data from the AGL heights needed to be gridded into a raster in order to be filtered by local maximum filter. In order to do this, we chose a step size based on the size of the trees, which would fill in a desired amount of pixels. A step size of 0.3 m was appropriate for the study area and a 300 £ 400 pixel raster was created. To eliminate empty pixels, points classified as first returns were replaced with a circle of a predefined radius. The largest height from the points inside the pixel was used in the gridding process, therefore only one height value was embedded within each pixel. Thus, the CHM was ready for field validation (Figure 7). Detailed information about how to generate error-free CHMs can be found in Khosravipour et al. (2014).

Local maximum filter
The CHM was based on the highest peak of the trees in each pixel. Local maximum filters were based on the window size set by the user. This filter moves the pre-defined window over the CHM and then compares the centre cell's value with the surrounding pixels within a circular window in order to define the centre pixel as a maximum (Popescu et al. 2003;Kini & Popescu 2004;Popescu & Wynne 2004;McGaughey 2014). The algorithm uses the CHM to identify local maximums and produces a text file based result. The result can easily be imported and visualized in a Geographical Information Systems software program. Generally, the moving window is specified as 3£3, 5£5, etc. depending on the pixel size of the CHM (Niemann et al. 1998;Pinz 1998;Popescu et al. 2003). In this study, a 3£3 window was used, roughly 1 m 2 , in which the algorithm searched for a maximum (based on the previously-defined 0.3 m step size). The variable sized circular window was based on the maximum height of the centre pixel within the window size defined by the user: Deciduous : crown widthðmÞ ¼ 3:09632 þ 0:00895 Â ht 2 (1) Pines : crown widthðmÞ ¼ 3:75105 À À0:17919 Â ht þ 0:01241 Â ht 2 (2) Combined : crown widthðmÞ ¼ 2:51503 þ 0:00901 Â ht 2 (3) Figure 6. AGL height of the test area.
Equations 1-3 are taken from Kini & Popescu (2004) for deciduous, pines, and combined tree types, respectively. In these equations, ht represents the height of the centre pixel. This algorithm is calculated based on stand composition equations (Kini & Popescu 2004). Based on the ground height measurements, users should select their own window size in order to get the best results. In our study, Equation 3 was selected for variable window size calculation based on the ground surveys. The radius obtained from this equation was used to draw a circle, whose centre is the centre pixel of the algorithm's pre-defined window as it is defined as a local maximum. Within this circle, all the pixels' values are compared to the centre pixel in order to define it as the local maximum. During the process, FUSION/LDV (FUSION/LDV 2014) software was used. Figure 8 shows the resulting raster with the point features as individual trees and AGL heights.

Validation of estimated and measured tree heights
To validate the method, we compared two different methods of measuring tree height. The first method was using laser distance metre and the second was with the algorithm. In total, 53 groundmeasured heights were taken. Tree heights in the test area ranged from 1.20 m to 7.10 m.
Paired t-tests were conducted with the population means from the two methods' data. The purpose of this statistical analysis was to determine whether the mean differences between the two paired samples differed from zero (Daniel & Terrell 1995). Two hypotheses were evaluated: H 0 : At a 95% significance level, between ground measured and algorithm-estimated heights, there is no statistically significant difference (m 1 ¡m 2 = 0). H 1 : At a 95% significance level, between ground measured and algorithm-estimated heights, there is a statistically significant difference (m 1 ¡m 2 6 ¼0).
The following equation was used for the paired samples t-test: where d is the mean difference between two samples, s 2 is the sample variance, n is the sample size, and t is a paired sample t-test with n¡1 degrees of freedom. For this study's data, t = 1.166, which is lower than t-table value of 2.006. For this reason, the null hypothesis H 0 cannot be rejected. The correlation coefficient of the two data-sets was approximately 0.94 ( Figure 9) and the root-mean-square error (RMSE) was 28 cm.

Discussion
The objective of this study was to evaluate the effectiveness of UAV in identifying tree crowns and measuring their heights for urban forest inventories. Only one previous study made quantitative validations of tree heights using UAVs with consumer-grade cameras (Zarco-Tejada et al. 2014). Zarco-Tejada et al. (2014) studied an orchard where canopy density was not as complex as in this study. Also, in their ground measurement process, obtaining tree height values with a GNSS device in RTK mode would have proven more difficult in a denser forest area like in this study. Their trees ranged from 1.16 m to 4.38 m, which made it easier to use a GNSS device for ground measurement. In our study area, trees ranged from roughly 1.20 m to 7.10 m, thus it was difficult to use a GNSS device to estimate tree heights. Hence, the present study and its algorithm for estimating AGL heights and positions should prove useful for forestry applications such as plant breeding, agronomy, plant quantification, etc. Specifically, individual tree heights could help with growth and age classification, firewood amount prediction, and probably biomass calculations. This study's main advantages are being cheaper and faster than other methods such as LiDAR, UAV-LiDAR, spaceborne LiDAR, satellite systems, and traditional photogrammetric methods. UAV-LiDAR methods, which are widely used in forestry and agricultural applications, do produce more valuable results, but due to heavier payload, gathering the base data is time consuming. The consumer grade camera used in this study was much cheaper than a UAV-LiDAR sensor. With spaceborn systems, there is always the issue of clouds and generally the resolution is not optimal for individual tree height estimations. Traditional aerial photogrammetry is costly for small study areas like the one presented here. Also, permission and authorizations, flight planning, and the need for experienced personnel make this method costly and time consuming. Given this study's accuracy, this approach should be useful in low-height flight sessions in order to get the most out of aerial photographs. The quality of the sensor may solve the problem of low-height flying by enabling higher flights, thus allowing surveys to cover more area. Even so, raising the number of low-height flights performed by small UAV platforms would still make this method preferable to other methods. Also, GCPs are required to obtain highly accurate DEMs, which eventually produce more accurate tree positions and heights. Local tree morphologies would affect the performance of this methodology. It is strongly recommended that parameters be based on ground training data. In our test area, detection of tree tops was easy compared to denser forests. Coniferous trees located were more easily identified and tagged. This approach would cause problems in areas where the tree tops could not be identified because of overlapping trees, which generally occurs in natural forest areas. Clearings between trees are very helpful when it comes to interpolating the terrain beneath the forest structures. Not only clearings between trees, but also clearings within tree foliage could enable obtaining more terrain points closer to the stem base. Only in very open canopy structures can this produce highly accurate results. Adjusting the interpolation process based on local parameters and obtaining more terrain points under the foliage could increase accuracy.
Ground measurements were not homogeneously distributed over the study area. This is because tree foliage generally did not allow the GNSS system to work properly. While recording the heights with pen and paper, locations of trees were marked on the ortho-image's paper map that was created before ground measurements. But this method might not work in other areas because, in a nadir perspective, locating trees on a paper map would be challenging. Hence, ground measurements were based on trees that were separated from others that could easily be identified in UAV imagery. The locations of these trees were not the stem base locations, but they could be matched with the associated estimated height point in order to do validation. If this method were used in another open canopy, users should try to get as many as ground measurements as possible to validate the results.
This study shows that in manmade forests, tree position and height detection is possible through point clouds generated by image matching, thus enhancing management decisions.

Conclusions
In this study, we used an UAV and a consumer-grade camera to obtain individual tree heights in a forested area. Compared to other approaches, this method produces accurate results, has a low cost, does not require trained specialists to use the UAV or the camera system, and takes little time. In a 15 ha forest, we performed one flight session over 1 km 2 for 28 minutes and gathered 133 aerial images with 6.41 cm GSD. The aerial images were the basis for a CHM that was filtered with a local maximum filter algorithm. The estimated tree heights from the algorithm were validated by field measurements, with a RMSE of 28 cm. Future work should focus on different types of trees and forests where the density of the forest will present the greatest challenge. With consumer-grade infrared cameras, classification of the trees should also be possible, which would provide useful data for forest inventories. Also, smaller cameras and positioning systems would increase accuracy.
This approach could prove useful when it comes to preparing inventories for very open canopy structured forest areas, which are generally manmade, and also monitoring them at set intervals. The highly cost effective, flexible, and mobile UAV technology, in addition to fully automated photogrammetric processing, can be deployed for operational use.