Performance, effectiveness and computational efficiency of powerline extraction methods for quantifying ecosystem structure from light detection and ranging

ABSTRACT National and regional data products of the ecosystem structure derived from airborne laser scanning (ALS) surveys with Light Detection And Ranging (LiDAR) technology are essential for ecology, biodiversity, and ecosystem monitoring. However, noises like powerlines often remain, hindering the accurate measurement of 3D ecosystem structures from LiDAR. Currently, there is a lack of studies assessing powerline noise removal in the context of generating data products of ecosystem structures from ALS point clouds. Here, we assessed the (1) performance and accuracy, (2) effectiveness, and (3) time efficiency and execution time of three powerline extraction methods (i.e. two point-based methods based on deep learning and eigenvalue decomposition, respectively, and one hybrid method) for removing powerline noise when quantifying 3D ecosystem structures in landscapes with varying canopy heights and vegetation openness. Twenty-five LiDAR metrics representing three key dimensions of the ecosystem structure (i.e. vegetation height, cover, and vertical variability) across 10 study areas in the Netherlands were used for our assessment. The deep learning method had the best performance and showed the highest accuracy of powerline removal across various landscape types (average $F1$F1 score = 96%), closely followed by the hybrid method (average $F1$F1 score = 95%). In contrast, the accuracy of the eigenvalue decomposition method was lower (average $F1$F1 score = 82%) and depended on landscape context and vegetation composition (e.g. the $F1$F1 score decreased from 96% to 63% when the average canopy height increased across landscapes). Powerline noise removal had the highest effectiveness (i.e. generating LiDAR metrics closest to those derived from manually labeled ground truth data) for LiDAR metrics capturing height and cover of low- and high-vegetation layers. Time efficiency (processed points per second) was highest for the eigenvalue decomposition method, yet the hybrid method reduced the execution time by > 50% compared to the deep learning method (ranging from 20% to 89% in study areas with different landscape composition). Based on our findings, we recommend the hybrid method for upscaling powerline removal on multi-terabyte ALS datasets to a regional or national extent because of its high accuracy and computational efficiency. Remaining misclassifications in LiDAR metrics could be further minimized by improving the training dataset for deep learning models (e.g. including various shapes of transmission towers from different datasets). Our findings provide novel insights into the performance of different powerline extraction methods, how their effectiveness varies for improving vegetation metrics and mapping the 3D ecosystem structure from LiDAR, and their computational efficiency for upscaling powerline removal in multi-terabyte ALS datasets to a national extent.


Introduction
Timely and accurate monitoring of ecosystem structures is increasingly needed to support ecological research, biodiversity policy, and habitat management (Eitel et al. 2016;Potapov et al. 2021;Wulder et al. 2012).Light Detection And Ranging (LiDAR), an active remote sensing technique, allows us to map vegetation structures from 3D point clouds with very high details (Asner et al. 2014;LaRue et al. 2020).An increasing number of countries have incorporated airborne laser scanning (ALS) campaigns into their national monitoring programs, providing massive amounts of 3D point clouds at regional or national extents (Kissling et al. 2022;Moudrý et al. 2023;Valbuena et al. 2020).Processing and extracting information from these 3D point clouds allows users not only to map terrain properties, aboveground biomass, and forest carbon at high resolution (Huang et al. 2019;Tang et al. 2021) but also the 3D structure of ecosystems (Almeida et al. 2021;Assmann et al. 2022;Kissling et al. 2023).However, pre-classification attributes of 3D point clouds delivered by data providers often do not unambiguously differentiate vegetation from other objects, which can lead to biases and errors in derived data products of the ecosystem structure (Kissling et al. 2023).Minimizing such biases and errors when generating data products is therefore crucial for accurately quantifying ecosystem structure.
LiDAR point clouds typically come with a preclassification that defines for each individual point to which class it belongs (i.e. which object has reflected the laser pulse), for instance, by using the standard point classes from the American Society for Photogrammetry & Remote Sensing (ASPRS 2019).However, given the complexity of object classes and structures, the pre-classification information usually only contains a limited number of classes, e.g."ground," "building," "water," and "unclassified."Although this is sufficient for mapping terrain and buildings (i.e. the intended purpose of many national ALS flight campaigns), such classes may not have enough thematic information for ecological applications.For instance, misclassifications of ground and non-ground points can have a strong effect on quantifying vegetation structure and terrain properties (Deibe, Amor, and Doallo 2020;Simpson, Smith, and Wooster 2017).Similarly, the accuracy of vegetation mapping can be influenced by confusing vegetation with other elevated objects such as building roofs or powerlines (Morsy, Shaker, and El-Rabbany 2022).LiDAR-derived vegetation metrics (e.g.maximum or 95 th percentile of vegetation height) can also be biased if classes such as the "unclassified" class (or even the pre-defined "vegetation" classes) contain points from human objects, such as ships on water and fences around buildings (Kissling et al. 2023).A typical example of human objects is powerlines, which can cause erroneously high vegetation height values in LiDAR metrics capturing the canopy layer of ecosystems (Guo et al. 2021;Roussel, Achim, and Auty 2021).Removing such noises is crucial for the accurate quantification of ecosystem structure from LiDAR point clouds.
A range of methods for powerline extraction from airborne LiDAR data exist.They can be divided into 2D grid-based or 3D point-based methods (Awrangjeb 2019;Sohn, Jwa, and Kim 2012).Two-dimensional grid-based methods first generate 2D pixels from 3D point clouds by calculating geometric features (e.g.eigenvalues, normalized height, or intensity) within a defined neighborhood.Given the linear characteristics of powerlines, the derived 2D images can then be used as input for pattern recognition algorithms (e.g.Hough transform or Radon transform) that allow powerline classification (Wang, Peethambaran, and Chen 2018;Yang and Kang 2018;Zhu and Hyyppä 2014).Some studies also apply machine learning algorithms (e.g.Random Forest and JointBoost) to classify powerlines using 2D featured images derived from 3D point clouds (e.g.Guo et al. 2015;Kim and Sohn 2013).A difficulty of the 2D grid-based methods is to separate objects (e.g.vegetation and building) that simultaneously occur with powerlines in the same grid cell.In contrast, 3D point-based methods identify powerlines by detecting elongated linear objects from the 3D point cloud directly, with the advantage that objects which overlap with powerlines (e.g.trees and shrubs) can be labeled separately and subsequently decomposed, e.g. by using their eigenvalues (Roussel et al. 2020).More recently, deep learning algorithms have been explored for 3D point cloud classification, demonstrating an impressive classification performance for LiDAR point clouds and enabling feature extraction and class labeling for each single point using neural networks (Qi et al. 2017;Wen et al. 2021;Zhao et al. 2021).However, this usually comes with long execution times, e.g. for labeling each individual point in large datasets (Li, Kahler, and Pfeifer 2021).There are also hybrid methods that integrate both 2D grid-based and 3D pointbased methods to identify powerlines, aiming at decreasing execution times while retaining high classification accuracy (Awrangjeb 2019;Wang, Peethambaran, and Chen 2018).While several studies have focused on improving extraction accuracy (Jung et al. 2020;Matikainen et al. 2016), it remains mostly unexplored how different powerline extraction methods perform in the context of improving LiDARderived vegetation metrics.
Three main aspects are important for evaluating powerline noise removal in the context of generating national or regional data products of ecosystem structure from ALS point clouds.First, the performance and accuracy of different extraction methods might depend on the context of neighboring vegetation, on powerline characteristics, or on the presence of other objects in the landscape (Jung et al. 2020;Peng et al. 2019).Second, powerline extraction methods might show differences in their effectiveness, i.e. how well LiDAR metrics of the ecosystem structure are improved after powerline noise removal.It could depend on the surrounding vegetation or on which aspects of the ecosystem structure are quantified (e.g.vegetation height, canopy cover, or vegetation density in different strata).Third, the computational efficiency of each method might differ.Grid-based methods can have short execution times as they are based on computationally efficient algorithms for 2D images (Awrangjeb 2019).In contrast, deep learning algorithms (e.g.Li et al. 2018;Qi et al. 2017;Thomas et al. 2019) often suffer from a high computational demand (i.e. a high time and memory consumption due to calculating features for each point from neighboring points), especially when applied to large-scale ALS datasets (Jung et al. 2020;Liu et al. 2023;Weinmann et al. 2015).Hybrid methods combine both 2D grid-based and 3D point-based methods and thus can benefit from the advantages of both methods (e.g.lower number of processed points and faster execution times).To our knowledge, there are currently no studies that provide a comprehensive evaluation of all three aspects of powerline extraction methods (e.g.performance, effectiveness, and computational efficiency) in the context of generating data products of ecosystem structure from ALS point clouds.
Here, we aim to evaluate the performance, effectiveness, and computational efficiency of three powerline extraction methods (i.e.deep learning, eigenvalue decomposition, and a hybrid method) for powerline noise removal from 25 LiDAR metrics representing vegetation height, cover, and vertical variability (Figure 1).We tested three hypotheses: (H1) deep learning shows the best accuracy compared to other methods and performs well in landscapes with different characteristics; (H2) the effectiveness of powerline extraction methods is highest for LiDAR metrics capturing vegetation height and cover, especially in landscapes with low-stature vegetation (because powerline points can then strongly bias vegetation height and cover values); and (H3) the computational efficiency of the deep learning method is low due to the high number of processed points and its execution time for labeling and calculating neighborhood features for each point, but the hybrid method (which combines 2D grid-based and 3D point-based methods) can substantially reduce the execution time due to the lower number of points being processed.Our work contributes to understanding how different powerline extraction methods perform in removing biases and noises in LiDARderived vegetation metrics and what their potential is for upscaling, i.e. generating data products of the ecosystem structure from multi-terabyte point clouds at a national extent.

Study areas
We selected 10 study areas (500 m × 500 m per area) in the Netherlands (Figure 2(a)) for assessing powerline noise removal.We chose those areas to represent (1) powerlines in each scene, (2) sites that are spread across the country (Figure 2(a)), (3) landscapes with different canopy heights and vegetation openness (Figure 2(b)), and (4) variation in landcover types and other vegetation and powerline characteristics (Table 1).The total number of LiDAR points per area varied between 2.7 and 5.7 million (Table 1).Powerlines were present in each study area, partially overlapping with vegetation (Figure 2(a)), and the number of powerline pixels (i.e.powerlines being present at 10 × 10 m resolution) varied from 237 to 698 across the 10 study areas (Table 1).Powerline points were not separated from vegetation points in the preclassification of the raw LiDAR point clouds, resulting in abnormal vegetation height estimates (e.g.Z values >50 m, Table 1).

LiDAR data preparation
We used the pre-processed raw point clouds from airborne LiDAR as input into our workflow (Figure 1).These were collected during the third national ALS campaign of the Netherlands (AHN3, Actueel Hoogtebestand Nederland).The campaign was conducted between 2014 and 2019 in the leafoff season covering the whole Netherlands (~34,000 km 2 ).Raw point clouds are available from the AHN-viewer (https://www.ahn.nl/ahn-viewer).The average point density of AHN3 is 10-16 points/m 2 with an overall vertical accuracy of 5 cm.Information stored for each point contains X, Y, Z, intensity, return number, number of returns, classification, scan angle rank, source ID, and GPS timestamp.The country-wide AHN3 data were preprocessed and delivered by "Rijkswaterstaat" (the Dutch Ministry of Infrastructure and Water Management), providing the pre-classification code for each point: unclassified (1), ground (2), building (6), water (9), and others (26).In the raw point cloud, points belonging to powerlines and vegetation are all defined as unclassified (1).We created digital terrain models (DTMs) at 1 m  3,303,321 2,652,288 3,475,929 5,339,845 4,948,259 5,270,261 4,664,596 5,797,241 4,531,694  Forest, grassland *Size (ha) represents 500 m × 500 m study areas.The range of raw Z values, the number of total points, and the point density were derived from raw LiDAR point clouds (AHN3).The range of normalized Z values, the height range of powerlines, the height range of vegetation, and the number of powerline points were calculated from manually labeled point clouds (ground truth).The mean values of two LiDAR metrics (95 th percentile of vegetation height and pulse penetration ratio, respectively) and the number of powerline pixels were calculated from pixels at 10 × 10 m resolution (derived from manually labeled point clouds).Landcover types were interpreted through visual inspection of the point clouds.
We manually labeled every point in the raw point clouds of the 10 study areas (approximately 45 million points in total) into six categories: vegetation (1), ground (2), buildings (6), water (9), powerline ( 14), and others (26) (e.g.bridges, cars).This was done using the ArcGIS Pro interactive editing tool for LAS classification (see https://pro.arcgis.com/en/pro-app/latest/help/data/las-dataset/interactive-las-classcode-editing.htm).The manually labeled point clouds were then used as ground truth (Figure 1), especially to (1) characterize the powerlines and vegetation in each study area (section 2.1), ( 2) evaluate the performance of powerline removal methods at the point level (section 2.4), and (3) generate LiDAR-derived vegetation metrics at 10 m resolution for assessing the effectiveness of the powerline extraction methods (section 2.5).

Methods for powerline extraction
We tested two point-based methods and one hybrid method for extracting powerlines.The first pointbased method (deep learning method) introduces a convolutional neural network (CNN) approach to feature learning from well-labeled point clouds and classifies the target point clouds into different classes (e.g.powerlines, vegetation, and buildings).The second point-based method (eigenvalue method) uses eigenvalue decomposition of point clouds and labels the linearly distributed points as powerlines.The third method (hybrid method) was developed here and employs a 10-m resolution GeoTIFF layer of vegetation height to subset potential candidate powerline points from the original point clouds and then uses the trained CNN algorithm to classify the candidate powerline points into defined classes.

Point-based method using deep learning
The PointCNN proposed by Li et al. (2018) is a deep learning generalization of CNNs which shares the design of hierarchical convolution on 2D CNNs (gridbased) and generalizes it to 3D point clouds.The core χ-Conv operator implemented in PointCNN is recursively applied to aggregate information from neighboring points into a subset of representative points assigning featured information (Li et al. 2018).While the output features are defined by the local (relative) coordinates of neighboring points and their associated features, they are also weighted and permuted by the χ-transformation which is jointly learned across all neighborhoods.In this way, PointCNN is also capable of handling point clouds with or without additional features in a robust and uniform fashion (Li et al. 2018).
We employed the Dayton Annotated Laser Earth Scan (DALES) dataset (from Canada) to train the PointCNN model and applied the trained model to the AHN3 dataset (from the Netherlands) for prediction (Figure 1).This approach provided a robust validation for testing spatial transferability (from Canada to the Netherlands) using a point cloud dataset for prediction that is independent of the training dataset and has different characteristics (e.g.different point density and other sensor type).The DALES dataset that we used for training is a large annotated point cloud dataset, which serves as an important benchmark for evaluating deep learning algorithms applied to 3D point clouds (Varney, Asari, and Graehling 2020).It consists of more than 500 million handlabeled point clouds obtained from ALS across multiple landscape types in British Columbia, Canada, and is currently the most extensive annotated aerial point cloud dataset that is publicly available (Varney, Asari, and Graehling 2020).The DALES dataset is delivered with separate training and testing tiles (40 tiles in total) with roughly a 70/30 percentage (i.e. 29 tiles for training and 11 tiles for testing).The training and testing scenes have a similar distribution across all labeled categories and come with a minimum and an average point density of 20 points/m 2 and 50 points/m 2 , respectively.There are eight classification categories in the dataset: ground (1), vegetation (2), cars (3), trucks (4), powerlines (5), fences (6), poles (7), and buildings (8).
We first trained the PointCNN model using 29 selected DALES training tiles and validated the model using the remaining 11 DALES testing tiles.During training, we set the number of epoch to 20, and selected the model with the highest accuracy for prediction (F1 score of 88% for classifying the powerline category based on an evaluation with the DALES testing tiles).We then used the trained PointCNN model to predict the class of each point from the 10 study areas (AHN3 dataset) into the abovementioned eight classification categories.We chose to train the model with the most common attributes of the point cloud (i.e.X, Y, and Z values) to prevent introducing unstandardized attributes (e.g.unnormalized intensity or RGB values) into the training and prediction.We performed PointCNN model training and prediction using the deep learning framework and arcgis.learn module within a Jupyter environment.General information on the PointCNN architecture and its implementation are also available from GitHub (https://github.com/yangyanli/PointCNN).

Point-based method using eigenvalue decomposition
The eigenvalue method (Figure 1) identifies clusters of points that are linearly distributed based on the three eigenvalues of each individual point using the k-nearest neighbor method (Limberger and Oliveira 2015).Linear objects (e.g.powerlines) can then be segmented based on the criterion that one principal value is significantly greater than the other two principal values, where λ 1 , λ 2 , λ 3 indicate the eigenvalues calculated for each point in ascending order and th is a given threshold value (Limberger and Oliveira 2015).We performed the eigenvalue calculation using the function segment_shapes() provided in the lidR package (Roussel et al. 2020).Given the point density of AHN3 (average 10-16 points/m 2 ) and the distribution pattern of powerline points, we set the threshold value (i.e.parameter th) varying from 4 to 20, and the number of neighboring points involved in the calculation (i.e.parameter k) varying from 6 to 40.Each combination of the parameters was tested, and the combination with the highest powerline detection rate was eventually selected as the final setting for each study area.A new attribute ("powerline") was then added to each point, indicating the segmentation result (powerline points = 1, others = 0).

Hybrid method
The hybrid method combined a 2D raster layer with the PointCNN model (Figure 1).In this method, a subset of the raw point clouds (candidate points) -based on intersecting a 2D grid height layer -was first selected and then used as input into the trained PointCNN model for powerline extraction.This should increase the computational efficiency while keeping extraction accuracy comparable to the deep learning method.We used the LiDAR metric 95 th percentile of vegetation height (Hp95) derived from the raw AHN3 point clouds at a resolution of 10 m to generate a 2D mask for segmenting the candidate points.We chose Hp95 = 10 m as the height threshold for generating a binary mask because European regulations require to ensure at least 7-10 m clearance beneath powerlines (Zhu and Hyyppä 2014).This suggests that high voltage powerlines should be >10 m high.This binary mask was then used to extract the subset of points for each study area.We then employed the trained PointCNN model (see section 2.3.1) to classify the candidate points into the same eight categories as classified in the DALES dataset.Similar to the deep learning method, we only used the X, Y, and Z values as input attributes for the prediction.

Performance of powerline extraction methods
To test H1, we evaluated the performance of the three powerline extraction methods by calculating four accuracy measures (i.e.recall, precision, quality, and F1 score).Recall (Re) is a measure of completeness or quantity of correctly extracted powerline points.Precision (Pr) measures the exactness or quality of correctly extracted powerline points.Quality (Qu) gives an overall evaluation of the completeness and exactness of powerline extraction, and the F1 score gives the harmonic mean of Re and Pr (Yang and Kang 2018).These measures were calculated as where TP is the amount of powerline points that are correctly identified as powerlines, FN is the amount of powerline points that are misidentified as others, and FP is the amount of other points that are misidentified as powerlines.The manually labeled points were used as ground truth to assess correct (true) or incorrect (false) classifications.Each accuracy measure was calculated for each study area (n = 10) and compared with the average canopy height of each study area.Note that the accuracy calculations were done without the transmission towers (i.e.powerline points only) because (1) the eigenvalue method is only designed to identify linear objects (i.e.powerlines) and (2) the shape of transmission towers differs between the training dataset (DALES from Canada) and the prediction dataset (AHN3 from the Netherlands) for the PointCNN model.

Effectiveness of powerline removal on LiDAR metrics
To test H2, we analyzed the effectiveness of powerline removal for 25 LiDAR metrics representing different aspects of vegetation height (7 metrics), vegetation cover (11 metrics), and vertical variability of vegetation (7 metrics) (see details in Appendix Table A1).For each metric, we compared their values after applying the three powerline extraction methods with the values derived from ground truth (manually labeled point clouds) and raw point clouds (with powerline noise) (Figure 1).All metrics were calculated at a 10 m resolution for each study area using a recently developed high-throughput workflow "Laserfarm" for generating geospatial data products of ecosystem structure from ALS point clouds (Kissling et al. 2022).
Within each study area, we assessed only the pixels with powerlines, i.e. those 10-m resolution grid cells in which powerlines occurred (n = 3434 across all study areas).

Computational efficiency of powerline extraction methods
To test H3, we evaluated the computational efficiency of the three powerline extraction methods (Figure 1).We estimated execution times for each study area using the Jupyter Notebook environment (for the deep learning and hybrid methods) and the R environment (for the eigenvalue method).This was done on a DELL XPS laptop with 2.40 GHz Intel Core i9 processor and 32 GB RAM and included loading the data, running the model/method, predicting the results and exporting the results.We further calculated the number of processed points for each method at each study area and the time efficiency (points/sec) of each method by dividing the amount of processed points by the execution time.

Performance of powerline extraction methods
In line with our hypothesis H1, the deep learning method generally performed best in terms of accuracy (Figure 3(a-d)).It showed a high recall (94.52% ± 4.13%), precision (98.61% ± 0.72%), quality (93.27% ± 4.10%), and F1 score (96.47% ± 2.26%) across all 10 study areas (Figure 3(a-d), Appendix Table B1).It also removed on average about 95% of all powerline points across the study areas (94.52% ± 4.12%, see Appendix Figure B1).The hybrid method also showed a very good performance, similar to the deep learning method (Figure 3(a-d)), with an equally high precision (98.93% ± 0.50%) and F1 score (95.10% ± 3.61%) and a slightly lower recall (91.76% ± 6.30%) and quality (90.84% ± 6.12%) (see Appendix Table B1).The hybrid method also removed a high proportion of powerline points across the study areas (91.76% ± 6.30%, see Appendix Figure B1).The eigenvalue method generally performed poorer than the other two methods (Figure 3(a-d)), with an overall lower recall (86.58% ± 6.73%), precision (83.35% ± 16.80%), quality (71.41% ± 14.40%), and F1 score (82.48% ± 10.07%) (see Appendix Table B1).It also showed a lower proportion of removed powerline points than the other two methods (86.58% ± 6.73%), except in areas E and G where it removed more points (see Appendix Figure B1).Moreover, the eigenvalue method showed a distinct decrease in precision, quality and F1 score in landscapes with tall canopies (areas F-J) compared to areas with low vegetation (areas A-E) (Figure 3(g)).In contrast, the deep learning and hybrid methods retained high accuracies in all study areas (Figure 3e and 3f), supporting the expectation of a good performance in areas with different landscape characteristics (H1).The only exception was study area E, where the eigenvalue method outperformed the other two methods in recall, quality, and F1 score (Figure 3(e-g)).

Effectiveness of powerline removal on LiDAR metrics
Most of the 25 LiDAR metrics were improved after removing powerline noise (Figure 4).Especially for

Computational efficiency of powerline extraction methods
In line with hypothesis H3, the computational efficiency of the deep learning method was low because of the large number of points to be processed (Figure 5 .With larger data volumes and a higher number of processed points, the execution time of both the deep learning and hybrid methods strongly increased (Appendix Table B2).However, compared to the deep learning method, the hybrid method reduced the total number of processed points by almost 50% (Appendix Table B2).This resulted in a substantial reduction of the execution time (Figure 5(b)), supporting our initial expectation (H3).Compared to the deep learning method, the execution time reduction of the hybrid method ranged from 11% (in area G, densely vegetated landscape with tall canopy height) to 80% (in area C, open landscape with low canopy height) (Appendix Table B2).

Discussion
We evaluated the performance, effectiveness, and computational efficiency of three powerline removal methods for improving 25 LiDAR metrics of ecosystem structure derived from ALS point clouds.The deep learning and hybrid methods (based on the PointCNN model) provided a consistently high accuracy of powerline noise removal across study areas with different canopy heights and landscape openness, whereas the eigenvalue method had a poorer performance, but a higher time efficiency.Powerline removal was most effective in LiDAR metrics representing vegetation height as well as vegetation cover in low and upper vegetation layers.We further showed that the hybrid method can greatly reduce the execution time compared to the deep learning method, making it a computationalefficient and accurate method for upscaling powerline removal to multi-terabyte ALS datasets at a national extent.
Deep learning is rapidly transforming the fields of Earth science (Reichstein et al. 2019) and ecology (Borowiec et al. 2022), and our results confirm the high potential of deep learning applications for point cloud classification (Bello et al. 2020;Guo et al. 2021;Wen et al. 2021).By removing ~95% of the powerline points from the raw point clouds, the applied deep learning method (i.e.PointCNN) was highly successful, with an average accuracy of ~96% (across the four accuracy measures, i.e. recall, precision, quality, and F1 score).The impressive performance of the PointCNN was even achieved using an independent test by training the model with a dataset from North America (i.e.Canada) and applying it to a dataset from Europe (i.e. the Netherlands), which has different characteristics (e.g.lower point density and other sensor types).The high accuracy of the PointCNN model is similar to the accuracy of other deep learning methods, e.g. the KPConv model proposed by Thomas et al. (2019) and methods tested with UAV-based LiDAR datasets (Chen et al. 2022).Due to the linear and narrow attributes of powerlines, relatively high point densities are usually needed to achieve good results in powerline extraction (Matikainen et al. 2016).In our study, we obtained an average F1 score of 96.5% with a point density of 10-16 points/m 2 (AHN3 dataset).Lower point densities (e.g.4-7 points/m 2 ) can result in a reduced performance of deep learning methods for 3D point classification, e.g. an average F1 score of 61.5% when the PointCNN method is applied to the ISPRS benchmark dataset (Wen et al. 2021).On the other hand, deep learning methods applied to point clouds with higher point densities (e.g.>20 points/m 2 ) typically show good performance of powerline extraction, e.g. an average F1 score of 97.1% when the DCPLD-Net method is applied to four datasets varying in point density from 22 to 120 points/m 2 (Chen, Lin, and Liao 2022).We therefore expect that deep learning methods for powerline extraction show good performance with ALS datasets that have point densities of 10-20 points/m 2 or above.
Compared to deep learning, the hybrid method also showed a high overall accuracy (average 94%), but a remarkable decrease in execution time.Other applications of hybrid methods also demonstrate a similar performance, such as the one proposed by Zhu and Hyyppä (2014), which successfully classified 93% of powerline points from forested areas in Finland.This encourages the use of hybrid methods due to their high accuracy and simultaneously a reduced execution time.In contrast, the eigenvalue method generally performed poorer than the deep learning and the hybrid methods, except in relatively open landscapes with low vegetation (e.g.our study areas A-C).While eigenvalue methods can successfully remove a large number of powerline points in certain situations (e.g.McLaughlin 2006), they also require additional parameter adjustments for optimization in different settings (e.g.different landscapes where powerlines occur or different characteristics of powerlines).This impairs their generalizability (Chen et al. 2022) for accurately detecting powerlines in different landscapes (e.g.Jwa and Sohn 2012;Nardinocchi, Balsi, and Esposito 2020) and hence limits the application of eigenvalue methods for upscaling to large areas with heterogeneous landscapes.
Our results show that the effectiveness of powerline removal depends on which properties of the ecosystem structure are captured (i.e.vegetation height, cover, or vertical variability) and in which stratum (e.g.low, middle, or upper layer).For instance, vegetation height metrics were more strongly improved than other metrics after removing the abnormally large Z values from the powerlines.Metrics capturing the density of low vegetation (e.g.below 1 m) and upper vegetation (e.g.canopy cover above 20 m) also showed strong improvements after powerline removal, especially when compared to other vegetation cover metrics such as vegetation density of the middle layer (e.g. between 3-4 m and 4-5 m).Airborne laser scanning often has difficulties in capturing low vegetation when canopies are dense, suggesting that low strata with few vegetation points (e.g.below 1 m) are more prone to misclassifications (Assmann et al. 2022).When powerline points (usually above 10 m) are identified and removed from vegetation points in the upper vegetation layer, the calculation of vegetation cover below 1 m or above 20 m can be greatly improved.In contrast, the effect of powerline removal on vegetation cover within 4-5  b) Examples of misclassifications in study area E for powerlines above water (deep learning and hybrid method, but not eigenvalue method).Note that transmission towers were generally not correctly classified.For visualization purposes, the eight categories classified by the deep learning and hybrid methods were grouped into four classes: ground, vegetation, powerlines, water and others (including buildings, etc.).For the eigenvalue method, the result only contained two classes: powerline and others.
m and on the pulse penetration ratio was almost neglectable.Overall, our results can provide guidance on which LiDAR metrics of the 3D ecosystem structure might be most biased if powerlines are present.
Upscaling powerline extraction to a national, multiterabyte ALS dataset requires time efficient and highly accurate methods.In the Netherlands, there are approximately 24,500 km overhead high-voltage powerlines with 110-380 kV (grid map available at: https://data.rivm.nl/apps/netkaart/).For the three tested methods, the hybrid method was the most promising candidate for upscaling because it is highly accurate across different landscapes and simultaneously reduces the computation time compared to the deep learning method.Note that the application stage of the deep learning method is already much more efficient than in its training stage: the training stage of the deep learning method costs on average 14 h per epoch (around 100 h in total for training on 350 million points), while in the application stage, it predicts (on average) 4.5 million points in 13 mins (see Appendix Table B2).Considering the total amount of points of the Dutch AHN3 data (~700 billion points) covering the whole Netherlands (~34,000 km 2 ) (Kissling et al. 2023), we estimate that the hybrid method would require to process ~30 billion points when considering only the candidate powerline points after subsetting (i.e.applying a 10-m resolution binary mask with the 95 th percentile of vegetation height >10 m).The estimated CPU time for the hybrid method (~63 days) is about 5% of the time for the deep learning method applied to the whole country (~1373 days), and thus only slightly more than the eigenvalue method (~56 days).These estimated execution times are based on a single process local machine.When upscaling the process to a high-performance computing (HPC) or cloud environment, the execution time could be strongly shortened by parallelization and distributed processing with the benefit of multi-core CPU or multi-node GPU clusters (Kissling et al. 2022).For instance, the actual execution wall-time can be reduced to ~3 days when using a cluster of 10 virtual machines that each has two cores.Some small biases and misclassifications will remain, independent of the applied powerline extraction method (Figure 6).The eigenvalue method failed to identify powerline points with a discontinuous distribution, i.e. when large gaps between neighboring powerline points occurred (Figure 6(a)).Both the deep learning and hybrid methods showed similar misclassifications, e.g. in areas where powerlines crossed water (Figure 6(b)).This probably stems from the input training data (DALES dataset) that has no powerlines above water (Varney, Asari, and Graehling 2020), resulting in misclassifications in the prediction.A general challenge for the classification was the lack of identification of transmission towers (Otcenasova, Hoger, and Altus 2014) (Figure 6).Only very few transmission towers are included in the training dataset (DALES dataset) and the shape of transmission towers differs between Canada and the Netherlands (see Appendix Figure C1).A future solution could be to collect more training samples to capture various shapes of transmission towers, which then can improve the capability of the PointCNN model to identify them.

Conclusion
Country-wide airborne LiDAR data provide great opportunities for generating high-quality metrics of ecosystem structure across large spatial extents.However, powerlines can introduce biases and noises into LiDAR metrics of vegetation height, cover, and vertical variability.We show that deep learning models in combination with grid-based approaches can provide high accuracy and simultaneously reduce execution times compared to the deep learning method.This makes the hybrid methods a computational-efficient and accurate approach for upscaling powerline removal to a national extent.Although the eigenvalue method generally performed poorer than the deep learning and the hybrid methods, it can still achieve high accuracy in relatively open landscapes with low vegetation.Powerline removal methods can largely remove abnormal Z values, especially for LiDAR metrics that capture vegetation height and vegetation cover in low and upper vegetation layers.Developing upscaling solutions on high-performance computing or cloud environments together with additional training data will be crucial next steps for generating high-quality metrics of ecosystem structure at regional or national extents.

Appendix B
Table B1.Performance of three the powerline extraction methods (deep learning, hybrid, eigenvalue decomposition) in each study area (A-J).Four accuracy measures are provided, namely the recall (Re), precision (Pr), quality (Qu), and F1 score.The mean (Mean) and standard deviation (SD) are calculated across the ten study areas (A-J).Table B2.Computational efficiency of three powerline extraction methods (deep learning, hybrid, and eigenvalue) in 10 study areas (A-J).Summarized are the number of processed points, the executing times, and the time efficiency for each method*.The deep learning and eigenvalue methods have the same number of processed points because both of them consider the whole point clouds as input, while the hybrid method only uses the candidate points (i.e.subsets of the whole point clouds).

Figure 1 .
Figure 1.Workflow for evaluating the performance, effectiveness, and computational efficiency of powerline noise removal from 25 LiDAR metrics capturing vegetation height, vegetation cover, and vertical variability of vegetation.Note that the PointCNN model (applied in the deep learning and hybrid method) uses independent datasets for training and prediction.The accuracy of the three powerline extraction methods is tested with manually labeled ground truth data (performance evaluation).

Figure 2 .
Figure 2. Locations and characteristics of the 10 study areas (A-J) in the Netherlands.(a) Spatial distribution and LiDAR point cloud visualization (colored by normalized Z values) of each study area.(b) Canopy height and landscape openness of each study area characterized by two 10-m resolution LiDAR metrics (95 th percentile of vegetation height and pulse penetration ratio, respectively).These metrics were generated from manually labeled point clouds (see detailed description in section 2.2).

Figure 3 .
Figure 3. Performance evaluation of powerline extraction methods.Accuracy of three methods (i.e.deep learning, hybrid, and eigenvalue method) as quantified by (a) recall, (b) precision, (c) quality, and (d) F1 score.All four accuracy measures together along a gradient of canopy height, shown separately for (e) deep learning, (f) hybrid, and (g) eigenvalue method.Dots with different colors indicate the 10 different study areas (A-J).Different line types indicate the different accuracy measures (recall, precision, quality, and F1 score).

Figure 4 .
Figure 4. Effectiveness of powerline removal on 25 LiDAR metrics of ecosystem structure.Metrics after powerline removal (using the deep learning, hybrid, and eigenvalue method, respectively) are compared to metrics generated from ground truth (i.e.manually labeled point clouds) and metrics derived from the raw point clouds (i.e. with powerline noise).The metrics represent vegetation height (7 metrics), vegetation cover (11 metrics), and vertical variability of vegetation (7 metrics).See Appendix TableA1for metric abbreviations.Box-and-whisker plots show the values of each metric calculated for pixels with powerlines (n = 3434) across the 10 study areas.Boxes show the median and interquartile range, with whiskers (stippled lines) extending to 1.5 times the interquartile range and dots beyond.
(a)), the long execution time (Figure 5(b)), and its low time efficiency (Figure 5(c)).In contrast, the eigenvalue method was the most time-efficient method (Figure 5(c)), having a more than 20 times faster execution time than the deep learning method (Figure 5(b))

Figure 5 .
Figure 5. Computational efficiency of three powerline extraction methods (i.e.deep learning, hybrid, and eigenvalue method) explained by (a) number of processed points, (b) execution time, and (c) time efficiency.Boxes show the median and interquartile range, with whiskers (stippled lines) extending to 1.5 times the interquartile range and dots beyond.The mean and standard deviation are given next to each boxplot.

Figure 6 .
Figure 6.Examples of misclassifications from powerline extraction methods in comparison with ground truth.(a) Examples of misclassifications in study area D when powerlines are close to tree crowns (deep learning and hybrid method) or when gaps between powerline points are relatively large (eigenvalue method).(b) Examples of misclassifications in study area E for powerlines above water (deep learning and hybrid method, but not eigenvalue method).Note that transmission towers were generally not correctly classified.For visualization purposes, the eight categories classified by the deep learning and hybrid methods were grouped into four classes: ground, vegetation, powerlines, water and others (including buildings, etc.).For the eigenvalue method, the result only contained two classes: powerline and others.

Figure B1 .
Figure B1.Violin plot of the proportion of removed powerline points across 10 study areas using three powerline extraction methods (deep learning, hybrid, and eigenvalue method).The proportion of remaining powerline points from each tested method was calculated relative to the total amount of manually labeled powerline points (ground truth).

Table 1 .
Detailed characteristics of 10 study areas (A-J) in the Netherlands×.