Windthrow damage detection in Nordic forests by 3D reconstruction of very high-resolution stereo optical satellite imagery

ABSTRACT We tested whether windthrow damage to Nordic conifer forest stands could be reliably detected as canopy height decrease between a pre-storm LiDAR (Light Detection and Ranging) digital surface model (DSM) and a photogrammetric DSM derived from a post-storm WorldView-3 stereo pair. The post-storm ground reference data consisted of field and unmanned aerial vehicle (UAV) observations of windthrow combined with no-damage areas collected by visual interpretation of the available very high resolution (VHR) satellite imagery. We trained and tested a thresholding model using canopy height change as the sole predictor. We undertook a two-step accuracy assessment by (1) running k-fold cross-validation on the ground reference dataset and examining the effect of the potential imperfections in the ground reference data, and (2) conducting rigorous accuracy assessment of the classified map of the study area using an extended set of VHR imagery. The thresholding model produced accurate windthrow maps in dense, productive forest stands with a sensitivity of 96%, specificity of 71%, and Matthews correlation coefficient (MCC) over 0.7. However, in sparse and high elevation stands, the classification accuracy was poor. Despite certain collection challenges during the winter months in the Nordic region, we consider VHR stereo satellite imagery to be a viable source of forest canopy height information and sufficiently accurate to map windthrow disturbance in forest stands of high to moderate density.


Wind as a forest disturbance agent
Wind was a major natural disturbance agent in European forests in 1950-2000, responsible for 53% of the damage in terms of wood volume, followed by fire, bark beetles, and snow (Schelhaas, Nabuurs, and Schuck 2003).A similar figure was reported in Patacca et al. (2023) for 1950-2019, where the average damage caused by wind was found to be 23 million m 3 y −1 , peaking at 48 and 38 million m 3 y −1 in the 1990s and 2000s, respectively.Climate change may make wind damage a more frequent occurrence in European forests in the future, even though Patacca et al. (2023) found only a weak trend in wind disturbance in the past 70 years.
The effect of climate change on windiness in Europe is, however, uncertain.First, extreme winds in Northern Europe are associated with either extratropical cyclones during the winter months or thunderstorms during the summer.In the former case, changes in low-pressure system intensity, frequency, or cyclone tracks caused by a warming climate may affect future wind conditions, while in the latter, a warming climate may bring changes in low-level humidity, which in turn, may have consequences for the frequency and intensity of thunderstorms, being convective weather systems driven by atmospheric instability.Secondly, indications exist that tropical cyclones will more often transform into extratropical cyclones and reach Northern Europe and that increased low-level humidity may give more favourable conditions for thunderstorms and an increase in severe wind gusts during the summer season.An increased risk of wind damage associated with a warming climate can hence not be ruled out.However, Gregow et al. (2020) pointed out that there is a considerable divergence between studies of future storms in Europe, and consequently the future trends in windiness are uncertain.In a review of past and future changes in wind over Northern Europe, they found that studies of trends in wind speed may give slightly different results depending on methods.
Another effect of a warmer climate is the weakening of root anchorage due to wet and unfrozen soils during the winter (Kamimura et al. 2012).One example is the Gudrun storm that caused uprooting of 75 million m 3 of forest in Sweden in January 2005 after an abnormal mid-winter thaw combined with heavy rain (Valinger, Kempe, and Fridman 2014).Windstorms produce a range of negative effects in forests beyond reduced timber quality and value, including increased harvesting costs, disruptions to timber supply chains, secondary biotic forest damage, such as subsequent bark beetle outbreaks (Blennow and Persson 2013;Hanewinkel and Peyron 2013;Komonen, Schroeder, and Weslien 2011;Økland and Berryman 2004;Schwarzbauer and Rauch 2013).
On 19 November 2021, a low-pressure system formed outside the west coast of central Norway and, being capped by a jet stream and warmer air aloft above the Scandinavian Mountains, caused westerly flow and gravity waves over southern Norway resulting in catastrophic downslope windstorms on the lee side of the mountains, with measured wind gust speeds exceeding 25 m s −1 at elevations below 600 m above sea level (m.a.s.l.).Wind gust speeds in the hardest affected areas corresponded to a return period of over 25 years (Skattør et al. 2021).The extent of wind damage to forest was estimated at 2.4 to 2.6 million m 3 , mainly in the form of uprooting (Skogbrand Forsikringsselskap Gjensidig 2022b).
Catastrophic windstorm events and the risk of increased windiness in a warming climate indicate the need for accurate windthrow damage maps within a reasonably short time after a windstorm event to quantify the scope of damage.

Forest mapping using very high-resolution optical satellite data
The forest disturbance mapping need can be served by multiple remote sensing technologies and platforms, ranging from near-field (e.g., unmanned aerial vehicles (UAVs) equipped with an optical camera or a LiDAR (Light Detection and Ranging) scanner) to airborne (aerial photography or LiDAR) to spaceborne (passive optical and active microwave satellite sensors).Optical remote sensing platforms cover the entire gradient of ground surface area captured and ground sampling distances (GSD, or spatial resolution)from low resolution satellite instruments with a GSD of >300 m, such as MODIS with a swath width of 2330 km, to UAV cameras with a GSD of <5 cm and a footprint of only several metres.In the context of satellite remote sensing, the designation 'very high resolution' (VHR) conventionally refers to sensors with a GSD of <4 m and a typical swath width of 12-20 km.
The origins of VHR optical satellite remote sensing date back to the early 1960s when the Keyhole (KH) 4A and 4B series of reconnaissance satellites were put into operation by the US government (Dowman et al. 2022).In 1999, with the launch of the commercial satellite Ikonos, VHR satellite imagery with a GSD of 0.8 m, four multispectral bands, and stereo capability became available to the research community, including for the monitoring of forest resources (Neigh et al. 2014).Subsequent developments in the VHR sensor technology brought imagery with further improved GSD down to 0.3 m.
Monoscopic, i.e., collected from a single viewpoint, VHR satellite imagery is a wellstudied source of data on forest stand attributes and forest disturbances, either for a single point in time or bitemporal for change detection applications.For instance, Fassnacht et al. (2017) used multispectral and panchromatic WorldView-2 imagery to identify tree species composition and estimate tree density and discussed the role of VHR satellite imagery in forest management; Francini et al. (2020) proposed a method for nearreal time detection of forest disturbances, and Dalponte et al. (2020) developed a mapping workflow for forest windthrow in northern Italy, both using the PlanetScope imagery; in Schwarz et al. (2003), manual interpretation and supervized classification and segmentation of Ikonos imagery was used for the same purpose in Switzerland.Kislov and Korznikov (2020) and Kislov et al. (2021) applied a convolutional neural network (CNN) to Pléiades-1A/B and WorldView-3 images to identify windthrow areas in dense conifer forests, while Wagner et al. (2019) tested a similar approach on WorldView-3 images to map forest types in the Brazilian Atlantic rainforest.In Brandt et al. (2020), VHR satellite imagery was used to train a CNN to detect previously undocumented individual trees in the non-forest areas of Africa.In Mugabowindekwe et al. (2023), aboveground carbon stocks were estimated on a nation-wide scale in Rwanda and neighbouring countries using SkySat imagery.Shamsoddini, Trinder, and Turner(2013) used World View-2 multispectral bands to estimate stand attributes, such as mean height, mean diameter, standing volume, basal area, and stem count, in a pine plantation in Australia.In Immitzer et al. (2016), regional wall-to-wall mapping of growing stock was undertaken by leveraging WorldView-2-derived spectral and height information in combination and separately.Fassnacht et al. (2017) noted that despite the reasonably good accuracy and the advantages of stereoscopic VHR satellite imagery (such as affordable price, high availability and short lead time, limited need for corrections, straightforward processing workflows) for photogrammetric canopy height reconstruction, conflicting opinions prevailed in the expert community and a limited number of studies were available.

Forest height measurement and change detection by VHR satellite photogrammetry
A typical photogrammetric workflow for canopy height reconstruction consists in combining a photogrammetric VHR satellite digital surface model (DSM) with a preexisting digital terrain model (DTM), typically airborne LiDAR, to produce a normalized DSM (nDSM), representing the per-pixel elevation difference between the DSM and the DTM, which in forested areas is identical to a canopy height model (CHM).This workflow resulting in a 'hybrid ' CHM was proposed in St-Onge, Hu, and Vega (2008).By applying the hybrid approach, CHM improved to an RMSE of 4 m in St-Onge, Hu, and Vega (2008), 3 m in Neigh et al. (2014) (both using Ikonos), 4 m in Piermattei et al. (2019) using Pléiades-1, 2.3 m in Goldbergs (2021) using GeoEye-1, 1.27 m on individual tree level in St-Onge and Grandin (2019) using WorldView-3, and a normalized median absolute deviation (NMAD) of 2.6 m in Ullah et al. (2020) using WV-2.In a number of studies, photogrammetric CHM metrics (mean, maximum, and height percentiles) were additionally regressed on a LiDAR reference to estimate forest height metrics with an RMSE of 1.4-2 m, e.g., in Pearse et al. (2018), Persson (2016), Persson and Perko (2016), Ullah et al. (2020), andYu et al. (2015).
This study illustrates the photogrammetric applications of the stereo imagery collected by the WorldView-3 (WV-3) satellite operated by Maxar Technologies Inc. WV-3 was launched in 2014 and provides a panchromatic resolution of 0.31 m at nadir.The revisit frequency is 4.5 days at <20° off-nadir or daily at a GSD of 1 m.The WV-3 instrument is a pushbroom scanner rigidly attached to the satellite bus; pointing at the target and collection of stereo imagery is achieved through the spacecraft's agile design by rotating the entire satellite bus.In addition to the panchromatic band, the WV-3 sensor has two multispectral arrays (MS1: Red, Green, Blue, Near-Infrared 1; MS2: Coastal Blue, Yellow, Red Edge, Near-Infrared 2) with a GSD of 1.24 m, a shortwave infrared (SWIR) detector array for the eight SWIR bands (GSD 3.7 m), and a separate 12-band CAVIS (Clouds, Aerosols, Vapours, Ice and Snow) instrument (GSD 30 m), thus referred to as a 'super-spectral' sensor.Swath width is 13.1 km at nadir, making it possible to collect up to 7500 km 2 of mono and 3000 km 2 of stereo imagery in a single collection scenario (Maxar Technologies 2020a).WV-3 has a reported absolute geolocation accuracy (circular error 90%) of <3.8 m for the unprojected panchromatic band without the use of ground control (Bresnahan, Powers, and Vazquez 2016).
The proposed workflow applies to any VHR optical satellite with the stereo collection capability.The objective of this study was to evaluate whether windthrow damage to Nordic conifer forest stands can be reliably detected as canopy height decrease between a pre-storm -typically LiDAR -nDSM and a photogrammetric nDSM derived from VHR stereo imagery collected shortly after the windstorm.

Study area and windthrow damage observations
Study area is the valley of Hedalen in Sør-Aurdal municipality in south-eastern Norway covering 105 km 2 (Figure 1).It lies in the boreal forest zone at elevations between 290 and 1130 m.a.s.l. and is a flat valley bottom flanked by steep mountain slopes in the west and undulating hilly terrain with a sparse forest cover in the east.The prevailing tree species are Norway spruce (Picea abies, 74% of the forested area), Scots pine (Pinus sylvestris, 19%), and birch (Betula pubescens and B. pendula, 8%).The study area was severely affected by a downslope windstorm caused by mountain waves on 19 November 2021 resulting in extensive forest windthrow damage.
We used a ground reference dataset containing observations of windthrow damage collected during the winter of 2021/2022.The ground reference was a combination of visual interpretation of drone orthomosaics and field observations.Windthrow damage was mapped as vector polygons by the Norwegian forestry insurance company Skogbrand with the goal to identify areas eligible for insurance compensation where the eligibility criteria were overturning and breakage due to strong wind in at least 25% of the pre-storm tree count, excluding patches smaller than 0.2 ha and stands with fewer than 200 stems/ha (Skogbrand Forsikringsselskap Gjensidig 2022a).Since salvage harvesting had been going on for 4 months by the time the satellite stereo images were collected (Section 2.2), we excluded from our analysis 102 salvaged stands (of 853 in total) visually identified in the orthorectified satellite images based on the presence of fresh logging residue and ruts (Figure 1).One-third of the damage polygons were reported as partially damaged, i.e., with less than 50% of the pre-storm tree count felled by wind.Due to the observed inconsistencies in the application of the damage level criterion in the field data, we chose to merge the two damage classes together.Candidate no-damage areas were identified by combining the areas representing forest estates covered by a wind damage insurance, a forest mask derived from the Norwegian forest resource map SR16 (NIBIO 2022), and the damage polygons, assuming forest areas covered by insurance and not reported as damaged to be free of windthrow damage.No-damage areas smaller than 1 ha were excluded to eliminate artefacts.The resulting 751 damage polygons and the candidate 182 no-damage areas were rasterized on a 16 by 16 m grid aligned with the SR16 map grid as 41,000 damage cells and 168,000 candidate no-damage cells.From the latter, we selected 42,000 (25%) no-damage cells through visual interpretation of the orthorectified VHR satellite images acquired in March 2022 (Section 2.2) by excluding cells containing clear signs of wind damage, cells obviously misclassified as forest, cells that could not be reliably classified due to shadows, and cells representing very sparse woodland and juvenile forest stands.We made sure that the selected no-damage cells contained not only dense forest stands, but also sparser, thinned and higher-elevation forest with narrow crowns, expected to present a challenge for 3D reconstruction of VHR satellite stereo (Goldbergs et al. 2019;Loghin, Otepka-Schremmer, and Pfeifer 2020;Piermattei et al. 2019).The number of no-damage cells was chosen to achieve a prevalence value θ close to 0.5 for the entire dataset.The reference dataset covered 29% of the forest area.Examples of damage and no-damage cells are shown in Figure 1.
We believe that the criteria applied to map windthrow damage in combination with the no-damage cell selection procedure introduced an element of imperfection (false negatives, i.e., wind-damaged patches not recorded as such) into the reference data, which is thus considered imperfect ground reference combining reduced sensitivity (i.e., ability to discriminate against false negatives) with perfect specificity (i.e., ability to discriminate against false positives) (Foody 2010; Yerushalmy 1947).

VHR satellite imagery
For 3D reconstruction and windthrow damage detection, VHR optical imagery covering the study area was collected by the WV-3 satellite as an along-track stereo pair (one forwardand one backward-looking image) on 6 March 2022.The stereo pair was collected as a combination of one panchromatic (PAN) and eight multispectral (MS) bands and delivered as a View-Ready Standard Stereo (OR2A) product projected onto the WGS-84 ellipsoid with a constant base elevation, calculated as the footprint's average terrain elevation, and georeferenced using WGS84 UTM Zone 32N (Maxar Technologies 2020c).The OR2A product had a spatial resolution of 0.3 m and 1.2 m for the PAN and MS bands, respectively, and a real dynamic range of 11 bits (stored as 16 bits).Each of the images included rational polynomial coefficients (RPCs) describing the sensor camera model (Grodecki and Dial 2003).Detailed specifications of the stereo pair are given in Table 1.
For rigorous accuracy assessment of the classified windthrow damage map of the entire study area, we used an additional VHR product, collected by the GeoEye-1 (GE-1) satellite shortly after the windstorm on 25 November 2021 (Table 1).The collection was a combination of one PAN and four MS (Blue, Green, Red, Near-Infrared) bands, delivered as a System-Ready Basic 1B (L1B) product including RPCs, i.e., 'raw' imagery radiometrically and sensor-corrected, but not projected on a plane and thus having a variable pixel resolution (Maxar Technologies 2020b).

Ground control points, check points and LiDAR digital elevation models (DEMs)
We collected ground control points (GCPs) and independent check points (ICPs) to improve geolocation accuracy and validate the 3D model.WV-3 imagery is reported by the satellite operator Maxar Technologies to possess a horizontal geolocation accuracy of <3.5 m circular error at the 90th percentile (CE90) without the use of GCPs (Maxar Technologies 2020c).This claimed accuracy is in line with the typical geolocation accuracies reported in the literature (e.g., 2.8-2.9 m CE90 for PAN images in Bresnahan, Powers, and Vazquez (2016)).To achieve a root mean square geolocation error of less than 1 pixel (<0.3 m), we collected 33 GCPs and 16 ICPs in an aerial orthomosaic with a spatial resolution of 0.2 m made available by the Norwegian Mapping Authority (Norwegian Mapping Authority and Geovekst 2022).3D coordinates of the GCPs and ICPs were measured in ETRS89 UTM 32N with orthometric heights using the NN2000 vertical datum.
We used LiDAR-based DSM and DTM from the Norwegian National Digital Elevation Model as the pre-event reference.The LiDAR DSM of the study area had a resolution of 1.08 m and was a combination of three LiDAR surveys flown in 2016-2017 with a point density of 2 to 5 points/m 2 (Norwegian Mapping Authority 2022); the DTM had a resolution of 1 m.

Methods
We implemented the following workflow to detect windthrow damage and conduct rigorous map accuracy assessment: (1) Pre-processing: pan-sharpening of the WV-3 and GE-1 imagery, followed by (1) stereo model generation and refinement of the WV-3 stereo, including accuracy assessment, and (2) orthorectification of the GE-1 imagery.
(2) DSM generation by dense image matching of the WV-3 stereo.(3) Assessment of the geolocation and vertical accuracy of the reconstructed DSM on stable ground surfaces.( 4) Training and cross-validating a classifier model to predict windthrow damage within the spatial extent of the ground reference dataset.(5) Accuracy assessment of the windthrow damage predictions made for the ground reference dataset using robust performance metrics.(6) Rigorous accuracy assessment using same robust metrics of a windthrow damage map produced by applying the best-performing model to the entire study area.

Pre-processing of the satellite imagery and bundle adjustment of the stereo pair
We used ENVI ver.5.6.2 to pre-process the satellite imagery: we pan-sharpened the eight MS bands of the WV-3 stereo pair using the nearest neighbour diffusion-based pansharpening algorithm (Sun, Chen, and Messinger 2014) and prepared two band stacks (one per WV-3 image) composed of three pan-sharpened bands each -NIR1, Green, Coastal Blue -for the subsequent 3D reconstruction.The three bands were selected to maximize visual contrast and quality.We pan-sharpened the GE-1 images by applying two algorithms as implemented in ENVI -the Gram-Schmidt algorithm (Laben and Brower 2000) and the HSV algorithm -to the Red, Green, Blue (RGB) and Near-Infrared, Red, Green (VNIR) band combinations and orthorectified the resulting images using DTM without GCPs (relative orthorectification).The resulting images had a resolution of 0.84 m due to a high off-nadir acquisition angle of 42°.
For bundle adjustment of the stereo pair and 3D reconstruction we used the digital photogrammetry software Agisoft Metashape Pro (ver.1.8.3).The GCPs and ICPs were first placed in the two WV-3 band stacks to assess geolocation accuracy using the provided RPC model.In the second step, the RPC model was refined to achieve sub-pixel accuracy and another geolocation accuracy assessment was made based on the ICPs alone.

DSM generation by dense image matching of satellite stereo
Using the two stacks of three pan-sharpened MS bands (Section 3.1), we built two photogrammetric 3D point clouds with two different point densities by using two downscaling factors of 4 and 16 representing the size of the kernel window applied to downsample the original images.Both point clouds had gaps in locations where the dense image matching algorithm (Hirschmüller 2008) failed to match 3D points in object space.As the gaps occurred in different locations depending on the downscaling factor, we merged the two point clouds to fill in the gaps, similarly to the approach taken in Straub et al. (2013), and achieve a point spacing of approximately 0.5 m.We filtered the merged point cloud for low and high noise in ArcGIS Pro ver.3.0 by applying a minimum and maximum threshold of, respectively, −4 m and 25 m relative to the DTM based on the 90 th percentile of 22.5 m of the dominant tree height in the area of interest (NIBIO 2022) and rasterized it with a spatial resolution of 0.49 m, the highest possible resolution for the merged point cloud.

Accuracy assessment of DSM
We assessed the vertical error Δ h on a subsample of 360,000 ground points (86,500 m 2 ) representing snow-free paved road surfaces extracted from the photogrammetric DSM and the reference LiDAR DTM.The effect of road edges and potential misalignment was reduced by buffering to 1.5 m from the centreline.We tested the resulting error distribution for normality using histograms and Q-Q plots and chose to use the four robust statistical metrics suggested by Höhle and Höhle (2009) as less sensitive to non-normal distribution and outliers: the median, the normalized median absolute deviation (NMAD), and the 68.3% and 95% quantiles of the absolute error, i.e., Δ h j j.The NMAD was calculated as follows: where Δ j is the individual errors j ¼ 1; . . .; n, M Δ is the median of the errors, and M j is the median absolute deviation.The NMAD was chosen as a distribution-free estimator of the scale of distribution, converging with the standard deviation when the distribution is normal, and the 68.3% quantile was chosen to represent the absolute error interval within one standard deviation from the mean, assuming underlying normal distribution (Höhle and Höhle 2009).The uncertainty of the four robust estimators was estimated by finding 95% confidence intervals by bootstrapping with 1000 samples with replacement.All statistical analysis was conducted in the open-source statistical software R (R Core Team 2022).

Windthrow damage detection
We normalized the photogrammetric and reference DSMs by subtracting the DTM to obtain canopy heights (nDSMs or CHMs) before and after the windstorm, and derived canopy height change between 2017 and 2022 as the difference between the two with same resolution as the WV-3 DSM.We then aggregated canopy height change on the SR16 grid by calculating the height change mean for each SR16 cell.As we expected forest crown closure to affect the performance of 3D canopy reconstruction (Goldbergs et al. 2019;Loghin, Otepka-Schremmer, and Pfeifer 2020;Piermattei et al. 2019), we selected basal area, available from the SR16 map, to represent this effect.In a Nordic boreal forest setting, this variable provides an objective basis for stratifying the forest area into crown closure classes and is commonly available to forest owners.
As the accuracy metric used to optimize the classifier, we chose the Matthews correlation coefficient (MCC, see Equation 4below) (Matthews 1975) -a special case of the phi coefficient, similar to the Pearson correlation coefficient as applied to a matrix of two binary variables (Guilford 1954) -considering its properties of (1) being less sensitive to imbalanced datasets than, e.g., overall accuracy, Cohen's kappa, and F1 score; and (2) taking into account both true negatives and true positives and thus combining sensitivity and specificity into a single performance score (Chicco and Jurman 2020).MCC has a valid range of [−1, 1], where values above zero indicate performance better than a random classifier.
To produce a binary windthrow map, we trained a thresholding classifier model using canopy height change per SR16 cell as the only input.The threshold value was optimized to maximize the MCC of the resulting two-class confusion matrix (Baldi et al. 2000).For threshold optimization we used the R package cutpointr (Thiele and Hirschfeld 2021).The model was trained and validated by K-fold cross-validation with K = 10.Threshold values were averaged over the ten folds and applied as a binary classification rule to the entire reference dataset.
A stratified version of the model was additionally trained, where we subdivided the reference dataset into three strata according to the basal area (BA) value: low BA (<15 m 2 / ha, n = 17,873, θ = 0.33), moderate BA (15-30 m 2 /ha, n = 39,798, θ = 0.49), and high BA (≥30 m 2 /ha, n = 26,199, θ = 0.6).The threshold was optimized for each of the strata separately, and the stratified model's performance was compared to that achieved for the given stratum with the non-stratified threshold.

Classification accuracy assessment
Two-by-two confusion matrices (Figure 2) were built for the classifier and its stratified version, and classification accuracy was reported using three accuracy measures: sensitivity S 1 , specificity S 2 , and the Matthews correlation coefficient MCC (Chicco and Jurman 2020).These measures were chosen considering their predictable behaviour in response to the effects of error in the ground reference data at different prevalence levels θ of the damage class (Fielding and Bell 1997;Foody 2010).Additionally, we calculated the value of the area under the receiver operating characteristic (ROC) curve (AUC) for each of the models.The ground reference data was considered to be imperfect (Section 2.1): we arbitrarily set its sensitivity S R 1 to 0.9 and specificity S R 2 to 1.The assumed imperfection is based on the consideration that windthrown patches of forest either under 0.2 ha (Section 2.1) or poorly visible in the VHR imagery were likely to be registered as nodamage areas (error of omission, i.e., less than perfect sensitivity S R 1 ).The value of S R 1 was chosen arbitrarily to illustrate the direction and magnitude of the effect of imperfect ground reference on sensitivity and specificity at the apparent prevalence level θ of 0.5.Errors in the ground reference and in the classifier predictions were assumed to be conditionally independent (Foody 2010).
In addition to the perceived values of g MCC, sensitivity e S 1 and specificity e S 2 , we calculated the true values of sensitivity and specificity adjusted for the assumed error (S 1 , S 2 ,) (Chicco and Jurman 2020;Foody 2011;Staquet et al. 1981) as follows: where a; b; c; d; e; f ; g; h; n refer to the cell values and totals as shown in Figure 2.

Map accuracy assessment
We produced a classified windthrow map of the study area by applying the trained classifier to a forest mask derived from the SR16 map.For rigorous map accuracy assessment as described in Olofsson et al. (2014), we chose visual interpretation of an additional set of random points using a combination of the GE-1 and WV-3 imagery (Section 2.2).We randomly selected 400 cells from within the forest mask and visually classified these as either damage or no-damage.If a cell was found to be a mixed cell or a non-forest cell or was hard to interpret because of image quality, it was discarded.After filtering, 283 valid cells (160 damage and 123 no-damage) were used to produce two-by-two confusion matrices, and map accuracy assessment was carried out as described in Section 3.5 using perceived sensitivity f S M 1 , specificity f S M 2 , and g MCC M .We did not correct the accuracy measures for potential error in the ground reference because the magnitude of such error would be difficult to estimate.The map accuracy assessment dataset had a prevalence θ of 0.57.

Geolocation accuracy of the 3D model
The 3D model based on the RPCs alone had a geolocation error (reported as RMSE) of 0.52 m horizontally and 0.14 m vertically when measured using the 33 GCPs and of 0.88 m and 1.12 m, respectively, when measured using the 16 ICPs.After optimization using GCPs, the error was reduced to 0.15 m horizontally and 0.03 m vertically on the GCPs, 0.44 m and 0.46 m on the ICPs (Table 2).Geolocation accuracy was thus <1 image pixel if measured on GCPs and slightly above 1 pixel on ICPs.This is in line with the geolocation accuracy values reported for WorldView-2 by Aguilar, Saldaña, and Aguilar (2014), Poli et al. (2015), Hobi and Ginzler (2012) and for WorldView-3 by St-Onge and Grandin (2019).

DSM accuracy
The photogrammetric DSM was found to have a vertical accuracy on the same order of magnitude as the spatial resolution of the WV-3 stereo pair and the geolocation accuracy of the 3D model.
We found that the median height error (systematic shift) of the photogrammetric DSM when compared to the reference DSM on paved road surfaces was −44 cm with multiple strongly positive outliers caused by parts of tree crowns located directly above the road surfaces (Table 3).The mean error (−40 cm) deviated from the median (−44 cm), consistent with a moderate positive skewness of 7.3.The distribution-independent estimator NMAD (49 cm) was found to be narrower than the standard deviation (56 cm), indicating a peaked distribution.
Error Δ h distribution of the photogrammetric DSM presented as a histogram and a normal Q-Q plot in Figure 3 is non-normal with a strong peak and a long right-hand tail indicating a greater share of severe positive outliers than negative ones.This finding is supported by the NMAD (49 cm) being narrower than the 68.3% quantile (61 cm).Therefore, we consider the four robust metrics -median error, NMAD, 68.3% and 95% quantiles -to be more appropriate accuracy measures.

Windthrow damage classification accuracy
The classifier model demonstrated a reasonable level of accuracy, with a preference for specificity S 2 vs. sensitivity S 1 .As follows from Equations 5 and 6, the perceived sensitivity e S 1 was unaffected by the less than perfect sensitivity of the ground reference; at the same time, perceived specificity e S 2 was substantially underestimated compared to the true value of S 2 (0.785 vs. 0.841).Figure 4 presents a classified windthrow map, including examples of correctly classified and misclassified cells.We found that the imperfect sensitivity S R 1 of the ground reference resulted in systematic underestimation of the perceived specificity e S 2 in all strata when compared to the true specificity S 2 .The underestimation effect was most pronounced in the moderate and high BA classes, where S 2 (0.74 and 0.856) was, respectively, 0.046 and 0.127 higher than e S 2 (Table 5(b)).
Figure 5(a,b) shows density plots of the damage and no-damage classes grouped by BA stratum and the respective ROC curves of the stratified thresholding classifier.The low BA stratum has a ROC curve close to that of a random classifier (AUC 0.597, see Table 5), explained by the almost identical density distributions of the damage and no-damage classes distinguished solely by the slim right-hand tail of positive canopy height change values in the nodamage class.The shape of the ROC curves (and accordingly, the AUC values of 0.764 and 0.867 in the higher BA strata, see Table 5) improved with increasing BA values, supported by the better separation of the damage and no-damage classes in the moderate and high BA strata.The no-damage class distributions had peaks in the negative height change region (Figure 5(a)), more prominent in the lower BA strata, -these resulted from a combination of the error of omission S R 1 ¼ 0:9 À � inthe ground reference (Section 3.5) and dense image matching errors (Section 3.2). Figure 5(a) indicates that, based on the relative shape of thedensity plots for the no-damage class, the high BA stratum was less prone to imperfect S R 1 and reduced S 2 compared to the moderate BA stratum.

Classified map accuracy
Table 6 shows map-level and stratum-level perceived accuracy measures resulting from rigorous map accuracy assessment of the non-stratified thresholding classifier.On both levels, there was a slight improvement over the respective values reported in Tables 4 and  5(b) except for the low BA stratum.On the map level, rigorous map accuracy assessment showed a slightly higher g MCC M of 0.505 (Table 6(a)), compared to 0.465 in Table 4.We found the classified windthrow damage map to have a higher sensitivity f S M 1 (0.775) than specificity f S M 2 (0.732), in contrast to our earlier finding of higher specificity e S 2 (Table 4).On the stratum level, map accuracy assessment revealed a negative g MCC M of −0.152 in the low BA stratum (Table 6(b)), caused by the model's zero sensitivity f S M 1 .In the moderate and high BA strata, we found g MCC M to be higher than the respective g MCC values in Table 5 (b) − 0.46 vs. 0.384 (moderate BA) and 0.718 vs. 0.64 (high BA).Similarly to the map level, the classified windthrow damage map was consistently more sensitive than specific in denser forest stands, i.e., better able to discriminate against false negatives than false positives.

Discussion
This study demonstrated that windthrow damage can be detected as a decrease in forest canopy height from a pre-storm to a post-storm DSM obtained by 3D reconstruction of WV-3 stereo imagery.The utility of the method is limited to moderate-to-high density productive boreal conifer forest stands, where accurate windthrow maps could be produced even under suboptimal imagery collection conditions, such as sun elevation lower than 25° and presence of snow.

Mapping windthrow with VHR stereo optical satellite imagery
Major windstorms caused by extratropical cyclones forming in the North Atlantic tend to hit the Nordic countries during the late autumn and winter months (Feser et al. 2015;Gregow, Laaksonen, and Alper 2017), posing operational challenges for the collection of optical satellite imagery at higher latitudes due to a combination of persistent cloud cover, low sun elevation, poor lighting conditions, and snow cover.Specifications of the collected GE-1 imagery in Table 1 offer an illustration of such challenges: the GE-1 imagery was acquired during a short cloud-free window within 1 week after the windstorm and had both a high off-nadir angle of 42° and a low sun elevation of slightly above 8°, i.e., conditions considered unfavourable for stereophotogrammetric reconstruction (Piermattei et al. 2018;Qin 2019).Once the location of windthrow damage is known, the spatial extent of the area to be covered by 3D reconstruction-based windthrow mapping does not appear to be a practical limitation as a single stereo collection scenario by, e.g., WV-3 can have a footprint of up to 2900 km 2 (Maxar Technologies 2020a) and can be combined with collections by other VHR satellites in the same or different constellation.
During the winter months, 3D reconstruction of optical VHR imagery can be combined with bitemporal change detection or time-series analysis using lower-resolution optical satellite imagery collected continuously (e.g., by Sentinel-2 or the PlanetScope constellation) and synthetic aperture radar (SAR) imagery products collected by active spaceborne sensors irrespective of cloud cover and lighting conditions (e.g., TerraSAR-X/TanDEM-X and the Capella Space and ICEYE constellations).Such alternative methods using lowerresolution optical satellite imagery are reported to have classification accuracies close to those achieved in this study (Chehata et al. 2014;Dalponte et al. 2020), but may involve long waiting time until imagery of satisfactory quality is collected over an entire region of interest (Vaglio Laurin et al. 2020).A combination of mutually complementary methods offering different trade-offs regarding the levels of detail, accuracy and acquisition cadence appears thus to be the optimal solution for large-scale windthrow mapping (Schwarz et al. 2003).
In this study, we chose forest canopy height change between two timepoints as the main input variable to the windthrow classification models, rather than the spectral information stored in WV-3's eight MS bands (Maxar Technologies 2020a) or the spectral indices using those, such as NDVI (Tucker 1979).The rationale behind this choice was to make the proposed windthrow detection workflow insensitive to such effects on the forest canopy spectral properties as species composition and phenological variation, presence of snow, tree crown shadows and lighting conditions, and to make the classification model as generalizable as possible.
Earlier work indicates that forest canopy height can be measured by digital stereophotogrammetry with an error that is one order of magnitude smaller than mean canopy height in a mature boreal conifer forest (Goldbergs 2021;Goldbergs et al. 2019;Loghin, Otepka-Schremmer, and Pfeifer 2020;Montesano et al. 2017;Persson and Perko 2016;Piermattei et al. 2018Piermattei et al. , 2019;;St-Onge and Grandin 2019;St-Onge, Hu, and Vega 2008), implying the possibility of reliably detecting both uprooting and stem breakage, both causing a major reduction in canopy height exceeding the measurement error.

Windthrow detection accuracy
To the best of our knowledge, few studies exist where 3D reconstruction of VHR satellite stereo imagery is employed to detect and map forest windthrow.One of these is Tian et al. (2017) where a post-storm WV-2 photogrammetric DSM was compared to a pre-storm LiDAR DSM, however the WV-2 imagery was collected 5 years after the windstorm, making their findings less relevant to the operational post-event mapping context.
We consider more realistic the presented scenario where satellite imagery is collected shortly after a windstorm when few alternative VHR sources are available.In that context, the ground reference data can both be expected to be scarce and incomplete and contain error, such as horizontal shift due to improper georeferencing, misclassified transitional cases (Foody 2010), bias introduced by applying arbitrary classification criteria or by generalization techniques used, e.g., morphological operators (Tian et al. 2017).However, the magnitude of error is hard to estimate and can range from 15% (Foody 2010) to 60% (Thompson et al. 2007).It is therefore important to characterize the oftenpredictable effect on the classification accuracy estimates of imperfections in the ground reference considering that the change class prevalence may vary (Foody 2010(Foody , 2011)).Finally, we consider it relevant to not only check whether forest stand attributes, such as BA, can improve windthrow detection performance, but also examine how sensitive the classifier is to forest conditions other than fully stocked productive stands.
Loghin, Otepka-Schremmer, and Pfeifer (2020) reported that WV-3 photogrammetric CHMs cannot reliably estimate tree height in conifers with narrow crowns <2.5 m (<8 image pixels) in diameter, resulting in a reconstructed tree height of <50% of the actual height.However, in conifers with crowns >5 m (>16 pixels), >90% of the actual tree height was reconstructed.This is consistent with the findings that dense image matching of VHR stereo is challenging in open-canopy forests (Goldbergs et al. 2019) when the sun elevation angle is above 25° (Montesano et al. 2017).In our study, the clear effect of BA as a proxy for crown closure on the classification performance is evident from Table 6 with a decreasing BA in boreal thinned or high-elevation forest stands, crown diameter tends also to decrease, and the forest canopy becomes discontinuous, causing a drop in the tree crown detection rate and a false positive classification outcome.While crown closure is not a forest stand metric commonly available as part of national forest inventories, BA is widely available and tends to correlate with the forest stand's development stage and thus crown closure, becoming a useful sensitivity metric in 3D reconstruction of a forest canopy in a setting similar to the one described in this study.
Figure 5(a) demonstrates that in the low BA stratum, the damage and no-damage classes have an almost identical canopy height change distribution -apart from a more pronounced right-hand tail in the no-damage class, both classes have their peaks at −2 m.This explains the very different e S 1 and e S 2 values in the stratified and non-stratified models: the former was optimized for the low BA stratum by choosing a sensitive threshold of -0.65 m, while in the latter one, the low BA stratum is a minority class and the single threshold is optimized for the majority class, i.e., the moderate and high BA strata, giving a much less sensitive threshold of -3.56 m and a higher e S 2 at the expense of a major increase in false negatives.
The most plausible explanation for the nearly identical canopy height change distribution of the damage and no-damage classes in the low BA stratum is a combined effect of the low tree crown detection rate and the imperfect ground reference (reduced sensitivity S R 1 ).The assumed imperfections in the ground reference also affect the moderate and high BA strata, but to a much lower extent Figure 5(a): the no-damage class in the moderate BA stratum exhibits a stronger positive tail and a less pronounced peak in the negative region, and the high BA stratum has a well-expressed bimodal distribution dominated by positive values.Shapes of the ROC curves in Figure 5(b) illustrate the improvement in the model's ability to discriminate with increasing BA.
Another potential misclassification factor, acting irrespective of the BA, are the smooth edges (e.g., between forest and non-forest) characteristic of photogrammetric satellite DSMs and caused by a combination of a lower GSD as compared to a LiDAR DSM and pixels representing tree crown sides that failed to be reliably matched in the 3D reconstruction process.This edge effect might increase the fraction of false negatives by extending the forest canopy beyond its actual boundary, especially in case of partial wind damage where intact trees are interspersed with uprooted ones.We believe that in this study, the edge effect had only a minor impact on the windthrow classification accuracy because of the severity of the wind damage; however, it is to be taken into consideration when applying the proposed workflow to less severe wind damage events.
This study examines a case of severe windthrow concentrated in a small study area with a prevalence θ of 0.5 across the three BA strata (from 0.33 in the low BA to 0.6 in the high BA stratum), which may not apply to windstorms less severe than the 19 November 2021 event and resulting in a diffuse spatial pattern with a lower prevalence.
Rigorous map accuracy assessment (Table 6) was generally consistent with the findings on model accuracy, confirming that a single threshold makes the model insensitive to windthrow in sparse forest.The choice of whether to stratify the threshold should be governed by the classified map user's preferences and cost functions associated with false negatives vs. false positives and the expected 3D reconstruction performance (Piermattei et al. 2019).
We chose a grid-based approach for this study since the auxiliary data, such as the SR16-derived forest mask and forest attributes, followed that format.Aggregating canopy height change over a 16 × 16 m cell simplifies the windthrow detection workflow, simultaneously making the classifier more prone to false positives.Alternative workflows would involve applying morphological filters to a canopy height change raster (Honkavaara, Litkey, and Nurminen 2013) or undertaking pre-and post-damage tree crown segmentation to detect canopy height change on a single-tree level (Gomes and Maillard 2016;Skurikhin, McDowell, and Middleton 2016;Tong et al. 2021;Wagner et al. 2018).

Conclusion
VHR satellite stereo imagery is a viable source of forest canopy height information sufficiently accurate to map forest disturbances such as windthrow that can be combined with bitemporal change detection and time-series analysis methods for region-scale mapping of wind damage.Using the proposed photogrammetric DSM reconstruction workflow and a simple thresholding model requiring no inputs other than canopy height change, accurate windthrow maps can be produced in moderate-to-high density productive forest stands.One limitation of the proposed workflow is that it is less reliable in sparse and high-elevation forest stands.Another limitation is the dependence on the availability of a relatively recent pre-event DSM and of pre-event data on BA or other measure of crown closure.

Figure 1 .
Figure 1.Orthorectified pan-sharpened false-colour infrared WorldView-3 image of the study area (left) with examples of damage and no-damage SR16 cells and salvaged areas (right).Coordinates are given in ETRS89 UTM 32N.(Satellite imagery © 2023 Maxar Technologies).

Figure 2 .
Figure2.Two-by-two (binary) confusion matrix where each observation is placed in one of the four cells based on the relationship between the predicted and reference value.
) reports the classification accuracy measures for the thresholding classifier model stratified by BA (Section 3.5) with the threshold value optimized per stratum.For comparison, classification accuracy is also presented in Table 5(b) for the original nonstratified thresholding model with a breakdown into the BA strata.Stratifying the classification threshold failed to materially improve the classification accuracy as measured by g MCC and at the same time, introduced a strong bias towards sensitivity S 1 , especially in the low and moderate BA strata.The stratified threshold value differed greatly between the strata -from close to zero (−0.65 m) in the low BA stratum to a strongly negative value of −3.62 m (close to the non-stratified value of −3.56 m) in the high BA stratum.

Figure 3 .
Figure 3. Histograms (a) and normal Q-Q plot (b) of the photogrammetric DSM error distribution.For readability, the histogram (bin width 10 cm) is also presented for three separate intervals with different scaling of the vertical axis: −4 m to −2 m, −2 m to +1 m, and +1 m to +16 m.

Figure 4 .
Figure 4. Classified windthrow map of the study area on the SR16 grid using the non-stratified thresholding classifier (a).Examples of correctly classified and misclassified cells, including respective nDSM profiles extracted from the reference and photogrammetric nDSMs (b) -(e).Axes in (b) -(e) are in m, blue dotted guidelines in indicate ground surface where the photogrammetric nDSM has negative elevations.Coordinates are given in ETRS89 UTM 32N.(Satellite imagery © 2023 Maxar Technologies).

Figure 5 .
Figure 5. Density plots of the canopy height change value in the no-damage and damage classes stratified by BA (a); ROC curves for the three BA strata (b); and sensitivity vs. specificity plot of the thresholding classifier, stratified by BA (c).Vertical lines in (a) and (c) and points on the curves in (b) show optimal threshold values in m by BA stratum.

Table 1 .
Specifications of the VHR satellite images: WV-3 stereo pair and two GE-1 strips.

Table 3 .
Accuracy measures, including robust, distribution-independent ones, for the photogrammetric DSM.

Table 2 .
Geolocation accuracy of the reconstructed 3D model before and after GCP optimization, RMSE in m.

Table 4 .
Accuracy measures (perceived values, estimates of real values assuming S R 1 ¼ 0:9, S R 2 ¼ 1 are given in brackets, where applicable) of the thresholding classifier using canopy height change as the input.

Table 5 .
Stratum-level accuracy measures (perceived values, estimates of real values assuming S R 1 ¼ 0:9 are given in brackets, where applicable) of the stratified canopy height change-based classifier (a), compared to a breakdown of the original non-stratified model by BA class (b).AUC values are unaffected by stratification and thus identical in (a) and (b).

Table 6 .
Map-level (a)and stratum-level (b) rigorous map accuracy assessment (perceived values) on an extended set of imagery (n = 283) of the classified windthrow damage map (Figure4).