Estimation of tropical forest aboveground biomass in Nepal using multiple remotely sensed data and deep learning

ABSTRACT This study assessed the prediction accuracy of the forest aboveground biomass (AGB) model using remotely sensed data sources (i.e. airborne laser scanning (ALS), RapidEye, Landsat), and the combination of ALS with RapidEye/Landsat using parametric weighted least squares (WLS) regression. We also analysed the AGB model using random forests, extremely randomized trees, and deep learning stacked autoencoder (SAE) network from nonparametric statistics to compare the performance with WLS regression. We also compared the widths of the 95% confidence intervals for estimates of the mean AGB per unit area using the model-based estimator. The study site in the Terai Arc Landscape, Nepal, comprised 14 protected areas extending from the southern part of Nepal to India and encompassed mosaics of continuous dense forest and tall grassland. The ALS data provided the largest prediction accuracy (0.30–0.35 relative root mean squared error (rRMSE)), whereas RapidEye and Landsat had smaller prediction accuracies (0.48‒0.54 and 0.47‒0.55 rRMSE, respectively) for the estimation of AGB. The combined use of ALS and RapidEye predictors in the AGB model reduced the rRMSE and narrowed the confidence interval compared with ALS alone, but the improvements were minor. The SAE prediction technique provided the largest prediction accuracy, with inputs of combined ALS and RapidEye predictors that yielded an R2 of 0.80, an rRMSE of 0.30, and a confidence interval of 176‒184 compared to other tested prediction techniques. The SAE prediction technique can become more powerful than other tested prediction techniques if properly adjusted and tuned for accurate forest AGB mapping applications. To our knowledge, this is the first study assessing the performance of the SAE in AGB modelling with a range of hyper-parameter values.


Introduction
Forest aboveground biomass (AGB) estimation is pivotal in several forestry-related activities, such as silviculture, environmental management, biodiversity conservation, and restoration CONTACT Parvez Rana parvez.rana@luke.fiNatural Resources Institute Finland (Luke), Paavo Havaksen tie 3, Oulu 90570, Finland (Araza et al. 2022;Duncanson et al. 2021).Forest AGB estimation is also directly linked to carbon sequestration and carbon storage and plays a key role in climate change mitigation.The importance of the accurate estimation of forest AGB has been highlighted in international activities such as reducing emissions from deforestation and forest degradation in developing countries (REDD+) (Latifi et al. 2015), and the challenges to the successful implementation of these initiatives have been shown to arise from the need for reporting forest AGB stocks (Naesset et al. 2016).Advancements in remote sensing (RS) technology have enabled more accurate and precise estimation of AGB stocks (Araza et al. 2022;Duncanson et al. 2021).Among all carbon pools, AGB is the most dynamic and variable, quickly reflecting management-related changes (Duncanson et al. 2021;Houghton, Hall, and Goetz 2009).
Enhanced tropical forest AGB estimates could provide crucial information for global carbon accounting since the degradation and deforestation of tropical forests account for 10% of the total anthropogenic carbon emissions (IPCC, MassonDelmotte et al. 2021).Also, tropical forests store approximately 55% of the total carbon (Pan et al. 2011;Urbazaev et al. 2018) and contribute to 70% of the total global forest carbon sink (Urbazaev et al. 2018).The Intergovernmental Panel on Climate Change (IPCC) encouraged countries to carry out inventories on greenhouse gas, and REDD+ reporting should provide an accurate estimate of forest AGB (Hou et al. 2013;Rana 2016;Xu et al. 2018;Xu, Hou, and Tokola 2012).In turn, this will contribute to global efforts to combat climate change and preserve the vital role that tropical forests play in maintaining the Earth's carbon balance.
Forest AGB has traditionally been estimated using field inventory methods (such as national forest inventories (NFIs)), which are expensive and restricted in time and space.A combination of RS data and field-based surveys can offer a practical and cost-efficient alternative for forest AGB estimation (e.g.Urbazaev et al. 2018).In the recent decade, a number of RS-based studies have been conducted at local (Hou, Xu, and Tokola 2011), national (Rodríguez-Veiga et al. 2016), continental (Baccini et al. 2008), and intercontinental scales (Avitabile et al. 2016).Among the RS sources, three-dimensional airborne laser scanning (ALS) data have frequently been reported to achieve a larger prediction accuracy across various geographical units (e.g. at the national level) for reducing the cost of large-scale forest AGB inventories (see Naesset et al. 2016).The ALS technology is now acknowledged to be the state-of-the-art preference for the RS-based assessment of AGB in extremely dense tropical forests (Chan, Fung, and Wong 2021).The accuracy and precision of ALS-based AGB estimation have varied from model to model and from area to area (Chan, Fung, and Wong 2021;Xu et al. 2018), although the ALS prediction technique had a larger estimation error for tropical forests compared to other biomes, such as boreal forests (Rana et al. 2014).The coarse-resolution RS data (e.g.Landsat) may over-or underestimate AGB stock in dense mixed-species tropical forests, whereas fineresolution RS data (e.g.ALS) can overcome this problem by providing an accurate and precise estimate of AGB stock (Chan, Fung, and Wong 2021).
AGB estimation using RapidEye data has been reported infrequently in the literature (but see Rana et al. 2014), although it has been claimed that a 5-metre pixel size with an additional red-edge band (0.69-0.73 µm) is potentially effective in tropical AGB estimation over time and space (Naesset et al. 2016).The RapidEye and Landsat optical data provide only limited information on forest structure in the vertical direction, and they do not provide cloud-free datasets on a large scale (Rana 2016).The use of ALS data overcame the cloud problem, although the availability of temporally and radiometrically corrected datasets at a large scale could be an issue (Abbas et al. 2020;Avitabile et al. 2012).
One approach to reducing the AGB estimation error could be the combined use of ALS and optical data, such as RapidEye and Landsat.This approach is reasonable because wall-to-wall AGB estimation using multiple sources of RS data, e.g. at the landscape level, have to meet even greater accuracy standards for AGB estimates over time and space as compared with single-source AGB estimates (Naesset et al. 2016).On the other hand, a number of issues need to be resolved, including the spatio-temporal variation and the unique profiles of individual sensors, i.e. differences in spectral and radiometric resolution depending on the data (Rana 2016).The integration of various remote sensing data sources, like ALS, RapidEye, and Landsat, holds promise for improving AGB estimation accuracy in tropical forests.By addressing the challenges and limitations of each data source, this approach could provide a more precise and cost-effective method for monitoring forest AGB and carbon stocks over time and space, ultimately contributing to better global carbon accounting.
Machine learning algorithms have become widespread as a substitute for the parametric approach (e.g.Ghosh and Behera 2021;Hudak et al. 2008;Niemi et al. 2015).Such algorithms (e.g.random forests (RF), Breiman (2001) can retain the variance structure of the predictions or estimates with respect to the field data (Dong et al. 2020;Verrelst et al. 2012) and effectively handle diverse environmental characteristics, achieving a reasonable prediction accuracy (Niemi et al. 2015;Zhang et al. 2019).These machine learning algorithms do not make assumptions about distributional characteristics for either auxiliary predictors (e.g.ALS metrics) or the predictors of interest (AGB) in the machine learning algorithms (e.g.Bousquet, Luxburg, and Ratsch 2004;Cao et al. 2018;Verrelst et al. 2012).However, they may produce a significant mean difference between observed and predicted values (LeMay and Temesgen 2005;Zhang et al. 2020).
Deep learning, a subset of machine learning techniques, excels at uncovering complex, nonlinear patterns in forest AGB (Zhang et al. 2019).It has also demonstrated the potential for classifying tree species, forest type, and land-use change detection (Nezami et al. 2020).Recently, the stacked autoencoder (SAE) network, a deep learning algorithm, has been employed for forest AGB estimation (Zhang et al. 2019) and image classification (Zhang, Ma, and Zhang 2016).The superior performance of the deep learning algorithm compared to the traditional parametric regression, other machine learning algorithms, e.g.RF and the extremely randomized trees (ERT) for AGB estimation, have been mentioned in some recent studies (Ayrey and Daniel 2018;Zhang et al. 2019).The combined use of ALS and RapidEye/Landsat predictors with the deep learning algorithms has the potential to reduce the estimation error compared to the other machine learning algorithms (Cao et al. 2018).
A plausible candidate model with a greater prediction accuracy may be deceptive if it is used to estimate population parameters or confidence intervals on data different from the ones used to fit the model (Hou et al. 2017).Confidence intervals for estimates of mean AGB per unit area depict the evaluation of the degree of accuracy of these products in geographical areas outside the calibration area (Baccini et al. 2012) or carbon policy development at various resolutions (Herold et al. 2019).Therefore, model comparisons of population estimation properties or confidence intervals are always preferred over-relying on prediction accuracy (Hou et al. 2017).In this paper, we compared the widths of the 95% confidence intervals for estimates of the mean AGB per unit area using the model-based estimator.
Our study assessed the prediction accuracy of the AGB model from RS data sources (i.e.ALS, RapidEye, Landsat) and the combination of ALS with RapidEye/Landsat using parametric multiple linear regression (weighted least squares, WLS).The study also analysed the AGB prediction techniques (ALS, RapidEye, Landsat, ALS+RapidEye, ALS+Landsat) using RF, ERT, and SAE from nonparametric statistics to compare the performance with WLS regression.The overall goal of this study was to improve the accuracy of AGB estimates using ALS and satellite imagery.The specific objectives were two-fold, namely 1) to select the optimum predictor variables among ALS, RapidEye, Landsat, and combinations of ALS and optical predictors; and 2) to select the optimum models to use with predictor variables to improve AGB estimates.To our knowledge, this is the first study to assess the performance of the SAE prediction technique in AGB modelling with a range of hyper-parameter values.

Study area and experimental data
The study area is located in the Terai Arc Landscape (TAL) in southern Nepal (27.14° −29.08°N and 80.15°−85.49°E), as shown in Figure 1.The TAL area is a chain of 14 protected areas located in the southern part of Nepal, close to the boundary with India.It covers 23,000 km 2 and consists of a mosaic of patches of continuous dense forest.The region encompasses a flat plateau with an elevation range between 60 and 300 metres.The climate ranges from tropical to subtropical, with most tropical areas in the east and drier areas in the west.Typical precipitation ranges between 600 millimetres in the west and 1,300 millimetres in the east, with winter rain occurring in the west.A total of 22% of the country's land area belongs to this zone (Rana 2016).Several ecosystem services are provided by the region, including those related to production, ecology, and protection, as well as sustainable livelihoods.The major tree species are Sal (Shorea robusta), Chir Pine (Pinus roxburghii), Marking Nut (Semecarpus anacardium), Axle Wood (Anogeissus latifolia), Schima (Schima wallichii), Karmal (Dillenia pentagyna), Java plum (Syzygium cumini).Other trees include Indian Gooseberry (Phyllanthus emblica), Indian Laurel (Terminalia tomentosa), and Malabar Plum (Syzygium jambos).
The field sample data and RS data were compiled from previous studies in the same area (see Rana 2016;Rana et al. 2017).For more information on the datasets, see Rana (2016) and Rana et al. (2017).A total of 960 sample plots were initially designed, however, 632 sample plots (radius 12.5 m) were surveyed from March to May 2011 (Table 1).Systematic cluster sampling was used to place the sample plots within rectangular blocks.Each rectangular block (a size 5 km × 10 km) contained six clusters, with eight plots in each cluster.It is worth noting that there was none or a small spatial correlation between sample plots within the same cluster in the study area (Rana 2016;Tokola and Shrestha 1999), but observations and measurements might be correlated.A total of 328 plots were unmeasured or partially measured due to GPS positioning errors and inaccessibility.As a handheld GPS was difficult to use in mountainous Nepal with inaccessible conditions, such as steep slopes and dense forests, the positioning errors were included in the unmeasured or partially measured sample plots.The average AGB values were 302 tons/hectare for partly measured sample plots and 188 tons/hectare for fully measured sample plots.The species and diameter at breast height (DBH) of all trees with DBH ≥5 cm were assessed in each plot.Every 5 th tallied tree was measured in the plot and used to calibrate height models for estimating the heights of the remaining tally trees.The AGB values (tons/hectare) were estimated from tree-level attributes (e.g.height, DBH, species, wood density factor) using standard models and methods described in detail in Rana (2016).
The ALS data were acquired between 14 March and 2 April 2011, using a Leica ALS50-II scanner.The ALS flight was operated at an altitude of 2,200 m above ground level, which yielded nominal ALS echoes with a density of 0.8/m 2 .The ALS points were classified as ground and non-ground points following the methodology of Axelsson (2000).A 1 m digital terrain model (DTM) was created from the ground points, and the DTM was used to  (1999).Calibration was primarily concerned with the effects of sun sensor angle and atmospheric conditions (Tokola, Löfman, and Erkkilä 1999).

Predictors derived from the RS data
Predictors were extracted from the ALS data using the area-based approach (Cao et al. 2018;Latifi et al. 2015;Naesset 2002).A total of 30 ALS predictors related to the height and density of the ALS pulse returns were calculated from the first and last echoes.The above predictors have been used in other recent studies (e.g.Cao et al. 2018;Latifi et al. 2015;Rana 2016).First echoes comprised of echo categories 'first of many', and 'only' and last echoes comprised of 'last of many' and 'only'.The first and last echoes contained most of the information about the upper and lower canopy (Hou, Xu, and Tokola 2011;Naesset et al. 2016;Rana 2016).The use of a threshold value for ALS predictor extraction was based on previous studies (see Kauranne et al. 2017;Rana et al. 2017).The ALS predictors were extracted using R and Python programming and were as follows: • Height percentiles of first echoes at 10, 20, 30, . . . 100 th .
• Mean height of first echoes over a threshold of 5 m.
• Standard deviation of first echo height.
• Proportion of first echoes less than 5 m to all first echoes.
• Proportion of last echoes less than 5 m to all last echoes.
• Proportion of last echoes with a height less than 1.5 m + i × 3 m for i = 0. ..7 and total number of last echoes.• Logarithm of the first echo height less than 5 m to all first echo heights.
• Average height of the highest three echoes from the first echoes.
A total of 21 predictors from RapidEye data were extracted.The predictors were the means of the five spectral bands as well as a near-infrared-based normalized difference vegetation index (NIR NDVI) and a red-edge NDVI (Xie et al. 2018).In addition, NIR-based NDVI for individual sample plots was used to extract 14 textural predictors, namely angular second moment, contrast, correlation, sum of squares or variance, inverse difference moment, sum average, sum variance, entropy, sum entropy, difference variance, difference entropy, information measures of correlation, second information measures of correlation, and maximal correlation coefficient (Haralick, Dinstein, and Shanmugam 1973).We used the haralick function in the mahotas library in Python programming to extract the textural predictors.Textural features from NIR-based NDVI helped to describe the spatial distribution of biomass in the region of interest (Hou, Xu, and Tokola 2011;Rana 2016).In addition to calculating the mean values of the six Landsat TM bands, we also derived two vegetation indices, i.e.NDVI and atmospherically resistant vegetation index (ARVI, Kaufman and Tanre 1994).In total, eight Landsat predictors were used in building the AGB model.If the sample plot polygon boundary was covered in more than one raster pixel, the average value of raster pixels was used.

Development of the AGB model
The term 'AGB model' here refers to the combination of the mathematical expression representing the relationship between field-measured AGB and RS predictors.In the AGB model, RS predictors were compiled using a hybrid approach.First, a correlation filter (findCorrelation function, R Core Team 2020) was used to ensure that the multicollinearity among the predictors was minimized.A correlation filter refined the list of candidate predictors using a pair-wise correlation among the predictors.If two predictors had a correlation value greater than 0.8, the findCorrelation function excluded the predictor with the largest absolute mean correlation.An exhaustive search of the predictors using an efficient branch-and-bound algorithm was also performed.The regsubsets function in leaps packages was used, which applied the Bayesian Information Criterion (BIC) for selecting a subset of predictors (R Core Team 2020).In addition to the above steps, the maximum R 2 (coefficient of determination) improvement approach was used to select the RS predictors to be included in the models (Latifi et al. 2015;Naesset et al. 2016).
The AGB model was developed using WLS regression which was easy to understand and interpret the correlation between AGB and predictors derived from ALS/RapidEye/Landsat. The weights were constructed in such a way that the sample plot measurements with smaller variances were given greater weight.However, nonparametric statistics, i.e.RF, ERT, and SAE prediction techniques, were also used individually in building the AGB model.The AGB models from these two families (i.e.parametric and nonparametric statistics) may behave quite differently (Hou, Xu, and Tokola 2011;Latifi and Koch 2012;Zhang et al. 2019), and it was crucial to assess how the estimation error varies from parametric WLS regression to deep learning SAE.
The RF makes final decisions using ensemble decision trees based on a majority vote (Breiman 2001).The parameters used in the RF were 501 decision trees (ntree), three predictors at each split (mtry), and 100 times iterations.We validated these parameter choices through a series of cross-validation experiments, in which we systematically varied the parameters and evaluated the performance of the resulting models.The chosen configuration delivered the best balance of predictive accuracy and model complexity in these experiments.The ERT is similar in manner to RF but reduces the variance and the mean difference by increasing randomization and using the whole original sample rather than bootstrap replicas (Geurts, Ernst, and Wehenkel 2006).While splitting a tree node, the ERT randomly chooses both attributes and cut-points (Geurts, Ernst, and Wehenkel 2006).As with the RF, ntry = 501 and mtry = 3 were used (Table 2).
The SAE, a deep learning algorithm, was used in AGB model development.It automatically extracts features through multiple layers of nonlinear processing to make them flexible in modelling the complex relationships with AGB compared to the classical prediction techniques (i.e.WLS) (Shao, Zhang, and Wang 2017).The SAE uses a hierarchical approach to build a deep neural network to extract deep features of data; it has an encoder and a decoder.An encoder converts the RS predictors into digital signals, whereas a decoder reconstructs the RS predictors from the digital signals.After each autoencoder is trained and after discarding the decoding layers, an SAE is formed using the encoder to all SAEs in a predictor-by-predictor manner (i.e.stacked).Subsequently, a deep neural network uses these trained SAEs to implement the AGB prediction work.Here, the grid search function of the R program was used to find the optimum value for different parameters of SAE (Table 2).The data were normalized using minimum and maximum values derived from training data to allow the SAE prediction technique to reconstruct the data effectively.Two convolutional layers with 264 and 128 nodes, respectively, were used.Each layer's output was standardized and passed through an activation function based on the Hyperbolic tangent function (Tanh).An activation function indicates the contribution of the input from the previous layer to the final output.A dropout layer with a rate of 0.2 was used to prevent overfitting and enhance the generalization ability of the network.The dropout rate of 0.2 means 20% of the hidden features were dropped randomly.The initial learning rate of 0.01 with a momentum of 0.5, a rate decay of 0.0006, L1 (Lasso) regularization of 0.0001, and L2 (Ridge) regularization of 0.00001, was used to optimize the SAE network parameters.The SAE used the Huber loss function (Equations 1, L HUBER ) to optimize the learning curve of the AGB model.As a result of the prediction of the final layer, the loss was calculated using the difference between the predicted and observed AGB value: where P and O indicate the predicted and observed values for the training example of j, respectively.The SAE was trained over 60 epochs, using a batch size of 10.Nonetheless, to prevent model overfitting, an early stop was used with a patience value of five.This means that if accuracy was not improved (i.e.root mean square error, RMSE) after five iterations, the process would end.The R packages randomForest function from random forests, extraTrees function from extra trees, and h2o.deeplearning function from H2O were used to implement the RF, ERT, and SAE, respectively (R Core Team 2020).A total of 20 AGB models were developed using WLS, RF, ERT, and SAE from the (i) ALS, (ii) RapidEye, (iii) Landsat, (iv) ALS + RapidEye, and (v) ALS + Landsat, respectively.We presented the width of the confidence intervals (95%) for estimates of the mean AGB per unit area.We used the model-based estimator given in Equation (2), variance (Equation 3), and bootstrapping resampling (random) to report the width of the confidence intervals for estimates of the mean AGB per unit area.During the bootstrapping resampling procedure, we used bootstrapping residuals and cluster bootstrapping (Efron and Tibshirani 1994;Field and Welsh 2007).With this approach, we focused on resampling the residual errors and draws a random cluster from N (0, b K�K ) and adding them back to the prediction techniques to form a resample or bootstrap cluster resample.During the cluster sampling, we used single-stage cluster sampling, which consists of first randomly selecting clusters and then selecting all plots within clusters.With this approach, we retained the spatial structure of the sample data and mimicked the original sampling design.We resampled the original calibration size 632 plots 10,000 times with replacements.Rather than the number of resampling, the most important issue is whether the estimates remained stable.For this study, 10000 repetitions were sufficient because the estimates changed little after 8,000 repetitions.
where μ is the mean AGB , and μ� k is the bootstrap estimate.We used 10,000 (k = 1, . . .., K) iterations for bootstrapping resampling.The variable importance function in respective prediction techniques (i.e.WLS, RF, ERT, and SAE) was used to identify the relative importance of various predictors of ALS/ RapidEye/Landsat for each AGB model.Each model's performance was evaluated using RMSE (Equation 4), relative RMSE (rRMSE, Equation 4), MD (mean difference), and R 2 .Using 632 plots, 10-fold cross-validation was employed and repeated 10 times, and the average values were reported in the results (Cao et al. 2018;Chan, Fung, and Wong 2021).

RMSE ¼
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi P n i¼1 where y i represents the observed value for a specific sample plot and b y i represents the estimated value for the same sample plot i, � y represents the mean value of the observed sample plots, and n signifies the total count of sample plots.

Relationship between AGB and RS predictors
ALS (n = 5, Figure 2), RapidEye (n = 3, Figure 3), and Landsat (n = 5, Figure 4) predictors were statistically significantly correlated with AGB (Spearman's correlation coefficient, p < 0.05).The importance assigned to different predictors varied among estimation methods (Figure 5).As a result, ERT indicated that some predictors were more important compared to others.A noteworthy finding was that the importance assigned by SAE to each predictor was homogenous (relative importance values of the predictors were similar).In the ALS prediction technique, height values at the 40 th percentile of the first echoes had a larger importance value compared to other predictors.In the combined approaches (ALS + RapidEye, ALS +  Landsat), ALS predictors had more importance compared to RapidEye/Landsat predictors (Figure 5).

AGB model development using WLS
The ALS prediction technique consisted of five predictors related to tree height and density (Table 3).Though both the first and last echoes of ALS data were used to form models, the first echoes had a more correlation with AGB compared to the last echoes (Table 3, Figure 2).It would not have been accurate to form AGB models using only the first echoes as they were mainly derived from the upper canopy (Naesset et al. 2016).For the RapidEye prediction technique, the mean value of the red band and the sum of averages accounted for much more of the variability in AGB than the sum of entropy (Table 3, Figure 3).The model statistics showed that NIR and ARVI explained more of the variability in AGB than the other Landsat predictors (Table 3, Figure 4).Four predictors were used for the combined use of ALS and RapidEye, although three predictors from ALS had explained most of the variability in AGB (Table 3, Figure 2).In the second combined approach to the ALS and Landsat data in the AGB modelling, four predictors were used.Similarly, most of the predictors were derived from ALS data with a significant relationship (Spearman's correlation coefficient, p < 0.05) with AGB (Table 3, Figure 2).
In the comparison of individual sensor results, the ALS prediction technique yielded the smallest estimation error, followed by RapidEye and Landsat (Table 4, Figure 6).Correspondingly, the rRMSE of the ALS prediction technique was 0.35, followed by RapidEye's 0.54 and Landsat's 0.55.The ALS prediction technique data had the smallest AIC value compared to the other tested prediction techniques (Table 4).The best performing was put in bold.
In the combined approach to the ALS predictors with RapidEye/Landsat predictors, the use of ALS and RapidEye predictors in the AGB model reduced the rRMSE by 0.007 compared with ALS alone but the improvements were minor (Cao et al. 2018;Hou, Xu, and Tokola 2011).The combined use of ALS and RapidEye predictors in the AGB model provided narrower confidence intervals compared to the other tested AGB models.

AGB model development using RF, ERT, and SAE
The same predictors (Table 3) were used for building the AGB model applying RF, ERT, and SAE prediction techniques to compare the performance with WLS regression.The hyperparameter values in the SAE prediction technique affected the AGB model.The performances of few of the important hyperparameters, such as epoch, activation function, and the number of nodes in the hidden layer, are presented in Figure 7.The learning curves of the SAE were stable with the 60 epochs (Figure 7), and the Tanh activation function provided a smaller RMSE than Rectifier (Figure 7).Two hidden layers, with 264 and 128 per layer, provided the optimum result for the AGB model (Figure 7).
The SAE prediction technique provided a smaller rRMSE, followed by the ERT and the RF prediction techniques (Table 5).The RF and ERT prediction techniques, in most cases, produced similar prediction accuracies (i.e.R 2 and rRMSE values) in AGB estimation (Figures 8 and 9).Furthermore, the prediction accuracy was not necessarily increased by adding variables from RapidEye and Landsat, i.e. the prediction techniques with input of ALS predictors, in most cases, achieved smaller estimation errors than those with ALS +RapidEye and/or ALS+Landsat.The SAE prediction technique with ALS and RapidEye predictors provided the largest R 2 of 0.80 and the lowest rRMSE of 0.30 (Figure 10).Similarly, the combined use of ALS and RapidEye predictors in the SAE prediction technique provided narrower confidence intervals compared to the other tested prediction techniques.

Discussion
This study considered the prediction accuracy of individual sensor results and the combination of ALS, RapidEye, and Landsat datasets for the estimation of AGB in a dense tropical forest in Nepal.As hypothesized, the ALS prediction technique provided smaller study showed the performance of the SAE prediction technique with a range of hyperparameter values, which was our novelty compared to an earlier study by Zhang et al. (2019).The advantage of the SAE prediction technique is the automated learning of features.The SAE compresses the input feature (predictors) into a smaller-dimensional feature space and then estimates AGB from this representation.Deep learning algorithms provided smaller estimation errors compared to the nonlinear regression for forest attributes inventory (e.g.Ercanli 2020; Muukkonen and Heiskanen 2005;Ogana and Ercanli 2022;Ozcelik et al. 2013).Furthermore, the ERT prediction technique provided a slightly smaller RMSE than RF (e.g.Barrett et al. 2014).However, some studies, e.g.Latifi and Koch (2012), reported a smaller estimation error of RF over ERT in estimating forest AGB.
Several hyper-parameters influenced the performance of SAE prediction techniques, such as the number of nodes in the hidden layers, epochs, the activation function (result shown), input dropout ratio, and regularization parameters (result not shown) (Ghosh and Behera 2021;Lv et al. 2021;Ogana and Ercanli 2022).Interestingly, the Tanh activation function provided a smaller RMSE than the Rectifier activation function, although it is the default option for many SAE prediction techniques.A probable explanation could be that with Tanh activation, the nodes can learn more complex structures of AGB distribution in the study area (Bengio 2012;Heaton 2008;LeCun et al. 2012).The SAE prediction technique can become more powerful than other tested prediction techniques if properly adjusted and tuned for application in precision forest AGB mapping (Shao, Zhang, and Wang 2017;Zhang et al. 2019).
The ALS prediction technique slightly underestimated certain attributes, especially for larger AGB values, where the residual error increased to a marginal extent.This underestimation may occur in plots with large field-measured values that contain large trees with a small canopy density (Rana et al. 2017), as these plots require a fairly large density of ALS echoes (0.8 pulse/m 2 in the present case).Despite low-density ALS data (e.g. 1 pulse/m 2 ) being able to reliably estimate typical forest structure, including AGB, at the plot level (Jakubowski, Guo, and Kelly 2013), coverage-related metrics are more sensitive to changes in ALS pulse density.One possible contributing factor in our study cases could be the removal of tree foliage for feeding domestic animals such as cattle or goats, leading to reduced foliage in many trees (Panthi 2013).Another factor could be that ALS echo penetration as far as to the ground was small in areas with dense understory vegetations and complex mountainous forests, as in the area studied here, so some ALS echoes are only reflected from the upper canopy, resulting in the underestimation of tree heights.Similar issues have been observed in boreal forests and tropical forests (Chan, Fung, and Wong 2021;Hou, Xu, and Tokola 2011;Tegel 2011).
Both the RapidEye and Landsat prediction techniques involved a larger degree of error and underestimated the AGB in relation to the field measurements.The reasons for this could be that (i) data saturation occurred for both RapidEye and Landsat when the AGB density reached 100-150 tons/ha (Tegel 2011), (ii) the seasonal presence or absence of leaves and the fact that RapidEye and Landsat recorded small NDVI values, and (iii) the RapidEye/Landsat sensor records radiance coming mostly from the canopy surface.Fundamentally, although the canopy allows light penetration, a proportion of the incident energy is absorbed.In dense forests with thick foliage, only a portion of this incident energy reaches the ground and reflects to the sensor.Thus, the sensor primarily records energy scattered by the canopy part, leaving variation in AGB beneath unrecorded (Hou, Xu, and Tokola 2011) and (iv), the area concerned here is diverse in its forest characteristics, e.g.there are 253 individual tree species, tree stem diameters (6-100 cm) and, stand densities (20-2218 stems/ha) as well as a great diversity in growth stages and phenology which could affect the spectral signatures of the forest trees.Consequently, an AGB model based on RapidEye/Landsat predictors could not be transferred directly to this area for the purposes of AGB mapping (Lu et al. 2016).RapidEye and Landsat sensors can, however, provide archival data for spatio-temporal or time-series analyses of biomass stock change since the availability of extensive databases make national forest biomass mapping entirely feasible (Abbas et al. 2020).
The integration explanatory ALS predictors with RapidEye or Landsat predictors resulted in minor reductions in RMSE for the WLS, RF, ERT, and SAE prediction techniques, indicating that little improvement was achieved with these combined models (Cao et al. 2018;Hou, Xu, and Tokola 2011).One possible reason for this is that the spectral information gained from the 5-m pixels of RapidEye and the 30-m pixels of Landsat did not provide any additional information, which might have been needed to provide separate information on the woody and leafy canopy elements in largely dense tropical forests (Hou, Xu, and Tokola 2011).One possible solution could be using multispectral ALS data (e.g. from Titan), which might help provide the above additional information for assessing AGB distribution in dense mixedspecies tropical forests in Nepal (Dalponte et al. 2018).
As technology advances (e.g.multispectral ALS, ICESat, GEDI) and new methods (e.g.deep learning) are developed, the study results could be useful for developing carbon estimation methodologies as well as automating forest AGB monitoring and enabling sustainable forest resource management.This study used monospectral ALS data (wavelength 1,064 nm); however, multispectral ALS (wavelength 532-1,550 nm) data offer a diversity of spectral information which can help to estimate AGB accurately (Dalponte et al. 2018).Multispectral ALS systems offer more radiometric features because different wavelengths can be used (based on return intensity distribution) as compared to monospectral ALS, which can provide key attributes on the structure and branching patterns of tree foliage, foliage density leaf clumping, and leaf size (Korpela et al. 2010;Shi et al. 2018).

Conclusions
Three main conclusions were drawn from this study.Firstly, the ALS data provided a larger prediction accuracy compared to the RapidEye and Landsat data and could explain the variation of AGB in large, complex tropical forests in Nepal.Secondly, the results of this study confirmed the findings reported in the literature that using RapidEye/Landsat predictors in addition to ALS predictors has little beneficial effect on increasing accuracy or precision.Finally, the SAE prediction technique is promising for estimating AGB, although it has previously rarely been used in estimating forest parameters.The results of this study can be used to further explore deep learning algorithms for forest inventory mapping and can be applied to the development of carbon accounting methodologies and inventories.

Figure 1 .
Figure 1.Map of the Terai Arc Landscape (TAL) area in Nepal and airborne laser scanning blocks.

Figure 2 .
Figure 2. Correlation between ALS predictors, i.e., (a) L30_04: height values at the 40 th percentile of first echoes, (b) L30_11: height values at the 10 th percentile of last echoes, (c) L30_15: height values at the 50 th percentile of last echoes, (d) L30_16: height values at the 60 th percentile of last echoes, (e) L30_29: density values of the first echoes (between 1 and 5 m), and AGB: aboveground biomass.r: Spearman's correlation coefficient with a significance level (or p-value) of the correlation.

Figure 5 .
Figure 5. Relative variable importance value used to train weighted least squares (WLS), random forests (RF), extremely randomized trees (ERT), and stacked autoencoder (SAE) prediction techniques for ALS (a), RapidEye (b), Landsat (c), ALS+RapidEye (d), and ALS+Landsat (e).Variable importance values represent the contribution of each variable in the prediction of forest aboveground biomass (AGB).L30_04: height values at the 40 th percentile of first echoes, L30_11: height values at the 10 th percentile of last echoes, L30_15: height values at the 50 th percentile of last echoes, L30_16: height values at the 60 th percentile of last echoes, L30_29: density values of the first echoes (between 1 and 5 m).RE5pcPS_b3: RapidEye red band mean value, HR06: RapidEye sum average, HR08: RapidEye sum entropy, LS_bd1: Landsat blue band mean value, LS_bd3: Landsat red band mean value, LS_bd4: Landsat near-infrared band mean value, LS_bd5: Landsat short-wave infrared band mean value, LS_ARVI: Landsat atmospherically resistant vegetation index.

Figure 6 .
Figure 6.Field-observed versus model-predicted (fitted) AGB using WLS regression.The regression line is presented in blue colour.rRMSE -relative root mean square error, MD − mean difference, and R 2 -Adjusted coefficient of determination.

Figure 7 .
Figure 7. Learning curves (a), activation function (b) and nodes in hidden layers (c) of the SAE prediction technique based on the five sets of predictor variables, i.e.ALS, RapidEye, Landsat, ALS +RapidEye, and ALS+Landsat.Solid lines indicate the mean RMSE values, whereas the bounds of the shaded area indicate the minimum and maximum RMSE value per epoch.Each SAE prediction technique was cross-validated 10 times.

Figure 9 .
Figure 9. Field-observed versus model-predicted (fitted) AGB using extremely randomized trees.The regression line is presented in blue colour.rRMSE -relative root mean square error, MD − mean difference, and R 2 -Adjusted coefficient of determination.

Figure 10 .
Figure 10.Field-observed versus model-predicted (fitted) AGB using stacked autoencoder.The regression line is presented in blue colour.rRMSE -relative root mean square error, MD − mean difference, and R 2 -Adjusted coefficient of determination.

Table 1 .
Forest attributes in the sample plots.SD − Standard deviation, DBH − Diameter at breast height and AGB −Aboveground biomass of all trees per hectare.normalize the heights (z values) of the ALS points.A total of 12 RapidEye satellite image tiles, with a spatial resolution of 5 m consisting of five spectral bands, were acquired between 15 and 25 March, 2011.Radiometric and geometrical corrections were performed on the RapidEye images according to the RapidEye manual (RapidEye 2020).Four Landsat 5 TM satellite images were acquired in January 2010 and February 2011 and corrected radiometrically and geometrically.Multiple linear regression analyses were performed to calibrate Landsat images, as demonstrated byTokola, Löfman, and Erkkilä

Table 2 .
Hyperparameter range and optimal value for each machine learning algorithm.