A comparative analysis of modeling approaches and canopy height-based data sources for mapping forest growing stock volume in a northern subtropical ecosystem of China

ABSTRACT Lidar has been regarded as the most accurate data source for forest-growing stock volume (FGSV) estimation, but inconsistent acquisition dates of lidar data with field survey often result in poor FGSV estimation accuracy. Spaceborne stereo imagery is captured at regular intervals, providing new opportunities for mapping and updating FGSV spatial distributions. Digital Surface Model derived from spaceborne stereo imagery and Digital Terrain Model (DTM) derived from airborne lidar can be used together to produce a canopy height model (CHM) (LS-CHM), which can then be used to predict FGSV spatial distributions, but this methodology has yet to be explored. Our research attempts to compare the performance of LS-CHM and lidar-CHM (L-CHM) for FGSV modeling and to explore the advantages of using the hierarchical Bayesian approach (HBA) over traditional linear regression and random forest modeling approaches when sample size is small. Considering different forest types and topographical conditions, as well as the number of sample plots for each forest type, HBA is used to develop the FGSV estimation model, and the results are compared with those from linear regression and random forest approaches. The research results in a northern subtropical forest ecosystem indicate that overall, L-CHM provides better predictions than LS-CHM using the same modeling approaches, and L-CHM is especially valuable when FGSV is small or large, but when FGSV falls within 100–200 m3/ha, LS-CHM–based variables produce better modeling accuracy than L-CHM–based variables using linear regression or HBA. The HBA based on stratification of both forest type and slope aspect provides the best FGSV estimation, using either L-CHM or LS-CHM, and solves the modeling problem due to limited sample sizes for forest types. Our research provides new insights to using the combination of satellite stereo images and lidar-derived DTM for mapping and updating FGSV in a large area.


Introduction
Forest growing stock volume (FGSV) is an important parameter for measuring quantity and quality of forest resources, as well as for sustainable forest development and management at national or regional scale (Lee and Phua 2010;Yu et al. 2014;Ataee et al. 2019). Traditional forest inventory approaches are based on sample plot measurements, which provide aggregated statistics of forest area and FGSV at different administrative levels, but rarely offer explicit spatial details of forest attributes (Kangas et al. 2018;Chirici et al. 2020). Because fieldwork for collection of FGSV is time-consuming and labor intensive, national forest inventory is usually conducted every 5 or 10 years, making the inventory data out of date for supporting practical forest management. Remote sensing technologies with the advantages of wide coverage and frequent data acquisition are widely used to estimate forest attributes, including FGSV at different scales Dos Reis et al. 2019).
Among many kinds of remotely sensed data, Landsat series are the primary sources for FGSV estimation due to their continuous data acquisitions and appropriate spatial and spectral resolutions (Barrett et al. 2016;Babcock et al. 2018). Recently, relatively high spatial resolution data, including Sentinel-2, ZiYuan-3 (ZY-3), and Gaofen-1 (GF-1), have been increasingly used for FGSV mapping (Li, Xie et al. 2019a;Schumacher et al. 2019). However, optical sensors capture land surface reflectance that provides only horizontal forest canopy features, and have a data saturation problem in dense forest sites (Lu, Batistella, and Moran 2005;Zhao et al. 2016). That is, when FGSV reaches a certain level, the optical sensor is no longer sensitive to FGSV change, resulting in underestimation of FGSV for dense forests that have high FGSV values (Nink et al. 2015;Zhao et al. 2016;Dos Reis et al. 2019;Chirici et al. 2020;Li et al. 2021).
The saturation values vary for sensors, attributing to their spatial and spectral resolutions. For example, Li et al. (2021) examined saturation values of four types of optical satellite data for coniferous plantations in subtropical regions of China, and found that FGSV estimation saturation values of Sentinel-2, GaoFen-2, ZiYuan-3, and Landsat-8 were 434 m 3 /ha, 409 m 3 /ha, 380 m 3 /ha, and 400 m 3 /ha, respectively. Another problem with optical sensors is that data collection is greatly affected by weather conditions, causing a scarcity of high-quality data in cloud-prone areas such as subtropical and tropical regions Long et al. 2019).
Microwave radar sensors have all-weather imaging capabilities, providing the potential for extraction of subtropical and tropical forest attributes. A number of studies have reported using synthetic aperture radar (SAR) data or a combination of SAR and optical data to map FGSV (Chowdhury, Thiel, and Schmullius 2014;Long et al. 2019;Zhang et al. 2019). However, radar data also suffer saturation problem Guo and Zhang 2019). Although InSAR and Pol-InSAR can alleviate the saturation problem to a certain extent and improve FGSV estimation accuracy (Xie et al. 2017;Managhebi, Maghsoudi, and Valadan Zoej 2020), the limited capability in distinguishing vegetation types (Li et al. 2012) makes it difficult to estimate FGSV of specific tree species or forest types using SAR data alone. In addition, the radar backscatter coefficient is greatly affected by topography, which constrains its application for accurate retrieval of FGSV in mountain areas (Sinha et al. 2015).
Lidar has the capability of capturing three dimensional (3D) features of forest stands and becomes an important data source for FGSV estimation (Nord-Larsen and Schumacher 2012; Montaghi et al. 2013;Nilsson et al. 2017;Dong and Chen 2018;Kangas et al. 2018). The canopy height model (CHM), the difference between a digital surface model (DSM) and a digital terrain model (DTM) generated from the airborne lidar, is often used to calculate forest structure metrics, which include canopy height percentiles and other structure features reflecting the variation of canopy height within a unit (e.g. plot size). These metrics are employed as predictors linking forest plot attributes for developing prediction models with parametric (linear or nonlinear regression) or nonparametric algorithms (random forest [RF], k-nearest neighbor [kNN], artificial neural network [ANN], support vector machine [SVM]) (Feng et al. 2017;Packalen et al. 2019;Jiang et al. 2020). For example, in Southeast Asian forests, Coomes et al. (2017) used lidar data to map forest carbon density and obtained relative root-meansquared error (RMSEr) of 13% when top canopy height was used as a single predictor. Leite et al. (2020) compared five algorithms (ANN, RF, SVM, linear, and Gompertz models) for estimating stand volume of Eucalyptus plantations, and found that ANN offered the best estimate for stand volume with RMSEr of 14.4%. However, previous studies also indicated that for individual tree species, regression models can often provide better estimation performance than machinelearning algorithms, considering the time consumed in optimizing the parameters and the overfitting problem in the algorithms, especially when the number of sample plots are small (Feng et al. 2017;Jiang et al. 2020) Although lidar exhibited some advantages over optical and radar data in modeling forest attributes, the large data volume, expensive data collection, and complicated data processing restrict its applications for mapping and updating forest attributes in a large area (Stereńczak et al. 2020). The stereo images, which have the ability to reconstruct 3D forest structures using image-matching techniques, became an alternative because of substantial cost saving compared to lidar (Immitzer et al. 2016). However, stereo images cannot penetrate forest canopy; they mainly capture top canopy features, from which only DSM can be constructed. If high-quality DTM (for example, lidarderived DTM) is available, the combination of DSM from stereo and existing DTM is able to compute forest CHM, which then is used for FGSV or biomass estimation. Previous research showed that high spatial resolution stereo images, especially airborne stereo images, can offer accurate estimates of forest attributes when combined with existing precise high spatial resolution DTM Kangas et al. 2018;Goodbody, Coops, and White 2019;Iglhaut et al. 2019;Ullah et al. 2020). However, operational use of aerial stereo imagery is limited to experimental studies due to its small area coverage (Fassnacht et al. 2017).
Spaceborne stereo imagery has advantages over aerial ones in areal coverage (Hobi and Ginzler 2012;Straub et al. 2013;Ullah et al. 2020) and can be acquired regularly at global scale. Many optical sensors, including WorldView, IKONOS, SPOT, Cartosat, Pléiades, ALOS PRISM, and ZY-3, provide stereo imagery, offering great opportunities for developing DSM data (Straub et al. 2013;Lagomasino et al. 2015;Yu et al. 2015;Vastaranta et al. 2013;Hosseini et al. 2019;Li, Xie et al. 2019a;Ullah et al. 2020). The combined use of a stereo-derived DSM and lidar-derived DTM enable us to calculate a forest CHM (hereafter LS-CHM), which is further used to estimate and update forest attributes at lower cost and in a timely manner. Furthermore, the repetitive availability of spaceborne stereo imagery solves the inconsistent date problem between lidar acquisition and field measurement. Many studies demonstrated the effectiveness of using LS-CHM in estimation of forest height (Beguet et al. 2014;Persson and Perko 2016), FGSV (Straub et al. 2013;Immitzer et al. 2016;Persson 2016), and forest aboveground biomass (AGB) Maack et al. 2015) in temperate and boreal forests.
ZiYuan-3 (meaning Resource-3, ZY-3) satellite is China's first civilian high-resolution stereoscopic Earth Observation Program. To date, three satellites (ZY-3-01/02/03) were launched in space. Both ZY-3-01/02 carried two instruments -a three-line-array (TLA) camera and a multispectral imager. The TLA camera consists of three high-resolution panchromatic cameras (0.50-0.80 µm): one directly orients to the nadir, and the other two have offsets of 22° forward and backward in-flight directions. These threeangle views enable the cameras to simultaneously collect stereoscopic imagery over the same locations. For ZY-3-01, the forward/backward cameras have a resolution of 3.5 m, and the nadir camera has a resolution of 2.1 m. ZY-3-02 improved the resolution of forward/backward cameras to 2.6 m. The multispectral imager consists of four wavelength bands (blue: 0.45-0.52 µm; green: 0.52-0.59 µm; red: 0.63-0.69 µm; and near-infrared: 0.77-0.89 µm) with a ground resolution of 5.8 m. With the high spatial resolution and stereo 3D capability, ZY-3 data were widely used in land-cover mapping Li, Tang et al. 2019b;Huang et al. 2020), DSM generation Liu et al. 2021), urban building height extraction (Liu et al. 2017), road extraction (Zhou et al. 2022), and applications in forestry such as mapping of forest types/tree species and retrievals of forest height, biomass, and growing stock volume (Ni et al. 2015;Li G. et al. 2019;Xie et al. 2019;Li et al. 2021;Wang, Zhang, and Guo 2021). Previous research indicates that combination of height-related variables with spectral data from ZY-3 considerably improved biomass estimation accuracy of Larch plantation in northern China (Li et al. 2019a). However, this approach is not suitable for subtropical regions because of complex forest structures and evergreen species where DSM is not an analog of DTM.
To accurately estimate FGSV, it is important to select proper modeling algorithms for constructing FGSV estimation models. In general, two broad categories of algorithms -parametric and nonparametricare used for FGSV mapping . Linear regression is a commonly used parametric algorithm, but cannot effectively handle the complex nonlinear relationship between FGSV and remote sensingderived variables (García-Gutiérrez et al. 2015). The machine-learning methods, including kNN, ANN, SVM, and RF, can solve the problems of nonlinear and high data dimensionality; however, the performances of these methods vary, depending on the data used, the geographical conditions of the study area, and the sample size (Gao et al. 2018). Another problem of machine-learning algorithms is overfitting, especially when sample plots for modeling are limited or the sample plots are not representative of the forest types in the study area Jiang et al. 2020). The hierarchical Bayesian approach (HBA) requires fewer samples, can produce more accurate estimates than single models (Qian et al. 2015), and can effectively overcome the overfitting problem (Wang and Blei 2018). Previous research has shown that HBA has advantages in estimating biomass or FGSV in complex forest structures over machinelearning methods (Ver Planck et al. 2018;Wang et al. 2019), but it is not clear how different data sources and stratification conditions affect HBA modeling performance.
Since lidar data are not always available for a given study area and the inconsistent dates between lidar and field surveys make developing FGSV estimation models difficult, proper use of spaceborne stereo imagery may offer new opportunities for FGSV modeling if precise DTM data are available. Previous research has not fully examined the potential of the combined DTM and spaceborne stereo imagery in retrieval of forest attributes. Therefore, our research attempts to compare the performances of the combined data with Lidar data alone using different modeling methods and examine the potentials of the combined data for FGSV estimation for different forest types in a subtropical forest ecosystem. Another objective is to explore the advantages of using HBA over using linear regression and random forest, especially when sample size is small. This research can provide new insights for the applications of stereo imagery for mapping and updating FGSV spatial distribution in a timely manner.

Study area
The study area is located in Jinzhai County, in western Anhui Province, with Henan Province to the northwest and Hubei Province to the southwest ( Figure 1). This county covers 3814 km 2 , extending approximately 80 km both from north to south and from east to west. This region has a northern subtropical monsoon climate, characterized by warm winters and humid summers, with annual average temperature of 13.2°C-15.6°C and annual average precipitation of approximately 1500 mm. The topography in this region is undulating with an average elevation of 500 m and an average gradient of 21%. The lowest elevation point of 60 m is located in the northeast, and the highest elevation point of 1729 m (Tiantangzhai Mountain) is in the southwest (Figure 1c). The climatic and topographical differences cause big variations in forest phenology between the north and the south of Jinzhai County.
The county has a forest cover rate of 75% (http:// www.ahjinzhai.gov.cn/zjjz/jbxq/index.html, accessed 14 October 2021). The typical biome is subtropical evergreen broadleaf forest. However, the big differences in temperature, precipitation, and radiation lead to obvious vertical biome zonality: from low to high elevation, the forests change from evergreen broadleaf to mixed evergreen and deciduous broadleaf to deciduous broadleaf. The main tree species of Masson pine, Chinese fir, and broadleaf species (such as sawtooth oak, chestnut, poplar, sweet gum) are interlaced across the area (Figure 1d) (Fu 2018).

Methods
The datasets used in this study include ZY-3 stereo images, airborne lidar, field measurements of sample plots, and the forest map of Anhui Province (Table 1). Figure 2 illustrates the framework for mapping FGSV, consisting of the following major steps: (1) data collection and preprocessing, including collection of field inventory data for calculation of FGSV at sample plot level, and collection of ZY-3 multispectral and stereo images; (2) extraction of DTM and DSM from airborne lidar, and extraction of DSM from stereo images; (3) calculation of canopy height-related variables from airborne lidar-derived CHM (L-CHM) and from LS-CHM and selection of predictive variables using stepwise linear regression (SLR), RF, and the least absolute shrinkage and selection operator (lasso) methods, respectively; (4) development of FGSV estimation models using linear regression, RF, and HBA with various data scenarios; (5) evaluation of modeling results using RMSE, RMSEr, and R 2 .

Field data collection and processing
Fieldwork was conducted in September-October 2019. A total of 71 samples with a plot size of 25.82 m by 25.82 m (=1 mu) were collected within the forestland in Jinzhai County. Traditionally, 1 mu was used for field surveys in China. Our sample plots were allocated over the study area according to forest types and distribution, age group, and workload. During the fieldwork, the coordinates of the four corners of each plot were recorded using Phantom P40 Unistrong Beidou No. 3 High Precision Positioning Board for later coregistration with satellite images. Using advanced swan anti-jamming technology, the P40 instrument provides high precision positioning (horizontal position error within 0.5 m) (http://unistrong.com/ Product/ProShow.aspx?proid=P40). For each sample plot, topographic factors (e.g. slope, aspect) and forest stand characteristics (dominant tree species, canopy closure, age groups) were recorded, and diameter at breast height (DBH) and tree height for individual trees were measured. The volume of an individual tree was calculated using a species-specific volume table of Anhui Province, and FGSV at plot level in m 3 /mu (1 mu = 666.7 m 2 ) were obtained by aggregating individual tree volumes within the sample plot. The FGSV in m 3 /mu was converted to cubic meter per hectare (m 3 /ha). Table 2 describes the statistical characteristics of these collected sample plots according to forest types and slopes. Table 3 summarizes the number of sample plots according to slope aspects and slope ranges.

Collection and processing of remotely sensed data
The airborne lidar cloud point data with point density of 2 points/m 2 were acquired in June 2019 using RIEGL-VQ-1560i lidar scanning system. The Lidar360 software package was used to process the cloud point data. The cloud points were initially labeled as ground and nonground. DTM and DSM at 2 m cell size were generated from ground points and first-return points, respectively, using a triangulated irregular network (TIN) interpolation technique. L-CHM was obtained by subtracting DTM from DSM, and the abnormal values such as negative values and values greater than 50 m were examined and refined by median filtering.
ZY-3 stereo imagery was acquired on 9 April 2020. The stereo imagery consists of three images from different view angles. Considering the large shadow problem in the forward-view image, the backwardview and nadir-view images were used to produce DSM at 2 m cell size using the Geomatica PCI software package. The major steps include (1) conducting relative orientation to create a surface 3D model; (2) conducting absolute orientation to fix the geometric location; (3) creating tie points linking two images; (4) developing an epi-polar image and extracting corresponding DSM. The detailed procedure of DSM extraction from ZY-3 can be found in Xie et al. (2019). The LS-CHM was obtained by subtracting lidar-based DTM from stereo-based DSM, then smoothed using maximum filtering of a window size of 3 by 3 pixels for producing a 2-m resolution CHM image. The abnormal values (greater than 50 m or negative values) were replaced by median value, the same approach as used in L-CHM.

Update of forest distribution
The forest map of Jinzhai County in vector format was extracted from the forestland "One Map" dataset of Anhui Province in 2019. The "One Map" was developed by visual interpretation and digitization of high spatial resolution satellite images such as SPOT 5 by Institute of East China Inventory and Planning, National Forestry and Grassland Administration. The results were then verified with a field survey. Each polygon on the "One Map" was labeled with detailed information, including forest location, land-cover type or forest type, area, growing stock volume, as well as conservation level (Xu et al. 2015). By superimposing the "One Map" on a ZY-3 color composite and the derived L-CHM map, boundaries of land-cover types, particularly forestlands, were manually modified, and a new forestland map was created. Finally, the landcover types were grouped into four categories: broadleaf, Chinese fir, Masson pine, and others, as shown in Figure 1(d).

Selection of explanatory variables
Correlations between FGSV and the derived variables are not always significant, and the potential explanatory variables may be highly correlated with each other. Therefore, it is necessary to screen out the unnecessary variables to improve modeling efficiency. In this study, SLR, RF, and lasso were used for variable selection. SLR starts with only the intercept and adds the strongest variable to the model, then continues to add the most significant variables one at a time to test their significance (Draper and Smith 1981). The process is repeated until no significant variables can be added to and no insignificant variables can be dropped from the model. This reduces multicollinearity and ensures the selected variables are optimal for developing the estimation model. Here, the F-statistical test (F test level of 0.1 and significance level of 0.05) is used to decide whether one variable should be included or not.
RF is a supervised algorithm based on the ensemble learning method and multiple decision trees; it can be used for both classification and regression (Breiman 2001). It randomly draws N sample sets from the training set using the bootstrap sampling method and builds N decision tree models to form an ensemble. Each of the N decision models is used to predict the value of y for a new data point, and the average of all predicted y values is the final prediction result for that data point (Breiman 2001;Belgiu and Dragu 2016). Performance of the RF model is measured internally during the implementation of its function, and it ranks the variables by their importance (Cutler, Cutler, and Stevens 2012). In this study, randomForest in the R package was used for variable selection and FGSV modeling. Three parameters -the number of decision trees or bootstrap samples (ntree), the number of variables randomly sampled as predictors for each split (mtry), and minimum leaf size (nleaf) -must be optimized in the randomForest algorithm. In general, the parameter nleaf directly used the default value of 5. For the parameters mtry and ntree, randomForest was run iteratively starting with values of 100 and p/10 (p is the number of predictor variables) in increments of 100 and 1 until they reached 1000 and p, respectively. The out-of-bag error was recorded at each iteration. Their optimal values were determined by the minimum value of out-of-bag error for the given input dataset. Variable selections were based on the variable importance rankings, which were obtained by comparing the variable<apos;>s contribution to mean square error change (Cutler, Cutler, and Stevens 2012). However, the identified variables may have high intercorrelations. Thus, a correlation analysis was used to determine the removal of variables having high correlation coefficients (e.g. >0.9) but less importance in the ranking list. In this way, the smallest number of predictive variables were selected for establishing the RF models.
Lasso is a regression method performing both regularization and variable selection to improve prediction accuracy and enhance model interpretability. The regularization is implemented by placing a penalty equal to the absolute value of the magnitude of coefficients and shrinks the coefficient estimates of less important variables toward zero. The tuning parameter λ that represents the amount of shrinkage controls the strength of the penalty (Tibshirani 1996). When λ is small, lasso is essentially the least squares regression. The larger λ is, the more coefficients are shrunk to zero, and the simpler the resulting model. With a sufficiently large λ, the coefficients of variables that have little influence on the response variable are forced to zero, and data dimensionality is reduced (Tibshirani 1996). This is particularly useful for variable selection within high dimensionality and high correlation datasets. In this study, glmnet in the R package was used for lasso variable selection (Friedman, Hastie, and Tibshirani 2010). The cv. glmne function in glmnet performs crossvalidation and returns a sequence of λ and the corresponding cross-validation error. By plotting the cross-validation error curve with upper and lower standard deviations against the λ sequences, two special values can be identified. One is lambda.min, which gives the minimum mean crossvalidation error; the other is lambda.1se, which gives the most regularized model under which the cross-validated error is within one standard error of the minimum (2010). Here, we were interested in lambda.1se, with which the minimum number of variables were identified. We specified the 10-fold cross-validation and mean square error as the cross-validation error criteria, searched the optimal λ values, and determined the optimized variables for different input datasets.

Development of FGSV estimation models
Three modeling algorithms -linear regression, RF, and HBA -were used in this study. Linear regression assumes that linear relationships exist between explanatory variables and a response variable, specifically L-CHM -or LS-CHM-derived variables and FGSV here. It can be expressed as where y i is FGSV of the sample plot i, β 0 is a constant, x1, x2, . . ., xp are the selected variables from L-CHMor LS-CHM-derived metrics, β 1 , β 2 , . . ., β p are the regression coefficients associated with the corresponding variables, p is the number of explanatory variables, and ɛ i is the error term. Due to its simplicity and easy interpretation, linear regression is widely applied for FGSV or biomass estimation modeling . However, in practice, the relationships between explanatory variables and the response variable are not always linear, especially when explanatory variables are derived from multisource data (Gao et al. 2018).
RF, with its flexibility and ability to handle multisource data, has been widely used for classification and prediction (Belgiu and Drăguţ 2016;Lu et al. 2016). A brief description of RF theory and optimization of parameters used in RF modeling is given in section 3.2.2. Depending on each data scenario (the selected variables for each dataset: LS-CHM and L-CHM) under stratification (different forest types) and nonstratification (all forest types as one population) conditions, the optimal RF model for each case was used separately to estimate FGSV of the entire study area.
HBA is a statistical model written in a hierarchical form that estimates the parameters of the posterior distribution using the Bayesian method. It combines prior knowledge about model parameters and data evidence, and is well suitable for multilevel analysis (Arhonditsis et al. 2006). The HBA can be expressed as where y i represents FGSV of sample i, the subscript j represents group j, ɑ j is intercept, x1, x2, . . ., xp are the selected predictors, and β1, β2, . . ., βp are coefficients of associated predictors. We assume that ɑ j had a normal distribution and were independent from each other, β1 j ,β2 j , . . . βp j had multivariate normal distribution (MVN), and P represents the covariance between coefficients. HBA was implemented within R packages brms using Stan program language, and its detailed procedure can be found in Bürkner (2017). For HBA, variables selected using SLR, RF, and lasso were used for input predictor variables. In this study, forest types (e.g. broadleaf, Masson pine, and Chinese fir) and slope aspects (e.g. sunny, semi-sunny, semishady, and shady) were used as grouping factors, depending on whether single-stratum (forest type) or double-strata (both forest type and slope aspect) HBA frameworks were designed. We assume there were similar linear relationships between FGSV and each predictor for different forest types and slope aspects but different values of coefficients (a, β1, β2, . . ., βp) for each predictor, implying that the importance of a predictor varies among forest types and slope aspects.

FGSV modeling and evaluation
Different scenarios were designed based on two datasets (L-CHM and LS-CHM), three variable selection approaches (SLR, RF, and lasso), three modeling algorithms (linear regression, RF, and HBA), and two group factors (single-stratum [forest type] and double strata [forest type and slope aspect]) (Table 4). Previous research indicated that stratification based on forest types and/or slope aspects was an effective way to improve modeling performance (Zhao et al. 2016;Gao et al. 2018;Jiang et al. 2020), but it requires more modeling samples for each stratum. Considering the limited number of sample plots in this research, linear regression and RF modeling approaches are based on stratification of three forest types, while HBA modeling can be based on double strata because of its advantages over other modeling approaches for fewer samples. Considering which variable selection method should be used for the three modeling algorithms, linear regression and RF models use the variables identified by their own as described in section 3.2.2. For HBA, the best method is unclear. Therefore, all three variable selection methods (i.e. SLR, RF, and lasso) were applied to both single-stratum and double-strata HBA. A total of 28 scenarios were designed for FGSV modeling, as summarized in Table 4.
Model fitness was assessed using the modeling coefficient of determination (R 2 fit ) which was calculated based on observation values and predicted values of the training sample plots. To evaluate the prediction performance of models, R 2 , RMSE, and RMSEr were calculated based on observation values and predicted values of the test sample plots Feng et al. 2017). The scatterplots illustrating the relationships between FGSV estimates and observation data were also constructed for visual examination of the model performance.
Because of the limited samples for some forest types, Leave-One-Out Cross-Validation was conducted, and cross-validated coefficient of determination (R 2 cv ), RMSE, and RMSEr were computed. We also used the R2R index (Valbuena et al. 2017) to indicate the model overfitting problem under different scenarios. The equations of R 2 fit , R 2 cv , RMSE, RMSEr, and R2R can be expressed as RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi P n i Vp cv i À Vo where R 2 fit is the modeling coefficient of determination (or goodness of model fit); Vp fit j is the predictive value of sample plot j from the fitting model; Vo j is the observation value of sample plot j used in model development; R 2 cv is the cross-validated coefficient of determination; Vp cv i is the predictive value of sample plot i during cross-validation; Vo i is the observation value of sample plot i; Vo is the mean of observation values of the sample plots; Vp is the mean of predictive values of the sample plots; i is the ith sample plot for cross-validation process; and n is the total number of sample plots. Actually, R 2 fit and R 2 cv are the squares of correlation coefficients between predictive values and observation values of the sample plots used during model fitting and crossvalidation, respectively (Book and Young 2006).

Analyses of selected variables and modeling effects
The selected variables using different approaches (Table 5) indicate that the variables selected using SLR and lasso methods are similar for both datasets. H mean was frequently chosen in the LS-CHM scenarios, and more canopy height percentile variables were chosen in the L-CHM scenarios, showing the importance of canopy height in terms of its relationship with FGSV. For the same forest types but different datasets, the selected variables varied, revealing different performances of the datasets in reflecting forest stand structures. The variables selected using RF were much different from those using SLR and lasso; they include more additional forest stand structure variables such as variance, standard deviation, skewness, and kurtosis, indicating that these variables may not have linear relationships with FGSV.
The comparison of goodness of fit (R 2 fit ) for different FGSV modeling scenarios (Table 6) indicates that R 2 fit values from the developed models based on L-CHM are always higher than those based on LS-CHM using the same variable selection method and modeling approach. In terms of modeling algorithms, developing models for each forest type independently using linear regression has much higher R 2 fit values than developing a single model for all forest types as a population, showing the importance of stratification of forest types for FGSV modeling. RF constantly generated higher R 2 fit than linear regression for both datasets, and for both stratification and nonstratification schemes. For the HBA approach, double-strata HBA produced better results than single-stratum HBA, especially for LS-CHM, where the R 2 fit increased by 0.26 on average. In respect to variable selection methods, R 2 fit values were similar with the same dataset, using either single-stratum HBA or double-strata HBA, indicating the limited effects of variable selection methods on HBA performance. Among all scenarios, RF and the double-strata HBA produced higher R 2 fit values than linear regression and single-stratum HBA, especially for LS-CHM.
Previous research showed the overfitting problem in the machine learning approaches, and the R2R results in Table 6 also confirmed that RF had a more severe overfitting problem than linear regression and HBA. Linear regression had smaller R 2 fit values than RF, based on either LS-CHM or L-CHM data scenarios, but the R2R values showed that RF exhibited a severe overfitting problem for every single forest type and all forest types together, and the overfitting problem got worse for LS-CHM. The variables selected by RF included highorder central moments such as skewness and kurtosis, resulting in large model volatility. Another reason is that RF is prone to recruit more variables than linear regression, making the ratio of sample size to predictive variables too small, causing RF model overfitting.

Comparison of modeling results based on overall performance
The R 2 cv , RMSE, and RMSEr values for developed models based on different scenarios (Table 7) indicate that the FGSV models using the L-CHM dataset have better performance than using the LS-CHM dataset, except for the linear regression model with nonstratification scenarios. Overall, the double-strata HBAs have the best prediction performance compared to single-stratum HBAs and nonstratification models, producing the highest R 2 cv , the smallest RMSE, and the smallest RMSEr, but the improvement varies for two datasets: greater improvement in LS-CHM than in L-CHM. These results are consistent with model-fitting R 2 fit trends, indicating that HBA has little overfitting problem. The worst scenarios are the RF model and HBA with variables selected by the RF method. This contradicts the results of the model-fitting results. For instance, R 2 cv from RF stratification scenarios are 0.37 and 0.67 for LS-CHM and L-CHM, respectively, which are much lower than the respective model-fitting R 2 fit values of 0.89 and 0.94. The performances of single-stratum HBAs fall within the ones between stratification and nonstratification linear regression models for both datasets. Regarding the predictor selection for HBA, the two datasets have different responses. For LS-CHM, linear regression is the best, and RF is the worst for both single-stratum and double-strata HBAs, while L-CHM shows no big difference among variable selection methods lasso, RF, and SLR. Among all scenarios, the double-strata HBA with predictors selected using lasso based on the L-CHM dataset produced the best prediction, while RF with nonstratification scenarios based on the LS-CHM dataset was the worst case. Through analyses of model prediction performances among different scenarios, the best scenarios were selected from each modeling algorithm corresponding to each dataset and applied to the entire forestland for producing an FGSV distribution map. Specifically, linear regression and RF with a stratification scheme for both datasets, singlestratum HBAs with SLR variable selection for both datasets, double-strata HBA with SLR variable selection for LS-CHM, and double-strata HBAs with lasso variable selection for L-CHM were chosen. The FGSV distributions in partial study areas, which were estimated using models selected above, indicate that FGSV estimates have similar spatial patterns (Figure 3), except with RF models. Overall, the L-CHM-based estimation results have more pixels within the FGSV less than 100 m 3 /ha than the LS-CHM-based estimates. The RF model produced narrower FGSV estimation ranges than other models used in this research, especially the HBA models.
The scatterplots showing the relationships between FGSV estimates and sample plot data ( Figure 4) indicate that the double-strata HBA performed the best for both datasets (a4, b4), while RF performed the worst (a2, b2). The overestimation and underestimation problems were obvious for all prediction results, regardless of the datasets and modeling approaches used, a similar conclusion as in previous research (e.g. Feng et al. 2017;Gao et al. 2018). This situation became worse for the LS-CHM-based prediction using the RF approach, where the range of prediction values fell within 50-250 m 3 /ha (a2). Double-strata HBA considerably improved the prediction of large FGSV, especially for FGSV greater than 250 m 3 /ha (a4, b4). In terms of datasets, the L-CHM-based prediction provided higher accuracy and more stable results than LS-CHM; however, for the medium ranges of FGSV (100-200 m 3 /ha), LS-CHM offered better prediction than L-CHM when double-strata HBA was used (a4, b4).

Comparison of modeling results based on forest types, slope aspects, and FGSV ranges
The overall assessment of modeling results based on datasets and modeling algorithms cannot provide detailed information on how different forest types, slope aspects, and FGSV ranges affected the FGSV estimation under different scenarios. The RMSE and RMSEr results from different scenarios according to forest types (Table 8) showed that for all forest types, L-CHM-based prediction is better than LS-CHM, regardless of which algorithms were used. Among four modeling algorithms, RF resulted in the highest RMSE and RMSEr for all forest types for both datasets. Double-strata HBA largely increased the estimation accuracy of Chinese fir for both L-CHM and LS-CHM, especially for LS-CHM where RMSEr decreased by 19.5% compared to linear regression. The best models for different forest types depended on the datasets used: with LS-CHM data, linear regression performed the best for broadleaf forest and pine forests, while double-strata HBA was the best for Chinese Fir; however, with L-CHM data, the doublestrata HBA produced the best estimation for broadleaf and Chinese fir, while there was no big difference among modeling approaches for pine forest.
The results in Table 7 indicate that double-strata HBA improved the overall performance, especially for LS-CHM, which was attributed to the improved FGSV estimation of Chinese fir (Table 8) by introducing the slope aspect factor. Therefore, it is necessary to analyze the impact of slope aspect on FGSV estimation. In terms of aspects, the RMSE and RMSEr values from different datasets and algorithms (Table 9) indicate Notes: HBA represents hierarchical Bayesian approach, L-CHM and LS-CHM represent the lidar-based canopy height model and the canopy height model derived from combined stereo-derived digital surface model and Lidar-derived digital terrain model, respectively; SLR and RF represent stepwise linear regression and random forest; lasso represents least absolute shrinkage and selection operator; R2R is the ratio of modeling coefficient of determination (R 2 fit ) to the cross-validated coefficient of determination (R 2 cv ); RMSE and RMSEr represent root mean square error and relative root mean square error.
that stratification of aspects markedly increased the accuracy of LS-CHM-based FGSV estimation, especially for sunny and shady slopes, where RMSEr decreased about 10% compared to linear regression; however, for L-CHM-based FGSV estimation, the improvement was limited and was worse for sunny and semi-sunny slopes than linear regression. This is probably because the quality of optical ZY-3 stereo images is considerably affected by terrain and viewing angles, but airborne lidar is not.  The RMSE and RMSEr according to different modeling algorithms in terms of FGSV ranges (Table 10) indicate that FGSV estimation has the lowest RMSE or RMSEr when FGSV values fall in medium range (100-200 m 3 /ha), the highest RMSE when FGSV is greater than 250 m 3 /ha, and the highest RMSEr when FGSV is less than 50 m 3 /ha, regardless of modeling approaches and datasets used. As shown in Figure 4, overestimation for low FGSV and underestimation for high FGSV are obvious. For most of FGSV ranges, L-CHM produced higher estimation accuracy than LS-CHM when using the same modeling approach, but at medium FGSV ranges (100-200 m 3 / ha), LS-CHM produced better estimation results than L-CHM. Overall, double-strata HBA provided better estimation results than others at each FGSV range, especially at the medium to high FGSV levels (>150 m 3 /ha) for LS-CHM. For example, RMSEr values decreased to 20.2% using double-strata HBA compared to linear regression, RF, and single-stratum HBA, implying that double-strata HBA is suitable to estimate medium and high FGSV. However, at low FGSV level (<100 m 3 /ha) based on L-CHM, linear regression resulted in the lowest RMSE and RMSEr, implying that linear regression is good in estimating low FGSV. These results indicate the value of using LS-CHM when FGSV falls within the range of 100-200 m 3 / ha and demonstrates the important role of slope aspect as a stratification layer, especially when FGSV is greater than 250 m 3 /ha.

Selection of suitable datasets
It is well known that airborne lidar is the most reliable data for accurately depicting 3D forest stand structure, but considering the cost for lidar data collection, airborne lidar data have not been extensively applied to real applications, especially in large areas. Spaceborne stereo imagery has the advantages over airborne lidar data in collection of DSM data in coverage and temporal scale, providing new opportunities to model FGSV. The key to using spaceborne stereo imagery is to accurately extract DSM, in addition to having precise DTM. Previous research showed that the ZY-3-derived DSM data have a positional error of 3 m and a horizontal error of 4 m (Pan et al. 2013;Ni et al. 2015;Liu et al. 2019). We compared the ZY-3-based DSMs and lidar DSMs at sample plot level, and found an average difference of 5.23 m between these two DSMs. The viewing geometry may change the abundance and locations of shadows on different slope aspects, resulting in diverse degrees of texture richness in the images  of different views, thus causing high discrepancies in canopy height between sunny and shady aspects during the imaging process for DSM generation. Such discrepancies vary depending on the aspects, with sunny slope being the smallest at 3.47 m, shady slope being the largest at 7.89 m, and semi-sunny/semi-shady being between them. Because of the different impacts of slope aspects on DSM retrieval accuracy, incorporation of slope aspect into HBA did result in much improved estimation, and RMSEr decreased from 42.9% using single-stratum HBA to 32.8% using doublestrata HBA (see Table 7). On the other hand, Lidar is an active data acquisition system that is not affected by terrain, thus slope aspect has little impact on FGSV modeling. As shown in our research, incorporation of aspect into HBA did not improve FGSV estimation. This research indicates that the LS-CHM-based model provides even better estimation accuracy than the L-CHM-based model when FGSV falls within 100-200 m 3 /ha. Forest stands with FGSV falling within this range usually are characterized by medium canopy closure. The possible reason is that LS-CHM reduces the impacts of canopy gaps within a forest site, while L-CHM captures forest gaps and some understory vegetation such as shrubs, resulting in lower values of variables such as mean. On the other hand, when FGSV is small, the L-CHM-based model performs much better than LS-CHM-based models. This may be due to the fact that understory and soils in forests with relatively small density have considerable impacts on the spectral features and the stereo-derived DSM is extracted from the spectral bands with different view angles, while L-CHM can capture the differences in forest height and background of soils and shrubs. When FGSV is greater than 250 m 3 /ha, the data saturation of spectral bands with different view angles cannot effectively represent the FGSV changes, thus cannot accurately extract DSM, while lidar can still capture the subtle differences within forest stand structures. Therefore, this research shows that when FGSV is small (e.g. <50 m 3 / ha) or high (e.g. >250 m 3 /ha), L-CHM has the advantage over LS-CHM in predicting better FGSV estimation, while the performance is inverse when FGSV falls within the medium range of 100-200 m 3 /ha.
Satellite optical stereo images offer great opportunities for retrieval of the forest height features. However, stereo images alone are rarely able to effectively produce forest height features. Only with a valid and precise DTM can accurate forest canopy height be obtained. In this study, FGSV prediction based on LS-CHM produced reasonable accuracy with RMSE of 46.4 m 3 /ha and RMSEr of 32.8% for the best scenario. The accuracy is lower than that based on the lidar data but is comparable with previous research such as Rahlf et al. (2014) and Neigh et al. (2014), who used stereo aerial photographs and WorldView-2 stereo images for FGSV estimation of boreal coniferous forests. Thus, our study offers valuable information for the generation and regular updates of FGSV as long as precise DTM is available.
The spatial resolution of stereo imagery greatly impacts the accuracy of retrieved DSM and forest canopy height. Previous research indicated that aerial stereo photographs and very high spatial resolution WorldView-2 (resolution <0.5 m) are suitable for estimation and monitoring of forest attributes, while cartosat-1, SPOT-6, QuickBird, and ZY-3 (resolution <4) provide relatively poor quality of DSM (St-Onge et al. 2008;Straub et al. 2013;Tian et al. 2017;Ullah et al. 2020). With the advances in space technology, the spatial resolution of satellite stereo imagery has steadily improved, and more satellites capable of capturing stereo images with submeter resolution have been launched (e.g. WorldView-3, 4, and Gaofen-7), providing the research community rich data sources for forest monitoring.

Selection of modeling approaches and use of stratification strategies
Previous studies have indicated that the number and representativeness of sample plots are important factors influencing model performance (Feng et al. 2017;Gao et al. 2018). This is especially true when the sample size is small relative to predictive variables, resulting in high prediction errors in RF-based models, as confirmed in this research. Linear regression assumes the linear relationships between response variable and predictive variables, and requires the residuals to follow normal distribution. Chinese fir with fewer sample plots and unbalanced FGSV distribution violated this assumption, leading to poor estimation. HBA within the Bayesian framework is able to derive an entire probability distribution over the value space for each parameter of interest or explicitly incorporate prior knowledge about parameters into the model. Thus HBA is suitable for complex multilevel models involving nonlinear relationships between variables and abnormal distribution at multiple levels (Gelman, Hwang, and Vehtari 2013). Our study verified such advantages of HBA, wherein double-strata HBA, especially, exhibited powerful prediction for Chinese fir, which has limited sample plots and unbalanced FGSV distribution. In estimation of forest attributes using remotely sensed data, stratification based on forest types often can achieve better results than nonstratification (Zhao et al. 2016;Gao et al. 2018;Jiang et al. 2020). In this study, three forest types, including Chinese fir, Masson pine, and broadleaf, were used as a stratification factor and were implemented into three FGSV modeling algorithms; however, they demonstrated different behaviors, depending on the data used. First, the model developed independently using linear regression through stratification by forest types performed better than the nonstratification model, while single-stratum HBA<apos;>s performance lay between them and double-strata HBA performed the best. Generally, FGSV has a close relationship with forest height, and such relationships vary among forest types due to their own canopy structures. Building species-specific models recognizes the potential difference in the FGSV and heightderived variables, resulting in more accurate estimates than a single model using combined sample datasets. Building a single model based on the combination of sample plots from all forest types leads to great uncertainty (large RMSEr), while developing models separately using reduced sample sizes leads to high variability in estimated mean FGSV across the study area. The single-stratum hierarchical model is the compromise of the two approaches, which not only provides the overall FGSV pattern, but also retains forest-specific features. Due to the shrinkage effect in the Bayesian model, the estimates are always closer to the overall central tendency than the stratification condition (group means) (Qian 2016); thus, using a shrinkage estimator can improve the overall accuracy of the assessment.
Theoretically, finer stratification and more accurate estimates for specific groups require larger sample sizes for developing rigorous models for each group. However, conducting many field observations is not always practicable due to the time and labor required for forest inventories. In our study, the limited number of sample plots did not allow us to develop models separately at finer levels, although we have considered other impacting factors such as slope aspect. For example, grouping sample plots based on forest types and slope aspects and developing linear regression or RF models for individual groups are unreasonable due to the small sample size (total of 71 sample plots). The HBA is not limited by the sample size. Although the Bayesian model groups the samples according to the specified criteria, each group is still controled by the overall samples, ensuring the stability of the grouping, even if there are only few samples in a group. However, careful consideration should be taken regarding the number of levels and choosing stratification factors. Generally speaking, whether the HBA is beneficial is determined by two factors: the ratio between-group variance to sample size and the ratio within-group variance to sample size for each group. When the sample size is large, or within-group variance is small, or the between-group variance is large, HBA estimates are similar to group means, suggesting the group mean is a reliable estimate for that group; thus, HBA is not necessary. In this study, when using only forest type as stratification, HBA estimates are poorer than individual group estimates. However, after introducing slope aspect as another stratification factor into HBA, the number of groups increased from 3 to 12, which greatly improved HBA estimates, especially for the LS-CHM dataset. Our research shows the advantages of using HBA over other machine learning algorithms when a limited number of sample plots for different forest types are available.

Design of an optimal modeling procedure for FGSV estimation
Implementing remote sensing-based FGSV estimation is actually a comprehensive procedure that needs to carefully design each processing stage. The major stages include collection and organization of sample plots and remote sensing and ancillary data; extraction and selection of variables that are used for FGSV modeling; development of FGSV estimation models using linear regression or machine learning algorithms; and application of the developed models and evaluation of prediction results ). Uncertainty analysis is also an important part of FGSV modeling (Chen, Vaglio Laurin, and Valentini 2015;Xu et al. 2018) but has not obtained sufficient attention. The major reason may be the difficulty in collecting a sufficient number of sample plots and the complexity of factors influencing model performance. In reality, uncertainty analysis is especially valuable for identifying major factors that induce errors, thus proper measures must be taken to optimize the modeling procedure . To date, no universal procedure is available for FGSV modeling, considering the difference in physical conditions of the study areas and specific data sources used. This research provides new insights for accurate FGSV mapping using HBA through a combination of satellite stereo imagery and precise DTM data for extraction of modeling variables and incorporation of forest types and terrain conditions as stratification layers. More research is needed to explore the FGSV modeling procedure that can be transferred to different study areas, at least for the same forest types based on variables related to canopy height.

Conclusions
This study evaluated the potentials of ZY-3 stereo and airborne lidar data for FGSV estimation using different stratification schemes and modeling approaches. Overall, the double-strata HBA provided the best estimation results for both datasets, while RF was the worst. Considering a specific forest type, for example, linear regression models were the best for broadleaf and Masson pine. The double-strata HBA demonstrated advantages in handling small sample size and unbalanced data distribution through improving prediction of Chinese fir forest. This study also showed that the parametric models were more appropriate for individual forest FGSV estimation than nonparametric models when the sample size was insufficient. Comparative analysis of the FGSV modeling results based on three variable selection methods indicated that the stepwise regression or lasso method was more effective than RF for selection of predictive variables for the Bayesian model, and incorporation of slope aspect into the Bayesian model considerably improved FGSV modeling effects.
The L-CHM offered better FGSV estimation than LS-CHM when the same modeling approach was used, especially for estimating very low (<50 m 3 /ha) and very high (>250 m 3 /ha) FGSV values. However, when the FGSV falls within 100-200 m 3 /ha, LS-CHM provided better estimation performance than L-CHM, indicating the potential valuable for FGSV estimation, especially if more accurate DSM from higher-resolution stereo images is available to improve the modeling performance for relatively low or high FGSV ranges. Although similar spatial patterns of FGSV estimates over the study area were achieved using different modeling approaches, there are considerable discrepancies in some areas due to the low accuracy of DSM from ZY-3 stereo image matching. Therefore, obtaining accurate DSM is required when using stereo images for FGSV modeling, which may be achieved by increasing spatial resolution of stereo images and improving DSM retrieval algorithms.
The advantages of using HBA are that (1) it can provide better estimates when sample size is insufficient for some forest types where traditional parametric approaches cannot be used to develop estimation models, (2) it can reduce the overfitting problem in machine learning approaches such as RF, (3) it can use multiple data sources such as forest types, slope aspects, and different regions as stratification layers. In most cases, collection of a large number of sample plots for different forest types is a challenge considering the intense labor and expense required; thus, HBA is recommended for FGSV estimation when a detailed and accurate forest distribution map, high-resolution DEM data, and other ancillary data are available.