Mapping Lorey’s height over Hyrcanian forests of Iran using synergy of ICESat/GLAS and optical images

ABSTRACT Lorey’s height, representative of mean height in uneven-aged forest stands, is a valuable parameter for forest ecosystem management. While in situ measures provide the most precise information, remote-sensing techniques may provide less expensive but denser and more operational alternative of Lorey’s height estimation over highly mountainous areas. This research aims first to evaluate the performances of two nonparametric data mining methods, random forest (RF) and artificial neural network (ANN), for estimation of Lorey’s height using ice, cloud and land elevation satellite/geoscience laser altimeter system (ICESat/GLAS) in Hyrcanian forests of Iran and then to provide Lorey’s height map using a synergy of ICESat/GLAS and optical images (TM and SPOT). RF and ANN GLAS height models were developed using waveform deterministic metrics, principal components (PCs) from principal component analysis (PCA) and terrain index (TI) extracted from a digital elevation model (DEM). The best result was obtained using an ANN combining first three PCs of PCA and waveform extent ʺWextʺ (RMSE = 3.4 m, RMSE% = 12.4). In order to map Lorey’s height, GLAS-estimated heights were regressed against indices derived from optical images and also topographic information. The best model (RF regression with RMSE = 5.5 m and = 0.59) was applied on the entire study area, and a wall-to-wall height map was generated. This map showed relatively good compatibility with in situ measurements collected in part of the study area.


Introduction
Measuring biophysical parameters of forests is required for forest ecosystem management. Tree height has a primary and fundamental importance among all other parameters. Information about vertical structure of forest is important for ecosystem health assessment, site fertility, biomass and carbon cycle measurement and monitoring, and also biodiversity conservation (Cairns, Barker, Shea, & Haggerty, 1995;Namiranian, 2006).
Although in situ measures provide the most precise information of forest height, they are laborious, costly, time consuming and even impractical in some area. Many forest stands, all over the world, are located in inaccessible or remote areas. This highlights the importance of remotely sensed techniques in global estimation of forest biophysical parameters. Development in altimetry technology, especially lidar (light detection and ranging), facilitates three-dimensional measurement of objects on the ground. Many studies have been performed using airborne lidar to estimate different forest properties like height, volume and biomass (Andersen, Reutebuch, & McGaughey, 2006;Chen & Hay, 2011;Drake et al., 2002;El Hajj et al., 2017;Khorrami, Darvishsefat, Tabari Kochaksaraei, & Shataee Jouybari, 2014;Latifi, Nothdurft, & Koch, 2010;Lindberg & Hollaus, 2012;Takahashi et al., 2010). The critical point about airborne lidar is that it is expensive and also the capacity to collect annual data over whole countries does not currently exist. Applying spaceborne lidar for estimation of forest biophysical parameters over large extent area has been investigated since ice, cloud and land elevation satellite (ICESat) was launched into the space in 2003. The geoscience laser altimeter system (GLAS) onboard ICESat operated for a total of 18 missions during its operational years (2003)(2004)(2005)(2006)(2007)(2008)(2009). It provides a full waveform of illuminated objects in nominally 70-m-diameter footprints on the surface.
In general, over flat areas, the distance between signal start and ground peak of the waveform is considered as canopy height . The waveform extent (distance between signal start and signal end) is broadened over sloped terrain, and identification of canopy top and ground peak can be difficult due to mixed returns from vegetation and ground surfaces (Chen, 2010;Lefsky et al., 2005). Thus, slope is an important issue in applying GLAS data to estimate forest height over the mountainous area that challenges this research. The current study was carried out in part of Hyrcanian forests of Iran with complex structure vertically and horizontally, a mixture of even and uneven aged stands, diverse broadleaf species and severe topography. To deal with the slope effect most recent researchers have used parametric analysis like multiple regressions (MLR) to retrieve forest canopy height (Chen, 2010;Duncanson, Niemann, & Wulder, 2010;Lefsky et al., 2005;Lefsky, Keller, Pang, De Camargo, & Hunter, 2007;Rosette, North, & Suárez, 2008) based only on metrics extracted from GLAS waveforms or in combination with topographic information extracted from digital elevation models (DEMs). A few studies used nonparametric analysis in order to estimate forest canopy height or above ground biomass (AGB) from GLAS data. Fayad et al. (2014) andRajab Pourrahmati et al. (2016) applied both MLR and random forest (RF) on metrics extracted from waveforms to estimate some forest biophysical parameters. An artificial neural network (ANN) method was also employed by Fu, Guoqing, and Zhifeng (2009) and Nelson et al. (2009) to predict AGB based on numerous metrics extracted from GLAS waveforms.
While lidar is a promising technique for forest structure measurement, it typically does not provide wall-to-wall coverage over large (regional to national scale) areas. Although a number of researchers have used GLAS data in combination with other sources of remote-sensing data to derive canopy height, they developed products that are global assessments of canopy height at a coarse scale (Fayad et al., 2016;Lefsky, 2010;Peterson & Nelson, 2014;Simard, Naiara, Fisher, & Baccini, 2011;Ting et al., 2015).
In this research, firstly, a comparative assessment of the two nonparametric methods of RF and ANN was done for estimation of Lorey's height (defined as the mean height of trees weighted by their basal area) using GLAS data in complex condition of Hyrcanian forests. ANN and RF are two machine-learning techniques, which are currently valuable tools for ecological modeling, and are especially useful in analyzing large data sets and identifying nonlinear relationships (Drew, Wiersma, & Huettmann, 2011). For both, the data conditioning is important since prediction beyond the range of training data is not possible. On one hand, RF is known as the most efficient algorithm for classification (Fernández-Delgado, Cernadas, Barro, & Amorim, 2014) and it handles both categorical and continuous data as well as missing values. It also benefits from great advantages of simplicity, ability of determining feature importance and lack of necessity for cross validation or a separate test set to get an unbiased estimate of the classification/regression error (Breiman, 2001;Liu, Wang, Wang, & Li, 2013). On the other hand, the advent of deep learning, a subset of machine learning in artificial intelligence with complex networks capable of unsupervised learning from unstructured data, in the late 2000s sparked renewed interest in neural nets. ANN is known to deal with very highly nonlinear and complex data, even on small data sets without a prior knowledge about the relationships between input and output variables. It has been shown that a neural network can approximate any function (Zhang, 2000), and many advanced applications currently make use of neural networks (Kumar & Thakur, 2012). However both methods have their own strengths and weaknesses, there are few studies in different fields of environmental issues (Olaya-Marín, Martínez-Capel, & Vezza, 2013;Were, Bui, Dick, & Singh, 2015) comparing the predictive performance of these two common machine-learning algorithms for a regression issue (i.e. when a continuous variable is predicted). It is worth noting that there is no single best algorithm for solving all or even one particular problem. Keeping this in mind, it is interesting to compare the performance of these two powerful techniques for predicting Lorey's height. It is important to highlight that such a comparison for prediction of forest biophysical parameters has not yet been presented in a literature.
Following the importance of Lorey's height measures, the synergistic use of spaceborne lidar (ICESat/GLAS) and optical data (SPOT-5 and Landsat5-TM) was considered to meet the need for a geospatial layer of Lorey's height. In spite of the most studies, in this research, production of a spatial map at a local to regional scale was subject to address requirements of local users.

Study area
This research was performed in the mountainous deciduous forests of Nowshahr, a subset of Hyrcanian forests, with severe topography (slopes ranging from flat to greater than 80%), located between 36.26̊to 36.68̊N latitudes and 51.32̊to 51.94̊E longitudes in the north of Iran ( Figure 1). The study area is dominated by oriental beech (Fagus orientalis), European hornbeam (Carpinus betulus), chestnut-leaved oak (Quercus castanifolia) and Persian ironwood (Parotia persica) at mid-elevations, and oriental hornbeam (Carpinus orientalis), and Persian oak (Quercus macranthera) at high-elevations with elevation range of 100-2200 m.

Data
Remote-sensing data The GLAS laser, transmitting 40 pulses per second to the surface, was illuminating the ground and objects in footprints with almost 70 m in diameter spaced at 170 m along track. Laser-reflected energy by all intercepting objects in each footprint was recorded by a telescope, resulting in a waveform representing the vertical profile of laser-illuminated surfaces. Data from all GLAS missions dated from 2003 to 2009 over the study area were employed. GLA01 and GLA14 products among 15 products, produced by NSIDC, were used to extract required information. GLA14 provides surface elevations for land. It also includes the laser footprint geolocation and reflectance, as well as geodetic, instrument and atmospheric corrections for range measurements. GLA01 includes the transmitted and received waveforms (Rosette et al., 2008;Wang et al., 2011).
A 10-m local DEM provided from 1:25,000 topographic maps was used to extract topographic information. Due to severe topography in the study area and its effect on GLAS waveform characteristics, a terrain index (TI) defined as elevation range within a moving window size of 7 × 7 on DEM was calculated and used in Lorey's height regressions to mitigate slope effects. Furthermore, slope, aspect and elevation classification maps were derived from the DEM to assess their contribution in the generation of a spatially continuous height map for the study area.
Two sources of optical data were used in this study: (1) Calibrated data (already corrected for atmospheric effects and orthorectified) of Landsat-5 TM available from the EarthExplorer website of the USGS (http://earthexplorer. In situ measurements Field measurement was done in three phases: September 2013, May 2014 and August 2016. In order to develop and validate Lorey's height models based on GLAS data, in situ data were collected over 60 circular plots (70 m in diameter) in September 2013 and May 2014 at the location of GLAS footprints (L3I and L3K missions acquired on October 2007 and 2008, respectively) that were navigated using GPS (Garmin Colorado 300). Since the study area covers climax community comprised of slow-growing deciduous species, the time interval between in situ measurement and lidar acquisition was neglected. Furthermore, spaceborne lidar is less sensitive to little changes in forest in comparison with very high spatial resolution data like airborne lidar. Diameter at breast height (DBH) of all trees greater than 7.5 cm, and heights of 11 trees were measured in each plot. To obtain the height of all trees in each plot (required for Lorey's height calculation), a variety of nonlinear models relating DBH to height, recommended in different studies, were developed and validated for four species as (1) F. orientalis, (2) C. betulus, (3) Q. castanifolia, (4) Alnus subcordata, and two groups of species which are similar in terms of shape and height: Group1 includes Tilia begonifolia, Acer velutinum, Acer cappadocicum, Sorbus torminalis and Fraxinus excelsior, and Group2 includes Q. macranthera, C. orientalis, P. persica and Diospyros lotus (refer to Rajab Pourrahmati et al., 2016). In the next step, Lorey's height was calculated using following equation 1: Where H Lorey , BA i , DBH i and H i are Lorey's height (m), basal area (cm 2 ), diameter at breast height (cm) and height (m) of tree i, respectively, and n is total number of trees in each plot. Third mission of field measurement was performed in August 2016 to validate the Lorey's height map produced from a synergy of GLAS, optical images (Landsat-TM and SPOT-5) and topographic information. Data were collected in 32 circle plots (each 0.1 ha) dispersed over part of the study area, randomly. Figure 2 shows location and distribution of all field measurements (92 plots) over the study area.

Icesat GLAS data analysis
In order to provide useful GLAS data over forested areas, some preprocessing was performed which led to eliminating cloud-contaminated, noisy and saturated waveforms (Rajab Pourrahmati et al., 2016). Two sets of information were extracted from GLAS waveforms to be used for estimation of Lorey's height: (1) Metrics representing vertical distance between different positions of waveform and ground peak which are called "waveform metrics" hereinafter such as waveform extent (W ext ) defined as the vertical distance between signal start and end, lead and trail-edge extent (H lead and H trail ) as the vertical distance from ground peak to signal start and signal end, respectively (Hilbert & Schmullius, 2012) and quartile energy heights (H 25 , H 50 , H 75 and H 100 ) measured from the ground peak (Nelson et al., 2009). These metrics rely on identification of ground peak which is prone to error over sloped terrain because of the slope broadening effect and mixed returns from ground and vegetation (Chen, 2010;Fayad et al., 2014;Lefsky et al., 2005;Pang, Lefsky, Sun, Miller, & Li, 2008;Rajab Pourrahmati et al., 2016). Figure 3 shows a waveform with some extracted deterministic metrics. (2) Principal components (PCs) produced by principal component analysis (PCA) on waveforms. PCA reduces the dimensionality of the data set by computing a new set of variables (PCs) Figure 2. Location of in situ plots over a hillshade of the study area. Yellow points correspond to the plots located at the center of GLAS footprints to develop and validate GLASbased height models. Red points show plots measured accidently for validation of final height map. which are uncorrelated and ordered such that the kth PC has the kth largest variance among all PCs (Yeung & Ruzzo, 2001). The idea of applying PCA was to avoid the need for identifying ground peak, unlike for waveform metrics explained above. Individual signal intensities of waveform samples (bins) were the data source for PCA analysis. Since equal length of waveforms should be used for the analysis, the length of longest W ext (equal to 400 bins) was considered as basis and all other waveforms were essentially referenced to this one. So, start from signal start, which can be at different location in each waveform, 400 bins were apart from each waveform while adding zero to upper bins beyond the signal end up to 400 bins. Since the number of observations (60 waveforms) is less than the number of bins (400), a subsample (41 bins) was created through a systematic random sampling (picking one bin every 10th bin) and used for PCA analysis instead of full samples. The metrics extracted from GLAS waveforms and their derivatives, used in this research, are listed in Table 1.

Extraction of vegetation and texture indices from optical data
As mentioned above, optical images were used to generate Lorey's height map. Several studies have shown relationship between forest structure and vegetation indices. Freitas, Mello, and Cruz (2005) evaluated relationships between forest structure (frequency of multiple-stemmed trees, density of trees, mean and range of tree diameter, mean and range of tree height and average of basal area) and vegetation indices including normalized difference vegetation index (NDVI) and moisture vegetation index (MVI) extracted from Landsat7-ETM+ images in Atlantic rainforest fragments in southeastern Brazil. They showed that MVI outperformed in dense humid forests, whereas NDVI is a good indicator of green biomass in deciduous and dry forests. They observed a weaker saturation effect and a higher sensitivity to MVI rather than NDVI over dense canopies in the Atlantic rainforest. Pascual, García-Abril, Cohen, and Martín-Fernández (2010) have reported high correlation between NDVI and MVI extracted from Landsat-ETM+ and mean and median lidar-derived heights (R > 0.6) in pine forests of the Fuenfria Valley in central Spain.
Nichol and Md.L.R (2011) estimated forest biomass based on simple ratio vegetation index (RVI) derived from AVNIR-2 and SPOT-5 with R 2 of 0.59 and 0.39, respectively. They observed a significant improvement in biomass estimation with an R 2 of 0.739 obtained from the combined use of RVI of both sensors.
In this study three vegetation indices, NDVI, MVI and RVI, were extracted from Landsat-5 TM and SPOT-5 multispectral bands using model maker in the ERDAS software (equations 2 to 4): Several other indices including minimum, maximum and mean values of NDVIs, MVIs and RVIs (So-called min-ndvi, max-ndvi, mean-ndvi, etc.), and also meansummer and mean-winter values were produced using vegetation indices from different dates. Image texture defined as variation of image tones that are related to the spatial distribution of forest vegetation (Roberts, Tesfamichael, Gebreslasie, Van Aardt, & Ahmed, 2007) has also proved to be capable of identifying different aspects of forest stand structure (Attarchi & Gloaguen, 2014;Kayitakire, Hamel, & Defourny, 2006;Nichol & Md.L.R, 2011;Trinder, Shamsoddini, & Turner, 2013). In this research eight statistical maps including "mean", "variance", "homogeneity", "contrast", "dissimilarity", "second moment", "entropy" and "correlation" were derived from gray level co-occurrence matrix (GLCM) texture analysis on mean NDVI image in ENVI. The GLCM characterizes the texture of an image by calculating how often pixel pairs with specific values and in a specified spatial relationship occur in an image and then statistical measures are extracted from this matrix.

RF regression to estimate Lorey's height from GLAS data
RFs, as an ensemble learning method developed by Breiman in 2001, operate by constructing a multitude of regression trees (Breiman, 1994). Each tree in the forest is made of a random subset of observations with replacement and also a random of explanatory variables. Principal components produced from PCA analysis ln: natural logarithm (the logarithm to the base e = 2.718), power n = 0.5, 1,. . ., 3, i = ith component produced from PCA Two important parameters in RFs are the number of trees in the forest and the number of variables in the random subset at each node of the tree. For regression application, prediction output of the forest is based on the average prediction of individual regression trees (Breiman, 2001;Liaw & Wiener, 2002). Through random sampling of observations, about one-third of them are not used for any individual tree and are called out of the bag, "OOB", for that tree. The accuracy of an RF's prediction can be estimated from these OOB data (Breiman, 2001;Grömping, 2009).
In this research, numerous RF regressions were developed based on different combinations of predictors (waveform metrics and PCs) using the "RandomForest" package in R provided by Liaw and Wiener (2014). In situ Lorey's heights collected at the location of GLAS footprints in 2013 and 2014 (60 plots) were used as response variable. Two important parameters in RF are mtry (number of random variables used in each tree) and ntree (number of trees used in the forest). We built RF with default mtry (square root of total number of all predictors) and different ntree values noting that more consistent results of RF could be achieved by increasing the number of ntree (Liaw & Wiener, 2002). The relative importance of the different metrics used in RF for the Lorey's height estimation was also analyzed. Variable importance is calculated by determining how much worse the OOB predictions would be if the data for that variable are randomly permuted (Liaw & Wiener, 2002). To validate and compare the developed models, statistical criteria including adjusted coefficient of determination (R 2 a ) as an indicator of the fit quality and root mean square error (RMSE) as a measure of accuracy were calculated based on a fivefold cross validation. Mean absolute error (MAE) and mean absolute percentage error (MAPE) were also calculated as a measure of dispersion and expression of accuracy in percentage, respectively.

ANN to estimate Lorey's height from GLAS data
ANNs are able to perform nonlinear modeling without a prior knowledge about the relationships between input and output variables. It is also a non-parametric and black-box model. Thus ANNs are more general and flexible modeling tool for forecasting.
A variety of neural network structures have been developed. Multilayer perceptron (MLP) is the most popular type of neural network and belongs to a general class of structures called "feedforward". It is composed of several layers of neurons (nodes); the first layer (input layer) for distributing the data into the network, the last one (output layer) for extraction result of the network, and remaining layers between input and output are called hidden layers. There are complete connections between neurons in successive layers. Each neuron, except input layer neurons, is obtained by computing weighted sum of previous layer neurons and applying an activation function. MLP utilizes a learning technique called backpropagation (BP) for training the network. This kind of ANN is based on supervised learning. The idea of BP algorithm is that output of neural network is evaluated against desired output. If results are not satisfactory, weights between layers are modified and process is repeated until error is small enough. The answer that emerges from a neural network's weights can be difficult to understand, and the network's training can take longer than certain other methods of machine learning such as RFs.
Generally, an MLP is characterized by the number of hidden layers, hidden neurons, output neurons and transfer functions. An MLP combined of one input layer, one output layer and one or more hidden layers. In theory a hidden layer with sufficient number of hidden neurons is capable of approximating any continuous function (Kaastra & Boyd, 1996;Zhang, Patuw, & Hu, 1998). The number of neurons in input layer equals to the number of variables, and in output layer depends on the application (usually one output neuron for regressions). There is no formula for setting an optimum number of hidden neurons. Katz (1992) indicated that optimum number of hidden neurons is between one-third and two or three times the number of input neurons. Bailey and Thompson (1990) suggested that the number of hidden neurons for a three-layer ANN should be 75% of the number of input neurons. The relationship between input and output of a neuron or network is determined by an activation function. Selection of activation function is arbitrary and is usually determined by response variable.
In this study, MLP models utilizing BP algorithm were developed using different combination of metrics (waveform metrics and PCs) as input neurons. The hyperbolic tangent and linear functions were used as activation function, respectively, in hidden and output layers. The networks were built using different number of hidden layers and their comprising neurons to find the optimum size of them. The in situ Lorey's heights measured in 60 plots were used as response values. The developed models were evaluated using fivefold cross validation statistics including R 2 a and RMSE.

Production of Lorey's height map
Since GLAS data do not provide spatially continues coverage of the study area, optical images and topographic data were employed to produce a wall-to-wall height map. Following steps were carried out to this end: (1) Generalizing GLAS height model to all GLAS data: As described in subsections 3.3 and 3.4, models estimating Lorey's height were built and validated using GLAS data and in situ measurements in 60 plots (collected on 2013 and 2014).
The best model, called GLAS height model hereinafter, was used to obtain Lorey's height all over the study area at the location of GLAS footprints (450 footprints from 2003 to 2009). (2) Developing new height model using the GLAS heights of all 450 footprints in the study area as reference and indices extracted from optical images and topographic information as predictors: In order to align all indices extracted from optical images with DEM-extracted variables spatially, they were resampled to 10 m resolution. Since the average size of GLAS footprints is 70 m in diameter, mean value of all indices in a 7 by 7 window at the center of GLAS footprints were calculated (mode value for categorical variables). Multiple linear regressions (MLR) and RF were used to develop height models which is called second height model hereinafter. For developing MLR models, the most correlated indices after testing the effect of each covariate on the Lorey's height, separately, were entered in a stepwise regression to find the best subset of variables. Selection of indices for RF models was based on both stepwise regression and important degree of indices. The main advantage of RF is its incorporation of continuous or qualitative predictors without making assumptions about their statistical distribution or covariance structure (Breiman, 2001). (3) Selection of best second height model: Statistical criteria were calculated based on a fivefold cross validation to validate, compare and select the best model. (4) Applying the best second height model on the study area and producing Lorey's height map: The best regression selected in the previous step was run on the entire study area and a continuous wall-to-wall height map was generated. The resulting map was validated using 32 in situ measurements over part of study area collected in August 2016.
Regression-kriging was also used to investigate the possibility of improvement in the height estimates by taking into account the spatial correlation between heights. Fayad et al. (2016) observed an improvement of about 2 m (RMSE decreased from 6.5 to 4.2 m) by applying kriging-regression on canopy height map provided for eucalyptus forests of French Guiana from combination use of GLAS and MODIS data. Regression-kriging involves spatially interpolating the residuals (the difference between the predicted and actual values) from a nonspatial model using kriging, and adding the results to the prediction obtained from the nonspatial model (Goovaerts, 1997). To do so, semivariogram analysis was applied to the regression residuals to quantify the spatial structure of canopy height. This method has been widely used to analyze spatial structures in ecology (Eldeiry & Garcia, 2010;Ge, Thomasson, Sui, & Wooten, 2011;Robertson, 1987). The fitted semivariogram was used in the kriging of the Lorey's height residuals and then defining the regression-kriging model. As its name indicates, regression-kriging consists of a regression part (m S 0 ð Þ) and a kriging part (z S 0 ð Þ): Whereẑ s 0 ð Þ is height value using regression-kriging method,m S 0 ð Þ is the fitted trend, z S 0 ð Þ is the kriged residual, λ i are the kriging weights determined by the spatial dependence structure of the residual and z(S i ) is the residual at location S i .

GLAS-based height using RF regressions
The five best RF models developed based on GLAS waveform metrics as predictors and in situ measurements as reference (60 plots measured in 2013 and 2014) were presented in Table 2. Generally, the four first models produced approximately the same result. Model 1 with four variables (metrics) including Ln (H 50 ), TI 1.5 (TI raised to the power of 1.5), W ext 2.5 and Ln(W ext ) produced an RMSE of 5.4 m. Statistic of MAPE which shows 26.9% of predictions of this model were off. All models presented in this table include TI or TI 1.5 , which indicates the importance of this variable in estimation of Lorey's height over sloped area. The prediction error of model 5 built using metrics W ext , TI, H trail and H lead is greater than the first four models. In fact models containing H trail or H lead showed less performance in comparison with the others. It could be because of uncertainties in extraction of H trail and H lead from waveforms broadened by terrain slope (Lefsky et al., 2007). Figure 4 shows estimated Lorey's height using models 1-5 versus in situ Lorey's height.
The RMSE of RF model based on PCs selected by stepwise regression (26 PCs) was high (RMSE = 8.5 m, R 2 a = 0.15). Since first three PCs explained 77.5% of variance in the data, RF models were developed based on these PCs. Table 3 shows statistics of three RF models based on PCs and Figure 5 demonstrates their result versus in situ measurement. The best result was obtained using model 3 which combined metrics W ext and TI with PCs ( Figure 5(c)). This model performed similar to RF models based on waveform metrics (Table 2). It should be noted that although W ext is a waveform metric, models including both PCs and W ext were still called PCs-based models, because W ext is independent of ground peak identification and easy to extract unlike other waveform metrics. The interesting point is that the PC 3 has greater importance degree than two first PCs in all models. It indicates more correlation between Lorey's height and PC 3 rather than PC 1 and PC 2 . In other words, even though the first PC has the largest variance of the data, it may include less useful information related to Lorey's height.
As it is seen in the Figures 4 and 5, RF regressions were not able to estimate forest height accurately in sparse stands of short trees or dense stands of tall trees.

GLAS-based height using ANN regressions
In order to access the optimal structure of ANN, numerous networks with different hidden layers and neurons and various iteration rates were assessed. As it is seen in Table 4, three-layer networks (one hidden layer) performed well in predicting forest height. Model 2 with 3 inputs (W ext 2.5 , Ln(W ext ), TI), 3 hidden neurons and 10 iteration estimated Lorey's height with an RMSE and R 2 a of 5.0 m and 0.72, respectively. The estimated Lorey's height using this model is illustrated versus in situ Lorey's height in Figure 6.
ANNs developed based on PCs performed slightly better than those based on waveform metrics. A simple neural network employing only three first PCs of PCA produced an RMSE and R 2 a of 4.7 m and 0.76, respectively (model 4, Table 4). Adding W ext as input variable in model 5 (Table 4) improved the result significantly (RMSE = 3.4 m and R 2 a = 0.87) (Figure 7). This model's predictions (12.3%) are off from true measurements. Adding TI did not improve the result comparatively to the model 5.
As it is seen in Figures 6 and 7, absolute error of predictions where there are short and tall trees (<10 m and >30 m) is lower in PCs-based ANN model rather than ANN model based on waveform metrics.

Production of canopy height map
Given a suitable result of GLAS-based Lorey's height model, production of a wall-to-wall canopy height map from the synergy of lidar, optical data and topographic data was taken under consideration. Among all RF and MLR regressions developed using GLAS heights as reference and indices extracted from optical images and DEM, the best result was obtained using an RF model combining TI, vegetation indices including min-ndvi, min-summer-ndvi, min-rvi, and RVI related to dates June 2010 and April 2015, and also texture indices including "mean" and "homogeneity" extracted from optical images. This model  produced an RMSE and R 2 a of 5.5 m and 0.59, respectively (Figure 8).
The fitted RF model was used to produce a wall-to-wall Lorey's height map which is observed in Figure 9. In total, based on general knowledge about the forest site and visual interpretation, the resulted map seems logical and reliable at least in large-scale studies. Comparison of H Lorey extracted from Lorey's height map with true H Lorey values at the location of 32 in situ plots collected in 2016 (dispersed randomly) produced an RMSE and R 2 of 4.3 m and 0.50, respectively. Figure 10 shows estimated H Lorey extracted from Lorey's height map versus in situ H Lorey .
The regression-kriging method was also used to produce canopy height map considering spatial correlation between canopy heights. The spatial structure of canopy height was analyzed using the semivariogram of height residuals, and an exponential model was fitted on them ( Figure 11).
However the residuals of RF Lorey's height model did not exhibit a strong correlation structure (refer to Figure 11), the regression-kriging method was under consideration to investigate the probability of improving the generated height map. Thus, based on information derived from the semivariogram, kriged layer of height residuals was created and added to the height map produced using RF regression (Figure 9). Comparison of the resulted map with in situ data (32 plots measured in 2016) showed no improvement in terms of height accuracy Table 4. Properties of some ANN models for estimation of Lorey's height based on waveform metrics and PCs of PCA and the resulted statistics (60 in situ samples collected in 2013 and 2014 were used as reference).    (RMSE = 4.3 m and R 2 = 0.54 after kriging, while it was 4.3 m and 0.50, respectively, before kriging). Thus, kriging of height residuals was performed using kriging weights calculated based on information derived from height residual's semivariogram. The kriged layer was added to the height map produced using RF regression. Comparison of the resulted map with in situ data (32 plots measured in 2016) showed no improvement in terms of height accuracy (RMSE = 4.3 m and R 2 = 0.54 after kriging, while it was 4.3 m and 0.50, respectively, before kriging).

Discussion
In this research, two nonparametric methods, RF and ANN, were tested and compared for estimation of Lorey's height based on two data sets extracted form GLAS data: waveform metrics and PCs. While waveform metrics are subject to uncertainty because of their reliance on identification of ground peak which is challenging over steep areas, PCA analysis was applied on waveform signal intensities and the produced PCs are less error-prone.
RF models developed using waveform metrics showed approximately similar performance to models based on PCs. Fayad et al. (2014) achieved the same result relating usage of these two data sets as covariates for developing GLAS height MLR and RF models. While previous studies overestimated GLAS forest canopy heights in different height classes over sloped terrain (Chen, 2010;Hilbert & Schmullius, 2012), the RF models developed in this study overestimate Lorey's height in the location of plots covered by sparse stands  with short trees (<10 m). This result is consistent with the Nelson's findings (Nelson, (2010) about weakness of GLAS data in measuring heights accurately in sparse boreal forests with maximum heights about 12 m. On the other hand, underestimation is also seen in some plots with tall trees (≥33 m) over steep area (between 40% and 50% except for one plot with 20% slope) of the current research. This is consistent with findings of Hayashi, Weiskittel and Sader (2014) especially in unmanaged units with largest variation between underestimation and overestimation. Rajab Pourrahmati et al. (2016) also observed over-and underestimation of GLAS Lorey's heights, respectively, in short sparse and tall dense stands over steep terrain, using an MLR.
Concerning ANN models, the best result was obtained using four input neurons including three first PCs of PCA and waveform extent "W ext ". This model estimated Lorey's height with an accuracy of 3.4 m over steep terrain. Sun, Ranson, Masek, Fu and Wang (2007) proposed an ANN model to predict maximum height using GLAS waveform metrics on flat terrain. The model estimated tree height with an RMSE of 3.4 m, and it shows overestimation for short trees (< 10 m). They recommended that similar results could be obtained in areas with steep slope. Parmar (2012) also developed an ANN model based on waveform metrics derived from GLAS that estimated height with an RMSE of 2.1 m on moderate slope terrain. The tree height values in Parmar's study ranges from 3.7 to 19 m, and the model tends to overestimate heights shorter than 14 m. The result obtained from the current study showed better performance over the results of Sun et al. (2007) and Parmar (2012) for short trees.
Comparing the result of RF and ANN models of the current study, generally, ANNs showed better performance. Three highlights of the best ANN model (RMSE = 3.4 m) are: (1) lack of uncertainties in input variables, PCs (unlike most waveform metrics which are error prone); (2) achieving higher accuracy rather than other models while using no auxiliary data (DEM); and (3) in contrast to RF models and findings of other studies, this model was able to estimate Lorey's height properly even in short sparse and tall dense stands over severe topography. However, RF is a strong prediction method in very large database and is able to take into account any predictor type; it is sensitive to the sample distribution in the training data set. According to Horning (2010), RFs cannot predict beyond the range in the training data. It is extremely important that the training data include samples that cover the entire range of response data values. Given that the data set used in this study is not very big, it may happen that training set which includes two-third of entire data in RF (Breiman, 2001;Grömping, 2009) is not representative of whole range of response values (heights). If so, it could be reasonable to obtain weaker result using RF algorithm in comparison with ANN.
In order to provide a wall-to-wall height map, GLAS-derived heights (using the best ANN model), spectral and textural indices extracted from optical images and topographic information were used to build height models using MLR and RF regressions. The best result was obtained from an RF model with an RMSE of 5.5 m and adjusted R 2 of 0.59. The mixed use of spectral and textural features generally showed better explanation of canopy height than either spectral or textural which is consistent with Maillard (2006) and Su, Sheng, Du, Ch and Liu (2015) findings for image classification. The resulting model was used to provide a height map at a spatial resolution of 30 m for the entire study area. Relatively good compatibility was observed between generated map and field measurements (RMSE = 4.3 m and R 2 = 0.50). It is worth noting that only 32 plots located in a small part of the study site were used for validation which corresponds to 3.2 ha of the mapped area (15,000 ha). More sample representatives of the entire study area are required to confirm reliability of the outcomes. The resulting map showed higher accuracy rather than height map produced by Peterson and Nelson (2014) that combined GLAS data with moderate resolution optical data of Landsat 7 ETM+ to provide a canopy height map over Alaska forests. They used a regression tree approach using GLAS heights and composite bands of Landsat 7 ETM+ data, DEM and derived slope and aspect. The overall accuracy and kappa coefficient were 0.54 and 0.128, respectively, for comparison between field observations and the mapped values. They confirmed that the accuracy of the generated map is poor at the local scale, and it may have good representation at the global scale.
The attempt for improving the precision of canopy height map using regression-kriging was unsuccessful in contrast with the result achieved by Fayad et al. (2016) that reported an improvement of about 2 m in terms of RMSE for forest canopy height map in relatively homogeneous flat forests. In our study, the exponential model fitted on the height residual semivariogram ( Figure 11) did not show strong spatial correlation. The model fitted on the semivariogram was quite flat that could be a consequence of high nugget effect which itself would be resulted from error in observation data (here GLAS-based heights) or low density of data set. Fayad et al. (2016) showed for lidar data sets with flight lines spacing below the range of the spatial autocorrelation of the height residuals, the precision of the canopy height estimates was at its highest. The heterogeneity of the study area highlights necessity of having a dense data set which is a limitation in the current research.
In total, there are several limitations in production of height map for the study area. The slope correction attempted here was parameterized using only 60 field plots that do not represent all slope conditions properly, especially in case of steepest slopes (greeter than 60%). The GLAS-based heights were obtained using local GLAS height models developed for a small part of study site which will lead to height discrepancy especially in heterogeneous forests. To ensure that as many GLAS footprints as possible were included in the analysis, the data from all GLAS laser campaigns from September 2003 on were included and processed using the same algorithms. These data were collected over a more than 5-year period, and therefore do not reflect a static moment in time. Lastly, the validation of the height maps is incomplete and is limited by a lack of field observations for many of the forested lands within study site.
Several studies have used GLAS data to derive canopy height over large regions, but have combined them with 250 m resolution MODIS data rather than 10-30 m resolution SPOT-5 and Landsat-TM data. These studies developed products that are global assessments of canopy height at a coarse scale (Fayad et al., 2016;Lefsky, 2010;Simard et al., 2011;Ting et al., 2015). In this study, we demonstrated GLAS data are also useful in mapping at finer resolution, although subject to the limitations identified herein. The resulting map provides a good understanding of the distribution of forest mean height across the study area in a short time and at the lowest cost. Indeed a larger database is needed to understand and quantify the various sources of error underlying the lack of complete correspondence between the field observations and the mapped height values, and to provide more accurate map over heterogeneous mountain forests for local management purposes. Still, we confirm usability of dedicated approach and data for providing height map with desirable precision at regional to global scale.
Referring imperfect orbital sampling pattern of the GLAS, we limited study area to a local scale to minimize uncertainties caused by insufficient well-distributed data over a vast area. Future space lidar multibeam profiler, ICESat-2/ATLAS, scheduled for launch in 2018 with increased orbital sampling density will mitigate this sampling problem. The use of applied methodology can be extended to the upcoming lidar instruments, ATLS and GEDI (Global Ecosystem Dynamics Investigation, scheduled for launch in 2019), and allows change detection and monitoring investigations.

Conclusion
This research aimed first to assess performance of two machine-learning algorithm (RF and ANN) in estimation of forest mean Lorey's height using GLAS data, and second to provide Lorey's height map using the synergy of GLAS and optical data.
Development of both nonparametric regression techniques of RF and ANN allowed us to estimate Lorey's height with an accuracy of about 5 m. Since the accuracy of estimated height is strongly influenced by topography factors, TI had a significant importance in most prediction models. However, we developed a neural network model relied only on GLAS-derived metrics (PCs and waveform extent "W ext ") that estimates Lorey's height with an accuracy of 3.4 m.
We also obtained a reasonable result combining spectral and textural information of SPOT-5 and Landsat TM images and topographic properties (TI) to create mean Lorey's height map using RF algorithm. A regression-kriging method was used to try to improve the precision of the obtained Lorey's height map. Kriged values of height residuals (the difference between reference GLAS heights and estimated heights by second height model) were added to the canopy height estimates obtained from RF regression. We did not observe an improvement in the precision of height map likely due to high distance between reference points and lack of spatial correlation between heights.
Although more research needs to be done focusing on approaches and auxiliary data increasing accuracy of structural information derived from spaceborne lidar in mountain forests, this study highlights the use of spaceborne lidar of GLAS as an efficient option for forest measurement even in regions with severe topography and heterogeneity in horizontal and vertical vegetation structure.