Generalized height-diameter models for five pine species at Southern Mexico

Abstract Generalized height-diameter at breast height (D) models are essential for the estimation of the timber stocks of a forest stand, as well as in the generation of base information to develop forest growth models, and as basic inputs in the development of forest management plans. Generalized models were developed to estimate total height (TH) based on the D and stand variables, of five Pinus species in forests under forest management of Ixtlán de Juárez, Oaxaca, Mexico. The data used come from a timber forest inventory, where n = 1041 sampling plots of 1000 m2 each were established based on a stratified-systematic sampling design. The species selected according to their relative abundance were: Pinus patula, Pinus oaxacana, Pinus ayacahuite, Pinus teocote and Pinus leiophylla. Five nonlinear equations were fitted using regression techniques to predict the TH of the trees under several silviculture regimes and forest management conditions. The statistical criteria of goodness of fit used were: adjusted coefficient of determination (R2adj), root mean square error (RMSE) and absolute average bias in the prediction (Ē). Likewise, the graphic analysis of the predictive capacity of the equations was considered. The D and the stand variables (quadratic mean diameter, dominant diameter and dominant height) for these species explained between 75 and 83% of the variability of the TH data. The predicting variables to apply the developed generalized models to estimate tree's total height require less sampling effort and are derived from conventional forest inventory data, which allows to reduce costs and time in field work.


Introduction
The allometric relationship between total height and the diameter at breast height (TH/D) of trees is important for the forester since it provides information to predict the total tree volume or the merchantable volume accurately, by means of tree volume tables or percentage volume (Soares and Tom e 2002;Trincado and Leal 2006). In growth and yield projection systems, models that allow estimating the TH-D ratio over the projection period are required (Sharma and Parton 2007;Santiago-Garc ıa et al. 2017) for the programing of silvicultural treatments and the quantitative estimation of volume production according to industrial uses (Castedo-Dorado et al. 2005). Also, for the biomass estimation in even-aged stands, and is usual to include the dominant height and age to estimate the stand productivity using the site index (Lappi 1997;Ju arez-de Gal ındez et al. 2007).
In forest inventories, the measurement of all tree heights is expensive, in addition, it is the variable with the greatest error (Zambrano et al. 2001), therefore, the measurement is frequently reduced to subsamples of trees, while the diameter is usually measured throughout the sample ( Barrio et al. 2004). Because there is a relationship between TH and D of the trees, in forest inventories is preferable, more practical and economical to take a sample of trees, measure them the TH and D as accurately as possible and fit these measures to regression models, then estimate the TH of the trees not measured (Di eguez-Aranda et al. 2005).
To fit the height-diameter models, numerous equations have been used, both linear and non-linear (Huang et al. 1992;S anchez-Gonz alez et al. 2007), although the heterogeneity of the environmental conditions of the forest and the different silvicultural regimes of the stands cause that a TH/D equation does not fit well for all stands or forest conditions (Schr€ oder and Alvarez 2001). In order to increase the equations efficiency and minimize errors in the fit and prediction at the stand level, generalized models are used in which to predict the height of each tree the D is used coupled with one or more stand variables (dominant height (H 0 ), quadratic mean diameter (D q ) and dominant diameter (D 0 )), which take into account certain basic characteristics inherent to all local height regressions represented by each stand (Sonmez 2009;G omez-Garc ıa et al. 2015).
The generalized height-diameter models allow to explain precisely the height of any tree in response to different management regimes, site quality and stand density (Temesgen et al. 2007;Crecente et al. 2010). These stand variables are selected so that they do not represent excessively significant additional costs in data collection (Di eguez-Aranda et al. 2005).
Studies to generate generalized TH/D models in managed natural forests are scarce because most of them have been developed for commercial forest plantations (Castedo-Dorado et al. 2006;Krisnawati et al. 2010;Milena et al. 2013). In Mexico, some studies have been generated those equations to describe the allometric relationship between TH and D, including Vargas-Larreta et al. (2009, 2016 in forests at El Salto Durango; Corral-Rivas et al. (2014) in the northwest of Durango, and Hern andez et al. (2015) in the east of the state of Hidalgo.
The aim of this study was to develop generalized height-diameter models to estimate total height based on the diameter at breast height and stand variables, for five Pinus species growing in ecosystems under forest management in Ixtl an de Ju arez, Oaxaca, Mexico. The hypothesis was that the generalized height-diameter models accurately describe the variability of the total height in managed stands, by incorporating stand variables.

Study area and species selection
The study was carried out in a communal-native forest of 7355 ha under forest management for timber production at Ixtl an de Ju arez, Oaxaca, Mexico. The area is located between the coordinates 17 18 0 16 00 -17 30 0 00 00 N and 96 31 0 38 00 -96 22 0 00 00 W and has altitude variations of 2200-3100 m. The region is located in the physiographic province "Mountain System of the North of Oaxaca." The climate is C (m) (w 00 ) b (i 0 ) g, temperate humid with summer rains. The average annual temperature ranges from 10 to 26 C in different areas; the annual rainfall varies between 800 and 1200 mm. The predominant vegetation type in the study area corresponds to pine-oak forest (STF. 2015;Santiago-Garc ıa et al. 2017

Tree inventory and variables selection
The data set used for the analysis comes from 1041 sampling plots (of 1000 m 2 each) of a forest inventory carried out in 135 substands, grouped into 18 stands, with a net inventoried area of 4538 hectares and a sampling intensity of 2.3%, while the mean sampling intensity by substand was 4.53%, based on a systematic-stratified sampling design, where the substands were defined as strata because these are stand divisions with similar vegetation characteristics. The sampling plots were distributed systematically and proportionally to the size of the substand. The separation distance between plots for substands >10 ha (70% of substands) was 224 m, while the rest of substands with a land area 10 ha (30% of substands), had a separation between plots of 100-200 m. The sampling plots were selected and established using ArcGIS-ArcMap 10.1 V R , Google Earth V R and topographic charts of the area. Control data were recorder at each plot: plot number, UTM coordinates, altitude (m), aspect (N, S, E, W) and slope (%), with a Garmin V R GPS and a Hagl€ of Sweden V R Clinometer.
At each sampling plot the total height (TH, m) of the trees in approximately 25% of them was measured randomly for a subsample, thus generating a different data collection by species. The TH was measured with Hagl€ of Sweden V R Clinometer, while the diameter at breast height (D, cm) of all trees within the plot was obtained with a Hagl€ of Sweden V R diametric tape. In addition, stand variables were estimated: quadratic mean , where: BA ¼ basal area, NT ¼ number of trees), dominant diameter (D 0 , cm) and dominant height (H 0 , m). These last two were determined proportionally according to the definition of dominant height that corresponds to the average of the 100 tallest or thickest trees per hectare (Assmann 1970;Alder 1980).
The generalized height-diameter models were fitted with all the data obtained in the forest inventory. In order to establish a relationship, in addition to the TH and D, it was necessary to include the stand variables: quadratic mean diameter (D q , cm), dominant diameter (D 0 , cm) and dominant height (H 0 , m) to obtain precise and unbiased estimates of the heights of individual trees under different growing conditions (L opez-S anchez et al. 2003).
The functional form of the generalized height-diameter models has the structure: where: TH: total tree height (m); D: diameter at breast height (cm); D 0 : dominant diameter (cm); D q : quadratic mean diameter (cm); H 0 : dominant height (m); b i : regression parameter vector.
For the fitting process, five generalized equations of TH-D were selected (Table 1), which have been used in some studies to describe this relationship in conifer species (Sharma and Parton 2007;Vargas-Larreta et al. 2009;Hern andez et al. 2015).

Data analysis
Since the equations described are nonlinear, we used the PROC MODEL of the SAS 9.3 V R statistical package for parameter estimation. We implemented the maximum likelihood method with complete information , that expresses the variability explained by the model using the number of parameters; the root mean square error (RMSE), which defines the precision of the estimates in terms of the dependent variable; and the absolute average bias ( E), which indicates the model deviation with respect to the observed values. We used these indicators (Table 2) to compare the goodness of fit of proposed models (Di eguez- Aranda et al. 2003).
In the fitting process of the allometric models, it is common to find heteroscedasticity of residuals. This is, the variance of the residuals increases as the value of the independent variable increases (x), therefore to correct this type of heteroscedasticity we included in the models of TH, a potential function that weights the variance of the residuals ( Alvarez-Gonz alez et al. 2007): The k value of the exponent was determined according to the Harvey (1976) method, which consists of using the residuals of fitted model without weights (ê i ) as a dependent variable in the potential variance model of the residuals:ê 2 i ¼ x k i , therefore we used: logê i ð Þ ¼ a þ k Â logðx i Þ: In this process, the independent variables (x i ) used were: D, D 0, and H 0 (D 2 H 0 , D 0.5 H 0 , H 0 2 D, D 0 2 H 0 ), while the weighting factor to achieve equality of error variance was included in the fit

Results and discussion
The dataset covered most of the existing diameter and heights classes. A total of 4899 individual tree height data were used for this study (Table 3). The minimum, maximum, and mean values, as well as standard deviations and coefficients of variation of the stand variables, are given in Table 3. Based on the parameter estimates of each fitted equation, as well as on their goodness of fit parameters and significance level, the total height was precisely predicted using the diameter at breast height (D) and some stand variables, such as the quadratic mean diameter (D q ), dominant diameter (D 0 ) and the dominant height (H 0 ) ( Table 4) as predictors. These last variables allowed the equations to be adapted to a wide range of stand conditions, directly related to site productivity and levels of competition within the stand (Misir 2010).
The most used parameter of the goodness of fit of a regression model is the adjusted coefficient of determination (R 2 adj ). The models of the five Pine species are considered adequate since they explain more than 75% of the variation observed in the dependent variable (Table 4). On the other hand, the reduced values of the goodness of fit statistics, such as the RMSE ( 4.3 m) and E(0.045 m), showed evidence of the low deviation and average bias incurred when predicting the total height (Table 4).
The GM3 model (Table 1) proposed by Sharma and Parton (2007) fitted to the pine species in this study, was the most appropriate for modeling the total height, having desirable results in three species, such as P. ayacahuite, which showed the highest R 2 adj value (0.83), followed by P. oaxacana (0.82) and P. patula (0.75) ( Table 4). These results contrast with those reported by Vargas-Larreta et al. (2009), where they obtained values of 0.87 for pine species from northern Mexico; however, our results agree with that obtained in a study conducted for Pinus kesiya in South Africa, where an R 2 adj ¼ 0.84 was obtained (Missanjo et al. 2013): To model TH of P. leiophylla the equation GM4 proposed by Wang and Tang (2002) presented better Table 2. Statistical indicators to assess the goodness of fit of generalized height-diameter models.

Indicator Equation
Adjusted coefficient of determination (R 2 adj )   Wang and Tang (2002) ½ TH: total tree height (m); D: diameter at breast height (cm); D 0 : dominant D (cm); Dq: quadratic mean diameter (cm); H 0 : dominant height (m); exp: exponential function, and b i : parameters to estimate. accuracy and precision (0.83), while the equation GM5 proposed by Nilson (1999) showed the most satisfactory results for P. teocote (0.80) ( Table 4). These results contrast with those obtained by Hern andez et al. (2015), using the modified Nilson model (Nilson 1999) in Hidalgo forests where they obtained values higher than 0.93.
The five equations tested in the present study showed high values in the percentage of the total variation explained of the total height. Vargas-Larreta et al. (2009) presented values higher than 0.79 for the genus Pinus when using the same equations, while Corral-Rivas et al. (2014) obtained similar values with the use of this type of equations for a group of Pinus species (0.82). On the other hand, Trincado and Leal (2006) and Milena et al. (2013) in plantations of Pinus radiata and Eucalyptus tereticornis, respectively, obtained values higher than 0.90; similar values were reported by Hern andez et al. (2015) with R 2 adj higher than 0.93.
The high precision showed by the five tested equations is due in part to the inclusion of the dominant  height as an independent variable. L opez-S anchez et al. (2003) found that the dominant height is the necessary basis for achieving acceptable height predictions of individual trees. On the other hand, in most growth modeling studies, both the tree and the stand development are linked to the dominant height (Borders 1989;Calama and Montero 2004) because it indicates the site quality in terms of growth and stand yield (Eerikainen 2003). The inclusion of the quadratic mean diameter as an explanatory variable takes into account the stand competence level since it shows a close relationship with the number of trees per hectare. Stand density is the main factor that affects the TH/D relationship, particularly for trees growing in managed stands (Huang and Titus 1994).
The percentage of the tree height variability explained by the equations for the five Pinus species varied between 75 and 83%. These results agree with Canga et al. (2007), who states that D, dominant height, quadratic mean diameter, and dominant diameter are the main variables to greatly explain the observed variability of the total tree height. The high variability found in the tree height can be attributed to the applied forest management practices because the Ixtl an de Ju arez forests have been under timber production during the last six decades (Castellanos-Bolaños et al. 2008). For this reason, silvicultural practices and the stand structure have a significant influence on the modeling of TH/D relationships (Saunders and Wagner 2008).
The heteroscedasticity correction of the selected models allowed us to obtain low average bias values (Table 4), in addition to the fact that it was possible to verify the distribution of the residuals with a homogeneous tendency (Figure 1). In two pine species, the Breusch-Pagan test did not comply with the decision rule (Pr ! 0.05) (Table 4). However, the graphs showed that the statistical fit and the results for these species are reliable since the average bias is minimized and it is maximized the R 2 adj (Trincado and Leal 2006). Also, the predicted values follow the trend of the observed data ( Figure 1).
Regarding the absolute average bias, the lowest value was shown by the GM3 model (0.002) for P. oaxacana and the highest value was obtained with the equation GM5 (0.162) for P. teocote (Table 4). When considering the root mean square error as a measure of probable error, no major differences were observed between the models, and the results ranged between 3.4 and 4.3 m.
The selection of the best equations for the studied species was based on desirable mathematical characteristics, such as presenting a sigmoidal curve, possessing a sufficient number of parameters to achieve flexibility without compromising parsimony, and the possible biological interpretation of the parameters in a reasonable way (Fekedulengn et al. 1999;Peng et al. 2001). Figure 1 illustrates the predictive capacity of the equations for the five Pinus species.
The use of the generalized equations developed only requires the estimation of the stand variables, with a low sampling effort in terms of time and cost, without loss of precision. Likewise, this type of mathematical tool would reduce costs and time in the collection of forest measurements of stands (G omez-Garc ıa et al. 2013). It is important to develop specific equations for each species because each one has particular growth habits. In addition, these types of equations facilitate the quantification of existing timber forest resources.

Conclusions
The fitted generalized height-diameter models are reliable for the prediction of the individual tree heights of the species analyzed. They explained a variability between 75 and 83% in the observed heights of the five Pinus species of forests with forest management for timber production in Ixtl an de Ju arez, Oaxaca, Mexico. The predicting variables necessary to use the models are diameter at breast height, dominant diameter, quadratic mean diameter, and dominant height; which require a low sampling effort. These variables are recorded in most forest inventories and can be projected unto the future with growth equations. The developed equations may be applied in forest inventories and as basic inputs of growth and yield models during the elaboration of forest management plans.
These silvicultural tools can significantly reduce the time and costs invested in fieldwork.