Above-ground biomass estimation for Quercus rotundifolia using vegetation indices derived from high spatial resolution satellite images

ABSTRACT The estimation of vegetation parameters, such as above-ground biomass, with high accuracy using remote sensing data, represents a promising approach. The present study develops models to estimate and map above-ground biomass of Mediterranean Quercus rotundifolia stands using one QuickBird satellite image in pan-sharpened mode, with four multispectral bands (blue, green, red and near infrared) and a spatial resolution of 0.70 m. The satellite image was orthorectified, geometrically and radiometrically corrected. Object-oriented classification methods and multi-resolution segmentation were used to derive a vegetation mask per forest species. Data from forest inventory (24 plots) and vegetation indices (NDVI, EVI, SR and SAVI) derived from high spatial resolution satellite images were used for an area of 133 km2, in southern Portugal. The statistical analysis included correlation, variance analysis and linear regression. The linear regression models fitted included the arithmetic mean and the median values of the vegetation indices per inventory plot as explanatory variables. The overall results of the fitted models show a trend of better performance for those with the median value of the vegetation index as the explanatory variable. The best fitted model (R2 = 75.3) was associated with the Simple Ratio (SR) median value as an explanatory variable. A Quercus rotundifolia above-ground biomass map was produced.

For the Mediterranean region, Gómez et al. (2012) explored the potential of QuickBird images in pine forest structure characterisation to assess wood volume and biomass, as quadratic mean diameter, basal area and number of trees per unit area, using the Classification and Regression Trees (CART) method; Viana et al. (2012) analysed different approaches to estimating crown biomass of Pinus pinaster stands and shrubland above-ground biomass with a remote sensing predictor in order to obtain large-scale forest biomass mapping. However, above-ground biomass estimation for holm oak (Quercus rotundifolia) using high spatial resolution remote sensing data and vegetation indices was not found in the literature.
Quercus rotundifolia is one of the most important forest trees in the dry areas of the Iberian Peninsula. This species is native to the Mediterranean basin and distributed from Portugal to Syria (Forey, 1996;Valdés, Talavera, & Fernández-Galiano, 1987). According to the national forest inventory, in Portugal holm oak stands account for approximately 413,000 ha, corresponding to 13.0% of the Portuguese forest area with 27.4% being located in Alentejo (IFN5, 2010). Holm oak usually occurs in low-density stands in a silvopastoral system, called montado, with open heterogeneous canopies, where the main product is the fruit, associated with extensive grazing. However, due to the wood's high calorific value, the timber is particularly important for firewood and charcoal, being an alternative source of energy.
The objective of this study is to develop models to estimate above-ground biomass at local and regional scale, using high spatial resolution satellite images, for Quercus rotundifolia pure stands for a region of southern Portugal and which could also be used in other regions for the same species, with similar climate and stand characteristics. The main goal was to develop an above-ground biomass empirical model for Quercus rotundifolia, dependent on vegetation indices derived from high spatial resolution satellite images (QuickBird, with pixel size of 0.70 m), and map biomass. In literature, the biomass models most frequently use the mean of a vegetation index as an explanatory variable. Nonetheless, if the vegetation index does not have a normal distribution, the arithmetic mean, can result in biased estimations of above-ground biomass. An alternative is to use the median values. In this study both the mean and the median vegetation index values were used to fit the functions and the performance of the models was compared.

Study area
The study area, covering approximately 133 km 2 , is located in southern Portugal (central coordinate 8º 4ʹ 53.98ʹ'W and 38º 51ʹ 16.12ʹ'N). This region is characterised by a Mediterranean climate and the terrain is marked by plains, with low altimetry variation (mean elevation of approximately 200 m). The forest is composed mainly of evergreen oaks, namely, holm oak (Quercus rotundifolia) and cork oak (Quercus suber), with small patches of Pinus pinaster, Pinus pinea and Eucalyptus globulus in both pure and mixed stands.
The pre-processing methods applied to the image were the geometric and radiometric correction. The geometric correction was done based on ground control points collected with a Global Navigation Satellite System (GNSS) and geodetic vertices identified both on the ground and in the image, using ENVI version 4.8 (ENVI 2009). The Root Mean Square Error (RMSE) was 0.49 m. The image was orthorectified using Rational Polynomial Coefficients (RPCs) based on digital terrain elevation from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER).
A vegetation mask per forest species was obtained based on satellite image, where the trees crowns were delimited and identified by specie. For this purpose, the object-oriented classification image methods were used. The image classification was based on objects instead of pixels, contrary to the traditional image classification (Frohn, Autrey, Lane, & Reif, 2011). Three main steps were followed all executed in the eCognition Developer software v. 8.0.1 (Definiens Imaging, 2010) ( Figure 1), (i) contrast split segmentation algorithm, the variables considered were the QuickBird bands and the Normalized Difference Vegetation Index, NDVI (Rouse, Haas, Schell, & Deering, 1973), calculated based on the difference between NIR and RED bands, isolating the tree crowns from the other land use and vegetation types; (ii) multiresolution segmentation method, was used to isolate and delimit the tree crowns or clusters of tree crowns within the resulting objects from the previous step. For this purpose, scale compactness and shape parameters, with thresholds chosen empirically from parameters such as tone or spatial pattern; (iii) the objects representing tree crowns and tree crowns clusters were classified per forest species using the nearest neighbour classification method (for more details on methodology see Sousa, Mesquita, Gonçalves, and Marques da Silva (2010).
The study area was divided into a 45.5 m x 45.5 m (2,070.25 m 2 ) grid, equivalent to 65 by 65 image pixels with 0.70 m of spatial resolution. The vegetation mask per species was used to determine composition per grid as well as crown cover in percentage, using a GIS software ArcGIS Desktop v.10.4 (Esri, 2010). When crown cover of one species was equal to or larger than 75%, the grid was considered pure. Pure grids of Quercus rotundifolia were divided into two strata: (i) 10% to 30% crown cover; and (ii) crown cover higher than 30%. Forest inventory plots, each corresponding to one grid, were selected by random stratified sampling, by proportional allocation.

Forest inventory
The forest inventory data set, collected in 2011, is composed of 24 pure plots, of which 15 are monospecies and 9 are multispecies of Quercus rotundifolia and Quercus suber. All individuals with a diameter at breast height of 6 cm or greater belong to the main stand. For these trees, dendrometric parameters were measured, namely diameter at breast height, total height and crown radii in 4 directions (North, South, East and West) (Avery & Burkhart, 1994) and their geographical location was recorded by GNSS. Aboveground biomass per tree (Eq. 4) was estimated using the allometric functions equations (1) -(3) of Paulo and Tomé (2006), where ww, wb and wc are respectively the biomass of wood (in kg); bark (in kg); and crown (in kg); d is the tree diameter at breast height (in cm) and AGB is the aboveground biomass (in kg). Above-ground biomass per plot was calculated as the sum of all trees biomass per area unit (t/ha).

Vegetation indices
Vegetation absorbs energy selectively. Due to the photosynthetic pigments the leaves absorb predominantly in the RED region of the electromagnetic spectrum, while due to their internal structure they reflect in the NIR. The vegetation indices use the combination of these bands to quantify and maximize the contrast of reflectance in these two spectrum regions and relate it to the plants' properties (Berra et al., 2012;Mutanga & Skidmore, 2004;Ponzoni & Shimabukuro, 2007;Van Der Meer et al., 2016). Four vegetation indices were calculated based on the individual bands of a QuickBird image, using Spatial Analyst by ArcGIS (Esri, 2010), namely (Table 1): the Normalized Difference Vegetation Index (NDVI); the Enhanced Vegetation Index (EVI); the Simple Ratio (SR); and the Soil-Adjusted Vegetation Index (SAVI). All combine information from two or more spectral bands to enhance vegetation signal while minimising soil, atmospheric and solar irradiance effects (Jackson & Huete, 1991).
NDVI is the most extensively used vegetation index, based on the fact that vegetation is highly reflective in the NIR region and strongly absorbing in the RED (Rouse et al., 1973). EVI was developed to optimize the vegetation signal with improved sensitivity in high biomass areas and improved vegetation monitoring by reducing the impact of canopy background noise and atmospheric influences (Huete, Justice, & Van Leuwen, 1996;Huete, Liu, Batchily, & Leeuwen, 1997). SR is calculated by dividing the NIR and RED band (Jordan, 1969) and intends capture the contrast between the RED and NIR bands for vegetated pixels. SAVI was developed by Huete (1988) especially for areas with sparse vegetation cover, to reduce the soil background confounding effects. The L values vary from 0 to 1, for areas of 100% and 0% crown cover or no vegetation, respectively. According to the same author, L = 0.5 is reasonable for a wide range of soil types. In this study the crown cover presents some variation, so the primarily recommended L-value was used (L = 0.5).   (Jordan, 1969) Soil Adjusted Vegetation Index Huete, 1988) Statistical analysis and above-ground biomass estimation Statistical analysis included correlation analysis, variance analysis and linear regression. The first was carried out with the Spearman correlation test, as data had a non-normal distribution assessed with the Shapiro Wilk normality test (Shapiro, Wilk, & Chen, 1968). The variance analysis between the mean and median values of the vegetation indices was done with a non-parametric test, the Wilcoxon test, as variables did not meet normality assumptions (Sheskin, 2007). For the latter the ordinary least squares linear regression (Eq. 5, where β O is the constant, β 1 the slope and VI the vegetation index) was used, as suggested by several authors (eg Austin, Mackey, & Van Niel, 2003;Drake et al., 2002a;Drake, Dubayah, Knox, Clark, & Blair, 2002b;Popescu, Wynne, & Nelson, 2003;Rauste & HäMe, 1994). The vegetation index value per grid was calculated as the arithmetic mean and the median of all pixels within each grid. The statistical properties of the models allow the selection of functions that potentially fit the data better and/or present better predictive ability (Burkhart & Tomé, 2012;Pretzsch, 2009). These were evaluated using the sum of squares of the residuals (SQR), the coefficient of determination (R 2 ) and the adjusted coefficient of determination (R 2 aj ). Complementary PRESS (Eq. 6, where y are the observed values, ŷ are the estimated values, i is the number of the observation, i = 1. . .n) and APRESS (Eq. 7) and their average values (PRESSm and APRESSm) statistics were used. The first is the sum of the square of the PRESS residuals and the second the sum of the absolute value of PRESS residuals. According to Clutter, Fortson, Pienaar, Briester, and Bailey (1983) and Myers (1986) these criteria can be used as a true validation test, considering the best model to be the one that has the lowest PRESS and APRESS values. The models were ranked according to the aforementioned criteria and the best model was the one with the lowest values. The heteroscedasticity associated with the error term of the models was evaluated graphically, plotting the studentised residuals against the estimated values. The normality of the studentised residuals was carried out using the normal Quantile -Quantile plots (QQ plots) and with the Shapiro Wilk normality test, for a probability level of 0.001 (Montgomery & Peck, 1982;Myers, 1986). The statistical analysis was implemented in R statistical software (R Development Core Team, 2012).
APRESS ¼ X n i¼1 y i Àŷ i;À1 The main steps of the methodology applied in this study are presented in the workflow diagram presented in Figure 1.
Results and discussion Figure 2 shows the result of the multi-resolution segmentation and object-oriented classification process for a small area of the image, used for Quercus rotundifolia pure stands. The agreement between the classification and ground truth obtained by the Kappa statistic (Congalton, Oderwald, & Mead, 1983;Stehman, 1996) was 78% and the global precision was 89%. The kappa statistic level of agreement is indicative that the methodology used to obtain the vegetation mask, for tree crown delimitation, has a good performance. Two reasons can be identified for the results attained. The first is the contrast, or the different spectral response, between the tree crowns and other land uses, especially those of the understory vegetation. The acquisition date of the images was August, which in the Mediterranean region corresponds to a warm and dry season and the end of the phenological cycle of the vegetation of the understorey. Thus, while the tree crowns of Quercus rotundifolia are photosynthetically active, the other vegetation has no photosynthetic activity, which enhances the contrast of the reflective response of the trees and the understorey vegetation, resulting in good tree crown delimitation accuracy. The second reason is related to the tree species identification. Vaz et al. (2011) analysed these two forest species, concluding that Quercus suber has higher photosynthetic potential than Quercus rotundifolia, which results in higher values of reflectance for Quercus suber than for Quercus rotundifolia. The spectral response is sufficiently different between these two species allowing their differentiation with accuracy. Two other aspects that give some insight into these results are the density and the spatial arrangement of the trees. Most Quercus rotundifolia stands are open that is have low number of trees per area unit and as a consequence many trees are isolated, consequently the crowns do not touch each other, and frequently are at a considerable distance from their neighbouring trees, which enables a better delimitation of the tree crowns and species identification. However, and in spite of many trees being isolated, the spatial arrangement is irregular, which means that some trees have a cluster spatial arrangement, although less frequently. In these clusters the separation of the individual tree crowns and species identification is more difficult and sometimes not possible. Nevertheless, as the number of clusters per area unit is rather small it is possible to obtain a good performance with the methodology used. Table 2 shows the descriptive statistics for the data considered in this study. The multispecies plots have a number of Quercus suber individuals of 5 treesha −1 in 6 plots, 10 treesha −1 in one and 14 treesha −1 in two, corresponding to an average of about 15% of the stems. The stem diameters vary between 8.7 cm and 84.7 cm and the crown diameters between 2.0 m and 13.3 m. Crown cover (CC) varies between 13.7% and 67.6%, covering the two previously defined strata.
The vegetation indices are relatively different from each other, with SR and EVI having the smallest and the greatest coefficient of variation (CV), respectively. The coefficient of variation for the median values per plot and per vegetation index is larger when compared with the mean values ( Table 2).
The Spearman correlation analysis, for a significance level of 95%, showed strong positive correlations between above-ground biomass (AGB) and vegetation indices for their mean and median values per plot, being larger for NDVI and SR (Table 3), corresponding to the vegetation indices with the lower coefficient of variation (Table 2).
In accordance with our results, positive correlations were found by Pinheiro, Durigan, and Adami (2009) for NDVI (0.481), SR (0.443) and SAVI (0.477), though weaker. The difference can be explained by the different development of shrub layers in both stands; in our study the shrub layer had low density while in that of Pinheiro et al. (2009) it had high density.
Contrary to our results, in the work of Watzlawick, Kirchner, and Sanquetta (2009) on Araucaria angustifolia stands and of Watzlawick et al. (2006) on Pinus taeda stands, both with very high density, found negative correlations between biomass and NDVI (-0.740 and -0.800, respectively) and SR (-0.710 and -0.750, respectively). According to several authors (Ponzoni & Shimabukuro, 2007; Wang,  Table 2. Above-ground biomass (AGB), Crown cover (CC), Number of trees per hectare (N), Basal area (G) and vegetation indices descriptive statistics (where min is the minimum value, max the maximum value, med the arithmetic mean, SD the standard deviation and CV(%) the coefficient of variation).  Brandão, & Teixeira, 2013), an increase in crown cover does not increase NDVI due to its quick saturation, because the index stabilizes, even though the density of the canopy increases, turning it insensible to biomass increase (Jensen, 2000). The strong correlation coefficients between vegetation indices and above-ground biomass seem to be associated with the time of the year that the image is taken. For Mediterranean countries, in summer (dry season), the understorey has a very light colour, due to the presence of dry vegetation and soil, which maximizes the contrast between the Quercus rotundifolia trees and the dry herbaceous background. The image was acquired in august, when the understorey had mainly dry vegetation and soil, resulting in high contrast and also very high correlation coefficients. Accordingly, better results for above-ground biomass estimation using Landsat ETM+ images were obtained in the dry season as opposed to other seasons of the year by Nguyen et al. (2015). The same authors stress that the satellite image data, which is related to the vegetation phenological cycle and season of the year, is a relevant factor in these types of studies, as evaluating the vegetation properties using remote sensing data with a good level of accuracy depends on the contrast in the spectral signatures of the different types of vegetation. Thenkabail et al. (2004) and Carreiras et al. (2006) also refer to this type of phenomena in their studies, for very high and medium spatial resolution images and aerial photos, respectively.
The statistical properties of the models can be seen in Table 4. The results of this study demonstrate that the above-ground biomass models with a vegetation index as an explanatory variable have a good performance, thus confirming that sensitivity can be maximised in the evaluation of the photosynthetically active vegetation (Günlü, Ercanli, Sönmez, & Baskent, 2014). This is related to the accuracy of the delimitation of the tree crowns and species identification, which are dependent on the type of the stand and the contrast of the spectral signature reflectance of the different land uses present, as previously mentioned. The overall analysis of the fitted models reveals that those with the median values of the vegetation indices as an explanatory variable have a better performance than the corresponding models with the mean as an independent variable. This might be related to the non-normality of the values of the vegetation indices. The median, being the midpoint of a frequency distribution of observed values, is not skewed by very large or very small values, thus it is a robust statistic when data does not have normal distribution.
The model with the best performance has the median of SR ( f SRÞ as explanatory variable, the simpler vegetation index. When analysing PRESS and APRESS statistics it can be seen that M1 has the lowest values. The best function with mean values per plot is M5, with NDVI as independent variable. The models with EVI and SAVI as independent variables have worse statistical properties than those with SR and NDVI, for both the mean and median values per grid. This fact indicates the good relationship between the simplest vegetation indices and certain vegetation parameters, in this case with the aboveground biomass. Analysis of the studentised residuals of M1 and M6 (Figure 3) and M2 and M5 (Figure 4) does not show systematic variations, the graphics of normal probability approach a straight line and the normality of the residuals is not met by the Shapiro-Wilk test. In literature few references were found for biomass estimation as a function of vegetation indices using high spatial resolution satellite images. The results of this study are consistent with those of several authors, though with higher coefficients of determination. Thenkabail et al. (2004) reported that the best model for biomass estimation in an oil palm plantation in Africa was a NDVI function with R 2 of 72 %. Clerici et al. (2016) stated that the best model for above-ground biomass estimation for periurban Andean secondary forest was a linear function with the mean value of SR as an independent variable,  with R 2 of 48%. Similarly, other authors (Watzlawick et al., 2009(Watzlawick et al., , 2006 reported that simple linear models with SAVI and NDVI as independent variables have the best predictive abilities for above-ground biomass estimates, with an adjusted coefficient of determination of 57% and 53%, respectively. Furthermore, analogous results were found in several studies based on vegetation indices derived from medium and coarse spatial resolution satellite images (Araújo, 1999;Dong et al., 2003;Häme et al., 1997;Maciel et al., 2009).
An example of the application of model M1 for above-ground biomass mapping is presented in Figure 5 for a small area. It was obtained calculating above-ground biomass per grid. It has the advantage of enabling above-ground biomass spatial distribution analysis for the whole study area.
This methodology, which uses high spatial resolution satellite images, makes an important contribution towards forest planning, management and silviculture in decision-making as it allows the production of biomass maps (model applied over a grid) through simple procedures, and is less timeconsuming than other methodologies, for example forest inventory and spatial analysis. It is also important to forest inventory design as a more cost-efficient sampling can be achieved at local and regional scale (McRoberts, Tomppo, & Naesset, 2010).

Conclusions
The rapid advancement of remote sensing technology expands the choice of the imagery sources. The increase in the availability of high spatial resolution images allows vegetation studies with a high level of accuracy at different scales.
This study implemented a comprehensive methodological framework for above-ground biomass prediction using high spatial resolution images. The results show that images taken in summer in Mediterranean countries can be used to estimate Quercus rotundifolia stands as a function of vegetation indices. The acquisition image date can be a limitation, because for the Mediterranean region high spatial resolution images are not available with high frequency, but similar data may be available from different satellites.
The model with the best predictive ability for estimating above-ground biomass (M1) has the median of SR as an explanatory variable. When comparing the models, those with the median of the vegetation indices showed better accuracy than those with the mean as an independent variable. These results can be explained, at least in part, by the non-normal distribution of the vegetation indices values.
The approach developed can be used in Quercus rotundifolia stands with similar stand structure to estimate above-ground biomass. Nonetheless, the swath of these images has a width of 16.5 km at nadir, which is adjusted to the local scale. To evaluate large areas several images have to be considered and worked through.

Acknowledgments
The authors are thankful to the: (a) Program of scholarship Ciências sem Fronteiras that allowed the development of this study at Évora University, Portugal; (b) Programa Operativo de Cooperação Transfronteiriço Espanha -Portugal (POCTEP), which funded the project Altercexa -Medidas de Adaptación y Mitigación del Cambio Climático a Través del Impulso de las Energías Alternativas en Centro, Alentejo y Extremadura, within which this study was developed.
(Refª 0317_Altercexa_I_4_E and 0406_ALTERCEXA_II_4_E); (c) FEDER Funds through the Operational Program for Competitiveness Factors -COMPETE and National Funds through FCT -Foundation for Science and Technology, under the Strategic Project UID/AGR/00115/ 2013, which funded this study; (d) TrustEE -innovative market based Trust for Energy Efficiency investments in industry (Project ID: H2020 -696140); (e) to the forest producers for allowing plot installation and measuring and to the field team, Carla Coelho, David Gomes and Pedro Antunes, for their work in data colleting.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
The authors are thankful to the: (a) Program of scholarship Ciências sem Fronteiras that allowed the development of this study at Évora University, Portugal; (b) Programa Operativo de Cooperação Transfronteiriço Espanha -Portugal (POCTEP), which funded the project Altercexa -Medidas de Adaptación y Mitigación del Cambio Climático a Través del Impulso de las Energías Alternativas en Centro, Alentejo y Extremadura, within which this study was developed. (Refª 0317_Altercexa_I_4_E and 0406_ALTERCEXA_II_4_E); (c) FEDER Funds through the Operational Program for Competitiveness Factors -COMPETE and National Funds through FCT -Foundation for Science and Technology, under the Strategic Project UID/AGR/00115/2013, which funded this study; (d) TrustEEinnovative market based Trust for Energy Efficiency investments in industry (Project ID: H2020 -696140). It has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 696140. The work reflects only the author's view and the Agency and the Commission are not responsible for any use that may be made of the information it contains.