Identifying non-thrive trees and predicting wood density from resistograph using temporal convolution network

Abstract Deep learning approaches have been adopted in Forestry research including tree classification and inventory prediction. In this study, we proposed an application of a deep learning approach, Temporal Convolution Network, on sequences of radial resistograph profiles to identify non-thrive trees and to predict wood density. Non-destructive resistance drilling measurements on South and West orientations of 274 trees in a 41-year-old Douglas-fir stand in Marion County, Oregon, USA were used as input series. Non-thrive trees were defined based on their changes in social status since establishment. Wood density was derived by X-ray densitometry from cores obtained by increment borers. Data was split for cross validation. Optimal models were fine-tuned with training and validation datasets, then run with test datasets for model evaluation metrics. Results confirmed that the application of the Temporal Convolution Network on resistograph profiles enables non-thrive tree identification with the probability, represented by the area under the Receiver Operator Characteristic curve, equal to 0.823. Temporal Convolution Network for wood density prediction showed a slight improvement in accuracy (RMSE = 18.22) compared to the traditional linear (RMSE = 20.15) and non-linear (RMSE = 20.33) regression methods. We suggest that the use of machine learning algorithms can be a promising methodology for the analysis of sequential data from non-destructive devices.


Introduction
In stand development, the degree of inter-tree competition is asymmetric for sunlight (Weiner 1990). Dominant trees or high social class trees are seldom limited by light availability and produce more growth in diameter increment compared to low social class trees. Thus, cambial growth is intrinsically determined by tree diameter (Zeide 1993). The occurrence of trees putting on less growth increment compared to the same size or smaller trees has contributed them to become slender and lower in social rank. This deviation in stem growth is the reflection of unfavorable local growing conditions and should be considered in the silviculture treatment process. However, there is a lack of simple identification tools to date. Unlike the suppressed trees, these declining social rank trees cannot be easily distinguished by diameter or height.
Resistance drill measurement has been widely applied to inspect tree decay in forest inventory (i.e., Rinn et al. 1996;Wang and Allison 2008). Recent research found that resistance profile amplitude correlated with wood basic density a.k.a. wood density (WD) or wood specific gravity (Rinn et al. 1996;Park et al. 2006;Bouffier et al. 2008;da Silva Oliveira et al. 2017). WD is an important wood characteristic that affects the performance of wood products and is often used as the criterion for timber grading. This rapid and inexpensive resistance measurement is potentially a good alternative to measure WD compared to the traditional measurement from x-ray densitometry (Gao et al. 2017). There are a few studies on Douglas-fir (Pseudotsuga menziesii (Mirb.)) WD and resistance drilling. Chantre and Rozenberg (1997) reported a correlation in 25 year-old Douglas-fir resistograph amplitude profiles and WD and suggested that resistance drilling could be an effective tool in estimating WD for the whole trunk. El-Kassaby et al. (2011) found that resistograph amplitude can represent WD in genetic control study among 20 unrelated coastal Douglasfir full-sib families. Nevertheless, Todoroki et al. (2021) argued that resistograph amplitude was insufficient to predict WD due to high prediction error based on their study of 60-75 years old Douglas-fir trees in six sites in coastal Western North America.
Traditionally, researchers use features in summary Statistics. Although, some parts of the data may not contribute significantly to model performance. Feature selection and pattern recognition in machine learning models were introduced to identify critical or influential features of the target response variables. Deep learning is a new branch of machine learning, developed by stacking layers of artificial human brain neurons to learn the complex non-linear relations in features and datasets (Schmidhuber 2015). The Temporal Convolutional Network (TCN) proposed by Bai et al. (2018), is a variation of Convolution Neural Networks for sequence modeling whose structure and associated hyperparameters were derived from optimized function and verified through actual experimental data. The causal convolution, dilated convolution and residual block were introduced to extract longterm series information to overcome the challenges in time coherence in the conventional convolutional neural network. Many sequence input applications found that TCN models yielded high accuracy, including Satellite Image Time Series classification (Pelletier et al. 2019) and El Niño-Southern Oscillation prediction (Yan et al. 2020).
In this study, we use the TCN approach to extract patterns from resistograph amplitude sequences for both classification and regression tasks. Resistograph profiles and WD data of 274 trees were obtained from a 41-year-old Douglas-fir stand located in Marion County, Oregon, USA. The declining social rank trees, hereby "non-thrive" trees were derived from periodic measurements every 2-4 years since establishment as part of a silvicultural treatment study from The Stand Management Cooperative at the School of Environmental and Forest Sciences, University of Washington. Our hypothesis is that the TCN approach can distinguish the sequence of resistance drilling amplitude of non-thrive trees. Considering that the usefulness of resistance drilling for predicting Douglas-fir WD is unclear with traditional Statistical analysis methods, this study aims to evaluate the performance of resistance drilling to predict WD with a deep learning method, TCN, which is anticipated to filter features and capture non-linear dependency. Therefore, we also hypothesize that the TCN approach has better performance than traditional regression analysis in predicting WD from resistograph amplitude profiles.

Study area
This Stand Management Cooperative study site, namely the Silver Creek Mainline site was planted in the winter of 1977 with Douglas-fir seedlings, 1360 stems per hectare, using 2-0 planting stock type, making the total age of the stand 3 year after the 1977 growing season. The stand is located in Marion County, Oregon (44.52 0 27 00 N, 122.33 0 58 00 W), at an average elevation of 671 m, facing roughly West with a slope of 10%. The soil type is Baumgard silt loam and the Douglas-fir stand adjacent to the site when it was planted exhibited a Site Index of 36.6 m at 50-year breast height age. The study plots were installed after the 1989 growing year during the dormant season at a total stand age of 15 years.
Nine 4,000 m 2 plots were chosen for this study with undergoing unique silviculture pathways, ranging from no action to pre-commercial thinning, in combination with further thinning based on Curtis' relative density (RD) (Curtis 1982) and fertilization (Table 1). Thirty trees were chosen from each plot using a stratified random sampling scheme, where diameter at breast height (DBH) defined the strata. In each plot, the DBH of all trees was ranked into percentile and divided into five strata. Six trees were randomly selected in the second and fourth quintile strata. Trees at percentiles 10, 50, and 90 were preselected as the benchmark. Therefore, only five trees in the first, third, and last quintiles were randomly selected. Resistance drilling and increment cores from increment borers were collected from the South and West sides of the selected standing trees.

Data
Tree diameters were periodically measured by the field crew. Competition indices were represented by the basal area of trees larger than the subject tree (BAL, m 2 /ha) and the plot basal area (BA, m 2 /ha). As a measure of the social rank of the tree within the stand, we used BAL percentile (BALpct) which was the proportion of BAL over plot BA in percentage unit (Equation 1).
Big or dominant trees have low BALpct. Each tree was classified as a normal or non-thrive tree based on its dynamic in BALpct. We define non-thrive trees as the trees whose BALpct increase more than 25 in late stand development compared to their BALpct at establishment. There were 31 non-thrive trees from a total of 274 trees in nine plots (Figure 1). Noted that one plot did not have a non-thrive tree due to low competition. It was the pre-commercial thinned to one-fourth of original density at establishment without fertilization treatment plot.
Total 548 resistograph profiles of two orientations at breast height from 274 trees were obtained. However, only 480 cores from 264 trees could Only minimal thinning from RD 55 to 35 once 5 No action 8 Precommercial thinning to half original stems, no further thinning 10 Applying fertilization every 4 years and repeated thinning from RD 55 to 35, and RD 55 to 40 11 Precommercial thinning to half original stems, applying fertilization every 4 years and minimal thinning from RD 55 to 35 once 12 Precommercial thinning to one-quarter original stems, and applying fertilization every 4 years be achieved in X-ray densitometry for ring density (kg/m 3 ), ringwidth, and latewood percent. Each core WD was calculated as average weighted ring density with ring width (Equation 2). The arithmetic mean value of core WD was 433 kg/m 3 with a standard deviation of 24 kg/m 3 .
The examples of resistograph profiles are illustrated in Figure 2 showing four profiles from two different trees in the same plot with the same diameter but one tree (tree no. 672) is a non-thrive tree while the other (tree no. 614) is the normal tree.

Analyses
Resistograph profiles derived from core drilling samples at breast height vary in length (Figure 3). In this study, we truncated resistograph profiles to 200 mm from bark which obtain the best metrics compared to other core lengths. The first 25.2 mm of resistograph profiles close to barks were trimmed out also to avoid measurement or calibration errors.
Measured amplitudes of resistograph with the resolution of every 0.1 mm from 25.2 to 200 mm were fed as sequential input for the TCN algorithm. Causal convolutions where output at time t is convolved only with elements from time t and earlier in the previous layer differentiating TCN architectures from other convolution models. The illustrated TCN model in Figure 4 has dilation rate set to 4 and kernel size of 3. Dilation consists of skipping with d values between the inputs of the convolutional operation and kernel size controls area in convolution. Other hyperparameters are the number of filters and dropout rate which are linked to predictive power and overfitting control, respectively. Analysis was done in Keras (Chollet 2015) with TensorFlow as a backend (Abadi et al. 2016) and library keras-tcn 1 was applied for the TCN implementation.
TCN classification with class weights for unbalanced classes and sigmoid activation function was applied to non-thrive tree classification. TCN regression with rectified linear unit activation function was applied to predict WD. With these small numbers of sample sizes, we assigned a bigger portion of data for training to ensure model accuracy. Therefore, data was randomly split into three sets: training, validation (in the training process), and testing at the ratio of $75:15:10. Parameters; the number of filters, kernel size, dilation size, and dropout rate, were fine-tuned based on evaluation metrics in training dataset for the best performance models. Then these models were applied to the test dataset for the evaluation metrics. The flow chart of the analysis process is shown in Figure 5.
Metrics for non-thrive tree classification included (1) Recall, a.k.a. hit rate, true positive rate, which is the ability of the classifier to find all the positive samples. It is the ratio between the correct predicted values in the target class (true positive) from all target class values in the data set (true positive þ false negative). And (2) Receiver Operating Characteristic (ROC) curve that is a plot of hit rate vs. false alarm rate at different thresholds. Metrics for WD prediction were mean squared error for model selection in the training dataset and root mean squared error (RMSE) in the test dataset.

Result and discussion
The optimal configuration of TCN for the non-thrive tree classification model had one TCN layer with 64 filters, kernel size of 8, 0.5 dropout rate, dilations of 64, and one dense layer. This model was fitted to the test dataset in different classification thresholds (Figure 6). At the selected threshold of 0.708, the result showed high recall at 0.8 and high precision at 0.31. To screen for non-thrive trees or control for false negatives, the model with high recall is better than high precision. In other words, we rather have normal trees marked as non-thrive trees (false positive) than misidentify non-thrive trees as normal trees (false negative). The accuracy metric, the ability to correctly predict class at the selected threshold, is 0.81. However, the accuracy metric is not suitable due to imbalanced data and unequal class importance. Another important metric, The ROC curve displayed a true positive rate against a false positive rate for each threshold as shown in Figure 7. The derived metric of accuracy, area under the ROC curve (AUC), is the aggregate measure of performance across all possible classification thresholds. In general, an AUC of 0.5 suggests no ability to discrimination, 0.7-0.8 is considered acceptable, 0.8-0.9 is considered excellent, and more than 0.9 is considered outstanding (Hosmer et al. 2013). This study has a model with an AUC of 0.823 indicating that the model is capable of distinguishing non-thrive trees from normal trees.
Social status drift trees were found in thinned stands (i.e., Pretzsch 2021). While social climber trees got more photosynthate from crown release, nonthrive trees lost their competitive access to light and other resources. The majority of non-thrive trees in this study were intermediate trees at an establishment with lagged size development at harvest. The rest of them were codominant trees that lost their privileged status over time. Developing this prognostic model to detect non-thrive trees provides information for regulating trees in stand management.
For WD prediction, the linear regression model on average profile amplitude in the train dataset as shown in Equation (3) Non-linear regression was also evaluated as an alternative model and shown in Equation (4) with RMSE ¼ 24.23 kg/m 3 , Adjusted R-squared ¼ 0.07.
The optimal configuration of TCN for the WD prediction model had one TCN layer with 32 filters, kernel size of 6, 0.5 dropout rate, and dilations of 32. This TCN model was run in the test dataset and obtained RMSE at 18.22 kg/m 3 , which outperformed RMSE from the linear regression and non-linear regression models in the test dataset at 20.15 and 20.33 kg/m 3 , respectively. Figure 8 illustrated that predicted WD from TCN are generally closer to actual values than ones from linear and non-linear regression as TCN allowed non-linearity and also feature representation in the model. However, Deep learning algorithms are often seen as a black box problem as they report results in high accuracy without an explanation of how they make the prediction. On the other hand, regression models of average amplitude from resistograph showed a positive correlation with WD. The relationship can be developed in linear or concave with asymptote non-linear models. The R-squared from these regressions were marginal which supported the recent study from Todoroki et al. (2021) reporting that average amplitude alone could not give precise WD prediction in Douglas-fir. The inclusion of other non-destructive variables could be useful. For example, Iliadis et al. (2013) and Demertzis et al. (2017) found that the inclusion of acoustic velocity produced high predictive power for WD with machine learning models. The integration of a deep learning algorithm improves WD prediction and represents a good venue for future research. Further efforts should be explored in large datasets or with other variables to develop reliable Douglas-fir WD prediction from resistograph profiles.

Conclusion
This study confirms the first hypothesis that the proposed deep learning method, TCN, can identify non-thrive trees from sequences of resistance drilling amplitude with a satisfactory predictive power. Our comparative analysis also supports the second    hypothesis that TCN is superior to traditional linear and non-linear regressions for predicting WD. However, all WD prediction models yielded large prediction errors. We recommend incorporating other explanatory variables with resistance drilling amplitude to improve model accuracy.
In summary, TCN used for sequences of resistograph amplitude can represent the important features through its own autonomous learning. The results are encouraging and provide insights into the link between deep learning methods, non-destructive technology, and tree prognostic which can be a reference for future research involving other species, especially those with fast growth and those in which destructive evaluation techniques create a loss of genetic values.