Estimating leaf mercury content in Phragmites australis based on leaf hyperspectral reflectance

ABSTRACT Introduction: High mercury (Hg) concentrations affect the chlorophyll in leaves, thereby modifying leaf spectra. Hyperspectra is a promising technique for the rapid, nondestructive evaluation of leaf Hg content. In this study, we investigated Hg contents and reflective hyperspectra of reed leaves (Phragmites communis) in a gold mining (Jilin province, China). Spectral parameters sensitive to Hg content were identified through basic spectral transformations, continuous wavelet transformation (CWT), and spectral indices techniques. Leaf Hg inversion models were developed using stepwise multiple linear regression, partial least squares regression, and random forest algorithms.Outcomes: The results indicated that: 1) leaf Hg content decreased with increasing distance from the mine: Jiapigou (JPG) > Erdaocha (EDC) > Laojingchang (LJC) > Erdaogou (EDG) > Lingqian (LQ) > Weishahe (WSH). 2) Hg–sensitive wavelengths were primarily in the visible region; CWT increased the correlation between hyperspectral data and leaf Hg content, and improved the regression and accuracy of inversion; 3) the continuum removal–CWT–stepwise multiple linear regression was better for estimating low leaf Hg content; while the differential spectral index–partial least squares regression was better for estimating high leaf Hg content.Conclusion: These hyperspectral inversion methods could be used for rapid, nondestructive monitoring of wetland plants.


Introduction
Mercury (Hg) is a persistent pollutant that is highly toxic and volatile, and readily accumulates in organisms (Angot et al. 2018). It is easily absorbed by plants and undergoes biomagnification that usually occurs in aquatic systems along the food chain, eventually posing a risk to humans . Globally, fossil fuel combustion, waste incineration, and mineral processing have elevated the atmospheric Hg concentration 5-fold compared with the level before the industrial revolution (Dunagan, Gilmore, and Varekamp 2007;Selin 2018). It was estimated that Asia contributed 49% of the global atmospheric Hg, and China emitted 583 t Hg and accounted for 26% of the total global emission UNEP, 2019). Therefore, it is critical to monitor environmental Hg contamination in these regions.
Wetlands are an active reservoir of Hg as their abundant water and organic matter provides an environment for Hg accumulation due to atmospheric deposition and runoff (Bachand et al. 2014). Hg can be absorbed by plants and then enters the biosphere (Herndon and Whiteside 2017). Due to the persistent impact and toxicity of Hg and its ability to enter the food chain, Hg accumulation in plants can be further amplified (Liu et al. 2010). Consequently, Hg contamination has become a global public health concern (Herndon and Whiteside 2017). Given the environmental and health impacts of Hg and the roles of plants in the biogeochemical and ecological cycling of this element (Liu et al. 2010), it is critical to monitor Hg levels in plants (Sysalová et al. 2017). Traditionally, monitoring of heavy metals in plants has been conducted by field sampling and laboratory analyses. This method is time-consuming and costly for intensive plant sampling (Liu et al. 2010), and inconvenient for large-scale monitoring . Therefore, it is necessary to develop techniques for rapid monitoring of plant Hg contamination.
Hyperspectroscopy is a technique that is suitable for fast, cost-effective, nondestructive, and large-scale monitoring of plant contamination (Dunagan, Gilmore, and Varekamp 2007;Wang, Gao, and Zha 2018). Integrating spectral and imaging information, this technique can also be used to evaluate the anisotropic distribution of spectral characteristics of plant leaves . In previous studies on the use of hyperspectra to estimate plant Hg, screening analyses revealed specific reflectance data that were strongly correlated with the severity of Hg contamination (Dunagan, Gilmore, and Varekamp 2007;Xu 2009;Qiao et al. 2018). For example, Dunagan, Gilmore, and Varekamp (2007) reported that Spinacia oleracea grown in Hg-contaminated soils modified the leaf spectra, with spectral features at bands of 430, 448, 471, and 550 nm, which correlated with the leaf Hg concentration. Xu (2009) found that Zea mays farmed in Hg-contaminated soils had decreased reflectance in the range of 690-740 nm. However, these studies were predominantly focused on agricultural plants such as vegetables and other commercial crops. Few studies have investigated the hyperspectral inversion of Hg concentration in wetland plants, which have important ecological value in production, regulating climate change, purifying water, and providing an ecosystem for many other organisms. It has not been determined whether hyperspectral inversion is a suitable technique for monitoring Hg contamination in wild wetland plants.
In this study, we investigated leaf Hg content in wild reed (Phragmites communis) and its relationship with hyperspectral features. Multiple techniques (spectral transformations, continuous wavelet transformation [CWT], and spectral indices) were used to identify Hg-sensitive parameters from the spectra. Models for the hyperspectral inversion of leaf Hg concentration were then developed with stepwise multiple regression (SMR), partial least squares regression (PLSR), and random forest (RF) algorithms. Our aim was to develop a model for accurate hyperspectral inversion of leaf Hg content in wild reeds that will support the fast, nondestructive, and large-scale monitoring of Hg contamination in wetland plants.

Study area
The study was conducted at the Jiapigou gold mine (127°16ʹ-127°39ʹ E, 42°52ʹ-42°56ʹ N; southeast Huadian, Jilin province, China) (Figure 1), located in the upper reach of the Songhua River. The mine is located in a region that hosts >30 gold mining plants. Since 1949, the region has maintained a gold production rate of >1 ton per year (Zhang, Wang, and Wang et al. 2012). Currently, the Jiapigou gold mine is the largest mining plant in the region, and excavates~2000 tons of ore annually. It started gold extraction using the amalgamation process in 1821 (Yang, Wang, and Cai 2014). Before 2008, the process used~20 kg Hg per year, and 50%-60% leached into the local soil, water, or atmosphere via point and non-point sources (Wang and Zhu 2000;Wang, Zhu, and Sheng et al. 2005). Between 2006 and 2008, the mine replaced the amalgamation process with a new all-slime cyanidation process (Li, Wang, and Wang 2013); however, the decades of Hg emission have resulted in severe pollution, with soil Hg concentrations reaching 2.2 mg·kg −1 , much higher than the standard natural background soil Hg concentration of 0.15 mg·kg −1 .
Six sampling sites were selected near the mine along the Shanma River (a tributary of the Songhua River) (Figure 1): Jiapigou (JPG), Erdaocha (EDC), Laojingchang (LJC), Erdaogou (EDG), Weishahe (WSH), and Lingqian (LQ) (listed from up-to downstream). Of these, LQ is an uncontaminated site without mining operations; JPG is a mine with active mining; EDC is a large tailing pond; and LJC and EDG are confluences of rivers. At LJC, the Shanma River joins an uncontaminated river from Wudaogou (an un-mined area without Hg contamination). At EDG, the Weisha River merges with an uncontaminated river from LQ.

Data collection
Between June and September 2016, five sampling points (each 100 × 100 cm), which were covered by dense reeds and were within 5 km of the river, were established at each of the six sampling sites. Hyperspectra were recorded using a FieldSpec 4 spectroradiometer (350-2500 nm; Analytical Spectral Device, Boulder, CO, USA) in the field at each sampling point. To minimize external interference, intact and undamaged leaves were selected and fully unfolded, then cleaned with water and measured with a leaf clip. Care was taken to avoid the midrib, and whiteboard referencing was conducted every 20 min. The leaves were tested during 10:00-14:00 in bright daylight, with a 5 cm distance between the probe and the middle of the leaf blade to ensure that the leaf occupied the full view of the probe. Three locations were measured on each leaf and the results were averaged. For model development, data from the leaf samples at all sampling points were pooled to ensure sufficient sample sizes. We used the Kennard-Stone (K-S) algorithm to calculate the Euclidean distances in Hg level between all leaf pairs (Yu et al. 2016), and then screened out 41 samples with large differences, which were used for model development (i.e., calibration set). The remaining 27 samples were used for model verification (i.e., validation set) ( Table 1). After measurement, the reed leaf samples were transported to the laboratory for analysis, rinsed with water, and dried. The Hg content was measured by cold vapor atomic fluorescence spectroscopy (CVAFS).

Methods
Hyperspectral parameters sensitive to leaf Hg content were identified from spectra by basic spectral transformations, CWT, and spectral indices techniques.

Basic spectral transformations
The basic spectral transformations were first derivative transformations (DR), continuum removal (CR), logarithmic transformation (LR), and normalized transformation (NR) (Equations 1-4), which effectively improved the contrast of different hyperspectral absorption characteristics. Original hyperspectral reflectance was abbreviated to Ro, Ro processed by DR was abbreviated to DR, and so forth.

Continuous wavelet transformation
CWT reveals subtle variations that are difficult to discern from the original spectral data. In addition, CWT decomposes hyperspectral data continuously giving wavelet coefficients that match well with the original spectral data and enable extraction of fine signals from raw data . In this study, we used CWT to decompose the hyperspectral data into two series of wavelet coefficients under different scales (Equations 5 and 6) (Cheng et al. 2010).
where f(t) represents hyperspectral reflectance (350-2500 nm) measured from a leaf; t is the wavelength interval (350-25,000 nm); ψ a, b (t) is the basic function; a is the scaling factor; and b is the shift factor. Coefficients better reveal the characteristic features in hyperspectra (shape and wavelength; Yu et al. 2016). The coefficients comprise two series: i (i = 1, 2, . . ., m) and j (j = 1, 2, . . ., n), expressed as an m × n matrix. Series i represents the decomposition scale, and j the bands. By correlating the coefficients with leaf Hg data, the optimum decomposition scale can be determined. In this study, the Mexican hat wavelet that performs well for hyperspectral decomposition (Cheng et al. 2010;Zhang et al. 2014) was used as the basic function, and Ro, DR, CR, LR, and NR were processed by CWT under 10 scales (2 1 , 2 2 , 2 3 . . . 2 10 ).

Spectral indices
Constructing spectral indices involves changing simple Ro data into more complex bands by combining information at multiple bands. This serves various purposes, such as increasing loading information, enlarging the differences between bands (Liu, Gong, and Zhao 2014), maximizing reflective information, and minimizing external interferences (Yang et al. 2009). In this study, the differential spectral index (DSI), ratio spectral index (RSI), and normalized spectral index (NDSI) (Equations 7-9), which are the most commonly used, were chosen to establish models for the identification of hyperspectral characteristics of reed leaves and Hg-sensitive wavelengths. To reduce data redundancy, Ro data (-350-2500 nm) were resampled (interval: 5 nm) before spectral index calculations, giving 430 bands. Then, DSI, RSI, and NDSI were calculated from all possible combinations of the 430 bands.

Model development and verification
Stepwise multiple linear regression (SMLR), partial least squares regression (PLSR) and random forest (RF) algorithms, which are the most commonly used in hyperspectral inversion, were employed to develop models for leaf Hg inversion from hyperspectral data. The SMLR method selects the independent variable for the regression equation step-by-step according to the importance of the independent variable. The PLSR method establishes a linear regression model by projecting the predicted variables and the observable variables into a new space. The RF method uses bootstrap resampling and develops decision trees. The predictive power of an inversion model is quantified by its coefficient of determination (R 2 ), and its accuracy is indicated by the root mean square error (RMSE; He 2015). An R 2 approaching 1 and an RMSE approaching 0 indicate excellent regression. R 2 > 0.25 and R 2 > 0.64 were regarded as accep-

Reed leaf Hg contents
The average reed leaf Hg content at the six sampling sites is shown in Table 2. The leaf Hg content decreased with increasing distance from JPG. The leaf Hg content was higher at JPG and EDC than at LJC, EDG, WSH, and LQ.

Identification of Hg-sensitive hyperspectral parameters
Identification based on basic transformations Correlation analyses were performed between leaf Hg content and hyperspectral data ( Figure 2). For Ro, significant negative correlations with leaf Hg content were found for the intervals 350-362, 516-647 (visible region), 696-1300 (near-infrared), 1301-1411, and 1507-1872 nm (mid-infrared) (p < 0.05; absolute value of maximum correlation coefficient (R): 0.521, 726 nm; Figure 2a). For DR, the R fluctuated sharply with wavelength and many Hg-sensitive wavelength intervals were scattered within the spectral range; the maximum R was 0.628 (389 nm; Figure 2b). For CR, a few Hg-sensitive intervals were found, including significant positive correlations at 361-464, 673-682 (visible), 1265-1278 (near-infrared), 2126-2216, and 2250-2290 nm (mid-infrared) and a significant negative correlation at 1293-1301 nm (p < 0.05; maximum R: 0.502, 405 nm; Figure 2c). For LR, the R curve resembled an inverted form of Ro, exhibiting similar Hg-sensitive intervals but increased R values (maximum R: 0.536, 726 nm; Figure 2d). For NR, the R curve was somewhat similar to that of Ro, but had lower R values and fewer Hg-sensitive intervals, with a significant positive association at 376-451 nm (visible) and a significant negative association at 1359-1374 nm (mid-infrared) (p < 0.05; maximum R: 0.441, 406 nm; Figure 2e). Therefore, DR and LR transformations improved the correlations between hyperspectral data and leaf Hg content. The Hg-sensitive intervals identified were predominantly in the visible range after data transformation.
Identification based on continuous wavelet transformation Hyperspectral data were decomposed by CWT and the resulting wavelet coefficients were analyzed for correlations with leaf Hg content. The R values were improved under all decomposition scales after CWT. For Ro-CWT, the maximum R was 0.651; for DR-CWT, it was 0.644; for CR-CWT, 0.662; for LR-CWT, 0.661; and for NR-CWT, 0.661 ( Figure 3). Examination of the R 2 values revealed that Ro-CWT correlated well with leaf Hg content (R 2 > 0.25) at seven intervals (350-495, 529-596, 598-628, 637-691, 736-742, 1251-1253 (Figure 4e). Therefore, there were good correlations between transformed spectral data in the visible range and leaf Hg content, and decompositions on small scales generally improved the correlations after CWT.

Identification based on spectral indices
Spectral indices (i.e., DSI, NDSI and RSI) sensitive to leaf Hg content were calculated for all possible band combinations ( Figure 5). The three spectral indices showed similar correlations with leaf Hg content. However, the bands with high correlations between DSI and leaf Hg content had significantly higher correlation coefficients than those between leaf Hg content and NDSI or RSI (Figure 5a). These regions were scattered on the equipotential diagram, covering all spectral ranges.  Table 3 summarizes all Hg-sensitive spectral indices with R > 0.6. The Hg-sensitive band combinations included visible/near-infrared, visible/mid-infrared, and visible-visible pairs. The maximum R value was obtained for the 350/520 nm combination.

Leaf Hg inversion
Inversion based on spectral transformations For modeling, leaf Hg content was used as the independent variable; for Ro, DR, CR, LR, and NR, Hg-sensitive bands with statistically significant correlations with leaf Hg content (R > 0.5, p < 0.01) were used as dependent variables; and for Ro-CWT, DR-CWT, CR-CWT, LR-CWT and NR-CWT, Hg-sensitive wavelet coefficients (R > 0.5, p < 0.01) were used as dependent variables. Inversion models were developed by SMLR, PLSR, and RF (Table 4).
The SMLR models developed from DR and LR-CWT were overfitted; NR had no wavelength with R > 0.5 ( Figure 2e). These data were not used for modeling.
Further analyses indicated that the models developed from Ro, DR, CR, and LR generally exhibited unsatisfactory fits. DR-PLSR yielded R 2 > 0.64 and showed the highest R 2 allowing leaf Hg inversion. For CWT-processed data, calibration models developed by SMLR and PLSR exhibited acceptable fits, whereas those developed by RF did not. Validation sets developed by PLSR and RF offered better fits than those developed by SMLR. Of all the models developed, eight exhibited better fits for both the calibration and validation sets: CR-CWT-SMLR, NR-CWT-SMLR, Ro-CWT-PLSR, DR-CWT-PLSR, CR-CWT-PLSR, LR-CWT-PLSR, NR-CWT-PLSR, and NR-CWT-RF.   Compared with the models developed from data that did not undergo CWT, those developed using CWT-processed data had improved fits and accuracies in both calibration (R 2 increased by 0.0386-0.2239; RMSE increased by 0.0001-0.0121) and validation (R 2 increased by 0.0018-0.5454; RMSE increased by 0.0013-0.0076). Of all the models examined, CR-CWT-SMLR exhibited the high prediction accuracy and the low RMSE (Table 4), indicating that it was the best model for leaf Hg inversion.

Leaf Hg inversion based on spectral indices
Inversion models were developed by SMLR, PLSR, and RF using leaf Hg content as the dependent variable, and spectral indices with R > 0.5 (i.e., DSI, NDSI and RSI) as the independent variable (Table 5). The three spectral indices were ranked on the basis of their regressions as follows: SMLR > PLSR > RF. Of the models shown in Table 5, only DSI-SMLR and DSI-PLSR showed a high R 2 on both the calibration and validation sets. Unlike the DSI-SMLR algorithm, the DSI-PLSR algorithm can overcome multicollinearity. Therefore, it was selected for subsequent leaf Hg inversion.

Verification of inversion models
The CR-CWT-SMLR and DSI-PLSR models were verified for leaf samples from all sites ( Table 6). The CR-CWT-SMLR model performed satisfactorily (R 2 > 0.8) for all sites except JPG, and the sites were ranked, from highest R 2 value to lowest, as follows: LQ > WSH > EDG > LJC > EDC. The sites were ranked in almost the opposite order on the basis of R 2 values for the DSI-PLSR model: JPG > EDC > LJC > WSH > EDG > LQ.

Identification of Hg-sensitive parameters and development of leaf Hg inversion model
High Hg concentration can decrease the chlorophyll content (Kupper, Kupper, and Spiller 1996), destroy the cellular structure, and change the permeability of cell membranes in the plant leaves (Yuan and Ding 2016), thereby causing reflective refractive discontinuity and Table 3. Summary of spectral indices with a correlation coefficient >0.6.

Spectral indices Band 1 (nm)
Band 2 (nm)  DSI  350  520-530  375  1655  NDSI  350  520-530  370  1585, 1890-1900, 2205-2210  RSI  350  520-530  370 1580-1585, 1885-1900, 2200-2215 DSI, differential spectral index; NDSI, normalized spectral index; and RSI, ratio spectral index.   modifying leaf spectra information, such as the red edge shifting toward the low-band direction, the green peak showing increased reflectance, and the red valley becoming shallower by increasing or decreasing the reflectivity (Maruthi Sridhar et al. 2007;Chi, Guo, and Chen 2017;Li 2018). In this study, the Hg-sensitive wavelengths identified from the data processed by basic spectral transformations and CWT were mainly in the visible region, consistent with the findings of another study on the correlations between hyperspectral reflectance of reed leaves and the heavy metals lead and copper (Zhou 2014). Wavelet coefficient screening and inversion modeling showed that after CWT, the correlation between reflectance and leaf Hg content increased substantially (Figure 3), and the R 2 and RMSE of prediction also improved (Table 4). Similar results were obtained in our earlier study on hyperspectral Hg inversion of reed leaves grown under controlled Hg exposure . This improvement may be attributable to the characteristics of wavelet transformations that decompose hyperspectral data in both time and frequency domains. It is thus possible to screen optimum signals on different scales to predict physiobiochemical characteristics of plants (Song et al. 2006(Song et al. , 2008. As compared with discrete wavelet transformation, CWT decomposes hyperspectral data continuously giving wavelet coefficients that match well with the original spectral data and enable extraction of fine signals from raw data. Therefore, compared with discrete wavelet transformation, CWT allows for clearer characterization of spectral signals (Fang, Le, and Liang 2015;Liang et al. 2015). In addition, CWT reveals subtle variations that are difficult to discern from the original spectral data, thus allowing the discovery of hidden information and improving the accuracy of inversion compared with analyses that do not include CWT (Yu et al. 2016). Due to these advantages, CWT is an important technique for the hyperspectral determination of Hg content in reed leaves. In this study, analyses of leaf Hg inversion based on hyperspectral transforms and spectral indices showed that CR-CWT-SMLR and DSI-PLSR performed better than RF. In our earlier study on Hg inversion for reed leaves grown under controlled Hg exposure, an RF model performed better than PLSR and SMR models . This difference may be attributed to the relatively small dataset of the current field study, which failed to fully exploit the advantages of the RF algorithm. Instead, commonly used empirical or semi-empirical modeling (particularly PLSR, which overcomes multicollinearity) algorithms performed better in this study.
Model verification with measured leaf Hg content data from the sampling sites showed that the CR-CWT-SMLR model ranked the sites, from highest R 2 value to lowest, as follows: LQ > WSH > EDG > LJC > EDC > JPG, whereas the ranking based on the R 2 values of the DSI-PLSR model showed almost the opposite trend (Table 6). This difference might have resulted from the geographic distribution of the sampling sites. Of all the sites studied, JPG was nearest the gold mine and thus, reeds at this site had the highest leaf Hg content. The other sites were a distance from the mine in ascending order of EDC, LJC, EDG, WSH, and LQ. Accordingly, the reed leaf Hg contents at these sites generally decreased sequentially ( Table 2). The R 2 of leaf Hg content predicted using the CR-CWT-SMLR model decreased with increasing Hg content; therefore, this model was stable and reliable at low leaf Hg content. In comparison, DSI-PLSR was reliable for sites with high leaf Hg content, but was less reliable for predicting low leaf Hg content.

Limitations and implications
Due to the limitations in sampling conditions and workload, we did not consider the effects of other factors, such as, climatic factors, on the performance of spectral indices to estimate the Hg content of reed leaves. Further studies should be carried out to evaluate variations in spectral indices of reed leaves related to differences in moisture, season, and/or other co-existing factors. In addition, the R 2 of the validation sets of the accepted leaf Hg inversion models were much higher than those in the calibration sets (Tables 4 and 5), and this may be attributed to the small number of sampling sites and small difference between the validation samples. These findings also indicate that the robustness of the models requires verification with more samples. However, in this study, leaf sample data from all the sampling points were used by pooling and filtering. We will conduct supplementary experiments in the future to collect samples for verification of robustness. Finally, further research should focus on technical methods to monitor Hg in reeds and other plants such as grasses and forest species in polluted areas, in order to better verify the method and expand the scope of its application.

Conclusion
Reed leaves were collected from a gold mining region to determine the relationship between leaf Hg content and the characteristics of reflective hyperspectra. Spectral transformation techniques were used to identify Hg-sensitive data. Based on the models developed by SMLR, PLSR, and RF algorithms, we investigated the correlations between the identified data and leaf Hg content, and inversed the leaf Hg content. The main results were as follows: (1) Wavelengths sensitive to leaf Hg content were primarily in the visible region. Hyperspectral data processed by CWT showed stronger correlations with leaf Hg content than the original hyperspectral data.
The Hg-sensitive wavelengths identified based on spectral indices showed a strong correlation in the 350/ 520 nm combination. (2) Inversion models developed based on CWTprocessed data improved regression and accuracy both for the calibration and validation sets. Of all the models, the CR-CWT-SMLR model resulted in the best inversion (R 2 ) for estimating low leaf Hg content, while the DSI-PLSR model performed best for estimating high leaf Hg contents.

Disclosure statement
No potential conflict of interest was reported by the authors.