A new spectral index for the quantitative identification of yellow rust using fungal spore information

ABSTRACT Yellow rust (Puccinia striiformis f. sp. Tritici) is a frequently occurring fungal disease of winter wheat (Triticum aestivum L.). During yellow rust infestation, fungal spores appear on the surface of the leaves as yellow and narrow stripes parallel to the leaf veins. We analyzed the effect of the fungal spores on the spectra of the diseased leaves to find a band sensitive to yellow rust and established a new vegetation index called the yellow rust spore index (YRSI). The estimation accuracy and stability were evaluated using two years of leaf spectral data, and the results were compared with eight indices commonly used for yellow rust detection. The results showed that the use of the YRSI ranked first for estimating the disease ratio for the 2017 spectral data (R2 = 0.710, RMSE = 0.097) and outperformed the published indices (R2 = 0.587, RMSE = 0.120) for the validation using the 2002 spectral data. The random forest (RF), k-nearest neighbor (KNN), and support vector machine (SVM) algorithms were used to test the discrimination ability of the YRSI and the eight commonly used indices using a mixed dataset of yellow-rust-infested, healthy, and aphid–infested wheat spectral data. The YRSI provided the best performance.


Introduction
Winter wheat (Triticum aestivum L.) is an essential food crop worldwide. The yield and economic benefits have increased as a result of agricultural intensification. However, an increase in the frequency of occurrence of pests and diseases (P&D) is adversely affecting wheat yield and quality, resulting in economic losses and posing a threat to food security. Therefore, effective and timely monitoring of wheat P&D is essential. Many scholars have used hyperspectral remote sensing for investigating P&D of winter wheat. Huang, Lamb, Niu, Zhang, Liu, and Wang (2007) used the photochemical reflectance index (PRI) to quantify yellow rust levels with high accuracy. Bauriegel, Giebel, Geyer, Schmidt, and Herppich (2011) acquired hyperspectral data of wheat under laboratory conditions and used principal component analysis to distinguish the spectra of diseased and healthy ear tissues in the wavelength ranges of 500-533 nm, 560-675 nm, 682-733 nm, and 927-931 nm. Morel, Jay, Feret, Adel, Bendoula, Carreel, and Gorretta (2018) analyzed the effects of fungal diseases on leaf biochemical and biological parameters using the PROCOSINE radiative transfer model and close-range hyperspectral imaging. Huang, Lu, Ye, Kong, Mortimer, and Shi (2018) distinguished powdery mildew, stripe rust, and nitrogen-water stress by continuous wavelet analysis of canopy hyperspectral data. Marín, Hoyos-Carvajal, and Botero Fernandez (2019) used hyperspectral reflectance spectroscopy and found that the 510-520 nm, 650-670 nm, and 700-750 nm wavelengths could differentiate Fusarium oxysporum stress and water stress in Solanum lycopersicum plants. Unlike traditional methods that depend on manual and visual assessment, remote sensing allows for effective monitoring of P&D severity and areas of occurrence using long-distance and large-area monitoring. Hyperspectral remote sensing provides rich spectral information and analysis results relatively easily and quickly and has a significant potential for accurate detection and differentiation of P&D.
Wheat yellow rust is a global disease that results in significant economic losses; therefore, in-depth research on this disease is urgently needed. Yellow rust often occurs in cold and humid regions. However, due to global climate change and global trade, yellow rust has recently shown adaptation to higher temperatures and has expanded significantly. Therefore, there is an urgent need to develop novel and effective methods for detecting yellow rust. Several authors have used hyperspectral remote sensing for studying yellow rust. For example, Jiang, Chen, Gong, and Li (2007) found that the derivative spectral index outperformed a standard spectral index because it reduces the additive constants and minimizes soil background effects. In the study of Liu, Gu, Wang, Wang, and Ma (2015), canopy spectral data of wheat inoculated with three different concentrations of urediniospores were collected, and a model was established using the discriminant partial least-squares method to distinguish leaves with and without yellow rust infection. Whetton, Hassall, Waine, and Mouazen (2018) used the standard deviation of the wavelength range from 500 to 650 nm and the squared difference between 650 nm and 700 nm to discriminate healthy, yellow-rust-infected and fusarium-head-blight-infected wheat and barley. Zhang, Yuan, Pu, Loraamm, Yang, and Wang (2014) compared continuous wavelet and conventional spectral features for detecting yellow rust disease at the leaf level. Zhang, Pu, Huang, Lin, Luo, and Wang(2012) used normalized canopy-scale hyperspectral reflectance data to differentiate yellow rust from nutrient stress in wheat. Guo, Huang, Ye, Dong, Ma, Ren, and Ruan(2020) used Headwall hyperspectral images of diseased wheat leaves and extracted spectral and texture features to establish a wheat yellow rust identification model. In previous studies, spectral information of wheat infected by yellow rust at the leaf and canopy scale was used to establish detection models. At the leaf scale, spectral information contains biophysical and biochemical information, and fungal spore information. At the canopy scale, the information includes leaf-scale information and structural information. The spectral information of the fungal spores on the leaf surface of wheat is a unique spectral feature that distinguishes yellow rust from other biotic and abiotic stresses, providing quantitative data on the leaf disease severity.
In the aforementioned studies, the unique fungal spore information was not separated from the other leaf spectral information. In contrast, this paper extends existing studies and focuses on the fungal spore information of yellow rust. The objectives of this paper were to (i) develop a new index (yellow rust spore index, YRSI) that uses the band sensitive to the fungal spores present on the diseased leaves; (ii) compare the YRSI with eight commonly used indices for yellow rust detection using two years of leaf-scale spectral data to evaluate the estimation accuracy and stability of the YRSI; (iii) evaluate the ability of the YRSI to distinguish leaves infected by yellow rust from healthy and aphid-infested leaves. The goal of this paper is to demonstrate that the proposed index is well suited for yellow rust identification and differentiation, facilitating crop protection.

Experimental sites and experimental design
Experiment 1 was conducted in 2017 at the Scientific Research and Experiment Base of the Chinese Academy of Agricultural Science, Langfang, Hebei Province, China (39° 30ʹ40ʹ'N, 116°36ʹ20ʹ'E). The wheat variety in the study area was 'Mingxian 169ʹ, and the wheat in the study area was inoculated with yellow rust. The spore concentration was 9 mg/100 ml. The area of each experimental plot was 3 m × 7 m. Forty-two replicates of the yellow rust-inoculated plots were used to maximize the information on the yellow rust severity. Normal field management (200 kg/ha nitrogen and 450 m 3 /ha water per plot) was implemented in each plot. The spectral data collected in this research were acquired from Zadok31 to Zadok71 in 2017 (Zadoks, 1974).
Experiment 2 was designed to obtain spectral data of winter wheat infected by yellow rust and aphids and was conducted at the Xiaotangshan National Precision Agriculture Demonstration Research Base, Changping District, Beijing (40° 10.6ʹN, 116° 26.3ʹE). Two varieties of winter wheat, '98-100ʹ and 'Jingdong 8ʹ, were selected due to their susceptibility to yellow rust. According to the National Plant Protection Standard (NPPS), the wheat was inoculated with yellow rust by spore inoculation. The 'Jingdong 8ʹ wheat variety is also susceptible to aphids. The wheat was inoculated with aphids during the growing season. The spectral data collected in this research were acquired at Zadok 71 in 2002.

Spectral measurements and processing
Leaf spectra were obtained using the ASD FieldSpec spectrometer (Analytical Spectral Devices, Inc., Boulder, CO, USA) coupled with a Li-Cor 1800-12 integration sphere (Li-Cor, Inc., Lincoln, NE, USA). The spectrometer was configured with a spectral range of 350-2500 nm and a field of view of 25°; the spectral resolution was 3 nm for the 350-1000 nm region and 10 nm for the 1000-2500 nm region. A focused beam was used for illumination (Huang et al., 2014).

Disease severity determination
The disease ratio (DR) was used in this study for measuring the disease severity. It is defined as the percentage of the leaf area covered by disease pustules. Two commonly used methods of interpretation were used here.
The first method is visual interpretation, which is most commonly used. After completing the spectral measurements of the leaf, each leaf was photographed and interpreted by a specialist (all samples were interpreted by the same person to minimize the error). All sampled leaves were inspected according to the National Rules for the Investigation and Forecasting of Crop Diseases (GB/T 15795-1995). The leaf was interpreted in certain intervals to reduce the error of estimating the percentage of infected spots.
The second method consists of using a LiDE 300 Canon scanner (Canon (China) Ltd., Beijing, China) to scan the disease-damaged leaves and obtain the DR value by image processing. An increasing number of studies have used image processing methods rather than visual interpretation for yellow rust detection due to improvements in image processing technology (Mi, Zhang, Su, Han, & Su, 2020). This method provides more accurate results than the first method.
The second method was used in Experiment 1, and the results are described in Section 3.1. The first method was used in Experiment 2; the results are presented in Section 3.2.

Establishment of the yellow rust spore index
Yellow rust is a fungal disease. The pathogen spores attach themselves to the leaves and penetrate deep into the leaf tissue. The fungal spores occur as stripes on the leaf surface and affect the leaf's pigment and water content (Figure 1(b)), adversely affecting the photosynthesis of the leaf. The spectrum of the diseased leaf contains spectral information on the spores and the leaf. As shown in Figure 2(a), the reflectance spectrum of a healthy leaf has a peak in the green bands, a valley in the red bands, and high reflectance in the near-infrared bands. The reflectance spectrum of the fungal spores (Bohnenkamp, Kuska, Mahlein, & Behmann, 2019) is very low below 522 nm, followed by increasing reflectance.
As the area and thickness of the yellow rust spores covering the leaf increase, the spectral reflectance of the leaf changes. In this research, a linear spectral mixing method was used to simulate the spectral response of the spores on the diseased leaves. Subsequently, a spectral response analysis was used to find the sensitive bands to construct the index. Based on the study by Bohnenkamp, Kuska, Mahlein, and Behmann (2019) and our measured data, the proportion of the spore spectra of the mixed spectrum was determined as approximately 0.2 when the DR was equal to 0.8 in this study. Therefore, the proportion of the spectra of yellow rust spores (YR spores) was used as a single variable, and the spore spectra were linearly mixed with the healthy leaf spectra. The proportion of spore spectra was set to 0-0.2, and the interval was set to 0.02. A total of 10 spectra were generated to determine the influence of the spore spectra on the leaf spectra, as shown in Figure 2(b). A sensitivity analysis of the simulation data was conducted using Equation (1): where p is the proportion of spore spectra (p range is 0-0.2); l denotes the step size pþl is the spectrum at p þ l; andz is the number of steps (z ¼ 10); i indicates the current number of parameter variations.
The results of the sensitivity analysis are shown in Figure 2(c). The peak of the sensitivity analysis curve of the fungal spores was located in the red band at 682 nm. The band is the nost sensitive band to changes in the proportion of spore spectra and also can be used to distinguish yellow rust from other crop P&Ds.
We established a normalized vegetation index using the 682 nm band and a second band with low sensitivity. However, low sensitivity in the range of 400-500 nm for all bands. Therefore, a correlation analysis was performed using each band of the leaf spectrum and DR to obtain |r| values. The lowest |r| value was 0.001 at 452 nm; thus, this band with relatively low values in the sensitivity analysis and correlation analysis was used as the second band. Finally, we obtained the YRSI as follows:

Spectral indices used for comparison
We selected eight published vegetation indices commonly used for yellow rust detection to compare with the proposed index for the quantitative identification of yellow rust. The vegetation indices used in this research are shown in Table 1.

Discrimination methods
The random forest (RF), k-nearest neighbor (KNN), and support vector machine (SVM) algorithms were used as classifiers to distinguish wheat infected by yellow rust, aphidinfested wheat, and healthy wheat. The overall accuracy (OA) (Ma et al., 2019) and F 1 score (F-S) (Ma et al., 2019;Su et al., 2020) were calculated to evaluate the indices' ability to discriminate the three classes. Each of the three classifiers has its own advantages. RF is a state-of-the-art algorithm with high accuracy, robustness, and low computational complexity (Su et al., 2018). KNN is a classical algorithm suitable for solving multi-classification problems; it is straightforward and has high accuracy and high tolerance to outliers and noise (Nugrahaeni & Mutijarsa, 2016;Krithika & Selvarani, 2017;Moldagulova & Sulaiman, 2018;Shah, Patel, Sanghvi, & Shah, 2020). The SVM has high generalization ability and is suitable for a small training sample size (Yin & Hou, 2016). A comparison of the three classifiers was conducted to provide an objective assessment of the differentiation ability of the indices.
(1) Random forest As a popular and highly flexible machine learning algorithm, RF was first proposed by Breiman (2001) and has superior classification performance. As the name suggests, RF is an advanced algorithm that randomly assembles a forest consisting of unrelated decision trees. Each tree votes on the class assigned to the given sample, with the most frequent answer winning the vote (Sun & Schulz, 2015). RF is a Bagging (short for Bootstrap AGgregation) ensemble learning method. In past studies, RF has achieved good classification performance, showing good tolerance to noise and outliers, and it is not prone to overfitting (Prasad, Iverson, & Liaw, 2006;Puissant, Rougier, & Stumpf, 2014;Tatsumi, Yamashiki, Torres, & Taipe, 2015;Ye, Huang, Huang, Cui, Jin, Guo, and Jin 2020). In this research, the number of decision trees was 30.
(2) K-nearest neighbor The KNN algorithm is a non-parametric statistical method for classification and regression (Wu, Feng, Zhang, & Yang, 2015) and has a wide application range (Nugrahaeni & Mutijarsa, 2016;Krithika & Selvarani, 2017;Moldagulova & Sulaiman, 2018;Shah, Patel, Sanghvi, & Shah, 2020). This instance-based learning algorithm is very simple, popular, and efficient for pattern recognition (Nacerfarajzadeh, Gangpan, Zhaohuiwu, & Minyao, 2011). In the algorithm, the input contains the k-closest training samples in the feature space, and in KNN classification, the output is a classification population (Foulds & Frank, 2010). The category with the most occurrences is the class of the object. In the classifier, k is always a positive integer, usually a decimal number. In this study, k is 1. In this case, the object's category is directly assigned by the nearest node. One of the key factors determining the classification performance of the KNN algorithm is the distance metric. The Euclidean distance is the most commonly used distance function for the KNN and is used in this study.
(3) Support Vector Machine The SVM was developed by Vapnik and Vladimir (2013) and is a supervised, nonparametric, machine learning technique. The fundamental principle of the SVM is structural risk minimization (SRM). As described by Vapnik and Vladimir (2013), the risk of the learning machine is constrained by the sum of the empirical risk estimated by the training sample and the confidence interval. The objective of SRM is to maintain a fixed empirical risk and minimize the confidence interval or maximize the margin between the separating hyperplane and the closest data point (Son, Park, Lee, Kim, Han, and Kim, 2019). A linear SVM was used in this research.

Disease ratio assessment with spectral indices
The 2017 spectral dataset with 84 samples was used to compare the results of the YRSI and the other eight vegetation indices for estimating the severity of yellow rust. The YRSI showed excellent linearity and the best performance for the relationship between the vegetation index and the DR (R 2 = 0.710, RMSE = 0.097) (Figure 3). The PSRI provided the second-best result, with an R 2 value of 0.701 (RMSE = 0.098). The PSRI is designed to detect vegetation senescence and is a good indicator of the pigment changes produced by yellow rust. The SIPI and MTCI provided similar results; these indices are typically used to determine the chlorophyll content of wheat infected by yellow rust. The SIPI ranked 3 rd (R 2 = 0.679, RMSE = 0.102), and the MTCI ranked 4 th (R 2 = 0.656, RMSE = 0.105). The PhRI (R 2 = 0.628, RMSE = 0.110), MSR (R 2 = 0.621, RMSE = 0.111), and NDVI (R 2 = 0.605, RMSE = 0.113) ranked 5 th , 6 th , and 7 th , respectively. Although these three indices ranked relatively low, their R 2 values were greater than 0.6, indicating their ability to detect yellow rust severity. The PhRI is related to light use efficiency and uses the 550 nm and 531 nm bands. It can provide information on the mixed reflectance of the pigments and the coverage of the yellow rust spores. Since the NDVI and MSR use the same bands but different configurations, they exhibited similar performance. The ARI is used to assess the anthocyanin content of vegetation. It has also been used for monitoring yellow rust since the anthocyanin content is altered when the vegetation is under external stress. However, the results of the ARI were not satisfactory in this study (R 2 = 0.074, RMSE = 0.173). The NRI showed the lowest performance among all indices (R 2 = 0.001, RMSE = 0.179). The scatter plot showed strong dispersion, indicating that this index is not suitable for quantitative inversion models of yellow rust severity.

Cross-validation of the disease ratio obtained from the spectral indices
The 2002 leaf dataset was selected for validation to determine the generalizability and robustness of the YRSI and the published indices. The dataset contained 178 samples. The DR was obtained by visual interpretation and expert knowledge, as mentioned in Subsection 2.2.2. Therefore, in this section, we conducted the cross-validation to assess the generalization ability and robustness of the proposed index. The results of the cross-validation are shown in Figure 4.
The proposed YRSI exhibited the best performance in the cross-validation, similar to the result in the previous subsection. The YRSI had the highest correlation with the DR (R 2 = 0.587, RMSE = 0.120), and the points of the predicted versus the measured DR were the closest to the 1:1 line (the slope was 0.574) among all indices. The other indices performed inconsistently for the two years of data. The YRSI provides excellent results since it uses the most sensitive 682 nm band to detect the fungal spores, and the normalized form removes the interference of other vegetation information. This index has a strong quantitative relationship with the DR.
The PSRI and SIPI provided high precision in the 2017 data but low precision in the 2002 validation data. The SIPI dropped five places in the ranking compared to the previous results and only obtained an R 2 of 0.452 (RMSE = 0.130). The PSRI ranked lower for the 2002 data (R 2 = 0.517, RMSE = 0.125) than the 2017 data. This result indicates that these indices are not sufficiently robust to guarantee good accuracy for different years and varieties. The MSR ranked the highest among the published indices in 2002 (R 2 = 0.572, RMSE = 0.122) and provided better performance than that in the 2017 data. The MTCI ranked third in the 2002 validation data, with an R 2 of 0.572 and an RMSE of 0.123. It obtained stable and relatively satisfactory results among the published indices in both years of the leaf data. In contrast to the relatively low ranking in the 2017 data, the NDVI showed better performance for the 2002 validation data (R 2 = 0.545, RMSE = 0.123). The performance of the ARI improved slightly compared with the 2017 data (R 2 = 0.544, RMSE = 0.123). However, the results of the two years showed that the ARI is not a satisfactory index for the quantitative detection of yellow rust. The PhRI ranked relatively low in both years of analysis but showed some ability to quantify the severity of yellow rust. Although it ranked last, the NRI showed better performance in the 2002 data, compared with 2017 data.

Discrimination test and yellow rust identification
The spectral dataset of the healthy, yellow-rust-infected, and aphid-infected wheat was collected in Experiment 2. There were 145 healthy samples, 132 yellow-rust-infected samples, and 56 aphid-infected samples. The nine indices were evaluated for the differentiation of the three classes using the RF, KNN, and SVM classifiers. The parameter settings for the three classifiers were the same as in the experiments (Section 2.5). Each index was used as a unique input feature for the classifiers. Five-fold cross-validation was used for accuracy evaluation. The OA and the F-S of healthy wheat (HE), wheat infected by yellow rust (YR), and wheat infected by aphids (AH) were used to evaluate the indices. The OA and F-S results show the ability of the indices to discriminate the three classes.
Among all indices, the use of the YRSI resulted in the highest classification accuracy for detecting yellow rust with all three classifiers ( Table 2). The OA was 85.6% for the RF classifier, 84.4% for the KNN, and the highest accuracy of 87.4% for the SVM. The YRSI's F-S showed good performance in all three classes for the different classifiers. For YRSI, The F-S of YR exceeded 90% for all three classifiers and was the highest for the HE and AH classes for all indices. Therefore, the YRSI had the strongest differentiation ability.
The OAs of the PhRI, SIPI, MSR, and PSRI exceeded 70% for all classifiers; however, the performance of the F-S was not very stable. The PhRI, SIPI, and PSRI provided relatively high F-S for the SVM in different classes; however, the F-S of AH was low for the other classifiers. The MSR yielded a low F-S for AH for all classifiers. The PhRI demonstrated relatively good differentiation ability, which was attributed to the following reasons (Gamon, Peneulas,and Field,1992): first, it reflects the crop's light use efficiency; second, the 550 nm and 531 nm bands are sensitive to chlorophyll and carotenoids, respectively. Therefore, the PhRI can distinguish the difference between healthy and stressed leaves. The SIPI's performance ranked second after the PhRI. As a chlorophyll index, it can distinguish healthy leaves from those affected by P&D, and it uses the 680 nm and 445 nm bands. Its band composition is similar to that of the YRSI, facilitating the identification of yellow rust (Penuelas, Baret, and Filella,1995). The different classifiers' accuracies of NDVI and MTCI achieved more than 60%; however, the F-S varied significantly in the different classes, with poor performance in the AH class. Su et al. (2018) also used the RF classifier for yellow rust discrimination; the red and NIR bands showed strong discrimination ability, and using the NDVI provided good results. However, the study by Su et al. (2018) was conducted at the canopy scale and image level, whereas our study was performed at the leaf scale. Thus, results may not be comparable. The performance of ARI and NRI was disappointing, and these indices were not suitable for differentiating the three classes of interest. The confusion matrices of the classification results for the three classifiers and the YRSI data are shown in Table 3. The user's accuracy for yellow rust was greater than 90% for all three classifiers (93.18% for RF, 93.18% for KNN, 96.69% for SVM). The producer's accuracy was 93.18% for the RF, 93.18% for the KNN, and 88.64% for the SVM. The results indicated that the use of the YRSI provided high accuracies for distinguishing yellow rust from healthy leaves and aphid-infested leaves and can be used for yellow rust identification in wheat.

Conclusion
In this study, we identified the most suitable hyperspectral band to develop the new index YRSI for detecting wheat yellow rust by analyzing the effects of the fungal spores on the leaves on the overall leaf spectral response to develop the YRSI. A comparison with eight commonly used indices for yellow rust detection demonstrated that the YRSI exhibited the best performance for describing the relationship with the DR in the 2017 spectral data (R 2 = 0.710, RMSE = 0.097). The YRSI also showed the best performance in the crossvalidation, with the highest R 2 value and lowest RMSE (R 2 = 0.587, RMSE = 0.120) in the 2002 spectral data. The YRSI proved accurate and robust for the quantitative estimation of the DR. Classification was performed using the nine indices with the RF, KNN, and SVM classifiers to differentiate three classes (yellow-rust-infested, aphid-infested, and healthy leaves). The use of the YRSI provided the best performance: the OAs were 85.6%, 84.4%, and 87.4%, and the kappa coefficients were 0.751, 0.756, and 0.797 for the RF, KNN, and SVM, respectively. The difference in the classification accuracy between the different classifiers was not significant. The user's accuracies for identifying yellow rust with the three classifiers were higher than 90%, and the producer's accuracies were around 90%. The use of the YRSI provided superior results for identifying yellow-rust-infected individuals, and the YRSI had the best performance for a mixed dataset.
In the present research, the influence of the fungal spores was considered in the index construction. The search for the optimal band was based on the linear spectral mixing of the spore spectra and healthy leaf spectra. The influence of the spores on the leaf spectra requires more in-depth investigation in subsequent studies. Moreover, only the leaf-scale analysis was conducted in this study. The applicability of the proposed index has to be verified at the canopy scale in follow-up studies. In addition, the YRSI is composed of red and blue bands and is suitable for the analysis of UAV and satellite images. The use of the proposed index in remote sensing image analysis is of great significance because the index enables fast and accurate detection and mapping of yellow rust. Finally, it is undeniable that the YRSI is an excellent index for identifying yellow rust and demonstrated good performance for quantitatively estimating disease severity in this study. This research provides guidance for the accurate and dynamic identification of yellow rust in the field and fine-scale application of pesticides.