Detection of the moldy status of the stored maize kernels using hyperspectral imaging and deep learning algorithms

ABSTRACT It is significant to identify the moldy status of stored maize by fungi infection in the early stage. Hyperspectral imaging (HSI) combined with the sparse auto-encoders (SAE) and convolutional neural network (CNN) algorithms was used to classify the moldy grades of maize kernels. The HSI data were obtained in the range of 400–1000 nm, and four grades from health to heavy mildew were distinguished using the measured fungal spores of maize. The depth spectral features were represented using SAE and the image features were extracted by CNN. K nearest neighbors, support vector machine (SVM), and partial least squares discriminant analysis classifiers were combined with the spectral and image features to establish classification models to identify the different moldy grades of maize kernels. The comparison results indicated that the fusion of SAE and CNN combined with the SVM classifier to construct the SAE-CNN-SVM model had the most satisfactory identification result with high correct recognition rates of 99.47% and 98.94% for the training and testing sets, respectively, and the values of sensitivity and specificity were 0.95–1. The moldy grades were presented intuitively on the maize image based on pixels or kernel-wise. Therefore, the HSI with the SAE-CNN-SVM model had good recognition ability for the early detection of moldy maize kernels, which could potentially provide technical support for the development of online detection of moldy maize kernels during storage.


Introduction
Maize is one of the most widely planted and high-yield grain crops in the world, and it is also an important raw material for processing crops in feed, chemical, food, and other industries. [1] However, maize is easy to be infected by the main fungus of Fusarium, Aspergillus, and Penicillium during postpartum to processing with high temperature and high humidity because of its large embryo, high water content, and rich nutrients, which will make its kernels moldy and further lead to the decline or loss of the edible and feeding quality. [2,3] More importantly, if the maize infected with fungi is not detected early, they will further produce mycotoxins that can threaten human and animal health. [4,5] Therefore, early detection of moldy kernels is one of the important tasks of grain quality inspection, which can be used to control and reduce grain loss.
The number of fungal spores is an important index to evaluate the early mildew status and grades of maize during storage and processing. [6] In China, the detection of fungal spores, total plate count, and mycotoxins are very important to evaluate the quality and safety of grain. [7] However, the related detection methods are mainly microbial experiments, such as spore counting, enzyme-linked immunosorbent assay, polymerase chain reaction, and high-performance liquid chromatography. [8,9] The accuracy and stability of the above methods can be guaranteed, but they are time-consuming,

Preparation of moldy maize kernels
A total of 285 maize samples (each sample contained 500 g kernels of maize) were collected from some grain depots (bulk grain) in China, which were packed in airtight bags and firstly stored in an artificial climate chamber with low temperature (≤4°C) for sample separation. Samples were randomly selected and each sample was divided into two parts for the experiment, one for HSI data scanning  kernels in each HSI image) and the other for the detection of fungal spores that could be used to determinate the naturally moldy maize grades by the method of the inspection of grain and oil-storage fungal examination-enumeration spores of fungi (the developed grain industry standard of China). [29] The standard specifies the maize samples could be divided into four moldy grades using the values of fungal spores data: grade1, grade2, grade3, and grade4, they correspond to health (value<1.0 × 10 5 ), mild mildew (1.0 × 10 5 < value <9.9 × 10 5 ), moderate mildew (1.0 × 10 6 < value < 9.9 × 10 6 ), and heavy mildew (value >1.0 × 10 7 ). After the statistical results, two-third of the samples (n = 190) was selected randomly as the training set and the remaining one-third (n = 95) was used as the testing set, which ensured independence between the two sets.

Detection methods of fungal spores
The fungal spores of maize were measured by the above standard methods, main steps include 10 g of maize kernels and 30 ml of deionized water were taken in a 50 ml test tube. The tube was plugged and shaken violently for 1 min (approximately 120-150 times) with manual way. After 5 min, the filtrate was collected for the nest trial using a filter cloth with 300 meshes. A hemocytometer with 25 middle squares was used to count the values of fungal spores (the number of spores in the middle square was recorded). The filtrate was dripped onto the hemocytometer plate with a 20 mm × 20 mm coverslip using a dropper pipette. After 30 s, the fungal spores on the plate were observed and counted under a microscope (eyepiece: 10 × ; objective lens: 10 × -60 × ). Eq. (1) is used to record spores in five squares <10 and Eq. (2) is used to record spores in 25 squares. Two parallel experiments were performed for each sample and the highest value was recorded.
where X 1 was the number of fungal spores of maize kernel (count/g) and A represents the total number of fungal spores in 5 or 25 squares.

Hyperspectral imaging system
The HSI system in this study is shown in Figure 1. The whole system is packed in a black box to avoid external light interference and the major components as: the transmission stage and a stepping motor (EZHR17EN; AllMotion Inc.,); a stable output halogen light source (21 V/150 W) with two branch linear light guide (3900-ER; Illumination Technologies Inc.); an imaging spectrograph (ImSpector VNIRV10E) with a spectral range of 325-1100 nm; a EMCCD camera (Andor Luca EMCCD DL-604 M; Andor Technology Ltd.) with 1608 × 1208 pixels and 6.15 mm x 14.2 mm for image size; a computer is used to collect HSI data by acquisition software (ACER N16Q1, Intel Core i7-6500 U@2.5 GHz, RAM 12GB). The bands of the low signal-to-noise ratio were discarded because of the low quantum efficiency and dark current of the CCD detector at the edges of the spectral region.
Only the band of 400-1000 nm (824 bands) was used for further processing. The acquired raw HSI R raw was calibrated into reflectance mode R cor according to where R white is the white reference image obtained using a standard white Teflon tile (~100% reflectance) and R dark is the black reference image acquired by covering the camera lens completely with its own black cap (~0% reflectance). A total of 30 maize kernels (endosperm facing up) were placed on a black cardboard for image acquisition. The distance between the lens and maize samples was 280 mm, the camera exposure time was 9 ms, and the movement speed of the stepping motor was 1.2 mm/s. So, it took about 90 s to collect a completed HSI image.

Identification of the region of interest and spectral data extraction
The spectral extraction process of a maize kernel is shown in Figure 2. A single image with high reflectance intensity at 713 nm from HSI data was selected to form a binary image, which was used to remove the background of HSI data by mask processing. To differentiate the region of the embryo and endosperm on the kernel, HSI data was transformed by PCA, and the PC score images were obtained to generate a scatter plot. Compared with the scatter plot and original image, the region of the embryo was occupied by red and blue represented the region of the endosperm. Thus, the two regions can be effectively divided. [30] Based on this, the regions of interest (ROI) were selected from the embryo location of the maize kernel, which corresponding to the area of the kernel was easy to be consumed by microorganisms. The average spectrum of ROIs was used to build models by the traditional feature extraction methods as well as a big data set composed of the spectrum of pixels in ROIs for training deep learning algorithms.

Deeping network models of SAE and CNN
The SAE [31] network mainly includes encoded and decoded layers, which is similar to multiple-layer neural network, and requires the output result to reconstruct the input data as much as possible. The structure of SAE is shown in Figure 3. For a given input sample set {x 1 , x 2 , x 3 , . . ., x i }, where i represents the numbers of sample, x i is encoded to the hidden layer h i as Then, h i is decoded to reconstruct the input data as output z i as where f c represents the activation function and sigmoid function is used in this study, c = {w i1 ,b i1 } and c´ = {w i2 ,b i2 } are the net parameters of the encoded and decoded layers, respectively, w is the correlation weight matrixes, and b is the bias items. The cost function of SAE can be represented as where k KL ðρjjρ 0 j Þ represents the relative entropy of the Kullback-Leibler, ρ is a sparse coefficient, ρ 0 j is the average activation degree of the hidden layer neurons, λ is weight attenuation coefficient, and β is sparse penalty coefficient. The gray neurons represent the unactivated unit (sparse values may be close to zero) and red neurons are the activated unit (in Figure 3). The quasi-Newton with limited-memory methods for L-BFGS was used to train the optimal net parameters (c = {w, b}). The training was completed using the back propagation algorithm and the weight was updated with the batch gradient descent method during iteration. The SAE network with two hidden layers was constructed in this study for representing the abstract depth features of input spectra. A classification layer (e.g., SoftMax layer) is added at the end of SAE, and the net parameters (c = {w,b}) of each layer are adjusted by the way of fine-tuning combined with the labeled training data using the cost function for guidance. Finally, the input spectra x i can be represented as the sparse features h(x i ) from the hidden layer output. Also, the sparse features h(x i ) can be used to express the moldy status of maize kernels as the spectral features from SAE for modeling.
CNN is a kind of multi-layer perceptron similar to the artificial neural network, which is often used to analyze visual images. [32,33] Inspired by classical and successful VGG16, LeNet-5, and AlexNet, the architecture of CNN in this study is illustrated in Figure 4. CNN was mainly composed of the following nine layers: input layer, four convolutional layers, one max pooling layer, two fully connected layers, and output layer. The input data (Q(x)×Q(y)×256) was the ROI image of maize, which was processed by PCA (Q(x)×Q(y) is the size of image and 256 represents the number of spectral bands). The 3 × 3 convolution kernels were used for each convolution layer (conv1˜conv4) and the number of channels was 64, 128, 256, and 512, respectively, and the stride was set to 1 and padding was 0. The 2 × 2 max-pooling layer was added to the conv2 layer, which was used to enhance the image features obtained by the convolution layer. Two connected layers (FC1 and FC2) of the size 1 × 1 × 256 were connected to the conv4 layer, which were used to express image features by weight. The output used the softmax function as a classifier. After each convolution operation, batch normalization was used to avoid overfitting and improve the generalization ability of the network; further, the nonlinear transformation was carried out by using the ReLU function.
The cross-entropy was taken as the cost function of the CNN model, and the gradient descent method was used to optimize the parameters of the network model. The equation is expressed as where N is the number of samples, y i pre is the prediction value, and y i tru is the true value. In the process of CNN training, the optimal CNN model was obtained according to the change trend of cost results during iterative training with different values of learning rate and the attenuation factor. The established CNN was used to extract the deep abstract features of images to express the moldy status of maize kernels.

Spectra and image features extraction algorithms
To further verify the feature dig and expression ability of SAE and the CNN model for the different moldy status maize. The traditional feature extraction method of successive projection algorithm (SPA) [34,35] was used to select optimal spectral wavelengths (18 spectral features were selected from original spectral data (824 bands) by SPA for modeling), and gray level co-occurrence matrix (GLCM) was used to extract texture features from the HSI image. We compared the recognition effects of the established classification models based on spectral features extracted by SAE and SPA algorithms based on image features by CNN and GLCM methods.

Classifiers
KNN is a common classifier based on a supervised learning algorithm. [36] The principle of KNN is to judge the attribution of class according to the category of the nearest k points when predicting a new value. KNN has the characteristics of simple, fast, and insensitive to outliers. SVM is a nonlinear classifier. [37] Its learning strategy is interval maximization, which can be formalized as a convex quadratic programming problem and also equivalent to the regularized minimization of the loss function. SVM has good robustness and generalization ability, which can effectively avoid "dimension disaster." PLS-DA is a statistical method, which combines the advantages of principal component analysis, canonical correlation analysis, and multiple linear regression analysis. [38] It is used to build a linear regression model by projecting the predicted variables and observed variables into a new space. The above three classifiers were combined with the spectral and image features to establish classification models for identifying moldy grades of maize kernels.

Model evaluation
In this study, the correct recognition rate (CRR) was considered as evaluation index to verify model performance, the equation was as follows: where N c is the number of correctly classified maize kernels in data sets and N T is the total number of maize kernels corresponding to the sets. The sensitivity and specificity were selected as another evaluation indexes [39] : where T p is the true positive, F n is the false negative, T n is the true negative, and F P is the false positive. The model is considered to have strong recognition ability with the indexes that are close to one.

Reference measurement of fungal spores
To facilitate statistics and modeling, all measured fungal spores were converted into logarithmic representation (lg count/g): grade1 (value <5 lg count/g), grade2 (5 lg count/g < value <6 lg count/g), grade3 (6 lg count/g < value <7 lg count/g), grade4 (value >7 lg count/g). The scatter plot of the measured values in the training set is shown in Figure 5. The number of fungal spores increased gradually and the moldy status of maize kernel changed from grade1 (health) to grade4 (heavy mildew). In total, 72 samples were classified as grade1 and their values were in the range of 4.0-5.0 lg count/g and 66 samples were classified as grade2 (values: 5.1-5.9 lg count/g). For the first two grades, the growth rate of fungi was relatively slow and the nutrients in the kernels have not been absorbed completely. [40] Furthermore, 75 samples were classified as grade3 (6.0 lg count/g < values <6.9 lg count/g) and another 72 samples were classified as grade4 (7.0 lg count/g < values <8.7 lg count/g), it meant that the fungal spores began to increase and the grains became mildewed seriously. In addition, the statistical results of fungal spores for all maize samples in the training and testing sets are shown in Table 1. In detail, the mean and standard deviations of the spores in the training set were 6.36 and 1.43 lg count/g and in the testing set were 6.58 and 1.89lg count/g, respectively. Moreover, the range of values (4. 26-8.64) in the testing set was within the range in the training set (4.10-8.89). Therefore, the division of the training and testing set was beneficial for improving the robustness of the model.

Spectral analysis
It could be seen from Figure 6 that the raw reflectance spectrum of all maize samples with different moldy grades showed a similar change tendency in the 400-1000 nm wavelength, and there were no obvious absorption peaks on the smooth spectral line. The reflectance intensities of kernels changed from high to low as the fungal spores and storage days increased. That was to say the mildewed kernel increased the absorption ability of light, and this phenomenon might be related to biochemical changes of maize kernel during storage, such as water, protein, starch, and other nutrients will decline, the surface loses luster, and the color becomes dark. [41] Furthermore, the spectral line had low reflectance values in the range of 400-600 nm, which might be related to the light absorption intensity of pigment contents in maize, such as chlorophyll and carotenoids. [42] The wavelength at approximately 500 nm might be referred to as the color changes of the maize kernel. [43] The less clear peak relatively might be linked to water of the maize that was distributed at 960 nm. It is necessary to further analyze whether these feature wavelengths can be used to build classification models for expressing the characteristic of different moldy grade kernels.

Training results of SAE and CNN
The SAE was established using the raw spectral wavelengths of the pixel in the ROI of maize kernels. Two hidden net layers were set in SAE, the neurons were considered as 120 or 100 for layer1 and 30 or 15 for layer2, sparse constraint p of 0.1 and 0.3; the SAE nets were trained based on the above network parameters and the optimal network obtained by results of cost value. The iterations were set to 250 for each net and the convergence trend of cost function tended to be stable after 55-60 iterations for all nets.
The training results of SAE nets are shown in Figure 7. The convergence of the cost function of SAE with one hidden layer (layer1:120 or layer1:100) was inferior to that of SAE with two hidden layers (layer2:120-30 or layer2:100-15). With p as 0.1, the cost values of SAE for layer1:120 and layer1:100 were stable at 0.75 and 0.63, respectively, and the costs reduced to 0.45 and 0.31 for layer2:120-30 and layer2:100-15, respectively (Figure 7(a)). By contrast, a similar trend with a p value of 0.3 is shown in Figure 7(b). The relatively better results of the SAE for layer2:120-30 and layer2:100-15 converged to 0.41 and 0.43, respectively. According to the comparison of cost values, the best ideal architecture of SAE was layer2:100-15 with a p value of 0.1. So, the 15 output neuron features from hidden layer2 of SAE net(layer2:100-15) could be as the abstract spectral features of the original input spectra for modeling.
The optimal CNN net was obtained according to the change trend of cost results during iterative training ( Figure 8). In the process of CNN training, the initial learning rates were set as 0.0001, 0.005, 0.0005, and 0.001, two attenuation factors were 0.75 and 0.85, and the number of epochs were set as 380, 402, 382, and 450, respectively. With the attenuation factor set as 0.75 (Figure 8(a)), the cost function with different learning rates had obvious volatility in the early stage of training. Among them, the convergence effect under a learning rate of 0.0001 was obviously inferior to others and its value was stable at 0.33, a good convergence result with a learning rate of 0.001 was obtained and value reached to 0.21. For the attenuation factor of 0.85 (Figure 8(b)), the overall convergence effects were satisfactory and the cost values converged to the range of 0.19-0.21. So, the learning rate of 0.001 and the attenuation factor of 0.85 were considered as the best parameters for the CNN model.

Identification results based on spectral features from SAE and SPA
The identification results of different mildew grades from maize kernels based on spectral features used the deep model of SAE and the traditional models of KNN, SVM, and PLS-DA (only use spectral features by SPA) and are shown in Table 2. For the SAE model, the overall CRRs of the training and testing sets were 96.31% and 93.68%, respectively, which had a significant performance. The recognition effect of grade1 and grade4 datasets (only one sample was wrongly classified in both sets) was better than that of grade2 and grade3. The recognition result of the KNN model (CRRs of 88.42%) was close to that of the SVM model (CRRs of 88.94%) in the training set, and their CRRs decreased to 83.15% and 81.05% in the testing set, respectively. The recognition ability of the two models was less than that of the SAE model, and the CRRs for the KNN and SVM were lower by 10.53% and 12.63% than SAE for the testing set. The performance of the PLS-DA model (CRRs was only 74.73% for the testing set) was not as good as that of KNN and SVM models, even less than that of the SAE model. In other words, the PLS-DA model is not suitable for the identification of moldy grades of maize kernel because of its weak stability. Therefore, based on spectral data, the classification ability shows that the SAE model performed better than others, which suggests that it may have significant potential for the identification of the mildew status of maize kernels. The expression effect of the mildew status of the maize kernels using the mined depth spectral features was better than that of shallow features from traditional methods.

Identification results based on image features from CNN and GLCM
The identification results of different mildew grades from maize kernels based on image features used the deep model of CNN and the traditional models of KNN, SVM, and PLS-DA (only use image features by GLCM) and are shown in Table 3. Compared with the results of models based on spectral features, the deep model of CNN also had the best performance with CRRs of 95.78% and 94.73% for the training and testing sets, respectively. There was no significant difference between CNN and SAE models for CRRs, but the recognition effect of CNN was slightly better than that of SAE for the grade2 and grade3 datasets. Furthermore, the overall CRRs of SVM were slightly higher than the KNN in both the training set (CRRs of 91.05% vs 90.52%) and testing set (CRRs of 85.26% vs 84.21%), but these were 4.73% and 9.47% of the training and testing sets less than the corresponding CRRs for the CNN model. The recognition effect of the PLS-DA model was obviously inferior to that of other models, the overall CRRs of the training and testing sets were only 84.73% and 75.78% respectively, which were lower by 11.05% and 18.995% than the CNN model. So, the established CCN model with image features had the significant superiority for the identification of the mildew status of maize kernels than other models. The deep image features were more representative than brightness, color, and texture features for different mildew kernels.

Identification results based on fusion features from SAE and CNN
On the basis of the above analysis, it is feasible to use spectral or image features alone combined with deep models (SAE and CNN) to identify the degree of mildew kernels, and the performance of the model is better than the traditional feature learning methods. To improve the accuracy and stability of the model, the abstract spectral features from SAE and deep image features from CNN were fused to build the recognition models using KNN, SVM, and PLS-DA classifiers (the models were named SAE-CNN-KNN, SAE-CNN-SVM, and SAE-CNN-PLS-DA). The results obtained using these models are shown in Table 4.
The performance of models based on fusion features had a significant improvement over the models with the single feature. The best SAE-CNN-SVM model with high CRRs of 99.47% and 98.94% in the training and testing sets respectively. One sample from grade4 was mistakenly classified into grade3 in the training set, so did the testing set. The CRRs of the SAE-CNN-KNN model in both the training set (CRRs of 98.94%) and testing set (CRRs of 96.84%) were slightly lower than that of the SAE-CNN-SVM model. For grade2 and grade3 in the training set, only one sample was misclassified to opposite grade. For the testing set, three samples from 95 samples were wrongly classified and the CRR for the grade2 samples was 100%. For the SAE-CNN-PLS-DA model, its recognition ability was obviously inferior to that of SAE-CNN-SVM and SAE-CNN-KNN models, but its CRRs were high by 10.53% and 8.95% than the PLS-DA with spectral feature (from 83.15% to 93.68%) and PLS-DA with image feature (from 84.73% to 93.68%) in the training set, and the increased values of CRRs were 15.79% and 14.74% in the testing set.
In addition, compared with spectral features, the CRRs of the SAE-CNN-SVM model were improved by 5.26% and 17.89% than SAE (from 93.68% to 98.94%) and SVM models (from 81.05% to 98.94%) in the testing set. For the image features and based on the testing set, the increased values of CRRs were 15.79% and 14.74% than CNN (from 94.73% to 98.94%) and SVM models (from 85.26% to 98.94%). So, the established deep model based on fusion features could effectively improve the recognition ability of mildew maize kernels. It can also explain that maize mildew is a comprehensive result of the changes of internal components and external attributes.
On the basis of above analysis, SAE-CNN-SVM and SAE-CNN-KNN models were selected to further evaluation by the values of sensitivity and specificity in Table 5. For the SAE-CNN-SVM model, the values of the sensitivity and specificity were very satisfactory for grade1 and grade2 in both training and testing sets, which were all 1. The minimum sensitivity for the grade4 samples was 0.95 in the testing set and others were 0.98-0.99. By comparison, the SAE-CNN-KNN model expressed an acceptable result. For the training set, the values of the sensitivity and specificity for grade1 and grade4 were all 1, and other grades were 0.97-0.99. For the testing set, the values were slightly reduced to the range of 0.95-1, and only the specificity values for grade1 and grade4 were 1. Therefore, it can be further confirmed that the overall performance of the SAE-CNN-SVM model is superior to the SAE-CNN-KNN model by the results of the sensitivity and specificity. Note: A Numbers in bold signify the correctly classified samples.

Visualization of the identification results
The identification results of the different moldy grades were visualized based on the pixel-wise and object-wise of the maize kernel in Figure 9. Different grades were marked with corresponding colors: green for grade1, blue for grade2, yellow for grade3, and red for grade4. Grade1 and grade4 from the original images were significantly different and there was no significant difference between grade2 and grade3. These characteristics were also expressed intuitively in visualization images. For the pixel-wise, the kernel images of grade1 and grade4 were mainly occupied by green and red, respectively, which was a satisfactory result. Grade2 and grade3 were dominated by blue and yellow, respectively, but some pixels were misjudged as other grades and mixed colors appeared in kernels, such as green was mistakenly mixed with blue in grade2 and blue was mistakenly mixed with yellow in grade3. That is to say, the changes of moldy status on the maize kernel surface caused by fungi infection were not uniform during storage. For the kernel-wise, the recognition results were presented intuitively with different colors. Only one kernel from grade2 was misclassified into grade1, and the CRR could be expressed as 96.66% (29/30 × 100% = 96.66%); two kernels from grade3 were mistakenly classified into grade2 and CRR was 93.33% (28/30 × 100% = 93.33%). For the mixed kernels of maize classification, the identification results based on pixel-wise had not yet got more excellent effect; many pixels on kernel were misclassified into other grades and the mixed colors were found to be distributed on the surface of maize kernel. By comparison, an acceptable result based on kernel-wise was obtained and only one kernel from grade3 (yellow) was misclassified into grade4 (red). So, it is feasible to use the best classification model to visualize the moldy grades of maize kernel. That is difficult to achieve by chemical methods or naked eye observation.

Conclusion
In this study, the accurate classifications of the moldy grades of contaminated maize kernels by fungi were investigated using HSI technology combined with deep learning algorithms of SAE and CNN. The maize samples were classified into four grades (health, mild mildew, moderate mildew, and heavy mildew) using the values of measured fungi spores as the reference. The deep spectral features were represented by SAE and optimal wavelengths were selected using SPA, the deep image features were dug by CNN network and texture features were extracted by GLCM. Based on spectral, image, and their fusion features, three classifiers of KNN, SVM, and PLS-DA were used to establish the recognition models for identifying the different moldy grades of kernels. Among them, the fusion features of SAE and CNN combined with SVM classifier to build the SAE-CNN-SVM model expressed the optimal classification performance with high CRRs of 99.47% and 98.94% for the training and testing sets, respectively. Also, the values of sensitivity and specificity were 0.95-1. The visualization results could be displayed intuitively the moldy grades of maize kernel, which was difficult to achieve by chemical methods or naked eye observation. So, the established deep model based on fusion features could effectively improve the recognition ability of mildew maize kernels than that of traditional learning methods. This research is expected to provide theoretical guideline for the equipment development of online batch testing of maize moldy kernels in the field, also to produce a new idea for the research of data dimension reduction and feature mining from HSI data.