Assessment and Prediction of Free Fatty Acids Changes in Maize Based on Effective Accumulated Temperature in Large Granaries

ABSTRACT In large granaries, maize storage period could last 3–5 years in China. Free fatty acids (FFA) content is commonly used as a sensitivity indicator of grain quality changes during storage. Samples of the stored grain are taken manually and tested in the laboratory monthly for regular quality monitoring. Although substantial labor and time are consumed, the testing way lacks of real-time. Temperature is a main factor influencing the quality of stored grain. In this study, it was analyzed that the effective accumulated temperature (EAT) had a strong correlation coefficient with FFA. Classification and regression trees (CART) and model trees (ML) in machine learning (ML) methods were used to estimate FFA by EAT based on data collected from 11 large granaries in northeastern China. While the minimum number of samples for segmentation was set as 20, the two models both had the optimal performance. The two models were evaluated by mean absolute error (MAE), root mean square error (RMSE) and the coefficient of determination (R2). The MAE, RMSE and R2 of CART and MT are 1.296, 1.761, 0.759; 1.247, 1.821, and 0.741, respectively. As CART had the lower RMSE and the larger R2, the model performance of CART was better. The model provided a method to estimate the changes of FFA based on the exist temperature monitoring system in large granaries as a way of stored maize quality real-time monitoring.


Introduction
Maize (Zea may L.) as one of the most important crops in the world is widely grown for food, forage, energy, and industrial materials. [1,2] In 2019, the output of maize had reached 260.78 million tons in China, while the yield of paddy and wheat were 296.93 and 237.28 million tons, respectively. [3] Maize has been definitely the largest output crop for years in China. [4] After harvesting, storage is an inevitable step for grain before eating or processing. During grain storage periods, quality and quantity loss may occur due to insufficient control manners which may result in the growth and reproduction of insects, mite, mold, and fungi. [5,6] Regular quality monitoring is indispensable to insure grain storage security.
FFA is commonly regarded as a sensitivity indicator of grain quality changes during storage. [7] Partial hydrolysis of the glycerides may occur in grain due to unfavorable storage conditions, then FFA increased. [8] The increasing FFA means deterioration happening in quality which could contribute to deterioration of taste and smell, loss of seed vigor and reduction in processing property. [9,10] The American Association of Cereal Chemists (AACC) has provided four methods for FFA determination including general method, rapid method for small grain, rapid method for corn and colorimetric method. [11] And all of them are operating through a titrimetric procedure which are precise but time-consuming. Besides the traditional gas chromatograph method, [12] other determining methods also have been explored. The Near infrared spectroscopy [13] and the gas chromatography-mass spectrometer detector [14] are used to analysis FFA in soybean and berry seed, respectively. Hui Jiang et al. used a homemade olfactory visualization sensor to quantitatively determinate FFA of stored rice. [15] If the grain samples needed to test were substantial, those methods were time-consuming and laborious. Therefore, a rapid and effective measurement of FFA has a significant meaning for stored grain in large scale.
Many grain storage researches were conducted in small-holders or laboratory scale. [16][17][18] The conditions of grain bulk stored in the large-scale granary are quite different with the tests carried in small scale including constitutes (impurity, damaged grain), temperature gradient, management and storage period. Temperature gradients in stored grain generated from seasonal changes in ambient air temperature and solar radiation. [19] Temperature and moisture content (MC) are two key factors that determine grain quality during storage period. [20] Modern commercial grain storage enterprises have built a multifunctional system to insure grain storage security at least including temperature monitoring system, ventilation, cooler machine, inner circulation fumigation. Measuring grain temperature is the main method used by grain industries to monitor grain storage conditions because it is easy and inexpensive to achieve by using thermocouples. [21] The relative humid (RH) sensor is not widely used in grain bulk by grain storage industries. A mass of grain temperature data has be obtained during the grain storage. However, the data is commonly used to monitor the storage condition or realtime control. Li, X.J. et al. had built a smart cooling-aeration system based on the temperature data, which could save energy consumption of cooling the bulk paddy stored in concrete silos. [22] To data, less than 1% data generated in Internet of things is used and mostly for alarms or real-time control, but more can be used for optimization and prediction. [23] It has been currently missing about how to use the data to predict the grain quality changes to insure grain storage security.
To seek the relation between the temperature data and the grain quality changes, machine learning (ML) methods could be utilized due to its effectiveness in dealing with big data. In recent years, ML methods have been applied in grain storage such as grain moisture determination, [24] dry matter loss prediction, [25] classification of grain storage inventory modes, [26] prediction of grain damage in smallholders [27] and insect populations prediction, [28] etc. ML could be an important tool used for improving grain storage security.
Classification and regression trees (CART) algorithm and model trees (MT) algorithm are both belongs to tree methods, which are widely used in data mining due to its ability of dealing with multiple variable and easy to improve. [29,30] Compared with other ML methods, CART and its varieties are more acceptable to the public thanks to its explicable of the results. In this paper, models of CART and MT are proposed to predict the FFA value in stored maize based on temperature data collected from 11 large granaries in the northeast China.

Study area
The study maize granaries are located in Jilin Province and Heilongjiang Province, which setting in the northeast China. In 2019, the total maize output of China was 260.78 million tons, including 30.45 million tons of Jilin Province and 39.40 million tons of Heilongjiang Province according to the National Bureau of Statistics of the P.R.C., i.e., the production of maize in the two provinces accounted for 26.8% of the total. The experimental data were collected from 11 maize storage granaries including 3 in Heilongjiang Province and 8 in Jilin Province.

Temperature monitoring system of the granary
The granary is designed and built according to the national standard of P.R.C, GB 50320, 2014 Code for Design of Grain Storehouses. The length of the granary is generally 36-60 m and the width is 36-40 m. The height of grain bulk should not below 6 m and the height between the grain bulk surface and the horizontal members of the roof should not less than 1.8 m.
The temperature monitoring system deployment in the granary is regulated by the nation standard of LS/T 1203, 2002 Measurement and Control System for Condition of Stored-grain. The thermometric cables arranged in the granary are as follows: the distance between rows and columns in the horizontal direction shall not be greater than 5 m, the distance between rows and columns in the vertical direction shall not be greater than 3 m and the distance shall be away from the grain surface, granary bottom, and granary wall about 0.3-0.5 m. Figure 1 depicts the basic working principle of temperature monitoring system. The thermometric cables with sensors are deployed in the storehouse and all the temperature data is collected at the integrated control box, then the data are transmitted through the wireless device to the work station. The data shown and stored in the work station is used as the basic information for stored grain management.

Maize quality assessment
The maize samples were taken from the granaries ranging from March 2017 to October 2020. The procedure was according to the national standard of GB 5491, 1985 Inspection of Grain and Oilseeds Methods for Sampling and Sample Reduction. The sampling quantity depends on the height and the area of the grain bulk. The height above 6 m is divided into 4 or 5 floors. Each floor is divided into small areas and each area should not be more than 50 m 2 . Sampling should be taken in the four corners and the center points in each area.
The MC of maize samples was analyzed by the oven-drying method ASABE,2017. [31] According to the national standard of GB T 29890, 2013 Technical criterion for grain and oil-seeds storage, two provinces studied in this paper belong to the low temperature and high humidity grain storage area. The allowed highest MC of stored maize in this area is 14%, and the difference between the maximum moisture value and the minimum moisture value should not exceed 1%. Before storing in the granary, the quality of corn must be inspected one by one transport truck, and the corn must be stored by classification in strict accordance with the inspection quality standards.
The FFA value of maize samples was measured according to the national standard of GB T 20570, 2015 Guidelines for Evaluation of Maize Storage Character. At room temperature, the fatty acids in maize were extracted with anhydrous ethanol, and then titrated with potassium hydroxide standard solution, after which the FFA value was calculated. The standard stipulated that each sample should be determined twice by the same inspector. The average value was used as a result, and the difference between two values should not be greater than 2 mg KOH/100 g.

Efficient accumulated temperature
Temperature is a key factor for grain storage security. It has a great influence on the emergence or growth of mite and fungi in a grain bulk, and the metabolism and quality changes of the grain. Here, an efficient accumulated temperature (EAT), a temperature calculation form, is introduced as follows: where A EAT is the EAT of the grain (°C*d), d is the storage time unit as day, n is the total days (d), T 0 is the threshold temperature (°C), T d is the average temperature of the grain bulk in one day. Every living thing responds to changing temperature. In order to predict the matter on a quantitative basis with temperature, EAT has been used in seed germination, [32] insect generation, [33] crop growth, [34] etc. The rising of FFA in grain is related to the temperature changes, thus it is reasonable to consider to build a relation between the FFA and EAT. When temperature is below 0°C, the metabolism of microorganisms stay at a low level. Therefore, T 0 , the threshold temperature, is set as 0°C.

CART and MT
CART: The gist of tree algorithms is the extraction of meaningful subgroups characterized by homogeneous covariate values and common outcome. [35] The traditional CART algorithm uses Shannon entropy to divide a set into binary. In this study, total variance was used to substitute for Shannon entropy. The CART is described as follows: where D is a standard data set for regression problems, x 2 R d is d dimensional eigenvector, y 2 R is a continuous random variable. If each attribute is viewed as a coordinate axis in the coordinate space, all D attributes will constitute a d-dimensional feature space and each x the eigenvector of d-dimension corresponds to a data point in the space. The objection of the CART is to divide the feature space into several subspace. And each subspace has a fixed output value, i.e., all input values in the same subspace would have a same output value. Firstly, choosing a random attribute and traversing all possible values, then finding the optimal partition point of the attribute, which is named as v � , according to the following formula (3).
where v � is the optimal partition point of the attribute, R 1 a; v ð Þ ¼ fxjx 2 D a�v g, R 2 a; v ð Þ ¼ fxjx 2 D a�v g, y i is the output value of data sample x i , c 1 and c 2 are the average output value of y i corresponding to the data set of R 1 a; v ð Þ and R 2 a; v ð Þ, respectively. c 1 and c 2 are calculated as follows: Secondly, traversing all attributes to find the optimal partition attribute, and then dividing the feature space into two subspace according to the optimal partition point of the attribute, repeating the above steps for each subspace until the stop condition is met. In this way, a cart regression tree is generated.
Assuming that the feature space is finally divided into subspace as R1, R2, . . ., RM. The model formula of cart regression tree can be expressed as, Similarly, c m is the mean value of the output value, y i corresponding to the sample x i in the Set R m . The intuitive understanding of this formula is that for a given sample x i , first judge which subspace it belongs to, and then take the output value corresponding to the subspace it belongs to as the predicted value y i of the sample. MT: The main difference between CART and MT is the form of leaf node. The leaf node of the MT is set as piecewise linear function compared with the constant of the CART. In order to get the piecewise linear functions of the MT, some changes need to be made with the CART, e.g., allowable descent error and minimum number of samples for segmentation needed to be set. The method to get piecewise linear function of each leaf node is as follows: where X is the matrix of the input data, w is the regression coefficient vector. For certain input data, regression coefficient vector is calculated by finding the value which can minimize the error between the calculation of output and the real output value. The error is described as: where y i is the real output value, x i is the input data, w is the regression coefficient vector. And matrix in the function of (7) can also be described as: Calculating the derivation, then getting: while X T Y À Xw ð Þ ¼ 0, we will get the result of the regression coefficient vector: where w is the regression coefficient vector, X is the matrix of input data, y is the output value corresponding to the input data. Performance Metrics: The performance of the model is commonly evaluated by the measures of deviation between the calculation and actual values. The measures used in this study are the mean absolute error (MAE), root mean square error (RMSE), and the coefficient of determination (R 2 ). They are shown as following: where N is the total number of FFA value needed to be estimated, y p t ð Þ, y t ð Þ and � y are the calculated value, true value, and the mean of all true values of the t th sample, respectively.

Results
In this section, the correlations between EAT and FFA were analyzed by Pearson simple correlation coefficient using BONC DSS Statistics(Version 1.0, BONC.2018). The EAT of stored maize in commercial storehouses was used to build the models of CART and MT. The MAE, MAPE, and RMSE were applied as the criteria to evaluate the performance. The building and evaluation of the models were carried out based on the Pycharm(Community Edition 2020.3, JetBrains Corp, 2020) combined with Anaconda (Individual Edition 5.3.1, Anaconda, Inc.2016). Figure 2 shows the changes of FFA and EAT of maize stored in a granary which cited in Changchun City, Jilin Province. The data of the storage period started from Apr., 2018 to Sep., 2019. From the 2th to 7th month, the EAT has a significant rise from 160 to 2338.6 (°C*d). And the FFA value rose from 35.4 to 39.5 (mgKOH/100 g) during the period from 4th to 6th month. The similar FFA value of storage maize also had been obtained by other researchers. Gras, P.W. et al studied the difference between maize stored in unseal bags and seal bags. After 9 months storage, the storage maize FFA value cited in the bottom, middle, top of unsealed and sealed bags were 46.6, 56.4, 43.3; 18.6, 42.8, and 25.6 mg KOH/100 g, respectively. [7] However, from 8th to 13th month, the EAT variance was not obvious, meanwhile, the FFA value had little changes. It means the changes of EAT, i.e, the temperature, have a significant influence on the FFA during storage.

Correlation analysis
The correlations between FFA value and EAT of stored maize were analyzed by Pearson simple correlation coefficient. It is shown in Table 1 that the EAT and FFA have a significant correlation at 0.01 level (2-tailed) up to 0.963. Therefore, it is reasonable to build a regression function between the EAT and FFA value.

Results of CART and MT
Temperature data of the stored maize are collected through the temperature monitoring system, which is shown in Figure 1, and data missing and lacking can happen accidentally. The Pauta criterion was applied to eliminate abnormal data and the linear interpolation method was used to complete the missing data, then using the data to get the EAT value. Eventually, 298 sets of data were obtained. To avoid the sample representativeness bias caused by systematic differences between the training set and the test set, [36] 80% of the data was randomly selected to use as the training set and the rest 20% was used as the testing set. The FFA value used in the models was that the real value subtracted the initial value in order to eliminate the difference of the initial value. In fact, the models estimated the changes of FFA value of stored maize.
The allowable descent error and minimum number of samples for segmentation are the two major parameters determining the leave creation of the tree models. The allowable descent error was set as 1, it had been tested in the two models which changes had little influence. The minimum number of samples for segmentation setting need to be tested using the correlation coefficient, i.e., Pearson product-moment correlation coefficient. The relation between the correlation coefficient and MNSS value is shown in Figure 3. As the MNSS value rising, the correlation coefficient of the two models rises at first, and reaches the max value while MNSS is 20, then descends. Thus, the MNSS was setting as 20 and the two models could have the optimal performance.
How the MNSS value influences the performance of the two models is shown in Figures 4 and 5. MNSS is the minimum number of samples for segmentation, i.e, only the amount of samples reaches the MNSS value that a leaf node might be created in the tree models. This is a way of prune the tree Figure 2. Changes of FFA value and EAT of maize. The two y-axis, which represents FFA value, EAT from left to right, and the titles of y-axis colored differently corresponding to the line with the same color, respectively.   models in order to minimize the influence of abnormal or irrelevant values. It can be seen that the higher MNSS value, the less functions would be created in the model, and the model would be simpler. According to Figure 3, the correlation coefficient of the two models had the optimal performance when MNSS was 20, below and above this value the correlation coefficient would descend. It can be interpreted as following: if the MNSS value is small, abnormal values might be involved in creating the model; if the MNSS is large, some helpful value would be neglected, then the performance of the model will be influenced by the MNSS value. The Figures 6 and 7 show the outcomes of the two models, respectively, while MNSS is set as 20 in both. Firstly, traversing all possible values, then finding the optimal partition point, which is calculated according to equation (3). In the CART model, the first separation point value of EAT was 5195.487, then all values above the separation point value were stored in the leaf and the rest values were stored in the right leaf. The separation method was repeated in the two leaves, until the separation condition is not met as MNSS<20 or the allowable descent error above setting value. For example, in the second layer in Figure 6, the leaf was EAT = 7557.585, it could not be separated again, then the FFA value corresponding to the EAT values which above 7557.785 was 14.189 which stored in the left leaf node, and the FFA value corresponding to the EAT values which below 7557.785 but above 5195.487 was 9.908 which stored in the right leaf node. Finally, the CART model separation procedure was finished showing in Figure 6. The main difference in outcomes between CART model and MT was the leaf node. The leaf node of CART model was a number but the MT was a function.

Comparison results
Data (20%) was used to estimate the performance of the two models. The criterion including MAE, RMSE and R 2 were computed, respectively. The smaller values of the MAE and RMSE, and the larger values of R 2 , the better performance of the model. As shown in Table 2, the RMAE and R 2 of the CART were lower than the MT, but the MT had a smaller MSE value. The R 2 of the two models was 0.759 and 0.741, respectively. The two values were much closer. In general, the CART had a better performance than MT. However, the MT usually performs better than CART in creating model because the liner normally has a smaller difference in estimation than a number. This might be the data for building models is loosely referring to the coefficient correlations of the two were 0.871 and 0.861 when MNSS was setting as 20, respectively. In conclusion, the performances of the CART and MT were close, the CART was a little better.

Discussion
The maize is usually stored for 3-5 years in large granaries in China according to the nation grain security policy. The grain quality will descend over storage time. It is affected by the interactions of physical and biological factors during storage. [37] The change of FFA content of grain is generally viewed as a key indicator of grain quality deterioration during storage, [12] and it has a direct relation with the stored temperature. Temperature monitoring and controlling is an efficient way to grain quality management and widely used in large grain granary. [22] During the winter, the cool air from the ambient is transported into the granary through fans. [38] As shown in Figure 8, cool air goes through the input tube and ventilation cage to enter the grain, then the grain is cooled down. The ventilation cage opening rate usually is 25-35%. In summer, the temperature of grain in the top layer will rise quickly. The bottom of the grain bulk remains at low temperature. The axial fans transport the air from the bottom and inner of the bulk to the top to cool down the rising temperature through the ventilation cages and vertical tubes, shown in Figure 8. The temperature controlling manner is usually named as inner air circulation (IAC).
The thermal diffusivity of bulk grain is at a very low level according to ASABE(2008), showing as follows: corn, 10.22 * 10 −8 m 2 /s; rice, 10.99* 10 −8 m 2 /s; and hard wheat, 11.5* 10 −8 m 2 /s. [39] As the ambient temperature rises, the top and the sites close to the wall rise faster than the bottom and the inner of grain bulk, then the temperature gradient forms in the bulk. Taking a granary for example, the  temperature gradient changes over time as shown in Figure 9. From Mar., 2017 to Aug., 2017, the temperature rose in the grain bulk as the proportions of the below 0°Cand 0-15°C became smaller, and the high-temperature proportions including 15-25°C and above 25°C became larger. From Sept. to Feb. of the next year, the temperature of the grain bulk went down.
The temperature gradient provides the possibility for IAC, but it raises the uncertainty of the grain quality. In large granary, thermometric cables are widely used to monitor the temperature. [40] For example, 60 thermometric cables that contain 420 sensors in total are normally placed in the granary which sizes is 42 m * 24 m storing the grain bulk which height is 6 m. However, the sensors system still cannot reflect all temperature condition of the granary due to its large scale and irregular self-heating points caused by complex biological conditions. [41,42] Sampling in the granary for testing grain quality each time is little due to the sparsity of sample locations. Therefore, a more reliable grain bulk monitoring needs more sensors and sampling. However, sampling, testing, and arranging sensors consumes huge labor and substantial materials, which increasing the cost of storage management.
Mathematical models had been built between FFA and storage time in prior researches. [43,44] But those models lack universality and might be unfeasible once the assumed conditions changed. In this study, ML methods, which had the ability to deal with big data, were used to develop a universal model for predicting the grain quality changes, i.e, the changes of FFA. The models had revealed the relationship between EAT and FFA based on data collected from 11 large granaries in two provinces. The correlation coefficients of the models were 0.871 and 0.861, respectively. Tree methods might be potentially useful, once the analytic goal is identifying strata with same covariate values and homogeneous multiple outcomes. [35,45] In this study, the models performance had proven its ability in prediction and provided the possibility to obtain real-time grain quality based on the existing temperature monitoring system, which would play a beneficial role in reducing losses both in grain quality and economic.

Conclusion
The models of CART and MT were built using EAT to estimate the stored maize FFA changes, a key indicator of grain quality, based on data collected from large granaries sited in Heilongjiang and Jilin Province of China. It was analyzed that the correlation between EAT and FFA was 0.963. While MNSS was setting as 20, the two models could have the optimal performance corresponding to correlation coefficients was 0.871 and 0.861, respectively. The outcomes of the two models both were showed in the tree form. The MAE, MASE, and R 2 were used to assess the performance of CART and MT, and the results of MAE, MASE, and R 2 are shown as follows: 1.296, 1.761, 0.759; 1.247, 1.821, and 0.741, respectively. As CART had the lower RMSE and the larger R 2 , the CART performed better. In large commercial granaries, temperatures generally ranged from −10°C to 30°C during one year. Temperature changes result in a temperature gradient between grain layers, then different grain quality is formed. It is tough to obtain thorough grain quality by sampling. The model has provided a method to obtain FFA as the real-time grain quality indicator based on the exit temperature monitoring system. It would benefit from improving grain storage management to get quality information with less labor consuming. In further study, with the use of humidity sensors, CO 2 sensors, and other sensors in large grain granaries, more factors such as humidity, and CO 2 would be taken into consideration to improve the models. In this study, the data was collected from grain granaries in northeastern China. Substantial data need to be collected and analyzed before the models could be applied for more areas.

Abbreviations
The