Geographical traceability of Boletaceae mushrooms using data fusion of FT-IR, UV, and ICP-AES combined with SVM

ABSTRACT Geographical traceability is important to consumer protection and quality control of edible mushrooms. In this work, Fourier transform infrared (FT-IR) spectroscopy, ultraviolet (UV) spectroscopy, and inductively coupled plasma-atomic emission spectrometry were used for traceability of 312 mushroom samples from eight different geographical origins in combination with multivariate statistical analysis. Initially, FT-IR, UV spectra, and 14 elements of 312 samples obtained from 8 geographical origins were analyzed, respectively. Meanwhile, the principal components of three techniques were extracted by principal components analysis for data fusion. Finally, classification models were established in the basis of UV, FT-IR, elements, and fusion datasets combined with support vector machine (SVM). Compared with individual technology, data fusion of multi-technique can obviously promote the classification performance in SVM models for geographical origins traceability. Especially, the accuracy of prediction in SVM model by data fusion of three instruments was 99.04%, which was higher than single technique and data fusion of two spectroscopies techniques. This result indicated that data fusion strategy combined with SVM can provide high synergic effect for geographical origins traceability of Boletaceae mushrooms. The more information is fused, the better performance of the model is. This method may be applied for quality control and evaluation of analogous food.


Introduction
Wild-grown edible mushrooms are a popular and favorite delicacy in Turkey, Poland, China, and so on. [1][2][3] Recently, the consumption of wild edible mushrooms increased rapidly, not only caused by the unique taste and texture but also for their chemical and nutritional properties. Edible mushrooms are generally considered as a valuable health food, due to their poor calories and fats and abundant vitamins, proteins, dietary fiber, and minerals. [4,5] Except the nutritious value, edible mushrooms have potential medicinal value, such as increasing immunity, depressing blood-fat, reducing cholesterol, and antineoplastic function, which are caused by terpenoids, antioxidants, and other active substances. [6,7] As far as we know, many habitat factors such as rainfall, air temperature, and altitude may lead to the change of nutritional components and the concentration of heavy metals. As previously studied by Lu et al., [8] the content of ganoderic acid A and B, polysaccharide, and triterpenoids had variations in Ganoderma lucidum because of different regions. Furthermore, there are significant differences in the content of heavy metal elements such as Cd, Ag, and Hg in edible mushroom from different geographical regions. [9,10] In addition, some minerals, such as K, P, Mg, Ca, Na, Cu, and Zn, had important roles to play in biological systems but also cofactors of a wide range of enzymes. [11] However, excessive intake of trace minerals (such as Cr, Ni) can exhibit toxic effects. [12] Hence, the geographical origins from which the wild-grown edible mushrooms are collected are important if the food are to be of good qualities, and an appropriate technology of geographical origins traceability of mushrooms is good for the stable development of edible fungi market.
Nowadays, many analytical techniques were used to research the differences and assess the quality of food, which were collected from different geographical origins. For example, Fourier transform infrared (FT-IR) spectroscopy for differential identification of Wolfiporia extensa from different places, [13] Raman spectroscopy was used for milk powder characterization, [14] cold-vapor atomic absorption spectroscopy (CV-AAS) was applied for assessing mushrooms which have been collected from Yunnan highlands and eastern Tibetan plateau, [15] ultraviolet (UV) spectroscopy for discrimination of whisky brands and counterfeit, [16] and inductively coupled plasma-atomic emission spectrometry (ICP-AES) has been used for Boletus tomentipes assessment from different sites. [17] However, single technique would still have some imperfection to provide a reliable result. For instance, FT-IR, which fails to confirm the specific chemical component and their contents [18] ; Raman spectroscopy is influenced easily by intrinsic fluorescence, is unfit to analysis complex compound with low analyte concentration. [19] Compared with individual techniques, data fusion can provide a more complete data interpretation. The main aim of the data fusion was to improve the models by enhancing the synergy between the fused techniques and using complementary inputs. [20] Fusing complementary technologies has become one of the most reliable, effective, and powerful systematic approaches to determine quality of food. For the past few years, data fusion strategy combined with multivariate statistical analysis has been applied a lot in quality evaluation of edible oil, beverage, food, and so on. For example, Dankowska [21] used data fusion of fluorescence and UV spectroscopies for the detection of cocoa butter adulterated with cocoa butter equivalents. Pizarro et al. [22] gave a synergistic effect to classify extra virgin olive oils from different sites of Spain by fusing visible fingerprints and physical-chemical parameters. As reported by Wang et al., [23] an approach based on data fusion of near infrared and ultraviolet-visible (UV-vis) spectroscopies with the aid of chemometric algorithms was used for discriminating successfully five varieties of green tea. These results showed that data fusion could show correlations between the multiple kinds of variables, indicating that data fusion strategy of complementary techniques was a powerful tool for identification and quality evaluation of food.
In China, Yunnan Province has a land of immense scale and abundant of edible mushrooms resources. Two-fifths of edible mushrooms (880 species) in the world have been identified in Yunnan Province. [24] However, there are a little report about evaluation of edible mushrooms from different geographical origins. In this work, the main goal was to establish an effective method for geographical origins traceability of Boletaceae mushrooms from eight geographical origins in Yunnan Province, China. FT-IR, UV, and ICP-AES were used, jointly and individually. And then, compared the classification results and selected the optimal analytical method. This work can provide a reference for further quality management of mushrooms market.

Samples preparation
The study included 312 samples of Boletaceae mushrooms. Fruiting bodies were collected from eight geographical origins in August and September in Yunnan Province, China (Fig. 1). The detailed information of Boletaceae mushrooms is shown in Table 1. These samples were thoroughly cleaned by soft brush and dried in drying oven at 55°C for 48 h until the samples reached a constant weight. And then, all of the samples were crushed in the laboratory and the samples powder was sifted through nylon sieve in 80 mesh. In the last step, the powder was stored in zip lock bags and preserved in the dry and room temperature conditions for next step analysis.

Instruments and reagents
FT-IR spectroscopy spectrometer (PerkinElmer, Norwalk, CT, USA), which equipped with a DTGS detector, was used to obtain FT-IR spectra for wavenumbers in the range of 4000-400 cm −1 and 16 scans per sample with resolution of 4 cm −1 . The UV-vis spectrophotometer (Shimadzu, Tokyo, Japan) was used to obtain UV spectra from 200 to 600 nm using a 1-nm slit width. It was equipped with a sample quartz cuvette and a blank quartz cuvette. All the mineral elements of samples were determined using ICPE-9000, which was produced by Shimadzu, Japan. A pulverizer (FW-100), which was purchased from Tianjin Huaxin Instrument Factory in China, was used to powder the samples of Boletaceae mushrooms. The tablet press (YP-2), which was purchased from Shanghai Shanyue Instrument Inc. in China, was used for pressing tablets of samples, and the electronic balance (XS125A) was used to weigh accurately.
Nitric acid solution (65%) was of guaranteed reagent grade, KBr (Purchased from Tianjin Fengchuan Fine Chemical Research Institute, China), chloroform (purchased from Yunnan Yangling Industrial Development Zone Shandian Medicine Co. Ltd., China), and hydrogen peroxide (30%) were of analytical reagent grade.

Data acquisition
In order to obtain FT-IR spectral data, 1.0 ± 0.2 mg sample powder and 100.0 ± 2.0 mg KBr powder were milled and blended uniformly in agate mortar. Then, the mixture powder was pressed with the pressure of 10 MPa for obtaining FT-IR information. After preheated for 30 min, FT-IR spectrometer was applied for every mushroom spectrum. Before the sample tablet was tested, the pure dried KBr tablet was scanned for subtracting carbon dioxide and water in the atmosphere. At the temperature of 25°C and relative humidity of 30% conditions, the experiment was completed. Each sample was determined three times and took the average for next step analysis. For UV spectral acquisition, 0.1 g sample powder of each sample dissolved with 10 mL chloroform solvent. The mixture solvent was ultrasonic extracted (55 kHz) in water bath about 30°C for 30 min. Then, the filtrate was stored in test tubes for the next step. After UV spectrometer preheated for 0.5 h, background was scanned first by detecting the chloroform solvent. Each sample was detected three times and took the average as the finally result.
For testing the concentration of mineral elements, 0.3 g of dried sample powder was accurately weighted and put into the high-pressure digestion with 6 mL HNO 3 (65%), 3 mL H 2 O 2 (30%), and 1 mL ultrapure water. Then, the mixture was digested in Ethos One (Milestone, Italy) microwave closed system for 2 h. At last, the digestive solution was filtrated and diluted to 25 mL using ultrapure water and prepared for instrumental analysis. The tea standard reference material GBW07605 (tea leaves were produced by the Institute of Geophysical and Geochemical Exploration in Beijing, China) was applied for verifying precision and accuracy of the analytical method. The result showed that discrepancies between determined and certified contents were all below 10%, and the recovery rate was between 91% and 106%. In this experiment, all glassware was moistened by 60% HNO 3 first and then rinsed with deionized water to prevent contamination.

Data analysis
As far as we known, spectroscopic techniques not only collect useful chemical information but also collect interference information, which was caused by scatter, background disturbance, solvent perturbation, and so on. Therefore, several preprocess methods were used to optimize datasets, such as second derivative (SD) and multiplicative scatter correction (MSC). SD is a common method to eliminate the overlap peaks and baseline shifts. [25] MSC is used to segregate the informative absorbance of the analyte and the scattering signal in the spectral data. [26] After selecting the best preprocess method, the optimize datasets of FT-IR and UV were used to the further analysis.
The goal of data fusion is to find out if there are any potentialities in the existing data that could be useful for classification. In order to find the optimal method for evaluation of Boletaceae mushrooms from eight geographical origins, mid-level data fusion strategy was applied for data mining. Mid-level data fusion (also called feature level data fusion) selected feature factors which represented the main information of samples from different technique to compose a new data matrix. [27] Obviously, it is important that extracted feature factors from each instrument. The most common approaches to reduce dimension and select feature factors are principal components analysis (PCA) and partial least squares discrimination analysis. [28] The goal is to screen the optimal feature factors that represent the main sample information for identification of Boletaceae mushrooms from different geographical origins. As it's easy to be used and usually was applied prior to any other more complex classification, [29] PCA was applied to extract principal components and then fused many principal components obtained independently from the signals of each instrument.
To establish the classification model for identification of different origins Boletaceae mushrooms, 208 samples, accounting for two-thirds of all samples, were selected as training set and other 104 samples as test set using the classic selection algorithm of Kennard-Stone [30] in FT-IR, UV, ICP-AES, and fusion datasets. Based on different datasets, a growing popularity method of support vector machine (SVM) was applied as chemometrics methods to establish discrimination models. As a supervised pattern recognition method, SVM is able to analyze nonlinear cases even with small datasets. [31] SVM was widely used in food for classification and regression, such as Bougrini et al. [32] showed an effective method to detect adulteration in argan oil using electronic nose and electronic tongue combined with SVM. Devos et al. [33] found that spectroscopic technology combined with SVM can be used easily to discriminate olive oil from different regions. In this work, SVM was used to establish classification models on the basis on individual instrument and data fusion for discrimination of Boletaceae mushrooms from eight geographical origins. In this work, OMNIC (Version 8.2, Thermo Fisher Scientific Inc., USA) and UV Probe (Version 2.34, Shimadzu International Trade Co., Ltd.) were used to analyze FT-IR and UV spectra, respectively. PCA and SVM were performed using MATLAB (version R2014a, MathWorks, USA).

Results and discussion
Chemical assignments of absorption peaks in FT-IR and UV spectra All of 312 raw FT-IR spectra of Boletaceae mushrooms from different collection sites are shown in Fig. 2a. These spectra present observations of several characteristics, which are similar spectral absorption peaks. These absorption peaks were reported in previous studies. [34][35][36] Around the bands of 3294 cm −1 is mainly attributed to the stretching vibration of O-H, which may be caused by water, triterpene, polysaccharide, and sterol. In addition, the peak at 2930 cm −1 is -CH 2 vibration of lipoids and the peak at 1403 cm −1 is caused by CH 2 =CH-CH 3 of triterpene compounds. The peaks at around 1647 cm −1 attributed to the C-N stretching and C=O stretching vibrations, indicating the presence of protein components. What's more, the C-C stretching around at 1079 and 1025 cm −1 probably indicates the structures in chitin, a major structural polysaccharide in mushrooms. Furthermore, the region of 900-400 cm −1 is mainly assigned as β-D-glucan, the pyranose form of glucose, triterpenes, and so on.
A total of 312 UV spectra of mushroom samples were collected. Because of the serious noise at the region of 200-235 nm, no characteristic adsorption peaks at 400-600 nm, and the main peaks focus on 235-400 nm. The region of 235-400 nm can provide the main information of samples. As can be seen from the raw UV spectra in Fig. 2b, the distinct characteristic absorption peaks appear at 275, 285, and 295 nm. These peaks are attributed to polysaccharide and protein, and those compounds are closely related to antioxidant properties of mushrooms. [37,38] In general, there are no significant spectral differences among the Boletaceae mushrooms from different geographical origins by visual inspection. And this result showed the existence of similar chemical components in these samples.
In order to reduce the interferential information, SD and MSC were applied to preprocess FT-IR and UV spectral data, respectively. Figures 2a and 2b) represent the raw FT-IR and UV spectra, Figs. 2c and2d) for the preprocessed FT-IR and UV spectra, respectively. Comparing the result of raw spectra and preprocessed spectra, the spectral resolution has been improved obviously and the noise has been attenuated dramatically. Hence, the preprocessed spectral data were applied to achieve identification and evaluation aim.

Elemental analysis
As shown in Table 2, the element contents in fruiting bodies of Boletaceae mushrooms are presented. Mushrooms can be considered as rich in P, K, Na, Ca, and Mg, as well as trace elements such as Cu and Zn. Furthermore, the concentrations of K, P, Mg, and Ca are higher than that of Zn and Cu, which is same as reported by Kalač. [39] In particular, the concentration of K was determined at the range of 9897.08-12,919.64 mg kg −1 , which is in accordance with the previous studies. [40] P, an essential element for people, almost takes into all physiological chemical reactions, like forming the bones, teeth, and nucleic acid, maintaining balance of metabolism, and regulating acid-base balance. [41] The concentration of P was found at the range of 4324.43-5907.53 mg kg −1 , which can suit human demand in life. The concentration of Zn was found at the range of 92.13-122.14 mg kg −1 , this result was in accordance with the range of edible wild-grown mushrooms in the previous study (45.00-188.00 mg kg −1 ). [42] Furthermore, there are significant differences in the concentrations of elements from different origins. As can be seen in Fig. 3, the differences of contents of Cr in eight locations were significant, in which the minimum and maximum values are 24.81 mg kg −1 in HongHe and 197.70 mg kg −1 in QuJing, respectively. As far as we know, mushrooms have a very effective mechanism against heavy metal accumulation from the environment. Hence, the high concentration of Cr suggested that some Boletaceae mushrooms may grow in polluted sites.

Number of principal components
PCA is a common statistical method, which was widely used in data dimension reduction and pattern recognition. [43] In this paper, PCA was used to extract the feature variables for identification of Boletaceae mushrooms from different origins. The eigenvalues and accumulative contribution rate of the first several principal components were used to evaluate the good potential for representative of sample information. According to the Kaiser criterion, [44] those principal components, the eigenvalues, were higher than one and can provide more effective information for discrimination. Hence, the first 18 principal components of FT-IR, the first 7 principal components of UV, and the first 5 principal components of element were chosen. Moreover, the higher the accumulative contribution rate is, the more information it represents. The Table 2. The results of mineral concentrations determined in eight origins mushrooms. (mean ± SD, mg kg −1 The bold values of Ca, Cr, P, Na, K mean those elements are high content in mushrooms.
results of PCA of FT-IR, UV, and element datasets are in Figs. 4a-4c, respectively. As we can see in Fig. 4, the accumulative contribution of the first 18 principal components of FT-IR is 96.85%, the accumulative contribution of the first 7 principal components of UV is 98.90%, and the accumulative contribution of the first 5 principal components of element dataset is 74.31%. Therefore, these principal components can represent the main information of Boletaceae mushroom samples and can be used to complete the midlevel data fusion for further analysis.

Geographical origins traceability of Boletaceae mushrooms
In the present study, FT-IR, UV, and ICP-AES combined with multivariate statistical analysis were applied for geographical origins traceability, individually and jointly. SVM is an excellent analytical method for discriminant analysis of complex data. Four parameters, kernel function parameter (g), penalty parameter (c), the accuracy of sevenfold cross validation, and the test set were used to evaluate the performance of discrimination model. These parameters can represent the reliability of the classification model. For example, the g is closely related to the classification accuracy, and c is the error term. Hence, the lower the c is, the more robust the discriminant model is. The accuracy of cross validation (sevenfold) and the test set represents the stability and predictive ability of classification models, respectively. The SVM models of Boletaceae mushrooms from different origins on basis of multiple data matrixes are shown in Fig. 5. Figures 5a-5e represent the classification models on the basis of FT-IR, UV, ICP-AES, data fusion of FT-IR, and UV, and data fusion of FT-IR, UV, ICP-AES, respectively. As we can see, the classification models of data fusion (Figs. 5d and5e) have the higher accuracy of the test set than single instrument (Figs. 5a-5c). Especially, the SVM model of data fusion of three instruments only has one classification error. Moreover, the result of classification model of FT-IR is similar with the classification model of UV, whereas the classification model of ICP-AES has the most classification errors. For taking a full consideration, all parameters of classification models have been exhibited in Table 3. As we can see, the largest value of penalty parameter is FT-IR (2352.5342), followed by single UV (147.033), data fusion of FT-IR and UV (5.278), single ICP-AES (3.0314), and data fusion of three instruments (3.0314). This result showed that single FT-IR or UV has large error, which made the classification models of FT-IR and UV unreliable. In the classification models of single instrument, the accuracy of the test set of ICP-AES (76.92%) is lowest and the accuracy of the test set of FT-IR (85.58%) is approximately equal to UV (86.54%). Furthermore, the crossvalidation accuracy of FT-IR is 77.40%, which is higher than another single instrument. It suggested that FT-IR is a rapid and reliable method for identification. As reported by Li et al., [13] FT-IR spectroscopy combined with multivariate data analysis were applied for identification of different regional W. extensa sclerotia samples. In addition, the penalty parameter c of data fusion of FT-IR and UV is larger than ICP-AES, but cross validation and the test set accuracy of FT-IR and UV data fusion are higher than that of ICP-AES, respectively. From the above, the classification model performance of data fusion is better than single instrument. This result is consistent with our previous study [45] that data fusion strategy, as an simple and reliable technology, has good performance for geographical origins traceability. Meanwhile, the accuracy of cross validation and the test set of data fusion of three instruments are 84.62% and 99.04%, respectively. The penalty parameter (c) is 3.0314. Compared with data fusion of two instruments, the classification models of data fusion of three instruments have a higher predicted accuracy. It indicated that the more information is fused, the better performance of the model is. Moreover, we inferred that the difference of species may provide the difficulties for geographical origins traceability of Boletaceae. This factor may be the reason that the predicted accuracy of classification model was less than 100%.

Conclusion
In this paper, FT-IR, UV, and ICP-AES were applied for geographical origins traceability, individually and jointly. PCA was used to reduce dimensionality and extract the feature variables. SVM was applied to establish discriminant models for geographical origins traceability of Boletaceae mushrooms. Subsequently, compared the classification performance of each single and data fusion strategies. Comparatively, data fusion strategy, combined with SVM, has better performance of classification models than single technique. In particular, data fusion of three instruments was proved as the best discriminant strategy with the accuracy of 84.62% in cross validation, the accuracy of 99.04% in the test set, and the penalty parameter is 3.0314. It indicated that the more information is fused, the better performance of the model is. In conclusion, this work demonstrated that data fusion strategy from multiple sources combine with SVM could improve the performance of models for geographical origin traceability. Therefore, data fusion combine with multivariate statistical analysis may provide a reliable method for food traceability and quality control.