Nondestructive prediction of physicochemical properties of kimchi sauce with artificial and convolutional neural networks

ABSTRACT This study presents a comparison of prediction performances by an artificial neural network (ANN), well-known deep convolutional neural network (D-CNN) models, and four proposed shallow convolutional neural network (S-CNN) models to forecast three key physicochemical properties (PCPs): salinity, °Brix, and moisture content of kimchi sauce (KS). The S-CNN models effectively minimized underfitting issues found in D-CNN models, predicting PCPs with a low error rate even with small image datasets. Furthermore, the ANN model using color values also allowed for competitive predictions. We used two nondestructive prediction strategies: (i) using color values with ANNs for immediate application in small-scale enterprises and (ii) using photographs as input for S-CNN models, allowing for faster and more accurate quality prediction. These results highlight the potential for image-based quality prediction in food science, possibly enhancing the efficiency and accuracy of real-time quality control. Future enhancements could incorporate additional data sources for improved predictive performance.


Introduction
Kimchi is a popular traditional Korean food manufactured by mixing sauces and various vegetables, which are subsequently fermented by natural lactic acid bacteria during long-term storage.Kimchi sauce (KS) features a variety of flavors, including sweet, sour, salty, and spicy, which are imparted by specific components, such as radish, red pepper powder, salt, sugar, salted seafood, and water. [1,2]In recent years, the demand for various types of KS has expanded considerably in South Korea due to customer interest in producing kimchi at home based on personal preferences.Consequently, the commercial manufacturing of KS has increased, and various products have been manufactured. [1,2]anufacturing products of consistent quality and maintaining product integrity are both crucial for increasing consumer trust and brand image in the food-processing industry. [3]The higher the quality and consistency of food, the greater the consumer's inclination to repurchase. [4]Furthermore, because KS accounts for 20% of kimchi products, the quality of KS could affect the physicochemical properties (PCPs) of kimchi or its fermentation, consequently affecting the final kimchi product's quality. [5]Numerous factors influence the quality of KS, including salinity, °Brix, moisture content, and color.These properties are important for customer acceptance because they can be related to saltiness, sweetness, sourness, viscosity, and appearance in sensory evaluations.PCPs and sensory attributes have been found to be related to a variety of foods. [6]Therefore, quality control of PCPs is critical in the manufacturing process of KS.
Although conventional PCP measurement methods have high precision and reliability, they necessitate multiple preparatory stages and are destructive and time-consuming. [7]Therefore, nondestructive, non-targeted fingerprinting approaches (e.g., spectroscopic and imaging techniques) may be better suited for food quality control, producing quick and cost-effective results.As a result, vision-based rapid real-time predictions are becoming more appealing alternatives. [8][11][12] The successful application of ANNs has been demonstrated in predicting the moisture content of mangoes [9] and shelled pistachios, [10] determining the moisture content of paddy based on color values, [11] and quantifying the contents of volatile oil, total alkylamides, and impurities in Gongjiao. [12]onvolutional neural networks (CNNs), [13] a pivotal advancement in deep learning, are particularly adept at handling high-dimensional vector inputs.Renowned for their exceptional performance, models like DenseNet, ResNet18, and ResNet50 have led to significant breakthroughs. [14,15]hese models and their deeply-inspired CNN derivatives have found widespread application in food science for tasks such as classification and feature analysis, particularly when analyzing massive data. [16,17]owever, deep CNN (D-CNN) models could struggle when used to predict properties from food images using small datasets.They may risk overfitting -learning noise from the data -which could impair their general applicability. [18]Simultaneously, these deep models are also prone to underfitting, especially with smaller datasets, limiting their ability to capture the full complexity of the data. [19]Such challenges become acute in real-world food factories, especially small to medium enterprises, due to restrictions in implementing costly and data-intensive equipment and in the volume of training data.Therefore, there is an urgent need for research into shallow CNN (S-CNN) models, which could mitigate these issues and enhance prediction processes.
Accordingly, we sought to compare the prediction performance of an ANN model, three wellknown D-CNN models (DenseNet, ResNet18, and ResNet50), and four proposed S-CNN models.The aim was to develop a predictive model for forecasting the three key PCPs (salinity, °Brix, and moisture content) of KS.In doing so, we aspire to present a reasonable CNN model for small food image datasets.Our proposed approach holds practical significance, particularly in these contexts, as it not only reduces the burden of handling extensive, real-time data but also improves prediction accuracy even with minimal data.Consequently, this approach could decrease operational and processing costs by making the process more efficient and accurate.

Physicochemical analysis of KS samples
Twenty different types of KSs were acquired from a local market (Gwangju, South Korea).Each KS sample was homogenized using a blender (HR1372, 700 W, Philips, Amsterdam, Netherlands).The moisture content was analyzed using an infrared moisture sensor (MB45, Ohaus, Parsippany, NJ, USA).Thereafter, the mixture was filtered using Whatman No. 2 filter paper (Whatman, Maidstone, England), and the filtered solution was used for further analysis.°Brix was assessed using a refractometer (PAL-1, Atago Co., Tokyo, Japan), and the salinity was determined using a salometer (PAL-ES2, Atago Co.).The brightness (L*), redness (a*), and yellowness (b*) values were measured using a colorimeter (CR-400, Konica Minolta Inc., Osaka, Japan).Chromaticity was measured at the center of the KS on the plate before photography.At least three replicates were performed for each experiment.

Image acquisition and processing
The image acquisition system consisted of (a) four fluorescent lamp light sources (8 W), (b) one digital camera (NEX5-R, Sony Corp., Thailand) that captured photographs with a resolution of 3344 × 2244 pixels, and (c) a frame for placing the lamp and camera within the black curtain to block external light sources.The camera was placed 17 cm vertically over the KS located at the center of the viewing field.KS photos were saved as JPEG (3344 × 2244 pixels) and acquired in RGB formats after being photographed.A custom white balance was set using the white card (White Balance Card, 21.59 × 27.94 cm, X-rite) placed under the same lighting conditions, ensuring consistent color balance across the dataset.
A set of images from various samples was obtained and a random, non-overlapping extraction of 73 × 73-pixel sub-images was performed within each sample area.This led to the acquisition of a dataset comprising 100 input images per sample, totaling 2000 images.The extraction method ensured that no area within the original images was sampled more than once, providing diverse and representative input data for the CNN model.

ANN
The ANN was proposed to predict three PCPs (salinity, °Brix, and moisture content) with measured color values as independent variables.An ANN is composed of an input layer, several hidden layers, and an output layer.The training step is an important stage for the ANN, in which input data are fed into the network with the desired output data, and the weight is changed so that the network attempts to produce the desired output data.The training was terminated when the error ceased to decrease with increasing epochs (using early stopping with a patience set to 5), and the network saved the weights of the epoch with the lowest error value to build a model, which was subsequently predicted to PCP using a test set.
Hidden layers are crucial in an ANN and are responsible for extracting and learning complex data patterns at different abstraction levels. [12]The process of selecting the number of hidden layers is critical, typically involving a trial-and-error procedure to achieve the desired performance. [20]In addition, the activation function, introducing non-linearity to the model, is another major factor affecting an ANN's performance, as it enables learning from complex, non-linear relationships. [21]The Rectified Linear Unit (ReLU), ELU, Tanh, and Leaky ReLU have been reported to exhibit strong performance in regression problems. [21]For preliminary testing, the number of hidden layers tested ranged from 1 to 5. The number of neurons in the hidden layers started at 8 and increased by a factor of 2 for each successive layer.For example, a model with 4 hidden layers would follow a configuration of 64-32-16-8 neurons, ending with a single output neuron.Moreover, the four activation functions mentioned earlier -ReLU, ELU, Tanh, and Leaky ReLU -were tested for each configuration of hidden layer numbers.Based on preliminary experiments, four hidden layers and the ReLU function (Eq.( 1)), which provided optimal performance (Figure S1), were used for further analysis.
The detailed structure of the six-layer deep feedforward network used in our study is depicted in Figure 1(a), and the model parameters are listed in Table S1.

CNNs
CNNs have distinct advantages over traditional ANNs in certain applications, particularly when dealing with image data.CNNs are specifically designed to automatically and adaptively learn spatial hierarchies of features, making them uniquely effective for tasks such as image classification, object detection, and even semantic segmentation. [13,14]In the initial stages of this study, well-established D-CNN architectures -DenseNet, ResNet18, and ResNet50 -were utilized to perform a comprehensive analysis of the KS image dataset (Figure S3). [14,15]Considering the present issue was regression-based, necessary adjustments were made to the original models.This primarily entailed changing the final layer's activation function to linear in each model, reflecting the continuous output prediction objective typical of regression tasks.Subsequently, a custom eight-layer S-CNN architecture was designed and divided into two parts: a feature extractor and a regression head (Figure 1(b)).The feature extractor included convolutional layers, max-pooling layers, and a residual connection.Pooling, a critical CNN feature, helps reduce the dimensions of the activation map, thus boosting prediction stability, quickening computations, and lowering overfitting risks.Tests were carried out by adjusting the number and position of the max pooling layers to optimize performance.As detailed in Table 1, four specific S-CNN models (CMCM, MCCC, CCCM, and CCCC), each with unique convolution and max pooling layer sequences, were trained and evaluated.After preliminary tests with various activation functions (Figure S2), the ReLU function (Equation ( 1)) was selected for its superior performance and used in both network segments.Final predictions were produced through a fully connected layer matching the output classes count.Training ceased when the error ceased to decrease despite increased epochs (using early stopping with patience set to 5).The model with the lowest error epoch weights was then constructed and used to predict PCPs using the test set.Consequently, three D-CNN and four S-CNN models were developed using images to predict three types of PCPs, and the most accurate model for estimating the PCPs of KS was identified upon comparison.

Different input data used for the ANN and CNNs
ANNs and CNNs have distinct advantages and disadvantages, as well as different applications.An ANN can accept a 1D vector as the input and generate output via another hidden layer vector that is fully connected to that input.The ANN takes longer to calculate weights because it fully connects input and output neurons.Alternatively, CNNs use the structure of images across different layers and consume less time because they lead to sparse connections between input and output neurons. [22]In ANNs, for example, massive weights for a single neuron in the output layer must be computed for an RGB image; thus, in such instances, CNNs may be more appropriate.In addition, CNNs cannot extract features from color value data, and ANNs may be more suitable when using color values only (L*, a*, and b*), as only three weights are computed.Consequently, diverse types of input data were used for the ANN and CNNs for different applications as follows: (1) The ANN used directly measured L*, a*, and b* values as input, which are characteristics that effectively condense the KS image.This can be used in manufacturing processes where photography is challenging.(2) In the CNNs, images were used as input because of their excellent feature extraction capabilities.This is useful in manufacturing environments in which photography is available.

Model training and evaluation
The 2000 image dataset was split into training (50%), validation (20%), and test sets (30%).For each variable, the independent and dependent variables were standardized using a min-max scaler. [23]Each network was trained and tested three times on a dataset, and the average value was used to fairly evaluate all models.The mean squared error (MSE) as a loss function was used to check for errors or deviations during the learning process.MSE is often used as a loss function in deep-learning model training due to its effective penalization of larger errors, crucial for scenarios where large deviations from the actual values are undesirable. [24]fter inputting the training and test data into the trained model, the data was re-scaled from the min-max scale.Thereafter, the mean absolute percentage error (MAPE), the root mean squared error (RMSE), and R-squared (R 2 ) were used to calculate the difference between the actual and predicted values of all models.MAPE was used due to its clear representation of the average percentage deviation between the model's predictions and actual values.This metric is widely favored in various fields, including food science, for prediction accuracy due to its scale independence and interpretability. [25,26]MSE was employed to estimate the standard deviation of prediction errors or residuals, enabling an understanding of data concentration around the best-fit line. [24]R 2 is a statistical metric indicating how closely a model's predictions align with the actual data; an R 2 value of 1 signifies a perfect fit. [25]SE, MAPE, RMSE, and R 2 are given by Eqs. ( 2), (3), (4), and (5), respectively: where y i is the experimental result, b y i is the predicted result, � y is the mean value of the experimental result, and n is the number of data points.
Permutation importance analysis was applied to the ANN model after training.This involved permuting each feature (L*, a*, and b*) within the test set, subsequently assessing the impact of this permutation on prediction quality. [27]If the permutation results in a significant increase in model error, it implies the permuted feature's importance, yielding a high score.Conversely, if the error change following permutation is negligible, the feature is considered less critical, thus receiving a lower score.All deep-learning models and data processing were performed on a workstation with an Intel (R) Core (TM) i3-6100 (3.70 GHz) CPU and 8 GB of memory.Python 3.6 (Python Software Foundation, Wilmington, DE, USA) was utilized as a software tool.Furthermore, all deep-learning models were generated using TensorFlow and Keras frameworks.The function "permutation_importance" from sklearn.inspection was implemented to carry out this analysis and compute permutation importance scores for each prediction value, namely salinity, °Brix, and moisture content.

Statistical analysis
Statistical analysis and visualization of experimental data were performed using Microsoft EXCEL (Microsoft, Redmond, WA, USA) and SPSS ver.19.0 (SPSS Inc., Chicago, IL, USA), including the analysis of variance and Duncan's multiple-range test to assess significant differences at the 95% confidence level (p < .05).

KS data exploration and visualization
Three PCP results were acquired: salinity between 2.65 and 4.35% (w/w), °Brix range from 12.50 to 20.33, and moisture content between 69.54 and 82.51% (w/w) (Figure 2(a)).A range of PCPs showed significant (p < .05)diversity in various KS, which could be interpreted as the variability managed in the actual manufacturing of KS. Figure 2(b) provides a comprehensive understanding of the relationship between color and other PCPs, using the coefficient of correlation.It was discovered that the L* and b* values had strong positive (moisture content) and negative (salinity and °Brix) correlations.According to previous research, highly significant correlations were observed between the L* and b* values, solid and moisture content, and viscosity and solid content of kimchi seasoning. [1]Nevertheless, the a* value may also carry significant information for predicting PCPs.The red pepper powder, which accounts for about 15% of KS, can have a significant impact on the a* value. [28]This influence can subsequently lead to variations in the PCP values, depending on the amount of red pepper powder used.To understand and model the relationship between the a* value and PCPs, which is anticipated to be non-linear, the application of deep-learning models was necessary.
Meanwhile, deep learning typically relies on data, with the quantity and quality of data directly influencing the model's performance. [29]As only the PCP values of the sauces listed in Figure 2(a) were obtained in this experiment, prediction findings of the deep-learning model have predictive validity within the maximum and minimum values of each PCP.When the model encountered values outside of this PCP range, it produced unexpected outcomes, which is a common feature of neural networks.As neural networks have a known limitation in their ability to extrapolate beyond the range of the data used for training, this behavior was expected. [30]For example, in the food industry, the relationship between KS and PCPs can vary depending on the product or processing line.To apply this method to low-quality detection, the deep-learning model must be trained by manually creating multiple low-quality products that fall outside the proper PCP range.In addition, by inputting massive amounts of similar common-sense data throughout the modeling process stage, more accurate prediction results can be generated.Improving the richness and representativeness of the training data can lead to an enhancement in both the accuracy and robustness of the model.

Deep-learning model training
The performance of the ANN and CNN models on training and validation data is shown in Table 2. Following the model training, all S-CNN models outperformed the ANN in terms of prediction across the salinity, °Brix, and moisture content datasets.Among the S-CNN models, the CMCM demonstrated the best predictive performance on these data.As shown in Figure 4, the CNN model is welltrained without underfitting the training data.The ANN model exhibited a relatively higher error rate compared to the S-CNN models; however, it can still be deemed acceptable, considering the smaller vector of input data that the model was trained on.Furthermore, the ANN had a lower error rate than the D-CNN models, further validating its efficiency.

Performance of prediction model
Each trained prediction model was tested to evaluate its performance.When all models were exposed to previously unseen test data, the error values increased compared to the prediction errors for the training data (Table 3).Multiple studies related to food quality prediction have similar findings to this study.Golpour et al. [11] used an ANN to forecast the moisture content of paddy based on color values; their optimized ANN had an MSE of 0.00105 during training and a mean absolute error of 0.031 during testing, and compared to that in the training phase, the model's mean absolute error increased by 0.02995.Furthermore, Balbay et al. [10] found that the use of an ANN with the Levenberg -Marquardt learning rule for predicting the water content of shelled pistachios in a fixed bed dryer system resulted in higher RMSE values during testing (2.23, 2.18, 1.66, and 1.76) compared to those during training (0.55, 0.37, 0.53, and 0.92, respectively).In this study, the developed ANN model , it was the a* value that revealed the greatest permutation importance (Figure 3).This outcome implies that the ANN model adeptly encapsulated the non-linear relationships between a* and the other PCPs.Interestingly, the redness (a* value) serves as a crucial quality assessment factor for consumers, with red pepper powder significantly contributing to this factor.The S-CNN models exhibited better performance with lower error rates compared to the D-CNN models (Table 3).This could be attributed to underfitting observed in the D-CNN models, where the predicted values remained constant and did not vary with input (Figure S3).In particular, when trained on relatively small datasets, these deeper models may fail to capture the full complexity of the data, leading to a subpar performance on unseen data. [19]Among the S-CNN models, the CMCM exhibited the lowest average MAPE (6.85%) and RMSE (1.43), calculated over three PCPs of KS.Max pooling takes the largest value from each activation map submatrix and produces a new matrix from it.This ensures that the learnable features remain limited in quantity while simultaneously maintaining the key aspects of every image, thus helping to resolve the underfitting. [31]As illustrated in Figure 5, the CMCM demonstrates a wide distribution of predictions for the test data without underfitting, indicating good flexibility in the model.The recent tendency in CNN development has been to decrease max pooling and extend the convolution layer capable of feature extraction while adding a preventive layer of underfitting, such as a droplet or batch normalization layer. [32]However, when we used droplet and batch normalization, and the number of convolution layers was increased to five or higher, a model with severe underfitting was obtained (data not shown).Consequently, droplet and batch normalization, previously considered solutions to the underfitting of D-CNN models in small dataset image classification, [33] are not believed to solve the underfitting in PCP regression prediction using input images.Hence, for the purpose of food quality control, the CMCM model, which avoids underfitting, could be considered for predicting quality using input images.
Meanwhile, the prediction results of the S-CNN models were better than those of the ANN (Table 3).This improvement can be attributed to the S-CNN models utilizing image vectors, while the ANN relied on measured color values as input.The S-CNN models, processing images interdependent of PCPs, could extract richer information and thus better predict the outputs.On the other hand, the ANN, with its input data constrained to color values, found its PCP feature extraction somewhat limited.Despite these contrasts, the ANN's ability to effectively encapsulate non-linear relationships between a* and other PCPs shows its potential for real-world industrial applications.
Despite a substantial amount of research on food product quality using computer vision algorithms, there has been little commercial use. [33]The most challenging barriers to the widespread adoption of these technologies include high costs, a lack of flexibility in the industrial environment, and a lack of technical skills. [3]In industrial applications, these techniques must be inexpensive, Inputs: input shape = (730, 730, 3), batch size = 14, optimizer = adam (learning rate = 0.0001).Conv2D: filter = (3, 3), kernel size = (5, 5), padding = "same," strides = (2, 2), activation = "relu."MaxPooling2D: pool size = (2, 2).CMCM, MCCC, CCCM, and CCCC: CNN models with feature extractors comprising sequences of convolutional (C) and max pooling (M) layers in varying arrangements.suitable in size, and efficient.The vision algorithms presented in this study, which use food images from a camera and S-CNN models to estimate PCPs, offer a relatively inexpensive method for food quality analysis that could be widely applied in the food industry.By providing the ability to increase food manufacturing efficiency and consistency, while simultaneously reducing operational costs, these technologies offer an alternative to expensive machinery replacements.They can be utilized to adapt the current equipment to operate on a new method of working. [34]ccording to Kaymak-Ertekin and Gedik, [35] a model with an MAPE within 10% is deemed acceptable, considering the MAPE corresponds to the relative percentage deviations they described.Our CMCM model came close to this standard, with a slightly higher MAPE of 10.18% for salinity, demonstrating that our predictions remain solid despite the inherent complexity of predicting salinity.Moreover, the °Brix and moisture content models excelled with MAPEs of 8.08% and 2.30%, respectively, comfortably falling within the 10% threshold.This suggests that these models could be valuable for industrial quality control, potentially refining management system accuracy.Although   our models have shown notable results, further improvements can be made to optimize their performance.One potential approach, as suggested by Romano et al., [36] is to combine additional data sources with images; for example, using a laser for moisture content prediction in bell peppers.For high industrial utilization, it is important that the data can be easily acquired during the processing stage, such as using lasers or sensors.By integrating diverse and readily obtainable data sources, it may be possible to overcome the limitations in prediction performance for the PCPs of KS.

Conclusion
The S-CNN models minimized the underfitting phenomenon observed with the D-CNN models when dealing with small food image datasets, thus enabling the prediction of the PCPs of KS with a low error rate.Additionally, the ANN model, which was established using color values as input, also made it possible to predict the PCPs with a low error rate that was competitive with the S-CNN models.The two strategies for nondestructive prediction presented in this study are as follows: (i) PCPs can be predicted in difficult-to-photograph conditions by measuring the color values using an ANN model.This can be applied immediately without new equipment investment, benefiting small-scale KS production enterprises.(ii) If photographs can be captured and used as input for the CNN, it allows for faster and more accurate quality prediction than the ANN.Thus, various deep-learning models for inexpensive food quality analysis using food images can be developed via the vision algorithms provided in this study.Furthermore, because the software is free and easy to use, deep learning can be effectively applied for real-time quality control of PCPs in other food products.Our findings highlight that image-based quality prediction for the PCPs of KS is particularly promising.To further enhance this prediction method, additional data sources, such as lasers or sensors, could be incorporated alongside images.

Figure 1 .
Figure 1.A schematic representation of the (a) artificial neural network (ANN) and (b) convolutional neural network (CNN) models used in this study.The blue layers are convolutional ones, the green max-pooling layers, and the yellow marked ones are dense layers.The output layer has one neuron with the identity activation function.

Figure 3 .
Figure 3. Impact of feature permutation on prediction quality of the proposed artificial neural network (ANN) model.

Figure 4 .
Figure 4. Comparison of actual and predicted physicochemical properties in kimchi sauce using the proposed convolutional neural network (CNN) model (CMCM) on the training dataset.CMCM: CNN model with feature extractor comprising a sequence of convolutional (C) and max pooling (M) layers.MAPE (%): Mean absolute percentage error.RMSE: Root mean squared error.R 2 : R-squared.The diagonal line: Perfect predictions.

Table 1 .
Structure and parameters of various shallow convolutional neural network (S-CNN) models.

Table 2 .
Model performance of the ANN and CNNs on training data.Mean absolute percentage error.RMSE: Root mean squared error.CNN: Convolutional neural network.ANN: Artificial neural network.CMCM, MCCC, CCCM, and CCCC: CNN models with feature extractors comprising sequences of convolutional (C) and max pooling (M) layers in varying arrangements.DenseNet: Densely Connected Convolutional Networks.ResNet18 and ResNet50: 18-layer and 50-layer Residual Neural Networks, respectively.PCPs: Physicochemical properties

Table 3 .
Model performance of the ANN and CNNs on test data.Bold]:the best performance.MAPE (%): Mean absolute percentage error.RMSE: Root mean squared error.CNN: Convolutional neural network.ANN: Artificial neural network.CMCM, MCCC, CCCM, and CCCC: CNN models with feature extractors comprising sequences of convolutional (C) and max pooling (M) layers in varying arrangements.DenseNet: Densely Connected Convolutional Networks.ResNet18 and ResNet50: 18-layer and 50-layer Residual Neural Networks, respectively.PCPs: Physicochemical properties