Assessment of integrated freshness index of different varieties of eggs using the visible and near-infrared spectroscopy

ABSTRACT This study aimed to determine the integrated freshness index (IFI) of eggs using Vis-NIR spectroscopy and optimized support vector regression, which gave the first insight into the freshness quality of eggs from the biochemical essence of quality changes. In this work, Vis-NIR transmission spectra of brown-shell and pink-shell egg samples were analyzed between 500 nm and 900 nm. Standard normal variables (SNV) were used to normalize the spectral data, and the Shuffled Frog Leaping Algorithm (SFLA) and Competitive Adaptive Reweighted Sampling (CARS) were used to choose the optimal wavelengths. The quantitative analysis model of IFI was developed using a support vector regression (SVR) that was optimized using Grid Search (GS), Genetic Algorithm (GA), and Particle Swarm Optimization (PSO). After conducting a comparative analysis, it was determined that the GA-SVR model based on 63 wavelengths screened by the SFLA best predicted IFI with a training set coefficient of determination (Rc 2) of 0.900, root means square error (RMSEC) of 0.005, a prediction set coefficient of determination (Rp 2) of 0.816, root mean square error (RMSEP) of 0.012 and relative analysis error (RPD) of 2.077. The results demonstrate that the model can be used to simultaneously perform nondestructive detection of two distinct egg IFI variants, suggesting broader applicability and enhanced model reliability.


Introduction
Eggs are one of the highest-quality sources of protein in the diet, but their internal changes during storage due to many factors such as feed, temperature, humidity, transport and storage conditions, and time. [1]As storage duration lengthens, egg quality decreases, which has a significant impact on their nutritional and commercial value.As egg production expands and becomes more industrialized, the enhancement of egg quality has become a top priority, and the testing of egg quality cannot be neglected.
Eggs are rich in various nutrients and are one of the most important food sources for providing nutrition to humans.The nutrient content of eggs from different breeds, breeding methods, and individual poultry varies considerably. [2]And the chemical changes during natural egg quality decline are a complex process.The relative mass fraction of S-ovalbumin in egg whites during storage increased from 5% in newly laid eggs to 81% after 6 months of refrigeration. [3]The results of our previous study revealed a high correlation between S-ovalbumin and storage time and common indicators of egg freshness, [4] and the changes in S-ovalbumin content of brown-shell eggs and pinkshell eggs during storage were significantly and strongly correlated.Namely, the changes in S-ovalbumin content of different egg varieties were less influenced by egg variety under the same storage conditions. [5]In addition, ovalbumin is irreversibly converted to S-ovalbumin during storage and the rate of conversion is only influenced by pH and temperature, so S-ovalbumin was considered as a reference indicator to evaluate the freshness of commercial eggs. [6]xisting egg quality evaluation relies mostly on egg weight, albumin height, protein content, eggshell strength, Haugh unit, egg shape index, eggshell color, air chamber height, and egg-specific gravity. [7,8]Standard freshness indicators such as Haugh unit and yolk index may only reflect a portion of the freshness quality of eggs, and rarely integrate the biochemical essence of quality changes; so, the freshness of eggs is not comprehensively and profoundly expressed.If it is possible to test and grade various egg species based on their primary protein content and multiple traditional freshness quality indicators, the quality of the eggs examined will be more complete.Moreover, this technique of grading not only increases the productivity of companies but also enriches the market to suit the needs of various consumer groups.
It is difficult to evaluate the freshness of eggs with the naked eye, especially when the interior protein composition changes, destructive testing is required to determine the freshness accurately. [9]urrently, biochemical methods are used domestically and internationally to detect egg protein content, albumin height, and yolk diameter.These methods have the disadvantage of long detection cycles and complex procedures, and they are destructive tests that fall far short of modern detection standards.Domestic and international nondestructive testing methods for egg freshness include primarily electronic nose technology, [10] dielectric properties, [11] machine vision technology, [12] and spectroscopy technology, [13][14][15] while visible/near-infrared technology is favored by a high population of egg industry researchers due to its rapidity, nondestructiveness, and topicality.Dong used Vis-NIR transmission spectroscopy to distinguish between unfertilized and fertilized duck eggs before hatching. [16]Matthias utilized Vis-NIR point spectroscopy for in-egg sex identification and discovered that technique was noninvasive, quick, and accurate for sex identification of brown eggs from chickens with sex-specific plumage color. [17]Kemps used Vis-NIR transmission spectroscopy and lowresolution proton nuclear magnetic resonance spectroscopy to measure the freshness of individual egg proteins. [18]Li were able to detect the total egg protein content by combining Vis-NIR spectroscopy with Kjeldahl nitrogen analysis. [19]ach predictive model based on spectroscopy has a specific fitness range, meaning that, in practice, each model can only produce optimal results for a specific range of samples, instrument conditions, and testing conditions.Numerous studies are currently based on the freshness index of a specific egg species.Few studies have been conducted on the nondestructive evaluation of the quality of multiple species' eggs at the same time.When utilizing the established model to predict new samples, the fitness of the model must be assessed.Previous research revealed that the mean spectra of brown-shell eggs and pink-shell eggs were not significantly different in the near-infrared range, but were significantly different in the visible range.The global update algorithm enabled the optimized model to simultaneously detect the HU, yolk index, and S-ovalbumin content of the two egg varieties.In addition, few researchers have investigated the specific protein content of different egg varieties using Vis-NIR spectroscopy in terms of the nature of biochemical changes in eggs, and there are no studies on indicators for the comprehensive assessment of egg freshness in conjunction with the specific protein content of eggs.
Accordingly, the concept of Integrated Freshness Index (IFI) of eggs is introduced for the first time in this work, and the purpose of this study is to evaluate the integrated freshness of two egg kinds.The research focuses on the following: (1) Determining the weights of albumin height, yolk diameter, yolk height, egg weight, and S-ovalbumin content, as well as developing a formula for calculating IFI.(2) Clarifying the optimal pretreatment strategy for IFI based on Vis-NIR spectra and to identify the wavelengths of interest.(3) Establishing an support vector regression (SVR) model based on parameter optimization, and after a comparative analysis, determining the model with the best prediction performance to achieve quantitative IFI prediction.

Samples preparation
The trial samples consisted of 270 eggs of two breeds that were laid on the first day at a chicken farm.All of the eggs were flawless and devoid of cracks.There were 135 eggs with brown shells and 135 eggs with pink shells.All eggs were kept in an incubator with a consistent temperature and relative humidity of 22°C and 65%.At 5-day intervals, i.e. on days 1, 6, 11, 16, 21, 26, 31, 36, and 41, 15 brownshell eggs and 15 pink-shell eggs were randomly selected and numbered sequentially.Due to test errors or loose yolk, 16 brown-shell eggs and 10 pink-shell eggs failed the test for yolk height or yolk diameter, leaving 244 good samples.

Experimental methods
Vis-NIR spectroscopy system and spectral acquisition: The visible/near-infrared spectroscopy system was manually constructed utilizing a USB2000+ visible/near-infrared fiber optic spectrometer (Ocean Optics, USA).The system concept is depicted in Figure 1.Each egg's visible/near-infrared spectrum was collected experimentally.The following acquisition parameters were set: The integration time was set to 60 ms to avoid distortion of the acquired spectral data; the average number of scans was set to 3 to ensure that the spectral data were obtained in as little time as possible while the spectral data were more accurate; the smoothing width was set to 5 to avoid large spectral noise without losing the detailed features of the spectral data; the spectral band range was the original range, i.e. 349 nm -1000 nm, and for each experiment, randomly selected eggs were positioned one by one with their long axis parallel to the ground to prevent any light leakage during the transmission spectrum acquisition.
Egg freshness indicators: After collecting the spectral data, each egg was weighed with an electronic balance (W i ), and then broken and placed on a glass plate.The concentrated protein height, i.e. the part of the egg white with significant thickness, the yolk diameter, and the yolk height were measured at three different locations with a digital vernier caliper, and the average values were taken as the final albumin height (HA i ), yolk diameter (D i ) and yolk height (HY i ), respectively.
S-ovalbumin content: The following steps were taken to measure the S-ovalbumin content in broken eggs: Each egg's egg white was separated and placed in one of fifteen 100-mL beakers (numbered 1 to 12).At 4°C, each was agitated using a magnetic stirrer.Then, 25 mL of phosphate buffer at pH 7.5 was added to each of the 15 100-mL beakers containing (5 � 0.05) g of egg whites.The solution was agitated for five minutes using a magnetic stirrer.For each egg, 5 mL of the combined solution was pipetted into two 20 mL sealed glass test tubes labeled i A , i B , where I represent the egg number.The test tube (i A ) was immersed in a 75°C water bath for 30 minutes before being allowed to cool. 5 mL of precipitator was added to the solution in test tubes (i A and i B ) and then emptied into centrifuge tubes.Then, 5 mL of precipitator was added to each test tube to rinse its inner wall, and the contents of each test tube were transferred to their respective centrifuge tubes.After 10 minutes, the mixture was centrifuged for 5 minutes at a speed of 12000 rpm. 2 mL of the supernatant from the top of each centrifuge tube was pipetted into a 10 mL centrifuge tube, and 4 mL of the biuret solution was added to each centrifuge tube.The mixture was allowed to rest at room temperature for approximately 30 minutes.Each sample's absorbance at 540 nm (OD heated , OD unheated ) was determined using a UV spectrophotometer. 2 mL of deionized water and 4 mL of biuret solution constituted the blank control sample.S-ovalbumin content was determined using the formula (1). [4]À ovalbumincontentð%Þ ¼ OD heated OD unheated � 100% (1)

Data processing
Calculation of index weights using principal component analysis: The principal component analysis is a mathematical transformation technique, it changes a given collection of correlated variables into a new set of uncorrelated variables by a linear transformation, and then arranges these new variables in decreasing order of variance. [20]In the mathematical transformation, the total variance of the variables is held constant so that the first variable has the largest variance, referred to as the first principal component, and the second variable has the second-largest variance and is uncorrelated with the first variable, referred to as the second principal component.By comparison, there are i major components for i variables.The number of evaluation indicators is then reduced based on the amount of the cumulative contribution of principal components, followed by the determination of loading coefficients and comprehensive indicators based on the single indicators and principal components.Before performing principal component analysis on the data, the data must first be normalized due to the varying metrics between the various data categories.The data were normalized using the polar difference method, and equations ( 2) and (3) were used to calculate positive and negative indications, respectively.
Here, A i represents the normalized metric value, while X i represents the positive metric value before normalization, Y i is the negative metric value prior to normalization.Partitioning of samples and spectral preprocessing: Using the sample set partitioning based on the joint X-Y distance (SPXY) algorithm, all samples are divided into calibration set and validation set in a 2:1 ratio.The SPXY method can effectively cover the multidimensional vector space, and the spectral data and indicator values are simultaneously considered when calculating the sample spacing. [21]This ensures that the samples in the two types of variable spaces have the same weights, thereby enhancing the predictive ability of the model.
Due to the fact that spectrum data can be affected by instrument noise and random errors, which directly impact the accuracy and precision of the modeling, appropriate spectral preprocessing procedures are required to enhance the predictive performance and robustness of the model.In this experiment, the spectra are preprocessed by first-order derivative processing and Standard Normal Variable Transformation (SNV).First Derivative (FD) can reflect the rate of change of the spectrum, the core principle of SNV is to standardize the raw spectral data. [22]creening of feature wavelengths: The Competitive Adaptive Reweighted Sampling (CARS) algorithm begins with the elimination of uninformative variables, and the PLS regression coefficients are used to select the optimal subset of combinations of feature variables representing spectral differences based on the position corresponding to the root-mean-square error of cross-validation (RMSECV). [23]he Shuffled Frog Leaping Algorithm (SFLA) is an algorithm similar to the reversible jump Markov Chain Monte Carlo (RJMCMC), [24] which proceeds in an iterative manner, calculating the probability of each variable being selected in each iteration, with the probability increasing with the importance of the variable, and favoring the variable with the highest probability as the feature variable.The SFLA method combines a particle swarm algorithm with strong global search performance and a meta-algorithm with strong local search capability to produce a robust merit search capability.
Model development and assessment: Support Vector Regression (SVR) is an effective machine learning-based multivariate methodology.The Support Vector Machine (SVM) algorithm is, in summary, a method that uses kernel functions to translate samples that are linearly indistinguishable in a low-dimensional feature space to a higher-dimensional feature space, so making them linearly different.The SVR method is a regression algorithm based on the idea of SVM; given a data set D = (x 1 ,y 1 ),(x 2 ,y 2 ), . ..,(x m ,y m )|mR, it learns the objective regression function f(x), where f(x) = wT(x) +b and (x) is a nonlinear mapping function. [25]Using the learnt function f(x) as the regression center, an interval band of width 2 is formed, and if the test sample falls inside this interval band, it is deemed to be the sample that was properly predicted.In this study, the Radial Basis Function (RBF) is used to build the model, and the two crucial parameters of the RBF kernel function, penalty parameter (c) and kernel function parameter (g), are optimized using Grid Search (GS), Genetic Algorithm (GA), and Particle Swarm optimization (PSO) to construct GS-SVR, GA-SVR, and PSO-SVR models, respectively.
The prediction performance of the established model was evaluated by comparing the coefficient of determination (R 2 ) and root mean square error (RMSE), where, R c 2 is the coefficient of determination of calibration set, RMSEC is the root mean square error of calibration set, R p 2 is the coefficient of determination of validation set, RMSEP is the root mean square error of validation set.The prediction ability of the model was quantified using relative predictive deviation (RPD) .The following are the applicable formulas for each evaluation index: RMSECðRMSEPÞ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 n where y i and y _ i are the measured and predicted IFI values of the i-th sample in the calibration set or validation set, respectively, � y is the mean IFI value of all samples in the calibration set or validation set, SD is the standard deviation of IFI for the samples in the validation set, and n is the total number of samples in the calibration set or validation set.The greater the coefficient of determination, the smaller the root mean square error, the greater the stability of the model, the greater the degree of fit, and the more effective the model's prediction performance.When R 2 is larger than 0.80, the predictive capacity of the model is strong.When RPD is less than 1.5, the model cannot predict the sample; when RPD is between 1.5 and 2.0, it can make a rough prediction; and when RPD is greater than 2.0, it can make an excellent prediction. [26]

Establishment of the egg integration freshness index
With the extension of storage time, the freshness of eggs decreases, and in this process, the height of egg white and yolk will gradually decrease, whereas the yolk diameter and S-ovalbumin content will gradually increase [7] ; As eggs are stored for a longer period, the egg weight will gradually decrease, i.e., the heavier the egg, the fresher the egg; as a result, the final determination of egg weight (W i ), albumin height (HA i ), and yolk height (HY i ) as positive indicators, and yolk diameter (D i )and S-ovalbumin content (S À ova i ) as negative indicators.Tables 1 and  Tables 2 were obtained in accordance with the principal component analysis processes.Table 1 showed the total variance explained and Table 2 showed the component matrix after rotation.Based on Table 1 and Table 2, The weights of each index were ultimately estimated as shown in Table 3.
On the basis of the indicator weights and their positive and negative directionality, the following formula (7) was developed for computing the egg integration freshness index (IFI): As shown in Figure 2, the IFI of both brown and pink shell eggs dropped as storage duration increased.IFI was inversely related to storage duration for both brown and pink shell eggs, as demonstrated by Pearson's correlation coefficient (R = −0.985,P < .01 for brown shell eggs; R = −0.987,P < .01 for pink shell eggs,).In addition, the correlation coefficient between the mixed IFI and storage time was −0.921, P < .01.Thus, IFI can be utilized as a new freshness index for a comprehensive evaluation of egg freshness.

Partitioning of sample sets and spectral preprocessing
Our earlier research demonstrated that while there was some variation in the typical spectra of brown and pink eggs, the variation was slight and the values of each freshness index were highly associated across varieties.In addition, our previous model fitness study revealed that the models for freshness indicators of different egg varieties could not successfully predict each other.However, the globally optimized model was able to predict the S-ovalbumin content values, etc. of both egg varieties more accurately, and it was more applicable and reliable.Therefore, we can merge the two variants directly to create a generic IFI nondestructive evaluation (NDE) model.Using the SPXY approach, two-thirds of the samples were selected for modeling and the remaining samples were used for prediction.The calibration set contained 163 samples and the validation set contained 81 samples.Table 4 displays the statistics of the divided sample sets using the SPXY algorithm.
Figure 3 shows the raw spectral of brown-shell eggs and pink-shell egg.The 500 nm -900 nm band range is chosen for this experiment because the original spectrum information has high noise signals at both ends.These noises will create significant interference in the following extraction and modeling of spectral feature parameters if they are not removed.As listed in Table 5, the original spectra were preprocessed with first-order differentiation (FD) and SNV, and GS-SVR, GA-SVR, and PSO-SVR models were developed to examine the modeling effects of various preprocessing approaches.
The results indicate that the GA-SVR model constructed from the full spectrum after SNV preprocessing has a satisfactory predictive accuracy.R 2 of the calibration set is 0.901, R 2 of the validation set is 0.796, and RPD is 2.077, indicating that the model's IFI predictions are excellent at this moment.The full spectrum was preprocessed by SNV before further data processing.

Characteristic wavelength filtering
Since too many dimensions of the full spectrum will slow convergence of the SVR model, and there are variables in the full spectrum that are irrelevant to IFI, the CARS algorithm and SFLA algorithm are used to further extract useful variables and eliminate irrelevant variables, respectively, in order to find  the combination of variables that can represent different levels of IFI in order to simplify the model and improve the prediction accuracy.Feature wavelength screening by CARS algorithm: As depicted in Figure 3, the number of Monte Carlo samples is set to 1000 in the CARS algorithm, and 5-fold cross-validation is utilized to acquire the final optimal feature wavelength screening findings after multiple iterations.
The diminishing exponential relationship between the number of variables and the number of runs is depicted in Figure 4(a) for the selected variables.Figure 4(b) depicts the residual RMSECV trend following the interaction test.The cross-validation RMSECV value decreases and then increases with the number of sampling runs, as shown in Figure 4(b), and the RMSECV value gradually decreases, indicating that some useless information in the spectral data is removed, and the RMSECV value gradually increases, indicating that some important information in the spectral data is removed.As illustrated by the position of the vertical straight line in Figure 4(c), when the RMSECV achieves its minimum value, a total of 29 sample runs are performed.When all RMSECVs were at their minimum, 19 wavelength combinations were selected as the ultimate best ones for predicting egg IFI after repeated parameter testing.
Feature wavelength screening by SFLA algorithm: In this study, the number of SFLA runs has been set to 10,000, the maximum number of potential variables has been set to 5, and the initial number of sampled variables has also been set to 5. Using the magnitude of probability as the evaluation index of variable screening, the horizontal coordinate in Figure 5 reflects the number of each dimensional spectral variable, whereas the vertical coordinate represents the probability of being chosen.The greater the wave peak, the greater the probability of variable selection.The probabilities of selecting  401 variables were ordered using a cutoff value of 0.1, and the 63 best variable combinations placed above the dashed line in the picture were found.Of the 63 bands screened, 30 bands were located in the visible spectral region, primarily those associated with eggshell color characteristics; 33 bands were located in the near-infrared spectral region, the vast majority of which were identical to those reported in the literature for the HU, yolk index, and albumin content.

Model construction and assessment
Table 6 displays the prediction performance of the GS-SVR, GA-SVR, and PSO-SVR models, which were developed using the characteristic wavelengths screened by the CARS method and SFLA algorithm, respectively.Based on the evaluation criteria of the model, it can be seen that the prediction performance of the GA-SVR model based on 63 characteristic wavelengths screened by the SFLA algorithm is better compared to that of other models, and that the coefficients of determination of the GA-SVR model established by its calibration set and validation set are 0.900 and 0.816, respectively, both of which are greater than those of other models, and that the RPD is 2.009.And we can observe that GA-SVR has excellent prediction capacity compared to other models, both in the pre-processing modeling stage and the feature wavelength-based prediction model construction stage.Overall, The GA-SVR model based on SFLA can therefore be used to forecast IFI. Figure 6 depicts the correlation between the IFI measurements and the model's predicted values.

Conclusion
This study was designed by applying the definition of the 'Integration freshness Index,' which gave a comprehensive insight into the freshness of the egg from the biochemical essence of quality changes during storage.IFI declined progressively with storage time for both brown-shell and pink-shell eggs, and IFI was substantially linked with storage duration (R = −0.985,P < .01,for brown-shell eggs; R = −0.987,P < .01,for pink-shell eggs).Importantly, a model based on Vis-NIR spectra for the nondestructive and quick detection of IFI in brown-shell and pink-shell eggs was developed.In this study, the raw full spectrum from 500 nm to 900 nm was utilized, and comparison analysis indicated that after SNV preprocessing and the SFLA algorithm to filter the feature wavelengths, a GA-SVR model using the 63   wavelengths screened could predict IFI the most accurately.R c 2 is 0.900 and RMSEC is 0.005 for the calibration set, whereas R p 2 is 0.81, RMSEP is 0.012, and RPD is 2.077 for the validation set, and the model can simultaneously execute nondestructive identification of two distinct egg kinds IFI, which renders it more relevant and dependable.Therefore, the visible-NIR spectral detection system may be utilized to gather objective data of egg IFI nondestructively and rapidly, allowing for grading and online inspection of egg quality.This enables eggs to be managed more fully according to their quality.

Figure 2 .
Figure 2. Changing trend of IFI with storage time.

Figure 3 .
Figure 3.The raw spectral of brown-shell eggs and pink-shell eggs.

Figure 6 .
Figure 6.Correlation between measured and predicted values of IFI in calibration set and validation set.

Table 2 .
Component matrix after rotation a.

Table 3 .
Egg indicator weights based on principal component analysis.

Table 4 .
Descriptive statistical analysis of the IFI of the sample set.

Table 5 .
Full-spectrum model prediction performance of IFI with different pre-processing methods.

Table 6 .
Evaluation results of the prediction model based on the characteristic wavelength.