Use of LC-Orbitrap MS and FT-NIRS with multivariate analysis to determine geographic origin of Boston butt pork

ABSTRACT To avoid fraudulent practices, LC-Orbitrap and FT-NIRS combined with multivariate analysis was used to distinguish between 53 Korean and foreign Boston butt samples; forty were used to establish the calibration model and 13 were used as an external validation set. Twenty metabolites were determined to be good indicators of geographic origin. Both LC-Orbitrap with CDA model based on 20 metabolites and FT-NIRS with PLS achieved 100% efficiency in identifying Korean and foreign samples; overall predictive rates for LC-Orbitrap (94.9%) and FT-NIRS (100%). Thus, combined use of LC-Orbitrap and FT-NIRS could be proposed to determine reliably discriminate geographic origins of pork samples.


Introduction
Pork is an excellent source of protein and polyunsaturated fatty acids (PUFA), which are essential in human diets. [1] According to the 2019 meat consumption data, [2] 31.2 kg of pork meat is consumed annually per person in Korea, making Korea rank first in the world in terms of pork consumption. Boston butt is a highly preferred pork cut among Koreans who mostly consume grilled or roasted pork rather than processed meat. However, the Boston butt is not a high yield cut, resulting in an imbalance between the supply and demand in South Korea and consequent import from the US, Spain, Canada, Chile, Denmark, Mexico, Netherlands, and Germany. [3] Ninety percent of the imported Boston butt is frozen, [3] and its sale price is two to three-fold lower than that of domestic chilled Boston butt. However, Korean consumers prefer local Boston butt to the imported frozen meat owing to its freshness. Unfortunately, the difference in sale prices has resulted in fraudulent practices, wherein cheaper imported frozen Boston butt is thawed and sold as the more expensive Korean chilled Boston butt.
Scientific technologies are being developed to protect consumer rights and the earnings of agricultural producers, as well as to prevent fraudulent practices. To identify the geographic origin of pork meat, methods have been developed to determine mineral element signatures [4][5][6] and use a combination of multi-elements and stable isotopes. [7,8] There have been only a few studies until now on the use of organic compound signatures to discriminate the geographic origins of pork. Pork is a good source of nutrients, such as protein, fat, fatty acids, and vitamins (thiamine, riboflavin, niacin, pantothenic acid, and pyridoxine). [9] Differences in these pork components depend on the location where the pig species was raised, [10,11] the production system, [8,[12][13][14] and feed provided. [15][16][17][18][19] (3), Germany (3), Ireland (3), Mexico (3), Spain (3), and the United States of America (3). The provinces and cities of Korean and foreign samples are listed in Table S1. There was no information about the locations and rearing practices for foreign samples.

Sample treatment
Boston butt samples (2000-2500 g each) were taken, and visible fat was removed using a ceramic knife; subsequently, 300 g of lean meat was acquired. The lean meat samples were spread thinly in a polyethylene bag and frozen at −40°C for 8 h. The samples were lyophilized for 24 h and ground using a grinding mill (A11 model, IKA, Königswinter, Germany) to obtain homogeneous powders (particle size < 20 mesh). The moisture contents of the powders were below a 5.0% average, and the powders were stored at −40°C before further analysis.

Reagents and solvents
Ultrapure water (18.2 Ω) was obtained using a Milli-Q water purification system (Milli-Q Advantage A 10, Millipore, Bedford, MA, USA). Formic acid and acetonitrile of LC-MS grade quality were procured from Merck (Darmstadt, Hesse, Germany). Caffeic acid was purchased from Sigma-Aldrich (St. Louis, MO, USA).

QuEChERS extraction
Pork samples (2 g) were placed in 50-milliliter polytetrafluoroethylene (PTFE) tubes, and 10 mL of ultrapure water (18.2 Ω) was added. Two ceramic homogenizers (Agilent Technologies, Wilmington, DE, USA) were added, and the tube was manually shaken for 1 min. Thereafter, 10 mL of acetonitrile was added as the extraction solvent, and the mixture was vortexed for 1 min at 3000 rpm (Shake master BMS A-20 TP, Tokyo, Japan). QuEChERS salts (4 g of magnesium sulfate, 1 g of sodium chloride, 1 g of sodium citrate tribasic dihydrate, and 0.5 g of sodium citrate dibasic sesquihydrate) were subsequently added. The mixture was shaken for 1 min at 3000 rpm (Shake master BMS A-20 TP, Tokyo, Japan) and centrifuged (5 min, 20 400 × g, 4°C) in a centrifuge (Mx 307, TOMY, Tokyo, Japan). Next, 500 µL of the supernatant was transferred to a 2-milliliter tube, and 300 µL of water, 150 µL of acetonitrile, and 50 µL of a 20 µg/mL caffeic acid solution in methanol, used as internal standard, were added. The mixture was vortexed for 20s and centrifuged at 20 400 × g for 5 min at 4°C. The solution was then filtered using a 0.22 PTFE syringe filter (Phenomenex, Torrance, CA, USA), and 2 µL of the filtered solution was injected into the LC-Orbitrap.
The QuEChERS method is based on extraction with acetonitrile and partitioning with salt addition. [28] Unlike methods [26,27] that first require extraction of the polar and nonpolar substances in pork meat, the QuEChERS method does not need initial preparation. The QuEChERS method employs cold water or ice for layer separation and vacuum evaporation followed by reconstitution for the mobile phase.

Orbitrap MS conditions
To acquire MS data, the full-scan/data-dependent MS/MS (dd-MS 2 ) was operated in positive mode using electrospray ionization (ESI). The following HESI-II source parameters were used: spray voltage of 3.5 kV; sheath, auxiliary, and sweep gas flow rates of 35 arb, 10 arb, and 0 arb, respectively; a capillary temperature of 300°C; S-lens radio frequency level of 50°C; and heater temperature of 320°C. The full-scan spectra were collected with an m/z range of 100-1500 Da at a 70,000 resolution, automatic gain control (AGC) of 1e 6 , and maximum injection time of 100 ms. The dd-MS 2 spectra were collected with an m/z range of 100-1500 Da at a 17,500 resolution, AGC of 1e 6 , maximum injection time of 100 ms, window of 0.5 m/z, loop count of top three peaks, dynamic exclusion of 6.0 s, and normalized collision energies of 10, 30, and 60.

Data processing and identification
Raw data from full MS scan and dd-MS 2 scan were processed using the Compound Discoverer 2.1 software (Thermo Fisher Scientific) for spectrum properties filter and peak alignment, and feature extraction was set as default value. Next, the exact mass coupled with MS/MS spectrum was matched with online mass databases, such as ChemSpider (www.chemspider.com) and mzCloud (www. mzcloud.org), for identification. An internal standard was used to evaluate for instrumental bias among the run samples. A blank sample was processed and used to eliminate the background signal in the samples.

Multivariate data analysis
The identified data filtered by set conditions (as described in Section 3.1), and fold-change (FC) analysis, t-test, and volcano plot construction were carried out using Compound Discoverer 2.1. The selected mass features were exported to the data matrix compound name, retention time, and nonnormalized areas for further analysis. Autoscaling (centered mean divided by the standard deviation of each variable) was applied to normalize data before multivariate data analysis. Next, we carried out orthogonal partial least squares-discriminant analysis (OPLS-DA) using Simca 17.0 (Umetrics, Umea, Sweden). After the OPLS-DA analysis, the most significant variables for geographical discrimination were selected by Variable Importance in the Projection (VIP), S-plot, and Canonical discriminant analysis (CDA) of UNISTAT (version 6.5., Unistat Ltd., London, UK). To rebuild the classification model from the selected variables, CDA of UNISTAT was performed. The normalized (autoscaled) areas of organic components were used as independent variables. The response variable was set to Korean samples and foreign samples in the calibration set. In the validation set, unknown Boston butt was set to " * ." For evaluating the performance of the model, its sensitivity, selectivity, and efficiency were calculated based on the values for True Korean (TK; samples correctly classified as Korean), False Korean (FK; samples that were not correctly classified as foreign), True Foreign (TF; samples correctly classified as foreign), and False Foreign (FF, samples not correctly classified as Korean).
The Korean and foreign sample prediction rates of the constructed model are presented as the percentages of correct classification of Korean and foreign samples, respectively. Pearson's correlation analysis was performed using SPSS 16.0 (SPSS Inc., Chicago, IL, USA).

NIRS analysis
Briefly, 15 g of freeze-dried pork powder was placed in the sample cup. The spectrum of each sample was collected three times in the reflectance mode using an FT-NIRS spectrometer (Bruker Optik GmbH, Ettlingen, Germany). The range of spectra was 11,500-3,950 cm −1 at 16 cm −1 resolutions and 32 scans with an integrating sphere. All data analyses, including spectra preprocessing, classification model construction, test validation, and PLS algorithm-based analysis, were carried out using the Bruker OPUS 7.0 software on Windows 7 (Microsoft, Redmond, WA, USA).
The following pre-treatment methods were applied for classification model developments: standard normal variate, multiplicative scatter correction, first derivative, and second derivative. The precision of the classification model with FT-NIRS was evaluated using the coefficient of determination (R 2 ), root mean square error of estimation (RMSEE), root mean square error of prediction (RMSEP), and residual prediction deviation (RPD). R 2 gives the percentage of variance present in the component values, which is reproduced in the prediction. RMSEE value calculates the analysis error of the classification model. The RMSEP value is a quantitative measure for the preciseness of the analysis of test samples. The RPD value is a qualitative measure for the assessment of the validation results. It is the quotient of the standard deviation of the reference values and the bias-corrected mean error of prediction of the validation. To avoid an over-optimistic interpretation of the result, the RPD is calculated. The OPUS program automatically selects the optimum classification model with the lowest possible rank (latent variable), RMSEE, RMSEP, and the highest R 2 and RPD.

Boston butt sample peaks obtained using LC-Orbitrap
In total, 2,872 peaks were detected in pork samples using LC-Orbitrap MS. To obtain sufficient peak features and reliable statistical results, the following filtering conditions were employed using Discoverer 2.1: (1) annotation source determined by mzCloud and ChemSpider has a status of full match in source predicted compounds; (2) compounds had features with the MS/MS fragments information greater than 1; and (3) volcano plot analysis using Student's t-test was conducted. In total, 1,004 metabolites were obtained that showed an adjusted p-value of <0.05 and a FC of 1.2. Figure 1 shows the volcano plot of the -log 10 adjusted p-values versus log2 fold changes (foreign/ Korean samples).
Among the metabolites, 138 duplicate metabolites were removed using the following conditions: high CV% for peak areas of each group, high number of fill gaps, and low identification matching percentage. The result of this analysis revealed 47 putative metabolites in the foreign group and five putative metabolites in the Korean group that can potentially discriminate between foreign and Korean domestic Boston butt samples.

Metabolite selection via OPLS-DA and CDA
OPLS-DA has been used to discriminate the geographical origins of the samples using the LC-HRMS data for soybeans, [29] guarana seeds, [30] durum wheat, [31] and orange. [32] Figure S1 shows the OPLS-DA results based on the total 52 metabolites described in Section 3.1 for 24 Korean samples (72 observations) and 29 foreign samples (87 observations). Except for one Korean sample, the OPLS-DA models discriminated between the Korean (blue circles) and foreign (red circles) samples. VIP scores of the OPLS-DA models were generated to identify which of the 52 metabolites described in Section 3.1 majorly influence discrimination between Korean and foreign pork samples. The metabolites with a VIP value greater than 1 have an above-average contribution to the OPLS-DA classification. Twenty significantly differential metabolites were selected based on a VIP value > 1 and a P-value < 0.05. Of these, seven were excluded due to unknown chemical classes: (1.09), and 1,11-diamino -3,6,9-trioxaundecane (1.02). In addition, to improve the discrimination between the origins of the two Boston butt, seven additional variables were selected via CDA. The variables were inserted and deleted to obtain the best variables capable of distinguishing between Korean and foreign groups based on standardized coefficients. The large standardized coefficients indicated variables with greater discriminating ability.
Finally, 20 metabolites (13 from OPLS-DA and 7 from CDA) were selected ( Table 1). The classification of the Boston butt samples using the selected 20 metabolites was also good (Figure 2a). Thus, sufficient discriminative information was contained in the 20 selected metabolites, which showed similar discriminant ability as that achieved with the previously mentioned 52 metabolites ( Figure S1). The foreign samples from ten different nations were more widely scattered than Korean samples, reflecting the geographical diversity of samples spread over a wider area. The values of R 2 X, R 2 Y, and Q 2 were 0.902, 0.791, and 0.773, respectively, which indicated a good fit and high predictive ability, with a low likelihood of overfitting (Figure 2a). When the R 2 Y and Q 2 values were close to 1, the OPLS-DA model was more stable and reproducible. When the R 2 Y, Q 2 values were >0.5, the OPLS-DA model had acceptable prediction abilities. [33,34] To verify the statistical significance of the constructed model, the permutation test (n = 200) method was used with 7-fold cross-validation as an internal prediction. In the permutation test, the regression intercept values for R 2 and Q 2 regression lines represent the degree of fit to the data and the predictive ability of the model, respectively. The intercept value (R 2 ) should not be more than 0.3, and the Q 2 value should not exceed 0.05 for the model, which demonstrated no overfitting and showed the high predictive value of the model. [35] In the permutation test, the values of R 2 and Q 2 were 0.042 and −0.201, respectively, indicating good repeatability and predictability of the model (Figure 2b).

Discrimination of the geographical origins of Boston butt samples by CDA
The OPLS models classified the different Korean and foreign groups. But external validation with independent sample set which were excluded from the classification model was necessary. Because OPLS-DA often yielded the most overoptimistic result possible for the predictive ability (Q 2 ) statistic. [36] To evaluate accurately evaluate the predictive ability, another supervised method, CDA was adopted.
The goal of CDA is to achieve better discrimination between sample origins by optimizing between-group variation and minimizing within-group variation. [37] CDA has been applied previously as a classification model to discriminate pork geographical origins based on mineral elements using inductively coupled plasma spectrometry. [5] This method has also been used to verify the authenticity of honey using 1 H nuclear magnetic resonance spectroscopy with great success. [34] To perform the external validation, 159 observations (53 Boston butt samples and their analytical replicates) were randomly assigned to a calibration set (CS) comprising 3/4 of the observations (Korean, 54; foreign, 66) and an external validation set comprising the remaining 1/4 observations (Korean, 18; foreign, 21). The CS was used to construct a classification model using the normalized areas of the 20 selected metabolites as explanatory variables, and the validation set was employed to validate the established model. This process was repeated three times to compare the predictive powers of the three models CS1-3. The CDA results showed that the overall sensitivity, selectivity, and efficiency were 100% with CS1, CS2, and CS3. The statistical parameters of the CDA model are shown in Table 2.
The canonical discriminant function is defined as the linear combination of variables by which two groups can be most well classified. The values of CS1, CS2, and CS3 were −2.3775, −2.6023, and −2.2165 and 1.9452, 2.1291, and 1.7346, respectively, for Korean and foreign samples. The canonical correlation is interpreted as an index of the overall fit of the model, as it shows the proportion of variance between the groups and the total variance. The values of canonical correlation for CS1, CS2, and CS3 were 0.9081, 0.9216, and 0.8923, respectively. A value near 1 indicates that the separation between the groups is good. The distances between centroids of the two groups for CS1, CS2, and CS3 were 4.3227, 4.7314, and 3.9511, respectively. A distance between centroids >2 indicates a significant separation. In the current study, 60 out of 63 foreign observations with all three CDA models were correctly classified as foreign, and the average predictive rate was 95.2%. Furthermore, 51 out of 54 Korean observations for CS1, CS2, and CS3 were correctly assigned to the Korean group, and the average predictive rate was 94.4% ( Table 2). The overall prediction rate was 94.9%, which was obtained from 111 out of 117 Korean and foreign observations. Thus, the CDA model may have good identification and predictive power.

Metabolites specific to Korean domestic and foreign samples
Nine metabolites, namely icosa-5,8,11,14-tetraenoic acid (AA), docosapentaenoic acid (DPA), 8z,11z,14z-eicosatrienoic acid, eicosapentaenoic acid (EPA), alpha-linolenic acid (LNA), docosahexaenoic acid (DHA), α-eleostearic acid, all-cis-4,7,10,13,16-DPA (osbond acid), and 9-oxo-10(E),12 (E)-octadecadienoic acid (9-OxoODE), belonged to unsaturated fatty acids and accounted for 45% of the 20 metabolites finally selected. As shown in the S-plot (Figure 2c), DPA, AA, 8z,11z,14zeicosatrienoic acid, and EPA were located on the outer side and could be considered as key markers of the foreign group. Interestingly, although EPA showed the highest FC among the 20 metabolites (Table 1), its VIP score ranked seventh (Table S2). As a result of Pearson's correlation analysis, significant positive correlations were observed among most of the unsaturated fatty acids ( Figure 3). [38,39] have reported that the fatty acid composition of pork closely reflects the diet. Pigs fed soybean oil have a higher PUFA content than pigs fed animal fat and palm oil. [40] The level of LNA and DHA is increased more by soybean oil supplementation than by beef tallow and the switch from tallow to soybean oil in three crossbred (Landrace×Yorkshire×Duroc) pig diets. [41] Diets enriched in plant oils (such as sunflower, linseed, and rapeseed) result in increased PUFA levels in pork. [42,43] Among the selected 20 metabolites, choline showed the highest VIP value of 1.47 (Table S2) and was the most remote metabolite visualized in the S-plot of OPLS-DA (Figure 2c). Thus, it has the greatest influence on the identification of the geographic origin of pork. Choline is an essential nutrient that functions to maintain cell structure, transport lipid in and out of cells, and serve as a precursor for the synthesis of acetylcholine and phospholipids. [44] Choline content was positively correlated with AA (r = 0.893, P < .0001), 8z,11z,14z-eicosatrienoic acid (r = 0.888, P < .0001), DPA (r = 0.880, P < .0001), platelet-activating factor (PAF; r = 0.873, P < .0001), and EPA (r = 0.844, P < .0001; Figure 3).
Hexanoylcarnitine (VIP = 1.24, FC = 0.2, p < .05) and N-Boc-L-valine (VIP = 1.35, FC = 0.1, p < .05) were present in the lower left-hand quadrant of the S-plot (Figure 2c); both mainly contributed to the identification of Korean Boston butt. Hexanoylcarnitine is a medium-chain acylcarnitine [47] and an intermediate metabolite of fatty acid β-oxidation. [48] In contrast, the content of palmitoyl carnitine, which is a long-chain acylcarnitine, [49] was 4-fold higher in foreign samples than that in Korean Boston butt samples. However, palmitoyl carnitine had a VIP value of 0.48, suggesting that it had little influence on discriminating between Korean and foreign origins of Boston butt [data not shown).
Reported that short-and medium-chain acylcarnitine are metabolized very differently from longchain acylcarnitine in cardiovascular diseases. [47] The different types of acylcarnitine found in Korean and foreign groups may affect different fatty acid oxidation pathways. Korean Boston butt had significantly higher levels of hexanoylcarnitine than foreign Boston butt. N-Boc-L-valine was linked to hexanoylcarnitine (r = 0.744, P < .0001; Figure 3), but very little is known about N-Boc-L-valine, and it should be investigated in the future. The pork distributed in the Korean market is dominated by pigs obtained by a three-way crossbreeding between Landrace, Yorkshire dam, and Duroc sire that combines their merits. [10] These three-way crossbred pigs are used for commercial pork production not only in Korea but also in other parts of the world. Industrial livestock farming cannot be easily changed. Unsaturated fatty acid concentrations in meat have been reported to be more related to feeding than to genotype. [50] Accordingly, differences in metabolic components of pork may be mainly influenced by pig diets, [51][52][53][54] including different feed ingredients and feed additives in different regions.

NIRS
To compare the discriminatory performance of the analysis methods (LC-Orbitrap MS and FT-NIRS), the same calibration set and validation set used for CDA were used for FT-NIRS. The three classification models, CS1-3, were developed using the calibration set (Korean, 54; foreign, 66) and validated using the remaining observations (Korean, 18; foreign, 21) as an independent test set, respectively. The wavelengths selected for the classification model were 9405.3-7499.1 cm −1 and 5027.3-4594.4 cm −1, in accordance with the automatic optimization function in the OPUS 7.0 software (Figure 4a). The 5027.3-4594.4 cm −1 absorption region was associated with a combination of secondary amides (-CONH-), primary amides (-CONH 2 ), and primary amines (NH 2 ); these wavelengths may reflect the protein contents in pork. Absorption at 9405.3-7499.1 cm −1 was related to the second overtone of methyl (-CH 3 ), methylene (-CH 2 ), C = C alkenes, and -CH, which were mainly related to the lipid content in pork.
To eliminate the effect of baseline drifts, noises, and light scattering and achieve a reliable and accurate model, a second derivative was used for optimum spectral preprocessing (Figure 4b). The loading value of the pork was set to 100 for Korean domestic samples and 1 for foreign samples.
Via the calibration equation with PLS, a predicted value of 50 was considered as the threshold for the classification of Korean or foreign samples. A Boston butt sample with a PLS predicted value below 50 was assigned to the foreign group, while a value equal to or greater than 50 was assigned to the Korean group ( Figure 5). When developed NIRS models (Figure 5a-c) were applied to a validation set (CS1-3) that had not been included in the calibration set, a prediction rate of 100% was obtained (Figure 5d-f) based on a cutoff value of 50.
The model generated using a combination of FT-NIRS and PLS showed good discriminatory performance for both the calibration and validation sets; experiments were repeated three times. The classification model with CS1, CS2, and CS3 showed R 2 values of 87.88%-94.25%, RMSEE values of 13-19.7, and RPD values of 2.6-3.93 (Figure 5a-c). The CS1, CS2, and CS3 of the FT-NIR model of the test validation were well-separated according to R 2 , RMSEP, and RPD ranges of 84.12-90.36%, 15.3-19.7%, and 2.6-3.24%, respectively (Figure 5d-f), [55] have reported that a value for R 2 between 0.81 and 0.9 is good. R 2 greater than 0.9 reveals a strong linear correlation. For RPD values between 2.5 and 3, predictions can be classified as good, and RPD > 3 indicates a high predictive performance.

Comparison between FT-NIRS and LC-Orbitrap
Comparisons between the results of the classification model from FT-NIRS and LC-Orbitrap were applied to minimize the discriminant error and to increase confidence in the judgment.
For the FT-NIRS, of CS1, three observations (one sample), marked in red, from Belgium were assigned correctly to the foreign group, and the Mahalanobis distance was within the corresponding limit, 0.35. However, these observations were recognized as outliers due to the large F-probe. The  The OPLS-DA model based on 20 metabolites by LC-Orbitrap had similar findings, wherein the three red marked observations from FT-NIRS fell outside the confidence ellipse (95% confidence interval) that accurately belonged to the foreign classification (Figure 2a dotted square box). Further, the results of CDA showed that the classification efficiency of CS1 was 100% (Table 2), [56] reported that samples are placed outside the confidence ellipse when samples indicate specific differences in compositional patterns compared with other samples rather than outliers. As similar results were obtained from two different pieces of equipment, LC-Orbitrap and FT-NIRS, we suggested that the three red marked observations had distinct metabolites instead of outliers.
Both CDA by processing the LC-Orbitrap MS data and PLS by FT-NIRS showed a perfect efficiency of 100%. For the external validation results, the overall predictive rates of 100.0% and 94.9% were found for FT-NIRS and LC-Orbitrap, respectively. Herein, organic component analysis of pork samples was carried out using LC-Orbitrap MS and FT-NIRS to characterize their geographic origins. In total, 20 metabolites were selected to discriminate between Korean and foreign pork samples via LC-Orbitrap MS coupled with volcano plot, OPLS-DA, and CDA. The concentrations of unsaturated fatty acids, choline, and PAF were found to be higher in foreign pork samples -obtained from ten nations -than those in Korean pork samples, whereas the concentrations of hexanoylcarnitine and N-Boc-L-valine were higher in Korean domestic pork samples than those in foreign pork samples. Because FT-NIRS with chemometrics is a fast and less laborious method than LC-Orbitrap MS, it might be suitable for distinguishing the geographic origins of a large number of pork samples. In contrast, LC-Orbitrap, compared with FT-NIRS, given the high cost and technical expertise requirement, provided a better understanding of the nutritional values that contribute to the identification of the geographical origin of Korean and non-Korean Boston butt samples. During routine analysis, FT-NIRS and LC-Orbitrap yielded consistent results, indicating consistency in the value judgments. We conclude that LC-Orbitrap MS and FT-NIRS can act as complementary strategies to determine the geographical origins of Boston butt. Future studies should be conducted to create a pork butt database for curating information related to samples of various geographic origins, evaluate the consistency of the findings of this research, update the classification model for maintaining the accuracy rate, and establish FT-NIRS and LC-Orbitrap for routine analysis to discriminate the geographic origins of pork butt.