Integrated proteomics and metabolomics analysis of transgenic and gene-stacked maize line seeds

ABSTRACT Unintended effects of genetically modified (GM) crops may pose safety issues. Omics techniques provide researchers with useful tools to assess such unintended effects. Proteomics and metabolomics analyses were performed for three GM maize varieties, 2A-7, CC-2, and 2A-7×CC-2 stacked transgenic maize, and the corresponding non-GM parent Zheng58. Proteomics revealed 120, 271 and 135 maize differentially expressed proteins (DEPs) in the 2A-7/Zheng58, CC-2/Zheng58 and 2A-7×CC-2/Zheng58 comparisons, respectively. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis showed that most DEPs participated in metabolic pathways and the biosynthesis of secondary metabolite. Metabolomics revealed 179, 135 and 131 differentially accumulated metabolites (DAMs) in the 2A-7/Zheng58, CC-2/Zheng58 and 2A-7×CC-2/Zheng58 comparisons, respectively. Based on KEGG enrichment analysis, most DAMs are involved in the biosynthesis of secondary metabolite and metabolic pathways. According to integrated proteomics and metabolomics analysis, the introduction of exogenous EPSPS did not affect the expression levels of six other enzymes or the abundance of seven metabolites involved in the shikimic acid pathway in CC-2 and 2A-7×CC-2 seeds. Six co-DEPs annotated by integrated proteomics and metabolomics pathway analysis were further analyzed by qRT-PCR. This study successfully employed integrated proteomic and metabolomic technology to assess unintended changes in maize varieties. The results suggest that GM and gene stacking do not cause significantly unintended effects.


Introduction
The commercialization of genetically modified (GM) crops began in 1996. 1 From 1996 to 2018, land planted with GM crops increased from 1.7 to 191.7 million hectares. 2 Stacked GM crops, which combine two or more traits, have multiple benefits and satisfy the need for planting diversity. The planting area of stacked GM crops accounts for approximately 40% of the global GM crop production area. 2 Although the commercialization of GM crops has many advantages, such as considerable economic benefits and reduced chemical pesticide pollution, more attention is being given to their food and environmental safety. 3,4 Genetic modification might lead to not only the insertion of exogenous DNA fragments but also the rearrangement or deletion of some endogenous genes, 5 therefore interfering with certain biochemical pathways and possibly producing new allergens or toxins. Thus, safety assessments of GM crops must be comprehensively carried out. [6][7][8][9] Different from previous directional detection methods, such as PCR and ELISA, the newly developed nondirectional detection methods, such as omics (e.g., genomics, transcriptomics, proteomics, metabolomics), developed in recent years enable carrying out comprehensive safety assessment and provide detailed insight into any unintended changes in the GM crops studied. 10,11 Although genomics and transcriptomics provide high coverage of gene sequence and expression data, which are distant from affecting the levels of nutrients, antinutrients and other factors that contribute to food and feed quality and safety, proteomics and metabolomics are closer to such endpoint phenotypes. 12 The use of isobaric tags with relative and absolute quantitation (iTRAQ)-based proteomics is a high-throughput method with high accuracy and repeatability that has been widely used in the safety evaluation of GM crops such as rice, potatoes and soybeans. [13][14][15][16][17][18] Our previous study also showed that this method can successfully assess unintended changes in GM maize. 19 In addition to proteins, primary metabolism profoundly influences crop growth, development and reproduction. 20,21 Metabolites such as carbohydrates, amino acids and organic acids accumulate in seeds, largely affecting crop quality traits. 22 Recent advances in metabolite profiling technology make it possible to comprehensively compare the differences in metabolites of crops affected by the growth environment, 12 genetic engineering and conventional cross-breeding. [23][24][25] In this study, proteomics and metabolomics were used to identify the unintended changes in seeds of the GM maize lines 2A-7 and CC-2 and the stacked line 2A-7×CC-2. Based on the proteomics and metabolomics data, the changes in protein expression and metabolites were evaluated between three GM maize lines, 2A-7, CC-2, and 2A-7×CC-2, and their non-GM parent, Zheng58.

Plant Materials
Seeds of the GM maize lines 2A-7 (carrying cry1Ab and cry2Ab genes), CC-2 (carrying maroACC gene) and the stacked line 2A-7×CC-2 (containing cry1Ab, cry2Ab and maroACC genes), as well as their non-GM parent, Zheng58, were studied. The GM maize line and non-GM parent seeds were provided by the State Key Laboratory of Agrobiotechnology and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University (CAU). All maize seeds collected were obtained from the same natural growth environment and stored at −80°C. Full and uniform maize seeds were selected as experimental materials.

DNA Extraction
Maize seed genomic DNA was extracted using the Quickly Genome DNA Extraction Kit (Tiangen, Beijing, China).

PCR-based Detection of Transgenic Maize
Event-specific PCR was used to detect specific events in GM maize lines. The sequences of the event-specific primers used and the sizes of the amplified DNA fragments are listed in Supplementary Table S1. PCR consisted of denaturing at 95°C for 1 min and 30 cycles of denaturing at 95°C for 15 s followed by annealing at 58°C and extension at 68°C for 30 s.

Protein Preparation
Three biological replicates of seeds of the different maize lines were used for protein profiling in this study. Maize seed grains of each line were ground in liquid N 2 . Total proteins were extracted with lysis buffer. The protein concentration was determined by the Bradford method. 26

Trypsin Digestion and iTRAQ Labeling
The extracted proteins (100 µg) were digested with 4 µg of trypsin overnight at 37°C. Protein reduction, blocking of cysteine residues, and digestion were performed according to the iTRAQ manufacturer's protocol (SCIEX, Framingham, MA, USA). The digested peptides were labeled with individual iTRAQ reagents, following the standard iTRAQ protocol for the 8-plex kit. The tags used were 113 Da for 2A-7, 114 Da for CC-2, 115 Da for 2A-7× CC-2, and 116 Da for Zheng58. The labeled samples were mixed in equal amounts and lyophilized.

LC and MS/MS (Liquid Chromatography and Tandem Mass Spectrometry) Analysis
The peptide mixture was redissolved in solution A (98% ddH 2 O, 2% acetonitrile and 0.1% formic acid) and then fractionated by a TripleTOF 5600 Plus instrument (SCIEX, Framingham, MA, USA). Then, 100 μg of the mixture was desalted and fractionated using a Durashell-C18 reversed-phase column. Next, solution B (98% acetonitrile, 2% ddH 2 O and 0.1% formic acid) was added. After separation, the fractions were resuspended in 20 μL of solution C (0.1% formic acid and 2% methanol in water), separated by an Eksigent nanoLC instrument (SCIEX, Framingham MA, USA) and analyzed by on-line electrospray tandem mass spectrometry.

Analysis of Proteomic Data
The original files generated by the Q-Exactive system were analyzed using ProteinPilotTM V4.5 (Thermo Fisher Scientific, Waltham, MA, USA), and protein identification was performed using the Mascot search engine (Matrix Science, London, UK; version 2.3.02) against the UniProt Zea mays (maize) database supplemented with 3 foreign proteins, namely, Cry2Ab, Cry1Ab and EPSPS (5-enolpyruvulshikimate-3-phosphate synthase enzyme, maroACC gene coding product). All the identified proteins were matched with at least one unique peptide at ≥ 95% confidence. 14,27 Proteins that had a fold change ≥ 2 or ≤ 0.5 and P value ≤ 0.05 15,28,29 among the comparison groups were considered DEPs. The principal components (PCs) analysis was performed by the statistics function prcomp within R (www.r-project.org). A heatmap of the identified proteins was generated by hierarchical clustering. Functional classification of the DEPs was performed by Gene Ontology (GO) annotation and enrichment using the Gene Ontology database (http://www.geneontology.org/ ). KEGG pathway annotation and enrichment of the DEPs were carried out using the KEGG database (http://www.genome.jp/kegg/).

Metabolite Preparation
Six biological replicates of seeds of the different maize lines were used for metabolite profiling in this study. Maize seed grains of each line were ground in liquid N 2 . The total metabolites were extracted with 70% aqueous methanol from 100 mg of seed powder overnight at 4°C. Following centrifugation at 10,000 g for 10 min, the extracts were absorbed by a CNWBOND Carbon-GCB SPE Cartridge (ANPEL, Shanghai, China) and filtered before UPLC (ultra performance liquid chromatography)-MS/MS analysis.

UPLC Conditions
The extracted metabolites were analyzed using a UPLC-ESI-MS/MS system (UPLC, Shim-pack UFLC SHIMADZU CBM30A system; MS, Applied Biosystems 4500 Q TRAP). The analytical conditions were as follows: UPLC column, Agilent SB-C18 (1.8 µm, 2.1 mm×100 mm). The mobile phase consisted of solvent A (98% ddH 2 O, 0.1% formic acid) and solvent B (acetonitrile). Sample measurements were performed with a gradient program that employed the starting conditions of 95% A and 5% B. Within 9 min, a linear gradient to 5% A and 95% B was programmed, and a composition of 5% A and 95% B was maintained for 1 min. Subsequently, a composition of 95% A and 5.0% B was adjusted within 1.10 min and kept for 2.9 min. The column oven was set to 40°C, and the injection volume was 4 μL. The effluent was alternatively connected to an electron spray ionization-triple quadrupole-linear ion trap (ESI-QTRAP)-MS.

ESI-QTRAP-MS/MS
The ESI source operation parameters were as follows: ion source, turbo spray; source temperature 550°C; ion spray voltage (IS), 5500 V (positive ion mode)/-4500 V (negative ion mode); the ion source gas I (GSI), gas II (GSII), and curtain gas (CUR) were set at 50, 60, and 30.0 psi, respectively; and the collision gas (CAD) was high. Instrument tuning and mass calibration were performed with 10 and 100 μmol/L polypropylene glycol solutions in QQQ (triple quadrupole) and LIT (linear ion trap) modes, respectively. QQQ scans were acquired as MRM (multiple reaction monitoring) experiments with the collision gas (nitrogen) set to 5 psi. The DP (declustering potential) and CE (collision energy) of individual MRM transitions were determined with further DP and CE optimization. A specific set of MRM transitions was monitored for each period according to the metabolites eluted within this period.

Metabolite Data Analysis
Before data analysis, quality control (QC) analysis was conducted to confirm the reliability of the data. A QC sample was prepared by the mixture of sample extracts and inserted into every two samples to monitor the changes over repeated analyses. Orthogonal partial least squares-discriminant analysis (OPLS-DA) was used to maximize the metabolome differences by removing the irrelevant differences between the GM and non-GM parent samples. On the basis of the OPLS-DA results, the metabolites of different maize samples were preliminarily screened from the variable importance in projection (VIP) values of the obtained multivariate OPLS-DA model. The P-value and fold change values of the univariate analysis were combined to further screen differential metabolites. Metabolites with VIP ≥ 1 and fold change ≥ 2 or fold change ≤ 0.5 were considered differential metabolites for group discrimination. 30 A heatmap based on the hierarchical cluster analysis method was generated in R software (www.r-project.org). KEGG pathway annotation of the DAMs was carried out using the KEGG database (http://www.genome.jp/kegg/ ). 31,32

qRT-PCR (Quantitative Real-time PCR)
A total of approximately 1.0 g of seeds ground in liquid N 2 per maize line was used for total RNA extraction with Ambion PureLink plant RNA reagent (Invitrogen, CA, USA). RNA integrity was analyzed by agarose gel electrophoresis and reverse transcribed with M-MLV reverse transcriptase (Invitrogen, CA, USA). The sequences of genespecific primers used for quantitative real-time PCR (qPCR) are listed in Supplementary Table S2. qRT-PCR was performed using the SYBR Green qRT-PCR Kit (Bio-Rad, CA, USA) with three biological replicates. All reactions were conducted on a CFX96 real-time PCR system (Bio-Rad). The PCR conditions consisted of denaturing at 95°C for 1 min and 40 cycles of denaturing at 95°C for 15 s, followed by annealing and extension at 60°C for 30 s. The qRT-PCR data were analyzed using the 2 −ΔΔCT relative quantification method. 33 Expression of GAPDH (glyceraldehyde-3-phosphate dehydrogenase gene) was measured as an internal control.

ELISA (Enzyme-Linked Immunosorbent Assay)
Maize seeds were ground with liquid N 2 . Proteins were extracted with lysis buffer. The contents of Cry1Ab, Cry2Ab and EPSPS were assessed using ELISA kits (Agdia, IN, USA).

Confirmation of GM Maize Lines
Seeds were used to study the proteomic and metabolomic differences between 3 GM maize lines (2A-7, CC-2 and 2A-7×CC-2) and their non-GM parent, Zheng58. Event-specific PCR was carried out for the identification of transformants. The target DNA fragment was only obtained from GM maize lines (Supplementary Figure S1) and then sequenced for further verification.

Protein Profiling of Maize Seeds
In total, 3600, 3508 and 3419 proteins were identified from three independent iTRAQ replicates, for a total of 6874 proteins (Supplementary Table S3). Among the identified proteins, 2569 simultaneously appeared in three replicates. The principal components (PCs) analysis obtained PC1(56.04%) and PC2(23.97%) two main components. Under the influence of the inter-batch effect, the repeated samples did not cluster together well, but the samples in the batch tended to gather (Fig. 1a). Cluster analysis of the identified proteins showed that the protein expression patterns of 2A-7 and 2A-7×CC-2 shared the highest similarity among the 4 studied maize lines. The protein expression patterns of 2A-7×CC-2 and Zheng58 shared higher similarity than those of 2A-7 and CC-2 (Fig. 1b).

Identification of Maize DEPs and co-DEPs in Seeds
Proteins with a fold change greater than 2.0-fold or lower than 0.5-fold (P value ≤ 0.05) were identified as DEPs. The numbers of DEPs in the different comparison groups are summarized in Table 1. There were 120 maize DEPs identified in the 2A-7/Zheng58 comparison group, including 57 upregulated proteins and 63 downregulated proteins (Supplementary Table S4). Two hundred seventy-one maize DEPs were identified by comparison of CC-2 with Zheng58, 48 of which were upregulated and 223 of which were downregulated (Supplementary Table S5). There were 135 maize DEPs identified in the 2A-7×CC-2/ Zheng58 comparison group, including 58 upregulated proteins and 77 downregulated proteins (Supplementary Table S6). Among these DEPs, 29 were simultaneously identified in the three comparison groups and named co-DEPs (Fig. 1c). Except for co-DEP Q0QW12, the regulatory trend of the other 28 co-DEPs was consistent among the three comparison groups (Fig. 1d).

Analysis of the Maize DEPs Identified
Gene Ontology (GO) annotation and enrichment were performed using the Gene Ontology database to reveal the functions of the identified DEPs, including molecular functions, biological processes, and cellular components. The DEPs of 2A-7/ Zheng58 were classified into 34 functional groups  (Supplementary Figure S2A). The DEPs in the biological processes category are mainly involved in the response to temperature stimulus and defense response (Supplementary Figure S2B). The CC-2/ Zheng58 DEPs were annotated into 43 functional groups (Supplementary Figure S3A). In the biological processes category, the DEPs participate in the response to stress and oxidation reduction (Supplementary Figure S3B). The DEPs for 2A-7×CC-2/Zheng58 were sorted into 35 functional groups (Supplementary Figure S4A), and the biological processes category of the DEPs are mainly involved in oxidation reduction and homeostatic processes (Supplementary Figure S4B). KEGG pathway annotation and enrichment analysis of the identified DEPs was carried out with the KEGG pathway database. The DEPs identified in the 2A-7/Zheng58 comparison group are mainly involved in metabolic pathways (ko01100) and biosynthesis of secondary metabolites (ko01110), followed by starch and sucrose metabolism (ko00500) and protein processing in the endoplasmic reticulum (ko04141) (Fig. 2a). The DEPs obtained from CC-2/Zheng58 and 2A-7×CC-2/Zheng58 comparisons mainly participate in metabolic pathways (ko01100) and biosynthesis of secondary metabolites (ko01110), followed by microbial metabolism in diverse environments (ko01120) (Fig. 2b and 2c).

Metabolite Profiling of Maize Seeds
There were 616 metabolites successfully detected in maize seeds by widely targeted metabolomics (Supplementary Table S7). The metabolites detected are diverse, and they were classified into 32 classes, including amino acids and derivatives, phenolic acids, organic acids, nucleotides and derivatives, sugar alcohols, etc. Cluster analysis of the identified metabolites showed that the metabolite profiles of 2A-7×CC-2 and CC-2 share the highest similarity among the 4 studied maize lines. The metabolite profiles of 2A-7×CC-2 and 2A-7 share higher similarity than those of 2A-7 and Zheng58 (Fig. 3a).

Identification of DAMs and co-DAMs in Maize Seeds
Metabolites with changes in abundance greater than 2.0-fold or lower than 0.5-fold (VIP ≥ 1) were identified as DAMs. The numbers of DAMs in the different comparison groups are summarized in Table 2. There were 179 DAMs identified in the 2A-7/Zheng58 comparison group, including 164 increased and 15 decreased metabolites (Supplementary Table S8). In total, 135 DAMs were identified by comparison of CC-2 with Zheng58, 109 of which were increased and 26 decreased (Supplementary Table S9). There were 131 DAMs identified in the 2A-7×CC-2/Zheng58 comparison group, including 106 increased and 25 decreased metabolites (Supplementary Table S10). Among these DAMs, 79 were identified in three comparison groups at the same time and named co-DAMs (Fig. 3b). The regulatory trend of these co-DAMs is illustrated in Fig. 3c.

Analysis of the Identified DAMs
KEGG pathway enrichment analysis of the identified DAMs was carried out with the KEGG pathway database. The DAMs identified in the 2A-7/ Zheng58 comparison group are mainly involved in the biosynthesis of secondary metabolites (ko01110), followed by flavonoid biosynthesis (ko00941) and arginine and proline metabolism (ko00330) (Fig. 4a). The DAMs obtained from the CC-2/Zheng58 comparison mainly participate in metabolic pathways (ko01100) and biosynthesis of secondary metabolites (ko01110), followed by pyrimidine metabolism (ko00240) (Fig. 4b). The DAMs detected from the 2A-7×CC-2/Zheng58 comparison group is primarily associated with metabolic pathways (ko01100) and biosynthesis of secondary metabolites (ko01110), followed by flavonoid biosynthesis (ko00941) (Fig. 4c).  (Table 3). Further data analysis showed involvement in KEGG pathways for 6 co-DEPs and 30 co-DAMs (Tables 4 and Tables 5). The introduction of exogenous EPSPS did not affect the expression levels of six other enzymes, DAHP synthase, 3-dehydroquinate synthase, 3-dehydroquinate dehydratase, shikimate dehydrogenase, shikimate kinase and chorismate synthase, or cause the abundance of seven metabolites, 3-dehydroquinate, 3-dehydroshikimate, shikimate, shikimate-    Figure  S5 and S6).

qRT-PCR Analysis of the co-DEPs Involved in the KEGG Pathway
Six co-DEPs involved in the KEGG pathways, aldose reductase, sorbitol dehydrogenase, peroxidase, phenylalanine ammonia-lyase, glutathione reductase and cell wall invertase, were selected for qRT-PCR analysis to detect their transcriptional levels. Aldose reductase was downregulated in the seeds of GM maize line CC-2 but not significantly changed in the seeds of the other two GM maize lines 2A-7 and 2A-7×CC-2. The other five co-DEPs were downregulated in the seeds of all three GM maize lines (Fig. 5).

Exogenous Protein Detection by iTRAQ and ELISA in Seeds of GM Maize Lines
Three exogenous proteins, Cry1Ab, Cry2Ab and EPSPS, were detected as DEPs. ELISA data showed that the content of Cry1Ab was 1.81 µg/g and 2.02 µg/g and the content of Cry2A was 0.31 µg/g and 0.31 µg/g in GM maize seeds 2A-7 and 2A-7×CC-2, respectively. The content of EPSPS was 20.00 µg/g and 28.70 µg/g in GM maize seeds CC-2 and 2A-7× CC-2, respectively (Table 6).

Discussion
Safety assessment of GM crops is performed to demonstrate the substantive equivalence of the GM crops and their non-GM parents. In the past, a large number of safety evaluations based on oimcs analysis of GM crops have either applied nonconsumpted tissues [34][35][36][37] or focused more on the effects of environmental factors. [38][39][40][41] Maize seeds, as edible plant parts for food and feed with abundant protein and metabolite types and quantities, must undergo safety evaluation. Integrated proteomics and metabolomics analyses can identify proteins and metabolites involved in the same pathways, which can better reveal the biological processes of crops after genetic modification. In this study, quantitative proteomics and widely targeted metabolomics analyses were performed for three GM maize lines, 2A-7 and CC-2, and the stacked transgenic line 2A-7×CC-2, and their corresponding non-GM parent, Zheng58, grown in the same environment. There were 120, 271 and 135 maize DEPs detected in the 2A-7/Zheng58, CC-2/ Zheng58 and 2A-7×CC-2/Zheng58 comparisons, respectively. In total, 179, 135 and 131 DAMs were    42,43 In these GM maize lines, the bt genes cry1Ab and cry2Ab are exogenous to plants, and the insecticidal proteins Cry1Ab and Cry2Ab are not involved in known metabolic activity in plants. Herbicide tolerance is conferred by introducing the maroACC gene, which encodes the EPSPS enzyme related to the shikimate pathway. The introduction of the herbicide resistance gene maroACC resulted in an increase in the number of DEPs in CC-2 and 2A-7×CC-2 maize seeds. The maroACC gene encodes EPSPS, a 3-phosphoshikimate 1-carboxyvinyltransferase and the key enzyme in the shikimic acid pathway, which is a metabolic pathway for the biosynthesis of aromatic amino acids in microorganisms and plants. [44][45][46] Nevertheless, integrated proteomics and metabolomics analyses showed that the introduction of exogenous EPSPS did not affect the expression levels of six other enzymes or the abundance of seven metabolites involved in the shikimic acid pathway in seeds of the GM maize lines CC-2 and 2A-7×CC-2 (Supplementary Figure S5 and S6). Insertion of the maroACC gene may interfere with other biological pathways than the shikimic acid pathway.
Integrated proteomics and metabolomics analyses identified 6 co-DEPs annotated in the KEGG pathways, aldose reductase, sorbitol dehydrogenase, peroxidase, phenylalanine ammonia-lyase, glutathione reductase and cell wall invertase, which were selected for qRT-PCR to assess gene expression. Aldose reductase showed similar downregulated patterns at the protein and transcription levels in the CC-2/Zheng58 comparison group but showed no change at the transcription level in the other two comparison groups. The other five co-DEPs showed opposite regulation patterns at the translation and transcription levels. These results are consistent with previously reported results for GM crops, 28,47 which might be due to the expression levels of proteins depending on the synergy of multiple biological processes, such as transcription, posttranscriptional modification, translation and posttranslational modification. Several endoplasmic reticulum-associated proteins, such as heat shock proteins Hsp40, Hsp70, sHSP and protein disulfide-isomerase (PDIs), were identified as DEPs. These proteins work as molecular chaperones or catalyze the rearrangement of -S-S-bonds in proteins to prevent protein denaturation, restore protein conformation and biological activity, and degrade misfolded proteins, which might be the mechanism underlying the self-protection stimulated by the expression of exogenous genes in GM maize seeds.

Conclusions
Integrated iTRAQ quantitative proteomics and widely targeted metabolomics were used to evaluate changes in the protein and metabolite profiles of maize seeds caused by both transgenic modifications and gene stacking. Pathway enrichment showed that these DEPs and DAMs mainly participate in metabolic pathways and biosynthesis of secondary metabolites. Except for EPSPS, shikimate pathway-related proteins and metabolites were not identified as DEPs and DAMs, respectively. Gene stacking did not affect protein and metabolite profiles more significantly.

Conflicts Of Interest
The authors declare that there are no conflicts of interest. Error bars represent the standard deviation (SD) among the three replicates. The asterisks represent significant differences compared with A3525, as indicated by the t-test (* p < .05, ** p < .01 and *** p < .001).