Assessment of the potential allergenicity of genetically-engineered food crops.

Abstract An extensive safety assessment process exists for genetically-engineered (GE) crops. The assessment includes an evaluation of the introduced protein as well as the crop containing the protein with the goal of demonstrating the GE crop is “as-safe-as” non-GE crops in the food supply. One of the evaluations for GE crops is to assess the expressed protein for allergenic potential. Currently, no single factor is recognized as a predictor for protein allergenicity. Therefore, a weight-of-the-evidence approach, which accounts for a variety of factors and approaches for an overall assessment of allergenic potential, is conducted. This assessment includes an evaluation of the history of exposure and safety of the gene(s) source; protein structure (e.g. amino acid sequence identity to human allergens); stability of the protein to pepsin digestion in vitro; heat stability of the protein; glycosylation status; and when appropriate, specific IgE binding studies with sera from relevant clinically allergic subjects. Since GE crops were first commercialized over 20 years ago, there is no proof that the introduced novel protein(s) in any commercialized GE food crop has caused food allergy.


Introduction
Genetically-engineered (GE) crops are developed using modern biotechnology techniques where precise methods are used to introduce the desirable traits into a plant. The intended change in a new GE crop is the desired phenotype brought about by the introduced transgene. In contrast, with traditional plant breeding, genes from two parents are mixed in many different combinations often bringing undesirable traits Ladics et al. 2015). The most widely grown GE crops contain genes for targeted insect protection, herbicide tolerance, or both. Plant expression of Bacillus thuringiensis (Bt) crystal (Cry) insecticidal proteins have been the primary way to impart insect resistance in GE crops. Bt is a common bacterium present in soils, on grains, and in environmental habitats including water (Martin and Travers 1989). The crystal proteins that confer insecticidal properties to Bt sprays are very specific to a subset of immature insects and have been widely used in GE crops to confer insect protection. Bt biopesticides have been adopted for use in commercial agriculture, forestry, and mosquito control (OECD 2007). As of 2011, more than 100 microbial Bt products have been registered to provide effective control of insect pests (USEPA 2011). To date, no harmful or adverse effects have been demonstrated after occupational exposure to Bt products, and no adverse effects have been reported in the consumer population exposed to these products in the form of spray residues on conventional or organic crops (WHO 1999).
Before GE crops are commercialized, they are evaluated for their overall safety from an agronomic, environmental, performance, and equivalence perspective, and the safety of the newly expressed or novel protein (Delaney et al. 2018). One of the evaluations for GE crops is to assess the expressed protein for allergenic potential as there is some concern that introduction of a novel protein into the food supply could increase the risk of food allergy in susceptible individuals. The process by which allergy assessment has been conducted since the 1990s has involved guidance from several expert scientific bodies, including the US FDA (1992), FAO/WHO (1996, 2000, 2001, and Codex (2003,2009). As will be discussed in further detail in this review, several endpoints have been proposed over the years in various decision-tree and weight-of-evidence approaches. At the core of these recommended approaches, several have been consistently utilized and include in silico (Bioinformatics) and in vitro (protein digestion) approaches. Others, such as animal models and targeted sera screening, were identified as needing further development and validation before being implemented. To date, the state-of-the-science and validation regarding animal models and targeted sera screening has not progressed to the stage where such endpoints are currently included in regulatory guidance documents for predicting protein allergenicity potential. Over the last five years, however, one area that has received much attention is the area of in silico bioinformatics analysis. Bioinformatics research has led to advances in predicting the allergenicity of novel proteins as is described in further detail below. substantial equivalence. It is centered on the principle that existing non-GE products used as foods can serve as the basis for comparison focusing on composition, toxicology, allergenicity, and nutritional data. Foods from GE crops undergo many scientific studies to demonstrate that they are substantially equivalent to those from non-GE crops. Proteins expressed by GE crops also undergo an extensive assessment to demonstrate that they are not allergenic or toxic. Before marketing GE crops, such products are required to undergo an evaluation of the potential allergenic and toxicological [outside the scope of this review; for review please see Delaney (2015)] activity of the protein(s) that are produced from the introduced genes. The objective of the allergenic potential evaluation of proteins is twofold: (1) protect allergic consumers from exposure to known allergenic or crossreactive proteins that may trigger an adverse reaction in those already allergic to such proteins, and (2) protect individuals from risks of allergic sensitization associated with the introduction of genes encoding proteins that may become food allergens.
The potential allergy risk to consumers from GE-crops can be placed into one of three categories (Figure 1). The first category that represents a potential risk to the allergic consumer is the transfer of a known allergen or cross-reacting allergen into a food crop. The second potential risk category involves the expression of novel proteins that may become allergens de novo (i.e. a new allergen) and the last, and least likely concerning category is the potential for enhancing the allergenicity of a GE crop (e.g. soybean) by increasing the expression of endogenous allergens. Over the last 22 years, several guidance documents have been written to provide recommendations for assessing the potential allergenicity of transgenic proteins (Metcalfe et al. 1996;FAO/WHO 2001;Codex 2003Codex , 2009. Currently, there is no single, definitive test for determining the allergenic potential of novel proteins. As a result, a ''weightof-the-evidence" (WOE) approach has been recommended by Codex (Codex 2003Ladics 2008) (Figure 2). Currently, most global agencies responsible for the regulation of GE-crops use the WOE approach. The recommended evaluations include consideration of the source of the introduced protein (i.e. whether the gene source for the new protein is known to induce allergy), similarity of the introduced protein to known allergens (in silico amino acid sequence identity comparisons to known human allergens), physicochemical properties (e.g. susceptibility to acid and enzymatic digestion in vitro, heat stability, and glycosylation status), and protein abundance in the crop. The abundance of the introduced protein, however, is currently given little weight by global regulators regarding protein allergenicity, even though such proteins introduced into agricultural biotechnology products are currently expressed at very low levels (i.e. in the ppm to ppb range). When appropriate (i.e. a positive amino acid sequence match to a known allergen is observed or the transgenic gene is derived from a known allergenic source), specific IgE binding studies are also considered. These studies require the use of well-characterized sera from individuals known to be clinically allergic to the identified source. Codex also recognized that certain methods previously recommended (e.g. animal models) were not validated but may prove useful in the future in assessing the allergenic potential of transgenic proteins "as scientific knowledge and technology evolves" (Codex 2003(Codex , 2009).

Evaluating potential allergenicity risk
To assess the first category of risk to the allergic consumer, the potential transfer of a known allergen or cross-reacting allergen into a food crop, the history of safe use (HOSU) of the gene source and recipient is examined and in silico (i.e. bioinformatics) procedures are utilized. The second potential risk category involving the expression of novel proteins that may become allergens de novo is assessed by evaluating the physical/chemical properties (e.g. resistance of the protein to pepsin digestion in vitro; heat stability of the protein) and abundance of the protein in the crop. The last category of risk regarding the potential for enhancing the allergenicity of a GE crop (e.g. soybean) versus it non-GE counterpart by increasing the expression of endogenous allergenic proteins is evaluated using several different analytical tools (e.g. specific IgE binding studies, ELISA, or mass spectrometry) (Figure 3).
Evaluating the potential transfer of a known allergen or cross-reacting allergen into a food crop History of safe use (HOSU): Evaluation of the gene source The scope of the HOSU evaluation of the gene source and the gene recipient includes determining whether the gene source(s) for the GE crop is a common cause of allergy. If this is the case, Figure 1. Categories of potential health risks relative to protein allergenicity and agricultural biotechnology.
• Source of the gene • Amino acid sequence idenƟty comparison with allergens • In vitro digesƟve fate study • Stability to heat and processing • GlycosylaƟon analysis • EvaluaƟon of the alteraƟon of endogenous allergen levels (case-by-case basis) • Allergen specific IgE sera screening (case-by-case basis)  additional tests are likely to be required. For example, peanuts and certain tree nuts (almond, hazelnut, walnut, and pecan) are considered common causes of food allergy, while birch or grass pollen or house dust mite are causes of respiratory allergy and latex contact allergy. If a gene is transferred from one of the commonly allergenic sources, specific serum IgE binding studies will be required which utilize sera from subjects allergic to the identified allergenic source of the gene. It is highly unlikely, however, that genes obtained from one of the commonly allergenic sources would ever be incorporated into agricultural biotechnology crops and commercialized.

In silico bioinformatic tools
Bioinformatics is the comparative analysis of protein sequences intended to evaluate structural and functional relationships. Bioinformatics has several core principles: (1) protein structure is determined by amino acid sequence; (2) similar amino acid sequences have similar structure, and (3) similar sequence and structure infer a common ancestor gene and related function.
Most of the major food, dermal, and respiratory allergens have been identified and cloned. Subsequently, the protein sequences for these allergens have been incorporated into several publicly available databases. Such databases differ in their content, level of descriptive information, data (biological or molecular data, update date, number of sequences), the degree of curation, and the presence of informatics applications for comparing the query novel sequence to public annotated sequences (Gendel and Jenkins 2006). The AllergenOnline database (Allergenonline.org) is a peer-reviewed database containing food, inhalation, dermal, and injection (e.g. venom, saliva of biting insects) allergens maintained at the University of Nebraska, Lincoln since 2004 (Goodman et al. 2016). The database is peer-reviewed by clinical and research allergists from around the world and updated once per year. The inclusion of protein allergens is based on available data in the public literature. The sequences of the proteins with published proof of IgE binding using sera from clinically allergic subjects have been included in the AllergenOnline database. Another peer-reviewed allergen database is the Health and Environmental Science Institute's (HESI) sponsored Comprehensive Protein Allergen Resource (COMPARE) allergen database that was initiated in 2017 (comparedatabase.org).
As a result, novel proteins can be routinely screened for amino acid sequence similarity to known human allergens using bioinformatic tools early in the product development pipeline. Amino acid sequences sharing a high degree of identity often share immunologic similarity (Aalberse 2000; Goodman et al. 2008b). Knowledge of the protein structures responsible for inducing sensitization, however, is still lacking even for known allergens (Ladics et al. 2014a;McClain et al. 2014;Poulsen et al. 2014; . Current bioinformatic analyses involve two recommended criteria; a search for continuous, identical stretches of eight or greater amino acids in length and an identity search using the FASTA (Pearson and Lipman 1988) or BLAST (Altschul et al. 1997) local alignment algorithms to search amino acid sequences of known allergens contained in databases (e.g. the AllergenOnline database) for alignments of 80 amino acids or greater possessing a sequence identity !35% (FAO/WHO 2001;Codex 2003;Cressman and Ladics 2009;Mirsky et al. 2013). The !35% identity match over 80 or greater amino acids criteria was established based on data indicating protein cross-reactivity occurring between Bet v 1 and vegetable proteins at %40% protein identity (Scheurer et al. 1999).

Bioinformatic criteria for IgE-mediated allergy
As recommended by FAO/WHO (2001), IgE cross-reactivity between a novel protein and a known allergen is considered a possibility when there is !35% identity over a segment of 80 or greater amino acids. However, for cross-reactivity to occur Aalberse (2000) has reported that a high degree of similarity is needed (i.e. >50-60%) over significant spans of the novel protein and allergen. Goodman et al. (2008) further noted that the risk of cross-reactivity exists among proteins with >70% identity. Radauer and Breiteneder (2006) reviewed sequence identities among allergenic and non-allergenic homologs of pollen allergens and reported that the prerequisite for allergenic crossreactivity between proteins was a sequence identity of at least 50% across the length of the protein.
Eight contiguous identical amino acid matches between a novel protein and a known allergen(s) are required by some regulators to identify sequences that may represent "theoretical" linear IgE binding epitopes. The original FAO/WHO (2001) recommended criteria, specifying there be a ! 6 amino acid continuous identity, has been discredited as producing too many false positives (Hileman et al. 2002;Stadler and Stadler 2003;Thomas et al. 2005;Ladics et al. 2006;Silvanovich et al. 2006;Goodman 2008). Furthermore, various other publications (Goodman 2008;Herman et al. 2009;EFSA 2011;Goodman and Tetteh 2011;Ladics et al. 2011;Harper et al. 2012;Young et al. 2012, Mirsky et al. 2013) have indicated that the standard search for a sequence of eight or more contiguous identical amino acids between the query protein and a known allergen also provides minimal significance in predicting potential protein allergenicity.
The second in silico criterion employs the FASTA local alignment algorithm (Pearson and Lipman 1988) to identify regions between a query sequence and an allergen sequence displaying identity !35% over an 80 or greater amino acid "window" (Codex 2003(Codex , 2009EFSA 2011). The purpose is to identify proteins with sufficient similarity to known human allergens to infer potential cross-reactivity. The resulting scores generated from comparisons between the query and all dataset proteins are then used to establish a linear relationship between alignment score and protein length. The score of an alignment with respect to the calculated distribution of all scores gives an indication of whether the alignment is meaningful; this is reflected in the expectation value (E value, a measure of the potential random occurrence of aligned sequences used to evaluate the significance of an observed alignment) assigned to the alignment. A small E value (e.g. 10 À7 ) indicates a potential biologically relevant similarity in the context of potential allergenic cross-reactivity; large E values (e.g. >1.0) represent random alignments that do not possess biologically relevant similarity (Pearson 2000;Silvanovich et al. 2009;Mirsky et al. 2013). The smaller an E value for an alignment, the more likely the comparison between the two proteins reflects a true structural similarity, although the E value can be influenced by protein length and database size (Baxevanis and Ouellette 1998).
When FAO/WHO published recommendations in 2001 for the determination of potential cross-reactivity to known allergens, they included a "sliding window" approach for performing the searches. The procedure includes a step to ''prepare a complete set of 80-amino acid length sequences derived from the expressed protein" (FAO/WHO 2001) followed by subsequent comparison of each 80-amino acid sequence to an allergen database. The potential for cross-reactivity was to be considered when there was ''more than 35% identity in the amino acid sequence of the expressed protein using a window of 80 amino acids and a suitable gap penalty" (FAO/WHO 2001). Above threshold matches (!35% over an 80-amino acid window) were automatically identified for closer scrutiny. The positive predictive value of the !35% identity over an 80-amino acid sequence using a "sliding window" algorithm (FAO/WHO 2001) was assessed Cressman and Ladics 2009). False positive and negative rates were evaluated using the "sliding window" of 80 amino acids versus conventional (i.e. full-length sequence) FASTA analyses. Data indicated that a conventional FASTA analysis across the whole protein sequence produced fewer false positives and equivalent false negative rates compared to the 80-amino acid "sliding window" FASTA search. Silvanovich et al. (2009) further noted that using the current !35% identity match over 80-amino acids sliding FASTA window criteria, the same rate of false positives could be achieved using either randomly selected protein sequences or the same sequences after being subjected to 1000 rounds of sequence shuffling. In many cases, the sliding window analysis resulted in identity matches to a variety of proteins from different families with diverse functions, again supporting the large degree of false positives observed with such analysis. Nevertheless, the positive results obtained with the conventional FASTA analysis still exceeded what would be predicted based on the expected percentage of real or true allergens in the clinic. This finding is likely due to the use of the currently recommended, very conservative, threshold criteria of 35%. When Ladics et al. (2007) raised the threshold to 50% when evaluating corn seed protein sequences, the number of positive findings decreased by half using the conventional FASTA analysis. By imposing a defined threshold of !35% sequence identity over an alignment length !80 amino acids, the default local alignment search criteria are constrained (Silvanovich et al. 2006;Ladics et al. 2007;Cressman and Ladics 2009). This constraint neglects many of the FASTA features that help to define relevant homologies between sequences, features incorporated into the algorithms themselves (e.g. E value). To date, global regulators do not take into account the E value and thus the significance of an observed alignment. As a result, if an alignment of 35.5% is observed between two protein sequences with a very large E value (e.g. 2.0), the alignment would still likely be considered significant by regulators.
Rather than using a specific percent identity cutoff value, an evaluation using an E value threshold has been proposed. For instance, Silvanovich et al. (2009) applied E value threshold methodology to 7695 corn protein sequences and determined a FASTA E value cutoff of 4.7 Â 10 À7 . This cutoff was 100% effective at identifying known allergens, but was sufficiently conservative as to have a 95% false positive rate such that no potential positive matches would be overlooked. Mirsky et al. (2013) conducted a comprehensive large-scale in silico evaluation of the various bioinformatic assessment criteria, including searches for: (1) alignments between a query protein and an allergen having !35% amino acid identity over a length !80 amino acids; (2) any identical sequence (of some minimum length) found in both a query protein and an allergen; and (3) any alignment between a query protein and an allergen with an E value below some threshold. The most effective criterion reported by Mirsky et al. (2013) suggests a query protein potentially allergenic if there is either (a) an alignment between it and an allergen having !35% amino acid sequence identity over an alignment length !80 or (b) some identical sequence of 13 or more amino acids found both in it and an allergen, and (c) there is an alignment between it and an allergen with an E value 10 À4 . These data suggest that a combination of amino acid alignments and E values should be employed when evaluating the potential allergenic cross-reactivity between two proteins.
Allergens are found in only a small subset of all known protein families (Radauer and Breiteneder 2006;Radauer et al. 2008). Furthermore, protein families that contain allergens also include numerous non-allergenic proteins. There are currently no known unique motifs that identify a protein as an allergen; however, further understanding of structural attributes and conformational epitopes may prove valuable for assessing allergenic potential. Although conformational epitopes are common in inhaled allergens, food allergens may contain them as well if the allergen is not completely cleaved in the digestive tract and digestion-resistant fragments are absorbed. New approaches based on the protein conformational structure when proven predictive may be useful for refining the WOE approach in the future. A structural database of allergenic proteins (SDAP) is available online (http://fermi.utmb.edu/SDAP/). SDAP is a web server that provides rapid, cross-referenced access to the sequences, structures and IgE epitopes of allergenic proteins (Oezguen et al. 2008;Ivanciuc, Garcia, et al. 2009;Ivanciuc, Midoro-Horiuti et al. 2009).
The SDAP core is a series of scripts that process user queries, interrogate the database, perform various computations related to protein allergenic determinants and prepare the output internet pages. The database component of SDAP contains information about the allergen name, source, sequence, structure, IgE epitopes and literature references, as well as easy links to the major protein databases (PDB, SWISS PROT/TrEMBL, PIR-ALN, NCBI Taxonomy Browser) and relevant literature. The computational component in SDAP uses an algorithm based on conserved properties of amino acid side chains to identify regions of known allergens. Such bioinformatic tools may have the potential to rapidly determine a potential cross-reactivity between proteins and to screen novel proteins for the presence of IgE epitopes they may share with known allergens. Nonetheless, major limitations to these methods suggest they are not yet predictive. Particularly, there are relatively few allergens for which the crystal structure is known. There are %715 allergens in the official database for the systematic allergen nomenclature that is approved by WHO and the International Union of Immunological Societies (WHO/IUIS) Allergen Nomenclature Subcommittee (www.allergen.org; WHO/IUIS). However, the 3 D structure has been solved for only $75 allergens, i.e. $10% of the allergens in the database, from which only $24 are food allergens. Further, the structure of a protein may be more related to the intrinsic function of the protein and less to the form in which the protein may sensitize a person or elicit an allergenic response. Therefore, additional data are required to assess the utility of protein structure in predicting protein cross-reactivity.
In summary, bioinformatics techniques based on linear sequence comparisons could be improved by using additional tools, such as E value thresholds. Further, to increase the power of the bioinformatics analyses, the determination of the degree of identity between proteins and known allergens by using a structural database and appropriate comparison scripts may prove useful in the future.

Bioinformatic criteria for celiac disease
Codex (2003) does require an evaluation for proteins derived from wheat or wheat relatives (e.g. barley, rye, and possibly oats) regarding their potential to induce celiac disease (CD); however, Codex did not provide guidance on how to conduct the evaluation. Several proteins have been identified in the scientific literature as potential CD-inducing proteins (Stepniak et al. 2005(Stepniak et al. , 2008Camarca et al. 2009;Dørum et al. 2010;Mitea et al. 2010). A database of peptides from wheat, barley, and rye that cause Tcell stimulation and intestinal epithelial pathology (www.allergenonline.org/celiachome.shtml) has been developed. The database is part of the www.AllergenOnline.org database. Currently, the CD database includes 1013 peptides with published evidence of T-cell reactivity using cells from CD patients in the context of MHC Class II DQ2.5 or DQ8 or toxic effects in intestinal epithelial cells or pathology in intestinal villi from those with CD. AllergenOnline suggests that novel proteins be searched for exact matches to the peptides in its database. In addition, proteins may be identified as potentially stimulating CD by using FASTA analysis and employing a proposed criteria of >45% identity over alignments of at least !100 amino acids and having an E value of <1 Â 10 À15 . Importantly, genes taken from plants outside of the Pooideae subfamily of grasses represent a little risk of causing CD. If a significant match is found, the protein should be further tested using cell-based assays or possibly food challenges in celiac patients to ensure minimal risk to the CD population.
Evaluating the potential de novo sensitization: an abundance of protein in the food crop; stability to digestion in vitro, heat stability, and glycosylation status

Stability to digestion in vitro and abundance of protein in the food crop
To evaluate the second potential allergy risk associated with agricultural biotechnology, the creation of food allergens de novo, biochemical and physical properties of the protein are evaluated as well as its abundance in the GE crop. The biochemical and physical endpoints include stability to pepsin and trypsin digestion in vitro, glycosylation status, and stability to heating (i.e. processing effects). Many food allergens share certain properties such as stability as defined using denaturants (such as heat) and biochemical measures of stability, such as resistance to pepsinolysis (Breiteneder and Mills 2005). Indeed, resistance to digestion is a property common to some, but not all, dietary proteins thought to sensitize by the human gastrointestinal tract (GIT) (Mills et al. 2004). To sensitize an individual via the GIT, an allergen must have properties which preserve its structure from degradation (such as resistance to low pH, bile salts, and proteolysis), thus allowing enough allergen to survive in a sufficiently intact form to be taken up by the gut and sensitize the mucosal immune system (Taylor and Hefle 2001;Mills et al. 2004). Investigations into the role of digestion in allergenicity of proteins have been hindered by a lack of common approaches and protocols for modeling gastrointestinal digestion in vitro.
The first published application of an in vitro pepsin digestion assay to address the question of food allergen stability was by . Subsequently, there have been several studies repeating the pepsin digestion assay for a variety of proteins (Buchanan et al. 1997;Kenna and Evans 2000;Okunuki et al. 2002;Fu 2002;Herman et al. 2007;Mandalari et al. 2009). Several variations in simulated gastric fluid (SGF) assay parameters have been reported and include differences in the pH of the assay, the purity of the pepsin, the pepsin to target protein ratio, the target protein purity, and finally, the method of detection. A standardized protocol for evaluating the in vitro pepsin resistance to proteins was established in the context of an international inter-laboratory study (Thomas et al. 2004). Although a correlation between resistance to pepsin digestion and allergenic potential has been proposed ) the correlation is low Ofori-Anti et al. 2008). There are examples of proteins in food that are not digestible and don't illicit food allergy. Likewise, it cannot be concluded that allergenic food proteins are necessarily more resistant to digestion. Data with kiwi suggests that unstable allergens may be protected from pepsin digestion by components of the food matrix (Polovic et al. 2007). Therefore, measurement of protein digestibility should not be regarded as a stand-alone endpoint for the safety assessment of novel proteins.
Instead, a WOE approach as described above should be utilized, as no single factor has been recognized as predictive of protein allergenicity (Ladics 2008). Novel proteins used in GE crops (e.g. Bt proteins in corn or CP4 EPSPS in herbicide-tolerant soybeans) are in general rapidly digested in pepsin in less than one minute (Herman et al. 2003;Herouet et al. 2005;Herman et al. 2006). Investigations into the utility of a more physiological based pepsin digestion assay are currently underway. For example, the International Life Science Institute's Health and Environmental Sciences Institute has evaluated a combined gastric and duodenal phase in vitro digestion assay based on the paper of Mandalari et al. (2009) using more physiological based parameters and various allergenic and non-allergenic proteins (Akkerdaas et al. 2018). These investigators reported that sub-optimal pH, low pepsin-to-protein ratio, and sequential pepsin and pancreatin digestion protocols do not improve the predictive value in distinguishing allergens from non-allergens.
There is also not a consensus on the importance of protein abundance in allergenicity assessment, particularly from a regulatory point of view, although clearly, the abundance of a number of major allergenic proteins in plants used for foods is greater than 1% of the protein in the food fraction . Importantly, novel proteins in GE crops are expressed at very low levels, in the ppm or ppb range. For example, Cry 1Ab expression level in MON 810 GE corn event was detected to be 0.83 ± 0.15 ppm in the grain (Szekacs et al. 2010). Similarly, Cry1F expression levels in Herculex 1V R GE corn was between 71-115 ppm (Mendelsohn et al. 2003). Mean expression levels of Cry1Ac in DBT418 corn kernels ranged from 36 to 42.8 ng/g dry weight (FSANZ 20013 safety assessment of DBT418 corn). According to the WHO/GEMS database, the consumption rate of corn in human is 4.98 g/kg/day; therefore 42.8 ng/g Cry1Ac corresponds to around 213 ng/kg of Cry1Ac for humans.

Heat stability
The heat stability of some protein allergens is considered an important feature for the retention or increase in the allergenicity of some foods after cooking or processing (Breiteneder and Mills 2008). For most proteins, the function is linked to their native folded conformation (Berg et al. 2002). Therefore, loss of protein function strongly correlates with loss of native structure. Currently, where possible, a protein function assay (e.g. enzyme activity) is the method of choice to assess thermal stability in the context of an allergenicity risk assessment, since a protein's function is a relevant biophysical feature of the protein (Indian Ministry of Science and Technology 2008). Some regulatory authorities also require the inclusion of an immunodetection assay using polyclonal IgG antibodies generated in animals as a prospective surrogate for IgE binding assessments. However, animal IgG binding is not considered an appropriate substitute when assessing allergenicity of novel proteins as this is an indication of immunogenicity rather than allergenicity (Davis et al. 2001). Furthermore, antibody binding does not necessarily correlate with loss of protein function, so assessing these two different endpoints together are not additive to the WOE approach used for identifying potential allergenicity and in fact, they can be contradictory to one another. This is illustrated by the safely consumed phosphinothricin acetyltransferase (PAT) protein, which is inactivated at 40-45 C (15 min) or 60 C (10 min) but is clearly detectable even after heat treatment at 100 C . These data show that the immunoreactivity is still detectable even if the PAT protein loses its enzymatic activity. In addition, the same recognition of the heat-treated and the native proteins by anti-PAT antibodies indicates that the conformational changes associated with denaturation do not affect the epitope accessibility by IgG. Therefore, a correlation between the loss of functional activity upon heating or immunodetection and the allergenic status of a protein has not been consistently demonstrated. As noted by Privalle et al. (2011), there is a distinct difference between the maintenance of allergenicity in cooked food (i.e. the ability to elicit IgE binding in vivo) and the retention of endogenous protein function (enzymatic or biological activity).
Heat-mediated unfolding may cause a loss of function which could occur in conjunction with (1) a change in immunological status such as a loss of conformational IgE binding sites (e.g. where sensitization occurred to native protein); (2) unfolding could be associated with no effect because only linear epitopes are relevant; or (3) unfolding can reveal hidden allergenic epitopes (Mills et al. 2009;Sathe and Sharma 2009). These immunological impacts, however, are not known to correlate with or be caused by a loss of protein function itself. Heat treatment has been shown to eliminate the allergic potential of some allergens such as patatin protein in potato (Koppelman et al. 2002), the hazelnut Cor a 1.04 allergen (Pastorello et al. 2002;Skamstrup Hansen et al. 2003) and chitinases present in fruits (S anchez- Monge et al. 2000). These are primarily incomplete allergens which cannot sensitize but can elicit an allergenic reaction after sensitization to a cross-reactive protein. Kiwifruit has several allergens, including cross-reactive lipid transfer protein and chitinase, and most appear sensitive to heat (Fiocchi et al. 2004). The production of soybean meal, a process which involves heat treatment, dramatically changes the profile of the extractable proteins and their immunological properties (Franck et al. 2002). However, the allergenicity of the heat-treated soybean meal is not significantly altered (Besler et al. 2001;Franck et al. 2002). Neoallergen (i.e. new or hidden allergens) formation may be part of the reason why some individuals can tolerate a raw food or raw food ingredient but will react to the same food when it is processed. Neoallergens have been identified from pecans, wheat flour, roasted peanuts, lentil, almond, cashew nut and walnut, soybean, shrimp, scallop, tuna, egg, apple, plum, milk, and potato (Vojdani 2009) In summary, measured loss of function and changes in protein conformation have no consistent association with changes in the clinical allergenicity of protein allergens: structural changes to proteins can have no effect on allergenicity, may increase allergenicity, or reduce allergenicity (Mills et al. 2009;Paschke 2009). For a novel food protein, there is no way to predict which might occur. Since the thermal stability of novel food proteins does not consistently correlate with allergenic risk, it does not provide any safety information as part of the allergy risk assessment of transgenic crops. Heat stability results with novel food proteins have no known predictive value in the allergenicity risk assessment . In a limited approach to supporting the broader safety assessment of novel proteins, thermal deactivation may be relevant to the toxicological risk assessment for cooked or processed food if a protein has some known adverse toxicological effect associated with its function (Hammond and Jez 2011;Delaney 2015).

Glycosylation status
Several protein allergens are glycosylated, implying the possibility that the glycosyl groups may contribute to their allergenicity (Huby et al. 2000;Breiteneder and Mills 2005). Oligosaccharides are naturally added to many proteins during or after their synthesis in eukaryotic cells. Glycosylation involves the covalent attachment of oligosaccharides most commonly to asparagine (N-linked) or serine/threonine (O-linked) amino acids. Glycosylation can influence the physical properties of proteins such as stability, hydrophobicity, solubility, and electrical charge, which may then affect its antigenic and allergenic potential. For example, antigen-presenting cells (APC) have been shown to have enhanced uptake of glycosylated proteins compared to their non-glycosylated counterparts (Sallusto et al. 1995). The latter uptake may be due to the presence of specific sugar receptors on the surface of APC (Condaminet et al. 1998). In addition, it has been reported that the receptor-mediated uptake of proteins by APC produced a quantitative increase in the antigenicity of proteins (Tan et al. 1997;Agnes et al. 1998). These data indicated that APC effectively process glycosylated proteins and subsequently mediate an enhanced immune response. Garrido-Arandia et al. (2014) investigated the role of N-glycosylation in kiwi allergy. These investigators reported that the sugar moiety induced the activation of APC, thus potentially playing a role in sensitization. Though Garrido-Arandia et al. (2014) also indicated that it was the kiwi protein fraction and not the sugar moiety, that was responsible for the allergic reactions. Up to 30% of allergic patients generate specific anti-glycan IgE (Altmann 2016). Despite antibody-binding glycol-proteins being common in foods, insect venoms, and pollen, cross-reactive carbohydrate determinants do not appear to cause clinical symptoms in patients and should be rated as false positives (Mari 2002;Ebo et al. 2004;Altmann 2007Altmann , 2016Yokoi et al. 2017). Furthermore, there are many glycosylated proteins that are not allergenic. Therefore, it is important to not consider glycosylation by itself, but rather in the context of the overall WOE data regarding the allergenic potential of a novel protein as described above.
Third category of potential allergenic risk: Assessment of endogenous allergens The question of whether the transformation of a gene(s) might increase the levels of endogenous allergens in an allergenic crop (i.e. soybean) has been raised. The measurement of endogenous allergens, however, represents the least likely potential risk associated with GE crops. It is mentioned as part of the "compositional analysis of key components" in GE crops including nutrients, anti-nutrients, and toxicants in the Codex Alimentarius document (Codex 2009). Additionally, a requirement from the European Union (EU) Commission to measure endogenous allergen levels in soybean as part of the compositional assessment of GE crops has been implemented (European Commission Implementing Regulation No. 503/2013). This recommendation has generated much discussion on the relevance of such data in the risk assessment for GE crops Doerrer et al. 2010;Herman and Ladics 2011;Fernandez et al. 2013;Panda et al. 2013;Ladics et al. 2014b). Several analytical tools, such as quantitative mass spectrometry (Lee et al. 2010;Stevenson et al. 2010;Houston et al. 2011;Stevenson et al. 2012) and enzyme linked immunosorbent assays (ELISAs) (You et al. 2008;Ma et al. 2010;Liu et al. 2012) have been employed to measure endogenous soybean allergens. These methods are used to quantitatively measure protein expression level. Twodimensional (2-D) gel electrophoresis has also been utilized (Rouqui e et al. 2010).
There are several reasons, however, why the measurement of endogenous allergens in GE crops does not add to their risk assessment. First, allergic individuals will attempt to avoid offending foods, whether GE or non-GE. Since soybean containing products are labeled, based on labeling regulations including the regulation in the EU, this risk is manageable by allergenic patients. Therefore, the level of allergen(s) is irrelevant when the food is already known to be allergenic and is regulated as such. Secondly, the measurement of endogenous allergens in non-GE crops demonstrates a wide range of natural variability in seed concentrations due to differences in the genetics of commercial varieties (Houston et al. 2011) and the interactions of those varieties with the environment (i.e. temperature, moisture, nutrients, plant pathogens, insect loads) (Sancho, Foxall, Rigby et al. 2006;Goodman et al. 2008;Doerrer et al. 2010;Stevenson et al. 2012;McClain et al. 2018). It has been reported that the insertion of a small number of genes through transgenesis is less impactful on crop composition as compared with traditional breeding methods (Herman et al. 2009;Parrott et al. 2010;Herman and Ladics 2011;Ricroch et al. 2011;Herman and Price 2013;Ladics et al. 2015). In addition, several studies have evaluated IgE binding between non-GE and GE soybean varieties and found no significant differences between the GE and non-GE varieties (Sten et al. 2004;Hoff et al. 2007).
Lastly, data are lacking regarding thresholds for individual allergenic proteins for either the sensitization or elicitation phase of an allergic reaction which prevents the interpretation of any differences in levels between the GE and non-GE crop (Taylor et al. 2002;Taylor and Hourihane 2008;Chassy 2010;Crevel 2010). In addition, there is no quantitative association between the exposure to an individual amount of a protein allergen(s) and the risk for sensitization and/or clinically relevant reaction (Panda et al. 2013). For the above reasons, it is not clear what relevance the measurement of endogenous allergens adds to the risk assessment of GE crops (Rouqui e et al. 2010;. Importantly, conventional breeding tactics, such as chemical and radiation mutation, can also alter existing endogenous levels of allergenic proteins.

Endpoints requiring further evaluation/validation
Animal models to evaluate potential protein allergenicity There has been some research with animal models that show promise for evaluating mechanisms of allergy and immunotherapy (Kulis et al. 2012) and for preliminary ranking of allergenic sources (Sun et al. 2013;Ahrens et al. 2014). Animal models are frequently used for confirming the hypo-allergenicity of foods (e.g. milk formulas for infants). Researchers have also investigated the use of animal models to predict the allergenic potential of novel proteins in food. The use of animal models for predicting food allergy gained further attention by the FAO/WHO recommendations (FAO/WHO 2001), which called for testing each novel protein in two animal models based on two different sensitization routes, even though the FAO/WHO publication recognized that there were no validated animal models available for predicting food allergenicity. Several investigators have worked on developing models for predicting or ranking the potential allergenicity of food proteins in several different species as reviewed in Ladics et al. (2010). Several models have been proposed using rats (Knippels et al. 1998(Knippels et al. , 1999a(Knippels et al. ,b, 2000Ladics et al. 2003), mice Herouet-Guicheney et al. 2009;Aldemir et al. 2009), dogs Buchanan and Frick 2002;Teuber et al. 2002) or swine (Helm et al. 2002. Most of these models are based on the assessment of induced antibody (i.e. IgE or IgG) responses and the frequency of responders in the test groups. Currently, however, none of the animal models have been tested with a wide range of allergens and putative non-allergens, and there is a lack of data on the reproducibility and predictive value (sensitivity and specificity) of any of the models. Importantly, several relevant questions remain concerning the use of animal models for predicting potential protein allergenicity: (1) What is the most appropriate endpoint or design for an animal model?
(2) What constitutes the measurement of a positive allergic response?, and (3) What is the most appropriate form of protein to test (i.e. isolated pure protein vs. protein in the food matrix)?.
Different species have been tested (e.g. rat, mouse, dog, swine) and clear strain differences have been observed that significantly impact results. This includes the potential use of various adjuvants, dosing regimen (i.e. number and timing of antigen doses), and route(s) of exposure for sensitization and challenge. For example, the oral (PO) or intragastric (IG) route may appear to be the most relevant for testing food proteins; however, the complication of oral tolerance by prior exposure to the protein and uniformity of dosing must be overcome (e.g. via adjuvant coadministration). Yet, while adjuvant co-administration will increase the sensitivity of detection of IgE antibody responses to proteins particularly by the IG route, there is concern that this may result in some loss of specificity. On the other hand, an intraperitoneal (IP) injection may represent the most direct assessment of the allergic potential for a novel protein, and it has been demonstrated that IP injections may overcome the tolerance that may occur if the protein is administered orally (Dearman et al. 2003) as the APC and lymphatic routes are different for IP compared to IG routes of exposure. Additionally, sensitization by routes other than IP or IG (e.g. dermal, subcutaneous, or inhalation) may need to be considered . For example, Nelde et al. (2001) reported that dermal application of ovalbumin (OVA) in BALB/c mice induced antigen-specific IgE production more efficiently than via the IP route, although anaphylactic symptoms could be induced in all mice independently of the route of antigen application. Hsieh et al. (2003) reported successful induction of clinically active IgE to OVA (based on the oral challenge) by 4-week epicutaneous applications of 0.1 mg antigen on the shaved backs of BALB/c mice. Administration of food extracts by transdermal application to mice has also been investigated (Navuluri et al. 2006;Birmingham et al. 2007).
Given that IgE is the most common marker of allergic sensitization and mechanistically it is essential for inducing the most common type of allergic reactions in humans (Bruijnzeel-Koomen et al. 1995), it is appropriate to consider this endpoint to measure. Other endpoints, such as a T H 2 cell response, cytokines, and APC levels, are also being investigated and may prove to be important endpoints to evaluate in future in vivo assessments of food allergy (Johnston et al. 2014). The choice of the IgE test(s), however, that are relevant is not as obvious. For example, in vivo measurement of protein-specific bioactive IgE by passive cutaneous anaphylaxis (PCA) or active cutaneous anaphylaxis (ACA) reactions, or active systemic anaphylaxis (ASA) and/or in vitro measurements of serum levels of antigen-specific IgE (i.e. antibody titers by enzyme-linked immunosorbent assay [ELISA]), are not always equivalent. It is also known that the abundance of antigen-specific IgE, as measured by in vitro antigen binding, is not absolutely correlated with symptoms of food allergy (McClain and Bannon 2006). Moreover, what constitutes negative or positive allergic responses in an animal model, like the level of antibody titer, the number of positive responders in the PCA assay, and/or frequency and severity of clinical signs of allergy, could impact the concordance of an animal model (i.e. ability to correctly identify both positives and negatives).  reported that the validity of any animal model should be based on the demonstration of a rank order of potency for several allergens, comparable to what is known regarding their prevalence and severity of responses in humans (Osterballe et al. 2005;Rona et al. 2007;Bjorksten et al. 2008). Importantly, regarding the potency of food allergens/allergenic foods in humans, there is only information available on their severity in challenge reactions, as little is known on the sensitizing potential of these food items in humans (Taylor and Hefle 2001;Crevel et al. 2008;McClain et al. 2014; . Given the questions raised above, it is not surprising that currently no animal model(s) has been extensively evaluated and validated with pure proteins or unprocessed or processed foods and is included in current regulatory guidance on predicting potential protein allergenicity . Animal models, however, are important for investigating the etiology of food allergy as well as evaluating advances in immunotherapy techniques to induce desensitization and, ultimately, tolerance to food allergic reactions (Oyoshi et al. 2014;Larsen and Bogh 2018).

Summary
There is an extensive safety assessment process in place that evaluates GE crops for potential allergenicity issues and employs a weight-of-the-evidence approach. In fact, GE crops have undergone more testing than any other food in history (Cockburn 2002) to evaluate the potential to: (1) transfer a known allergen or a likely cross-reactive protein; (2) create an allergen de novo; or, (3) increase the levels of endogenous allergens in an already allergenic crop (i.e. soybean). Since GE crops were first commercialized over 20 years ago, there is no evidence that the introduced novel protein(s) in any approved and commercialized GE crop has caused food allergy. Importantly, those GE crops with the potential to cause allergy can be identified early in product development and their introduction stopped before commercialization.

Disclosure statement
The author is employed by the DuPont Co., a company that develops and commercializes agricultural biotechnology products. The author alone is responsible for the content of this manuscript.