Cumulative acquisition of pathogenicity islands has shaped virulence potential and contributed to the emergence of LEE-negative Shiga toxin-producing Escherichia coli strains

ABSTRACT Shiga toxin-producing Escherichia coli (STEC) are foodborne pathogens causing severe gastroenteritis, which may lead to hemolytic uremic syndrome. The Locus of Enterocyte Effacement (LEE), a Pathogenicity Island (PAI), is a major determinant of intestinal epithelium attachment of a group of STEC strains; however, the virulence repertoire of STEC strains lacking LEE, has not been fully characterized. The incidence of LEE-negative STEC strains has increased in several countries, highlighting the relevance of their study. In order to gain insights into the basis for the emergence of LEE-negative STEC strains, we performed a large-scale genomic analysis of 367 strains isolated worldwide from humans, animals, food and the environment. We identified uncharacterized genomic islands, including two PAIs and one Integrative Conjugative Element. Additionally, the Locus of Adhesion and Autoaggregation (LAA) was the most prevalent PAI among LEE-negative strains and we found that it contributes to colonization of the mice intestine. Our comprehensive and rigorous comparative genomic and phylogenetic analyses suggest that the accumulative acquisition of PAIs has played an important, but currently unappreciated role, in the evolution of virulence in these strains. This study provides new knowledge on the pathogenicity of LEE-negative STEC strains and identifies molecular markers for their epidemiological surveillance.


Introduction
Shiga toxin-producing Escherichia coli (STEC) are a group of food-borne pathogens that cause diarrhea and severe diseases, such as Hemorrhagic Colitis (HC) and Hemolytic Uremic Syndrome (HUS), primarily in children under 5 years of age [1]. STEC causes outbreaks and sporadic cases of gastroenteritis affecting both public health and the food industry [2]. Shiga toxins (Stx) are the primary STEC virulence factors responsible for pathologies such as HC and HUS [3]. There are two variants of Stx, Stx1 and Stx2, which include subtypes (Stx1a,c,d and Stx2a-g) defined by nucleotide differences, biological activity and serological reactivity [4]. STEC strains producing Stx1a, Stx2a, and/or Stx2d are associated with the most severe cases [5].
The Locus of Enterocyte Effacement (LEE) Pathogenicity Island (PAI) harbors genes that mediate the adhesion phenotype of a group of STEC strains (LEE-positive) that are associated with HUS and are clinically relevant. Specifically, LEE mediated adherence leads to the loss of intestinal microvilli, resulting in severe diarrhea [6]. However, STEC strains lacking LEE (LEE-negative STEC) have also been isolated from severe disease cases, suggesting the existence of additional virulence factors that also favor the pathogenicity of these bacteria. A study examining a small number of STEC LEE-negative strains demonstrated that this is a heterogeneous group, harboring diverse arrays of virulence factors [7]. Genetic elements such as the PAIs harbor several of these virulence factors. Three PAIs have been reported as exclusively present in LEE-negative STEC strains, namely: the Locus of Proteolysis Activity (LPA) [8], the Subtilase-Encoding Pathogenicity Island (SE-PAI) [9] and the Locus of Adhesion and Autoaggregation (LAA) [10]. However, the pathogenicity mechanisms mediated by these PAIs have not been thoroughly investigated. Recently, LEE-negative STEC strains belonging to serogroups O91, O113, O128, O146 and O174 have been increasingly identified from clinical cases in Europe, Argentina and South Korea [11][12][13][14][15]. Unfortunately, limited surveillance of LEEnegative STEC strains has prevented an accurate assessment of their global spread and impact on public health [16].
A useful strategy for studying the evolution of these pathogens involves combining comparative genomics with epidemiological data [17,18]. Moreover, associations between genetic data and phenotypic behavior improve these analyses and may contribute to reaching new insights into the features that have favored the emergence of these strains in the clinical setting, as well as delineate the virulence potential of specific strains [19]. While studies using these approaches have increased our understanding of LEE-positive STEC strains [17,19,20], the number of studies involving LEE-negative STEC strains is limited [7]. Therefore, we performed a large-scale genomic analysis of 367 LEE-negative STEC strains isolated globally from several sources, including humans, food, animals and the environment. As a result, we were able to identify uncharacterized Genomic Islands (GIs), including two novel PAIs and one Integrative and Conjugative Element (ICE). In addition, we found that LAA was the most prevalent PAI among LEE-negative STEC strains, suggesting that it plays an important role. This work also provides experimental evidence supporting the participation of LAA in the intestinal colonization of a mouse model of STEC infection. Phylogenetic analyses and genomic comparisons suggest that evolutionary events, in which individual genes or groups of genes are acquired, could explain the increased incidence of these strains. This study is a step forward in our knowledge of the evolution of genomes and pathogenic mechanisms of LEE-negative STEC, with important implications for future studies on their epidemiology and surveillance.

Bacterial strains, plasmids and primers
The STEC strains used in this study are summarized in Table S1. Spontaneously derived streptomycin resistant (Str r ) mutants of STEC strains O113:H21 E045-00 and O91:H21 V07-4-4, were used in this study. Strains were grown in static or agitated LB at 37°C. Culture media was supplemented as needed with ampicillin (100 µg/ ml), kanamycin (50 µg/ml) or streptomycin (50 µg/ ml). Plasmids and primers used are summarized in Table S9.

Genome sequencing
The 35 STEC strains sequenced in this study were isolated from Argentina (n = 14) [21], Chile (n = 16) [22] and Spain (n = 5). Genomic DNA was extracted using the Wizard genomic DNA purification kit (Promega Corp., USA) and sequenced at MicrobesNG (University of Birmingham, UK) using Illumina MiSeq or HiSeq 2500 technology platforms with 2×250-bp paired-end reads. Draft genomes were provided after trimming low quality-ends and assembling reads with SPAdes 3.10 [23]. Assembly statistics were obtained with Quast v4.6.3 [24]. Contigs shorter that 200 nt were removed and sequences were deposited at Gen-Bank under Bioproject PRJNA448751. Nucleotide accession codes are included in Table S1.

Publicly available genome sequences
A total of 332 genome sequences of LEE-negative STEC strains were downloaded from GenBank on 15 January 2018. Sequences management and BLAST searches were performed using Geneious software (v11.0.5; Biomatters Ltd). The presence of stx genes and the absence of the eae gene were confirmed using BLASTn. Genomes accession numbers are listed in Table S1.

Identification of PAIs and ICEs
Nucleotide sequences of LAA, SE-PAI, LPA and the High-Pathogenicity Island (HPI) were downloaded from GenBank, using accession numbers AFDQ01000026.1 (positions:385984-472336), JQ994271, AJ278144.1 and AL031866 (positions: 78113-114560), respectively. These sequences were used to perform BLASTn searches against each genome. By default, absence was defined as coverage and/or identity <70%. Since LAA is composed of four modules [10], a strain was considered positive for this PAI when at least three of its modules were identified. Identification and characterization of genomic regions with features of PAIs and/or ICEs were performed using REPuter [28], ISfinder [29] and tRNAscan-SE [30]. Open reading frames (ORFs) and the G+C content were determined using Geneious software. Contigs assembly (mapping) against a reference sequence was performed in Geneious software. The genomic regions corresponding to new PAIs and ICEs were extracted from the original genome sequences available in GenBank and then annotated using the RAST server [31] (Files S1-3). These genomic regions were searched in the Islander [32] and PAIDB [33] databases but none of the homologous sequences exceeded 20% of consultation coverage. The association network (based on a presence/ absence matrix) showing the co-occurrence or mutually exclusive patterns in the distribution of PAIs was drawn using igraph package [34] in R [35]. For the presence/absence matrix, absence was defined as coverage and/or identity <70%. Comparisons between ICE sequences were performed using progres-siveMauve [36].

Construction of isogenic mutants
The LAA PAI was inactivated in the STEC E045-00 and V07-4-4 genomes by allelic replacement, as previously described [37]. Both strains have the complete set of LAA's genes compared to the prototype LAA B2F1 and the deletion comprised approximately 86 kb. For this, the E045-00 and V07-4-4 strains carrying plasmid pKD46, encoding the lambda-derived Red recombination system, were transformed with a PCR fragment containing the kanamycin resistance (Kan) gene flanked by 49 and 53 nucleotides identical to the 3 ′ends of the pheV-tRNA and the LAA PAI (Table S9). The kanamycin-resistant clones were analyzed by PCR to verify the allelic exchange at the right locus using an internal primer to the Kan gene (Kan_rev) and another flanking LAA (LAA_conf_for).

Two-dimensional polyacrylamide gel electrophoresis
Outer membrane protein (OMP) extracts were obtained and analyzed by 2D-PAGE as described [22]. Selected spot proteins were cut from the polyacrylamide gels and identified by MALDI-TOF/TOF at the Mass Spectrometry Core at the University of Texas Medical Branch (Galveston, Texas).

Bacterial adhesion and biofilm formation assays
Bacterial adhesion to Caco-2 and HT-29 epithelial cells and biofilm formation were measured as previously described [10].

Mouse infection studies
All animal experiments were carried out at the Universidad de Concepción, following protocols approved by the Faculty of Biological Sciences Bioethics Committee. Male BALB/c mice (6-8 weeks old; from Instituto de Salud Pública, Santiago, Chile) were randomly distributed into experimental groups. Mice were kept in conventional animal facilities and received water and food ad libitum. The STEC infection mouse model was carried out as previously described [38], with minor modifications. Briefly, mice were given water ad libitum containing streptomycin (5 g/l) 24-48 h prior to inoculation and for the duration of the experiment. Bacterial strains were grown in agitated LB supplemented with streptomycin (50 µg/ml) at 37°C for 18 h. Cultures were centrifuged, the pellets were washed in sterile PBS and resuspended in a 20% sucrose (w/v) and 10% NaHCO 3 (w/v) solution in sterile water to ∼10 10 colony forming units (cfu)/ml. Bacterial concentration was confirmed by serial dilution and cfu counts in LB agar plates.
Mice were then starved of food and water overnight (12 h). The next morning, each animal was orally infected by pipette feeding with 100 µl of a bacterial suspension containing 10 9 cfu/ml. Following challenge, animals were housed individually, and food and water supplemented with streptomycin (5 g/l) was reintroduced and provided ad libitum. Intestinal colonization by the challenging bacteria was determined by bacterial shedding in feces over time. Feces were collected daily, weighed, homogenized, suspended in 1 ml PBS and, after serial dilutions in PBS, plated on MacConkey agar plates supplemented with streptomycin (50 µg/ml) to determine the cfu. For the survival experiments mice were handled and infected as described for the colonization experiments' however, the cfu in feces over time was not recorded.

Phylogeny, phylogroups, molecular typing and Bayesian analysis of population structure (BAPS)
A parsimony phylogenetic tree based on core single nucleotide polymorphisms (SNPs) of 367 complete or draft genomic sequences of LEE-negative STEC, and E. coli K-12 MG1655, was built with kSNP3.1 [39], using a 20-core Dell Poweredge R730 server. Briefly, SNPs flanked by stretches of six nucleotides on each side were identified using k-mers of 13-nucleotides, screened in all the genomes, counted and tabulated into a matrix, in order to build a phylogenetic tree with the parsimony method. A total of 3,956 SNPs were considered for the final tree, which represents the consensus of 100 bootstrap replicates. The tree was further processed with the Interactive Tree of Life tool [40]. LEE-negative STEC population structure was delineated with RhierBAPS [41] using the 3,956 core SNPs. For this, a 3 depth levels and a maximum clustering size of 70 (default = number of isolates/5; 367/5 = 73,4) were specified. Phylogroup assignment was conducted in silico using the ClermonTyping Tool [42]. STs of each strain were determined according to the Achtman scheme [43] using MLST 1.8 tool [44].

Statistical analysis
Differences in the distribution of virulence genes between isolation sources were analyzed using Fisher's Exact test (two-tailed). Pairwise association (co-occurrence or mutual exclusivity) between PAIs, ICEs and toxins was performed in contingency tables by Odds ratios. The statistical significance of these associations was determined using Pearson's chi-squared test or Fisher's exact test (when frequencies were less than 5). When any of the cell values of the contingency table was zero, 0.5 was added to all cells (Haldane correction) to avoid errors in the statistical test. Differences in bacterial colonization and survival rates between groups of mice were analyzed using Mann-Whitney U Test and log-rank test, respectively. Differences between the genome sizes were analyzed using Kruskal-Wallis test followed by Dunn's multiple comparison test. For all statistical tests, a P value of <0.05 was considered significant.
The presence/absence of stx genes and other virulence genes harbored by these strains was determined using VirulenceFinder tool [26] and the BLASTn algorithm (Table S1 and S3). The frequency of detection for each gene is presented in Table 1. Among the strains (n = 367), 266 (72.5%) contained only one stx gene and 101 (27.5%) had more than one (Table S4). The most frequent combinations of stx genes were stx 1a + stx 2a (32/ 367; 8.7%) and stx 1c + stx 2b (30/367; 8.2%). The stx 2b gene was found in a statistically greater proportion of human strains compared to animal strains. In contrast, stx 1d , stx 2a and stx 2e were found in a greater proportion of animal strains compared to human strains. Additionally, the virulence genes senB, eibG, ag43-I, tia and sigA were significantly more frequent in human strains, whereas sta1, saa, hra1, lpfA, lesP and espP were significantly more often found in animal strains.
A previous study reported the presence of genes encoding bacteriocins among LEE-negative STEC strains isolated from humans [11]. Bacteriocins are antimicrobial peptides produced by bacteria that are active against closely related strains [48]. E. coli produces two types of bacteriocins, classified by their molecular weight as colicins (25-80 kDa) and microcins (<10 kDa) [48]. Notably, we found that the mch cluster involved in the synthesis of microcins was more frequently detected in strains isolated from humans compared to strains isolated from other sources (Table 1). Whether the production of microcins improves the fitness of these bacteria during human infection remains to be investigated. Altogether, these data show that LEE-negative STEC strains contain several genes that potentially provide phenotypes associated with adhesion, colonization and toxicity. While different combinations of some of these genes may contribute to virulence during human infection, others may be necessary for persistence in animal reservoirs.

Identification of novel Pathogenicity Islands and Integrative and Conjugative Elements
PAIs contribute to the evolution and virulence of pathogenic E. coli strains [49]. Therefore, we aimed to determine the distribution of the SE-PAI, LPA and LAA islands. We also searched for the HPI of Yersinia pestis, which is widely present among many Enterobacteriaceae, including E. coli [50]. Several of the virulence genes highlighted in this study have been previously mapped to known PAIs, such as tia and subA in SE-PAI; iha and espI in LPA; and hes, iha, lesP and ag43-I in LAA. Interestingly, BLASTn searches using the complete nucleotide sequences of the PAIs showed that tia is also present in other regions of the chromosome of strains that lack SE-PAI. Further analysis on the genomic context of tia revealed two previously uncharacterized genomic regions which contained this gene and that exhibit features commonly found in PAIs (Figure 1).
One of these PAIs, found in contig 13 of the draft genome of STEC O174:H21 MOD1-EC1633 (GenBank accession: NJVC01000013), is a 48-kb genomic region located next to the selC-tRNA gene (Figure 1(a)). Among other characteristics it includes: (i) an integrase-encoding gene (ORF1) localized downstream of the selC-tRNA gene; (ii) it is flanked by 23-bp imperfect direct repeats (DR), corresponding to the 3 ′ end of the selC-tRNA gene; (iii) a lower GC content than average for the MOD1-EC1633 chromosome; (iv) several insertion sequences (IS); and (v) various virulence-associated genes, such as tia (ORF4), which has been involved in invasion of LEE-negative STEC strains [51], as well as a group of genes (ORFs 14-21) encoding a member of the contact-dependent growth inhibition (CDI) system [52]. Specifically, ORF20 encodes a protein sharing 89.3% similarity with the prototypic CdiA protein (Gen-Bank protein access: AAZ57198.1) of E. coli EC93 [53], while ORF21 encodes a CdiB homolog, a β-barrel protein which exports CdiA through the outer membrane. Since this genomic region contained the tia gene related to bacterial invasion and encoded a CDI system, it was named Locus of Invasion and Contactdependent Growth Inhibition (LIC).
The second PAI was identified in contig 57 of STEC O22:H8 MOD1-EC3763 draft genome (GenBank access: NJSE01000057.1); nevertheless, this contig is only 13-kb and not thought to contain the whole PAI sequence. This genomic region was not found in other LEE-negative STEC strains within a single contig, but an additional search in GenBank identified a sequence with 99.9% identity in contig 3 (214-kb) of the draft genome of E. coli FCH1 (GenBank accession: AYKD01000003.1). Further analyses revealed that the 13-kb fragment is part of a 71-kb PAI inserted in the pheV-tRNA gene of the FCH1 chromosome (Figure 1  (b)). The characteristics of each ORF in this region are shown in Table S6. Subsequently, the contigs of the MOD1-EC3763 strain were mapped against the Whole Genome Sequence (WGS) of strain FCH1 (see Methods), allowing us to identify 99.6% of the PAI (70,923 bp with 99.8% identity) within 7 ordered and concatenated contigs of the draft genome of strain MOD1-EC3763 ( Figure S2). The presence of the PAI structure was also determined in other STEC strains (see below). Interestingly, the FCH1 strain is a human isolate lacking stx genes [54], indicating that this PAI is not restricted to STEC. The characteristics identified in this genomic region include: (i) it is inserted / integrated next to the pheV-tRNA gene; (ii) an integrase-encoding gene (ORF1) located downstream of the pheV-tRNA gene; (iii) flanking 23-bp imperfect DR corresponding to the 3 ′ end of the pheV-tRNA gene; (iv) IS elements; and (v) several virulence genes, including tia (ORF3); ORF14 encoding a protein sharing 88.1% and 86.5% similarity with the prototypic CdiA protein and the CdiA homolog harbored by the LIC PAI, respectively; the vpe (ORFs 34-37) and deoK (ORFs 39-42) operons, which are also present in GIs of other E.coli pathotypes and promote intestinal colonization by pathogenic E. coli strains in murine infection models [55,56]; and ORF54, which encodes a protein sharing 87.6% and 79.8% similarity with Cah (GenBank access: AAG55356.1) and Ag43 (GenBank access: AUG16753.1), which participate in adhesion, autoaggregation and biofilm formation [27]. Moreover, we identified a group of genes (ORFs 17-33) sharing 91% identity with the OI#138 island (GenBank access: AE005571) of STEC O157:H7. The OI#138 has not been previously characterized, but its predicted function is the biosynthesis of fatty acids and polyketides [57]. Since this group of genes is related to adhesion and colonization, this genomic region was named Locus of Adhesion and Colonization (LAC).
The sigA gene was almost exclusively present in a group of clinical LEE-negative STEC strains (Table 1,  Table S1). This gene was originally reported in the she PAI of Shigella flexneri 2a [58] and later in two PAIs harbored by the STEC/EAEC O104:H4 strain that caused the 2011 HUS outbreak in Europe [59]. Considering these observations, we sought to determine whether this gene is harbored in a mobile genetic element (MGE). In silico analysis of the genomic context of sigA in the draft genome of STEC O117:H7 str.
FHI72 (GenBank assembly accession: GCA_000939255.1) showed that an integrase-encoding gene and a pheV-tRNA are located 20,168 and 20,442 bp upstream of the sigA start codon, respectively (Figure 2(a)). In addition, a 52-bp perfect DR corresponding to the 3 ′ end of the pheV-tRNA was also identified 66,066 bp downstream of the sigA start codon. These DR sequences flank an 86,482-bp genomic region inserted in the pheV-tRNA gene, which has a similar G+C content (51%) as the FHI72 chromosome. A total of 103 ORFs were identified, including genes involved in DNA conjugation and type IV pilus biogenesis (Table S7), features that are exhibited by integrative and conjugative elements (ICEs) [60]. Seven ICEs had been identified in E. coli according to the ICEberg database [61]. Consequently, based on the nomenclature proposed by Burrus et al. [62], we named this genomic region ICEEc8 and found that it is present in strains belonging to the serotypes O91: H14 (7/29; 24%) and O117:H7 (5/5; 100%). In addition to sigA (ORF23), another virulence gene carried by ICEEc8 of O117:H7 strains is iha (ORF52), which is absent in ICEEc8 harbored by O91:H14 strains that also has an inverted region of about 40-kb (Figure 2  (b,c)).
Additionally, we found patterns of association (cooccurrence and mutual exclusivity) among PAIs, the ICEEc8 and some toxin genes ( Figure 3). For instance, LAA, LIC and LAC, SE-PAI and LPA, as well as LPA and HPI, have a high co-occurrence probability (p < 0.005, p < 0.0005 and p < 0.05, respectively) ( Figure 3 (b)). In contrast, LAA is mutually excluding with SE-PAI, LPA and HPI (p < 0.0005). Notably, LAA was  Table S7. (b) Alignment between ICEs shown in (a) and (c). Alignment was performed using progressiveMauve [36]. Colored segments represent homologous regions. Non-colored areas represent unaligned sequences that may be genome-specific. Inverted regions are identified by boxes below the central line.
co-occurrent with stx 1a , stx 2a , stx 2c and stx 2d , which encode to the Stx subtypes responsible for the most severe cases of STEC infection [5]. Whether these associations provide selective advantages to particular strains or are linked to the genetic background of each strain is not clear.

Functional analysis of the LAA PAI
The high prevalence of LAA among LEE-negative STEC strains suggests that it may play an important biological role, but its contribution to the pathogenicity remains uncharacterized. Therefore, we generated a LAA deletion mutant of STEC O113:H21 E045-00 to address its role in different functional in vitro and in vivo assays.
The OMP profiles of E045-00 and its isogenic mutant ΔLAA, grown in static or agitated LB cultures at 37°C, were analyzed by 2D-PAGE (Figure 4a). Differentially synthesized OMP proteins were identified by mass spectrometry. The LAA-encoded Ag43 protein was synthesized during agitated growth, but not in static cultures. In contrast, the flagellar protein (FliC-H21) was present in samples from static cultures, but it was undetectable by 2D-PAGE in agitated culture samples. This opposite regulatory effect on the expression of FliC and Ag43 was previously reported [63]. Interestingly, in the absence of LAA, FliC-H21 was detected in similar quantities in extracts obtained from agitated and static cultures, suggesting the presence of a regulatory crosstalk between LAA and the flagellar genes that deserves future analysis. As expected, the OMP profiles of the ΔLAA mutant did not show spots corresponding to the NmpC and Ag43 proteins (both encoded in LAA). A single spot corresponded to the Tia and LAA-encoded Hes proteins. Both proteins belong to the Heat resistant agglutinin family (Hra), and share 65% identity in their amino acid sequence [10], with nearly identical isoelectric points and molecular masses. The E045-00 strain contains two copies of the tia gene, localized in the LIC and LAC PAIs (Table  S1). In the wt and ΔLAA strains, Tia was more abundant in the static culture samples; it remains to be determined if both tia genes are up-regulated under static growth. PAIs were identified (co-occurrence) is shown next to the links (dotted lines). The figure was prepared using the package igraph [34] in R [35]. (b) Pairwise association plot for PAIs, ICEs and toxins. Red squares represent negative associations (mutual exclusivity); Blue squares represent positively associations (co-occurrence). The color scale represents the magnitude of the association determined by Pearson's Chi-square test or Fisheŕs exact test (when frequencies were less than 5). The figure was prepared using the package corrplot [76] in R [35].
The OmpW protein which has been associated with protection against phagocytosis [64], was more abundant in the agitated culture. Based on these results, the subsequent functional experiments were performed by growing bacteria in agitated LB.
Next, we evaluated whether the deletion of LAA affected the adhesion and biofilm formation of E045-00. In vitro assays indicated that E045-00 and its isogenic mutant ΔLAA have similar levels of adhesion to Caco-2 and HT-29 cells after 30 min or 1 h of incubation. Furthermore, the biofilm formation of E045-00 at 24, 48 and 72 h were low and unaffected by the deletion of LAA (not shown).
Then the role of LAA in intestinal colonization was evaluated in a streptomycin-treated mouse infection model. Colonization levels of E045-00 and the ΔLAA mutant were significantly different at 2, 3, 6, 8, and 9 days post-infection, suggesting the involvement of LAA in this phenotype; however, no significant difference were observed at days 10-12 (Figure 4(b)). It is worth mentioning that E045-00 harbors the LAC PAI, where the vpe and deoK operons, which promote colonization of the murine intestine [55,56], are located. At this point, we can't rule out the possibility that genes contained within LAC or other PAIs may mask the lack of LAA.
To further evaluate the functionality of LAA in a different genetic background, we used the STEC O91: H21 V07-4-4 strain, from which a ΔLAA mutant was generated. As observed in E045-00, the deletion of LAA did not affect the adhesion to human epithelial cells and biofilm formation of V07-4-4 (not shown). The V07-4-4 produces Stx2d, the Stx subtype with the highest toxicity in the murine model [65]. Therefore, groups of streptomycin-treated mice were infected with the V07-4-4 or its ΔLAA mutant and survival of the mice was recorded. Mice infected with either V07-4-4 or its ΔLAA mutant showed signs of disease (lethargy and reduced food and water consumption) from day 2-8 after challenge. Two out of 10 (20%) and four out of 10 (40%) mice challenged with V07-4-4 or the ΔLAA mutant, respectively, survived until the end of the experiment (day 13). Nevertheless, a non-significant difference between the experimental groups was observed (p = 0.1621) in the survival curves (Figure 4(c)).

Phylogenetic and population structure analyses and molecular typing
We investigated the evolutionary history of the strains through phylogenetic analysis based on single nucleotide polymorphisms (SNPs) found in every genome (core SNPs). The maximum parsimony tree obtained is shown in Figure 5. Next, the population structure was assessed by a Bayesian clustering method (see methods), which identified seven sequence clusters (SCs; primary clusters) that were further subdivided  7) were stained with Coomassie blue G-250, and selected spots (proteins) were identified by MALDI-TOF/TOF mass spectrometry. The scale bars on the left indicate molecular weights in kDa. (b) Colonization of streptomycin-treated mice orally inoculated with 10 9 cfu of E045-00 SR or E045-00ΔLAA SR . Fecal pellets were collected daily, weighed, homogenized, and plated on MacConkey agar containing streptomycin. Data points are cfu/g of fecal sample collected from each mouse on the indicated day. Lines represent means. Differences in colonization levels for each day were analyzed using the Mann-Whitney U Test (*p < 0.05 **p < 0.005). (c) Survival rate of groups of ten streptomycin-treated mice orally inoculated with STEC O91:H21 V07-4-4 SR strain or its isogenic mutant ΔLAA SR . The difference in the survival rate between both groups was analyzed using the log-rank test. Statistical significance level was defined as p < 0.05. into 31 lineages (L1 to L31; BAPS level 2). Most of the lineages were monophyletic and included strains belonging to a specific serotype with some exceptions, as lineage 13, which included several serotypes and ancestors localized in different branches across the tree. This analysis also allowed the identification of several sub-lineages (BAPS level 3; see below). Phylogroups and Sequence Types (STs) were also determined. A total of one hundred and six different STs were identified, all of which have been previously described (Table S1). In general, the topology of the tree was consistent with the distribution of phylogroups, serotypes and STs, but not serogroups, which in some cases (e.g. O91, O113, O174) were separated into distant branches (Figures 5 and 6). A previous study has also shown the polyphyletic nature of STEC serogroups [10].
It has been reported that among LEE-positive STEC strains from different lineages may differ in their virulence potential [20,66,67]. Moreover, it has been suggested that hypervirulent lineages containing different arrays of virulence genes correlate with the incidence and severity of diseases associated with STEC [68,69]. Furthermore, the cumulative acquisition of PAIs in LEE-positive STEC strains has been correlated with an increased ability to cause severe disease in humans, suggesting an additive or synergistic effect [17]. Therefore, we hypothesize that genome plasticity Figure 5. Phylogenetic relationships and population structure of LEE-negative STEC strains analyzed in this study. The maximum parsimony phylogenetic tree (midpoint rooted) is based on 3,956 core SNPs identified in 367 complete or draft genomic sequences of LEE-negative STEC and E. coli K-12 MG1655. The tree was built using kSNP3.1 [39] and further processed with the Interactive Tree of Life tool [77]. Sequence clusters (SC; SC1 to SC7) are indicated in the outer colored ring, which are further divided into 31 lineages (inner ring). Clinically important serotypes are shown. In general, lineages contain strains belonging to a specific serotype. In the lineage 16, branches corresponding to O113:H21 serotype are delineate with a dotted line. may also play an important role in the appearance of LEE-negative STEC lineages with an increased virulence potential. Thus, we attempted to determine whether there are lineages with greater virulence potential among LEE-negative STEC, and, if so, their geographical distribution. With this aim, we mapped the distribution of virulence determinants and epidemiological data onto the maximum parsimony tree. For better visualization and understanding, we focused this analysis on clinically relevant serotypes (O91:H14, O91:H21, O113:H21, O128:H8, O146:H21 and O174: H21) ( Figure 6). Notably, we found concordance between the phylogeny and the virulence repertories, and, in some cases, with the geographical distribution.
The lineage 17 was associated with eleven strains, all belonging to the O128:H8 serotype, and separated into two sub-lineages ( Figure 6(b)). In both sub-lineages, 17-1(5/11) and 17-2 (6/11), the majority of strains were from Europe with one North American strain present in each group. Strains from sub-lineage 17-1 harbor SE-PAI (3/5) and LPA (5/5); most of the strains from sub-lineage 17-2 harbor these PAIs, but they also include HPI (6/6). With the exception of one strain (in 10 of 11) the mhc cluster was present. Other virulence genes were differentially distributed, stx 1c was present in all 17-2 strains and in two 17-1 sub-lineages. Moreover, stx 2b was associated with 5/6 strains in 17-2 and 4/5 strains in 17-1 (Table S1). Regarding ehxA and subA both were present in almost all strains of the lineage 17, with only one strain not harboring subA and two strains not harboring ehxA.

Genome size analysis
Bacteria show substantial variations in genome size, a feature that may be linked to their lifestyle. For instance, in some bacterial pathogens there is a tendency for a narrow host range and increased pathogenicity to be associated with a reduction in genome size (reductive genome evolution) [70]. Therefore, to gain further insights into the evolution of LEE-negative STEC virulence, we examined variation in genome sizes among these strains and other E. coli pathotypes (Table S10) Figure S3a, File S4). Furthermore, an interesting finding was that genomes from human-associated LEE-negative STEC strains were significantly (p = 0.0052) larger (average 5.21 Mb, range 4. 75-5.74 Mb) compared to those isolated from bovines (average 5.11 Mb, range 4.85-5.42 Mb), with genomes larger than 5.2 Mb almost exclusively corresponding to human isolates ( Figure S3b, File S5). These observations might reflect a higher degree of adaptability in LEE-negative STEC strains that are both capable of infecting humans and colonizing their animal reservoir.

Discussion
Using a combination of in silico analyses, taking advantage of the increasing data provided by massive sequencing, and experimental assays, this study provides new information regarding the evolution of virulence of LEE-negative STEC strains.
Stx is the main virulence factor of STEC; specifically, Stx1a, Stx2a and Stx2d subtypes are associated with severe disease [5]; nevertheless, none of these subtypes was prevalent in human strains compared to animal isolates (Table 1). Therefore, identification of the Stx subtype alone does not seem sufficient to predict the virulence potential of these strains. Previous studies agree with this finding [17].
While the varying configurations of virulence genes and genome sizes among LEE-negative STEC strains make it difficult to assess the virulence potential of a given strain, we found that genes such as hes, one of the most prevalent in LEE-negative STEC [10], may complement existing molecular risk evaluation schemes [71].
Additionally, we report the identification of three new MGEs that are distributed in different LEE-negative STEC strains. Among these, LIC and LAC PAIs contain the tia gene (Figure 1), an enterotoxigenic E. coli virulence factor [72] that has been previously associated with the presence of SE-PAI [9]. Thus, the report of two new PAIs also containing this gene is indicative that it has spread to and within STEC strains, rendering another example of a horizontal transfer event within E. coli pathotypes [73]. Other genes located in these PAIs, such as those encoding CDI systems, could provide competitive advantages against other bacteria. During the review of the present manuscript, Saile et al. [74] reported the identification of several genetic elements with characteristics of PAIs in a number of LEE-negative STEC strains. Among the genetic sequences reported in that study are LIC and partially LAC.
On the other hand, the identification of ICEEc8 containing sigA is an important finding (Figure 3), as SigA participates in infant rabbit intestinal colonization by the STEC/EAEC O104:H4 strain that caused the 2011 HUS outbreak in Europe [59]. It is noteworthy that a recent study reported the identification of the sigA gene in 36 STEC O117:H7 strains isolated in the United Kingdom [75], suggesting that they harbor ICEEc8. Thus, it is important for future studies to address the functional characterization of this novel MGE. This is the first large-scale study showing the accumulation and diverse distribution of GIs among LEE-negative STEC strains. Thus, together with plasmids and bacteriophages, PAIs and ICEs may play important roles in the evolution of the virulence of these pathogens. Including the results of this report, six PAIs and one ICE have been shown to be harbored by LEE-negative STEC; however, it is likely that there are more MGEs yet to be identified ( Figure S4).
It is worth highlighting that LAA was the most prevalent PAI among the strains (Table 1). In addition, our results suggest that LAA contributes to the intestinal colonization in streptomycin-treated mice, but its participation in virulence could not be statistically demonstrated, although a mild attenuation of the strain could be observed (Figure 4(c)). Thus, we consider that LAA's functionality deserves a more detailed investigation. Future studies, using reduced inoculums or determining the LD50, might complement these results and provide further information on the role of LAA in virulence in this model. In addition, the association of LAA with toxin genes that cause severe disease, such as stx 1a , stx 2a , stx 2d and cdtB (Figure 3(b)) could be useful in STEC surveillance.
Perhaps even more important is the epidemiologic and genomic evidence supporting the notion that there are LEE-negative STEC subpopulations that are globally distributed. Specifically, phylogenetic analyses revealed that serotypes O91:H14, O91:H21, O113:H21, O128:H8, O146:H21 and O174:H21 have sub-lineages with higher virulence potential, principally related to the Stx subtype and the number of PAIs they harbor ( Figure 6).
On the other hand, we identified that there is considerable variability in the genome size of LEE-negative STEC strains, as much as 1 Mb, and a tendency for larger genomes among human strains compared to those isolated from bovines. In other words, LEE-negative STEC strains harbor the repertoires of genes that allow them to live in the bovine reservoir while large genomes could reflect the gain of genes, PAIs, ICEs, plasmids and bacteriophages by which they are able to infect humans. This hypothesis could be examined in future studies with a more significant number of fully-sequenced (closed) genomes and epidemiological data.
In conclusion, the results described here support our hypothesis as to the role of PAIs play in the emergence of LEE-negative STEC strains. Therefore, this study further adds to our knowledge on the potential pathogenic mechanisms of LEE-negative STEC and lays down a base line for their epidemiological surveillance.