Complete genome analysis of Vibrio mimicus strain SCCF01, a highly virulent isolate from the freshwater catfish

ABSTRACT Vibrio mimicus is a foodborne pathogen, which is widely distributed in the aquatic environment. Moreover, it is often involved in aquatic animal diseases. In recent years, V. mimicus is an emerging pathogen in some species of Siluriformes. The strain SCCF01 was isolated from yellow catfish (Pelteobagrus fulvidraco). In this study, we aimed to perform genomic analysis of V. mimicus strain SCCF01 to identify genetic features and evolutionary relationships. Information on gene function and classification was obtained by functional annotation, and circular graph of strain SCCF01 genome, which was created by Circos v0.64. Information on virulence genes (adhesion, flagellum system, exotoxin, and secretory system, etc.) was obtained by virulence genes annotation. Genome element prediction showed that most of the mobile elements were distributed in chromosome I. Therefore, chromosome I of SCCF01 genome has more plasticity than chromosome II and might be larger in size. Genomic linear relationship between the strain of V. mimicus and strain SCCF01 was analyzed by linear pairwise comparison but was unable to determine the relationship. Gene family analysis predicted that the evolutionary direction of strain SCCF01 was: clinical strain → environmental strain → SCCF01 strain. Phylogenetic analysis showed that the strain SCCF01 was more closely related to environmental strains. According to gene family analysis and phylogenetic analysis, we speculated that strain SCCF01 has probably diverged from environmental strains.


Introduction
Vibrio mimicus (V. mimicus) was initially considered as an atypical V. cholerae, [1] which is closely related to Vibrio cholerae. V. mimicus is a widely distributed aquatic bacterium that can cause disease in humans and massive death of aquatic animals. It is a foodborne pathogen [2], which can cause gastroenteritis, diarrhea and food poisoning [3][4][5]. V. mimicus infecting was also common in aquaculture (can infect shrimp, crab, and fish) [6][7][8][9][10][11]. In recent years, V. mimicus is an emerging pathogen in some species of Siluriformes. The epidemiological features of V. mimicus showed short disease duration and high mortality rate, which eventually leads to substantial economic loss in Siluriformes farmhouses [9,10]. Since 2011, large-scale V. mimicus infectious outbreaks have occurred continuously in Siluriformes farms in China. V. mimicus strain SCCF01, which is a highly virulent strain isolated from yellow catfish (Pelteobagrus fulvidraco) in China, causes almost 100% mortality in yellow catfish [9,12].
At present, there are three complete genome sequences and eight draft genome sequences of V. mimicus available in Genbank genome database (five strains were isolated from human, five strains were isolated from the environment and only one strain SCCF01 was isolated from fish) (Supplementary Table  S1). Genome sequence of strain SCCF01 was the only complete genome sequence of V. mimicus from infected aquatic animals. Previous studies have shown that V. mimicus strain SCCF01 natural infection can cause high mortality rate in fishes [9], and is also a highly virulent in the artificial infection experiment [12,13]. However, the genetic features and evolutionary strategies of V. mimicus from fish remain unknown.

Materials and methods
Sources of strain and genome sequences V. mimicus strain SCCF01, which was isolated from diseased yellow catfish (Pelteobagrus fulvidraco) at a commercial aquaculture site in Southwest China [9]. Challenges showed that bath immersion of strain SCCF01 (10 [6] CFU·mL −1 ) caused 100% mortality of yellow catfish. The whole-genome sequence of V. mimicus was obtained by single-molecule real-time (SMRT) sequencing using platform PacBio RS II [12]. The complete genomic sequences of SCCF01 have been deposited in GenBank under the accession numbers CP016383 (chromosome I) and CP016384 (chromosome II). The other genomic sequences which were used for comparative genome analysis were downloaded from the National Center for Biotechnology Information (NCBI) (Supplementary Table S1).

Comparative genome analysis
Collinearity analysis: Global collinearity was identified using Mummer v3.23 by genome-wide sequence comparisons (http://mummer.sourceforge.net/) [22], and LASTZ v1.03.54 (http://www.bx.psu.edu) [23] provides genomelocal sequence comparisons to determine the detailed collinearity (Translocation/Trans, Inversion/Inv and Trans +Inv) between two sequences. Gene family analysis: The pairwise alignments of the genome using BLAST [24] were performed to filter out untrustworthy results. Meanwhile, a gene family clustering table was constructed based on the results of alignment similarity by Hcluster-sg v 0.5.1 [25]. Phylogenetic analysis: The two phylogenetic gene trees (Vibrio mimicus species and Vibrio genus) were constructed based on locally collinear block searching in the sample and reference strains as previously reported [26]. The output file in HomBlocks alignments was input into RAxML [27] to construct the phylogenetic trees using the GTR+I + G model with a bootstrap value of 1,000. The phylogenetic trees were displayed and customized using Evolview (http://www.evolgenius.info/evolview/) [28].

Results and discussion
General genomic features of V. mimicus strain SCCF01 The genome of V. mimicus strain SCCF01 was sequenced using PacBio RS II with the P6-C4 Reagent Kit, which resulted in 35,089 pair polymerase reads with Read N50 of 12,135-bp and their characteristics are summarized in Table 1. After filtering, all reads were assembled into two circular chromosomes of 3,213,040 bp for Chromosome I and 1,272,975 bp for Chromosome II, with a G + C content of 46.61% and 45.88%, respectively (Supplementary Table S2). A total of 4,160 genes (4,018 CDSs and 140 RNA genes) were predicted in the SCCF01 genome by PGAAP ( Table 1). The strain SCCF01 genome encodes 18 rRNAs and 69 tRNAs. The predicted ORFs are further classified into COGs functional groups ( Figure 1) which summarized in Supplementary Table S3.

Virulence factors
Based on the VFDB database [18], we scanned the V. mimicus SCCF01 genomes for virulence-related  features. In total, we identified 107 putative orthologs involved in the production of any of the above virulence factors (summarized in Table 2). This reported dataset should be applied to the development of gene attenuated vaccine and can serve as the basis for future studies concerning interactions of V. mimicus strain SCCF01 and diseases. Here, we also compared the genomic virulence genes of V. mimicus SCCF01 with the clinical strain (ATCC33655) and environmental strain (ATCC33654). The three genomes analyzed shared 539 virulence genes in the core genome and the Specific virulence genes were 74 (SCCF01), 73 (ATCC33655) and 52 (ATCC33654) respectively (Supplementary Figure 2). Characteristics of common and specific virulence genes were summarized in Supplementary

Genome element prediction
Mobile elements including repetitive elements, genomic islands, and phages within genomes have driven bacterial horizontal gene transfer and evolution [29][30][31].
Repetitive elements (also known as repeated sequences) are repetitive multiple copies of DNA sequences that do not have transcriptional activity. According to their structure, repetitive elements can be divided into Interspersed Repeat Sequences (IRS) and Tandem Repeats Sequence (TRS). Repetitive elements within genomes play an important role in the evolutionary process [32]. SCCF01 genome was screened by online program RepeatMasker, and the results (Table 3) showed that the total IRS percentage of the strain SCCF01 genome in Chromosome I and Chromosome II was 21.79% and 13.75%, respectively. Similarly, the total TRS percentage of the strain SCCF01 genome in Chromosome I and Chromosome II was 7.30% and 8.30%, respectively. The percentage of total IRS in Chromosome I was greater than Chromosome II. However, the percentage of total TRS in Chromosome II was greater than Chromosome I. The IRS is derived from transposable elements (TEs), that are largely responsible for horizontal gene transfer [33]. The TRS can exhibit high-mutation rates [34]. Therefore, we speculated that chromosome I is responsible for structural variation and chromosome II is responsible for single nucleotide change in the evolutionary process of strain SCCF01. Genomic islands (GI) are large genomic regions that mediated horizontal gene transfer in bacteria [31]. There are 16 GIs predicted in the genome of the strain SCCF01 by IslandViewer 4 and localization of predicted GIs as shown in Figure 2. Interestingly, 15 predicted GIs were located in the Chromosome I, in this case only one predicted GI was located in Chromosome II. The results of genomic islands prediction indicated that Chromosome I was more likely than Chromosome II to acquire genes via horizontal gene transfer.
Prophage sequences of 11 V. mimicus strains were predicted by PHAST and the features are listed in Supplementary Table S4. Prophages prediction showed that the strains (SCCF01, ATCC33654, ATCC33655, SX-4, and VM573) contained intact prophage sequences and other strains contained incomplete or questionable prophage sequences. V. mimicus strain SCCF01 uniquely harbored two integrated prophages in the large chromosome (chromosome I) of strain SCCF01, and its CDSs sharing greater identity to the Vibrio phage 12B12 [GenBank: NC_021070.1] (Figure 3). However, no prophage sequences could be detected in chromosome II of strain SCCF01. Recently, many researches have shown that prophage can mediate horizontal gene transfer [35,36]. The results of prophage prediction indicated that chromosome I was capable of horizontal gene transferring by a prophage.
In general, the majority of mobile elements (repetitive elements, genomic islands, and prophage) were detected in chromosome I. These mobile elements are connected closely to horizontal gene transfer contribution to acquire  genes [33,[35][36][37]. Therefore, chromosome I of the SCCF01 genome has more plasticity than chromosome II and chromosome I might be enlarged in size.

Collinearity analysis
Herein, Mummer and LASTZ program were applied for a genome-wide collinearity analysis between the SCCF01 genome and standard strains (ATCC33654 and ATCC33655). Our analysis of the V. mimicus SCCF01 genome, suggested that the evolution of the strain SCCF01 genome structure is marked by interchromosomal rearrangements (Figure 4). Structural variation (translocation, inversion, deletion, insertion, and complexindel) is shown in Supplementary Figure 1. Genome-wide collinearity relation and detailed structural variation can be visualized. However, it is difficult to confirm evolutionary relationships.  Note: the upper axis and lower axis are standard strain genome, the medial axis is SCCF01 strain genome, sense strand is shown in the yellow box, antisense strand is shown in blue line, color depth in the box shows the similarity of alignment, full-fit indicates 100% similarity.

Gene family analysis
A gene family is a set of several similar genes, formed by duplication of a single original gene. The statistics of gene family numbers were obtained according to the cluster of orthologous group based on protein sequences of SCCF01, ATCC33654, ATCC33655, VM223, and MB-451. Gene family analysis showed that the number (genes number, genes in families, unclustered genes, family number, and unique families) of strain SCCF01 were calculated more than other strains (Table 4). In addition, the number (genes number, genes in families, unclustered genes, family number, and unique families) of environmental strains (ATCC33654 and VM223) were computed more than clinical strain (ATCC33655 and MB-451). The gene gain/loss events might have occurred during the evolution of the genus Vibrio, and the gene gain events in the evolutionary process of V. mimicus and V. cholerae were more frequent than the gene loss events [38]. Therefore, gene family analysis predicted that the evolutionary direction of strain SCCF01 was clinical strain → environmental strain → SCCF01 strain.

Phylogenetic analysis
In order to determine the phylogenetic relationship of the SCCF01, genome tree analysis was performed from V. mimicus species ( Figure 5) and Vibrio genus (Supplementary Figure 2) based on locally collinear block searching. The phylogenetic tree from V. mimicus species ( Figure 5) showed that the isolates were roughly divided into two clusters: clinical V. mimicus (ATCC33655, SX-4, and MB-451) and environmental V. mimicus (CAIM1882, CAIM1883, ATCC33654, SCCF01, and VM223). The evolutionary relationships inferred by this tree suggest that SCCF01 is more closely related to the environmental isolate.  The phylogenetic tree from Vibrio genus (Supplementary Figure 3) showed that the strain SCCF01 was classified into the V. mimicus cluster and further proved that the strain SCCF01 was determined to be V. mimicus on the genome level.

Conclusions
First, Genome analysis of V. mimicus strain SCCF01 revealed common basic features. The information of virulence (adhesion, flagellum system, exotoxin, and secretory system) was obtained by virulence genes annotation and will be useful for the development of gene attenuated vaccine and pathogenesis study for this pathogen. Secondly, chromosome I of the SCCF01 genome has more plasticity than chromosome II and might be larger in size. Finally, we speculate that the strain SCCF01 has probably diverged from environmental strains based on gene family analysis and phylogenetic analysis.