Genomic epidemiology and heterogeneity of Providencia and their blaNDM-1-carrying plasmids

ABSTRACT Providencia as an opportunistic pathogen can cause serious infection, and moreover the emergence of multi-drug-resistant Providencia strains poses a potentially life-threatening risk to public health. However, a comprehensive genomic study to reveal the population structure and dissemination of Providencia is still lacking. In this study, we conducted a genomic epidemiology analysis on the 580 global sequenced Providencia isolates, including 257 ones sequenced in this study (42 ones were fully sequenced). We established a genome sequence-based species classification scheme for Providencia, redefining the conventional 11 Providencia species into seven genocomplexes that were further divided into 18 genospecies, providing an extensively updated reference for Providencia species discrimination based on the largest Providencia genome dataset to date. We then dissected the profile of antimicrobial resistance genes and the prevalence of multi-drug-resistant Providencia strains among these genocomplexes/genospecies, disclosing the presence of diverse and abundant antimicrobial resistance genes and high resistance ratios against multiple classes of drugs in Providencia. We further dissected the genetic basis for the spread of blaNDM-1 in Providencia. blaNDM-1 genes were mainly carried by five incompatible (Inc) groups of plasmids: IncC, IncW, IncpPROV114-NR, IncpCHS4.1-3, and IncpPrY2001, and the last three were newly designated in this study. By tracking the spread of blaNDM-1-carrying plasmids, IncC, IncpPROV114-NR, IncpCHS4.1-3, and IncpPrY2001 plasmids were found to be highly involved in parallel horizontal transfer or vertical clonal expansion of blaNDM-1 among Providencia. Overall, our study provided a comprehensive genomic view of species differentiation, antimicrobial resistance prevalence, and plasmid-mediated blaNDM-1 dissemination in Providencia.

Historically, the identification of certain Providencia species primarily relied on distinct biochemical reactions.These species are characterized by a positive reaction to phenylalanine deaminase and a negative reaction to lysine, ornithine decarboxylase, and arginine dihydrolase.Notably, the production of acid from D-mannose serves as an indicative trait.Within this context, P. rettgeri and P. stuartii are typically differentiated by their respective abilities to produce acid from trehalose and D-arabitol, with P. rettgeri uniquely producing acid from erythritol.However, for several conventional species, specific biochemical reactions are absent [14].At present, advancements in sequencing technology have promoted the use of 16S rDNA for identifying Providencia isolates, achieved through PCR amplification.However, this method has limitations, with high sequence similarities often resulting in misclassifications.Thus, a large-scale genomic investigation is important for the precise identification and classification of Providencia species, ensuring the accurate differentiation and understanding of each species within the context of their genomic profiles.
Providencia species are known to exhibit multi-drug resistance (MDR) due to their intrinsic resistance to penicillins and the first-generation cephalosporins [14], aminoglycosides [15], tetracyclines [16], and polymyxin [17].Moreover, extensive use and frequent misuse of antibiotics have led to the emergence of extended-spectrum β-lactamaseor even carbapenemase-producing Providencia strains, exacerbating the MDR problem in Providencia [18,19].It has been revealed that the frequent emergence of carbapenemase-producing Providencia strains is mainly due to the spread of bla NDM-1 gene [2,20].IncC plasmids that carry bla NDM-1 have been widely detected in P. stuartii from distrust Romanian [21] and Italy hospitals [22], and several additional Inc (incompatible) groups of bla NDM-1 -harbouring plasmids have been identified in Providencia [18,[23][24][25][26], indicating diverse types of plasmids contribute to the spread of bla NDM-1 in Providencia.However, a genomic view of the dissemination of bla NDM-1 -carrying plasmids among a large collection of Providencia strains is lacking.
In this study, we performed genome sequencing of 257 Chinese Providencia isolates (42 of them obtained complete genome sequences).Together with the 323 Providencia genomes available in GenBank, we used a total collection of the 580 global Providencia sequenced isolated for further genomic epidemiology analyses, aiming to establish a genome sequencebased classification scheme, disclose the prevalence of antimicrobial resistance and dissect the plasmidmediated bla NDM-1 dissemination in Providencia.

Bacterial isolates and identification
From 2010 to 2018, 22.798 bacteria were collected in over 200 hospitals in China by our group, most of which were identified as Klebsiella (31.31%),Escherichia coli (21.34%),Pseudomonas (14.51%),Acinetobacter (10.13%).Among these 22,798 isolates, 303 isolates were initially identified as Providencia using Vitek or mass spectrometry in hospitals.After eliminating 6 unsuccessfully cultured isolates and 27 ones without the Providencia 16S rDNA gene, 270 isolates were obtained.Bacterial genomic DNAs were then extracted using a Qiagen UltraClean Microbial DNA Isolation Kit, followed by sequencing by Illumina platform.13 isolates with poorly sequencing quality were subsequently excluded.257 Providencia isolates were finally enrolled in this study between 2010 and 2018, whose species were further identified by ANI analyzing their whole genome information (Figure S1).

Bacterial phenotypic resistance assay
The bacterial antimicrobial susceptibility was tested using BioMérieux VITEK 2 and interpreted as per the 2020 Clinical and Laboratory Standards Institute (CLSI) guidelines [27].The activity of class A/B/D carbapenemases in bacterial cell extracts was detected by a modified CarbaNP test [28].

Genomic DNA sequencing, sequence assembly, and annotation
All the 257 Chinese Providencia isolates were subjected to draft-genome sequencing using a pairedend library with an average insert size of 350 bp (ranging from 150 bp to 600 bp) on a HiSeq sequencer (Illumina, CA, USA).In addition, 42 (Table S1) of them were subjected to complete genome sequencing with a sheared DNA library with an average size of 15 kb (ranging from 10 kb to 20 kb) on a PacBio RSII sequencer (Pacific Biosciences, CA, USA).The quality control analysis of sequencing data was conducted using NanoPack [29] and FastQC (https:// www.bioinformatics.babraham.ac.uk/projects/fastqc/ ).Sequence assembly and annotation were performed as previously described [30][31][32].

Phylogenomic analysis and average nucleotide identity (ANI) analysis
The pairwise ANI values of the 580 global Providencia genomes were calculated using FastANI [33], and the ANI heatmap was generated by the python seaborn package.The Providencia genome sequences were aligned to the complete chromosome sequence (Gen-Bank accession number CP029736) of P. rettgeri AR_0082, and the core single nucleotide polymorphisms (SNPs) were identified by Mummer v3.2 (https://mummer.sourceforge.net/).All the SNPs in the repetitive DNA regions were identified and filtered by RepeatMasker (http://www.repeatmasker.org/).Based on the final core SNPs, the maximum-likelihood phylogenetic tree was constructed using RAxML [34] under the GTR model with a bootstrap iteration of 1000 and visualized using iTOL (https:// itol.embl.de).

Plasmid analysis
All the fully sequenced bla NDM-1 -carrying plasmids from GenBank (last accessed Dec 20, 2021) together with those determined in this study were used as the reference to assemble and align the draft sequences of the rest bla NDM-1 -carrying plasmids in the 580 global Providencia genomes using BLAST [35] and custom Perl scripts.The five Inc groups and core backbone rep (replication) and par (partition) genes were determined for all the bla NDM-1 -carrying plasmids in the 580 global Providencia genomes.To achieve high accuracy, the assembled draft plasmid sequences met the following three criteria [36,37]: the bla NDM-1 -embedded contigs had 100% query coverage and ≥99% identity with corresponding reference plasmids; the bla NDM-1 -carrying contigs and the repcarrying contigs of the same plasmid had similar sequencing depths; each draft plasmid sequence had ≥70% query coverage and ≥94% identity with the corresponding reference plasmids.The alignment rings of the five Inc groups were visualized by BRIG [38].

Statistical analysis
Statistical analyses were preformed using R package v.3.271(http://www.r-project.org).Chi-squared test or Fisher's exact test was used to determine the significant relationships between categorical variables.Wilcoxon test, a non-parametric test for the comparison of two groups, was adopted to determine the significant differences for the medians of two independent samples.

Data availability
The complete chromosome and plasmid sequences of the fully sequenced 42 isolates (including 33 bla NDM-1positive isolates and three bla VIM-4 -positive isolates) were submitted to GenBank with the accession numbers listed in Table S1.The 257-genome sequencing data have been uploaded to the SRA database under BioProject PRJNA1013098.The assembled genome sequences of the isolates were submitted to GenBank under BioProject PRJNA828374.

Genome sequence-based classification of Providencia into diversified genocomplexes and genospecies
The 257 Chinese Providencia isolates collected in this study (Figure 1 and Table S1) were determined with draft-genome sequences, among which 42 ones (including 33 bla NDM-1 -positive ones) were further determined with complete genome sequences.Together with the 323 Providencia genomes available in GenBank as of February 1st, 2022, a total collection of 580 global Providencia genomes from 32 countries were applied for further analysis, covering all the 11 conventional Providencia species (Figure 2).These 580 isolates include 328 P. rettgeri isolates, 137 P. stuartii isolates, 73 P. alcalifaciens isolates, 16 P. rustigianii isolates, eight P. heimbachae isolates, four P. vermicola isolates, three P. huaxiensis isolates, one P. sneebi isolate, one P. burhodogranaeriae isolate, one P. thailandensis isolate, and one P. friedericiana isolate.There were seven additional Providencia isolates that could not be classified into any of these conventional species due to their low genetic similarity to these species.
Compared to the Providencia isolates from other countries, the Chinese isolates presented significantly higher resistance gene ratios in C01.S01 (P-value = 0.0041) and C01.S02 (P-value = 0.0015) (Figure S3), indicating different resistance-gene prevalence profiles between China and other countries [22,39,40].Providencia strains harboured abundant antimicrobial (a) A maximum-likelihood phylogenetic tree based on the 13,499 core SNPs from the 580 genomes.Providencia AR_0082 was used as the reference chromosome for SNP calling.The confidence value in the major branch nodes were over 95%.The conventional classification was marked on the circle of the phylogenetic tree, and the typical reference isolates of the 11 conventional Providencia species was marked in asterisks with different colours.Seven primary phylogroups (i.e.genocomplexes) with significant phylogenetic distance were distinguished in different colours.(b) A heatmap of pairwise ANI of the 580 isolates.The pairwise ANI values of the 580 isolates range from 79.73% to 99.99%.The phylogenetic trees were used as the references and placed on the upper and right sides of the ANI heatmap.The order of isolates on the heatmap was corresponding to the isolate position on the phylogenetic tree.The genocomplexes and genospecies were cross-determined according to the topology of the phylogenetic tree and the ANI heatmap.resistance genes and might serve as reservoirs for resistance genes, exacerbating the spread of resistance genes among different bacteria.Two carbapenemase genes bla NDM-1 (33/257, 12.84%) and bla IMP (3/257, 1.17%) were identified in our 257 Chinese Providencia isolates, and their class B carbapenemase activities were further validated by CarbaNP and meropenem drug susceptibility tests (Table S1).In addition, three carbapenemase genes bla NDM-1 (82/580, 14.13%), bla IMP (38/580, 6.55%), and bla VIM (7/580, 1.21%) were identified in the 580 global Providencia genomes (Figure S2).Carbapenemase genes were detected in only C01 from China, while they were detected in C01 and C05 from other countries (Figure S3) [18,21,26,41].Overall, carbapenemase genes were detected at relatively low frequency and bla NDM-1 represented the major disseminated carbapenemase gene in Providencia.
By analysing the distribution of the same Inc group plasmids with the identical local bla NDM-1 genetic environment on the phylogenetic tree, we observed that IncC plasmids with Tn125-3, Inc pPROV114-NR plasmids with Tn125-3, Inc pPrY2001 plasmids with Tn125, and Inc pPrY2001 plasmids with Tn125-3 were contained in distinct lineages of C01.S01, C01.S02, and C01.S05, presenting the evidence of parallel horizontal transfer of various Tn125 derivatives into existing bla NDM-1 -Figure 6. Dissemination of bla NDM-1 -carrying plasmids in genospecies C01.S01, C01.S02, and C01.S05.A recombination-free core SNP-based phylogenetic tree was constructed using a collection of 116 Providencia isolates, which harboured IncC, Inc pCHS4.1-3, Inc pPrY2001 , Inc pPROV114-NR , and IncW plasmids (either carrying bla NDM-1 or not), from the three dominant genospecies C01.S01, C01.S02, and C01.S05.All the 116 genomes were aligned to the complete chromosome sequence of P. rettgeri AR_0082, and the 168,349 core SNPs were identified by Mummer v3.2.ClonalFrameML was used to infer and then remove recombination, finally identifying 28,582 recombination-free core SNPs.The PROV023 isolate from C02.S03 served as the outgroup to determine the evolutionary relationships/distances for C01.S01, C01.S02, and C01.S05.The confidence value in the major branch nodes were over 95%.The isolates fully sequenced in this study were marked with black asterisks.The first two columns, from left to right, showed the isolated hospitals and the isolated dates.The plasmid-carrying isolates were marked with the blue block or the red block (corresponding to presence of bla NDM-1 or not respectively) in the remaining five columns.The local bla NDM-1 genetic environments were shown on the left side and the homologous regions were marked in shadow.The lines were drawn between the plasmids and the local bla NDM-1 genetic environments, representing which genetic environment was located in each indicated plasmid.

Discussion
The conventional species classification measure based on 16S rDNA sequences causes the frequent misclassification due to high similarity of 16S rDNA sequences among different Providencia species [4,11,47,48].Although additional marker sequences such as housekeeping genes have been used to perform Providencia taxonomic assignment [47,49], the big problem is the insufficient genetic information for resolving the genetically diverse Providencia species.It is difficult to exactly assign a Providencia strain into the existing species even based on its genome sequence due to the comparatively longer distance to the conventional Providencia species [47].In this study, a genome sequence-based species classification scheme for Providencia based on a collection of 580 global Providencia genomes was established to redefine the 11 conventional Providencia species into seven primary genocomplexes and 18 secondary genospecies.Compared to previous genome studies of Providencia [13,47], this study enrolled the largest Providencia genome dataset to establish the species classification scheme, which provided an extensively updated reference for Providencia species identification and the improved accuracy of genome sequence-driven species discrimination.The combined use of phylogenetic distances and pairwise ANI values across inner-and inter-species would provide a paradigm strategy for microbial species classification.
As shown in this study, Providencia isolates especially those from the three prevalent genospecies C01.S02, C01.S05, and C05.S01 carried diverse and abundant antimicrobial resistance genes and displayed high resistance ratios against multiple classes of drugs.The rapid development of MDR in Providencia might be primarily due to the extensive use and abuse of antibiotics [39,50].It has been reported that there is a correlation between the increased colistin consumption and the increasing prevalence of drug resistant Providencia infections [39], and that the increased usage of colistin might have exerted significant pressure on the selection of Providencia in Romania [50].Providencia might serve as the reservoir of resistance genes, enabling the transfer of resistance genes within Providencia and to other bacteria [51,52].
Our study identified a ratio of 14.14% (82/580) of bla NDM-1 -positive Providencia isolates, indicating an emergence of challenge for clinical therapeutics caused by carbapenem resistance [18,21,53,54].We further analysed the genetic basis of dissemination of bla NDM-1 in Providencia.On the one hand, we identified five Inc groups of bla NDM-1 -carrying plasmids namely IncC [21], IncW [41,55], Inc pCHS4.1-3, Inc pPROV114-NR , and Inc pPrY2001 , among which the last three were newly designated.IncC plasmids are mostly prevalent among Providencia, which is consistent with the previous study [21,22].All these five Inc groups of bla NDM-1 -carrying plasmids contain the complete conjugation regions, which would accelerate the spread of bla NDM-1 within Providencia and to other bacteria [21,37,40,56,57].On the other hand, we observed the parallel horizontal transfer of local bla NDM genetic environments (manifesting as intact Tn125 and its various truncated derivatives) into the existing bla NDM -empty plasmids belonging to different Inc groups.In addition, our previous studies have reported the capture of local bla NDM genetic environments by the accessory genetic elements embedded within Providencia chromosomes [42,45].The exogenous integration of intact Tn125 is mostly likely due to transposon jumps, while that of its truncated derivatives should be driven by homologous recombination [58].
In summary, we conducted a genomic epidemiology analysis on the 580 global sequenced Providencia isolates, including 257 ones sequenced in this study.Firstly, a genome sequence-based species classification scheme for Providencia was established, and the conventional 11 Providencia species were redefined into seven primary genocomplexes and 18 secondary genospecies, providing an extensively updated reference for Providencia species discrimination based on the largest Providencia genome dataset to date.Secondly, the prevalence of drug susceptibility and antimicrobial resistance genes were screened among all the genocomplexes/genospecies, disclosing the presence of diverse and abundant antimicrobial resistance genes and high resistance ratios against multiple classes of drugs in Providencia.Finally, the distribution of bla NDM-1 -carrying plasmids and their local bla NDM-1 genetic environments was tracked among the three prevalent genospecies, revealing horizontal transfer or vertical clonal expansion of bla NDM-1 in Providencia driven by various Inc groups of plasmids.Data

Figure 1 .
Figure 1.Spatial-temporal distribution of our 257 Chinese Providencia isolates.(a) Distribution of the 257 isolates among 12 provinces of China.(b) Distribution of the 257 isolates among different years, provinces, and hospitals.The 37 hospitals (designated H1 to H36) were assigned to the 12 provinces with different colours.

Figure 2 .
Figure 2. Spatial-temporal distribution of the 580 global Providencia isolates.(a) Spatial distribution of the 580 global Providencia isolates.Different colours represent the number of isolates.The pie chart shows the percentages of our 257 Chinese Providencia isolates and additional 22 Chinese isolates from GenBank.(b) Distribution of the 580 global Providencia isolates in different years and countries.The heatmap displays the number of isolates in each year and country.The total numbers of isolates collected in a year or in a country were shown in the bar charts on the right or top of the figure, respectively.The countries located on the same continent were marked in the same colour at the bottom of the figure.

Figure 3 .
Figure 3. Genome sequence-based classification of the 580 global Providencia isolates.(a)A maximum-likelihood phylogenetic tree based on the 13,499 core SNPs from the 580 genomes.Providencia AR_0082 was used as the reference chromosome for SNP calling.The confidence value in the major branch nodes were over 95%.The conventional classification was marked on the circle of the phylogenetic tree, and the typical reference isolates of the 11 conventional Providencia species was marked in asterisks with different colours.Seven primary phylogroups (i.e.genocomplexes) with significant phylogenetic distance were distinguished in different colours.(b) A heatmap of pairwise ANI of the 580 isolates.The pairwise ANI values of the 580 isolates range from 79.73% to 99.99%.The phylogenetic trees were used as the references and placed on the upper and right sides of the ANI heatmap.The order of isolates on the heatmap was corresponding to the isolate position on the phylogenetic tree.The genocomplexes and genospecies were cross-determined according to the topology of the phylogenetic tree and the ANI heatmap.

Figure 4 .
Figure 4. Prevalence of the 580 global Providencia isolates in different genospecies.(a) The prevalence of isolates in the major geographic regions.The links on the circle represented the composition of the isolates according to geographic region or genospecies.The edge size on the circle corresponds to the number of isolates.(b) The percentage of isolates collected from different geographic regions in each genospecies.The numbers on the y-axis represent the percentage values of the isolates from different geographic regions.(c) The percentage of isolates from different genospecies in each geographic region.The numbers on the xaxis represent the percentage values.

Figure 5 .
Figure 5. Prevalence of antimicrobial resistance genes in the 580 global Providencia isolates.(a) Shown was the rectangular format of the core SNP-based phylogenetic tree in Figure 3.The antimicrobial resistance genes harboured in the isolates were marked with blue blocks in the columns.(b) Bar charts showed the detection ratios of antimicrobial resistance genes in different genospecies.The corresponding relationships between drugs and resistance genes were shown.

Table 1 .
Designation of genocomplexes and genospecies in Providencia.
*Designated in this study.#Collectedand fully sequenced in this study.$ Complete genome sequence.