Genetic diversification of persistent Mycobacterium abscessus within cystic fibrosis patients

ABSTRACT Mycobacterium (M.) abscessus infections in Cystic Fibrosis (CF) patients cause a deterioration of lung function. Treatment of these multidrug-resistant pathogens is associated with severe side-effects, while frequently unsuccessful. Insight on M. abscessus genomic evolvement during chronic lung infection would be beneficial for improving treatment strategies. A longitudinal study enrolling 42 CF patients was performed at a CF center in Berlin, Germany, to elaborate phylogeny and genomic diversification of in-patient M. abscessus. Eleven of the 42 CF patients were infected with M. abscessus. Five of these 11 patients were infected with global human-transmissible M. abscessus cluster strains. Phylogenetic analysis of 88 genomes from isolates of the 11 patients excluded occurrence of M. abscessus transmission among members of the study group. Genome sequencing and variant analysis of 30 isolates from 11 serial respiratory samples collected over 4.5 years from a chronically infected patient demonstrated accumulation of gene mutations. In total, 53 genes exhibiting non-synonymous variations were identified. Enrichment analysis emphasized genes involved in synthesis of glycopeptidolipids, genes from the embABC (arabinosyltransferase) operon, betA (glucose-methanol-choline oxidoreductase) and choD (cholesterol oxidase). Genetic diversity evolved in a variety of virulence- and resistance-associated genes. The strategy of M. abscessus populations in chronic lung infection is not clonal expansion of dominant variants, but to sustain simultaneously a wide range of genetic variants facilitating adaptation of the population to changing living conditions in the lung. Genomic diversification during chronic infection requires increased attention when new control strategies against M. abscessus infections are explored.


Introduction
More than 30 species of nontuberculous mycobacteria (NTM) are known to cause infections in humans [1], mostly in persons with immunodeficiency or underlying diseases such as Cystic Fibrosis (CF) [2]. NTM prevalence in CF populations is increasing [3], which can in part be attributed to medical progress resulting in extended life expectancy of CF patients. Particularly, infections with Mycobacterium abscessus (MABS), also named Mycobacteroides abscessus, pose a threat to CF patients as they frequently lead to lung function decrease [4] and pose a risk in lung transplantation [5]. The environment has generally been assumed to be the source of NTM infection.
However, recent findings show that the majority of MABS in CF patients belong to three phylogenetic clusters [6] suggesting human-to-human transmission as an additional potential infection route. MABS is characterized by its extreme resistance toward antibiotics [7], which necessitates long-lasting combination therapy. Despite such protracted therapy causing severe side effects, low culture conversion rates of typically 40-50% have been reported [8]. Because of the serious side effects of the antibiotic therapies, the balance between their harm and benefit for patients must be carefully weighed. Although information on genomic diversity of MABS is now available [6], prognostic genomic markers for disease progression have not yet been identified. A study by Shaw et al. (2019) has described the genetic diversity of MABS isolated from different body sites of two CF patients shortly (up to 59 days) after lung transplantation [9]. Very recently, Bryant et al. (2021) investigated MABS evolution and proposed a model for pathogenic evolution of MABS [10]. Apart from these studies, knowledge on in-patient evolution of MABS during chronicity is still scarce. Using respiratory samples from CF patients from a German CF treatment center, the focus of the present study lies on characterizing the population structure of MABS in CF patients and analyzing genetic variation of MABS evolving within patients during long-term chronic infection.

Isolation and identification of NTM from Cystic Fibrosis patients
The 42 patients involved in the study were recruited from the CF Center at the Charité-Universitätsmedizin Berlin in Germany from 2013 to 2018. Within this period, the center treated 16 CF patients with NTM-PD. Patient characteristics are summarized in Supplementary Table S1. Respiratory samples (sputum, BAL) were taken in the CF Center, if a patient had an unclear decline in pulmonary function tests that did not respond to bacteria-targeted antibiotics or during the annual checkup. NTM were isolated from these samples by the Robert Koch Institute. Permission for the study was obtained from the ethics committee of the Charité -Universitätsmedizin Berlin (EA2/093/12). Written consent had been obtained from all patients.
Isolation of NTM from sputum or BAL using Nalc/ NaOH was performed as described in [11]. NTM colonies isolated from respiratory samples were purified at least twice by spreading single colonies on agar plates. NTM species were determined by PCR (DreamTaq DNA Polymerase, Thermo Fisher Scientific) and sequencing (ABI BigDyeTM 3.1, Thermo Fisher Scientific) of 16S rDNA and/or ITS. Primers used are listed in Supplementary Table S2.

Determination of MIC
Minimal inhibitory concentrations (MIC) were determined using the Sensititre TM plates (TREK Diagnostic Systems, ThermoFisher Scientific) according to the instructions of the provider. Interpretation of MIC values followed the CLSI guidelines [12], M62.

Whole genome sequencing
For Illumina sequencing, paired-end DNA libraries were constructed using the Nextera XT DNA kit (Illumina, San Diego, CA, USA) according to the man-ufacturer´s protocol. The pooled library was prepared as recommended by the Illumina HiSeq v3 reagent preparation guide and loaded onto a cartridge (V3 chemistry) generating a 300 bp paired-end output. MinION one-dimensional (1D) libraries were constructed, using the SQK-RBK004 kit (Nanopore technologies, Oxford, UK), and loaded according to the manufacturer's instructions onto an R9. 4 [6]. For quality control of NGS data, the in-house pipeline QCumber ( v 2 . 1 . 1 ) ( h t t p s : / / g i t l a b . c o m / RKIBioinformaticsPipelines/QCumber) was used. QCumber employs the software tool Trimmomatic [13], which was used for Illumina adapter removal.
All draft genomes were annotated using Prokka [14]. The determination of the maximum common genome (MCG) alignment was done by identifying the genes present in all genomes [15]. Coding sequences were clustered based on the parameters sequence similarity (min. 70%) and coverage (min. 90%) and defined the 2,085 genes that were present in each genome while fulfilling the threshold parameters as MCG. Next the allelic variants of these genes were extracted from all genomes by a BLAST-based approach, aligned individually for each gene and then concatenated, which resulted in an alignment of 2.089 Mbp. This alignment was used to calculate a maximum likelihood-based phylogeny with RAxML v.8.2.10 with 100 bootstraps under the assumption of the gtr-gamma DNA substitution model [16]. ClonalFrameML v1.11 [17] was used to correct for recombination events. The phylogenetic tree was visualized together with the distribution of accessory genes using phandango [18].
Interpretation of SNP distances with respect to probability of patient-to-patient transmission was done according to the proposition from Bryant et al., who had proposed less than 20 SNPs to indicate probable recent transmission of MABS, 20-38 SNPs possible recent transmission and more than 38 SNPs no recent transmission [6]. Further confirmation was obtained by comparing the accessory genomes of the isolates.
For gene variation analysis, the genome sequences of 30 MABSa isolates from one chronically infected patient collected in the years 2013 to 2017 were used to extract non-synonymous small nucleotide variants (nsSNV). The MinION sequence data together with Illumina data from an isolate originating from the first sample from this patient were used as a reference to identify nsSNVs in all other isolates compared to this genome sequence. To this end the MinION fast5 output files were demultiplexed with Deepbinner (v.0.2.0) [19]. Basecalling and barcode trimming was performed using Guppy (v.3.1.5) (Community.nanoporetech.com). The read quality was checked using pycoQC (v.2.3.1.2) [20], followed by (1) denovo assembly of the initial sample, (2) reordering contigs against MABSa ATCC 19977 reference sequence, (3) mapping of the remaining samples from this patient against the final assembly, (4) gene annotation, (5) multi sample variant calling using the initial sample as starting point, (6) SNV and Indel filtering and (7) variant annotation. The assembly was performed using Unicycler (v0.4.7) [21] and reordered with progressiveMauve (v2.4.0) [22] against the MABSa reference sequence. Afterward the remaining samples of the patient were mapped against the assembly using BWA (v0.7.15-r1140). Genes were annotated using prokka (v1.13.3). A multi sample variant calling was performed using GATK (v.4.1.2.0). SNVs and Indels were filtered according to GATKs best practice recommendations with few alterations. For SNVs the following filter was set: QD<2.0 ||

Predominance of MABS among isolated NTM
NTM were isolated from 16 of the 42 patients (Supplementary Table S1). MABS was the most frequently isolated NTM species (11 patients) followed by M. avium hominissuis (MAH) (six patients), M. intracellulare (three patients) and M. chimaera (one patient). Two patients were co-infected by MABS and MAH, one patient by MABS and M. intracellulare, one patient by MAH and M. intracellulare and one patient by MABS, MAH and M. intracellulare. 88 isolates from the 11 patients infected with MABS were further investigated. Seven of these patients were infected with the subspecies MABSa, three with the subspecies MABSm and one patient with the subspecies MABSb. One patient had a double infection with MABSa and MABSb. 43 isolates had a rough, 43 a smooth colony morphology and two were mixed. An overview on isolates used for whole genome sequencing including subspecies, colony morphology, isolation year and accession numbers is presented in Supplementary Table S3.

Absence of patient-to-patient transmission of MABS
Illumina NGS data were used for phylogenetic analysis of all 88 isolates from 34 respiratory samples of 11 MABSinfected patients (Supplementary Table S3). Figure 1 shows a Maximum Likelihood Tree corrected for recombination events based on 2085 core genome genes identified in the genomes of the 88 isolates and the reference strains.
The principle tree structure was confirmed by a Maximum Likelihood Tree based on the pan genome (Supplementary Figure S1).
The phylogenetic tree shows the presence of wellseparated MABS clusters each belonging to one patient (A, B, C, H, K, S) as well as the presence of highly similar isolates belonging to different patients (E, F and D, G, O). One representative of each global MABS cluster [6] was included in the phylogenetic tree. While none of the patient isolates formed a cluster with the MABSm global cluster strain BIR 948, isolates from patients E and F clustered with the global MABSa cluster strain RVI 21, whereas isolates D, G and O clustered with global cluster strain BIR 1034. Within MABSa strains up to 32 SNPs between different isolates belonging to one patient were observed in the core genome. Isolates belonging to different MABSa strains within the same global cluster exhibited SNP distances of at least 91 SNPs (cluster D/G/O/BIR 1034) or at least 78 SNPs (cluster E/F/RVI 21), which according to a proposition from Bryant et al. [6] argues against transmission of MABS among patients of the  treatment center. Absence of patient-to-patient transmission was further confirmed by considerable differences in the accessory genomes ( Figure 1).

Genetic diversity evolving during persistent MABSa infection
The longest tracking time and highest sample numbers were obtained for patient C. We chose isolates from this patient for analysis of in-patient diversity of MABS. The first respiratory sample obtained from this patient was taken in the month after diagnosis of MABS infection, followed by samples after five, 10 Figure 1 and Supplementary Figure S1). The Maximum Likelihood Tree in Figure 1 shows that these isolates form a cluster separated from the isolates from the other patients. One of the isolates obtained from the first sample (isolate 09-13-3) was additionally sequenced by MinION technology to complete the genome and provide a baseline reference for further analysis. Supplementary Table S4 summarizes the statistics for genome sequence assembly of the completed genome from isolate 09-13-3. The genome from isolate 09-13-3 comprised a chromosome of a size of 4.949.160 bp and a plasmid of 24.978 bp. This plasmid (pMabs-09-13, plasmid map in Supplementary Figure S2 A) exhibits an identity of 99.86% (with a coverage of 100%) to plasmids pGD42-1 (accession CP065281.1), pGD69A-1 (accession CP065270.1), and pGD69B-1 (accession CP065267.1) described by Dedrick et al. [24]. Supplementary Figure S2 B shows an alignment of plasmids pMabs-09-13 and pGD42-1.
SNV calling using the MinION genome sequence from 09-13-3 as baseline was applied to identify gene variations occurring in this patient over the 4.5-years monitoring period. Only non-synonymous (ns) SNVs and larger deletions as well as mutations in the rRNA genes were further considered. Fifty-three genes exhibiting non-synonymous mutations with respect to isolate 09-13-3 were identified. These genes and the affected isolates are listed in Table 2.
A visualization of the frequency and chronology of chromosomal gene mutations occurring in 29 isolates compared to the reference 09-13-3 is provided in Figure 2. The number of genes with non-synonymous mutations was higher from 18 months after MABS diagnosis compared to the period before. Occurrence of dominating clones exhibiting and sustaining specific combinations of mutations was not observed during the observation time period. However, a clear trend was observed with respect to the loss of the plasmid pMabs-09-13 over time. The strain 09-13-3 originally contained a plasmid of a size of 24.978 bp (Supplementary Figure S2). This plasmid was lost from the MABS strain in the course of chronic infection. Specifically, all isolates contained the plasmid until ten months after MABS diagnosis. However, it remained absent in all isolates sampled at and after 38 months.
Network analysis with the set of mutated genes by STRING (https://string-db.org/) identified ten functionally enriched PFAM protein domains (Table 3).
Functionally enriched genes comprised a cluster involved in cell wall synthesis and resistance toward ethambutol (embA and embC) and a cluster involved in synthesis of glycopeptidolipids (GPL) (eryA, mps2 and mps1). Also, MAB_4690c belonging to a second GPL-like gene cluster from MABS [50] was among the functionally enriched genes as well as MAB_2255 and MAB_2256, which also encode a probable non-ribosomal peptide synthetase and a probable polyketide synthase. Furthermore, betA (probable glucose-methanol-choline oxidoreductase) and choD (putative cholesterol oxidase) were among enriched genes.
The mutations in the GPL synthesis genes [53] can explain the rough morphotype of all rough isolates. Confirmation of the impact of mutations in GPL synthesis genes on GLP composition was obtained by thin layer chromatography (TLC) with GPL extracted from a smooth and four rough isolates exhibiting different types of mutations (Supplementary Figure S3).
Comparison of MICs from ten rough and ten smooth paired isolates originating from the same respiratory samples showed different median MICs for Amikacin, Cefoxitin, Imipenem and Linezolid. Interestingly, rough colonies showed higher median MIC values to three of these antibiotics (Amikacin, Cefoxitin and Imipenem), while they exhibited a lower median MIC to one of them (Linezolid) (Figure 3). Additional information on MICs of MABS isolated from patient C is provided in Supplementary Table S5.

Diversification in virulence-and resistance associated genes during chronic lung infection
Out of the 53 genes that had developed nonsynonymous gene variations, at least 17 genes or their homologs in other mycobacteria were published to be involved in mycobacterial virulence (Table 2). These genes or their homologs exert an impact (i) on survival in the host or host cells [MAB_0033c (pknB), Table 1 Chronology of M. abscessus isolation from respiratory samples from patient C, clinical disease parameters, and antibiotic treatment.  Zn acquisition [42] (Continued ) Virulence [51] (Continued )     Table 2. Thirteen of the 53 MABS genes or their homologs in other mycobacterial species are known to be related to antibiotic resistance (Table 2). It has been shown that genes or homologs in other mycobacteria to MAB_0173, MAB_0186c (embA), and 0189c (embC) are related to Ethambutol resistance. Clofazimine resistance was associated to genes MAB_0146c (crp), MAB_2299c, and MAB_4099c (mps1). MAB_2299c had additionally been associated with Bedaquiline resistance. Mycobacterial porin genes such as MAB_1080 were related to resistance toward Fluoroquinolone, Chloramphenicol and Clarithromycin. Clarithromycin resistance may additionally be influenced by gene MAB_1881c and rrl. The mutation that was identified in the rrl gene (position 2270, A/C) is known to confer acquired Clarithromycin resistance.
PknB from mycobacteria was shown to impact resistance toward ß-lactams and Rifampin. References are provided in Table 2.

Discussion
MABS is a highly problematic multi-drug resistant pathogen. Despite protracted combination therapy accompanied by severe side effects, only low conversion rates of typically 40-50% are reported [8] calling for more personalized treatment and also consideration of the mycobacterial population dynamics. Accordingly, the present study focused on exploring the strategy of chronic MABS to adapt to the lung environment in CF patients.
MABS predominated NTM infections in our study group, followed by MAH. The MABS subspecies distribution with 63.6% of subspecies abscessus, 27.3% of subspecies massiliense and 9% of subspecies bolletii was highly similar to the distribution found in the global study from Bryant et al [6].
Five of the eleven MABS-infected patients carried strains belonging to one of three global human transmissible clusters, that have been reported to be more virulent and Figure 3. Comparison of MICs from smooth and rough isolates from patient C. Ten smooth and ten rough M. abscessus isolates from nine serial respiratory samples were tested using the Sensititre system (TREK diagnostics system, ThermoFisher Scientific). Out of the 13 antibiotics available in the sensititre panel three (Cefoxitin, Imipenem and Linezolid) showed statistically significant differences in MIC between the smooth and rough isolates from this patient (Mann Whitney Test, *: P < 0.05, **: P < 0.01, ***: P < 0.001). Bars indicate the median values. resistant and at the same time more frequently associated with chronic disease compared to sporadic strains [6]. Nevertheless, the results from the SNP analysis argued against transmission of MABS between patients in the study group. This was additionally supported by comparison of the accessory genomes of the isolates. Comparison of accessory genomes for discrimination of closely related MABS isolates was also proposed by Davidson [54] and Doyle and colleagues [55]. There is meanwhile a greater number of publications (e.g [55,56,57]) reporting the presence of global cluster strains in patient cohorts without epidemiologic evidence for transmission among patients. The absence of patient-to-patient transmission endorses infection control measures in the CF center in Berlin which involve among others spatial and/or temporal separation of patients with NTM in respiratory tract and wearing a face mask during the entire clinical stay [58].
Comparative genome analysis of 30 isolates from 11 serial samples from a chronically infected patient identified 53 genes with non-synonymous variations. Although colony picking did not allow representation of the whole MABS population in the samples, it enabled us to identify mutations and their temporal appearance during chronic infection. Additional to chromosomal mutations, the plasmid from this strain present in the beginning was lost during persistent infection. This may be explained by the fitness cost of plasmid maintenance in stress conditions present in the human host and is in good accordance to a study by Shoulah et al. [59], who found more plasmidderived genes in environmental compared to clinical isolates from MAH. Associations between molecular changes ( Table 2) and antibiotic treatment as well as clinical disease progression (Table 1) were not evident, which could be due to the multifactorial character of CF disease and the fact that the collection of sequenced isolates represents only a portion of derivatives present in the lung of the patient. Also, by sequencing of isolates from different body sites, Shaw and colleagues (2019) [9] had shown that sputum did not completely reflect in-patient diversity of MABS and also did not represent the overall antibiotic resistance profiles.
Isolates with mutations in these 53 genes are extremely valuable to study virulence and resistance mechanisms of persistent MABS, which are currently insufficiently investigated. Frequent genetic changes included those leading to GPL deficiency and rough colony morphotype. These mutations were associated with increased MICs for Amikacin, Cefoxitin and Imipenem and decreased MIC for Linezolid. In contrast to the present study comparing MICs of isogenic isolates, previous studies on strains isolated from different patients let to controversial outcomes [60,61]. Our study emphasizes the need to give more attention to the impact of morphotypes on drug resistance when searching for new anti-mycobacterial drugs.
Of the 53 genes exhibiting genetic diversity, at least 23 genes or their homologs in other mycobacteria have been assigned to be virulence-and/or resistanceassociated. Genetic diversity evolved in genes related to resistance to Ethambutol, Rifampin, Clofazimine, Bedaquiline, Fluoroquinolone, Chloramphenicol, Imipenem, Cefoxitin and Clarithromycin. Two isolates had acquired the A to C mutation at position 2270 (MABS numbering) in the rrl gene, a mutation known to confer acquired Clarithromycin resistance [62]. Mutations in rrl and furB were also found to exhibit inpatient diversity by Shaw et al. (2019) [9]. Our study very well complements their results. Shaw et al. took samples during a relatively short-time period and in the specific situation after lung transplantation. The big advantage of their study was the sampling of different body sites, enabling a more complete picture of inpatient diversity. On the other side, our study although limited to respiratory isolates, is prior when considering the long sampling period of 4.5 years that very well reflects mutation events owing to a chronic course of disease. Further, the study by Bryant et al. (2021) [10], confirms the relevance of genes exhibiting diversity as identified in the present study. Out of the 53 genes identified, six genes (rrl, GPL synthesis genes, ubiA, crp/fnr, tetR, and ideR) were found by Bryant et al. to exhibit in-patient diversity. Three of these (ubiA, crp/ fnr, and ideR) were shown to be involved in intracellular survival [10].
Interestingly, our results demonstrate that in-patient evolution did not bring forth fewer dominating subpopulations but rather fostered the coexistence of diverse mutant sub-populations.
We aimed to estimate the mutation rate of isolates from patient C using tip dating. However, when implementing two statistical tests, we failed at detecting a temporal signal in our sequences, precluding any meaningful rate assessment (data not shown).
The upsurge of genetic diversity possibly enables the population to adapt to changing living conditions as illustrated by the following example. MAB ecology combines the ability to survive both in the environment and in human airways, which offer disparate access to biometals such as zinc and iron. Such metals are, on the one hand, essential cofactors of enzymes and structural components of regulatory proteins. On the other hand, however, excessive concentrations thereof can be toxic. Therefore, a stringent regulatory system for homeostasis is required. The sputum from CF patients displays highly enriched metal concentrations of zinc and iron [63], which may favor genetic diversity within genes involved in zinc and iron homeostasis during chronic infection. Zinc uptake in mycobacteria is regulated by zur-smtB. Among the genes exhibiting gene diversity in persistent infection, zur is a zinc-binding repressor controlling genes involved in zinc uptake [37]. The porin MspD was shown to be induced by zinc starvation or zur deletion [37]. Moreover, MAB_1080, which was annotated as the porin protein MspD, was also among the genes exhibiting diversity upon chronic infection. Mutations were also identified in ideR, a gene that regulates transcription in response to iron levels [64], as well as in tcrX, which was shown to be up-regulated when M. tuberculosis was grown under iron-limited conditions [32]. Furthermore, Msp porins of rapid growing mycobacteria promote growth in nutrient-limited conditions by enhancing diffusion of small hydrophilic molecules into the cells. At the same time, however, they limit intracellular survival by increasing vulnerability to killing by reactive nitrogen [38,65]. Therefore, the mutation in MAB_1080 may additionally impact the survival of MABS in macrophages.
ChoD, a cholesterol oxidase, is needed in M. tuberculosis survival in macrophages [49]. A mutant deficient in the gamma-glutamyl-transpeptidase (GgtB) from M. tuberculosis, which is homologous to MAB_2788, was shown to be resistant to the toxic effects of Glutathione/ S-nitrosoglutathione and therefore better survived in macrophages [47]. Macrophages are not the only host cells for mycobacteria, some of which are also able to replicate in Type II alveolar epithelial cells (AECs). The tryptophan synthesis gene trpC from M. tuberculosis is strongly up-regulated during growth in AECs [46] and also the homologue from this gene was mutated in two of the MABS isolates.
MptA, EmbC and MmpL family proteins are involved in lipid/glycolipid synthesis. MptA is a mannosyltransferase necessary for synthesis of the mannan backbone from Lipomannan. EmbC from M. tuberculosis catalyses arabinosylation of Lipomannan to form Lipoarabinomannan (LAM), which is involved in immune response by interacting with TLR2 and mannose-receptor [29,43,66]. Interestingly, a longitudinal analysis by Kreutzfeld [67] of 6 MABS bolletii isolates from a CF patient collected over 11 years also identified mutations in the embABC operon during chronic infection.
In conclusion, our data indicate that the survival strategy of MABS in the CF lung is not toward the clonal expansion of few dominant variants but the preservation of heterogeneous subpopulations allowing adaptation to changing lung conditions. Similar future longitudinal studies involving other MABS strains and patients and including samples from different body sites [9] will further explore the range of variation of MABS in-patient evolution during chronic lung infection, which in future will facilitate the development of efficient treatment options.