The high prevalence of Clostridioides difficile among nursing home elders associates with a dysbiotic microbiome

ABSTRACT Clostridioides difficile disproportionally affects the elderly living in nursing homes (NHs). Our objective was to explore the prevalence of C. difficile in NH elders, over time and to determine whether the microbiome or other clinical factors are associated with C. difficile colonization. We collected serial stool samples from NH residents. C. difficile prevalence was determined by quantitative polymerase-chain reaction detection of Toxin genes tcdA and tcdB; microbiome composition was determined by shotgun metagenomic sequencing. We used mixed-effect random forest modeling machine to determine bacterial taxa whose abundance is associated with C. difficile prevalence while controlling for clinical covariates including demographics, medications, and past medical history. We enrolled 167 NH elders who contributed 506 stool samples. Of the 123 elders providing multiple samples, 30 (24.4%) elders yielded multiple samples in which C. difficile was detected and 78 (46.7%) had at least one C. difficile positive sample. Elders with C. difficile positive samples were characterized by increased abundances of pathogenic or inflammatory-associated bacterial taxa and by lower abundances of taxa with anti-inflammatory or symbiotic properties. Proton pump inhibitor (PPI) use is associated with lower prevalence of C. difficile (Odds Ratio 0.46; 95%CI, 0.22–0.99) and the abundance of bacterial species with known beneficial effects was higher in PPI users and markedly lower in elders with high C. difficile prevalence. C. difficile is prevalent among NH elders and a dysbiotic gut microbiome associates with C. difficile colonization status. Manipulating the gut microbiome may prove to be a key strategy in the reduction of C. difficile in the NH.


Introduction
Clostridioides difficile infection (CDI), the leading cause of gastroenterologic hospitalizations and associated deaths, 1 remains at a historically high level with hospital stays from the disease tripling over the past decade. 2 The elderly are disproportionally affected 3,4 with the rate of CDI being several fold higher in individuals 65 years of age and older 5 and an increased risk of 2% for each additional year after 65 years. 6 Not only are elders at increased risk of acquiring CDI but they also have higher rates of recurrence, complications, and death. 7 Elders living in nursing homes (NHs) are now the predominant group suffering from CDI. 3,8 On average, 40% to 50% of new CDI cases come from elders living in nursing homes. 9,10 Most NHs in the US have structured infection control and prevention programs, 11 however environmental measures to control CDI, such as enforcing hand hygiene, contact precautions, and decontamination procedures are employed only after CDI is identified. These measures have not been able to stem CDI concerns in NHs. 12 microbiome composition has a higher proportions of the phylum Bacteroidetes and decreased abundances of other bacteria beneficial to human health, at the genus levels. 16,17 Dysbiosis is a term describing a microbial imbalance or maladaptation on or inside the body and can be defined as either the loss or gain of bacteria which promote health or disease. 18,19 The microbiome of nursing home elders forms a dysbiotic pattern with increasing age, frailty, and malnutrition scores. 20 These microbial dysbiosis patterns have associations with different disease states, however connecting these NH dysbiotic patterns to C. difficile colonization and attempting to correct them may serve as a means to prevent disease spread. Another target for CDI prevention is reducing the prevalence of C. difficile in the environment. The prevalence of C. difficile in stool is the highest among those living in nursing homes with 20% to 50% of residents affected, compared to 1.6% in the general community and 9.5% in the outpatient setting. 14,21 Higher prevalence of asymptomatic colonization with C. difficile is a well-documented source of new CDI cases with spread of the bacteria to vulnerable individuals, however approaches to managing colonization as a means to prevent CDI are lacking. [22][23][24] Currently, there is a lack of knowledge concerning the risk factors contributing to C. difficile colonization in NH elders. A better understanding of how clinical factors and microbiome composition contribute to C. difficile colonization in NH elder would provide a novel tool for preventing CDI. Accordingly, we set out to follow longitudinally a cohort of elders from multiple NH facilities to investigate: 1) the patterns of C. difficile colonization; 2) the associations of C. difficile colonization with medication exposures and other clinical variables; and 3) the associations of C. difficile colonization with characteristics of the elder gut microbiome. Our findings contribute to the understanding of microbiome composition as a potential target to reduce CDI burden in the elderly.

Characteristics of the study subjects
Over a 3-year period, we enrolled and followed 167 NH elders from 5 different facilities collecting monthly samples, totaling 506 clinical samples.
Forty-four (26.35%) of the residents enrolled provided only a single stool sample for the following reasons: 23 (52.3%) elders were on a floor where multiple sample collection did not occur, 12 (27.3%) residents chose to withdraw from the study, 5 (11.4%) died after the first sample collection, 3 (6.8%) had their nurse stop collection due to social issues and, 1 (2.3%) moved out of the facility. Of the remaining, we had an average of 3.7 samples collected per elder. No residents included in this analysis were exposed to antimicrobials nor were hospitalized during the study period. The average participant age was 85.2 years (SD 9.1) of which 18.1% were male, 40 (23.9%) were hospitalized in the past year, and 31 (18.7%) had an antimicrobial exposure in the 6-months preceding enrollment. Both frailty and malnutrition were prevalent with an average clinical frailty score of 6.5 (s = 1.0) or halfway between moderate and severely frail, and a malnutrition indicator categorical score of 2.1 (s = 0.7). Of note, no residents experienced diarrheal symptoms nor were diagnosed with CDI during the course of this study.
A majority of NH elders have detectable C. difficile with one quarter of elders colonized over multiple timepoints Of the 506 samples collected 122 (24.1%) were positive for C. difficile. Over the course of the study, 78 (46.7%) elders had at least one time point in which C. difficile was detected in their stool. Across all five sites, the prevalence of C. difficile from the first elder stool sample taken was 24.0%. For elders with serial sampling, 52 (9.7%) elders had C. difficile positive samples following a previous C. difficile negative sample; 57 (10.6%) elders had C. difficile negative samples following a previous C. difficile positive sample. On average there was just over 1 month between sampling timepoints (48.8 days, 38.9 sd). For elders that went from undetectable to detectable (U > D) there were on average 51.1 days and 40.0 days from detectable to undetectable (D > U) elders. Among elders with multiple positive samples, the average time difference between positive samples was 42.6 days (sd 25.0). There was no significant difference between the five sites in the percent of serial sampling that changed the C. difficile status. Site 4, however, had a higher prevalence of C. difficile (p = .048), as determined from first samples collected per elder (Figure 1).
Of the 123 elders with multiple samples, there were 41 (33.3%) with a single positive sample and 30 (24.4%) with multiple samples positive for C. difficile. The number and percentage of residents colonized by C. difficile did not vary significantly by the nursing home facility nor floor/wing in which the elder lived. We did not observe any significant differences in resident demographics nor clinical scores (both frailty and malnutrition) between colonized and non-colonized residents ( Table 1). The only medication significantly associated with C. difficile colonization (single and multiple) was proton pump inhibitors (Table 1). Remarkably, elders taking a PPI daily were 63% less likely to be colonized with C. difficile (OR 0.37, 95%CI 0. 17,0.82). No other medication showed any significant associations and the only medical condition associated with colonization was a history of a cerebrovascular accident (CVA). Residents colonized with C. difficile had a non-statistically significant higher percentage of an antibiotic exposure within the preceding 6 months.

Dysbiotic microbiome profile associates with clostridioides difficile colonization
Using longitudinal shotgun metagenomic sequencing reads from our NH elders profiled for microbial species abundances with Metaphlan2 25 in combination with demographic, clinical, and medication data we sought to investigate what microbiome and clinical features associate with higher rates of C. difficile colonization in the NH  Figure 1. Mean percentage of initial samples testing positive for C. difficile (prevalence) by nursing home facility site and subsequent samples that changed from undetectable to detectable (U > D) and detectable to undetectable (D > U). This data represents the percentage from the initial and all subsequent patient samples where the following sample changed status. Error bars depict 95% confidence intervals. environment. We identified a total of 789 species, and after applying a 10% prevalence cutoff we ended up with 218 which we used for the remainder of the analysis. We started by applying unsupervised learning methods, such as correspondence analysis and t-Distributed Stochastic Neighbor Embedding (t-SNE), and, as expected, found that interindividual variability overwhelmingly accounted for the majority of the information in the data (Supplemental Figure 1). Therefore, in order to identify microbiome features that are associated with C. difficile colonization, while also accounting for patient-specific effects, we chose to apply mixed-effect random-forest regression. 26,27 This modeling approach has significant advantages compared to traditional multi-linear regression techniques, as it is agnostic to model structure (e.g. non-parametric regression), it does not need to meet common assumptions underlying classical regression techniques and is able to intrinsically perform ranked feature selection. We modeled the relative abundance of every species as a function of patient's sex, age, PPI (yes/no), antibiotic in the last 6 months (yes/no), GABA Analogs (yes/no), Benzodiazepines (yes/no), Valproic Acid (yes/no), Thiazide diuretics (yes/no), Antiplatelet Medications (yes/no), Oral Medications for Diabetes (yes/no), and C. difficile prevalence (Never, One time Point, Multiple Time points, as 0, 1, 2). Because PPI was found to associate with lower C. difficile colonization (Table 1), we hypothesized that this could be microbiomemediated and hence included in the modeling a C. difficile-PPI interaction term. Finally, the patient identifier was used as a random effect. The final model that we trained to data was therefore of the form X s;i ¼ f CC ð Þ þ 1jID, where X is the relative abundance of species s in sample i, f is a general non-linear function (the random forest) applied to the Clinical Covariates (CC) as fixed effects and 1jID corresponds to the random effect to account for multiple samples from the same patient. The advantage of this approach is that it allows training using multiple samples from the same patients without needing to average them out. We used permutated importance analysis to determine significance of covariates in predicting microbe abundances. 28 From the modeling, we identified 55 different bacterial species whose abundances were significantly associated with C. difficile colonization. We categorically stratified C. difficile colonization for a b Figure 2. Results from mixed-effect random forest modeling identifies bacteria significantly associated (BH adjusted <0.05) with C. difficile colonization in NH elders. (a) We report barplots showing the average ± standard deviation of the abundance of CDcolonization significantly associated microbes in patients never, once or with multiple samples colonized by C. difficile. The average was calculated from each patient average across multiple microbiome samples. Averaging across patient is only for visualization purpose as the mixed-effect random forest modeling explicitly considered each sample from every patient. (b) heatmap displaying the log fold change between the average abundance in patients once or multi-colonized over those never colonized by C. difficile. Heatmap shows that bacteria with known symbiotic or health-associated properties (including SCFA-producers from the clostridium cluster IV an XIVa) are reduced at higher C. difficile colonization. Oppositely, bacteria associated to gastrointestinal dysbiotic conditions are enriched in elders with higher C. difficile prevalence. Red color indicates fold change increase in C. difficile colonization status while blue is decrease. Species bolded are otUs that overlap with There are several patterns that emerge from this analysis. First, bacterial taxa that are beneficial to human health are enriched in elders without C. difficile. For example, multiple butyrateproducing organisms in the Eubacterium genus 29 along with taxa such as Odoribacter laneus, Alistipes shahii, and Parabacteroides distasonis 30-32 have higher relative abundance in elders with no C. difficile and progressively lower relative abundance in Once and Multiple colonization groups. When confirmed that butyrate production capacity was reduced at higher prevalence of C. difficile colonization by repeating the same modeling analysis but this time predicting the abundance of metabolic pathways inferred by running humann2 33 on the metagenomic data. Among the 211 pathways significantly associated with C. difficile prevalence (p-value BH adjusted <0.05 (Supplementary Figure S2), we found that both pyruvate fermentation to butyrate and the superpathway of clostridium acetobutylicum acidogenic fermentation were reduced in elders colonized Once or Multiple times with C. difficile compared to None.
Second, both inflammatory-type and pathobiont species (bacteria that cause or promote disease only when specific genetic or environmental changes have occurred) 34 were found to be at higher relative abundance among elders with C. difficile. Pathogens such as multiple Shigella species, Escherichia coli, and Bacteroides fragilis, [35][36][37] all increased in abundance, progressing from Never to Multiple positive samples. Interestingly, this also coincided with higher abundance of the superpathway of lipopolysaccharide (LPS) biosynthesis in Once and Multiple individuals Supplementary Figure S2) which was recently shown to be mostly contributed by Bacteroidales and Enterobacteriales microbes 38 and in an increase in the E. coli-derived superpathways of fatty acids and unsaturated fatty acids biosynthesis Supplementary Figure S2). Inflammation-associated bacteria Bilophila wadsworthia, Bacteroides dorei, and Firmicutes bacterium, [39][40][41] also showed increased abundances in elders with multiple C. difficile positive samples. Lastly, taxa that are either resistant to or grow with exposure to bile exhibited significantly higher abundances among elders with multiple C. difficile positive samples. These include Alistipes putredinis, Anaerofustis stercorihominis, as well as Bacteroides and Bifidobacterium species. 30,36,42,43 Hierarchical clustering of the fold change abundance, between the Once and Multiple positive samples groups relative to Never positive group, indicates reduction of symbiotic (mutually beneficial host-microbiome relationship), or health-associated taxa and an increase in inflammation-associated taxa, with known ability to grow in presence of primary bile acids, with Multiple C. difficile colonization (Figure 2b).
Sixty-six individuals were characterized by having samples both negative and positive for C. difficile presence. We therefore asked if the microbiota profile change in patients where there is a change in C. difficile status both in terms of species abundances and overall diversity. We compared Shannon Diversity for samples negative and positive with C. difficile using mixed-effect modeling and using patient ID as random effect. We found no difference in diversity between samples positive and negative for C. difficile presence (Supplemental Figure 3A). We applied Mixed-Effect Random Forest Classification to predict C. difficile sample positivity as a function of microbial abundances. The model identified 11 species significantly associated with C. difficile sample positivity, and notably identifies an increase in Akkermansia muciniphila and in B. vulgatus in C. difficile positive samples (Supplemental Figure  3B). Interestingly, higher A. muciniphila abundance has also been found in CDI elders compared to control groups. 44 As we determined that PPI is negatively associated with C. difficile prevalence in our bivariate analysis (Table 1), we hypothesized that this protection could be gut microbiota dependent. We therefore investigated the microbiota species that were characterized by having a significant (p-value BH adjusted <0.05) C. difficile-PPI interaction coefficient from the mixed-effect random forest modeling. From these species, we found that a subset displaying a greater abundance in PPI-treated elders without C. difficile compared to PPI-untreated and whose abundance was decreasing with increasing C. difficile prevalence. According to these criteria, we identified a set of nine species (Figure 3).

Discussion
Nursing home elders demonstrated high C. difficile prevalence in our study with over half having at least one C. difficile positive sample and 1 in 4 elders having multiple positive samples. This colonization pattern suggests, and we further hypothesize, that a subset of elders in the NH may serve as a source of transmission to other individuals. In an attempt to better understand these individuals, we looked for clinical and microbiome associations. Among the clinical, we found daily proton pump inhibitor exposure was associated with a 63% reduced risk of C. difficile colonization (i.e. more non-colonized residents were daily PPI users). Reduced abundances of symbiotic or butyrateproducing organisms and higher abundances of pathogenic, inflammatory, or bile acid growing bacterial species were associated with C. difficile colonization.

Clostridium difficile colonization is common among NH elders
The prevalence of C. difficile colonization observed in our study is consistent with prior reports that have demonstrated a range of colonization in the NH environment with prevalence rates as low as 20% to over 50%. 14,21 We have expanded upon prior cross-sectional reports by following elders longitudinally. We demonstrate not only a high prevalence but also patterns of colonization where a subset of elders has multiple C. difficile samples over time. Elders with multiple positive samples may represent a group of long-term colonized elders who are more likely to serve as reservoirs of C. difficile relative to those with only one positive sample, or transiently colonized elders.
There are few studies of C. difficile colonization in the elderly, fewer involving elders living in nursing homes. The systematic review noted above did not reveal any association with PPI and C. difficile in the elderly. 45 Among older hospitalized adults, treatment upon admission with PPIs has not been shown to be associated with C. difficile colonization. 46 The use of PPI was also shown not to be associated with C. difficile colonization in previous studies with a lower number of subjects, 68 long-term care elders, although a nonstatistically significant higher percentage of colonized residents were on a PPI. 21 Our findings suggest that changes in the intestinal microbiome with acid-reducing medication use may be associated with a less favorable environment for C. difficile colonization.

A dysbiotic gut microbiome composition associates with clostridium difficile colonization
Elders in our study with Once and Multiple types of C. difficile colonization demonstrated a pattern of decreased abundances of symbiotic and/or butyrateproducing organisms and consequent reduction in pathways leading to butyrate production. These elders were found to display increased abundances in pathogenic, inflammatory, or bile acid growing species, along with increased capacity for the production of known inflammatory molecules such as LPS.
Butyrate is a short-chain fatty acid that is known to contribute to the maintenance of the gut barrier functions and has both immunomodulatory and anti-inflammatory properties. 29 Thus, higher abundances of organisms that produce butyrate are considered to be beneficial. Among patients with CDI, butyrate-producing species have been shown to be depleted, 47 and butyrate protects against CDI, in mouse models by reducing intestinal inflammation and increasing epithelial tight junctions. 48 Lack of butyrate-producing species has been linked to many other intestinal disorders such as Crohn's Disease and colorectal cancer. 29,49 A lack of butyrateproducing organisms in NH elders may contribute to a favorable environment for C. difficile colonization.
Multiple pathogenic bacterial species exhibit increased relative abundance in C. difficile colonized elders. These include B. fragilis, the most commonly isolated anaerobic pathogen, 36 Eggerthella lenta, a significant human pathogen that is often associated with serious gastric pathology, 50 E. coli, and multiple Shigella species, which has been reported to be present during CDI in elderly NH patients. 51,52 Inflammationassociated species were also present in higher abundances especially among elders with multiple C. difficile positive samples. Species such as Proteobacteria bacterium, a commonly found species in inflammatory disorders, 53 Anaerostipes caccae which is positively associated with CDI in Irritable Bowel Disease, 54 Bacteroides vulgatus, a species that has recently been identified as influencing neuroinflammatory signaling, 55 and Ruminococcus gnavus, which has been associated with a dysbiotic microbiota. 56 Taken together, higher abundances of pathogenic and inflammatory bacteria associate with C. difficile colonization.
Bile acids, detergent-like biological substances synthesized in the liver from cholesterol, play an important role in the physiology of intestinal bacteria and influence their functionality. 43 C. difficile is dependent upon accessing and modifying endogenous bile salts. 57 We observed increased abundances of species that grow when exposed to increasing concentrations of bile acids among elders with multiple C. difficile positive samples. Several members of the Bacteroides genus had higher abundances among C. difficile colonized elders. In general, Bacteroides is a bile resistant genus while Bacteroides fragilis, specifically, is known to play a key role in the enterohepatic circulation of bile acids. 36 We also found Bifidobacterium breve following the same trend as another species that survives in bile. 43 Interestingly, Bifidobacterium longum, which has one of the lowest survival rates in bile among the Bifidobacterium genus, 58 was found at lower abundances among C. difficile colonized elders. Other species following this trend of growth in bile and higher abundances in C. difficile colonized elders include Alistipes putredinis, 59 Anaerofustis stercorihominis, 42 Bilophila wadsworthia, 60 and the pathogens Shigella and Escherichia coli. 61 It is thought that decreases in secondary bile acids may provide a favorable environment in which C. difficile can grow and colonize. 62 Proton pump use associates with protection against C. difficile colonization via commensal gut bacteria Proton pump inhibitors have been known for quite some time to be associated with an increased risk of CDI. 6 A recent large systematic review and meta-analysis including 56 studies involving 366,683 patients demonstrated a two times increased risk of CDI with PPI use. 45 However, whether this is causation or association is still up for debate given that the exact mechanism of action is unknown. 63,64 Proton pump inhibitors are one type of medication that changes both overall diversity and the relative abundances of specific bacterial taxa. 65 Besides diversity, differences among bacterial species composition in the intestines between PPI users and non-users are consistently associated with changes toward a dysbiotic gut microbiome and are in line with known changes in the microbiome that predispose individuals to CDI. 66,67 With regards to colonization, PPIs have been shown not to promote C. difficile colonization in murine models, 68 and a more recent prospective observational study among older hospitalized adults demonstrated no relationship between PPI use and C. difficile colonization. 46 In other nursing home investigations, the use of PPI had a nonsignificant inverse relationship with C. difficile colonization however among a much smaller cohort. 21 Here we are reporting on the association of lower risk for C. difficile colonization among nursing home elders taking a PPI daily and propose that a possible mechanism lies in the differences in microbiome composition between users and non-users of PPIs. Among non-colonized elders taking a PPI, we noticed enrichment in gut bacteria such as Eubacterium species and Faecalibacterium prausnitzii. The Eubacterium spp. are an important butyrateproducing bacterial species which contributes to the maintenance of the gut barrier functions, and has both immunomodulatory and anti-inflammatory properties. 29 Faecalibacterium prausnitzii, one of the most abundant and important butyrate commensal bacteria of the human gut microbiota, 69,70 also show enrichment in non-colonized PPI users. The subset of species displaying a greater abundance in PPI-treated elders without C. difficile compared to PPI-untreated and whose abundance was decreasing with increasing C. difficile prevalence included species five (C. symbiosum, D. longicatena, E. eligens, E. rectale and F. prausnitzii) belong to the Clostridiales order. These species have also been associated with protection against inflammatory and infectious conditions in both mice and humans, including C. difficile infection. [71][72][73] Vincent et al. (2013) found that Eubacteria and Faecalibacterium are depleted in CDI patients. 73 It additionally reported a decrease in the family Bacteroidaceae, which is consistent with our analysis finding B. uniformis enriched in PPI with no CD and reduced with higher prevalence of C. difficile, an association noted by others. [74][75][76] Our findings suggest that changes in the intestinal microbiome among elders exposed to a PPI may be associated with a less favorable environment for C. difficile colonization.

Strengths and limitations
This study had several notable strengths and limitations. One limitation of this study is that it did not have four stool samples from each resident. This may have led to misclassification of the secondary outcome of multi-colonization in these residents. This is the largest longitudinal cohort of nursing home elders reporting microbiome composition. It is also the largest study to survey NH residents for C. difficile colonization. That being said this study is still limited in the number of residents enrolled. A more robust cohort would help us to take a much deeper look at the multiple levels of data and to better explore other classes of medications used less frequently by NH elders. There are potential confounding variables, specifically classes of medications the residents were taking (such as corticosteroids and immunosuppressants) that were not evaluated in this cohort due to the small number of residents on these drugs. Following up this investigation with a cohort including larger numbers of residents from more facilities would strengthen the findings and further explore the dysbiosis associations with medication exposure in the elderly and further address how these dysbiotic patterns are associated with C. difficile colonization.

Conclusions
In conclusion, C. difficile colonization is common among NH elders with a large portion of these colonized residents harboring this pathogen over the course of months. C. difficile colonization state was associated with an inverse relationship to PPI medication use and key healthy bacterial species are present in elders using a PPI who do not demonstrate colonization. Finally, we found that the abundances of several key intestinal bacterial species were associated with C. difficile colonization and thus describe a dysbiotic environment in which C. difficile can take residence and thrive. This microbiome is described as having lower abundances of symbiotic or butyrate-producing organisms with increased abundances of organisms that utilize bile acids or are considered proinflammatory or pathogenic bacterial species. Further work is needed to see if promoting an elder gut microbiome composition to one that resists colonization could affect the high rates of C. difficile colonization seen within the nursing home environment, thus providing a novel approach at preventing this devastating disease.

Study setting and population
This prospective cohort study was approved by the institutional review board at the University of Massachusetts Medical School. This cohort is of NH residents ≥65 years of age who lived in one of five NH facilities in central Massachusetts. We approached residents who had been living in the facility for ≥1 month and did not have any diarrheal illness or antimicrobial exposure within the preceding 4 weeks. Our trained staff used a standardized Capacity for Informed Consent Instrument 77 that combines capacity assessment questions with observation. If the resident was deemed unable to provide consent, we contacted the healthcare proxy to obtain informed consent. Residents were enrolled for a minimum of 4 months. All residents across the four facilities followed similar low-fiber diets and at each nursing facility, all the elders were fed the same meals that is typical for a nursing home diet. No patients suffered from dysphagia or had a feeding tube.

Data collection
We conducted baseline and end of study medical record abstraction for factors associated with key study outcomes. These factors included: age, nutritional status, comorbidities, all medications, and frailty. 16 Prior history of hospitalizations within the past year and antibiotic exposures in the past six months were collected from the medical record. Both daily and as needed medications were obtained from the facility's medical record. Polypharmacy was defined using the most commonly reported definition of five or more daily medications. 78 Polypharmacy has been shown to represent a determinant of gut microbiota composition independent of specific drug classes that have detrimental clinical consequences. 79 We also obtained age, sex, race, and length of NH stay from the medical record. Clinical scoring systems were made from elders during baseline interviews and corroborated from family or facility staff. We used the Charlson Comorbidity Index (CCI) to categorize patients' medical comorbidities. 80,81 The CCI is a method of predicting mortality by classifying or weighting comorbid medical conditions and has been widely used to measure the burden of medical diseases. 82 Frailty was categorized according to the validated and widely utilized Canadian Study of Health and Aging's (CSHA) 7-point Clinical Frailty Scale (CFS). 83 This has been previously validated in demonstrating signatures of frailty in the gut microbiota. 20,84,85 This frailly scale goes from very fit (1) to very severely frail (8). We did not enroll terminally ill patients into this study. We assessed nutritional status using the Mini Nutritional Assessment (MNA) tool. [86][87][88] The MNA is a rapid assessment of nutritional status used routinely in elderly NH residents. 88 Residents were categorized as normal (1), at risk (2), or malnourished (3) based on the MNA survey administered to the residents by trained research staff or the nurse caring for the resident if mentally impaired. All residents enrolled were monitored during their involvement in the study for any changes to their care or for new medication or hospitalization exposures.

Sample collection, sequence processing, and analysis
DNA was extracted from samples using the PowerMagTM Soil DNA Isolation Kit on an epMotion 5075 TMX liquid handling workstation according to manufacture protocols (MO BIO Laboratories, #27,100-4-EP). Sequencing libraries were constructed using the Nextera XT DNA Library Prep Kit (Illumina, Inc., #FC-131-1096) and sequenced on a NextSeq 500 Sequencing System as 2 × 150 base pair-end reads. Shotgun metagenomic reads were first trimmed and quality filtered to remove sequencing adapters and host contamination using Trimmomatic 89 and Bowtie2, 90 respectively, as part of the KneadData pipeline v0.6.1 (https://bit bucket.org/biobakery/kneaddata) using human genome hg19 (GRCh37, GCA_000001405.1). Reads were then profiled for microbial taxonomic abundances using Metaphlan2 91 and for metabolic pathway abundance using humann2 33 (as in our previous work). 20,27 Statistics for the sequencing is provided in Table 2: All samples were tested for C. difficile toxin genes to determine C. difficile colonization. This was done using real-time polymerase-chain reaction with AdvanSure RT-PCR kit (LG Life Science) for the simultaneous detection of tcdA and tcdB genes. The primer pairs used were NK9-NK11 for the repetitive domain of the tcdA gene and NK104-NK105 for the tcdB gene. This method is based on TaqMan technology and has been shown to have very good test characteristics with 100% sensitivity and 98.3% specificity. 92 We used the SLAN RTPCR detection system (LG Life Science) according to the manufacturer's instructions. 92 Each sample needed to be positive for both tcdA and tcdB genes to then be categorized as positive for C. difficile. Estimation of C. difficile colonization using qPCR instead than directly from shotgun metagenomics has been used several times previously by us and others 93,94 and it allows to determine with better resolution the presence of this bacterium which is usually found at very low abundance in the GI tract (<1%) and could be undetected from metagenomic sequencing.

Definition of outcome of C. difficile colonization
We defined "C. difficile colonization" as the detection of C. difficile toxin genes A or B using real-time polymerase-chain reaction in the absence of any diarrheal symptoms 4 weeks pre-and post-sample collection. We further divided C. difficile colonization into two categories. The first was elders with only one sample testing positive (one-time) while the second was if the elder had multiple samples positive (multiple). No elders in this study had diarrhea at any point.

Statistical and computational analysis
Multivariable Logistic Regression Modeling. We first used multivariable logistic regression analysis to test whether clinical variables alone were associated with C. difficile colonization. To select the set of covariates for the multivariable model, we selected any covariates with a p < .20 from an unadjusted bivariate analysis. We ran two models, first with the outcome of any colonization time point and then again with the outcome of multiple colonization time points. We included all acidreducing medications rather than proton pump inhibitors alone in the model given that both were significantly associated with the outcome.
Descriptive Microbiome Analysis. To determine similarity in microbiome samples among the NH elders and to associate microbiome features to C. difficile colonization level, we started by performing traditional unsupervised correspondence analysis, including Principal Component Analysis and t-Distributed Stochastic Neighbor Embedding. We used functions within the R packages phyloseq, vegan, and Rtsne to perform the descriptive microbiome analysis (see available code at https://gitlab. com/vanni-bucci/2020_cdad_nh_paper).
Machine Learning Analysis. As most of the signal from the unsupervised analysis was accounted by inter-individual variability, we then decided to run supervised machine learning models that account for the repeated sampling nature of our study design. We therefore run mixed-effect random forest modeling to predict the abundance of every detected microbiota species as a function of clinical covariates and level of C. difficile colonization in each individual (see results). As reported in the results we used the Mixed-Effect-Random-Forest (MERF) routine from R package LongituRF 95 to fit the model X s;i ¼ f CC ð Þ þ 1jID, where (1) X is the relative abundance of species s in sample i, (2) f is a general non-linear function (the random forest) applied to the Clinical Covariates (CC) as fixed effects (which includes C. difficile prevalence as ordinal variable 0,1,2) and (3) 1jID corresponds to the random effect to account for multiple samples from the same patient. We used permutated importance analysis to assess the significance of the clinical predictors in associating with each microbiota species abundance.
To perform the within-patient analysis (e.g., differentiating positive and negative samples for elders that switch from negative to positive and vice-versa) we first calculated Shannon diversity and performed linear-mixed effect modeling as Diversity ¼ SamplePositivity þ 1jID. To classify samples as C. difficile positive vs negative, we applied Mixed-Effect Random Forest Classification. Permutated Importance Analysis was used to determine species significantly associated with sample positivity.

Data availability
Sequences have been deposited in the NCBI SRA under accession PRJNA529586. Processed data, metadata and code to reproduce analysis in the paper is available at https://gitlab. com/vanni-bucci/2020_cdad_nh_paper.

Disclosure statement
No potential conflict of interest was reported by the authors.