Fecal microbiota and bile acids in IBD patients undergoing screening for colorectal cancer

ABSTRACT Due to the potential role of the gut microbiota and bile acids in the pathogenesis of both inflammatory bowel disease (IBD) and sporadic colorectal cancer, we aimed to determine whether these factors were associated with colorectal cancer in IBD patients. 215 IBD patients and 51 non-IBD control subjects were enrolled from 10 French IBD centers between September 2011 and July 2018. Fecal samples were processed for bacterial 16S rRNA gene sequencing and bile acid profiling. Demographic, clinical, endoscopic, and histological outcomes were recorded. Characteristics of IBD patients included: median age: 41.6 (IQR 22); disease duration 13.2 (13.1); 47% female; 21.9% primary sclerosing cholangitis; 109 patients with Crohn’s disease (CD); 106 patients with ulcerative colitis (UC). The prevalence of cancer was 2.8% (6/215: 1 CD; 5 UC), high-grade dysplasia 3.7% (8/215) and low-grade dysplasia 7.9% (17/215). Lachnospira was decreased in IBD patients with cancer, while Agathobacter was decreased and Escherichia-Shigella increased in UC patients with any neoplasia. Bile acids were not associated with cancer or neoplasia. Unsupervised clustering identified three gut microbiota clusters in IBD patients associated with bile acid composition and clinical features, including a higher risk of neoplasia in UC in two clusters when compared to the third (relative risk (RR) 4.07 (95% CI 1.6–10.3, P < .01) and 3.56 (95% CI 1.4–9.2, P < .01)). In this multicentre observational study, a limited number of taxa were associated with neoplasia and exploratory microbiota clusters co-associated with clinical features, including neoplasia risk in UC. Given the very small number of cancers, the robustness of these findings will require assessment and validation in future studies.


Introduction
Inflammatory bowel disease (IBD), including Crohn's disease (CD) and ulcerative colitis (UC), are chronic, relapsing inflammatory conditions of the gastrointestinal tract. Among the many potential factors contributing to morbidity and premature mortality in IBD, the increased risk of developing colorectal cancer is one of the most serious, with IBD patients who have long-standing colonic involvement at risk of developing colitis-associated cancer in addition to sporadic colorectal cancer (CRC). 1 Colitis-associated cancer and sporadic CRC have both overlapping and distinct features at the molecular level, as well as unique clinical risk factors -notably related to the duration, extent, and severity of colonic inflammation, and the co-existence of primary sclerosing cholangitis (PSC). 1,2 Recently, there has been interest in the role of the gut microbiota in the pathogenesis of sporadic CRC, particularly the role of Fusobacterium species 3,4 and microbiota-based screening tests have been proposed. 5,6 The gut microbiota has also been strongly implicated in the pathogenesis of IBD 7,8 and microbiota-based treatments, such as fecal microbiota transplantation, are being investigated. [9][10][11][12][13] Furthermore, the gut microbiota has been shown to be important in murine models of colitis-associated cancer. 14,15 Despite these associations, it remains to be determined whether the gut microbiota may contribute to the development of colitis-associated cancer in human IBD. The potential for gut microbiota-based screening as a noninvasive alternative to colonoscopy in IBD has recently been proposed. 16 Metabolites produced or transformed by the gut microbiota have also been implicated in the pathogenesis of IBD 17,18 and of sporadic CRC, especially bile acid metabolites. 19 Secondary bile acids are produced by the actions of certain colonic bacteria on primary bile acids. Deconjugation of taurine and glycine from primary bile acids is performed by a broad range of species encoding bile salt hydrolases. Taxa expressing the bai (bile acid-inducible) operon convert primary bile acids to secondary bile acids and this is performed by a narrow set of Clostridium species (Clostridium scindens, Clostridium hylemonae and Clostridium hiranonis). 20 The secondary bile acid deoxycholic acid (DCA), has in particular been associated with sporadic CRC, 19 while low levels of secondary bile acids have been reported in IBD. 17 To address whether the gut microbiota and bile acid metabolites were associated with colitisassociated neoplasia (cancer and its precursor lesions, low-grade dysplasia (LGD) and highgrade dysplasia (HGD)), we conducted a multicentre observational study (ClinicalTrials. gov Identifier: NCT02726243) of both the bacterial fecal microbiota by 16S rRNA gene amplicon sequencing and bile acid metabolites in IBD patients undergoing surveillance colonoscopy in France, using non-IBD patients undergoing screening colonoscopy as controls. The aims of this study were to investigate the link between the gut microbiota, intestinal inflammation, colorectal cancer, bile acids, and primary sclerosing cholangitis (PSC). To address these aims, we used both direct comparisons between groups of interest and exploratory, unsupervised Dirichlet Multinomial Mixtures (DMM) to identify microbiome clusters within the data.

Study population
A total of 270 patients were recruited into this study from 10 centers in France (Figure 1, Figure 2a and Table S1). Of these, 268 fecal samples were available for sequencing and two samples failed sequencing, leaving 266 patients in the final analysis (106 UC (including undetermined colitis (n = 2)), 109 CD and 51 non-IBD controls ( Figure 1, Table S1)). Clinical characteristics are provided in Table 1. Neoplasia was divided into five categories: No neoplasia, sporadic adenomas, low-grade dysplasia, high-grade dysplasia, and colorectal cancer. Two patients had high-grade dysplasia in an adenoma and were included in the high-grade dysplasia category. Patients with more than one finding were classified based on the highest level of neoplasia they exhibited, according to Cancer > high-grade dysplasia > low-grade dysplasia > sporadic adenoma > No neoplasia (Table S2).

Cohort characteristics and medication use
Clinical and disease characteristics of IBD patients are presented in Table 2. Oral steroid use, oral, and rectal 5-aminosalicylate (5-ASA) use and tacrolimus use were significantly more common in UC, while the anti-tumor necrosis factor (anti-TNF) medications infliximab and adalimumab were more common in CD. Importantly, IBD patients were significantly younger than non-IBD controls, while non-IBD controls had a significantly higher BMI (Table 1, Figure 2b), limiting direct comparisons between these populations. Disease duration and age across the different categories of neoplasia are presented in Figures 2(c,d), respectively. Neoplasia distribution by colorectal location, disease extent, and PSC are presented in Figures 2(e-g), respectively.

Significant confounders limited gut microbiota comparisons between IBD patients and non-IBD controls
The median number of final sequences per sample was 23823.5 (range 3616-58437). In total there were 6051 ASVs, with 2401 following initial filtering. We first evaluated the study groups to determine if there were differences between non-IBD controls and the different subtypes of IBD. Alpha diversity was reduced in IBD compared to controls overall (non-IBD controls median Shannon index 3.74 (IQR 0.63), CD 3.44 (IQR 0.86), UC 3.41 (IQR 0.61); p = .012, Figure 3a). Beta-diversity analysis suggested differences between control, CD, and UC groups ( Figure 3b) although using our consensus approach to differential abundance, only three genera (Ruminococcus gnavus, Lachnoclostridium and Flavonifractor) were found to be increased in CD versus non-IBD controls (Figure 3c, Figure S1). However, the majority of patients in this study were in clinical remission (175/215-81.4%) and would be expected to have a less altered microbiota.

Lachnospira is decreased in IBD patients with colorectal cancer
We next looked at neoplasia and compared alpha diversity between patients without neoplasia (N0) and patients with any grade of neoplasia (Nxsporadic adenoma, low-grade dysplasia, high-grade dysplasia, and cancer) ( Figure 4a) and found no significant difference between patients with neoplasia and patients without neoplasia in any of the three cohorts. Despite the low number of IBD-associated cancers, we compared the microbiota between IBD patients without any neoplasia (n = 177) and IBD patients with cancer (n = 6), as this was a primary objective of the study. Pairwise PERMANOVA did not identify a significant difference in community composition (Figure 4b), while differential abundance testing identified only one genus, Lachnospira, which was increased in IBD patients without neoplasia compared to IBD patients with cancer ( Figure 4c, Figure S2a). Plotting the relative abundance of Lachnospira across the different categories of neoplasia confirmed a trend of decreasing Lachnospira with more advanced neoplasia in IBD ( Figure 4d).

Escherichia-Shigella is increased and Agathobacter decreased in UC patients with neoplasia
We next looked at the combined category of any neoplasia (Nx) versus no neoplasia (N0). No genera were differentially abundant between these two categories for all IBD patients; however, when stratified by subtype, differences were observed in beta diversity in UC patients alone ( Figure 5a). Differential abundance testing identified an increase in Escherichia-Shigella in UC patients with neoplasia and an increase in Agathobacter (formerly Eubacterium rectale) in UC patients without neoplasia ( Figure S2b, Figure 5(b,c)). Although Lachnospira was not significantly increased in N0 UC patients according to DESeq2, it was with the other two methods ( Figure S2b and S2c).

Dirichlet multinomial mixtures identify 3 microbiota clusters
Given the exploratory nature of this study and the large number of potentially important co-variates that influence both the microbiota and neoplasia risk, we employed a common unsupervised approach to identifying different community types in microbiome studies, Dirichlet Multinomial Mixtures (DMM). The aim of this approach was

Figure 3. Gut microbiota characteristics of the cohorts in this study
a. Alpha diversity measures (Shannon diversity index) by study group. b. PCoA plots of proportion-normalised Bray-Curtis divergence, coloured by group with pairwise PERMANOVA significance stars with false-discovery rate applied. c. Differential genera increased in CD versus controls plotted according to their DESeq2 log2-fold change. Genera were only deemed differential when they were detected by all 3 differential abundance metrics employed in this study. CD -Crohn's disease; PCoA -principle co-ordinate analysis; UCulcerative colitis. P-values: * <0.05; ** <0.01; *** <0.001; **** <0.0001.

Figure 4. Gut microbiota associated with neoplasia and cancer in IBD
a. Alpha diversity combared between patients with and without neoplasia, stratified by study group. b. PCoA plots of proportion-normalised Bray-Curtis divergence for IBD patients, coloured by category of neoplasia. Pairwise PERMANOVA for patients with any neoplasia and patients with cancer. c. Differential genera increased in IBD patients without cancer versus cancer plotted according to their DESeq2 log2-fold change. Genera were only deemed differential when they were detected by all 3 differential abundance metrics employed in this study. d. Proportionnormalised Lachnospira abundance plotted for each category of neoplasia. CD -Crohn's disease; IBD -inflammatory bowel disease; PCoAprinciple co-ordinate analysis; UC -ulcerative colitis. P-values: * <0.05; ** <0.01; *** <0.001; **** <0.0001.
to better understand the relationship between different microbiota community types and various clinical variables, including neoplasia. Using the Laplace approximation, we determined that the optimum number of clusters was 3 ( Figure  S3a, with parameters in Figure S3b). Due to inherent variability between runs, we re-fit the model 20 times to ensure convergence ( Figure S3c). The top 40 genera that contribute to the cluster assignment are presented in Figure 6a as a row-normalized heatmap of the mean difference in abundance between clusters, descending in order of magnitude. Cluster 1 (C.1) was associated particularly with beneficial butyrate-producing bacteria Agathobacter and Roseburia, C.2 with Oscillospiraceae (UCG-002 and UCG-005), Alistipes and Christensenellaceae R-7 group and C.3 was associated with a number of pathobionts in IBD, including Streptococcus, Ruminococcus gnavus group, and Escherichia-Shigella (Figure 6a).

Microbiota clusters are associated with clinical characteristics in IBD patients
To visually explore the relationship between the clusters, taxa, and main clinical variables, we used the seqPCoA function to construct a triplot at the genus level (Figure 6b), including the top 10 taxa. Variables included presence of PSC, disease phenotype (Montreal classification), clinical and endoscopic activity (both binary), age, disease duration, body-mass index (BMI), presence of neoplasia stratified by subtype and use of anti-TNF medication, immunomodulators, and 5-ASAs. At q-value threshold of 0.2, 7 taxa and the variables Cluster, neoplasia, PSC, and endoscopic activity were selected. This plot identifies an association between C.1, Agathobacter, and UC patients without neoplasia. In contrast, C.3 was associated with Ruminococcus gnavus and Escherichia-Shigella, PSC and activity on endoscopy. Neoplasia in UC patients was associated with both C.2 and C.3.
To further evaluate the association of DMM clusters and categorical clinical variables, we performed Correspondence Analysis (CA) (Figure 6c) in IBD patients. The variables selected were Neoplasia by subtype, PSC, clinical, and endoscopic activity and disease phenotype. This suggested that C.3 was associated with PSC, clinical and endoscopic activity and CD patients with ileal involvement, while C.1 was associated with UC patients without neoplasia, low endoscopic activity and E1 or E2 disease. Taken together, these findings suggest that C.3 is a traditionally 'dysbiotic' cluster and is associated with clinical features such as disease activity, ileal CD and PSC, while C.1 is associated with disease remission, UC patients with more limited disease and UC patients without neoplasia.
Given these observations, we compared the incidence of neoplasia across the three clusters ( Figure 6d). No association between neoplasia and microbiome cluster was identified in CD patients (Figure 6d

Clinical and technical confounders in the dataset
Consistent with the results of the exploratory triplot and Correspondence analysis (Figures 6(b,c), respectively), 50% of UC patients in C.3 had PSC. In contrast, only 23.4% of patients in C.1 and 24.1% of patients in C.2 had PSC (Chi-squared test P-value = 0.009). As C.3 was associated with other risk factors for neoplasia, such as disease activity and PSC, these may act as confounders. Associations of other important clinical variables with cluster assignment are presented in Table S3.
As described in the methods, a subset of samples (15 out of 215) were fresh-frozen rather than being stored in RNAlater®. While these accounted for a small proportion overall IBD patients (6.97%), they accounted for a large proportion (15/38 (39.5%)) of samples with neoplasia in IBD patients. We compared neoplasia samples fresh-frozen (15) versus neoplasia samples stored in RNAlater ( Figure  S4). There was no significant difference between the groups by PERMANOVA (Figure S4a), or cluster assignment ( Figure S4b), although there is a trend toward increased membership of cluster C.3 in the fresh-frozen samples. Other potential confounders, antibiotic use within the past 3 months ( Figure S4c) and sequencing run ( Figure S4d) did not appear to have strong effects on composition.

Bile acids are not associated with cancer or neoplasia but altered between community clusters
Targeted bile acid metabolomics resulted in 27 bile acids, as well as a number of ancillary measurements (Table S4). PCA demonstrated no clear difference between IBD patients with neoplasia and patients without neoplasia and this was confirmed by PERMANOVA testing (p = .11, Figure 7a, Figure S5a). There were no differentially detected bile acids between IBD patients with cancer and those without, nor any difference between IBD patients with and without neoplasia.
Interestingly, the DMM clusters identified from the microbiota analysis were associated with bile acid composition in IBD patients (PERMANOVA p-value = 0.001, Figure 7b). C.2 and C.3 samples appeared to be at either end of a spectrum with C.1 samples intermediate. This effect appears to be primarily driven by the ratio of primary to secondary bile acids (Figure 7c, Figure S5b). When looking at bile acids across all available patients, the proportion of primary bile acids was highest in C.3 and lowest in C.2, while secondary bile acids were highest in C.2 ( Figure 7d). C.2 was also associated with the lowest proportion of conjugated bile acids (Figure 7e, Figure S5c and S5d). When correlating genera with the five main bile acids (CA, CDCA, DCA, LCA, and UDCA), we observed that bacteria that positively correlated with secondary bile acids were most abundant in C.2, while bacteria negatively correlated with secondary bile acids and positively correlated with primary bile acids were most abundant in C.3 (Figure 7f). Due to the limitations of 16S rRNA gene sequencing, we were not able to identify species with known 7α-dehydroxylating capabilities, such as Clostridium scindens. While a number of bacteria were correlated positively with secondary bile acids, these are not known to have 7α-dehydroxylating capabilities, although bai genes have been reported in uncultured metagenome-assembled genomes closely related to Oscillospiraceae, 21 two members of which (Oscillospiraceae-UCG 002 and -UCG 005) correlated positively with DCA and LCA in the current study (Figure 7f). Additionally, Ruminococcus gnavus correlated positively with UCDA, consistent with previous reports. 22 Overall, these data demonstrate that C.3 patients have reduced bile acid deconjugation and primaryto-secondary conversion rates, while C.2 patients have the highest rates. These findings were similar when each study group (control, CD and UC) were analyzed independently ( Figure S6). However, CD patients with ileal involvement had a higher proportion of primary bile acids (cholic acid and chenodeoxycholic acid) and a lower proportion of the secondary bile acid lithocholic acid ( Figure S7).

Discussion
Colorectal cancer contributes importantly to the morbidity and premature mortality associated with IBD 1 and evidence from pre-clinical studies suggests that the microbiota may play a role in tumourigenesis in the setting of inflammation. 15,[23][24][25] The invasiveness and healthcare costs associated with endoscopic screening mean that risk-based systems, already in place for stratifying surveillance intervals, could be improved by noninvasive biomarkers.
The current study is an exploratory evaluation focusing on the relationships between the gut microbiota, intestinal inflammation, colorectal cancer, bile acid profiles and PSC. This cohort was highly heterogenous in terms of disease subtype, duration, severity, and treatment with a small number of IBD-associated cancers (n = 6), limiting our ability to perform direct comparisons between groups of interest while controlling for confounding variables. Stratification by disease subtype (CD and UC) due to the known differences between these conditions was performed, while neoplasia categories were combined due to the continuous spectrum from low-grade dysplasia to high-grade dysplasia to cancer to increase the power to detect differences. While we have reported a small number of tentative associations by direct comparisons between groups with a consensus differential abundance approach, these are vulnerable to numerous potential confounders for which we were unable to fully control.
To perform a more exploratory approach in this heterogenous cohort, we additionally employed unsupervised microbiome clustering, with the aim of identifying microbiome clusters that were both microbially meaningful in an IBD context and which could be co-associated with clinical variables relevant to CAC. Clusters C.1 and C.3 associated with numerous low-risk and high-risk clinical features, respectively, emphasizing that the gut microbiome may simply be a marker of risk factors for neoplasia, rather than having any direct causal association with its development.
Although the total number of colitis-associated cancers were low (n = 6), differential abundance analysis detected a reduction in Lachnospira. This genus is a member of Lachnospiraceae and has not been previously associated with CAC. Our differential abundance approach aimed to avoid some of the pitfalls of relying on one method 26 but as a result, required a relatively high adjusted p-value threshold (0.2) and may also be affected by the highly skewed numbers in each group in this analysis. Given the reduction in Lachnospira in high-grade dysplasia as well and the exploratory nature of the study, we have reported this finding, however it should be interpreted with caution and will require validation. Similarly, in UC patients, Escherichia-Shigella was increased in neoplasia, while Agathobacter was decreased. Enterobacteriaceae has been shown to be increased in mucosa of CAC 27 and are commonly associated with IBD and inflammation. Escherichia-Shigella was also prominently associated with C.3, the most dysbiotic cluster associated with the highest prevalence of neoplasia in UC. Targeted editing of the gut microbiota to reduce E. coli and commensal Enterobacteriaceae has also been shown to reduce colonic tumors in the azoxymethan (AOM)/dextransodium sulfate (DSS) model of colitis-associated cancer. 24 Inflammation appears to independently promote expression of tumor-promoting genes in the pks island in E. coli. 25 Fusobacterium was not associated with risk of neoplasia in UC patients in this study, consistent with previous results. 27 Interestingly, while C.3 was the most high-risk cluster for neoplasia in UC patients, this cluster was also associated with ileal disease in CD. Particularly, of 15 patients with isolated ileal disease, 9 patients were in C.3 and only 1 in C.1, consistent with this patient group being characteristically associated with dysbiosis. However, this group had almost no neoplasia (one sporadic adenoma), suggesting that C.3 may represent different things depending on disease subtype. In CD, it may be enriched for patients with ileal CD (not a risk factor for neoplasia), while in UC patients, it may be associated with PSC and disease activity (risk factors for neoplasia).
In contrast, Agathobacter, (formerly Eubacterium rectale 28 ) is a butyrate-producing bacteria that has been shown to be reduced in UC 29,30 and has been associated with improved response to anti-TNF medications in pediatric IBD. 31 Butyrate is an important short chain fatty acid produced by fermentation of dietary fiber and has immunemodulating anti-inflammatory and anti-neoplastic effects in the colon. 32 This genus was strongly associated with C.1. The closely related genus Roseburia was also highest in C.1 (Figure 6a), suggesting that these members of the Lachnospiraceae may be important for the maintenance of gut health in patients with long-standing ulcerative colitis. DMM models, used as an exploratory technique here, have been used to identify different risk groups for development of atopy in a new birth cohort of infants 33 and have also been applied to identify microbiota and metabolomic groups in pediatric IBD patients and their relatives. 34 The mucolytic bacterium R. gnavus, closely associated with the high-risk cluster C.3, has already been linked to IBD in multiple studies. 8,35,36 Interestingly, R. gnavus may correlate with active inflammation, 36 as histological inflammation is strongly associated with colorectal cancer risk in IBD. 37 While alterations were less clear in C.2, Alistipes, which has been associated with sporadic colorectal cancer 38 as well as mediating colitis-associated cancer risk in Lipocalin-2-deficient/Il10-deficient mice, 39 was enriched in this cluster (Figure 6a).
In terms of bile acid profiles, C.3 demonstrated marked bile acid dysmetabolism, including increased primary:secondary bile acid ratios, decreased deconjugation and increased sulfation ( Figure 6 and Figure S5 and S6). While these findings may be due to a reduction in functional capability of the microbiota, it is also possible that patients in C.3 had a shorter transit time, although we do not have data to support this. C.2, which had the highest relative proportion of secondary bile acids also had a high relative risk of neoplasia compared to C.1. It is interesting to note that while conversion of primary bile acids to secondary bile acids is a hallmark of a functioning colonic microbiota, secondary bile acids have been associated with an increased risk of colorectal cancer. 19 C.1, the lowest risk cluster, had higher levels of primary bile acids and lower levels of LCA, across the whole cohort ( Figure 7d) and in UC and CD patients alone ( Figure S6). We note that the differences in bile acid composition across the clusters may be due to other factors, such as diet and may modify the gut microbiota, thus having an indirect effect. However, we did not identify any association with neoplastic risk and bile acid profiles. We hypothesize that other metabolites which were not assessed in this study such as short chain fatty acids and tryptophan metabolites may also be involved, although this will require future studies to investigate.
This study has limitations. Due to its multicentre nature, there may be variability between institutions and providers in terms of approach to surveillance, dysplasia detection, and histopathological assessment, although all centers were reference centers for IBD. Also, a small number of participants would fall outside current surveillance guidelines (15 CD patients with L1 disease and 7 UC patients with E1 disease), while there was a high proportion of patients in the cohort in remission, which may alter microbiota composition and fecal bile acids. Control subjects were also significantly older and had a significantly increased BMI compared to IBD patients (Table 1).
Importantly, there was some heterogeneity in terms of storage as some samples were freshfrozen as opposed to being stored in RNAlater®, which may be a source of bias. In addition, 16S rRNA gene sequencing for this study was performed over two separate sequencing runs, one 250bp paired-end and the other 300 bp pairedend sequencing. To account for this, we performed separate error-learning steps with the dada2 algorithm and ensured identical trimming lengths.
A number of potentially important predictor variables were not available. These include family history of colorectal cancer, previous dysplasia, and post-inflammatory polyps, although the utility of the latter has recently been brought into question. 40 Finally, missing data prevented us from providing a full analysis of the endoscopic disease activity scores, as these were incomplete, absent or incorrectly applied in 10 IBD patients. There was also some missing data in relation to the clinical conditions described in Table 1 (Spondyloarthropathies, Appendicectomy, Psoriasis).

Conclusion
In this multicentre study of the fecal microbiota and bile acid profiles of IBD patients undergoing colorectal cancer surveillance in France, we identified a small number of taxa and high-and low-risk community clusters associated with neoplasia in UC. These microbiota changes were closely associated with other high-risk features, such as inflammation and PSC and whether they are markers of high-risk disease or have causal link to neoplasia will require mechanistic studies. These findings will also require future validation in large, prospective cohort studies.

Ethical approval
Approval for human studies was obtained from the local ethics committee (Comité de Protection des Personnes Ile-de-France IV, IRB 00003835 Dyscolic study; registration number 2014/ 10NICB). Patients or the public were not involved in the study design.

Inclusion and exclusion criteria
For inclusion, patients must be 18 years of age or older, have the capacity to give informed consent, have IBD or be a non-IBD control undergoing a scheduled screening colonoscopy, diagnoses confirmed in any of the participating service according to ECCO consensus and patient to have follow-up in one of the participating services. Exclusion criteria were trusteeship, guardianship or safeguard justice, unable to speak French, answer questions or speak, history of colonic resection, 'ostomy' at time of colonoscopy and current treatment by radiotherapy or chemotherapy. Patients who had received antibiotics within the past 3 months were included, although this was initially a temporary exclusion criteria.

Recruitment
Adult IBD patients undergoing surveillance colonoscopy were recruited in 10 French IBD centers ( Figure 2a, Table S1) and provided informed consent. Non-IBD adult patients undergoing screening colonoscopy for CRC were identified in routine clinical practice. Demographic and clinical details were recorded. Clinical activity was also assessed using the partial Mayo score (for UC) and the Harvey-Bradshaw index (for CD).

Colonoscopy
Colonoscopy was performed according to local protocols at each institution. Endoscopy reports and histological outcomes were provided.

Fecal sample collection
Patients who consented to participate provided a single fecal sample which was stored in RNAlater® prior to colonoscopy and bowel preparation. A subset of IBD samples (15 out of 215) were fresh-frozen rather than being stored in RNAlater®. While these accounted for a small proportion of IBD patients overall (6.97%), they accounted for a large proportion (15/38 (39.5%)) of samples with neoplasia in IBD patients. Fecal samples were then transferred to the central receiving laboratory at the Center de Recherche Saint Antoine (CRSA), Paris.

Microbial DNA extraction
DNA was extracted from fecal samples by both mechanical and chemical methods, as previously described. 41 Microbial lysis was performed by both mechanical and chemical methods. Briefly, mechanical lysis was performed with glass beads and following isopropanol precipitation of nucleic acids for 10 min at room temperature, samples were incubated on ice for 15 min and then centrifuged for 30 min at 20 000 g and 4°C. The resulting pellets were suspended in phosphate buffer (450 μL) and potassium acetate (50 μL). Following RNase treatment and DNA precipitation, recovery of nucleic acids was performed via centrifugation at 20 000 g and 4°C for 30 min. The DNA pellet was suspended in 80 μL of trypsin-EDTA buffer.

16S rRNA gene amplicon sequencing
Amplicon sequencing of the V3-V4 region of the 16S ribosomal RNA gene was employed for microbiota analysis. The primers used for this analysis were -16S sense 5′-TACGGRAGGCAGCAG-3′ and anti-sense 5′-CTACCNGGGTATCTAAT-3′. This was performed using an optimized and standardized 16S amplicon library preparation protocol (Metabiote, GenoScreen, Lille, France). 16S DNA PCR was performed with 5ng of genomic DNA with bar-coded primers (Metabiote MiSeq Primers) according to the manufacturer's protocol (Metabiote) at a final concentration of 0.2 μmol/L, with an annealing temperature of 50°C for 30 cycles. PCR product purification was performed with Agencourt AMPure XP-PCR purification system (Beckman Coulter, Brea, CA, USA) and was quantified according to the manufacturer's protocol with samples multiplexed at equal concentrations. An Illumina MiSeq platform (Illumina, San Diego, CA, USA) was used for sequencing and this was performed over two separate sequencing runs: a 250 bp paired-end sequencing protocol and a 300 bp paired-end sequencing protocol, at GenoScreen. Raw paired-end sequencing reads were subjected to the following initial procedures a GenoScreen: (1) quality filtering with the PRINSEQ-lite PERL script, 42 truncating bases from the 3′ end with a quality <30 (based on the Phred algorithm) and (2) using CutAdapt to remove primers, with no mismatches allowed in the primer sequences. 43 Only sequences with perfectly matching forward and reverse primers were retained for further analysis.
Data was then imported into the R statistical environment for subsequent analysis (R version 3.6.3, 46 phyloseq package (version 1.28.0), 47 incorporating the Bioconductor workflow) [48] 16S rRNA gene sequence data are deposited in the Sequence Read Archive (accession number PRJNA720094). Bile acid metabolomics data and selected metadata are available from the corresponding author on request (harry. sokol@gmail.com).

Targeted bile acid metabolomics
Bile acid metabolomics were performed as previously described. 17

Statistical and microbiome analysis
Continuous data is presented as median and interquartile range (IQR). Between-group differences were assessed for continuous data using the Wilcoxon rank sum test for two groups and the Kruskal-Wallis test for >2 groups and for categorical data using the Chi-squared test or the Fisher exact test. P-values are represented with stars according to the following convention: * <0.05; ** <0.01; *** <0.001; **** <0.0001. P-values plotted by ggpubr are uncorrected.
Filtering of microbiota data was performed initially to remove ASVs that were not assigned to a Phylum or which were present on only a single individual. Microbiota alpha diversity was estimated using the Shannon diversity index. Total-sum scaled (TSS) normalized data was generated for betadiversity assessment using the Bray-Curtis index. PERMANOVA was performed using the adonis function with 999 permutations (vegan package, 49 version 2.5-6). Differential abundance between groups was tested at the genus level in genera present in at least 10% of individuals using an ensemble or consensus approach, as suggested by a recent publication. 26 This included a standard Wilcoxon rank sum test on TSS normalized data, an approach using centered-log ratio (clr) transformed data as implemented in the ALDEx2 package, 50 the Wilcoxon test on total sum scaled proportions and a negative binomial count method implemented in the DESeq2 package (version 1.24.0). 51 A false discovery rate threshold of 0.2 was applied and only taxa which were detected concordantly by all three approaches were included and plotted based on their DESeq2 'log2foldchange'. Plotting was performed using ggplot2 (version 3.2.1) 52 and ggpubr (version 0.2.3). 53 Triplots were created using the 'seqPCoA' function from the seqgroup package (https://github. com/hallucigenia-sparsa/seqgroup/) which wraps PCoA, PERMA-NOVA and envfit functions from the vegan package.
Dirichlet multinomial mixture (DMM) clustering of bacterial genera was performed with the DirichletMultinomial package in R (version 1.26.0), 54,55 using the number of clusters that minimized the Laplace approximation. Due to variability between runs, this procedure was re-run 20 times to ensure a convergent result.
For bile acid metabolites, TSS normalized data was compared between groups. For principal component analysis (PCA), bile acids concentrations were normalized and transformed in a similar manner as used in the metaboanalystR package (version 3.0.3), 56 replacing undetected values with 1/5 th of the minimum value per feature, followed by log-2 transformation and mean centering. ALDEx2 was used to test for differentially abundant bile acid metabolites.
The relative risk of dysplasia between low-risk and high-risk clusters was calculated with 95% confidence intervals 57 and statistical significance tests were performed using the Fisher exact test. For correlation between bacterial genera and bile acid metabolites (cholic acid (CA), chenodeoxycholic acid (CDCA), deoxycholic acid (DCA), lithocholic acid (LCA) and ursodeoxycholic acid (UDCA)), zero values were imputed using the 'cmult' function from the zCompositions package, 58 followed by the clr transformation from the compositions package. 59 A Spearman rank correlation was performed using the 'rcorr' function from the Hmisc package and only values with an FDR-corrected p-value of <0.2 and an absolute correlation coefficient of >0.4 were included. Genera were assigned to the DMM cluster for which the had the highest mean abundance and the resulting heatmap was plotted using the pheatmap package.

Contributorship statement
HS designed the study. CO and NR performed DNA sample extraction. J-MR, DL, PM, XT, MA, XR, GM, XD, JK, LB, PS, LRB and HS recruited patients to the study. DR, AnLam, EG performed bile acid analysis. HS and AL analyzed the data. All authors contributed to discussion and writing of the manuscript.

Disclosure statement
No potential conflict of interest was reported by the author(s).