Formation of secondary allo-bile acids by novel enzymes from gut Firmicutes

ABSTRACT The gut microbiome of vertebrates is capable of numerous biotransformations of bile acids, which are responsible for intestinal lipid digestion and function as key nutrient-signaling molecules. The human liver produces bile acids from cholesterol predominantly in the A/B-cis orientation in which the sterol rings are “kinked”, as well as small quantities of A/B-trans oriented “flat” stereoisomers known as “primary allo-bile acids”. While the complex multi-step bile acid 7α-dehydroxylation pathway has been well-studied for conversion of “kinked” primary bile acids such as cholic acid (CA) and chenodeoxycholic acid (CDCA) to deoxycholic acid (DCA) and lithocholic acid (LCA), respectively, the enzymatic basis for the formation of “flat” stereoisomers allo-deoxycholic acid (allo-DCA) and allo-lithocholic acid (allo-LCA) by Firmicutes has remained unsolved for three decades. Here, we present a novel mechanism by which Firmicutes generate the ”flat” bile acids allo-DCA and allo-LCA. The BaiA1 was shown to catalyze the final reduction from 3-oxo-allo-DCA to allo-DCA and 3-oxo-allo-LCA to allo-LCA. Phylogenetic and metagenomic analyses of human stool samples indicate that BaiP and BaiJ are encoded only in Firmicutes and differ from membrane-associated bile acid 5α-reductases recently reported in Bacteroidetes that indirectly generate allo-LCA from 3-oxo-Δ4-LCA. We further map the distribution of baiP and baiJ among Firmicutes in human metagenomes, demonstrating an increased abundance of the two genes in colorectal cancer (CRC) patients relative to healthy individuals.


Introduction
Bile acid synthesis in the liver represents a major route for removal of cholesterol from the body and bile acids function as an emulsifying agent for the digestion of lipid-soluble dietary components in the aqueous lumen of the small bowel. 1 In humans, the liver synthesizes two abundant primary bile acids, cholic acid (CA; 3ɑ-,7ɑ-,12ɑ-trihydroxy-5β-cholan-24-oic acid) and chenodeoxycholic acid (CDCA; 3ɑ-,7ɑdihydroxy-5β-cholan-24-oic acid) from cholesterol. Before active secretion from the liver, bile acids are conjugated to either taurine or glycine at the C-24 carboxyl group. 1 When bile acids reach the terminal ileum, they are actively transported across the epithelium into portal blood and returned to the liver in a process known as enterohepatic circulation (EHC). Daily, several hundred milligrams of bile acids escape EHC and enter the large intestine. Colonic bacteria are capable of carrying out numerous biotransformations of primary bile acids to diverse secondary bile acids in the large intestine. The composition of intestinal and fecal bile acids in germ-free animals reflects the biliary composition. [2][3][4][5] Meanwhile, in conventional animals with a normal gut microbiota, fecal bile acid composition is diversified from only a few primary bile acids synthesized by the host to an estimated ~400 secondary bile acid products. 6,7 Bacterial modifications to bile acids provide a form of interdomain communication given that beyond mere lipid-digesting detergents, bile acids are important nutrient-signaling molecules. 8 Indeed, microbial metabolism of bile acids is widely recognized to contribute to numerous human disorders including, but not limited to, cancers of the liver 9,10 and colon, 11 obesity, type 2 diabetes, nonalcoholic fatty liver disease (NAFLD), 12,13 cholesterol gallstone disease, 14,15 Alzheimer's disease, 16,17 and cardiovascular disease. 18 A myriad of microbial bile acid biotransformations occur in the large intestine and include two key transformations. First, the conjugated bile acids are hydrolyzed to unconjugated bile acids and glycine or taurine by bile salt hydrolase (BSH). 19 Second, the unconjugated primary bile acids CA and CDCA are converted to deoxycholic acid (DCA; 3ɑ-,12ɑdihydroxy-5β-cholan-24-oic acid) and lithocholic acid (LCA; 3ɑ-hydroxy-5β-cholan-24-oic acid) 20 via 7ɑ-dehydroxylation, respectively. BSH (EC 3.5.1.24) enzymes are widely distributed among predominant microbial phyla within the domains Bacteria and Archaea inhabiting the human GI tract and catalyze the substrate-limiting deconjugation of bile acid amides. 19 The resulting major secondary bile acids routinely measured in human fecal samples are unconjugated derivatives of DCA and LCA. 20 A bile acid inducible (bai) regulon encoding enzymes involved in the conversion of CA to DCA (Figure 1), and CDCA and ursodeoxycholic acid (UDCA; 3ɑ-,7β-dihydroxy-5β-cholan-24-oic acid) to LCA has been elucidated over the past three decades in strains of Lachnoclostridium scindens (formerly Clostridium scindens), Peptacetobacter hiranonis (formerly Clostridium hiranonis), and Lachnoclostridum hylemonae (formerly Clostridium hylemonae). 20 Discovery and characterization of bai genes have allowed recent studies to extend the species distribution of 7-dehydroxylating bacteria into new families within the Firmicutes through bioinformatics-based searches of metagenomic sequence databases. 21,22 Similarly, comparison of the distribution of bai genes between fecal metagenomes obtained from healthy and disease cohorts has also enabled the association of the abundance of bai genes with risk for adenomatous polyps 23 or colorectal cancer. 24 This agrees with bile acid metabolomic studies that demonstrate increased fecal and serum DCA and LCA derivatives in subjects at high risk for CRC. [25][26][27][28][29][30] Conversely, lower abundance of bai genes is associated with bile acid dysbiosis characterized by increased fecal conjugated primary bile acids in inflammatory bowel diseases. 31,32 There are additional bai genes yet to be accounted for in strains of L. scindens that result in the formation of stereoisomers of DCA and LCA known as "secondary allo-bile acids". In 1991, Hylemon et al. 33 reported that allo-deoxycholic acid (allo-DCA; 3ɑ-,12ɑdihydroxy-5ɑ-cholen-24-oic acid) formation is a CAinducible side-product of bile acid 7-dehydroxylation by L. scindens. During the conversion of cholesterol to the primary bile acids CA and CDCA, the liver enzyme Δ 4 -3-ketosteroid-5β-reductase (3-oxo-Δ 4steroid-5β-reductase; AKR1D1) saturates the Δ 4bond generating steroid A/B rings in the cisorientation which appear "kinked" (Figure 1). When CA is transported into bacteria expressing bai genes, the first oxidative steps of bile 7-dehydroxylation, catalyzed by BaiA and BaiCD, "resetting" A/B ring stereochemistry through formation of the 3-keto-Δ 4 structure. 20 This is followed by the rate-limiting 7ɑdehydration (BaiE). 34 The BaiCD was shown to then re-establish stereochemistry by catalyzing the conversion of 3-oxo-Δ 4 -DCA (12ɑ-hydroxy-3-oxo-5β-chol -4-en-24-oic acid) to 3-oxo-DCA (12ɑ-hydroxy -3-oxo-5β-cholan-24-oic acid), which is further reduced by BaiA1 and BaiA2 to DCA. 35 The current model of bile acid 7ɑ-dehydroxylation suggests that another enzyme, currently unknown, acts on 3-oxo-Δ 4 -DCA to form the alternative stereoisomer, 3-oxoallo-DCA (12ɑ-hydroxy-3-oxo-5ɑ-cholan-24-oic acid), which is reduced by another unknown reductase to allo-DCA. Secondary allo-bile acids have a "flat" shape owing to hydrogenation that results in an A/B-trans orientation ( Figure 1). While few studies have reported measurement of allo-DCA and allo-LCA (3-oxo-5ɑ-cholan-24-oic acid), two studies have shown these bile acids are enriched in the feces of patients with CRC. 36,37 Derivatives of allo-LCA are also reported to be enriched in Japanese centenarians, 38 although there is a paucity of measurement of secondary allo-bile acids across populations and disease states. Thus, determining the gene(s) encoding reductases in L. scindens and other gut microbes responsible for the formation of allo-DCA and allo-LCA is of biomedical importance.
We recently reported genome-wide transcriptome profiling of L. scindens ATCC 35704 in the presence of CA and DCA and identified a potential candidate bile acid-inducible 3-oxo-Δ 4 -5ɑreductase. 39 Here, we confirm that this candidate bile acid-inducible gene encodes a novel bile acid 3-oxo-Δ 4 -5ɑ-reductase responsible for secondary allo-bile acids formation. We have named this gene in L. scindens ATCC 35704 the baiP gene. We previously reported identification of the baiJ gene as part of a polycistronic operon in L. scindens VPI 12708 and L. hylemonae DSM 15053, whose function remained unknown. 40 Our current study reports that the baiJ gene also encodes a bile acid 3-oxo-Δ 4 -5ɑ-reductase. The baiP and baiJ genes are distributed solely among the Firmicutes. Identification of these bai genes may provide the BaiB, Bile acid CoA ligase; BaiA, 3α-hydroxysteroid dehydrogenase; BaiCD, 3-dehydro-Δ 4 -7α-oxidoreductase; BaiE, 7α-dehydratase; BaiF, CoA transferase; BaiH, 3-dehydro-Δ 4 -7β-oxidoreductase. The enzymes involved in the sequential reduction of 3-oxo-Δ 4 -DCA and allo-DCA are currently unknown. ability to predict and potentiate the formation of alternative forms of secondary bile acids whose ring structures are "flat" rather than the "kinked" form produced by the host. Indeed, we developed Hidden Markov Models (HMMs) of bai proteins and determined the distribution of baiP and baiJ in human metagenomes, demonstrating increased abundance in colorectal cancer (CRC) patients relative to healthy individuals.

Phylogenetic analysis of BaiP followed by functional assay reveals the baiJ gene also encodes bile acid 5ɑ-reductase
Having provided experimental evidence that baiP encodes an enzyme with bile acid 5ɑ-reductase activity, we wanted to determine the phylogeny of the BaiP from L. scindens ATCC 35704. A subtree of the >1,400 sequences representing close relatives of the BaiP from L. scindens ATCC 35704 was generated ( Figure 3a). The proteins most closely related to BaiP from L. scindens ATCC 35704 in the "BaiP Cluster" were from Lachnoclostridium strains MSK.5.24, GGCC_0168, and Lachnospiraceae bacterium 5_1_57FAA. Additional FAD-dependent oxidoreductase BaiP candidates from a penguin isolate, Proteocatella sphenisci DSM 23131 (76% sequence identity), and P. hiranonis 15,44 (72% sequence identity) were also identified at high bootstrap values (90-100%). Previous work established bai genes in P. hiranonis, 45 although the present data provide first indication that P. hiranonis has the potential to form secondary allo-bile acids (Figure 3a, 3b). P. sphenisci has also been reported to encode the bai polycistronic operon, 21,22 and our demonstration that P. sphenisci harbors baiP indicate that secondary allo-bile acids may constitute part of the bile acid metabolome of penguin guano (Figure 3b).
A second closest FAD-dependent oxidoreductase cluster (~45% ID) to BaiP from L. scindens ATCC 35704 was composed of the previously named BaiJ proteins from L. scindens VPI 12708, L. hylemonae DSM 15053, and P. hiranonis DSM13275, as well as Dorea sp. D27, and an unclassified Clostridium sp. ("BaiJ Cluster"). Prior work established a novel bai operon in which the baiJ gene is adjacent to the baiK gene on a polycistronic operon in L. scindens VPI 12708 and L. hylemonae DSM 15053. 40 Evidence was also presented that L. scindens VPI 12708 and L. hylemonae DSM 15053 formed allo-DCA. 46 It was then reported that the BaiK is a paralog of BaiF in L. scindens VPI 12708, and both proteins catalyze bile acid coenzyme A transferase from the end-product secondary bile acids, DCA~SCoA and allo-DCA~SCoA, to primary bile acids including CA, CDCA, allo-CA, and UDCA. 40 The baiJ gene has been shown previously to be enriched in the gut microbiome in mouse models of liver cancer and CRC, 9,24 diseases reported to be enriched in secondary allo-bile acids in the biliary pool in the few studies that have measured them. 47 Taken together, the close phylogenetic clustering of BaiJ with BaiP indicates that the baiJ gene may also encode a bile acid 5ɑ-reductase isoform (Figure 3a, 3b). 21,44,45 To test this hypothesis, we cloned and overexpressed the baiJ gene from L. scindens VPI 12708 (accession number: ACF20978) in E. coli BL21(DE3) (Figure 3c

BaiP and BaiA1 catalyze consecutive final reductive steps in the formation of allo-DCA and allo-LCA
Having established that BaiP converts 3-oxo-Δ 4 -LCA to 3-oxo-allo-LCA, we next sought to identify an enzyme from L. scindens ATCC 35704 catalyzing the final reductive step from 3-oxo-allo-LCA to allo-LCA. There is compelling evidence that BaiA1 and BaiA2 enzymes catalyze the first oxidative and last reductive steps in the pathway. 35,48,49 This comes from substrate-specificity and kinetic analyses of BaiA1 and BaiA2 showing that 3-oxo-DCA and 3-oxo-LCA are substrates 48 and by the observation that BaiA is sufficient for the final reductive step yielding DCA. 35 Prior work established that the baiA genes encode bile acid 3ɑ-hydroxysteroid dehydrogenase (3ɑ-HSDH) that catalyze the first oxidation step, formation of 3-oxo-7ɑ-hydroxy-5β-bile acids, and the final reductive step generating 7-deoxy-3ɑ-hydroxy-5β-bile acids. 49  A previous bioinformatics study hypothesized based on gene context and annotation that CLOSCI_00522, a gene directly downstream from baiN (CLOSCI_00523), encodes a predicted NAD(FAD)-utilizing dehydrogenase involved in the final reductive step 31, (Figure S1). This gene was named "baiO". 31 An organism may encode several proteins from different lineages that have similar catalytic activity. Indeed, the BaiN 50 is predicted to catalyze similar sequential reactions to BaiH and BaiCD. 35 We therefore tested the hypothesis that the previously annotated baiO encodes either a bile acid 3-oxo-Δ 4 -reductase and/or bile acid 3ɑ-HSDH. We cloned the baiO in pETduet and verified the expression after His-tag purification and SDS-PAGE (Figure S1a, S1b). Analysis of bile acid products after 24 h incubation of E. coli expressing BaiO enzyme in a resting cell assay with either 3-oxo-LCA, 3-oxo-DCA (Figure S1c, S1d), 3-oxo-Δ 4 -LCA, or 3-oxo-Δ 4 -DCA (Figure S1e, S1f), did not yield a detectable product by LC/MS. While this does not disprove that CLOSCI_00522 is involved in bile acid metabolism, we were not able to confirm its function.

The distribution of baiP and baiJ genes in public human metagenome datasets
Having shown that BaiP clusters with the previously identified BaiJ from L. hylemonae DSM 15053, the next objective was to determine the presence of bai genes involved in bile acid 7-dehydroxylation among bacterial genomes from human stool samples. We utilized reference sequences of BaiP and BaiJ as well as BaiE and BaiCD (Figure 5a) to generate HMMs in order to search public human metagenomic databases. We expected that the occurrence of BaiE and BaiCD which are cotranscribed on the multi-gene bai operon will coincide with the relative abundances of BaiP and BaiJ. As expected, genes for BaiE and BaiCD as well as BaiP and BaiJ were observed to have similar relative frequency (1% and 0.9% of total metagenome assembled genomes (MAGs), respectively). All genes were largely represented by unclassified Firmicutes and Lachnospiraceae. (Figure 5a). Representative genera were analyzed to identify candidates which possess multiple genes of the Bai operon which revealed that unclassified Firmicutes, unclassified Lachnospiraceae, and Flavonifractor harbored all four genes analyzed. This pathway analysis also revealed the novel finding that Flavonifractor and Pseudoflavonifractor harbor genes for bile acid 7-dehydroxylation. Intriguingly, while bai genes represented approximately 1% of total MAGs, genes were detected in approximately one third of subjects (BaiCD 35%, BaiE 35%, BaiJ 30%, and BaiP 28%). An analysis of differences in gene presence among healthy subjects and those with adenoma and carcinoma revealed that the genes had the greatest abundance in patients with carcinoma, and that the genes baiCD, baiE, and baiJ were significantly associated with carcinoma ( Figure 5b, Table S4, S5)

Discussion
The results of the current study add to a growing literature demonstrating that the colonic microbes are capable of "resetting" stereochemistry of sterols undergoing enterohepatic circulation through expression of 5ɑ-reductase and 5β-reductase enzymes. So far, two mechanisms have been identified: (1) A direct mechanism whereby bacteria encoding the multi-step bile acid 7ɑdehydroxylation pathway convert primary bile acids to either secondary bile acids via BaiCD/ BaiN or as shown herein secondary allo-bile acids via BaiP/BaiJ activities; and (2) an indirect mechanism in which certain species of Bacteroidetes convert 5β-secondary bile acids DCA and LCA to 3-oxo-Δ 4 -intermediates, followed by reduction to secondary allo-bile acids. 38 The current work is thus a significant advance toward determining the enzymatic basis for the formation of secondary allo-bile acids by the gut microbiome ( Figure 6).
Bile acid intermediates in the 7ɑ-dehydroxylation pathway have been determined previously. Björkhem et al. 51  C] CA to volunteers followed by analysis of tritium loss after extraction from duodenal aspirates confirmed that 3-oxo-Δ 4 -bile acid intermediates were formed during conversion of CA to DCA. 33 Subsequent work incubating [24-14 C] CA with cell extracts of L. scindens VPI 12708 revealed a multi-enzyme pathway necessary to convert CA to DCA (and CDCA to LCA). 52 Hylemon and Bjӧrkhem (1991) isolated nine [24-14 C] CA intermediates after incubation with cell-free extracts of CA-induced whole cells of L. scindens VPI 12708 providing the biochemical framework to search for enzymes involved in bile acid 7ɑ-dehydroxylation. 33 Subsequent work determined that bile acid 7ɑdehydroxylation proceeds by two oxidation steps yielding a 7ɑ-hydroxy-3-oxo-Δ 4 -intermediate, the substrate for the rate-limiting enzyme, bile acid 7ɑdehydratase (BaiE). 34,41,53 Removal of the C7hydroxyl yields a 7-deoxy-3-oxo-Δ 4 -intermediate which is then reduced by flavoproteins BaiN 50 or BaiH 35 to a 7-deoxy-3-oxo-Δ 4 -intermediate. The BaiCD and BaiA isoforms then convert 7-deoxy-3-oxo-Δ 4 -intermediates to DCA or LCA. 35,53 One of the bile acid-inducible [24-14 C] CA metabolites identified was [24-14 C] allo-DCA, indicating that L. scindens possesses an enzyme with bile acid 5ɑreductase distinct from BaiCD (bile acid 5βreductase). 33 The current results establish conclusively that the baiP and baiJ genes encode bile acid 5ɑreductases in different strains of L. scindens and related Firmicutes that catalyze the formation of allo-DCA and allo-LCA.
Previous work also demonstrated that BaiA1 and BaiA2 catalyze both the initial oxidation and final reduction in the formation of DCA and LCA. 35,48 However, a recent report named a gene (CLOSCI_00522) adjacent to baiN, the "baiO" that encodes a predicted 61 kDa flavin-dependent dehydrogenase proposed to catalyze the final reductive step in the pathway. 31 We tested both BaiA1 and BaiO for reduction of allo-DCA and allo-LCA. While the function of CLOSCI_00522 in bile acid metabolism remains unclear, our results have extended the functional role of the BaiA1. We determined for the first time that this enzyme converts 3-oxo-allo-DCA and 3-oxo-allo-LCA to allo-DCA and allo-LCA, respectively.
The functional role of the previously reported baiJKL operon in L. scindens VPI 12708 and L. hylemonae DSM 15053 has also been extended by the current study. 40 Ridlon and Hylemon (2012) reported that BaiK and BaiF catalyze bile acid~CoA transferase from secondary bile acids, including allodeoxycholyl~SCoA, to primary bile acids. 40 The baiJ gene was annotated as "flavin-dependent fumarate reductase" and "3-ketosteroid-Δ 1dehydrogenase", and is co-expressed with baiKL under the control of the conserved bai promoter. 40 We previously observed bile acid induction of baiJKL genes by RT-PCR 40 and RNA-Seq 54 in L. hylemonae DSM 15053. Also, the baiJ gene was reported to be enriched in the gut microbiome in mouse models of liver cancer and CRC. 9,24 Fecal secondary allo-bile acids have also been reported to be enriched in GI cancers. 47 Phylogenetic analysis of BaiP from L. scindens ATCC 35704 revealed two clusters harboring Firmicutes encoding the bai pathway, many of which, such as P. hiranonis, L. hylemonae, and strains of L. scindens, are known to convert CA and CDCA to DCA and LCA, respectively. These clusters are also represented by taxa such as Dorea sp. D27, P. sphenisci, and Oscillospiraceae MAGs whose genome sequences contain bai operons. 21,22 Clusters with more distant homologs of BaiP are also worth examining in future studies for novel bile acid 3-oxo-Δ 4reductases. Mining human metagenomic datasets for "core" Bai proteins (BaiCD, BaiE) as well as BaiP and BaiJ sequences confirmed that these enzymes are only encoded in Firmicutes. Roughly a third of healthy, adenoma, and carcinoma subjects had detectable BaiE enzymes representing ~1% of MAGs. A combination of low abundance bile acid 7-dehydroxylating Firmicutes and stringency of the HMM search likely explains the low representation of subjects with detectable Bai enzymes. Intriguingly, and in line with previous reports, 24 Bai enzymes are enriched in CRC subjects relative to healthy subjects.
There is a paucity of studies on secondary allo-bile acids, and the literature which exists is conflicting as to whether to regard these hydrophobic "flat" bile acids as beneficial, disease promoting, or contextually important. [36][37][38]47 Recent work measured the secondary allo-bile acid iso-allo-LCA in fecal samples at an average concentration of ~20 μM, and that low micromolar levels, such as those achieved in our resting cell assays, inhibit the growth of gram-positive pathogens including Clostridioides difficile 38 (Figure 6). There is a recent growing interest in the immune mechanisms of action of secondary bile acid derivatives and isomers in the colon. Secondary bile acid derivatives, including 3-oxo-DCA, 3-oxo-LCA, iso-DCA (3β, 12ɑ-dihydroxy-5β-cholan-24-oic acid), iso-LCA (3βhydroxy-5β-cholan-24-oic acid), and certain secondary allo-bile acids (e.g. iso-allo-LCA: 3β-hydroxy-5ɑcholan-24-oic acid), regulate the balance of regulatory T cells (Treg) and pro-inflammatory T H 17 cells by promoting expansion of Tregs. [55][56][57] The current work is thus an important contribution in a rapidly evolving area of the role of diverse bile acid metabolites generated by the gut microbiome on mechanisms underlying host health and disease.

Cloning of bai operon genes from L. scindens strains
The strains/plasmids, primers, and synthetic DNA sequences used in this study are listed in Table S1, S2, and S3, respectively. First, baiP gene encoding FAD-dependent oxidoreductase and baiA1 gene encoding 3α-HSDH from L. scindens ATCC 35704, baiJ gene encoding FAD-dependent oxidoreductase from L. scindens VPI 12708, and baiO encoding a predicted 61 kDa flavin-dependent dehydrogenase were codon-optimized for E. coli and synthesized using gBlocks service from Integrated DNA Technologies (IDT, IA, USA). To construct a BaiP, BaiJ, BaiO or BaiA1 expression plasmid (pBaiP, pBaiJ, pBaiO or pBaiA1), a DNA fragment (vector fraction) was amplified from the pETduet plasmid using a primer pair of V1-F and V1-R, V1-F and V1-R, V1-F and V1-R, or V2-F and V2-R, respectively. Another DNA fragment (insert fraction) was amplified from the synthetic oligomers of BaiP, BaiJ, BaiO or BaiA1 using a primer pair of BaiP-F and BaiP-R, BaiJ-F and BaiJ-R, BaiO-F and BaiO-R or BaiA1-F and BaiA1-R, respectively. The two pairs of PCR products were ligated together by in vitro homologous recombination using a Gibson assembly cloning kit (NEB, Boston, MA, USA), respectively. For construction of a BaiP and BaiA1 co-expression plasmid (pBaiP-A1), a DNA fragment (vector fraction) was amplified from the pBaiP plasmid using a pair of the primers V2-F and V2-R, and another DNA fragment (insert fraction) was amplified from the synthetic oligomer of BaiA1 using a pair of the primers BaiA1-F and BaiA1-R. The two PCR products were ligated together by the Gibson assembly cloning kit (NEB) Recombinant plasmids (Table S1) were transformed into chemically competent E. coli Top10 cells via heat-shock method, respectively, plated, and grown for overnight at 37°C on lysogeny broth (LB) agar plates supplemented with appropriate antibiotics (Ampicillin: 100 μg/ml). A single colony from each transformation was inoculated into LB medium (5 ml) containing the corresponding antibiotic. The cells were subsequently centrifuged (3,220 × g, 10 min, 4°C) and plasmids were extracted from the cell pallets using QIAprep Spin Miniprep kit (Qiagen, CA, USA). The sequences of the inserts were confirmed by Sanger sequencing (ACGT Inc, Wheeling, IL, USA).

Heterologous expression and purification of Bai enzymes in E. coli
For protein expression, the extracted recombinant plasmids were transformed into E. coli BL21(DE3) cells by use of electroporation method, respectively, and cultured overnight at 37°C on LB agar plates supplementary with appropriate antibiotics. Selected colonies were inoculated into 10 mL of LB medium containing the corresponding antibiotic and grown at 37°C for 6 h with vigorous aeration. The precultures were added to fresh LB medium (1 L), supplemented with appropriate antibiotics, and aerated at 37°C until reaching an OD 600 (optical density of a sample measured at a wavelength of 600 nm) of 0.3. IPTG was added to each culture at a final concentration of 0.1 mM to induce and the temperature was decreased to 16°C. Following 16 h of culturing, cells were pelleted by centrifugation (4000 × g, 30 min, 4°C) and resuspended in 30 ml of binding buffer (20 mM Tris-HCl, 300 mM NaCl, 10 mM 2-mercaptoethanol, pH 7.9). The cell suspension was subjected to an ultra sonicator (Fisher Scientific) and the cell debris was separated by centrifugation (20,000 × g, 40 min, 4°C).
The recombinant protein in the soluble fraction was then purified using TALON Metal Affinity Resin (Clontech Laboratories, CA, USA) per manufacturer's protocol. The recombinant protein was eluted using an elution buffer composed of 20 mM Tris-HCl, 300 mM NaCl, 10 mM 2-mercaptoethanol, and 250 mM imidazole at pH 7.9. The resulting purified protein was analyzed using sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE).

Whole cell bile acid conversion assay
E. coli BL21(DE3) strains harboring the constructed plasmids were cultured aerobically at 25°C on LB medium (10 mL) supplementary with appropriate antibiotics and expressed the corresponding proteins by IPTG induction at 25°C. Following 16 h of culturing, the strains were pelleted by centrifugation (3,220 × g, 10 min) and washed twice with anaerobic PBS solution. The washed E. coli strains were inoculated along with 50 μM bile acid substrates (3-oxo-Δ 4 -LCA, 3-oxo-Δ 4 -DCA, or 3-oxo-allo-LCA) into 10 mL of PBS and incubated anaerobically at room temperature for 12 h. The whole cell reaction cultures were centrifuged at 3,220 × g for 10 min to remove bacterial cells and adjusted the pH of the supernatant to pH 3.0 by adding 25 μL of 2 N HCl. Bile acid metabolites were extracted by vortexing with two volumes of ethyl acetate for 1 to 2 min. The organic layer was recovered and evaporated under nitrogen gas. The products were dissolved in 200 μL methanol and analyzed by liquid chromatography-mass spectrometry (LC-MS).

Liquid chromatography-mass spectrometry
LC-MS analysis for all samples was performed using a Waters Acquity UPLC system coupled to a Waters SYNAPT G2-Si ESI mass spectrometer (Milford, MA, USA). For the bile acids as substrates and products of whole cell bioconversion assay by the E. coli strains expressing BaiP, BaiJ, or BaiP-A1 enzymes (3-oxo-Δ 4 -LCA, 3-oxo-Δ 4 -DCA, 3-oxo-LCA, 3-oxo-allo-LCA, 3-oxo-DCA, LCA, allo-LCA, DCA, and allo-DCA) analysis, LC was performed with a Waters Acquity UPLC HSS T3 C18 column (1.8 μm particle size, 2.1 mm × 100 mm) at a column temperature of 40°C. Samples were injected at 0.2 μL. Mobile phase A was a mixture of acetonitrile and methanol (50/50, v/v), and B was 10 mM ammonium acetate. The mobile phase composition was 75% of mobile phase A and 25% of mobile phase B and ran an isocratic mode. The flow rate of the mobile phase was 0.5 mL/ min. MS was carried out in negative ion mode with a desolvation temperature of 400°C and desolvation gas flow of 800 L/hr. The capillary voltage was 2,000 V. Source temperature was 120°C, and the cone voltage was 30 V. Chromatographs and mass spectrometry data were analyzed using Waters MassLynx software. Analytes were identified according to their mass and retention time. For quantification of 3-oxo-allo-LCA produced by the E. coli BL21 (DE3) expressing BaiP/BaiJ strains, a standard curve was obtained, and then 3-oxo-allo-LCA was quantified based on the standard curve ( Figure S5). The limit of detection (LOD) for 3-oxo-Δ 4 -LCA, 3-oxo-allo-LCA, and allo-LCA was 0.1 μmol/L.
For the cortisol and 11β-OHAD as substrates and products of whole cell bioconversion assay by the E. coli strain expressing BaiP enzyme analysis, LC was performed with a Waters Acquity UPLC BEH C18 column (1.7 μm particle size, 2.1 mm × 50 mm) at a column temperature of 40°C. Samples were injected at 0.2 μL. Mobile phase A was a mixture of 95% water, 5% acetonitrile, and 0.1% formic acid, and B was a mixture of 95% acetonitrile, 5% water, and 0.1% formic acid. The mobile phase gradient was as follows: 0 min 100% mobile phase A, 0.5 min 100% A, 6.0 min 30% A, 7.0 min 0% A, 8.1 min 100% A, and 10.0 min 100% A. The flow rate of the mobile phase was 0.5 mL/min. MS was carried out in positive ion mode with a desolvation temperature of 450°C and desolvation gas flow of 800 L/hr. The capillary voltage was 3,000 V. Source temperature was 120°C, and the cone voltage was 30 V.

NMR spectroscopy
To determine the molecular structure of the chemically synthesized 3-oxo-Δ 4 -DCA and allo-DCA at the atomic level, NMR spectroscopy was perfor med. 1 H-NMR spectra were recorded on a JNM-ECA -500 spectrometer (JEOL Co., Tokyo, Japan) at 500 MHz, with pyridine-D 5 as the solvent. Chemical shifts are given as the δ-value with tetramethylsilane (TMS) as an internal standard. The abbreviation used here: s, singlet; d, doublet; bs, broad singlet.

Phylogenetic analysis
Sequences for phylogenetic analyses were retrieved from NCBI's NR protein database using the sequence of HDCHBGLK_03451 as the query and limiting the number of resulting database matches to five thousand and allowing a maximum alignment E-value of 1E-10 for BLASTP v. 2.12.0 +.- 59 The retrieved alignments showed high sequence conservation, therefore the worst E-value seen in the alignments was about 3E-37.
Given the high sequence similarities observed in the search step, sequences were clustered with USEARCH v. 11.0.667 60 to remove redundancy from the dataset. The cluster_fast command was used with an identity threshold of at least 95% to cluster sequences. Each cluster was represented in the phylogenetic analysis by one representative, the centroid sequence. The only exception was the sequences in the same cluster as the query sequence used above, in which case all sequences from the Centroids 25% shorter or longer than the average sequence length calculated for the whole dataset (596 amino acids) were removed from the dataset, thus keeping in the analysis only sequences with at least 446 and at most 744 amino acids in length. The 1,460 protein sequences remaining in the dataset were aligned by MUSCLE v. 3.8.1551 61 and the best-fitting sequence substitution model was identified using ModelTest-NG v. 0.1.7. 62 Phylogenetic Figure 6. Direct and indirect formation of secondary allo-bile acids, and their potential consequences. Taurocholic acid is deconjugated, mainly in the large intestine, by diverse gut microbial taxa. Free cholic acid is imported into a few species of Firmicutes that harbor the bai regulon. Direct Pathway: After several oxidative steps, and rate-limiting 7α-dehydration, 3-oxo-Δ 4 -DCA becomes a substrate for BaiCD forming DCA or BaiP/BaiJ forming alloDCA. Indirect Pathway: DCA is imported into Bacteroidetes strains that express 3α-HSDH and 5β-reductase (5BR) which converts DCA to 3-oxo-Δ 4 -DCA. Expression of 5α-reductase (5AR) and 3β-HSDH sequentially reduce 3-oxo-Δ 4 -DCA to iso-allo-DCA. Alternatively, allo-DCA generated by Firmicutes can be isomerized to iso-allo-DCA by species expressing 3α-HSDH and 3β-HSDH such as Eggerthella lenta. While taurocholic acid is a germination factor for C. difficile, secondary bile acids such as DCA and secondary allo-bile acids are inhibitory toward C. difficile vegetative cells in the GI tract. Secondary bile acids, including DCA and allo-DCA, are associated with increased risk of colorectal cancer (CRC). tree inference was performed using the maximum likelihood criterion as implemented by RAxML v. -8.2.12, 63 using the WAG sequence substitution model with empirical residue frequencies, gammadistributed substitution rates, and bootstrap pseudoreplicates (whose number, 250, was determined automatically by the program at run-time). The resulting phylogenetic tree was edited with TreeGraph2 v. 2.15.0-887 64 and Dendroscope v. 3.7.6 65 and further cosmetic adjustments were performed with the Inkscape vector editor (https://inkscape.org/ last accessed on January, 20th, 2022).

Bai gene identification in MAG database
A database of publicly available MAGs from five cohorts varying in CRC status was previously annotated for open reading frames and used for this study. 66,67 Custom Hidden Markov Model (HMM) profiles were created for each of the 4 genes of interest (baiCD, baiE, baiP, and baiJ) by creating an alignment of reference protein sequences in this study and blastp results with 60% identity to those reference sequences and then passing the alignments to hmmbuild to create an HMM profile. Initial HMM cutoffs were generated by querying protein sequences from the Human Microbiome Project. 66 To further refine HMM profile cutoffs, blast databases were made of each alignment and a concatenated file of predicted open reading frames from the 16,936 MAGs described earlier were queried against the alignment databases. The MAG database was searched using the HMM profiles with finalized cutoffs and hmmsearch within HMMER 3.1b2. All custom HMM profiles used for these searches can be found at: https:// github.com/escowley/BileAcid_LeeJ.

Summary calculations and statistical analysis for association of Bai genes with disease state from MAG database
Summary calculations of number of gene hits in the MAG database, number of participants with the gene of interest, and disease information were performed in R and can be found in Table S4. Methods for determining associations between Bai genes and disease state were previously described. 66 Briefly, chi squared tests were performed on a dataset of binarized participants that were designated as "presence" if any of their MAGs contained a copy of the gene of interest or "absence" if none of their recovered MAGs contained a copy of the gene of interest. P-values less than 0.05 are designated as significant (Table S5).