Completion of the gut microbial epi-bile acid pathway

ABSTRACT Bile acids are detergent molecules that solubilize dietary lipids and lipid-soluble vitamins. Humans synthesize bile acids with α-orientation hydroxyl groups which can be biotransformed by gut microbiota to toxic, hydrophobic bile acids, such as deoxycholic acid (DCA). Gut microbiota can also convert hydroxyl groups from the α-orientation through an oxo-intermediate to the β-orientation, resulting in more hydrophilic, less toxic bile acids. This interconversion is catalyzed by regio- (C-3 vs. C-7) and stereospecific (α vs. β) hydroxysteroid dehydrogenases (HSDHs). So far, genes encoding the urso- (7α-HSDH & 7β-HSDH) and iso- (3α-HSDH & 3β-HSDH) bile acid pathways have been described. Recently, multiple human gut clostridia were reported to encode 12α-HSDH, which interconverts DCA and 12-oxolithocholic acid (12-oxoLCA). 12β-HSDH completes the epi-bile acid pathway by converting 12-oxoLCA to the 12β-bile acid denoted epiDCA; however, a gene(s) encoding this enzyme has yet to be identified. We confirmed 12β-HSDH activity in cultures of Clostridium paraputrificum ATCC 25780. From six candidate C. paraputrificum ATCC 25780 oxidoreductase genes, we discovered the first gene (DR024_RS09610) encoding bile acid 12β-HSDH. Phylogenetic analysis revealed unforeseen diversity for 12β-HSDH, leading to validation of two additional bile acid 12β-HSDHs through a synthetic biology approach. By comparison to a previous phylogenetic analysis of 12α-HSDH, we identified the first potential C-12 epimerizing strains: Collinsella tanakaei YIT 12063 and Collinsella stercoris DSM 13279. A Hidden Markov Model search against human gut metagenomes located putative 12β-HSDH genes in about 30% of subjects within the cohorts analyzed, indicating this gene is relevant in the human gut microbiome.


Introduction
The human liver produces all 14 enzymes necessary to convert cholesterol into the dihydroxy bile acid chenodeoxycholic acid (3α,7α-dihydroxy-5βcholan-24-oic acid; CDCA) and the trihydroxy bile acid cholic acid (3α,7α,12α-trihydroxy-5βcholan-24-oic acid; CA). 1 These bile acids are conjugated to taurine or glycine in the liver helping to lower the pK a and maintain solubility, impermeability to cell membranes, and lower the critical micellar concentration, allowing for efficient emulsification of dietary lipids and lipid-soluble vitamins. 2 Bile acids are effective detergents owing to the α-orientation of the hydroxyl groups which produce a hydrophilic-face above the plane of the cyclopentanophenanthrene steroid nucleus, and a hydrophobic-face below the plane of the hydrocarbon rings. 1 Conjugated bile acids emulsify lipids throughout the duodenum, jejunum, and ileum. Once bile acids reach the terminal ileum, high affinity transporters (intestinal bile acid transporter, IBAT) actively transport both conjugated and unconjugated bile acids from the intestinal lumen into ileocytes where they are bound to ileal bile acid binding protein (IBABP) and exported across the basolateral membrane into portal circulation and returned to the liver. 3 This process of recycling bile acids is known as enterohepatic circulation (EHC) and is responsible for recirculating the ~2 g bile acid pool 8-10 times daily. While ~95% efficient, roughly 600-800 mg bile acids escape active transport and enter the large intestine. 2 Anaerobic bacteria adapted to inhabiting the large intestine have evolved enzymes to modify the structure of host bile acids. 2 Conjugated bile acids are hydrolyzed, releasing the amino acids, by bile salt hydrolases (BSH) in diverse gut bacteria representing the major phyla, including Bacteroidetes, Firmicutes, and Actinobacteria, as well as the domain Archaea. 4 In contrast, the unconjugated primary bile acids CA and CDCA are 7α-dehydroxylated by a select few species of gram-positive Firmicutes mostly in the genus Clostridium, forming deoxycholic acid (3α,12αdihydroxy-5β-cholan-24-oic acid; DCA) and lithocholic acid (3α-hydroxy-5β-cholan-24-oic acid; LCA), respectively. 1,5 The secondary bile acids DCA and LCA have increased hydrophobicity relative to their primary counterparts, which is associated with elevated toxicity. 6 DCA and LCA have been causally linked to cancers of the colon, 7 liver, 8 and esophagus. 9 Importantly, gut microbiota can produce less toxic oxo-bile acids and β-hydroxy bile acids as well. 6 Bile acid 3α-, 7α-, and 12α-hydroxyl groups can be reversibly oxidized and epimerized to the βorientation by pyridine nucleotide-dependent hydroxysteroid dehydrogenases (HSDHs) distributed across the major phyla including Firmicutes, Bacteroidetes, Actinobacteria, Proteobacteria, as well as methanogenic archaea. 1,10 HSDH enzymes that recognize bile acids are regio-(C-3 vs. C-7) and stereospecific (α vs. β) for hydroxyl groups decorating the steroid nucleus. Thus, bile acid 12α-HSDH reversibly converts the C-12 position of bile acids from the αorientation, such as on DCA, to 12-oxo bile acids, such as 12-oxolithocholic acid . [11][12][13][14] Bile acid 12β-HSDH completes the epimerization by interconverting 12-oxo bile acids to the 12βconfiguration, forming epi-bile acids. We recently identified and characterized NAD(H)-and NADP(H)-dependent 12α-HSDHs from Eggerthella sp. CAG:298 15 , Clostridium scindens, C. hylemonae, and Peptacetobacter hiranonis (formerly Clostridium hiranonis). 10 In addition to these recently reported 12ɑ-HSDHs, multiple genes encoding enzymes in the urso-(7α-& 7β-HSDH) and iso-(3α-& 3β-HSDH) bile acid pathways have been described to date ( Figure 1). 5 However, a gene encoding 12β-HSDH to complete the epi-bile acid pathway has not yet been reported.
The first indication that gut bacteria may encode 12β-HSDH was suggested by the detection of 12βhydroxy bile acids in human feces. [16][17][18] Edenharder and Schneider (1985) reported 12β- Figure 1. A gene encoding 12β-HSDH completes the gut microbial epi-bile acid pathway. Cholic acid (CA) is converted to the oxointermediate, 7-oxodeoxycholic acid , and further to ursoCA through the urso-bile acid pathway catalyzed by NAD(P)dependent 7α-and 7β-HSDH. The secondary bile acid deoxycholic acid (DCA) is formed through the multi-step 7α-dehydroxylation of CA. DCA is biotransformed to 3-oxoDCA by 3α-HSDH and to isoDCA by 3β-HSDH in the iso-bile acid pathway. DCA can be converted to 12-oxolithocholic acid (12-oxoLCA) by 12α-HSDH and from 12-oxoLCA to epiDCA by 12β-HSDH. Examples of bacteria expressing each HSDH are shown below the reaction followed by corresponding gene annotations. Prior to this study, a gene encoding 12β-HSDH had not been identified. dehydrogenation of bile acids by Clostridium paraputrificum, and epimerization of DCA by coculture with E. lenta and C. paraputrificum. 19 Thereafter, Edenharder and Pfützner (1988) characterized crude NADP(H)-dependent 12β-HSDH activity from C. paraputrificum D 762-06. 20 However, little is known about the potential diversity of gut bacteria capable of forming 12β-hydroxy bile acids that molecular analysis is predicted to yield. Here, we report the identification of a gene encoding NADP(H)-dependent 12β-HSDH from C. paraputrificum ATCC 25,780 and characterization of the recombinant gene products purified after heterologous expression in E. coli from C. paraputrificum. We also identify novel taxa encoding bile acid 12β-HSDH by phylogenetic analysis, confirmed by a synthetic biology approach.

C. paraputrificum ATCC 25780 possesses bile acid 12β-HSDH activity
We first investigated the bile acid metabolizing capability of C. paraputrificum ATCC 25780 because previous studies reported bile acid 12β-HSDH activity in other C. paraputrificum strains, but did not identify the gene(s) responsible. 20 The epi-bile acid pathway of DCA involves the reversible conversion of DCA (3α,12α) to 12-oxoLCA (3α,12-oxo) through the action of 12α-HSDH, and 12-oxoLCA to epiDCA (3α,12β) by 12β-HSDH ( Figure 1). C. paraputrificum ATCC 25780 was incubated with two potential substrates of 12β-HSDH, 12-oxoLCA and epiDCA, along with DCA as a control. In order to contrast the product formed by bile acid 12β-HSDH with that formed by bile acid 12α-HSDH activity, Clostridium scindens ATCC 35704, which is known to express 12α-HSDH, was incubated with the same substrates. When 12-oxoLCA was incubated in cultures of C. paraputrificum ATCC 25780, the primary product eluted at 13.97 min with 391.28 m/z in negative ion mode ( Figure 2). This is consistent with the elution time of epiDCA standard and its 392.57 amu formula weight. With epiDCA as substrate, the culture produced a major peak of 391. 28

Identification of a gene encoding bile acid 12β-HSDH
After bile acid 12β-HSDH activity was confirmed in C. paraputrificum ATCC 25780, its genome was searched for genes encoding proteins annotated as oxidoreductases within the NCBI database. HSDHs are NAD(P)-dependent and often members of the large and diverse SDR (short-chain dehydrogenase/ reductase) family. 21 Five SDR family oxidoreductase proteins and one aldo/keto reductase were identified as 12β-HSDH candidates in the C. paraputrificum ATCC 25780 genome and pursued for further study. These six genes were amplified from genomic DNA of C. paraputrificum ATCC 25780, cloned into the pET-28a(+) vector, and overexpressed in E. coli (Table S1). The N-terminal His 6 -tagged recombinant proteins were purified by metal-affinity chromatography and resolved by SDS-PAGE ( Figure 3a). Two out of the six recombinant proteins (WP_027096909.1, WP_027099631.1) were not soluble and bands at the expected molecular masses were apparent in the membrane fraction by SDS-PAGE. These proteins were not explored further. The other four 12β-HSDH candidates (WP_027099077.1, WP_027098355.1, WP_027097937.1, WP_027098604.1) were soluble and visualized by SDS-PAGE. The four soluble recombinant proteins were then screened for pyridine nucleotide-dependent bile acid 12β-HSDH activity by TLC and spectrophotometric assay. Screening reactions were prepared with 12-oxoLCA and NADPH, or epiDCA and NADP + in pH 7.0 phosphate buffer.
Only WP_027099077.1 exhibited 12β-HSDH activity by TLC and spectrophotometric assay, which was also confirmed by LC-MS (Figure 3b). Reaction products of WP_027099077.1 with 12-oxoLCA and NADPH, epiDCA and NADP + , DCA and NADP + , and no substrate control were subjected to LC-MS. In the presence of purified recombinant WP_027099077.1 and NADPH, 12-oxoLCA was reduced quantitatively (2 hydrogen addition) to a product that eluted at 13.12 min with 391.28 m/z in negative ion mode. This is consistent with the 392.57 amu formula weight and elution time for epiDCA based on the substrate standard. Additionally, epiDCA was oxidized to a product with an elution time of 13.40 min at 389.27 m/z, agreeing with the retention time and formula weight of 390.56 amu for authentic 12-oxoLCA. DCA (392.57 amu) was not converted by WP_027099077.1 as the sole peak observed matched DCA standard at 14.60 min and 391.29 m/z. The interconversion of 12-oxoLCA and epiDCA, but no activity with DCA, indicates stereospecificity for the 12β-hydroxy position. Thus, DR024_RS09610 has been identified as the first gene reported that encodes bile acid 12β-HSDH (WP_027099077.1).
Recombinant C. paraputrificum WP_027099077.1, hereafter referred to as Cp12β-HSDH, had a theoretical subunit molecular mass of 27.4 kDa. The observed subunit molecular mass was 26.4 ± 0.5 kDa by SDS-PAGE, calculated from three independent protein gels. WP_027099077.1 is predicted to be a cytosolic protein that is not membrane-associated by TMHMM v. 2.0. 22

Biochemical characterization of recombinant Cp12β-HSDH
The approximate native molecular mass of Cp12β-HSDH was determined by size-exclusion chromatography. Cp12β-HSDH exhibited an elution volume of 15.04 ± 0.02 mL, corresponding to a 54.67 ± 0.79 kDa molecular mass relative to protein standards (Figure 4a). The size-exclusion data along with the theoretical subunit molecular mass of 27.4 kDa suggests Cp12β-HSDH assembles a homodimeric quaternary structure in solution.
In order to optimize the enzymatic activity of Cp12β-HSDH, the conversion of pyridine nucleotides at 340 nm was measured in buffers between pH 6.0 and 8.0 by spectrophotometric assay ( Figure  4b). The optimum pH for Cp12β-HSDH in the oxidative direction with epiDCA as the substrate and NADP + as co-substrate was pH 7.5. In the reductive direction where 12-oxoLCA was the substrate and NADPH the co-substrate, the optimum pH was 7.0. Michaelis-Menten kinetics were performed at the pH optimum for each direction. In the reductive direction, Cp12β-HSDH displayed a K m value for 12-oxoLCA at 18.76 ± 0.40 µM which was  similar to that of NADPH (Table 1; Figure S1). The K m value in the oxidative direction with epiDCA as substrate was about twice the K m determined for 12-oxoLCA. The K m for NADP + was 36.84 ± 0.55 µM. The V max and k cat were greater in the oxidative than the reductive direction. However, the catalytic efficiency (k cat /K m ) of 12-oxoLCA as substrate was greater than the oxidative direction with epiDCA as substrate.

Phylogenetic analysis of Cp12β-HSDH
The Cp12β-HSDH sequence from C. paraputrificum ATCC 25780 (WP_027099077.1) was used in a BLASTP search against the NCBI non-redundant protein database in order to determine its prevalence across bacteria. A maximum likelihood phylogeny of 5,000 sequences was constructed, revealing that many sequences most similar to Cp12β-HSDH are found in Firmicutes and Actinobacteria ( Figure S2). Within the 5,000-member phylogeny, a subtree (highlighted gray) of the most closely related proteins to Cp12β-HSDH was selected for closer inspection ( Figure 5). Cp12β-HSDH clustered most closely with other C. paraputrificum sequences Values represent the mean ± SD of three or more replicates.   Values represent the mean ± SD of three or more replicates.
The subtree also contains many sequences from Actinobacteria, the genera Collinsella and Olsenella among them. Collinsella species are of interest because C. aerofaciens expresses BSH and various HSDHs recognizing sterols, 25 including bile acid 12α-HSDH. 26 To determine if a member of the Actinobacteria encodes a bile acid 12β-HSDH, a sequence more distantly related to Cp12β-HSDH, Olsenella sp. GAM18 WP_120179297.1, was chosen for gene synthesis and protein overexpression because it had not been shown previously to metabolize bile acids ( Figure 5). 12-oxoLCA, epiDCA and DCA were tested as substrates and conversion was measured by spectrophotometric assay. Recombinant WP_120179297.1 displayed activity with 12-oxoLCA at 128% relative to Cp12β-HSDH, 69% relative activity with epiDCA, and showed no reaction with DCA ( Table 2). These data confirm that the more distantly related WP_120179297.1 has bile acid 12β-HSDH activity.
Within the extended subtree are various Novosphingobium species. These Alphaproteobacteria deserve mention due to their ability to biodegrade aromatic compounds, such as phenanthrene 27 and estrogen. 28 To test if this cluster has bile acid 12β-HSDH activity, WP_007678535.1 from Novosphingobium sp. AP12 was synthesized, cloned, overexpressed, and purified ( Figure 5). The potential 12β-HSDH activity of WP_007678535.1 was screened using 12-oxoLCA, epiDCA, and DCA as substrates. WP_007678535.1 exhibited no activity with these bile acid substrates (Table 2). Because Novosphingobium strains are frequently plantassociated or isolated from aquatic environments, 29 this enzyme may be specific for other substrates.
The genomic context of 12β-HSDH genes from C. paraputrificum ATCC 25780, Eisenbergiella sp. OF01-20, and Olsenella sp. GAM18 was explored ( Figure S3). The three 12β-HSDH genes did not appear to be organized within an operon nor was the genomic context conserved across these organisms.
Two organisms present in the 12β-HSDH subtree, Collinsella tanakaei and Collinsella stercoris ( Figure 5, asterisks), were also found in a previous phylogenetic analysis of putative 12α-HSDHs. 10 Due to strain variation within species, we inspected the sequences further on the NCBI database and determined that the pairs of HSDHs are encoded by the same strain within each species. Collinsella tanakaei YIT 12063 12α-HSDH (WP_009141301.1) and 12β-HSDH (WP_009140706.1) are encoded by the genes HMPREF9452_RS06335 and HMPREF9452_RS03390, respectively. Collinsella stercoris DSM 13279 also contains both putative 12α-HSDH (WP_040360544.1; COLSTE_RS02900) and 12β-HSDH (WP_006720039.1; COLSTE_RS01465). 10 While the paired activity of 12α/12β-HSDH has not been tested in culture, these organisms may be novel epi-bile acid epimerizing strains that convert bile acid 12α-hydroxyl groups to the epi-configuration. To our knowledge, these are the first strains identified with C-12 epimerizing ability.

Hidden Markov Model search of putative 12β-HSDH genes in human gut metagenomes
To understand the distribution of potential 12β-HSDH genes in the human colonic microbiome, a Hidden Markov Model (HMM) search was performed against metagenome assembled genomes (MAGs) from four publicly available cohorts [30][31][32][33] using reference sequences from the 12β-HSDHs characterized in this paper ( Figure 5). Putative 12β-HSDH genes inferred by HMM search were found in ~30% of the subjects (198/666) (Figure 6a). Twenty-two subjects exhibited two different organisms containing the gene. This gene was found in healthy subjects as well as subjects with the following disease states: colorectal cancer, colorectal adenoma, fatty liver, hypertension, and type 2 diabetes.
Two hundred twenty microbial genomes contained putative 12β-HSDH genes among 16,936 total available genomes. Putative 12β-HSDH genes were most often identified in the phylum Firmicutes, which was dominated by genes in Lachnospira eligens (formerly Eubacterium eligens) (Figure 6b). The gene from L. eligens was widespread across subjects in each of the four cohorts. This large proportion of hits from L. eligens may reflect its higher relative abundance allowing it to be assembled better into genomes. Sequences from this organism also appeared multiple times in the 12β-HSDH subtree ( Figure 5). Lachnospira eligens is a pectin degrader capable of promoting antiinflammatory cytokine IL-10 production in vitro 34 and has been proposed as a probiotic for atherosclerosis. 35 The gene was also present in C. paraputrificum along with other unidentified Clostridium sp. and Eubacterium sp. Actinobacteria had few members with the gene, represented by Collinsella intestinalis, Collinsella tanakaei, and Olsenella sp. Phocaeicola coprocola (formerly Bacteroides coprocola) was the only member of phylum Bacteroidetes with the gene.

Phylogenetic analysis of regio-and stereospecific HSDHs
Next, the phylogenetic relationship between Cp12β-HSDH (WP_027099077.1) and other regio-and stereospecific HSDHs was explored. To accomplish this, we updated the HSDH phylogeny presented by Mythen et al. (2018) by including additional bacterial or archaeal HSDH sequences of known or putative function along with representative eukaryotic sequences ( Figure 7; Table S2). 15 The sequences included span the known HSDH functional capacities, with some recognizing bile acids and others recognizing steroids like cortisol or progesterone. Most members of each HSDH class are clustered together, which is apparent by each highlight color encompassing more than one HSDH of the same known function. Furthermore, most bacterial HSDHs grouped separately from their eukaryotic counterparts. Prokaryotic sequences were interspersed among the eukaryotic with some exceptions in grouping by HSDH function. Cp12β-HSDH, the two other confirmed bile acid 12β-HSDHs (WP_118677302.1, WP_120179297.1), and additional similar sequences from across our bile acid 12β-HSDH subtree formed their own cluster. These sequences shared a branch with bacterial bile acid 12α-HSDHs as well as eukaryotic 3β-HSD/Δ 5 →Δ 4 -isomerases. Bile acid 12α-HSDH sequences included various clostridia (EDS06338.1, EEG75500.1, EEA85268.1, ERJ00208.1) 10, 36 and Eggerthella (CDD59475.1). 15 Collinsella aerofaciens (EBA39192.1), which has been reported to express bile acid 12α-HSDH activity, 25 grouped with the known bile acid 12α-HSDHs along with two human gut archaeal sequences from Methanosphaera stadtmanae and Methanobrevibacter smithii.

Discussion
Microbial bile acid HSDHs have been studied since the early 1970s, with much of the original work focusing on 3α-and 7α-HSDHs. 14,49 Thereafter, 3β-, 7β-and 12α-HSDH activity was observed in cultures of various microbiota, 1 including Eggerthella lenta (formerly Eubacterium lentum) which is capable of oxidizing CA and DCA at C-12 and epimerizing bile acids at C-3. 50 In the mid-1980s, C. paraputrificum, C. tertium, and Clostridioides difficile each in binary cultures with E. lenta were shown to epimerize DCA via a 12-oxo -intermediate to epiDCA. 18 Since then, HSDH genes encoding the iso-and urso-bile acid pathways and 12α-HSDH were identified, but not 12β-HSDH. 5 In this work, we identified the first bile acid 12β-HSDH gene, completing the microbial epi-bile acid pathway.
Edenharder & Pfützner (1988) initially characterized NADP(H)-dependent 12β-HSDH from crude extracts of the fecal isolate C. paraputrificum strain D 762-06, with differing results from our findings. 19 Gel filtration analysis of crude extract from C. paraputrificum strain D 762-06 suggested a molecular mass of 126 kDa, whereas our current work with Cp12β-HSDH from ATCC 25780 is estimated at 54.6 KDa by gel filtration chromatography. The strain used in this study, C. paraputrificum ATCC 25780, was also isolated from feces. 51 It is possible that these are the same NADP(H)-dependent enzymes by sequence from two different strains of C. paraputrificum and that the recombinant protein quaternary structure is unstable, resulting in a dimeric form in our study. Alternatively, these bacterial strains may have distinct versions of 12β-HSDH with different amino acid sequences, as we have shown previously for 12α-HSDH from Eggerthella lenta. 35,52 Indeed, the 12β-HSDH from C. paraputrificum strain D 762-06 was reported to be partially membrane associated, whereas hydropathy prediction by TMHMM v. 2.0 found no evidence of transmembrane domains in Cp12β-HSDH. In addition, pH optima for the conversion of 12-oxoLCA between Cp12β-HSDH (7.0) and the native 12β-HSDH (10.0) from strain D 762-06 differed. Oxidation of epiDCA was optimal at pH 7.5 for Cp12β-HSDH, and reported as pH 7.8 for the crude native enzyme from strain D 762-06. 19 Further work will be needed to determine if distinct bile acid 12β-HSDHs are present in C. paraputrificum strains.
Cp12β-HSDH exhibited a dimeric quaternary structure by size-exclusion chromatography under our experimental conditions. Although future crystallization of Cp12β-HSDH would better illustrate its true polymeric state, HSDHs are often either tetrameric 42,53 or dimeric. 54,55 Cp12β-HSDH was more specific for bile acids lacking a position 7-hydroxyl group: epiDCA and 12-oxoLCA, over epiCA and 12-oxoCDCA. Cp12β-HSDH also had lower activity with 3,12-dioxoLCA versus 12-oxoLCA. This indicates that both the 7-hydroxyl and 3-oxo groups hinder the ability of Cp12β-HSDH to convert the substrate. An x-ray crystal structure of Cp12β-HSDH may shed light on why this apparent steric hindrance occurs.
Phylogenetic analysis of Cp12β-HSDH coupled with synthetic biological "sampling" and validation at different points along the branches revealed shared 12β-HSDH function among Eisenbergiella sp. OF01-20 and Olsenella sp. GAM18, lending functional credibility to sequences throughout the subtree ( Figure 5; Table 2). Eisenbergiella sp. OF01-20 was originally sequenced from a human gut microbiota cultivation project (Integrated Microbial Genomes [IMG] Genome ID: 2840324701). Eisenbergiella spp. are often present at relative abundances of less than 0.1% in human fecal samples. 56,57 Olsenella sp. GAM18 was initially isolated from humans (IMG Genome ID: 2841219092). The relative abundance of Olsenella was shown to be about 2% within the gut microbiome of some individuals. 58 Our subtree includes more abundant gut taxa such as Ruminococcus (relative abundance ~5%) 59,60 and Collinsella (relative abundance ~8%), 59 as well. Due to limitations in 16S rDNA sequencing depth, it is difficult to conclude if the species in our subtree are found at relevant levels in the human gut or if 12β-HSDH genes are present. Therefore, we performed a HMM search to assess the relative prevalence of 12β-HSDH genes. About 30% of subjects had putative 12β-HSDH genes, indicating the relevance of this gene in the human gut microbiome. The HMM search revealed that 220 microbial genomes out of 16,936 total contained putative 12β-HSDH genes. While concrete prevalence is difficult to establish, putative 12β-HSDH genes are less widespread than the ubiquitous bile-acid metabolizing gene, bile salt hydrolase, 4 which was present in 2,456/16,936 total genomes in these cohorts. These data expand the limited metagenomic work that has focused on bile acid HSDH genes in the human gut. 61 Two organisms from our 12β-HSDH subtree were also identified in a previous 12α-HSDH phylogeny from Doden et al. (2018). 10  12α-HSDH (WP_040360544.1; COLSTE_RS02900). 10 Although the dual 12α/12β-HSDH activity is untested in culture, we predict these strains are novel C-12 epimerizers. Epimerizing strains have been identified for the C-3 19,41 and C-7 hydroxyl 1,62 positions, however, this is the first indication of bacteria capable of C-12 epimerization.
Indeed, this function joins the vast repertoire of HSDHs already studied in many Firmicutes and Actinobacteria. 1 Bile acid 12α-HSDH activity has been detected in Eggerthella species 35,52 in the phylum Actinobacteria and various clostridia [10][11][12]36 in the phylum Firmicutes. Similarly, 3α-and 3β-HSDH are widespread among Firmicutes, 1,65 and 3α-HSDH has also been reported in Eggerthella species. 13,35,41 7α-and 7β-HSDH were shown in numerous Firmicutes 14,37,65 and the Actinobacteria Collinsella aerofaciens. 24 Along with these bile acid-specific HSDHs, the glucocorticoid 20α-and 20β-HSDHs are evident in both Firmicutes 43,45 and Actinobacteria such as Bifidobacterium adolescentis. 42, Until this study, there were no reports of genes encoding 12β-HSDH and the activity had only been shown in C. paraputrificum, C. tertium and C. difficile. 18 Thus, our phylogenetic analysis revealed hitherto unknown diversity for bile acid 12β-HSDHs within the Firmicutes and Actinobacteria. Bacteroidetes sequences were notably absent within our 12β-HSDH subtree and only one sequence was identified in our HMM search, although Bacteroidetes have been shown to encode multiple other HSDHs. 1,49 Interestingly, C. tertium and C. difficile enzymes were also not present in our phylogenetic analysis even though this activity has been reported for strains of these clostridia, 18 indicating that genes encoding other forms of bile acid 12β-HSDH are present in the gut microbiome.
The distribution pattern of microbial HSDHs is becoming increasingly clear ( Figures 5 & 7), although in many cases the evolutionary pressures on gut microbes for encoding particular regio-and stereospecific HSDH enzymes is not clear. As observed with BSH enzymes, the functional importance of HSDHs may be strain-dependent. In some strains, the mere ability to acquire or dispose of reducing equivalents may be important, and the class of enzyme unimportant. Bile acid hydroxylation patterns affect the binding and activation/inhibition of host nuclear receptors. 66 HSDH enzymes may thus act in interkingdom-signaling, a hypothesis that has recent support based on the effect of oxidized and epimerized bile acids on the function of regulatory T cells. 67,68 The concerted action of pairs of HSDHs result in bile acid products with reduced toxicity for microbes expressing the HSDH(s) or for an important inter-species partner, which was likely a factor in the evolution of these enzymes. Examples of strains of species capable of epimerizing bile acid hydroxyl groups are found in the literature, and the physicochemical properties and reduced toxicity of β-hydroxy bile acids are known, providing hypotheses for physiological function. Clostridium limosum (now Hathewaya limosa) expresses both bile acid-inducible NADP-dependent 7α-and 7β-HSDH capable of converting CDCA to UDCA. 69 CDCA is more hydrophilic and more toxic to bacteria than UDCA. 6,70 Indeed, treatment with UDCA increases the hydrophilicity of the biliary pool, reducing cellular toxicity and improving biliary disorders. 71 Similarly, strains of Eggerthella lenta 15,41 and Ruminococcus gnavus 41 express both NADPH-dependent 3α-and 3β-HSDHs capable of forming 3β-bile acids (iso-bile acids). Iso-bile acids are also more hydrophilic and less toxic to bacteria than the α-hydroxy isomers. 41 At least some strains of R. gnavus also express NADPH-dependent 7β-HSDH, contributing to the epimerization of CDCA to UDCA. 39 It may be speculated that R. gnavus HSDHs function in detoxification of hydrophobic bile acids such as CDCA and DCA; however, further work is needed. Analogous to E. lenta and R. gnavus, C. paraputrificum is another example of a strain encoding multiple HSDHs that favor formation of β-hydroxy bile acids. 19 C. paraputrificum strains encode the iso-bile acid pathway as well as NADPH-dependent 12β-HSDH. 18,19 While little is known about the biological effects of 12β-bile acids (epi-bile acids), the physicochemical properties relative to 12α-hydroxy bile acids should approximate that of iso-and urso-derivatives. 6,41,70 An important question emerging from these observations is whether one particular epimeric product rather than another has important consequences on the fitness of the bacterium generating them, or if the increased hydrophilicity and reduced toxicity are the key factors.
Since the initial detection of epi-bile acids by Eneroth et. al. and Ali et. al., [15][16][17] the measurement of bile acid metabolomes in clinical samples has become commonplace, 72 yet few studies measure or report epi-bile acids. Recently, 12β-hydroxy and 12-oxo-bile acids have been quantified in human feces by Franco et. al. (2019). 12-oxoLCA was the most abundant oxo-bile acid in feces at concentrations of about one half that of DCA in stool. While epiDCA itself was not measured, 3-oxo-12βhydroxy-CDCA was shown at 12 ± 4 µg/g wet feces. 73 Additionally, epiDCA has been reported in biliary bile of angelfish, likely produced from bacterial origin, so the 12β-HSDH gene may be widespread among resident microbiota of diverse vertebrate taxa. 74 A critical limitation to the study of epi-bile acids is the absence of commercially available standards, although there are methods available for chemical synthesis. 75,76 The newly identified bile acid 12β-HSDHs could be employed for the enzymatic production of epi-bile acid standards from oxo-intermediates.
The physiological effects of epi-bile acids are poorly characterized, particularly in the GI tract. Borgstrӧm and colleagues compared infusion of CA, ursoCA, and epiCA on bile flow, lipid secretion, bile acid synthesis, and bile micellar formation. In contrast to ursoCA and CA, epiCA was secreted into bile in an unconjugated form. The 12β-hydroxyl group may hinder the enzyme responsible for conjugation. Additionally, epiCA infusion increased the rate of secretion of newly synthesized bile salts. 77 Another study reported increased 12-oxoLCA levels in rats with high tumor incidence when they were fed a high safflower oil or corn oil diet. 78 While the toxicity of epi-bile acids has not yet been tested relative to the secondary bile acids DCA or LCA, both 12-oxoLCA and epiDCA are less hydrophobic than DCA by LC-MS (Figures 2 & 3). Due to the involvement of DCA in cancers of the liver and colon, 7,8 bile acid 12β-HSDH may be of therapeutic importance in modulating the bile acid pool in favor of epiDCA over toxic DCA. Future studies with animal models will be imperative to determine the effects of epibile acids on host physiology.

Whole cell bile acid conversion assay
C. paraputrificum ATCC 25780 and C. scindens ATCC 35704 were cultivated in anaerobic brain heart infusion (BHI) broth for 24 hrs. Two mL anaerobic BHI was inoculated with 1:10 dilution of either organism along with 50 μM bile acid substrate and incubated at 37°C for 24 hours. The bacterial cultures were centrifuged at 10,000 × g for 5 min to remove bacterial cells and the conditioned medium was adjusted to pH 3.0. Solid phase extraction was used to extract bile acid products from bacterial culture. Waters tC18 vacuum cartridges (3 cc) (Milford, MA, USA) were preconditioned with 6 mL 100% hexanes, 3 mL 100% acetone, 6 mL 100% methanol, and 6 mL water (pH 3.0). The conditioned medium was added to the cartridges and vacuum was applied to pull media through dropwise. Cartridges were washed with 6 mL water (pH 3.0) and 40% methanol. Bile acid products were eluted with 3 mL 100% methanol. Eluates were then evaporated under nitrogen gas and the residues dissolved in 200 μL 100% methanol for LC-MS analysis.

Liquid chromatography-mass spectrometry
LC-MS analysis for all samples was performed using a Waters Acquity UPLC system coupled to a Waters SYNAPT G2-Si ESI mass spectrometer (Milford, MA, USA). LC was performed with a Waters Acquity UPLC HSS T3 C18 column (1.8 μm particle size, 2.1 mm x 100 mm) at a column temperature of 40°C. Samples were injected at 1 μL. Mobile phase A was water and B was acetonitrile. The mobile phase gradient was as follows: 0 min 100% mobile phase A, 0.5 min 100% A, 25 min 0% A, 25.1 min 100% A, 28 min 100% A. The flow rate was 0.5 mL/min. MS was carried out in negative ion mode with a desolvation temperature of 300°C and desolvation gas flow of 700 L/hr. The capillary voltage was 3,000 V. Source temperature was 100° C and cone voltage was 30 V. Chromatographs and mass spectrometry data were analyzed using Waters MassLynx software (Milford, MA, USA).

Isolation of genomic DNA
Genomic DNA was extracted from C. paraputrificum ATCC 25780 using the Fast DNA isolation kit from Mo-Bio (Carlsbad, CA, USA) according to the manufacturer's protocol for polymerase chain reaction and molecular cloning applications.

Heterologous expression of potential 12β-HSDH proteins
The pET-28a(+) and pET-46 Ek/LIC vectors were obtained from Novagen (San Diego, CA, USA). Restriction enzymes were purchased from NEB (Ipswich, MA, USA). Inserts were generated by PCR amplification with cloning primers from Integrative DNA Technologies (Coralville, IA, USA) of C. paraputrificum ATCC 25780 genomic DNA or genes synthesized in E. coli K12 codon usage (IDT, Coralville, IA, USA). Cloning primers and genes created by gene synthesis are listed in Table S1. Inserts were amplified using the Phusion High Fidelity Polymerase (Stratagene, La Jolla, CA, USA) and cloned into pET-28a(+) after insert and vector were double digested with the appropriate restriction endonuclease and treated with DNA Ligase, or annealed into pET-46 Ek/LIC after treatment with T4 DNA Polymerase. Recombinant plasmids were transformed via heat shock method, plated, and grown overnight at 37°C on lysogeny broth (LB) agar plates supplemented with antibiotic (50 µg/mL kanamycin or 100 µg/mL ampicillin). Vectors were either transformed into chemically competent E. coli DH5α cells and grown with kanamycin (pET-28a(+)) or transformed into NovaBlue GigaSingles™ Competent cells and grown with ampicillin (pET-46 Ek/LIC). A single colony from each transformation was inoculated into LB medium (5 mL) containing the corresponding antibiotic and grown to saturation. Recombinant plasmids were extracted from cell pellets using the QIAprep For protein expression, the extracted recombinant plasmids were transformed into E. coli BL-21 CodonPlus (DE3) RIPL chemically competent cells by heat shock method and cultured overnight at 37° C on LB agar plates supplemented with ampicillin or kanamycin (100 µg/ml; 50 µg/mL) and chloramphenicol (50 µg/ml). Selected colonies were inoculated into 10 mL of LB medium supplemented with antibiotics and grown at 37°C for 6 hours with vigorous aeration. The pre-cultures were added to fresh LB medium (1 L), supplemented with antibiotics, and aerated at 37°C until reaching an OD 600nm of 0.3. IPTG was added to each culture at a final concentration of 0.1 mM to induce and the temperature was decreased to 16°C for a 16-hour incubation. Cells were pelleted and resuspended in binding buffer (20 mM Tris-HCl, 300 mM NaCl, 10 mM 2-mercaptoethanol, pH 7.9). The cells were subjected to five passages through an EmulsiFlex C-3 cell homogenizer (Avestin, Ottawa, Canada), and the cell debris was separated by centrifugation.
The recombinant protein in the soluble fraction was then purified using TALON® Metal Affinity Resin (Clontech Laboratories, Mountain View, CA, USA) per the manufacturer's protocol. The recombinant protein was eluted using an elution buffer composed of 20 mM Tris-HCl, 300 mM NaCl, 10 mM 2-mercaptoethanol, and 250 mM imidazole at pH 7.9. The resulting purified protein was analyzed using sodium dodecyl sulfatepolyacrylamide gel electrophoresis (SDS-PAGE). The observed subunit mass for each was calculated by migration distance of purified protein to standard proteins in ImageJ (https://imagej.nih.gov/ij/ docs/faqs.html). TMHMM v. 2.0 was used to predict transmembrane helices. 21

Enzyme Assays
Pure recombinant 12β-HSDH reaction mixtures were made using 50 μM substrate, 150 μM cofactor and 10 nM enzyme in 150 mM NaCl, 50 mM sodium phosphate buffer at the pH optima 7.0 or 7.5.
Reactions were monitored by spectrophotometric assay measuring the oxidation or reduction of NADP(H) aerobically at 340 nm (6,220 M −1 .cm −1 ) continuously for 1.5 min on a NanoDrop 2000c UV-Vis spectrophotometer (Fisher Scientific, Pittsburgh, PA, USA) using a 10 mm quartz cuvette (Starna Cells, Atascadera, CA, USA). Additional reactions were incubated overnight at room temperature and extracted by vortexing with two volumes ethyl acetate twice. The organic layer was recovered and evaporated under nitrogen gas. The products were dissolved in 50 μL methanol and LC-MS was performed as described above or used for thin layer chromatography.
The buffers for investigation of the optimal pH of recombinant 12β-HSDH contained 150 mM NaCl and one of the following buffering agents: 50 mM sodium acetate (pH 6.0), 50 mM sodium phosphate (pH 6.5 to 7.5), and 50 mM Tris-Cl (pH 8.0). Substrate specificity was performed according to the above reaction conditions at the optimal pH.
The reaction mixtures for kinetic analysis were 10 nM enzyme, sodium phosphate buffer (pH 7.0), and 150 µM NADPH for varying concentrations of 12-oxoLCA or 80 µM 12-oxoLCA for varying NADPH concentrations in the reductive direction. The oxidative reaction mixture contained 10 nM enzyme, sodium phosphate buffer (pH 7.5), and 300 µM NADP + when epiDCA concentrations were changed or 100 µM epiDCA when NADP + was varied. Kinetic parameters were estimated with GraphPad Prism (GraphPad Prism, La Jolla, CA, USA) to fit the data using nonlinear regression to the Michaelis-Menten equation.

Thin layer chromatography
Reaction mixtures were made using 50 μM substrate, 150 μM cofactor and 10 nM enzyme in 150 mM NaCl, 50 mM sodium phosphate buffer at pH 7.0. Reactions were incubated overnight at room temperature and extracted by vortexing with two volumes ethyl acetate twice. The organic layer was recovered and evaporated under nitrogen gas. The products were dissolved in 50 μL methanol and spotted on a TLC plate (silica gel IB2-F flexible TLC sheet, 20 × 20 cm, 250 μm analytical layer; J. T. Baker, Avantor Performance Materials, LLC, PA, USA). The steroids were separated with a 70:20:2 toluene-1,4-dioxane-acetic acid mobile phase and visualized using a 10% phosphomolybdic acid in ethanol spray and heating for 15 min at 100°C. 79

Native molecular weight determination
Size-exclusion chromatography was performed using a Superose 6 10/300 GL analytical column (GE Healthcare, Piscataway, NJ, USA) connected to an ÄKTAxpress chromatography system (GE Healthcare, Piscataway, NJ, USA) at 4°C. The column was equilibrated with 50 mM Tris-Cl and 150 mM NaCl at a pH of 7.5. The purified protein was loaded onto the analytical column at a concentration of 10 mg/mL and eluted at a flow rate of 0.3 ml/min. The native molecular mass of 12β-HSDH was determined by comparing its elution volume to that of Gel Filtration Standard proteins (Bio-Rad, Hercules, CA, USA): thyroglobulin, γ-globulin, ovalbumin, myoglobin, vitamin B 12 .

Phylogenetic Analysis
The sequence of the C. paraputrificum 12β-HSDH protein (accession number WP_027099077.1) was used as query for a similarity search against the NCBI non-redundant protein database by BLASTP, 80 with a maximum E-value threshold of 1e-10 and a limit of 5,000 results. Retrieved sequences were aligned with Muscle v. 3.8.1551 81 and analyzed by maximum likelihood with RAxML v. 8.2.11. 82 Selection of the best-fitting amino acid substitution model and number of bootstrap pseudoreplicates were performed automatically, and substitution rate heterogeneity was modeled with gamma distributed rate categories. The resulting phylogenetic tree was formatted by Dendroscope v. 3.5.10 83 and further cosmetic modifications were performed with the vector editor Inkscape, v. 0.92.4 (https://inkscape.org).
For closer analysis of the phylogenetic affiliation of C. paraputrificum ATCC 25780 12β-HSDH, sequences from the well-supported subtree where this sequence is located in the 5,000-sequence tree, plus an outgroup, were reanalyzed for confirming the relative placement of all sequences nearest to Cp12β-HSDH. The methods used were the same as described above for the full tree.
A maximum-likelihood tree of representative HSDH sequences was inferred by selecting sequences from each HSDH subfamily, based on the tree from Mythen et al. (2018), 35 with the addition of eukaryotic, archaeal, and other bacterial sequences deposited in the public databases. Phylogenetic inference methods were the same as described above.

Hidden Markov Model Search
A Hidden Markov Model (HMM) search was performed using a custom HMM profile against a concatenated file of metagenome assembled genomes (MAGs) from four publicly available cohorts. [29][30][31][32] MAGs were filtered for genome completeness, quality, and contamination as described. 84 For generation of the custom 12β-HSDH profile, reference sequences from the 12β-HSDHs characterized in this paper were aligned with MAFFT, manually trimmed, and constructed using hmmscan. 85 The MAG database was searched using HMMSearch version 3.3.0 85 , using an individually identified cutoff of 350.00. Resulting hits were then filtered to remove results less than 70% completeness and closest matched species were recorded. The HMM search file is publicly available at: https:// github.com/AnantharamanLab/doden_et_al_2021.