Functional profiling of bacterial communities in Lake Tuz using 16S rRNA gene sequences

Abstract The 16S rRNA amplicon sequencing technique is a key aspect of studies of microbial communities but does not provide direct evidence of a community’s functional capabilities. This work aimed to assess the structure of the uncultured bacterial communities from two locations in Lake Tuz in Turkey to provide information on their roles in the lake ecosystem. The most abundant phyla in the lake water were Firmicutes (84%) for lake sample site 1 (TG1), 70% for lake sample site 2 (TG2), Fusobacteria (9% for TG1, 22% for TG2) and Proteobacteria (6% for TG1, 7% for TG2). The most abundant genera were Romboutsia (45% for TG1, 35% for TG2), Clostridium sensu stricto 1 (8% for TG1, 8% for TG2), Cetobacterium (9% for TG1, 22% for TG2) and Photobacterium (2% for TG1, 3% for TG2). PICRUSt constitutes a novel bioinformatics tool to establish profiles for bacterial protein functions based on metagenomic 16S rRNA data for a community of unculturable bacteria. PICRUSt also provides information on whole-community metabolic functions related to adaptation, bioremediation potential and the ability of various groups of microorganisms to survive in highly saline water. The overall results provide an effective strategy for assessing the metabolic capacities of microbes in situ in a high-salt aquatic environment such as Lake Tuz and the potential of these microbes to serve as bioremediation agents. This approach provides useful insights into predictive metagenomics of an unculturable microbial community for which only marker-gene surveys are currently available.


Introduction
Natural hypersaline environments such as inland seas, lakes, salt marshes and marine salterns as well as manmade evaporation ponds are found on all continents. Thalassohaline brine lakes are formed by evaporation of seawater and have a near neutral pH and ionic composition like that of seawater, with NaCl as the main component. In contrast, athalassohaline brine lakes form as a consequence of local geology, and Mg 2þ and Ca 2þ are the predominant divalent cations. The defining characteristic of hypersaline habitats is their large diversity of salt-loving microbes. The concentrations of salts in hypersaline habitats vary from above 15% to saturation, and pH ranges from 6 to 11 [1]. Lake Tuz (in Turkish, Tuz G€ ol€ u) is a thalassohaline brine lake (salinity, 32.4%) located in the arid central plateau of Turkey, 105 km northeast of Konya and 150 km south/southeast of Ankara ( Figure 1). It is the second largest lake in Turkey. To date, little effort has been made to characterize the community of salttolerant microbes of Lake Tuz toward the goal of predicting the contribution of individual bacterial species to the ecosystem based on 16S rRNA gene sequences. Several studies have used both in vitro and classical microbe-identification approaches as well as molecular methods to determine the microbial diversity of aquatic environments [2][3][4][5][6]. Our current study complements the work of Mutlu et al. [4], who found that Lake Tuz is dominated by Archaea and Bacteria. However, our findings focus only on bacteria that predominate in Lake Tuz with respect to their molecular features that allow salt tolerance and their potential use for bioremediation of xenobiotics. Although a large breadth of bacterial diversity can be determined by analysing 16S rRNA gene sequences, this does not provide compelling information concerning the contributions made by the various bacterial species to the overall aquatic ecosystem. Regardless of this shortcoming, 16S rRNA gene amplicon sequencing, compared with conventional approaches, is more efficient at determining all the various aspects of microbial community structure and functions. Moreover, the use of primer pairs targeting the hypervariable region of the 16S rRNA gene may bias the estimates of bacterial abundance toward certain phyla [7]. This can be mitigated by using primer pairs that target the V3-V4 hypervariable regions of the 16S rRNA gene. Such primer pairs have been reported to capture the true range of bacterial phyla and thus are effective for Illumina MiSeq analyses of microbial communities that inhabit complex aquatic environments [8,9]. The advent of innovative molecular methods such as metagenomics using high-throughput Illumina technology and pyrosequencing has enabled the routine, comprehensive characterization of microbial communities using culture-independent methods. These techniques have reduced the cost and time required for sequencing and generate numerous sequences for each sample [10]. Moreover, these new methods can identify the predominant bacteria in a sample and permit the discovery of minor taxa in the bacterial community; importantly, minor taxa may play substantive roles in any given bacterial community or, if not, the minor taxa may be relics of an ancient ecosystem that underwent a shift in environmental conditions [11]. Analysis of the 16S rRNA gene amplicon remains the standard method for culture-independent studies of microbial diversity, and this approach has made it clear that there exist enormous numbers of unculturable (and therefore uncharacterized) microbes whose metabolic capacities, life cycles and relative ecosystem impact remain largely undefined.
Recently, several 16S rRNA gene studies have extended our ability to infer the functional contribution of individual bacterial community members by mapping a subset of abundant 16S rRNA sequences to their nearest sequenced reference genome [12][13][14]. Currently, predicting microbial functions from 16S rRNA gene sequencing data is a common alternative to shotgun metagenomic approaches. The computational tool PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) predicts functional metagenomic content based on the frequency of detected 16S rRNA gene sequences corresponding to genomes in regularly updated, functionally annotated genome databases [15]. Therefore, the clustering of 16S rRNA sequences into operational taxonomic units (OTUs) or amplicon sequence variants enables direct comparison with 16S rRNA gene sequences from reference genomes in a database [12,14]. PICRUSt uses reference phylogenetic trees of 16S rRNA gene amplicons to infer metabolic information in genomes included in databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) and thereby allow prediction of metagenomic function.
Herein, we report the physicochemical properties of lake water and bacterial species composition of the bacterial community at two sites in Lake Tuz and predict the contributions of bacteria to the lake ecosystem using a 16S rRNA amplicon sequencing approach. The bacterial communities from Lake Tuz were directly isolated from each sample without cultivation. The most abundant phyla were Firmicutes, Fusobacteria and Proteobacteria, and we predicted functional genes related to evolutionary adaptability and the potential for bioremediation. Our results will support future research on bioremediation carried out by potential resident microbes that produce previously unknown enzymes-for instance, new types of dehalogenases [16][17][18][19] We note that detailed information on microbial communities is necessary for judging the biotechnological potential of individual bacterial species.

Sample collection and analysis of physicochemical properties of bacterial species
Water samples were collected from two locations, termed TG1 and TG2, of the hypersaline Lake Tuz, at latitude 38 44 0 26 N and longitude 33 23 0 04 E and latitude 38 47 0 48 N and longitude 33 23 0 26 E, respectively, during the peak of the dry season in August 2018. Each sample was taken aseptically and transferred to a sterile plastic container, brought to the laboratory within 4 h and immediately stored at 4 C for further analysis. The 16S rRNA amplicon analysis of all water samples was performed by First Base Laboratories-Apical Scientific (Malaysia). The physicochemical properties of the water samples were measured onsite using calibrated YSI equipment (YSI Inc., Yellow Springs, OH, USA). Aqueous Na þ , K þ , Ca 2þ , SO 2À 4 and Clwere analysed according to the standard methods described by [20].

DNA extraction
Water samples were filtered through a 0.45-mm-poresize sterile mixed cellulose ester membrane (Fisher Scientific, Canada). The filters were cut into fragments (1 cm 2 ) using sterilized scissors and were aseptically transferred into PowerBead tubes (MoBio, Carlsbad, CA, USA) for DNA extraction using the PowerSoil isolation kit (MoBio). The concentrated and purified DNA samples were assessed using a NanoDrop 8000 spectrophotometer (Thermo Scientific). The purified DNA was subjected to agarose gel electrophoresis and visualized with ethidium bromide. DNA was stored at -20 C until further analysis.

PCR amplification, library preparation and illumina MiSeq sequencing
Bacterial composition was investigated by targeting the V3-V4 hypervariable regions of the 16S rRNA gene. Amplicons were generated using the bacteria-specific oligonucleotide primers 27 F (5 0 AGAGTTTGATCCTGGCT CAG3 0 ) and 1492 R (3 0 GGTTACCTTGTTACGACTT5 0 ) [21] and amplification was carried out in a 50-mL reaction volume that contained purified genomic DNA (gDNA), 0.3 pmol of each primer, 1 mL DNA Taq polymerase (1 U/mL) (Sigma-Aldrich) and PCR buffer containing dNT Ps and pure water (PCR/Amplification kit, Sigma-Aldrich). PCR was performed according to the following conditions: 1 cycle of 94 C for 2 min (initial denaturation), 20 cycles of 98 C for 10 s (annealing) and 51 C 30s and 68 C for 1 min (extension) of the amplified DNA. Then, the amplicons were extracted from a 1.7% Tris-acetate-EDTA agarose gel and purified using a DNA Gel Extraction kit (Axygen Bioscience, Union City, USA) and quantified using QuantiFluor ST (Promega, Madison, USA). The eluted products were then used for library preparation using a 16S rRNA metagenomic sequencing library preparation protocol [8]. Finally, purified library products were pooled in equimolar amounts and sequenced (paired-end, 2 Â 300 bp) on an Illumina MiSeq platform by Apical Scientific (Malaysia). The quality and quantity of PCR products and libraries were assessed using the TapeStation 4200 Picogreen DNA quantification reagent and spectrophotometry. Neighboring taxa were revealed using BLASTn and analysed by pairwise sequence alignment to calculate nucleotide sequence identity using the EzBioCloude server (www.ezbiocloud. net) [22]. Sequences with !97% identity were considered to belong to the same OTU.

Bioinformatics analyses
The raw reads were quantified using the sequence analysis program Trimmomatic (http://www.usadellab. org/cms/?page=trimmomatic) [23]. All raw reads were quality filtered and checked using a minimum quality score of 25 over at least 75% of the sequence read, and low-quality sequences containing >10 consecutive low-quality base pairs, ambiguous bases, errors in barcode sequences or >2 nt mismatches from the primer sequences were discarded. Paired-end reads were merged using USEARCH software (version 11.0.667, http://www.drive5.com/usearch/). All sequences of <150 bp or >600 bp (sequenced on the MiSeq platform) were disregarded. Reads were then aligned with the SILVA 16S rRNA database (Release 132) and inspected for chimeric errors using VSEARCH v2.6.2. Chimeric sequences were identified and removed using UCHIME [24]. Sequences were clustered into OTUs (i.e. !97% sequence identity) using UPARSE (version 11.0.667 http://www.drive5.com/uparse/) [25], and the RDP Naïve Bayesian Classifier in the SLIVA database (http://wwwarbsilva.de) [26] was applied to perform sequence-level taxonomic classification and identify OTUs. Spurious OTUs with only one read (singleton) or doubletons were detected upon further analysis using the clustering algorithm QIIME 1.9.1 (Quantitative Insights Into Microbial Ecology). To further explore patterns in alpha diversity, each of species richness (Chao estimator), species coverage (coverage) and species diversity (Shannon diversity index and Simpson index) was calculated and rarefaction analyses were performed using mothur software at the 97% identity level [27].

Predictive metagenome analysis
The PICRUSt tool was used to predict the metagenome based on 16S rRNA amplicon datasets [15]. The metagenomic functional potential of the Lake Tuz overall microbial community was investigated using 16S rRNA abundance data via PICRUSt v.2.1.3-b (https://github.com/picrust/picrust2/wiki) with default parameters [28]. The 16S rRNA-based metagenome was functionally annotated using KEGG pathway functions (i.e. EC and KO [KEGG orthology] accessions) using hidden state prediction [29]. Based on the abundances of 16S rRNA amplicon sequence variants and phylogenetic proximity to reference taxa with available genomes, amplicon sequence variants having a nearest sequences taxon index score of 2 were discarded from subsequent analyses.

Results and discussion
Physicochemical properties of Lake Tuz water samples Lake Tuz is categorized as thalassohaline with ionic constituents Na þ and Clbeing the major ions followed by SO 2À 4 , and the pH is >7. The physicochemical analyses revealed that Na þ and Clwere the dominant ions in the water of Lake Tuz. Table 1 presents results about the other physicochemical parameters of the water. Therefore, our physicochemical analyses of samples collected from Lake Tuz revealed that the water contains sufficient amounts of important ions to sustain the growth of halophilic bacteria.

Bacterial diversity of Lake Tuz water samples
High-throughput sequencing with an Illumina MiSeq platform was used to explore the composition and structure of the bacterial community inhabiting Lake Tuz. A total of 302,308 reads, representing 133 OTUs, were produced by sequencing the V3-V4 region of 16S rRNA gene-based amplicons from the two sample locations, TG1 and TG2. All reads were ascribed to bacteria, and 155,242 reads were assigned to TG1 (67 OTUs) and 147,066 to TG2 (66 OTUs). The reads were clustered and distributed as follows: 67 OTUs for TG1 and 66 OTUs for TG2. The single OTU that was unique in TG1 belonged to class Coriobacteria.
The rarefaction curve in Figure 2(a) reached a plateau stage, implying that the sequencing depth was appropriate for a detailed description of the overall bacterial community. The sequence reads were used to examine whether alpha diversity differed between any two water samples and hence whether the bacterial composition differed between the samples (Figure 2(b)). Rarefaction curves and alpha-diversity indices were based on 97% (nucleic acid) sequence identity to provide information about bacterial richness and homogeneity in the lake.

Bacterial composition
A metagenomic analysis of Lake Tuz water was carried out by amplicon sequencing using next-generation technology to explore the diversity of microbial communities. Amplicon sequences of TG1 and TG2 were categorized into eight distinct phyla, namely Firmicutes, Fusobacteria, Proteobacteria, Acidobacteria, Cyanobacteria, Actinobacteria, Deinococcus-Thermus and Bacteroidetes, and one unclassified phylum ( Figure 3). Firmicutes was the major phylum of the bacterial communities, accounting for 84% of all bacterial phyla in TG1 and 70% in TG2. Fusobacteria and Proteobacteria were the second and third most represented phyla among the bacterial communities, accounting for 9% and 6% of TG1 and 22% and 7% of TG2, respectively ( Figure 3).  The phyla could be subdivided into classes (Figure 4(a)) and genera (Figure 4(b)). Only 11 and 12 classes of bacteria were found in TG1 and TG2, respectively. Based on the average relative abundance, Clostridia (83% for TG1, 68% for TG2), Fusobacteria (9% for TG1, 22% for TG2) and Gammaproteobacteria (6% for TG1, 7% for TG2) were the three major classes of bacteria in the two samples (Figure 4(a)). For TG1 and TG2, 45% and 35%, respectively, of the sequences were members of genus Romboutsia, and 8% of the sequences belonged to the genus Clostridium sensu stricto 1 in each sample. The major proportion of OTUs belonged to unclassified bacteria, which comprised 39.5% of genera for TG1 and 36.7% for TG2. These were followed by genera Cetobacterium and Photobacterium, which accounted for 9% and 1%, respectively, of TG1, and for 22% and 1% of TG2 (Figure 4(b)).
Surprisingly, the dominant domain was Bacteria and not Archaea. A previous fluorescence in situ hybridization analysis revealed that Lake Tuz is dominated by Archaea (58%) and Bacteria (21.7%) [4], and Jacob et al. [30] found that Archaea is a major domain (50%) of the entire Dead Sea microbial community. Both Lake Tuz and the Dead Sea have high salinity (34% salt) and are typical of thalassohaline brine. The present study revealed that Firmicutes (70-84%), Fusobacteria (9-22%) and Proteobacteria (6-7%) are the three major bacterial phyla, whereas Mutlu et al. [4] indicated that Lake Tuz is dominated by members of Bacteroidetes and represented by Haloquadratum spp. and Salinibacter spp. [3]. Our current study also revealed that Bacteroidetes (0.6%) is less dominant in the water samples. Jacob et al. [30] reported that the Dead Sea is dominated by two important phyla, namely Proteobacteria (55.9%) and Firmicutes (41.7%).
We found that Romboutsia (Figure 4(b)) is the most highly represented bacterial genus in Lake Tuz (35-45% of species). According to Wang et al. [31], Romboutsia is gram-positive and was commonly found in an alkaline-saline lake of an oilfield in China. There were varieties of bacteria present in hypersaline environments from previous and current studies. These results were attributed to the selection of sampling locations of the lake or sea and the time of the month of sampling, each of which helps determine the microbial diversity that changes owing to continuous fluctuations in the physicochemical properties of the water and also possibly to the method used for analyses [32][33][34]. Since the bacterial composition was determined in samples collected only at specific sites, all studies have suggested that the results may not necessarily represent the entire lake but could serve as a baseline for future work owing to the fact that potential biases may be introduced. Concerning the present study, it is interesting to note that there are still unclassified phyla and that the unclassified bacteria constitute 36-39% of the total population. The inability to identify most of the bacterial species in Lake Tuz may be attributable to limitations on OTU sampling depth. Thus, a more robust OTU sampling depth may address this potential issue.
Prediction of functional profiles using 16S rRNA data A functional profile was generated using PICRUSt to predict gene families based on bacterial metagenomes by modelling genes from 16S rRNA data derived from the generated OTUs and its reference genome database [15]. To gain insight into the metabolic contribution of bacteria to the lake ecosystem, the prediction tool PICRUSt was used to determine the functional characteristics of the bacterial communities in the lake. The principal metabolic pathways were common to samples from both TG1 and TG2. Tables 2 and 3 present the number of sequence reads of the predicted genes involved in pollutant degradation and adaptation to a high-salt environment, respectively. Bioremediation-related enzymes and other proteins that had evolved to facilitate microbial adaptation to a high-salt environment were found to be highly represented in both TG1 and TG2. The proportions of enzyme types were similar among samples from both sites, although this result must be confirmed by complementary studies involving isolation and characterization of bioremediation-competent bacteria. Genes predicted from Lake Tuz were found to encode haloacid, haloacetate and haloalkane dehalogenases, alkane 1-monooxygenase and other genes related to the metabolism of hydrocarbons, halogenated organic compounds and heavy metals, such as arsenic and mercury, suggesting the presence of naturally occurring halogenated compounds and/or possible contaminants or pollutants in the lake (Table 2). On the other hand, most of the bacterial species in the microbial community of Lake Tuz had genes that were predicted to be involved in the synthesis of halo-adaptation compounds such as ectoine, glycinebetaine, glutamate, trehalose and choline ( Table 3). The unique cellular enzymatic machinery of halophilic microbes allows them to thrive in extreme saline environments, and this is related to the elevated content of acidic amino acids in bacterial proteins, which increase the net negative charge of a protein's surface. Therefore, the bacteria effectively use hydrocarbons as their sole source of both carbon and energy, and thus such bacteria may be valuable bioremediation agents for the treatment of saline effluents and hypersaline waters contaminated with toxic compounds that are resistant to degradation [17,35].

Conclusions
Most of the bacterial phyla in Lake Tuz highlight the importance of these bacteria for degrading pollutants and thus these groups may be the focus of future studies. Very few reports have been published concerning pollutant degradation in saline/hypersaline environments with respect to microbial diversity and the prediction of metabolic and enzymatic systems using metagenomic amplicon sequencing [5,[36][37][38][39][40]. This finding suggests that the bacteria inhabiting Lake Tuz could have potential applications in biotechnology as well. However, this study could serve as a foundation for future studies of the bioremediation of pollutants and heavy metals in extreme or polluted environments. A different approach such as shotgun metagenomic sequencing and/or quantitative PCR could be employed to identify pollutant-degrading bacteria at the species level.