Investigation of rumen metagenome in Italian Simmental and Italian Holstein cows using a whole-genome shotgun sequencing technique

Abstract The study aimed at investigating the rumen microbiome composition and functional activity in mid lactating cows of Italian Holstein (IH) and Italian Simmental (IS) breeds. Eight IH and eight IS pluriparous cows with days in milking (DIM) ranging from 90 to 180 were selected and rumen contents were sampled with oesophageal tube. Rumen metagenome was analysed using a whole shotgun sequencing. Data were analysed for taxonomic classification and microbial genes. The relative abundance of Archaea, the Archaea to Bacteria ratio and the Archaea to Eukarya ratio were higher (p < .05) in IS than IH cows. The comparison between IH and IS underlined differences for the abundances of Bacteria, being Bactroidaceae, Bacteroides, Prevotellaceae and Prevotella lower (p < .05) in IS than in IH cows. The IS cows showed higher abundances of Euryarchaeota (p < .05), Methanosphera (p < .01) and Methanothermobacter (p < .05) than IH cows. The annotation of sequences to KEGG revealed that 170 genes were differentially abundant between IS and IH cows and among these, 20% were involved in protein biosynthesis, 8.8% in one-carbon metabolism, as methyl coenzyme M reductase associated protein and of six isoforms of methyl coenzyme M reductase. The present results suggest a genetic link between breed and microbiome, although this interaction can be influenced by several biological factors. Considering that there are still a low number of whole genome shotgun sequencing analysis of rumen communities, these data can provide further information to scientific community.


Introduction
According to different estimates, ruminants contribute from 7.5 to 18% of the total amount of anthropogenic emissions of methane, (FAO 2010;Hristov et al. 2013). Accordingly, the expected increase of demand for food of animal origin in the next years and the related expansion of livestock systems is one of the area where research can contribute more to an environmentally sustainable growth (Pulina et al. 2017).
The relationship of animal physiology, genetics, feeding and nutrition with methane emissions in dairy cows is an area of investigation which has attracted many researches in the last years (Morgavi et al. 2010;Cassandro et al. 2013;Poulsen et al. 2013;Mao et al. 2015;Campanaro et al. 2017;Negussie et al. 2017). A recent study by Roehe et al. (2016) has underlined that breed and genetics of beef cattle affect the amount of rumen methane emissions and that these differences are reflected in a variation of rumen microbial compositions. The Archaea to Bacteria ratio and the abundances of their genes have been demonstrated to be involved in the reported differences, suggesting the implementation of breeding strategies, which can include the composition of microbial community as a supplementary indirect trait of methane emissions (Methagene COST action FA1302 (http:// www.methagene.eu). Also according to Wallace et al. (2014Wallace et al. ( , 2015, the relative abundance of Archaea in the rumen is correlated with methane emissions from individual animals. In this research work, we wanted to investigate whether dairy and dual-purpose lactating cows show different microbiota composition and functional microbiome. The rumen metagenome of Italian Holstein (IH) and Italian Simmental (IS) cows was analysed using whole-genome shotgun sequencing and results were used for functional analysis.

Animals, productive traits and diets
The experimental procedures and the handling of the animals complied with the Italian legislation on animal care (DL n.116, 27/1/1992) and were approved by the ethical committee of the University of Udine. Cows were selected within four herds from commercial farms in the North East part of the Po Valley, Italy. The farm had free stall barns as housing system and the animals were fed ad libitum a total mixed ration (TMR) twice a day, at around 07:00 am and 05:00 pm with free access to water. The local Farm and Breeder Association (Associazione Allevatori del Friuli Venezia Giulia, Codroipo, Italy, www.aafvg.it) provided information on individual milk yield, composition and somatic cell count (SCC) of the last official record and reproductive parameters. The body condition score (BCS) of each cow was recorded by a trained evaluator on a scale from 1 (thin) to 5 (fat) with 0.25 point intervals (Edmonson et al. 1989). Four healthy mid lactating cows were selected with DIM ranging from 90 to 180 (mean 141 ± 32 DIM) for each farm, to have eight Italian Simmental (IS) and Italian Holstein (IH), respectively. A more detailed description about cows and production traits is presented in Table 1. The compositions of the rations with their contents of nutrients are reported in Table 2.

Metagenomic DNA extraction and sequencing from rumen samples
A sample of about 40 g of rumen content of each animal was collected with an oesophageal tube one hour after the morning feeding and was transferred to the laboratory kept under dry ice , where it was maintained at À80 C until DNA extraction. The rumen contents were sampled within a period of eight days. For DNA extraction, whole rumen contents were extracted with the following steps: 1. three minutes centrifugation at 3000 g; 2. washing with a 50 mM Tris-HCl pH 8.3, 200 mM NaCl, 5 mM sodium EDTA, and 0.05% Triton X-100 buffer solution; 3. washing with 50 mM Tris-HCl pH 8.3, 200 mM NaCl, and 5 mM sodium EDTA buffer solution; 4. washing with a 10 mM Tris-HCl pH 8.3and 0.1 mM sodium EDTA. DNA was then extracted with a Fecal DNA MiniPrep kit (Zymo Research; Irvine, CA), following the manufacturer's instructions, including the bead beating step. Pre-amplification DNA concentration in the samples was measured with a Nanodrop-1000 assay (Thermo Scientific, Wilmington, DE). DNA samples were cleaned up through repeated ethanol precipitation using 3 M sodium acetate of pH 5.2. Genomic DNA was diluted to 20 ng/mL on a final volume of 55 mL and using a Covaris S2 station (Covaris, Woburn, MA) the genomic DNA was sheered in AFA microtubes 200 cycles per burst and an intensity of five cycles of 40 seconds in frequency sweeping mode at 4 C. The obtained samples were processed in the CBM laboratory (Cluster BioMedicine, Basovizza Trieste -Italy; https://www.cbm.fvg.it/) according to TruSeq DNA Sample Prep V2 Low Throughput Protocol (Illumina, San Diego CA). Libraries were pooled into eight group of two samples and the final purified products were quantitated, using both quantitative polymerase chain reaction (qPCR) Kapa Library Quantification Kit (Kapa Biosystem, Wilmington MA) and BioAnalyzer High Sensitivity DNA Chip (Agilent Technologies, Waldbronn, Germany). Sequencing was performed using a 200-cycle paired-end run, TruSeq SBS Kit on a HiScanSQ System (Illumina, San Diego CA).  The reads <50 bp long or with an average Q quality score of sequencing <20 were discarded.

Metagenome analysis
Sequencing data in the format of FASTQ files were analysed with the online Meta Genome Rapid Annotation using Subsystem Technology (MG-RAST, Meyer et al. 2008;Wilke et al. 2013) pipeline for ribosomal RNA (rRNA) genes annotation. A dereplication step was performed using a simple k-mer approach to rapidly identify all 20 character prefix identical sequences, in order to remove artificial duplicate reads. The MG-RAST pipeline also provided a screening stage to remove the reads that were near-exact matches to the genomes of a handful of model organisms, including fly, mouse, cow and human. The rRNA reads were detected through an initial BLAT search against a reduced RNA database that efficiently identifies RNA. The reduced database was a 90% identity clustered version of the SILVA database and was used merely to rapidly identify sequences with similarities to rRNA. The rRNA-similar reads were then clustered at 97% identity and the longest sequence was picked as the cluster representative. The rRNA annotation with the MG-RAST pipeline was performed through a BLAT of the longest cluster representative obtained from the 97% identity clustering step, against the M5rna database, integrating SILVA (Pruesse et al. 2007), Greengenes and RDP databases.
For the protein annotation, sequence similarity searches were computed against a protein database derived from the M5NR (Wilke et al. 2012), which is a non-redundant integration of several databases (GenBank, SEED, IMG, UniProt, KEGG and eggNOGs). Accordingly, MG-RAST can supports complementary views into the data with one similarity search, including different functional hierarchies: SEED Subsystems, IMG terms, COG, eggNOGs and ontologies such as GO (Gene Ontology Consortium). Furthermore, a four level hierarchy for the enzyme was implemented, using the KEGG enzyme number (EC: Enzyme Commission number): 1. KEGG level 1first digit of the EC number (EC:X. Ã . Ã . Ã ) 2. KEGG level 2the first two digits of the EC number (EC:X.Y. Ã . Ã ) 3. KEGG level 3the first three digits of the EC number (EC:X:Y:Z:. Ã ) 4. KEGG level 4the entire four digits EC number.
Sequencing data in the format of FASTQ files were analysed with the online MG-RAST pipeline for rRNA) genes annotation (Meyer et al. 2008;Wilke et al. 2013). The MG-RAST 'best hit organism abundance' function (Wilke et al. 2013) was employed against a database, integrating SILVA (Pruesse et al. 2007), Greengenes and RDP databases, with a threshold of 97% sequence similarity. For the annotation of genes encoding proteins, sequence similarity searches were computed against a protein database derived from M5NR database (Wilke et al. 2012), with a 0.97 identity threshold. This M5RN database represents an integration of several databases (GenBank, SEED, IMG, UniProt, KEGG and eggNOGs). A four level hierarchy for the enzyme was implemented, using the KEGG enzyme number: KEGG level 3the first three digits of the EC number (EC:X:Y:Z:. Ã ) 4. KEGG level 4the entire four digits EC number.
Taxonomic and functional abundance data were parsed with Perl scripts (Supplementary Material 1) in order to obtain functions or taxonomic definitions on rows and counts per animal on columns. Every taxonomic rank (Domain, Phylum, Class, Order, Family and Genus) or function level (level 1, 2, 3 and 4) were retained. In the MG-RAST, reads can be mapped to multiple annotations and one annotation to multiple reads (Wilke et al. 2013), due to the structure of microorganism genomes, pipeline and to the characteristic of the database used for annotation. To avoid redundant rows of functions, different rows with the same definition and the same count within animal were merged.
After this step, a curation to avoid redundant rows of function was performed: two different rows with the same definition and the same count per animal were merged.
The datasets generated and/or analysed during the current study are available in the SRA repository: https:// www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP103178.

Statistical analysis
The operational taxonomic units (OUT) picking from the whole-genome shotgun approach used in the study resulted in a high number of Phyla (Supplementary Table S1), many of which were assigned to taxa not belonging to Bacteria domain or to Eukarya resident into the rumen (Kim et al. 2011). Before statistical analysis of metagenome, taxa were checked to remove phyla not included in the rumen census as reported by Kim et al. (2011). A further cleaning of dataset was adopted, to consider only those Phyla that were present at least in half of the cows (more than eight animals) and with relative abundance higher than 0.05%, to reduce uncontrolled variability related to the depth of sequencing more than to biological factors. The applied criteria are arbitrary, but the percentages of the remaining 16 Phyla accounted for 99.37 ± 0.23 of the cleaned sequences and for the 96.95 ± 0.15 of the initial total sequences (Supplementary Table S2).
Data were subjected to analysis of variance with fixed effects of breed and random effect of farm using the linear mixed model of SPSS software (SPSS 1997): where Y was a vector of OTU counts; l was the overall mean; breed was the effect of breed with i equal to 1 for IH and 2 for IS; farm was the random effect of farm distributed as N(0, Ir 2 farm ), with identity matrix I and farm variance r 2 farm and e the residual error. Least square differences test was applied to assess significant differences between means for breed effect. Differences in relative abundances between breeds at each taxon were assessed with the multiple test correction of Bonferroni, with a number of test equal to 76. Differences among means were considered significant for p < .05.

Results
The mean number of reads was 1,15,395 (1,03,476 for IS and 1,27,315 for IH cows) and after the application of cleaning criteria the number was reduced to 97.0 and 98.1%, for IS and IH cows, respectively. A further slight reduction of 0.5% was obtained after filtering for the presence of reads in at least eight cows and the final reads used for the analysis were 99,781 and 1,24,203 for IS and IH, respectively. These mean values did not significantly differ between breeds and farms (Supplementary Table S2).
The relative abundance of Archaea was significantly different between the two breeds (76.90% for IS and 56.64% for IH; p < .05 of total OTUs), with the IS cows  showing higher values than IH cows (Figure 1 and Supplementary Table S3). The higher abundance of archaea was also confirmed from the Archaea to Bacteria ratio and from the Archaea to Eukarya ratio, which were higher (p < .05) in the IS cows. The mean relative abundance of taxa significantly different between breeds is reported in Table 3. Within the Bacteria domain, a significantly higher (p < .05) relative abundance was observed for the phylum Actinobacteria, the genus Bifidobacterium, the family Spiroplasmataceae and the genus Spiroplasma for the IS cows in comparison to IH cows. For the family Bacteroidaceae and the genus Bacteroides and for the family Prevotellaceae, the relative abundance in the IS cows was significantly lower than in the IH cows (p < .05).
Within the Archaea domain, higher relative abundances of phylum Euryarchaeota (p < .05) and genera of Methanosphera (p < .01) and Methanothermobacter (p < .05) were obtained for the IS cows in comparison to IH cows. Within the Archaea domain, the effect of farm was never significant.
The annotation of the sequences provided 1636 features at the KEGG level 4 (Supplementary Table S4). From the analysis of variance, genes significantly different between breeds were 170, 20% involved in protein biosynthesis, 8.8% in one-carbon metabolism and 8.8% in RNA processing and modification.
Thirty four genes of 'Protein biosynthesis' category were significantly more abundant in IS than IH cows: these genes were predominantly related to Eukarya and Archaea rather than Bacteria domain. Within these genes, ribosome large subunit (LSU) and small subunit (SSU) eukaryotic and archaeal (14 and nine genes, respectively) and translation elongation factors eukaryotic and archaeal genes (seven genes; Figure 2 and Supplementary Table S4) were found. Among the genes involved in the one-carbon metabolism, the relative abundances of formylmethanofuran dehydrogenase subunit D, methyl coenzyme M reductase associated protein, six isoforms of methyl coenzyme M reductase (alpha subunit, gamma subunit, I alpha subunit, I beta subunit, I gamma subunit and II gamma subunit, Figure 3, Panel A) were significantly higher (p < .05) for IS cows in comparison to IH cows. Also for three genes involved in the methenyltetrahydromethanopterin metabolism (Figure 3, Panel B), the relative abundances were significantly higher (p < .05) in IS cows.

Rumen microbial community
In the present study, we sampled rumen contents from commercial farms and, although the rations were similar in terms of ingredients and contents of nutrients (Table 2), the different environments of the stable can have exerted some influence on rumen microbial communities. A considerable individual variability has been widely shown in rumen microbial profile (Ross et al. 2012;Sandri et al. 2014), independently from the diet and environment. On the other hand, Ross et al. (2013) after repeated samples from the same cows taken several weeks apart, observed a high repeatability of microbiome composition within the animal. The same authors suggested a good stability of rumen microbial profile over the time, confirming the hypothesis of a substantial host specificity of bacterial community (Weimer et al. 2010). However, the details and the mechanisms which regulate this mutual relationship remain mostly unknown.
The influence of host genetics on the composition of rumen microbial communities can be related to several biological factors, not only the rumen physiology, but also to host-microbiome interplay, which has led to the development of the concept of microbiomegut-brain axis (Mayer et al. 2015). Within the rumen, regulation of saliva production (Appuhamy et al. 2014), size and physical structure of the rumen (Goopy et al. 2014), rate of absorption of volatile fatty acids are factors that can influence the microbial fermentation and the production of methane. The gut microbes and the host is a highly integrated system, with the production of neuropeptides and signalling compounds from both the players that can interact with gut contraction and satiation mechanisms, behaviour,

Ratio between Archaea and Bacteria domains
The average Archaea to bacteria ratio (Figure 1 and Supplementary Table S3) was slightly higher than that reported by Roehe et al. (2016) for growing beef cattle, likely due to the higher dietary fibre of the rations fed to dairy cows during lactation. Fermentation of non-structural carbohydrates produces higher acetate than propionate as end products of bacteria metabolism, a conditions which favours the methanogenesis (Roehe et al. 2016). The higher relative abundances of Archaea and their ratios with Bacteria and Eukarya (p < .05) were found in the less productive dual purpose IS cows. A relevant shift of microbial community composition with an increase of Archaea to the detriment of Bacteria (Roehe et al. 2016) and Eukarya (Popova and Eugene 2011;Carberry et al. 2012) is reported for high fibre in comparison to high concentrate diets.
Significant higher relative abundances of Bacteroidaceae and Bacteroides, Prevotellaceae were found for IH, the experimental groups that corresponds to a higher genetic merit for milk yield. Rooke et al. (2014) in beef cattle reported that the amount of concentrate in the diet had not effect on the relative abundances of Prevotella plus Bacteroides. Instead, these authors showed higher abundances of these  genera in Aberdeen Angus steers in comparison to Limousin steers, concomitantly with the higher feed intakes of the former breed, suggesting a genetic link between dietary intakes and metagenome composition. Individual dry matter intakes were not recorded in our study and it is likely that cows with higher milk yields ate more, influencing the structure of rumen metagenome. Recently, McCann et al. (2014) and Jewell et al. (2015) reported that microbial community of beef cattle was influenced by feed efficiency, evaluated in terms of residual feed intake, suggesting that the genetic component of the host can exert an appreciable effect on rumen microbiome. However, in the research of McCann et al. (2014) the authors observed that the less efficient animals, i.e. those with positive residual feed intake, had higher abundance of Prevotella, whilst Jewell et al. (2015) found that this genus was correlated with both high and low efficiencies for milk production and was considered beneficial to rumen function. Relative abundances of Actinobacteria and Bifidumbacterium in IS were significantly higher and in a recent study the abundance of these Phyla was positively associated with methane production in Holstein cows (Ross et al. 2013).

Functional annotations
The whole genome shut-gun sequencing allows taxonomic classification of microbes and the definition of genes richness and abundance using annotation to remote database. The abundance of genes somehow reflects the presence of microbial taxa, but allows the identification of specific features for the substrate utilisation by the ecosystem into the rumen (Brulc et al. 2009). Microbial communities differing for their internal structure can form networks that have the same metabolic inputs and outputs (Taxis et al. 2015). In the present study, a significant variation of abundance within the archaea and bacteria domains was observed between IS and IH cows and the functional analysis at level 2 of KEGG indicated that the rumen ecosystem of the two breeds displayed unique features (Supplementary Table S4).
The higher abundance of Archaea in the IS metagenome compared to IH metagenome corresponded to the significant enhancement of genes involved in protein biosynthesis (Figure 2). Since a systematic analysis of archaeal ribosomes is still lacking, the functional characterisation of ribosomal proteins is a fundamental step in the understanding how archaeal ribosome can cope with extreme environmental conditions (Marquez et al. 2011). Recently, an archaeal Ribosome LSU L14, named L23 in Eukarya, has been reported as main docking partner of the aIF6, eIF6, and this factor inhibits the initiation of translation in both these domains (Pech and Nierhaus 2012), whilst aIF2 positively regulates the process (Benelli and Londei 2011). Interestingly, the relative abundances of many large and small subunit of ribosomal protein were affected by breeds, together with eukaryotic translation initiation factors. Among these, significantly higher were the abundance of the LSU ribosomal protein L14 and of eukaryotic translation initiation factor 2, alpha and beta subunits (Supplementary Table S4). However, these data can be related to the higher abundances of archaeal taxa in IS cows more than to a specific effects on protein biosynthesis. Studies of the function of these factors are required, to elucidate their role on Archaea functions and their relationship with methane metabolism.
If the higher abundance ( Figure 3) of methyl-coenzyme M reductase genes in IS metagenome in comparison to IH was associated to a variation of methane emission is not possible to state. Recent findings (Campanaro et al. 2017) have reported that some Archaea species, as Methanomethylophilus sp, can have a relevant role in the rumen microbiome, due to a higher abundance within the methylotrophic methanogens. The methyl-coenzyme M reductase enzyme is responsible for the formation of methane and is organised as a hexamer in an alpha2beta2gamma2 arrangement, with molecules of the nickel porphinoid coenzyme F430 embedded between the subunits with two active sites accessible for the substrate methylcoenzyme M (Ermler et al. 1997). In particular, the methyl-coenzyme M reductase catalyses the last step in the methanogenesis (Friedrich 2005) and the abundance of the subunit alpha was associated with methane emissions in dairy cattle (Aguinaga Casanas et al. 2015;Roehe et al. 2016). Of note, in our study the methyl-coenzyme M reductase alpha subunit was significantly higher in IS cows in comparison to IH cows (1.74 þ 0.33 and 0.82 þ 0.10, respectively; p < .05; Supplementary Table S4). The higher abundance of archaeal gene formylmethanofuran dehydrogenase subunit D (fmdD) also suggested a higher methanogenic metabolism of IS rumen microbial community. The enzyme forms a transcription unit with the gene formylmethanofuran dehydrogenase subunit B (Hochheimer et al. 1995). This genes catalyses the reversible reduction of carbon dioxide (CO 2 ) and methanofuran via N-carboxymethanofuran (carbamate) to N-formylmethanofuran, the first and second steps in methanogenesis from CO 2 and the subunit B was associated to methane emission in beef cattle (Roehe et al. 2016).

Conclusions
This study investigated and compared rumen microbial communities between cows of IS and IH breeds housed in commercial farms and the results suggested that the composition of metagenome and the abundance of microbial genes was affected at some extent from the genetic background. The mechanisms behind host and microbial community interplay is the result of several biological factors, which can act as confounding factors leading to biased conclusions. However, these data confirm previous recent findings, which indicate the link between host genetic and rumen microbial composition and functions. Furthermore, considering the still low number of whole-genome shotgun sequences for rumen communities, these data can provide further information to scientific community.