The transcriptome analysis of males musk gland in Moschus berezovskii (Artiodactyla: Moschidae)

Abstract The dwarf musk deer, or Chinese forest musk deer (Moschus berezovskii) is an endangered artiodactyl species native to a narrow region in southern and central China as well as northernmost Vietnam. This species is well known for the habit of male musk deer to secrete musk from the musk gland. Unmated male forest musk deer secrete a greater amount of musk than mated males, which probably allows them to attract a greater number of females. However, transcriptome data for this species are deficient, and the potential transcriptional pathways of musk production remain unknown. In this study, we used RNA-sequencing (RNA-Seq) to perform an in-depth survey of transcriptome data from musk gland and to compare the transcriptome differences between mated- and unmated sexually mature Chinese forest musk deer. There were 13,009 and 12,995 genes were expressed in unmated and mated musk glands tissue. In addition to analyzing these genes, we analyzed the differentially expressed genes in musk gland tissue between mated and unmated Chinese forest musk deer males. One hundred and thirty seven genes were found differentially expressed. We found that some genes in “PPAR signaling pathway”, “PI3K-Akt signaling pathway” and “Thyroid hormone synthesis” pathways may be associated with musk production and the secretion of the musk deer. The annotated genes and differentially expressed genes in this study can be applied in conservation or utilization study of musk deer in the future.


Introduction
Chinese forest musk deer Moschus berezovskii Flerov, 1929 (Artiodactyla: Moschidae), is a small solitary ungulates distributed throughout the forested and mountainous parts of Asia (Feng et al. 1981). Due to historic over-hunting and habitat destruction, their populations have been sharply reduced (Guan et al. 2009). Musk deer are currently listed in Appendix I and Appendix II of the Convention on International Trade in Endangered Species of Wild Fauna (Zhou et al. 2004) and on the Flora and World Conservation Union IUCN Red List (Wang & Harris 2015), and they are considered as a first-class key species of wildlife in China Wild Animal Protection Law (Guan et al. 2009).
Since the 1950s, captive breeding has become one of the most important tools for the conservation of this endangered wild animal. Till date, approximately 8,980 captive musk deer are kept and bred at approximately 30 musk deer farms (Meng et al. 2011;Meng et al. 2012a;Zhao 2016).
Musk is a class of aromatic substances including glandular secretions from animals such as the musk deer and artificial substances with similar odors (Sommer 2004). Not only has musk been used in traditional Chinese medicine for more than 2,000 years, but it is also employed in perfumes in European countries due to its unique, long-lasting scent (He et al. 2014a). Adult male musk deer secrete musk from the musk gland, which is located between their navel and genitals (Green 1986). The production of musk is a complicated physiological process. After musk is secreted from the single layer of columnar epithelial cells that line the musk gland, it is stored in the musk sac. This sac has several keratinized layers, which are mix with the musk after being sloughed. At the same time, the musk is fermented by musk gland microbiota and changed from an unremarkably scented substance into a mature musk with strong fragrance. (Shrestha 1998;Jie et al. 2014a). Musk is likely involved in communication through chemical signals. In general, unmated forest musk deer males secrete a greater amount of musk than mated males, potentially allowing them to attract a greater number of females (Green 1987;Jie et al. 2014b). The extraction of musk from captive animals is the only legal source of musk (Meng et al. 2012b). Our previous study showed that musk production after mating was significantly lower than that of unmated deer, analysis of the chemical composition of the musk showed that unmated males have more muscone and cholesterol (Li et al. 2016).
Although the draft genome sequence of forest musk deer (Moschus berezovskii) was published (Fan et al. 2018). There is limited transcriptome data available currently. Thus, it is difficult to study the molecular mechanisms underlying musk production, including the formation and secretion of musk. Due to its highthroughput, quantitative, cost-efficient and genomeindependent nature, RNA-seq has been shown to be useful technique for the analysis of entire transcriptomes to study the regulatory mechanisms of selected tissues or cells across a wide range of species, especially for animals without an available genomic sequence (Wang et al. 2009;Martin & Wang 2011). In addition to these advantages, RNA-seq technologies are able to identify sequence variations, alternative splicing, exonintron boundaries and simple sequence repeats (SSRs) (Wang et al. 2009;Mutz et al. 2013). The present study was therefore carried out to provide transcriptome data for Chinese forest musk deer and to investigate the potential mechanisms of musk secretion at the genetic level. In this study, using the Illumina HiSeq 2500 platform, reference transcriptome for musk gland of unmated and mated males were constructed and annotated, allowing us to identify potential pathways and candidate genes involved in musk production. As a result, a total of 208,730 unigenes were assembled and identified. Analysis of transcriptome data from the musk gland tissues of mated and unmated sexually mature Chinese forest musk deer males will reveal possible genes and pathways involved in musk production.
Musk deer is well known for the musk secreted by the male, but also seriously threatened by poaching due to the high value of musk. By increasing the production of musk from captive breeding individuals, it may release the threatening factor to some extent. In this study, these data will be helpful in further studies on the genetics and genomics of musk deer, including genome annotation of this species.

Sampling
Chinese forest musk deer were housed in the Chongqing Institute of Medicinal Plant Cultivation (Chongqing, China, altitude: 678 m). The musk deer's housing conditions, feeding regimens and behavioral observations were the same as previous studies (Jie et al. 2014b;Li et al. 2016;Xu et al. 2017). According to their mating behavioral observation, musk gland tissues were collected from two sexually mature male individuals (one mated and the other unmated) after they died from natural disaster (earthquake of April 20th, 2013). The weight of musk obtained from the unmated male was 30.23 g, whilst only 0.45 g musk was obtained from the mated male. The two musk deer werẽ 3.5 years of age. Their physical status before dead was good and no genetic relationship between them. Samples were collected in sterile cryogenic vials, stored in liquid nitrogen, transported to Sichuan Agricultural University, and frozen at −80°C until their RNA profiles could be determined. All procedures in this study were ethical approved by the Institutional Animal Care and Use Committee (IACUC) of Sichuan Agricultural University, under permit number DKY-L20143204. These two musk gland tissue samples were collected specifically for the purposes of this study and the current study itself received review and approval by the IACUC.

RNA extraction, cDNA synthesis, and sequencing
Frozen tissues were thawed, and the total RNA was extracted with TRIzol and purified with the RNeasy Minikit (Qiagen, USA) following the manufacturer's manual. The quality of the RNA was measured using the Nanodrop2000 (Thermo Scientific, USA). All of the samples used in the analysis presented RNA integrity numbers greater than 8.0. mRNA was isolated from the total RNA using oligo (dT)-linked beads (Illumina; San Diego, CA, USA). The SuperScript Double-Stranded cDNA Synthesis Kit (Invitrogen, Carlsbad, CA) was employed to generate double-stranded cDNA. The Illumina TruSeq RNA Sample Prep Kit was used to prepare cDNA libraries. All RNA samples were sequenced on the Illumina HiSeq 2500 platform (paired-end), Sangon Biotechnology Co., Ltd at Shanghai.

Transcriptome assembly
The raw reads were filtered using Cutadapt (Martin 2011) and Prinseq (Schmieder & Edwards 2011) software. Low-quality reads showing adaptor contamination, low quality at end, containing fuzzy N bases and low-quality reads with Phred score < 20 were removed. Due to the lack of a reference genome for Moschus berezovskii, we used the Trinity de novo assembler (v20140717) (Grabherr et al. 2011;Haas et al. 2013) to assemble the clean reads into unigenes.

Unigene annotation and open reading frame prediction
The unigenes were used for BLAST search and annotation against an NCBI non-redundant (NR) database using an E-value cut-off of 10 −5 . The unigenes were further aligned to several protein databases, including Swiss-Prot (Magrane & Consortium 2011), nucleotide collection (NT), clusters of orthologous groups for eukaryotic complete genomes (KOG, Tatusov et al. 1997, conserved domain and protein classification (CDD, Marchler-Bauer et al. 2015), Pfam (Finn et al. 2014), translated EMBL (TrEMBL, Camon et al. 2003), gene ontology (GO, Ashburner et al. 2000) and Kyoto encyclopedia of genes and genomes (KEGG, Kanehisa et al. 2004), to retrieve their functional annotations. When the annotation results were inconsistent from different database, then aligned results of NR database were selected, followed by the Swiss-Prot, TrEMBL, CDD, Pfam, GO, KOG and KEGG databases. OrfPredictor tool (Min et al. 2005) was used to identify the open reading frame (ORF) of unigenes without significant hits in all database.

Identification of differentially expressed genes (DEGs)
Processed reads from each sample were mapped to the musk deer reference genome (Fan et al. 2018) using TopHat v2.0.12, mismatch = 2 (Trapnell et al. 2009). The RNA-seq reads were also mapped to transcriptome reference database, and the resulting transcript abundances were quantified using RSEM v1.2.22 (Li & Dewey 2011). The expression levels were determined based on the value of reads counts number of each gene. Differentially expressed genes (DEGs) between these two samples were identified using the DEGSeq v1.12.0 (Wang et al. 2010). Genes with an adjusted q value ≤ 0.005, and |log 2 (fold change)| ≥ 1 were identified as differentially expressed.
Functional enrichment of the DEGs GO functional classifications and KEGG pathway enrichment analysis were performed to identify enriched GO and KEGG pathways. Only GO-BP, GO-MF and KEGG pathway terms with a corrected p value less than 0.05 were considered significant and included to the list.

Validation of the gene expression by qPCR
We selected 4 DEGs to validate the expression profiles of RNA-Seq by relative quantification/real-time PCR (q-PCR). The expression levels of the DEGs were normalized to GAPDH (F: 5ʹ-CAAGTTCAACGG CACAGTCA-3ʹ, R: 5ʹ-TGGTTCACGCCCATC ACAA-3ʹ, product length was 249 bp), primers for the mRNAs were designed using PrimerBLAST (https:// www.ncbi.nlm.nih.gov/tools/primer-blast/). Total cDNA synthesis and qPCR reactions were the same as our previous study (Xu et al. 2018). Each sample was conducted with three technique replicates. The relative expression levels were calculated using the 2 −ΔΔCt method (Livak & Schmittgen 2001), and the data were expressed as mean values.

Availability of supporting data
The datasets supporting this article are available at the National Center for Biotechnology Information (NCBI) accessioned under Bioproject ID: PRJNA289641. The Illumina sequences were deposited in the SRA under accessions SRP060734 and SRP060736.

Transcriptome sequencing and assembly
Using Illumina paired-end sequencing technology, a total of 56,667,752 and 56,811,688 raw reads were obtained from the musk gland of unmated and mated males respectively. After removal of the adaptor fragments, low-quality reads (Phred score < 20), a total of 111,451,986 clean reads consisting of 14 billion nucleotides were obtained (Table I). Among them, 77.83% and 77.25% were mapped to the genome through TopHat v2.0.12 (Table II).
We used the Trinity program (Grabherr et al. 2011) to assemble the high quality reads, producing 208,730 unigenes with an average length of 599 bp from these transcripts (Table III). Of which 56,757 unigenes were longer than 500 bp and 23,891 unigenes were longer than 1,000 bp. The corresponding N50 and N90 values were 840 bp and 251 bp, respectively.
In the NR database search, unigenes were translated into six open reading frames and aligned to known proteins. Within the KOG database, 19,762 (9.47%) unigenes were categorized into 27 KOG clusters. As illustrated in Figure 1, among the 27 KOG categories, signal transduction mechanisms, general function prediction and transcription formed the three largest functional groups, whereas cell wall/membrane/envelope biogenesis, nuclear structure and cell motility formed the three smallest functional groups of known proteins, accounting for only 248 unigenes.
Using Blast2GO (Conesa et al. 2005), 31,039 unigenes involved in 250,056 GO terms were annotated according to the GO database. The majority of the GO terms were assigned to biological processes (114,835, 45.92%), fewer to cellular components (69,474, 27.78%) and the least to molecular functions (65,747, 26.29%). These GO terms were further subdivided into 61 sub-categories (Figure 2). Under the biological process category, the cellular process subcategory was the most abundant. Among the assignments made to the cellular component category, a large proportion of the sequences were involved in the cell and cell parts subcategories. In the molecular function category, the majority of the unigenes were grouped into the catalytic activity, binding and molecular transducer activity subcategories. KEGG classification (Figure 3) indicated that most of the unigenes participated in cellular processes, environmental information processing, genetic information processing, metabolism and organismal systems. The largest unigene-associated subgroup was signal transduction (2762), followed by the immune system (1382), endocrine system (1172), nervous system (859), translation (720) and carbohydrate metabolism (663).

ORF prediction and CDS identification
According to our work flow, a total of 205,896 ORFs, ranging from 30 to 166,471 bp with an average of 282 bp, were predicted from the unigenes. The CDS length distribution in the musk transcriptome is shown in Figure 4. Among these ORFs, the largest proportion ranged from 101-200 bp, followed by 201-300 bp, 1-100 bp and 301-400 bp; 2,848 unigenes contained ORFs longer than 2,000 bp.

General changes of gene expression in the musk gland after mating
To identify the differentially expressed genes (DEGs) between the musk glands of mated and unmated deer, we calculated the FPKM to measure gene expression levels. Based on the applied criteria (q ≤ 0.005 and |log 2 (fold change)| ≥ 1), 137 significantly DEGs related to mating condition, including 13 upregulated and 124 down-regulated genes, were identified ( Figure 5). Among these genes, angiotensinogen (ANTG), acyl-coenzyme A synthetase (ACS), and serum albumin precursor were the top 3 significantly up-regulated genes, which were up regulated by~159, 94 and~80 folds, respectively, while COX2, ATP8 and ND5 were the most markedly down-regulated genes, by at least 100-folds.
To retrieve the biological processes that differed between mated and unmated musk glands, GO analysis was performed to evaluate the functions of the differentially expressed unigenes. There were 32 DEGs  assigned to the three main categories of GO classification. The number of the DEGs involved in the biological process and molecular function categories were 3 and 7 respectively (Table V). Among the top 3 significantly enriched GO terms, the major significant modifications changes were related to "calcium ion binding" (GO:0005509, 17 DEGs), "oxidoreductase activity, acting on peroxide as acceptor" (GO:0016684, 5 DEGs) and "peroxidase activity" (GO:0004601, 4 DEGs).
The KEGG pathway annotations of the DEGs were also obtained. A summary of the total 17 enriched pathways was included in Table V. After mating, the major modifications were related to "PPAR signaling  pathway", "cardiac muscle contraction", "Calcium signaling pathway", "PI3K-Akt signaling pathway" and "Thyroid hormone synthesis".

Validation of the gene expression by qPCR
To validate the sequencing data, we selected 4 DEGs representing different temporal profiles to examine the expression patterns at each mating condition. Two DEGs (ANGT and KCNJB) were decreasing in unmated male, 2 DEGs were increasing (IGHG3 and WFD18) (Table VI). Relative expressions of 4 DEGs detected by qPCR were compared with the reads counts values after normalization of RNA-Seq. The results showed that the expression patterns of 4 DEGs were consistent with the RNA-Seq data (Table  VI), illustrating the reliability of our RNA-Seq data and guaranteed the accuracy of stringent pipeline was used to identify the transcript.

Discussion
With the development of next-generation sequencing technology and assembly methods, RNA-seq has become a cost-effective method for obtaining large amounts of transcriptome data and for efficient probing (Grabherr et al. 2011). In this study, we used the Illumina HiSeq 2500 platform to generate 208,730 unigenes for Chinese forest musk deer. These genes were then annotated and analyzed. It is likely that there are a large number of genes specific to forest musk deer, as only 36.92% of the sequences showed matches in at least one of the nine public databases examined. The total mapped ratio to reference genome of two samples was 77.83% and 77.25% respectively, indicated that the genomic quality of musk deer should be improved in the future.
KEGG pathway analysis revealed that mRNAs associated with "PPAR signaling pathway", "Calcium signaling pathway", "PI3K-Akt signaling pathway" and "Thyroid hormone synthesis" pathways were more abundant in the musk gland of unmated deer than mated deer. Thyroid hormones are key regulators of development, growth, and metabolism. Thus, the activity of thyroid hormone synthesis likely produces more gonadal hormone in the musk glands of unmated deer to attract females. Those results were also in consistent with a previous study by our group demonstrated that the amount of muscone production is greater in unmated males than mated males (Li et al. 2016), and the main chemical component of musk is muscone. On the other hand, both GO and KEGG pathway analyses of the musk glands of unmated and mated males indicated that numerous genes associated with the calcium signaling pathway in the musk gland of unmated deer. Calcium (Ca 2+ ) ions are important for cellular signaling because they enter the cytosol of the cytoplasm and exert allosteric regulatory effects on many enzymes and proteins (Clapham 2007). Thus, the up-regulation of these pathways probably enables cells to produce and secrete greater amounts of enzymes and proteins. Previous study showed that aldosterone and sex steroid hormones activate intracellular calcium signaling in CCD cells via a nongenomic PKC-dependent pathway, which may have important implications for renal transport (Harvey & Higgins 2000). The similar mechanism may exist in musk deer, which may lead to increased sex steroid hormones secretion and musk production in the unmated musk deer. We noted that some of the most up-regulated genes, such as keratins, are important in keratinocyte differentiation (Fuchs & Green 1980). These keratins are probably involved in protein secretion as well as the sloughing of the keratinized layers in the maturation process of musk. The PPAR pathways are involved in the differentiation of keratinocytes (Lee et al. 2006). In the present study, 5 DEGs in the PPAR pathway were detected, indicating high keratinocyte activity in musk gland of unmated musk deer. Many of the identified down-regulated genes are related to mitochondrion function. Cytochrome C oxidase subunit II (COX2) was the most significantly down-regulated gene, and the down-regulation of COX2 may be influenced by two factors: activate the PPAR pathway can inhibit the expression of COX2 (Yang & Frucht 2001); and COX2 plays a role in keratinocyte differentiation (Leong et al. 1996) and possibly regulated by the musk production progress. Compared with mated deer, many immune-related genes, such as angiotensinogen, adipocyte complement-related protein, and serine protease inhibitor and defensin, were shown to be more abundant in unmated deer. We predict that this is likely related to the host defense mechanisms used to eliminate pathogens and boost innate immunity during the production and maturation of musk.
In the present study, we provided and analyzed the transcriptome of musk deer. However, these results are from only two captive deer, so further work is needed to confirm whether these findings can be extended to wild deer and different individuals by expanding the sample size. Only one sample from each mating status was examined in the present study due to the endangered status of the musk deer. As an endangered animal, the artificial breeding of forest musk deer in musk deer farming is one of the key methods for the protection of Chinese forest musk deer as well as for obtaining a musk from a natural source (Xiang et al. 2009;Meng et al. 2012b;He et al. 2014a). Captive population size is still very small, and it is still difficult to manage and musk deer breed on farms because of their solitary habits, territorial behavior and excitable mood (Zhao et al. 2011;He et al. 2014b). Moreover, many factors, such as the threat of disease and unclear temporal patterns of estrous, contribute to the difficulty of increasing populations of captive musk deer. Thus, gathering transcript data from musk deer samples will not only help us to better utilize musk deer genetic resources, but will also provide a valuable opportunity for us to improve the musk production of this valuable endangered species.

Conclusions
In our study, we characterized and evaluated the musk gland transcriptome and quantified the expression profile of mated-in comparison to unmated sexually mature Chinese forest musk deer. The study enriched the molecular database and laid a foundation for the future functional studies of genes involved in musk secretion. Additionally, our findings showed that "PPAR signaling pathway", "Calcium signaling pathway", "PI3K-Akt signaling pathway" and "Thyroid hormone synthesis" pathways were associated with mated male musk deer. These results will be helpful in greatly improving the musk production in Chinese forest musk deer.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was supported by National Natural