Transcriptome analysis identifies key genes involved in anthocyanin biosynthesis in black and purple fruits (Lycium ruthenicum Murr. L)

Abstract Lycium ruthenicum is a characteristic plant resource in Northwest China. The fruit with high anthocyanin content is the main reason for its high nutritional value and economic value, while the regulation of key genes regulating anthocyanin biosynthesis in black fruit is not clear. In this study, high-throughput sequencing technology was used to compare the expression differences of all genes in black and purple fruits. The purple samples obtained 45.13 and 51.49 M Raw reads, while the black fruit samples measured 53.85 and 47.38 M Raw Reads. After splicing and assembly, 162,842 transcripts were obtained after removing redundant and duplicated results. Homology comparison of all predicted proteins found that 36,978 (22.43%) genes in Lycium ruthenicum were homologous to Solanum tuberosum, followed by 24,459 (14.84%) genes were homologous to Capsicum annuum. Overall, 3685 genes were up-regulated in purple fruit, 2837 genes were down-regulated in purple fruit, and the remaining 156320 genes were not differentially expressed in black and purple fruit. Through homology comparison of anthocyanin biosynthesis and metabolism structure genes in Lycium ruthenicum, 63 homologous genes, 4 MYB transcription factors and 5 bHLH transcription factors were screened. LrAN2p (909 bp) in purple fruit was 136 bp more upstream of ‘ATG’ than LrAN2 (774 bp) in black fruit. Three highly expressed copies of bHLH transcription factors AN1b with different lengths were screened in black fruit, but they were almost not expressed in purple fruit. This study possibly identified the major LrAN1b genes regulating the black fruit trait.


Introduction
Lycium ruthenicum Murray is a perennial shrub of Lycium L. in Solanaceae [1]. It is widely distributed in desert and saline alkali areas such as Qinghai, Xinjiang and Gansu [2]. It is an important 'dual-purpose drug and food' plant resource in China [3]. 'Four medical classics' and 'Jingzhu Materia Medica' recorded that black fruit was mainly used to treat heart fever, heart disease, irregular menstruation, menopause and other symptoms [4]. L. ruthenicum is rich in anthocyanins [5,6], polysaccharides [7,8], polyphenols [9], flavonoids [3]and other natural components. More than a dozen anthocyanins have been found in black fruit, and the anthocyanin content ranks first among all kinds of Lycium [10]. Pharmacological studies show that the antioxidant [10], anti-Alzheimer's disease [1], anti-acute gout [11], anti-inflammatory [12] and other pharmacological effects of L. ruthenicum are related to anthocyanins. Purple L. ruthenicum is distributed in many field areas in Qinghai Province. The plant phenotype, leaf shape and fruit shape are consistent with black L. ruthenicum, but only the fruit color is purple, which is significantly different from black. Purple L. ruthenicum is sparsely distributed and only preserved for Lycium germplasm resources.
Anthocyanins are water-soluble flavonoid secondary metabolites widely existing in higher plants, giving flowers and fruits red, purple and blue color [1], mainly exist in cell vacuoles and play an irreplaceable role in the formation of plant flower color. Studies have shown that anthocyanin biosynthesis is completed through phenylpropane pathway [13,14]. Structural genes related to anthocyanin synthesis include chalcone synthase (CHS), phenylalanine ammonia lyase (PAL), chalcone isomerase (CHI), Flavonoid-3 '-hydroxylase (F3'H), Flavonoid-3′ -hydroxylase (F3H) and Flavonoid-3 ' , 5′ -hydroxylase (F3'5'H) [14,15]. At the same time, the transcription factors MYB, bHLH and WD40 can also regulate the biosynthesis of anthocyanins by regulating the expression of structural genes. The three transcription factors regulate the expression of structural genes by forming the MYB-bHLH-WD40 (MBW) complex [16,17]. However, allelic variation in transcribed genes often leads to color differentiation in different plants. Deletion of the promoter region sequence of the MYB transcription factor VvMYBA1 in grapes results in differences in gene function intensities, resulting in different coloured grape pulp [18]. Similar MYB regulators have been identified in Arabidopsis (MYB113; PAP1; MYB114) [19,20], tomato (SlAN2) [21] and potato (StMTF2). bHLH genes are also important like MYBTFS in anthocyanin biosynthesis. The difference in the formation of anthocyanins in the pericarp of purple and white wheat is mainly caused by the specific expression of TaMYC1 and the repetitive sequence in the promoter region [22]. Similarly, in blue-grained wheat, there is also a key bHLH transcription factor ThMYC4E for key regulation [23]. Homologues of maize R and B genes were also found in Arabidopsis (bHLH113) [20], potato (StbHLH1) [24], petunia (Jaf13) [25] and Ipomoea purpurea (bHLH2) [26]. These genes were found to influence anthocyanin biosynthesis.
Transcriptome sequencing (RNA-seq) is a technology that uses high-throughput sequencing technology to sequence and analyze all or part of mRNA, small RNA and no coding RNA in cells or tissues. It can efficiently, quickly and accurately study the relationship between transcriptional regulatory networks and traits. The pathway of anthocyanin synthesis and metabolism is relatively clear. Through the analysis of gene expression and structure in transcripts, we can quickly find out the key genes of anthocyanin synthesis in different tissues.

Plant material
The Lycium ruthenicum variety LMH1 (Figure 1(a)) and Lycium ruthenicum variety LMHp (Figure 1(b)) are planted widely in China and the two cultivars were chosen to carry out this research as representatives of Lycium ruthenicum. These two kinds of materials were preserved in the Northwest Plateau Institute of Biology, Chinese Academy of Sciences.

RNA extraction
Total RNA was extracted using Tiangen RNAprep Pure Plant Kit, the quality of the total RNA extracted was detected by 1.5% agarose gel, and the concentration was determined by Nanodrop (Thermo Scientific, Wilmington, DE, USA).

Sequencing and data assembly
The total RNA of Lycium ruthenicum fruit that met the quality requirements was sent to Beijing Nuohe Biotechnology Co., Ltd. for subsequent sequencing experiments. The Lycium ruthenicum cDNA libraries were created according to the manufacturer's (Illumina Inc, San Diego, CA, USA) mRNA-seq sample preparation method. The cDNA library was sequenced by Nuohe Biotechnology Co., Ltd. using an Illumina HiSeq 2000 instrument (Beijing, China) with a read length of 100 bp and three replicates. The original sequencing results were filtered, connector sequences were removed as well as low-quality sequences at both ends and low-complexity sequences, so as to obtain high-quality sequences, and then the data were assembled. Trinity assembly software was used to assemble the high-quality data obtained by sequencing, to obtain reliable transcripts and to carry out homologous clustering and splicing of transcripts to obtain unigenes [27]. The assembled unigenes were put into the NCBI blastx (E value < 0.00001) protein database for comparison [28]. The expression levels of unigenes in black fruit and purple fruit were analyzed by FPKM value. Then, chi-quare analysis was performed by IDEG6 software to analyze the differences of unigenes in the transcripts of black fruit and purple fruit, and detect the differential gene expression (DEGs), in which the mismatch rate (FDR) ≤ 0.001 and unigenes with |log2ratio|>1 were determined as significantly differentially expressed genes [29].

Functional annotation and metabolic pathway analysis
We submitted the assembled unigenes to the public database for searching, annotating and classifying the functions of genes, mainly including non-redundant nucleic acid database (NT), non-redundant protein database (NR), protein sequence database (Swissprot) Clusters of orthologous groups for eucaryotic complete genes (KOG), KEGG(http://www.genome.jp/kegg/), GO (http://www.geneontology.org) and Pfam (http://pfam. xfam.org/). The significantly different Unigenes in the black fruit and the purple fruit were located in the GO and KEGG pathway databases, and the GO classification and KEGG pathway analysis of a single Unigene were performed to further search for significantly enriched pathway annotations in the differential genes [28]. We attempted to find the related structural genes according to the KEGG pathway of anthocyanin biosynthesis, and to compare and screen out each homologous transcript in the database. The expression level of each screened transcript in black and purple fruits was evaluated by FPKM value, and the annotation results of each screened transcript in each protein database were evaluated in the annotation data.

Bioinformatic analysis
The amino acid sequence alignment was generated using Vector NTI 11 software. The biological software sequence operation toolbox(http://www.detaibio.com/ sms2/translate.html), developed from DetaiBio(Nanjing, China) was used to translate the coding sequence into amino acids. As previously described [30], we used the Interpro website (http://www.ebi.ac.uk/interpro/) to predict the conservative functional domains, and the amino acid sequence encoded by MYBTF and bHLHTF were compared with other plant homologous sequences in GenBank by MEGA 7.0 software to determine whether MYBTF and bHLHTF belong to the category of transcription factors controlling anthocyanin synthesis.

Sequence assembly and protein annotation
The second-generation high-throughput sequencing platform Illumina HiSeq 2000 was used to sequence the transcriptomes of Purple and Black fruits. The Purple samples produced 45. 13 1). This indicates that the sequencing quality control is good. The quality of clean reads was measured and the data were reliable.

Functional classification of differentially expressed genes
All annotated unigenes were classified into three categories: Biological Process, Cellular Component and Molecular Function (Figure 2(a)). The differential expression of each gene was judged by the log2fold charge value. There were 3685 genes up-regulated in purple fruit; 2837 genes were down-regulated in purple fruit, and the remaining 156320 genes were not differentially expressed in black fruit and purple fruit (Figure 2(b)). After KEGG annotation of all differentially expressed genes, it was found that most genes in the differential table were distributed in the Metabolism category, especially 167 differentially expressed genes in the Secondary Metabolic Pathway (Figure 2(c)). This further narrowed the scope to screen out the key  genes regulating the difference of anthocyanin biosynthesis in purple fruit.

Analysis of anthocyanin biosynthesis pathway
Through homology comparison of anthocyanin biosynthesis and metabolism structure genes in Lycium ruthenicum, 63 homologous genes, 4 MYB transcription factors and 5 bHLH transcription factors were screened (Supplemental Table S2 Figure 3). The above structural genes are also expressed in purple fruit, but far weaker than black fruit. Four MYB transcription factors screened were the same transcript of LrAN2 gene. Although LrAN2 is widely expressed in black fruit, it is more expressed in purple fruit. LrAN2 was screened in the transcriptome of black fruit and purple fruit, respectively. The results showed that LrAN2p (909 bp) in purple fruit was 136 bp more upstream of 'ATG' than LrAN2 (774 bp) in black fruit (Figure 4(b)). The phylogenetic tree showed that AN2 was associated with MYB1 and MYB113 in Nicotiana attenuate (Figure 4(a)).
The five bHLH transcription factors were different transcripts of LrAN1b, except two invalid transcripts: CL8159. Contig6_ All and CL8159.Contig7_ All, and the open reading frames (ORF) retrieved from the other three transcripts are more than 1000 bp. The difference value Log2FC of contig1, 5 and 9 expression in black fruit and purple fruit were −8.103, −12.099 and −7.428, respectively. In particular, these three transcripts are highly expressed in black fruit and almost not in purple fruit. CL8159.Contig1_ All (2022 bp), CL8159.Contig5_ All (1179 bp) and CL8159.Contig9_ All (1968 bp) are different lengths of the same gene. There was a 54-bp difference between 1 and 9 at 'TAA' , while 5 was the shortest and 843 bp were missing at 'ATG' (Figure 4(d)). The phylogenetic tree showed that LrAN1b was associated with bHLH1, bHLH2 and bHLH3 in Capsicum annuum (Figure 4(c)).

Discussion
In this work, the transcriptome was used to compare black and purple fruits of Lycium ruthenicum to evaluate difference in anthocyanin biosynthesis genes. Comparing the differences in the expression levels of structural genes and transcription factors, could help to ultimately explain the main molecular reasons that regulate the differences in anthocyanin accmulation in black and purple fruits.
Nowadays, there are many studies on the transcriptome of Lycium ruthenicum. The transcriptomes of Black and red fruit (Nq7) were compared to screen 31 genes related to anthocyanin biosynthesis, it was not activated in red fruit, and the expression levels of all structural genes were very low. In this study, although the expression of structural genes in purple fruit was far weaker than that in black fruit, CHS (CL9678. Contig3_All), F3H (CL2922.Contig1_All), F3'H (CL21824. Contig1_All), F3'5'H (CL7784.Contig7_All), DFR (CL6466. Contig1_All), ANS (CL12997.Contig4_All) all had high expression, and UF3GT (CL18450.Contig1_All), AT1 (Unigene60992_All) and GST (Unigene21020_All) during anthocyanin transport were transcribed in a large amount in purple fruit, which shows that the anthocyanin pathway in purple fruit is also activated.
Zong et al. [30] isolated key MYB transcription factors LrAN2 and LbAN2 from black (LMH1) and red (Nq7) fruits, respectively. The results of transgenic overexpression showed that the ability of LrAN2 to induce anthocyanins in tobacco was stronger than that of LbAN2. The population analysis of Lycium ruthenicum confirmed that the allelic variation of the AN2 gene in the promoter and genome in red and black fruits was closely related to the formation of fruit color. Interestingly, the four AN2 homologous genes screened in this comparison have higher expression levels in purple fruit than red fruit, especially CL1622.Contig2_All has FPKM values of 5768 and 849 in purple fruit and black fruit, respectively. Due to LrAN2p is longer in purple fruit, the insertion of 136 bp leads to a stronger expression of LrAN2p, and the large amount of transcription of AN2 gene can confirm that this gene is involved in the process of activating anthocyanin biosynthesis, which cannot explain the reason for the low anthocyanin content in purple fruit.
The expression levels of bHLHs in red and black fruits were different but not significant. But in this study, the purple fruit was used as a reference and bHLHs were significantly different. Among the five transcripts screened in this analysis, CL8159.Contig6_All and CL8159.Contig7_All ORF was too short to be an effective bHLH transcription factor, three different copies of CL8159.Contig1_All, CL8159.Contig5_All and CL8159.Contig9_All are only abundantly transcribed in black fruit, and almost not expressed in purple fruit, especially CL8159.Contig5_All. Tang B et al. [31,32] found in the developmental tree that these three copies were most related to the three copies of AN1b in pepper, and it is the specific expression of these three copies in pepper that determines the difference in anthocyanin accumulation in pepper epidermis and thus affects the appearance of pepper color. The interaction between LrAN2 and LrAN1b has been verified by the yeast two-hybrid method, but the key reason why a large amount of anthocyanins cannot be formed in white fruit has not been explained. Therefore, it is very likely that there is a weaker functional MBW (LrAN2p+LrAN1bp+WD40) complex in purple fruit that properly activates the biosynthesis of anthocyanin in purple fruit, while in black fruit, MYB and bHLH have a large amount of transcription, so the anthocyanin biosynthesis pathway in black fruit is greatly activated. Zong et al. [22] demonstrated that the expression of TaMYB7D in purple and white wheat pericarp did not differ between the two colours of grains, while deletion of the repeat in the upstream promoter of the bHLH transcription factor TaMYC1 prevented TaMYC1w from forming a functional complex with TaMYB7D in white grains. The main reason for the lack of AN1b expression in purple fruit may be that the upstream promoter function of CL8159.Contig5_All inhibited the expression of CL8159.Contig5_All in purple fruit. Subsequent studies will focus on the functional differences of LrAN1p promoters.

Conclusions
This study confirmed that there was obvious anthocyanin accumulation in purple fruits, but the expression of structural genes in anthocyanin pathway were much weaker than that in black fruits. LrAN2p was also highly expressed in purple fruits, while three bHLH transcription factors were highly expressed only in black fruits: CL8159.Contig1_ All (2022 bp), CL8159.Contig5_All (1179 bp) and CL8159. Contig9_All (1968 bp). It confirmed that the formation of high anthocyanin fruit in Lycium ruthenicum needs not only the specific expression of LrAN2 or LrAN2p, but also the coordinated regulation of bHLH transcription factors, and finally the formation of a highly functional MBW complex, so as to activate the biosynthesis of high anthocyanin in black fruit of Lycium ruthenicum.

Disclosure statement
The authors have declared that no competing interests exist.

Data availability statement
The transcriptomic data (contains black and purple fruits) has been successfully uploaded to NCBI (https://www.ncbi. nlm.nih.gov/bioproject/807882), Submission ID: SUB11096071; BioProject ID: PRJNA807882. All data generated or analyzed during this study are included.