Digital gene expression analysis during floral transition in pak choi (Brassica rapa subsp. chinensis)

Digital gene expression profiling technology was used to examine the changes in gene expression during floral transition in pak choi. A total of 1486 differentially expressed genes (DEGs) during floral transition were identified, of which 505 were upregulated and 981 were downregulated. Gene ontology enrichment analysis showed that the proteins encoded by DEGs were mainly located in eight cell regions including the apoplast, plant-type cell wall, chloroplast, etc. They had eight kinds of molecular functions such as transcription factor and oxidoreductase, and involved in 72 biological processes containing jasmonic acid and salicylic acid metabolism, glucose metabolism, the MAPK cascade, etc. Further pathway enrichment analysis showed that four metabolic pathways were significantly enriched by DEGs. DEGs that exhibited at least a 10-fold difference were analyzed in detail. Results showed that flavonoid metabolism and phenylpropanoid biosynthesis (or pathways) were significantly enriched and the corresponding gene number was the highest. Moreover, 15 genes among the highly expressed genes were involved in reproductive development, of which five homologues in Arabidopsis thaliana were directly related to floral transition, which were predicted to have a similar function in pak choi.


Introduction
The transition from vegetative growth to reproductive growth is not only relevant in the perpetuation of species, but also related closely to human life; this transition in higher plants is regulated by endogenous and exogenous signals [1][2][3]. With the completion of whole genome sequencing in the model plants Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa), the molecular mechanisms of flowering were also investigated, and this endeavour has achieved significant progress [4]. New genes were identified and intensively characterized [5][6][7], and new regulation pathways were discovered in succession. Wang et al. [8] revealed that the microRNA156-targeted SQUAMOSA PROMOTER BINDING PROTEIN-LIKE (SPL) genes not only act downstream of FT/ FD, but also define a separate endogenous flowering pathway. Currently, at least six pathways that control flowering have been identified and they are vernalization pathway, photoperiod pathway, autonomous pathway, gibberellin pathway, ambient temperature pathway, and ageing pathway [1,9].
Pak choi (Brassica rapa subsp. chinensis) belongs to the Brassicaceae family, and it is one of the more cultivated and consumed vegetables in China and other Asian countries. Pak choi was recently introduced in North America and Europe. The recent completion of whole genome sequencing and the constant enrichment of information on gene annotation in Chinese cabbage, which is the representative of the Brassica species [10,11], facilitates the gene comparison and screening for expression profile analysis of pak choi and provides an opportunity for genetic improvement and molecular design breeding among Brassica species. Although some studies on flower molecular regulation in Brassica species have been conducted [12][13][14][15][16], information is still lacking compared with the flowering molecular mechanism of the model plant A. thaliana. Therefore, further studies on the flowering mechanism of pak choi are necessary in B. rapa, as well as in breeding within Brassicaceae.
Digital gene expression (DGE) profiling technology is based on high-throughput sequencing technology and it is an effective method for the systematic investigation of the overall differences in gene expression at the transcriptome level. Given its high sensitivity and ability to detect splicing isoforms and somatic mutations, DGE has been widely used in gene expression profiling and pathway studies [17]. In addition, because DGE does not require the genome reference sequence, this technology has also been applied to decode the genomes of several non-model organisms, thereby providing valuable information that elucidates gene functions in these organisms [18,19].
In the present study, DGE profiling was performed using Illumina sequencing technology to examine gene expression profile changes in the shoot apex before and after flower bud differentiation in pak choi. This study aimed to screen the differentially expressed genes (DEGs) and identify the pathways involved during floral transition in pak choi. The results may be used as a basis to improve the flowering molecular mechanism of B. rapa vegetables.

Plant material and cultivation management
Pak choi variety '75 #' is an easy-bolting inbred line provided by the Institute of Vegetables Research, Shanxi Academy of Agricultural Sciences.
Seeds were rinsed and soaked in water at 55 C for 10 min and kept for 2 h at room temperature. The seeds were placed in Petri dishes with double filter paper on the bottom and then kept in a growth chamber at 25 C and 16 h/d light until the radicle emerged through the seed coat. Thereafter, the seeds were placed in the refrigerator at 4 C for 20 d in the dark. The filter paper was kept moist during the entire period to ensure seed germination [20]. After treatment, the seedlings were planted in the 50-hole pot filled with sphagnum peat/ vermiculite substrate mixture (3:1, v/v), and standard agricultural practices were applied throughout the experiment.

Sampling
Flower bud differentiation during plant growth was graded based on B. rapa subsp. Pekinensis [21] and Arabidopsis [22]. On the 16th day after planting, the shoot apex started to differentiate into the flower bud (stage 1). Each apex was accurately determined to be in stage 1 using a stereomicroscope (OLYMPUS SZX16), the apex of six plants were taken and mixed into approximately 0.15 g of sample and were quickly frozen in liquid nitrogen, and then stored in ¡80 C until RNA extraction. Plant apices were labelled as flower bud differentiation stage 1 (stage 1). The sample obtained on 15th day (when the plant had about seven true leaves and when the flower bud did not yet start to differentiate) was set as the control (CK).

Total RNA extraction and gene expression profile sequencing
Total RNA was extracted using QIAGEN RNeasy Plant Mini Kit (QIAGEN, 74903, Germany) with DNaseI according to the suggested protocol. The integrity of total RNA was verified by running the samples on 1% agarose gel. The quality and quantity of the purified RNA were determined by measuring the absorbance at 260/280 nm (A260/A280) using a NANODROP 2000c spectrophotometer (Thermo Scientific, USA). High-throughput sequencing was performed by Biomarker Technologies (Beijing, China). Total RNA (6 mg) was enriched by binding to oligo(dT) magnetic adsorption beads. The mRNA was fragmented by adding fragmentation buffer. The RNA fragments then served as a template for first-strand cDNA synthesis using random hexamers and reverse transcriptase. Second-strand cDNA was synthesized using DNA polymerase I, dNTPs, and RNaseH. Secondstrand DNA was then purified using a QiaQuick PCR extraction kit. The double-stranded cDNA was endrepaired, and a single 'A' base was added to the 3 0 -end of the cDNA fragments. Solexa adapters were then ligated to the ends of the cDNA fragments; cDNA fragments of a suitable length were obtained by agarose gel electrophoresis and amplified by PCR to construct the final cDNA libraries using Illumina HiSeq TM 2500 for Solexa sequencing.

Analysis of gene expression and functional annotation
The raw sequences of the two datasets were evaluated. Raw reads were processed by remove reads containing adaptors and reads with more than 10% unknown bases. Low-quality reads were filtered out that each reads with a quality score of less than 10 was not more than half of the bases, and the clean reads consist of more than 80% base-pair with Q-value 30. The clean reads were aligned with TopHat software [23], and then were mapped with the reference gene sequence (http://brassi cadb.org/brad/), allowing for up to 1 bp mismatch only. The mapped reads were classified as annotated. The false discovery rate (FDR) 0.01 and fold change (FC) 2 were used to determine the threshold of significant DEGs.
Gene function and pathway involved in were analyzed using the gene ontology (GO) (http://www.geneontology. org/) and Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www.genome.jp/kegg/) databases, and the genes upregulated and downregulated by at least 10-fold were analyzed in detail.

Quantitative real-time PCR
To validate the DGE results, we assessed two randomly selected genes' relative expression levels by quantitative real-time PCR (qPCR) analysis. For reverse transcription, the concentration of each RNA sample was adjusted to 500 ng/mL. Single-stranded cDNA was synthesized using PrimeScript® RT Reagent Kit for Perfect Real Time (TaKaRa, DRR037A, Japan) following the manufacturer's instructions. Specific primers used for qPCR were designed with the online Primer 3 software, and the pak choi ACTIN gene was chosen as an internal reference gene. qPCR was carried out using SYBR® Premix Ex TaqTM (Tli RNaseH Plus) (TaKaRa, DRR820A, Japan) in a 25-mL reaction volume. The reaction mixture included 20 ng cDNA (2 mL), 8.5 mL ddH 2 O, 12.5 mL SYBR and 1 mL of each of the forward and reverse primers (10 mmol/L). The program was as follows: 30 s at 95 C, followed by 40 PCR cycles of 5 s at 95 C, 15 s at 55 C and 20 s at 72 C. The relative levels of gene expression were calculated using the comparative 2 ¡DCT method.

Data access
The transcriptome sequencing data from this study have been deposited in the NCBI SRA database and are accessible through accession numbers SRP066286 (http:// www.ncbi.nlm.nih.gov/sra).

Sequencing quality evaluation
Read analysis showed that the raw reads were transformed into clean reads after removing the low-quality reads. The proportions of the clean reads in control and stage 1 sample were 99.90% and 99.96%, respectively, indicating that the quality of the cDNA library and sequences was high.
Considering the differences in genetic background and limitation of the annotated information, approximately 85.08% of the clean reads could be mapped to the reference sequences in the control sample (Table 1), while percentage in stage 1 was approximately 84.25%. Most of them (62.58% of control and 63.50% of stage 1) perfectly matched the genome. The mismatches between all annotation reads and the genome were nearly less than 2 bp. In addition, multiple reads mapped to similar genes was low, and most of the annotated genes were only a single read (93.79% of control, 93.83% of stage 1). These results indicated that the homology between pak choi and the reference genome was relatively high, and the selected reference genome was good, which could meet the needs of subsequent analyses.
Gene expression profile analysis before and after flower bud differentiation A total of 41,174 reference genes were selected for DEGs analysis. Reads that could be annotated produced 41,020 effective reference genes, accounting for 99.87% of the total number of reference genes. FDR value 0.001 and FC 2 were used as a threshold to identify significant DEGs. The number of DEGs between the stage 1 group and control group was 1486, of which 505 were upregulated and 981 were downregulated, and the number of downregulated genes was significantly higher than that of the upregulated genes ( Figure 1).
Gene expression fold changes were highly uneven ( Figure 2). A total of 1486 genes showed at least two-fold changes. Fold changes were mainly two-to five-folds,  thereby accounting for 88.1% (74.3%) of the total number of DEGs. Five-to ten-fold changes in DEGs accounted for 9.11% (16.0%). Twelve (81%) DEGs exhibited 10-to 30-fold changes, accounting for 2.38% (8.26%). The DEGs with expression of 30-fold or greater accounted for only 1.08%, of which only 2 were upregulated and 14 were downregulated at stage 1. These results showed that the number of DEGs with significant difference was low, and it significantly decreased with differences increasing during floral transition in pak choi. Thus, floral transition did not affect significantly most genes' expression levels, which suggested that the function of DEGs with minor differences could not be ignored in analysis. Among the terms listed in cellular components, the apoplast (GO: 0048046) was the most enriched term, followed by plant-type cell wall (GO: 0009505) (Figure 3). The apoplast is the major pathway for nutrient transport, storage, and activation. The apoplast and plant-type cell wall do not only play a key role in intercellular communication and interactions, but the apoplast can also determine cell fate during cell differentiation [24]. In addition, the chloroplast thylakoid lumen and thylakoid membrane were shown to be enriched regions, both of which are sites for food production and energy conversion. Cellular localization analysis was also performed on 109 genes that were either upregulated or downregulated by at least 10-fold. These genes corresponded to the chloroplast thylakoid membrane and apoplast, indicating that plants make large amounts of nutrients through energy conversion and transport them in vivo. This result further explained that floral transition by itself is a complex process and requires large amounts of energy and nutrients.

GO functional analysis of DEGs
In molecular function categories, the sequence-specific DNA binding transcription factor activity (GO: 0003700) was the most enriched term. Upon comparative analysis of the upregulated and downregulated genes of at least 10-fold, 14 genes were classified into nine classes of transcription factors. The MYB transcription factors, including Bra037419, Bra017813, and Bra000453, were expressed the most, demonstrating expression changes of at least 20-fold. The MYB transcription factor family, one of the largest families of transcription factors in plants, regulates plant growth and metabolism, such as cell morphology and pattern formation, secondary metabolism, and biological and abiotic stress responses [25]. The MYB transcription factor also played role in regulating anther development and pollen formation [26]. Another enriched items, phenylalanine ammonia-lyase (PAL), participate in polyphenol (e.g. flavonoid and phenylpropanoid) synthesis, whereas, adenylyl-sulfate reductase (glutathione), phosphoadenylyl-sulfate reductase (thioredoxin), adenylyl-sulfate reductase, and peptide-methionine (S)-S-oxide reductase are all categorized as reductant-oxidant enzymes. Thus, plants require high levels of energy during floral transition, and this finding was similar to that for enriched terms for cellular components.
In terms of biological process, the term JA-mediated signalling pathway (GO: 0009867) showed the largest number of annotated genes. High amounts of JA, which plays an important role in the formation of flowers, fruits, and seeds, are found in the reproductive organs of plants; many genes responding to JA exhibit high expression levels in flower and fruit development [27]. The present study found a large number of annotated genes for the JA-mediated signalling pathway, indicating that JA may be involved in reproductive organ development during floral transition in pak choi.
In the present study, the MAPK cascade contained 10 genes with at least 10-fold DEGs. The MAPK cascade is a protein kinase that is also known as protein phosphorylase. The MAPK cascade is activated by various biological and abiotic stresses, and it is also involved in hormone signal transduction and plant growth and development [28]. In this research, the MAPK cascade was found to be involved in floral transition in pak choi.
The term response to sucrose was significantly enriched in the biological process category, and it contained 17 genes with at least 10-fold DEGs. Sugar, which is the main product of photosynthesis, is essential in regulating plant metabolism and development, such as germination, flowering, ageing, and stress reaction. Sucrose significantly promotes flowering, for example, pineapple and rudbeckia stem tips accumulate large amounts of sucrose during floral transition [29,30]. The experimental results showed that sugar plays a key role in floral transition in pak choi.
SA biosynthesis was significantly enriched in the biological process category, and it contained 10 genes with at least 10-fold DEGs. SA is a simple phenolic compound in plants. SA is a floral induction factor during flowering. Low SA concentrations promote flower bud differentiation, whereas high SA concentrations inhibit flower bud differentiation [31]. Treatment with a certain SA concentration suppresses the growth, delays the onset of flowering, reduces the number of flowers, and affects the consistency and corm plumpness of Freesia [32]. Moreover, low SA concentration promotes flower bud differentiation in cucumber [33]. These results suggested that SA may affect the flowering of pak choi. However, the mechanism of the effects of SA on plant flowering remains to be further studied.

KEGG pathway analysis
To identify the biological pathways involved in floral transition, 1486 genes were mapped to the referenced and canonical pathway in KEGG. A total of 88 KEGG pathways were assigned, of which 19 pathways showed significant differences between the control and stage 1 groups (p < 0.05) ( Table 2). However, at a corrected pvalue < 0.05, only four of these pathways were significant, namely, nitrogen metabolism, flavonoid biosynthesis, phenylpropanoid biosynthesis, and tryptophan metabolism.
A total of 109 genes that were either upregulated or downregulated by at least 10-fold were used in pathway enrichment analysis. The highest number of enriched genes was involved in the flavonoid biosynthetic and phenylpropanoid biosynthetic pathways. This result was consistent with DEGs significantly enriched in the GO biological process 'flavonoid biosynthetic process' and GO molecular function 'phenylalanine ammonia-lyase activity.' It suggested that flavonoids and phenylpropanoids played an important role in floral transition in pak choi. Phenylpropanoid metabolism involves a complex series of branched biochemical reactions. This process produces numerous phenolic substances in plants, and its metabolic branches vary among plant species; one of its main branches produces flavonoid compounds, including flavonoid alcohol, anthocyanin, and tannin [34]. Flavonoids are the main pigments in flowers, fruits, and seeds; they also protect plants against UV, prevent pathogenic microorganisms invasion, increase crop yield, adjust auxin transport, promote pollen germination, and are directly related to pollen fertility [35][36][37][38][39].
In this study, nitrogen metabolism was the first significant enrichment pathway, which had two genes with at least 10-fold change in expression. Nitrogen metabolism is one of the basic physiological processes in plants. Plants generally absorb inorganic nitrogen compounds, such as ammonia salt or nitrate. Nitrate reductase and ammonia salts are directly involved in amino acid synthesis and transformation. The amino acid is then used as the main substrate in protein synthesis in the cell, and it becomes an important part of the plant, following protein modification, classification, transport, storage, and other relevant processes. Nitrogen affects the flowering of Boronia and Alive [40,41]. Zhang [42] also reported that nitrogen nutrition can adjust gene expression for the induction of flowering in A. thaliana, resulting in changes in floral induction. This result again showed the importance of nitrogen metabolism in floral transition in pak choi.
In the present study, tryptophan metabolism was the fourth among the KEGG pathways, showing significant enrichment. Therefore, tryptophan metabolism may play an important role in floral transition. It is reported that tryptophan metabolism not only provides tryptophan, but also provides a large number of metabolic precursors for the synthesis of various secondary metabolites, such as IAA, phytoalexin, and alkaloids derived from indole and anthranilic acid. Tryptophan biosynthesis also plays a direct regulatory role on plant growth and development, pathogen defense response, and plant-insect interactions [43]. Tryptophan was recently found to promote flowering in Pharbitis nil [44].
Three genes with at least 10-fold change in expression were found in the diterpenoid biosynthesis pathways. Terpenoids are one of the main floral components, and flower fragrance plays an important role in plant reproduction [45]. Terpenoids do not only render fragrance to flowers, but also perform many other physiological and ecological functions, such as attracting pollinators, regulating plant growth and development, adjusting plant heat resistance and resisting photooxidation stress [46]. This study suggested that terpenoids are synthesized during floral transition in pak choi to prepare for flower formation and protect against bad environmental conditions. DEGs related to reproductive growth during floral transition A total of 15 genes among the highly expressed (FC >10-fold) 109 genes (Table 3) were speculated to be related to reproductive growth. Bra022565 positively regulated flowering compared with the remaining 14 genes, which displayed downregulated expression levels during floral transition. By comparing these 15 genes with the floral genes published in reference genome, only Bra002788 has been known. Bra002788 was involved in the gibberellin pathway, negatively regulated flowering, exhibited downregulated expression during floral transition in pak choi, and performed a function that was consistent with that of the reference genome. A comparative analysis of the remaining 14 genes from A. thaliana showed that Bra022565 was upregulated by 11.77-fold. Its homologue in A. thaliana encodes GA20OX2, which belongs to the GA pathways and catalyzes several steps in the biosynthesis of GA by oxidizing a number of precursors, thereby promoting flowering. In addition, the homologue of Bra034674 in A. thaliana encodes a peroxisomal catalase, participates in photoperiodic response, and was highly expressed in bolts and leaves, and shows a circadian pattern in mRNA expression, in which mRNA levels are high in the early morning. The Bra027530 was speculated to suppress the meristem transition from vegetative to reproductive growth, and its homologue in A. thaliana encodes a member of the NAP subfamily of ABC transporters and is involved in anthocyanin and flavonoid biosynthesis, but without similar functions of switching vegetative growth to reproductive growth. Furthermore, the Bra003998 homologue in A. thaliana encodes a member of the NAC transcription factor gene family and regulates flower development and leaf senescence; this homologue is also expressed in floral primordia and is upregulated by AP3 and PI, its expression is also associated with leaf senescence. The Bra031255 homologue is a member of the zinc finger (B-box type) family protein, and it is involved in the light signalling pathway in A. thaliana and regulates flower development. The Bra004109 homologue in A. thaliana is involved in the formation of pollen exine, and it is associated with anthocyanin and flavonoid biosynthesis. The Bra033350 and Bra026606 homologues in A. thaliana are involved in embryonic development. Bra016250 is presumably related to postembryonic development in pak choi, although its homologue is not functional in A. thaliana. The homologues of Bra036883, Bra033911, Bra001886, and Bra023798 in A. thaliana all encode early light-inducible protein and positively control seed germination. Furthermore, Bra012600 positively regulates development. These findings indicated that the homologues of Bra022565, Bra034674, Bra027530, Bra003998, and Bra031255 in A. thaliana were directly related to floral transition, which predicted that these genes may have similar functions in pak choi and need to be investigated further.

Validation of DEGs by quantitative real-time PCR
To validate the DGE data, gene Bra028599 and Bra029305 were selected for DGE analysis during floral transition; the results are shown in Figure 4. The results of qPCR showed these genes had the concordant direction of expression change with DGE, indicating that the results from DGE are believable.

Conclusions
Using DGE profiling technology, 1486 DEGs were identified during floral transition in pak choi. Among these DEGs, 505 were upregulated, whereas 981 were downregulated. The expression change fold of these genes was mostly 2-5. The proteins encoded by DEGs were mainly located in the apoplast, cell wall, chloroplast thylakoid and other parts, and they were widely involved in biological macromolecular metabolism, such as protein and sugar metabolism, hormone metabolism, secondary metabolism, and energy metabolism. The flavonoid metabolic pathway and phenylpropanoid synthesis pathway were mainly involved during floral transition. Fifteen genes (FC 10-fold) were involved in reproductive growth, among which the homologues of Bra022565, Bra027530, Bra034674, Bra003998, and Bra031255 in A. thaliana were directly related to flowering transition. It suggested that these genes may be directly related to flowering in pak choi. Functional analysis and validation of these genes are necessary so as to lay the foundation for further molecular research on the mechanism of floral transition in pak choi.