De novo transcriptome sequencing of genome analysis provides insights into Solidago canadensis invasive capability via photosynthesis

ABSTRACT Solidago canadensis is one of the most destructive invasive plants in the East of China, yet nothing is known about its photosynthetic ability at the molecular level. In order to examine the mechanism aiding invasion from the photosynthetic and molecular perspective, transcriptome sequencing and de novo assembly of S. canadensis and its congener S. decurrens (native, non-invasive species) were compared in a bioassay experiment. Results showed that S. canadensis has higher biomass and net photosynthetic rates than those of the native plant, S. decurrens (P < 0.05). Based on RNA-seq data, the genes which are closely related to light absorption and electron transfer in S. canadensis were observed to be up-regulated compared with those from S. decurrens, including the process of photosystems I and II, such as PsbP, PsbQ, Psb27, PsbW, PsaG, PetE, and light-harvesting complexes (LHCs). The RT-qPCR verification results of key genes were similar to those from transcriptome. Therefore, our results might partly explain the successful invasion of S. canadensis in China at the level of transcriptome, leading to its enhanced photosynthetic capability.


Introduction
Solidago canadensis (Asteraceae) is a perennial clonal herb, originating from North America. It has been intentionally introduced into New Zealand, Australia, Europe and parts of Asia, and especially in the East of China (Li and Xie 2002;Dong et al. 2006). S. canadensis was first introduced into Jiangsu province, China in 1913 as an ornamental plant (Jin et al. 2004), and in the next 50 years, this species rapidly expanded into all types of land areas in the East of China and also the Central of China (Zhao et al. 2015). Currently, S. canadensis is abundant in many natural habitats, including roadsides, farmlands, woodlands, and city suburbs (Li and Xie 2002), presenting a huge threat to the biodiversity and abundance of the native vegetation.
Compared with its native congeneric species, S. canadensis can be adapted to many extreme environments, such as drought stress, salt stress and high temperature, which enables it to demonstrate high invasion capability (Hu et al. 2007). Previous studies have shown that there were several factors resulting in its successful invasion. Firstly, the potential capability of rapid growth and prolific reproduction (both sexual and asexual) clearly contribute to its successful invasion (Xu et al. 2006;Yang et al. 2007). Secondly, there is a strong allelopathy effect from S. canadensis roots and rhizomes, which can inhibit the germination and seedling growth of native plant species (Zhang et al. 2014). Thirdly, soil microbes, such as arbuscular mycorrhizal fungi, could enhance the invasion of S. canadensis into new habitats, with a cumulative effect of AM fungi on the species over time (Jin et al. 2004). As an abiotic factor, light also could affect the competitive ability, growth, or stress-resistant ability of the invasive plant (Björn et al. 2016). However, there are still several unanswered questions about the light adaptation ability of S. canadensis.
Molecular tools and transcriptome sequences are costeffective methods to support research on environmental responses in plants (Müller et al. 2012), and the RNA-seq method provides a direct and efficient way to compare the complex mechanism of photosynthesis (Xing et al. 2016). Thomas et al. (2016) demonstrated the generation and assembly of RNA sequences to explain the responses to environmental change in green ash (Fraxinus pennsylvanica), and emerald ash borer attack, which is a primary threaten to green ash trees. Yu et al. (2016) provided a solid foundation for identifying taproot thickening-related critical genes (RsSPS1 and RsSuSy1) based on root transcriptome sequencing in Raphanus sativus. Furthermore, in order to explain the invasive mechanism of Ipomoea cairica, an invasive plant in southern China, Björn et al. (2016) sequenced and assembled the de novo transcriptomes from I. cairica and two related species (I. digitata and I. nil), and showed how some genes related to enzyme production (pal, 4cl, cad, chs, and chi) in secondary metabolism played important roles in stress-resistant, growth and allelopathy processes.
Based on the above, this study was carried out to perform transcriptome sequencing and de novo assembly on two species, the invasive plant S. canadensis and the co-occurring native plant Solidago decurrens in order to: (1) compare the transcriptomes relating to photosynthesis to examine their role in invasion ability; (2) provide information for further functional and comparative genomic research about the invasive plant.

Photosynthetic analysis
The photosynthetic parameters of the two species were tested for prior to destructive sample collection. Chlorophyll fluorescence measurements (F v /F m ) were obtained by recording fluorescence induction curves with a Dual-PAM-100 measuring system (Walz GmbH, Effeltrich, Germany). Before determination, the leaves were kept in clip curettes for dark adaptation for 30 min. The fluorescence parameters F 0 (initial fluorescence level) and F m (maximum fluorescence level) were measured and the ratio F v /F m (where F v = F m -F 0 ) was determined according to Rosenqvist & van Kooten's method (2003). Rapid light curve (α value) was also measured with Dual-PAM-100.
To characterize light-induced shifts in carbon acquisition, instantaneous gas exchanges were measured on fully expanded leaves, the photosynthetic rates were determined using the Portable Gas Exchange Fluorescence System GFS-3000 (Walz GmbH, Effeltrich, Germany). Two leaves of each replicate from the same site of both S. canadensis and S. decurrens were selected for sampling. The mean value of two leaves was used as the replicate for statistical analysis. All the three replicates of each plant were determined. All of the measurements were completed from 08:00 to 15:30 in 27 August 2016.
After testing and collection, each plant was washed and collected, including all the shoots and roots, and then oven dried for the measurement of shoots biomass and roots biomass (65°C for 48 h).

RNA extraction
RNA from the leaves of both S. canadensis and S. decurrens was collected. Total RNA was extracted from the tissues using TRIzol® Reagent according to the manufacturer's instructions (Invitrogen, Shanghai, China) and genomic DNA was removed using DNase I (Takara, Dalian, China).

Library preparation and Illumina Hiseq 4000 sequencing
Firstly, the RNA-seq transcriptome library was prepared following TruSeq TM RNA sample preparation Kit from Illumina (San Diego, CA, USA). Messenger RNA was isolated according to polyA selection method by oligo (dT) beads and then fragmented by fragmentation buffer. Secondly, doublestranded cDNA was synthesized using a SuperScript double-stranded cDNA synthesis kit (Invitrogen, CA, USA) with random hexamer primers (Illumina). Then the synthesized cDNA was subjected to end-repair, phosphorylation and 'A' base addition according to Illumina's library construction protocol. Libraries were size selected for cDNA target fragments of 200-300 bp on 2% Low Range Ultra Agarose (LRUA) followed by PCR amplified using Phusion DNA polymerase (NEB, Beijing, China) for 15 PCR cycles. After quantified by TBS380 (YPH-Bio, Beijing, China), pairedend RNA-seq sequencing library was sequenced with the Illumina HiSeq 4000 (2×150 bp read length).

De novo assembly and annotation
The raw paired end reads were trimmed and quality controlled by SeqPrep (https://github.com/jstjohn/SeqPrep) and Sickle (https://github.com/najoshi/sickle) with default parameters. Next, clean data from the samples were used to do de novo assembly with Trinity (http://trinityrnaseq. sourceforge.net/) (Grabherr et al. 2011). All the assembled transcripts were searched against the NCBI protein nonredundant (NR), String, and KEGG databases using BLASTX to identify the proteins that had the highest sequence similarity with the given transcripts to retrieve their function annotations and a typical cut-off e-value less than 1.0×10 −5 was set. BLAST2GO (http://www.blast2go.com/ b2ghome) program was used to get GO annotations of unique assembled transcripts for describing biological processes, molecular functions and cellular components (Conesa et al. 2005). Metabolic pathway analysis was performed using the Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg/) (Goto and Kanehisa 2000).

Detection of homologous genes
OrthoMCL v2.0.2 was used to identify gene family clusters in S. canadensis and S. decurrens (Li et al. 2003). Pairwise sequence similarities between all input protein sequences were calculated by all-by-all BLASTp with an e-value cutoff of 1e-05 and a minimum match length of 50%. To define ortholog cluster structure, a Markov clustering algorithm was applied with an inflation value (-I) of 1.8 (default value in OrthoMCL) (Kim et al. 2017). Putative splice variants were removed from the data set, the longest protein sequences were kept and subsequently filtered for premature stop codons and incompatible sequences.

Differential expression analysis and functional enrichment
To identify DEGs (differential expression genes) between two different samples, the expression level of each transcript was calculated according to the fragments per kilobase of exon per million mapped reads (FRKM) method. RSEM (http://deweylab.biostat.wisc.edu/rsem/) was used to quantify gene abundances (Li and Dewey 2011). The R statistical package edgeR (Empirical analysis of Digital Gene Expression in R, http://www.bioconductor.org/packages/2. 12/bioc/html/edgeR.html) was utilized for differential expression analysis (Robinson et al. 2010). In addition, functional -enrichment analysis including GO and KEGG was performed to identify which DEGs were significantly enriched in GO terms and metabolic pathways at Bonferroni-corrected P-value ≤0.05 compared with the whole transcriptome background. GO functional enrichment and KEGG pathway analysis were carried out by Goatools (https://github.com/tanghaibao/Goatools) and KOBAS (http://kobas.cbi.pku.edu.cn/home.do) (Xie et al. 2011).

Quantitative real-time PCR analysis
Gene-specific RT-qPCR primers were described by Afonina et al. (2007), which are shown in Table S1. The cDNA was synthesized from the two species with total RNA in a 10-μL reaction system using PrimeScript II RTase (TaKaRa Biotechnology, Dalian, China). RT-qPCR was performed in ABI Step-One Real-Time thermal cycler (USA) using power SYBR Green qPCR master mix (ABI, UK) following manufacturer's instructions. PCR amplification was carried out with initial denaturation of 94°C for 10 min, 40 cycles each of 30s at 95°C, 30s at respective annealing temperatures and extension at 72°C for 30 s. All reactions were performed in three replicates. The fold change in expression was calculated and transformed to log2 scale.

Statistical analysis
All the datasets were checked for normality prior to further analysis, and none of them were significantly different from a normal distribution, so no data transformation was required. Means of graphical representation showing chlorophyll fluorescence, fluorescence induction curves, rapid light curve, biomass, the transcriptome and RT-qPCR expression profiles of 14 selected photosynthetic genes were calculated according to the data using SPSS (16.0). Student's t-test was used to compare the means of graphical representation showing chlorophyll fluorescence, fluorescence induction curves, rapid light curve, biomass, the transcriptome and RT-qPCR expression profiles of 14 selected photosynthetic genes. The results were expressed as mean ± SE, and the statistical significance was accepted when P < 0.05.

Photosynthetic and growth data
Based on morphological observations, the invasive species, S. canadensis, showed a potential clear competitive advantage over its congener, S. decurrens (Figure 1(A, B)). In addition, the photosynthetic parameters of the invasive species were significantly higher than those of the native species (P < 0.05), such as chlorophyll fluorescence (F v /F m ), fluorescence induction curves (α value) and rapid light curve (Figure 1 (C-E)). Results of the biomass were the same to photosynthetic data (Figure 1(F)).
To better establish links and assign a high-level function of genes in the genome, KEGG analyses were performed. 49,019 (38.3%) and 51,100 (40.0%) transcripts were successfully mapped to 385 and 390 metabolic pathways, respectively, broadly classified under five categories (A, Metabolism; B, Genetic Information Processing; C, Environmental Information Processing; D, Cellular Processes; E, Organismal Systems) (Fig. S3 for S. canadensis, Fig. S4 for S. decurrens).

Pathway enrichment analysis of DEGs in photosynthesis of S. canadensis and S. decurrens
Based on the maximum number of differentially expressed transcripts, 1089 major genes (286 metabolic pathways) were found to be potentially involved in five major categories: (A) Metabolism, (B) Genetic Information Processing, (C) Environmental Information Processing, (D) Cellular Processes, (E) Organismal Systems. To categorize the biological functions of the DEGs of photosynthesis, a KEGG pathway enrichment analysis was performed. Twelve genes were found to be involved in photosynthesis (ko00195) from energy metabolism of metabolism (Figure 2(A)), and eight genes were found to be involved in photosynthesis-antenna proteins (ko00196) from energy metabolism of metabolism (Figure 2(B)). Our results showed that photosystem I genes (PsaE, PsaG, PsaK), photosystem II genes (PsbP, PsbQ, PsbW, PsbY, Psb27), photosynthethic electron transport genes (PetE, PetF, PetH) and F-ATPase delta to be up-regulated in photosynthesis (ko00195) of S. canadensis (Figure 3). Genes of light-harvesting chorophyll protein complex (LHC) family (Lhca1,Lhca2,Lhca4,Lhcb1,Lhcb3,Lhcb4,Lhcb6,Lhcb7) were also found to be up-regulated in photosynthesis-antenna proteins (ko00196) of S. canadensis (Figure 3).

Validation of RNA-Seq expression data by RT-qPCR
To verify the reliability of RNA-Seq differential expression analyses, quantitative realtime PCR was utilized to explore  20 selected genes between S. canadensis and S. decurrens in photosynthesis ( Figure 4). Finally, 14 selected genes were successfully validated by RT-qPCR. The 14 genes which belonged to photosynthesis (ko00195) and photosynthesisantenna proteins (ko00196) were found to be up-regulated in S. canadensis. Furthermore, the majority of the genes expressed were similar to those found with RNA-Seq analysis.

Discussion
Photosystem II (PSII) catalyzes the initial step of photosynthesis, the light-dependent oxidation of water that yields molecular oxygen and reduced plastoquinone (Hou et al. 2015). It is the essential process in photosynthesis. The photosynthetic data (ratio F v /F m ) is proportional to the potential maximal quantum yield of PSII, and the significant difference between S. canadensis and S. decurrens (Figure 1(C)) might be attributed to the efficiency of PSII that is generally related to photoinhibition (Krause and Somersalo 1989) or alteration in the levels of chlorophyll biosynthetic pathways (Song et al. 2014). Combined with the results of RNASeq (Figures 2  and 3) and RT-PCR (Figure 4), the up-regulated genes are associated with PSII, which would explain the advantage of hypersensitivity to light in S. canadensis. The three genes, PsbP, PsbQ and Psb27, serve to optimize oxygen evolution and efficiency of PSII (Schmidt et al. 2016), and play an essential role in enabling plants to adapt to fluctuating light intensity (Ifuku et al. 2005;Hou et al. 2015). PsbW is located close to the minor antennae of the PSII complex, which would associate with the PSII reaction center and the LHC complex (Plöchinger et al. 2016). The protein, PsbY, has played a supporting role in stabilizing the binding of PsbE and PsbF (Plöchinger et al. 2016). Photosystem I (PSI) is a multiprotein complex consisting of a number of subunits, which catalyzes the electron transfer from plastocyanin or cytochrome c 6 to NADP + through ferredoxin and ferredoxin: NADP + oxidoreductase (Ozawa et al. 2010). In higher plants, PSI and peripheral light-harvesting complex I (LHCI, Lhca1-Lhca5) together form the PSI-LHCI supercomplex and drive oxygenic photosynthesis.
The key genes, PsaG and PsaK, may be integrated after the LHCI is assembled into the PSI core complex (Ozawa et al. 2010). In addition, LHCI and PSI are significantly correlated with F-ATPase and C4 cycle (Xing et al. 2016). Therefore, the complex consisting of LHCI, PSI, and F-ATPase plays an important role in the coordinated expression between cyclic electron transport and the CO 2 -concentrating mechanism (Xing et al. 2016). In this study, several genes were found to be up-regulated in cyclic electron transport, such as PsaG, PetE, PetF, Lhca1, Lhca2, and Lhca4 (Figure 2), and the results of photosynthetic data (Figure 1(D,E)) and qPCR (Figure 4) are in good agreement with the transcription. Therefore, the competitive advantage of S. canadensis may be due to three factors: (i) the utilization of light for electron transport, especially in PsaG, PetE, Lhca1, Lhca2, and Lhca4 (Brown et al. 2005;Ozawa et al. 2010;Xing et al. 2016); (ii) several key genes which associate with oxidative phosphorylation pathway which were observed to be up-regulated, such as F-ATPase (Allen 2003); (iii) the adaptation of changing environment, e.g. the overexpression of PetF could increase the efficiency of reactive oxygen species (ROS) scavenging in chloroplasts to confer heat tolerance (Lin et al. 2013).

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This study was supported by Natural Science Foundation of Shanghai (18ZR1425400), National Natural Science Foundation of China (31800310) and the Research Funds for the Introduction of Talents of Shanghai Science and Technology Museum.