Adaptive evolution of the rbcL gene in the genus Rheum (Polygonaceae)

ABSTRACT Rapid putative radiations of Rheum might be caused by the recent uplifts of the Qinghai–Tibetan Plateau and the quaternary climate oscillations. To better understand the molecular adaptation associated with Rheum radiation, in this study, the adaptive evolution of the chloroplast rbcL gene was analysed using the Phylogenetic Analysis Program. The results showed that two amino acid residues (75F, 203I) were under positive selection. The spatial analysis indicated that the site (75F) was located in the β-sheet of the N-terminal loops involved in subunit interactions in the L8S8 molecule, and the site (203I) was in the α/β-barrel active centre located on the C-terminal domain of the large subunit of Rubisco. These results suggest that potential positive selection in rbcL might have played an important role in the adaption of Rheum species to the extreme environments in Qinghai–Tibetan Plateau regions, and different species lineages might have been subjected to different selective pressures.


Introduction
Rheum is one of the largest genera in the family Polygonaceae with about 60 species primarily distributed in mountainous and desert regions of the Qinghai-Tibetan Plateau and adjacent areas, and a few extending to or occurring in central/western Asia and Europe [1,2]. The distribution and the ancestral area reconstruction analyses implied that rapid radiations of Rheum have taken place, possibly as a result of the extensive uplifts of the Qinghai-Tibetan Plateau [3,4]. As a reflection of their adaptation to the new alterations of the habitat, these plants show marked phenotypic diversification [5]. For example, some species have evolved into dwarf plants with coriaceous basal leaves or drooping bracts. The adaptive advantage of decumbent forms is thought to be the avoidance of damage by strong winds, and the drooping bracts may play an important role in protecting pollen or in maintaining the temperature of the inflorescences from damage by low temperatures and ultraviolet radiation, allowing these forms to be distributed up to the snow line at an altitude of 5000 meters [6]. The stem leaves in some species are degenerated and basal leaves are covered with verruca or indumentum to reduce water transpiration to prevent withering at high temperatures of the environment so that these species can grow in the Gobi Desert [7]. Rheum palaestinum, for example, has broad rigid leaves with a waxy surface and channels cut into them to collect rainwater, with enough power to cause deep soil penetration [8]. These changes in morphology and physiology may result from the adaptive evolution of some orthologous genes which code for functional proteins, such as the chloroplast ndhF and rbcL genes, which are related to photosynthesis and photorespiration [9]. In our previous study, three amino acid sites of NDHF (NADH dehydrogenase subunit F) were identified under positive selection, and the secondary structures of the NDHF subunit showed that one of these amino acids was located in the a-helix [10]. The phylogenetic analysis also suggested that positive selection in ndhF has likely played a major role in the adaptive evolution of Rheum species [10].
The chloroplast rbcL gene encodes the large subunit of Ribulose-l,5-bisphosphate carboxylase/oxygenase (Rubisco) which basically contains all the catalytic active sites of the enzyme. The environmental factors could be the selective pressure to lead the adaptive evolution of the rbcL gene for improving the CO 2 utilization efficiency of Rubisco [11]. Kapralov and Filatov [12] searched for positive selection in rbcL sequences from over 3000 species representing all lineages of green plants and some lineages of other phototrophs, and found that positive selection exists in rbcL of most analysed land plants. The CONTACT Shizhong Mao msz@gxib.cn analysis of a large data set of rbcL sequences of C3 and C4 monocots showed that the rbcL gene evolved under positive selection in independent C4 lineages to adapt to the high CO 2 environment [13,14]. The selective pressure on the rbcL gene was higher for heterophyllous lineages of potamogeton than for homophyllous lineages to adapt to different environment [15]. Adaptive evolution of the rbcL gene was also found in Brassicaceae [16].
To examine the adaptive evolutionary process linked to rapid radiation of this genus and to further consolidate our previous results, in this study, the sequences of the rbcL gene from 34 species of Rheum and two species of Oxyria in Polygonaceae were retrieved from the National Center for Biotechnology Information (NCBI) for adaptive evolution testing. These may provide new molecular evidence for the rapid putative radiations of Rheum triggered by the recent uplifts of the Qinghai-Tibetan Plateau.

Sequences collection and phylogenetic analyses
Sequences of the rbcL gene were downloaded from the NCBI (National Center for Biotechnology Information) database (http://www.ncbi.nlm.nih.gov). Thirty-four species of Rheum and two species of Oxyria in Polygonaceae used in this study are listed in Table 1. Sequence alignment was conducted using the software CLUSTAL W ver. 1.83 [17] and adjusted manually in BioEdit 5.0.9.1 [18]. Oxyria digyna and Oxyria sinensis were used as an outgroup. The Maximum Parsimony (MP) analysis was conducted using PAUP 4.0b10 [19], the tree-bisectionreconnection branch swapping with the Multrees option, and the bootstrap values for all nodes were calculated with 1000 replicates.

Adaptive evolution analysis
Based on the MP tree, the analysis of the adaptive evolution of the rbcL gene was implemented in the program of CODEML from PAML package version 4 [20]. The lnL values under one-ratio model as well as free ratio model were calculated, and the Likelihood Ratio Test (LRT) was conducted to test whether there are different ratios for each lineage. Site-specific models, which allowed the v ratio to vary among sites but fixed a single v ratio in all branches, were used to detect positive selection and to identify positively selected sites. Three pairs of site-specific The rbcL protein modelling The model of the 3D structure of the rbcL protein for R. palaestinum was searched by using the Protein Data Bank (PDB) database (http://www.rcsb.org/pdb/home/ home.do) and the best model found was the structure of a product complex of tobacco ribulose-1,5-bisphosphate carboxylase/oxygenase (PDB ID:1ej7l). Sequence rearrangement was conducted by the SPDBV 4.01 program according to the 1ej7l structure. Homology modelling was carried out by using SWISS-MODEL [23]. According to the 3D structure of 1ej7l, the original configuration was obtained.

Results and discussion
All rbcL sequences across 34 species were 1257 bp with an average GC content of 45.2%. The aligned data analysis consisted of 1257 characters, of which 31 are variable and parsimony-uninformative and 17 parsimony-informative. MP analysis yielded nine equally parsimonious trees, and a strict consensus tree of these trees is shown in Figure 1. The topology of the MP tree was similar with the molecular phylogenies published to date [4,24]. Because only the catalytic region of the rbcL gene was taken into account, the clustering relationship was basically consistent with the geographical distribution within Rheum in the MP tree, and all of the Rheum species comprised a well-supported lineage with a sister relationship to Oxyria.
To analyse the possibility that positive selection acts on rbcL genes, we used the maximum-likelihood codon model from the CODEML program in the PAML4 package. The analysis of the MP tree was conducted by a single-rate model and a free-rate model, and all of the calculations and tests are listed in Tables 2 and 3. Under the one-ratio model, which allowed for only a single v ratio across all sites of the gene phylogeny and the same v ratio for all branches in the phylogenic tree (Figure 1), the log-likelihood value was v D 0.1733, lower than 1 ( Table 2). In the branch-specific analysis, the LRT statistic for the comparison of the one-ratio model vs. the  (Table 3).
In site-specific models, models M2, M3 and M8 allowed sites with v > 1. According to the LRT statistics, the comparisons of M0-M3, M1-M2 and M7-M8 all indicated significant differences (p < 0.05); thus, models M3, M2 and M8 were significantly better than M0, M1 and M7. Under both M2 and M8 models, two sites were under positive selection with v > 1 and two amino acid residues of Rubisco (75F, 203I) were identified with a Bayesian posterior probability of positive selection larger than 0.95 in one or more cases when analysed by the Bayes Empirical Bayes (BEB) ( Table 2).
The spatial analysis of the Rubisco residues under positive selection indicated that the Rubisco residue (75F) was located in the b-sheet of the Rubisco L subunit (Figure 2), and the amino acid encoded was phenylalanine (F) in R. palaestinum, while tyrosine (Y) was also found in other species of Rheum. The Rubisco residue (203I) was located in the a-helix of the Rubisco L subunit (Figure 2), and the amino acid encoded was isoleucine (I) in R. palaestinum, while leucine (L) was also found in other species of Rheum.
In higher plants, the Rubisco complex is a 16 subunit oligomer that consists of eight large subunits (LS) and eight small subunits. LS (51-58 kDa) of Rubisco has two clearly separated domains (N and C). The N-domain of LS (1-150 residues) is folded into a central mixed five-stranded sheeting with two-helices on one side of the sheet. The C-domain was predicted to have an a/b-barrel structure (157-475 residues). The N-terminal loops are involved in subunit interactions in the L8S8 molecule [25]. In this study, positive selection was found at the sites of codon 75 and codon 203 by the M2 model and the M8 model. The BEB method suggested that the maximum probability of positive selection was in the sites 75F and 203I (P > 95%) (calculated as the partial length of the rbcL subunit) in R. palaestinum ( Table 2). The spatial analysis indicated that the Rubisco site (75F) was located in the b-sheet of the N-terminal loops involved in the subunit interactions in the L8S8 molecule (Figure 2), and the Rubisco site (203I) was located in the   a/b-barrel structure. In our previous study on the adaptive evolution of the ndhF gene in this genus, we found that the ndhF gene was at high expression level under stress conditions, and three amino acid sites (188H, 465H, 551L) were identified to be under positive selection. Interestingly, the positively selected sites (465H and 551L) were on the loops, while the 188H amino acid site was located in the a-helix [10]. The changes found in rbcL gene may have resulted in a change in the Rubisco activase and in improvement of the utilization efficiency of CO 2 in plants. It is believed that if Rubisco has higher utilization efficiency of CO 2 , the plant would have higher photosynthetic efficiency and better adaptability to the environment [26,27]. Under strong ultraviolet B (UV-B) radiation, the transcription of rbcL might be restrained. Thus, various environmental stress factors, such as extreme UV radiation and CO 2 concentration, might be considered the selective pressure to lead the adaptive evolution of the rbcL gene [28].
The distribution and the ancestral area reconstruction analyses suggested that rapid putative radiations of Rheum might have been triggered by the recent uplifts of the Qinghai-Tibetan Plateau and the quaternary climate oscillations, as previously suggested by Fan et al. [29] about the radiation of Leymus. Geological evidence indicates that at least four different periods at the early Miocene (i.e. 22, 15-13, 8-7 and 3.5-1.6 Ma) occurred during recent extensive uplifting of the Qinghai-Tibetan Plateau [4][5]30], and new habitats may have been created while old ones became fragmented within each period. The new alterations of the habitat of Rheum species are various, from snow line at an altitude of 5400 meters to Gobi Desert at an altitude of 700 meters. It could be speculated that, through the adaptive evolution of some genes involved in photosynthesis pathways, e.g. rbcL and ndhF, some species of Rheum could adapt to the extreme habitats.

Conclusions
In this study, positive selection was found at Rubisco sites (75F, 203I) by the M2 model (selection) and the M8 (Beta v) model. The spatial analysis indicated that the site (75F) was located in the b-sheet of the N-terminal loops involved in subunit interactions in the L8S8 molecule, and the site (203I) was in the a/b-barrel active centre located on the C-terminal domain of the large subunit of Rubisco. The signature of positive selection in the rbcL gene across the Rheum phylogeny suggests that Rubisco might have been involved in adaptation to local environments for Rheum species in Qinghai-Tibetan Plateau habitats. This novel explanation for plant adaption may yield insights into the species diversity and endemism of alpine plants in Qinghai-Tibetan Plateau regions. However, to better understand the rapid putative radiations of Rheum species in local adaptation, other photosynthesis-and photorespiration-related genes should be included in the future works.

Disclosure statement
No potential conflict of interest was reported by the authors.