Genetic diversity of Herpetospermum caudigerum (Ser.) Baill using AFLP and chloroplast microsatellites

Abstract Herpetospermum caudigerum (Ser.) Baillis is an endangered species found in high altitude regions of Tibet in China. In this work, its genetic diversity and genetic structure were investigated based on nuclear and chloroplast DNA. A total of 426 fragments were scored using 10 amplified fragment length polymorphism (AFLP) primer combinations, and from these, 256 fragments (60.7%) were polymorphic and could differentiate these populations. The dendrogram revealed that populations from different altitude have rich genetic diversities (Ht = 0.156, Hs = 0.111, Gst = 0.287 and Nm = 1.618). The averages of the number of alleles (Na), effective number of alleles (Ne), Nei’s genetic diversity (H) and Shannon’s index (I) were 1.352, 1.188, 0.111 and 0.168, respectively. In addition, 5 P03-trnL-trnF-300 haplotypes and 3 P02-trnL-trnF-396 haplotypes were identified among the 4 populations, and 7 haplotypes were identified based on the combined fragments. The non-coding region P03-trnL-trnF-300 exhibited higher polymorphisms with the number of haplotypes, the abundant haplotype (gene) diversity and the nucleotide diversity. Tajima’s test showed that all Tajima’s D values were statistically significant at P < 0.05, indicating that natural selection has an effect on mutations in these fragments. Our results from H. caudigerum cpDNA indicated a high genetic diversity fixation index (Fst = 0.968) and showed greater genetic differentiation among populations (96.7%, P < 0.01) than analysis of nDNA (63.72%, P < 0.01). These results could lay the foundation for further understanding and conservation of H. caudigerum germplasm resources.


Introduction
The highest plateau on Earth, the Tibetan Plateau, is 4000 m above sea level, with extreme environmental conditions as characterized by the dryness, low oxygen pressure, low temperature, strong ultraviolet (UV) radiation and violent winds [1]. Despite of these extreme environments, various Tibetan plant species have successfully lived in the plateau due to highland adaptations. Accordingly, heterogeneity of habitats in high altitude areas can contribute to rich genetic biodiversity [2], which is important to sustain traditional agriculture. Therefore, it is valuable to collect, describe and evaluated their diversity, sufficiently, especially to endangered plant species [3][4][5].
Herpetospermum caudigerum (Ser.) Baill (H. caudigerum Baill) is naturally distributed in high altitude regions, e.g., the Tibet, Yunnan and Sichuan province in China, and is best known in the Tibetan Plateau as traditional medicinal herb [6]. Generally, its dried and ripe seeds are used for the treatment of infections, cholecystitis, jaundice, hepatitis and dyspepsia, which is why it has become a focus of research [7,8]. However, in recent years, wild H. caudigerum has been listed as an endangered plant, due to destruction of habitat, difficulties with tissue culture, climate change and exploitation [9][10][11][12][13]. Conservation of plant species often requires ex situ cultivation in living collections [14]. Unfortunately, the genetic diversity of H. caudigerum in the Tibetan Plateau is still largely unknown, as a valuable medicinal plant source.
To date, several DNA-based molecular marker techniques, such as simple sequence repeat (SSR), restriction fragment length polymorphism (RFLP) and single nucleotide polymorphism (SNP) have been utilized to identify the genetic diversity of plant species [15][16][17]. Most molecular markers have the advantage of being non-tissue-specific, relatively abundant, suitable for early rapid assessment, and less susceptible to environmental impacts [18]. In addition, amplified fragment length polymorphism (AFLP) and chloroplast microsatellites are also reliable genetic markers and have been widely utilized for genetic analysis of herbaceous plants due to their highly sensitive, efficient, and reproducible characteristics [19,20]. The combined analysis of nuclear and chloroplast genomes may provide vital information for guiding conservation efforts.
Although some papers have reported on the feasibility of seed germination [21] and techniques of tissue culture [22] on H. caudigerum, the genetic diversity of this species is not clear. The main objective of this study was to access the genetic diversity and population relationships of H. caudigerum from different altitude gradients using AFLP and chloroplast markers. These results will benefit the conservation and exploitation of the germplasm resources of H. caudigerum and provide a theoretical basis for further studies of the evolution and phylogeography of the germplasm resources of H. caudigerum.

Plant materials and DNA extraction
In this study, we selected four wild H. caudigerum Baill (2n ¼ 2x ¼ 20) populations from different altitudes (2800 m, 3000 m, 3100 m and 3300 m) in the College of Agriculture and Animal Husbandry, Tibet, China (Supplementary Figure S1). A total of 70 individuals were selected randomly for the diversity analysis using the AFLP and chloroplast microsatellite method. The sampling information is summarized in Table 1.
The total genomic DNA was extracted from approximately 0.1 g of fresh leaves using the Plant Genomic DNA Kit (Zoman, Beijing, China) following the instructions supplied with the kit. The concentration and quality of extracted DNA were checked by an UV-VIS spectrophotometer (UV-1800; Shimadzu, Japan) and assessed by electrophoresis on 0.8% agarose gels. Then, the extracted DNA was adjusted to 40 ng/lL with autoclaved, deionized water and used for polymerase chain reaction (PCR) amplification.

PCR amplification, sequencing and genotyping
The AFLP primers and adapters were chosen according to the method described by Meng et al. [23] (Supplementary Table S1). The reaction protocol was performed as previously described by Costa et al. [24] with minor modifications. The pre-amplification mixture (20 lL) contained 1 lL of ligation mixture, 2.5 mmol/L dNTPs, 10Â buffer, 20 mmol/L pre-amplification primer, and 1 U of Taq polymerase. The selective amplification was performed in a 20 lL volume including 5 lL of template, 10Â buffer, 2.5 mmol/L dNTPs, 20 mmol/L selective amplification primer, and 1 U of Taq polymerase. The amplified fragments were separated in a 6% polyacrylamide sequencing gel and detected using silver staining as previously described by Li et al [25].
In addition, trnL-trnF regions of chloroplast DNA were amplified and sequenced as described by Phumichai et al. [26] (Supplementary Table S2). Amplification was performed in a volume of 50 lL containing 4 lL of template, 10 lmol/L primer pairs, 2. 5 mmol/L dNTPs, 2.5 mmol/L 10Â Buffer, and 2.5 U of Taq polymerase. The amplification was performed using the following conditions: 94 for 3 min, 35 cycles of 94 for 60 s, 53 for 45 s, and 72 for 105 s; and 72 for 10 min. PCR products were run on 2% agarose gels to verify a specific band and determine the size of the amplified products.

Data analysis
AFLP fragments for each primer combination were scored manually as present (1) or absent (0) for a binary data matrix. To assess the genetic diversity of H. caudigerum Baill., the genetic parameters, including the number of total bands, the number of polymorphic bands, polymorphic rates, number of alleles (Na), effective number of alleles (Ne), Nei's gene  [27]. Unweighted pair group method using arithmetic average (UPGMA) and principal coordinates analysis (PcoA) results was produced using NTSYS-pc Version 2.10 software. Analysis of molecular variance (AMOVA 1.55) software was used to calculate genetic variances among and within populations and genetic differentiation [28]. The Bayesian program STRUCTURE v.2.3.4 software was used to determine the genetic structure among the populations. The analysis was performed by setting the number of populations (K) from 2 to 4. The burn-in steps and the number of replicates were 10,000 and 50,000 for each K, respectively. The optimum K was estimated by the DK according to Structure Harvester [29]. Sequences were viewed and overlapping fragments were assembled using DNAStar. The cpDNA sequences of the trnL-trnF region were trimmed and aligned using MEGA v5.1, and they were double-checked by eye. Haplotypes and haplotype diversity statistics were calculated using the program DNAsp 5.0 [30]. The median-joining model implemented to investigate the evolutionary relationships between haplotypes in NETWORK 4.6.1.2 [31]. To estimate the degree, genetic differentiation was performed by the molecular variance (AMOVA) analysis according to Arlequin V3.5 software [32]. The population genetic structure was analysed as described previously.

Results and discussion
The understanding of genetic diversity is the basis for the genetic improvement of endangered plant species. In general, molecular markers can provide key information to estimate the genetic diversity [33]. AFLP and chloroplast markers have emerged in previous reports as efficient, accurate and highly reproducible methods [34,35]. However, there are no related reports that have used both AFLP and cpDNA methods to investigate the genetic variation on H. caudigerum. In this study, we investigated the genetic structure of four populations of H. caudigerum from different altitudes based on nDNA (AFLP) and cpDNA (chloroplast DNA).

Genomic AFLP analysis
A total of 426 bands were amplified by 10 AFLP primer pairs in this study (Table 2), of which the mean percent of polymorphic bands was 60.7%. These results indicate extensive allelic diversity within these populations, which were similar to our previous study on Fragaria ananassa Duch and Aconitum kongboense L in Tibet (58.8% and 64.12%, respectively) [36]. The size of the fragments ranged from 100 to 3500 bp (data not shown). The number of scorable fragments amplified by each primer pair varied from 28 to 57, with an average of 42.6. It was higher than in a previous report on chickpea (Cicer spp.) by Saeed et al. [37]. This is not only because that previous experiment involved a different species, but also because it detected the reaction products by agarose-gel electrophoresis, whereas we used polyacrylamide-gel electrophoresis in the present study. The primer pair of E21 þ M21 amplified the lowest number of fragments (28). The primer pair of E22 þ M33 amplified the highest number of fragments (57). Furthermore, the average number of polymorphic fragments amplified by the tested primer pairs was 25.6. The percentage of polymorphic bands varied from 39 to 77%, with an average of 60.7%. This was higher than our previous report on Aconitum kongboence L. from Tibet [23]. The differences in all of these data could be ascribed to the materials or the AFLP primer sets.
Based on nDNA, the values of Na, Ne, H, I, Ht and Hs were determined in the studied accessions, with an   In addition, we compared and analyzed the genetic variation among and within populations based on all AFLP results ( Table 3). The AMOVA results revealed significant genetic variation (63.72%) among populations. However, 36.28% existed within populations, which is high significantly. Furthermore, our results from the cpDNA from H. caudigerum indicated a high level of genetic diversity fixation index (Fst ¼ 0.968) and a higher genetic differentiation among populations (96.7%, P < 0.01) compared with the nDNA analysis (63.72%, P < 0.01).
The pairwise geographic and genetic distance of different altitudes was analyzed ( Table 4). The analysis of the genetic distance indicated that the genetic distance was the highest between the populations from 2800 m and 3000 m (0.096), whereas the genetic distance between 3000 m and 3100 m populations was 0.017. In addition, the geographic distance was the lowest between 3100 m and 3300 m populations (13.081), which showed the highest relationship. Tajima's D values were statistically significant, with P < 0.05, indicating genetic bottleneck events in the local population. However, these results are contrary to those of studies with Trailliaedoxa gracilis [41]. In general, the geographical conditions may be the main contribution to the low level of genetic variation by nDNA. For example, in order to adapt to the extreme environments of the Tibet plateau (large temperature differences, dryness and high radiation), H. caudigerum has evolved hard shells and low seed germination rates. The reduced effective population size and geographical isolation make maternally inherited chloroplast markers more likely to record the effects of population history in current genetic patterns than nuclear markers [42][43][44][45][46]. According to Nei's genetic distance, cluster analysis was performed to show the genetic relationship between different populations using the UPGMA method ( Figure 1). The similarity coefficients among all populations ranged from 0.82 to 0.96, with an average value of 0.857. All of the materials were distinctly divided into three major groups. The population at 2800 m was solely grouped in Cluster A (blue colour). The population at 3300 m was clustered in Cluster B (yellow colour). The populations at 3100 m and 3000 m were found in Cluster C (red colour). Overall, the cluster results indicated that all accessions were closely related to each other.
Principal coordinates analysis (PCoA analysis) revealed a pronounced genetic variation among the four H. caudigerum populations, which is in accordance with the clustering pattern produced by STRUCTURE software (Supplementary Figure S2). Three principal axes divided all materials into three groups. The 2800 m and 3300 m populations were clustered into two groups. The remaining individuals (from 3000 m to 3100 m) were clustered together into one group.
The genetic structure of 70 individuals from four populations was analyzed according to differences of allele frequency. In this study, the K-value was reached at 2 to 4. The maximum likelihood values showed that the best appropriate number of populations was four (K ¼ 4). Structure cluster analysis (Figure 2(a)) showed that Cluster A (blue) included the population from 2800 m, Cluster B (yellow) consisted of the population from 3000 m, Cluster C (green) consisted the population from 3100 m, and the population from 3300 m formed Cluster D (red). The structure graph presented four clear groups, whereas the individuals were distinctly resembled in each elevation group.
Furthermore, the cluster analysis and PcoA analysis revealed that the populations from 3000 m to 3100 m were more closely related to each other and reflected the geographic distribution patterns of these populations. For instance, Cluster A and B constitute the individuals from 2800 m to 3300 m, respectively. The materials from higher altitude (3000 m and 3100 m) were also clustered together.
The cluster analysis suggests that the genetic diversity of the populations correlates with their eco-geographic origins. These results were similar to the previous reports, and may arise from gene flow, artificial selection or life history [47][48][49].

cpDNA analysis
Now, the knowledge of the population structure of wild H. caudigerum is still poorly known, although studies on similar morphological forms of Herpetospermum spp. have been reported [50]. However, no reports were available on the structure and level of population variation of H. caudigerum. Chloroplast DNA (cpDNA) analysis revealed that different cpDNA haplotypes were found at high altitudes including 3000 m, 3100 m and 3300 m, indicating multiple origins of these H. caudigerum representatives from high altitudes. The origin of H. caudigerum has been analyzed by Guan et al. [50]. Indeed, the geographical distribution of H. caudigerum is mainly confined to high altitude in Tibet, suggesting low levels of variation occurred at high altitude. Additionally, pollen-mediated gene flow may be strong at a high altitude, giving rise to a dependent population.   The genetic diversity statistics are summarized in Table 5. P03-trnL-trnF-300 showed higher values for haplotype diversity (Hd), variance of haplotypes diversity (Vh), total nucleotide diversity (Pi) and standard deviation of haplotypes diversity (Sh) and total nucleotide diversity (Pi). In addition, the neutrality test statistics (Tajima's D test) were significant at P < 0.05 levels ( Table 5).
The analysis of molecular variance (AMOVA) based on cpDNA showed strong genetic differentiation among all accessions (Table 3): 96.79% of the variation was among populations, and 3.21% of the variation was within populations. The results showed greater genetic differentiation among populations than within populations, and a relatively low level of genetic diversity within the existing germplasm of this plant species. At the same time, the genetic variation among and within accessions were significant based on cpDNA (Fst ¼ 0.96794, P < 0.05).
Based on cpDNA data, all of the 70 samples between pairs of populations resulted in statistically significant Fst (Table 7). The value of Fst was from 0.17 to 1.00 between two populations from different altitudes. The mean Fst between pairs of populations was 0.96 ± 0.36 (P < 0.05). This indicated that H. caudigerum has narrow genetic background. This low level of genetic diversity may be attributed to technological or geographical reasons. In addition, the Tibetan Plateau experienced interglacial/postglacial periods; accordingly, the current distribution of plant species at the Tibetan Plateau is strongly affected by glaciations and paleo-environments during the Quaternary [51].
The STRUCTURE analysis of the cpDNA dataset revealed that the most likely number of populations was K ¼ 4. Therefore, the details for the STRUCTURE analysis at K ¼ 4 were shown. In Figure 2(b), among the 70 individuals, 17 individuals from 2800 m belong to one Cluster (blue), whereas the other individuals from 3000 m, 3100 m and 3300 m altitudes were not differentiated from each other and showed a genetic mixture.
Two polymorphic cpDNA loci (P03-trnL-trnF-300 and P02-trnL-trnF-396) were selected, and 28 alleles were produced. For these alleles, 7 haplotypes (from H1 to H7) were identified (Supplementary Table S3). Based on the median joining network, three main groups were formed (Figure 3). Haplotype H2 harbouring 24 individuals from 3000 m and 3300 m had a high haplotype frequency, which could be the dominant haplotype. H1 haplotype was characteristic for all individuals of the 2800 m population and was not contained any other population. Haplotype H2, H5 and H6 showed a closer relationship with H7, but Haplotype H1 had a distant relationship with other closer haplotypes. Thus, the network demonstrated that the haplotype was from the same parent, but diverse development directions were generated between haplotype H1 and the other haplotypes due to the long evolutionary process. Therefore, the genetic diversity of the population of H. caudigerum was influenced by altitude. In contrast to the populations from higher altitudes, the population at 2800 m showed low level of genetic differentiation and probably originated from a common ancestor regardless of their geographical separation.
According to the genetic diversity data, the resources of H. caudigerum should be rationally exploited. On the one hand, it is recommended that the introduction of new germplasm will contribute to the preservation of genetic diversity, crop improvement and resources utilization. On the other hand, a cultivation technology of H. caudigerum has been introduced and has helped to establish the breeding system, plantations and management regimes.

Conclusions
In the present study, we combined AFLP with cpSSRs to analyze the genetic relationship of H. caudigerum populations collected at different altitude. Altogether, the UPGMA and STRUCTURE analysis indicated that the 70 samples can be clustered into three groups, the 2800 m, 3000 m and a combination of 3100 m and 3300 m, respectively. In addition, the cpSSRs technique appeared more suitable for the assessment of the genetic diversity of H. caudigerum based on simple chloroplast structure, satisfactory repeatability and accuracy. More importantly, these results could provide a more scientific basic for further understanding the genetic differentiation and conservation strategies for the H. caudigerum germplasm.

Disclosure statement
No potential conflict of interest was reported by the authors. Author's contributions XFL and MFJ conceived and designed the experiments. XFL, LP, JMQ and QH performed the experiments. XFL, LP and SLQ analyzed or interpreted the data for the work. XFL, LP, JMQ, and QH wrote the manuscript. XFL, LP, JMQ and SLQensure that questions related to the accuracy or integrity of any part of the work. All authors read and approved the final manuscript.

Data archiving statement
There are no linked research data sets for this submission. The following reason is given: All data generated or analyzed during this study are included in this published article and its Additional files.