The complete chloroplast genome of Abies ernestii Rehder (Pinaceae) and its phylogenetic implications

Abstract Abies ernestii Rehder is endemic to the montane regions of Southwest China. Till now, phylogenetic relationships between A. ernestii and other closely related species remain unclear. In this study, we first characterized the complete chloroplast (cp) genome of A. ernestii. The whole cp genome was 121,841 bp in size, including one hundred and thirteen genes. Results of comparative cp genome revealed that only ycf1 and ycf2 was characterized by a considerable variation. Our phylogenetic analyses supported the monophyly of the genus Abies and revealed a clear separation between A. ernestii and A. chensiensis Tiegh. This study highlights the significance of using cp genomes to examine species boundaries among closely related fir species.


Introduction
Abies ernestii Rehder is distributed across Southwest China and characterized by high economic and ecological importance (Farjon and Rushforth 1989;Farjon 1990). It grows in cold, moist forests of the Hengduan Mountain region at 2600-3800 m elevations, with annual precipitation of 1100-2200 mm (Kuan 1981). Abies ernestii differs from Abies chensiensis in the length of needles and color of cones (green or pale purplish), in addition to its distribution in the north and west of the Sichuan province, extending into extreme north-western Yunnan province (Liu 1971). However, the species delimitation between A. ernestii and A. chensiensis has been a continuous source of debate (Liu 1971;Farjon 2001).
Recently, several studies have been conducted on the species delimitation between A. ernestii and A. chensiensis using morphological or molecular data (Suyama et al. 2000;Xiang et al. 2004Xiang et al. , 2009Xiang et al. , 2015Liepelt et al. 2010;Aguirre-Planter et al. 2012). However, no robust phylogenetic relationships have been reported owing to the lack of high-resolution molecular markers (e.g. complete chloroplast [cp] genomes) Xiang et al. 2018). Therefore, several distinct taxonomic combinations have been proposed: A. ernestii was treated as an independent species, a variety, or a subspecies of A. chensiensis (Liu 1971;Shao and Xiang 2015). In addition, A. chensiensis and A. ernestii were treated as identical in the studies conducted by Handel-Mazzetti (1929) and Dallimore and Jackson (1966). Thus, a more rigorous approach using highresolution molecular markers is required to delimitate the relationships between A. ernestii and A. chensiensis.
In this study, we first determined the cp genome of A. ernestii using next-generation sequencing technology. Thereafter, we compared it with other available genomes to explore their genetic divergence, develop molecular markers, and reconstruct phylogenetic relationships. This study evaluated the significant value of cp genomes to examine the species boundaries among closely related fir species.

Sample collection and DNA extraction
The specimens and leaf materials of A. ernestii were collected by Xianchun Zhang from Wenchuan county, Sichuan province, China (30.55 N,103.03 E). The voucher specimen was deposited in the herbarium of the Institute of Botany, CAS (PE) (http://pe.ibcas.ac.cn/, Qin Ban, herbarium2@ibcas.ac.cn) under the voucher number 5874. The leaf materials were first desiccated in silica gel, and the Ezup plant genomic DNA prep kit was used to extract the total DNA. The relevant cp genome sequence was submitted to the NCBI database (No. MH706707). The associated numbers were PRJNA790664 (BioProject), SRP351570 (SRA), and SAMN24219951 (Bio-Sample), respectively.
Polymerase chain reaction amplification, sequencing, and assembly The extracted DNA was sequenced on the Illumina HiSeq X platform with libraries in 350 bp length. The clean reads were approximately 10.2 million in 150 bp length. We aligned, assembled, and annotated the reads using CLC de novo assembler (v. beta 4.6, CLC Bio, Aarhus, Denmark), GeSeq (https://chlorobox.mpimp-golm.mpg.de/geseq.html), and tRNAscan-SE v1.3.1 (Schattner et al. 2005;Tillich et al. 2017). Few adapt-related reads were recognized in the raw reads. These reads were trimmed N > 10% or Q 5 to ensure highquality data and then assigned to the genome sequence using Velvet (Zerbino and Birney 2008). The cp genome sequence was annotated in tRNAscan-SE and GeSeq (Schattner et al. 2005;Tillich et al. 2017). To match the gene predictions, we checked all the start/stop codons and intron/ exon boundaries in Sequin 15.50 and Geneious 8.0.2 (Kearse et al. 2012;Lohse et al. 2013). Finally, we annotated the sequences by comparing them with the published genomes of A. chensiensis (MH706706 and MH047653).

Repeat sequence detection and comparative analysis of cp genomes
The MISA software was used to check the simple repetitive DNA sequences (SSRs) in the cp genome of A. ernestii. Thereafter, we detected the long repeats using the REPuter website (https://bibiserv.cebitec.unibielefeld.de/reputer/) (Kurtz et al. 2001). The parameters used in the analysis were as follows: the hamming distance was 3, maximum computed repeats were 50 bp, and minimal repeat size was 30 bp. Using the mVista program with Shuffle-LAGAN mode, we compared the whole cp genome of A. ernestii, A. chensiensis (MH706706 and MH047653), and Abies fargesii Franch. (MH706716) (Frazer et al. 2004). The repeat units were set to 10, 5, 4, 3, 3, and 3 for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotides, respectively.

Phylogenetic analyses
Phylogenetic analysis was performed using the complete cp genomes. We aligned the sequences containing the hotspot mutation regions and chose the maximum-likelihood (ML) analysis as the method in this study. The analysis was performed using a rapid bootstrap analysis and 1000 rapid bootstrap search steps in the RAxML v.7.8 (Stamatakis 2014). The relevant bootstrap value under each node was obtained from FigTree v.2.2. We collected the complete plastomes of 16 fir species from the NCBI database ( Figure 3, Table S1). Juniperus squamata Buchanan-Hamilton ex D. Don in the family Cupressaceae was used as an outgroup.

Chloroplast genome features of A. ernestii
The complete cp genome of A. ernestii was 121,841 bp in length, and its G and C content was $38.30%. This genome was characterized by a typical quadripartite structure, similar to the cp genomes in Pinaceae. The large single-copy (LSC) region and short single-copy (SSC) region were 67,157 and 54,156 bp, respectively. It contained 113 genes, including peptide-encoding genes (68), tRNA genes (16), open reading frames (6), and rRNA genes (4), respectively ( Figure 1, Table S2). In the LSC region, there were 53 protein-coding genes, 16 tRNA genes, and 3 open reading frames. The SSC region owned all the 4 rRNA. The IR regions were 264 bp and featured trnI-CAU and trnT-GGU. Among the encoded genes, ycf3 and rpl2 had two introns, and 11 genes (atpF, trnV-UAC, trnA-UGC, trnL-UAA, trnK-UUU, trnG-GCC, rps12, petD, rpoC1, petB, and rpl16) had one intron (Table 1). The palindromic inverted repeat, including trnS, psaM, ycf12, and trnG, was located in 52-kb inversion points and 1180 bp in length. Such inverted repeat was identical to the published cp genomes of fir species (Dong et al. 2021). In addition, all ndh genes were lost as other cp genomes of Pinaceae, such as Keteleeria davidiana var. calcarean (W. C. Cheng & L. K. Fu) Silba (Li et al. 2021). As proposed by Blazier et al. (2011), the ndh genes were initially discovered through their homology to complex I in the mitochondrial respiratory electron transport. The lack of ndh genes across various species in Pinaceae indicated its dispensability, suggesting that there might be alternative or supplementary electron transport pathways to achieve this function.

Repeat sequence analysis
The simple sequence repeats (SSRs) have been widely used in phylogenetic and phylogeographic analyses of highly polymorphic genetic materials (Kaur et al. 2015). In this study, we detected 67 SSRs in the cp genome of A. ernestii, and they included 42 mononucleotide repeats, 14 dinucleotide repeats, 2 trinucleotide repeats, 7 tetranucleotide repeats, and 2 pentanucleotides repeats ( Figure S1). Compared with other SSRs, the mononucleotide repeats (62.68%) were the most abundant and they contributed more to the genetic variations. Most mononucleotide repeats were A or T, and three types of dinucleotide SSRs AG/CT/AT were identified. In addition, we detected two types of trinucleotide SSRs (AAT/ATT), Table 1. List of genes encoded in Abies ernestii chloroplast genomes.

Comparative genome analysis
To perform the comparative genome analysis, we used three closely related firs: A. ernestii, A. chensiensis, and A. fargesii (Figure 2). In the phylogenetic analysis, they clustered together as a monophyletic lineage, with an extremely high Figure 2. Comparison of four chloroplast genomes using the mVista alignment program, with Abies chensiensis (MH706706) as a reference. The x-axis means the midpoint of the window, and the y-axis means nucleotide diversity (Pi). Genome regions are color-coded as protein-coding, rRNA coding, tRNA coding, or conserved noncoding sequences.
bootstrap (BSML ¼ 100) (Figure 3). Our results indicated that the IR region was more conservative than the SSC and LSC regions. Almost all the genetic variability was concentrated in the intergenic or noncoding region, corroborating the extremely low resolution among closely related fir species in previous phylogenetic studies ( Figure 2) (Xiang et al. 2004(Xiang et al. , 2009. In the coding regions, only ycf1 and ycf2 were characterized by a considerable variation between A. ernestii and A. chensiensis (Figure 2). ycf1 had been proposed as the most variable coding region that was better than the existing plastid barcodes and could serve as a DNA barcode (Dong et al. 2015). Unfortunately, the cp markers widely used in the Abies phylogenetic studies, for example, matK, rpl16, rps12-rpl20, rps18, trnC-D, trnS-G, and trnT-F, were highly conservative among the three closely related species (Xiang et al. 2018).

Phylogenetic analysis
To infer the phylogenetic status and evolutionary relationship of A. ernestii, we selected 16 published cp genomes of fir species, with Juniperus squamata as an outgroup. Based on the best ML phylogram, 17 Abies species were supported as one monophyletic lineage (BS ML ¼ 100) (Figure 3). Abies balsamea (L.) Mill. was distributed in North America and formed a sister lineage with the firs from East Asia (BS ML ¼ 100).
Among the East Asia firs, Abies beshanzuensis M. H. Wu was nested with others (BS ML ¼ 97). Abies ernestii, A. chensiensis, and A. fargesii clustered together as a monophyletic lineage, with an extremely high bootstrap (BS ML ¼ 100). Our results further indicated that A. ernestii and A. chensiensis were separated with an extremely high bootstrap (BS ML ¼ 100); thus, their current species status was well corroborated. The treatments of A. ernestii as a variety of A. chensiensis, or an identical subspecies of A. chensiensis were rejected (Handel-Mazzetti 1929;Dallimore and Jackson 1966). Such a robust phylogenetic relationship between A. ernestii and A. chensiensis has not been revealed previously (Xiang et al. 2004;Jaramillo-Correa et al. 2008;Semerikova et al. 2011Semerikova et al. , 2018.
This study provides new insights into the species delimitation and phylogenetic relationship in the genus Abies. Our results validated the reliability of using cp genome data to delimitate problematic groups at low taxonomic levels. These data provided important genetic resources for these ecologically important Abies species and further cp genome evolution research.

Ethical approval
All procedures in this study involving plants followed the Plant Use and Collection guidelines of Henan Agriculture University and the Institute of Botany, Chinese Academy of Sciences (IBCAS).

Author contributions
Yi-Zhen Shao, Yun Chen, and Qian Wen contributed to the conception and design of the study, analysis, and interpretation of the data. Yi-Zhen Shao and Zhi-Yuan Shi wrote the first version of the manuscript. Zhao Wang, Wei Wang, Yun Chen, and Qian Wen critically reviewed and modified the article regarding its intellectual content. All authors read, discussed, and approved the final version, and all authors agree to be accountable for all aspects of the work.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data availability statement
The genome sequence data supporting this study's findings are available in GenBank of NCBI at (https://www.ncbi.nlm.nih.gov/) under accession no. MH706707. The associated BioProject, SRA, and Bio-Sample numbers are PRJNA790664, SRP351570, and SAMN24219951, respectively.