Complete chloroplast genomes of wild and cultivated Cryptomeria japonica var. sinensis

Abstract The tree Cryptomeria japonica var. sinensis is native to China and is an important forest species widely used for wood production. Here, we sequenced the complete chloroplast (cp) genomes of six wild and six cultivated accessions of this tree. The 12 cp genomes ranged from 131,379 to 131,528 bp. The GC content was 35.4%, similar to other gymnosperm species. The cp genomes lacked typical inverted repeat (IR) regions and encoded 118 genes. Most genes appeared in one copy and 17 genes contained introns. Two multi-copy genes (trnM-CAU × 3, trnQ-UUG × 2) were identified. And 59–61 simple sequence repeats (SSRs) were identified in the whole cp genomes, and most SSR loci consisted of A or T bases. Phylogenetic analysis indicated that wild and cultivated accessions were not clearly differentiated. Our results will provide useful information for the conservation and utilization of this variety. Supplemental data for this article is available online at https://doi.org/10.1080/13102818.2021.1932592 .


Introduction
The chloroplast (cp) is the most important organelle in green plants as it is the place where photosynthesis and carbon fixation occur. Compared with the nuclear genome, the cp genome is more conserved in terms of gene structure and composition, which is advantageous for the study of taxa at higher taxonomic levels [1,2]. Moreover, the cp genome does not recombine and is uniparentally inherited [3,4], which can elucidate the history and evolution of plant populations [5]. Complete cp genome sequences were first reported for tobacco [6] and liverwort [7] in 1986. In recent years, the rapid progress of next-generation sequencing has enabled us to better understand the molecular and genomic characteristics of cp genomes [8][9][10].
The cp genomes of higher plants are circular molecules ranging in size from 100 to 200 kb [11]. In angiosperms, the cp genome contains two identical inverted repeat sequences (IRA, IRB) that divide the genome into large (LSC) and small single copy (SSC) regions. It is believed that large IRs can help stabilize the cp genome [12]. The relative size of this typical quadripartite structure remains constant; the gene order and organization are highly conserved [13,14].
However, in gymnosperms, the cp genome of most coniferous species lacks the large IRs, which may lead to more gene loss and structural rearrangement [15][16][17].
C. japonica var. sinensis, also called Liushan, is a native variety of C. japonica in southeast China. It is one of the most important plantation species widely used for commercial timber. In this study, we sequenced the cp genomes of six wild and six cultivated Liushan trees using a next-generation sequencing platform. We compared the gene content, genome structure, SSRs information, and intraspecific variation among the cp genomes of these 12 accessions. Phylogenetic analysis was also performed to understand the relationship of C. japonica var. sinensis in Cupressaceae family.

DNA sequencing and genome assembly
Twelve C. japonica var. sinensis accessions, including six wild and six cultivated trees were selected from Tianmu Mountain (119°26'08.83''E, 30°20'17.17''N) and Xiapu seed orchard (119°56'13.85''E, 26°51'58.99''N) in China, respectively. Fresh leaves of each tree were sampled for total DNA extraction using a modified CTAB protocol [18]. Five micrograms of purified DNA were used to construct the short-insert libraries (average 450 bp) according to the Illumina standard protocol. Genome sequencing was performed using Illumina Hiseq high-throughput sequencing technology. The raw data were filtered to obtain high-quality reads by removing adapters and low-quality sequences using the NGS QC Toolkit v2.3.3 [19]. The complete cp genome of C. japonica (Accession: AP010967) was used as a reference sequence for splicing, assembly and annotation by SPAdes 3.9.0 software [20].

Genome analysis and annotation
To assess the levels of genomic variation between wild and cultivated trees, parsimony informative sites and nucleotide diversity were calculated using DnaSP version 6.1 [21]. The step size was set to 200 bp, with a 600-bp window length. The Dual Organellar GenoMe Annotator (DOGMA) [22] was used for genome annotation based on comparisons of homologous genes with other conifer cp genomes. UGENEORFs finder tool was used to predict open reading frames (ORFs) in the DNA sequences. The tRNA genes were confirmed by tRNAscan-SE version 1.21 [23] with default settings. The rRNA genes were verified using the RNAmmer 1.2 server [24]. A circular gene map of each cp genome was drawn by the online tool of Organellar Genome Draw program (OGDraw) [25].

Phylogenetic analysis
Intraspecific phylogeny analysis was performed using 12 C. japonica var. sinensis and C. japonica base on a data matrix of 82 shared protein-coding genes. Furthermore, interspecific phylogeny analysis of 13 Cupressaceae species was also performed with 71 common protein-coding genes, using Cunninghamia lanceolata as outgroup. The phylogenetic trees were constructed estimated by maximum likelihood (ML) in MEGA X [26]. The bootstrap support of each branch was calculated with 1000 replicates. The bootstrap values are only shown for nodes with greater than 50% support.

Chloroplast SSRs identifying
Simple sequence repeats (SSRs) were detected using the MISA Perl script [27]. The parameter of minimum repeat units was set as 10 for mononucleotide, 6 for dinucleotide and 5 for trinucleotide to hexanucleotide, respectively.
Among the total genes in the cp genome, 112 were single-copy genes, and two tRNA genes (trnM-CAU × 3, trnQ-UUG × 2) were multi-copy. Introns can regulate the transcription rate of genes and play an important role in the genes structure and function [31]. A total 17 single-copy genes contained introns (Figure 1), including 11 protein-coding genes (trnA-UGC' trnI-AAU' trnI-AUC' trnK-AAA' trnL-UAA'  trnS-CGA' rps12' rps16' rpl2' rpl16' rpoC1' petB' petD' atpF' ndhA' ndhB and ycf3) and six tRNA genes. In protein-coding genes, rps12 was identified as a trans-spliced gene. The distance between 5'rps12 and 3'rps12 genes was 38.9 kb. The tRNA genes are among the most important versatile molecules responsible for maintaining the protein translation machinery [32,33]. C. japonica var. sinensis has a higher number of tRNA Met and tRNA Ser genes, and has a lower number of tRNA Gly , tRNA Ile , tRNA Thr and tRNA Val genes ( Table  3). The tRNA Met species, including initiator tRNA fMet and elongator tRNA Met , is a major player to give rise to other tRNAs [34,35]. We found three copies of tRNA Met in the cp genome, which was different to C. japonica, T. distichum and G. pensilis [28][29][30]. Four ycf genes (ycf1 to ycf4) have also been identified in the cp genome, but the clpP gene was absent (Figure 1).

Chloroplast genome variation between wild and cultivated accessions
To investigate levels of cp sequence divergence between wild and cultivated trees, the nucleotide variation of 12 cp genomes was established. The results showed that wild and cultivated accessions possessed the same level of nucleotide variation (0.00003) ( Table  4). We identified 11 and 10 mutation sites in the cp genome of wild and cultivated trees, respectively. Using the C. japonica cp genome (AP010967) as a reference, we identified a total of 29 or 28 InDels as well as 16 or 14 single-nucleotide polymorphisms (SNPs) (A/T) in wild and cultivated trees, respectively ( Table  5). The trnL-ycf1 spacer had the highest number of indels (10), and the largest indel (198 bp) was found in ycf1, which is thought to be involved in cellular metabolism or to play a structural role in plastids [36].
Intraspecific phylogeny analysis indicated that C. japonica var. sinensis and C. japonica trees did not fall into separate clades (Figure 2A). To determine the evolutionary relationship of C. japonica var. sinensis, we included 13 Cupressaceae species using 71 common protein-coding genes in the phylogenetic analysis. The result showed that wild and cultivated C. japonica var. sinensis trees formed a clade, which was sister to C. japonica with 100% bootstrap support. C. japonica var. sinensis and C. japonica had a close genetic relationship with Thujopsis dolabrata ( Figure 2B).

Repeat sequences in chloroplast genome
Chloroplast simple sequence repeats (cpSSRs) are used to investigate the levels of genetic diversity [37][38][39].   In total, we detected 59-61 SSRs with a length ≥10 bp. Most of the repeated sequences were located in intergenic regions and only some in protein-coding sequences ( Figure 3, Supplemental file 2). This supports previous reports that SSR frequency varies between different regions of the genome [40,41]. Mononucleotide repeats were the most abundant SSRs, whereas no tetranucleotides were found (Figure 3). Almost all SSRs were composed of A or T. These SSRs can function as useful molecular markers to explore population genetic structure and domestication events.

Conclusions
In this study, we reported 12 C. japonica var. sinensis cp genomes in wild and cultivated trees by de novo sequencing. The structure of the cp genome showed a partial lack of one IR copy, which is a common feature in gymnosperm cp genomes. Phylogenetic analysis suggested that the wild and cultivated trees possessed the same level of nucleotide variation. C. japonica var. sinensis had a close genetic relationship with T. dolabrata. Our study will be helpful for conserving this important timber forest species, and further studies.

Acknowledgment
We are grateful to Dr. Markus Ruhsam and Dr. Berthold Heinze for checking English and valuable suggestions.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data accessibility
This work was supported by the National Key R&D Program of China (2016YFE0127200).