Complete chloroplast genome of a rare and endangered plant species Osteomeles subrotunda: genomic features and phylogenetic relationships with other Rosaceae plants

Abstract Osteomeles subrotunda is a rare and endangered plant species with extremely small populations. In our study, we sequenced the complete chloroplast (CP) genome of O. subrotunda and described its structural organization, and performed comparative genomic analyses with other Rosaceae CP genomes. The plastome of O. subrotunda was 159,902 bp in length with 36.6% GC content and contained a pair of inverted repeats of 26,367 bp which separated a large single-copy region of 87,933 bp and a small single-copy region of 19,235 bp. The CP genome included 130 genes, of which 85 were protein-coding genes, 37 were transfer RNAs, and eight were ribosomal RNAs. Two genes, rps19 and ycf1, which are located at the borders of IRB/SSC and IRB/LSC, were presumed to be pseudogenes. A total of 61 SSRs were detected, of which, 59 loci were mono-nucleotide repeats, and two were di-nucleotide repeats. The phylogenic analysis indicated that the 14 Rosaceae species were divided into three groups, among which O. subrotunda grouped with P. rupicola, E. japonica, P. pashia, C. japonica, S. torminalis, and M. florentina, and it was found to be a sister clade to C. japonica. Our newly sequenced CP genome of O. subrotunda will provide essential data for further studies on population genetics and biodiversity.


Introduction
Rosaceae, the rose family, composed of over 100 genera and 3000 species, is a medium-sized flowering plant family with significant economic and scientific importance (Jung and Main 2014). Rosaceae family contains a considerable number of fruit plants, ornamentals, and herbs, which greatly benefit people all over the world (Wu et al. 2003). Osteomeles is a small genus in the rose family, comprising only about five species, including O. subrotunda, O. anthyllidifolia, and O. schwerinae, which are native to eastern Asia and Polynesia (Hsieh and Chaw 1996). There are very few reports regarding Osteomeles plants, and most of them focus on O. schwerinae, a traditional medicinal plant widely used in Asia countries. Lee et al. (2010) isolated two flavonol glucosides, hyperoside and quercitrin, from O. schwerinae by using semi-preparative high-speed counter-current chromatography separation technique. Sohn et al. (2018) reported that O. schwerinae extract inhibited the accumulation of extracellular matrix and the proliferation of rat glomerular mesangial cells. O. subrotunda distributes only in Ryukyu Islands in Japan and Renhua in China (Wu et al. 2003). In recent years, new distributions were found in several islands in Zhejiang Province, China, and according to our long-term investigation, the total plant number was found to be less than 300. O. subrotunda was now listed as a plant species with extremely small populations as well as a key protected wild plant in Zhejiang Province.
Chloroplasts (CPs) are essential organelles that carry out photosynthesis to produce sugars and oxygen in eukaryotic plant cells; Moreover, they synthesize fatty acids, terpenes, and amino acids for multiple functions (Waters and Langdale 2009;Dorrell and Howe 2012;Shi and Theg 2013). Like a mitochondrion, a CP possesses its unique genome, which is organized into a single circular chromosome (Martin et al. 2013;Allen 2015). As compared to nuclear genomes, the CP genomes of land plants are more conserved in structural organization, containing a large single-copy (LSC) region, a small single-copy (SSC) region, and two inverted repeat (IR) regions (Daniell et al. 2016). In higher plants, the size of CP genomes varies from 70 to 220 kb, which contains 110-130 genes like transfer RNAs (tRNA), ribosomal RNAs (rRNA), and protein-coding genes (Whittall et al. 2010;Dong et al. 2013).
In recent years, the complete CP genomes of several Rosaceae plants were sequenced and characterized, and these included Rosa praelucens , FragariaÂananassa (Cheng et al. 2017), and Hagenia abyssinica (Gichira et al. 2017). However, the CP genome sequence of Osteomeles remained uncharacterized. In our current study, we report the complete CP genome sequences of O. subrotunda, an endangered Rosaceae plant in China, by analyzing CP genome characteristics and performing comparative analysis against other 13 Rosaceae species. The completely sequenced CP genome of O. subrotunda will provide a valuable source for population genetics and biodiversity studies in the future.

Plant material and DNA extraction
Fresh leaves were collected from Toumen Island (28 41.132 0 N, 121 46.502 0 E), Zhejiang Province, China. A voucher specimen coded CHS2017108 was deposited at the Molecular Biology Laboratory in Taizhou University. Approximately, 0.5 g of leaves were ground into a fine powder using liquid nitrogen in a sterile pestle and mortar, and genomic DNA was extracted following the cetyltrimethylammonium bromide (CTAB) protocol (Doyle and Doyle 1987).

DNA sequencing and sequence assembly
A DNA library was prepared and was then sequenced using an Illumina Hiseq X Ten system (Illumina, San Diego, CA, USA). Approximately, 3.6 Gb raw data of 150 bp paired-end reads were obtained, and they were then filtered by NGS QC Toolkit v2.3.3 to obtain clean reads (Patel and Jain 2012). NOVOPlasty was applied to assemble the plastome (Dierckxsens et al. 2017).

Simple sequence repeat analysis
MIcroSAtellite Identification Tool (MISA) was applied to identify simple sequence repeats (SSRs) in O. subrotunda complete CP genome (Thiel et al. 2003). To perform SSR detection, 1-6 bp nucleotide motifs were considered, and the minimum repeat unit size was defined as follows: 10 for mono-nucleotides, six for di-nucleotides, and five for tri-, tetra-, penta-, and hexa-nucleotides.

Phylogenetic analysis
The complete CP genome sequence of Glycine falcata (NC_021649.1) was retrieved from NCBI and served as an outgroup. The genome sequences were aligned by performing a multiple sequence alignment using MAFFT v7.388 (Katoh and Standley 2013). The best-fit DNA substitution model for maximum-likelihood (ML) analysis was selected by running program jModelTest 2.1.9 under the Akaike information criterion (AIC) (Darriba et al. 2012). An ML tree was constructed by using program PhyML 3.1 using the best-fit model, GTR þ GþI, with 1000 bootstrap replicates (Guindon et al. 2010).

Genome sequencing and assembly
Overall, we yielded 11,924,123 clean reads (150 bp in average length) by removing unqualified reads in raw reads, and after de novo assembly, a circular contig was assembled with Novoplasty. The complete CP genome of O. subrotunda was 159,902 bp in length, which exhibited a typical quadripartite structure, including an LSC region (87,933 bp), an SSC region (19,235 bp), and two copies of IRs (26,367 bp). The GC content of O. subrotunda CP genome was 36.6%, which was unevenly distributed across the four regions; however, GC contents of intergenic regions and introns were lower. The highest GC content was observed in IRs (42.7%), followed by the LSC region (34.3%) and the SSC region (30.4%) (Figure 1).

IR expansion and contraction
As compared to 13 Rosaceae CP genome sequences, the length of IR in O. subrotunda was shorter that these of P. maximowiczii (26,436 bp), S. torminalis (26,416 bp), P. pashia (26,386 bp), and M. florentina (26,381 bp), but longer than other CP genomes (25,311-26,326 bp) (Table S1). Structural variations were observed in LSC/IR/SSC boundaries. In O. subrotunda CP genome, the IRA/SSC and IRB/SSC borders were present within upstream regions of the two ycf1 genes, and the ycf1 which IRA/SSC border located was a complete gene with a length of 5628 bp, while the other ycf1 in which IRA/ SSC border located was 1089 bp in length. The same Figure 1. The chloroplast genome of Osteomeles subrotunda. From the center going outward, the four circles indicate scattered forward and reverse repeats, tandem repeats, microsatellite sequences identified, and gene structure of the plastome. Table 1. List of genes in the chloroplast genome of Osteomeles subrotunda.

Group of genes
Name of genes Total number Large subunit of ribosomal proteins rpl2 (Â2) a , rpl14, rpl16 a , rpl20, rpl22, rpl23 (Â2), rpl32, rpl33, rpl36 11 Small subunit of ribosomal proteins rps2, rps3, rps4, rps7 (Â2), rps8, rps11, rps12 b , rps14, rps15, rps16 a , rps18, rps19, wrps19  phenomena were found with the two rps19 genes. The LSC/ IRB border located in the complete coding region of the rps19 gene which was 279 bp in length, while the ycf1 where LSC/IRA border located was proved to be a pseudogene with its 3 0 region truncated (Figure 2). Expansions or contractions of IR regions were observed among the 14 Rosaceae CP genome sequences. For all species, the IRB/SSC border was in the 3 0 region of the ycf1, and created an ycf1 pseudogene with a length of 1013 (P. utilis) to 1152 bp (G. rupestre). For P. rupicola, P. utilis, P. maximowiczii, P. pashia, C. japonica, E. japonica, M. florentina, S. torminalis, and O. subrotunda CP genomes, the LSC/ IRB border located within the coding sequence of rps19, while for D. fruticose, F. chiloensis, G. rupestre, and R. takesimensis, the LSC/IRB border was seen to locate in the intergenic region of rps19 and rpl2. In R. chinensis var. spontanea CP genome sequence, the LSC/IRB border located within rpl12. SSC/IRAs located within the ycf1 genes except for the C. japonica CP genome, in which a 3 0 -truncated ycf1 pseudogene with a length of 2298 bp was produced by the border (Figure 2). The ndhF genes of P. rupicola, P. utilis, P. maximowiczii, P. pashia, C. japonica, E. japonica, M. florentina, and S. torminalis, were extended and overlapped with pseudogene ycf1.

SSR analysis
A total of 61 SSRs were detected in the plastid genome of O. subrotunda. Of these, 59 loci were mono-nucleotide repeats, and two were di-nucleotide repeats (Table S2). Among 59 mono-nucleotides, 26A stretches, one C stretch, and 32T stretches were identified; however, no G stretch was found. The two di-nucleotides were both AT stretches, including a six repeat motif and a seven repeat motif, respectively. The length of the SSRs ranged from 10 to 20 bp, and most of them located in intergenic or intron regions. Seven genes, ycf1, atpH, trnI-GAU, trnG-GCC, rpoC2, rpoB, and atpB, contained one or two SSRs. Only two SSRs were observed harboring in IRA and IRB, respectively.

Phylogenetic analysis
jModelTest was used to carry out the statistical selection of the best model of nucleotide substitution, and the results determined that the best-fit model was GTR þ GþI. To determine the phylogenetic position of O. subrotunda, we constructed a phylogenetic tree using the whole CP genomes of 14 species in the family Rosaceae using G. falcata as an outgroup (Figure 3). Seven nodes were completely supported by 100% bootstrap, and two nodes had more than 95% bootstrap values. The phylogenic tree showed that the 14 Rosaceae species were divided into three groups, in which G. rupestre, R. takesimensis, R. chinensis var. spontanea, F. chiloensis, and D. fruticose in subfamily Rosoideae comprised one group, and P. utilis and P. maximowiczii in subfamily Prunoideae clustered another group. O. subrotunda grouped with P. rupicola, E. japonica, P. pashia, C. japonica, S. torminalis, and M. florentina, species belonging to subfamily Maloideae, and O. subrotunda was found to be a sister clade to C. japonica with a relative low bootstrap value of 67.

Discussion
Based on our previous long-term investigation, O. subrotunda is a wild plant species with tiny populations. To protect this rare species, it was listed as a key protected plant in Zhejiang Province. According to our knowledge, there was no report on this plant species. The rapid development of modern sequencing platforms in recent years has significantly facilitated to CP genome research. Currently, more than 2000 CP genomes were deposited in GenBank (https://www.ncbi.nlm.nih.gov/genome/browse#!/organelles/). However, there are only 45 Rosaceae CP genomes in GenBank, which mainly include Prunus (13), Fragaria (10), Malus (6), and Rosa (6), while Osteomeles is not on the list. In this study, we sequenced and assembled the CP genome of O. subrotunda. Typically, CP genomes in angiosperms contain a conserved quadripartite circular structure, including two copies of IRs which were separated by an LSC and an SSC (Jansen et al. 2005;Du et al. 2017). Our results indicated that the complete CP genome of O. subrotunda was 159,902 bp in length, which contained a typical quadripartite structure. The full length of O. subrotunda is within the range of the CP genome of other Rosaceae plants (Jansen et al. 2011;Wang et al. 2013;Bao et al. 2016). Similar to known CP genome in Rosaceae, O. subrotunda contained more AT bases, which was smaller than that of R. chinensis var. spontanea but larger than Sorbus torminalis . SSR markers are tandemly repeated nucleotide sequence consisting copies of mono-, di-, tri-, or tetranucleotide motifs flanked by unique sequences (McCouch et al. 1997). SSR sequences distribute in nuclear genomes, and they are widely used in genetic diversity analysis, linkage mapping studies, and marker-assisted breeding (Kaur et al. 2015;Almontero and Espino 2016;Tian et al. 2017;Ahmad et al. 2018;Chao et al. 2018). Likewise, CP genome sequences have been found to contain SSRs, and they have been increasingly used in both population genetic structure and evolutionary studies for their high polymorphism (Kikuchi et al. 2013;Wheeler et al. 2014;Fu et al. 2016;Takahashi et al. 2018). Musa balbisiana CP genome possesses 59 SSRs, though mono-nucleotides are common (29 SSRs, accounts for 49.15% of the total SSRs), it also includes di-, tri-, tetra-, penta-, and hexa-nucleotides (Shetty et al. 2016). In strawberry CP genome, 61 SSR loci are detected, among which 38 are mono-nucleotide repeats, 16 are di-repeats, besides, there are three tri-repeats and four tetra-repeats (Cheng et al. 2017). While the CP genome of O. subrotunda, a total of 61 SSRs were identified; however, there were only two types of nucleotide repeats, mono-and dinucleotide repeats, respectively.
Generally, gene content, gene order, and genome structure in CP genomes are highly conserved in flowering plants; however, gene loss, pseudogenization, and IR expansion or contraction are also common during CP genome evolution (Wicke et al. 2011;Daniell et al. 2016). Gene duplication in plastids of some hemiparasite plants, ndhC is found missing in both Striga hermonthica and S. aspera, while ndhF is missing only in Buchnera americana (Frailey et al. 2018). In the three Actinidiaceae plants, Actinidia polygama, A. tetramera, and Clematoclethra lanosa, their clpP genes were absent in CP genomes (Wang et al. 2016). However, in our present study, gene loss was not observed. The expansion and contraction of IRs account for CP genome size variations and gene pseudogenization (Ni et al. 2016;Wang et al. 2016). The IRs underwent both expansion and contraction during evolution in Rosaceae family, among which D. fruticosa CP genome showed the shortest IR region (25,311 bp), while P. maximowiczii demonstrated the longest IR sequence (26,436 bp). Pseudogenization commonly occur in parasitic, semi-parasitic, and non-parasitic plant species (Raman and Park 2015;Daniell et al. 2016;Gruzdev et al. 2016;Shin and Lee 2018). In Dendrotrophe varians CP genome, infA was found to be a pseudogene which contained a premature stop codon, moreover, at the LSC-IRA junction, the rps19 was truncated at its 3 0 end (Shin and Lee 2018). The ndhH gene of Dendrocalamus latiflorus crossed IRa/SSC boundaries, creating an incomplete copy of ndhH gene (Wu et al. 2015). In our present study, ycf1 located at IRB/SSC border was a pseudogene with its 3 0 region truncated, and the other ycf1 gene of C. japonica at LSC/IRA border was also proved to be a pseudogene.

Disclosure statement
No potential conflict of interest was by the author(s).

Data availability statement
The data that support the findings of this study are openly available in GenBank of NCBI at https://www.ncbi.nlm.nih.gov/nuccore/MK977586. The associated BioProject, SRA, and Bio-Sample numbers are PRJNA685556, SRR13275017, and SAMN17087660, respectively.