The complete chloroplast genome of Photinia davidsoniae: molecular structures and comparative analysis

Abstract Photinia davidsoniae is a common ornamental arbor in the genus Photinia (family Rosaceae). Here, we sequenced and assembled the complete plastome of P. davidsoniae using the next-generation DNA sequencing technology. And we then compared it with nine Photinia species using a range of bioinformatics software tools. The ten plastomes had sizes ranging from 159,230 bp for P. beckii to 160,346 bp for P. davidsoniae. They all had a conservative quartile structure. It contained two single-copy regions: a large single-copy (LSC) region, a small single-copy (SSC) region, and a pair of inverted repeat (IR) regions. Each of the plastomes encoded 113 unique genes, including 79 protein-coding genes, four rRNA genes, and 30 tRNA genes. Furthermore, we detected six hypervariable regions (matK-rps16, rpoB-trnC, trnT-psbD, ndhC-trnV, psbE-petL, ndhF-rpl32-trnL), which could be used as potential molecular markers. We constructed two phylogenetic trees with plastomes or concatenated protein sequences of 25 species of 8 genera of Rosaceae. The tree constructed with complete plastomes has much stronger support. The results placed P. davidsoniae in the upper part of the phylogenetic tree. It shows that P. davidsoniae and P. lanuginosa are closely related. In summary, the plastomes of Photinia are conserved overall but carry significant minor variations, as expected. The results will be indispensable for distinguishing species, understanding the interspecific diversity, and elucidating the evolutionary processes of Photinia species.


Introduction
The genus Photinia belongs to Maleae (Rosaceae) and comprises approximately 60 species (Robertson et al. 1991;Lu and Spongberg 2003). They are widespread landscape tree species resistant to pruning and air pollution (Mattei et al. 2017;Mori et al. 2018), and many were cultivated for gardening (Zhao et al. 2020). Photinia davidsoniae Rehder & E.H.Wilson (referred to as P. davidsoniae in the following text) is an evergreen plant species, which grows in thickets at altitudes of 600-1000 m and mainly distributes in southern China and Southeast Asia (Lu and Spongberg 2003). Like many other species of Photinia, P. davidsoniae has luxuriant foliage around the trunk, purple and tender leaves in early spring, and little white flowers in early summer, and bear red fruits in autumn. (Sterling 1965;Aoki et al. 2006;Mattei et al. 2018). Photinia species exhibited similar morphological features and the species boundaries have been unclear. With the continuous discovery of new Photinia species (Guo et al. 2010;Li et al. 2015), a reliable classification of Photinia is in urgent need.
Chloroplast genomes (referred to as plastomes in the following text) have been widely used in plant taxonomy. Compared with morphological identification, plastome sequences can produce more accurate phylogenetic relationships. Recently, the confusion in the taxonomic of Photinia-related species has been primarily solved based on the complete plastome sequences (Shi et al. 2019;. Particularly, the phylogenetic analysis of the Photinia-related species support the idea of a new genus Phippsiomeles and the resurrection of a redefined Stranvaesia in Maleae . Furthermore, Eriobotrya was found to belong to Rhaphiolepis based on plastomes and ribosome DNA ). However, their study focused only on the phylogenetic relationships among Photinia and its related genera. The comparative analysis of Photinia plastomes were not conducted extensively.
Here, we sequenced and assembled the complete plastomes of P. davidsoniae for the first time and then compared them with the plastomes of nine published Photinia species to explore the interspecific diversity of the plastomes of Photinia.

Plant material, DNA extraction, and sequencing
We collected fresh leaves of P. davidsoniae from the Central China Medicinal Botanical Garden, EnShi, China (30 10 0 N, 109 44 0 E) and froze them at À80 C. We used the plant genomic DNA kit (Tiangen Biotech, Beijing) to extract the total DNA following the manufacturer's protocol (Zhang, Li, et al. 2019). The DNA library with an insert size of 350 bp was constructed using the library preparation kit (New England Biolabs, USA) and sequenced using the Hiseq 2500 platform (Illumina, USA). We removed low-quality sequences, which are those with over 50% bases having quality values of Q < 19 or those with over 5% bases being 'N.' We obtained a total of 49,157,518 reads as clean data for further analysis.

Genome assembly and annotation
We used NOVOPlasty (v2.7.2) (Dierckxsens et al. 2017) to perform de novo genome assembly from the clean data. Bowtie2 (v2. 0.1) (Langmead et al. 2009) was used to ensure the assembly's correctness by mapping all clean reads to the assembled genome sequences. We used CPGAVAS2 (Shi et al. 2019) to annotate the genome. We used Apollo (Misra and Harris 2006) to edit the annotations with problems manually. The simple sequence repeats (SSRs) were identified using the CPGAVAS2 web server by calling MISA (Beier et al. 2017), including mono-, di-, tri-, tetra-, penta-, and hexanucleotides with the minimum numbers were 10, 5, 4, 3, 3, and 3, respectively. Additionally, tandem repeats were detected with the Tandem Repeats Finder program (v4.07b). REPuter (Kurtz et al. 2001) was used to calculate palindromic repeats, forward repeats, reverse repeats, and complementary repeats with the settings: Hamming Distance was three, and Minimal Repeat Size was 30 bp.

Phylogenetic analysis
The plastome sequences of 24 species in the family Rosaceae, including two outgroup species (Rosa rugosa and Sanguisorba officinalis), were downloaded from GenBank (Supplemental Table S1). The complete plastome sequences and 75 common protein sequences among the 25 species were aligned by using CLUSTALW2 (v2.0.1) (Thompson et al. 2002), respectively. These proteins include

Basic features of the plastomes
The plastomes of Photinia are characterized by a typical circular DNA molecule with a total length ranged from 159,230 bp (P. beckii) to 160,346 bp (P. davidsoniae). The overall GC content ranged from 36.42% to 36.66%. These plastomes have a conservative quartile structure, comprising a large singlecopy (87,434-88,302 bp) region, a small single-copy (19,217-19,361 bp) region, and a pair of inverted repeat (26,280-26,436 bp) regions (Table 1). The GC content of IR regions is higher than that of SSC regions and LSC regions in all ten Photinia species.

Genome annotation
The genome structures of ten plastomes are highly conserved. Using the plastome of P. davidsoniae as an example, it contains 131 unique genes. Among them, 79 are proteincoding genes, four are rRNA genes, and 30 are tRNA genes ( Figure 1 and Table 2). The total lengths of the protein-coding genes, rRNA genes, and tRNA genes are 77,832 bp, 9048 bp, and 2739 bp, accounting for 48.54%, 5.64%, and 1.71% of the complete plastome sequences, respectively. Introns play a significant role in selective gene splicing (Plangger et al. 2019). Among the 113 unique genes, two (ycf3 and clpP) contained two introns and 13 contained one intron, including eight protein-coding genes (rps16, atpF, rpoC1, petB, rpl22, rpl2, ndhB, ndhA) and five tRNA genes (trnK-UUU, trnS-CGA, trnL-UAA, trnE-UUC, trnA-UGC) (Table  S2). We identified six protein-coding genes, four rRNAs genes, and seven tRNA genes duplicated in the IR regions. Three genes have been found to span the IR and single-copy  regions, namely rps19, ndhF, and ycf1. Their structures are described in the section describing the contraction/expansion of the IR regions.

Repeats analysis
In this study, the numbers of SSRs ranged from 95 (P. prionophylla) to 105 (P. taishunensis). And we detected 1001 SSR loci in the ten plastome sequences (Figure 2A). It is worth noting that most of the repeating units in Photinia plastomes were A/T repeats, resulting in high A/T content in these plastomes. Besides, trinucleotide repeats are rare, and we only observed one in P. prunifolia. Moreover, we detected 550 tandem repeats in the ten cp genomes using the similarity cutoff of 90% ( Figure 2B). The numbers of tandem repeats ranged from 46 (P. beckii) to 63 (P. prionophylla). In contrast, the length of tandem repeats is mostly <¼ 20 bp (data are not shown). Besides, 237 forward repeats, 167 palindromic repeats, 69 reverse repeats, and seven complementary repeats were detected. Moreover, we found that P. beckii had the largest number of reverse repeats and the least number of palindromic repeats, different from the other nine species ( Figure 2B).

Contraction and expansion of the IR regions
With the evolution of plastomes, the IR regions have expanded and contracted, and some genes have the opportunity to access the IR regions or single-copy regions (Wang et al. 2018). The IR regions can undergo contraction and expansion, which are considered the main reason for the different lengths of plastomes in angiosperms. We compared the IR and SC boundaries of ten Photinia species and five related genera species (Figure 3). Three genes, rps19, ndhF, and ycf1, were found to span the borders. Most rps19 sequence is in the LSC region, and only a small fragment is in the IRb region. The length of the small fragment varies significantly. For example, it is 2 bp for P. taishunensis and P. glabra, 28 bp for P. beckii. In contrast, it is over 100 bp in other species.
For gene ndhF, they are located in the IRb/SSC border and overlapping with the first copy of the ycf1 gene. The large fragment of ndhF is in the SSC region; the small segment of ndhF is in the IRb region. The exception is that the ndhF genes for P. integrifolia and P. lochengensis are completely included in the SSC regions. For gene ycf1, there have two copies, which span the junctions of IRb/SSC and SSC/IRa. The length of the fragments in the IRa and IRb regions are similar, about 1000 bp.

Phylogenetic analysis using plastomes data
Here, we use two data sets to determine the phylogenetic relationship: the common protein-coding sequences Table 2. Gene contents of the plastomes of Photinia species.
( Figure 6A) and the complete plastome sequences ( Figure  6B). In our phylogenetic trees, it is evident that there are two main clades and then further divided into different subclades. Clade I included Pourthiaea and Aronia, and clade II contained four genera: the Photinia, Heteromeles, Cotoneaster, and Stranvaesia. Our data showed that P. davidsoniae is most closely related to P. lanuginosa. Both were most closely related to P. serratifolia. The two phylogenetic trees have similar topologies. However, the branches from the tree constructed with the complete chloroplast genome sequences have higher bootstrap support values. It is possible that the protein sequences are highly conserved and thus do not have sufficient informative sites to determine the relationships among these species.

Discussion
In this study, we sequenced the plastome of P. davidsoniae to understand its phylogenetic relationship with other congeneric species and also carried out a detailed comparative analysis of ten plastomes from Photinia. The plastomes were found to be highly conserved from different aspects. For example, these plastomes have identical numbers of protein-coding genes, rRNA genes, and tRNA genes. There is no rearrangement among the plastomes, consistent with those described for most other genera in angiosperms (Raman et al. 2017). Furthermore, our phylogenetic results are consistent with the earlier investigation (Shi et al. 2019;. And the results show that the whole plastome sequences are more reliable in phylogenetic and evolutionary studies as a super barcode . Nevertheless, we have identified minor variations among these plastomes. Firstly, there are differences in the number of repeats detected in different species, including SSRs, tandem repeats, and dispersed repeats. SSRs exhibited high polymorphism in Photinia species, which have provided a large amount of information for molecular markers (Wang et al. 2017). Previous research reported that these short dispersed repeats ranged from 30 to 40 bp are essential for promoting plastomes rearrangements. Whether these repeats have caused the rearrangement of the cp genomes of Photinia species is an interesting question.
Secondly, changes were observed in the IR boundary region of these plastomes. Although these changes are subtle, two genes, rps19 and ndhF, exhibited significantly dynamic changes at LSC/IRb and IRb/SSC. For gene rps19, it didn't span the LSC/IRb border in two species (P. taishunensis and P. glabra). Similarly, ndhF genes did not span the IRb/ SSC boundary in some species (P. integrifolia and P. lochengensis). However, both genes span the boundaries in most cases. It is not clear whether this dynamic boundary change has any effect on these genes' transcription.
We learned several lessons from these studies. Firstly, for closely related species, the non-coding regions might provide useful information to understand the current evolutionary processes. Secondly, one needs to incorporate additional information from the nuclear genomes and mitochondrial genomes for overall phylogenetic and evolutionary analysis. Unfortunately, we could not retrieve the raw sequence data for these analyses. For the future, more genome sequencing is needed to further explore these issues.
In summary, the results reported here could provide valuable information for genetic diversity, phylogenetic evolution, and taxonomy studies of the genus Photinia. Data availability statement The sample has been deposited in the herbarium of the Institute of Medicinal Plant Development in Beijing, China, with the accession Figure 5. Hypervariable regions in Photinia species. We used a sliding window to analyze the sequence polymorphism among the plastomes of ten Photinia species.
The sliding window has a length of 600 bp and a step size of 200 bp. The X-axis represents the position of nucleotide; Y-axis represents nucleotide polymorphism of each window. Figure 6. Phylogenetic relationships of species from Photinia and related genera inferred using the Maximum likelihood (ML) method. A. The phylogenetic tree was constructed was constructed using the complete nucleotide sequences of the using the 75 common protein sequences among the 25 cp genomes. B. The phylogenetic tree 25 cp genomes. Two taxa, namely, R. rugosa and S. officinalis, were used as outgroups. Bootstrap values were calculated from 1000 replicates.