Complete chloroplast genome of a rare and endangered plant species Phalaenopsis zhejiangensis: genomic features and phylogenetic relationship within Orchidaceae

Abstract Phalaenopsis zhejiangensis is a rare and endangered plant species with extremely small populations. The complete chloroplast (cp) genome of P. zhejiangensis was assembled, its structural organization was described and comparative genomic analyses was carried out. The cp genome of P. zhejiangensis is 143,547 bp in length, with a GC content of 37.2%, which includes a pair of inverted repeats (IRs) of 24,464 bp separated by a small single-copy region of 10,764 bp and a large single-copy region of 83,856 bp. The cp genome contains 126 genes, consisting of 80 protein-coding genes, 38 transfer RNAs, and eight ribosomal RNAs. Six protein-coding genes, including ψndhB (two copies), ψndhD, ψndhG, ψndhK, and ψndhI, are identified as pseudogenes. Another six ndh genes, ndhA, ndhC, ndhE, ndhF, ndhH, and ndhJ, are missing from the plastid genome. A total of 41 cp simple sequence repeats (SSRs) were identified, including 40 mono-nucleotides and one di-nucleotides. Phylogenic analysis revealed P. zhejiangensis was nested inside the Phalaenopsis species and sister to P. wilsonii. The assembly and analysis of P. zhejiangensis cp genome will provide essential data for further study of taxonomy and systematics of Orchidaceae.


Introduction
Orchidaceae, a family of monocotyledonous plants, is one of the most successful and highly evolved monocots, comprising 763 genera with approximately 28,000 species (Hadi et al. 2015;Christenhusz and Byng 2016). The genus Phalaenopsis belongs to tribe Vandeae, which is commonly known as moth orchids, including more than 70 species widely distributed across Southeast Asia to northern Australia, with the majority in Indonesia and the Philippines (Ko et al. 2017;Gogoi and Rinya 2020;Lee et al. 2020). There are 12 Phalaenopsis species distributed in China, among which four species are endemics (Wu et al. 2009). Phalaenopsis zhejiangensis is a small epiphytic plant with slightly flattened roots, very short stems, basal leaves, and thinly textured flowers (Wu et al. 2009). P. zhejiangensis was transferred from Doritis zhejiangensis, while the latter was renamed from Nothodoritis zhejiangensis (Yukawa and Kita 2005), which was first found on a trunk of Podocarpus macrophyllus in 1970 at Xitianmu Mountain in Linan County, Zhejiang Province (Tsi 1989;Schuiteman 2012). P. zhejiangensis is a rare and endangered plant species in China, which is narrowly distributed in provinces of Zhejiang Province, Gansu, Anhui, and Shanxi, growing on the bark of trees or rock surfaces, with extremely small populations. As a result of habitat loss, P. zhejiangensis is listed as threatened with extinction. Fortunately, Zeng et al (2011) have established an effective propagation protocol for large-scale propagation of this endangered orchid species. In recent years, several chloroplast (cp) genomes of Phalaenopsis plants, including P. aphrodite, Phalaenopsis 'Tiny Star', P. lowii, P. lobbii, and P. mannii, were sequenced and assembled (Chang et al. 2006;Kim et al. 2016;Wang et al. 2019;Chen et al. 2020;Zhang et al. 2020). However, the cp genome of P. zhejiangensis has not yet been assembled. In our study, the complete cp genome sequence of P. zhejiangensis was assembled to provide new insights into taxonomy and systematics of Orchidaceae.

Plant material and DNA extraction
Fresh leaves were collected from a plant nursery (28 65.762 0 N, 121 46.976 0 E) in Jiaojiang, Zhejiang Province, China. A voucher specimen was deposited at the Molecular Biology Laboratory in Taizhou University (http://www.tzc.edu. cn, Dr. Huijuan Zhang, zhanghj82@126.com) under a collection number of CHS2020029. Fresh leaves were ground into a fine powder in liquid nitrogen with a mortar and pestle, and genomic DNA was extracted following the CTAB-based protocol as described by Doyle and Doyle (1987).

DNA sequencing and sequence assembly
Construction of a genomic DNA library was carried out according to the manufacturer's instructions, and highthroughput sequencing was then performed using the Illumina Hiseq X Ten system (Illumina, San Diego, CA). Approximately, 3.64 Gb raw data of 150 bp paired-end Illumina reads were generated, and 3.63 Gb high-quality clean reads were harvested by using the NGS QC Toolkit v2.3.3 (Patel and Jain 2012). Geneious Prime 2019 (Biomatters, Auckland, New Zealand) was employed to assemble the complete cp genome.

Chloroplast genome annotation
Annotation of the complete cp genome was performed with Geneious Prime 2019 by using P. lobbii as a reference genome (NCBI accession number: MT830847) , and the borders of each gene were manually inspected and corrected where necessary. The circular cp genome map of P. zhejiangensis was drawn using the online program OrganellarGenomeDRAW (OGDRAW, http://ogdraw. mpimp-golm.mpg.de) with default settings (Lohse et al. 2013).

Simple sequence repeat analysis
Perl scripts of MIcroSAtellite Identification Tool (MISA) were used to identify and locate the potential simple sequence repeats (SSRs) loci in complete cp genome sequence of P. zhejiangensis (Thiel et al. 2003). To determine the presence of SSRs, 1-6 bp nucleotide motifs were considered, and the minimum repeat numbers were set as 10 repeat units for mono-nucleotides, six for di-nucleotides, and five for tri-, tetra-, penta-, and hexa-nucleotides.

Phylogenetic analysis
The complete cp genome sequences of 39 Orchidaceae species downloaded from NCBI were used for phylogenetic analysis, which included eight Phalaenopsis plants, six Holcoglossum species, and five Cleisomeria plants. The complete cp genome of Coix lacryma-jobi (FJ261955) was applied as an outgroup. A total of 41 genome sequences were aligned with a multiple sequence alignment program MAFFT v7.388 plugin in Geneious Prime 2019 under default settings (Katoh and Standley 2013). The best-fit evolutionary model of DNA substitution for maximum-likelihood (ML) analysis was determined by using jModelTest 2.1.9 under the Akaike information criterion (AIC) (Darriba et al. 2012). A phylogenetic tree was generated by PhyML 3.1, with 1000 bootstrap replicates (Guindon et al. 2010).

Genome sequencing and assembly
More than 12 million paired-end clean reads were obtained, and a circular cp genome was assembled with Geneious Prime. The complete cp plastome of P. zhejiangensis is 143,547 bp in length which includes a pair of inverted repeats (IRs) of 24,464 bp separated by a large single-copy (LSC) region of 83,855 bp and a small single-copy (SSC) region of 10,764 bp ( Figure 1). The overall guanine-cytosine (GC) content of P. zhejiangensis cp genome is 36.8%, while the corresponding values of LSC, SSC, and IR sequences are 34.0%, 28%, and 43.5%, respectively.

Genome features of Phalaenopsis zhejiangensis
The complete cp plastome of P. zhejiangensis is comprised of 126 genes, including 74 protein-coding genes, 38 tRNA genes, eight rRNA genes, and six pseudogenes. All the six Table 1. List of genes in the chloroplast genome of Phalaenopsis zhejiangensis.

IR expansion and contraction
The IR region of P. zhejiangensis is 24,464 bp in length, while the other eight Phalaenopsis IRs included in our study are longer than P. zhejiangensis, they range from 24,719 bp to 25,846 bp ( Table 2). Comparisons of LSC, IRB, SSC, and IRA junction boundaries among nine Phalaenopsis cp genomes were performed and presented in Figure 2. In Phalaenopsis plastome, LSC/IRB boundary junctions lie within rpl22 gene except P. mannii, which locates in the intergenic region between rpl22 and rps19. Eight rpl12 genes extend 31 bp into IRB regions, while the rpl12 in P. wilsonii extends only one base pair into its IRB region ( Figure 2). The IRB/SSC junctions of P. mannii, P. equestris, and P. lobbi locate within the wycf1 pseudogenes; however, no wycf1 is identified in other six Phalaenopsis cp genomes including P. zhejiangensis. The SSC/ IRA junctions of P. wilsonii, P. mannii, P. equestris, and P. lobbii lie within ycf1 genes, which extend 0 bp, 132 bp, 9 bp, and 108 bp into IRA regions, respectively while the SSC/IRA junctions of P. zhejiangensis, P. lowii, P. japonica, P. aphrodite subsp. formosana, and the hybrid 'Tiny Star' locate between ycf1 and trnN-GUU.

SSR analysis
Totally, P. zhejiangensis cp genome contains 41 SSRs (Table  3). Among the SSRs identified herein, A/T mononucleotide repeats account for the largest proportion of 97.56%. Only one di-nucleotide repeat was observed, and no tri-, tetra-, penta-, or hexa-nucleotides presented in the plastome. Most SSRs are distributed in LSC (70.73%), followed by SSC (24.39%), and the proportion in IRs is less than 5%. Among 41 SSRs, 25 loci lie in intergenic regions, four in introns, and 12 in coding regions or pseudogenes.

Phylogenetic analysis
Statistical selection of the best-fit model of nucleotide substitution was carried out by the jModelTest program, and GTR þ GþI was turned out to be the best-fitted. To evaluate the phylogenetic relationship of P. zhejiangensis within Orchidaceae, a phylogenetic tree using the complete plastome sequences of 41 plant species were generated ( Figure  3). The resulting ML tree had very high bootstrap support for the majority of clades, and the phylogenetic tree was shown to be consistent with the traditional morphology-based taxonomy of Orchidaceae. Nine plants from the genus Phalaenopsis formed a well-supported monophyletic clade with 100% bootstrap value, and P. zhejiangensis was sister to P. wilsonii, with a support value of 97%.

Discussion
The classification of Orchidaceae is a very complex problem for its bewildering diversity and morphological parallelism, and also for botanical neglect (Dressler 1993). Orchidaceae were divided into five subfamilies, including Apostasioideae, Vanilloideae, Cypripedioideae, Orchidoideae and Epidendroideae, and Apostasioideae are monophyletic and sister to other subfamilies (Dixon et al. 2003;Tsai et al. 2013). The subfamily Orchidoideae is comprised of a large group of orchids, including four tribes (Diurideae, Cranichideae, Codonorchideae, and Orchideae) with around 3600 plant species that largely share terrestrial habits (Bateman et al. 2003;Serna-S anchez et al. 2021). The genus Phalaenopsis belongs to tribe Orchideae, which consists of about 62 genera (Inda et al. 2012). The morphology of P. zhejiangensis is similar to both Doritis and P. lowii, with the characteristic feature of pollinia similar to that of Doritis, and its shapes of petals and columns similar to P. lowii (Tsi 1999;Christenson 2001). P. zhejiangensis was initially named N. zhejiangensis based on morphology, and then renamed as D. zhejiangensis (Yukawa and Kita 2005;Schuiteman 2012). Molecular analyses using ITS, trnL intron as well as trnL-F spacer prove that N. zhejiangensis is species nested in genus Phalaenopsis (Deng et al. 2015). Our molecular phylogenetic analyses confirm its systematic position, which indicates that P. zhejiangensis is sister to P. wilsonii, with a high support value. The sizes of previously published Phalaenopsis cp genomes ranged from 144,607 bp (P. lobbii) to 148,964 bp (P. phalaenopsis aphrodite subsp. formosana) (Chang et al. 2006;Kim et al. 2016;Wang et al. 2019;Chen et al. 2020;Zhang et al. 2020). In our present study, the complete plastid genome of P. zhejiangensis is only 143,574 bp in length, which turns out to be the smallest among all known cp genome sequences in genus Phalaenopsis. The cp genomes of most flowering plants range from 120 kb to 160 kb, and the difference in genome size is mainly caused by contraction and expansion of the IR region (Goulding et al. 1996;Ingvarsson et al. 2003;Yao et al. 2020). The IR sizes of published Phalaenopsis plastid genomes are 24,719 (P. lobbii)  to 25,846 bp (P. equestris) (Jheng et al. 2012). In our present study, the length of P. zhejiangensis IR sequence is 24,464 bp, which is smaller than that of P. lobbii.
Plastid gene loss and pseudogenization are common phenomena in parasitic, semi-parasitic, and saprophytic plants (Molina et al. 2014;Li and Zheng 2018;Nie et al. 2019). In semi-parasitic plants of Macrosolen cochinchinensis, M. tricolor, and M. bibracteolatus, the infA and all the ndh genes were lost among the three species, and two genes, ycf1 and rpl2, were found to be pseudogenes (Nie et al. 2019). Gastrodia elata is a saprophytic plant with extremely small cp genome in which many genes related to photosynthesis are missing, and it contains only 28 genes, including 20 protein-coding genes, three rRNAs, and five tRNAs (Park et al. 2020). However, cp gene loss and pseudogenization are rarer in photosynthetic species, because the plastid gene cannot simply be discarded (Magee et al. 2010;Wicke et al. 2011;Daniell et al. 2016). Interestingly, pseudogenization, truncation, and deletion of ndh genes in cp genomes are common in some cp genomes of photosynthetic orchid plants (Kim et al. 2015). Orchids like Apostasia odorata, Sobralia callosa, Paphiopedilum armeniacum, and Phragmipedium longifolium retain the complete set of ndh genes, whereas most of the ndh genes in P. equestris, Dendrobium officinale, and D. catenatum were found to be lost in their plastids (Kim et al. 2015;Lin et al. 2017). In this study, six ndh genes are  pseudogenized, and another six ndh genes are missing from the plastid genome of Phalaenopsis zhejiangensis.