Complete chloroplast genome of Impatiens huangyanensis Jin and Ding 2002: genomic features and phylogenetic relationship within genus Impatiens (Balsaminaceae)

Abstract Impatiens huangyanensis Jin and Ding 2002 is a plant species with very small populations, and it distributes only in Huangyan, Zhejiang Province, China. In this study, the complete chloroplast genome of I. huangyanensis was assembled by using high-throughput Illumina paired-end sequences. Its genomic feature was determined, and comparative genomic analysis of the genus Impatiens was performed. The results revealed that the full-length chloroplast genome of I. huangyanensis was 152,156 bp with a GC content of 36.8%. The chloroplast genome contains a typical quadripartite structure, comprising two copies of inverted repeats (IRs), a small single-copy (SSC) region, and a large single-copy (LSC) region. The sequence lengths of IR, SSC, and LSC were 25,756 bp, 17,662 bp, and 82,982 bp, respectively. The chloroplast genome consisted of 134 genes, including 84 protein-coding genes, 37 transfer RNA genes, eight ribosomal RNA genes, and five pseudogenes. Phylogenic results indicated I. huangyanensis shared a clade with I. davidii Franchet 1883, I. macrovexilla Chen 2000, I. fanjingshanica Chen 1999, and I. piufanensis Hook 1908, with a support rate of 100%. Our study provided insight into further studies on the conservation genetics of I. huangyanensis.


Introduction
The Balsaminaceae is a widely distributed family that includes only two genera, namely Hydrocera and Impatiens.Hydrocera is a monospecific genus, containing only one species H. triflora, distributed in tropical Asia such as Southeast Asia, South India, and China (Shui et al. 2011).Impatiens is a species-rich genus, comprising over 1,000 species, mostly annual and perennial herbs with succulent stems (Janssens et al. 2006).China is regarded as the center for origin and diversification of Balsaminaceae, and there are about 250 wild Impatiens species, many of which have long been utilized as medicinal herbs (Luo et al. 2021).I. huangyanensis is a narrowly distributed Impatiens species, which was found only in mountain areas of Huangyan, Zhejiang Province, occurring in habitats of roadsides and forest margins, with very small populations (Jin and Ding 2002).At present, the chloroplast genome of I. huangyanensis has not been reported, and its systematic genetic location remains unclear.In this study, the chloroplast genome of I. huangyanensis was assembled based on high-throughput paired-end sequences, and a phylogenetic tree was generated to reveal its relationship with other Impatiens species.

Plant sampling
Fresh leaves were gathered from Foling (28 � 32 0 25 00 N, 121 � 09 0 37 00 E), Huangyan, Zhejiang Province, China (Figure 1).Leaves were taken to the laboratory and then washed with running water to get rid of dirt and dust before rinsed with sterile distilled water.A voucher specimen namely CHS20200388 was deposited in the Molecular Biology Innovation Laboratory at Taizhou University (Dr.Ming Jiang, jiangming1973@139.com).

DNA isolation, sequencing, assembling, and annotation of the chloroplast genome
The sodium dodecyl sulfate method was applied to extract high-quality genomic DNA.The genomic DNA was used to construct paired-end sequencing libraries with an average insert size of 350 bp.The library was sequenced by an Illumina Hiseq X Ten sequencing platform.Low-quality reads were filtered by using NGSQCToolkit v2.3.3 (Patel and Jain 2012).Chloroplast genome assembly was performed using the NOVOPlasty program (Dierckxsens et al. 2017).The chloroplast genome was annotated by Dual Organellar GenoMe Annotator (Wyman et al. 2004).Transfer RNA gene prediction was conducted by tRNAscan-SE 2.0.9 (Chan and Lowe 2019).The whole chloroplast genome map of I. huangyanensis was drawn using CPGView (http://www.1kmpg.cn/cpgview/;Liu et al. 2023).Sliding window analysis of nucleotide diversity was performed by using DnaSP 6.0 (Rozas et al. 2017).

Phylogenetic analysis
To understand its relationship with other Impatiens species, 18 chloroplast genome sequences of Impatiens species were downloaded from GenBank (National Center for Biotechnology Information [NCBI]) to construct a phylogenetic tree.We also downloaded a sequence of H. triflora (L.) Wight.et Arn.1753 (NCBI accession number: NC_037400), which was used as an outgroup species.The chloroplast genomes were aligned with MAFFT v7.450, a multiple sequence alignment program (Katoh and Standley 2013).Based on the best model GTR þ R, a phylogenetic tree was built by PhyML 3.1 with the maximum-likelihood method using whole chloroplast genome sequences (Guindon et al. 2010).

Results
Totally, 3.27G clean data were obtained, with 10,886,063 reads.The results revealed the complete chloroplast genome of I. huangyanensis was 152,156 bp in length, with an average depth of 3164.80�(Supplementary Figure S1).Among the Impatiens chloroplast genomes used in this study, the sequence lengths ranged from 151,538 bp (I.fanjingshanica Chen 1999) to 152,928 bp (I.mengtszeana Hooker 1908).
The phylogenetic analysis results showed that the 20 Balsaminaceae species clustered into four major groups.Both I. guizhouensis Chen 1999 and I. pritzelii Hook 1908 shared piufanensis, with a support rate of 100% (Figure 3).

Discussion and conclusions
Impatiens is a species-rich genus of angiosperms, and a number of new species were described (Tiwari 2023).However, the chloroplast genomes of most Impatiens species have not yet been assembled.In this study, we assembled I. huangyanensis chloroplast genome and revealed its close relationship with I. davidii, I. macrovexilla, and I. piufanensis.Pseudogenization is a common phenomenon in chloroplast genome.In our present study, five genes were found to be pseudogenized.The pseudolization of matK was also observed in Anthoceros formosae Steph. 1916, Campylotropis bonii Schindl. 1916, and some photosynthetic orchid species (Kugita et al. 2003;Barthet et al. 2015;Feng et al. 2022).One copy of ycf1 is located at the boundary of SSC/IR, and its 3 0 ends were truncated, pseudogenization of ycf1 is common due to the incomplete duplication of the normal copy (Amar 2020).

Figure 1 .
Figure 1.Impatiens huangyanensis Jin and Ding 2002.(A) Flower lateral view; (B) flower front view; (C) fruits; (D) natural habitat of I. huangyanensis.All the photos were taken by Ming Jiang.

Figure 2 .
Figure2.The chloroplast genome of Impatiens huangyanensis.The map contains six tracks.From the center outward, the first track shows the dispersed repeats, which consist of direct and palindromic repeats, connected with red and green arcs.The second track indicates the long tandem repeats as short blue bars.The third track reveals the short tandem repeats or microsatellite sequences as short bars with different colors.The colors, type of repeat they represent, and the description of the repeat types are as follows: black: c (complex repeat); green: p1 (repeat unit size ¼ 1); yellow: p2 (repeat unit size ¼ 2); purple: p3 (repeat unit size ¼ 3); blue: p4 (repeat unit size ¼ 4); orange: p5 (repeat unit size ¼ 5); red: p6 (repeat unit size ¼ 6). the chloroplast genome contains an LSC region, an SSC region, and two IR regions, and they are shown on the fourth track.The GC content along the genome is shown on the fifth track.Genes are color-coded according to their functional classification.The transcription directions for the inner and outer genes are clockwise and anticlockwise, respectively.The bottom left corner indicates the key for the functional classification of the genes.