Characterizing the complete chloroplast genome of the Impatiens davidii (Balsaminaceae)

Abstract Impatiens davidii Franch, 1886 is a rare ornamental flower used in gardens and has high economic value. In this study, we characterized the chloroplast genome of I. davidii and analyzed its phylogenetic relationship with other Impatiens species. The length of the complete chloroplast genome sequence of I. davidii is 152,214 bp, with a GC content of 36.9%. The chloroplast genome shows a typical quadripartite structure with a pair of inverted repeats (IRs) of 25,634 bp, separated by one large single copy (LSC) region of 83,128 bp and one small single copy (SSC) region of 17,818 bp. We annotated 125 genes, of which there were 85 protein-coding genes, 32 tRNA genes, and 8 rRNA genes. The Bayesian phylogenetic tree strongly supports that I. davidii has a close phylogenetic relationship with a group including I. piufanensis and I. alpicola.

Impatiens plants, rare ornamental flowers in gardens, contain more than 1000 species and are distributed over the whole Northern Hemisphere and tropical zone (Vrchotov a et al. 2011). China has rich Impatiens germplasm resources with over 220 species, a few of which play important roles in gardens (Cheng 1999). Impatiens davidii Franch, 1886 is an endemic species that mainly occurs in the Lushan Mountain and its adjoining areas, and its study is helpful for genetic improvement and variety breeding of Impatiens flowers (Cheng 1999). Chloroplast genomes are characterized by highly conserved sequences and structures because of their nonrecombinant, haploid and uniparentally inherited nature (Birky 2001;Wicke et al. 2011). Achieving good chloroplast genomic information is helpful to understand genomic variations and contributes to further physiological molecular and phylogenetic studies (Zhong et al. 2019). Here, we first report the chloroplast genomic information of I. davidii and analyze its phylogenetic location in the genus Impatiens. The annotated genomic sequence has been submitted to GenBank under the accession number MZ424444.
Samples of I. davidii were collected from Lushan Mountain (29 35 0 40.21ʺN,115 59 0 9.6ʺE) and the total genomic DNA from the fresh leaves was extracted with the DNAprep Pure Plant Kit (Tiangen Biotech, Beijing, China). An I. davidii specimen was deposited in the Laboratory of Molecular Biology of Jiujiang University (Anpei Zhou, Email: 6090078@jju.edu.cn) under the voucher number ID-AP1. Total DNA was used to generate libraries on an Illumina NovaSeq 6000, and approximately 3 Gb raw data were produced with 150 bp paired-end read lengths. GetOrganelle software (Jin et al. 2020) was used to assemble the complete chloroplast genome of I. davidii. The obtained scaffolds were adjusted to produce chloroplast genome sequences using Bandage software (Wick et al. 2015) and the initial annotation was completed using Geneious R8 software (Biomatters Ltd, Auckland, New Zealand). The codons were checked and adjusted by comparison with the reference genome I. pritzelii.
The phylogenetic location of I. davidii in the genus Impatiens was examined with fully sequenced chloroplast genome. Six chloroplast genome sequences of five Impatiens species (I. alpicola, I. glandulifera, I. hawkeri, I. piufanensis, and I. pritzelii) were obtained from GenBank, and one Hydrocera triflora species in the Balsaminaceae family was used as the outgroup. A total of eight complete chloroplast sequences were aligned through MAFFT v7 software (Katoh and Standley 2013). Based on the Akaike information criterion (AIC) derived from ModelTest 3.7 software (Posada and Crandall 1998), the best-fitting nucleotide substitution model was GTR þ I þ G. Bayesian inference was employed to reconstruct a phylogenetic tree using MrBayes software (Ronquist and Huelsenbeck 2003). In this step, the parameter settings were 1,000,000 generations, and the posterior probability was estimated using the Markov chain monte carlo (MCMC) method. According to the results, I. davidii can be considered sister to a group including I. piufanensis and I. alpicola (Figure 1).

Disclosure statement
No potential conflict of interest was reported by the author(s).

Author contributions
Chenhua Fu performed the experiment, analyzed the data, authored drafts of the paper, approved the final draft. Xiyan Chen analyzed the data, prepared figure, approved the final draft. Tongjian Li conceived and designed the experiment, approved the final draft. Anpei Zhou conceived and designed the experiment, reviewed drafts of the paper, approved the final draft. All authors agree to be accountable for all aspects of the work.

Data availability statement
The genome sequence data that support the findings of this study are openly available in GenBank of NCBI under the accession number MZ424444. The associated BioProject, SRA, and Bio-Sample numbers are PRJNA766029, SRR16043999, and SAMN21600381, respectively. Figure 1. Bayesian phylogenetic tree based on the complete chloroplast genome sequences. Six chloroplast genome sequences of five Impatiens species are downloaded from GenBank and Hydrocera triflora is set as the outgroup. The phylogenic tree is constructed by the Bayesian inference method with 1,000,000 generations. The posterior probability values are shown at nodes.