The complete chloroplast genome of Coffea liberica (Gentianales: Rubiaceae)

Abstract Coffee is one of the most popular beverages around the world. As one of the best-known coffee species, Liberian coffee (Coffea liberica Bull ex Hiern 1876) has a high resistance to leaf rust, a devasting disease caused by Hemileia vastatrix. However, there are few reports on the systematic position and phylogenetic relationship of C. liberica at the chloroplast (cp) genome level. Thus, we successfully assembled its cp genome. The full length is 154,799 bp with a GC content of 37.48%. We have further annotated the cp genome and predicted 85 protein-coding genes together with 8 rRNAs and 37 tRNAs. Furthermore, a large single copy region (LSC), a small single copy region (SSC), an inverted repeat region a (IRa) and an inverted repeat region b (IRb) are identified with lengths of 84,868 bp, 18,121 bp, 25,905 bp and 25,905 bp, respectively. The phylogenetic tree indicates that C. liberica is closely related to C. canephora, which is consistent with a previous result obtained from genotyping‐by‐sequencing.


Background
Coffee is one of the most popular beverages around the world. The three best-known coffee species for coffee production are Arabica (Coffea arabica L.), Robusta (C. canephora L. Linden) and Liberian coffees (C. liberica Bull ex Hiern 1876) (Patay et al. 2016). To date, C. arabica has the largest cultivation areas for coffee production, but it is threatened by leaf rust, a devasting disease caused by Hemileia vastatrix (Talhinhas et al. 2017). In contrast, high leaf rust resistance has been identified in C. canephora and C. liberica, which has been successfully used for breeding resistant varieties in C. arabica (Prakash et al. 2004). However, there are few reports on the systematic position and phylogenetic relationship of C. liberica at the chloroplast (cp) genome level. The cp genome could provide reliable evidence of the evolution and origin of plant species, such as Solanaceae Mehmood, Ubaid, Bao, et al. 2020;Mehmood, Ubaid, Shahzadi, et al. 2020). Thus, we successfully sequenced and assembled the cp genome of C. liberica, which will benefit related studies in the future.

Methods and results
Young leaves of C. liberica were cut from a five-year-old tree in the coffee germplasm garden of the Dehong Tropical Agriculture Research Institute of Yunnan in Ruili,China (24.0256 N,97.8596 E) and used for DNA extraction. The specimen has been preserved in the Herbarium of the Dehong Tropical Agriculture Research Institute of Yunnan (http:// www.dtari.org.cn/, Xuehui Bai, 13529520059@163.com) under the voucher number DTARI-cl202101. The fresh leaves were rapidly soaked in liquid nitrogen and broken into powder for total DNA extraction by using the CTAB method (Doyle and Doyle 1987). The DNA sample was used for library construction and Illumina sequencing after being delivered to Biozeron Biotech (Shanghai, China). The Illumina NovaSeq platform was selected for paired-end short reads sequencing after the DNA sequences were broken into 300-500 bp fragments. After Illumina sequencing, we deposited a total of 3.81 Gb raw data in the SRA database with the accession number PRJNA771824. A total of 3.78 Gb clean data was filtrated in order to assemble the scaffolds of the cp genome by using NOVOPlasty v4.2 (Dierckxsens et al. 2017). The gaps between scaffolds were filled with GapCloser v1.12 to obtain the full cp genome (Luo et al. 2012). The cp genome of C. liberica contained 154,799 bp with a GC content of 37.48%, which was deposited in GenBank under the accession number MW970411. We selected the GeSeq and CPGAVAS2 software to annotate the cp genome and predicted 85 protein-coding genes together with 8 rRNAs and 37 tRNAs (Tillich et al. 2017;Shi et al. 2019). We selected Geneious v11.0.3 to screen the regional boundaries (Kearse et al. 2012). As a result, a large single copy region (LSC), a small single copy region (SSC), an inverted repeat region a (IRa) and an inverted repeat region b (IRb) were identified with lengths of 84,868 bp, 18,121 bp, 25,905 bp and 25,905 bp, respectively.
We selected 46 cp genome sequences to conduct the phylogenetic analysis. There are 42 species in Rubiaceae and four other species as outgroup, comprising Myxopyrum hainanense, Mitreola yangchunensis, Hoya carnosa and Calotropis procera (Amenu et al. 2022). All these cp genomes were aligned using MAFFT v7.0 (Katoh and Standley 2013). The phylogenetic tree was constructed by the Maximum Likelihood method with bootstrap values of 1000 replicates in MEGA 7.0.26 (Kumar et al. 2016). The result has indicated that C. liberica is closely related to C. canephora (Figure 1), which is consistent with a previous result obtained from genotyping-by-sequencing (Bawin et al. 2021). This study will benefit future studies related to chloroplast in the Coffea genus.

Ethics approval and consent to participate
The study involved only a cultivated crop without any threatened/endangered species. It was exempted from ethical approval and didn't need any permissions to carry it out.

Authors' contribution
Xuehui Bai, Hongyu Zheng and Xing Huang conceived and designed the experiments. Xuehui Bai and Hongyu Zheng analyzed the data and drafted the manuscript. Jinhong Li, Tieying Guo, Qin Luo and Zhirun Zhang contributed to the species identification and sample preparation. Xing Huang, Weihuai Wu and Kexian Yi revised the manuscript. All authors read and approved the final manuscript.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data availability statement
The genome sequence data that support the findings of this study are openly available in GenBank of NCBI at (https://www.ncbi.nlm.nih.gov/ nuccore/) under the accession number MW970411. The accession numbers of BioProject, SRA and Bio-Sample are PRJNA771824, SRX12645655 and SAMN22346234, respectively.