High-throughput sequencing yields a complete mitochondrial genome of Emberiza godlewskii (aves, emberidae)

Abstract Emberiza godlewskii (Taczanowski, 1874) is a passerine bird of eastern Asia which belongs to the genus Emberiza in the bunting family Emberizidae. Obtaining the complete mitochondrial genome sequence of E. godlewskii is helpful to understand the species delimitation for further study of E. cia/godlewskii complex. The circular genome (16,839 bp in length) contains 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 control region. The base composition shows that purine percentage (52.87%) is slightly higher than pyrimidine (47.13%). The phylogenetic analysis based on the published data of the mitochondrial genome showed that E. godlewskii is closely related to E. cioides. This new mitochondrial genome data will provide more essential molecular data for further study related to E. cia/godlewskii complex.


Introduction
The species boundary between Emberiza. godlewskii (Taczanowski, 1874) and E. cia are more controversial owing to the morphological variation among geographic populations (Vaurie 1956;Mauersberger 1972;Zheng 2005). Our previous research with a small number of gene segments revealed a deep divergence between the southern and northern geographical populations of E. godlewskii ). Further study on E. cia/godlewskii complex with extensive sampling as well as plumage color and morphological measurements demonstrated that the southern subspecies populations form a distinct monophyletic group, while the northern E. godlewskii is the sister group of Emberiza cia. In other words, there is a distinct southern clade that should be separated from the original E. godlewskii (Li et al. 2023). Thus, we recommend the southern E. godlewskii subspecies should be recognized as a full species. However, the full mitochondrial genome of E. godlewskii is still unknown, which limits this species' ecological and evolutionary research. In this study, we obtained the complete mitochondrial genome sequence of southern E. godlewskii subspecies (E. godlewskii yunnanensis) through high-throughput sequencing for the further study on E. cia/godlewskii complex.

Ethics statement
The program of sampling collection and experiment in the article has passed the ethical review of Animal and Plant Ethics Committee of Northwest Normal University, and all experiments complied with the guidelines of the committee and the current laws of China.

Sample collection
The genomic DNA was extracted from the muscle tissue of E. godlewskii, which was collected from Jinzhong mountain nature reserve in Guangxi province, China (24.3806˚N 104.5715˚E). A specimen was deposited at the Institute of Zoology and Ecology, College of Life Science, Northwest Normal University (https://sky.nwnu.edu.cn; Dr J. Li, lijd14@ nwnu.edu.cn) under the voucher number EG2021017.

Mitochondrial genome assembly and annotation
Genomic DNA was isolated using a TIANamp Genomic DNA Kit (Tiangen, Beijing, China) according to the manufacturer's instructions. The complete mitochondrial genome of E. godlewskii was sequenced and assembled using Illumina HiSeq 2500 and MitoZ (Meng et al. 2019) respectively. The assembled mitochondrial genome was annotated using the MITOS web server (Bernt et al. 2013) under the invertebrate mitochondrial code.

Results
The complete mitochondrial genome of E. godlewskii (Figure 1) is circular and its full length is 16,839 bp. The sequence has been deposited in GenBank under accession number OQ509015. The base composition of mitochondrial DNA (mtDNA) shows that the percentage of A þ T (52.87%) is slightly higher than G þ C (47.13%). The complete mtDNA sequence has 13 protein-coding genes (PCGs), 23 tRNA genes, 2 rRNA genes (12S rRNA and 16S rRNA), and a control region. The 13 PCGs encode ND1, ND2, COX1, COX2, ATP8, ATP6, COX3, ND3, ND4L, ND4, ND5, CYTB and ND6 respectively. In the13 PCGs, eleven PCGs utilize ATG as the start codon, while COX1 translates from GTG and ND3 from ATA; nine PCGs end with complete (TAA: ND2, COX2, ATP8, ATP6, ND3, ND4L, CYTB; Figure 1. The species image of adult E. godlewskii. This bird is characterized by a grey head with chestnut lateral stripe, which is widely distributed in East Asia and tends to select bushy and rocky hill slopes that are often near forests, thickets, ravines, and farm fields. The photograph was taken in Jinzhong mountain nature reserve in Guangxi province of China by Dr X.Bao. Figure 2. The circular complete mitochondrial genome map of E. godlewskii. The complete mtDNA contains 13 PCGs, 23 tRNA genes, 2 rRNA genes (12S rRNA and 16S rRNA), and a control region. The 13 PCGs (ND1, ND2, COX1, COX2, ATP8, ATP6, COX3, ND3, ND4L, ND4, ND5, CYTB and ND6), 23 tRNA genes, and 2 rRNA genes are encoded in the plus and negative strands of the mtDNA. The control region of the mtDNA distributes between genes of tRNA-Glu and tRNA-Phe. The outside of the ring represents the plus strand, while the inside represents the negative strand. TAG: ND6) or incomplete (T: COX3) stop codons, while COX1 and ND1 terminate with AGG and ND4 and ND5 stop with AGA. The 23 tRNA genes range in size from 66bp encoding tRNA-Ser to 75 bp encoding tRNA-Leu. The 12S and 16S rRNA genes are 976 bp and 1592 bp respectively. The control region of the mtDNA was 1202bp in length and distributes between genes of tRNA-Glu and tRNA-Phe (Figure 2).
To assess the phylogenetic relationships of E. godlewskii, we selected other nine species of mitochondrial genomes of genus Emberiza birds, which have been published and obtained from GenBank, to construct phylogenetic relationships. We implemented a maximum-likelihood (ML) analysis with the best-fitting model HKY þ I þ G and 1000 bootstrap replicates. The concatenated nucleotide sequences of 13 PCGs from 10 species of Emberiza genus, which included E. godlewskii, were used to construct the phylogenetic tree. The phylogenetic analysis showed that E. godlewskii is closely related to E. cioides based on these available mitochondrial genome data (Figure 3).

Discussion and conclusion
In summary, the complete mitochondrial genome of E. godlewskii is 16,839 bp in length, whose size is almost the same as other genus Emberiza birds which have been reported. The phylogenetic relationship based on the published data of mitochondrial genomes demonstrated that E. godlewskii is closely related to E. cioides. However, the phylogenetic position of E. godlewskii is unclear if E. cia mitochondrial genome is added. Thus, the mitochondrial genome of E. godlewskii reported in this study and sequencing the mitochondrial genome of E. cia in the future will provide more essential molecular data for further study related to E. cia/godlewskii complex.