Sequencing and analysis of the complete mitochondrial genome of Crocidura tanakae from China and its phylogenetic analysis

Abstract The complete mitogenome sequence of Crocidura tanakae was determined using long PCR. The genome was 16,969 bp in length and contained 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes, 1 origin of L strand replication and 1 control region. The overall base composition of the heavy strand is A (32.5%), C (22.3%), T (31.9%), and G (13.3%). The base compositions present clearly the A–T skew, which is most obviously in the control region and protein-coding genes. Mitochondrial genome analyses based on MP, ML, NJ, and Bayesian analyses yielded identical phylogenetic trees. The five Crocidura species formed a monophyletic group with the high bootstrap value (100%) in all examinations. This study verifies the evolutionary status of C. tanakae in Soricidae at the molecular level. The mitochondrial genome would be a significant supplement for the C. tanakae genetic background.

In this article, the complete mitochondrial genome of Crocidura tanakae was sequenced for the first time on ABI 3730XL using a primer walking strategy and the long and accurate PCR, with five pairs of long PCR primers and with 14 pairs of sub-PCR primers. A muscle sample was obtained from a female C. tanakae captured from Bijie regions of Wumeng Mountains in Guizhou Province, China (26 24 0 22 00 N, 105 44 0 04 00 E). The muscle tissue was preserved in 95% ethanol and stored at À75 C before use. The specimen and its DNA are stored in Animal and Plant Herbarium of Mudanjiang Normal University. The voucher number is GZ2019004.
The control region of C. tanakae mitochondrial genome was located between the tRNA-Pro and tRNA-Phe genes, and contains only promoters and regulatory sequences for replication and transcription, but no structural genes. Three domains were defined in the large mole mitochondrial genome control region (Zhang et al. 2009): the extended termination-associated sequence (ETAS) domain, the central conserved domain (CD), and the conserved sequence block (CSB) domain.
The total length of the protein-coding gene sequences was 11,415 bp. Most protein-coding genes initiate with ATG except for ND2, ND3, and ND5, which began with ATC or ATT. Six protein-coding genes terminated with TAA whereas the Cyt b gene terminated with AGA. The incomplete stop codons (T--or TA-) were used in ND1, ND3, COX3, ATP6, and ND4. A strong bias against A at the third codon position was observed in the protein-coding genes. The frequencies of CTA (Leu), ATT (Ile), TTA (Leu), and ATA (Met) were higher than those of other codons. The length of tRNA genes varied from 57 to 75 bp.
Most C. tanakae mitochondrial genes were encoded on the H strand, except for the ND6 gene and eight tRNA genes, which were encoded on the L strand. Some reading frame intervals and overlaps were found. One of the most typical was between ATP8 and ATP6. The L-strand replication origin (OL) was located within the WANCY region containing five tRNA genes (tRNATrp, tRNA-Ala, tRNA-Asn, tRNA-Cys, tRNA-Tyr). This region was 37-bp long and had the potential to fold into a stable stem-loop secondary structure. The total base composition of C. tanakae mitochondrial genome was A (32.5%), C (22.3%), T (31.9%), and G (13.3%). The base compositions clearly present the A-T skew, which was most obviously in the control region and protein coding genes.
To explore the evolution of Insectivora shrews which include Soricidae and Talpidae, especially the evolution of genus Crocidura from China, here, we investigate the molecular phylogenetics of Chinese C. tanakae using complete mitochondrial genome sequence of 36 species. All sequences generated in this study have been deposited in the GenBank (Figure 1).

Disclosure statement
No potential conflict of interest was reported by the authors.