Sequencing and analysis of the complete mitochondrial genome of the taiga shrew (Sorex isodon) from China

Abstract The complete mitogenome sequence of the taiga shrew (Sorex isodon) was determined using long PCR. The genome was 17,008 bp in length and contained 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, one origin of L strand replication and one control region. The overall base composition of the heavy strand is A (32.5%), C (24.5%), T (28.5%), and G (13.5%). The base compositions present clearly the A–T skew, which is most obviously in the control region and protein-coding genes. The extended termination-associated sequence domain, the central conserved domain and the conserved sequence block domain are defined in the mitochondrial genome control region of the taiga shrew. Mitochondrial genome analyses based on MP, ML, NJ, and Bayesian analyses yielded identical phylogenetic trees. The eight Sorex species formed a monophyletic group with the high bootstrap value (100%) in all examinations.

In this paper, the complete mitochondrial genome of the taiga shrew (Sorex isodon) was sequenced for the first time on ABI 3730XL using a primer walking strategy and the long and accurate PCR, with five pairs of long PCR primers and with 14 pairs of sub-PCR primers. A muscle sample was obtained from a female taiga shrew captured from Phoenix Mountain of Changbaishan Mountains in Heilongjiang Province, China (44 27 0 48 00 N, 128 12 0 40 00 E). The muscle tissue was preserved in 95% ethanol and stored at À75 C before use. The specimen is stored in Animal and Plant Herbarium of Mudanjiang Normal University. The voucher number is FH2016104.
The mitochondrial genome is a circular double-stranded DNA sequence that is 17,008 bp long including 13 proteincoding genes, two rRNA genes, 22 tRNA genes, one origin of L strand replication and one control region. The accurate annotated mitochondrial genome sequence was submitted to GenBank with accession number MG983792. The arrangement of the multiple genes is in line with other Soricidae species (Nikaido et al. 2001;Fontanillas et al. 2005;Xu et al. 2012Xu et al. , 2013Xu et al. , 2016Huang et al. 2014Huang et al. , 2016Jin et al. 2017;Liu, Tian, Jin, Dong, et al. 2017;Liu, Wang, et al. 2017) and most mammals (Meganathan et al. 2012;Yoon et al. 2013;Xu et al. 2012Xu et al. , 2013. The control region of the taiga shrew mitochondrial genome was located between the tRNA-Pro and tRNA-Phe genes, and contains only promoters and regulatory sequences for replication and transcription, but no structural genes. Three domains were defined in the taiga shrew mitochondrial genome control region (Zhang et al. 2009): the extended termination-associated sequence (ETAS) domain, the central conserved domain (CD), and the conserved sequence block (CSB) domain. Three CSBs were found in the CSB domain and they were located in positions 16,284-16,319, 16,714-16,748, and 16,764-16,802. Also only one repetitive sequence (RS) region was found, which was located between the CSB1 and CSB2, and was rich in A and C. The repetitive pattern of segments in the RS was 5 0 -TA-(CACGTACGCCTATA)n-CA-3 0 (n ¼ 14).
The total length of the protein-coding gene sequences was 11,448 bp. Most protein-coding genes initiate with ATG except for ND2, ND3, and ND5, which began with ATA or ATC. Seven protein-coding genes terminated with TAA whereas the Cyt b gene terminated with AGG. The incomplete stop codons (T--or TA-) were used in ND1, ND2, COX3, ND3, and ND4. A strong bias against A at the third codon position was observed in the protein-coding genes. The frequencies of CTA (Leu), ATT (Ile), TTA (Leu), and ATA (Met) were higher than those of other codons. The length of tRNA genes varied from 59 to 75 bp. Twenty-one of them could be folded into the typical cloverleaf secondary structure except the tRNA-Ser (AGY), whose complete dihydrouridine arm was lacking.
Most of the taiga shrew mitochondrial genes were encoded on the H strand, except for the ND6 gene and eight tRNA genes, which were encoded on the L strand. Some reading frame intervals and overlaps were found. One of the most typical was between ATP8 and ATP6. The L-strand replication origin (OL) was located within the WANCY region containing five tRNA genes (tRNATrp, tRNA-Ala, tRNA-Asn, tRNA-Cys, and tRNA-Tyr). This region was 35 bp long and had the potential to fold into a stable stem-loop secondary structure. The total base composition of the masked shrew mitochondrial genome was A (32.5%), C (24.5%), T (28.5%), and G (13.5%). The base compositions clearly present the A-T skew, which was most obviously in the control region and protein coding genes.
In order to explore the evolution of Insectivora shrews which include Soricidae and Talpidae, especially the evolution of genus Sorex from China, here, we investigate the molecular phylogenetics of Chinese the taiga shrew using complete mitochondrial genome sequence of 30 species. All sequences generated in this study have been deposited in the GenBank (Figure 1).

Disclosure statement
No potential conflict of interest was reported by the authors.