Sequencing and analysis of the complete mitochondrial genome of the masked shrew (Sorex caecutiens) from China

Abstract The complete mitogenome sequence of the masked shrew (Sorex caecutiens) was determined using long PCR. The genome was 17,096 bp in length and contained 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, one origin of L strand replication, and one control region. The overall base composition of the heavy strand is A (32.9%), C (24.5%), T (29.3%), and G (13.3%). The base compositions present clearly the A–T skew, which is most obviously in the control region and protein-coding genes. The extended termination-associated sequence domain, the central conserved domain and the conserved sequence block domain are defined in the mitochondrial genome control region of the masked shrew. Mitochondrial genome analyses based on MP, ML, NJ, and Bayesian analyses yielded identical phylogenetic trees. The five Sorex species formed a monophyletic group with the high bootstrap value (100%) in all examinations.

In this paper, the complete mitochondrial genome of the masked shrew (Sorex caecutiens) was sequenced for the first time on ABI 3730XL using a primer walking strategy and the long and accurate PCR, with five pairs of long PCR primers and with 14 pairs of sub-PCR primers. A muscle sample was obtained from a female the masked shrew captured from Luobei region of Small Khingan Mountains in Heilongjiang Province, China (45 50 0 07 00 N, 132 84 0 74 00 E). The muscle tissue was preserved in 95% ethanol and stored at À75 C before use.
The mitochondrial genome is a circular double-stranded DNA sequence that is 17,096 bp long including 13 proteincoding genes, two rRNA genes, 22 tRNA genes, one origin of L strand replication, and one control region. The accurate annotated mitochondrial genome sequence was submitted to GenBank with accession number MF374796. The arrangement of the multiple genes is in line with other Soricidae species (Nikaido et al. 2001;Fontanillas et al. 2005;Huang et al. 2014Huang et al. , 2016Xu et al. 2016) and most mammals (Meganathan et al. 2012;Xu et al. 2012Xu et al. , 2013Yoon et al. 2013).
The control region of the masked shrew mitochondrial genome was located between the tRNA-Pro and tRNA-Phe genes, and contains only promoters and regulatory sequences for replication and transcription, but no structural genes. Three domains were defined in the masked shrew mitochondrial genome control region (Zhang et al. 2009): the extended termination-associated sequence (ETAS) domain, the central conserved domain (CD), and the conserved sequence block (CSB) domain. Three CSBs were found in the CSB domain and they were located in positions 16, 426-16,450, 16,813-16,844, and 16,866-16,891. Also, only one repetitive sequence region (RS) was found, which was located between the CSB1 and CSB2, and was rich in A and C. The repetitive pattern of segments in the RS was 5 0 -TA-(CACGTACGCCTATA)n-CG-3 0 (n ¼ 16).
The total length of the protein-coding gene sequences was 11,425 bp. Most protein-coding genes initiate with ATG except for ND2, ND3, and ND5, which began with ATA or ATC. Seven protein-coding genes terminated with TAA whereas the Cyt b gene terminated with AGG. The incomplete stop codons (T--or TA-) were used in ND1, ND2, COX3, ND3, and ND4. A strong bias against A at the third codon position was observed in the protein-coding genes. The frequencies of CTA (Leu), ATT (Ile), TTA (Leu), and ATA (Met) were higher than those of other codons. The length of tRNA genes varied from 59 to 75 bp. Twenty-one of them could be folded into the typical cloverleaf secondary structure except the tRNA-Ser (AGY), whose complete dihydrouridine arm was lacking.
Most of the masked shrew mitochondrial genes were encoded on the H strand, except for the ND6 gene and eight tRNA genes, which were encoded on the L strand. Some reading frame intervals and overlaps were found. One of the most typical was between ATP8 and ATP6. The L-strand replication origin (OL) was located within the WANCY region containing five tRNA genes (tRNATrp, tRNA-Ala, tRNA-Asn, tRNA-Cys, tRNA-Tyr). This region was 36 bp long and had the potential to fold into a stable stem-loop secondary structure.
The total base composition of the masked shrew mitochondrial genome was A (32.9%), C (24.5%), T (29.3%), and G (13.3%). The base compositions clearly present the A-T skew, which was most obviously in the control region and protein coding genes.
In order to explore the evolution of Insectivora shrews, which include Soricidae and Talpidae, especially the evolution of genus Sorex from China, here, we investigate the molecular phylogenetics of Chinese the masked shrew using complete mitochondrial genome sequence of 26 species. All sequences generated in this study have been deposited in the GenBank (Figure 1).