Rapid development of novel microsatellite markers from Mauremys reevesii (Testudines: Geoemydidae) using next-generation DNA sequencing technology

Abstract Mauremys reevesii (Gray, 1831), which belongs to Mauremys of Geoemydidae (Testudines), distributed in China, as well as Japan and Korea. Previous studies have developed several polymorphic microsatellite loci, but most of them were dinucleotide motifs. Here, we developed 15 polynucleotide-repeat microsatellite loci (including di-, tri, tetra-and pentanucleotide motifs) for M. reevesii through Restriction-site Associated DNA tags sequencing (RAD-seq). A total of 987 microsatellite loci with flanking sequences were suitable for setting primers for polymerase chain reactions (PCR). To verify the identified SSRs, 40 primer pairs were selected for PCR detection. In total, 32 primer sets produced strong PCR products matching their expected sizes, in which species amplification tests showed that 15 were polymorphic. And the number of alleles per locus ranged from 3 to 16. The observed and expected heterozygosity per locus varied from 0.3784 to 1.000 and from 0.3995 to 0.9700, respectively. The methodology of microsatellite isolation constructed in this study is not only cost-effective and time-saving in comparison to traditional approaches, but also can be served as useful tools which benefit population genetics studies and conservation management of M. reevesii.


Introduction
Microsatellites are short tandem DNA repeats (Bu et al. 2011) and have been broadly used to assess genetic population structure, construct genetic linkage map and so on due to the features of highly polymorphic and co-dominant (Bu et al. 2014;Zhang et al. 2016). Traditionally, the isolation of microsatellites markers has two major approaches, the one is developed from gene library, including PIMA, FIASCO and etc, the other is the utilization of other closely related species microsatellite sequences (Liu et al. 2012). However, these methods are not only costly and time-consuming but also limited by the difficulties of de novo development in species without any genomic information. The emergence of next-generation sequencing technologies has rapidly improved the development of SSR because of its ability to generate a large amount of sequence information quickly and economically (Inoue et al. 2013;Rico et al. 2014;Hu et al. 2016).
M. reevesii is widely distributed in China, Japan and Korea. Because of the destruction of the habitat, overhunting and environmental pollution, the wild populations have decreased dramatically (Altherr & Freyer 2000;Spinks et al. 2004). Consequently, it has been categorized as endangered in the Chinese Red List of Threatened species. Thus, the preservation, management of genetic resources and artificial breeding required accurate genetic analysis for M. reevesii are important. Previous studies have been done in these areas of research population genetic structure and genetic analysis. RAPD technique was used to analyze the genetic diversity of M. reevesii at molecular level (Zhu et al. 2005).
Eight novel polymorphic microsatellite loci developed by FIASCO were presented for the M. reevesii (Ye et al. 2009). The eight microsatellite core motifs were AC/GT, these microsatellites consisted of five dinucleotides and three compound motifs were unitary. In order to isolate more types of microsatellite for analyzing the genetic diversity, we reported the development of novel microsatellite primers (including di-to pentanucleotide motifs) for M. reevesii using RADseq technology (Castoe et al. 2012;Brandt et al. 2014;Nugraha et al. 2014).

Sample collection and DNA extraction
Procedures involving animals and their care were consistent with NIH guidelines (NIH Pub.No. 85-23, revised 1996) and in accordance with the approval of the Committee of Anhui Normal University under approval number #20130710.
Three sexually mature turtles (M. reevesii) (1♀, 2♂) were, respectively, collected in Guangdong, Guangxi and Anhui. Total genomic DNA from three turtles was, respectively, extracted from the tip tail tissues by a standard phenol/chloroform procedure via proteinase K digestion. And the DNA quality was assessed on 1% agarose gel.

DNA sequencing and microsatellite discovery
Extracted DNA was sent to the Genergy Biotechnology Company and sequenced by Restriction-site Associated DNA tags sequencing. Restriction endonuclease digestion of genomic DNA was used by PstI. This sequencing run yielded over 22.5 M reads, with a sequencing depth of 4.04. These reads were assembled into contigs. Based on the contigs, the potential microsatellite loci were searched for simple sequence repeats (SSRs) by MISA software, 987 loci with enough flanking sequence were selected. Then, we looked for dinucleotides motifs with at least six repeats in the consensus sequences and tri-, tetra-, penta-and hexanucleotide motifs with at least five repeats, in which 40 were chosen for primer design using Oligo7.0. The primers were designed with the following criteria: (i) GC content 40-60%; (ii) product size 150-350 bp; (iii) primer length 18-25 bp; and (iv) melting temperature 50-60°C with a maximum 2°C difference between paired primers.

SSR markers screening
Each pair of primers was pre-tested on eight specimens of M. reevesii. Total genomic DNA was extracted from tip tail tissues by phenol-chloroform method. PCRs were performed in a total volume of 25 μL PCR mixture containing 1 μL template DNA (30-50 ng/μL), 1U Taq DNA polymerase (TaKaRa Co., Ltd, Dalian, China), 2.5 μL 10× PCR buffer, 2μL of 25 mM MgCl 2 , 2 μL of 25 mM dNTPs, 0.5 μL of 25 mM primer (each). PCR cycling was as follows: 95°C for 5 min for pre-denaturation plus, 94°C for 50 s, and at the annealing temperature for 45 s 72°C for 60 s for 33 cycles followed by an additional extension at 72°C for 10 min. The PCR products were separated with agarose gel on an 1%.

PCR amplification and genotyping
An M13 tail (5ʹ-AGGGTTTTCCCAGTCACG-3ʹ or 5ʹ-GAGCGGATAACAATTTCACAC-3ʹ) was added to the 5ʹ end of each forward primer of these loci. The M13 universal primer with the same sequence to the M13 tail was labeled with FAM, TAMRA or HEX at its 5ʹ end. Each pair of primers was tested on 37 specimens of M. reevesii (7 were collected in Anhui, 14 were collected in Guangxi and 16 were collected in Guangdong), PCR experimental system and reaction program as above.

Microsatellite cross-species amplification
Cross-species amplifications of 15 microsatellite loci were tested in six individuals (Mauremys megalocephala were collected in Anhui, M. mutica were collected in Guangdong, Ocadia sinensis were collected in Anhui, two individuals of each class) by using the same amplification conditions described above. The PCR products were visualized on 1% agarose gel.

Data analysis
PCR products were analyzed on an ABIPRISM 3730 Genetic Analyzer by using ROX 350 or LIZ 500 (Applied Biosystems). Genemarker (Applied Biosystems) was used to analyze the size standard. Popgene version 1.32 (Yeh et al. 1997) and Genepop 4.0 (Raymond & Rousset 1995) were also used to estimate Hardy-Weinberg equilibrium (HWE), allelic counts (N A ), observed heterozygosity (H O ) and expected heterozygosity (H E ) for the loci.

Results and discussion
987 contigs were obtained from RAD-seq, which contain microsatellites loci, the average size of 482 bp, with dinucleotide motif were themost frequent (45.29%), followed by tri-(9.52%) except mononucleotides (40.93%). Longer motifs like tetra-(1.11%), penta-(0.81%), hexanucleotide (0.51%) and compound (1.82%) motifs were the least frequent microsatellites in our dataset (Table I). In eukaryotes, the abundance of dinucleotide is the highest, but relatively few of tetranucleotide, and our statistical results were consistent with the previous studies (Carleton et al. 2002). AC (26.17%) and AGC (23.40%) motifs were the most frequent of the di-and trinucleotide motifs, respectively. Most repeat tracts were low in tandem repeat motif number, and few sequences were observed had high numbers (Lü et al. 2013). Among the dinucleotide repeat sequences, repeats with 6-7 copies were the most common (30.60%); among trinucleotide repeat sequences, repeats with 4-5 copies were the most common (8.10%), while repeats with 10-11 copies were the most common (29.79%) among mononucleotide repeat sequences (Table I).
In our screening of 15 microsatellites, we can find that the dinucleotide motifs with high repetitions and trinucleotide to hexanucleotide motifs repetitions are relatively low, the overall performance of the number of repeats decreased with the repeat motif length increases. Although dinucleotide motif is more universal than longer motif microsatellite loci in many species, when PCR amplification used dinucleotide repeated the probability of nonspecific product was much higher, due to sliding mismatch or non-template nucleotide additional which led to the emergence of non-existent allele, then brought difficulty to the interpretation of microsatellite alleles. While the microsatellite motif reached more than trinucleotides, the probability of nonspecific products will significantly reduce (Haberl & Tautz 1999;Fernando et al. 2001). When the number of repeats was the same, long nucleotide motif microsatellite was easier to distinguish alleles than the dinucleotide motif in group detection. Thus, the long nucleotide motif microsatellite was considered having better application value. In this paper, we prefer to choose microsatellite markers with long repeat motif.
To prevent the screening of the same loci, 15 novel microsatellite loci were compared with previous results have been reported (Ye et al. 2009), the same sequence and primers were not found. The 15 microsatellite loci generated amplification products with 3 to 16 alleles per locus. The number of alleles per locus ranged from 3 to 16. The observed and expected heterozygosity per locus varied from 0.3784 to 1.000 and from 0.3995 to 0.9700, respectively. Compared with the other turtle species, our screening microsatellite loci had high polymorphism, such as the following: Fantin et al. screened 17 microsatellite loci from Podocnemis unifilis, the scope of alleles between 3~11, the range of the observed heterozygosity and expected heterozygosity was 0.208 0.950 and 0.395~0.592, respectively (Fantin et al. 2007). Que et al. screened of 15 microsatellite loci and alleles from Pelodiscus sinensis between the range of 2~7, observed heterozygosity varied from 0.03~0.98 and expected heterozygosity ranged from 0.05~0.81 (Que et al. 2007). This implies M. reevesii has a high level of genetic diversity. Besides, there were only two loci (SLN 02, SLN 08) exhibited significant deviation from Hardy-Weinberg equilibrium (Table II). It may be caused by the non-randomized sampling, the small number of individuals or the existence of null alleles.
The 15 primers could produce stable and clear bans in three species (M. megalocephala, M. mutica, O. sinensis). Hereby, the results of partial cross-species amplification of PCR products in 1% agarose gel electrophoresis are shown in Figure 1, and the genotyping was exhibited in Figure 2. It revealed that the microsatellite flanking sequences of the turtles were highly conserved, which suggested that the 15 microsatellite loci in this study were conservative in Geoemydidae and could be also applied to the research of some species of Geoemydidae in genetic diversity (Baggiano et al. 2011).
RAD-seq was utilized to identify SSR markers in eggplant in the previous study (Barchi et al. 2011), the sequences generated nearly 2,000 putative SSRs, and primer pairs were designed to amplify 1,155 loci. In this study, we obtained a smaller number of 987 target contigs with a reason that a low shearing efficiency of PstI enzyme in DNA of turtles. RAD-Seq methodology took advantage of one restriction enzyme and random shearing to generate genomic fragments, came with high levels of DNA loss and little control over the sequenced fragments, mainly for organisms without a reference genome (Hohenlohe et al. 2010). RADseq technology has made progress and developed, double digest RAD-seq (two kinds of enzymes were utilized in the methodology) has arisen (Bonatelli et al. 2015), it can overcome these shortcomings and increased the sequencing of the same genomic regions across individuals. This will be a new way for us to screen microsatellite in the future.

Conclusion
In summary, 15 novel polymorphic microsatellite DNA loci have been developed specifically for M. reevesii. These markers may serve as a valuable tool for population genetics analyses, gene flow and provide information on the evolutionary history of the species. This is  the first time to develop polynucleotide-repeat microsatellite markers in M. reevesii by using RADsequencing technology.

Disclosure statement
No potential conflict of interest was reported by the authors.