First microsatellite markers for Scaligeria lazica Boiss. (Apiaceae) by next-generation sequencing: population structure and genetic diversity analysis

ABSTRACT The Apiaceae family includes a few agronomic and medicinal species, one of which is Scaligeria lazica Boiss. In this study, the genetic diversity of S. lazica was analyzed based on novel simple sequence repeat (SSR) markers using next-generation sequencing (NGS). A total of 15.17G clean Illumina data set was obtained and dinucleotide repeats were the most abundant repeats in S. lazica. Of the tested 150 SSR primer pairs, 139 ones produced amplification and 84 ones were polymorphic. Forty polymorphic SSR loci were used in genetic diversity analysis of 40 S. lazica accessions from four locations. A total of 264 alleles were amplified with an average of 6.6 alleles per locus. The polymorphism information content (PIC) was 0.60, while the observed homozygosity (Ho) and expected heterozygosity (He) values were 0.47 and 0.66, respectively. According to cluster and structure analysis, all accessions were grouped into four different clusters according to their collection sites. The SSR markers developed in this study can be tested for other Scaligeria species due to their high transferability and can be used for genetic studies in genus Scaligeria DC.


Introduction
There is an overwhelming plant biodiversity on earth and efforts have been made to categorize plants based on their size, forms, habitat, structure, anatomy, biochemical and molecular features to interpret the relationships among them [1][2][3][4].
Apiaceae Lindl. (Umbelliferae) includes 300-455 genera with 3000-3750 species and is one of the best known families of flowering plants. Members of this family are of great economic significance due to their culinary use (vegetables, condiments) and for their application in medicine [5,6]. The genus Scaligeria DC. belongs to Apiaceae and is represented by seven species, of which two are endemic in Turkey [7,8]. The naturally growing Scaligeria species in Turkey are as follows: S. napiformis (Willd.ex Spreng) Grande, S. tripartita (Kalen.) Schischkin, S. lazica Boiss. (endemic), S. meifolia (Fenzl) Boiss., S. glaucescens (DC.) Boiss., S. hermonis Post and S. capillifolia Post. (endemic). S. lazica has a strong characteristic smell of anise due to the presence of trans-anethol (1-methoxy-4-(1-propenyl) benzene) in its oil, which is known as 'laz anasonu' in the Black Sea Region of Turkey [8,9]. The economically important trans-anethole-containing oils are used widely in the food and liquor industries [10]. trans-Anethole originates from anise in raki and is also used for flavour in many alcoholic drinks such as Middle Eastern arak, Colombian aguardiente, French spirits absinthe, anisette and pastis, Greek ouzo, mastika in Bulgaria and the former Yugoslav Republic of Macedonia. [11]. In addition, anethole shows estrogenic effects and is useful in providing relief from rheumatism and lower back pain [12,13]. Interestingly, some phytochemical markers of Pimpinella spp. are also detected in S. tripartita and S. lazica species [9,[14][15][16]. Studies on Scaligeria spp. are still very limited in the literature to clarify the taxonomy of the genus. A promising approach to resolving these problems in the taxonomy of Scaligeria species are molecular studies.
Simple sequence repeats (SSRs) are a very powerful technique due to their high rate of polymorphism and repeatability. The potential use of SSR molecular markers in the family Apiaceae has been demonstrated by different studies. Novel SSR markers were developed in Apiaceae and they have been used to study the genetic diversity, phylogenetic relationships and germplasm characteristics. The potential use of SSRs in Apiaceae has been demonstrated by different studies [17][18][19][20][21].
Next-generation sequencing is a very advantageous technology as regards the millions of sequences that can be generated at once with a lower unit cost. It is possible to have thousands of microsatellite loci sequenced very rapidly and cost-effectively in plants [22,23]. SSR markers in plants are used as a prominent genomic resource in the botanical sciences. SSRs are highly repeatable and transferable across closely related species [24][25][26][27]. An important feature of SSR markers is that multiple alleles per locus can be amplified, e.g. 10 alleles in a population. This variability makes SSR markers highly polymorphic and useful for genetic diversity analysis [27,28].
To date, there are no reports on SSR development and genetic diversity studies in Scaligeria lazica. Therefore, in this study, we aimed to develop novel SSR markers by next-generation sequencing and to study the population structure in S. lazica. Due to their high transferability, the markers from this study can also be useful as molecular tools for other species in genus Scaligeria.
DNA extraction was carried out by using the DNeasy mericon® 96 QIAcube HT Kit (QIAGEN Ltd., Manchester, UK) according to the manufacturer's protocol. DNA concentration was measured using a Qubit Fluorometer (Invitrogen, Massachusetts, USA) or estimated by comparing the band intensity with λ DNA of known concentrations after separation by 0.8% agarose gel electrophoresis and ethidium bromide staining. The DNA samples were then diluted with elution buffer (EB) to a concentration of 10 ng/mL for SSR-PCR (polymerase chain reactions) reactions.

SSR loci development by illumina sequencing and primer design
A DNA sample of one S. lazica accession (RCH-5) from the Rize-Camlihemsin location was used for next-  generation sequencing using the Illumina Hi-Seq TM 2500 platform. The library construction and sequencing were performed at the Beijing Genomic Institute, China. About 15.17G clean data were generated after removing adapter and low quality (quality value 5 (E) sequences. SOAPdenovo2 [29] was used for assembly and SSR loci were searched using SSRIT [30] software. The search parameters were set for detection of di-, tri-, tetra-, penta-and hexanucleotide SSR motifs with a minimum of 11, 8, 6, 5 and 4 repeats, respectively. The SSR loci were subjected to primer design using Primer 3 webbased software [31] with the standard parameters, following settings; the primer length was between 18 and 23 nucleotides, with an optimum size of 20 nucleotides. The melting temperatures ranged from 50 to 70 C, with an optimum temperature of 55 C. The optimum GC content was set to 50%, with a minimum of 30% and a maximum of 70%. The predicted PCR products ranged from 100 to 350 bp. In this study, 1982 SSR primer pairs were successfully designed and 150 SSR primer pairs were randomly selected and synthesized.
For the successful amplification of the SSR loci, their optimum annealing temperatures were determined by gradient PCR. Two accessions were used in the gradient PCR reactions. The gradient PCR products were run on 3% agarose gels and visualized after ethidium bromide staining. If a primer pair had amplification, then PCR and capillary electrophoresis were performed to determine its allelic profiles by using eight accessions (two from each location). The sequences of amplified SSR loci were deposited into the National Center for Biotechnology Information (NCBI) database (GenBank accession numbers KX234451-KX234589).

PCR reactions and electrophoresis
SSR-PCR was carried out using a three-primer strategy according to Schuelke [32] with some modifications [33]. PCR reactions were performed in a total volume of 12. Amplification was performed in a Veriti thermal cycler (Applied Biosystems, Massachusetts, USA) in two steps as follows: initial denaturation at 94 C for 3 min, followed by 10 cycles at 94 C for 30 s, 56-60 C for 45 s and 72 C for 60 s. Then, a second step included 30 cycles at 94 C for 30 s, 58 C for 45 s and 72 C for 60 s.
A final extension at 72 C for 10 min was included in the program. When the PCRs were completed, the reactions were subjected to denaturation for capillary electrophoresis in ABI 3130xl genetic analyzer [Applied Biosystems Inc., Foster City, CA, USA (ABI)] using 36 cm capillary array with POP7 as the matrix (ABI). Denaturation of the samples was completed by mixing 0.5 mL of the amplified product, 0.2 mL of the size standard and 9.8 mL of formamide. The fragments were resolved using ABI data collection software 3.0. and SSR fragment analysis was performed with GeneScan Analysis Software 4.0 (ABI).

Data analysis
After determining the alleles of each SSR locus by capillary electrophoresis (Figure 2), the number of alleles (Na), the number of effective alleles (Ne), observed (Ho), expected (He) heterozygosity and the number of private alleles were calculated using GenAlEx V6.5 software [34]. The polymorphism information content (PIC) was also calculated using PowerMarker ver 3.25 software [35]. A dendrogram was obtained using Jaccard coefficient in NTSYSpc v2.21c [36] software by unweighted pair-group method with arithmetic averages (UPGMA).
Population structure and identification of admixed individuals was performed using the model-based software program, STRUCTURE 2.3.4 [37]. In this model, a number of populations (K) are assumed to be present, each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned to populations (clusters), or jointly to more populations if their genotypes indicate that they are admixed. The posterior probabilities were estimated using the Markov Chain Monte Carlo (MCMC) method. The MCMC chains were run with a 50,000 burn-in period, followed by 500,000 iterations, using a model allowing for admixture and correlated allele frequencies. Run of STRUCTURE was performed by setting K from 1 to 5.

Results and discussion
Development of novel SSR loci and genetic diversity analysis in S. lazica A total of 15.17G clean Illumina data were generated to search SSR loci and 1982 SSR primer pairs were successfully designed. A total of 1982 SSRs dinucleotide repeats were the most abundant (70.63%), followed by trinucleotide (11.5%) and hexanucleotide repeats (11.3%), while tetranucleotide (4.3%) and pentanucleotide (2.1%) repeats were the least frequent ( Figure 3). The most abundant dinucleotide and trinucleotide repeat motif types were TC/GA and TTC/GAA, respectively.
Of the tested 150 SSR primer pairs by gradient PCR analysis, 139 primer pairs produced successful amplification. Fifty-five of 139 primer pairs were monomorphic in the screening of the SSRs in the eight accessions. Eightyfour SSR loci (60.4%) generated polymorphic fragments (Table S1 in the Online Supplement) and 40 were used for further analysis to fingerprint 40 Scaligeria accessions. The Genebank accession numbers, the sequences of forward and reverse primers, repeat motifs, annealing temperatures, product sizes, Na, Ne, Ho, He and PIC values of the 40 SSR loci are presented in Table 2.
The average values of population genetic parameters in this study are summarized in Table 3. The highest degree of genetic diversity values occurred in the RCE location, while the TSM location had the lowest values and RCH and TAD locations had similar degrees of genetic diversity. The number of private alleles in the RCH location was quite higher than that in the other three locations. For example, it was four times more than in TAD and 16 times more than in TSM. The polymorphism rate of 264 alleles was the highest (97.3%) in the RCE location, whereas the TSM (91.3%) and RCH (92.4%) locations had the lowest polymorphisms.

Genetic relationships and population structure
The genetic relationships among S. lazica accessions are provided in Figure 4. The clustering of the 40 S. lazica accessions was largely based on the place of collection and geographic origin. The UPGMA dendrogram grouped all accessions into two major clusters: the first cluster contained accessions from the RCH population, whereas the other three populations were in the second cluster. The second cluster had two main subclusters: one subcluster contained all accessions from the RCE location, whereas the second subcluster contained all accessions from Trabzon province (TAD and TSM).
The genetic structure of the Scaligeria accessions used in this study is shown in Figure 5. A model-based clustering method was performed for all 40 accessions using 40 SSR loci. The most probable number of clusters was identified by calculating the Delta K (DK), which is based on the rate of change in the log probability of data between successive K values (K = 1 to K = 5). The peak of the DK graph corresponds to the most probable number of populations in the data set, as previously discussed in Motalebipour et al. [38]. The highest value of DK was found at K = 2 (Figure 6), where all 44 genotypes were divided into two main groups: the RCH accessions and the other (RCE, TSM, TAD) accessions, which is similar to the UPGMA dendrogram ( Figure 4). As the value of K increased to 3, the genotypes in group 2 were divided into two subgroups: the first subgroup contained accessions of RCE and the second subgroup consisted of the other accessions. When K = 4, all accessions were classified into four different clusters according to their collection sites and geographical distance which was supported by another high DK found at K = 4. As the value for K increased to 5, the accessions were again divided into four groups and additionally provided grouping of RCE accessions based on their genetic diversity ( Figures 5 and 6).

Novel SSR markers and genetic diversity in S. lazica
Due to the codominant nature and high polymorphism, SSR markers are reliable tools for genetic diversity and population structure analysis as well as genetic mapping, fingerprinting and parental identification in plants. However, sequence information is needed for primer development for each species; therefore, the development process is expensive and time-consuming. Next-generation sequencing technology provides an opportunity to generate thousands of SSR loci in a very short time at a lower cost than Sanger sequencing. However, testing the loci for polymorphism in the lab is still expensive and time-consuming. However, it is a very powerful technique, once the SSR loci have been developed.
The Apiaceae family includes several agronomic and medicinal vegetable species, one of which is S. lazica in the genus Scaligeria. There are a limited number of SSR markers in the literature for Apiaceae family [21,39,40].  Till now, to the best of our knowledge, there is no study on development of SSR markers in Scaligeria in the literature. Thus, this is the first report that presents polymorphic SSR loci for Scaligeria. In this study, from 15.17G clean Illumina data, 1982 SSR primer pairs were successfully designed and the most frequent repeats were identified to be dinucleotides (70.63%), in agreement with previous reports. In Apiaceae family, Rijal et al. [21] and Cavagnaro et al. [17] reported polymorphic microsatellite markers in Heracleum persicum Desf. ex Fisch and Daucus carota L., respectively, and they found that the most frequent repeats are dinucleotides, as in our study. The most abundant dinucleotide motif in Scaligeria was TC/GA. The GA motif is generally one of the most abundant dinucleotide repeats in plants [41][42][43]. There are several reports on SSR development in Apiaceae: Michalczyk et al. [39] generated 12 SSR loci from Cnidium dubium (Schkuhr) Thell. and obtained an average of 8.  produced 264 polymorphic alleles. The number of generated and characterized polymorphic SSR loci was higher than those in the above-mentioned studies for Apiaceae family [21,[40][41][42][43][44]. The mean PIC value obtained in this study was higher (PIC = 0.60) than those in the previous reports discussed above. The average Ho value for the 40 polymorphic loci in this study was 0.47, which was higher than that for L. schaffneriana subsp. recurva and  D. carota L. ssp. sativus [40,44], while lower than that for C. dubium and H. persicum [21,39]. The average He for the 40 polymorphic loci in this study was 0.66, which was higher than the values reported in the previous studies in Apiaceae.
Among the populations, the Cayeli location in Rize province (RCE) had the highest genetic diversity values and polymorphism rate, while the Sumela location in Trabzon province (TSM) had the lowest levels of genetic diversity ( Table 3). The lowest level of polymorphism and number of private alleles were also in TSM location. The marginal distribution would reduce the opportunity to communicate with other populations and may lead to a low level of genetic diversity.

Cluster analysis and genetic structure of S. lazica
To identify the true optimal number of subsets (K) in STRUCTURE, LnP(D) and DK were chosen, as described in Ren et al. [45]. The K value that provides the maximum likelihood, called LnP(D) in STRUCTURE, is generally considered as the optimal number of subdivisions [37]. Our results from two main genetic sites can be summarized with the maximum of ad hoc measure DK at K = 2 ( Figure 6). All the accessions separated and consistently grouped well in both cluster and structure analysis. The dendrogram and structure analysis at K = 2 divided Scaligeria accessions into two main clusters: the first cluster included all RCH accessions, whereas the second cluster contained the accessions from the other collection sites. The second cluster divided into two subclusters: the first subcluster contained RCE with 10 accessions, whereas the second subcluster contained TSM and TAD which included 10 accessions in each group.
According to both cluster and structure analyses, the accessions from RCH and TAD locations were the most distant group, whereas the closest groups were TSM and TAD. These two locations were also geographically the closest (Table 1 and Figure 1). The high number of private alleles separated the RCH location from the others, which was also supported by cluster and structure analyses at K = 2 (Figures 4 and 5). In the structure analysis at K = 4, all accessions were classified into four different clusters according to their collection sites. The structure analysis demonstrated that the accessions in RCE location were the most diverse ones, which was supported by structure analysis at K = 5. The results presented in this study, therefore, provide information about the genetic diversity levels of S. lazica from different locations in Turkey. Similar diversity can also be observed in the production of trans-anethol by the accessions, which is the most important product of Scaligeria species.

Conclusions
Here we present novel SSR markers for the first time in S. lazica by next-generation sequencing. Dinucleotide repeats were most abundant (70.63%), followed by trinucleotide (11.5%) and hexanucleotide repeats (11.3%) in S. lazica. SSRs are a co-dominant marker system and can be applied not only as a useful molecular tool for the analysis of genetic variation, population structure and phylogenetic relationships of S. lazica accessions collected from different locations, but also may contribute to development of DNA markers linked to the agronomic and medicinal characters in S. lazica. Furthermore, novel SSR markers developed from this study can be also analyzed to test their transferability to other Scaligeria species.

Disclosure statement
No potential conflict of interest was reported by the authors.