Assessment of genetic diversity in Triticum urartu Thumanjan ex Gandilyan accessions using start codon targeted polymorphism (SCoT) and CAAT-box derived polymorphism (CBDP) markers

Abstract Among the wild relatives of wheat, Triticum urartu Tumanian ex Gandilyan is of interest to wheat breeders. Numerous works have focussed on its utilization for enrichment of the genetic variation of cultivated genotypes. In the present study, the genetic diversity and population structure in 85 accessions of T. urartu were investigated using start codon targeted (SCoT) polymorphism and CAAT-box derived polymorphism (CBDP) markers. Nineteen SCoT primers and fifteen CBDP primers amplified 185 and 141 polymorphic fragments with an average of 9.74 and 9.40 fragments per primer, respectively. The polymorphic information content (PIC) for the SCoT and CBDP primers ranged from 0.41 to 0.50 and 0.40 to 0.49, with the resolving power (Rp) ranging from 21.61 to 3.97 and 13.08 to 28.02, respectively. Neighbour-joining (NJ) based clustering grouped 72 accessions into two main clusters based on the two sets of markers and the combined data. Genetic relationships inferred from STRUCTURE analysis was matched with cluster analysis and principal coordinate analysis (PCoA), indicating that the accessions were grouped into two major clades and the grouping patterns were not correlated with the geographic origins. The results from this study revealed that Iranian T. urartu, especially Kerend-e-Gharb and Sisakht-Pataveh populations, can be interesting for wheat improvement. Hence, conservation of this region is recommended.


Introduction
Among members of Einkorn wheats, Triticum urartu with A u genome is a diploid and self-pollinated species (2n ¼ 2X ¼ 14), which is known as one of the main ancestral species of durum and bread wheat [1]. The main geographical origins of T. urartu are in the eastern and central parts of the Fertile Crescent region, which are parts of Turkey, Iran, Armenia, Iraq, Azerbaijan, Syria and Lebanon [2]. According to reports, the west areas of Iran are a main centre of distribution of Einkorn wheat with associated compositions of Aegilops species, as the richest wheat gene pool has been explored in different parts of this country [3]. Hence, distribution of this germplasm in the different parts of Iran, suggest that this area is an ideal source of diversity for discovering useful genes to transfer into modern wheat. Several studies reported that T. urartu is an ideal source of genes related to seed quality, tolerance to both biotic [especially for leaf rust and powdery mildew diseases] and abiotic [especially for drought, salinity and cold] stresses tolerance [4][5][6][7][8][9][10][11][12].
Genetic variability among plant germplasm offers scenarios for improving the plant characteristics. Molecular markers are known as one of the efficient tools for investigating genetic diversity and population structure. Over the past few decades, various molecular marker systems have been developed in a wide range of plant crops. Nowadays, gene-targeted marker systems have become an important and useful technique in the genetic diversity assays. Start codontargeted (SCoT) polymorphism system is known as one of the novel molecular markers, based on the short conserved region of the translation initiation codon (ATG) [13]. CATT box-derived polymorphism (CBDP) is another novel promoter-targeted marker, which uses the nucleotide sequence of CAAT box of plant promoters. The CAAT box region has a specific pattern of nucleotides with a consensus sequence and is located upstream of the start codon of eukaryotic genes [14]. Due to several advantages such as repeatability, low cost and high polymorphism, these techniques have been successfully applied in genetic diversity studies of many plant species [14][15][16][17][18][19][20].
Genetic diversity in the wheat germplasm opens up new scenarios to the discovery of useful genes or alleles and improvement of varieties with desirable traits, which include both farmer-and breed-preferred traits such as yield potential, large seed, high seed weight and tolerance to both biotic and abiotic stresses [21]. Previously, the genetic diversity and population structure of T. uraru populations have been studied using a large number of marker systems, including morphological characters [22], isozymes [23] and molecular markers [20,24,25].
Of these, molecular characterization has been demonstrated to be an efficient technique to study genetic diversity, and it has successfully been used to dissect the genetic architecture and population structure in many plants [26][27][28][29]. However, there are no reports on the genetic diversity and population structure of T. urartu population using SCoT and CBDP markers. The main goals of this study were to investigate the genetic population diversity and analyze the population structure in 85 accessions of T. urartu collected from different regions of Iran using these genetargeted markers.

Plant materials
A total of 85 accessions of T. urartu were collected from seven natural habitats located in west and southwestern regions of Iran (Table 1).

DNA extraction
Total genomic DNA was extracted from young leaves of glasshouse-grown seedlings according to the CTAB protocol [30]. DNA quality was assayed using 0.8% agarose gel electrophoresis.

DNA fingerprinting using SCoT markers
Nineteen SCoT primers used in this study were designed according to Collard and Mackill [13]. All 19 primers amplified scorable polymorphic bands and were selected for further analysis ( Table 2). All polymerase chain reaction (PCR) amplifications were done in a 30-lL reaction mixture containing 15 lL of master mix 2XPCR (ready-to-use PCR master mix 2X; Ampliqon), 9.5 lL double distilled water, 4 lL of the template DNA and 1.5 lL of each primer. The PCR amplifications (BioRad, T-100) were done under the following conditions: initial denaturation at 95 C for 10 min, followed by 38 cycles of denaturation at 95 C for 1 min, annealing at 53.7-62.8 C (varied for each primer) for 50 s, extension at 72 C for 2 min and a final extension at 72 C for 7 min. The PCR products were visualized in a 1.5% agarose gel, stained with SafeView II and finally photographed using a gel documentation system.

DNA fingerprinting using CBDP markers
For CBDP analysis, 15 primers were designed according to Singh et al. [14] ( Table 2). All PCR amplifications were carried out in 20-lL reaction mixture containing 6 lL double distilled water, 2 lL of the template DNA, 2 lL of each primer and 10 lL of master mix 2XPCR (ready-to-use PCR master mix 2X; Ampliqon). CBDP-PCR amplification was done under the following conditions: initial denaturation at 94 C for 4 min, followed by 30 cycles of denaturation at 94 C for 1 min, annealing at 33.6-73 C (varied for each primer) for 1 min and extension at 72 C for 2 min. The final extension was 7 min at 72 C. PCR products were visualized in a 1.5% agarose gel, stained with SafeView II and finally photographed using a gel documentation system.

Data analysis
The amplified SCoT and CBDP fragments were scored as 0 and 1 for absence and presence of the bands, respectively. Five informative indices including total amplified bands (NPB), percentage of polymorphism bands (PPB), polymorphism information content (PIC), resolving power (Rp), and the marker index (MI) were used as the discriminatory powers for screening the primers. To partition the genetic diversity, analysis of molecular variance (AMOVA) was performed using GenAlEx ver. 6.5 software [31]. Several genetic diversity indices such as the percentage of polymorphic loci (PPL), the observed (Na) and effective (Ne) numbers of alleles, Shannon's information index (I) and Nei's gene diversity (H) were estimated. Genetic dissimilarities were estimated based on Jaccard's coefficients [32], and a Fan-dendrogram was rendered by the Neighbour-joining (NJ) method using MEGA ver. 5.1 software [33]. Structure analysis was done using STRUCTURE software version 2.3.4 [34] to show the Bayesian clustering patterns for the 85 studied accessions. This analysis was performed with a burning time period of 5000 and a Markov Chain Monte Carlo (MCMC) replication number set up to 50000. The analysis was run 7 times for each 'K', ranging from 1 to 10. Finally, STRUCTURE HARVESTER, a program available online, was used to calculate the DK [35].

Polymorphism revealed by SCoT and CBDP markers
Nineteen primers generated a total of 185 scorable bands, of which all were polymorphic ( Table 2). The total amplified bands (TAB) ranged from 7 (SCoT-11  Our results indicated that SCoT markers showed higher TAB, NPB, PIC and MI than CBDP markers. Hence, the higher values of these indices detected by the tested markers indicate a reasonable significance level of genetic diversity in the Iranian T. urartu germplasm. Three informative parameters, PIC, Rp and MI, provide a benchmark that can help to determine the potential of markers in genetic analyses. In the present study, the average of MI and PIC values for SCoT markers was greater than CBDP, suggesting the good capacity of the SCoT system to present polymorphic level in investigated T. urartu accessions. In contrast, the average Rp value of the CBDP markers was more than that of the SCoT markers, which indicated the high ability of the CBDP markers to reflect the genetic diversity in T. urartu. These results are in agreement with Heikrujam et al. [36], Tiwari et al. [9], Etminan et al. [16,17,37] and Qaderi et al. [19], who reported that SCoT and CBDP markers were more useful for dissecting genetic diversity and structure than other molecular marker techniques.

Population genetic variation
The genetic diversity parameters calculated for each population are summarized in Table 3. As a considerable result, out of seven populations, the highest values of the genetic diversity indices (Ne, I, He and PPL) were observed for Kerend-e-Gharb, Songhor and Sisakht-Pataveh populations using each marker system. We selected Kerend-e-Gharb as a divergent population based on the pooled data. The higher genetic diversity in this population might be attributed to the frequency of allelic variation of this population being affected by different climatic conditions [38]. Moreover, this result reveals that this region may be a good source of variability to explore the new alleles and candidate genes for future breeding programmes. Our findings are in accordance with those of a previous research in which higher diversity was observed for T. urartu populations than other wild wheat species [20]. The highest values of genetic variation parameters were reported for T. urartu and other wild diploid wheats sampled from west regions of Iran, especially Kermanshah province [27]. The west region of Iran, especially Kermanshah, Lorestan and Kurdistan, are located in a part of the Fertile Crescent and it is generally accepted that the genetic diversity among wheat germplasm there is higher than in other parts of Iran. Hence, according to these results, we believe that the Kerend-e-Gharb region can be an ideal source for the discovery of new gene resources. In agreement with our findings, Mousavifard et al. [25] using an ISSR marker system revealed a high level of genetic diversity among Iranian wild diploid wheat species. Naghavi et al. [39] using three different marker systems such as RAPD, AFLP and SSR also indicated high diversity among the Einkorn wheat germplasm.

Population genetic diversity
Analysis of molecular variance (AMOVA) was performed to dissect the genetic variation in T. urartu populations ( Table 4). The results of AMOVA showed that the percentage molecular variance was higher within populations (SCoT ¼ 97%, CBDP ¼ 94%, pooled data ¼ 95%) than among populations (SCoT ¼ 3%, CBDP ¼ 6%, pooled data ¼ 5%), which suggests frequent exchange of gene pool among these different geographical regions [9]. This result was confirmed by inter-population differentiation (G ST ) and gene flow (Nm) parameters. The genetic differentiation coefficient (G ST )/gene flow (Nm) for SCoT, CBDP and pooled data were 0.07/6.20, 0.11/4.16 and 0.23/5.44, respectively. Wright [40] demonstrated that if Nm > 1, the gene flow can inhibit the differentiation among populations due to genetic drift. As a result, high values of Nm were found when we used SCoT, CBDP and pooled data (6.20, 4.16 and 5.44, respectively). Indeed, this result can be explained by the size of the populations and the degree of distributions between different regions [41]. Similar results have been reported previously for genetic diversity in T. urartu germplasm using SSR markers [42], IRAP and REMAP markers [43] and ISSR [25], which can support our findings.

Genetic distance and grouping relationship
According to SCoT analysis, the pairwise genetic distance coefficients estimated using Jaccard's coefficient showed a range of 0.17-0.77 with an average value of 0.56 among all the 85 accessions of T. urartu. The maximum genetic distance was found between two accessions from Kerend-e-Gharb (accession No. 23) and Songhor (accession No. 69) populations, whereas the minimum distance was observed between two accessions of Sisakht-Pataveh (accession No. 36) and To investigate the relationships among the studied accessions, cluster analysis was carried out using the NJ method and based on dissimilarity coefficients matrix. The Fan-dendrogram constructed using SCoT, CBDP and the pooled data (SCoT þ CBDP) grouped all accessions into two main clusters (Figure 1(A-C)). Based on SCoT data, the first cluster (CI) consisted of 2, 8, 4, 4, 6, 4 and 9 accessions from Bashmagh, Kerend-e-Gharb, Sisakht-Pataveh, Farokhshar, Bisetoon, Songhor and Marivan populations, respectively. Other accessions related to each population were grouped together in the second cluster (CII) (Figure 1(A)). In CBDP analysis, cluster I mainly consisted of accessions from Marivan and Bisetton along with some accessions from Songhor (4 accessions), Bashmagh (2 accessions), Sisakht-Pataveh (4 accessions), Farokhshar (4 accessions) and Kerend-e-Gharb (6 accessions) populations. More than half of the accessions from Songhor, Bashmagh, Kerend-e-Gharb, Sisakht-Pataveh and Farokhshar populations grouped in the second cluster (CII) (Figure 1(B)). A Fan-dendrogram rendered based on the pooled data showed a high level of genetic diversity among the studied populations and that each of cluster included several accessions of each population (Figure 1(C)).
Moreover, principal coordinate analysis (PCoA) was done based on SCoT, CBDP and pooled data. According to SCoT, CBDP and the pooled data, the first two axes accounted for 63.03, 31.87 and 61.21%   Table 1 for codes of genotypes.
of the total molecular variation respectively. Based on PCoA biplots, all accessions were classified into two distinct groups (GI and GII), so that grouping of genotypes was not in accordance with their origins ( Figure  2(A-C)). Indeed, these findings were confirmed by the results of cluster analysis ( Figure 1). Our results showed that the clustering patterns did not agree with the geographical distribution, revealing a high rate of gene flow among populations from different regions. Previously, many studies demonstrated a good efficiency of SCoT and CBDP markers for classification of different populations. For instance, in some studies, the phylogenetic relationships among different wild wheat species were studied and the results showed that this marker system had a high ability in discrimination of species based on their genomic structure [20,44]. Similarly, Etminan et al. [18] reported that the CBDP markers are more efficient in differentiation of various species of Aegilop and Triticum genera.
Although several studies have shown that SCoT and CBDP markers have a high efficiency in grouping individuals or species with different genetic background, our results revealed a low efficiency of these systems in grouping the individuals based on their geographical origins.
Population structure refers to all genetic patterns of individuals within a population. In fact, the genetic structure in a natural population is characterized by the number of possible subpopulations within it, the frequencies of different alleles in each subpopulation, and the degree of genetic isolation of the subpopulations [45]. The population structure of the 85 T. urartu accessions was analyzed based on generated polymorphism data of each marker system and pooled data using STRUCTURE analysis for K ¼ 1 to K ¼ 10. In all the analyses, the results obtained by STRUCURE HARVESTER indicated that maximum DK was reached at K ¼ 2 (Figure 3(A-C)). At this K, all accessions were assigned to two main subpopulations. The inferred population structure for K ¼ 2 indicated that most of the accessions had a membership coefficient (qi) to one of the subpopulations equal or higher than 0.50 (qi ! 0.50). These results further confirmed the PCoA and cluster analysis.

Conclusions
Our results revealed a high level of genetic diversity in the Iranian T. urartu populations. This was further supported by various statistical analyses such as AMOVA, Bayesian clustering pattern, PCoA and cluster analyzes that showed a high difference within populations. Hence this study can suggest that exploring this highly diverse gene pool may result in identification of useful genes for researchers for wheat adaptation and improvement. Furthermore, the present study revealed that the SCoT and CBDP systems were powerful tools for assessment of the genetic diversity in wheat wild relatives. Hence, we offer that these DNA-based systems could be used in combination with other molecular markers for genetic analyses such as, association mapping studies and construction of linkage maps.