Genetic diversity and population structure analysis of landrace and improved safflower (Cartamus tinctorious L.) germplasm using arbitrary functional gene-based molecular markers

Abstract The genetic diversity and relationships among 48 landrace and improved safflower (Carthamus tinctorius L.) genotypes were analyzed using three gene-targeted markers, start codon-targeted (SCoT) polymorphism, conserved DNA-derived polymorphism (CDDP) and CAAT box-derived polymorphism (CBDP). A total of 30 primers (10 primers from each marker) detected genetic polymorphism among the safflower genotypes. Three different marker types showed high level of polymorphism and the CDDP markers produced a higher number of polymorphic bands (74) in comparisons with SCoT and CBDP. The average PIC values for SCoT, CDDP, and CBDP were 0.39, 0.43, and 0.41, respectively. The marker index (MI) in CDDP markers was higher than that in SCoT and CBDP markers. Cluster analysis using SCoT and CDDP markers grouped the safflower genotypes into three distinct groups, whereas the CBDP markers divided the genotypes into two clusters. There were positive correlations between the similarity matrices obtained by each marker type. The results from the cluster analysis and STRUCTURE analysis using combined data grouped the genotypes into two clusters, generally in agreement with their origins. To our knowledge, this is the first detailed report of using gene-targeted molecular markers for genetic diversity analysis in safflower. Our results showed the efficiency of these markers for genetic diversity analysis in safflower and their potential for genome diversity and germplasm conservation.


Introduction
Safflower (Carthamus tinctorius L.) is one of the most important annual oilseed crops which are grown throughout the semiarid regions. Safflower is a highly branched, diploid (2n ¼ 2x ¼ 24) and thistle-like annual plant that is believed to have originated and to have been domesticated in the Fertile Crescent region dating to approximately 4500 years ago [1,2]. In recent years, safflower has become an important crop especially with the increasing interest in its cultivation as a potential biofuel crop [3]. Traditionally, safflower has been grown for its seeds, and used as a food additive for colouring and flavouring foods, as well as in teas. Safflower florets have also been used in medicine for their soothing effect in cases of hysteria, such as that associated with chlorosis, and for reduction of blood cholesterol levels [4]. Crop genetic diversity is important for germplasm conservation and food security. The loss of diversity is reportedly one of the most essential environmental concerns outlined by the Food and Agriculture Organization [5]. Wild relative species and landrace genotypes of domesticated crops possess valuable traits for crop breeding, such as pest and disease resistance [5]. Recently, there have been different studies for characterization of safflower germplasm, including morphological studies [6,7], biochemical analyses [8,9], and also different molecular markers such as randomly amplified polymorphic DNA (RAPDs) [10][11][12], amplified fragment length polymorphisms (AFLPs) [13], inter-simple sequence repeats (ISSRs) [14,15], and simple-sequence repeats (SSR) [16]. Characterization of the genetic diversity in plant species based on gene families can be exploited for better estimates of genetic relationships for germplasm conservation and breeding programmes [17,18]. In recent years, there has been a shift away from random DNA markers (such as, RAPD, ISSR, and AFLP) towards gene-targeted markers such as start codon-targeted polymorphism [17], conserved DNA-derived polymorphism (CDDP) [18] and CAAT box-derived polymorphism (CBDP) [19]. The SCoT molecular markers are based on conserved short regions flanking the ATG translation start codon in the plant genome. These markers can amplify both DNA stands [17,20]. SCoT markers have been applied for genetic diversity analysis in different plant species such as chickpea [20,21], potato [22], tomato [23], peanut [24], and wheat [25,26]. CDDP markers are developed on the basis of short conserved amino acid sequences and designed polymerase chain reaction (PCR)-based primers based on the corresponding DNA. CDDP markers designed based on the sequences of gene families in the plant genome that are present in multiple copies. These specific primers are designed for specific annealing to known functional genes such as homeobox (KNOX) or auxin-binding protein (ABP1) coding genes and the generated polymorphic bands are detected in an agarose gel [18,27]. CDDP primers targeted plant genes which are mostly related to biotic and abiotic stresses. This technique has been shown to be highly polymorphic and efficient, and it has been successfully utilized in rice [17], bittersweet [27], chickpea [28], and wheat [25,26]. CBDP molecular markers exploit the CAAT box region of promoters in plant genes [19]. The CAAT box has a distinct pattern of nucleotides with a consensus sequence GGCCAATCT located $80 bp upstream of the start codon of eukaryote genes and plays an important role during transcription [19]. These gene-targeted markers (SCoT, CDDP, and CBDP) have longer primers with high annealing temperature, so their reproducibility has proved to be high compared to dominant arbitrary markers such as RADP and ISSR. The developments of these markers is based on public biological databases and does not require genome sequences for specific species [28]. These are dominant markers; however, a number of co-dominant markers are also generated during amplification. A literature survey shows that for the genetic diversity in safflower, only arbitrary dominant markers (RAPD, ISSR and AFLP) and SSR markers have been used. Therefore, in the present study, for the first time an effort was made to study the genetic diversity of 48 genotypes of safflower including both landrace and cultivated genotypes from different geographical regions using gene-targeted (SCoT, CDDP and CBDP) markers. We also compared the efficiency of these markers in genetic diversity analysis.

Plant material and DNA extraction
A total of 48 safflower genotypes (Table 1), including 24 landrace accessions from different geographical location of Iran, 12 Iranian improved genotypes provided by Iranian Seed and Plant Improvement Institute (SPII) and 12 genotypes from different countries (provided by Iranian National Gene Bank) were considered for genetic diversity using SCoT, CDDP, and CBDP molecular markers. All the genotypes were grown in the greenhouse and total genomic DNA was extracted from a pool of 10 plants of each genotypes following a CTAB extraction protocol [29]. DNA concentrations were estimated by both spectrophotometry (260/280) and gel electrophoresis (0.8% agarose gel) and used for PCR analysis in final concentration of 30 ng/lL.

SCoT-PCR analysis
Ten SCoT primers developed by Collard and Mackill [17] were used for genetic diversity analysis in safflower genotypes ( Table 2). PCR amplification was performed in 20-lL reactions containing 30 ng of template DNA, 1Â PCR buffer, 0.2 mmol/L dNTPS, 0.4 lmol/L of primer and 1.5 U of Taq polymerase (Cinaclon, Iran). The PCR reaction was performed in a TGradient 96 Thermocycler (What-man Biometra GmbH, Goettingen, Germany) as follows: 95 C for 4 min, followed by 38 cycles of denaturation at 94 C for 45 s, annealing at 49 C for 45 s and extension at 72 C for 2 min. A final extension cycle at 72 C for 10 min followed. The PCR products were separated in 1.3% agarose gels and stained with ethidium bromide.

CDDP-PCR analysis
Ten CDDP primers (Table 2) were used in 48 safflower genotypes based on the protein sequences of wellcharacterized genes from diverse plant species [18]. CDDP primers were synthesized by Sinaclon Company (Tehran, Iran). The CDDP-PCR reaction mixture was performed in a total volume of 20 lL containing approximately 100 ng of template DNA, 0.2 mmol/L dNTPs, 0.4 lmol/L of primer, and 1.5 U of Taq polymerase (Cinnagene, Tehran, Iran). The PCR reaction was performed in a TGradient 96 Thermocycler (What-man Biometra GmbH) as follows: an initial denaturation step at 94 C for 5 min, followed by 38 cycles of denaturation at 94 C for 30 s, optimal annealing temperature for 40 s and elongation at 72 C for 90 s, followed by a final elongation step at 72 C for 10 min. The PCR products were analyzed using 1.5% agarose electrophoresis gels stained with ethidium bromide.

CBDP-PCR analysis
According to the primer sequences previously designed [19], a set of 10 CBDP primers (Table 2) with clearly separated bands and high polymorphism were selected for genetic diversity in 48 safflower genotypes. PCR amplifications were performed in a total volume of 20 lL containing 30 ng of template DNA, 0.2 mmol/L dNTPs, 0.4 lmol/L of primer, and 1.5 U of Taq polymerase (Cinnagene). The PCR reaction was performed in a TGradient 96 Thermocycler (Whatman Biometra GmbH) as follows: initial DNA denaturation at 95 C for 4 min, followed by 3 cycles of 1 min denaturation at 94 C, 1 min annealing at 36 C, and 2 min of extension at 72 C. In the following 32 cycles, the annealing temperature was increased to 50 C with a final extension of 72 C for 7 min. The amplification products were resolved in 1.5% agarose gels stained with ethidium bromide.

Data analysis
The PCR products of SCoT, CDDP, and CBDP primers were scored visually. Only clear bands were considered for final scoring and data analysis. For each marker, the bands were scored visually for presence (1) or absence (0) in all the 48 accessions. Cluster analysis using unweighted pair-group method with arithmetic averages (UPGMA) using the similarity matrices was performed with the NTSYS-pc 2.1 program package (State University of New York, USA) [30]. It was also used to perform principal coordinate analysis (PCoA) to visualize the 48 genetic relationships among individual safflower genotypes. Mantel statistic was used to compare the similarity matrices as well as the dendrograms produced by the SCoT, CDDP and CBDP techniques. Polymorphic information content (PIC) values were calculated for each SCoT, CDDP, and CBDP primers according to the formula: PIC =1 À R(Pij) 2 , where P ij is the frequency of the i th pattern revealed by the j th primer summed across all patterns revealed by the primers. Marker index (MI) was obtained by multiplying the average PIC with the effective multiplex ratio (EMR). The EMR is the product of the number of polymorphic loci per primer (n) and the fraction of polymorphic fragments [30]. For the analysis of population structure, a Bayesian model-based analysis was performed using STRUCTURE 2.1 software (Pritchard Lab, Stanford University, USA) [31]. This software assumes a model in which there are K populations (clusters) that contribute to the genotype of each  Table 1.
individual and each is characterized by a set of allele frequencies at each marker locus. A Monte Carlo Markov chain method was used to estimate allele frequencies in each of the K populations and the degree of admixture for each individual plant. The number of clusters was inferred using 10 independent simultaneous runs with 1000 replications using the admixture model and correlated allele frequencies with the K value ranging from 1 to 10.

Results and discussion
SCoT markers diversity pattern Ten SCoT primers were screened to study the genetic diversity among safflower accessions; all the primers produced distinct scorable fragments. The amplification profile of safflower genotypes generated using the SCoT12 primer is presented in Figure 1(a). A total of 73 bands were generated, out of which 61 bands were polymorphic (83% polymorphism) across 48 safflower accessions ( Table 3). The maximum and minimum number of polymorphic bands were obtained using SCoT35 (11 bands) and SCoT22 (3 bands), respectively. The PIC values for the 10 primers ranged from 0.22 (SCoT 1) to 0.48 (SCoT 35) with an average of 0.39 per primer. The marker index (MI) of the primers ranged from 0.88 (SCoT 1) to 5.28 (SCoT 35). A diverse level of polymorphism using SCoT markers has been reported in different plant species. The level of polymorphism observed in safflower genotypes here was relatively higher than that reported in wheat [25], potato [22], and chickpea [28]. The Jaccard's genetic similarity values of the safflower genotypes based on SCoT molecular markers ranged from 0.49 to 0.93. Unweighted pair-group (UPGMA) clustering using SCoT molecular dataset grouped the safflower genotypes into three major groups ( Figure 2). Cluster I comprised 34 accessions and divided into two sub-groups. The first sub-group contained Iranian landraces, whereas in the second sub-group, Iranian improved cultivars grouped with genotypes from China, Turkey, and Cyprus ( Figure 2). Cluster II contained 10 genotypes, all of which were Iranian landraces. In cluster III, two Iranian landraces grouped with two cultivars from Turkey and Pakistan. The results also showed that the SCoT clusters had relatively direct connection with the origin and the source of the genotypes.

CDDP markers diversity pattern
When 10 CDDP markers were used for genetic diversity analysis in 48 safflower genotypes, a total of 89 sharp and scorable bands were generated, out of which 74 bands were polymorphic (83.1% of   polymorphism) across the 48 safflower genotypes. The amplification profile of the safflower genotypes generated using the ERF2 primer is presented in Figure 1(b). The maximum and minimum number of polymorphic bands were obtained using ABP-1 and WRKY-1 (10 bands) and Knox2 (4 bands), respectively. The PIC values for the 10 CDDP primers ranged from 0.24 (Knox2) to 0.49 (ABP-1) with an average of 0.43 per primer. The MI of the primers ranged from 0.96 (Knox2) to 4.6 (ABP-1 and WRKY-1). High level genetic diversity in safflower using ISSR and SSR markers has been reported previously [15,16]. The level of variability using the CDDP markers for safflower genetypes is similar to that found in durum wheat [26], bittersweet [27], and chickpea [28]. UPGMA clustering using the CDDP markers grouped the safflower genotypes into three major clusters ( Figure 3). Cluster I included four genotypes from Iran, Pakistan and the United States. Cluster II comprised 22 genotypes, most of which were Iranian landrace and five improved cultivars originated from Iran. Cluster III comprised 21 genotypes, most of which were Iranian landraces and improved cultivars from Turkey, Cyprus and China. Interestingly, the Iranian landrace genotypes grouped in cluster II and III and all Iranian improved cultivars also grouped in these clusters. The grouping of the genotypes into clusters was also connected with the geographical origin and the sources of genotypes.

CBDP markers diversity pattern
The 10 CBDP primers used in this study yielded 65 distinct and bright bands and their sizes ranged from 200 to 2900 bp. The amplification profile of the safflower genotypes generated using the CAAT3 primer is presented in Figure 1(c). The number of bands varied from 5 (CAAT4 and CAAT5) to 9 (CAAT1, CAAT8 and CAAT10), with an average of 6.5 bands per primer. Out of 65 bands, 57 (87.6%) were polymorphic, and the number of polymorphic bands varied from 4 (CAAT4) to 9 (CAAT8), with an average of 5.7 bands per primer. The detected polymorphism per primer ranged from 57.5 to 100% ( Table 2). The PIC values for the 10 CBDP primers ranged from 0.19 (CAAT4) to 0.49 (CAAT1) with an average of 0.41 per primer. The primers' MI ranged from 1.16 (CAAT9) to 4.41 (CAAT1) ( Table 2). UPGMA clustering grouped the 48 safflower genotypes into two major clusters. Cluster I comprised nine genotypes originating from the United States, Cyprus, and Turkey. The second cluster contained 39 genotypes mainly originated from Iran. In this cluster, landrace and Iranian improved genotypes were grouped together in closer sub-clusters (Figure 4).

Diversity and population structure analysis by combined data
The general dendrogram ( Figure 5) that was constructed using the combined data of all the molecular markers used in this study (SCoT, CDDP, and CBDP) grouped the genotypes into two major clusters. Cluster I comprised nine genotypes which mostly originated from USA and Turkey. Cluster II contained 39 genotypes, all of which originated from Iran. The dendrogram from the CBDP data was most consistent with the general dendrogram. The cophenetic . The Jaccard's similarity matrices generated from the whole marker dataset were used for PCoA. The 48 safflower genotypes were clearly classified into two groups ( Figure 6), which was in agreement with the UPGMA clustering above. The first and the second principal axis explained 26.11 and 12.01% of the variation, respectively. The genetic structure of 48 safflower genotypes was further explored using the Bayesian clustering model implemented in the STRUCTURE software. The results showed the highest peak at K ¼ 2 indicating the presence of two major clusters (Figure 7). These results mean that the 48 genotypes should be divided into two populations ( Figure 8). The results obtained from the STRUCTURE analysis are in good agreement with those obtained from the UPGMA clustering and PCoA. In this study, the three different gene-targeted molecular markers, i.e. SCoT, CDDP, and CBDP, adopted to study the genetic diversity among landrace and improved safflower genotypes from different sources demonstrated that gene-targeted molecular markers have advantages over the use of dominant random markers (such as ISSR, RAPD, and AFLP), as these markers reveal genetic diversity from the genic region in the genome and this functional diversity can be used in any species [32,33]. Gene-based molecular markers have not been used for genetic diversity analysis in safflower so far. To our knowledge, the present study is the first analysis of the genetic diversity among landrace and improved safflower genotypes collected from different geographical sources using gene-targeted (SCoT, CDDP, and CBDP) molecular markers. In our study, three different types of markers (SCoT, CDDP, and CBDP) showed high level of polymorphism and were found to be effective in determining the genetic diversity among safflower genotypes. Their efficiency was evident from the high number of polymorphic bands, PIC and MI values. A high level of diversity in safflower using SSR [34,35], AFLP [36], RAPD [11], and ISSR [37] markers has been reported earlier. In our study, three different markers showed relatively the same efficiency with regard to the average polymorphism and PIC values. The average PIC value of the CDDP markers (0.43) was higher than those of the SCoT and CBDP markers (0.39.5 and 0.41, respectively). The average MI of the CDDP markers (3) was also higher than those of the SCoT and CBDP markers (2.08 and 2.64, respectively). The CBDP marker utility for genetic diversity has been reported in cotton, linseed and Ajwain [19,38], Jojoba [33], and Kalmegh [39], whereas SCoT and CDDP markers have been proved effective in the analysis of genetic diversity in diverse plant species [20,22,23,25,26,[39][40][41]. These functional markers are also known as perfect markers, or diagnostic markers. The most important advantages of these types of markers are the direct link between the markers and the alleles of a locus of interest and the prevention of loss of information in marker-assisted selection due to recombination between gene and marker [42]. Gene-targeted molecular markers were used in the present study for several reasons. Firstly, these markers are easy to work  with in laboratory conditions and their reproducibility compared to arbitrary markers such as RAPD markers is higher due to the use of longer primers [17,18]. We found that there was a feasible relationship between the diversity patterns obtained by gene-targeted markers and the geographical origin. The high correlations between the genetic distances and the geographical origin of the examined safflower genotypes suggest that natural selection has significantly affected the genome regions that were amplified by SCoT, CDDP, and CBDP markers. In the other hand, maybe if more genotypes from the Fertile Crescent are included in the genetic diversity analysis, the results for geographical relationships and origins will be more acceptable. This was particularly observed from the results that the genotypes from Iran and Turkey grouped in distinct clusters. Overall, the present study showed high genetic diversity in safflower genotypes, as well as in Iranian landrace accessions. This level of genetic variation can be useful for more systematic germplasm management and utilization in breeding programmes [21].

Conclusions
This study, for the first time, demonstrated the capability of three different gene-targeted molecular markers (SCoT, CDDP, and CBDP) not only to determine the polymorphism rate, but also to be used for management of the genetic diversity in different improved and landrace safflower genotypes as well as in applied breeding programmes, particularly for the development of a core collection. The obtained results provide data for the characterization of safflower genotypes in addition to previous analyses by RAPD, AFLP, and SSR markers. The results showed that SCoT, CDDP, and CBDP molecular markers can be used as reliable techniques for detecting the levels of DNA polymorphism and genetic relationship in safflower.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This study was supported by Islamic Azad University, Sanandaj Branch, Iran under grant number 1703/18.09.93.