Genetic diversity of cultivated pistachio as revealed by microsatellite molecular markers

ABSTRACT Pistacia vera L. is the only cultivated and commercially-grown species in the genus and Iran is one of the two major centers of Pistacia diversity and the main producer of pistachios in the world. Genetic diversity is crucial for sustainable use of genetic resources and conservation. To investigate the genetic diversity of pistachio (Pistacia vera), we genotyped 42 cultivars of this species using 20 polymorphic nuclear simple sequence repeat (nSSR) markers. The nSSR markers generated 3–7 alleles (102 in total) with an average of 5.10 per locus. Polymorphism information content (PIC) ranged from 0.36 to 0.86 with an average value of 0.64, while the observed heterozygosity (Ho) ranged from 0.21 to 0.79 with an average value of 0.44 and also the expected heterozygosity (He) varied from 0.11 to 0.39 with an average of 0.22. Genetic similarity values obtained from Dice's coefficient ranged from 0.08 to 0.93. The UPGMA (unweighted pair group method with arithmetic mean) dendrogram and Bayesian clustering separated the cultivars into two major clusters containing 21 and 21 cultivars, respectively, and also the neighbour-joining method revealed their phylogenetic relationships. The present results indicated the existence of wide genetic variability within the species and can be used for further research in the area of germplasm conservation and plant breeding.


Introduction
The genus Pistacia is a member of the Anacardiaceae family and consists of 11 species [1]. Pistacia vera L. (2n = 2x = 30) is a commercially-grown species in the genus [2], native to north Afghanistan, northeast Iran and central Asian republics [3]. It has been cultivated widely especially in Iran, which allocates the main part of the country's non-petroleum export product [4]. As one of the oldest nut crops in human history, it dates back to ancient times [1]. In addition, pistachio nuts have a high nutritional value and are commercially important [3]. There are two other species in Iran, including P. atlantica subsp. mutica and P. khinjuk, which are mainly used as rootstock for P. vera [4]. Iran is the world's largest producer of pistachio with the highest cultivation area of this crop in the world [5], but compared to some countries like United States and Turkey, has low yield in recent years.
Although morphological, physiological and biochemical methods are current tools to study genetic diversity of the plant species, they have some limitations, for instance lack of sufficient polymorphism, effect of environmental conditions on character expression and time consumption for making a full description, as several characters need to be evaluated during the entire growth period of the plant. For these reasons, morphological, physiological and biochemical methods cannot be the only efficient techniques for identification of tolerant cultivars. The genetic bottlenecks imposed by crop domestication and the subsequent improvement programmes have meant that only a small fraction of the total genetic diversity within a species gene pool is captured in the modern improved crop cultivars [6]. The amount of genetic diversity within a crop species depends on the various types of genetic resource in question, i.e. whether the resource is a wild genetic stock or a highly selected material from a breeding programme, as well as the breeding system and domestication history [7]. Determination of genetic diversity can help the achievement of the processes such as gene transferring from wild species through direct gene transfer techniques or using vectors without needing crossing, shortening plant breeding programmes and rapid and safe selection in hybrids carrying a foreign gene. Continual advances in crop improvement through plant breeding are driven by the available genetic diversity. Therefore, the recognition and measurement of such diversity is crucial to breeding programmes. In addition, this knowledge provides the cornerstone on which decisions for germplasm collection, preservation and exchange are based. It is important to have fast, reliable, cost effective and objective methods to identify and describe plant material [8].
Today, molecular marker technology provides a new way to identify cultivars and evaluate genetic diversity. DNA markers allow direct comparisons of different cultivars at molecular level [9]. A number of different DNA molecular assays have been applied in pistachio studies, including RFLP [8], RAPD [10,11], AFLP [12], SRAP [13] and SSR [14,15]. Microsatellite markers (SSRs) are tandem repeats of one to six nucleotides which are widespread in both eukaryotic and prokaryotic genomes [16,17]. SSR marker has some advantages such as quickness, simplicity, co-dominant inheritance, high polymorphism and abundant distribution throughout genomes. This technique has been used widely for genetic mapping, comparative analysis and QTL analysis in plants [18].
Iran is considered as the origin of a number of fruits and nuts, because the country has highly diverse weather conditions and geographic regions [19]. Despite the extensive genetic diversity of Iranian pistachio germplasm, few studies have addressed this important crop, and only some of its cultivars are commercially used. Thus, the germplasm of this species in Iran is endangered. Therefore, in order to manage and protect the Iranian pistachio, identification and registration of this valuable nut crop is essential for breeding programmes as well as commercial production. Thus, in the current study, nuclear SSR markers (nSSR) have been used to investigate the extent of diversity in the cultivars of P. vera gathered across Iran.

Materials and methods
A total of 42 pistachio cultivars from major growing centers of the crop in Iran were used (Table 1). Young leaves were collected from each cultivar during June to July, frozen immediately after collection with liquid nitrogen and grinded into powder. Total genomic DNA was extracted from 100 to 200 mg leaf material following a CTAB-based procedure as Doyle and Doyle [20]. Quality and quantity of DNA were estimated both visually by ethidium bromide stained agarose gel 1.00% and a spectrophotometer.
For nuclear DNA amplifications, a total of 20 nSSR primer pairs were used ( Table 2). nSSR primer pairs were originally developed for P. vera by Topcu et al. [21]. The authors reported that they could be useful in genetic studies on the genus Pistacia for different purposes such as fingerprinting, population genetic and molecular characterization. nSSR PCR was carried out in a thermal cycler (My cycler, Bio rad, Germany) in a total volume of 25 mL under the following conditions: 20 ng/mL genomic  DNA, 1x PCR buffer (10 mmol/L Tris-HCl, pH 8.00; 50 mmol/L KCl), 1.50 mmol/L MgCl 2 , 200 mmol/L of each dNTP, 10 pmol of each primer and 1 U Taq DNA polymerase (Cinna Gen, Iran). Reactions were performed using a Touchdown PCR program of 5 min denaturation at 94 C, followed by 10 cycles of 45 s at 94 C, 45 s at 63 C, decreasing with 0.80 C every cycle and 1 min at 72 C. This was followed by 25 cycles of 45 s at 94 C, 60 s at primer set annealing temperature, 1 min at 72 C and a final extension time of 7 min at 72 C. PCR amplifications were confirmed by running 10 mL of PCR product on 2.50% agarose gels. Then, the amplification products were detected on 14% non-denaturing polyacrylamide gels visualized by silver nitrate staining. The nSSR alleles were systematically scored along a ladder size. Polymorphism information content (PIC) was calculated for each primer using formulas described by Powell et al. [22]. PIC = 1 ¡ Spi 2 , where, pi is the frequency of the ith allele. Also, MI = PIC £ nb, where PIC is the mean value of total PIC, n is the number of bands and b is the proportion of polymorphic bands (Powell et al. 1996). For each primer pair, the number of different alleles (Na), number of effective alleles (Ne), number of private alleles (Np), observed heterozygosity (Ho), expected heterozygosity (He) and discrimination power (Dp) were estimated using the GenAlEx 6.5 program [23].
The genetic similarities according to Dice's coefficient were calculated using the SIMQUAL program of the numerical taxonomy multivariate analysis system NTSYSpc version 2.10 [24] and the dendrogram was constructed through the SAHN clustering program using the unweighted pair group method with arithmetic means (UPGMA). The correlation between the nSSR similarity matrix and constructed dendrogram was computed by Mantel Test [25] using the COPH and MYXCOMP programs.
In addition, polymorphic alleles were analyzed using the STRUCTURE software to determine population structure, using the Monte Carlo Markov Chain (MCMC) algorithm [26,27]. STRUCTURE version 2.3.3 was applied to assign the cultivars to the subpopulations (K). The later data probabilities for each K (Pr(X | K)) or L(K) for clusters K = 1 to K = 10 were obtained using the Admixture model, which allows potential recombination between the justifiable clusters. Ten runs were used for each K, with 70,000 replicates, after a burn period of 30,000. The batch run function for a total of 100 runs (10 runs each for 1-10 clusters) was performed to measure the probability change value of each K. According to Evanno et al. [28], some of the subpopulations were inferred using the rate of change of the log probability (DK) between the conjugate values of K. Also, FreeTree software (version 0.9.1.50) [29] was used for phylogenetic relationships using neighbour-joining (NJ) trees which were viewed using Tree View program, version 1.6.6 [30].

Results and discussion
The 20 nSSR primer pairs gave reproducible amplification products. A total of 102 alleles were generated using the 20 primer pairs across the studied cultivars ( Table 2). The number of different alleles (Na) per primer pairs ranged from 3 (by CUPVB443a, CUPVD642x and CUPVB583) to 7 (CUPVB577, CUPVB630 and CUPVB462y) with an average of 5.10 alleles per primer pairs ( Table 2).  (Table 2). Normally, nSSRs with polymorphism information content (PIC) values > 0.50 are considered as highly informative markers [31].
Genetic similarity coefficients (GS) were obtained with using Dice's coefficient. The GS ranged from 0.08 (between Haj-Naseri and Shahpasand) to 0.93 (between Badami-Kaj and Sirizi) ( Table 3). The UPGMA method was used for cluster analysis and dendrogram construction. The cophenetic correlation coefficient (CCC) indicated high correlations between the similarity matrix of each marker and the cophenetic matrix obtained from the UPGMA dendrogram. CCC value calculated nSSR was 0.92. The cophenetic correlation coefficient is considered to be a very good representative of the data matrix in the dendrogram if it is 0.90 or greater [32].
The dendrogram obtained from the nSSR markers by the UPGMA method revealed high genetic variation between the studied cultivars and grouped them into two main clusters (Figure 1). The first main cluster contained 21 cultivars including Khanjari, Momtaz, Shahpasand, Loko, Nogh, Shahpasand-Kaj, Ghazvini, Rezaee, Sirizi, Amiri, Sifaldini, Ghafoori, Hasanzadeh, Harati, Ebrahimabadi, Mosaabadi, Hatami, Yazdi, Badami-Kaj, Badami-Zoodras and Sarakhs. Khanjari and Badami-Zoodras have high similarities in traits related to fruit (nut and kernel size and colour) and leaf. Sirizi and Badami-Kaj showed high similarity by SSR marker. Also, they have high similarities in morphological traits such as nut shape, kernel color and kernel size. Table 3. Genetic similarity matrix among the studied cultivars of P. vera based on nSSR data estimated by Dice's coefficient. The second cluster consisted of 21 genotypes including Aliabadi, Karimi, Ebrahimi, Shahrebabak, Ohadi-Riz, Akbarabadi, Sefid-Pesteh, Ghazvini-Zodras, Hasanzadeh, Ravar, Badami-Zarand, Fandoghi-Riz, Javadaghaee, Shasti, Lahijani, Mohseni, Beheshtabadi, Nasiri, Haj-Naseri, Rizab and Sirjani. Also, they have high similarities in traits related to fruit (nut and kernel size and colour) and leaf. Additionally, Bayesian clustering analysis using STRUCTURE software was performed. The most likely value of K based on the DK method in STRUCTURE HAR-VESTER (as chosen by Evanno's DK method) [28] in Bayesian clustering analysis was two and indicates division of variation into two clusters, indicating the most appropriate two main clusters and confirmed the groupings observed in the UPGMA dendrogram (the first cluster represented by red colour and the second cluster by green colour) (Figure 2). Phylogenetic analysis using NJ method revealed the evolution of the studied germplasm, in which Haj-Naseri, Sirjani and Nasiri genotypes showed the lowest similarity to their ancestors and thus were the newer genotypes ( Figure 3).
Molecular markers are considered to be versatile tools for studying genetic diversity and variability among different plant species. The major advantage of DNA markers is that they are not affected by environmental conditions [9]. The microsatellites (SSR) are being used widely nowadays for the exploitation of variability among genotypes of different plant species. The selection of a set of highly polymorphic core SSR primers is a crucial step for DNA fingerprinting in cultivar identification. According to the UPOV (International Union for the Protection of New Varieties of Plants, 2010) guidelines [33] for DNA fingerprinting using molecular markers for the protection of plant cultivars, only markers with distinct PCR alleles, high reproducibility and reasonable polymorphism can be selected.  Figure 1. UPGMA dendrogram produced using Dice's coefficient based on nSSR data in the studied cultivars of P. vera. Figure 2. Bayesian clustering analysis (K = 2) based on nSSR data for the studied P. vera germplasm (42 cultivars) performed using-STRUCTURE.
Note: Vertical black lines separate cultivars (for explanation of cultivar codes, see Table 1).
Pistachio has important socio-economic and ecological impacts in the arid and semi-arid agricultural regions of Iran [34]. In addition, Iran hosts a wide genetic diversity of Pistacia spp. and more than 300 pistachio genotypes have been collected across the country. Iran therefore possesses valuable germplasm for pistachio improvement and conservation programs. Assessing genetic diversity and relationships among cultivars of Iranian pistachio, using discriminative and robust markers, is therefore important [34].
In the present work, 42 P. vera cultivars were characterized with 20 nSSR markers. The results confirm the efficiency of microsatellite markers for fingerprinting purposes. Our results demonstrated that the level of detected diversity was high, with an average Na per locus of 5.10, Ne of 1.40, Ho of 0.44, PIC of 0.64, and a Dp of 0.74 for nSSR marker. These values were higher than those reported by Arabnejad et al. [35], who detected an average of 3.69 alleles per primer pairs and an average PIC of 0.46 detected in 20 commercial cultivars of Iranian pistachio; and also higher than those reported by Baghizadeh et al. [36] (an average of 2.75 alleles per primer pairs and an average of 0.44 for detected in 31 Iranian pistachio cultivars) and by Ahmad et al. [14] (an average of 3.30 alleles per locus in 17 pistachio cultivars). Kolahi-Zonoozi [37] assessed genetic diversity of 45 commercially Iranian cultivars using 12 nSSR markers and detected that PIC varied from 0.19-0.56 with an average of 0.33 and the mean of Ho and He were 0.49 and 0.35, respectively. Mirzaei et al. [38] reported 80.00% polymorphism among 22 Iranian pistachio cultivars and wild pistachio species. In a study reported by Golan-Goldhirsh et al. [11] in assessing polymorphisms among 28 Mediterranean pistachio accessions, 27 selected primers produced 259 total bands (an average of 9.59).
Some cultivars in different locations have the same name and some morphological identity, while molecular results showed differences between them. For instance, Badami-Zarand cultivar was differentiated from Badami-Kaj and Badami-Zoodras. Also, Ghazvini-Zodras showed differences with Ghazvini. These differentiations can be due to the intrinsic nature of nSSRs, since it is very unlikely that the microsatellites amplified correspond to the mutated DNA region when they have been randomly isolated from the whole genome. The results from this study showed that the studied cultivars had high genetic variation due to the species' dioeciously and cross-pollination nature [36].

Conclusions
This study was aimed at evaluating the genetic diversity of Iranian pistachio in order to aid the conservation of its germplasm. The obtained information about the genetic variation between and within different populations will prepare the ground for the formulation of appropriate conservation strategies. The present analysis revealed that Iranian-cultivated pistachio germplasm is highly variable, presumably due to specific local genetic backgrounds, breeding pressure and/or limited interchange of genetic material. The unique nature of the Iranian pistachio germplasm revealed by our results, supports the case for the implementation of more intense characterization, conservation and breeding strategies. Also, the SSR markers used were useful for determination of genetic diversity among pistachio cultivars in Iran.