Genetic diversity and population structure of modern Bulgarian and foreign durum wheat based on microsatellite and agronomic data

Abstract The genetic variation and population structure of a panel of 90 durum wheat (Triticum durum desf.) consisting of 62 varieties and breeding lines originating from two agro-ecological zones in Bulgaria (Northern and Southern Bulgaria) and 28 introduced varieties from South–western, Central and Eastern Europe, and the USA were determined by 34 microsatellite markers (SSR). The genetic diversity in the modern durum wheat was 0.5612 with 6.88 alleles per locus. Model-based population STRUCTURE analysis identified two sub-populations (K = 2) separating the South Bulgarian varieties (SP1) from all others (SP2), including South-Western, Central-Eastern European and North Bulgarian ones. Subsequent genetic structure analysis at K = 4 revealed an additional division of each sub-population into two (SP1-2, SP1-4, SP2-1, SP2-3). DAPC analysis and UPGMA dendrogram based on SSR data were in excellent agreement with the Subpopulations defined by STRUCTURE analysis. The Principal Component Analysis (PCA) revealed that among the studied 5 agronomically important traits the correlation between grain yield and plant height was the highest and distinguished 10 varieties and advanced breeding lines from Southern Bulgaria (SP1-2 and SP1-4) as most promising in regard to yield stability. This study showed a good relationship between the genetic and phenotypic population structures according to the division of the genotypes by their agro-geographical origin. It will be useful for both breeders and farmers and could serve as a fundament for durum wheat improvement programs under drought prone environmental conditions.


Introduction
Durum wheat (Triticum turgidum L. var. durum (Desf.), 2n = 4x = 28, AABB genome) is an important cereal crop that accounts for 8% of the wheat planted area worldwide and 5% of the total wheat production [1]. It is used for the production of semolina, pasta, burghul, couscous and desserts unique to durum. More than a half of the durum wheat planted area is in the Mediterranean basin, mainly Italy, Spain, France and Greece, and the West Asian and North African (WANA) countries, where durum wheat has been an important commodity since historical times [2]. Although the Mediterranean region is accounted as the main producer of durum wheat in the world, it still remains the largest importer and consumer of durum wheat [3]. As durum wheat is adapted to low rainfall and semiarid environments and gives valuable production, it has been an object of revived interest in the last few decades [4,5].
However, under the current climate change scenario, the sustainable production of this important crop in the Mediterranean and west and eastern Balkan regions is extremely dependent on seasonal rainfall which is expected to become more and more unpredictable and highly variable, often impacting negatively the yield and quality stability.
The level of genetic diversity in durum wheat is very important for increasing agricultural productivity and decreasing the vulnerability to biotic and abiotic stress factors. The choices of varieties for crosses made by breeders may not be right enough to ensure the appropriate genetic diversity desirable to fulfil these requirements. Often the modern wheat cultivars are characterized by high uniformity at genomic regions encoding traits relevant to adaptation while an extensive diversity could prove detrimental [6]. On the contrary, genetic diversity would remain high in unselected genome regions or in the cases of adaptation to variable environments and resistance to diseases. Wheat production should be increased dramatically in order to secure a reliable and sustainable food source for the continuously increasing world population, which is expected to reach as much as 9.5 billion by the year 2050. The main challenge is to develop wheat genotypes that produce higher yields and can resist and/ or tolerate biotic and abiotic stresses associated with climate changes.
To address the biophysical and socioeconomic constraints of producing high-yielding and disease-resistant wheat with good grain quality, the research efforts aim to increase and stabilize wheat production worldwide [7]. For this purpose, it is necessary to select the best parental genotypes with enough genetic diversity to ensure the perspective hybrid combinations with the best adaptation to the specific climatic conditions. Many authors alert that the allelic richness and the genetic diversity of breeders' varieties of the main commercial crops, including wheat, have decreased as a consequence of the Green Revolution and the limited use and/or abandoning of the local landraces [8][9][10][11][12][13][14][15][16]. This trend also includes the Balkan Peninsula [17,18]. This narrow genetic diversity and the increase in the similarity between the modern varieties in relation to allelic composition are attributed to: the destruction of the native habitats; development of genetically uniform varieties and application of high selection pressure in breeding programs. Another contributing factor is the globalisation of the seed production industry, leading to a reduced choice of elite varieties. At the same time, common high-yielding varieties have been widely adopted in many countries, as a part of the Green Revolution [19].
Knowledge of genetic diversity is essential for understanding the relationships between varieties, their classification, and characterization for underlining strategies and enhancement of the breeding programs [20]. Durum wheat breeding programs heavily rely on molecular and phenotypic data to enhance the genetic diversity for agronomic traits of particular interest, such as productivity, grain quality, biotic-and abiotic stress tolerance, to respond to the challenges of climate changes. However, there are a limited number of morphological markers, and the environment has a strong influence on the expression of quantitative traits [21]. Genetic diversity studies have made huge progress in recent years, from classic detection of morphological traits to molecular variation on DNA level [22]. This has made it possible to select superior genotypes for further crop enhancement programs [23].
The recent advancements in molecular genetics technologies and the development of Next Generation Sequencing (NGS) platforms improved our knowledge of the wheat genome organization and evolution and offered reliable approaches for timely and efficient breeding [21]. Second generation DNA marker technologies established in hexaploid wheat like AFLP (Amplified Fragment Length Polymorphism), SSR (Simple Sequence Repeats), DArT (Diversity Arrays Technology), have been extensively used in various research projects in tetraploid wheat related to characterization of germplasm collections, genetic diversity, comparative and gene mapping studies [1,4,13,20,[24][25][26][27][28]. The availability of the high-density SNP chips (Illumina iSelect 90 K SNP) for wheat and the reference genome of durum wheat (cv. Svevo) have enhanced the construction of genetic and consensus maps with better resolution as well as the studies of marker-trait associations and isolation of genes associated with important QTLs. The high-density SNP chips and the wheat reference genome have made it possible to reveal the genomic patterns of diversity, inferring ancestral relationships between individuals in durum wheat populations [29][30][31].
Among the markers developed for hexaploid wheat, the SSR and SNP are the most useful markers in tetraploid wheat because of sharing the two A and B genomes [29]. They are the most applicable molecular tools due to their co-dominant nature and wide coverage across the genome. Although SNP markers progressively replaced the SSRs in the last decade, the microsatellites are still explored widely in various research projects because of their easy use, relatively low cost, and high degree of polymorphism provided by their multiallelic nature. Many studies have reported inclusion of SSR markers in the analyses related to determination of population structure, genetic diversity and the relationships among wheat genotypes, which is fundamental for the development of appropriate breeding plans [21]. The information obtained from the population structure analysis allows better utilization of the natural diversity for the identification of genes/QTLs of agronomic importance through the application of the recently developed molecular technologies.
In Bulgaria, durum wheat has been grown since ancient times because of the favourable climatic conditions. According to Mac Key 2005 [32] and Feldman 2000 [33], durum wheat spread across the Mediterranean Basin from the Fertile Crescent (10,000 BP) via Turkey (8500 BP), the Balkan Peninsula, Greece and Italy (8000 BP), and from there to North Africa and the Iberian Peninsula (7000 BP). Durum wheat seeds have been found in archaeo-botanical materials dating back to the Early Bronze Age [34]. Many local populations have been preserved in a number of small farms until the beginning of the last century.
The first Bulgarian durum wheat varieties were developed through individual selection from the landraces [35]. The artificial selection was initiated in the period 1950 − 1980, through the application of intraspecific and interspecific hybridization through crosses between lines selected from local populations with improved Bulgarian and foreign varieties from Italy, Portugal, Spain and Morocco. The first Bulgarian variety Apulicum 233 was obtained by intraspecific hybridization of Tr. turgidum var. durum x Tr. turgidum var. turgidum in 1963. This variety is characterized with a high stem and good cold and drought tolerance, and high productivity followed by the development of the low-stemmed durum wheat variety Zagorka in 1980. Since then, a number of new varieties were obtained via induced mutagenesis, intra-varietal hybridization in both wheat breeding centers located in Southern and Northern Bulgaria [36][37][38][39]. Recently, new high-yielding and high-quality varieties have been released. Among them, the varieties Predel (Southern Bulgaria) and Mirella (Northern Bulgaria) combine high content of yellow pigments, strong gluten, good colour and culinary quality of pasta products [40].
The assessment of genetic diversity in Bulgarian durum wheat is of great importance for future breeding programs directed toward increasing the genetic diversity in breeding materials selected for important agronomic traits such as high productivity, grain quality, biotic and abiotic stress tolerance. Characterization of the modern durum wheat germplasm by DNA markers provides a powerful tool for precise germplasm identification and evaluation of genetic diversity, crucial in breeding programs. The molecular and phenotypic characterization of wheat varieties is a steppingstone to preserve the existing genetic diversity. This will facilitate the selection of varieties that are well adapted to the specific agro-climatic and soil conditions. Until now, the application of molecular markers like SSR in Bulgarian durum wheat have been limited to the assessment of genetic diversity in Bulgarian landraces and several modern Bulgarian and foreign cultivars [41] and to providing evidence for alien introgressions in advanced durum wheat lines [42].
Here, we explore the genetic diversity and population structure of a panel of 90 modern durum wheat varieties and elite lines from the working collection of the Institute of Field Crops -Chirpan, Bulgaria, using SSR markers and phenotypic data to reveal the potential of the genotypes for further use in breeding programs for sustainable agricultural production.

Plant material
The plant material consisted of 90 modern durum wheat (Triticum durum Desf.) varieties. The Bulgarian durum wheat is presented by 26 modern varieties and 36 breeding lines developed in the Institute of Field Crops (IFC, former Cotton and Durum Wheat Research Institute), Chirpan and Dobrudzha Agricultural Institute, Northern Bulgaria (Supplemental Table S1). The foreign germplasm introduced in Bulgaria is presented by the varieties from several European countries, including Italy (11), France (1), Austria (4), Hungary (4), Germany (3), Russia (2) and Ukraine (1), and the USA (2).
For some of the subsequent analyses, the accessions were separated into 3 groups according to their geographic origin as follows: Eastern Balkans (Bulgaria); South-Western Europe and the USA (Italy, France and the USA); and Central and Eastern Europe (Germany, Austria, Hungary, Russia and Ukraine).

Phenotypic characteristics
Field experiments were conducted in three crop seasons (2018 − 2021) in the fields of the Institute of Field Crops (IFC, former Cotton and Durum Wheat Research Institute), Chirpan, Bulgaria, located in south-eastern Bulgaria. The region is typical for durum wheat production. Sowing was done up to the middle of November of each year. Field trials were organized in plots of 15 m 2 each using complete randomized design with three replications per genotype. Twenty plants per replication were used for the analysis before harvesting. The following traits were evaluated and measured each cropping season: Days-toheading (HD), Plant height (PH), Thousand kernel weight (TKW), Grain yield (Gy) and Spike length (SPL).

Genotyping with SSR markers
All uniplex PCRs were performed in a 6-µL reaction mixture containing 25 − 30 ng of genomic DNA, 1x MyTaq HS mix (Bioline), 0.05 pmol/uL of each primer and miliQ water.
PCR was performed on a Veriti96 Thermal Cycler (Thermo Fisher Scientific, Wilmington, DE, USA) using the PCR conditions described for each marker type [43,47].
Electrophoresis and visualization of SSR alleles were performed on an ABI3130 DNA analyzer as previously described [21]. Briefly, a standardized multi-pooling procedure was used to prepare SSR products for electrophoresis. After PCR, a 3-fold (for Barc SSRs) or 10-fold (for the remaining SSRs) initial dilution of the PCR products was performed. Additional dilutions (up to 1/25 − 1/75x or 500 − 750x) were done depending of the intensity of the signal after the subsequent mixing of the labelled products, respectively FAM: PET: NED (Barc SSRs) or FAM: PET (Xgwm, Xwmc and Xcfa). Nine mixed pools were used according to the length of the PCR products. Subsequently, the diluted products were mixed with labelled internal standards GeneScan™ 500 LIZ™ dye Size Standard (Thermo Fisher Scientific, Wilmington, DE, USA) and formamide, denatured and electrophorezed on an ABI3130 DNA analyser. SSR allele sizing was performed with the Gene Mapper v4.0 software program ( Thermo Fisher Scientific, Wilmington, DE, USA).

Data analysis
Allele number, frequency of alleles, observed heterozygosity (Ho), expected heterozygosity (He) and Polymorphic Information Content (PIC) were calculated using Power Marker v. 3.25 software [48].
The phylogenetic tree was constructed from the Nei 83 D A distance matrix [49] with the unweighted pair group method with arithmetic mean (UPGMA) as implemented in Power Marker v3.25. The resulting tree was visualized and annotated by using the Evolview v4 webserver [50].
The genetic structure in the studied panel of durum genotypes was analyzed using both a model-based approach in STRUCTURE 2.3 [51] and a discriminant analysis of the principal components (DAPC) [52]. For the STRUCTURE analysis, the number of the hypothetical populations was set from K = 1 to K = 10, and the analysis was run with 100,000 burn-in phases and 1,000,000 MCMC iterations in 10 independent runs in an admixture model with correlated allele frequencies.
The results were further processed using the R package pophelper [53] for determination of the most probable number of populations, according to Evanno's method [54], and visualization of the results. The number of the clusters in DAPC analysis was determined using the k-means method implemented in find.clusters function, and the resulting groups were used to run the DAPC. The analysis of molecular variance (AMOVA) was also done in R version 4.2.0.
We used principal component analysis (PCA) to examine how the genotypes from the populations established by the model-based method are clustered according to their performance in the five quantitative phenotypic traits described above. The PCA was done using the best unbiased linear predictions (BLUPs) across the three experimental seasons. The BLUP values were extracted from the model: where μ is the overall mean, g i is the random effect of the ith genotype, r j is the random effect of the jth replication, s k is the random effect of the kth season, (gs) ik is the random effect of the interaction between ith genotype and kth season and ε ijk is the random error of the model. The model was fitted using lmer function from the lme4 package [55]. All phenotypic data analysis and visualization were performed in R version 4.2.0.
The dendrogram based on 5 quantitative phenotypic data was constructed using an Euclidean distance matrix. The correlation between the genotypic and phenotypic distances among the studied genotypes was analyzed with a mantel test [56] with 10 K permutations using mantel.randtest from the ade4 package [57] in R version 4.2.0.

Genetic diversity analysis
We analyzed the genetic diversity and population structure based on the patterns of molecular diversity in 34 microsatellite loci in our durum wheat germplasm collection. The collection includes 90 accessions: 62 modern Bulgarian varieties and breeding lines and 28 introduced foreign varieties that were phenotypically evaluated during three subsequent crop seasons (2018 − 2021).
The markers were selected to cover all 7 chromosomes of wheat with a minimum of 4 SSR markers per chromosome. The diversity pattern of the 34 SSR loci across the whole set of accessions revealed a total of 234 distinct alleles ranging from 2 in loci Xwmc170-2A and Xgwm165-4AS to 15 in locus Xgwm268-1BL with a mean of 6.88 alleles per locus ( Table 1). The mean PIC was 0.5211 with values ranging from 0.1311 for marker Xgwm631 on chromosome 7AS to 0.8083 for marker Xgwm46 on chromosome 7BS. A few additional markers like Xbarc170, Xgwm389 and Xwmc607 on chromosomes 4 A, 3BS and 7 A, respectively, were also characterized with high PIC values. The gene diversity index (He or GD) ranged from 0.1350 (Xgwm631-7AS) to 0.8223 (Xgwm46-7BS) with a mean of 0.5612. High levels of genetic diversity were also observed in loci Xbarc170-4A (0.8034) and Xgwm389-3BS (0.7964). The average observed heterozygosity (Ho) across 34 loci was 0.2023 with the highest value of 0.8333 in locus Xgwm427-6AL and the lowest in locus Xgwm480-3AL (0.0222). The B-genome showed the highest mean number of alleles (7.418), GD (0.6119) and PIC (0.5742). Тhe Ho of A genome (0.2124) was higher than that observed in the B genome (0.1922) ( Table 1).
Comparative analysis of gene diversity among the germplasm pools from different geographical regions Eastern Balkans (Bulgaria) and Europe (South-Western and Central-Eastern Europe) (Supplemental Table S1, Supplemental Figure S1 and Table 2) showed the highest mean number of alleles in the Eastern Balkan germplasm pool (5.92) followed by the germplasm with origin from Central and Eastern Europe (4.50) and South-Western Europe (3.94). However, the mean GD in Central and Eastern European varieties (0.512) was higher in comparison to the Eastern Balkan (0.481) and South-Western durum wheat (0.437).
The most contributing alleles with frequencies above 60% for the observed genetic differentiation among the Eastern Balkan durum wheat and the remaining germplasm pools from South-western and Central-Eastern Europe were the alleles 305 bp, 208 bp, 134 bp and 126 bp in loci Xbarc17-1AL, Xgwm11-1BL, Xgwm376-3BS and Xgwm95-2AS, respectively, as well as 153 bp in Xgwm408-5BL and 147 bp in Xwmc24-1AS (Supplemental Figure S1).
To partition the genetic diversity among the groups of durum wheat samples, we performed an AMOVA, considering the geographical origins described above as a grouping factor. The results of AMOVA (Table 3, AMOVA) indicated that most of the genetic variation (46.57%) among the 90 varieties and lines structured in 3 populations were distributed between samples within the populations, while the variation between the populations was 21.57%. Finally, the variation within the studied accessions was 31.85% of the total variance in the whole durum wheat population.

Population structure
To analyze the genetic structure in the studied collection, we used a model-based approach implemented in the STRUCTURE software. The analysis identified two distinct peaks: a higher one at K = 2 and a lower one at K = 4 ( Figure 1A). The resulting sub-populations at K = 2 divided the varieties in a pronounced geographical manner ( Figure 1B). The first sub-population (SP1) included 43 varieties and lines, originating from Southern Bulgaria. The second sub-population (SP2), consisting of 34 genotypes, was composed of varieties originating from Northern (6) and Southern Bulgaria (1), while the remaining 25 were from Europe (South-western and Central-Eastern Europe) and 2 from the USA. The STRUCTURE bar graphic also provides information on the level of admixture in the collection. Thirteen of the genotypes (14.44%) were not assigned to any of the described sub-populations and were considered admixed. The admixed genotypes included 12 South Bulgarian varieties and breeding lines, and 1 Austrian variety. At K = 4 ( Figure 1C) the first Sub-population (SP1) was subdivided into 2 Sub-populations (SP1-2 and SP1-4). SP1-2 consisted of 22 South Bulgarian varieties and lines while SP1-4 consisted of 17 South Bulgarian genotypes of which 14 advanced breeding lines developed in the last decades and 3 varieties (Deyche, Trakiets and Kehlibar). The second Sub-population (SP2) was also divided into 2 Sub-populations (SP2-1 and SP2-3). The SP2-1 included 12 varieties of which 2 were from Northern Bulgaria and the remaining 10 were with origin from Hungary (3), Austria (3), Germany (1), Russia (2) and Ukraine (1). SP2-3 consisted of 12 varieties, among which 1 was from Northern Bulgaria, 10 from Italy and 1 from the USA. The admixed genotypes included varieties and advanced breeding lines from breeding centers located in Southern Bulgaria (17) and Northern Bulgaria (3), Italy (1), France (1), Germany (2), Austria (1), Hungary (1) and the USA (1).
The presence of four distinct subpopulations was also confirmed by the DAPC analysis. The resulting DAPC inferred groups were in excellent agreement with those formed at K = 4 in the model-based population structure analysis. The above described sub-populations were completely preserved. In addition, the admixture genotypes not assigned to any of the four sub-populations at K = 4 were assigned to one of the 4 groups formed in the DAPC analysis. Groups 2 and 4 representing the genotypes from the breeding centers in Southern Bulgaria were located separately in quadrants 1 and 4 on the DAPC graph and groups 1 and 3 including varieties mostly from Nor thern Bulgaria and South-western and Central-Eastern Europe were situated in quadrants 2 and 3 (Figure 2 and 3).

Phylogenetic analysis based on molecular data
The phylogenetic tree (Figure 3) delineated the 90 varieties into 2 main clusters in agreement with the observed 2 main sub-populations inferred by the model-based approach at K = 2 in the genetic structure analysis. The first main cluster was composed entirely of Bulgarian varieties and advanced breeding lines from Southern Bulgaria, while the second encompassed all introduced foreign varieties, all 6 Bulgarian varieties from Northern Bulgaria and one advanced breeding line from Southern Bulgaria. The two main clusters were subdivided each into two sub-clusters that were in excellent agreement with the sub-populations inferred from the genetic structure at K = 4 and the 4 groups defined by DAPC analysis (Figures 1 and 2). This further separated the  Bulgarian varieties and lines developed in the breeding centers of Northern and Southern Bulgaria. Except for line D8267, all genotypes from Southern Bulgaria were divided in the two sub-clusters of the first main cluster. Similarly, four of the varieties from Northern Bulgaria were located in the first sub-cluster and two in the second sub-cluster of the second main cluster. The highest similarity among the durum wheat was observed for the set of genotypes from Southern Bulgaria. Fifteen percent of them showed more than 90% similarity to each other and 2 advanced breeding lines (D-8401 and D-8472) were with the same allele configuration at all studied 34 microsatellite loci.

Comparison of genetic and phenotypic structures
To analyse how the genetic structure inferred by the model-based method at K = 4 is represented in the phenotypic performance of the studied genotypes, a PCA biplot analysis was performed with the BLUPs for five agronomically important phenotypic traits (HD, PH, TKW, Gy and SPL) (Figure 4). The first two composite axes explained 63.2% of the phenotypic variation − 39.1% and 24.1%, respectively for PC1 and PC2. Grain yield (Gy) and plant height (PH) most contributed to the first axis and thousand kernel weight (TKW) and days to heading (DH) for the second axis. The four sub-populations were separated in the Cartesian system of the biplot, each dominating one  of the four quadrants. Quadrant 1 was populated predominantly by members of SP1-2, quadrant 2 by members of SP2-3, and quadrants 3 and 4 by members of SP2-1 and SP1-4, respectively. Along the PC1, the genotypes were divided mainly according to their Gy and PH. The genotypes in quadrant 4 had the highest average Gy followed by those located in quadrant 1. Cluster analysis using phenotypic traits resulted in 2 main clusters ( Figure 5). As reported previously for molecular data (SSR), most genotypes were grouped . phylogenetic tree of the collection of 90 modern durum wheat varieties and advanced breeding lines. the tree was built using upgma algorithm based on nei83 D a distance [49] computed from the allele frequencies at 34 SSR loci. the geographic origin of each variety is represented by abbreviations highlighted with different colours. the barplots to the right represent the membership to subpopulations inferred by the model-based approach at K = 2 and K = 4. the two main clusters of the phylogenetic tree are marked by vertical bars.
in clusters according to the classification defined from model-based STRUCTURE analysis. Cluster 1 encompassed most of the Bulgarian genotypes originating from Southern Bulgaria, including several varieties and lines not included in subpopulation SP1 (K = 2), and 15 Bulgarian, German and Austrian accessions not assigned to any sub-population. This cluster was divided into 2 branches, each presented by different sub-populations. The first branch clustered 29 accessions with origin from Southern Bulgaria belonging to sub-populations SP1-2 (8) and SP1-4 (7), 3 varieties from sub-population SP2-1 originating from Russia (Gelios and Alena), Northern Bulgaria (Melina), 1 Italian variety (M.Meridio) from SP2-3 and 10 varieties and advanced lines not assigned to any sub-population of which 2 from SP2 in STRUCTURE. The second branch included genotypes only from sub-populations SP1-2 (13) and SP1-4 (7) and 5 not assigned to any sub-populations, all with origin from Southern Bulgaria. The varieties with the highest yield (mean 5.900 − 6.380 t/ha for the 3-year period) were mostly from SP1-2 and they were grouped close to each other.
Cluster 2 generally corresponded to sub-population SP2 (K = 2) and was also divided into 2 branches. The first branch clustered accessions from sub-population SP2-3 with origin from Italy (4), 3 varieties from SP2-1 (Austria, Hungary and Ukraine). The second branch was split into two sub-branches. The first one included four Italian varieties from SP2-3, one variety Progress from SP1-2 (Southern Bulgaria) and three admixed varieties from Italy, Germany and Hungary. The second sub-branch included genotypes from SP2-1 (6) (Austria, Hungary, Germany and Northern Bulgaria), 3 from SP2-3 (USA, Italy and Northern Bulgaria), 3 advanced breeding lines from SP1-4 (South Bulgaria), 9 admixed genotypes of which 4 from South Bulgaria, 3 from Northern Bulgaria, 1 from the USA and 1 from France.
To analyse the correlation between the two distance matrices estimated using genotypic and phenotypic data, we performed a Mantel test. The results showed a significant albeit low positive correlation between the two matrices (r = 0.28, p < 0.0001).

Discussion
Genetic diversity is vital for the success of crop improvement programs and food security because it underlies the adaptation of crops to environmental changes and ecosystem resilience [21]. It is essential for understanding the relationships between varieties, facilitating their classification and characterization with the aim of defining new breeding strategies and choices of the best parents for crosses in breeding programs in accordance with the specific regional agro-climatic conditions [20]. However, many studies on durum wheat have shown that the recent breeding activities in addition to the well-accepted scientific scenarios for the two-stepped domestication events have led to narrowing of the genetic diversity [30]. It was claimed that durum wheat has lost 84% of the nucleotide diversity present in wild emmer wheat in the process of domestication [58]. The loss of genetic diversity in modern durum wheat varieties limits the possibilities for improvement toward higher productivity and lesser vulnerability to biotic and abiotic stresses [59]. Therefore, the continuous assessment of genetic diversity to identify sources of new genetic variation in the wheat germplasm pool is a key factor in the development of high-performing varieties that have the capacity to adapt to specific agro-ecosystems and overcome the ever-changing climate conditions [21].
In this study, we described the genetic and phenotypic diversity and marker-based population structure of a panel of 90 modern Bulgarian and foreign durum wheat (T. durum Desf.) varieties in order to define the extent of genetic variation in durum wheat cultivated in Bulgaria and to outline further breeding strategies.

Genetic diversity and SSR marker distribution
The allelic diversity depends on the genetic composition of the population and is a measure of the genetic diversity, which is crucial for the population's long-term potential for adaptability and persistence. It is also an important tool to discern the genetic attributes associated with specific traits. This study showed a significant level of genetic variation in the studied panel of 90 modern durum wheat varieties and elite lines currently evaluated in the specific conditions of the south-eastern part of Bulgaria, which is often characterized by heatwaves and droughts during the vegetation period.
A total of 234 alleles were identified by 34 SSR markers with an average of 6.88 alleles per locus. The results are similar to other reports. Soriano et al. [20] reported a higher mean number of alleles (10) for a collection of 172 durum wheat landraces from 21 countries of the Mediterranean Basin and 20 modern cultivars in a study aimed at determining the genetic diversity and structure, and their agronomic performance. A higher number of alleles per locus  The polymorphic information content (PIC) across the tested loci (0.5211) indicates that the selected SSR markers, with the exception of Xwmc170 and Xgwm165, are highly polymorphic and appropriate for genetic diversity studies. At the genome level, the B genome had higher values than the A genome for a number of alleles, GD and PIC, which is consistent with the molecular and cytogenetic studies of other authors for durum wheat [4,41,61,62]. This indicated that the B genome is the most differentiated and divergent from the ancestral donors among the 3 genomes (A, B, D) of wheat.
The observed genetic differentiation among the East Balkan durum wheat and the remaining germplasm pools from South-western and Central-Eastern Europe could be attributed to the regional adaptation, different germplasm sources and breeding methods for improvement of specific traits under the particular environmental conditions as well as market requirements. Our study showed a prevalence of several alleles in the East Balkan durum wheat in several loci some of which are located close to QTLs for disease resistance or encoding important agronomic traits. Such prevalence was found in the following loci: Xwmc24-1AS located near to QTLs for yield related traits like kernel width (KW), TKW and kernel number per spike (KNPS) [63][64][65]; Xgwm408-5BL linked to Vrn-B1 [66], grain protein content [67] and to transpiration rate at early grain filling stage [68]; Xgwm11-1BL contributing to the resistance to stripe rust (yr29 and y26) [69,70] and to heat stress tolerance [71,72] and in Xgwm 376-3BS, associated with metaQTL (MQTL3) for harvest index and culm length expressed under drought [73].
The study showed that 21.57% of the total genetic variance is captured among the 3 groups according to their geographical origin, while the highest diversity was captured among individuals within the groups. Higher genetic variation (37%) among the eco-geographical groups of durum wheat grown in different zones in Tunisia has been reported by Slim et al. [60], while Soriano et al. [20] studying 172 durum wheat landraces from 21 Mediterranean countries and 20 modern cultivars have found only 13% variation among the 5 sub-populations selected by them (West-Mediterranean, modern cultivars, East-Balkan and Turkey, East Mediterranean and western Balkan and Egypt).

Population structure and differentiation of wheat varieties
The investigation of the genetic structure of durum wheat populations with different origin is of high importance for better exploitation of the available genetic diversity. Such studies also contribute to the selection of genotypes with desirable traits for the purpose of broadening the genetic basis of modern varieties in further breeding programs [21]. Here, we used the approach described by Evanno et al. [54] to analyze the output of the STRUCTURE software, and the results indicated that there are most probably two sub-populations in the studied collection of modern durum wheat. The distribution of the varieties and lines across the resulting sub-populations outlined a pronounced geographic pattern, separating almost all South Bulgarian durum wheat from those originating from South-western and Central-Eastern Europe and the remaining 6 Bulgarian varieties from Northern Bulgaria. Only one breeding line from Southern Bulgaria was located in the second main cluster together with the introduced foreign germplasm and the Northern Bulgarian varieties. The analysis of the sub-populations at K = 4, revealed subdivision of the varieties and lines originating from Southern Bulgaria and those from South-western and Central-Eastern Europe together with the varieties from Northern Bulgaria into two additional distinct groups. Further, the DAPC analysis results supported a potential genetic structure consisting of four subpopulations. Geographical patterns in the genetic structure and a subdivision of the modern durum wheat into several sub-populations according to breeding programs were reported by other authors [4,74,75] studying different panels including also European wheat varieties.
The DAPC analysis largely agreed with the accession's geographic origin and the specific climatic and agro-ecological conditions in Northern and Southern Bulgaria, as it has been pointed out by Tsonev et al. [21] for Bulgarian bread wheat varieties. The primary determinant for the group separation is the regional adaptation to the different agro-climatic zones in Northern and Southern Bulgaria determined by the Balkan Mountains, which serve as a natural barrier dividing Bulgaria into two regions with specific climate conditions: cooler and more continental in Northern Bulgaria and almost Mediterranean in Southern Bulgaria. The second main contributor to the observed separation is implementation of different germplasm sources and breeding methods for improvement of specific traits under the particular environmental conditions like tolerance to cold, drought and resistance to diseases. The observed genetic similarity among the breeding lines and varieties developed in Southern Bulgaria is strongly influenced by the breeder's objective, choice of hybridizing germplasm sources and the necessity to obtain/ develop genotypes with improved drought tolerance and semolina quality. However, the repeated use of the same founders in their crossing strategies leads to the development of genotypes with similar allelic configuration. Our study showed that the accessions from Italy, Germany, France, Austria, Russia, Ukraine, Hungary and the USA, together with the 6 Bulgarian varieties from Northern Bulgaria, are grouped in the second main cluster. The specific grouping of varieties from Northern Bulgaria together with the varieties from South-western and Central-Eastern Europe is a consequence of the use of crosses with Ukrainian, French and Spanish varieties. The goals of this breeding program included expanding the area of cultivation of durum wheat in regions with a colder climate and improvement of semolina quality [76]. Close clustering of modern varieties originating from different agro-ecological conditions provides additional evidence supporting the substantial reduction in genetic diversity in the durum wheat breeding programs worldwide pointed out by Kabbaj et al. [75] and other authors [77]. The main reason for this is the high rheological requirements of the pasta industry which push the durum wheat breeders to often use the same set of donor genotypes with high-quality traits to meet the conditions for market competitiveness [75].

Phenotypic diversity and relationship between phenotypic and genotypic data
Classification of the durum wheat genotypes using the UPGMA clustering method based on SSR and phenotypic dissimilarities showed correspondence with the results obtained using the STRUCTURE software.
Both the SSR and the phenotypic dendrograms showed the presence of 2 major clusters, which is consistent with the observed grouping of the genotypes according to their geographical origin with STRUCTURE (K = 2). However, in cases of some admixtured accessions, the dendrograms concurred in locating them in branches corresponding to different sub-populations inferred by STRUCTURE. This was the case of the varieties Duramant (Germany), Auradur (Austria) and Alena (Russia) as well as two advanced breeding lines (M712 and D8278) from Southern Bulgaria in the first Cluster and second cluster of the dendrogram based on phenotypic data.
Some discrepancies were also found in the grouping of a few varieties and breeding lines in the phenotypic dendrogram previously assigned to specific sub-populations by STRUCTURE at K = 4. This is the case with the varieties Gelios (Russia) and Melina (Northern Bulgaria) and MMeridio (Italy) from SP2-1 and SP2-3, which were grouped together with the genotypes from Southern Bulgaria in sub-cluster 1 of Cluster 1 in the phenotypic dendrogram, as well as with the breeding lines D-8382, D-8339 and D-8197 from SP1-4, which fall into cluster 2 together with genotypes from South-western, Central-Eastern Europe and the remaining 5 genotypes from Northern Bulgaria.
A few differences of the grouping of the genotypes in the branches of the phenotypic dendrogram was also observed in comparison to that of UPGMA and STRUCTURE assigning of genotypes but it was not so obvious.
Royo et al. [78] using a collection of 191 durum accessions and mainly semi-dwarf elite materials generally developed using a limited number of founders, found a weak relationship between molecular clustering and phenotypic structure. Similar results have been obtained by Soriano et al. [20] for the sub-population 'modern varieties' in their study including 172 durum wheat landraces and 20 cultivars collected from 21 Mediterranean countries via genotyping with 44 SSR markers and phenotyping for 14 agronomically important traits. They found better similarities between genetic distances and adaptive response in landraces than in modern cultivars.
Our study showed that phenotypic data were suitable for differentiating groups of accessions with highly similar or identical genetic architecture and confirmed the separation of almost all South Bulgarian durum wheat from the North Bulgarian and the remaining European durum wheat. The PCA clearly differentiated the genotypes according to their agronomic performance and showed that the South Bulgarian genotypes from the SP1-2 population were characterized by higher yield in comparison to that from SP1-4 and the remaining European and North Bulgarian durum wheat.
Although the Mantel test showed a low correlation of the distance matrices used to generate the phylogenetic tree and the phenotype dendrogram, it was significant. This could reflect that some SSR loci included in the Nei 83 Da distance calculation are located near QTLs related to the selected phenotypic traits. Additional inclusion of new SSRs and/or SNP will be of importance to better dissect the genetic structure of the selected durum wheat germplasm and will allow the selection of the appropriate germplasm for further association mapping.

Conclusions
This study revealed the genetic, phenotypic and geographic structures of the set of 90 Eastern Balkans and Western, Central and Eastern European modern durum wheat varieties and breeding lines, as well as the relationship among them. Based on the data from 34 SSR markers, the model-based population structure analysis divided the genotypes in two main populations according to their agro-geographical pattern of origin. The additional subdivision of each population was in good agreement with the constructed UPGMA dendrogram and DAPC analysis, showing the presence of 4 sub-populations following the agro-ecological zoning, breeders' germplasm choices and the preference of breeding methods. The substantial agreement between the clustering of the accessions based on the employed data analysis methods indicated a well-defined stratification of the population, driven by the geographical and environmental factors. The observed close relationship among the genotypes in Southern Bulgaria should alert breeders to improve the genetic diversity in the present-day germplasm by introducing into breeding programs durum wheat that is more diverse and well-adapted to drought conditions, including also the available landraces selected in Southern Bulgaria. This will pave the way for future planning of breeding programs for development of more adapted and resilient durum wheat varieties able to withstand climate changes and meeting the conditions for pasta industr y and market competitiveness.

Authors' contributions
Elena G. Todorovska (E.G.T.) acquired funding, conceptualized and designed the experiments, organized the molecular experiments, prepared tables, prepared the original draft, reviewed drafts and approved the final draft.
Nikolai K. Christov (N.K.C.) performed molecular experiments, analyzed data, prepared figures and tables, prepared and reviewed drafts and approved the final draft.
Stefan Tsonev (S.T.) performed molecular experiments, analyzed data, prepared tables and figures, prepared and reviewed drafts of the paper and approved the final draft.
Rangel Dragov (R.D.) performed field experiments and phenotypic evaluation, described and contributed the plant material, reviewed drafts of paper and approved the final draft.
Krasimira Taneva (K.T.) performed field experiments, reviewed drafts of the paper and approved the final draft.
Violeta Bozhanova (V.B.) conceptualized and designed field experiment, reviewed drafts of the paper and approved the final draft.

Data availability statement
All data that support the findings reported in this study are available from the corresponding author upon reasonable request.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was supported by the Bulgarian Ministry of Education and Science under the National Research Programme 'Healthy Foods for a Strong Bio-Economy and Quality of Life' approved by DCM # 577/17.08.2018.