Genetic diversity of the Italian thoroughbred horse population

Abstract For over three centuries, thoroughbred (TB) horses have been selected exclusively for traits enhancing racing performances. Officially the TB origins can be genetically traced back along the male line due to the crossbreeding among 50 English royal mares and four Arab oriental stallions in 1700s. Because of the TB population is tightly controlled, possible loss of genetic variability should be carefully evaluated for breed conservation and management programme. To this aim FTA® cards of 10032 TB foals over a 14-year new-born campaign period were collected for genetic variability evaluation using 16 microsatellite markers. Total number of allele was 118 with a mean value of 7.37 alleles per locus; Polymorphic information content (PIC) values were above the informative threshold (PIC >0.5) for all microsatellites except for HTG4 and HMS2 whereas no significant differences were showed between the mean expected (He) and observed heterozygosity (Ho) values (0.674 vs. 0.675, respectively). Hardy–Weinberg proportion exhibited no statistically significant deviation from equilibrium (p < .05) for all loci; the inbreeding coefficient mean was close to zero suggesting very low probability of autozygosity. The number of genotypes observed (Ng) was calculated for each microsatellite and the most representative was found at the HMS2 locus (LL, frequency 0.557). Parentage testing was also investigated reporting a combined probability of identity (PI) for the 16 loci of 4.1 × 10−14 while the probability of exclusion (PE) exceeded 99% in all cases. Overall data reported a reasonable level of informativeness which genuinely reflect the narrower genetic structure of the Thoroughbred population. Highlights Thoroughbred population from Italy did not show any significant evidence for close inbreeding; a moderate level of genetic diversity was found at some loci yet still adequate for parentage and identity verification. Appropriate management programme could be placed to introduce new genetic variability in the Italian Thoroughbred population.


Introduction
Historically the origins of the thoroughbred (TB) horse are traced back to the 1700s driven by the enthusiasm of the British aristocracy for horse racing (Cassidy 2002). Officially TB breed was formed by crossing a restricted group of selected English native mares (the Royal Mares) with four Arab oriental stallions (Hewitt 2006) from which the whole line of modern race thoroughbreds is today recognised to descend (Cunningham et al. 2001). It's been estimated that roughly one third of the genes of the current TB population comes from the foundation sires and more than a 50% is due to the top 10 horses described by Cunningham (1991); if the ancestors contributing at 1% or more are included, the genealogy extends of 21 more horses raising overall about 80% of the modern TB gene asset (Cunningham 1991). Since then TB horses have been artificially selected for centuries uniquely for strength and speed traits enabling superior performances in races. As the intense selective breeding process proliferated the need of a pedigree registry became apparent in the second half of the 1700s. Formal rules of racing were detailed and the first Studbook was then published to keep record of the horses competing and pedigrees (Weatherby 1791).
The first genealogical Book of the Italian-born TB horses was inaugurated in 1875. Nowadays the Italian Ministry of Agriculture (MIPAAFT) is recognised as Studbook keeper and sole authority allowed to register TB horses (MIPAAFT 2018). To maintain the book integrity, horse breeders have to provide foals parentage records to the Italian Authority which validates the declared genealogy and generates the pedigree for registration. Currently the estimated TB horse population in Italy is approximately 91,000 and a limited stallion-to-mare covering ratio has been observed over the last few years which may negatively impacts the genetic diversity (MIPAAFT 2018) ( Figure 1). Further concerns raise from the limited genetic line of the breed enhanced by a selective conservation programme which makes the TB population considerably controlled. Because of the closed population structure the genetic variability should then be carefully monitored to gain comprehensive information for breed management and conservation (Cunningham 2001). It's today worldwide accepted that safeguarding of the TB breed relies on DNA genotyping test by using STRs microsatellite markers (Binns et al. 1995;Bowling et al. 1997). Because of the co-dominant mode of inheritance and the discriminant power the use of Short Tandem Repeats represents an effective support to estimate genetic diversity, confirm individual identification and define unambiguous parentage in horses Tozaki et al. 2001;Ling et al. 2011;Dorji et al. 2018). To our knowledge no genetic population studies to evaluate microsatellite informativeness have so far been reported for the Italian Thoroughbred population. To this aim genetic variability of the Thoroughbred horse population from Italy has been investigated using a total of 16 microsatellite loci. Insight will also be provided regarding the microsatellite panel efficacy for parentage and identity verification.

Materials and methods
DNA extraction and microsatellite analysis FTAV R Card (Flinders Technology Associates, Whatman, United Kingdom) provides a cost-effective method for collecting and processing nucleic acids from a wide variety of matrices (e.g. blood spot) (Whatman Inc. 2003). Purification of sample spotted onto the FTA paper involves washing of the filter disc followed by drying and PCR in situ as the DNA remains immobilised on the matrix. FTA cards of 10,032 TB foals over a 14-year new-born campaign period (2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017) were collected for pedigree verification as determined by the Italian Authority [MIPAAFT]. Genomic DNA was recovered by punching a 1.2 mm FTA disk into a 96well plate using a BSD 600 Duet automated Puncher (Microelectronic Systems, Brendale, Queensland, Australia). To meet high throughput demands DNA extraction and genotyping procedures have been performed both in a single plate using a liquid handling system (MicroLab Star, Hamilton Robotics, USA) (Tack et al. 2007). DNA fragments were co-amplified in a multiplex PCR using the Equine Genotypes Panel 1.1 Kit (ThermoFisher Scientific, Waltham, Massachusetts, USA) according to manufacturer instructions. The kit includes 12 loci recommended by the Equine Genetics and Thoroughbred Parentage Testing Standardization Committee of the ISAG and 5 extra markers: AHT4, AHT5, HMS2, HMS6, HMS7, HTG4, VHL20, ASB2, HMS3, HTG10, ASB17, ASB23 and HTG6, HTG7, HMS1, CA425 and LEX3, respectively (not included in this study). Multiplex-PCR products were electrophoresized using the ABI PRISM 3130xl Genetic Analyzer (ThermoFisher Scientific, USA) and the Standard GeneScan-500 LIZ employed as size reference (ThermoFisher Scientific, USA).

Statistical analysis
Fragment analysis was performed using the GeneMapper ID v5.0 software (ThermoFisher Scientific, USA) and the allelic variants called according to the equine international nomenclature (Van de Goor et al. 2010). The total number of allele (N a ) and their frequency (A f ), the effective number of alleles (N e ), observed and expected heterozygosity (H o and H e , respectively), inbreeding coefficient (F) and the probability of identity and exclusion of a locus (PI and PE) were obtained using the GenAlEx v6.51 package for population genetic analyses (Peakall and Smouse 2012). The software was also employed to compute the Number of genotypes observed (N g ), the Major Genotype observed (M go ) within the TB population and its frequency (M gf ) for each locus. Cervus 3.0.7 software (Kalinowski et al. 2007) was used to estimate the polymorphic information content (PIC), null allele frequency estimation (Null f ) and the deviation probability from the Hardy-Weinberg equilibrium (HWE) following sequential Bonferroni correction. The null allele frequency was estimated based on the observed and expected genotypes frequencies using ten iterations of the algorithm described by Summers and Amos (1997). Parentage exclusion when the other parent is known (PE 1 ), when the genotype of one parent is missing (PE 2 ) and when a putative parent pair is excluded (PE 3 ) were calculated following Jamieson and Taylor (1997).

Results
A total of 10,032 TB foals were DNA-typed using 16 microsatellite marker and the data are presented in Tables 1-4. According to Barker (1994) the microsatellite loci recommended for genetic diversity evaluation should have a number of alleles greater than 4 (N a >4) with an effective number of alleles per locus greater than 2 (N e >2). The total number of alleles (N a ) for the 16 microsatellites was 118 with an average of 7.375 alleles per locus ranging from 5 (HMS1) to 10 (HMS2). Effective number of alleles (N e ) showed a mean value of 3.385 ranging from 1.723 for HMS2 to 5.308 for  Hardy-Weinberg proportion was calculated and no statistically significant deviation from equilibrium (p < .05) was showed after sequential Bonferroni correction. The allelic frequencies at each microsatellite marker are showed in Table 2. The highest allele frequency was reported for the allele L at the HMS2 locus (0.747) while an uneven distribution was showed for the other alleles within the population. The number of genotypes observed (N g ) for each microsatellite varied from 8 (HMS1) to 36 (ASB2) and the average was 20.625. The most representative genotype was LL at the HMS2 locus found in more than a half of the tested individuals with a frequency of 0.557 (Table 3). The null allele frequency ranged from À0.004 for ASB17 to 0.005 for CA425 with an average of 2 Â 10 À4 (Table 3). Combined Probability of Identity (PI) for the 16 loci was 4.061 Â 10 À14 when random mating was assumed while decreasing to 2.701 Â 10 À06 for expected full-sib population. PI by locus varied from 0.061 for ASB2 to 0.362 for HMS2 (Table 4) with an average of 0.166. Probability of exclusion (PE) for increasing locus combinations exceeded 99.99% when the other parent is known (PE 1 ) and when a putative parent pair is excluded (PE 3 ) whereas a 99.51% was obtained when the genotype of one parent is missing (PE 2 ). PEs by locus are listed in Table 4.

Discussion
As rule, the Italian TB Studbook includes horses foaled in Italy and/or imported as long as they have been recorded in any Studbook approved by the International Stud Book Committee (ISBC). To be registered, a TB foal must be the product of a live cover thereby pedigree verification plays a pivot role in protecting the studbook integrity and monitoring the genetic variability. A limited number of individuals are today used for breeding programmes making TB population at risk of loss of genetic variability. The aim of the present study was then to evaluate the genetic diversity for the Italian TB population and provide comprehensive data for parentage and identity testing using a panel of 16 ISAG microsatellite. A total of 10,032 Italian-born TB foals were genotyped and a reasonable level of genetic diversity was observed as indicated by the allele number, heterozygosity observed and PIC. According to Botstein et al. (1980) when the expected heterozygosity and PIC values  exceed 0.6 and 0.5, respectively, the microsatellite markers are considered to be highly informative. In the present study we found 4 STRs markers among 16 that did not reach the expected heterozygosity threshold value (CA425, HMS2, HTG4 and HTG6) and two of them had not PIC values higher than 0.5 (HMS2 and HTG4) showing consistent levels with the degree of markers informativeness. Locus HMS2 was found to be essentially the less polymorphic probably due to the lowest level of heterozygosity and PIC and because of the very high frequency of the allele L which was the most representative of all within the TB population (overall allele L count ¼14,981). A group of 5582 individuals out of 5795 total observed homozygotes was found to have LL genotype at this locus with a frequency of 0.557. Apparent excess of homozygotes and allele abundance were therefore investigated at all loci for possible large allelic dropouts leading to false genotyping and allele overestimation (Pemberton et al. 1995;Hoffman and Amos 2005;Okello et al. 2005). No evidence for null alleles was detected. The estimated null alleles frequency (Null f ) slightly varied from negative to positive values with an average close to zero assuming only a very low genotyping error rate (Null f threshold <0.2) (Dakin and Avise 2004). Failure of scoring due to allelic non-amplification usually shows a characteristic homozygous-homozygous mismatch in known parent-offspring comparisons (Pemberton et al. 1995) hence we were prone to exclude it. Genotyping errors are also expected to cause deviation from the Hardy-Weinberg proportion; according to the data the observed heterozygosity did not differ significantly from the expected values which follow the pattern showed for the inbreeding coefficient. The mean inbreeding value and by locus were close to zero thereby confirming there was no evidence of homozygous excess for the TB population. Speculations from Binns et al. (2012) suggest that inbreeding in TB has increased significantly over the last four decades. Few stallions are today covering much more mares than previously and top stallion can breed up to 200 mares leading to detrimental consequences such as increasing of the frequency of recessive genetic disorders. Our data reported no clear evidence of inbreeding occurrence while showing a lower genetic diversity of the population due to a smaller gene pool. Breeding season in Italy exhibited in the last decade a dramatically decreased number of foals per year with a stallion-to-mare mean covering ratio nearly to 1:14 which means 1 stallion every 10 new-born as average (Figure 1). Selective breeding based on racing performance traits is today largely practiced worldwide; a trait happened to be uniquely expressed by an ancestor no longer available for breeding purpose is commonly used as reference in Thoroughbred line selection. Such a breeding strategy is a common way to propagate the gene pool from an outstanding ancestor while keeping low the inbreeding level as mates distantly related could give no suggestion for inbreeding because of a lower coefficient of relationship. However, while inbreeding can assure a trait is passed to offspring it could lead to negative side effects since the inbreeding coefficient might not reflect the presence of some hidden deleterious recessive alleles (Lacy et al. 1996;Hedrick and Garcia-Dorado 2016). The effective number of alleles (N e ) is the number of alleles required to provide the same expected heterozygosity (H e ) as in the population, if they had the same frequency. The results of the present work showed a low N e /N a ratio at some loci. Among the 16 markers, HMS2 showed the highest number of allele while having the lowest number of effective allele (10 vs. 1.723). Large discrepancies between N a and N e at the given locus confirm the presence of low frequency alleles in the population and the predominance of only a few of them. A numbers of studies concerning genetic diversity in horses have been reported so far. Seo et al. (2016) reported levels of heterozygosity and PIC of almost one order of magnitude higher than our study when the same microsatellite markers are used in Halla horses. Consistent with our study they also found high frequency of the allele L at the locus HMS2 (0.421) along with a high PIC informativeness (0.717). Surprisingly Cho (2007) reported no evidence of the allele L at HMS2 locus for a small group of Thoroughbred horse (N ¼ 26) while showing slightly lower observed heterozygosity and PIC levels compared with our study. However, genetic diversity indices for TB horses from Spain, France, Korea and Bosnia Herzegovina reported values comparable to ours with a few of exception (Lee and Cho 2006;Marletta et al. 2006;Leroy et al. 2009;Rukavian et al. 2016). These results may be explained by the widespread use of a relatively small number of individuals actively covering across the world although some countries are much more genetically involved than others. Because of the pedigree of a proven winner is still captivating for breeder, temporary or permanent importation/exportation for breeding purpose take place regularly spreading very similar variability across those TB populations. The results of our study involved uniquely Italian-born foals without taking into account the diversity of the imported stallions fraction. The Italian Equine Genetic Database (Unire Veterinari 2015) was therefore interrogated to retrieve the genetic profile of those stallions which covered the most in the last year in Italy (threshold >5 foals) to corroborate whether the lower informativeness at some loci could be attributed to a limited number of individuals and traced back to a well-defined geographical area or not. We found 27 stallions from Ireland, USA and Great Britain mainly, but also France and Italy, accountable for over 300 Italian-born TB foals. Allelic profiles were reviewed and the allele L at the HMS2 locus was found to be ubiquitous for all the stallions coherently with the observed foals pedigrees. Particularly, 16 of them carried the homozygous configuration and were responsible for over 200 newborn foals, 6 were heterozygous and a few of stallions were lacking of information at this locus; for these the allelic profiles were inferred from the fathers' line which were found to carry at least one copy of the allele L. These findings support our data though they do not reflect the genetic diversity of the Italian TB population in sensu stricto; they rather refer to a larger percentage of the whole TB breed as consequence of the gene flow occurring among the donor countries. Because the genetic contribution from countries like Japan or Australia is poorly represented for the Italian thoroughbred population these results could be integrated with the data from the others studbooks by gathering information about those countries where the stallions are actively breeding the most.
The present study was also aimed to provide comprehensive information regarding the variability of the STRs markers in parentage and identity verification. The likelihood of observing two identical genotypes from two randomly chosen individuals was measured by assessing the Probability of Identity (PI); the microsatellite panel was also evaluated to estimate the probability of exclusion (PE) in TB horses. According to the recommendations provided by the International Stud Book Committee (ISBC) the PE in TB horse must be higher than 99.95% when running parentage test (Tozaki et al. 2001). Overall, we reported greater values for PE 1 and PE 3 whereas PE 2 was below the ISBC threshold value although 4 more markers were included to the 12 required by the ISAG. The power of solving disputed parentage relies on the degree of informativeness of a marker which in turns depends on PIC, He and allele frequency values of the local population. Our study reported lacking of information for some loci, two of which (HMS2 and HTG4) are part of the ISAG Primary Equine DNA Panel. Thus we suggest that additional markers should be used to gain reliable combined PE value when the genotype of one parent is missing. On the other hand, the microsatellite markers showed to be suitable for individual identification.

Conclusions
In conclusion, the findings from the current study outlined that the TB breed from Italy is essentially less differentiated. The microsatellite panel used in present study showed that a few alleles are poorly represented within the population. However, a reasonable level of genetic diversity was found yet still adequate for pedigree verification. The data reported no clear evidence of close inbreeding occurrence; however the existence of a milder form of inbreeding due to a few remote common ancestors could be presumed by the lower genetic diversity detected. Appropriate management strategies could be then placed to introduce new variability among the Thoroughbred populations. Since TB selective breeding look mostly at speed traits some other non-apparently performance-related values (e.g. vigour) are usually neglected in breeding and often inadvertently bred out. As consequence the frequency of recessive alleles could then hiddenly increase and go unnoticed until a genetic disorder show up. Breeding by selecting different aptitudes from various TB strains might instead harmonise a fully set of performance-based traits as well as reduce the odds of recessive deleterious mutations inheritance. The foremost management approach is to look both at the pedigrees and studbooks before breeding to avoid mating among closely related individuals. The stallionto-mares covering ratio should be kept as low as possible and the rate of inbreeding monitored to maximise genetic diversity while reducing the risk of loss of variability. Finally new genetic variability from those TB strains usually less contributing in Italy should be addressed while ensuring the protection of breed integrity.

Ethical approval
Samples were collected by authorised Veterinary on behalf of the Italian Ministry of Agriculture (MIPAAFT), which states and approve the sampling procedures.