Genome-wide diversity of Pagliarola sheep residual population and its conservation implication

Abstract Local breeds represent an underestimated resource in terms not only of their important cultural and economical role in marginal areas, but also because they often own a potential genetic pool well adapted to extreme conditions. This fact is of increasing interest, especially when considering climate global challenges where peculiar and uncommon traits could be advantageous. In this study, we genotyped 24 individuals belonging to the small residual Pagliarola sheep population using the OvineSNP50K array, in order to compare its genomic architecture with other 21 Italian local breeds. Moreover, we performed the fixation index (FST) outlier analysis to identify genes most differentiated between Pagliarola and Merino-derived Italian breeds. All population genetic analyses highlighted that Pagliarola breed represents a distinct genetic unit showing a genetic relationship with Merino-derived breeds underlying the close cultural connection between these breeds related to transhumance. However, Pagliarola breed resulted to be divided into two different sub-populations, named Pagliarola 1 (PAG-1) and Pagliarola 2 (PAG-2). Genetic diversity indices and inbreeding estimated from runs of homozygosity (FROH) indicated that while PAG-1 showed among the highest values, PAG-2 turned out strongly inbred, probably as a consequence of a founder effect. The FST outlier analysis identified the 5 most differentiated single nucleotide polymorphisms, two of which mapping within known genes (MACF1 and GLIS1) reported to be linked to feed efficiency and local adaption. All these information strongly suggest that proper conservation measures should be implemented in order to recover the Pagliarola population and its fundamental local products. HIGHLIGHTS All population genetic analyses highlighted that Pagliarola breed represent a distinct genetic unit, and clusters with the Merino-derived Italian breeds. The breed resulted to be divided into two different sub-populations. For the two Pagliarola sub-populations, the genetic diversity indices suggest a very different genetic makeup. Results from the Bayesian approach identified two know genes (MACF1and GLIS1) related to feed efficiency in livestock species and local adaptation. The information generated in this study strongly suggests that proper conservation measures should be implemented in order to recover the Pagliarola population.


Introduction
During the last century, erosion of livestock genetic resources was observed as the result of a massive replacement of low-productivity local breeds with highly productive ones. These local breeds are often reduced to populations having a small effective size, implying difficulties to manage inbreeding and withinbreed genetic diversity. In recent years, an increasing interest in recovering and preserving local breeds has taken place (e.g. Ciani et al. 2014;Mastrangelo et al. 2018). Local breeds are the result of a combination of natural and artificial selection carried on over centuries which shaped animals to adapt and produce in a variety of different environments (Ajmone-Marsan 2010; Ajmone-Marsan et al. 2014). In fact, local populations often possess valuable traits such as disease resistance, high fertility, good maternal qualities, longevity and adaptation to harsh conditions and poor-quality feed, all desirable qualities for low-input, sustainable agriculture. In addition, local breeds are a reservoir of diversity that may turn out useful to face the upcoming climate change and disease outbreaks and contribute to conserve local culture and traditional products (Ciani et al. 2013).
With a total of 263, Italy is one of the European countries with the highest number of local livestock breeds. Among these 84 are sheep breeds, 17 of which are already extinct (Bittante 2011). Italy has a long history of sheep breeding and, despite a dramatic contraction in numbers, still raise several local sheep breeds that may represent a unique source of genetic diversity (Ciani et al. 2013(Ciani et al. , 2014Mastrangelo et al. 2017).
The Pagliarola is an ancient sheep population farmed in central Italy, mainly in the Abruzzo region (Marchi and Mascheroni 1925;Fabrizio 1928;Bonadonna 1976), together with other small local populations such as Bariscianese, Pomarancina, Perugina, Pecora dell'Amiata, Todina and Vissana. This sheep finds its origins in the Apennines mountain range, and is identified as the ancestral Appenninica sheep population. Even if these small local populations were never selected to improve productive traits such as meat (Marchi and Mascheroni 1925;Bonadonna 1976;Sarti et al. 2002), they were raised for their adaptation to the Apennines environment and for their low nutritional requirements. In particular, Pagliarola was resident in the mountain villages all year round (Renieri et al. 1984). During the severe winter season, the breed was usually supplemented with straw (''paglia'') and this derived their actual name. Pagliarola is a dual-purpose breed; it is reared both for its milk and the resultant local dairy products ('Pecorino di Farindola', 'Pecorino del Matese', and 'Formaggio di Pietracatella') and for its meat ('Agnello del Centro Italia IGP', made from the meat of the lambs of the local sheep breeds of Apennines, including Pagliarola). Ceccobelli et al. (2016) reported that only a few small Pagliarola flocks were still surviving in very hard environmental conditions, thus, an appropriate breeding program in order to recover Pagliarola population would be desirable. More recently, this population has been counted for a total of 25 individuals (21 ewes and 4 rams) in the territory of Barisciano (AQ) and samples of semen from two Pagliarola rams were collected and preserved in order to establish a genetic bank of the population (Anzalone et al. 2018). This is the last estimation of the population consistency.
The availability of sheep genome-wide SNP panels allows providing background information concerning genome structure in local and cosmopolite breeds (e.g. Kijas et al. 2012;Ciani et al. 2014). Genetic diversity is a key measure for the monitoring of genetic parameters that are important for the prevention of genetic erosion, inbreeding and other deleterious processes that may lead to population extinction. In addition, an investigation of genomic variation is an important prerequisite to maintain its integrity and to ensure appropriate conservation strategies in the presence of genetic structure that is not easily identifiable with morphological data. Moreover, considering the challenging of the climate changes, the identification of genetic stock which might harbour potential genes well adapted to extreme environment, should be a topic of increasing importance (Hoffmann 2010;Biscarini et al. 2015).
The aim of this work was to study the genetic diversity and population structure of Pagliarola sheep in the Italian context using data generated from Illumina OvineSNP50K array. Moreover, a genome-scan analytical approach was performed to identify genomic regions harbouring genes conceivably associated to the breed aptitude. This paper highlights how genomic survey is a valuable tool in order to advise possible reintroduction and conservation plans. Indeed, contrariwise to other local sheep for which conservation programs, supported by several authorities (PSRN, ASSONAPA), have a longstanding strategy to manage zootechnical biodiversity (genomic characterisation, semen collection and reconstitution), fewer efforts have been made on small residual local breeds that instead represent an important reservoir for genes adapted to specific environmental conditions.

DNA sampling, genotyping and quality control
Blood samples were collected from 24 individuals of Pagliarola sheep. Genomic DNA was extracted using the Genelute Mammalian Genomic DNA Kit (Sigma). The concentration of extracted DNA was assessed with NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE). All animals were genotyped for 54,241 single nucleotide polymorphisms (SNPs) using the Illumina OvineSNP50K array. The raw data have been merged using the software PLINK (Purcell et al. 2007) with the genotypic data of 21 Italian sheep breeds assembled by the Italian Sheep Consortium (BIOVITA) (Ciani et al. 2014). The genotypes of Merinizzata Italiana breed have been retrieved from the dataset used in Ciani et al. (2015). In addition to these Italian breeds, the Ile de France sheep (IDF) was considered due to its likely cross-breeding with the Pagliarola breed. Genotypic data of IDF breed were available from Rochus et al. (2018). After filtering for minor allele frequency (-maf 0.05), missing genotype call rate (-geno 0.01), linkage disequilibrium (r 2 0.02) and missngness per individual (-mind 0.1), a final dataset consisting of 29,970 SNPs was obtained and one Pagliarola individual has been removed.

Analysis of genetic relationships and population structure among breeds
Genetic structure of breeds has been inspected by using the maximum likelihood clustering approach as implemented in the software ADMIXTURE (Alexander and Lange 2011) with K values ranging from 2 to 24. The results have been plotted using the membercoef.circos function in the R package BITE (Milanesi et al. 2017).
To explore the genetic differentiation among breeds, we performed a multidimensional scaling plot (MDS) based on pairwise identical-by-state (IBS) distances using the software PLINK (-cluster, mds-plot 3) and visualised using the R package ggplot2 (Wickham 2011).
In order to reconstruct genomic relationships among all populations and to investigate the presence of patterns of population splitting and migration events, a maximum likelihood tree has been generated using the software TREEMIX (Pickrell and Pritchard 2012). For this analysis we also used the genotypic data of Soay sheep (Kijas et al. 2012) as an outgroup, and allowing migration edges from 1 to 10. The best predictor for the number of migration edges was selected using the optM function in the R package OptM (Fitak 2018).

Genetic diversity indices
Genetic diversity indices such as observed (Ho) and expected (He) heterozygosity and minor allele frequency (MAF) have been estimated using PLINK, while trends in historical effective population size (Ne) for the 13th generation (presumed to be contemporary Ne), based on linkage disequilibrium (LD), have been estimated using the software SNeP v1.1. (Barbato et al. 2015).
In addition, to have an estimate of the inbreeding within and among breeds, we performed runs of homozygosity (ROH) using PLINK and the following parameters: a sliding window of 50 SNPs across the genome; the minimum number of consecutive SNPs included in an ROH was 50; the minimum length of an ROH was set to 4000 kb; the maximum gap between consecutive homozygous SNPs was 1000 kb; a density of one SNP per 100 kb; and a maximum of one SNP with missing genotype and up to one heterozygous genotype were allowed in an ROH. The inbreeding coefficient (F) on the basis of ROH (F ROH ) for each animal was calculated as follows: where L ROH is the total length of all ROH in the genome of an individual and Laut is the specified length of the autosomal genome covered by SNPs on the chip (2452.06 Mb).

F ST -outlier analysis
Based on the genetic structure results, the F ST -outlier approach as implemented in the BAYESCAN software (Foll and Gaggiotti 2008) was adopted to identify loci involved in the differentiation between the Pagliarola 1 (see Results and discussion section) and the other Merino-derived Italian breeds (GDP, SPV and MER), using the default parameters. To control the number of false positives, significant markers were defined by applying a q-value threshold of 0.05 and using the 0.9999 SNPs of F ST percentile distribution (Moscarelli et al. 2021). Chromosomal coordinates for each SNP were obtained using the sheep reference genome Oar 3.1.

Results and discussion
High-throughput genotyping arrays have greatly facilitated the study of genetic structure in livestock species, but whereas much effort has been devoted to study population structure within dominant commercial breeds, local breeds, which are less widely used, are generally understudied (Beynon et al. 2015). This study provides the first overview of population structure estimates of the Pagliarola sheep from a genomewide perspective.

Analysis of genetic relationships and population structure among breeds
The ADMIXTURE analysis, MDS and TREEMIX were used to visualise and explore the genetic relationships between Pagliarola and the other sheep breeds.
Population structure inferred using the software ADMIXTURE, with K ranging from 2 to 24, showed the lowest CV error at K ¼ 18 (Supplementary Materials, Figure S1). At this K value, the ADMIXTURE components highlighted that all breeds have their own genetic identity, with the exception of Biellese and Bergamasca breeds which appear to be indistinguishable and Comisana and Pinzirita which share a large proportion of genetic components (Mastrangelo et al. 2014) (Figure 1). At K ¼ 2, Ile de France breed first differentiated while Merinizzata showed high level of admixture when compared with all the other Italian native breeds. At K ¼ 3, Alpagota, Biellese and Bergamasca separated, while the Pagliarola breed resulted to be divided into two different sub-populations which are clearly distinguishable at K ¼ 4, and hereinafter named Pagliarola 1 (PAG-1, n ¼ 14) and Pagliarola 2 (PAG-2, n ¼ 9). It should be noted that despite the number of individuals belonging to the two Pagliarola subgroups was quite small, it has been empirically demonstrated that for population structure analyses, the patterns observed using six randomly extracted animals per breed closely mirrors those inferred from 20 to 24 animals per breed (Gaouar et al. 2017).
These two genetic clusters (PAG-1 and PAG-2) match with the two different farms where the samples were taken. Therefore, the genetic structure detected for Pagliarola sheep could be due to the introgression of genes from other breeds (PAG-1) and/or to geographical isolation for a long time with genetic drift  Table 1.
(PAG-2). These results were also in agreement with a previous study conducted on the genetic structure and relationship using microsatellite markers (Ceccobelli et al. 2016).
Component 1 of the MDS plot reveals a clear differentiation of breeds from central Italy (MER, PAG-1, PAG-2, SPV and GDP) including the Ile de France breed which as expected is shown as the most differentiated. On the other hand, component 2 reflects a latitudinal genetic cline with breeds from northern Italy located mostly in the upper portion of the plot, while breeds from southern Italy and Sardinia cluster in the lower quadrant of the plot (Figure 2). This strong phylogeographic pattern of sheep breeds has been highlighted in previous studies at both worldwide (Kijas et al. 2009(Kijas et al. , 2012 and regional level (Ciani et al. 2014;Mastrangelo et al. 2017). Such a geographic structuring has been suggested to be the outcome of a combination of adaptive processes driven by natural selection and historical admixture prompted by human selection. Moreover, the observed phylogeographic pattern mirror the distribution of human genetic diversity (Brisighelli et al. 2012;Anagnostou et al. 2017) underling the close connection between livestock diversity and historical human migrations/barriers as highlighted in other livestock species . For the Pagliarola sheep, these results confirmed the findings based on the ADMIXTURE, where, the population formed two separated and differentiated clusters.
The TREEMIX phylogram shows a clear distribution of clusters according to the geographic origin of breed ( Figure 3). Indeed, we observe a group including all breeds from Central-Eastern Alps and Central Apennines (IPK, APG, BGM, BLL, APP, FBS), a second group including Western Alps (ODL and SBC), a group including Central Apennines breeds (GDP, SPV, MER, PAG-1, PAG-2) and another group including breeds from Central and Southern Italy (LAT, BAG, AMR), Sicily and Apulia (LCC, CMS, PZR, VDB) and Sardinia (SAR, SBK). In this context, Pagliarola clusters with the Merino-derived Italian breeds (Figure 3). This result is not surprising considering the close connection between Pagliarola and some Merino-derived Italian breeds. In fact, up to the 1950s of the last century, the Pagliarola was traditionally reared in the same flocks with Gentile di Puglia, though the management conditions of the two breeds were different (Renieri et al. 1984). This topology remained reasonably constant through iterations. The linear method implemented in the OptM (R Package OptM) function indicated a major changing point in the log likelihood at four  Table 1. migration events (Supplementary Materials, Figure S2). A strong migration event is shown between Soay and the node separating Merinizzata and Ile de France breeds. This migration event is probably linked to the Northern European origins of both Ile de France and Soay breeds. On the other hand, the close relationship of Merinizzata with these Northern European breeds is due to its known admixed origin between local Merino-derived breeds and exotic Merino meat breeds including Ile de France (Morbidini et al. 1995;Sarti and Panella 1999). Another strong migration edge is shown between the base of the crown including Alpagota, Biellese and Bergamasca and the Leccese breeds. This result is in accordance with documented livestock exchange between Southern and Northern regions in order to improve production (Ciani et al. 2014). An additional migration event is highlighted between Biellese and Sambucana breeds. In this context, the use of Biellese to improve meat yield in other autochthonous breeds including Sambucana is well known (Bigi and Zanon 2008). Finally, another migration event is shown between Pagliarola (PAG-1) and the node including Merinizzata and Ile de France breeds indicating some gene exchange between these breeds. Even if Pagliarola has been considered as one of the local populations of the Apennines (Marchi and Mascheroni 1925;Bonadonna 1976) from which Appenninica breed has been selected after crosses with meat selected breeds , neither migration events or drift parameters indicated a genetic relationship between these two breeds.

Genetic diversity indices
Genetic diversity indices, which are key parameters in the genetic management of populations, are reported in Table 1. The results revealed that Pagliarola 1 and Pagliarola 2 tended to display the highest and the The effective population size (N e ) estimates for each breed are also reported in Table 1. The minimum N e value was found for Pagliarola 2 (N e ¼ 20), the maximum value was found for Bergamasca (N e ¼ 130), while Pagliarola 1 showed a small N e of 48. For the two sub-populations of Pagliarola breed, N e was smaller (average 34) than for all other breeds (average 94.5). Consistent with our results, Mastrangelo et al. (2017), reported a contemporary effective population size of 25 in the Barbaresca sheep, which, like Pagliarola, is an endangered population that has experienced a dramatic census contraction.
Individual genomic inbreeding have been evaluated using ROH analysis as reported in Figure 4. The highest mean value of F ROH is observed in Pagliarola 2 (0.133), followed by Leccese (0.068) and Valle del Belice (0.067), while the lowest values are observed in Comisana, Bergamasca and Pagliarola 1 (0.008, 0.014 and 0.024, respectively) ( Table 1). For the Italian sheep breeds, similar low values were previously reported (Mastrangelo et al. 2018), in particular for small and endangered breeds, as Altamurana and Barbaresca. The total number of ROH and the total genomic length for each animal of the two Pagliarola sub-populations are shown in Figure 5. For Pagliarola 1, the majority of the animals clustered near to the origin of coordinates with a total length <100 Mb. For the Pagliarola 2, there were individuals with a larger number of ROH (from 20 to 70) and with the total length of genome in ROH more than 500 Mb. Therefore, Pagliarola 2 showed ROH patterns typically produced by inbreeding or strong population bottlenecks. Generally, in the farming system of local sheep breeds, the farmers use rams born in the flock and the exchange of animals is quite unusual. This leads to an increase of inbreeding within the population due to the uncontrolled mating between genetically related individuals (Mastrangelo et al. 2012). Therefore, in these small populations, such as Pagliarola breed, directional mating and the use of selected rams can be considered as encouraging breeding practices to prevent higher increases of inbreeding in order to preserve genetic diversity.
For the two Pagliarola sub-populations, the genetic diversity indices suggest a very different genetic makeup. Indeed, all genetic diversity indices indicate that Pagliarola 2 sub-population could be the result of a strong genetic drift, as underlined by high level of homozygosity (F ROH ) and a low value of effective population size (N e ) and heterozygosity (Table 1 and Figure 4), confirming the results on the genetic structure above reported. One potential explanation for such a trend is a combination of a founder effect facilitated by a small genetic pool from which the Pagliarola 2 sub-population may have originated. Strong differences in terms of genetic drift are not an exception in livestock and have been repeatedly reported also in other species (Lawson Handley et al. 2007;Ciani et al. 2015;Nicoloso et al. 2015;Lenstra et al. 2017;Senczuk et al. 2020;Moscarelli et al. 2021).
Moreover, this small genetic background could have been even amplified by possible attempts to keep the subpopulation as 'pure' as possible. Unlike Pagliarola 2, Pagliarola 1 showed the highest genetic diversity values when compared with all the other native Italian breeds, likely because of its admixed origin. Such a result is very important to underline that an urgent recovery of this population is needed in order to not lose a marked genetic potential.

F ST -outlier analysis
The genome-wide distribution of pair-wise F ST values was used to identify highly differentiated SNPs between Pagliarola 1 (considered to be the most likely original genetic stock) and the other three Merinoderived Italian breeds (GDP, SPV and MER). Results from the Bayesian approach identified a total of 5 outlier SNPs (Table 2) located on OAR01, OAR10 and OAR15. Among the significant markers, two SNPs on OAR01 mapped within know genes (MACF1and GLIS1).
The MACF1 gene plays crucial role in biological processes related to feed efficiency in livestock species (Reyer et al. 2017;Lam et al. 2018). The GLIS1 is reported as a candidate gene for feed efficiency (Cockrum et al. 2012) and local adaption (Sweet-Jones  Table 1.

Conclusion
This study has reported, for the first time, genetic diversity and population structure estimates from a genome-wide perspective in the Pagliarola sheep. The genetic structure analyses show that Pagliarola breed own a distinct and unique genetic background composed of two different genetic groups with a common origin but characterised by a different genomic make up. Indeed, Pagliarola 2 showed a marked genomic inbreeding, low values of both heterozygosity and effective population size, suggesting that a strong genetic drift occurred in this population. On the contrary, Pagliarola 1 showed low genomic inbreeding values and high values of heterozygosity. These results stress the importance of genetic investigations before specific conservation strategies, including a possible reintroduction of the original genetic pool, can be implemented. Indeed, basing on our outcomes we strongly encourage a reintroduction from the original genetic stock of Pagliarola 1 population. Moreover, concerning the genetic relationships with other local breed from the Apennines, although Pagliarola has been considered as one of the ancestral populations from which Appenninica breed derived, we found a closer relationship with Merino-derived breeds. Such a genetic relatedness might be the outcome of the close cultural connection related to transhumance. Indeed, despite Pagliarola breeds was not an active transhumant breed was part of the same herds during the non-transhumant periods. Finally, the BayeScan results demonstrate the importance to preserve the Pagliarola genome as genetic reservoir of alleles particularly adapted to low-input farming system. The information generated in this study strongly suggests that proper conservation measures should be implemented in order to recover the Pagliarola population and its fundamental local products.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Ethical approval
All procedures involving animal sample collection followed the recommendation of directive 2010/63/EU. Sampling was carried out by trained veterinarians within the frame of vaccination campaigns, hence no permission from the animal research ethics committee was necessary. Veterinarians adhered to standard procedures and relevant national guidelines to ensure appropriate animal care.

Data availability statement
The data that support the findings of this study are available upon request from the corresponding author.