Genome-wide association study for milk production traits in an economically important local dairy sheep breed

Abstract In this study, we conducted a genome-wide association study (GWAS) for five milk production traits in the Valle del Belice sheep. Repeated measurements for milk yield (MY), fat percentage and yield (F% and FY) and protein percentage and yield (P% and PY) on 481 ewes, were available for the analysis. The animals were genotyped using the Illumina Ovine 50k BeadChip. Weighted deregressed breeding values (DEBVw) were used as phenotypes for GWAS analysis. A total of 23 genome-wide significant SNPs were identified: 3 associated with MY, 9 with FY, and 11 with P%. Several SNPs mapped within known candidate genes or previously reported QTL for milk production traits in livestock species. Additional interesting markers were identified on OAR3 for FY and P%. These SNPs supported some previous findings and also added new information useful to understand the genetic mechanisms underlying the milk production and quality traits in dairy sheep. Highlights A total of 23 significant SNPs were detected. Several SNPs mapped within known candidate genes or previously reported QTL for milk production traits. These results could provide information to understand the genetic architecture of milk production traits in dairy sheep.


Introduction
Dairy sheep represent an important resource for the development of the economies of hill and mountain areas, in which other economic activities are difficult to develop (Scintu and Piredda 2007). The Valle del Belice is the main sheep breed reared in Sicily and is used for the production of traditional raw milk cheeses, Pecorino Siciliano and PDO 'Vastedda della Valle del Belice'. Therefore, it appears obvious that milk production traits and their genetic improvement are of great interest in this breed.
Genome-wide association studies (GWAS) represent a powerful tool to identify causal genes associated with important traits in livestock, especially when dealing with complex traits such as milk production traits. GWAS based on high throughput single nucleotide polymorphism (SNP) genotyping technologies opened a broad avenue for exploring candidate genes associated with these traits, making the identification of these genes crucial to understand the genetic architecture of milk production and composition (Goddard and Hayes 2009;Wiener et al. 2011;Liu et al. 2020). GWAS studies in sheep are more recent and less widespread than those of other animal species (Johnston et al. 2011;Zhang et al. 2013;Mastrangelo et al. 2018 ;Sutera, Moscarelli, et al. 2021). In particular, lactation performance has been the focal point of the genetic improvement of dairy sheep over the past decades (Garc ıa-G amez et al. 2012;Di Gerlando et al. 2019;Sutera et al. 2019;Usai et al. 2019;Li et al. 2020;. However, so far, little overall consensus has emerged from these studies. In GWAS of domesticated animals, often, the researchers may have available raw phenotypes of animals and also their estimated breeding values (EBVs) from pedigree-based analyses of historical data. Some animals may have genotypes and EBVs based on information from their relatives, but no direct phenotypes. Although EBVs were used as dependent variables in GWAS (Johnston et al. 2011;Becker et al. 2013), this approach gave a high false positive rate (Ekine et al. 2014). As an alternative, the EBVs can be 'deregressed' (DEBVs) and to account for the heterogeneous variance due to differences in breeding value accuracy of individual, the weighting factor and deregressed weighted estimated breeding values (DEBVw) were also proposed (Garrick et al. 2009).
The aim of this study was to conduct a GWAS to identify loci associated with milk production traits (i.e. milk yield (MY), fat yield (FY), fat percentage (F%), protein yield (PY) and protein percentage (P%)) in the Valle del Belice dairy sheep using the DEBVw as phenotypes.

Materials and methods
Ethics statement SNP genotype data were obtained from a previous study . Therefore, no specific ethical approval was required.

Samples, genotyping data and quality control
Phenotypic data from 481 ewes of Valle del Belice breed were collected by the University of Palermo in four flocks. More details on both animals and phenotypes are provided in a previous study conducted by the same authors .
Animals were genotyped with the OvineSNP50 BeadChip (Illumina Inc, San Diego, CA, USA). The quality control was performed using GenABEL package v3.4 (Aulchenko et al. 2007) in R environment (http:// www.r-project.org). Only SNPs located on autosomes were considered. Animals and markers that fulfilled the following criteria were kept in the analysis: (i) call rate per individuals and per SNPs >95%, (ii) minor allele frequency >2% and (iii) no extreme deviation from Hardy-Weinberg equilibrium (p < .001). A total of 37,228 SNPs and 469 individuals were retained for further analyses. The phenotypic data from 469 ewes included 5,328 test-day records for MY, FY, F%, PY and P%.

Genome-wide association analyses
The EBVs were estimated fitting the same effects reported in , using the pedigree rather than the SNPs to build the relationship matrix. In addition, EBVs were first deregressed (DEBVs) according to Garrick et al. (2009), and then weighted to obtain DEBVw, using ASReml v3.0 (Gilmour et al. 2009), fitting the genomic relationship matrix (GRM).
The association analysis was carried out using GenABEL package (Aulchenko et al. 2007) and DEBVw as phenotype. No polygenic step to account for relatedness among the individuals was needed, as this was already accounted when estimating the DEBVw (i.e. by fitting the GRM). Association was therefore tested using a qtscore function (Aulchenko et al. 2007). After Bonferroni correction, significant thresholds were p < 1.34 Â 10 À6 for genome-wide (p < 0.05) and p < 2.69 Â 10 À5 for suggestive (i.e. one false positive for genome scan) levels, corresponding to -log 10 (P) of 5.87 and 4.57, respectively. Quantile-quantile (Q-Q) plots were used to compare the observed and expected distribution of p-values under the null hypothesis of no association. Population substructure was explored using multidimensional scaling (MDS) in order to verify the genetic homogeneity of the sample before analysis.

Results and discussion
Previous GWAS have been conducted for milk production traits in the Valle del Belice breed, using SNP data and repeated measures , and Copy Number Variants and DEBV ). Here, we report the results from a GWAS for five most commonly evaluated milk production traits using the DEBVw. The choice to use DEBVw as response variable in our analysis was guided by the fact that GenABEL cannot handle multiple records (i.e. it only allows to fit a polygenic effect as random, making it not possible to account for the permanent environmental effects due to the repeated measurements). It is not uncommon to use EBVs as dependent variables in GWAS (Johnston et al. 2011;Becker et al. 2013). However, it was demonstrated that this approach has a high false positive rate (Ekine et al. 2014), whereas the use of DEBVs and DEBVw as dependent variables can improve the power of GWAS (Sell-Kubiak et al. 2015;Sevillano et al. 2015).
A total of 23 genome-wide significant SNPs were detected, with 43 reaching the suggestive significance threshold (by the definition we used), across all five traits (Figure 1 and Supplementary Figure S1 for the Q-Q plots). As SNP rs406975522 is associated with both FY and PY, the total number of distinct identified SNPs was 65. The details of the identified SNPs, including the trait they are associated to, positions on Ovis aries v4.0 genome assembly, p-values and the nearest known genes they laid within, are given in Table 1.
Out of 10 SNPs associated with MY, four were mapped within known genes or previously reported QTL for the milk yield or other milk production traits (Table 1). In particular, SNPs located between 177.26 and 178.47 Mb on OAR2 (rs419432879 and rs429723758, in bold in Table 1), laid into the region containing a QTL (ID ¼ 13992) for milk and lactose yield in sheep (Raadsma et al. 2009). The SNP rs419432879 mapped within the NCKAP5 (Nck-associated protein 5) gene, which has been associated with F% in Holstein cattle (Jiang et al. 2019). The SNP rs160014503 on OAR5 was located in the intronic region of SLC27A1 (Solute Carrier Family 27 member 1) gene, which shows enzymatic activity similar to acyl-CoA synthetase (Hall et al. 2003). In cattle, SLC27A1 was proposed as candidate gene for milk traits (Kulig et al. 2013;Nafikov et al. 2013) and polymorphisms at this gene have potential effects on milk yield trait (Lv et al. 2011). Finally, the SNP rs424648601 (located at 19.38 Mb on OAR17) was within a QTL (ID ¼ 57756) previously identified by Garc ıa-G amez et al. (2013) for MY in the Churra sheep breed, using two different approaches.
Only one SNP was associated with F% and it is located on OAR9, within the EXT1 (Exostosin 1) gene. On the other hand, 30 SNPs (located on 14 different chromosomes) were found associated with FY; some of these markers are associated to this trait in other sheep breeds (e.g. SNP rs403120738 on OAR2 (Garc ıa-  of our SNPs laid into this region. Several SNPs associated with FY mapped within known genes. The SNP rs423430191 on OAR5 was located in the intronic region of MAPK9 (Mitogen-Activated Protein Kinase 9), that is involved in immune system pathway in sheep (Yang et al. 2015) and is one of the key signalling nodes in mammary gland development in humans (Whyte et al. 2009). Zhou et al. (2019) examining the transcriptome and proteome profiles of liver tissue samples from Chinese Holstein cows, identified MAPK9 as candidate gene that regulates milk fat synthesis, transport, and metabolism. The SNP rs407468867 on OAR11 mapped within the RNF157 (Ring Finger Protein 157), a candidate gene involved in milk production traits in sheep (Saridaki et al. 2019 ). The SNP rs424842019 on OAR18 was located within the ADAMTSL3 (A Disintegrin-Like And Metalloprotease Domain With Thrombospondin Type I Motifs-Like) gene. Mateescu and Thonney (2010) reported two QTLs (ID ¼ 14149 and ID ¼ 14150) for MY in the same region.
For the protein traits (i.e. P% and PY), 22 and 3 SNPs were identified, respectively. Only the significant SNP for P% (rs42391986) on OAR2 mapped within a QTL (ID ¼ 57738) for the considered trait (Garc ıa-G amez et al. 2013). The SNP rs424276469 on OAR3 was located within MED27 (Mediator Complex Subunit 27), a candidate gene for milk yield in water Buffalo (Du et al. 2019). Significant SNPs were also found between 41.70 and 46.45 Mb on OAR6. The OAR6 has received wide attention because harbours, among others, the well-known casein clusters, generally accepted as major genes affecting milk production traits (Moioli et al. 2007). A significant marker identified in this study on this chromosome mapped within SEL1L3 gene, which has been reported as candidate gene for milk protein composition and cheese-making properties (Sanchez et al. 2019). Of the three SNPs associated with PY, the SNP rs406975522 on OAR9 (reported as significant also for FY) mapped within MMP16 (Matrix Metallopeptidase 16) gene. This SNP could be an example of cross-phenotype association, by which variants are associated with multiple traits that are often relevant to pleiotropy (Solovieff et al. 2013), although this hypothesis will have to be confirmed by further statistical analysis.
It is worth to highlight that no common regions were identified when comparing these results with those previously obtained in other GWAS on this dataset , in which a different definition of the traits and different methodologies were used. This might be partially due Table 1. SNPs significantly associated with milk production traits at genome-wide (p < 1.34 Â 10 À6 ) and suggestive (p < 2.69 Â 10 À5 ) thresholds. to the different approaches used and to the small sample size, which can be a possible cause of power reduction to detect significant regions. On the other hand, it might also be a confirmation of the complex nature of milk production traits, with many variants with small effect involved. We are therefore hypothesising that using different approaches and definition of the traits could be considered as complementary approaches, as they allow identifying different variants worth exploring.

Conclusion
A total of 23 genome-wide significant SNPs associated with milk production traits were identified in this study, with severalSNPs mapped within known candidate genes or previously reported QTL. These SNPs supported some previous findings and also added new information useful to understand the genetic mechanisms underlying the milk production and quality traits in dairy sheep. As no common regions were identified when comparing these results with those previously obtained in other GWAS on this dataset, we are hypothesising that using different approaches and definition of the traits could be considered as complementary approaches, as they allow identifying different variants.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data availability statement
The data have been submitted to the repository at: https:// ww.animalgenome.org/repository/pub/UPIT2018.0801/