Genome-wide association study for residual concentrate intake using different approaches in Italian Brown Swiss

Abstract Residual feed intake (RFI) is the most used measure of feed efficiency. However, considering the importance of concentrates in the ration, a new index, the residual concentrate intake (RCI), was here defined. RCI aims to measure the individual efficiency in converting the concentrate into animal products. Brown Swiss young bulls (N = 736) were genotyped at 41,183 loci. Animals were housed in pens equipped with an automatic feeding system able to recognise the animal and record the concentrate intake. The diet consisted of concentrate and hay (ad libitum). The new RCI index was calculated as the residuals of the linear regression of concentrate intake on metabolic live weight and average daily gain. Animals were ranked according to their corrected RCI and divided into low (LRCI) and high phenotypes (HRCI). A low heritability (0.06 ± 0.03) was estimated using only genomics for this new index. Results from multivariate (M-GWAS) and Bayesian (B-GWAS) approaches were combined to identify SNP associated with RCI. The M-GWAS selected 698 SNPs potentially associated, whereas no significant markers were obtained in B-GWAS. Markers in the last approach were ranked according to their posterior inclusion probability and the first 698 were retained. Only SNPs in common between sorted B-GWAS and M-GWAS (N = 11) were considered associated with RCI. A total of 48 candidate genes were retrieved near these SNPs. Most of them were previously reported to be associated with feed efficiency and RFI. The combined use of multivariate and Bayesian techniques allow to identify SNPs associated with the investigated trait. Highlights RCI could be promising to select animals 48 candidate genes were found associated with RCI Multivariate technique allowed to identify significant SNPs


Introduction
Feed costs contribute to up to 60% of production costs in the dairy cattle industry (Connor 2015). Production costs strongly depend on animal efficiency: feed consumption decrease when animals can efficiently convert feed into milk or body gain (Negussie et al. 2019;Pulina et al. 2020). Among different indexes suggested to evaluate feed efficiency in cattle, the most popular is probably the residual feed intake (RFI) (Van Arendonket al. 1991;Manafiazar et al. 2013;Berry and Crowley 2013). It is obtained by subtracting the actual from the expected individual intake required by the animal for its maintenance and production. Being RFI by definition independent from production and body weight, animals with low RFI can consume less feed without reducing the production level. RFI has been investigated in beef, dairy, and dual-purpose cattle (Cantalapiedra-Hijar et al. 2018;Kenny et al. 2018;Romanzin et al. 2021).
In ruminants, feed intake can be roughly separated into forage and concentrate based on grains and feedstuffs rich in energy and protein. According to Purcell et al. (2016), the amount of concentrates offered to dairy cattle has increased in the past decades to solve two main problems: to reduce the extent of negative energy balance (NEB) experienced in early lactation, and to allow cows to achieve their potential milk yield. Concentrates and forages can be offered at the same time in a total mixed ratio or separately (Purcell et al. 2016). In the latter situation, the concentrate is supplied using individual concentrate feeders on the barn or in robot milking systems, while forages are offered ad libitum.
Considering the growing importance of concentrates in the ration composition of ruminants (Purcell et al. 2016), a new index to evaluate feed efficiency, the residual concentrate intake (RCI), is defined in the present research. RCI, similarly to RFI, aims to measure individual efficiency in converting the concentrate into animal products (milk or body gain). Being RCI part of RFI, it should assume the same characteristics of independence from animal-related variables and trait heritability. At present, several countries include RFI in their breeding programs (Pryce et al. 2012;Bolormaa et al. 2013; Connor 2015). However, the cost of measuring RFI (or RCI) represents a strong limitation to population-wide selection programs. Genome-wide association study (GWAS) is a useful tool to understand the genetic basis of a trait and, in consequence, also of RCI. This approach is aimed to identify genomic regions associated with genetic variation in a trait and to select genes that could be associated with it. In the present research, a GWAS for RCI in Italian Brown Swiss cattle was developed by using the multivariate approach (M-GWAS) proposed by Manca et al. (2020).

Materials and methods
Data used in this study were obtained from pre-existing databases and therefore the animal care approval was not needed.

Data
A sample of 1,092 Brown Swiss young bulls was considered in the study. Animals arrived at 5-6 months of age at the ANARB (Italian Association of Brown Swiss, Verona, Italy) genetic centre from different commercial herds. Bulls were housed in a quarantine pen for about one month and they were distributed among pens (no more than six bulls/pen). Each pen was equipped with an automatic feeding system able to recognise the animal and to record the concentrate intake. The diet consisted of concentrate and hay. The concentrate was a commercial pelleted mix formulated using grain meals and agro-industrial byproducts (raw chemical composition: 18.0% of CP, 3.2% of fat, 10.1% of crude fibre, 7.6% of ash, and 0.4% of sodium). Hay was administered ad libitum. Animals remained in pens for three months and the body weight (BW) was recorded monthly. After this period, bulls were moved into single pens for the mount training. From the initial 1,092 young bulls, only 736 animals with at least three BW records were considered for further statistical analysis. The RCI phenotypes were calculated as the residuals of a linear regression model of concentrate intake on metabolic live weight and average daily gain (ADG) (Arthur et al. 2005). These residuals were then adjusted according to the following linear model: where M was the fixed effect of the ith birth month (12), Y was the fixed effect of the jth birth year (from 2002 to 2013), and e was the random residual (for more details see Macciotta et al. 2015). Animals were ranked according to their corrected RCI and divided into low (LRCI) and high phenotypes (HRCI). The two groups contained an equal number of individuals, 368 bulls each (Manca et al. 2020).
All animals were genotyped by using the Illumina BovineSNP50 BeadChip (Illumina Inc., San Diego, CA, USA). SNP with call rate lower than 99% or minor allele frequency lower than 5% were removed. The remaining missing genotypes were replaced with the most frequent allele at that specific locus. At the end of data editing, 41,183 SNP located on 29 autosomes (mapped on the ARS-UCD1.2 bovine map release) were available for analysis.

Heritability estimation
Heritability was estimated using the following mixed linear model: where y is a vector of RCI (i.e. the values adjusted for month and year of birth, see the above equation), l is the overall mean, Z is an incidence matrix relating phenotypes in y to additive genetic effects in u that is a vector of additive animal effects, and e is a vector of random residuals. Since pedigree was not available, the genetic (co)variances structure for u was u $ Nð0, Gr 2 a Þ where r 2 a is the additive genetic variance and G is the genomic relationship matrix (GRM) built according to VanRaden (2008): where M is the matrix of genotypes centred by twice the current allele frequencies (p) estimated for the j-th SNP. Variance components and heritability (h 2 ) were estimated using GIBBS3F90 software (Misztal et al. 2014) using a total of 100,000 iterations with the first 10,000 discarded as burn-in and saving 1 sample every 10. Post means and standard deviations were calculated using POSTGIBBSF90 software (Misztal et al. 2014).

The multivariate and Bayesian GWAS
The method proposed by Manca et al. (2020) was used to develop a multivariate GWAS. In this approach, associated markers were detected by combining the results of two different techniques. In the first, called multivariate GWAS (M-GWAS), data were arranged in a multivariate manner, with animals on the rows and genotypes (coded as 0, 1, and 2) on the columns. This data was submitted to three multivariate techniques: the canonical discriminant analysis (CDA), the discriminant analysis (DA), and the stepwise discriminant analysis (SDA). The algorithm started applying the CDA separately by chromosome. Then the mean and the standard deviation of the absolute value of canonical coefficients (CC) in the 29 canonical functions (CAN, one for each chromosome) were calculated. For each CAN, only markers whose CC's absolute value was greater than the mean plus one standard deviation were retained. The obtained SNPs were then joined, and the SDA was applied to obtain the maximum number of linearly independent markers. The selected SNPs were used as variables to develop a new CDA where both the Mahalanobis' distance between HRCI and LRCI and the Hotelling's ttest were evaluated. Then the DA was applied to ascertain if animals were correctly assigned to the group of origin.
CDA and DA were successively applied in an iterative procedure. At each run, the number of involved markers was reduced by deleting those with lower CC's absolute value but still keeping the minimum number of SNPs able to separate groups. The procedure stopped when Hotelling's test was still highly significant (p-value < .001) and, at the same time, the DA correctly assigned all animals to HRCI and LRCI. The obtained SNPs were the most discriminant markers and, in consequence, they were considered associated with RCI.
In the second technique, called Bayesian GWAS (B-GWAS), the BayesR software (Moser et al. 2015) was used. B-GWAS was carried out using a chain length of 50,000 samples, with the first 20,000 ones being discarded as burn-in and saving every 10th sample. In this approach, the effect of each marker is selected from one out of four possible distributions: Nð0, 0r 2 a Þ, Nð0, 0:0001r 2 a Þ, Nð0, 0:001r 2 a Þ, Nð0, 0:01r 2 a Þ, where r 2 a is the additive genetic variance. The posterior inclusion probability (PIP) for each SNP was calculated as the sum of probability to be included in one of the three non-zero distributions (i.e. 0.0001, 0.001, and 0.01). One SNP was declared significant if its PIP was >0.30 (Pasam et al. 2017). Then SNPs were sorted according to their descending PIP value and the first x-SNPs were selected, where x was the minimum number of markers selected by the M-GWAS. Finally, SNPs simultaneously detected in the M-GWAS and the sorted list from B-GWAS was considered associated markers and submitted to gene discovery.
All genes with at least one significant marker inside or in the 250k upstream and downstream from their boundaries were annotated (Cesarani et al. 2019a;Manca et al. 2020). Moreover, protein-protein interactions (PPI) were carried out using STRING (https:// string-db.org).

Heritability estimation
The estimated h 2 was 0.06 ± 0.03, a value that is lower than the estimates reported in the literature for the RFI. Heritability of 0.16 ± 0.03 for RFI was reported by Lu et al. (2015) in dairy cattle using pedigree information. Recently, also Freetly et al. (2020) estimated larger values for RFI in heifers (0.25 ± 0.11) and cows (0.16 ± 0.10) of crossbreed animals using the REML approach (i.e. with pedigree information). Li et al. (2020) reported estimates of 0.14 with either a pedigree or genomic model applied to Holstein dairy cows. Anyway, it must be pointed out that the range of heritabilities for RFI is very large: from 0.07 to 0.62 (Berry and Crowley 2013). In our study, pedigree was not available and h 2 was estimated using only the G matrix. Heritability estimated using genomics are often lower than those estimated using pedigree (Aldridge et al. 2020;Hidalgo et al. 2020;Cesarani et al. 2021). Moreover, since this study involved Brown Swiss, a selected dairy breed, and selective genotyping strategy (i.e. the genotyped animals were only top bulls) variance components and h 2 estimated using only genomic information are likely to be biased (Cesarani et al. 2019b). This is because the heritability of a trait strongly relies on the covariance among relatives and when only genotypes of the best animals are used to estimate variance components these covariances do not reflect the relationships among animals in the whole population.

Genome-wide association studies
No significant SNPs (i.e. SNP with a PIP higher than 0.30) were found using the B-GWAS. On the contrary, using the M-GWAS, 698 SNPs were selected as the minimum number of markers able to separate the two considered groups (LRCI and HRCI). Thus, SNPs in B-GWAS were decreasingly sorted according to their PIP and the first 698 markers were considered. A total of 11 SNPs was found in common between the two lists of markers (i.e. sorted B-GWAS and M-GWAS), which have not necessarily the best PIP. Only these markers were considered significantly associated with RCI (Table 2). BTAs 1 and 6 harboured two SNPs, everyone, whereas the remaining seven markers were found in BTAs 7, 12, 17, 22, 23, 24, and 27, respectively.

Gene-by-gene description
Candidate genes nearby the associated SNPs are shown in Table 2. A total of 21 and 25 genes were identified using limits of 100 and 205 kb up and downstream the significant SNP position, respectively. Two markers, ARS-BFGL-NGS-104436 on BTA1 and Hapmap42981-BTA-57599 on BTA24, were found inside genes KCNAB1 and MAPRE2, respectively. The Potassium Voltage-Gated Channel Subfamily A Member Regulatory Beta Subunit 1 (KCNAB1) is involved in the regulation of potassium ion transmembrane transport (Majumder et al. 1995). This gene, located on BTA1, was listed among the candidate genes associated with inbreeding in two cattle populations (Brahman and Tropical Composite) by Reverter et al. (2017), who studied the genomic inbreeding depression for climatic adaptation of tropical beef cattle. The same gene was reported to be associated with congenital deafness in Australian Stumpy Tail Cattle dogs (Xu et al. 2021).
The Microtubule-associated protein RP/EB family member 2 (MAPRE2) gene, mapped on BTA24, was found to be associated with the first calving interval in buffaloes (de Araujo Neto et al. 2020). It was also reported among 238 genes differentially expressed in a meta-analysis about molecular signatures of muscle growth and composition retrieved from public transcriptomics data (Bazile et al. 2020), and it has been also associated with carcase and growth traits in chicken (Zhang H et al. 2020).
A cluster of seven genes located on BTA27 (ADRB3, BRF2, PROSC, ERLIN2, GOT1L1, RAB11FIP1, ZNF703) and detected in the present study, was found to be associated with ADG in Nellore cattle (Olivieri et al. 2016). In that study, the genomic region harbouring these genes explained 1.56% of the additive genetic variance of ADG. Hardie et al. (2017) found six genes of this cluster associated with RFI in Holsteins. In the same study, ADGRA2 (BTA6) was associated with RFI, whereas MTHFD2L (BTA27) was related to metabolic body weight, a trait used for RFI calculation (Hardie et al. 2017). The MTHFD2L gene was reported to be associated also with other economically important traits, such as carcase conformation in Charolaise and Limousine (Purfield et al. 2019), subclinical ketosis in Holstein dairy cows (Nayeri et al. 2019), and resistance to clinical mastitis in Nordic Holstein cattle (Cai et al. 2018).
The ERLIN2 gene was reported also among the candidate genes associated with RFI in beef cattle through a gene interaction network (Karisa et al. 2013). Moreover, RAB11FIP1, GOT1L1 and ADRB3 genes were found to be related to fat yield in Canadian Holsteins (Li et al. 2010). Hu et al. (2010), reported that ADRB3 is involved in the mobilisation and utilisation of energy in cattle. In fact, this gene was associated with intramuscular fat content and fatty acid composition in pigs (Xue et al. 2016) and with growth traits (i.e. birth weight, growth rate until weaning, and carcase composition) in Merino sheep (Forrest et al. 2003).
Some other interesting genes already associated in cattle with RFI were flagged as significant in this study. The F13A1 gene, located on BTA23, was reported to be associated with RFI by Zhang F et al. (2020), who performed a GWAS on multiple beef cattle breeds. This gene has been also associated with mastitis resistance in Nordic Holstein (Cai et al. 2018). These authors analysed differentially expressed genes in udders and reported that this gene is involved in complement and coagulation cascades. de Lima et al.
(2016) carried out a GWAS on Nellore cattle and they associated the TIPARP gene with RFI, suggesting its possible role as a regulatory factor. This gene, located on BTA1, was also reported as a candidate gene for valeric acid in a GWAS for methane emission and ruminal volatile fatty acids using Holstein cattle sequence data (Sarghale et al. 2020). The PCDH8 gene <250 kb a Distance from the significant SNP: in ¼ the SNP was inside the gene; 100 kb ¼ the SNP was from 1 to 100 kb upstream or downstream from the gene; 250 kb ¼ the SNP was from 101 to 250 kb upstream or downstream from the gene.
was recently associated with RFI, residual gain, and feed efficiency in French beef cattle (Taussat et al. 2020). Another gene already associated with feed efficiency in beef cattle and highlighted also in the present study was the DTNA gene. This was reported to be associated with residual ADG by Serão et al. (2013), and with RFI by Chen et al. (2011). Interestingly, the HTR4 gene was highlighted by Yao et al. (2013) in a random forest carried out to identify additive and epistatic SNPs associated with RFI in dairy cattle. Other genes, even if not directly related to RFI, were reported to be associated with other meat or growth traits related to this phenotype. The ABCG1, located on BTA1, has been associated with ADG in a GWAS carried out on Hereford cattle (Seabury et al. 2017); this gene has also a role in adiposity and fat mass growth in humans and mice (Frisdal et al. 2015). Another gene already associated with ADG, even if in pigs, and also highlighted in the present study is the OLFM4 gene, located on BTA12. Onteru et al. (2013) associated this gene with ADG in pigs through GWAS, whereas Liu et al. (2018) associated the gene with RFI in broilers by using a differential expression analysis. As far as cattle breeds were concerned, the OLFM4 gene has been also associated with milking speed in Brown Swiss cattle (Kramer et al. 2014) and with coat colour in Vrindavani cattle (Chhotaray et al. 2021). Continuing with the genes already related to growth traits, Lindholm-Perry et al. (2020) recently found the LY86 gene among the genes differentially expressed comparing cohorts of beef steers based on different feed intakes. The ZNF397 gene was found to be significant in gene expression analyses carried out comparing two groups of animals, high-marbling and low-marbling, of Hawoo cattle breed (Lim et al. 2013). Furthermore, its protein is overexpressed in the spermatozoa of high fertile buffalo bulls compared to low fertile bulls (Muhammad Aslam et al. 2019). The TIMP4 gene has been found over-expressed in the doublemuscled foetuses, compared to normal ones (Hocquette et al. 2007), suggesting a possible role in the muscle development and the animals' growth; in another study, its mRNA expression remained constant in Longissimus dorsi muscle associated with different stages of intramuscular adipose tissue development (Roberts et al. 2015). The TAPT1 gene was reported to be associated with carcase weight and with eviscerated weight, even if in Beijing-You chickens (Liu et al. 2013). The NR2C2 gene was reported among the genes downregulated in the low plane compared to the high plane of nutrition in Angus x Simmental beef cows (Mois a et al. 2015). Thus, its role in energy mobilisation, and therefore a connection with growth traits and RFI, could be supposed. The other three genes found in the present study (RIPK4, PRDM15, and C2CD2) were reported to be associated with the development of loin in Charolaise (Doyle et al. 2020). Two of these genes (PRDM15 and C2CD2) have been also associated with milk yield in Portuguese Holstein cattle (Silva et al. 2020). BBS12 gene has been associated with pure meat weight, foreshank weight, and silverside weight in Chinese Simmental beef cattle (Chang et al. 2019). Also, MRPS25 was already associated with meat traits, even if in sheep. In particular, according to Bolormaa et al. (2016), this gene is associated with several meat traits. It influences retail colour and increases myoglobin and wet iron contents in muscle; moreover, it increases meat tenderness, decreases pH level, and it influences other important meat traits, such as eye muscle area and eye muscle depth, glycogen, isocitrate dehydrogenase activity and polyunsaturated fatty acids level (Bolormaa et al. 2016). ZSCAN30 has been associated with first calving interval in buffaloes (de Araujo Neto et al. 2020).
Five genes belonging to the CXC family of chemokines (CXCL1, CXCL2, CXCL3, CXCL5, CXCL8) were flagged as significant genes on BTA 6. This family is involved in the immune response: CXCL1 and CXCL2 have an important role in immune defense because they modulate the leukocyte infiltration (Sharifi et al. 2018), CXCL3 is responsible for the constitutive chemotactic activity of bovine milk for neutrophils (Rainard et al. 2008), whereas CXCL8 plays an important role in a wide range of bovine diseases (Widdison and Coffey 2011). Mukiibi et al. (2019) found CXCL2 among the differently expressed genes between Charolaise steers with high and low ADG, and CXCL3 associated with ADG in Angus, Charolaise, and Kinsella composite populations. Also, the IL21 gene was reported to be associated with immune system response and it has been reported as significant in an FST analysis between selected and conserved Polish Red cattle (Gurgul et al. 2019). Other five genes (BBS12, CETN4, SPATA5, NUDT6, and FGF2), mapped in the regions highlighted in this study and related to the immune system in cattle, have been associated with somatic cell count in seven GWAS on different populations of dairy cows (Chen et al. 2015). FGF2 is involved in embryonic mortality in cattle (Khatib et al. 2008), probably because of its role in the stimulation of the interferon-s which plays a regulating role in the establishment and maintenance of pregnancy in ruminants. (Michael et al. 2006). Another gene associated with reproductive traits found in this study is the UMODL1 gene, which is involved in the regulation of ovarian follicle development (Grigoletto et al. 2018).

Gene enrichment and protein-protein interaction (PPI)
When genes were divided according to their molecular function, five different classes were observed (Table 3). The highlighted classes were binding (23 genes, 47.9% on the total), molecular function regulator (13 genes, 27.1% on the total), catalytic activity (eight genes, 16.7% on the total), molecular transducer activity (three genes), and transporter activity (two genes). The clustering of genes according to the different cellular components is shown in Table 4. Three clusters were obtained: cellular anatomical entity (31 genes), intracellular (20 genes), and protein-containing complex (four genes). When looking at the genes divided according to the biological process they are involved in, 12 different clusters could be observed (Table 5). The three largest classes were cellular process (30 genes, 62.5% on the total), biological regulation (24 genes, 50% on the total), and metabolic process (15 genes, 31.3% on the total). The other highlighted classes grouped 13 or less genes (Table 5).
PPI was investigated for the 48 candidate genes of the present study (Figure 1). The PPI showed more interactions than expected: 55 edges identified (average node degree of 2.34) compared to the nine expected. The p-value of the PPI enrichment was lower than 1e À16 . A big cluster with nine genes (IL21, FGF2, PROM1, CXCL6, GRO1, CXCL8, CXCL3, CXCL2, and OLFM4) already associated with the immune system in the gene-by-gene description (see above) could be observed. A mini-cluster with three genes (ADRB2, ADRB3, and HTR4), previously reported to be associated with RFI in cattle (Yao et al. 2013;Hardie et al. 2017) was also highlighted. Another cluster with six genes (GOT1L1, ZNF703, ERLIN2, PROSC, GPR124, and RAB11FIP1) was observed: some of these genes (GOT1L1, ZNF703, ERLIN2, PROSC, and RAB11FIP1) were already associated with ADG in Nellore and with RFI in Holstein.