Genetic parameters of calving ease in dairy cattle using threshold and linear models

Abstract Genetic parameters of direct and maternal effects for calving ease (CE) in Portuguese dairy cattle were estimated applying a threshold model (TM) and three linear models (LM) using 320,953 CE records. In LM, CE scores were directly used as dependent variables (LM1) or were transformed into values on an underlying continuous liability scale (average and maximum values for LM2 and LM3, respectively). The estimate of heritability for direct effects was lower on linear models (0.04–0.05) than in TM (0.09). Furthermore, direct heritability was higher than maternal heritability, ranging between 0.04–0.09 and 0.01–0.02, respectively. The genetic correlation between direct and maternal effects was negative in all models, ranging between –0.75 and –0.82. We investigated the association between bulls estimated breeding values (EBV) attained by TM and three LM, for 2223 bulls. Results show that all correlations were higher than 0.95 and slightly greater correlations were observed for direct effects than for maternal effects. Particular emphasis was given to individual differences between EBV ranking lists. The average difference between rank positions of TM and the three LM ranged from 64 to 122. Therefore, the selection of breeding animals is influenced by the model’s choice. Even with clear theoretical advantages in analysing categorical data, perhaps due to its more recent implementation, TM has not yet achieved the popularity of LM. The results of this work show that TM ranking list is more in accordance with LM2 than with LM1 or LM3. Therefore, with linear models, CE should be considered as average liability value instead of usual numerical score or maximum liability.


Introduction
Calving is a key event on dairy cattle productive life and successful births are essential to economic success of the farm. Calving affects animal welfare, with dystocia being ranked as one of the most painful conditions of cows (Huxley and Whay 2006;Eaglen and Bijma 2009). Calving difficulty affects herd profitability because of increased labour and veterinary costs, increased calf mortality, and reduced subsequent fertility and survival of the cow (Hickey et al. 2007). Calving problems such as hard pull assistance, caesarean, fetonomy or abortion increase the infections risk. Consequently, calving-related infections also indirectly affect human health as they require increased use of antibiotics, leading potentially to microbial resistance. Besides, animal welfare is compromised by these calving complications and so consumer acceptability of dairy production systems (Mee 2008).
Calving trait phenotypes are affected by two separate components, the calf's contribution (direct effect; e.g. arising from size, hormonal balance, weight) and the dam's contribution (maternal effect; e.g. arising from pelvic measurements, ability to respond to parturition signalling) (Eaglen et al. 2012;Vanderick et al. 2014). Calving ease (CE) measures the presence or absence of calving problems and their intensities. This trait is generally scored on a categorical scale by the farmer, what makes it more sensitive to subjectivity (Dekkers 1994).
In theory, methods used for analysing continuous data are not suitable for categorical data (Gianola and Foulley 1983;Ramirez-Valverde et al. 2001;Hansen et al. 2004). Gianola and Foulley (1983) developed the threshold model (TM) for genetic evaluation of categorical traits. The TM takes into account the asymmetry and extreme incidence of some categories. However, in several studies, preference was given to the use of linear models (Carnier et al. 2000;Eaglen and Bijma 2009). The purpose of this research was to estimate variance components and genetic parameters for CE using TM and three linear models (LM) and to compare breeding values ranking lists achieved from the two methodologies.

Data
A data set consisting of 990,026 CE records was provided by ANABLE (Associac¸ão Nacional para o Melhoramento dos Bovinos Leiteiros). CE data are routinely recorded by farmers and scoring was conducted according to a 6-grade scale: 1 ¼ normal (not assisted), 2 ¼ moderate assistance, 3 ¼ hard pull, 4 ¼ caesarean, 5 ¼ fetonomy, 6 ¼ abortion. Hence, higher scores indicate greater calving difficulty, although the trait usually is referred to as calving ease rather than calving difficulty (Eaglen and Bijma 2009). For this study, the number of classes is reduced to 4. This is done by adding score 5 to 4 and discarding score 6. From the original data set, twin calving's were discarded and both parents must be known. Only records from cows that had calved between 20 and 39 mo., 30 and 59 mo., 43 and 75, 56 and 92 mo., 69 and 108 mo. in the first, second, third, fourth and fifth parities, respectively, were used. Cows outside these ages were relatively few, and those cows may have had an error in their parity record. Farmers CE classification was checked for inconsistencies. A total of 65 herds were excluded because farmers put all scores in the same category. After editing, the data consisted of 320,953 records originating from 1595 herds. Table 1 has the frequencies of each class of calving ease. Pedigree data were extracted from the database used for the official Portuguese genetic evaluations. The final pedigree file included 465,759 animals of three generations.

Linear model
Dependent variable of LM (y) was considered in three approaches. CE scores were directly used as dependent variable (LM1) or are transformed into values on an underlying continuous liability scale (LM2, LM3) ( Figure 1). This transformation does not make CE scores continuous; it merely alters the differences between scores (Eaglen and Bijma 2009). LM1: y equal to CE scores 1, 2, 3 and 4. LM2: Truncated normal distribution properties allows attaining average liability value inside truncation intervals (Foulley 2000). Let Ø x ð Þ and U x ð Þ be the density and cumulative normal distribution with mean 0 and variance 1, respectively. Average liability for two-sided truncation is given by: Average liability for one-sided truncation (lower tail), since Ø a ð Þ ¼ U a ð Þ ¼ 0: And, average liability for one-sided truncation (upper tail), since Ø b ð Þ ¼ 0 and U b ð Þ ¼ 1: E XjX > a ð Þ¼ Ø a ð Þ 1 À UðaÞ CE scores (1, 2, 3 and 4) were converted into the average liability value corresponding to the observed phenotypic category (Table 1). Based on the frequency of each category, the transformed values of categories 1 through 4 were À1.048, 0.537, 2.520 and 3.417, respectively. LM3: CE scores (1, 2, 3 and 4) were converted into the maximum liability value corresponding to the observed phenotypic category. Based on the frequency of each category, the transformed values of categories 1 through 4 were À0.370, 2.229, 3.142 and 3.719 (Table 1). By definition, maximum liability value to CE score 4 is infinite, since U 1 ð Þ¼ 1. Due to calculus convenience, it was assumed U 3:719 ð Þ¼ 0:9999. The following LM was assumed when estimating the variance components and the fixed effects for CE: where y ijklmn ¼ CE (score or transformed) on animal n, m ¼ overall mean, parity i ¼ effect of the i th parity class (1, 2, 3, 4, 5), sex j ¼ effect of the j th sex class (male or female), b 1 ¼ regression coefficient, age k ¼ age of the dam k at calving, ys l ¼ effect l th birth season and birth year of calve (58 levels), hy m ¼ random effect m th herd and birth year of calve (17 110 levels), a n ¼ random direct genetic effect of the n th animal, m n ¼ random maternal genetic effect of the n th animal, and e ijklmn ¼ random residual effect. Only classes with six or more observations were considered on ys and hy effects, and hy was considered as random effect to avoid convergence problems (Misztal et al. 1989;Vanderick et al. 2014). In matrix notation the model can be written as where y is a vector of CE scores or liability values, b is a vector of fixed effects, h is a vector of random hy effects, a is a vector of direct effects, m is a vector of maternal effects and e is a residual vector. X, Z h , Z a and Z m are known incidences matrices linking data with respective effects. The a and m were assumed to follow a multivariate normal distribution with mean 0 and covariance matrix G A, where indicates the Kronecker product of matrices, and A is the relationship matrix. All the others covariances were assumed to be zero and Var(e) ¼ Ir e 2 .
The parameters in the model were estimated by restricted maximum likelihood method using the ASREML software (Gilmour et al. 2009).

Threshold model
In the TM, the observed outcome (y ijklmn ) for CE was assumed to be ordered in four categories. An unknown liability (U ijklmn ) was assumed with three unknown thresholds (t ¼ {t 1 , … ,t 3 }), which categorised the observed outcome: For reasons of identifiability and without loss of generality, it was assumed that t 1 ¼ 0 and t 2 ¼ 1. In addition, it was assumed that the liabilities conditional on all of the effects were independent and normally distributed: Ujb; h; a; m $ N n Xb þ Z h hy þ Z a a þ Z m m; Ir 2 e À Á The model in matrix notation is similar to the previous one, with the vector of liabilities U instead of y. Variance components and genetic parameters as well as solutions for fixed and random effects in threshold animal mixed models were estimated with THRGIBBS1F90 , which is a Fortran90 programme using a Bayesian approach via the Gibbs sampling algorithm . After 10,000 Gibbs samples were discarded as burn-in, 40,000 samples were used to calculate the posterior means and SE for variance components, correlations and heritabilities (Smith 2005).

Other statistics
The Spearman's rank correlation coefficient (r s ) was used to evaluate the association between estimated breeding values (EBV) attained by TM and linear models (LM1, LM2, LM3). In this analysis, it was considered only bulls with progeny higher than 15, born in at least 5 farms. Spearman's rank correlation considers ranked EBV, rather than the EBV itself. Descriptive statistics based on absolute values of the ranking position differences were also applied in order to quantify individual differences in ranking positions.

Results and discussion
Almost 99% of births were classified on scores 1 or 2 and score 2 has the highest frequency (Table 1). Similar CE distribution has been reported by Eaglen and Bijma (2009). However, in other studies, the score with the highest frequency was score 1 (Wiggans et al. 2003;Eaglen et al. 2013). Differences between authors in CE classes' definition can contribute to those discrepancies. Also, data quality depends highly on dairy farmers' own judgement to assign scores for calving ease. It is important to sensitise breeders to correctly classify births. For instead, if a farmer put all scores in the same category, this is an evidence of bad judgement (Vanderick et al. 2014).
Variance components and genetic parameter estimates (phenotypic and residual variances, genetic (co)variances and heritability) of CE with TM and LM models are shown in Table 2. For all models, the additive genetic variance due to direct effects was higher than that due to maternal effects. Furthermore, direct heritability was higher than maternal heritability, ranging between 0.04-0.09 and 0.01-0.02, respectively. Similar outcomes have been reported (Wiggans et al. 2003;Eaglen et al. 2012;Vanderick et al. 2014). The genetic correlation between direct and maternal effects was negative on all models, which is consistent with the literature, but it was superior (in absolute value) to most published results (Hickey et al. 2007;Eaglen and Bijma 2009). A biological reason for this negative genetic association is that female calves of a small size are likely to be born easily, but as adults they may experience more difficulties in giving birth because of the reduced pelvic opening dimensions (Hickey et al. 2007).
The estimate of heritability for direct effects was lower in linear models (0.041-0.046) than in TM (0.086). The same tendency was verified with the heritability for maternal effects but with lower scores. As reported by several studies, higher heritabilities are usually expected with threshold models than linear models (Phocas and Laloe 2003;Vanderick et al. 2014). Accordingly to Vanderick et al. (2014), these heritabilities cannot be directly compared because they were estimated on different scales, on a visible probability scale and on an underlying normal scale for linear and threshold models, respectively. However, since heritability is a variance ratio, the differences in scale are mitigated. Estimated heritabilities obtained in the present study were lower but were within the range of previously published estimates of this trait in dairy cattle, which ranged from 0.03 to 0.17 for direct heritability and from 0.02 to 0.12 for maternal heritability (Weller and Gianola 1989;Steinbock et al. 2003;Wiggans et al. 2003;Lopez de Maturana et al. 2007;Eaglen et al. 2012).
Breeding values of 2223 bulls with progeny higher than 15, born in at least five farms, were considered for the comparison of genetic ranking lists (Table 3). These are the criteria applied in Portugal for publication of genetic evaluations for milk traits. Additionally, with this restriction, only bulls evaluated with higher Table 2. Genetic parameter estimates of calving ease with linear models (LM1, LM2, LM3) and threshold model (TM). LM TM LM1 LM2 LM3 r 2 AD 0.006 ± 0.0004 0.015 ± 0.0011 0.037 ± 0.0026 0.007 ± 0.0007 r ADM -0.002 ± 0.0003 -0.006 ± 0.0007 -0.015 ± 0.0016 -0.003 ± 0.0004 r 2 AM 0.001 ± 0.0002 0.004 ± 0.0006 0.009 ± 0.0013 0.002 ± 0.0003 r 2 hy 0.099 ± 0.0012 0.249 ± 0.0031 0.647 ± 0.0079 0.093 ± 0.0013 r 2 e 0.132 ± 0.0004 0.343 ± 0.0011 0.780 ± 0.0026 0.072 ± 0.0005 t 3 ---1.271 ± 0.0050 r 2 P 0.137 ± 0.0004 0.355 ± 0.0009 0.811 ± 0.0021 0.078 ± 0.0004 q ADM -0.803 ± 0.0380 -0.799 ± 0.0389 -0.819 ± 0.0357 -0.751 ± 0.0330 h 2 D 0.043 ± 0.0031 0.041 ± 0.0030 0.046 ± 0.0032 0.086 ± 0.0091 h 2 M 0.010 ± 0.0016 0.010 ± 0.0015 0.011 ± 0.0016 0.023 ± 0.0037 r 2 AD : genetic variance of direct effects; r ADM : genetic covariance between direct and maternal effects; r 2 AM : genetic variance of maternal effects; r 2 hy : variance of hy effect; r 2 e : residual variance; t 3 : threshold 3; r 2 P : phenotypic variance; q ADM genetic correlation between direct and maternal effects; h 2 D : heritability for direct effects; h 2 M : heritability for maternal effects. accuracy were considered. Table 3 shows r s between TM and linear models (LM1, LM2, LM3), for direct and maternal effects. For each one of these comparisons, mean, median, first and third quartile of the absolute values of the rankings' position differences is also shown. All correlations were higher than 0.95 and slightly greater correlations were observed for direct effects than for maternal effects (Table 3). Vanderick et al. (2014) reported Spearman's rank correlations between sire breeding values from TM and LM of 0.972 and 0.971 for direct and maternal CE breeding values, respectively, indicating a good agreement between ranking of sires attained with both methodologies. Correlations between EBV of TM and LM have been found in other categorical traits such as calf mortality (0.97), stillbirth (0.96 to 0.99) and clinical mastitis (>0.99) (Hagger and Hofer 1990;Heringstad et al. 2003;Hansen et al. 2004). Analyses of individual differences between models in rankings' positions are useful to understand the influence of model choice in bulls selection. Considering only results for direct effects given in Table 3, all the presented statistics (mean, median, first and third quartile) are better for LM2 than for LM1 or LM3. For instance, the average difference between rank positions of TM and LM2 is 64 and 50% of the EBV differ in less than 46 positions (median). Worst scores were found for LM3, since the average difference between rank positions of TM and LM3 is 122 and 50% of the EBV differ in less than 83 positions. Also for maternal effects, results of Table 3 show that TM ranking list is more in accordance with LM2 than with LM1 or LM3.
Bivariate plots of bull rankings between TM and linear models (LM1, LM2, LM3) for direct and maternal genetic effects are shown in Figure 2. This graphical representation sustains that there is a linear relation between EBV ranking lists attained with TM and linear models (LM1, LM2, LM3). Also, plots of direct and maternal genetic effects for LM3 shows higher dot dispersion than plots for linear models LM1 and LM2.
Results showed on Table 3 and Figure 2 allow to conclude that the best and the worst agreement of breeding values ranking lists between TM and linear models are attained with LM2 and LM3, respectively.
The pedigree file used in this work has 465,759 animals (bulls and cows) but comparisons between EBV ranking lists were made considering only bulls with higher accuracy, reproducers or candidate to be reproducers. Therefore, and from an individual point of view, it must be emphasised that 50% of the bulls occupied positions in the ranking lists for TM and LM2 (best agreement) with differences between 20 and 92. Worst results were attained with LM1 and LM3. Consequently, those individual ranking differences will have practical consequences in the reproducers selection.
Previous research has treated CE as categorical data, with threshold models or as a continuous trait with linear models. When linear models are applied, usually, CE scores are directly used as phenotypes (Carnier et al. 2000;Gutierrez et al. 2007;Cervantes et al. 2010). This is possible because CE is conveniently classified on a numerical category scale. To better account for the categorical nature of the CE scores, more recently studies transform the recorded CE scores into the average liability value corresponding to the observed phenotypic category (Eaglen and Bijma 2009;Eaglen et al. 2012). Additionally, this work considers CE classes as the maximum liability value corresponding to the observed phenotypic category (LM3). There are no previous studies considering simultaneously all these approaches. Considering the categorical nature of the CE, from a theoretical point of view, application of a TM is the correct choice (Gianola 1982), whereas, from a practical point of view, LM is a more easily applicable choice (Varona et al. 1999;Ramirez-Valverde et al. 2001;Phocas and Laloe 2003). It should also be taken into account that the use of threshold models significantly increases the computational effort (Gutierrez et al. 2007). Additionally, use of CE scores or transformed CE scores in LM influences genetic evaluation and could have practical impact in reproducers selection Advantages of threshold over linear models have been reported with simulated data (Meijering and Gianola 1985;Hoeschele 1988). However, variable results have been found using field data. Similar performance of threshold and linear models (Weller et al. 1988;Matos et al. 1997) and advantages of linear over threshold models (Carlen et al. 2006) have been reported. Varona et al. (1999), using only data from large herds and the animal model, showed no advantage of univariate threshold over linear models for CE.