Past, present, and future developments in single-step genomic models

Abstract Single-step genomic best linear unbiased predictor (ssGBLUP) is a methodology for estimating breeding values jointly for genotyped and non-genotyped animals. Since its development in the early 2010s, ssGBLUP faced challenges like modelling missing pedigrees, efficiently computing accuracies, ensuring the compatibility between genomic and pedigree information, implementing large-scale genetic evaluations, and using non-genotyped animals for genome-wide association studies, among others. Because of the extensive research and the availability of efficient software packages, those challenges for ssGBLUP were solved. Nowadays, ssGBLUP is the chosen methodology estimating values in almost all livestock populations. This review aims to report the progress of ssGBLUP, outline the current state of the art, and hypothesise about the future of this methodology. Highlights Single-step genomic BLUP is the most popular methodology for genetic evaluations including genotyped and non-genotyped animals. The development of theories and efficient software allows to use single-step for virtually any real dataset. Continuous research in single-step will allow the use of massive amount of data like video recording, omics, among others.


Introduction
Single-step genomic best linear unbiased predictor (ssGBLUP) became the most popular methodology for genetic evaluations, including genotyped and nongenotyped individuals. Since its early development at the beginning of the 2010s, it has been shown to produce accurate estimated breeding values in animals, plants, and humans. This study aimed to review the history of single-step genomic evaluations and provide insights into the future of single-step genetic evaluations.

Preludes of single-step genetic evaluations
According to the classical genetic theory (Fisher 1919), the phenotype of an individual y i ð Þ is explained by an overall population mean l ð Þ , its genetic merit g i ð Þ plus an error term e i ð Þ: Also, its genetic merit is equal to the sum of the additive (u i ), dominance (d i Þ, and epistatic effects (e i Þ (Falconer and Mackay, 1996): Given the, in general, small magnitude of the dominance and epistatic effects, g i % u i , and in consequence: where u i is the breeding value of the i th individual. Therefore, if the additive effects at the quantitative trait loci (QTL) are known, the breeding value of the individual can be accurately obtained from (Fernando and Grossman 1989): where X 0 i is the i th row of the design matrix X for the vector of fixed effects b, z ij is the centred genotype of the i th individual at the j th QTL, and a j is the effect of the j th QTL. However, the genotypes at the QTL are unknown and, therefore, Equation (4) cannot be used for estimation purposes. In the early 2000s, the availability of panels of thousands of single-nucleotide polymorphism (SNP) and the pioneering study from Meuwissen et al. (2001) encouraged using genomic information for genetic evaluations. Given the large number of SNP, it is expected that they are in linkage disequilibrium (LD) with the QTL affecting a specific trait (Meuwissen et al. 2001). Therefore, the breeding values of the animals can be obtained from: where a j is the effect of the j th SNP in the panel. Then, the conceptual Equation (5) leads to two families of equivalent statistical models, known as genomic best linear unbiased predictor (GBLUP) and single nucleotide polymorphism best linear unbiased predictor (SNP-BLUP) (Strand en and Garrick 2009). The structure of the GBLUP models is: where y is the vector of phenotypes, u is the vector of breeding values, e is the vector of errors, W is a design matrix, G is the genomic relationship matrix, and r 2 u is the additive genetic variance. On the other hand, the structure of the SNP-BLUP models' family is: where Z is the centred SNP content matrix, a is the vector of SNP effects, D is the covariance matrix for the marker effects, and r 2 a is the SNP variance. By comparing Equations (6) and (7), the reader can note that both models are equivalent when Gr 2 u ¼ ZDZ 0 r 2 a : For most applications, D ¼  (6) and (7), two families of single-step models exist. Although Equations (6) and (7) are simple, they were not used straightaway because only a small portion of the animals from the livestock populations had available genotypes due to the high genotyping cost. Since most animals were not genotyped, methods for incorporating the genomic information from a small set of animals were necessary for genetic evaluations.
Before the development of single-step methodologies, genomic selection was carried out by the socalled multiple-step procedures (e.g. Boichard et al. 2002;Cole et al. 2009;Harris and Johnson 2010). Besides their differences in details and implementation, most of the multiple-step genetic evaluations involve the following steps: (i) run a pedigree-based genetic evaluation, (ii) obtain pseudo-phenotypes for genotyped animals such as daughter yield deviations (DYD), deregressed proofs (DRP), or adjusted estimated breeding values (EBV), (iii) calculate direct genomic values (DGV) for genotyped animals using a genomic model based on the pseudo-phenotypes obtained in the previous step, and (iv) combine EBV and DGV for genotyped animals using a selection index methodology.
Although its easy implementation, multiple-step genetic evaluations have several drawbacks. These disadvantages include biased or inaccurate predictions for genotyped animals, absence of gain in accuracy for non-genotyped animals, and incompatibility between estimated breeding values for genotyped and nongenotyped animals. We refer the reader to Misztal et al. (2009) and Patry and Ducrocq (2011) for more details on these disadvantages.

First stages
Given the disadvantages of multi-step genetic evaluations, Misztal et al. (2009) proposed to include both genotyped and non-genotyped animals in a single genetic evaluation by replacing the numerator relationship matrix (A) with a new covariance matrix (H) that combines genomic and pedigree relationships. With such a matrix, estimated breeding values for non-genotyped and genotyped animals can be obtained from the following model: However, Legarra et al. (2009) noticed that H was not a proper covariance matrix because it may not be positive semi-definite. Additionally, the derivation by Misztal et al. (2009) did not consider the distribution of breeding values of ungenotyped animals conditioned on breeding values of genotyped animals.
Consequently, Legarra et al. (2009) proposed the currently used single-step covariance matrix H, which has the following structure: where the subscripts 1 and 2 represent the blocks pertaining to the non-genotyped and genotyped animals, respectively. Letting u 1 and u 2 be the vector of breeding values for non-genotyped and genotyped animals, respectively, the structure of H represents the covariance of the conditional distribution of u 1 on u 2 , given that u 2 was updated by marker information. Henderson (1975) used such a structure to model the effects of selection, which can be traced back to Pearson (1903). Since Legarra et al. (2009) correctly defined the joint covariance matrix for non-genotyped and genotyped animals, the MME for a single-step genetic evaluation with the model (8) are: Note that the inverse of H is needed in (10). However, when single-step was developed, its algebraic structure remained unknown. Although Misztal et al. (2009) proposed different computational approaches for altered versions of (10) without requiring H À1 , none of them were as computationally efficient as equations Equation (10).
However, Aguilar et al. (2010) and Christensen and Lund (2010) discovered that the structure of H À1 was simpler than the structure of H: The authors found that the algebraic expression for H À1 is: where, henceforth, the superscript ij of a matrix refers to the ij th block of its inverse. Due to its simple structure, the above matrix allows using Equation (10) for various models for routine genetic evaluations. The genomic relationship matrix is singular when the number of genotyped animals exceeds the number of markers, and in the presence of clones or monozygotic twins. Thus, a small fraction of a positive definite matrix is added to G to ensure its non-singularity in a procedure known as blending (VanRaden 2008). In ssGBLUP, the most common choice for blending is 1 À b ð ÞG þ bA 22 , where 0 b 1: A less common but computationally more efficient alternative is using the identity matrix instead of A 22 : When the choice is A 22 , the underlying genomic model is equivalent to a marker effects model with a residual polygenic effect (RPG) with a covariance matrix equal to A 22 (Christensen and Lund 2010).

The compatibility between genomic and pedigree information
The first applications of ssGBLUP to commercial datasets revealed the presence of inflation in the GEBV for genotyped animals (Forni et al. 2011;Christensen et al. 2012). Also, Aguilar et al. (2010) implicitly recognised it and proposed a scaling factor for reducing the overestimation of GEBV. Vitezica et al. (2011) and  identified that the origin of the problem was in the difference between the genetic base of pedigree and genomic relationship matrices. Since the genotyped animals were not genotyped at random, their base population was different from the pedigree's base population. Consequently, this generated incompatibility between G and A 22 : To overcome this incompatibility, Vitezica et al. (2011) proposed adjusting the genetic base of G to that of A 22 , resulting in: . This is the method proposed by Powell et al. (2010), closely related to the one proposed by Vitezica et al. (2011).  presented two different approaches to ensure the compatibility between the genomic and pedigree relationship matrices. The first follows the ideas from Vitezica et al. (2011) of adjusting G: He proposed to adjust the genetic base of G by using a simple regression of the form: As mentioned by the author, the coefficients can be estimated either by least squares (VanRaden 2008) or by equating the averages of the matrices and their diagonals Gao et al. 2012). It turns out that both the method of Vitezica et al. (2011) and The second approach proposed by Christensen (2012) consisted of adjusting the complete numerator relationship A to the genetic base of G: The author claimed that such an approach depends on two parameters, c and $ s, which can be estimated by maximum likelihood. After being estimated, A is modified such that For generalising the method of Christensen (2012), Legarra et al. (2015) introduced the concept of metafounders. They defined the metafounders as pseudoindividuals that relate pedigree founders and different populations. To use metafounders in single-step, the genomic relationship matrix is constructed with allele frequencies equal to 0.5, and the numerator relationship matrix is modified by including a multidimensional parameter, called C: The matrix H À1 with metafounders is: where the subscript C denotes that a matrix was constructed using metafounders, and the subscript 0.5 denotes that it was calculated with allele frequencies equal to 0.5. In genetic evaluations with metafounders, C reflects the covariance between the allele frequencies in the base populations and can be estimated from pedigree and genomic information (Garcia-Baccino et al. 2017). In the case of a single metafounder, C ¼ c, and the approach of Legarra et al. (2015) is the same as the method of Christensen (2012). As they can relate a priori unrelated base populations, metafounders are especially helpful for multibreed single-step genomic evaluations (Junqueira et al. 2020;Kluska et al. 2021).

Missing pedigrees
Genetic evaluations use unknown parent groups (UPG) for modelling missing pedigrees for classical animal models. UPG can be fitted as covariates in the model equation (Westell et al. 1988) or by modifying the inverse of the covariance matrix for the random effects (Quaas 1988). This second approach is more used than the first one because of its computational efficiency and is the chosen one when modelling missing pedigrees in ssGBLUP. In its introduction, Misztal, Vitezica et al. (2013) pointed out that single-step genomic evaluations had problems when UPG modelled missing pedigrees. They claimed that, at that moment, commercial implementations of ssGBLUP with UPG showed significant ranking changes for selection candidates, poor convergence rate, and biased UPG solutions when fitted as suggested by Quaas (1988). However, these problems were not observed when UPG were included as a separated covariate in the model equation. Thus, Misztal, Vitezica et al. (2013) applied the Quaas-Pollack transformation (Quaas and Pollak 1981) to the model equations with UPG as explicit covariates for deriving single-step with UPG. Holding the model in Equation (8), the MME for ssGBLUP with UPG are: where Q is a matrix assigning animals to UPG,ŝ represents the UPG solutions andû Ã is the BLUP of u Ã ¼ Qs þ u: The distinctive feature of Equation (15) compared to a situation where UPG are included only in A is the multiplication Q 0 2 G À1 À A À1

22
À Á Q 2 located in the third diagonal block of the left-hand side of Equation (15), where Q 2 is the block of Q pertaining to the genotyped animals. Misztal, Vitezica et al. (2013) discussed more specific details and implementation procedures for Equation (15).
Starting from the hypothesis that the information in G is complete, Tsuruta et al. (2019) proposed eliminating the UPG contributions for the genomic relationships and  showed that this is equivalent to modelling first the missing pedigrees and second the inclusion of genomic information. In contrast, the approach of Misztal, Vitezica et al. (2013) is the opposite. In this setting, the multiplication Q Although both approaches give similar results, the second approach is more computationally efficient with many UPG.
Since they are a generalisation of UPG, metafounders can also model missing pedigrees in ssGBLUP. This method has the advantage of modelling missing pedigrees and simultaneously accounting for the incompatibility between genomic and pedigree information.  presented a similar approach (i.e. encapsulated UPG) based on UPG instead of metafounders. However, the encapsulated UPG method does not guarantee the compatibility between A and G per se; thus, tuning methods are still required.

Individual accuracies
The accuracy of an individual's GEBV is defined as the correlation between the estimated and true breeding value under a conceptual repeated sampling. According to the BLUP theory (Henderson 1984, p. 41), the individual accuracies can be obtained as a function of the diagonal elements of the inverse of the coefficient matrix in Equation (10). Approximation methods need to be used when the coefficient matrix is too large to be inverted. Most of the approximation methods have the following steps: (i) calculate accuracies for a model without genomic information, (ii) calculate genomic accuracies, (iii) remove doublecounting of information (i.e. phenotypes of genotyped relatives), and (iv) propagate genomic information to non-genotyped animals. Misztal, Tsuruta et al. (2013) proposed to calculate the amount of information under ssGBLUP models from the diagonal of the matrix: where E is a diagonal matrix of effective number of records or effective record contributions, and a is the ratio between the error and genetic variance. The effective number of records are obtained from genotyped animals' pedigree accuracies. After calculating the diagonal elements of B À1 , the accuracy of the i th genotyped animal is calculated as acc i ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 1 À aB À1 ii q : Accuracies for non-genotyped animals are then obtained by updating the effective record contributions with weights derived from the calculated accuracy for genotyped animals.
Instead of inverting Equation (16), Liu et al. (2017) proposed calculating genomic accuracies based on an SNP-BLUP model and removing double-counting of information by deriving weights from pedigree accuracies calculated only with genotyped animals. In their study, they also proposed adjustment factors to avoid accuracy overestimation. These factors can be calculated by cross-validation from previous data. Edel et al. (2019) suggested a similar strategy with additional steps to simplify the structure of the matrices to be inverted. The most time-consuming task for these methods is the calculation of genomic accuracies; therefore, different strategies were developed to reduce such time (Ben Zaabza et al. 2020, 2021.

Marker effects in ssGBLUP
By the equivalence between SNP-BLUP and GBLUP, the marker effects from the former can be obtained as a linear function of the GEBV of the latter and vice versa. Keeping the notation from Equations (6) and (7), the BLUP for the marker effectsâ ð Þ can be calculated as (Strand en and Garrick 2009): whereû is the BLUP of the vector of breeding values. In single-step, since the statistical model is conditional on the genomic information, Equation (17) can be used for estimating the marker effects. However, the genomic relationship matrix in ssGBLUP is usually blended and tuned. If tuning is applied before blending, G is overwritten as: Then, Equation (17) is modified as follows: In single-step, the estimates of marker effects have a dual purpose. For genetic evaluations, they can be routinely used for calculating interim or indirect predictions, whereas they can also be used for genomewide association studies (GWAS).
When new genotyped animals without phenotypes or progeny enter the database between genetic evaluations, indirect predictions (IP) estimate their breeding values without performing the complete genetic evaluation; hence, saving computing time. Suppose these genotyped animals have a SNP content matrix Z ip centred with the same allele frequencies as in the original genetic evaluation, the GEBV of the indirectly predicted animals are: where l ip is the mean for genotyped animals.
Including it in Equation (20) is for avoiding biased indirect predictions Pimentel et al. 2019). The indirect predictions calculated in Equation (20) are known as direct genomic values (DGV), which do not consider polygenic effects. Therefore, they are an approximation when the underlying model considers a high proportion of residual polygenic effect. To consider polygenic effects for indirect predictions, Liu et al. (2016) presented the following expression: where A ip, g is the block of the numerator relationship matrix relating the indirectly predicted animals with the animals used for calculating marker effects in Equation (19). A significant drawback of the standard GWAS methods is that only genotyped individuals can be used (i.e. EMAXX from Kang et al. 2008;Kang et al. 2010). Since the number of genotyped animals in livestock populations can be small, GWAS for livestock species suffer from the lack of statistical power and large standard errors associated with QTL detection. Thus, Wang et al. (2012) suggested using single-step for GWAS, developing a method called single-step GWAS (ssGWAS). This allows for including records for nongenotyped animals and for using multiple-trait and complex models. The method works as follows: and p j is the minor allele frequency at the j th SNP. 3. Back-solve the GEBV to SNP effects using Equation (19). 4. Calculate the weight for the j th marker as D jj 1 a 2 j p j 1 À p j ð Þ (Zhang et al. 2010). 5. Normalise D for the genetic variance being constant across iterations. 6. Go to step 2 for ssGWAS1 or to step 3 for ssGWAS2.
The computational difference between ssGWAS1 and ssGWAS2 is that the former requires running ssGBLUP once per iteration, whereas the latter only at the first iteration. The choice depends on the stability of GEBV in consecutive iterations. Since this is an adhoc algorithm, specific methods for calculating p-values for the marker effects as in the standard GWAS were developed (Aguilar et al. 2019). This procedure, known also as ssGWAS, is as follows: 1. Calculate H À1 : 2. Set and solve the MME from Equation (10). 3. Calculate the matrix of prediction error (co) variances for genotyped animals C ð Þ: 4. Back-solve the SNP effects from the estimated breeding values using Equation (19). 5. Calculate the variance of the SNP effects' estimates as where Z i is the column of Z corresponding to the i th marker. 6. Calculate the p-value for the ith SNP as where U Á ð Þ is the cumulative density function of the standard normal distribution.
As stated by Aguilar et al. (2019) the estimates and variances of SNP effects calculated from ssGWAS can be transformed to fixed regression estimates as calculated from EMMAX (Bernal Rubio et al. 2016).

Marker effects from equivalent models
As for the equivalence between GBLUP and SNP-BLUP, two families of equivalent models to ssGBLUP that estimate the marker effects instead of the breeding values of the animals or both effects together exist. Intending to present computational strategies to improve single-step genomic evaluations, Legarra and Ducrocq (2012) proposed several models equivalent to ssGBLUP. Building from their work, two families of marker effects single-step models were developed: the so-called single-step Bayesian Regression (ssBR; Fernando et al. 2014) and single-step SNP-BLUP (ssSNP-BLUP; Liu et al. 2014, Taskinen et al. 2017. Fernando et al. (2014) reviewed the theoretical derivation of ssGBLUP in Legarra et al. (2009) and introduced ssBR as an alternative to avoid the inversion of G and A 22 in ssGBLUP. Recalling that the subscripts 1 and 2 refer to the non-genotyped and genotyped animals, respectively, the general equation for ssBR is: , l g is the expected breeding value for genotyped animals, and ϵ is a vector of imputation errors for non-genotyped animals. The term Jl g is equivalent to the adjustment proposed by Vitezica et al. (2011), differing in that the first considers l g as fixed and the second as random. If multivariate normality is assumed for y, a, ϵ, and e, Equation (22) yields MME of size equal to the sum of the number of fixed effects, markers, and non-genotyped animals plus one. However, Equation (22) is flexible for implementing a Gibbs sampler when a prior distribution different from a normal distribution is assumed for the marker effects (Gianola 2009). MME resulting from Equation (22) usually involve a diagonal covariance matrix; hence, their inversion is trivial. However, the parts of the MME corresponding to the incidence matrices of the model are dense. Therefore, large matrix multiplications or specific solving strategies (Strand en and Garrick 2009) are needed. Regardless of the estimation procedure, the BLUP of the breeding values is: In contrast to ssBR, the ssSNP-BLUP models do not change the incidence matrix for the statistical model. Consequently, the model structure is the same as in Equation (8). However, in ssSNP-BLUP, the vector of breeding values has the marker effects appended to it. Therefore, H À1 from Equation (11) is extended to: The above covariance matrix yields the following MME: As proposed by Liu et al. (2014), the MME in Equation (25) can be separated into two different equations: and However, using Equations (26) and (27) leads to models with convergence problems (M€ antysaari et al. 2020). Clearly, Equations (24)-(27) are not defined if b ¼ 0: Thus, in these settings, ssSNP-BLUP always involves a residual polygenic effect or implicit blending.
Using ssGBLUP, ssBR, or ssSNP-BLUP for obtaining estimates of the marker effects depends on the assumed prior, the statistical model, computational resources, and user preference.

Large-scale genetic evaluations
At the first stages of the implementation of singlestep genomic evaluations, Equation (11) was easy to calculate and plug into the MME. Following Henderson's rules (Henderson 1976;Quaas 1976), A À1 is easy to calculate for any number of animals, and G and A 22 were easy to construct, the latter by Colleau (2002), and invert because of the small number of genotyped animals. As the number of genotyped animals grew, the inversion of both matrices turned unfeasible given the usual computational resources for genetic evaluations.
A logical choice to overcome the inversion of G is to use ssBR or ssSNP-BLUP, which are models that do not involve G À1 : However, they involve complex products of the form X ¼ A 12 A À1 22 Y, where X and Y are matrices or vectors required for estimating the breeding values for non-genotyped animals. Fernando et al. (2014) suggested that such products can be calculated by solving the sparse linear system of equations A 11 X ¼ ÀA 12 Y using preconditioned conjugate gradient (PCG). However, the resulting matrix X, known as the matrix of imputed genotypes, is dense and large to store in memory. For solving this issue in ssBR, Fernando, Cheng, Golden et al. (2016) presented the hybrid model, which directly fits the breeding values for non-genotyped animals plus the marker effects. In ssSNP-BLUP, Taskinen et al. (2017) introduced equivalent models that, instead of storing the matrix of imputed genotypes, impute them 'on-the-fly'. In ssGBLUP, products of the form A À1 22 x, where x is a vector, are needed. Very efficient methods based on the equality A À1 22 ¼ A 22 À A 21 A 11 ð Þ À1 A 12 allow calculating A À1 22 x for many genotyped animals (Masuda et al. 2017;Strand en et al. 2017).
Although ssGBLUP requires G À1 , this method is widely used for genomic evaluations. This is because (25) it can accommodate more complex models than ssBR and the software development is more straightforward because any program for pedigree-based genetic evaluations can be modified to ssGBLUP by replacing A À1 by H À1 : Even though ssSNP-BLUP uses the same incidence matrices as ssGBLUP, memory requirements might be higher depending on the implementation, and convergence problems due to ill-conditioned systems of equations were reported (Vandenplas, Eding et al. 2018, 2019. However, convergence problems can be solved using a second-level preconditioner (Vandenplas et al. 2019).
Mimicking the ideas of Henderson (1976) and Quaas (1988) for computing A À1 ,  introduced the Algorithm for Proven and Young (APY) to compute a sparse representation of G À1 , hereafter denoted as G À1 APY : The idea underlying APY is that genotyped animals can be divided into proven or core animals (c) and young or noncore animals (n). Then, the breeding values of the noncore animals u n ð Þ can be written as a linear combination of the breeding values of the core animals u c ð Þ, plus an error term n ð Þ : cc is the regression coefficient matrix of u n on u c : Assuming that Var n ð Þ ¼ M nn ¼ diag G nn À G nc G À1 cc G cn À Á and taking variances of both sides of Equation (28): Then, the inverse of Equation (29) is: The first diagonal block is a dense square matrix of size equal to the number of core animals, whereas the off-diagonal block is a dense rectangular matrix of size equal to the number of core animals times the number of noncore animals. Lastly, the second diagonal block is a diagonal matrix. If the number of core animals is much less than the number of noncore animals, Equation (30) is sparse, and it can be calculated for many genotyped animals .
At the beginning of APY implementations, the division between proven and young animals was based on phenotypic and progeny information. However, Fragomeni et al. (2015) empirically showed that the definition of proven and young is not critical for the method to work. Thus, the groups were named core and noncore, and the core animals can be chosen as a random sample from the genotyped animals. Pocrnic et al. (2016) related the number of core animals with the number of eigenvalues explaining a certain percentage of the variance of the spectrum of G: Misztal (2016) hypothesised that the number and type of animals selected to be core depend on the number of independent chromosome segments present in the population. Thus, the core animals can be chosen at random if they represent all the independent chromosome segments in the population. However, selecting the type and number of core animals in APY is still a topic of discussion and research (Bradford et al. 2017;Nilforooshan and Lee 2019).  presented a similar approach to APY but based on an orthogonal decomposition of G: As an alternative to APY, M€ antysaari et al. (2017) developed ssGTBLUP, which is equivalent to ssSNP-BLUP when the equations for the marker effects are absorbed into the MME. When G is blended as G ¼ ZZ 0 þ C, where C is usually proportional to the identity matrix or A 22 : Then, G À1 is obtained following the Woodbury formula as: where T ¼ L À1 Z 0 C À1 is a matrix of size equal to the number of markers times the number of genotyped animals, and L is the Cholesky factor of Z 0 C À1 Z þ I: Solving the MME by PCG requires multiplying a vector x from right to left in Equation (31). Therefore, it is not needed to compute G À1 but C À1 x and T 0 Tx: The first product is easy given that C À1 is the identity matrix or A À1 22 , and the matrix T is calculated before solving the MME. For a further reduction in computing time, M€ antysaari et al. (2017) where UKU 0 is the eigendecomposition of Z 0 C À1 Z: Using this definition, the authors suggested applying a low-rank approximation to Equation (31) by eliminating the eigenvectors corresponding to the eigenvalues close to zero, resulting in an approximation to ssGBLUP as in APY. In the same fashion, Ødegård et al. (2018) presented a low-rank approximation of G À1 based on the singular value decomposition of Z:

Variance component estimation
Since the model structure of ssGBLUP is the same as the animal model but replacing A À1 by H À1 , any method for estimating variance components and reliabilities applied to the animal model can be employed in ssGBLUP. We refer the reader to Hofer (1998) for a review of these methods.
When all the animals are genotyped, the variance components obtained from a GBLUP are equal to those calculated without genomic information when the number of markers increases towards infinity and the pedigree information is complete (Cuyabano et al. 2018). Given a finite number of markers, Cuyabano et al. (2018) showed that the heritability calculated with a GBLUP model is likely underestimated because the markers do not capture all the genetic variation. Although it is not proven and a formal proof would be out of the scope of this review, we expect that with blending or a residual polygenic effect, the variance components calculated with a ssGBLUP model should not be biased (Cesarani, Pocrnic et al. 2019). Also, the estimated variance components should match those estimated with an animal model, subject to pedigree completeness and sample size. In practice, estimates may or not match (Hidalgo et al. 2020) depending on the statistical model, sample size, selection, and genotyping strategies. These hypotheses should hold for any variance components estimation method given their convergence.
ssGBLUP applied to livestock genetic evaluations Present and future challenges Currently, ssGBLUP is the method of choice for genomic evaluations in most livestock species, where not all the animals in the evaluations have genotypes. The current state of the art allows implementing most of the models used in animal breeding for a very large number of genotyped animals. This is possible due to the wide availability of efficient software packages (BLUPF90, Misztal, Tsuruta et al. 2014;BOLT, Garrick et al. 2018;DMU, Madsen et al. 2014;Mix99, Lidauer et al. 2015;MixBLUP, Mulder et al. 2012). Applying complex models such as social interaction models or including dominance, epistasis, and genotype by environment interactions is still challenging when the number of genotypes is large (Varona et al. 2018).
For genomic evaluations, we foresee a massive amount of information in the form of omics, images and videos of animals' behaviour, and whole-genome sequence. This information may help define new phenotypes and improve phenotype prediction. However, it is not clear whether this data would help increase the accuracy of ssGBLUP predictions. Nonetheless, the key would be to find a parsimonious deal between incorporating these new sources of information and the simplicity and efficiency of ssGBLUP genomic evaluations. These new data create the need to increase the efficiency and flexibility of ssGBLUP continuously. Thus, ongoing research aims to make ssGBLUP an efficient tool for constantly growing datasets. These research topics include an approximation of theoretical accuracies of estimated breeding values, improving convergence of the solving algorithms, increasing the efficiency for categorical traits analysis, and calculating p-values for large-scale ssGWAS.
The availability of software and various datasets allow testing different models and methodologies based on single-step. Almost all studies draw conclusions based on cross-validation (Thompson 2001; Gianola and Sch€ on 2016; Legarra and Reverter 2018). However, cross-validation results depend highly on the data and the animals selected for the validation set. Thus, cross-validation studies may not consider peculiarities of animal breeding such as population structure, selection, selective genotyping, among others. We believe that there is space for research on different methods for model selection in ssGBLUP. New approaches would complement cross-validation and allow making trustworthy decisions for ssGBLUP research or genomic evaluations.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This study was partially funded by Agriculture and Food Research Initiative Competitive Grant no. 2020-67015-31030 from the US Department of Agriculture's National Institute of Food and Agriculture.