The t linear mixed model: model formulation, identifiability and estimation

Abstract The robustness of the t linear mixed model (tLMM) has been proved and exploited in many applications. Various publications emerged with the aim of proving superiority with respect to traditional linear mixed models, extending to more general settings and proposing more efficient estimation methods. However, little attention has been paid to the mathematical properties of the model itself and to the evaluation of the proposed estimation methods. In this paper we perform an in-depth analysis of the tLMM, evaluating a direct maximum likelihood estimation method via an intensive simulation study and investigating some identifiability properties. The theoretical findings are illustrated through an application to a dataset collected from a sleep trial.


Introduction
Linear mixed models (LMM) are widely popular for their appealing feature of modeling both the intra-and inter-subject variability with a broad set of possible correlation structures (Laird and Ware 1982). However, they rely on normality assumptions, and can be very sensitive to outlying data and heavy tails distributions. The multivariate t distribution (see the publication of Kotz and Nadarajah (2004) for a comprehensive discussion of its characterizations and properties) has been proposed to make linear regression and mixed linear regression robust (Lange, Little, and Taylor 1989;Pinheiro, Liu, and Wu 2001). In particular, the t linear mixed model (tLMM) is a natural extension of the classical LMM in which the response variable is jointly t distributed (Pinheiro, Liu, and Wu 2001). A convenient hierarchical formulation of the tLMM can be derived by mixing the multivariate normal distribution of the response variable with a gamma-distributed latent scaling variable for the covariance matrix (Pinheiro, Liu, and Wu 2001). This characteristic makes the tLMM robust to heavy tails distributions of the observed data, and thus more appropriate than the LMM when normality assumptions are violated. Furthermore, the tLMM is a favorable choice over other robust methods because robustness is achieved with only one extra parameter compared to traditional LMMs, and the model structure allows the analysis of random variances.
In fact, many authors have explored applications and extensions of the tLMM, and the related literature is quite broad.
A similar model has been proposed to analyze heterogeneous within-subject variances (Chinchilli, Esinhart, and Miller 1995). Here the authors made no distributional assumptions on the random effects, while they imposed a similar hierarchical structure for the residual variances. Chinchilli, Esinhart, and Miller (1995) aimed at studying the within-subject variability of two different laboratory techniques, treated consequently the fixed effects as nuisance parameters, and proposed partial likelihood estimation for a model with only random intercept and no covariates. An extension of this model has been introduced for the analysis of menstrual diary data, where both the mean structure and the mean of the latent variable were assumed to link to some covariates (Lin, Raz, and Harlow 1997). The estimation approach proposed here combined quasi-likelihood, pseudolikelihood and method of moments (QL/PL-M) in consecutive steps for the fixed effects, variance components and parameters of the distribution of the latent variable.
First references on tLMMs were focused on making linear mixed models robust against outliers, as an alternative to Gaussian mixture models (Welsh and Richardson 1997). They studied the effect of different levels of contamination on the response variable without deriving the marginal distributions or discussing algorithmic details of the estimation. Later, Pinheiro, Liu, and Wu (2001) derived the hierarchical structure and the distribution of random effects and residuals behind the assumption of multivariate t distribution. This work included the description of various expectation maximization (EM) algorithms for maximum likelihood estimation. The model has been later extended by allowing autoregressive residual structures and estimated by using direct maximum likelihood estimation (Lin and Lee 2006), EM-type algorithms (Lin 2008), and Bayesian methods (Lin and Lee 2007). In these publications the authors discussed also the prediction of future values, which is a natural consequence in the Bayesian formulation. Multivariate extensions of the tLMM have been studied in the frequentist and Bayesian frameworks Fan 2011, 2012) and extended by other authors for handling missing values, censored data and heavy tails (Ho and Lin 2010;Matos et al. 2013;Wang 2013a;Lin 2014, 2015;Wang, Lin, and Lachos 2018;Lin and Wang 2019). Besides allowing robust inference, the tLMM enables the detection of outliers, intended as observations that do not comply with normality (and therefore with the LMM) (Pinheiro, Liu, and Wu 2001;Lin and Lee 2007;Ho and Lin 2010;Fan 2011, 2012;Matos et al. 2013;Wang and Lin 2014;Wang 2017).
The publications referenced so far focused on applications and extensions of the tLMM, on proposing alternative estimation methods for the tLMM, and on showing the supremacy of the tLMM over the traditional LMM. However, the literature is lacking investigations on the consistency of the proposed estimation methods and the relative estimation error in general settings, as authors seldom include simulation studies. The QL/PL-M estimation approach has been validated through an intensive simulation study (Lin, Raz, and Harlow 1997), but the underlying t mixed model included only a random intercept and assumed independence of residuals. Pinheiro, Liu, and Wu (2001) have run a small simulation study focusing on the robustness of the tLMM when compared with the LMM in presence of varying amount of outlying observations, but reported only the relative efficiency. The ML estimation method has never been thoroughly investigated, but has often been employed in the applications that are consistently part of the publications (Lin and Lee 2006). Models formulated in their full generality might not be estimable with the data in hand, and fitting is often performed in some particular cases with uncorrelated random intercept and slope (Lin 2008), or allowing only either random slope or autoregressive correlation structure (Lin and Lee 2006). Furthermore, the standard error of the estimates for the frequentist approaches is based on asymptotic theory (based on the observed/expected Fisher information matrix). Although the authors have recognized that this method might lack accuracy, they have not investigated its reliability in finite-sample settings because of its computational load (Pinheiro, Liu, and Wu 2001;Lin and Lee 2006;Lin 2008). Finally, the common parametrization is not justified and hides possible identifiability issues: all the publications on tLMM make certain choices on the parameters and select parsimonious correlation structures, without giving mathematical explanation for them. Identifiability of the tLMM has never been addressed, while identifiability for LMMs has recently raised much attention. Many authors warned the users of possible misinterpretation of the output given by software capable of fitting LMM (Wang 2013b, Wang 2017Samani 2014;Janz en et al. 2019;Lavielle and Aarons 2016). In fact, also when overparametrization is avoided, the problem might be non-identifiable, and various automatic procedures might return incorrect estimates without warnings.
In this paper, we address the possible formulations of the tLMM (cf. Sec. 2) and pin choices of the variance-covariance structure that lead to identifiability issues (cf. Sec. 3).
We run a small simulation study to demonstrate the silent risk of converging to the wrong values when the model is not identifiable. Then we choose a popular parametrization of an identifiable model and run an intensive simulation study in Sec. 4. The model we investigate includes various fixed effects, correlated random intercept and slope, and an autoregressive residual correlation structure. We simulate a large number of repeats compared to a relative small number of subjects, a setting that is realistic in various practical situations, but that has rarely been formally analyzed (Lin, Raz, and Harlow 1997). The simulations are based on our case study of a sleep trial. We show the accuracy of an extended version of the simple and direct ML estimation method of Lin and Lee (2006) when the fitted model is identifiable and prove that, as expected, the estimation bias decreases substantially when the number of subjects increases. Furthermore, we quantify the error committed in evaluating the estimates' standard deviation in finite samples when relying on ML asymptotic theory. We also propose a concise overview of the approaches proposed in the literature, and based on the tLMM fit, to identify observations that deviate from the normality of the data. Finally, in Sec. 6 we illustrate the tLMM on a case study from a sleep trial, compare its fit to the fit of the LMM and identify outlying participants. The final section presents a summary and a discussion of the results.

Model formulation
The t linear mixed effects model is defined by where Y i is the vector of T i observations for subject i ¼ 1, :::, n, X i and Z i are known T i Â p and T i Â q matrices corresponding to the vectors f and u i of the fixed and random effects respectively, and e i is a vector of T i residuals (Pinheiro, Liu, and Wu 2001;Lin and Lee 2006;Lin 2008). The random effects are assumed to be independent of the residuals conditionally on the latent variable c i Here G is the covariance matrix for the random effects and R i is the residuals correlation matrix. Under the stated distributional assumptions, Y i is t distributed with a degrees of freedom, mean X i f and variance-covariance matrix Lin, Raz, and Harlow 1997;Pinheiro, Liu, and Wu 2001).
This parametrization is commonly used in the literature and the choice is motivated by identifiability issues: relaxing the restrictions on the distribution of the latent variable c i leads to This model is in fact non-identifiable unless a and b have a certain constant ratio (cf. Sec. 3). By taking r 2 ¼ b=a, one can then show that the two formulations are equivalent.

Model identifiability
If we introduce the notation fs i , i ¼ 1, :::, qg and fg ij , i, j ¼ 1, :::, q, i 6 ¼ jg for the parameters of the variance-covariance matrix G and consider here a general covariance matrix R i with parameters fr rs , r, s ¼ 1, :::, T i g, the set of parameters involved in the t density is h ¼ ðf 1 , :::, f p , s 1 , :::, s q , g 12 , :::, g qqÀ1 , r 11 , r 12 , :::, r T i T i , a, bÞ: To demonstrate identifiability we need to show that and D 2 i ¼ ðY i À X i fÞ TX À1 i ðY i À X i fÞ: First note that if X i is of full rank, then the coefficients for the fixed effects are uniquely determined. For the remainder, we assume that both design matrices X i and Z i are of full rank and that the variance-covariance matrices G and R i are positivedefinite. Under these assumptions, the problem reduces to identifiability of the matrix X i ¼ ffiffiffiffiffiffiffi ffi b=a p X i : Obviously, identifiability is not met in this general formulation for two reasons. First, a and b can be reparametrized by ca and cb with some constant c > 0, leading to the same matrixX i as the original parametrization, thus we need to fix the ratio b=a ¼ r 2 : Second, we may rescale cr and X i =c and obtain the same matrixX i in the original parametrization. To overcome this identifiability issue we assume that R i is a correlation matrix, that is, r ss ¼ 1, 8s ¼ 1, :::, T i : In this way the scale ffiffiffiffiffiffiffi ffi b=a p ¼ r represents the residual standard deviation, constant over the dimensions of the multivariate outcome. The set of parameters reduces thus to h ¼ ðf 1 , :::, f p , s 11 , :::, s qq , g 12 , :::, g qqÀ1 , q 12 , :::, q T iÀ1 T i , r, aÞ, where we have changed the residual covariances r rs into the correlations q rs .
Since X i is a T i Â T i symmetric matrix, at maximum it will have N distinct elements, with Therefore identifiability requires the number of parameters in the variance-covariance structure to not exceed N. Also when this condition is satisfied, there may be combinations of G and R i that lead to unidentifiability.
To study identifiability of matrix in Eq. (9), we consider two sets of parameters h 1 and h 2 , take r 2 2 ¼ r 2 1 =c with any c > 0, and consider the system of equations obtained by equating the elements in the two covariance matrices correspondingly. From the diagonal elements we get s 2 02 ¼ cðs 2 01 þ 1Þ À 1, s 2 12 ¼ cs 2 11 , Substituting these expressions in the off-diagonal elements, we get ðq rs Þ 2 ¼ cððq rs Þ 1 À 1Þ þ 1, r, s ¼ 1, :::, T i : An unstructured correlation matrix is thus clearly non-identifiable in presence of random effects, because all conditions in Eqs. (11) and (12) can be satisfied by a certain choice of c > 0. Even if we assume that there are no random effects (s 01 ¼ s 11 ¼ g 1 ¼ 0) it is possible to define a model with alternative variance-covariance structure that is equivalent to the one of the original model, in which R i is replaced by the introduction of random effects. Indeed, if we choose s 2 02 ¼ c À 1, s 2 12 ¼ g 2 ¼ 0, we create another set of admissible parameters that, together with Eq. (12), satisfies Eq. (9). In this case the residual variance is being compensated by a random intercept. Strictly speaking thus, this is not an identifiability problem in the sense of the definition because if we start with the parameter set q ¼ ðq 1, 2 , :::, q T i À2, T i À1 Þ alone, we are not allowed to increase the parameter set. However, this illustrates how the same variance-covariance structure can be obtained with very different parametrizations.
In the following, we explore how various assumptions on R i affect conditions in Eq. (12), and thus identifiability of the variance-covariance matrix. The results we obtain are for tLMM with random intercept and slope, but the particular case of a tLMM with random intercept only can be obtained by setting s 1 ¼ 0 in Eq. (9).

Toeplitz correlation matrix
1 q c ::: 1 q ::: ::: :: This implies the additional constraints on the correlations q r, s These constraints need to hold both for the parameter set h 1 and h 2 , together with the conditions in Eqs. (11) and (12). The Toeplitz correlation structure reduces both the number of parameters and the number of constraints, but it does not induce identifiablity. For example, in case of four repeats we would have the following conditions on the correlation coefficients q, c and w: and these expressions can clearly be satisfied for a value c other than (but possibly close to) one. Note that the model remains unidentifiable with an increased number of repeats, since it adds one parameter for each repeat. Moreover, reducing the model to a random intercept only does not resolve the identifiability of the parameters q, c and f either.

Banded Toeplitz correlation matrix
A banded Toeplitz matrix is a Toeplitx matrix in which the diagonals outside the bandwidth are equal to zero R i ¼ 1 q c 0 ::: 0 q 1 q c ::: 0 c q 1 q ::: 0 0 0 ::: ::: ::: 0 0 0 ::: This means that for time lags larger than the bandwidth, the correlation is zero and there is no correlation coefficient to be estimated. Therefore the off-diagonal elements impose additional conditions that lead to identifiability. For example, take the last element in the first row of Eq. (9) and equate the expression for two sets of parameters h 1 and h 2 , assuming that q 1, T i ¼ 0 due to a bandwidth smaller than T i . By plugging in the expressions for s 02 , s 12 , g 2 in Eq. (11), we obtain one unique solution c ¼ 1 which coincides with identifiability. Thus, the model is identifiable as long as the bandwidth is less than or equal to T i À 1: The same conclusion holds in case the model includes only the random intercept.
Note that this holds true also if we replace the power by a distance measure over time, as long as the distances differ from each other. The spatial correlation structure / d 12 ::: 1 ::: also makes the model identifiable with at least three observations, both in case of random intercept and slope or random intercept only.

AR(2)
In case of AR(2), the elements in matrix Eq. (16) are given by For verifying identifiability we consider thus the additional constraints (Eq. 12) that reduce to From the first two equations we obtain and these two expressions plugged in the third equation in Eq. (20) lead to the only admissible solution c ¼ 1, / 12 ¼ / 11 , / 22 ¼ / 21 : Thus already four observations are sufficient to make the tLMM with random intercept and slope and AR(2) correlation structure identifiable. Again, the same conclusion is obtained for the simpler model with random intercept only by setting s 1 ¼ 0: AR (3) In case of AR (3), the correlations for time lags 1, 2, 3, 4 (and thus the corresponding elements in matrix (Eq. 16)) are given by To make Eq. (12) explicit in this case, we fill in the expressions above for the autocorrelation corresponding to two sets of parameters h 1 and h 2 . The resulting system of equations admits only the solution of equal sets of parameters and c ¼ 1. Thus the tLMM with random intercept and slope and AR(3) correlation structure is identifiable with at least five observations. Similarly, the model with random intercept only is identifiable with the same number of observations. Note that increasing the order of the autoregressive structure increases the complexity of the resulting system of equations. We expect that the tLMM with AR(p) correlation structure is identifiable when the number of observations T i > p, but a formal proof is challenging. For AR(3) we had to resort to Mathematica (Wolfram Research, Inc. 2019) to solve the set of equations.

A direct ML estimation method
In case an identifiable model is selected, maximum likelihood can be used to estimate the parameters. The log-likelihood function, analogous to the one proposed by Lin and Lee (2006), is given by lðhÞ ¼ (8), and h ¼ ðf 1 , :::, f p , s 11 , :::, s qq , g 12 , :::, g qqÀ1 , q 12 , :::, q T iÀ1 T i , r 2 , aÞ: To maximize the likelihood, we have chosen the nonlinear optimization procedure NLopt (Johnson 2014), named nloptr (Ypma 2014) in R (R Core Team 2016) using a non-gradient based local search method that performs numerical optimization by using quadratic approximations of the objective function and allows for bound constraints for the variance-covariance components (Powell 2008).
To investigate the accuracy of this estimation procedure, we select an identifiable tLMM with a parametrization in line with the literature, namely with G an unstructured variance-covariance matrix, and R i0 an AR(1) correlation matrix. Note that this formulation is equivalent to Eq. (1), with R i ¼ r 2 R i0 , and r 2 a positive parameter now multiplying only R 0 . This way, it has the direct interpretation of the residual variance. The response variable conditionally on the latent variable c i has thus variance

Estimation of the standard deviation of the estimates
Under certain regularity conditions, the standard deviation of the estimates can be derived from the observed Fisher information matrix Iĥ (e.g. Section 5.5 of Van der Vaart 1998) with whereĥ denotes the ML-estimator of h. The Fisher information matrix Iĥ is obtained by substituting the ML-estimatesĥ in the derivatives of the log-likelihood function with respect to the parameters. This method has been used by Lin and Lee (2006) and Lin (2008), and with the expected information matrix by Pinheiro, Liu, and Wu (2001).
In the simulation study of an identifiable case, we check the accuracy of this approach via bootstrap analysis (Efron and Tibshirani 1994). We compare the asymptotic standard error of the estimates with the ones obtained with the bootstrap approach on 350 samples on the datasets generated in the first setting (Table 1). We show the coverage probabilities of these parametric confidence intervals.

Outlier identification
"An outlier is an observation that is very different to other observations in a set of data" (Upton and Cook 2014). Since traditional LMMs are based on the assumption of normality of the data, they are sensitive to outliers, and maximum likelihood estimates get significantly biased in presence of extreme observations (Pinheiro, Liu, and Wu 2001). Therefore, outliers can directly affect statistical inference when modeling relies on normality assumptions. However, excluding extreme observations prior to the analysis is not a solution, as it may delete important information. The (multivariate) tLMM has been shown to be an appropriate tool to perform robust analyses. In fact, the heavier tails of the t distribution can accommodate the effect of outliers. Pinheiro, Liu, and Wu (2001) show that the effect of the outliers on maximum likelihood estimates is bounded in the tLMM, also for larger deviations of the observations, while it is unbounded in the classical LMM.
tLMMs not only allow proper inference by downweighting the effect of outliers, but also enable outlier detection. Outlier identification based on the fit of the tLMM is largely discussed in literature, where outliers are always intended with respect to the Gaussian distribution.
The methods proposed use the Mahalanobis distance and the fact that the posterior distribution of c i isĉ with posterior meanĉ which under normality is equal to one. Here T i denotes the number of follow-up times for subject i andD 2 i is an estimate of the expression in Eq. (8), obtained by replacing the parameter values with their estimates. Thus a subject is detected as outlying (with respect to normality) if the values sampled from the posterior distribution are significantly smaller than 1 (Lin and Lee 2007), or if the 95% prediction interval for c i does not include 1 (Ho and Lin 2010;Wang and Fan 2012). Lin and Lee (2007) and Fan (2011, 2012) propose the use of boxplots of posterior samples of c i , drawn from Eq. (23) to find outlying subjects. Wang and Lin (2014) and Wang, Lin, and Lachos (2018) complement the approaches above with a quantitative threshold-based method to distinguish outliers, where the threshold is and B a is the a quantile of the Beta distribution with parameters a=2 and T i =2 (Wang and Lin 2014). If the posterior median of the sampled values of c i is less than c, the subject is an outlier. If the upper bound of the 100ð1 À aÞ% posterior interval is smaller The first four columns uniquely define a label for each of the simulation settings which will be used in the following. The number of subjects is set to n ¼ 50 in all the 64 settings, while the setting numbers with a star are also simulated with n ¼ 100 subjects in the identifiable case. The parameter q is between brackets because it is only for the identifiable case with AR(1) correlation structure.
than c, then the subject is an extreme outlier in the population (Wang, Lin, and Lachos 2018). The same approach has been proposed to detect outliers in a longitudinal data clustering study (Wang 2019), applying formula (25) with cluster-specific degrees of freedom (a s ). Pinheiro, Liu, and Wu (2001) also introduce a decomposition of the Mahalanobis distance,D This provides useful diagnostic statistics for understanding how the estimatesû i andê i affect the individual weightsĉ i , and for identifying subjects with outlying observations (Matos et al. 2013). In fact, under the Gaussian model and when (one of) these quantities are larger than 1, subject i should be treated as potential outlier at the corresponding level (e.g. random effects or residuals).

Simulation settings
In this section we introduce the settings of our simulation study to validate the ML estimation method in an identifiable case, and the settings of a smaller simulation study to show the consequence of fitting a model that is not identifiable. Motivated by the case study (Sec. 6.3), we consider a tLMM including three fixed effects (gender, age and time) and two random effects (intercept and slope), The coefficients for the fixed effects are set to the values f 0 ¼ 65, f g ¼ 2:5, f a ¼ 0:5, f t ¼ 2 and the variance of the random intercept to s 2 0 ¼ 1 for all simulations. Furthermore, we choose a set of values for each variance-covariance parameter, and by combining their values we define the simulation settings, identified by the labels in Table 1. With the notation introduced in Sec. 2, n is the number of subjects, s 1 is the standard deviation of the random slope, g is the correlation coefficient between random intercept and slope. The remaining parameters are chosen differently for the identifiable and unidentifiable case and specified in the following. Each setting is simulated Nsim ¼ 1000 times to ensure stability.

Unidentifiable case -RIS with Toeplitz correlation structure
To show the risks connected to non-identifiable models described in Sec. 3, we run a small simulation study with a reduced number of repeats T i ¼ T ¼ 3, n ¼ 50 subjects and a Toeplitz correlation structure with parameters u 1 ¼ 0:90 and u 2 ¼ 0:82 (ensuring the correlation matrix is positive definite). We consider the settings with a ¼ 1:5, a ¼ 5 and a ¼ 15 in Table 1, ignoring the last column since q is here replaced by the fixed values of u 1 and u 2 : This reduces to 24 possible settings. The starting points for the optimization procedure are the linear transformations of the true parameters used in the simulation as given in Sec. 3.1, with c ¼ 2.

Identifiable case -RIS with AR(1) correlation structure
For the identifiable case, we simulate T i ¼ T ¼ 16 repeats and choose an AR(1) correlation structure with autoregressive coefficient q given in Table 1. The number of subjects is set to n ¼ 50 in all the 64 cases, while the setting numbers with a star are also simulated with n ¼ 100 subjects. As starting values we give the true values used to simulate the data, and their linear transformations in few cases, for comparison with the unidentifiable case.

Unidentifiable case -RIS with Toeplitz correlation structure
In this section we report the results from the simulation study concerning the unidentifiable case. As visible in Table 2, the ML estimates of the fixed effects are reliable. On the other hand, the estimates of the variance-covariance parameters are converging to the starting values (Table 3). Thus the bias with respect to the true underlying parameters is very large. The effect of non-identifiability is limited to the variance-covariance matrix, while the fixed parameters can be properly estimated. However, given that this model is often used to compare the inter-and intra-individual variability between groups (Chinchilli, Esinhart, and Miller 1995;Lin, Raz, and Harlow 1997;Welsh and Richardson 1997), estimating variance components accurately is of crucial importance. Hereĥ ij denotes the ML-estimate of a fixed effect coefficient in h in the jth simulated dataset under setting i (j ¼ 1, :::, N sim and i ¼ 1, :::, N settings, with N settings ¼ 24 for n ¼ 50). Under the column Means we report mean ij ðĥ ij Þ and between brackets stdev i ðmean j ðĥ ij ÞÞ for the coefficients of the fixed effects. For the same coefficients, we show mean i ðstdev j ðĥ ij ÞÞ and between brackets stdev i ðstdev j ðĥ ij ÞÞ in the column standard deviations. Here we can consider the absolute bias, as the fixed effects do not vary across settings.

Identifiable case -RIS with AR(1) correlation structure
In this section we report the results from the simulation study on the identifiable case with autoregressive correlation structure. From the results, we can conclude that overall the estimation procedure works well, and it always gets to convergence in a reasonably short amount of time (on average 6 min per simulation). Furthermore, convergence and accuracy are not affected by imposing linear combinations of the true parameters as starting values for the estimation procedure. The fixed effects can be estimated with high accuracy in all the simulated settings, even with a relatively small number of subjects compared with the large number of repeats. In Table 4 we summarize the results with various statistics. We can conclude that the estimates of the fixed effects are on average equal to the true values, with very little variation across the settings. Also the variance-covariance parameters can be estimated with low bias in most of the settings, although some exceptions exist for the parameters s 0 and especially g (see Table 5).
Clearly, the increased number of units has a positive effect, reducing relative bias and standard deviation, as we may expect from the asymptotic behavior of ML estimates. To investigate in more detail the bias of the parameters s 0 and g for n ¼ 50, we visualize the relative bias of the individual settings in Figures 1 and 2. The two lines (continuous and dotted) connect the settings with q ¼ 0:25 and q ¼ 0:75 respectively, the two symbols distinguish the different values of g, and the labels on the horizontal axis correspond to the simulation settings defined in Table 1.
A strong dependence on the autoregressive coefficient q is apparent, while q itself is always estimated with good accuracy. In particular, higher values of q make the estimation of s 0 and g more difficult. Moreover, when q ¼ 0:75 we can observe a difference in performance attributable to g, namely small g combined with big q causes the most biased estimates of the variance of the random intercept and its correlation with the random slope. When q ¼ 0:25, the relative bias is not large.
Furthermore, we performed a bootstrap analysis on the first simulation setting. We chose a bootstrap sample of B ¼ 350 as this is shown to provide stability in the estimates on the case study (see Supplemental material). In Table 6 we report the average estimate, the standard error as obtained analytically by exploiting the asymptotic properties of ML estimates, the standard error as obtained from the bootstrap analysis (average over the simulated datasets for the bootstrap standard deviation) and the coverage Hereĥ ij denotes the ML-estimate of a variance-covariance component in h in the jth simulated dataset, j ¼ 1, :::, N sim: The true value h Ã i can vary across settings i (i ¼ 1, :::, N settings, with N settings ¼ 24, and thus we report the statistics of the relative bias: mean i ððmean j ðĥ ij Þ À h Ã i Þ=h Ã i ÞÁ 100, and between brackets stdev i ððmean j ðĥ ij Þ À h Ã i Þ=h Ã i Þ: probabilities of the bootstrap confidence intervals. We observe that the direct derivation from the Fisher information matrix undervalues the standard error of the estimates, since the same values from the bootstrap analysis are overall larger and lead to coverage Hereĥ ij denotes the ML-estimate of a variance-covariance component in h in the jth simulated dataset, j ¼ 1, :::, N sim: The true value h Ã i can vary across settings i (i ¼ 1, :::, N settings, with N settings ¼ 16 for both n ¼ f50, 100g, corresponding to the starred settings in Table 1), and thus we report the statistics of the relative bias: mean i ððmean j ðĥ ij Þ À h Ã i Þ=h Ã i Þ Á 100 and between brackets stdev i ððmean j ðĥ ij Þ À h Ã i Þ=h Ã i Þ:  Table 1, the line types distinguish the values of q and the symbols denote the value of g. Clear dependence on q and on g in the cases with q ¼ 0:75: Hereĥ ij denotes the ML-estimate of a fixed effect coefficient in h in the jth simulated dataset under setting i (j ¼ 1, :::, N sim and i ¼ 1, :::, N settings, with N settings ¼ f64, 16g for n ¼ f50, 100g respectively). Under the column means we report mean ij ðĥ ij Þ and between brackets stdev i ðmean j ðĥ ij ÞÞ for the coefficients of the fixed effects. For the same coefficients, we show mean i ðstdev j ðĥ ij ÞÞ and between brackets stdev i ðstdev j ðĥ ij ÞÞ in the column standard deviations. Here we can consider the absolute values, as the fixed effects do not vary across settings.
probabilities close to the nominal ones. Increasing the number of bootstrap samples can probably still improve the coverage for some of these parameters.

Case study
Understanding self-reported sleep quality and how it links with measurable data is currently of much interest in sleep medicine, since it has been proved to connect with mental and physical health problems (Åkerstedt et al. 1994;Augner 2011;Rahe et al. 2015). In particular, many references investigated the relationship between heart rate variability and sleep quality, possibly with other factors, such as working conditions and well-being. Although researchers do not agree on the direction of the influence, it is well agreed on the existence of a strong link between sleep and both autonomic and cardiovascular response (Kageyama et al. 1998;Burton et al. 2010;Michels et al. 2013). Our goal is to assess the influence of physiological values, and in particular of derived heart rate measures on perceived sleep quality using the data from a study on sleep quality. Most of the studies apply traditional linear mixed models, but the observed values are often highly variable both within-and between-subjects, with outlying Results of the bootstrap analysis with 350 samples from each of the 1000 simulated datasets from setting 1. The asymptotic error is the one obtained via the Fisher information matrix and is compared to the one obtained via the bootstrap method. In the last column is shown the coverage probability as obtained with the bootstrap standard deviation.  Table 1, the line types distinguish the values of q and the symbols denote the value of g. Clear dependence on q and on g in the cases with q ¼ 0:75: observations and outlying participants. We propose an alternative approach based on the tLMM to take into account the heterogeneity observed in the data. We estimate the influence of night-averaged heart rate (HR) quantities on perceived sleep quality (PSQ) fitting the same model we used in the simulations.

The data
The study design involved 50 healthy volunteers, of which 54% were females, aging from 40 to 65 years old, as this was considered the age category with the most stable routine and sleep rhythm. All the participants were chosen according to well defined exclusion criteria (diagnosed neurological, cardiovascular, psychiatric, pulmonary or endocrinological disorder; sleep disorder; using sleep, antidepressant or cardiovascular medications; consuming drugs or excessive alcohol; being pregnant; shift working; crossing more than two time zones in the last 2 months). The protocol involved two weeks of home monitoring plus two days in a sleep laboratory. During the first phase, participants were asked to wear an unobtrusive device and to fill in morning and evening questionnaires, where information of the preceding night and day were collected respectively. In the sleep laboratory, besides the collection of this information, other physiological measurements were taken. For more details on the study setup, refer to Goelema et al. (2017). Due to unreliability of the physiological measurements on 8 participants, and the missingness of all the baseline variables relative to one participant, we remained with a dataset consisting of 41 participants. The PSQ, chosen as response variable, was assessed through the morning questionnaire via a visual analog scale ranging from 0 to 100. Being a subjective value, PSQ is highly variable within-and between-participants, with no clear relationship between the level and the variance (see Figure 3). Heterogeneity in this variable was checked via the Levene test (F ¼ 3.05 with df ¼ 40 and p ¼ 6:13e À 09).
The heart beats were extracted from the signal measured by the unobtrusive device and then the average heart rate over 30 s-windows was computed. For the aim of this study, we considered only the aggregated values over nights, namely the night average (HRmean) and the night standard deviation (HRsd) of the 30 s window-averages.

Preprocessing and data imputation
The data collected during the two nights in laboratory settings were excluded as the change in sleeping conditions might have affected sleep quality, resulting in 14 time points per participant. Furthermore, the rate of missingness in the response variable was 11.13%, while it was 11.74% in HR mean and standard deviation. Since the proposed estimation method requires complete datasets, we performed multiple imputation (R package mice, Zhang 2016) generating 20 complete datasets using some of the baseline and longitudinal variables.

The fitted model
We fitted a tLMM with random intercept and slope for time as continuous variable and included age, sex, mean HR, standard deviation HR as covariates for the fixed effects, analogous to the one described in Sec. 4.1. We also fitted a linear mixed model with the same fixed and random effects (MIXED procedure of SAS with maximum likelihood estimation, Littell et al. 2007). Note that we used the LMM estimates as starting values for the parameters in the tLMM, together with various initial values for a. Both models were fitted on each imputed dataset separately, and the standard error of the estimates was computed via the bootstrap approach (see Sec. 4.2). Then the final estimates were derived as the mean of the estimates from the analysis of each imputed dataset, and the final standard error of the estimates (combining within-and between-imputed dataset variability) was derived via Rubin's rule (Rubin 2004). For the LMM we used the MIANALYZE procedure of SAS and for the tLMM we implemented the Rubin's rule in R. After fitting the model, we sampled 1000 values from the posterior distribution of c i (Eq. 23), and showed their values per subject by means of a boxplot. Analogously to Lin and Lee (2007), we define as outlying the subjects that show extremely small sample valuesĉ i 's. In fact, a small value of the latent variable indicates that the influence of the corresponding subject (outlier with respect to normality) is downweighted in the estimation of model parameters. To understand better at which level these subjects are outlying (namely in the u i or in the e i term), we decomposed the Mahalanobis distance following Eq. (26), and checked the value of each component divided by its own expected value under the Gaussian model (Pinheiro, Liu, and Wu 2001).

Results
After a first analysis of the tLMM, the initially included random slope was left out based on the information criteria (average AIC RIS ¼ 5142:98 vs. average AIC RI ¼ 5138:83), suggesting that the heterogeneity in the slope was not sufficiently large, and confirmed by the estimate of the variance of the random slope s 1 ¼ 0:000198: We thus reduced the model to random intercept only. The estimates for the linear mixed model are shown in Table 7, with the corresponding standard errors between brackets.
As for the LMM, also for the tLMM the random slope did not improve the fit and thus in Table 8 we report the estimates of the model with random intercept only, with their standard deviations obtained via the bootstrap approach. The results shown in the table correspond to the starting value a 0 ¼ 1:5: The average perceived sleep quality corrected by the mentioned factors is 84.84 ± 6.63, and is negatively affected by an increased mean heart rate over the night (f HRm ¼ À0:18 6 0:16), while it is on average higher for females (f g ¼ 6:93 6 4:41). At the same time, differences in age seem to play no role in the perception of a good night sleep, as well as the fixed effect of the night standard deviation of the heart rate. There is a positive effect of time and a significant difference between participants in overall mean. The residuals have a variance of r 2 ¼ 440:55 6 13:28, with no significant autoregressive structure. Note that both parameters of the gamma distribution are equal to a ¼ b ¼ 1:51 6 0:85, which justifies the choice of tLMM over traditional LMMs, and this is confirmed by the information criteria (average AIC tLMM ¼ 5138:83 against average AIC LMM ¼ 5210:18). Furthermore, while the estimates do not show extreme differences, the tLMM significantly reduces the standard deviation compared to the LMM. This might be explained by the fact that the heterogeneity affects more the confidence with which we can estimate the parameters than the estimates themselves, and is another observation in favor of the t model over the traditional one.
Finally, we used the tLMM fit to identify outlying participants with respect to normality (as discussed in Sec. 4.3), and applied the rule suggested by Lin and Lee (2007). Valuesĉ i 's substantially lower than 1 indicate a possibly outlying subject. In Figure 4 we show the values sampled from the posterior distribution for each participant via a boxplot. In total each boxplot refers to 1000 values sampled from each of the 20 posterior distributions (20 imputed datasets and relative estimated parameters). We chose to pool the values across imputations because there was no large deviation across  Estimates of the coefficients of the LMM and corresponding standard error between brackets. Fitted with the MIXED procedure in SAS software choosing the maximum likelihood estimation method. The estimates and standard errors are pooled with the procedure MIANALYZE. Some parameters have a tilde in the notation as they cannot be directly compared with the corresponding parameter in the tLMM Pinheiro, Liu, and Wu (2001). imputations. Looking at the plot, we conclude that participants 2, 3,4,11,15,16,25,29,33,34,37 and 38 should be regarded as potential outliers for the LMM fit. It is also interesting to understand at which level the participants are outlying, because even in case their overall behavior is in line with the sample, they may still have outlying behavior at the level of random effects or residuals separately. Following the method of Pinheiro, Liu, and Wu (2001), we plotted the values of the two components of the Mahalanobis distance and the distance itself divided by their own expected value under normality. Cases in which these values are larger than one can be regarded as outliers.
In Figure 5 we show these values corresponding to the 20 imputations in our case study: each boxplot collects the values of one of these quantities across the 20 imputed datasets for one single participant. The three plots are for D 2 =T i , D 2 e i =T i and D 2 u i respectively. We can observe that participants 2,4,8,14,15,22,37,38 are outlying at the u i level, while subjects 2, 3,4,11,15,16,25,29,33,34,37,38 are outlying at the e i level. It is worth noticing that participants 8, 14 and 22 are flagged as potential outliers when looking at the components of the Mahalanobis distance, while they would not be detected when looking at the total residuals or at the posterior conditional distribution of the c i 's. The analysis of outlying observations clearly demonstrates the potential of the tLMM over the more common LMM approach.

Discussion
In this paper, we presented a thorough analysis of the tLMM, with particular focus on model definition and parameter estimation. For the first part, we provided the mathematical formulation of non-identifiability for tLMMs and gave explanation to the common choice of equal parameters of the gamma distribution, normally not justified. It is worth noticing how different gamma distributions actually end up in the same marginal distribution of the response variable Y i : Thus, the same t can have different interpretations. Identifiability issues that arise in the variance-covariance structure are theoretically shown by comparing the covariance matrix corresponding to two different sets of parameters, illustrating the procedure under some popular assumptions on the variancecovariance structure of random effects and residuals. The effect of unidentifiability on estimation is illustrated via a small simulation study. As expected, the estimates of the fixed effects are still reliable, their standard deviation is slightly higher, but this might be due to the lower number of settings compared with the identifiable case. The main effect of unidentifiability is reflected in the estimation of the variance-covariance parameters, and this is crucial when using the t mixed model for the analysis of variances (Chinchilli, Esinhart, and Miller 1995;Lin, Raz, and Harlow 1997;Welsh and Richardson 1997). Since the method convergences also in this case, identifiability issues are difficult to spot a posteriori and should be checked in the phase of model definition and formulation.
Then we investigated the reliability of an existing estimation method in case the model is identifiable. We extended the direct maximum likelihood approach of Lin and Lee (2006) to allow for two correlated random effects and an autoregressive correlation structure of first order. The method is direct, easy to understand, reasonably fast and fully reproducible.
We assessed its performance via an intensive simulation study under the traditional choice of a ¼ b, and showed that the fixed effects can be accurately estimated in all the considered settings, also with a relatively small amount of subjects and a large number of repeats. This analysis is quite unique, as previous work on the robustness of estimation methods is limited, and even when it exists, it simulates a small number of observations (Pinheiro, Liu, and Wu 2001) or a relatively simple correlation structure (Lin, Raz, and Harlow 1997). This is the component that is most difficult to estimate. In fact, we concluded that for some sets of true values the estimation of the variance-covariance parameters is less stable, in particular for g and s 0 when the coefficient of autocorrelation q is large. This problem is normally overcome in the literature by employing Bayesian approaches (Lin and Lee 2007) or expectation-maximization algorithms (Lin 2008), or by avoiding the coupling of both random intercept and slope with autoregressive correlation structures. For example Lin and Lee (2006) explore random intercept with autoregressive correlation, but simplify the correlation structure when including also a random intercept. The choice might be due to their case study, concerning 17 observations on a maximum of 18 participants, which might be very critical for the estimation accuracy. In fact, we showed that thanks to the optimal properties of ML estimation the accuracy can be improved by increasing the number of subjects in the study, which is commonly even larger in panel studies than it was in our simulation study. Furthermore, our study demonstrates that relying on the asymptotic properties of the ML estimates in this case can lead to underestimation of the actual variability in the estimates. Therefore we proposed a bootstrap approach to estimate the standard error of the estimates.
The analysis of sleep data led to conclusions that slightly differ from the literature (Kageyama et al. 1998;Michels et al. 2013). This might be due to the HR variables that were taken into account, aggregated at night level, or to the novel approach accounting for additional heterogeneity. In fact, even choosing higher starting values for the parameters of the gamma distribution (e.g. a 0 ¼ 10), led to the same estimate of the hyperparameter, supporting the choice of a model with underlying heavy-tailed distribution. The choice between these assumptions will require further investigation.
Note that the speed and rate of convergence of the estimation method could be increased by choosing an alternative optimization procedure and by providing the gradient of the likelihood function. Furthermore, more research is needed to overcome the limitation of the current method in handling missing values. Also the model could be extended by empowering different heterogeneity factors for the random effects and for the residuals, as already highlighted by Pinheiro, Liu, and Wu (2001). Another extension would involve replacing the gamma distribution with more general exponential probability distributions, and relaxing the hypothesis of conditional independence between random effects and residuals. Finally, future research should focus on deriving specific measures to assess the goodness of fit of the t distribution.

Data availability statement
The data that support the findings of this study are available from Philips Research. Restrictions apply to the availability of these data, which were used under license for this study. The codes are available from the corresponding author on request.

Disclosure statement
No potential conflict of interest was reported by the authors.