Mixture of Regressions with Multivariate Responses for Discovering Subtypes in Alzheimer’s Biomarkers with Detection Limits

There is no gold standard for the diagnosis of Alzheimer’s disease (AD), except for autopsies, which motivates the use of unsupervised learning. A mixture of regressions is an unsupervised method that can simultaneously identify clusters from multiple biomarkers while learning within-cluster demographic effects. Cerebrospinal fluid (CSF) biomarkers for AD have detection limits, which create additional challenges. We apply a mixture of regressions with a multivariate truncated Gaussian distribution (also called a censored multivariate Gaussian mixture of regressions or a mixture of multivariate Tobit regressions) to over 3000 participants from the Emory Goizueta Alzheimer’s Disease Research Center and Emory Healthy Brain Study to examine amyloid-beta peptide 1–42 (Abeta42), total tau protein and phosphorylated tau protein in CSF with known detection limits. We address three gaps in the literature on the mixture of regressions with a truncated multivariate Gaussian distribution: software availability; inference; and clustering accuracy. We discovered three clusters that tend to align with an AD group, a normal control profile, and non-AD pathology. The CSF profiles differed by race, gender, and the genetic marker ApoE4, highlighting the importance of considering demographic factors in unsupervised learning with detection limits. Notably, African American participants in the AD-like group had significantly lower tau burden.


Introduction
A definitive diagnosis of Alzheimer's disease (AD) is only possible from an examination of brain tissue in an autopsy (Dubois et al., 2007).The problem is made worse by the fact that clinical diagnosis using biomarkers have historically been based on studies dominated by people of European ancestry (Blennow et al., 2015).African American individuals are greatly underrepresented in AD biomarker studies and clinical trials (Shin and Doraiswamy, 2016), and CSF biomarker levels differ by race (Garrett et al., 2019).Unsupervised learning was applied to CSF biomarkers to reveal CONTACT B. B. Risk.Email: brisk@emory.eduarXiv:2303.00715v1[stat.AP] 1 Mar 2023 insights into AD, but race and other demographic factors were not considered (Meyer et al., 2010).
There are at least three challenges to analyzing CSF AD biomarker data: 1) multivariate biomarkers have detection limits, resulting in censoring; 2) the disease status is unknown since there is no gold standard; and 3) demographic effects may depend on unknown subtypes.Current statistical software do not simultaneously address these problems (Table 1).Our goals are twofold: 1) cluster participants into groups using an unsupervised multivariate method, since no gold standard is available and current criteria may be limited by factors such as European ancestry, and 2) gain insights into pathophysiology by estimating within-cluster effects of demographic variables (race, gender, the genetic marker ApoE4, age and education).
Our study is motivated by the Emory Goizueta Alzheimer's Disease Research Center and the Emory Healthy Brain Study (hereafter, Emory ADRC/HBS Dataset), which contains three CSF biomarkers (amyloid-beta peptide 1-42 [Abeta42], total tau protein [tTau] and phosphorylated tau protein [pTau]) from lumbar punctures of over 3,000 individuals (Goetz et al., 2019).The dataset contains 16.5% (495) African American participants, which is substantially higher than the Alzheimer's Disease Neuroimaging Initiative (< 5%).An important limitation of the assay is that approximately 15% of the participants in the Emory ADRC/HBS dataset have one of the three biomarker levels defined by the detection limits of the assay.In this paper, we define a censored response variable in the same way as other mixture modeling papers (Jedidi et al., 1993;Lee and Scott, 2012): censoring occurs if the value of the response variable is set equal to the detection limit when the true value is more extreme than the threshold, while the predictors are available for all observations (e.g., participants with Abeta42 over 1,700 have their Abeta42 values set equal to 1,700).This differs from truncation, which typically refers to a restricted sampling of the distribution of the population (e.g., if patients with Abeta42 over 1,700 were not recorded, then the data would be truncated).
Unsupervised learning, such as Gaussian mixture models (GMMs), are popular tools for defining disease subtypes when a gold standard is not available (Collins and Huynh, 2014).Model-based clustering approaches derived from GMMs have advantages over distance-based clustering algorithms such as K-means.GMMs estimate posterior probabilities of group membership for each data point rather than hard clustering.GMMs utilize a statistical model that can account for cor-relations.From a probabilistic perspective, K-means assumes spherically shaped clusters, which can lead to poor results when features are correlated (Coates and Ng, 2012).In our application, CSF biomarkers of AD are highly correlated.A Gaussian mixture of regressions (GMR) model, also called "switching regressions" in econometrics, extends GMMs to datasets with predictors by modeling the mean structure of each group using regression (Goldfeld and Quandt, 1973;Quandt and Ramsey, 1978).These models allow for the effect of predictors to be modified by the latent groups.These models have also been extensively discussed in the machine learning literature, where they are called "mixture of experts" models (Yuksel et al., 2012).
Censored multivariate Gaussian mixtures of regressions (censored GMRs), also known as a mixture of regressions with a truncated multivariate Gaussian distribution or a multivariate mixture of tobit regressions, have been previously considered in the literature.Lee and Scott (2012) derived EM algorithms for fitting multivariate GMMs to censored data.To model predictors, Jedidi et al. (1993) derived an EM algorithm for a mixture of tobit regressions with a univariate censored response, and more recent extensions of tobit regression for a univariate response model errors using a finite mixture of Gaussian and/or non-Gaussian distributions (Hanson and Johnson, 2002;Caudill, 2012;Karlsson and Laitila, 2014;Garay et al., 2017;Zeller et al., 2019).Wang et al. (2019) proposed a mixture of factor analyzers for multivariate data that simultaneously performs clustering and dimension reduction, and Wang et al. (2021) extended it to predictors and censoring.
Our primary contribution is an analysis of the Emory ADRC/HBS dataset using a censored multivariate Gaussian mixture of regressions.We implement an EM algorithm to address the important gap that software does not exist for the censored multivariate Gaussian mixture of regressions.We also address gaps in the current literature regarding the use of Wald tests of significance of the predictors in this context, wherein we approximate the information matrix using the empirical complete data score function.We also conduct a simulation study to address a gap regarding the impact of predictors and censoring on the accuracy of clustering.
The remainder of this paper is organized as follows.In Section 2, we review the multivariate tobit model and describe the extension to censored Gaussian mixtures.We then describe an EM algorithm and method for inference.In Section 3, we conduct simulations to illustrate the advantages of the censored multivariate GMR over methods ignoring or deleting the censored observations, and we also examine the selection of the number of clusters.In Section 4, we conduct an analysis of the Emory ADRC/HBS Dataset.Finally, we discuss findings and future research in Section 5.

Modeling approach and estimation
In this section, we first review the multivariate tobit model and its estimation using an expectationmaximization (EM) algorithm.We then build upon this framework to derive an EM algorithm for the censored multivariate GMR.

Multivariate censored regression (tobit model)
Let y i be a p-dimensional random vector for the ith subject, i = 1, . . ., N , which can be partitioned into two parts, where y ioi and y ici denote the uncensored and censored dimensions of y i .We use a vector of censoring indicators c i to represent whether or not one particular dimension of y i is censored, and its censoring directions are observed through c i = (c i1 , ..., c ij , ..., c ip ) , where denotes the transpose, such that: Though the true values are not observed before they are censored, we can nevertheless further assume y i are generated from the partially unobserved truth, as a latent random vector y * i = (y * i1 , ..., y * ij , ..., y * ip ) with some known lower and upper detection limits: Similarly, we can partition y * i into observed and censored parts: where y * ici are unobserved and censored as y ici .In a multivariate regression model setting, we also assume x i is a d-dimensional vector that represents the observed predictors of the ith subject.
Then, the model can be formed as where β is a d × p coefficient matrix and a primary parameter of interest, i is a p-dimensional vector of random noise and i i.i.d.
Let ψ denote the collection of parameters β and Σ.Let Y be the N × p matrix of stacked observations y i , C be the N × p matrix of stacked censoring directions c i ; and X the N × d matrix of stacked predictor vectors x i .Then ψ can be estimated from maximization of the incomplete data likelihood function: where D(c i ) is a domain in R p , depending on the censored patterns represented by c i , and f y * ic i |yio i is the conditional Gaussian density of the unobserved responses y * ici that experience censoring given the observed responses without censoring y ioi .Typically, (2) is maximized numerically by the Newton-Raphson method (Amemiya, 1973).Here, we outline the EM algorithm similar to (Fair, 1977;Ruud, 1991), which we will extend to Gaussian mixtures in Section 2.2.Let Y * denote the N × p matrix of true observations formed by stacking y * i .Define the complete data likelihood assuming y * i , i = 1, . . ., N , are observed: The EM algorithm steps are derived by maximization of the conditional expectation of this complete data log-likelihood function, with details in the Appendix A.1.

Censored multivariate Gaussian mixture of regressions
Let G denote the number of clusters.Later, we examine the selection of G using information criteria.
Then we can expand the model in (1) to a mixture model: Therefore, y * i |{g, x i } ∼ N (β g x i , Σ g ).Assuming observations are i.i.d., define the incomplete data likelihood function: where ψ g are the vectorized parameters for each component; ω g ∈ (0, 1) are the mixing proportions of each mixture component subject to constraint: G g=1 ω g = 1; and Ψ is the overall parameter vector, such that Ψ = (ω 1 , ..., ω G−1 , ψ 1 , ..., ψ G ). Unlike the previously described regression model, in a mixture model setting, the direct maximization of the above likelihood function is not possible.Thus, we utilize the EM algorithm for parameter estimation.Let z ig denote an indicator variable equal to one if the ith subject is in the gth group and zero otherwise, and let Z denote the N × G matrix formed by stacking [z i1 , . . ., z iG ] .Assuming both Z and Y * are observed, we can write the complete data likelihood function: . (5) The EM algorithm is initialized with values Ψ (0) and then iterates between the E-and M-steps until convergence, as described here.Let Ψ (k) denote the values of the parameters from the previous iteration.Let Z ig denote a random indicator variable equal to one if the ith subject is in the gth group and zero otherwise.Define , where E Ψ (k) denotes the expectation evaluated using Ψ (k) .Let ). Taking the conditional expectation of the complete data log-likelihood function: Then the EM algorithm steps are: • E-step: where y * ici y * ici |y ioi g and y * ici |y ioi g are the first and second moments of truncated conditional Gaussian distribution.These moments are calculated using the R package MomTrunc (Galarza et al., 2021).
• M-step: z ig /N (10) where Y * g is the N × p matrix of stacked conditional expectations y * i g , and Z g = diag( z 1g , ..., z N g ).Additional details are in the Appendix A.2.

Hypothesis testing of within-cluster effects
Unlike some MLE algorithms in which the information matrix is automatically extracted (e.g., Newton-Rhapson updates), the information matrix is not directly calculated in the EM algorithm.
As an alternative, some authors use bootstrapping (McLachlan and Peel, 2000;O'Hagan et al., 2019), which is generally reliable but computationally expensive.Instead, we approximate the observed information matrix using the empirical complete data score function.
Under mild regularity conditions and weak consistency of the MLE that is a global maximizer in the interior of the parameter space where Ψ 0 is the true parameter vector; Θ is the parameter space; s c (Ψ; is the first-order derivative of the individual conditional expectation of the complete date log-likelihood with respect to the parameters of interest.For details, see equation 2.60 in McLachlan and Peel (2000).We then conduct Wald tests of the within-cluster effects.This approach avoids the computation of second-order partial derivatives and is computationally feasible.
McLachlan and Peel (2000) note that the sample size in mixture models has to be large for valid inference.Our data application has N > 3,000.In Section 3, we show the type-1 error rates are in general close to their nominal levels in a setting with N = 1,000.

Simulation design
We examine the censored multivariate GMR and estimators using two simulation scenarios: mild censoring and severe censoring.In both scenarios, unobserved data with N = 1,000 are first generated from a three-cluster mixture of regressions with bivariate responses (Y 1 , Y 2 ) following Gaussian distributions where the means are linear transformations of an intercept β g,0 where g ∈ {1, 2, 3} and three predictors (X 1 , X 2 , X 3 ).The simulation design is detailed in Table 2 and summarized here.The predictors are generated from a mean-zero multivariate Gaussian distribution with the covariance matrix equal to the sample correlation matrix based on three continuous demographic features from the Emory ADRC/HBS (age, education and Montreal cognitive assessment).The true parameters are described in Table 2.
In Scenario I, a lower detection limit equal to 0 is applied to the first response variable, Y 1 , leading to around 4.1% of observations left-censored on Y 1 , while an upper detection limit of 30 is applied to Y 2 leading to around 13.7% of observations right-censored on Y 2 .This leads to censoring levels similar to those found in the real data set (Section 4).In Scenario II, a lower detection limit of 2.5 is applied to Y 1 and upper detection limit of 26.5 to Y 2 leading to around 40.2% of observations left-censored and 37.2% of observations right-censored, respectively.For each scenario, we performed 101 simulations with results from the median error used in figures.Scatter plots of the unobserved truth and the censored data from an example simulation are visualized in Appendix Figure C1.
We compare the censored multivariate GMR to three approaches: a multivariate mixture of regressions ignoring censoring, i.e., treating censored observations in the same manner as uncensored and using all observations (ignore-censor GMR); a multivariate mixture of Gaussians in which censored observations are deleted (delete-censor GMR); and the censored multivariate Gaussian mixture model ignoring predictors (censored GMM) (Lee and Scott, 2012).We report the mean and standard deviation (SD) across simulations of the mixing proportions ω 1 , ω 2 and ω 3 .We report the mean and SD of Frobenius errors for other parameters.We evaluate the overall clustering accuracy using the adjusted Rand index (ARI) comparing the unobserved true labels against our modeled labels, where each observation is assigned to the class that had highest E Ψ (Z ig |y i ).
We also examine the type-1 error rates of the censored multivariate GMR from 500 simulations using the estimates corresponding to parameters in which the true coefficient values are equal to 0.
We also investigate the selection of the number of clusters.We fit the censored multivariate GMR for g = 1, ..., 6.For each of the 101 replicates and each g, the model is randomly initialized 32 times, and the solution with highest likelihood among the set of converged solutions is selected.We then calculate the integrated completed likelihood criterion (ICL) (Biernacki et al., 2000).

Simulation results
In Scenario I, the mixing proportions from the censored multivariate GMR are unbiased with small standard deviation, the mixing proportions from the ignore-censor GMR are similar, while bias increases in the delete-censor GMR with an overestimation of the frequency of group 2 and underestimation in group 3, and the mixing proportions are highly inaccurate in the censored GMM (Table 3).For the regression coefficients, the censored multivariate GMR shows greater accuracy than other approaches.The ignore-censor GMR and delete-censor GMR are particularly inaccurate for group 3, which has regression coefficients leading to greater censoring than groups 1 and 2 (Table 3).For the covariance matrices, the censored multivariate GMR is considerably more accurate than other approaches.
In Scenario II, we observe similar patterns but the benefits of the censored multivariate GMR are even greater (Figure 1, Table 3).The censored multivariate GMR is still able to accurately estimate the mixing proportions, while ignore-censor GMR leads to gross inaccuracies.The censored GMM overestimated the frequency of group 2. Overall, the ARIs in Scenario II are lower than Scenario I, but the ARI from the censored multivariate GMR (0.68) is much higher than the ignore-censor GMR (0.08) and the censored GMM (0.17).Delete-censor GMR can not perform clustering on approximately 575 observations.The censored multivariate GMR still accurately estimates the regression coefficients and covariance matrices, while other approaches become highly inaccurate.
These results highlight the need to model both the censoring and the predictors using the censored multivariate GMR.Even if a study is not interested in the influence of predictors, a mixture of regressions is necessary for accurate clustering.Moreover, the censoring must be modeled, otherwise both clustering and regression coefficient estimates are highly inaccurate.
In Scenarios I and II, all type-1 error rates are near nominal levels, with the highest type-1 error rate equal to 0.056 (Appendix Table B1).
In selecting the number of components in the censored multivariate GMR, ICL selects the correct number of components in both Scenarios I and II (Appendix Figure C2).4).Lower CSF Abeta42 corresponds to higher brain Abeta42 burden, such that we expect Abeta42 to be lower in patients with AD.In contrast, higher CSF tTau and pTau are associated with higher tau in the brain, and thus we expect tTau and pTau to be higher in patients with AD.A common approach is to use the ratio of Abeta42/tTau or Abeta42/pTau, where small values indicate AD pathology (Hampel et al., 2008;Meyer et al., 2010).However, using ratios obscures the censoring and discards information available from the multivariate approach.Since ignoring censoring led to erroneous clustering in our simulations, we believe the multivariate approach with the censored multivariate GMR will better reflect the underlying biology.
The data include age (decades), education level (decades), self-reported race, sex and apolipoprotein gene type.Due to small samples sizes in American Indian or Alaska Native (n = 6), Asian (n = 36), and Native Hawaiian or Other Pacific Islander (n = 7), we created a binary variable for race equal to one if the participant was African American and zero otherwise.Additionally, there were 82 self-report Hispanic or Latino participants, which was not examined due to sample size.
We included four levels: negative for the ε4 allele, heterozygous ApoE4 (ApoE4-1), homozygous ApoE4 (ApoE4-2) and missing data (Table 4).All individuals provided informed consent and all procedures are approved by the Emory University Institutional Review Board.
To select the optimal number of clusters, we calculate the ICL for the number of clusters G = 1, ..., 6.For each G, we estimate an intercept and regression coefficients for the five demographic variables for the three CSF biomarkers, and we estimate the 3 × 3 covariance matrices.The model is randomly initialized 50 times and the solution with highest likelihood among the set of converge solutions is selected.(50/50 iterations converge for G=1 to 4, 45/50 for G=5, and 35/50 for G=6.) The optimal number of clusters identified by ICL is equal to three (Appendix Figure C3).
To gain insight into the meaning of these groups, we visualize their relationship with the three in this group relative to the observed frequencies.
Other notable findings in this group are no association between age and Abeta42, but a positive relationship with tTau and pTau.Compared to other groups, the coefficients of age on tTau and pTau are large, which reflects a faster progressing tau pathology in the AD group.
Group 2: Control-like.In contrast to the AD-like group, CSF Abeta42 are significantly lower in African American participants compared to the group composed of other races in the control-like group.Total tau and pTau are also significantly decreased in African American participants, but the coefficients are smaller than in the AD-like group.This again suggests that AD pathology differs in African American participants, and underscores importance of the mixture of regressions approach.
Additionally, females in the control-like group had significantly higher levels of CSF Abeta42 than males.
Carriers of ApoE4 have greatly reduced CSF Abeta42, but in contrast to the AD-like group, the tau levels are unchanged.Previous studies that have found that ApoE4 is associated with Abeta42 but not tau in cognitively normal aging (Morris et al., 2010).
In contrast to the "AD-like pathology" group, CSF Abeta42 decreases with age (leading to an increase in brain Abeta42) in the control-like group.Since the levels of CSF Abeta42 levels are much lower at baseline in the AD-like group, CSF Abeta42 levels in the control-like group are still higher than the AD-like group at higher ages.Total tau and pTau increase with age, but at a slower rate compared to the AD-like group, which reflects age-progressing tau pathology in the control-like group.
Group 3: Non-AD pathology.Overall, we do not see significant relationships between the predictors and CSF biomarkers in this group (Table 5, Figure 4).This may be due in part to the smaller mixing proportion implying small sample size and imprecise coefficient estimates.

Discussion
We used a censored multivariate Gaussian mixture of regressions with a feasible EM algorithm to examine predictor impacts on subgroups in CSF biomarkers of Alzheimer's Disease.The approach is similar to estimating multivariate regression effects in which all predictors interact with a group variable, but here we also learn the group labels.Our approach simultaneously identifies clusters while allowing the effects of predictors to differ for different clusters for a multivariate outcome with detection limits.In contrast to intensive bootstrap methods, we approximate the asymptotic covariance matrix of the within-cluster effects β g using the empirical complete score function.Our simulations show that this approach adequately controls type-1 error rates in large samples (n ≈ 1,000).In simulations with moderate (comparable to our data application) and severe censoring, we show that ignoring censored records, deleting censored records or ignoring predictors creates substantial inaccuracies.Our approach results in large improvements in both the accuracy of clustering and regression estimates.Our simulations add to the latent class literature by demonstrating that modeling both the censoring and the predictors are important for accurate clustering.
Our analysis of the Emory ADRC/HBS Dataset using the censored multivariate GMR reveals new insights.We identify three clusters that tend to align with an AD-like group, a control-like group and a third group with undefined non-AD pathology.Predictor effects vary across clusters.
African American participants in the AD-like group had less severe tTau and pTau pathology.CSF biomarkers typically use the ratio of Abeta42 to tau, but previous studies may have based such determinations from studies of non-Hispanic Whites (Meyer et al., 2010).Recently, some researchers reported potential racial differences in CSF biomarkers (Morris et al., 2019;Garrett et al., 2019), which aligns with our findings.Additionally, the effects of ApoE4 on CSF biomarker levels differed between the AD-like and control-like groups, females had higher CSF Abeta42 than males in the control-like group, there were no age impacts on Abeta42 in the AD-like group but significant effects in the control-like group, and age impacts on pTau and tTau were greatest in the AD-like group.
We found a higher proportion of patients in the AD-like group than were diagnosed with AD.This is expected since there is a significant number of cognitively normal individuals with asymptomatic AD (Jansen et al., 2022(Jansen et al., , 2015)).Likewise, the vast majority of those clinically diagnosed as "Other" would be expected to fall into a non-AD or control-like multivariate CSF distribution.A benefit of the censored multivariate GMR is that it generates probabilities of membership in each cluster.
Future research can create an interactive tool to allow a clinician to enter a patient's data and obtain probabilities for membership in the AD-like, control-like, and non-AD pathology groups, which can complement existing approaches to diagnosing AD.
There are a number of limitations of our approach.Model selection and interpretation can be challenging with increasing number of response variables, predictors and groups.Penalized approaches may be helpful in higher dimensions (Khalili and Lin, 2013;Xie et al., 2010).Another avenue for future research is to consider an alternative approach that models the probability of latent class membership as a function of covariates using multinomial regression (Jacobs et al., 1991).In our approach, the predictors indirectly impact the posterior probabilities.This results in more interpretable effects, which can deepen our understanding of the impact of demographics, behavior and genetics on biomarkers in complex neurological disorders.2015); Latent GOLD 6.0: Vermunt and Magidson (2021).

Software
Table 2.: Summary of simulation scenarios.
The EM algorithm steps are derived by maximization of the conditional expectation of Q c (ψ; ψ (k) ; y): • E-step, update the conditional expectations based on old parameter values: where are the first and second moments of the multivariate truncated conditional Gaussian distribution subject to the truncation region D(c i ) which can be numerically computed using a quasi-Monte Carlo integration algorithm (Genz and Bretz, 2002;Genz, 2004).
The conditional complete data score function w.r.t the parameters of interest: • Solve the conditional complete data score S c (β; ψ (k) , y) at 0: • Solve the conditional score S c (Σ; ψ (k) , y) at 0 and plug-in β: where Y * g is the N × p matrix of stacked conditional expectations y * i g .The EM algorithm iterates between the E and M steps sequentially till convergence.

A.2. EM algorithm of the censored multivariate GMR (tobit regression)
Similarly, based upon the model definition in Section 2.2, the complete data likelihood assuming both the latent component labels z ig and latent data y * i are observed: (A6) The EM algorithm steps are derived by maximization of the conditional expectation of Q c (Ψ; Ψ (k) , y): • E-step, update the conditional expectations based on old parameter values: are the first and second moments of truncated conditional Gaussian distribution of the gth mixture component subject to the truncation region D(c i ), which can be numerically computed using a quasi-Monte Carlo integration algorithm developed by (Genz and Bretz, 2002;Genz, 2004).

4.
Real data application: Emory ADRC/HBS Dataset The purposes of this analysis are twofold: 1) to identify patient clusters based upon their CSF biomarkers without utilizing possibly incorrect clinical diagnoses; and 2) to evaluate the withincluster effects of the predictors on the CSF biomarkers.The Emory ADRC/HBS Dataset contains 3,004 individuals including 888 individuals with AD, 661 individuals with other cognitive disorders (Other) and 1,455 individuals who are cognitively normal (CN).Diagnosis is based on a combination of clinical history, neuropsychological testing, blood tests, structural neuroimaging and CSF biomarkers.All individuals included in our analyses provided informed consent to participate in research protocols approved by the Emory University Institutional Review Board (EHBS IRB00080300, ADRC IRB00079069, NeuCog IRB00078273, Vascular IRB00084414, CRIN IRB00024959).All CSF samples are assayed on the Roche Elecsys platform to measure levels of amyloid-beta peptide 1-42 (Abeta42), total tau protein (tTau) and phosphorylated tau protein (pTau).Because of the detection limits of the biometric assay, the CSF biomarkers are subject to censoring: 10/3,004 observations of Abeta42 are left censored and 349/3,004 observations are right censored; 31/3,004 observations of tTau are left censored and 4 are right censored; and 110/3,004 observations of pTau are left-censored and 5 are right censored (Figure 2, Table

(
possibly incorrect) clinical diagnoses (Figure 3).Panel A shows a scatterplot of Abeta42 and tTau colored by diagnosis group.Panel D shows the same scatterplot colored by the censored multivariate GMR clusters.Panels B and E show Abeta42 versus pTau for the AD-diagnosis and censored multivariate GMR, respectively, and Panels C and F show tTau versus pTau for AD-diagnosis and the censored multivariate GMR, respectively.

Figure 4 Figure 4 .
Figure 4 letting • denote an expectation conditioning on the observed data y ∈ R N ×p , then the conditional expectation of the complete data log-likelihood function takes the form:Q c (Ψ; Ψ (k) , y) ∝ g β g x i x i β g }.
: Type-1 error rates for within-cluster effects in Scenario I (mild censoring) and Scenario II (severe censoring).

Figure
Figure C1.: Example simulated data set and censoring mechanism.The data prior to censoring is depicted in row 1.In Scenario I (row 2), Y 1 is left-censored at 0 and Y 2 is right-censored at 30.In Scenario II (row 3), Y 1 is left-censored at 2.5 and Y 2 is right-censored at 30.

Figure
Figure C2.: Selection of the optimal G for Scenario I & II on 101 data replicates.A: Selection of the optimal G in Scenario I; B: Selection of the optimal G in Scenario II.

Figure
Figure C3.: Selecting the optimal G on the Emory Cognitive Neurology Biomarker Data.ICL indicates 3 groups.

Table 3 .
: Simulation results in Scenarios I and II from the censored multivariate GMR and three other approaches.Ignore-censor multivariate GMR uses the mixture of Gaussian regressions while treating the censored data in the same manner as uncensored.Delete-censor multivariate GMR uses the mixture of Gaussian regressions but deleting censored observations.Censored multivariate GMM uses the censored mixture of Gaussians without considering the effects of predictors, consequently only ω g , Σ g and the ARI are comparable to other approaches.Reported are the mean (sd) estimate across simulations for ω g , the mean (sd) of Frobenius errors and adjusted Rand Index (ARI) from 101 replicates.The clustering results for the replicate associated with the median error is in Figure1.ARI is not reported for delete-censor multivariate GMR because it can not perform clustering on the censored observations.

Table 4 .
: Patient demographics of the Emory Goizueta Alzheimer's Disease Research Center and the Emory Healthy Brain Study Dataset.