Evaluating Covariate Effects on ESM Measurement Model Changes with Latent Markov Factor Analysis: A Three-Step Approach

ABSTRACT Invariance of the measurement model (MM) between subjects and within subjects over time is a prerequisite for drawing valid inferences when studying dynamics of psychological factors in intensive longitudinal data. To conveniently evaluate this invariance, latent Markov factor analysis (LMFA) was proposed. LMFA combines a latent Markov model with mixture factor analysis: The Markov model captures changes in MMs over time by clustering subjects’ observations into a few states and state-specific factor analyses reveal what the MMs look like. However, to estimate the model, Vogelsmeier, Vermunt, van Roekel, and De Roover (2019) introduced a one-step (full information maximum likelihood; FIML) approach that is counterintuitive for applied researchers and entails cumbersome model selection procedures in the presence of many covariates. In this paper, we simplify the complex LMFA estimation and facilitate the exploration of covariate effects on state memberships by splitting the estimation in three intuitive steps: (1) obtain states with mixture factor analysis while treating repeated measures as independent, (2) assign observations to the states, and (3) use these states in a discrete- or continuous-time latent Markov model taking into account classification errors. A real data example demonstrates the empirical value.


Introduction
New methods such as Experience Sampling Methodology (ESM; Scollon et al., 2003) enable the assessment of psychological constructs or "factors" (e.g., depression) in daily life by repeatedly questioning multiple participants via smartphone apps, for example, nine times a day for one week. Such intensive longitudinal studies (say with more than 50 measurement occasions) are often conducted to analyze dynamics in factor means. For instance, researchers investigated how emotional dynamics relate to subjects' mental health (Myin-Germeys et al., 2018) or tailored interventions to subject's daily experience of negative affect (Van Roekel et al., 2017). For drawing valid inferences about the dynamics, it is crucial that the measurement model (MM) is invariant (i.e., constant) between and within persons over time. The MM indicates which factors are measured and how these factors are measured by items, which is expressed by means of "factor loadings". In case of continuous data, the MM is obtained with factor analysis (FA). If the MM is invariant, the factors are conceptually equal across subjects and time-points and therefore comparable. However, the MM might be affected by subject-or time-point-specific response styles or substantive changes in item interpretation. As a result, the MMs might differ between subjects (e.g., the item interpretation might depend on subjects' psychopathology) but the MM might also differ within one subject (e.g., the response style of choosing only the extreme categories might depend on situational motivation to complete the questionnaire). If invariance stays undetected, inferences may be invalid. For example, a mean score change in negative affect might be at least partly due to changes in item interpretations.
To conveniently evaluate (violations of) invariance of intensive longitudinal data for multiple subjects simultaneously, latent Markov factor analysis (LMFA; Vogelsmeier, Vermunt, van Roekel, & De Roover, 2019;Vogelsmeier, Vermunt, B€ oing-Messing, & De Roover, 2019) was proposed, which combines a discrete-or continuous-time latent Markov model with mixture factor analysis. 1 As will be described in more detail in Section "Latent Markov Factor Analysis", the Markov model clusters subject-and time-point-specific observations according to their underlying MM into dynamic latent MM classes or "states", which implies that subjects can transition between latent states and thus between MMs over time. 2 State-specific factor analyses reveal which MM applies to each state. Observations that belong to the same state are invariant. Observations that belong to different states are non-invariant, for instance, with regard to the number or nature of the underlying factors or the size of factor loadings. Note that some subjects might stay in one state, which implies within-person invariance (i.e., over time). Other subjects may transition (more or less frequently) between different MMs, which implies within-person non-invariance. Moreover, comparing state memberships across subjects provides information about between-person (non-)invariance.
The aim of assessing non-invariance patterns is usually twofold. On the one hand, detecting noninvariance is important for deciding how to proceed with the data analysis. For example, if the invariance violation is strong, one may decide to conduct factormean analyses with observations from one state only. If only a few MM parameters differ across states (i.e., "partial invariance" holds; Byrne et al., 1989), one may decide to investigate dynamics in the factor means but let the corresponding MM parameters differ across states. More specifically, if discrete (i.e., abrupt) changes are of interest, one could continue with LMFA by adding factor means to the model and constraining invariant parameters to be equal across states. The state memberships would then (also) capture discrete changes in factor means (this is further explained in Section "Measurement part"). 3 If continuous (i.e., smooth) changes are of interest, researchers could opt for a latent growth model (Muth en, 2002) with state-specific parameters.
On the other hand, researchers would typically like to include explanatory variables (in the following referred to as "covariates") that can possibly explain MM changes so they can learn about these substantively interesting aspects of their data. As an example, when studying adolescents' affective wellbeing in daily life, the situational context (e.g., being with friends versus being with parents) might lead to different MMs in that some items may be more relevant for measuring affect in one over the other context. For instance, "being excited" might be more related to the positive affect factor when being with friends whereas "being content" might be more related to positive affect when being with parents.
Exploring the relations between covariates and state memberships is theoretically possible by adding different (sets of) covariates to the "structural model" (SM). Note that the SM generally refers to the causal relationships between latent variables (and/or exogenous variables) or between latent variables at consecutive measurement occasions. Specifically, in LMFA, the SM refers to the transitions between states and thus between MMs. However, with the currently implemented full information maximum likelihood (FIML) approach, that estimates all parameters (i.e., from the MM and the SM) at the same time, exploring covariate effects is cumbersome. LMFA is an exploratory method, which entails that researchers have to select the best model in terms of number of latent states and number of factors within the states. To this end, one needs to estimate a large number of plausible models and compare them with the Bayesian information criterion (Vogelsmeier, Vermunt, van Roekel, & De Roover, 2019) or an alternative model selection criterion. For example, comparing models with 1 À 3 states and 1 À 3 factors per state would already result in 19 models that have to be estimated by the researchers. Model selection with covariates is even more cumbersome because the whole model (i.e., the MMs and the SM) would have to be re-estimated for every set of covariates. Especially in exploratory studies, where researcher might want to add or remove covariates until only significant covariates are left, the model selection procedure quickly becomes unfeasible (e.g., if there are only three different sets of covariates for the 19 different model complexities, this would already result in 19 Â 3 ¼ 57 models).
To avoid the model selection problem with covariates in any latent class analysis (the latent Markov model is a specific variant thereof), researchers sometimes first select the MM (or MMs if they differ across latent classes) without including the covariates to the SM. Once the choice about the complexity of the MM has been made, researchers include the covariates in the SM and re-estimate the whole model (i.e., only 19 þ 3 ¼ 21 models have to be estimated). However, 1 Note that it is also possible to apply LMFA to a single subject if the number of observations was large enough. For guidelines on the required number of observations, see Vogelsmeier, Vermunt, van Roekel, et al. (2019). 2 Note that classifying observations of different subjects into the same latent states entails that persons are assumed to differ from themselves in the same way as persons differ from other persons. 3 This analysis would be comparable to the factor-mean modeling approach that was proposed by Bartolucci and Solis-Trapala (2010).
this is problematic because, in the FIML approach, both parts of the model, the MM and the SM, are estimated at the same time so that specifications of the SM may also influence the MM. Thus, including covariates can redefine the states and even impact the optimal number of states or factors (Bakk et al., 2013;Nylund-Gibson et al., 2014).
A better strategy that considerably simplifies the estimation is the so-called "three-step" ("3S") approach, which decomposes the estimation into three manageable pieces. More specifically, the steps for a latent Markov model are as follows: step 1: obtain state-specific MMs by conducting mixture factor analysis on the repeated measures data while disregarding the dependency of these observations; step 2: assign the observations to the states (and thus the MMs) based on posterior state probabilities; step 3: pass the state-specific MMs to a latent Markov model in order to estimate the probabilities to transition between the states (the three steps will be elaborated in Section "Three-Step Estimation of Latent Markov Factor Analysis"). Although the MMs are also estimated first without considering the SM with its covariates (step 1), the MMs are kept fixed when adding the covariates to the SM (step 3).
Next to facilitating the inclusion of covariates, the step-wise approach is also more intuitive because it ensures that the states-and thus the formation of the MMs-are free from covariate influences. This is required if latent classes (in our case latent states) should exclusively capture heterogeneity in the MMs (Bakk et al., 2013;Nylund-Gibson et al., 2014). Moreover, the step-wise approach better corresponds with how researchers prefer to approach their analyses (Bakk et al., 2013;Devlieger et al., 2016;Vermunt, 2010). That is, they rather see the investigation of the SM (i.e., in our case the transitions between states and the influence of covariates) as a final step that comes after investigating what the MMs look like. Because of the separate steps, the analyses could even be distributed across researchers such that one researcher carries out the first step to obtain the different underlying MMs. A second researcher could take the results and continue with the analyses of the transitions between the MMs. If everything has to be done in one step, it may quickly become overwhelming. Thus, applied researchers are used to and typically prefer such stepwise approaches and perceive simultaneous "one-step approaches" as counter-intuitive and more difficult to interpret (Vermunt, 2010). Especially for complex analyses such as LMFA, offering step-wise approaches can therefore help to reach applied researchers and motivate them to apply the new method.
When splitting the estimation of latent class models in general-and latent Markov models in particularinto the estimation of the MM(s) and the SM, estimates of the SM would be biased, however. In order to prevent this bias, the estimation procedure has to take into account the classification error that results from classifying observations into classes or states because classification is never perfect. To this end, Bolck et al. (2004) proposed the "BCH" method in which the classification error is used to reweight the data prior to conducting logistic regressions to predict class membership. Moreover, Vermunt (2010) developed an alternative, more flexible, maximum likelihood correction ("ML" method) in which the estimation of the latent class model in the third step explicitly incorporates the classification error. More recently, the ML approach (or an extension thereof) was applied to the 3S estimation of latent Markov models (e.g., Asparouhov & Muth en, 2014;Di Mari et al., 2016;Nylund-Gibson et al., 2014) and showed to be a trustworthy alternative to the one-step FIML approach.
The aims of the current study are (1) to tailor the ML correction method to LMFA (in the following referred to as 3S-LMFA) to provide a more accessible alternative to the FIML estimation (in the following referred to as FIML-LMFA) that is more convenient to use (especially with covariates) and easier to interpret for applied researchers and (2) to evaluate whether 3S-LMFA approaches the good performance of FIML-LMFA in terms of state and parameter recovery. Note that both, FIML-LMFA as well as 3S-LMFA, can be estimated by means of Latent GOLD (LG) syntax (Vermunt & Magidson, 2016), which is also used for the current study.
The remainder of this paper is structured as follows: In Section "Method", we first describe the data structure, provide a motivating example, outline the general LMFA model, and explain the steps of 3S-LMFA. In Section "Simulation Study", by means of a simulation study, we evaluate the performance of 3S-LMFA and compare it to the performance of FIML-LMFA. Section "Application" illustrates the empirical value of 3S-LMFA by means of a real-data application. In Section "Discussion", we discuss limitations of 3S-LMFA and directions for future research.

Method
In the following, we first describe the data structure, the LMFA model, and the FIML estimation before we explain the three steps of the 3S estimation in detail.

Data structure
We assume ESM data with repeated measures observations (with multiple continuous variables) that are nested within subjects and are denoted by y ijt (where i ¼ 1, :::, I refers to subjects, j ¼ 1, :::, J refers to items, and t ¼ 1, :::, T to time-points). Note that T may differ across subjects but we omit the index i in T i for simplicity. The y ijt are collected in the 1 Â J vectors y it ¼ ðy i1t , :::, y iJt Þ that themselves are collected in the T Â J data matrices Y i ¼ y 0 i1 , :::, y 0 iT À Á 0 :

Motivating example
In order to motivate the use of LMFA in general and the 3S approach in particular, consider the following ESM data that was collected within the larger ADAPT project (Keijsers et al., 2017 Ebesutani et al., 2012;Keijsers et al., 2019;Watson et al., 1988), where five items indicated positive affect and another five items indicated negative affect (all items are displayed in Table 4). All affect items were measured on a Visual Analog Scale (VAS) from 0 (not at all) to 100 (very much). The visual display of the items in the app can be found in the Online Supplement S.3. Next to the affect questionnaire, adolescents also completed questionnaires to assess time-varying covariates (e.g., participants' current company) at each measurement occasion. Furthermore, before the ESM study, participants completed a baseline questionnaire about timeconstant covariates (e.g., on emotion clarity and emotion differentiation capability). A typical next step of substantive or applied researchers would be to investigate changes in positive and negative affect over time. However, if response styles or item interpretation differ across time-points and/or subjects, the MM is not invariant within and between subjects and conclusions about dynamics in affect may be invalid. LMFA can be used to trace MM differences between subjects and MM changes over time. More specifically, there are two main research questions that can be answered using LMFA: 1. Which MMs underlie which parts of the data and how do the MMs differ? 2. Are the MMs related to time-varying and/or timeconstant covariates?
For answering only the first question, FIML-LMFA can be used. However, if researcher also want to answer the second question, the model selection including covariates would be too cumbersome with FIML-LMFA and 3S-LMFA is indispensable. In Section "Application", we answer both research questions using 3S-LMFA.

Latent Markov factor analysis
LMFA consists of two parts. First, the measurement part concerns the state-specific response variable distributions that, in the case of LMFA, consist of the MMs for the constructs, which are defined by a mixture of factor models. Second, the structural part concerns the discrete latent process that is either defined Figure 1. Artificial example of the relations between the structural model parameters (top panel) and a zoomed in state-specific measurement model (bottom panel) in the full information maximum likelihood LMFA. Note that the statespecific measurement models may differ regarding all parameters, including the number of factors and the values of the loadings (k kjf ), intercepts ( kj ), and unique errors (e kj ). by a "discrete-time" latent Markov model (Bartolucci et al., 2014Collins & Lanza, 2010;Zucchini et al., 2016), which assumes equal time-intervals, or by a "continuous-time" latent Markov model (B€ ockenholt, 2005;Jackson & Sharples, 2002), which allows timeintervals to differ. Additionally, it is possible to include covariates to the SM. Figure 1 depicts the relations between the parameters from the SM and zooms in on the relation between the states from the SM and the state-specific MMs by means of an artificial example. The different parts including the notation will be described next.

Measurement part
The measurement part shows how the state memberships define the responses. Thereby, it is important to note that the responses at time-point t, y it , depend only on the latent state k ðk ¼ 1, :::, KÞ at that timepoint and the responses are thus independent of the responses at other time-points given that state ("independence assumption"), which is also illustrated in Figure 1. In LMFA, the factor model depends on the state membership of subject i at time-point t (denoted by s itk ¼ 1) as follows: In this equation, K k is the state-specific J Â F k loading matrix (where F k is the state-specific number of factors); f itk $ MVNð0, W k Þ is the subject-specific F k Â 1 vector of factor scores at time-point t (where W k is the state-specific factor covariance matrix); m k is the state-specific J Â 1 intercept vector; and e itk $ MVNð0, D k Þ the subject-specific J Â 1 vector of residuals at time-point t (with D k containing the unique variances d kj on the diagonal and zeros on the offdiagonal). Thus, the state-specific response densities, p y it js itk ¼ 1 À Á , are defined by state-specific multivariate normal distributions with means m k and covariance matrices R k ¼ K k K 0 k þ D k : To obtain the statespecific factor models, LMFA employs exploratory factor analysis within the states in order to retain maximal flexibility regarding the differences in MMs that can be traced. In contrast to confirmatory factor analysis, exploratory factor analysis puts no a priori constraints on the factor loadings. However, if desired, confirmatory factor analysis can also be used.
From Eq.
(1) we can see that the state-specific MMs may differ with regard to their loadings K k , intercepts m k , unique variances D k , and factor (co)variances W k , implying that LMFA explores all levels of measurement non-invariance, that is, configural invariance (invariant number of factors and pattern of zero loadings), weak factorial invariance (invariant loading values), strong factorial invariance (invariant intercepts), and strict factorial invariance (invariant unique variances) (for more details see, e.g., Meredith, 1993). For identification purposes, the factor variances are equal to one in all the states and rotational freedom is dealt with by means of criteria to optimize simple structure and/or the between-state agreement of the factor loadings (e.g., Clarkson & Jennrich, 1988;Kiers, 1997).
It is important to note that restricting the factors to have a mean of zero and a variance of one has the consequence that changes in factor means and variances may be captured as changes in the intercepts and loadings (i.e., if an additional state is selected for such a change). Therefore, when all intercepts that pertain to the same factor are higher or lower in one state compared to the other, it might be a sign that the factor means rather than separate intercepts differ across these states. Similarly, if all loadings of the same factor are likewise larger or smaller (i.e., the scaling is affected), it might be a sign that factor variances rather than the separate loadings differ across states. However, when the number of factors differs across states, it does not make sense to disentangle loading differences from factor-variance differences and, as long as weak invariance is violated, it does not make sense to disentangle intercept differences from factormean differences. In contrast, if the loadings and intercepts are (at least partially) invariant 5 , one could go ahead with an adjusted LMFA-that means, with equality restrictions on the MM parameters and including state-specific factor variances and meansand capture discrete changes in factor variances and means over time, as was already mentioned in the Introduction.
Furthermore, LMFA currently assumes that factors have no auto-and cross-lagged correlations at consecutive time-points. By means of a dynamic factor analysis, it would be possible to incorporate such autocorrelations, but factor rotation would be more intricate as auto-and cross-lagged relations have to be rotated toward a priori specified target matrices 5 Note that evaluating higher levels of invariances would require specifying increasingly more restrictive confirmatory factor analysis (CFA) models within the states in step 1 of the three-step estimation. Deciding on which equality constrains to add for the loadings may be based on multigroup factor rotation (MGFR; De Roover, Vermunt, & Ceulemans, 2020). For the intercepts and unique variances, the decision can be based on "score tests" (Oberski, van Kollenburg, & Vermunt, 2013). The tenability of (partial) invariance can be evaluated by comparing models with different restrictions, for example, in terms of their BIC value (Lubke & Muth en, 2005). Note that estimating constrained models is already possible in LG but the CFA variant of LMFA with equality restrictions has yet to be explained and evaluated in future research. (Browne, 2001;Zhang, 2006). This would require a priori hypotheses about MM changes that are often not present or incomplete and that are thus undesired in exploratory studies. In addition, one would require more measurement occasions per subject (Ram et al., 2012), which is often unfeasible. In order to investigate whether ignoring autocorrelations in the data would pose problems for LMFA, Vogelsmeier, Vermunt, van Roekel, and De Roover (2019) conducted a simulation study using the FIML estimation and showed that the state and parameter recovery of the MMs were largely unaffected. Note, however, that ignoring dependency in the data leads to an underestimation of standard errors (SEs) of the MM parameters. This is only relevant when using hypothesis tests to trace significant differences in the MMs across states, which is possible by means of Wald tests using De Roover and Vermunt (2019)'s recently developed "multigroup factor rotation". 6 One would then have to correct for the dependency in the data (e.g., with the "primary sampling unit" identifier in LG; Vermunt & Magidson, 2016). Otherwise, invariance of parameters would be rejected too easily. However, the hypothesis tests are outside the scope of this paper.
Structural part A latent Markov model generalizes a latent class model-the statistical method to identify subgroups with a similar set of indicator values-because subjects can transition between classes in a latent Markov model, while subjects remain in the same classes in a latent class model. The classes in a latent Markov model are therefore referred to as "states". For an extensive description of latent Markov models, see, for example, Bartolucci, Farcomeni, et al. (2015) and Zucchini et al. (2016). In brief, transitions between the states are captured by a latent "Markov chain" defined by the probabilities to start in a state k at time-point t ¼ 1 ("initial state probabilities") and the probability of being in a state k at time-point t > 1 conditional on the occupied state lðl ¼ 1, :::, KÞ at t À 1 ("transition probabilities"). Note that, according to the first-order Markov assumption, the probability of being in a certain state k at time-point t depends only on the state at t À 1: The initial state probabilities are given by the K Â 1 probability vector p, which contains the elements p k ¼ p s 1k ¼ 1 ð Þwith s tk referring to the state membership k at time-point t (e.g., if a subject is in state 1 at time-point 1, then s 11 ¼ 1 and s 12 ¼ Á Á Á ¼ s 1K ¼ 0). These binary variables are in turn collected in the membership vectors s it ¼ s it1 , :::, s itK ð Þ 0 , for t ¼ 1, :::, T, which are in turn collected in the K Â T state membership matrix S ¼ s i1 , s i2 , :::, s iT ð Þ : The transition probabilities are collected in the K Â K matrix P, which contains the elements p lk ¼ pðs tk ¼ 1js tÀ1, l ¼ 1Þ: Note that the rows indicate the state that a person comes from and the columns determine the state where the person transitions to. Hence, the diagonal elements represent the probabilities to stay in a state and the off-diagonal elements the probabilities to transition to another state. Therefore, diagonal values close or equal to 1 indicate stable state memberships and, thus, within-person invariance. It applies that the sum of the initial state probabilities, P K k¼1 p k , and the row sums of the transition probabilities, P K k¼1 p lk , equal 1. In the discrete-time latent Markov model, the timeintervals between observations are assumed to be equal. This assumption is often not tenable in empirical data. For instance, the questionnaires in ESM are usually send out at random moments and participants may skip certain measurement occasions, which automatically increases the distance between two subsequent observations. To accommodate such data, a continuous-time latent Markov model can be employed, which allows for differing intervals across time-points and subjects by considering the length of time spent in a state, d: In the following, we provide a brief summary. The interested reader is referred to B€ ockenholt (2005) and Jackson and Sharples (2002) for general information about continuous-time latent Markov model and to Vogelsmeier, Vermunt, B€ oing-Messing, and De Roover (2019) for more specific information on continuous-time-LMFA. In brief, transitioning from the origin state l to destination statek is defined by the "intensities" (or rates) q lk (collected in the K Â K intensity matrix Q) that replace the transition probabilities p lk and can be seen as probabilities to transition between states per very small time unit: for all k 6 ¼ l (thus, for the off-diagonal elements in the intensity matrix Q). The diagonal elements are equal to the negative row sums (i.e., À P k6 ¼l q lk ; Cox & Miller, 1965). The transition probabilities for any interval of interest can be computed by taking the matrix exponential of Q Â d: Note that larger timeintervals d increase the probability to transition to a different state. In turn, Q can be obtained by taking the matrix logarithm of P: 7 In the following, we expand the structural part by including U subject and possibly time-point specific covariates z itu (collected in the U Â 1 vectors z it ) such that they affect the initial and transition probabilities. 8 Note that the measurement part is assumed to be not (directly) affected by the covariates, which can also be seen in Figure 1. Also note that the parameters of the structural part are typically modeled using a logit model (for initial and transition probabilities) or via a log-linear model (for transition intensities) in order to prevent parameter space restrictions, which is also what LG does. The covariates enter the model through these parameterizations. For the initial state probabilities, we use the parameterization with k ¼ 2, :::, K and k ¼ 1 as the reference category.
The coefficients b 0k are the initial state intercepts and b 0 k ¼ ðb k, z i11 , :::, b k, z i1U Þ 0 are the initial state slopes, which quantify the effect of the covariates on the initial state memberships. In discrete-time-LMFA, the multinomial logistic model for the transition probabilities is with k 6 ¼ l: Thus, the logit is modeled by comparing the transition from state l to state k with the probability of staying in state l: The coefficients c 0lk are the transition intercepts and c 0 lk ¼ ðc lk, z itu , :::, c lk, z itU Þ 0 are the transition slopes, which quantify the effect of the covariates on transitioning to another state. In continuous-time-LMFA, we use a log-linear model for the transition intensities (for k 6 ¼ l): Finally, the joint distribution of observations and states, given the covariates, is Note that the d ti in p d ti s it js itÀ1 , z it ð Þ refers to the transition probabilities' dependency on the subjectand time-point-specific time-interval in continuoustime-LMFA. The term reduces to p s it js itÀ1 , z it ð Þin discrete-time-LMFA.

FIML estimation of latent Markov factor analysis (FIML-LMFA)
In order to obtain the maximum likelihood (ML) parameter estimates with the FIML estimation, the following loglikelihood function has to be maximized: with p Y i , S i jZ i ð Þas given in Eq. (6). The ML estimates can be obtained by means of the forward-backward algorithm (Baum et al., 1970), which is an efficient version of the expectation maximization (EM; Dempster et al., 1977) algorithm and is also utilized by LG to find the ML solution. Within the maximization-step, a Fisher algorithm is used to update the state-specific covariance matrices defined by the factor models (Jennrich & Sampson, 1976) and, in case of continuous-time-LMFA, also to update the log-transition intensities. For a summary of the algorithms (including information about the convergence criteria and the utilized multistart procedure) see It is important to note that we assume the number of states (K) and factors per state (F k Þ to be known when estimating the models. However, in real data, the best model in terms of the number of states and factors has to be evaluated. The Bayesian information criterion (BIC) performs well in selecting the best model in FIML-LMFA although the final decision regarding the optimal model should also take interpretability into account (Vogelsmeier, Vermunt, van Roekel, & De Roover, 2019). When also including covariates, every model under comparison (i.e., with all possible combinations of K and F k ) has to be reestimated every time a covariate is added or removed from the model because, using FIML estimation, the best model may change depending on the included covariates. For instance, when researchers want to obtain the best subset of U ¼ 3 covariate candidates, they would have to estimate 2 U ¼ 8 times the number of models that is already under comparison. When all models are estimated, one may use the BIC and 7 Note that the Q matrix with the particular structure on the off-diagonals follows naturally from taking the matrix logarithm of the P matrix with its restriction P K k¼1 p lk ¼ 1: 8 Note that the only difference between time-varying and time-constant covariates is that the former may take different values within a subject (i.e., in the dataset, the covariate scores may differ across rows) and the latter has the same value within a subject (i.e., in the dataset, the covariate scores are repeated/identical across rows).
interpretability to choose the final model. Thus, the model selection quickly becomes overwhelming and even unfeasible when exploring relations between state memberships and many covariates.

Three-step estimation of latent Markov factor analysis (3S-LMFA)
In contrast to FIML-LMFA, 3S-LMFA decomposes the maximization problem for estimating the MMs and the SM into smaller parts. First, the state-specific MMs are estimated (step 1). Second, the observations are assigned to the MMs (i.e., classified to the states) and "classification errors" are calculated (step 2). Finally, the SM is estimated using the state-assignments while correcting for the classification errors (step 3). In the following, we explain the three steps in detail.
Step 1: Estimation of the state-specific measurement models The first step as illustrated in Figure 2 involves estimating the state-specific MMs underlying the data by means of mixture factor analysis (McLachlan & Peel, 2000;McNicholas, 2016). The structural part (including the covariates) can be validly ignored because, in LMFA, the observations at a given time-point t, y it , are assumed to be conditionally independent of the state at time-point t À 1, s itl ¼ 1, and the covariates at time-point t, z it , given the state membership at time-point t, s itk ¼ 1 (see Figure 1). For the estimation, all repeated observations are treated as "independent" such that respectively, say, 100 observations for each of 100 subjects results in 10,000 independent observations. The model parameters of interest are the state proportions p s itk ¼ 1 ð Þ and the state-specific response probabilities p y itk js itk ¼ 1 The mixture factor analysis model is therefore and the loglikelihood function is In LG, the posterior state probabilities and the state-specific factor models are estimated with an EM algorithm with Fisher scoring (Lee & Jennrich, 1979) in the maximization-step. 9 As already discussed in the introduction, in this step, one also selects the optimal number of states K and factors per state F k without having to be concerned about the covariates. Although the BIC is also a commonly used model selection criterion for mixture factor analysis (McNicholas, 2016), the CHull (Ceulemans & Kiers, 2006) method-which also balances model complexity and fit-proved to outperform the BIC in mixture factor analysis, especially when considering the three best models (Bulteel et al., 2013). Based on their results, we suggest to use the CHull method, potentially combined with the BIC, to select the three best models and compare them in terms of interpretability.
Step 2: Classification of observations and calculation of the classification error Once the state-specific MMs have been estimated, in the second step, we allocate each observation to one of the K states (see Figure 3). Therefore, we create a new variable w it ¼ w it1 , :::, w itK ð Þ 0 , that, similar to s it , represents the assignments of the observations to the estimated MMs from step 1. These predicted state memberships are based on the estimated posterior state probabilities p s itk ¼ 1jY it ð Þ from step 1, which can be expressed using Bayes' theorem as Step 1: Estimating the measurement model by performing mixture factor analysis. Note that the dependence of the observations is disregarded, which is indicated by the missing arrows between the latent states. Step 2: Assigning states and calculating the classification error. 9 Alternatively, one may also use another EM algorithm in the maximization-step (e.g., McNicholas, 2016).
Thus, all observations y it belong to each of the K states with a certain probability p s itk ¼ 1jy it À Á : There are two common rules 10 on how to proceed with these posterior state probabilities with regard to the final state assignments. First, "proportional assignment" assigns a state according to the posterior probabilities such that p w , which leads to a "soft" partitioning. Second, "modal assignment" allocates the weight p w itk ¼ 1jy it À Á ¼ 1 for the state k with the largest posterior state probability in s it and a zero weight for all others states. Note that we will focus on modal assignment because proportional assignment is unfeasible with a large number of time-points per subject, which would involve separate weights for all K T possible combinations of states in case of classification uncertainty (Di Mari et al., 2016).
Regardless of the assignment rule, classification error is inherent to any assignment procedure because the largest posterior probability is usually not equal to 1. We have to account for this error because, if not accounted for, the error attenuates relationships between variables. On the one hand, this attenuation will lead to an underestimation of the relation among true states s it at two consecutive time-points and thus, an overestimation of the transition probabilities away from a state (Vermunt et al., 1999). On the other hand, estimating the relationship between the estimated memberships w it and covariates z it -instead of using the true states s it -causes underestimation of the covariate effects (Di Mari et al., 2016). Hence, a correction for attenuation of relationships due to classification error is necessary.
In order to calculate the classification error so that we can account for it in step 3, we have to obtain the probability of a certain state assignment w itm ¼ 1 conditional on the true state s itk ¼ 1, p w itm ¼ 1js itk ¼ 1 ð Þ , for all k, m ¼ 1, :::, K: These probabilities are collected in the K Â K "classification error probability matrix". They are computed as For the derivation, see the Appendix A.1.1. To solve this equation, p y it ð Þ can be validly substituted by its empirical distribution (Di Mari et al., 2016;Vermunt, 2010), resulting in Note that another option to solve the integral would be to use Monte Carlo simulation. The larger the probabilities for m ¼ k (corresponding to the diagonal elements of the classification error probability matrix), the better the classification and thus, the smaller the classification error. Note that classification error is strongly related to separation between the states (i.e., how well the latent states are predicted by Y ¼ Y ; Bakk et al., 2013;Vermunt, 2010). To qualify the separation in any LC analysis, an entropy-based (pseudo) R-squared measure, R 2 entropy , is commonly used (Luko cien_ e et al., 2010;Vermunt & Magidson, 2016;Wedel & Kamakura, 1998). The R 2 entropy value defines the relative improvement of predicting the state membership when using the observations y it compared to predicting the state membership without y it : Values range from zero (prediction is no better than chance) to one (perfect prediction). State separation (and hence classification error) depends on various factors. For example, it increases with a lower number of states, higher factor overdetermination (which is higher in case of less factors, more variables, or lower unique variances), and lower between-state similarity (determined by larger differences in the state-specific MMs). The R 2 entropy values for the different settings in our simulation study will be reported below in Section "Design and Procedure". Step 3: Estimation of the structural model In the final step, we estimate the SM (i.e., the Markov model with covariates), which is illustrated in Figure  4. The key to correct for the classification error obtained in step 2 is to show the relationship between the estimated state memberships conditional on the covariates, pðW i jZ i Þ, and the true state memberships conditional on the covariates, Mari et al., 2016). Therefore, we consider the joint probability It can be seen that Eq. (13) resembles the FIML-LMFA model from Eq. (6), marginalized over S i and with different response probabilities. It is in fact a latent Markov model with the state assignments from W i as single indicators with K categories replacing the observed item responses Y i : This demonstrates that Y i is no longer needed in step 3 if we have the classification error probabilities p w it js it ð Þ: The response probabilities are fixed to the classification error probabilities and thus do not have to be estimated. Hence, the focus of the latent Markov model changes. Instead of accounting for unobserved heterogeneity of the MMs (as in FIML-LMFA), the latent Markov model accounts for error in the estimated state assignments W i : In order to estimate the SM, the following loglikelihood function has to be maximized: The estimation, just as in the regular FIML-LMFA, is done by means of the forward-backward algorithm. 11 However, the classification error probabilities are utilized as fixed response probabilities, such that only the (covariate-specific) transition and initial-state probabilities need to be estimated. Note that the stateassignments W i are treated as the manifest (i.e., observed) indicators (that contain error) of the "true" (error-free) latent states S i , which are inferred through the forward-backward algorithm and used to determine the parameters of the SM. Differences between W i and S i become less likely for well-separated states with small classification error.
Finally, as already discussed in the Introduction, in the third step, one evaluates which covariates significantly relate to the transition and/or initial state probabilities. Instead of selecting the best subset of covariates by means of an information criterion as in the FIML approach, one may start with a model including all covariate candidates or none of them and use Wald (or likelihood ratio) tests to decide which covariates can be removed from or added to the model one by one (e.g., using forward or backward elimination). Note that, as in any statistical model, there are advantages and disadvantages with regard to such data-driven covariate selection procedures (for a review, see Heinze et al., 2018). When in doubt, one may conduct sensitivity analyses comparing the results from different approaches. When having strong a priori hypotheses about covariates, one may also consider a more theory-driven approach.

Problem
The aim of the simulation study was to evaluate the performance of 3S-LMFA and to see if it approaches the performance of FIML-LMFA. The specific targeted measures were recovery of the states (i.e., the classification), the MM parameters, and the parameters and SEs of the SM consisting of the Markov model with covariate effects. First, parameter and state recovery have previously been shown to be positively influenced by an increasing amount of information (in terms of sample size) and by higher state-separation (i.e., a better distinction between the states; Bakk et al., 2013;Di Mari et al., 2016;Vermunt, 2010). The more information is available and the more separable the states are, the more accurate the mixture factor analysis can estimate the MM parameters in step 1 and the more accurate the estimation of the SM in step 3.
Second, SEs are possibly slightly underestimated because the error probabilities p w itm ¼ 1js itk ¼ 1 ð Þare assumed to be known in step 3 although they are actually estimated parameters of the mixture factor analysis in step 1. When the SEs are underestimated, the Wald statistic to test covariate effects would lead to wrong conclusions regarding the statistical significance of covariates. If this underestimation is present, it will likely vanish with large state separation and amount of information (Di Mari et al., 2016;Vermunt, 2010). In the simulation study, we evaluate whether underestimation is present and from what 11 Note that the third step of 3S-LMFA can be fastened by combining the EM estimation with a Newton-Raphson algorithm which is extensively described in De Roover, Vermunt, Timmerman, and Ceulemans (2017). point on state separation and amount of information are sufficient to obtain trustworthy SE values.
Third, the R 2 entropy and thus the state separation is higher for FIML-LMFA than for the initial state separation in the first step of 3S-LMFA because the former has additional information from the SM (i.e., the covariates and the states occupied at adjacent timepoints) while the latter has information only from the MMs in step 1. Therefore, the recovery of the state memberships is expected to be better for FIML-LMFA. We expect this difference in recovery to decrease when the state memberships are updated in step 3 (i.e., when the SM is also included). However, the degree to which the state-membership recovery in 3S-LMFA approaches the recovery in FIML has to be demonstrated in the simulation study.
Note that the evaluation of the model selection procedures in step 1 (i.e., finding the best number of states, K, and factors per state, F k by means of the BIC and the CHull) and step 3 (i.e., selecting the correct covariates by means of Wald tests, e.g., with backward elimination) is beyond the scope of this paper and will be used only in the application. As described in Section "Step 1: Estimation of the State-Specific Measurement Models", the BIC and the CHull have already been extensively evaluated for mixture factor analysis. Furthermore, when the simulation study shows that the covariate parameters and their SEs are estimated correctly, we believe that the Wald tests will also correctly identify the significant covariates. However, in Section "Discussion", we will discuss the possibility of inaccurate model selection under the violation of the conditional independence assumption.
We manipulated the two key factors: (1) state-separation 12 (this includes (a) between-state loading differences and (b) intercept differences) and (2) amount of information (this includes (c) number of subjects and (d) number of participation days per subject). Note that, for selected conditions, we also investigated whether 3S-LMFA might be more affected by ignoring autocorrelation than FIML-LMFA (see Appendix A.2) and whether varying the strength of the covariate effects and the distribution of the covariates across observations or subjects impacted the estimation procedures differently (see Appendix A.3), which was not the case.

Design and procedure
The conditions were the following: This design resulted in 2 Â 2 Â 4 Â 2 ¼ 32 conditions. For the population model, we used an ESM setup-with number of subjects, N, days, D, and observations per day, T day -that is often found in practice (e.g., van Roekel et al., 2019;Van Roekel et al., 2017). Furthermore, we used unequal time-intervals that are typical for ESM studies and, therefore, employed continuous-time-LMFA. Thereby, the following values were used as constants: number of items J ¼ 20, unique variances e ¼ :2, number of states K ¼ 3, number of factors F k ¼ F ¼ 2, and number of observations per day T day ¼ 9: The latter also determined the ESM sampling scheme (comparable to Vogelsmeier, Vermunt, B€ oing-Messing, et al., 2019): Imposing that a sampling day lasts from 9 am to 9 pm, both day and night intervals were on average 12 hours long. The T day ¼ 9 measurement occasions during the day lead to intervals of 1.5 hours if the measurement-occasions were fixed. However, for random variations, we let observations deviate from these fixed time-points by means of a uniform distribution with a maximum of plus and minus 30 percent of the fixed 1.5 hour intervals. Thus, the deviations were drawn from Unif ðÀ0:3 Â 1:5, 0:3 Â 1:5Þ: To determine the SM, the initial state parameters were chosen to lead to equal probabilities of starting in a state (b 02 ¼ b 03 ¼ 0). The transition intercept parameters were specified to be realistic for a short unit interval of 1.5 hours with high probabilities to stay in a state. 13 More specifically, the intercept parameters were State-separation a: Between-state loading differences at two levels : medium loading differences, low loading differences; b: Between-state intercept differences at two levels : no intercept differences, low intercept differences; To alter the transition probabilities, we used one time-varying dichotomous covariate (z it1 Þ, which changed in value after 3 days for D ¼ 7 or after 15 days for D ¼ 30, and one time-constant dichotomous covariate (z it2 ¼ z i2 Þ that was randomly assigned to the subjects with equal probabilities. Both covariates had values equal to À0:5 or 0:5: A higher value for z it1 lowered the probabilities of transitioning to and staying in state 1 and 3 while increasing the probabilities of transitioning to and staying in state 2. For instance, this time-varying covariate could represent an intervention that increased the probability to move to and stay in a "healthy state". The corresponding slope parameters were c 12, z it1 ¼ c 32, z it1 ¼ 1 and c 13, z it1 ¼ c 21, z it1 ¼ c 23, z it1 ¼ c 31, z it1 ¼ À0:5: Furthermore, a higher value for z i2 increased the probability to transition away from the origin state, leading to a less stable Markov chain. For instance, this stable variable could be a trait-like general stability in response behavior that influences all probabilities to transition away from the state at the previous time-point. The corresponding slope parameters were c 12, Z it2 ¼ c 13, z it2 ¼ c 21, z it2 ¼ c 23, z it2 ¼ c 31, z it2 ¼ c 32, z it2 ¼ 0:5: The four resulting possibilities for the transition probability matrices were P z it1 ¼À:5, z it2 ¼À:5 ¼ Note that the covariate effects appear to be rather small but they increase for larger intervals than the unit interval. Regarding the state separation, we used the same conditions as in previous simulation studies evaluating LMFA (Vogelsmeier, Vermunt, B€ oing-Messing, et al., 2019;Vogelsmeier, Vermunt, van Roekel, & De Roover, 2019). More specifically, we generated data with state-specific MMs as defined in Eq. (1), assuming orthogonal factors (i.e., f itk $ MVNð0, IÞ). To induce the between-state loading differences, we started with a common base matrix in both states: 1 1 1 1 1 1 1 1 1 1 0 0 which shows a binary simple structure that is often found in empirical studies (e.g., consider a typical positive vs. negative affect structure that may also underlie the data in the motivating example described in Section "Motivating Example"). For the medium loading difference condition, respectively one loading was shifted from the first factor to the second and one from the second to the first (for different items across states). Through this manipulation, the overdeterminaton of the factors was not affected and thus equal across states. For example, the first two of three loading matrices were with k 1 ¼ 0 and k 2 ¼ 1: Similarly, for the low between-state loading difference condition one crossloading of ffiffiffiffi :5 p was added to the first and second factor (for different items across states), which also lowered the primary loadings to ffiffiffiffi :5 p : Specifically, in the example in Eq. (18), the entries in K 1 and K 2 were k 1 ¼ ffiffiffiffi :5 p and k 2 ¼ ffiffiffiffi :5 p : Finally, row-wise rescaling of the loading matrices leads to a sum of squares of 1 À e per row. The between-state loading matrix similarity was computed by means of the grand mean, u mean , of Tucker's (1951) congruence coefficient (i.e., u xy ¼ x 0 y ffiffiffiffi ffi x 0 x p ffiffiffiffi y 0 y p , with x and y referring to matrix columns) that was computed for each pair of factors (note that u ¼ 1 means proportionally identical factors). The u mean across all states and factors was respectively .80 and .94 for the medium and low loading difference condition: For the intercepts, we used the following base vector with fixed values of 5: m Base ¼ 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 À Á 0 , 13 1.5 hours pertains to one unit and the other intervals are scaled to this unit interval.
which was used as such in all states for the no intercept difference condition. To induce low intercept differences across states, we altered two intercepts to 5.5 (different items across the states). For example, for the first two states, the vectors were m 1 ¼ 5:5 5:5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 À Á 0 , m 2 ¼ 5 5 5:5 5:5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 À Á 0 : The combination of the between-state loading difference and intercept difference conditions lead to four different state-separation conditions. To quantify the general state-separation in both analyses based on the population values, we calculated the four R 2 entropy values for step 1 of 3S-LMFA, where information is only obtained from the MM, and for FIML-LMFA, where information is retrieved from both the MM and the SM including the two covariates. 14 For FIML-LMFA, starting from the smallest R 2 entropy , the resulting values amounted to .90 for the low loading difference/no intercept difference condition, to .94 for the medium loading difference/no intercept difference condition, to .96 for the low loading difference/low intercept difference condition, and to .97 for the medium loading difference/low intercept difference condition. For the same conditions in the first step of 3S-LMFA, these values were respectively equal to .52, .65, .76, and .82. Thus, as expected, state-separation is initially lower in 3S-LMFA than in FIML-LMFA, showing the importance for 3S-LMFA to include the information from the SM in step 3. 15 For each condition, we generated 100 datasets in R (R Core Team, 2020) according to the described population models and analyzed them in LG. Note that only one syntax file is required for FIML-LMFA but two files are necessary for 3S-LMFA. First, one syntax file is required to run step 1 and 2. Thereby, the posterior state assignments and the classification error probability matrix are saved and, subsequently, they are loaded in the second syntax file that is required for step 3.

Results
In the following, we evaluate the performance of 3S-LMFA and compare it to the results of FIML-LMFA based on the replications that converged in both steps of the 3S method as well as in the FIML method. Results that did not converge were re-estimated once and were excluded if convergence still failed. After reestimation, 3180 out of 3200 datasets converged in 3S-and FIML-LMFA (all datasets converged in step 1 and step 3 of 3S-LMFA and 3180 in FIML-LMFA). Non-convergence in FIML was almost exclusively present for the smallest amount of information condition (i.e., N ¼ 30 and D ¼ 7) and was caused by reaching the maximum number of EM iterations without convergence. Furthermore, we re-estimated the replications of converged results that showed unrealistically large SEs due to boundary values for any of the estimated initial state and transition parameters (i.e., with an SE > 10 such as 100, 400 or 1000) because including such cases would falsify the results. This was only the case for 56 datasets in the third step of 3S-LMFA, where re-estimation did not help. As a result, 3124 datasets were included in the performance analyses reported below. 16

Goodness of state recovery
The recovery of the states was assessed by means of the Rand Index (RI) as well as the Adjusted Rand Index (ARI; Hubert & Arabie, 1985). Both indices evaluate the overlap between two sets of elements while being insensitive to permutations of element labels (in our case state labels). The indices in the RI range from 0 (no overlap between any of the pairs) to 1 (perfect overlap) and the ARI takes values from around 0 (overlap is not better than chance) to 1 (perfect overlap). As expected, the state-recovery was rather poor after the first step of 3S-LMFA because of the low R 2 entropy values here (RI ¼ :83, SD RI ¼ :06, ARI ¼ :61, SD ARI ¼ :14). However, the overall recovery in 3S-LMFA was excellent (Steinley, 2004; RI ¼ :94, SD RI ¼ :03, ARI ¼ :87, SD ARI ¼ :06) and almost as high as in FIML-LMFA (RI ¼ :97, SD RI ¼ :01, ARI ¼ :94, SD ARI ¼ :03). Moreover, only the state-separation influenced the state recovery in that larger separation increased recovery (Table 1), indicating 14 The population R 2 entropy value for a specific choice of population parameters and number of measurement occasions was obtained using Monte Carlo simulation. For this purpose we used the 'Monte Carlo simulation study' option in LG with one random draw of the timeintervals and covariate patterns and with the parameters fixed to their population values. 15 Note that it is always good to check the R 2 entropy after step 1 that is automatically provided in LG because for a very small state-separation, say, with a value much lower than 0.5, it might be better to conduct a FIML-LMFA with additional state-separation information from the SM (including covariates). This is because in that case, the actual differences between the states might be even lower than the estimated ones. This would lead to an underestimation of the classification error (Vermunt, 2010). However, such low values are unlikely to be found in practice. 16 Note that we also investigated whether the solutions converged to local maxima (i.e., that they had smaller logL values than the global maximum likelihood (ML) solution. Although the latter is unknown, we can obtain an approximation ('proxi') in simulation studies by estimating the models with the population parameters as starting values. When logL multistart < logL proxi , the solution is considered a local maximum. This was no issue in FIML-LMFA and the first step of 3S-LMFA and only occurred for 3 datasets in the third step of 3S-LMFA.

Goodness of MM parameter recovery
Goodness of loading recovery. We computed a goodness of state-loading recovery (GOSL) as the average Tucker congruence coefficient between the true and the estimated loading matrices: where K f k corresponds to the state-and factor-specific loadings. By using Procrustes rotation 17 (Kiers, 1997) in order to rotate the estimated state-specific loading matricesK k to the true ones K k , we solved the label switching of the factor labels within the states. Furthermore, to handle the label switching of the states, we retained the state permutation that maximized the GOSL value. Overall, the loading recovery was very good in 3S-LMFA (GOSL ¼ 1; SD ¼ 0) and was the same for FIML-LMFA. Note that loading recovery can be good despite a bad state recovery because the loading matrices are very similar across states.
Goodness of intercept recovery and unique variance recovery. To examine the recovery of the intercepts and the unique variances, we calculated the mean absolute difference (MAD) between the true and the estimated parameters. The overall intercept recovery in 3S-LMFA was very good (MAD int ¼0.02; SD ¼ 0:01) and did not differ from the recovery in FIML-LMFA. The same applied to the unique variance recovery (MAD unique ¼0.01; SD ¼ 0:00). Moreover, only the amount of information had a marginal effect on the two types of recovery in that the largest number of subjects (N ¼ 90) and a higher number of participation days (D ¼ 30) slightly improved the recovery in both analyses (Table 1)

. 18
Goodness of SM parameter recovery Goodness of transition and initial state parameter recovery. To evaluate the recovery of the transition and initial state parameters, we calculated the average bias and the average Root-Mean-Square-Error (RMSE) for the individual parameters of the four parameter types (i.e., initial state and transition intercept parameters and the two slope parameters for the covariates; Table 2). As can be seen, the bias in 3S-LMFA is generally very small (i.e., between À0.02 and 0.01) and in line with FIML-LMFA. However, the RMSE is generally higher in 3S-LMFA (e.g., RMSE ¼ 0:39 for the 17 Note that the rotation was done in R. Although rotation in LG was already possible for known groups, the issue with switching state labels has to be resolved to provide LG with the correct state-specific target matrices before rotation can be applied to unknown groups such as the states. 18 The unique variance recovery may be affected by Heywood cases (i.e., improper factor solutions with at least one unique variance being negative or equal to zero, possibly caused by insufficient amount of information or underdetermined factors; Van Driel, 1978). However, this was not the case in any of the analyses.
first initial state intercept parameter in 3S-LMFA versus RMSE ¼ 0:36 for the same parameter in FIML-LMFA). This is because using the step-wise procedure implies some loss of information. Moreover, Table 3 illustrates the effects of the manipulated factors for the four different parameter types, yet, averaged across the individual parameters for the sake of brevity and illustrative purposes. The manipulated factors had an influence on the bias and the RMSE in 3S-LMFA that were similar to the effects on the measures in FIML-LMFA. More specifically, a higher amount of information generally decreased the bias, while a larger stateseparation only marginally decreased the bias for some of the individual parameters. Furthermore, a higher state-separation as well as a higher amount of information decreased the RMSE.  Goodness of covariates' SE recovery. To examine the SE recovery, we compared the average estimated SE for all 100 replications within a condition with the SD of the parameter estimates across these replications and calculated the SE/SD ratios for the individual parameters for the four parameter types ( Table 2). The ratios are generally slightly lower than 1 in 3S-LMFA with values ranging from 0.95 to 1.02, indicating that the SEs are slightly underestimated. However, this is similar in FIML-LMFA, yet with values ranging from 0.97 to 1.02. Moreover, the manipulated factors had no clear impact on the recovery in neither of the analyses as the four parameter types were influenced differently by a higher state-separation and higher amount of information.

Computation time
Exploring the computation time of all replications in the two analyses, we found that, with 178.01 seconds, FIML-LMFA took on average more than twice as much time as 3S-LMFA, where the total computation time was 82.42 seconds (45.37 seconds for step 1 and 37.05 seconds for step 3). It should be noted that we used 25 random start sets with an EM tolerance of 1e-005 in FIML-LMFA and step 1 and 3 of 3S-LMFA. However, one set and a criterion of 0.01 is probably enough in the third step of 3S-LMFA because local maxima are very unlikely when the measurement part is fixed. Adjusting the values accordingly makes the computation even faster.

Conclusion
Summarized, the parameter and SE recovery in 3S-LMFA approached the recovery in FIML-LMFA, making the 3S procedure a promising fast alternative when the inclusion of covariates is of interest and hence the FIML estimation is likely unfeasible. Although a small information loss in terms of higher RMSE values for the parameters of the SM and a slightly worse state-recovery in 3S-LMFA could be observed, the general parameter recovery in 3S-LMFA was on average as good as in FIML-LMFA and furthermore much faster.

Application
To illustrate the empirical value of the 3S-LMFA approach we applied it to the ESM data introduced in Section "Motivating Example". Note that this application is only meant to illustrate the possibilities of the new methodology, and since the hypotheses were not preregistered, we consider these analyses exploratory. 19 As previously described, we investigated which MMs underlie which part of the data and how the MMs differ (step 1), and whether the MMs are related to covariates (step 3). From all covariates offered in the dataset, we included only five covariates that we thought were plausible to influence MM differences/ changes and were of interest for this application. Because emotional experiences may vary depending on situational influences (Dejonckheere, Mestdagh, et al., 2021), and adolescents spend most of their time with parents and friends (Larson, 1983;van Roekel et al., 2015), we chose the following three time-varying covariates for the social context: (1) being alone (nominal), (2) being with a friend (nominal), and (3) being with a parent (nominal). From the baseline measurement, we chose the following two time-constant covariates: (1) emotion clarity deficit measured with the Emotion Clarity Questionnaire (ECQ; Flynn & Rudolph, 2010) on a Likert scale from 1 (totally disagree) to 5 (totally agree) (e.g., "I often have a hard time understanding how I feel.") and (2) differentiation of emotional experience assessed via a subscale of the Range and Differentiation of Emotional Experience Scale (RDEES; Kang & Shaver, 2004) on a Likert scale from 1 (totally disagree) to 7 (totally agree) (e.g., "I am aware that each emotion has a completely different meaning."). These baseline questionnaires can be found in the Online Supplement S.1 and S.2. 20 In step 1, we investigate which MMs underlie the data by performing mixture factor analysis including the model selection procedure. Given the relatively small number of observations (T i Â I ¼ 1168) and items (J ¼ 10), we only considered models with 1 À 3 states and 1 À 3 factors per state. The best fitting model according to the CHull method was a two-state model with two factors in the first state and one factor in the second state ("[2 1]"). We provide more information about the selection procedure in Appendix A.6. The state separation was very high (R 2 entropy ¼ 0:98). About 60 percent of the observations were classified into state 1 and 40 percent into state 2. We will inspect the differences between the MMs step by step, starting with (1) the loadings, followed by (2) the intercepts and (3) proportions of unique variances,

19
Note that the NA items were generally right-skewed. Since the consequences of violating the normality assumption have yet to be investigated, one should be particularly cautious with drawing substantial conclusions (Vogelsmeier, Vermunt, van Roekel, et al., 2019). This is, however, not a problem for illustrating the purpose of 3S-LMFA. 20 Note that three subjects in the ESM study did not have any baseline measures for an unspecified reason. For such cases, LG automatically imputes the average scale score.
which are all given in Table 4. First, looking at the standardized (and in state 1 obliquely rotated) loadings, we can see that state 1 is characterized by two independent positive affect (PA) and negative affect (NA) factors that are hardly correlated (r ¼ 0:07), indicating that adolescents in this state differentiate positive and negative emotional experiences. In contrast, the dimensions seem to collapse in state 2, which is characterized by a single ("bipolar") dimension "PA versus NA". Moreover, it is noticeable that the item "miserable" has a high loading in state 2 but not in state 1. Finally, the rather low loadings of the negative emotions indicate that their relation to the general score on the latent factor is weaker than is the case for the positive emotions.
Second, the intercepts in state 1 are rather high for the positive emotions and very low for the negative emotions. In state 2, the intercepts for the positive emotions are somewhat lower and the intercepts for the negative emotions somewhat higher. Note that, as explained in Section "Measurement part", intercept differences pertaining to all items that are strongly related to a certain factor are probably due to differences in the factor means.
Third, investigating the proportions of unique variances, it appears that two of the positive emotions "proud" and "lively" have something unique that cannot be explained by the PA dimension/end of the scale in neither of the two states. Comparing them with the other positive emotions, one can imagine that their scores at least partly depend on specific encountered events (e.g., "proud" may be elicited by achievements and "lively" is more likely to occur during high-energy activities). Moreover, "miserable" has a large unique variance in state 1 and therefore, also considering the low loading, is hardly related to the other emotions. It could be that the item is not always suited to assess affect in adolescents as it is an emotion that is likely triggered by rather extreme situations that might not have been encountered for adolescents in the ESM study. Finally, the negative emotions in state 2 have higher unique variances than in state 1, indicating that there is less covariance between them and that there is no large covariance with the positive emotions.
There is a theoretical debate about whether positive and negative affect are two independent factors (Watson & Tellegen, 1985) or two bipolar ends of the same factor (Russell, 1980). However, our results suggest that both theoretical perspectives can be true at different points in time within one individual. In the first state, adolescents are capable of differentiating positive and negative emotional experiences ("independent state"). In contrast, the factor structure in the second state may be a result of adolescents' simplistic representation of either having "positive" or "negative" emotions ("bipolar state"). These findings are in line with recent research, which suggests that both theoretical perspectives can be true, dependent on person specific factors (e.g., Dejonckheere et al., 2019) or situation specific factors (e.g., Dejonckheere, Mestdagh, et al., 2021). Regarding the intercept differences, we conclude that being in a rather good mood is related to the independent state and being in a rather unpleasant mood is related to the bipolar state. This is in line with research indicating that the bipolar state is more common in individuals with depression (Dejonckheere et al., 2018) and who are stressed (Dejonckheere, Mestdagh, et al., 2021;Zautra et al., 2002).
In order to better understand what triggers the different states, we investigated the influence of the five covariates. First, based on the posterior probabilities of the observations to belong to the state-specific MMs, we obtained the modal state membership and the classification errors (step 2). Given the high state separation, the classification errors were very small: Factor loadings were standardized by dividing them by the state-specific item standard deviations. We considered the loadings to be considerable when they were larger than 0.3 in absolute value (e.g., Hair et al., 2014). These loadings are depicted in boldface.
p w it js it ð Þ¼ :9968 :0032 :0080 :9920 : Therefore, correction for classification error is hardly necessary, which generally cannot be foreseen before conducting the step-1 analysis. The modal state assignments were subsequently used as indicators in order to estimate the Markov model with covariates on the transition probabilities (step 3). 21 By means of stepwise backward selection with the five covariates, we eliminated the least significant covariate at each step until only covariates were left that met the criterion ofa < 0:05: The final model contained the two timeconstant covariates from the baseline measure and the time-varying covariate being with a parent. Note that, due to the low classification errors, the final state memberships (i.e., 60% in state 1 and 40% in state 2) did not change after step 1. In Table 5, we present the parameters of the SM (including the Wald test statistics). To see the covariate effect more easily, we also present the transition probabilities for a two-hour-interval, which was the most frequently encountered interval length in the data. We calculated them respectively for the highest and lowest score on one covariate while setting the value of the other covariates to their average value in the sample (averages are given in the notes of Table 5). Note that the effect of being with a parent was so small that we do not further discuss it. Regarding the time-constant covariates (emotion clarity deficit and differentiation of emotional experience), we can see that adolescents with high emotion clarity deficit have a slightly higher probability to stay in or transition to the bipolar state (i.e., are more likely to be in that state) compared to adolescents with a low emotion clarity Table 5. Logit and log intensity parameters for the structural model and additional transition probabilities respectively for the lowest and highest possible score on the three covariates for a two-hour interval.  deficit, who are equally likely to be in either of the states. Moreover, adolescents with a high differentiation of emotional experiences have a slightly higher probability to stay in or transition to the differentiated state than adolescents with a low differentiation of emotional experiences, who are equally likely to be in either of the states. From the individual transition plots of the adolescents (see Figure 5 for six representative examples), we can clearly see between-person differences (that are apparently partly related to clarity and differentiation of emotions). For instance, some adolescents are mainly in the independent state (row 1) and others are mainly in the bipolar state (row 2). However, we can also see some adolescents with frequent transitions between the states (right picture in row 3) and some adolescents with transitions after a certain amount of completed questionnaires (left picture row 3). These transitions indicate that there are likely time-varying within-person variables that influence the transitions but that we are not aware of. Therefore, in the future, it would be interesting if applied researchers would include time-varying covariates in their ESM studies (e.g., stress Dejonckheere, Mestdagh, et al., 2021;Zautra et al., 2002) that could potentially influence within-person changes between a bipolar and an independent representation of one's emotional state. To conclude, LMFA indicated that configural invariance was violated across states and that some subjects transitioned between the two states frequently over time while others were mainly in one of the two states. Therefore, the questionnaire data is not validly comparable across all subjects and time-points.

Discussion
In this article, we tailored Vermunt's (2010) maximum likelihood (ML) three-step (3S) procedure to latent Markov factor analysis (LMFA)-a method to explore measurement model (MM) changes over time-and showed that the resulting 3S estimation of LMFA (3S-LMFA) is a promising alternative to the original full information maximum likelihood (FIML) estimation of LMFA (FIML-LMFA): 3S-LMFA performs almost as good as FIML-LMFA, is more accessible and intuitive for applied researchers, and facilitates estimation when researchers want to explore the influence of different (sets of) covariates on transitions between MMs.
It is important to note that this article is one of the first to apply a 3S approach with a continuous-time Markov model to time-intensive longitudinal data, which is data that becomes increasingly popular in different fields with diverse data characteristics. On top of that, the flexible step-wise nature of 3S-LMFA can be used to easily extend the method. Specifically, it is easy to adjust the method to the data and research questions at hand by exchanging the first step (i.e., the mixture factor analysis), which makes it applicable to a wide range of data. For example, one may consider extending item response theory models for longitudinal categorical data. If it is not possible to estimate the first step in Latent GOLD (LG), one can also estimate the first step in a different program and only communicate the results to LG to continue with the second and third step. The same will soon be possible in the open-source program R as we are working on a package that takes estimated state probabilities from any step 1 model (estimated in R or another program) as input, calculates modal state assignments and the classification errors, and links it to an existing package that can estimate single indicator (continuous-time) Markov models with fixed response probabilities. Although adaptations of the MMs are also possible in FIML-LMFA, it is much more difficult in practice since a specific part of the estimation procedure would have to be adapted (i.e., inside the LG software), which is not possible for applied researchers but only for the software programmer.
A limitation of the current paper is that we did not examine the performance of 3S-and FIML-LMFA under violation of the conditional independence assumption and assumed the covariates to influence only the parameters in the structural model (SM), that is, the transitions in the Markov model, and not the factors or the observed variables directly. This assumption might be violated (e.g., being with friends might be related to higher positive affect) and might lead to extracting a wrong number of states and inaccurate parameter estimates (Kim et al., 2016;Kim & Wang, 2017;Masyn, 2017;Nylund-Gibson & Masyn, 2016). As in any other mixture model approach with covariates, the problem of model misspecifications is inherent to both the FIML and the 3S estimation and should be extensively studied in the specific context of LMFA. With regard to extracting the correct number of states, it can be expected that 3S-LMFA performs better than FIML-LMFA when the effects of the covariates on the latent state memberships are included and direct effects of these covariates-for example, on the response variables-are falsely omitted. In the first analysis step of 3S-LMFA, the MMs are formed while disregarding the covariates. Therefore, the covariates do not affect the state enumeration. This is different in FIML-LMFA, where covariates may affect the state enumeration. Specifically, if the local independence assumption is violated, FIML-LMFA would require too many states to counter the local independence violation and achieve a good model fit (Kim & Wang, 2017;Nylund-Gibson & Masyn, 2016). However, inaccurate covariate estimates could occur with both estimation approaches (Asparouhov & Muth en, 2014;Kim et al., 2016;Masyn, 2017). Therefore, it is important to develop diagnostic tools to detect misspecification (e.g., by means of residual statistics) and to account for it, possibly by including the respective covariates with direct effects on the response variables in step 1 of the analysis and by using covariate-specific classification-error adjustments in step 3 (Vermunt & Magidson, 2021). However, tailoring these methods to LMFA is beyond the scope of this paper.
Another limitation is that the states capture not only dynamics in the MM but also in the factor means. If additional states are selected for differences in the factor means, it is difficult to identify if covariates are related to the states only because they explain differences in the MMs or (also) because they explain dynamics in the factor means. In the future, it would be relevant to investigate under which circumstances additional states are selected for factor mean differences and, if necessary, to look into possibilities to lower the impact of factor means on the state formation, for example, by adding random effects to the factor means of zero. This would capture part of the factor mean dynamics and increase the role of other parameter differences in the state formation. Significant covariate effects would then be mainly attributable to differences in the MM parameters.

Article information
Conflict of interest disclosures: Each author signed a form for disclosure of potential conflicts of interest. No authors reported any financial or other conflicts of interest in relation to the work described.
Ethical principles: The authors affirm having followed professional ethical guidelines in preparing this work. These guidelines include obtaining informed consent from human participants, maintaining ethical treatment and respect for the rights of human or animal participants, and ensuring the privacy of participants and their data, such as ensuring that individual participants cannot be identified in reported results or from publicly available original or archival data. Role of the funders/sponsors: None of the funders or sponsors of this research had any role in the design and conduct of the study; collection, management, analysis, and interpretation of data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.
A.2. Additional simulation study: Autocorrelated factor scores

A.2.1. Problem
In order to investigate whether ignoring autocorrelated factor scores is more harmful for the performance of 3S-LMFA than it is for FIML-LMFA (Vogelsmeier, Vermunt, van Roekel, & De Roover, 2019), we conducted a simulation study with selected conditions from the main simulation study (a-d) and, furthermore, manipulated the autocorrelation (e). More specifically, we kept the state-separation conditions (a) and (b) as they had considerable effects on the performances in the main simulation study (Section "Simulation Study") but we kept respectively only one factor of the conditions pertaining to the amount of information (c and d) as these conditions had only minor effects on the performances. For the size of the autocorrelation, we used the coefficients suggested by Cabrieto et al. (2017), that were also used in the simulation study to investigate the effect of ignoring autocorrelation in FIML-LMFA (Vogelsmeier, Vermunt, van Roekel, & De Roover, 2019).

A.2.2. Design and procedure
The new conditions were the following: The conditions marked with " Ã " are the ones that are the ones that are now fixed to one value from the manipulated conditions in the main simulation study. This design resulted in 2 Â 2 Â 1 Â 1 Â 3 ¼ 12 conditions. The data generation was the same as in the main simulation study (again with 100 replicates). However, instead of using an orthogonal regular factor model as shown in Eq. (1), we used an orthogonal dynamic factor model, where the factor scores at time-point t are correlated with the factor scores at t À 1 by the coefficient / (e): where e it $ MVNð0, IÞ is a subject-and time-point specific F k Â 1 noise vector. The correlated factor scores f itk were generated by means of a recursive filter (Hamilton, 1994), that is, the first factor scores are set equal to the noise elements e i1 and the remaining scores are computed as in Eq. (A10). In order to retain the expected variance of 1, we multiplied the resulting factor scores by Roover et al., 2014). Note that we computed the average autocorrelation across all datasets belonging to the same condition to see how the manipulation played out. The autocorrelations were À0.02, 0.26, and 0.64. slightly higher for the transition intercepts. Thus, the autocorrelation appears to be partially captured by the step-3 latent state transitions. However, the effect of the autocorrelation on the parameter estimation is negligible.

A.3.1. Problem
In order to test whether non-uniform covariate distributions and the strength of the covariate effects influence the performance of 3S-and FIML-LMFA differently, we repeated selected conditions from the main simulation study and additionally manipulated the strength of the covariate effects (e) and the distribution of covariates (f). More specifically, we selected the conditions that affected the performances in the main simulation study the most (Section "Simulation Study"). This implied that we kept the state-separation conditions (a and b) while selecting only one factor from the conditions pertaining to the amount of information (c and d).

A.3.2. Design and procedure
The new conditions were the following: The conditions marked with " Ã " are the ones that are now fixed to one value from the main simulation study. This design resulted in 2 Â 2 Â 1 Â 1 Â 3 Â 3 ¼ 36 conditions. We generated the data as in the main simulation study (again with 100 replicates). However, the effects of the time-varying covariate z it1 and time-constant covariate z i2 as well as their distributions across observations and/or subjects differed depending on factors (e) and (f). First, with regard to the strength of the covariate effects, a higher value for z it1 still lowered the probabilities of transitioning to and staying in state 1 and 3 and increased the probabilities of transitioning to and staying in state 2 but with slope  parameters being equal to c 12, z it1 ¼ c 32, z it1 ¼ 1 and c 13, z it1 ¼ c 21, zit1 ¼ c 23, zit1 ¼ c 31, zit1 ¼ Às: Furthermore, a higher value for z i2 still increased the probability to transition away from the origin state but with slope parameters being equal to The parameter s was either 0.25, 0.5, or 1 (see factor e). Next, with regard to the distributions of the covariate scores À0:5 and 0:5, we included conditions with a uniform distribution (i.e., "50/50") and both a "70/30" and "30/ 70" condition. The time-varying covariate z it1 was assigned such that the score changed from À0:5 to 0:5 after 5 of the 7 days in the 70/30 condition and after 2 days in the 30/70 condition. To obtain exactly a 50/50 condition, the scores changed after 3 days for the first half of the subjects and after 4 days for the other half of the subjects. For the timeconstant covariate z i2 , the scores À0:5 and 0:5 were randomly selected with probabilities being equal to the three distribution levels (i.e., 70/30, 50/50, or 30/70). Note that we included a 70/30 and 30/70 condition to prevent a possible confounding of the results: The covariate scores influence the transition probabilities (i.e., the state memberships become more or less stable) and a higher stability of the state membership previously showed a positive influence on the recovery of the states in FIML-LMFA (Vogelsmeier, Vermunt, van Roekel, & De Roover, 2019). For instance, a covariate score of À0:5 on both covariates would lead to a slightly more stable transition probability matrix than a covariate score of 0:5 on both covariates (e.g., with an average of 96% versus 92% probability to stay in a state with s ¼ 1 and a one-unit interval). Note, however, that the difference is so small that it might not affect the performance.

A.3.3. Results
The results can be found in Table A3. The state and MM recovery of 3S-and FIML-LMFA were largely unaffected by the strength of the effect and the distribution of the covariates and, therefore, will not be further discussed. With regard to the SM, there was only a very small effect with regard to the RMSE but it was the same for both estimation procedures. First, the RMSE was slightly higher for the strongest covariate effect (i.e., s ¼ 1). This is likely due to somewhat larger SE values that are inherent to larger logit parameters. Second, the RMSE for the transition intercepts and transition slopes was slightly higher for non-uniform covariate distributions, which is likely caused by the general loss of information when covariate scores are not uniformly distributed. Step 3 syntax of 3S-LMFA Note that the step-3 syntax below is only one option to estimate the third step. Instead of calculating the classification error probability matrix manually and inserting it into the syntax ('w ¼ … ') to tell LG that the matrix should be used as fixed response probability matrix, it is also possible to use the "step3" option in LG (' step3 ml modal;' ). When using this option, LG automatically calculates the classification error probability matrix from the input file (i.e., the step 1 posterior probabilities and the modal state assignments, here 'classification1.csv') and uses it as fixed response probability matrix. However, when using the step3 option, LG does not yet provide the user with the final latent state-assignments. This is because the classification is often not the primary focus of interest in other three-step analyses where researchers rather focus on parameter estimates such as covariate effects. Since classification is certainly of interest in LMFA, we suggest to use the manual syntax version. For the model selection, we ran all models five times to see whether the maximum likelihood solutions were indeed global solutions. We considered the solutions to be global when the absolute differences between the loglikelihood values of the 5 solutions was respectively smaller than 0.01. As a result, 11 out of 19 models were passed to the model selection procedure with the CHull method, which was conducted with the R-package "multichull" (note that we also did a sensitivity check by doing the CHull test including possible local optima and the selected model was always the same). The CHull can be considered an automated generalized scree-test (Bulteel et al., 2013;Ceulemans & Kiers, 2006;Ceulemans & Van Mechelen, 2005). The method identifies the models in a "loglikelihood versus number of parameters" plot that are at the higher boundary of the convex hull (Cattell, 1966) and identifies the optimal model by evaluating the elbow in the scree plot (i.e., the point where the improvement in fit with additional parameters levels off). During the CHull procedure, following Wilderjans et al. (2013)'s recommendation, we discarded models for which the fit was almost equal to the fit of a less complex model (i.e., when it fitted less than 1 percent better than the less complex model, which is also the default value in the Rpackage). The model with 2 states and respectively 2 and 1 factors ("[2 1]") was the best (see output in A.6.2). The second best model was the model with two states and 1 factor in both states ("[1 1]"). From the grouping of points, corresponding to the different number of states, it can also be seen that the improvement in fit from 1 to 2 states is much larger than the one from 2 to 3 states.

A.6.2. Output CHull
Output from the CHull method performed by the R-package "multichull "shows the models considered, the models on the upper bound of the convex hull, the selected model [2 1], and the CHull-figure plotting the number of free parameters against the loglikelihood value.