How to Perform Three-Step Latent Class Analysis in the Presence of Measurement Non-Invariance or Differential Item Functioning

ABSTRACT The practice of latent class (LC) modeling using a bias-adjusted three-step approach has become widely popular. However, the current three-step approach has one important drawback – its key assumption of conditional independence between external variables and latent class indicators is often violated in practice, such as when a (nominal) covariate represents subgroups showing measurement non-invariance (MNI) or differential item functioning (DIF). In this article, we demonstrate how the current three-step approach should be modified to account for MNI; that is, covariates causing DIF should be included in the step-one model and the step-three classification error adjustment should differ across the values of the DIF covariates. We also propose a model-building strategy that makes the new methodology practically applicable also when it is unknown which of the external variables cause DIF. The new approach, implemented in the program Latent GOLD, is illustrated using a synthetic and a real data example.


Introduction
During the past several years, the practice of latent class (LC) modeling using a stepwise approach called bias-adjusted threestep LC analysis has become widely popular. It originates from the work of Bolck et al. (2004), who proposed a stepwise approach for modeling the association between classes and external variables. This involves: (1) Estimating the LC measurement model parameters, including the determination of the number of latent classes.
(2) Classifying cases into one of the classes based on the model selected in step one (typically, the most likely class). (3) Examining the relationship between the classes and external or auxiliary variables while accounting for classification errors introduced in step two.
The external variables used in step three can be covariates affecting the classes (Vermunt, 2010), distal outcomes affected by the classes , or a combination of these (Bakk et al., 2013;Nylund-Gibson et al., 2019). Also, more complex models can be estimated in the third step, such as latent transition or latent Markov models (Asparouhov & Muthén, 2014;Di Mari et al., 2016). The trend toward this stepwise approach to deal with covariates and distal outcomes is due to its practical advantages, its several theoretical benefits, and its implementation in generally available LC analysis software like Mplus (Muthén & Muthén, 2015) and Latent GOLD (Vermunt & Magidson, 2016). An alternative approach, referred to as one-step LC modeling, involves simultaneous inclusion of class indicators and external variables in a single model (Bandeen-Roche et al., 1997;Dayton & Macready, 1988). For a list of problems in this one-step approach that are resolved by the three-step approach, as well as a description of the main three-step approaches, see Vermunt (2010). See also Nylund-Gibson and Masyn (2016), who present a simulation study that reveals several problems with respect to the selection of the number of classes when covariates are included early on in the modeling process.
However, despite its many improvements over the one-step approach, the current three-step LC analysis approach has one important drawback -its key assumption of conditional independence between external variables and latent class indicators is often violated in practice (Masyn, 2017). More specifically, the derivation of the bias-adjusted three-step method requires the assumption of no direct effects between external variables (covariates) and the response variables used to construct the LC model (indicators). Violations of this assumption include situations where a covariate represents subgroups showing measurement non-invariance (MNI) or item bias (Clogg & Goodman, 1984;Kankaras, Moors, Vermunt, 2010). But, more generally, the LC model with covariates is a multiple indicators multiple causes (MIMIC) model where, as in factor analysis and item response theory models, DIF can be conceptualized as causes having not only indirect effects on the indicators via the latent variable(s) but also having direct effects on the indicators (Lee et al., 2017;Suh & Cho, 2014;Woods, 2009). In the remainder, we use the terms MNI, DIF, and covariates/ causes with direct effects interchangeably.
A simulation study designed by Asparouhov and Muthén (2014) to investigate the consequences of ignoring direct effects of covariates on indicators demonstrated that the three-step approach may yield seriously biased estimates in such a case (see also Janssen et al., 2019). As a possible solution, they presented a somewhat ad-hoc modification of the three-step approach in which the direct effects of interest are included in the measurement part of the step-one model. A similar approach was used by Di Mari and Bakk (2018) as a modification of two-step and three-step latent Markov modeling with direct effects. While these approaches perform better than fully ignoring the direct effects, a certain amount of bias remains, which may still be substantial.
Based on the above, one may conclude three-step LC analysis should not be performed in situations with direct effects, and a one-step model should be used instead. In fact, Janssen et al. (2019) did not even attempt to include the three-step approach in their simulation study in the detection and modeling of direct effects in LC analysis. They state: "With threestep, in contrast, adding DEs [Direct Effects] becomes increasingly difficult and often even impossible." However, as we demonstrate in this article, there turns out to be a rather simple way to modify the three-step approach to allow for direct effects. Specifically, the correct three-step approach with MNI or DIF proceeds as follows: (1) Covariates with direct effects on the indicators are included in the step-one model, whereas other external variables need not be included. The step-one model should contain not only the relevant direct effects but also the effects of the included covariates on the latent classes.
(2) The step-two classifications are obtained in the usual way, but now also account for the fact that these depend on the values of the covariates included in the step-one model. (3) The step-three model of interest contains the DIF covariates used in step one, as well as other covariates and/ or distal outcomes of interest. The key modification here compared to the standard step-three modeling is that the classification error correction matrix, also referred to as D matrix, should be allowed to differ across categories of covariates with direct effects.
One natural application of the modified step-three approach is in situations where it is desired to account for MNI, such as in cross-cultural research. In the step-one analysis, one includes the country as a grouping variable and builds a measurement model that may differ in certain ways across countries (that is, one or more indicators may be affected by DIF across countries). The classification in step two is based on all variables used in the step-one model, including the country. The stepthree model would include country as one of the covariates and account for the fact that classification errors (the D matrices) differ across countries. This method of dealing with direct effects has been implemented in Latent GOLD 6.0, in both the Step3 and the Syntax module (Vermunt & Magidson, 2020). Note that our modified approach differs in two important ways from the modification Asparouhov and Muthén (2014) experimented with and Di Mari and Bakk (2018) used in the context of latent Markov modeling. First, the step-one model contains the effects of the covariates on the classes, which prevents the overestimation of the effects due to the omission of the indirect effects via the classes. Second, the step-three correction for classification errors differs across categories of the external variables with direct effects, which is exactly what is required to properly account for the non-invariant measurement models and avoid biasing the model parameters.
The next section provides the theoretical framework for our modified three-step procedure and explains how to apply the new approach in practice, that is, when we do not know a priori which external variables are causing DIF. Following that, we illustrate the new approach with both synthetic and real data applications. In Appendix A1, we show how to set up the models using the Latent GOLD 6.0 software. We end with a conclusion and discussion section.

Three-step LC analysis with direct effects
In this section, we derive the form of the step-one and stepthree models for three-step LC analysis with direct effects. We consider three different situations where direct effects need to be addressed in somewhat different ways: 1) covariates have direct effects on indicators, 2) covariates have direct effects on both indicators and distal outcomes, and 3) indicators have direct effects on distal outcomes. We also propose a modelbuilding strategy that can be used when it is not known a priori which covariates may induce DIF.

Covariates with direct effects on indicators
The population model of interest is an LC model with two sets of explanatory variables, z 1 and z 2 , where the z 1 have direct effects on some of the response variables or indicators y, whereas the z 2 do not. This situation is depicted in Figure 1. The LC model for Pðyjz 1 ; z 2 Þ, the probability of having a certain pattern of responses conditional on the covariate values, takes on the following form: Here, Pðxjz 1 ; z 2 Þ is the probability of belonging to latent class x conditional on the covariate values, which will usually be modeled using a logistic regression equation. The term Pðyjx; z 1 Þ denotes the probability of a response pattern y given class membership x and covariate values z 1 . Under the typical assumption of local independence between the response variables, we obtain: Here, Pðy j jx; z 1 Þ is the class-specific response probability for the jth indicator, which, because of the direct effects, may also depend on z 1 . For categorical responses, these probabilities will usually be modeled using logistic equations. Following the same logic as a standard three-step LC analysis, since z 2 does not affect y directly, we can ignore z 2 in the estimation of the measurement part of the model. The resulting step-one model for Pðyjz 1 Þ is obtained as follows: ( 3) that is, by collapsing or marginalizing over the z 2 . As can be seen, the term of main interest, Pðyjx; z 1 Þ, is still the same as in the model that includes z 2 , which shows that we can ignore z 2 in the step-one model. It also shows that 1) the covariates z 1 should be part of the step-one model, 2) the specification used for Pðyjx; z 1 Þ should be the same as in the full model, and 3) the z 1 À x associations should be included in the step-one model. Note that the marginal z 1 À x associations in Pðxjz 1 Þ will usually not be the same as the partial z 1 À x association in Pðxjz 1 ; z 2 Þ.
Using the above step-one model, in step two, one can obtain class assignments w based on the posterior class membership probabilities, Pðxjy; z 1 Þ ¼ Pðxjz 1 ÞPðyjx; z 1 Þ=Pðyjz 1 Þ. When using modal assignment, Pðwjy; z 1 Þ equals 1 for the class with the largest Pðxjy; z 1 Þ and 0 for the other classes. With proportional assignment, Pðwjy; z 1 Þ and Pðxjy; z 1 Þ are equal to one another. Irrespective of the class assignment method that is used, the relationship between the assigned class w and the true class x, which is crucial for the step-three adjustment, is obtained as follows: where for Pðyjz 1 Þ we can use either the estimated or the empirical distribution. Note that this formula is the same as the one provided by Bolck et al. (2004) and Vermunt (2010), except for the conditioning on z 1 .
As shown by Bolck et al. (2004), to derive the step-three model for Pðwjz 1 ; z 2 Þ, we first define the structure for the joint distribution Pðx; y; wjz 1 ; z 2 Þ based on the assumed population model and the classification rule, that is, which is subsequently collapsed over x and y to obtain Pðwjz 1 ; z 2 Þ: The resulting step-three model is a standard LC model with a single indicator w with "response" probabilities Pðwjx; z 1 Þ, which were defined above. Bolck et al. (2004) called the matrix with elements PðwjxÞ the D matrix. This matrix makes it possible to correct for the classification errors in the assigned class memberships w. The key result of our derivation for the DIF case is that the D matrix should be allowed to vary across levels of the DIF covariates. Estimation of the step-three model yields the effects of both the z 1 and the z 2 variables on the true latent classes x, which appear in the term Pðxjz 1 ; z 2 Þ. For parameter estimation, one can use either the method proposed by Bolck et al. (2004), which is often referred as the BCH approach or the maximum likelihood approach suggested by Vermunt (2010).

Covariates with direct effects on both indicators and distal outcomes
Suppose the population model of interest is a model in which the z 2 are distal outcomes instead of covariates. As illustrated in Figure 2, the covariates z 1 affect both the class indicators and the distal outcomes directly; that is, they may cause DIF and are at the same time relevant control variables in the model for the distal outcomes. In this case, the underlying population model becomes: Because the z 2 are unrelated to the y conditional on x and z 1 , we can use the same step-one model and step-two classification as described above for the covariate case. The step-three model will have the following form: This shows we should include z 1 in the step-three model, and, as in the covariate case, use classification error probabilities Pðwjx; z 1 Þ which depend on z 1 .
x z 2 z 1 y Figure 2. LC model with covariates z 1 , class indicators y, and distal outcomes z 2 . Covariates z 1 are also control variables in the model for the distal outcomes z 2 . The modified three-step approach is needed when covariates z 1 have direct effects on one or more of the indicators y (see thick dashed line).
In the simpler case in which z 2 is not affected by z 1 directly, the step-one model remains the same. The step3-model can be simplified to: where PðwjxÞ is computed in the usual manner, that is, by aggregating over both y and z 1 .

Indicators with direct effects on distal outcomes
Suppose the population model of interest is a model in which some of the responses y (denoted by y 1 Þ have a direct effect on distal outcomes z (see Figure 3). In this case, the underlying population model becomes: ÞPðy 1 jxÞPðy 2 jxÞPðzjx; y 1 Þ: Because z follows causally after y 1 , despite the direct effect, we can use a standard step-one model that excludes the distal outcomes. The step-three model will have the following form: Thus, in the step-three model, we should again include the y 1 while accounting for its relationship with w and x. This model may also be written as a model conditional on y 1 : which implies using y 1 in a similar way as z 1 was used in the step-three models described above.

A three-step model-building strategy
The three-step approach described above requires the inclusion of the covariates inducing MNI or DIF in the step-one model. However, in practice, we often do not know a priori which of the covariates of interest will have direct effects on the indicators. To overcome this problem, we propose using the following model-building strategy: (1) Construct the step-one LC model: a. Determine the number of classes and other relevant LC model features without the inclusion of covariates. b. For each covariate separately, include it in the model selected in step 1a and determine whether it has direct effects on the indicators. c. Specify the final step-one model with all DIF covariates and the significant direct effects encountered in step 1b.
(3) Estimate the step-three model with all covariates and/or distal outcomes, taking into account that classification errors depend on the DIF covariates.
Step 1a is based on the work by Nylund-Gibson and Masyn, who showed that especially in the presence of DIF, class enumeration should be done using the class indicators only, thus without covariates (or distal outcomes). For step 1b, it is key to have a tool to identify the direct effects, such as Bivariate Residuals reported by the Latent GOLD program (Oberski et al., 2013;Vermunt & Magidson, 2016). When estimating the LC model with DIF from step 1 c, the LC analysis software will allow the user to write the classifications to a file (step 2). While steps 1a, 1b, 1 c, and 2 can be performed with any LC analysis software, step 3 requires an option to indicate which are the DIF covariates. Appendix A1 shows how this is implemented in Latent GOLD version 6.0 (Vermunt & Magidson, 2020).

A demonstration using a constructed data example
To show that the proposed three-step approach with DIF works, we will apply the method to a synthetic data set. That is, we will demonstrate the new method recovers the parameters from a known population perfectly, while the standard approach does not, which confirms that the standard approach yields biased estimates. The population model we used consists of three latent classes, six dichotomous indicators (y 1 -y 6 ), and two covariates (z 1 having two categories, −.5 and .5, and z 2 with five categories, −2, −1, 0, 1, and 2). The specified LC population model yielded Pðyjz 1 ; z 2 Þ, which we multiplied by the number of observations of the covariate pattern concerned. The total "sample" size was set to 1000. The resulting non-integer frequencies were used as frequency weights in the LC analysis of the construct data set.
The values used for the class-specific response probabilities were the same as those used by Vermunt (2010) and Bakk et al. (2013) for the moderate class separation condition; that is, the "success" probably for Class 1 equals .80 on all indicators, for Class 2 .20 on all indicators, and for Class 3 .80 on the first three indicators and .20 on the other three. These "success" probabilities correspond with logit values of 1.3863 and −1.3863, respectively. The direct effects of z 1 on y 1 and y 4 were set to 1, implying a shift of −.5 and .5 in the response logit for the first and second categories. of z 1 , respectively. The intercept, z 1 slope, and z 2 slope parameters in the logistic model for the classes were set to −1, −1, and 1 for Class 2 and to −2, 1, and −1 for Class 3 (Class 1 served as baseline category). The resulting class proportions were .53, .31, and .16. The entropy-based R-squared value was .71 with covariates and .60 without covariates, corresponding with a moderate class separation. Table 1 reports the parameter values of the regression model for the latent classes obtained using different LC analysis approaches, where for the three-step methods modal class assignment and maximum likelihood bias-adjustment was used. As can be seen, both the correctly specified one-step model and the new three-step approach accounting for MNI or DIF reproduce the true parameter values perfectly. In contrast, the one-step model without DIF, the three-step approach without DIF, and the threestep approach with DIF in step one but not in step-three yield severely biased estimates for the z 1 slopes. The intercept and z 2 slope estimates are close to their true values.
In this synthetic example, the absolute bias in the z 1 effects turned out to be about 30% for each of the "incorrect" approaches. However, it should be noted that in practice the amount of bias will depend on factors such as strength and direction of the DIF, number of DIF items, and DIF being uniform or non-uniform. (Janssen et al., 2019;Masyn, 2017). We noticed that even the settings used in the step-three model estimation matter: the amount of bias changed with proportional instead of a modal class assignment or with BCH instead of maximum likelihood adjustment.

An illustration with a real data example
The extended three-step approach for dealing with MNI or DIF and the proposed model-building strategy will be illustrated with a data set used by Hagenaars (1993) in his book on loglinear modeling with latent variables. It is the demo data file "political.sav" which is part of the Latent GOLD software package. It contains five dichotomous indicators measuring what Hagenaars referred to as "System involvement" and "Protest tolerance," as well as the categorical covariates' gender, education (college and less than college), and age (18-34, 35-57, and 58-91). Table 2 reports the fit measures for the estimated LC models in step 1a (without covariates). These indicate that according to the BIC we should select a 3-class model and according to the AIC and the likelihood-ratio goodness-of-fit statistic a 4-class model. The other two models are restricted 4-class models in which the classes are "forced" to represent two dichotomous latent variables. Such models are sometimes referred to as a discrete factor (DFactor) model (Magidson & Vermunt, 2001). These two restricted models are preferred over the unrestricted 3-and 4-class models, where the "confirmatory" variant gives the best balance between fit and parsimony. In the latter model, three indicators load on the "System involvement" factor and two on the "Protest tolerance" factor. This model was also selected as the best model by Hagenaars (1993).
In step 1b, we estimated three models which included one of the covariates in the model selected in step 1a. Table 3 presents the Bivariate Residuals reported by Latent GOLD, which are approximate chi-squared statistics with 1 degree of freedom. Using a cutoff value of 3, we conclude there is evidence for age having direct effects on two of the indicators, while for gender and education we do not need direct effects. In step 1 c, we, therefore, use a model with age as a covariate and the two encountered direct effect included. This model is also used for classification (step 2). Table 4 presents the parameter estimates for the step-three model which accounts for DIF, and, for comparison, also for the standard three-step approach which ignores DIF (these are both based on proportional classification and maximum likelihood bias adjustment, the default setting in Latent GOLD). Rather than estimating a single model for the four classes (the four combinations of the two dichotomous latent variables), we performed separate step-three analyses (binary logit regression analyses) for "System involvement" and "Protest tolerance", The three-step analyses are based on modal classification and maximum likelihood bias adjustment. with "low" used as the reference class. As can be seen, accounting for the DIF induced by age has a huge impact on the estimated age effects on "System involvement" but only a small impact on its effects on "Protest tolerance." The latter can be explained from the fact that the DIF item loading on the second factor (Repression potential) is not very strongly related to this factor, implying it does not have a strong impact on the classification and its errors. Content wise, we can conclude that Male, College, and Age 35-57 or 58-91 are associated with higher "System involvement" and College and Age 18-34 are associated with higher "Protest approval."

Conclusion and discussion
In this paper, we derived the correct modification of the current three-step LC analysis approaches that need to be made to properly adjust for the situation in which one of its basic assumptions is violated. The three situations discussed concerned: covariates with direct effects on indicators, covariates with direct effects on both indicators and distal outcomes, and indicators having direct effects on distal outcomes. For the first situation, we used a synthetic data example to demonstrate the new method works by applying it to a hypothetical population where DIF is present.
In addition, a model-building strategy was presented which can be used to find the external variables causing DIF, thus making the new methodology practically applicable. The approach was illustrated with a real-life application using a data set from Hagenaars (1993), which showed that using the new approach may have a large impact on the step-three parameter estimates. It is important to note that correct modeling of MNI or DIF is not only necessary for a valid step-three LC analysis but it can also be of interest on its own merits. For example, in crossnational comparative research, the step-one LC model will typically include country as a covariate or grouping variable, where direct effects are included to account for country differences in the interpretation of the items. Individuals can then be classified based on the resulting multiple-group LC model which accounts for country differences (Clogg & Goodman, 1984;Kankaras, Moors, Vermunt, 2010;McCutcheon, 1987). However, this does not mean that country needs to be included in a step-three LC analysis. Alternatively, a step-three analysis may involve the prediction of class membership using individual-level predictors only (say age, education, and gender). In such a case, a standard step-three analysis should be performed in which one does not account for MNI across countries. That is, only if the country is part of the step-three model, country DIF should be accounted for in this model.
It is important to note our paper assumes that indicators affected by DIF are to be retained in the step-one LC analysis. An alternative approach to handle the presence of DIF is to remove the DIF items from the measurement model, which is commonly done in the field of item response theory modeling. However, because LC analyses are usually performed with a relatively small number of indicators, one will typically prefer keeping all indicators, which requires modeling the MNI in the right way in the step-one model, and subsequently using the new approach described in this paper in the step-three model.

Limitations
The proposed method works well when the number of covariates or grouping variables causing DIF is small, say one or two. However, if it turns out that a larger number of covariates have direct effects, it may be better to use a one-step LC model. In such a situation, a step-three analysis would require estimation of a large number of different D matrices (one for each DIF covariate combination), likely making the three-step approach less stable. Moreover, if say four out of the five covariates of interest show DIF, the step-one model will already contain four covariates, so why not include the fifth right away rather than proceeding with a three-step approach? Note that also in these types of situations, the first part of the proposed model-building strategy (in which one identifies the DIF covariates) remains very useful. Vermunt (2010) and Bakk et al. (2013) showed that threestep LC analysis may not perform well when class separation is extremely low, that is, with entropy R-squared values below .5. On the other hand, the method was shown to be especially useful in moderate class separation conditions, where it performs very well. Though not investigated here, these results might also be expected to apply with the modified three-step LC model, when DIF is present.
Using a synthetic data set, we showed the new method reproduces the parameters perfectly in a theoretical population. However, we did not investigate bias in parameter and standard error estimates caused by sampling fluctuation. It can, however, be assumed that the results of earlier simulation studies apply here as well, though slightly larger sample sizes may be needed because the step-one model contains more parameters, and D matrices with more elements must be estimated. As shown by Bakk et al. (2014), step-three standard errors may be slightly underestimated, which can, however, be corrected. In Latent GOLD, this requires specifying the step-three model "manually," that is, by providing the estimated D matrix in logit form as well as its estimated covariance matrix.

Extensions
While this paper focused on LC models for categorical indicators, the proposed modified step-three approach can also be applied with other types of mixture models. It can be used in all situations in which covariates to be included in a step-three analysis have direct effects on the response variables. This includes latent profile models (Collier & Leite, 2017), mixtures of normal distributions (Gudicha & Vermunt, 2013), mixture growth models (Diallo & Lu, 2017), mixture regression models (Kim et al., 2016), and LC tree models (Van den Bergh & Vermunt, 2019). In addition, for three-step latent transition and latent Markov models with DIF, the proposed approach is also relevant (Asparouhov & Muthén, 2014;Di Mari et al., 2016). While the focus of this paper was three-step LC analysis, exactly the same solution for dealing with DIF can be used in a two-step LC analysis (Bakk and Kuha, 2018).
Step 1a, 1b, and 1 c, which aim at finding the right step-one model, are also applicable with a two-step model. The step-two model is an LC model containing all covariates and all indicators, but with the measurement model parameters fixed to their step-one estimates. In the case of DIF, the measurement model will also contain the encountered direct effects. Actually, Di Mari and Bakk (2018) proposed an approach that is very similar to this one, with one important difference. They were not aware of the fact that the covariate-class associations should be included in the step-one model, because otherwise the direct effects will be overestimated.
If z2 is a distal outcome (say a continuous variable) and z1 a DIF control variable, the step-three model specification would be as follows: options step3 modal bch; output parameters standarderrors = robust profile probmeans = posterior estimatedvalues; variables independent z1 nomina; dependent z2 continuous; latent Class nominal posterior = (Class#1 Class#2 Class#3) dif = z1; equations Class <-1 + z1; z2 <-1 + z1 + Class; Note that we changed the step3 correction method from ml to bch, which is the preferred method for continuous distal outcomes.