Latent Logistic Interaction Modeling: A Simulation and Empirical Illustration of Type D Personality

ABSTRACT This study focuses on three popular methods to model interactions between two constructs containing measurement error in predicting an observed binary outcome: logistic regression using (1) observed scores, (2) factor scores, and (3) Structural Equation Modeling (SEM). It is still unclear how they compare with respect to bias and precision in the estimated interaction when item scores underlying the interaction constructs are skewed and ordinal. In this article, we investigated this issue using both a Monte Carlo simulation and an empirical illustration of the effect of Type D personality on cardiac events. Our results indicated that the logistic regression using SEM performed best in terms of bias and confidence interval coverage, especially at sample sizes of 500 or larger. Although for most methods bias increased when item scores were skewed and ordinal, SEM produced relatively unbiased interaction effect estimates when items were modeled as ordered categorical.


Introduction
In the medical and behavioral sciences, researchers often investigate the effect of predictor variables on a binary outcome variable. Occasionally, researchers are not only interested in the main effects of single predictors, but also in how multiple predictors influence each other's effects on the outcome variable. When one predictor moderates the effect of another predictor, one speaks of an interaction effect. Although there are several ways to assess the interaction between two variables on a binary outcome measure, researchers typically use a logistic regression analysis. In the logistic regression model, the interaction effect is assessed by multiplying the observed scores of two (or more) constructs involved in the interaction, and including the resulting product variable as a predictor (e.g., see Field, 2010;Tabachnick & Fidell, 2007).
In psychological research, the two interacting constructs are commonly unobserved (latent) and measured with questionnaires containing items measured on an ordinal scale. The scores on these items are typically summed and the resulting sum score is assumed to represent the construct of interest. One disadvantage of using such observed scores in regression analyses is that the presence of measurement error in these scores can either attenuate the regression coefficients or result in spurious effects (Busemeyer & Jones, 1983;Embretson, 1996;Kang & Waller, 2005;MacCallum et al., 2002). We will argue that this is especially true for interaction effects.
A second approach to assess such interactions is structural equation modeling (SEM). This approach takes into account the measurement error in the observed item scores by specifying a measurement model that shows the association between a latent construct and the items used to measure the construct. This measurement model allows for separating the variance in the item scores caused by variation in the latent construct from the variance caused by residual factors (i.e. measurement error). However, SEM often requires a large sample size, especially as models become more complex (Lomax & Schumacker, 2004;Kline, 2010).
A third approach to assessing interaction effects is by using factor analysis to first estimate the factor scores of the constructs, and to subsequently regress the binary outcome measure on the product of these estimated factor scores in a logistic regression (Devlieger & Rosseel, 2017;Lu et al., 2011). The key difference between this factor score interaction approach and the regular SEM approach is that the latter estimates the parameters of the measurement and structural model in a single step, whereas the former estimates those parameters in two separate steps.
Although all three approaches are commonly used by applied researchers, it is still unclear how they perform with respect to bias and precision when estimating the interaction effect on a binary outcome measure when sample size is small and the observed item scores are ordinal and non-normally distributed. In this study, we aim to answer this question based on both a Monte Carlo simulation and an empirical illustration.

Type D personality
The Type D ("distressed") personality construct (Denollet, 2005) serves as a good case study for modeling interaction effects on binary outcome measures. First, some researchers (e.g., Lodder, 2020a;Smith, 2011) argue that the effect of Type D personality is best modeled as an interaction between its two subcomponents negative affectivity (NA) and social inhibition (SI). People with a Type D (distressed) personality tend to both experience negative emotions (i.e. NA) and are inhibited in expressing their emotions and behavior in social situations (i.e. SI). The two subcomponents NA and SI are hypothesized to show a combined synergistic effect that is more than the sum of their parts (Smith, 2011), suggesting the Type D effect is more than the additive NA and SI effects. A second reason why Type D personality serves as a good case study is that the medical outcomes associated with Type D personality are often measured on a binary scale. For instance, a systematic review showed that patients with cardiovascular disease who have a Type D personality show an increased risk on adverse cardiac events (Denollet, Schiffer et al., 2010). These findings were corroborated by a meta-analysis (O'Dell et al., 2011). Another meta-analysis indicated that having a Type D personality imposes an increased mortality risk in people with coronary artery disease .
Most of the studies included in these meta-analyses did not operationalize Type D personality as an interaction between its two subcomponents NA and SI, but classified people as having a Type D personality when they showed a sum score of 10 or higher on both the NA and SI construct. This approach has been criticized (e.g., Smith, 2011) as it involves the dichotomization of continuous variables, a practice known to reduce the power and effect size in statistical analyses and that may even give rise to spurious main-or interaction effects (MacCallum et al., 2002).
To answer this criticism, Denollet et al. (2013) showed that Type D personality, operationalized as the interaction between its two continuous subcomponents NA and SI, significantly predicted the later occurrence of major cardiac events. This study serves as a perfect case study for the present article, as the authors used a logistic regression analysis to assess the interaction between two continuous predictor variables on a binary manifest outcome. Therefore, in the first part of the current study we will reanalyze the data of Denollet et al. (2013) and not only use the original logistic regression analysis, but also a logistic regression on factor scores and a logistic regression using SEM. As it not yet clear how these models perform with respect to the bias and precision of the estimated interaction effect when the item scores are ordinal and non-normally distributed, the second part of this article present the results of a Monte Carlo simulation study, assessing for each of those three interaction models the bias and precision in estimating the interaction effect under various conditions.

Modeling logistic interaction effects
Several methods exist to model an interaction between two continuous variables on a binary manifest outcome variable. These methods all operate within a logistic regression framework, to model the effect of continuous predictor variables on a binary outcome variable. They model the interaction effect between two variables on a binary outcome variable by including in the regression both the main effects of the two variables in the model, as well as their interaction. Let � 1 and � 2 be the latent predictors, respectively, and � 1 � 2 their product representing their interaction. Furthermore, let p x ð Þ ¼ PðCardiac eventjx ¼ � 1 ; � 2 ð ÞÞ be the probability of a cardiac event, given a set of predictors �. The logistic model then equals: In Equation (1), the natural logarithm of the odds of an event (i.e. the log odds or logit) depends on an intercept (β 0 Þ, an effect (β 1 ) of the first predictor (� 1 Þ, an effect (β 2 ) of the second predictor (� 2 ) and, most importantly, the interaction between those two predictors (β 3 ).
We will now discuss three general methods used to model interaction effects when the outcome is both manifest and binary: (1) logistic regression on observed scores; (2) logistic regression using SEM; and (3) logistic regression on factor scores. These methods differ with respect to the theoretical meaning of the predictors � 1 and � 2 and in how they handle the measurement error in the item scores.

Logistic regression on observed scores
The approach most often encountered in introductory statistics textbooks involves modeling the interaction using logistic regression where the predictors are observed scores. According to this method, the terms � 1 and in � 2 Equation (1) can typically be seen as an unweighted sum of all questionnaire item scores measuring a construct. The interaction term is typically constructed by multiplying the mean-centered sum scores of the two constructs constituting the interaction (e.g., see Field, 2010, p. 279;or Tabachnick & Fidell, 2007, p. 442). Using observed scores in a regression analysis tacitly assumes that these scores are a perfectly reliable measure of the construct that is supposed to be measured. By assuming this, researchers ignore the measurement error that is often present in questionnaire scores. Ignoring this measurement error leads to reduced standardized associations in subsequent analyses, a phenomenon known as attenuation bias (Spearman, 1904). Such attenuation bias is especially problematic when modeling interactions between two observed scores containing random measurement error. When multiplying unreliable observed scores to construct the interaction term, the measurement error of the resulting product variable is larger than the sum of the two parts, because the measurement error present in each of the two scores also gets multiplied rather than summed.

Logistic regression using SEM
A second approach to model interaction effects fits a structural equation modeling framework, combining a latent variable measurement model to model errors with a structural model expressing the relations between these latent variables. According to this SEM approach, the terms � 1 and � 2 in Equation (1) can be seen as the unidimensional latent variables representing the construct of interest. Within a SEM framework, for each latent variable (�) a separate measurement model relates the vector of observed item scores (x) to the latent construct based on a vector of factor loadings (Λ x ) and a vector denoting measurement error (δ): Equations (1) and (2) together define the structural equation model, where (1) describes the structural model and (2) the measurement model. All parameters in a SEM model are simultaneously estimated. Note that within a SEM context a binary observed outcome can also be modeled as a latent variable with a single binary indicator by fixing the item's factor loading and residual variance to specific values (Hayduk & Littvay, 2012). However, in the present study we used the observed binary outcomes in our SEM, similar to how the observed binary outcomes are modeled using the sum score and factor score regression methods.
There are different approaches to model interactions within a SEM framework. In its basic form, interaction effects are modeled by means of the product of two latent variables. However, within a latent variable modeling framework, interaction (and other non-linear) effects can also be modeled using various other techniques. First, the indicant product approach (Kenny & Judd, 1984) multiplies the item scores of items loading on the first construct involved in the interaction, with the item scores of all possible combinations of items of the other construct involved in the interaction. These multiplications constitute new items that are modeled to load on a new latent interaction variable. For example, when two latent constructs are each measured with three items, nine different multiplications are possible between the item scores of these two sets of three items. Hence, nine new observed variables are created that will load on the new latent interaction variable.
The structural model then includes both the effects of the two original latent variables as well the new interaction variable, thus replacing the � 1 � 2 by � int in Equation (1). Two main disadvantages of this method are (1) that the number of items loading on the new latent interaction variable can quickly get very large, and (2) that the method additionally requires the specification of a set of complex parameter constraints. To solve these problems, Marsh et al. (2004) proposed an unconstrained approach that no longer needed the complex parameter constraints and that modeled the latent interaction variable based on the same number of items as each latent variable involved in the interaction. In a simulation study, the authors showed that their unconstrained approach performed better than the traditional constrained approach.
Another popular method to model latent interaction effects is the Latent Moderated Structural equations (LMS) approach (Klein & Moosbrugger, 2000). This approach does not require new items loading on a latent interaction variable, but makes use of mixture modeling to express the non-normality of an interaction effect. Due to their multiplicative nature, the product scores representing interaction effects often show nonnormal distributions, even when the latent variables involved in the interaction are themselves normally distributed. LMS models interaction effects by representing the joint distribution of the indicator variables as a mixture of normal distributions.
The method assumes that the indicators of the latent predictor variables show a multivariate normal distribution. Earlier simulation studies showed LMS to perform very well when item scores are normally distributed, yet a violation of this assumption caused LMS to show more bias than for instance, the unconstrained approach (Cham et al., 2012;Marsh et al., 2004).
Some other recently developed latent variable methods make use of mixture modeling to handle non-normally distributed latent variables. Examples include the Nonlinear Structural Equation Mixture Modeling approach (NSEMM; Kelava et al., 2014) and an approach making use of Bayesian finite mixture models (Kelava & Nagengast, 2012). Although both approaches appear to be useful when modeling latent interaction effects, in the current study we decided to use the LMS approach, as it is commonly used by applied researchers, and is the default method implemented in the Mplus software (L. K. Muthén & Muthén, 1998. Moreover, in a previous simulation study (Lodder et al., 2019) we found that LMS outperformed two unconstrained approaches with respect to minimizing bias and maximizing power when estimating the interaction between two continuous latent variables on a continuous outcome variable, even when the latent traits were non-normally distributed.

Logistic regression on factor scores
Although latent variable modeling is often considered the preferred choice when analyzing associations between variables containing measurement error, latent variable approaches do have some disadvantages. First, they often require a large sample size, especially when the models become more complex. Furthermore, because latent variables typically estimate all the parameters in a single step, increasingly complex models can produce unstable parameter estimates (Devlieger et al., 2019). To overcome these problems, researchers may use a two-step approach, called factor score regression (Lu et al., 2011).
This approach is similar to the logistic regression on observed scores, but the observed scores of the predictor variables are replaced with factor scores estimated in a separate latent variable model. Accordingly, the terms and � 2 in Equation (1) can be seen as the estimated factor scores of the constructs of interest. Factor score logistic regression differs from latent variable modeling approach in that it first estimates for each person the factor scores based on the item scores (Step 1), and then uses those estimated factor scores in a logistic regression (Step 2). Factor score regression requires a smaller sample size than structural equation models. In a recent simulation study, Devlieger and Rosseel (2017) showed that factor score regression can be a suitable alternative to structural equation modeling when estimating the effect of continuous latent predictor variables on continuous latent outcome variables.
Within a factor score regression framework there exist several techniques to estimate the factor scores that can be used in a subsequent regression analysis. Devlieger et al. (2016) showed that some of these techniques are inherently biased depending on whether factor scores or observed scores are used for the predictor or outcome variables.
The authors compared four approaches to estimate factor scores: the (1) Regression method (Thurstone, 1935), (2) Bartlett method (Bartlett, 1937), (3) Bias avoiding method (Skrondal & Rabe-Hesketh, 2004), and (4) Bias correcting method (Croon, 2002). Devlieger et al. (2016) analytically showed that the latter two approaches should be unbiased, independent of whether factor scores or observed scores are used for the predictor or outcome variables. Their analysis also indicated that the Bartlett method is expected to only show bias when factor scores are used for the predictor variable(s). The reverse is true for the Regression method, which is expected to only show bias when factor scores are used for the outcome variable(s). The authors confirmed these predictions in a Monte Carlo simulation study where factor scores were used for both the predictor and outcome variables.
In the current study, the outcome variables concern different types of cardiac events that are directly observed. The predictor variables NA and SI could be considered unobserved latent variables. This means that the Bartlett approach is expected to show bias, while the remaining three approaches to estimate factor scores should be unbiased. Of those three approaches we decided to use the Regression approach in our study, because this is both one of the best known approaches and one of the easiest to implement by applied researchers.
Equation (3) shows how this Regression approach estimates the factor scores of a latent variable (A � ) based on the variance of the latent factor var � ð Þ, the vector with factor loadings Λ x ð Þ and the inverse of the model implied covariance matrix of the indicator variables � À 1 x À � . The last step of the factor score regression method is to use these estimated factor scores (and their interaction effect) as predictors in a logistic regression analysis.

Modeling skewed ordinal item scores
In clinical research, questionnaire item scores often show positively skewed distributions (Reise & Waller, 2009), with lower response categories being more often endorsed than higher response categories. Such skewed distributions can be problematic if the statistical method used to analyze the data assumes these data to be normally distributed. Although linear regression assumes the prediction residuals to be normally distributed, logistic regression does not impose such distributional assumptions. Nevertheless, the continuous predictors in a logistic regression are assumed to have a linear relation with the logit of the binary outcome variable (Field, 2010).
In latent variable modeling, an assumption of maximum likelihood (ML) estimation applied to factor analysis is that the observed item scores are continuous and normally distributed (Bollen, 1989, pp. 131-134). Analyzing skewed ordinal item scores as if they are continuous and normally distributed can lead to biased parameter estimates and underestimated standard errors, increasing the chance of false positive conclusions about the significance of these estimates, especially if there are five or fewer response categories (Dolan, 1994;B. Muthén & Kaplan, 1985;Kline, 2010;Rhemtulla, Brosseau-Liard & Savalei, 2012). Researchers could avoid this bias by using modeling techniques that do not assume the item scores to be normally distributed, such as (1) item response models (Rasch, 1960;Birnbaum, 1968;Samejima, 1997) or (2) factor models that estimate the parameters using weighted least squares (WLS) estimation and fit the model to the polychoric correlation matrix rather than the product moment correlation matrix.
Another popular method to handle skewed ordinal item scores is (3) using ML estimation with robust standard errors and a robust statistic, such as the Satorra-Bentler correction (also known as MLR estimation; Satorra & Bentler, 2010). In a previous Monte Carlo simulation study (Lodder et al., 2019), we found that when item scores are ordinal and non-normally distributed, using MLR estimation adequately controls the false positive rate when estimating the interaction between two latent variables on a continuous latent outcome variable, while WLS estimation resulted in an inflated false positive rate. Therefore, in the present study we chose to estimate the parameters of our latent variable models using MLR estimation. In earlier work (Lodder et al., 2019), we also showed that a linear regression on sum scores resulted in negatively biased interaction effects as well as inflated false positive rates. It is not yet clear, however, whether these results also apply to regression models with a binary and manifest outcome, especially because logistic regression, as opposed to linear regression, does not assume the prediction residuals to be normally distributed.
According to an earlier simulation study, latent variable as well as factor score regression methods showed a negative bias in the standard errors of the structural regression coefficients when the item scores were non-normally distributed (Devlieger et al., 2016). However, this simulation study only investigated the main effect of continuous latent predictor on a continuous latent outcome, raising the issue of whether skewed item scores also result in biased standard errors when estimating the interaction effect of two continuous latent variables on a binary and manifest outcome.
The main goal of the present study is to compare several popular methods to model interactions between two continuous latent variables on a binary and manifest outcome variable. It is not yet known how these methods perform in terms of bias, accuracy and precision, especially when the item scores underlying the two continuous constructs are non-normally distributed and measured on an ordinal scale. We aim to shed more light on this issue using both a Monte Carlo simulation as well as an empirical illustration.

Study overview
In Study 1, our empirical illustration, we investigate the effect of Type D personality on the two endpoints major cardiac event (MACE) and myocardial infarction (MI) separately, and after adjusting for theoretically important confounding factors (see Denollet et al., 2013). Figure 1 shows the structural equation model of the relation between Type D Personality (modeled as the interaction between NA and SI) and Cardiac Events, while controlling for the covariates depression, age, gender, MI at baseline, percutaneous coronary intervention (PCI) at baseline, coronary artery bypass grafting (CABG) at baseline, left ventricular ejection fracture (LVEF), and poor exercise tolerance (ET).
As the Type D personality effect is hypothesized to reflect an interaction between its components negative affect and social inhibition (Smith, 2011), we studied four methods to model this interaction effect: (1) logistic regression, modeling the interaction as a multiplication of sum scores, (2) logistic regression, modeling the interaction as a multiplication of factor scores, (3) latent logistic regression, modeling the interaction using the LMS approach in a structural equation model treating the ordinal item scores as continuous, and (4) latent logistic regression using LMS, but modeling the item scores at their appropriate measurement level (ordered categorical).
NA and SI, the two constructs involved in the interaction, are measured with multiple items measured on an ordinal and occasionally positively skewed scale. It is still unclear how these interaction models perform with respect to bias and precision when the observed outcome is binary and the item scores of the exogenous latent variables are ordinal and non-normally distributed. Therefore, in Study 2, we aimed to answer this question by conducting a Monte Carlo simulation investigating to what extent the models used in Study 1 provided accurate and stable parameter estimates, while varying across (1) sample size, (2) the reliability of the NA and SI scales (3) the size of the interaction effect, (4) skewness in the latent NA and SI traits, and (5) skewness in the item scores (while the latent NA and SI traits are normally distributed). This simulation should provide valuable information on the bias and accuracy of these popular methods to study interaction effects on dichotomous outcomes.

Participants
Secondary data were used of a study by Denollet et  disease. The mean age of these patients was 58.7 years (SD = 10.5) and 87% were men. The patients were included from Antwerp University Hospital Belgium between January 1998 and December 2005. The Medical Ethics Committee of the hospital approved of the study protocol (5/ 48/193). Patients filled out the psychological questionnaires at baseline and after 5 years each patient (or family) was contacted to determine the end points (see below). All participants gave informed consent.

Measures
End Points. We investigated two different but related endpoints marking the occurrence of a cardiac event: The major cardiac event (MACE) and cardiac death or myocardial infarction (MI). The endpoints are related in such a way that every MI is a MACE, but not every MACE is an MI. Time-to-event data were not available for these endpoints, explaining our choice for logistic regression rather than cox regression.
Type D Personality. The traits underlying Type D personality (NA and SI) were measured using the DS14 questionnaire (Denollet, 2005). Each trait was measured with a scale consisting of seven questions with five ordinal response categories ranging from "false" (0) to "true" (4). The DS14 has been validated in several populations (Denollet, 2005) and several studies showed a two-factor structure to best fit the data (Nefs, Pouwer & Denollet, 2012;Romppel, Herrmann-Lingen, Vesper & Grande, 2012). In our sample both NA and SI showed a coefficient alpha of .87.

Depression.
Depressive symptoms were measured using the ten-item version of the Beck Depression Inventory (BDI10; Denollet, Martens et al., 2010), with each item having three ordinal response categories ranging from 0 through 2. Because depressive symptoms may confound the relation between Type D personality and cardiac events, we included the BDI10 score as a covariate in our models. The BDI10 has been validated in both the general-and a post-MI population (Denollet, Martens et al., 2010). In our sample, the coefficient alpha of the BDI10 was .83. In our latent variable analyses we also added a measurement model for the BDI10 to adjust the Type D personality effect for the latent depression score.

Model building
For the logistic regression on sum scores method, the interaction variable was constructed by multiplying the meancentered NA and SI sum scores. To account for uncertainty in the factor scores when estimating the standard errors, the R-package boot (version 1.3-24; Canty & Ripley, 2019) was used to bootstrap the standard errors using 1000 bootstrap samples. To identify our latent variable models, we fixed the first factor loading of each latent trait to a value of one. For each of the two cardiac endpoints separately, we fitted four nested models where we regressed the endpoints on the covariates and on Type D personality. In Model 0 we only included the covariates age, sex, depression, ET, LVEF, MI at baseline, CABG at baseline and PCI at baseline. Subsequently, in Model 1 we added the main effects of NA and SI. To make sure the interaction effect between NA and SI would not merely reflect an unmodeled quadratic effect (Lodder, 2020b;MacCallum & Mar, 1995), we added the quadratic effects of NA and SI in Model 2. Finally, in Model 3 we included the interaction between NA and SI. We assessed the interaction between NA and SI according to four different methods: (1) Logistic regression of sum scores using maximum likelihood (ML) estimation, (2) Logistic regression of cardiac events on factor scores using robust maximum likelihood (MLR) estimation to estimate the parameters in the NA and SI measurement models, (3) latent logistic regression using LMS with MLR estimation (treating ordinal items as continuous), and (4) latent logistic regression using LMS with MLR estimation, modeling the ordered categorical items using polychoric threshold parameters. Although earlier research has advocated to use weighted least squares estimation (Flora & Curran, 2004) for ordered categorical SEM, the software used to estimate latent interaction effects according the LMS method only allowed for MLR estimation.

Model fit
For our latent variable models, model fit was assessed by inspecting the Akaike (1974) Information Criterion (AIC; Akaike, 1974), the Bayesian Information Criterion (BIC; Schwarz, 1978), the sample size adjusted BIC (SABIC). When numerical integration is required (e.g., when modeling latent interactions with the LMS method), means, variances, and covariances are not sufficient statistics for model estimation and chi-square and related fit statistics are not available. As a result, for the LMS method the Mplus software does not report the chi-square statistics and related indices (RMSEA, CFI, TLI). We therefore decided to report these fit indices only to assess the fit of the correlated measurement models. With respect to the fit of the interaction model, we compared the fit of the nested models by conducting a difference test based on the log likelihood values of the two compared models. For LMS these log likelihood values are adjusted based on scaling correction factors obtained with MLR estimation. P-values smaller than .05 were considered statistically significant.

Software
We conducted the hierarchical logistic regression analysis in IBM SPSS Statistics 23. We used the Mplus software (Version 8; L. K. Muthén & Muthén, 1998 to analyze our latent interaction models. All other analyses were conducted using the freely available R programming software (Version 3.2.3; R Development Core Team, 2008), including the R-package Lavaan (Version 0.6.1; Rosseel, 2012) to model the factor score logistic regression and the R-package Amelia II (Version 1.7.5; Honaker et al., 2011) to handle any missing data. Based on guidelines by Lodder (2013), we used multiple imputation to impute any missing values in the DS-14 or BDI10 item scores and we imputed 10 datasets. Parameter estimates and fit indices were pooled across imputed datasets. SPSS reported pooled results for the sum score regression models. The pooling of the factor score regression models was done using the R-package semTools (Jorgensen et al., 2019). Lastly, Mplus provided the pooled results for the latent interaction models (Asparouhov & Muthén, 2010). As only a few observations were missing, the test statistics and fit indices were similar across the imputed datasets. Our analysis scripts can be found on the open science framework: https://osf.io/yhvnp/.

Hypotheses
We first expected to reproduce the findings of Denollet et al. (2013), who originally analyzed this dataset using hierarchical logistic regression analysis on the NA and SI sum scores. However, Denollet et al. (2013) did not adjust for quadratic NA and SI effects and ignored the missing values when computing the NA, SI and BDI10 sum scores. We did not expect this to have a major impact on the results. Our second expectation was that the four methods to model the interaction between NA and SI would produce approximately similar results. However, we expected the logistic regression method based on observed sum scores to show smaller effects compared to the latent variable and factor score logistic regressions because it does not take into account the measurement error present in the observed item scores.

Results
Of all Table 1 shows the results of the sum score and factor score methods to model interaction effects for the MACE and MI endpoints, while Table 2 shows those results for the two SEM LMS methods. The latent variables NA and SI both showed adequate factor indeterminacy values (NA = .948; SI = .940). To estimate the factor scores required for the factor score regression, a correlated three factor model (NA, SI, depression) was estimated and factor scores were computed for further analyses. The CFA showed a reasonable yet suboptimal fit to the data (χ 2 (239) = 637.775, p < .001; RMSEA = .056, 95% CI = [.050, .061]; CFI = .907; TLI = .892).

Major cardiac events (MACE)
Logistic Regression on Sum Scores. According to the logistic regression using the NA and SI sum scores, the −2LL difference test indicated that after adjusting for covariates, the model including the interaction between NA and SI on MACE fitted the data better than the models without the interaction term, χ 2 (1) = 6.489, p = .011. The main-and quadratic effects of NA and SI failed to reach significance in all tested models. In Model 3, the interaction between NA and SI on MACE was statistically significant ( ). Although the model including the interaction did not show the best model fit, the estimated interaction effect points in a similar direction as the regular logistic regression effect reported above.

Logistic Regression Using SEM LMS (Categorical).
When modeling the ordered categorical items not as continuous, but at their appropriate ordinal measurement level, both the −2LL difference test as well as the AIC, BIC and SABIC indicated that the model including the latent interaction between NA and SI best fitted the data. The estimated regression coefficients revealed nonsignificant main-and quadratic effects for NA and SI, but a significant interaction (OR = 1.85, 95% CI = [1.11, 3.08]).

Myocardial infarction (MI)
Logistic Regression on Sum Scores. According to the logistic regression using the NA and SI sum scores, the −2 log likelihood difference test indicated that after adjusting for covariates, the model including the interaction between NA and SI on MI fitted the data better than the models without the interaction term, χ 2 (1) = 11.12, p = .

Logistic Regression Using SEM LMS (Continuous).
According to the logistic regression conducted within a SEM framework and using the LMS approach to model the interaction between the latent variables NA and SI, the −2LL difference test indicated that after adjusting for covariates, the model including the interaction between NA and SI on MI fitted the data better than the models without the interaction term, χ 2 (1) = 6.929, p = .008. However, although based on the AIC and SABIC we would choose the model including the interaction term, the BIC preferred the model with covariates only. Inspection of the regression coefficients of the latent interaction models showed non-significant main effects for NA and SI, yet their interaction was significant (OR = 3.706, 95% CI = [1.246, 11.021]). There also turned out to be a significant negative quadratic effect for SI (OR = .359, 95% CI = [0.151, 0.853]), suggesting that higher SI scores are associated with a lower chance on MI and that this effect gets stronger at higher levels of SI. Although the model including the interaction did not show the best model fit according to the BIC and SABIC, the estimated interaction effect points in a similar direction as the regular logistic regression effect reported above.  20, 15.83]), yet the confidence interval was very broad, likely due to the small number of observed MI events.

Synthesis
These results support the hypothesis that Type D personality, operationalized as the interaction between NA and SI, predicts the occurrence of a major cardiac events, and even more strongly predicts the occurrence of a cardiac death or myocardial infarction. Figure 2 shows the factor score distributions of NA, SI & NA x SI, separately for people who did (red curve) or did not (green curve) have a cardiac event. The top row shows the results for the MACE endpoint and the bottom row for the MI endpoint. The top of each plot shows the result of the two sample Kolmogorov-Smirnoff test for the equality of the two distributions. As the main effects of NA and SI were not associated with both MACE and MI, we expect the factor score distributions of people with-and without a cardiac event to be samples from the same population distribution. Conversely, because the interaction between NA and SI significantly predicted the occurrence of both MACE and MI, we expect the factor score distributions of people with-and without a cardiac event to differ. The results of the two sample Kolmogorov-Smirnoff tests confirmed these expectations. The null hypothesis of equal factor score distributions for people with-and without an event could not be rejected for the NA and SI factor scores, but was rejected for the NA x SI interaction factor scores for both MACE (D = .224, p < .001) and MI (D = .266, p = .005).
In general, the four methods to model the interaction effect agreed on the direction and the statistical significance of the interaction effect, but differed in their estimated size of the interaction. As expected, the effects of the sum score logistic regression analyses were smaller than those of the other two methods, likely because this approach did not take into account the measurement error in the item scores. However, the interaction effects estimated by the two latent logistic regression approaches using LMS were substantially larger than the effects estimated by the factor score logistic regression.
An earlier study comparing four factor score regression methods (Devlieger et al., 2016) analytically and numerically showed that the Regression method should not be biased when factor scores are only used for the predictors and the outcome is observed. However, their simulation study involved continuous outcome variables and did not include interaction effects, making it difficult to generalize these findings to the models applied in the current study. In our second study we aimed to shed more light on this issue by conducting a Monte Carlo simulation study, comparing the bias, power and false positives of four methods used to model interaction effects on binary observed outcomes.

Procedure
In our simulation study, we varied five different design parameters: scale (continuous, ordinal), scale reliability (0.60, 0.87), skewness (no skewness, moderate latent skewness, high latent skewness, moderate item skewness, large item skewness), the size of the interaction (0, 0.154, 0.308, 0.616) and sample size (250, 500, 1000). This resulted in a fully-crossed factorial design with 2x2x5x4x3 = 240 conditions. We simulated 500 datasets in each of those 240 conditions and we analyzed each of those datasets with four different methods to model interaction effects.

Design Parameter 1: Scale Measurement
Level. The first design parameter was the scale level of the simulated DS14 items. We either simulated continuous item scores or ordinal item scores with five response categories (0-4 Likert scale).
Design Parameter 2: Scale Reliability. The second design parameter was the reliability of the NA and SI scales. We simulated both scales to either have a reliability observed in Study 1 (0.87), or a substantially lower reliability of 0.60. This lower reliability was achieved by multiplying all estimated Study 1 factor loadings of the NA and SI measurement models with 0.7.

Design Parameter 3: Latent Skewness.
The third design parameter was the amount of skewness in the distribution of the latent traits NA and SI. We used the method of Vale and Maurelli (1983) as implemented in the R-package fungible (version 1.5; Waller, 2016), to simulate a multivariate distribution of NA and SI. We varied across three latent skewness values (0, 2 and 3; with corresponding kurtosis values 0, 7 and 21), while retaining the product moment correlation between NA and SI (Study 1 estimate of .428). Besides generating skewness on the latent level, in some ordinal item scenarios skewness was also generated on the item level while keeping the underlying latent traits normally distributed (see Data simulation paragraph).

Design Parameter 4: Size of Interaction.
The fourth design parameter indicated the strength of the interaction between NA and SI on depression. We based the true size of the interaction on the standardized regression coefficient of the estimated interaction effect in Study 1 according to the LMS approach (β = .308). In our simulation, we allowed the interaction to be either absent (0), half the size of the Study 1 interaction (0.154), the exact size of the Study 1 interaction (0.308), or twice the size of the Study 1 interaction (0.616).

Design Parameter 5: Sample Size.
The fifth design parameter indicated the sample size of the simulated dataset. We varied across small (n = 250), medium (n = 500) and large (n = 1000) sample size conditions. The number of items loading on a construct was not manipulated in this simulation. As this may also affect the power to detect interaction effects, readers are advised to interpret our findings using the number of cases per variable (n/p ratio) rather than the raw sample size. For the sample sizes included in our simulation (250, 500, 1000) the n/ p ratios are approximately 18, 36 and 72.

Data simulation
For each of the 240 conditions, we simulated 500 datasets containing scores on items measuring the constructs NA (7 items) and SI (7 items). We generated data using the parameter estimates (i.e., factor loadings, latent (co)variances, regression coefficients, thresholds and error variances) of the latent interaction model in Study 1. 1 First, we randomly sampled vectors of NA and SI latent trait scores according to the multivariate skew distribution, given the NA and SI (co)variance(s) from Study 1 and given the skewness design parameter. Second, continuous item scores for each individual (i) and for each item (j) measuring the traits (t) NA or SI were obtained as follows: As input we used a matrix with individual NA and SI trait scores (Ξ), the factor loading matrix retrieved from Study 1 (Λ), and residual error matrix (Θ) based on a multivariate normal distribution with a mean vector of zeroes and a diagonal covariance matrix with variances retrieved from the output in Study 1. In line with earlier research (Flora & Curran, 2004), for ordinal scenarios with latent skewness we transformed these continuous item scores into ordinal scores using the symmetric Case 1 thresholds (−1.645, −0.643, 0.643, 1.645) proposed by B. Muthén and Kaplan (1985) . 2 In scenarios where skewness was generated on the item level while keeping the latent NA and SI traits normally distributed, we used the Case 2 (−1.645, −0.643, 0.643, 1.645) and Case 3 (−1.645, −0.643, 0.643, 1.645) thresholds to transform the normally distributed continuous item scores into skewed ordinal item scores.
To simulate the cardiac event scores, we had to take into account that these scores depended on the scores of both the latent NA and SI traits as well as the interaction between NA and SI. Therefore, we used Equation (5) to compute the event probabilities based on a logistic regression model: In Equation (5), ρ Ci denotes the probability of a cardiac event score of individual i. β 0 represents the intercept and the three other β 0 s denote the standardized regression coefficients of the structural logistic regression of MACE on NA (� NAi ), SI (� SIi ) and the interaction between NA and SI (� NAi � � SIi ). Lastly, e denotes the base of natural logarithms.

Statistical methods
After simulating 500 datasets in each of the 240 conditions, we analyzed each dataset according to the same methods used in Study 1: (1) Logistic regression of sum scores using maximum likelihood (ML) estimation, (2) Logistic regression of factor scores based on the Regression method and using MLR to estimate the parameters in the NA and SI measurement models, and (3) Latent variable interaction using LMS with MLR estimation while treating the ordinal items as continuous. In scenarios involving ordinal item scores we also estimated the latent interaction effect using (4) LMS with MLR estimation while modeling the items as ordered categories using polychoric threshold parameters. We implemented all latent interaction models in Mplus and conducted the simulation using the R-package MplusAutomation (Hallquist & Wiley, 2011). The R-script of this simulation study is available at this project's open science framework page: https://osf.io/yhvnp/.

Outcome measures
Our main outcomes were bias, precision and accuracy of the estimated interaction effects. The bias was computed as the 1 We used the parameter estimates resulting from the LMS method, because this is the default method in Mplus to model interaction effects. We focused on the MACE endpoint, because the MI endpoint shows a relatively low proportion of participants with an event compared to the MACE endpoint. For reasons of simplicity, we did not study the effects of covariates in this Monte Carlo simulation. We therefore fitted the Study 1 LMS model without covariates and used those parameter estimates as input for our simulation study.

2
The average estimated skewness of the simulated skewed ordinal item scores was .16 and .20 when the generated latent skewness was equal to 2 and 3, respectively.
difference between the mean of the parameter estimate across 500 replications and the true value; that is, the β values used to generate the data. We used the standard deviation and corresponding 95% variability interval of the parameters estimates across the 500 replications as a measure of precision. We also assessed the mean squared error (MSE) as a measure of accuracy, where MSE is defined as the squared distance between the estimated value of the interaction effect and the true value of the interaction effect, averaged across 500 replications. Additionally, we computed for each condition, a 95% confidence interval coverage rate, as the percentage of replications where the 95% confidence interval of a single estimated interaction effect contained the true value of the interaction effect. Lastly, we determined in each condition the percentage of replication with a significant estimated interaction effects, in order to shed light on the power to detect non-zero interaction effects and the percentage of false positives when the true interaction effect was equal to zero.

Expectations
We expected the sum score method to underestimate the true size of the interaction because this approach does not take into account the measurement error inherent in the item scores and this may attenuate the true association between the latent construct and the manifest binary outcome. Though factor score methods account for measurement error when estimating parameters in the measurement model, the coefficients in the factor score regression may still be contaminated because the sample moments of the factor score deviate from the true moments. For linear factor score regression the biased moments can cancel out, but it is unclear whether this also generalizes to a logistic regression analysis. We therefore investigated this empirically in our simulation. Lastly, we did not expect LMS to suffer from attenuated regression coefficients, because it takes into account the measurement error of the item scores when estimating the interaction between the latent variables. In line with earlier simulation studies (Kelava & Nagengast, 2012;Kelava et al., 2014) we expected LMS to perform well with large sample sizes (500 or higher) and no skewness in the latent traits. We also expected the factor score approach to perform equally to the LMS method, because earlier research showed this to be the case for linear models involving main effects rather than interactions (Devlieger & Rosseel, 2017).

Results
For both the sum score and factor score methods, the convergence rate was 100% in all simulation conditions. Although for the LMS approach most conditions also showed 100% convergence rates, conditions with large skewness sometimes resulted in non-convergence (convergence rates were 98.4% or higher).
No non-positive definite covariance matrices were encountered. We removed the non-converged solutions from further analyses. Tables 3-5 show for all 240 simulation conditions the relative bias in the estimated interaction effect averaged across all 500 replications. Tables 6-8 report for all simulation conditions the percentage of replications in which the 95% confidence interval of a single estimated interaction effect contained the true value of the interaction effect. The online supplement contains six supplemental Tables. Tables S1, S2 and S3 present the bias in the estimated interaction effect in terms of the mean squared error (the squared difference between the true interaction effect and the estimated interaction, averaged across 500 replications). Tables S4, S5, and S6 show the power of detecting the interaction effect (the percentage of significant interactions in case of a non-zero true interaction) and false positive rates (the percentage of significant interaction effects when a true interaction effect was absent). For each method, the average bias in the estimated interaction effect is visualized in Figure 3 (continuous item scores), Figure 4 (ordinal item scores with latent skewness), and Figure 5 (ordinal item scores with skewness generated on the item level). The online supplement contains similar figures for both the two main effects. Each figure contains 9 plots, where the rows represent the different sample size conditions and the columns the different skewness conditions. In each of the 9 plots the x-axis shows the size of the true interaction effect and the y-axis the bias in the estimated standardized regression coefficient. The different methods used to model the interaction effect are visualized with varying colors and shapes of the data points. Filled shapes indicate high reliability conditions, while open shapes represent conditions with poor reliability. Each data point corresponds to the bias of a particular method in the estimated interaction effect, averaged across 500 replications. The error bars represent the 95% confidence interval of the mean estimated bias.

Absolute and relative bias
Inspection of Figures 3, 4 and 5 (absolute bias) and Tables 5-7 (relative bias) reveals some interesting patterns. As expected, the sum score method underestimated the interaction effect in every simulation condition and this effect became more pronounced as the reliability decreased, as the skewness increased, as the size of the true interaction increased, or when item scores were ordinal. Interestingly, a similar pattern was observed for the factor score method, suggesting that this method did not appropriately adjust for measurement error when estimating interaction effects in a generalized linear modeling context.
When item scores were normally distributed, both LMS methods tended to slightly overestimate the interaction effects, especially when the sample size was small (n/p ratio = 18). In general, LMS outperformed the factor score and sum score methods both in terms of average absolute and relative bias in the estimated interaction. As expected, LMS performed especially good at a sample size of 1000 (n/p ratio = 72) and when item scores were normally distributed. As expected, when latent skewness was introduced, both LMS methods underestimated the larger interaction effects, though this effect was still much more apparent for both the sum score and factor score methods. When item scores were ordinal, continuous LMS performed slightly better than categorical LMS when skewness was generated at the latent level. Interestingly, when the latent variables were normally distributed and skewness was introduced at the item level, only the categorical LMS method remained relatively unbiased. It showed similar bias regardless of whether the ordinal item scores were skewed. This pattern was not observed for the LMS method that treated the ordinal item scores as continuous. That method still resulted in underestimated interaction effects when the ordinal item scores were skewed, yet this effect became less pronounced with lower scale reliability.

Confidence interval coverage
All methods showed acceptable coverage probabilities for the smallest interaction effects. The largest interactions resulted in lower coverage for both the sum score and factor score regression methods, especially at larger sample sizes. A possible explanation is that these methods produce biased estimates for the larger interaction effects, and these biased estimates get increasingly narrow confidence intervals due to the larger sample size, resulting in lower coverage rates. Both LMS methods performed best in terms of confidence interval coverage, with acceptable coverage probabilities in almost all simulation conditions, except when skewness was introduced at the latent level, and either the sample size or the true interaction effect was large. When skewness was introduced at the item level rather than at the latent level, the coverage rates of the continuous LMS methods remained suboptimal, while those of categorical LMS were adequate.

Mean squared error
For all methods, as the sample size became larger, the mean squared error became smaller, likely because of less variable estimates. At a sample size of 250 (n/p ratio = 18), LMS showed larger mean squared error than the factor score and sum score methods, likely due to the larger variability in the estimated interaction effects. Lowering the reliability of the questionnaires from .87 to .60 only slightly affected the relative bias and mean squared error for both LMS methods, but it largely affected both the sum score and factor score methods, with lower reliability resulting in more bias, especially for larger interaction effects. When item scores were ordinal and Figure 3. Comparison of three methods to estimate interaction effects between NA and SI on a binary outcome in scenarios with continuous items, varying over the true interaction size (x axis), reliability of the NA and SI scales, amount of latent skewness (columns) and sample size (rows). Each data point shows the mean bias (including 95% confidence interval) in the standardized regression coefficient of the estimated interaction effect.
skewness was generated at the item level, categorical LMS outperformed continuous LMS, though it performed slightly worse when skewness was generated at the latent level.

Power and false positives
As expected, for all methods larger sample sizes resulted in more statistical power to detect true interaction effects. With a sample size of 250 (n/p ratio = 18), all methods were only able to detect the largest interaction effects (0.616) with a power of at least 0.80. At a sample size of 500 (n/p ratio = 36), all methods also became able to detect medium interactions (.308) with sufficient power. Having ordinal item scores resulted in slightly less power for all methods compared to using continuous item scores. As the reliability of the questionnaires decreased, the sum score and factor score methods showed less statistical power to detect the interaction effect, while this only affected both LMS methods to a minor extent. For each method the power decreased as the skewness of the item scores increased, regardless of whether the skewness was generated on the item level or on the latent variable level. An exception was the categorical LMS approach, showing only minor power reductions when skewness was introduced at the item level. All methods showed acceptable false positive rates close to the nominal level of 5% in each of the simulation conditions (ranging between 2.6% and 6.2%).

Synthesis
Inspection of the simulation condition that most resembled the circumstances of our empirical study (500 participants, moderately skewed ordinal item scores (skewness = 2), estimated reliability of 0.87 and an interaction effect of β = .308), suggested that SEM according to both LMS methods performed Figure 4. Comparison of four methods to estimate interaction effects between NA and SI on a binary outcome in scenarios with ordinal items, varying over the true interaction size (x axis), reliability of the NA and SI scales, amount of latent skewness (columns) and sample size (rows). Each data point shows the mean bias (including 95% confidence interval) in the standardized regression coefficient of the estimated interaction effect.
best in terms of power, coverage probability, relative bias and mean squared error in the estimated interaction. Both the sum score method as well as the factor score method showed slightly lower power, lower coverage probabilities and underestimated the true size of the interaction effect. Whether continuous or categorical LMS performed best depends on whether the skewness originated at the latent or item level. If the latent NA and SI traits are skewed, then categorical LMS performed slightly worse than continuous LMS in terms of absolute bias and power, while the reverse was true when the latent traits were normally distributed and the skewness originated at item score level. Given that the latent NA and SI distributions in Figure 2 are not very skewed, we assume the skewed item scores to have originated at the item level. In light of this we consider the categorical LMS approach to perform best in the circumstances of our empirical study and we will therefore base our discussion on these Study 1 results.

Discussion
The goal of this paper was twofold. Our starting point was the empirical question of whether Type D Personality, operationalized as the interaction between its two subcomponents NA and SI, predicts the occurrence of cardiac events in a population of patients suffering from coronary artery disease. We used four different methods to model the interaction effect. Our second goal was to compare the bias, precision and accuracy of these four methods in a Monte Carlo simulation study, in order to shed more light on the inconsistent estimates resulting from the four interaction models used in our empirical study.
As expected, our empirical study showed that Type D personality was associated with the occurrence of major cardiac events, and even more strongly associated with the occurrence of a cardiac death or myocardial infarction. In general, the four methods used to model the interaction agreed Figure 5. Comparison of four methods to estimate interaction effects between NA and SI on a binary outcome in scenarios with ordinal items, varying over the true interaction size (x axis), reliability of the NA and SI scales, amount item skewness (columns) and sample size (rows). Each data point shows the mean bias (including 95% confidence interval) in the standardized regression coefficient of the estimated interaction effect.
on the direction and the statistical significance of the interaction effect, but differed in their estimated size of the interaction. As expected, the effects of the sum score logistic regression analyses were smaller than the effects estimated by the factor score logistic regression and latent logistic regression methods, likely because the sum score approach did not take into account the measurement error in the item scores. Although as expected the estimates resulting from the latent logistic regression were larger than those of the sum score regression, they were unexpectedly also larger than the estimates produced by the factor score logistic regression. This finding motivated our Monte Carlo simulation study.
In our simulation, SEM using LMS to model the interaction effect outperformed all other methods in terms of relative bias when the sample size was large and there was no skewness. This result aligns with our expectations, because LMS assumes that the indicators are multivariate normally distributed (Klein & Moosbrugger, 2000). Indeed, earlier research showed LMS to be biased when the scores of the items loading on the latent exogenous variables are skewed (Cham et al., 2012;Kelava & Nagengast, 2012;Kelava et al., 2014). We replicated this findings by showing that LMS underestimated the larger interaction effects when item scores were skewed due to skewness at the latent variable level. However, when skewness was introduced at the item level rather than at the latent variable level, categorical LMS produced acceptable estimates of the interaction effects at a sample size of 1000. This findings suggests that when sample size is large enough, categorical LMS becomes robust to violations of the assumption of multivariate normally distributed indicators, as long as the underlying latent traits are normally distributed. This robustness does not apply to the continuous LMS method that treats the ordinal item scores as continuous.
As expected, the sum score interaction method in general produced more biased estimates than those of the latent interaction methods. This corroborates earlier research showing that using sum scores may attenuate the estimates of the regression coefficients because using sum scores includes random measurement errors (Busemeyer & Jones, 1983;Embretson, 1996;Kang & Waller, 2005;MacCallum et al., 2002). Because of this finding we recommend researchers not to use the sum score method when analyzing the interaction between two continuous variables on a manifest binary outcome.
Interestingly, the estimates of the factor score interaction more closely resembled those of the sum score method than those of the latent interaction method. Both the factor score method and the SEM LMS account for measurement error when estimating the parameters in the measurement model. However, the factor score regression's two step method may result in contaminated structural regression coefficients if the factor score sample moments are different from the true moments. Though in the context of linear factor score regression this bias cancels out (see for instance, Devlieger & Rosseel, 2017), our simulation shows this is not the case in a generalized linear modeling context (e.g., with a observed binary outcome). Showing this analytically would be an interesting avenue for future research.

Recommendations
We have a number of recommendations to researchers planning to model the interaction between two continuous variables on an observed binary outcome. First, whenever possible use items measured on a continuous scale, as these typically result in less bias than items measured on an ordinal scale. Second, in case of a large sample size (e.g., N ≥ 500 or n/p ratio ≥ 36) consider SEM using LMS to estimate a latent interaction model, as these models are the least biased when sample size is large, especially in the absence of skewness. In that situation researchers could either use categorical or continuous LMS, because in line with earlier research (Dolan, 1994;Rhemtulla, Brosseau-Liard & Savalei, 2012) our simulation suggests that normally distributed ordinal item scores with five categories can be considered continuous. Third, when the estimated reliability of the constructs involved in the interaction is not sufficient, SEM results in less biased estimates than the sum score or factor score regression methods. Fourth, latent skewness introduces negative bias in the estimated interactions for most methods. When the skewed item scores are continuous, LMS is the least biased method, but researchers should take into account that this method may underestimate the larger interaction effects. When the skewed item scores are ordinal, consider using categorical LMS when the underlying latent variables are still normally distributed. Although none of the investigate methods produce unbiased interaction effects when the underlying latent variables are skewed, researchers should consider using continuous LMS in these circumstances, as this method slightly outperforms categorical LMS and the other methods in terms of minimizing bias. However, researchers may also consider modeling the interaction between non-normally distributed latent variables using a Bayesian approach. Although the present study did not focus on this approach, it produced unbiased interaction effects in another simulation study (Kelava & Nagengast, 2012).

Limitations
In our simulation we focused both on items with continuous scales and items with five category ordinal scales. The differences between those scale types was primarily a matter of degree rather than kind. Overall, using ordinal items resulted in less precise and slightly more biased estimates, especially when skewness was high. This is in line with earlier simulation studies showing that when skewness is absent, ordinal items with at least five categories can be treated as continuous items in subsequent analyses (Dolan, 1994;Rhemtulla, Brosseau-Liard & Savalei, 2012). As these studies did not focus on interaction modeling, future research could investigate whether interaction models perform well when the interaction constructs are based on items with a smaller number of ordinal response categories (e.g., 2, 3 and 4). Another limitation of our simulation is that we did not include covariates in the model for reasons of simplicity. Given that covariates are often part of a statistical model in the medical and behavioral sciences, future simulation studies could assess whether the inclusion of covariates affects the performance of methods used to model interactions.
A further limitation of our study is that we only focused on the LMS method to model the latent interaction effect, while there exist many other methods to model latent interactions, such as the product indicator approach (Jöreskog & Yang, 1996;Kenny & Judd, 1984;Marsh et al., 2004), the two stage least squares approach (Bollen & Paxton, 1998), or mixture modeling (Kelava & Nagengast, 2012;Kelava et al., 2014). In earlier work (Lodder et al., 2019) we compared LMS with two different product indicator approaches in a Monte Carlo simulation study and found that LMS was the least biased method when modeling the interaction between two latent variables on a continuous latent outcome variable. Future research could aim at extending the results of the present study to other latent interaction models applied within a logistic regression context.
In our study Type D personality was operationalized as the interaction effect between its continuous subcomponents NA and SI. According to Denollet et al. (2013), the effect of Type D personality on cardiac outcomes in theory implies that it is the combination of having both high NA and high SI that is most detrimental to cardiac health. Smith (2011) interpreted this as saying that the Type D effect is more than the sum of its parts NA and SI, a classic example of synergy. This synergistic effect would imply that it is not the separate NA and SI effects that are essential to Type D personality, but the combined NA and SI effect that is still present above and beyond the sum of their additive effects. Smith (2011) argued that this synergy is modeled statistically by testing an interaction effect while also including the main effects of the variables constituting the interaction. Although in our study we followed Smith (2011) by using a variable-centered approach and modeling Type D personality as an interaction between NA and SI, another commonly used method to operationalize Type D personality is a person-centered approach that classifies people in subgroups based on whether they have crossed a particular cutoff score for both NA and SI (e.g., Denollet et al., 2013Denollet et al., , 2018Hillen, 2017). Person-centered approaches are often useful when the data shows substantial amounts of heterogeneity and the effect of interest is present for some people and not for others. However, classifying people in groups based on cutoff values has been criticized because such dichotomization produces less sensitive statistical tests and may result in spurious findings that are not robust against using other cutoff values and do therefore likely not reflect real differences between the groups (MacCallum et al., 2002; Royston et al., 2006). However, it is not necessary to classify people based on arbitrary cutoff values, because within a latent variable framework it is possible to use the individual item scores to classify people in a set of distinct latent classes and subsequently use class membership to predict the scores on an outcome variable. Therefore, future research could investigate whether such a person-centered approach is more beneficial when studying Type D personality, than the variable-centered interaction model we used in our study.

Conclusions
When seeing the estimates in our empirical study in the light of the results of our simulation study, we can draw several conclusions. First, given the characteristics and outcomes of our empirical study (500 participants, moderately skewed ordinal item scores and an interaction effect of β = .308), the latent interaction model using categorical LMS performed best with respect to minimizing the average bias. It performed slightly better than the continuous LMS method and much better than both the sum score and factor score methods. Those latter two methods produced similar estimates that were both lower than those of the two LMS methods. It is interesting to note that this exact pattern was also found in our empirical study. If we follow the results of the latent interaction model, then we can conclude that Type D personality is a significant predictor of both major cardiac events and an even stronger predictor of cardiac death or myocardial infarction.
In this article we showed that Type D personality is an important risk factor in the occurrence of cardiac events, in line with earlier research on this issue (Denollet et al., 2013;Du et al., 2016;Kupper & Denollet, 2016). We used several statistical interaction models to assess this association, resulting in varying estimates, yet similar conclusions. To the best of our knowledge this study includes the first Monte Carlo simulation comparing the performance of these methods when estimating interaction effects between two continuous variables on an observed binary outcome variable. To our knowledge this is also the first simulation study to show that the mechanism causing the skewed item scores determines what method should be used to model the interaction effects. Although our simulation study was motivated by an issue we encountered in our empirical study, the results are not limited to research on Type D personality. Because our simulation varied over a wide range of design factors, we consider these results to be generalizable to many other research areas involving interactions between continuous latent variables on binary manifest outcomes.

Funding
The research of J.M. Wicherts was funded by the European Research Council (Grant nr: 726361) as part of the programme: H2020-EU.