A Semiparametric Approach for Structural Equation Modeling with Ordinal Data

ABSTRACT There is currently a lack of methods for non-linear structural equation modeling (NSEM) for non-parametric relationships between latent variables when data are ordinal. To this end, a semiparametric approach for flexible NSEMs without parametric forms is developed for ordinal data. An indirect application of a finite mixture of structural equation models (SEMM) is employed for modeling the conditional expected mean of endogenous latent variables. In this context, the latent classes are not to be interpreted as groups of observations belonging to those classes, rather they serve as means to model flexible non-linear functions as locally linear functions which together approximate a globally non-linear function. The proposed method is based on a hybrid of direct maximization and expectation-maximization algorithms. Two simulation studies are performed which show that parameter estimates are associated with low bias and a non-linear functional form is satisfactorily estimated using the proposed approach.


Introduction
In recent decades, structural equation modeling (SEM) has become available to applied researchers, particularly in social sciences, thanks to software packages such as LISREL (Jöreskog & Sörbom, 1993, 1996) and Mplus (Muthén & Muthén, 2020) by modeling the covariance structure assuming normally distributed latent variables.Most SEM approaches have historically focused on linear relationships between latent variables.In this paper, we will refer the exogenous variables as the variables that are not determined by other variables in the model, whereas endogenous variables as the variables determined by other variables in the model.However, it is frequently necessary to incorporate more flexible functions in order to capture the true relationships.See Ajzen and Madden (1986), Ajzen (1991), and Agustin and Singh (2005) for examples of such considerations.When the structural relationship is nonlinear, the latent endogenous variables are not normally distributed.Hence, the parameters cannot be determined only based on the covariance structure.Subsequently, new approaches were developed starting with Kenny and Judd (1984), where products of indicators were used as indicators corresponding to product and interaction terms.This approach was further developed by Jaccard and Wan (1995), Jöreskog and Yang (1996), Kelava and Brandt (2009), and Wall and Amemiya (2001).By forming products of indicators for interaction and quadratic terms, estimation of nonlinear structural models was made possible in (at the time) existing software.The product indicator approaches have been criticized for their ad-hoc nature and arbitrariness in terms of which indicators to include (e.g., Wall (2009) and Harring et al. (2012)).Since then, the methodological development for estimating nonlinear structural equation models (NSEM) has continued by maximum likelihood-based approaches, among which Klein and Moosbrugger (2000) and Klein and Muthén (2007) are the most established methods.
Bayesian approaches of Arminger and Muthén (1998) and Zhu and Lee (1999), and two-stage least square approaches of Bollen (1995Bollen ( , 1996) ) have also been considered.Most of the development for NSEM has been focused on models with continuous indicators, although there are exceptions such as Lee and Zhu (2000), Lee and Song (2003), Rizopoulos and Moustaki (2008), and Muthén (2001).
In all methods mentioned above, the functional form of the structural model is specified.However, it is sometimes desirable to relax this requirement in order to model a more general family of functions.Nonparametric approaches, such as Song et al. (2014), and semiparametric approaches, e.g., Bauer (2005), Pek et al. (2011), Song and Lu (2010), Finch (2015), and Guo et al. (2012) provide the opportunity to estimate models with unspecified functional forms.In the finite mixture of structural equation models (SEMMs), e.g., Bauer (2005) and Pek et al. (2011), latent exogenous variables are modeled as mixtures of multivariate normal distributions.The mixturebased semiparametric approaches have two directions of purposes.In the direct approach, the groups are interpreted as real subpopulations and, hence, group-specific structural parameters are interpreted as parameters associated with the corresponding subpopulation.On the other hand, in the indirect approach, the groups are not to be interpreted as subpopulations in any substantive sense.Instead, they provide means to model nonnormal latent variables as mixtures of normal distributions as in the nonlinear structural equation mixture model (NSEMM) by Kelava et al. (2014).Hence, model estimation, robust to nonnormal exogenous latent variables, is possible in a parametric context.Another application of the indirect approach, as in the SEMM by Bauer (2005) and Pek et al. (2011), is to model a nonlinear relationship as weighted group-specific linear functions.In this approach, both nonlinearity and nonnormality in the latent variables are modeled by the normal mixture, resulting in the flexible model estimation, robust to nonnormal latent exogenous variables.
There is, at present, a lack of non-and semiparametric approaches for latent variable modeling in the presence of ordinal data, although there are a few exceptions, e.g., Song et al. (2013) who extended the penalized Bayesian P-splines approach of Song and Lu (2010) to incorporate mixed data types.It has been demonstrated that treating ordinal data as continuous can produce bias (Finney & DiStefano, 2006;Rhemtulla et al., 2012).In this article, a semiparametric approach, based on the work of Bauer (2005) and Pek et al. (2011) is developed to allow for ordinal indicators based on quasi-maximum likelihood estimation, combining the direct maximization of the approximated loglikelihood of Jin et al. (2020) and the expectation-maximization (EM) algorithm of Dempster et al. (1977).
The article is organized as follows.First, the model is outlined, followed by the estimation procedure.Two simulation studies are reported in the following section.The first study recognizes that accurate, group-specific parameters are obtained, whereas the second study shows that a structural model of quadratic form is satisfactorily estimated using the developed approach.The article concludes with discussion and conclusion.An appendix is provided with technical details.

The semiparametric model
In the SEMM, as proposed by Bauer (2005) and Pek et al. (2011), a weighted sum of linear components is used to model the conditional expected value EðηjξÞ as where ξ is a K ξ -dimensional vector of latent exogenous variables, η is a K η -dimensional vector of latent endogenous variables, M is the number of components in the mixture, τ ðmÞ is a K η -vector of intercepts for component m, Γ ðmÞ is a K η � K ξmatrix representing linear effects of component m, and the conditional probability of component m is where N K ξ ðÞ is the multivariate normal density function, π ðmÞ is the population proportion of component m, μ ðmÞ is the population mean vector of dimension K ξ � 1 corresponding to component m and Φ ðmÞ is the population covariance matrix of ξ corresponding to component m of dimension Only continuous data are considered by Bauer (2005) and Pek et al. (2011).In this work, their approach is extended to work for ordinal data.Define X j to be the j th ordinal indicator of ξ and Y j to be the j th ordinal indicator of η. 1 The number of indicators corresponding to ξ and η is denoted by ρ x and ρ y , respectively, such that x and y are the ρ x and ρ y dimensional realizations of X j , for j ¼ 1; 2; 3 . . .; ρ x and Y j for j ¼ 1; 2; . . .; ρ y , respectively.The measurement model relating the continuous latent variables ξ and η to the ordinal observed variables x and y is then x;j À β T x;j ξ; u j 2 1; . . .; U j � � ; j ¼ 1; 2; . . .; ρ x ; (3) where g ðxÞ j and g ðyÞ j are the link functions, β T x;j is the ρ x -vector of factor loadings of indicator X j , β T y;j is the ρ y -vector of factor loadings of indicator Y j , and U j and V j are the number of categories of corresponding indicator.Here we let α

Maximum likelihood estimation
The joint density function of y, x, η, ξ and m is referred to as f y; x; η; ξ; m ð Þ.We assume that the observed variables y and x are independent, conditioned on the latent variables η and ξ and component m.The distribution of y is determined by η and the distribution of x is determined by ξ from the measurement model (Equations ( 3) and ( 4)).The measurement model does not depend on the group m.The SEMM (1) indicates that the distribution of η is determined by ξ and component m and that the distribution of ξ is determined by the component m.Thus, the joint density function is equivalent to where f yjη ð Þ and f xjξ ð Þ are given by the measurement model (3) and (4), ξjm,N ξ; μ ðmÞ ; Φ ðmÞ � � (6) with the restriction of total probability P M m¼1 π ðmÞ ¼ 1.The conditional distribution of η given ξ and component m is assumed to be In the current setting Ψ is the same for all M components for simplicity.This restriction could easily be lifted.Neither the latent variables ξ i and η i , nor the components m i , are observed, where subscript i refers to observation i.Hence, under the assumption of independent observations, the objective function for maximization is the observed log-likelihood where θ is the vector of unrestricted parameters, n is the number of observations and , i.e., the vector of latent variables of the i th observation.Similarly, y i and x i are 1 The subscripts on X and Y should in principle be different, e.g., j x and j y but for notational convenience, we use the same (j) for X and Y.
vectors of observed indicators of the i th observation, and m i represents the m th i component.

Estimation
The aim is to maximize ,ðθÞ with respect to θ.In situations with missing data (or latent variables), it is common to employ the EM algorithm (Dempster et al., 1977).However, the EM algorithm is known for its slow convergence.Jin et al. (2020) proposed to estimate an extended NSEM by a quasi-Newton method where the exact gradient of the approximated log likelihood is calculated for finding the maximum of ,ðθÞ without using EM, hence, referred to as direct maximization (DM).
For the extended NSEM, we expect DM to be computationally more efficient than EM.Instead of purely relying on the EM algorithm, we propose to use a hybrid of EM and DM in Jin et al. (2020) in order to estimate the SEMM.The developed version of the EM algorithm is expected to be reasonably computationally efficient since there are closed-form solutions to the parameter updates in one EM step.For the purpose of presentation, a sketch of the proposed algorithm is described here.A detailed description of the procedure can be found in the appendix.
Let U ðmaxÞ and V ðmaxÞ be the maximum number of categories of the indicators corresponding to ξ and η, respectively.
Then let α x be a ρ x � U ðmaxÞ À 1 À � -matrix with element α on row j and column u j , α y be a ρ y � V ðmaxÞ À 1 À � -matrix with element α ðv j Þ y;j on row j and column v j .Let β x be a ρ x � K ξmatrix with the row vector β T x;j on row j and β y be a ρ y � K ηmatrix with β T y;j on row j.For convenience, the vector of unconstrained parameters is partitioned as θ , where θ 1 is a vector of unrestricted elements in α y , β y , α x and β x and θ 2 is a vector of unrestricted parameters in Ψ, τ ð1Þ , . .., τ ðMÞ , Γ ð1Þ , Γ ðMÞ , Φ ð1Þ , . .., Φ ðMÞ , μ ð1Þ , . .., μ ðMÞ , π ð1Þ , . .., π ðMÞ .For a given θ 2 , θ 1 only consists of group invariant parameters in the measurement model.We propose to maximize , θ 1 ; θ 2 ð Þ with respect to θ 1 for fixed θ 2 by DM.No closed-form updates of θ 1 are found.Hence, the quasi-Newton's method is used to update θ 1 numerically.For a given θ 1 , θ 2 is updated using the EM algorithm.EM is used for some parameters since the logarithm of the sum over the components m i in (9) generates instability in the gradient of ,ðθÞ.Further, in the proposed version of EM, the conditional maximization has closed-form solutions.
Both the closed-form updates in EM and the , θ 1 ; θ 2 ð Þ to be maximized in DM involve intractable integrals and approximations are required.In DM, the log-likelihood , θ 1 ; θ 2 ð Þ is approximated by the adaptive Gauss-Hermite quadrature (AGHQ, Liu & Pierce, 1994).The reader is directed to the appendix for the expression of an AGHQ approximation.For an unimodal function, the error rate of the AGHQ approxima- Jin & Andersson, 2020) where � b c takes the largest integer that is less than the enclosed value and q is the number of quadrature points per latent variable.
The AGHQ approximation was accurate for the extended NSEM investigated by Jin et al. (2020) for q � 5.
The exact gradient of the approximated log-likelihood is calculated in the quasi-Newton's method, which is similar to the method of Jin et al. (2020).In EM, the integrals in the closed-form solutions are also approximated by the AGHQ and are expected to be accurate for large samples and sufficiently large q.See the appendix for details.A pseudocode of the proposed approach is presented in Algorithm 1 at iteration k. , update θ 2 with approximated closed-form solutions end

Restrictions
In the SEM framework, a common assumption is that In the proposed semiparametric setting, it is more convenient to put restrictions in the measurement model as explained below.It is, however, whenever appropriate, possible to perform a change of variables such that the new variables, The common assumption that E ξ ð Þ ¼ 0, corresponds to K ξ restriction equations for proportions and means of exogenous latent variables.The corresponding maximization equations in the EM step then lack closed-form solutions.In the current work, we define K ξ restrictions on α x instead.Impose the restrictions α ð1Þ x;s ¼ 0 where s is an index corresponding to a row of α x and on the same row in β x there is a restricted element setting the scale of a latent variable.There is one such restriction per latent exogenous variable giving rise to K ξ restrictions.Consider a change of variables for ξ � ¼ ξ À μ ξ , where Then, the measurement model, e.g., corresponding to the indicator x s , is in general.Correspondingly K η number of restrictions for η are also needed.For this reason, the restriction τ ð1Þ ¼ 0 is imposed.Then, the change of variables η Thus, it is possible and straightforward to alternate between the sets of variables η T ; ξ T À � , for which the estimation procedure is designed, and η �T ; ξ �T  À � which has mean zero.
For example, the population latent regression in Simulation Study 2 in the next section is η which is based on the zero-mean parametrization, namely, The linear predictor for the first indicator is α ð1Þ x;1 À β ðxÞ 1 ξ � , where α ð1Þ x;1 �0, when generating the ordinal values.During estimation in our parametrization, we restrict α ð1Þ x;1 ¼ 0 and τ ð1Þ ¼ 0. The linear predictor for the first indicator becomes À β ðxÞ 1 ξ and the latent regression in the first group is Γ ð1Þ ξ.Consequently, neither EðηÞ nor EðξÞ is necessarily zero.Instead, they are μ ξ ¼ P M m¼1 π ðmÞ μ ðmÞ and μ η ¼ P M m¼1 π ðmÞ τ ðmÞ þ Γ ðmÞ μ ðmÞ À � , respectively.We can easily transform the estimates from our parametrization to the zeromean parametrization using ξ � ¼ ξ À μ ξ and η � ¼ η À μ η .In particular, the β and Γ ðmÞ , for all m, remain the same.The measurement models in two parametrizations can be transformed by For the latent regression, our parametrization yields Hence, the conditional mean in the zero-mean parametrization can be obtained by where

Simulations
In order to investigate the performance of the proposed method, two simulation studies are performed.In the first study, data is generated using prespecified, true model parameters, and the accuracy of the parameter estimation is investigated.In the second study, the data is generated using a quadratic function.Hence, no true parameters are known (or even existing).The degree to which the procedure captures the true nonlinear function is investigated, without having access to true parameters.The code is written in R (R Core Team, 2018) and will be provided upon request and will be released as an R package.In the simulations, the estimation stops when the maximum absolute difference in the parameters between two consecutive iterations is less than 5 � 10 À 4 .In each simulation study, 1000 samples are generated and estimated using 5 quadrature points.For evaluation purposes in Simulation Study 1, the relative bias ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi , and standard deviation SD p ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi are calculated for each structural parameter θ p , where R is the number of replications and θp;i is the corresponding estimate at the i th replication, and � θ p is the average estimate of the parameter θ p .
2 A simple structure is not necessary, but in this first study, a simple structure is employed for simplicity.

Simulation results
Two (out of 1000) replications failed to converge for n ¼ 500 and all samples converged for n ¼ 1000.In line with the literature (Flora & Curran, 2004), relative biases larger than 5 are considered to be moderate to substantial.In Table 1 (n ¼ 1000) and Table 2 (n ¼ 500) the population parameter values (θ p ), average estimates ( � θ p ), RB, 10 � SD and 10 � RMSE are reported for the parameters associated with the structural, semiparametric model.The proposed semiparametric method provides low biases for all parameters.The RMSE is almost exclusively constituted by the variance of the estimates.

Simulation design
In the second simulation study, data (n ¼ 1000) were generated using the same quadratic function as Bauer (2005), namely where ξ � is generated from a standard normal distribution, the error term ζ is generated from a normal distribution with mean 0 and variance 0.25.The stars are used because Hence, after estimation, the estimated function is shifted as described before.The parameters of the measurement model are given by α T x ¼ 0 À 0:20 À 0:10 2:97 2:63 2:39 3:85 3:59 3:22 5:24 4:84 4:19

Simulation results
The conditional expected mean E η � jξ � ð Þ is shown as a function of ξ � in Figure 1 when 2 and 3 components are used (left and middle panels), respectively.The bold line represents the population function in equation ( 10), whereas the thin line represents the average estimated conditional mean function.Ninety-five percent of the estimated functions are within the shaded areas.With two components the functional form is captured to some degree, although the performance is substantially better using three components.Although the average result is acceptable, in some instances, the form of the conditional expected mean is jumping peculiarly when one of the proportions is estimated to be close to zero.In practice, it is difficult to know how many components to use.One possibility is to use the Akaike information criterion (AIC) for selecting the number of components.The AIC is defined as where k is the number of estimated parameters and , θ � � is the approximated log likelihood at the mode.In the simulations, 3 components are recommended in approximately 84% of the replications.In the right panel of Figure 1, each replication has been estimated using 2 and 3 components and the model with the lowest AIC has been selected.The average estimated conditional mean function (the thin line) captures the population function well.

Discussion and conclusion
In NSEM it is common to consider terms corresponding to interaction and quadratic effects for modeling nonlinearities in the structural model.It is sometimes appropriate to consider more flexible functions with less, or no a priori assumptions about the functional form.Such models are less common in the literature, particularly models which consider ordinal data.In this article, a semiparametric structural model (Bauer, 2005;Pek et al., 2011) is proposed in the presence of ordinal data.The structural model consists of a weighted sum of locally linear functions, which together approximate a globally nonlinear function.A hybrid of a direct maximization approach (Jin et al., 2020) and an expectation-maximization algorithm (Dempster et al., 1977) is suggested and implemented for maximizing the marginal log-likelihood of the observed ordinal data.Two simulation studies are performed.The first shows that the proposed method provides parameter estimates with low relative bias, whereas the second study illustrates how the method accurately captures a nonlinear structural model in terms of the conditional mean of the latent endogenous variable.The accurate parameter estimates and performance in estimating the conditional mean suggest that the AGHQ approximations are of high quality for q � 5 in models similar to those investigated in this study.Two and three components are used in the second study.Using two components yields a reasonable and stable functional form, whereas three components yield a more accurate functional form on average, with the drawback of occasionally unstable estimates.The number of components might possibly be selected using AIC.Advantages of the proposed method include the capacity of modeling flexible structural models in the presence of ordinal data when the latent exogenous variables are not necessarily normal.In this study, little attention is devoted to the latter point.A suggestion for future studies is to investigate the performance of the proposed method under moderate to severe nonnormality of latent exogenous variables.This is left as a future project.

Measurement Model Estimation
Since the sum in ( 9) does not depend on the parameters in θ 1 , direct maximization, as proposed by Jin et al. (2020) is implemented for θ 1 given the current estimates of θ 2 .The observed likelihood is where L i is the observed likelihood of the i th observation.Let the mode Υi;m be the solution to and the Hessian be evaluated at the mode.Further, let L i;mi L T i;mi ¼ À H i;mi Υi;mi ; θ À � À � À 1 .Applying the AGHQ approximation using equations ( 11) and ( 12) to the i th factor of equation ( 14), gives where Υi;mi ðk 1 ; k 2 ; . . .; k K Þ ¼ ffi ffi ffi 2 p L i;mi z k1;k2;...;kK þ Υi;mi are the translated and dilated latent variables about the mode.And with where , is the log of the m th i term in equation ( 15).θ 1 is updated by solving @, @θ 1 ¼ 0 with respect to θ 1 , using the approximation in equation ( 15).Then equation ( 16) is Acknowledging that the mode Υi;mi depends on θ, the gradient of , i;m θ; Υi;m ðθÞ À � is @, i;mi θ; Υi;mi ðθÞ À � @θ 1 ¼ @, i;mi ðθÞ @θ 1 j Υ¼ Υi;m i þ @ Υi;mi ðθÞ @θ T � �T @, i;mi ðΥÞ @Υ j Υ¼ Υi;m i ; (17) where @, i;mi ðθÞ=@θ 1 is the derivative of , i;mi with respect to θ 1 , treating Υi;mi as constant, whereas @, i;mi ðΥÞ=@Υ is the derivative of , i;mi with respect to the latent variables, treating θ as fixed.The first factor in the second term of ( 17) is, by the implicit function theorem In order to find the solution to equation ( 16) the BFGS approximation to the Newton-Raphson method is implemented.The inverse of H i;mi is approximated and updated by low-rank approximation at each iteration.The updated parameter vector θ 1 is obtained by updating θ 1 with the BFGS algorithm a number of times.

Structural Model Estimation
If the quasi-Newton-Raphson method proposed in the previous section would be used, the sum in equation ( 14) introduces instability for parameters, which depend on m i .All elements in θ 2 depend on m i except for Ψ.Ψ is the same for all m i in this setting only for simplicity.The reason for keeping Ψ in θ 2 is for convenience at later stages in more general settings.The EM algorithm of Dempster et al. (1977) is implemented for updating θ 2 given the current estimates of θ 1 , as follows: Treating Υ i and m i as random variables, let the conditional distribution of Υ i and m i be .The update of θ 2 is the vector that maximizes the expected value of the complete log-likelihood given the conditional distribution in equation ( 18).Throughout this section, θ 1 is fixed to the current estimate.Hence, for simplicity, θ 1 is dropped.Then, the objective function is The integrals in (36) are intractable.For ρ i;m the AGHQ approximation is applied as in the previous section.For the other integrals, the same weights and quadrature points are used.Integrals should still be asymptotically well approximated since the original integrand is multiplied by a polynomial of degree at most 2, as argued by Naylor and Smith (1982).Once the updated estimates are obtained the previous estimates θ

Table 1 .
Parameter and corresponding average estimates using AGHQ approximation with 5 quadrature points per latent dimension and 1000 replications.RB, SD, and RMSE stand for relative bias, standard deviation, and root mean squared error, respectively.The sample size is N ¼ 1000.

Table 2 .
Parameter and corresponding average estimates using AGHQ approximation with 5 quadrature points per latent dimension and 1000 replications.RB, SD, and RMSE stand for relative bias, standard deviation, and root mean squared error, respectively.The sample size is n ¼ 500.