Exploratory Mediation Analysis with Many Potential Mediators

Social and behavioral scientists are increasingly employing technologies such as fMRI, smartphones, and gene sequencing, which yield 'high-dimensional' datasets with more columns than rows. There is increasing interest, but little substantive theory, in the role the variables in these data play in known processes. This necessitates exploratory mediation analysis, for which structural equation modeling is the benchmark method. However, this method cannot perform mediation analysis with more variables than observations. One option is to run a series of univariate mediation models, which incorrectly assumes independence of the mediators. Another option is regularization, but the available implementations may lead to high false positive rates. In this paper, we develop a hybrid approach which uses components of both filter and regularization: the 'Coordinate-wise Mediation Filter'. It performs filtering conditional on the other selected mediators. We show through simulation that it improves performance over existing methods. Finally, we provide an empirical example, showing how our method may be used for epigenetic research.


Introduction
Social and behavioral scientists are increasingly employing technologies such as fMRI, smartphones, and gene sequencing, which yield 'high-dimensional' datasets with more variables than observations. These high-dimensional data are often intended to answer questions such as "which areas of our brain are relevant for pain perception?" (Atlas, Lindquist, Bolger, & Wager, 2014) and "which genes mediate the effect of trauma on stress reactivity?" (Houtepen et al., 2016). These are questions regarding exploratory mediation analysis (EMA). Structural equation modeling (SEM) is the preferred method for mediation analysis with multiple mediators (Preacher & Hayes, 2008;Vanderweele & Vansteelandt, 2014). With this method, it is possible to determine to what extent specific M variables mediate the X → Y effect conditional on the presence of other mediators in the model. However, this method fails when the data is high-dimensional, when the variables under investigation outnumber the samples N . In this situation, the observed covariance matrix is rankdeficient, leading to linear dependence in the observed moments and, for the full mediation model, nonconvergence.
Several alternative methods for EMA have been proposed to deal with this issue. One option mentioned by Preacher & Hayes (2008) is to select relevant mediators from a series of univariate X → M → Y mediation models (e.g. Boca, Sinha, Cross, Moore, & Sampson, 2014;Liu et al., 2013). We call this the "filter" method, following the taxonomy of Guyon & Elisseeff (2003). Its main advantages are that it is simple to explain and run, requiring only P univariate path models. On the other hand, the filter method introduces bias through model misspecification: it takes into account only the marginal relationships of M with X and Y . A pitfall of this is that a variable useless by itself can be useful together with others (Guyon & Elisseeff, 2003). In other words, a certain mediator may be marginally irrelevant, but relevant conditional on another set of mediators.
rather than the desired subset of mediators: the regularization in XMed shrinks small β paths to 0, irrespective of the value of their associated α paths -shrinkage is performed on all paths equally. This leads to inflated false positive rates as reported by Serang et al. (2017) and Jacobucci, Brandmaier, & Kievit (2018). In summary, regularization methods do perform conditional estimation, but they select paths rather than mediators.
In this paper, we propose a hybrid approach to EMA which we call the "Coordinate-wise Mediation Filter" (CMF). This method combines advantages from both the filter and regularization methods: (a) it converges in case of high-dimensional data, (b) it takes into account mediator correlations, leading to conditional selection of mediators, and (c) it selects based on mediation, not paths. CMF performs univariate filtering conditional on the other selected mediators by using an algorithm from regularized regression: cyclical coordinate descent on residuals (Breheny & Huang, 2011;Friedman, Hastie, & Tibshirani, 2009).
The remainder of the article is structured as follows: first, we provide relevant background on exploratory mediation analysis. Then, we outline the Coordinate-wise Mediation Filter as a hybrid method for mediator subset selection. Following this, we show through simulation where each of the discussed methods performs as well as SEM. In addition, we assess the performance of CMF relative to the other available methods in a highdimensional simulation. Lastly, the CMF procedure is illustrated by applying it to the epigenetic process of trauma and stress reactivity.

Exploratory mediation analysis
The fundamental goal of mediation analysis is to determine the process by which a variable X influences another variable Y (MacKinnon, Lockwood, & Williams, 2004). Exploratory mediation analysis (EMA) in particular is used to explore a dataset for potential mediating variables (MacKinnon, 2008). In other words, EMA pertains to determining among multiple potential mediators which subset is most relevant. Through EMA, researchers can build theory and select variables of interest for further research into the process under investigation.
An example application of EMA is the research by Ammerman et al. (2018), who investigated how childhood maltreatment leads to suicidal behaviour. They defined 46 potential mediators, including psychological counseling, closeness to parents, and selfesteem. The authors did not test a fully specified mediation model about the precise relations of each of these variables to childhood maltreatment and suicidal behaviour. Instead, this study was exploratory, identifying which variables were the most relevant targets for future research. Indeed, the authors conclude that the study "highlights factors that may be potential targets for risk assessment and for treatment among adolescents with a history of childhood maltreatment".

Univariate mediation analysis and the filter method
A common framework for univariate mediation analysis is a system of regression equations (Equation (1); MacKinnon et al., 2004). The system is displayed graphically in Figure 2.
In the present paper, we consider only the case where the data from X, M , and Y are continuous and their relations are linear. For nonlinear discrete extensions to mediation analysis, see Hayes & Preacher (2010) and Hayes & Preacher (2014), respectively. For further details, refer to the reviews by MacKinnon, Fairchild, & Fritz (2007) and Preacher (2015). Under the standard assumptions of linear SEM, the parameter estimates of this system may be used to determine whether M is a mediator -a dichotomous decision. There are several ways to make this decision, usually based on a quantity of interest q and a measure of uncertainty (MacKinnon, Lockwood, Hoffman, West, & Sheets, 2002). For example, q may represent the size of the indirect effect through the product of its coefficients q prod = αβ, and uncertainty measures for q prod can be obtained using asymptotic standard error methods (e.g., Olkin & Finn, 1995;Sobel, 1986) or bootstrapping (Preacher & Hayes, 2008).
Combining the quantity of interest q with an uncertainty estimate and a specified alpha level yields a dichotomous decision criterion based on a p-value. We call this a univariate decision function D: a function that maps the data of X, M , and Y to a binary decision of whether M should be considered a mediator (1) or not (0).
Note that any function that follows this specification can be considered a decision function, regardless of complexity. An example of higher complexity decision functions is given by VanderWeele (2015, p. 46), who states exposure-outcome confounding should by default be controlled for when testing for mediation. The decision function encodes the researcher's definition of mediation: a product of coefficients decision function with a p-value cutoff of 0.1 will lead to different results than an exposure-outcome controlled decision function with a stricter cutoff.
This decision function framework thus provides a convenient abstraction, highlighting a key advantage for mediation analysis methods: if the choice of decision functions is flexible, a method is adaptable to the specific needs of a researcher. If researchers want to follow the recommendation of VanderWeele (2015), they can do so by adding an XM interaction term into the decision function.
While these decision functions are univariate, EMA is an inherently multivariate procedure, requiring analysis of multiple indirect effects. To perform EMA, a researcher can apply their chosen decision function to each mediator separately, through P different mediation models as in Figure 2. This "filter method" will result in a subset of relevant mediators. However, the implicit assumption is that the M variables are independent of one another. In other words, the selected subset will not include mediators that are relevant only conditionally on another mediator.

Multivariate mediation analysis and XMed
To make mediation decisions multivariately, Preacher & Hayes (2008) recommend the SEM approach. In this approach, the quantities of interest q 1 , ..., q P and their uncertainty are estimated directly from a multiple mediation model as in Figure 1. A decision can then be made for each individual M p based on its multivariately estimated quantity q p . Unlike the filter method, this approach estimates q p conditional on the other P − 1 quantities, so that marginally irrelevant true mediators may still be detected.
However, the SEM approach is unavailable in the case of high-dimensional data because SEM parameters are estimated from observed covariances. High dimensional data (P > N ) leads to a P ×P observed covariance matrix of at most rank N , meaning a linear dependence exists among elements. If dependent elements are mapped to separate parameters in the SEM model, an infinite number of solutions exist for the same log-likelihood, so there is no maximum likelihood solution. This is the case in the full mediation model. As an alternative intuitive explanation, it is possible to view the M → Y part of the mediation model as a high-dimensional multiple regression, where ordinary least squares (OLS) estimates are unavailable because the covariance matrix cannot be inverted (Hastie et al., 2015).
XMed is an adjustment to the SEM method that not only allows for high-dimensional data, but it also automatically selects a subset of mediators without an explicit decision function. The estimation method for XMed is RegSEM (Jacobucci, Grimm, & McArdle, 2016), which applies regularization to a chosen subset of model parameters in a structural equation model. This shrinkage is determined by the hyperparameter λ along with the penalization function P (·) in the objective function of RegSEM: where · is a vector of parameters.
In XMed specifically, shrinkage is applied to the vectors of α (x → M ) and β (M → y) parameters. Subset selection of the mediators occurs through the chosen regularization method; the penalty function P (·) is the LASSO penalty, the 1 norm of the chosen parameter vector: P (·) = · 1 . Depending on the value of λ, The LASSO penalty shrinks the smallest of the chosen parameters to 0 during estimation. This immediately forms the decision rule: for potential mediator M p , if α p or β p equals 0, then the estimated indirect effect αβ p is 0, thus M p is not considered to be a true mediator.
A well-known algorithm for computing the LASSO solution, which can also be applied in SEM, is coordinate-wise conditioning or coordinate descent: the conditional solution is well-known and easy to find, in SEM the maximum likelihood estimates, and the penalized solution is found by cyclically updating and soft-thresholding the conditional solution for each parameter in turn, until convergence (Hastie et al., 2015).
A sequential combination of the ideas of filtering and regularization was proposed by Zhang et al. (2016) in a three-step approach called HIMA. First, in the screening step the authors marginally filter irrelevant potential mediators based on the M → Y relations. Second, the remaining M → Y paths are estimated with regularization. Lastly, the test step performs the joint significance test as introduced by Baron & Kenny (1986) with Bonferroni correction on the remaining mediators.
The main disadvantage of these methods is that there is a pertinent difference between (a) penalized estimation of the paths and (b) finding mediators. For XMed, a relatively small α p path will be shrunk to 0 before stronger α paths, regardless of the strength of its associated β p path. This holds for HIMA too, since in the selection stage it considers only β paths. Thus, these methods do not target mediators with strong indirect effects αβ, but intermediate variables with strong α or β paths. Even though these methods do work conditionally, they make the implicit assumption that the mediators also have the strongest X → M and M → Y paths, which need not be so.
Rephrasing this in terms of decision functions, the regularization methods exclude variables which have a relatively weak covariance with X or Y . However, this decision criterion only partially captures theoretically plausible mediators: true mediators may exist for which the covariance with X or Y is relatively weak, but the indirect effect αβ is relatively strong. The regularization methods will thus underperform in the presence of "noise" variables which are not mediators, but which strongly covary with either X or Y . We illustrate this in the simulation section.
In conclusion, to perform EMA, (a) the SEM method is optimal but unavailable for high-dimensional data, (b) the filter method is simple and flexible but does not select mediators conditionally, and (c) regularization methods do proper conditioning but are estimating paths rather than selecting mediators.

Coordinate-wise Mediation Filter
We propose a hybrid method, the Coordinate-wise Mediation Filter (CMF), which contains both theory-driven decision functions and conditional estimation of the quantity of interest. Like the filter method, CMF applies a decision function to each of the mediators, but it performs this task conditional on the set of currently selected mediators. The procedure is similar to cyclical coordinate descent, the algorithm underlying regularization procedures in various software implementations -but differs in that mediation rather than separate regression paths are explicitly identified as the target. A key component of this algorithm is the use of residuals to remove dependency among the coordinates (Hastie et al., 2015). CMF generalizes this idea to mediator selection with arbitrary objective functions.
The CMF implementation consists of two components: an inner algorithm, which handles feature selection using the decision function D through coordinate descent, and an outer algorithm, which performs random starts, feature subsampling, and subsequent aggregation. The combined procedure can be characterized as a stochastic coordinate descent algorithm. The following two sections give a detailed outline of the inner and outer algorithm.

Inner algorithm
First, we initialize a vector of length P which contains the current mediator selection in the form of 0 and 1 values -the starting values. A step is then as follows: for each potential mediator M p , create a data matrix M * , which contains all the mediators currently selected, excluding the variable M p under consideration. Then, perform the decision function D on the parts of x and y orthogonal to (conditional on) this matrix. This conditioning is performed through calculating the residuals of x and y with respect to M * : The decision function is thus performed as D(r x , M p , r y ), leading to a binary decision whether mediator p selected, conditional on M * .
The inner algorithm is run continuously, randomly ordering the choice of p in each iteration. It stops either when the mediator selection does not change from one step to the next, or when the prespecified maximum number of iterations is reached. The resulting program, shown in Algorithm 1, is a binary, randomized form of cyclical coordinate descent similar to those in Hastie et al. (2015). The randomization improves stability for very highdimensional data (Nesterov, 2012). Richtárik & Takáč (2014) show that this method attains relatively fast convergence even with a billion variables in a sparse regression situation.

Outer algorithm
The value of the decision vector resulting from the inner algorithm depends to some extent on the starting values, due to the discrete nature of its coordinates. Therefore, the algorithm is embedded in an outer loop that performs multiple random starts. After aggregating the results from the different starts, the decision vector of length P is continuous: each element p in this vector signifies the proportion of times the potential mediator M p was selected by the inner algorithm. These proportions, or empirical selection probabilities, naturally lead to a mediator ranking. This ranking can then again be dichotomized using a cutoff score.
The second essential part in the outer algorithm is feature sampling. With feature sampling, the inner algorithm will loop over only √ P potential mediators at each iteration. This procedure is similar to how random forest decorrelates its trees (Breiman, 2001). Zhang, Zhao, Zhang, & Wei (2017) show in a sparse regression setting that feature sampling improves and stabilizes the performance of feature selection. Furthermore, there are links between feature sampling and shrinkage: for linear regression, considering only √ P variables during training is equivalent to ridge regression on the standardized predictors. This generalizes to more complex methods such as GLM (Wager, Wang, & Liang, 2013). Feature sampling in the CMF algorithm thus takes on the crucial role of regularization.
The entire CMF procedure is implemented in the R package cmfilter, available from https://github.com/vankesteren/cmfilter. An example analysis with specific hyperparameters and cutoff score determination is described in the application section to this paper, with accompanying R code in the supplementary material.
The CMF method addresses the most important issues associated with both filter and regularization methods: it conditions on the other mediators while simultaneously being flexible to the choice of theoretically relevant decision functions. In the next section, we investigate the performance of CMF through simulation.

Simulations
This section is subdivided into two parts. The first part aims to show empirically the theoretical advantages and disadvantages of SEM, filter, XMed, HIMA, and CMF. We simulate specific conditions which are theoretically challenging for some but not all methods. The results from the first section are aimed at generating an understanding of the theoretical background in the present paper.
The second part is aimed at simulating real-world performance in a controlled highdimensional situation. The results from this section indicate to what extent the CMF method outperforms its rival methods in practice, in addition to providing an anchor for the expected absolute level of performance in terms of false positives and true positives in such a situation.
All the simulations were run on R version 3.5.0 (R Core Team, 2018). The full environment used for the simulations is shown in Appendix A.

Theoretical conditions
The goal of this section is to illustrate when each method performs adequately and when it does not. Two situations are of particular interest: (a) suppression through correlation among mediators, and (b) noise in the α and β paths, overshadowing a potential mediator. Filter methods are likely to underperform in terms of power in the first case, as the effect of a mediator is dependent on another and marginally invisible. In the second case, the regularization methods are theorized to under-perform because the α and β paths are regularized independently whereas it is their combination that indicates mediation.
The data was controlled to behave according to the population, i.e., the data was transformed to exhibit the exact correlation matrix implied by the data-generating model. In each simulation, we show the power and false discovery rates of the three methods in 100 simulated datasets of 400-600 observations. The decision function under consideration for the filter, SEM, and CMF methods was the Sobel test (Sobel, 1986), one of the most common tests in the product of coefficients category (MacKinnon et al., 2002). For these tests, any variable with a p-value below .1 was considered to be a mediator. The SEM and filter methods were implemented using the lavaan package (Rosseeel, 2012), and CMF was implemented using the accompanying cmfilter package. For XMed, the regsem package (Jacobucci et al., 2016) was used with cross-validation was to find the optimal penalty parameter, and any variables with nonzero α and β paths were considered mediators. HIMA was run according to its implementation in the R package HIMA (Zhang et al., 2016), again with a p-value of .1. Further details on the data generation and precise simulation conditions can be found in the the R code in the supplementary material.

Suppression
In the first illustration, the effect of the second β path is 0, but conditional on the first mediator this effect is nonzero. Its data-generating model is shown in Figure 3. The power to detect the second mediator thus indicates the robustness of each selection method to a full suppression effect. The results are shown in Table 1, in the form of power to detect each mediator. As expected, the filter method fails to detect M 2 under the marginal suppression in this data, whereas the other methods do detect the suppressed mediator.

Noise in the α paths
The second illustration considers noise in the form of variables related to X. In addition to the single mediator, 15 noise variables were generated; the α path was set to 0. The results are displayed in Table 2 in the form of rates of detection for each potential mediator. The SEM method performs optimally, as do the HIMA and CMF methods. The filter and XMed methods do not perform as well as these, having relatively strong false positive rates and lower power, respectively.

Noise in the β paths
Like the second illustration, the third adds 15 noise variables alongside the true mediator. This time, the noise variables are related to the outcome variable Y . The data-generating mechanism is shown in Figure 5. The results can be found in Table 3. The HIMA method, which in the previous simulations performed as well as the benchmark SEM method, fails to detect the mediator in any of the 100 iterations. The other methods attain a perfect score.

Suppression and noise
The last illustration combines the above simulations into a single data-generating mechanism, where both suppression and noise are present, as shown in Figure 6. The results of this combined simulation, displayed in Table 4 show again that CMF performs at benchmark level. An interesting quantity for the imperfect methods is the positive predictive value (PPV): the probability that a mediator selected by a method truly mediates the effect of X on Y . For filter and XMed methods, the PPV is lowered through either a relatively low true positive rate (power) or a high false positive rate (type-I error).

Interim conclusion
While the considered data-generating mechanisms are very specific, the differences in performance between the methods can be exacerbated and diminished by altering the parameter values while preserving the structure. Overall, CMF is the only method that performs as well as the baseline in all of these data-generating mechanisms. Together, they show that this method is robust to boundary cases where other methods may fail. This is a valuable property of a mediator selection method, because these situations may occur simultaneously, with no way to test them in real-world datasets. In the next part, we explore how well the CMF method performs in high-dimensional circumstances, where the baseline optimal SEM method cannot work.

High-dimensional mediation simulation
In this section, we compare the performance of the available EMA methods in a simplified high-dimensional situation. Due to the wide nature of the dataset (p = 1000), the benchmark default SEM method is unavailable.

Simulation setup
Following one of the high-dimensional simulation conditions of Zhang et al. (2016), the dataset consists of 100 samples and 1000 potential mediators. These mediators are generated in four uncorrelated blocks: one block with true mediators (M ), one with noise variables related to X (A), one noise block covarying with Y (B), and one large "white noise" block without any covariance (I). The general structure can be found in Figure  7. For each of the simulations, this structure was created as a sparse block matrix using the Matrix package (Bates & Maechler, 2017), after which multivariate normal data was generated using the sparseMVN package (Braun, 2018). Specific data generation and simulation R code can be found in the supplementary materials. Figure 7: General covariance structure for the high-dimensional performance simulation. In the white sections of the matrix, there is no covariance. The true mediator block M is related to both X and Y , whereas the correlating noise blocks are related to either X (block A) or Y (block B). The largest block is the identity matrix block I, which generates only unrelated noise variables.
Note that unlike the illustrative simulations, these data favor the filter method: there is no suppression or excessive interdependence of potential mediators. Therefore, the filter method is the benchmark in this simulation. The XMed method was omitted from this simulation because it requires estimation of the full SEM model before regularizing: it would need to be adjusted to work with high-dimensional data.

Results
The results are displayed in Table 5. The CMF method has the highest true positive rate, and a medium false positive rate, leading to a similar positive predictive value (PPV) to the filter method. In other words, the mediators selected by CMF are as likely to be true mediators as those selected by the benchmark filter method. As true positive rates and false positive rates can be adjusted by the choice of alpha level, we conclude that the CMF method also performs at benchmark level in this high-dimensional situation.

Application to epigenetic data
In this section, we show how the CMF method can be used for exploratory mediation analysis in a real-world setting. Aside from the results shown here, the full R syntax is available in the supplementary material. Houtepen et al. (2016) researched which locations in the genome are likely to mediate the relation between childhood trauma and stress reactivity later in life. In order to identify the genomic locations, they measured methylation at CpG sites using array based technology. In a discovery sample, they found a location of interest which they subsequently researched further and related to functional changes in the human prefrontal cortex.
Here, we re-analyze the original discovery sample dataset to investigate whether CMF yields different potentially relevant locations compared to the correlational filter analysis of the original authors.

Dataset and preprocessing
The dataset of the discovery sample was obtained from ArrayExpress, the data repository of the European Bioinformatics Institute: https://www.ebi.ac.uk/arrayexpress/experiments/ E-GEOD-77445. The sample consists of 85 healthy individuals. The X variable is score on a childhood trauma questionnaire and the Y variable is the increase in cortisol after a stress test defined as increase in the area under the curve (iAUC). The 385 884 potential mediators M were taken from the analysis of DNA methylation in the blood, with default preprocessing. From the available respondent characteristics, age and sex were considered to be confounders. For full details of the dataset, see Houtepen et al. (2016).
Before analysis, X, Y , and M were residualized with respect to their intercept, age, and sex. Since the number of M variables was so large, the last preprocessing step was a straightforward univariate filter. For this, the top 1000 potential mediators in terms of their absolute product of correlations with X and Y were retained. For more details, see the preprocessing R code in the supplementary materials.

Analysis and Results
The CMF algorithm was performed using the centered X and Y and the 1000 potential mediators M . The Sobel test with a p-value of 0.1 was used as the decision function D and 10 000 iterations with random starts were run to ensure stability of the results. After inspecting the scree plot of the selection rates, the cutoff for selection was set to 0.075. The resulting selection rates and selected cg locations in the genome are shown in Figure  8. These locations were annotated using the BioConductor package FDb.InfiniumMethylation.hg18 (Triche, 2014) to find the nearest protein-coding gene. The shortened descriptions were summarized from the GeneCards database (Safran et al., 2002). The result is shown in Table 6. Inspecting and comparing these results more closely, two of the locations identified by CMF have been previously associated with development throughout the lifespan: PRRC2A and ARGLU1. These two locations are also in the top 10 of the lists generated by HIMA and filter. In addition, the RPTOR gene has been associated with cell growth and survival -development on a cellular level. Relative to other sites, this last location does not have a strong correlation with either childhood trauma (r = 0.186) or stress reactivity (r = 0.233), but due to its conditional indirect effect it is deemed relevant by both CMF and HIMA. The ZSCAN30 gene has a small marginal correlation with stress reactivity (r = 0.096) which lowers its rank for both the filter and HIMA methods. However, due to its strong correlation with childhood trauma (r = 0.347) and its conditional relevance this site is still high on the list for CMA.
In conclusion, CMA has overlap with other methods but can identify relevant locations that other methods may miss. Further research using replication samples could focus on exploring whether and how methylation at these locations may alter stress reactivity after childhood trauma.

Discussion
Structural equation modeling, the benchmark method for exploratory mediation analysis, is unavailable in the case of high-dimensional data. Several alternative methods exist, but in the current paper we have shown through simulations that these underperform in situations with specific dependence among mediators, noise variables related to either X or Y , or a combination thereof. Taking these situations into account, we have introduced CMF, a hybrid algorithmic method to identify from a set of potential mediators the most likely true mediators.
CMF improves upon the existing methods by combining the estimation method from regularized regression with the theory-based decision functions from classic mediation analysis. It extends EMA with theoretically relevant decision functions to the highdimensional case. As a full package including a software implementation, it is flexible to the choice of decision function, robust in the tested situations, and it scales to multiple processor cores.
Besides its role as a novel method for EMA, CMF contributes several ideas to the statistical literature. It shows that the use of cyclically calculated residuals is applicable beyond regression into the territory of structural equation modeling. In addition, its performance is greatly improved by feature subsampling, which has regularizing effects on the estimated parameters and thus on the mediator selections. CMF is an example of how combining a deterministic algorithm with a stochastic outer component can lead to adequate performance.
One result of the approach taken in this paper is that there is no formal proof of convergence, and the algorithm may take a long time to stabilize. In addition, the complications introduced in the outer loop make determination of the cutoff for selection nontrivial. In general, the algorithm will output a top-N vector of most selected mediators, and potential options for deciding which cutoff to take are visual inspection of the scree plot or a form of parallel analysis (Horn, 1965). In addition, the error rates (type I and type II errors) are not analytically defined and have a complex relation with the alpha level of the base decision function. This could be investigated empirically in the future.
For this work, we only considered direct feature selection on the set of M variables. Another solution is projecting the available features onto a low-dimensional space before or during estimation. Feature selection can then be performed in this space, leading to variable importance upon reprojection to the original space. Examples are PCA, PLS, or the directions of mediation method by (Chén, Crainiceanu, Ogburn, Caffo, & Wager, 2017). However, we chose to exclude these methods because they do not select mediators, but rather linear combinations of all mediators.
Our coordinate-wise mediation filter bears resemblance to a class of metaheuristic algorithms in the SEM literature for specification search (Marcoulides & Falk, 2018). These algorithms perform an exploratory search for the optimal model based on overall model fit, e.g., the BIC objective. CMF could be considered specification search where the objective is not overall model fit but mediation analysis: it is targeted towards determining whether a specific variable is relevant to a process rather than searching for the optimal model. In addition, CMF performs regularization required for high-dimensional data. In the future, other specification search strategies could be implemented for EMA, but they each need to be adjusted to incorporate both a specific mediation objective and regularization.
Future research should focus on embedding mediation analysis theory directly in penalization procedures for these datasets, either in a classical estimation setting (Zhao & Luo, 2016) or using Bayesian estimation with shrinkage priors (Erp, Oberski, & Mulder, 2018). More generally, enriching structural equation models beyond EMA with embedded feature selection mechanisms will enable social and behavioral scientists to develop and test theories on novel, high-dimensional datasets.