Can cognitive inflexibility reduce symptoms of anxiety and depression? Promoting the structural nested mean model in psychotherapy research

Objective To estimate the causal effect of executive functioning on the remission of depression and anxiety symptoms in an observational dataset from a vocational rehabilitation program. It is also an aim to promote a method from the causal inference literature and to illustrate its value in this setting. Method With longitudinal (four-time points over 13 months) data from four independent sites, we compiled a dataset with 390 participants. At each time point, participants were tested on executive function and self-reported symptoms of anxiety and depression. We used g-estimation to evaluate whether objectively tested cognitive flexibility affected depressive/anxious symptoms and tested for moderation. Multiple imputations were used to handle missing data. Results The g-estimation showed a strong causal effect of cognitive inflexibility reducing depression and anxiety and modified by education level. In a counterfactual framework, a hypothetical intervention that could lower cognitive flexibility seemed to cause improvement in mental distress at the subsequent time-point (negative sign) for low education. The less flexibility, the larger improvement. For high education, the same but weaker effect was found, with a change in sign, negative during the intervention and positive during follow-up. Discussion An unexpected and strong effect was found from cognitive inflexibility on symptom improvement. This study demonstrates how to estimate causal psychological effects with standard software in an observational dataset with substantial missing and shows the value of such methods.

. EFs can uniquely predict fundamental aspects of adult life such as complex reasoning, literacy, and education from as early as 2 years of age (Mulder et al., 2017), and are closely linked to general functioning and work life (Huizinga et al., 2018).Recent trials of multidisciplinary rehabilitation indicate an important role of EF capacity in return-to-work, quality of life, and overall functioning (Aasvik et al., 2015;Aasvik et al., 2017;Jacobsen et al., 2016;Johansen et al., 2019), but the "why" and the "how" of this relationship remain unclear.
A viable candidate for illuminating this relationship is understanding the impact from symptoms of anxiety and depression on EFs.A large body of research associates mood disorders with poor EFs (Snyder, 2013;Snyder et al., 2015), and EF capacity in the form of indecisiveness and distractibility has been included in the DSM-5 classification of mood disorders (First et al., 2021).Several influential psychological models echo the DSM by proposing EF deficits as early signs of depression and anxiety (Fresco et al., 2007;Hayes et al., 2006;Nolen-Hoeksema, 2011;Price & Duman, 2020;Teasdale et al., 2002;Wells, 2011;Wells & Matthews, 1996).As such, deficits in EFs are often conceptualized as transdiagnostic risk factors for developing mood disorders (Liu et al., 2021;Snyder et al., 2015).
However, while EFs have been shown to prospectively predict mood disorders (Caspi et al., 2020;Letkiewicz et al., 2014;Mac Giollabhui et al., 2019), and cognitive inflexibility appears particularly salient (Mac-Pherson et al., 2021;Stange et al., 2016;Stange et al., 2017), the relationship is not straightforward (Fu et al., 2021).The nature and role of EF deficits vary in psychopathology, as well as within different subclassifications of mood disorders (Snyder et al., 2015).When comparing patients with depression to healthy controls, systematic reviews and meta-analyses observe deficits across attention, executive and memory measures, with no evidence of specific domains being affected more than others (Rock et al., 2014).To further complicate the picture, EFs can also deteriorate because of mood disorders, and symptom remission has been documented to happen independently of improvements in EFs (Porter et al., 2015;Torrent et al., 2012).
Moreover, there are only a handful of studies looking at the relationship between treatment effects, symptom remission, and EF deficits, and these show conflicting results (Porter et al., 2015).In bipolar patients, poor performance on verbal recall has been associated with symptom severity after cognitive-behavioral group therapy (Sachs et al., 2020).On the other hand, following antidepressant therapy, improvements in depression and anxiety can happen without improvements in EFs (Devanand et al., 2003;Levkovitz et al., 2002).Moreover, such treatment effects are strongly modified by age and years of education (Gudayol-Ferré et al., 2021).In summary, it is not yet established whether EFs deficits drive, predict, or are caused by psychopathology, or if improvements in EFs from therapies depend on age or level of education (Gudayol-Ferré et al., 2021).Although our knowledge of EFs in mood disorders has grown, a major drawback is that the prospective role of EF deficits in mood disorders is poorly understood (Fu et al., 2021).
A formal framework that provides technical notation to conceptualize causation can give insight in this relationship.Such a framework is that of "potential outcomes" or "counterfactuals", employed in a wide variety of disciplines to assess causality.If an outcome would have differed had some exposure or action been other than it was, then the exposure or action can be said to cause the outcome.In statistics, a formal notation for the counterfactual approach was described by Neyman (1923), and later developed and extended to observational studies by Rubin (1974Rubin ( , 1978)).Robins extended it further to include multiple and time-varying exposures (1986), and it was related to a graphical representation of causality by Spirtes et al. (1993) and Pearl (1995Pearl ( , 2009).

An Introduction to Robins' g-methods
Even in the perfectly designed randomized controlled trial (RCT), the treatment comparison can be a biased estimate of the "true" treatment effect due to non-adherence or differential loss to followup.Moreover, conducting an RCT is often not feasible, leaving the causal question unanswered.The overwhelming amount of research questions of a causal nature in the health sciences has inspired the development of methods for causal inference for any study design to come as close as possible to the "true" causal effect, with as few and mild assumptions as possible.In epidemiology, causal questions have been assessed from observational studies as RCTs are hard to come by, and progress in methodology has benefitted other fields.Identification and estimation of causal effects including underlying assumptions, e.g., effect of a time-varying exposure, treatment, or action on an outcome has been studied in detail with a variety of applications (Hernán & Robins, 2020), but might still need justification in psychological research.From here on, the variable of which the causal effect is of interest is called the "action" variable, to represent the specific action itself or a hypothetical intervention that could change the action variable.

Psychotherapy Research 1097
In observational studies of a time-varying action, time-varying confounding is always a potential challenge.This is commonly illustrated in epidemiology with a causal diagram, a so-called causal-directed acyclic graph (causal DAG) (Pearl, 1995).In a causal DAG, with arrows representing direct causal effects, a statistical association between an action variable and an outcome can only be produced by the following three causal structures: (i) a true cause and effect, (ii) common cause (confounding): if the action and the outcome share a common cause that is not adjusted for, and (iii) common effects (selection/collider bias): if the action and the outcome have a common effect, or a descendant of a common effect, that is adjusted for.
If the time-varying confounder is itself affected by earlier actions, conventional methods for confounder control (by stratification/adjustment in a regression) are flawed (Figure 1).With interest in the joint effect of A 0 and A 1 on Y (Daniel et al., 2013), adjusting for L 1 in a standard regression model would block part of the effect of A 0 (through A 1 ) and potentially lead to bias of the effect of A 0 through the unmeasured confounder U for the L 1 − Y relationship (L 1 becomes a collider).
Robins' generalized methods (g-methods) were developed for identification and estimation of such effects.It's a family of methods that include the gformula / g-computation, g-estimation for Structural Nested Models (SNMs) (Robins et al., 1992;Robins, 1994Robins, , 1997;;Vansteelandt & Joffe, 2014;Vansteelandt & Sjolander, 2016), and inverse probability weighting (IPW) for Marginal Structural Models (MSM) (Robins et al., 1999).In the counterfactual framework, a marginal causal effect is defined by the theoretical contrast between the mean if everybody in the population was exposed to the action at both timepoints versus the mean if everybody was not exposed to the action at both timepoints.Assumptions on which the g-methods rely form the link between this hypothetical effect and the observations: 1. "Counterfactual consistency" allows us to equate the observed outcome to the counterfactual outcome that would be observed under the observed action level.This equality should hold, regardless of the nature of the assignment mechanism (e.g.whether it was randomized or observational).If, for example, there are multiple versions of the action (let's say, therapy with different content) which might give rise to different outcomes depending on which version is administered, this renders the counterfactual outcome not well defined, and consistency is violated (VanderWeele & Hernán, 2013).2. "Sequential conditional exchangeability" is the same as "no unmeasured confounding", in the time-varying case.It implies that the counterfactual outcomes are independent of the observed action levels, conditional on past covariates and action levels (in Figure 1 it corresponds to no unmeasured variables that affect A 0 and Y , or A 1 and Y ).From here on, the term "no unmeasured confounding" is used and is meant for every time point.3. "Positivity" is met when there are individuals in all levels of the action variable, within all confounder and prior action levels (not needed for g-estimation).This implies welldefined and positive probabilities for each action level.
Under these assumptions, the hypothetical observational study in Figure 1 equals a sequentially randomized trial in which the action level was randomized at baseline and randomized again at time 1 with probability depending on L 1 (Naimi et al., 2017).Of the three g-methods, IPW estimation for MSMs is by far the most popular method (and newest).It appears much simpler than the two others, and is easy to implement in standard software.However, the less known method of gestimation outperforms IPW in several ways, it's almost always more efficient, more robust for unmeasured confounding, particularly well-suited for a continuous action variable, can accommodate moderation by time-varying covariates and does not require the positivity-assumption (Vansteelandt & Sjolander, 2016).G-computation is even more efficient, but highly computer-intensive with many parametric assumptions (distributional assumptions for confounders etcetera) and can also not handle moderation by time-varying covariates.
Structural nested mean models (SNMMs) solve the problems in the conventional confoundercontrol (stratification/adjustment in regression), by avoiding the adjustment on post-action variables (adjustment for L 1 in Figure 1).This is achieved by modeling the outcome at each time conditional on the action variable and covariate history up to that time; after having removed the effects of subsequent action variables to disentangle the unique contributions of each action at each time (Vansteelandt & Joffe, 2014).The parameters in the SNMMs are estimated with g-estimation and "nested" refers to a conceptualization of longitudinal data as resulting from a nested series of trials (Picciotto & Neophytou, 2016).Starting with the last time-point (with no post-action variables), the action-outcome relation is estimated and adjusted for past actions and covariates.Iteration for each prior time-point after having "removed" the effect of the later action variable on the outcome ends with a counterfactual outcome for everyone being predicted under an "action-free" regime.Combined with the no unmeasured confounding assumption (for every time-point), the gestimation finds the counterfactual outcome (with the action removed) that satisfies the independence assumption between this counterfactual and the observed action variable.This counterfactual is a function of the causal action effect through parameters in the SNMM, and inversion of this function yields the g-estimated parameters.
The limited popularity of g-estimation has largely been related to a lack of implementation in standard software.In 2016, Vansteelandt and Sjolander described how to obtain the g-estimator for a linear SNMM by combining separate regression models for the outcome and the action variable (Vansteelandt & Sjolander, 2016).If either of these models (but not both) are mis-specified, the g-estimator for the causal parameters in the SNMM is still consistent (provided the SNMM is also correct), a property which is called "double-robust."This implies indirectly some protection against unmeasured confounding in the outcome model (one form of misspecification), also shared by the instrumental variable approach (Ertefaie et al., 2017;Joffe & Brensinger, 2003).It can be shown that the g-estimator is equivalent to an instrumental variable (two-stage least squares) estimator (Joffe & Brensinger, 2003;Naimi et al., 2017).Double-robustness means that consistent causal parameters in the SNMM are obtained even if the model for the action variable, or for the outcome (but not both) has some form of misspecification.This property can be exploited in the model fitting, with one model held constant while varying the other.Large variation in the causal parameter estimates indicates that the model which was held constant is wrong.In this way, double-robustness can guide model selection (Wallace et al., 2016).
More applications of g-estimation have been called for (Vansteelandt & Joffe, 2014;Vansteelandt & Sjolander, 2016).In two subfields of epidemiology there has been an increase in applications in recent years, (Picciotto & Neophytou, 2016) which should easily translate to psychotherapy research and other mental health therapies.In pharmacoepidemiology, g-estimation has been applied to adjust for nonadherence and to determine optimal personalized strategies, which has increased interest in the area of dynamic treatment regimens (Tsiatis et al., 2020).A dynamic treatment regime is a sequence of interventions and of interest in all personalized treatment of chronic diseases, including many mental disorders.In occupational epidemiology, gestimation has been used to adjust for the so-called "healthy worker survivor bias," which creates the illusion that an unhealthy exposure is protective, because the workers of poorer health are more likely to reduce their exposure.This is also seen in psychotherapy research and the mental health field, where patients with more severe symptoms tend to illicit more frequent and comprehensive treatment, thereby generating an inverse association between treatment and outcome.G-estimation is appropriate to adjust for such feedback.
Even though the g-estimation can be implemented in standard software, and the regression models for the action variable and for the outcome are flexible (within standard limitations for regression models), there is a lack of flexibility for link functions other than identity or log link (Vansteelandt & Sjolander, 2016).The implementation of log-linear SNMMs for counts and similar models for survival endpoints can be informed by the development of the linear SNMM (Vansteelandt & Sjolander, 2016).Based on semi-parametric theory (Tsiatis, 2020), Robins formulated the g-estimator as the solution of an unbiased estimating equation, without distributional assumptions (consistency under mild regulatory conditions) (Robins, 1994).
Traditionally, causality in epidemiology and psychology has been assessed in structural equation models (SEMs).Even though some of the more recent causal inference methods have developed out of the SEM literature, e.g.causal graphs (DAGs), traditional SEMs have many limitations with respect to causal inference when, e.g. the effect of an action on an outcome, or mediation is of interest (Rijnhart et al., 2021VanderWeele, 2012).SEMs tend to make more assumptions, like linearity in functional form among covariates and multivariate normal distribution.In fact, in a regression model, internal relations among covariates are not modeled, only the relation between the action variable and outcome (covariates play the role of confounders or moderators).This means that bias from unmeasured confounding is more likely in a multivariate SEM, since any common cause of two variables in a causal graph (representing the SEM) also Psychotherapy Research 1099 must be included (VanderWeele, 2012).The usual assumption of uncorrelated errors in an SEM would be violated in Figure 1, and therefore indirectly implies an assumption of no unmeasured confounding for both the relationship between A0 -Y, A 1 -Y and L 1 -Y (De Stavola et al., 2015).
One reason for the popularity of the SEM model in psychology is that it can be extended to account for measurement error, within restrictions of normality and uncorrelated errors.A realistic causal DAG is a general and useful way of depicting different types of measurement errors.It should simultaneously represent biases arising from confounding, selection (collider-bias) and measurement error, and thereby reveal how (if possible) to correct for these (Hernán & Robins, 2020).Sensitivity analysis techniques for direct and indirect effects in causal mediation have been developed, for unmeasured confounding as well as for the presence of measurement error (VanderWeele, 2015).
The main aim of this study is to demonstrate how the structural nested mean model with g-estimation can be used to disentangle causal pathways in an observational longitudinal dataset of mental health variables with some design limitations.The causal effect of a hypothetical intervention that could change cognitive flexibility on remission of depression and anxiety symptoms over a 12-month period is assessed.As EF capacity is strongly related to socio-demographics such as age and education, these were, in addition to coping style, considered as candidate moderators (VanderWeele, 2015).It was also an aim to see whether symptoms of depression and anxiety was associated with differences in baseline EFs or long-term return-to-work for our participants.

Setting
Individuals completing either inpatient or outpatient multidisciplinary occupational rehabilitation programs based on multimodal cognitive behavior therapy were recruited from four clinics, alongside healthy controls.The duration of the rehabilitation programs varied between the clinics from 3 to 12 weeks.

Participants
A total of 390 participants were recruited for this study.Of these 317 were patients who were either on partial or full sick leave, volunteered to take part in the study as well as 73 healthy volunteers who were currently working full time.A total of 187 participated in an inpatient occupational rehabilitation program and 93 in an outpatient program.Seventythree workers in the control group who volunteered to take part were all working full time and had no sick leave during the testing period.They were recruited from the wider community and employees from three rehabilitation clinics and included a wide selection of different blue-and white-collar workers.The two groups were matched for age, sex, and number of days between pre-and posttests.Of participants in rehabilitation, 80% had an ICD-10 diagnosis either in the categories F, mental and behavioral disorders or M, diseases of the musculoskeletal system and connective tissue.Exclusion criteria for the rehabilitation and control group were a history of head injury or having applied for disability pension.The flow of participants is illustrated in Figure 2.

Design
This study had an explorative, non-randomized prepost design.All participants in the study were assessed with cognitive and emotional tests and work and health questionnaires on four time points spanning a 13-month period and are included in the estimation of effects irrespective treatment groups.

Multidisciplinary Return-to-Work (RTW) Rehabilitation Program
The patients were referred to occupational rehabilitation by general practitioners or social security offices.The main aim of rehabilitation was RTW, and the programs lasted between 3 and 12 weeks.The patients were followed up by an interdisciplinary team including at least four of the following professionals: physician, physiotherapist, psychologist, work consultant, coach, nurse/psychiatric nurse, and sports pedagogue.The assessment of work ability, physical fitness, and current work and health situation was carried out to tailor rehabilitation efforts.

Materials Computerized Cognitive and Emotional Tests (Exposure Variables)
Eight validated tests from the Cambridge Neuropsychological Test Automated Battery (CANTAB) were used to assess cognitive and emotional functioning on a touch screen.The order of the tests was fully counterbalanced across participants at each testing session and within each group.All participants were introduced to the touch screen by way of a motor screening task performed before testing both at pre-and post-tests.This screening was performed to familiarize the participants with the touch screen and reduce as much as possible any initial apprehension before testing.A description of computerized cognitive and emotional tests is given in brief below.
Rapid Visual Information Processing, Spatial Working Memory, Spatial Recognition Memory, Stockings of Cambridge (a version of the Tower of London task measuring executive planning), Emotion Recognition Task.All tests were administered on a touch-sensitive computer screen.

Cognitive Flexibility (Primary Action Variable)
Intra-Extra Dimensional Shift (IED) tests the participants' cognitive flexibility and is a computerized version of the Wisconsin Card Sorting Test (Grant & Berg, 1993).It measures a person's ability to acquire and reverse rules and taps into lowerorder cognitive functions such as discriminating between visual stimuli and flexibly maintaining and changing an attentional set (Cambridge Cognition, n.d.).On the screen, four white boxes appear, and a stimulus is shown inside two of them.The stimuli are composed of color-filled shapes and white lines.The participant is asked to select a stimulus at random.Following a selection, the participants get feedback on their answer-correct or incorrect.The task is to discover the rule determining which stimulus is correct.We chose IED total errors adjusted as our action variable.IED total error is a summation of all errors made at different stages.For those that fail a stage and therefore are not given a chance to attempt further stages, an additional sum of 25 is added for each stage not attempted.A lower score represents better performance (higher flexibility).

Depressive and Anxious Symptoms (Response Variable)
The 14-item Hospital Anxiety and Depression Scale (HADS) measures mental distress and is divided into an anxiety and a depression scale, each with 7 items (Zigmond & Snaith, 1983).Twenty-one is the maximum score on each scale.The cut-off for mental distress is usually set at 7 or above on either the anxiety (HAD-A) and/or the depressive scales (HAD-D) (Wu et al., 2021) and this is validated for a Norwegian population (Bjelland et al., 2002).In the current sample, the HADS showed satisfactory internal consistency (Cronbach's α = 0.89) at baseline.TOMCATS is a measure designed to measure the concept of response outcome expectancies as defined in the Cognitive Activation Theory of Stress (CATS) [2,18].In this context, it was proposed as a moderator between the unmeasured effects of stress on executive functioning.Response outcome expectancies are thought to drive, or limit sustained stress activation, thus potentially moderating the adverse effects of cortisol on brain areas involved in EFs, thus acting on depressive symptoms.
The inventory consists of three factors that represent the three response outcome expectancy dimensions of CATS: positive expectancy (one item), no expectancy (two items) and negative expectancy (three items).All items are rated on a four-point scale from "not true at all" to "completely true."For this study, we only used the negative expectancy item that has been associated with depressive symptoms, namely an expectancy of hopelessness (TChope) and helplessness (TChelp).In the current sample, the TOMCATS showed satisfactory internal consistency (Cronbach's α = 0.81) at baseline.

Non-linear Decreasing Magnitude Over Time
In treatment-effect trials, effects often decrease over time.In the present study, moderation from a nonlinear function of time f (t) with a similar shape as the outcome proved a good fit and is given by (similar to the Box-Cox transformation): where l .0 ( , 0) yields a function that decreases less (more) rapidly than the negative log(t), l = 1 yields a linear function of time (in days).

Moderators, Time Constant Education Levels
For this analysis, education was divided into two levels: low = up to high school (<13 years of education) and high = college and university degree (>13 years of education).

Statistical Analyses
Baseline demographics for selected variables were analyzed using means and standard deviations for continuous variables or percent for categorical.A comparison of average percent sick leave over 12 months and EFs on baseline was tested for education using independent sample t-tests.

Causal Effects
The counterfactual approach is used to assess the causality between cognitive flexibility and subsequent depressive/anxiety symptoms.In terms of the effect of an action on an outcome, the counterfactual outcome Y (a) is the potentially unobserved outcome for the hypothetical action level a.In the following, random variables and their realizations are represented by upper-and lower case letters, respectively.
A conditional average causal effect of a hypothetical intervention setting the action variable to a versus 0 (0 can represent action-free or any reference value) among subjects with covariate value l, can be formulated by the linear structural mean model (SMM) (Robins, 1994;Vansteelandt & Joffe, 2014 , where E(.|.) is the conditional population mean, z is a covariate vector possibly depending on l, and c is the vector of causal parameters of interest.
For example, with l representing gender and z = 1, the additive action effect on an outcome from action level a relative to no action, is equal between genders.On the other hand, z = l would describe different action effects between men and women.
With time-varying action variable A t , vector of covariates L t and outcome Y t , observations are made at time t = 1, 2.. with the history up until (and including) t, denoted by A t and L t , for example A 2 = {A 1 , A 2 }.Y s ( a t , 0) is the counterfactual outcome at time s = 1, 2 . .., with s .t, for an action history equal to a t up until time t and zero thereafter.This construct facilitates assessments of a causal effect of a timevarying action on the following outcome as well as on all subsequent ones, formulated by the structural nested mean model (SNMM) (Robins, 1994): The action effect posited in (2) represents the difference in outcome when the action level at time t is set equal to a t and 0 thereafter, relative to when action level at time t and onward is set equal to 0, conditional on action and covariate history.In other words, the contribution to the action effect from a specific timepoint (so-called "blip") when later contributions are "removed."This effect is identifiable and can be estimated from observed data under the usual causal assumptions of consistency and no unmeasured confounding (see Introduction).
The causal effects to be assessed in the present application are the effects of a hypothetical intervention that could change cognitive flexibility (IED t ) at time t, on subsequent depression and anxiety symptoms, by HADD s , HADA s , and HADT s , s .t, and with potential effect-modifiers TChope t and TChelp t .
Different causal hypotheses can easily be assessed with different choices of z st in (2) (Appendix), although lack of power is a limitation with increased model complexity.A simple model with moderation from a time-fixed covariate and a non-linear function of time proved to have a good fit in the present application (with HADD s as outcome), and is given by (3) for = 2, 3, 4 s .t, and with f (t) from (1).
Following Vansteelandt and Sjolander, fitting the SNMM for the causal effect of the time-varying IED t on HADD s , HADA s , and HADT s , s ., can be done in three steps, by combining ordinary regression models for IED t , one model for each time-point (e.g.linear regression) and a longitudinal model for the outcome, including the SNMMterms, fitted and refitted (Appendix) I. Regressing the action variable IED t on its history, baseline, and preceding timevarying covariates and outcome for each, including non-linearities and interactions.
With data in a "wide" format, these models could for example be fitted with the linear model command in the statistical software R (R Core Team, 2014): "lm(IED t . ..)" (OLS regression) for t = 1, 2, 3, with the fitted values (called "propensity scores"), saved and denoted P 1 , P 2 , and P 3 respectively.II.Regressing the observed outcome, e.g., HADD t , on covariates and moderators as well as the SNMM terms with previous action (1, HighEdu, f (t − 1))ied In the outcome model (steps II and III), f (t) was included as one component in l t with l = 0.2, 0.1, 0.4 giving good fit for HADD t , HADA t , and HADT t respectively as a function of days (Figure 3).
The steps above describe how g-estimation can be achieved by the simple combination of standard regression models.To achieve an automatic model selection and identify the "best" model for each bootstrap sample (and thereby precise bootstrap standard errors), the OLS regression in step I, can be replaced by for example an automatic lasso-regression algorithm (as in the present application where the Rpackage glmnet was used) (Friedman et al., 2010).Automatic model selection was also performed in the calculation of the censoring weights (adjusting for potential selection bias from loss to follow-up, see Appendix).
Even though the procedure is flexible (e.g. to include automatic model selection), the implementation requires some experience in statistical programming (see R-code for the present application in the supplementary file).Recent development in software has made g-estimation based on this method available, in the R-package gestools (De Stavola et al., 2015).This package takes as input the data set in long format and changes it to a convenient form, to ensure there exists a data entry for each person at each time point, generates lagged versions of the timevarying covariates, and covariate-histories (set to zero at the start).It performs g-estimation for different types of action variables and for continuous or dichotomous outcomes.
A causal graph (DAG) of the design is shown in Figure 3.

Psychotherapy Research 1103
Incomplete data were of concern, both for the outcomes (HADD t , HADA t , HADT t ) for the action variable (IED t ), and different covariates.Missing values in an outcome was considered as loss-tofollow-up and the person was not allowed to reenter.Potential selection bias from loss-to-followup was adjusted for by inverse probability of censoring weights (Appendix).
To address the issue of incomplete data in covariates, multiple imputation (MI) (Appendix) was performed under the assumption of missing at random (MAR), with the R-package mice (Buuren & Groothuis-Oudshoorn, 2011).The outcomes served as covariates both in the propensity score and outcome, steps 1, 2, and were imputed as missing covariate, but not as missing outcome.Out of 69 variables in the dataset, the covariates with missing values are shown in supplementary table 1.
Efficiency gain from MI was assessed by the fraction of missing information (FMI), approximated by FMI = r/(1 + r) (for a high number of imputations), where r is the relative increase in variance due to the missingness (Madley-Dowd et al., 2019;Schafer, 1999) (Appendix).The FMI is parameter specific and quantifies the loss of information due to missingness, while accounting for the amount of information retained by other variables (Madley-Dowd et al., 2019).A low number of imputations in MI is usually sufficient, for example with an FMI of 20%, 10 imputations correspond to a relative efficiency above 98% (Schafer, 1999).
With respect to standard errors in the g-estimation algorithm, the bootstrap is recommended (Vansteelandt & Sjolander, 2016).To account for uncertainty from incomplete data and to achieve unbiased standard errors from the g-estimation, the bootstrap was performed for each imputed dataset.Ten complete datasets were generated from the imputation algorithm, and for each complete dataset a bootstrap parameter estimate with the standard error was generated (500 resamples), and finally combined according to Rubin's rules (Rubin, 1987).Model selection was carried out for every bootstrap sample, both in the censoring weights and propensity score models, to identify the best model.A large set of covariates was entered for each bootstrap sample, and the automatic lasso regression algorithm returned the best cross-validated model for that sample, in terms of minimum prediction error, while avoiding overfitting.

Results from g-Estimation
Time-course of the mean level of the action variable (IED t ) and outcomes (HADD t , HADA t , and HADT t ), stratified by low-versus high education (HighEdu t ), is shown in Figure 2. The overall feature in both groups was clear improvement during the intervention period (mean length = 28 days), represented by a decrease in IED t (increased flexibility) and depression/anxiety symptoms (HADD t , HADA t , and HADT t ) followed by gradually flatter mean levels during follow-up.
Also, a constant difference between the groups, in favor of the high-education group, was evident.Included in the plot is the function f (t) from Equation (1), used in the outcome model (steps 2 and 3) as one component of l t , representing a good fit of the observed outcomes.
Different causal models (SNMMs) were fitted to assess long-term effects, group differences in shortterm effects (interactions with the action variable or other covariates) and whether TChope t or TChelp t could play a role as moderators for IED t 's influence on mental distress (Appendix).No such significant effects were found, but few repeated measures with limited observation time, and model complexity resulted in a lack of power for one or more of these tests.
The simpler SNMM from Equation ( 3), modeling a constant and short-term low-versus high education effect as well as a time-varying part, showed significant causal parameters for all outcomes.Final combination estimates from the 10 imputed datasets are shown in Table III.The effect of IED t on HADD t had a non-significant intercept (c 0 ), a significant difference between low-and high education ( ĉ1 = 0.043, p = 0.016) and a significant time- Psychotherapy Research 1105 varying part ( ĉ2 = −0.005,p = 0.016).The effect of IED t on HADA t had similar characteristics with a non-significant intercept (c 0 ), a significant difference between low-and high education ( ĉ1 = 0.048, p = 0.003) and a significant timevarying part ( ĉ2 = −0.009,p = 0.0036).The effect of IED t on the total score, HADT t had a significant intercept ( ĉ0 = 0.0038, p = 0.05), a significant difference between low-and high education ( ĉ1 = 0.12, p = 0.0026) and a significant timevarying part ( ĉ2 = −0.006,p = 0.0022).These estimates are inserted in the SNMM from Equation (3) (c 0 + c 1 HighEdu + c 2 f (t)) to give the coefficient for the short-term effect for each time point (Table IV).The causal effect had the same characteristics for all three outcomes.
For low education, the effect was negative overall, strongest during the intervention period and gradually weaker.In other words, a hypothetical intervention that could increase inflexibility seemed to cause improvement in mental distress at the subsequent time-point, the more inflexibility, the larger improvement.For high education, a change in sign was estimated, the effect was negative during the intervention and positive during follow-up (for HADD t and HADT t immediately following the intervention, and for HADA t the sign-change happened in the last period).In this group, a hypothetical intervention that could increase inflexibility seemed to improve mental distress during the intervention period (although weaker than for the low-education group).After the intervention period, the effect changed sign, a hypothetical intervention that could increase inflexibility had a negative influence on mental distress.
Other dimensions of EF were tested similarly, without any significant effects.Dividing the significance level (0.05) by the number of tests to correct for multiple testing (a conservative Bonferroni correction) would still result in significant effects for HADA t and HADD t .
In the imputation, both prior and subsequent measurements for each missing value were allowed in the model.Stability in the effect estimates across the imputations indicated successful imputation.Estimated FMI ′ s were between 1.6 and 4.3%, considerably smaller than the proportions of missing IED t values during follow-up (except at baseline)-0.8%,10.3% and 30.8%, reflecting that other variables succeeded in retaining information for the missing IED t values.
Selection bias from loss-to-follow-up was assessed by comparison of results with and without censoringweights (Appendix).Estimation without weights (without adjusting for loss-to-follow-up) showed an underestimation of the magnitude (in the direction of zero).Relative bias was largest for HADD t , with 8.5%, and 7.1% for c 1 and c 2 (significant coefficients) and smallest for HADA t with 0.2% and 1.1% for c 1 and c 2 , respectively.The difference in bias was not explained by different samples.

Return-to-Work in Low vs. High Education
Testing whether those with low education differed with regard to work, an independent samples t-test indicated no significant difference in the percentage of sick leave post-treatment to 12 months after completing rehabilitation t(175) = 0.2, p = .88.Additional analyses were performed to check for baseline differences in cognitive flexibility and depressive and anxious symptoms in the RTW versus non-RTW groups.Here, depressive symptoms were associated with RTW (p = 0.04), but no substantial differences were detected for anxiety or cognitive flexibility (p = .13).

Discussion
Causal effects of cognitive inflexibility on improvement in symptoms of depression and anxiety were found in this observational dataset.Despite limitations in the data the model proved sufficient to "tease out" strong causal effects.The inflexibility domain of EF had a much stronger signal/noise ratio than the other domains, consistent across all the centers, and maintained significance when corrected for multiple testing.The effect decreased over time and was moderated by education.
In the low-education group, cognitive inflexibility was higher than in the high-education group over the whole period and seemed to have a strong causal effect on subsequent anxiety and depressive symptoms, which was not the case for the high-education group.The effect weakened after the treatment period.A total change in HADD t (depressive symptoms) over the whole observation period for the low-education group was estimated to be around 2.4 points improvement relative to perfect flexibility, which is clinically relevant.The high education group did have a similar, although weak effect during the treatment period, and with a change of sign, indicating that more cognitive inflexibility led to more depressive/ anxious symptoms after treatment.These results might explain why there is no consistent relationship between EF capacity and mood disorders, and why some benefiting from rehabilitation while others do not (Mikkelsen & Rosholm, 2018).
Education as a moderator was found in a recent systematic review and meta-analysis on the effects of antidepressants on executive functioning (Gudayol-Ferré et al., 2021).Improvement in EFs was overall stronger in the higher educated following treatment (Gudayol-Ferré et al., 2021) indicating that education sometimes can be a moderator in this relationship.
Theoretically, the importance and positive benefits of high cognitive flexibility are recognized in leading models of cognitive behavioral therapy (CBT) (Clark & Beck, 2010;Hayes et al., 2011).The more traditional CBT models state that with repeated activation, negative self-schemas can crystalize into a depressive or anxious mode, which then generalizes to more and milder stressful life events.Newer developments such as metacognitive therapy also state that emotional disorders are maintained by the activation of maladaptive thinking styles.Here, the cognitive attentional syndrome is characterized by extended negative thinking in the form of rumination and worry and is associated with increased self-focused attention and maladaptive coping behaviors (Wells, 2011).In both cognitive models, maladaptive information processing is thought to result in a weakening of cognitive control.This could be operationalized as inflexibility or inability to access more adaptive, alternative modes of thinking.What is surprising with our current findings is that such inflexibility seems to be beneficial.
Our results might therefore indicate an important nuance in our understanding of cognitive flexibility and symptoms of anxiety and depression.While it is fairly established that shifting ability is central for supporting flexible and effective self-regulation (Kashdan & Rottenberg, 2010), the ability to shift attention from a task at hand to new aspects of a Psychotherapy Research 1107 situation more critical is highly adaptive, but it also costly in the form of energy expenditure for the brain.An automized cost-benefit analysis of shifting between sets of stimuli in any demanding novel situation has previously been demonstrated (Szalma & Matthews, 2015).This signifies that a given challenge must be significant or perceived as important to motivate shifts in attention.Less flexibility may thus be associated with good coping in less demanding day-to-day situations, only becoming negatively associated with good coping in crises or extended challenging situations.
Following this reasoning, our results could be explained by our participants' perception of their challenges related to sick leave.Challenges caused by sick leave in the Nordic region are often considered manageable for the individual due to available, generous benefits, support and health care systems (Elstad & Vabø, 2008).It could be that the beneficial role of inflexibility appears in our study due to the context of sick leave being less dramatic than in countries with no or little welfare or sick leave benefits.
Another interpretation also worth considering in lieu of our results is our action variable testing cognitive flexibility, IED set shift.In this test, a rule is established by shaping a given participants' behavior with contingencies related to choosing an abstract pattern (correct/wrong).Once a rule is established, it is then altered without warning, and the speed and accuracy of acquiring a new rule, accumulated over several stages, is the outcome from the test.Its design dictates that those who resist changes to rules and are less sensitive to cues from their context, score poorly on this test.But here it is also these participants who improve in their symptomatology the most.The counter-intuitive nature of our data could also, therefore, be viewed considering the behavioral construct of rule-governed behavior.A conceptualization of EFs as a subset of rule-governed behavior characterized by flexibility was given by Hayes et al in the mid-90s (Hayes et al., 1996).
When one compares learning behavior under the control of instructions (i.e., rule-governed behavior) with behaviors shaped by contingencies such as reinforcement by trial and error, participants who are instructed tend to show more insensitivity to negative rewards than contingency-shaped participants (Vaughan, 1989).This entails that when instructed to expect less symptoms from doing a behavior such as running, a less flexible participant would ignore experience of non-relief and defer to the instruction of "less symptoms."Within rule-governed behavior, cognitive inflexibility, therefore, would refer to a tendency to maintain current behavior disregarding experience to a larger degree, which could be adherence to instructed benefits from following "rules" set in a rehabilitation program, such as expecting less symptoms from certain behaviors.As an example of this, a previous randomized controlled trial indicated support for our current finding in showing that low-educated participants benefited more from an intervention in smoking cessation, due to increased adherence to instructive stories (Strecher et al., 2008).
Indeed, the current rehabilitation programs would involve multiple examples of increased behavioral activation under the control of instructions from therapists.Given these conditions, one could envision generalized tracking developing during the stay from instructed functional relationships among events (e.g., drawing out rules based on observation and instructions from therapists) (Villatte et al., 2015).This generalized tracking of a rule in the abstract sense could then improve anxiety and depression through rigidity over time and perceived symptom relief disregarding evidence of the contrary.This is substantiated by the HADS being a patientreported outcome, meaning that one does not know if others in close personal relationships for example would judge this perceived symptom relief in the same way.Thus, this improvement could be strictly subjective, something that is indicated by the lack of substantive associations with return-to-work.Of interest to the current findings is also a prior publication on this multicenter trial showing significant effects from other cognitive functions on return-towork, but not between cognitive flexibility and return-to-work (Johansen et al., 2019).

Methodological Discussion
The structural nested mean model with g-estimation demonstrated efficient estimation in this application.The study of causal reciprocal effects between two variables (so-called cross-lagged effects), dates to early time-series analysis, and was incorporated in the SEM framework (Jöreskog, 1970;Jöreskog & Sörbom, 1979), in the form of the cross-lagged panel model (CLPM), which became especially popular in the behavioral and psychological science research.A recent review of medical journals found 270 papers published between 2009 and 2019 that used CLPM (Usami et al., 2019b).The CLPM represents a different tradition for modeling causal effects than the counterfactual framework in epidemiology but shares the same fundamental "no unmeasured confounding" assumption.The CLPM's sensitivity for this assumption has been criticized, e.g., yielding non-satisfactory adjustment for stable trait factors (Lucas, 2022).Extensions of CLPM and several alternatives with latent variables have been developed (Usami et al., 2019a), but they all have in common that they are severely biased in their causal cross-lagged effect-estimate, when the data-generating process deviates from their strict parametric assumptions (Lüdtke & Robitzsch, 2022).To date, it seems to be a lack of consensus in psychological and behavioral science, of whether to keep (Orth et al., 2021) or to abandon (Lucas, 2022) the CLPM model.
Consequently, the counterfactual framework has gained acceptance in defining the causal effects (bias) in general (Lüdtke & Robitzsch, 2022;Rijnhart et al., 2021;Usami, 2022) and for comparison of methods (Lüdtke & Robitzsch, 2022).In some cases, authors from the SEM literature even recommend causal mediation analysis (counterfactual framework) rather than the traditional mediation models (SEM) (Rijnhart et al., 2021).The strength of the SNMM with g-estimation has been recognized, but unwillingness to depart from the latent variable models makes a comparison between the two traditions difficult.Usami used the SNMM as part of fitting the SEM, to estimate joint effects of time-varying treatments with improved control for stable traits, due to the robustness of functional form and misspecification in the SNMM (Usami, 2022).Despite of these advances in robustness, the SEM component still relies on several parametric (and distributional) assumptions, some of which are untestable.
The assumption of "no unmeasured confounding" is not testable from data, and the reason that causal inference from observational data is hazardous (Hernán & Robins, 2020).In the counterfactual framework, rather than introducing further assumptions on unmeasured factors that each would need separate assessment (and not testable), it seems more appealing to apply the already existing framework for sensitivity analysis (Robins et al., 1999;VanderWeele & Arah, 2011;Klungsøyr et al., 2009).In this framework, one needs a model (a parsimonious function) for the association between the mean of the counterfactual outcome variable and the action, treatment or exposure, within levels of measured confounders.This biasfunction which represents the magnitude of confounding due to unmeasured factors, is not identified from the data, and needs to be considered plausible by subject-matter experts.The sensitivity of the causal effect is described by varying this function (and its parameters).In this way, many fewer sensitivity parameters are needed to be varied, and the "impossible" decision as to whether to view U as univariate or multivariate, continuous or discrete, and its relation to the action variable, treatment or exposure and outcome is done away with (Robins et al., 1999).The SNMM with g-estimation is well suited for sensitivity analysis, because g-estimation does not require no unmeasured confounding for the identification of parameters, it merely requires that the magnitude of unmeasured confounding is known (Robins et al., 1999;Vansteelandt & Joffe, 2014;Yang & Lok, 2018).By letting the bias-function be zero for no unmeasured confounding and varying both parameters in this function and the functional form itself, sensitivity for unmeasured confounding can be assessed (Yang & Lok, 2018).However, Yang and Lok described this in the form of complicated estimating equations (Yang & Lok, 2018), which have not been translated to g-estimation by nested regression models from Vansteelandt & Sjolander, used in this application.
A practical problem is to determine if a certain magnitude of unmeasured confounding (on the scale of the parameters in the bias-function) represents sensitivity or not.Therefore, when the association of interest is not well described, as in the present case, such an undertaking might not be worth the effort.Further, the degree of missing data will necessarily affect the interpretation of the results in all cases.Plausible significant causal effects rely on "no misspecification" in the imputation models (as well as in other model-components), which are strong assumptions.Adding further assumptions, in the form of bias-functions for unmeasured confounding might seem questionable.Nevertheless, the g-estimator is itself robust for unmeasured confounding by allowing for miss specification in one of its components (e.g.unmeasured confounding in the outcome model) (doubly robust).One gets two attempts of success, consistent estimates are achieved if either the propensity score model, or the outcome model, in addition to the SNMM are correctly specified.Even efficiency gain can be achieved by including a "wrong" model.In an RCT with a dichotomous action (or treatment or exposure), where the propensity score model is known (probability for treatment), and consistent effect-estimate would be expected, adding a mis specified outcome model in a doubly-robust estimator usually leads to lower standard error in the effect-estimate (Tsiatis et al., 2020).Insight in one of the two models will, in a doubly robust method, be of extra value.In the present application, confidence in the comprehensive propensity score model (a series of lasso-regressions with a rich set of covariates, both time-constant and time-varying) indicates that the estimated causal effect is not completely a result of unmeasured confounding.

Strengths and Limitations
Some limitations concerning a causal interpretation of the results are: the observational nature of the data, a limited observation period and number of repeated measures, time-varying confounding, and substantial missingness.On the other hand, the multicenter design with four independent sites provided data from many participants with a good geographic spread in the most populous region of Norway.The tests of executive functioning were selected on relevant and important aspects of executive functioning and the HADS is a widely used and suitable response variable.The structural nested mean model with g-estimation is appropriate causal inference in these types of data outperforming several alternative methods.

Conclusion
The results from this study identify an unexpected causal relationship between cognitive inflexibility and improvement in symptoms of anxiety and depression.It challenges the scientific consensus, even though the causal interpretation must be viewed as exploratory, due to limitations in the design.Confirmation of these results from other studies will contribute a nuance in the understanding of which variables to target in therapy and why.With higher acceptance in defining causal effects with counterfactuals within the psychological and behavioral science, and the continuing lack of consensus for the most appropriate model for causal inference, this application is meant to promote the structural nested mean model with g-estimation.It has fewer non-testable assumptions than the traditional latent variable models and does not rely on specialized software.

Figure 1 .
Figure 1.Causal diagram (DAG) representing the relation between a time-varying exposure (A 0 , A 1 ), a confounder (L 1 ) for the relation between A 1 and Y and an unmeasured confounder for the relation between L 1 and Y .

Figure 2 .
Figure 2. Time course of cognitive flexibility (IED t ) and mental distress (depression/anxiety symptoms) by HADD t , HADA t and HADT t (with f (t) from Equation 1 included).
Predicting Y s ( A t , 0) for all t, s with s .t by means of ĉ(0) (removing subsequent effects from the observed outcome, see appendix), denoted H 31 , H 42 , and H 41 .The updated and improved ĉ(1) is found by a second independence GEE with these predictions as outcomes in an extended data set (long format), and the same covariates as above, for example fitted by the R command: "geem ((HADD 2 , HADD 3 , HADD 4 , H 31 , H 42 , H 41 ) . . .)" and again with the SNMM par-HighEdu, f (t))ied t coefficients.Standard errors are estimated by bootstrap.
forming a dataset in "long" format and for example fitting the model with the GEE command in R: "geem((HADD 2 , HADD 3 , HADD 4 ) . . .)" (independence working correlation).The (1, HighEdu, f (t − 1))ied Table I and baseline scores on executive functioning for the whole sample are shown in supplementary table 2. At baseline, those with higher education scored significantly better compared to those with low education on the Stockings of Cambridge, Emotional recognition, and Rapid visual processing, but not on cognitive flexibility (see Table II).Figure 3. Causal graph (DAG) of study design, with time-varying action variable, covariates and outcome.Directed arrows represent possible direct causal effects (red arrows are short-term effects being estimated), in a longitudinal design with 390 adults in an occupational rehabilitation program in Norway.Note.A t : action variable (IED t ), L t : baseline and time-varying covariates, Y t : outcome (HADD t , HADA t , HADT t ), C t : censoring-indicator (loss-to-follow up)surrounding box represents conditioning (the outcome is observed only for those not lost-to-follow up) 1104 H. B. Jacobsen et al.

Table I .
Selected demographics at baseline for all participants.

Table II .
Baseline scores on executive functioning measures divided by high and low education levels.

Table IV .
Time-varying estimated SNMM coefficients from TableIII, for each time-point and level of education, describing short-term causal effects over time on mental distress by depressive symptoms (HADD t + 1 ), anxiety (HADA t + 1 ), and sum (HADT t + 1 ) (separate models) of hypothetical interventions on cognitive flexibility (IED t ).