Cognitive function in clinical burnout: A systematic review and meta-analysis

ABSTRACT Clinical burnout has been associated with impaired cognitive functioning; however, inconsistent findings have been reported regarding the pattern and magnitude of cognitive deficits. The aim of this systematic review and multivariate meta-analysis was to assess cognitive function in clinical burnout as compared to healthy controls and identify the pattern and severity of cognitive dysfunction across cognitive domains. We identified 17 studies encompassing 730 patients with clinical burnout and 649 healthy controls. Clinical burnout was associated with impaired performance in episodic memory (g = −0.36, 95% CI −0.57 to −0.15), short-term and working memory (g = −0.36, 95% CI −0.52 to −0.20), executive function (g = −0.39, 95% CI −0.55 to −0.23), attention and processing speed (g = −0.43, 95% CI −0.57 to −0.29) and fluency (g = −0.53, 95% CI −1.04 to −0.03). There were no differences between patients and controls in crystallized (k = 6 studies) and visuospatial abilities (k = 4). Our findings suggest that clinical burnout is associated with cognitive impairment across multiple cognitive domains. Cognitive dysfunction needs to be considered in the clinical and occupational health management of burnout to optimise rehabilitation and support return-to-work.


Introduction
Exposure to work-related stress has been identified as an important occupational health risk factor, associated with impaired mental health (Harvey et al., 2017) and substantial societal costs (Hassard et al., 2018). One of the most well-known constructs related to prolonged psychosocial stress exposure is burnout, commonly characterised across three dimensions: emotional exhaustion, cynicism and reduced personal accomplishment (Maslach et al., 2001). Burnout refers to an occupational phenomenon rather than a medical diagnosis; however, the growing problem with stress-related mental disorders has led researchers and clinicians alike to increasingly acknowledge the end stage of the burnout process, often referred to as clinical burnout (Grossi et al., 2015;Kleijweg et al., 2013;Schaufeli et al., 2001;van Dam, 2021). In this stage, burnout symptomology is severe enough to cause significant distress and impaired daily functioning and requires professional treatment (Grossi et al., 2015).
Despite the vast negative consequences of burnout, research and policy responses have been hampered by the large variability in how it is defined and assessed (Bayes et al., 2021;Eurofound, 2018) and caution has been raised that clinical burnout is poorly recognised and managed in health care practice (Kakiashvili et al., 2013). Consequently, attempts have been made to formalise diagnostic criteria in order to standardise diagnostic procedures and treatments, including the diagnosis exhaustion disorder, incorporated into the Swedish version of the International Classification of Diseases (ICD-10) (Grossi et al., 2015), as well as using the ICD-10 diagnosis of neurasthenia with the addition that symptoms are work-related (e.g. Roelofs et al., 2005;Schaufeli et al., 2001) and the Diagnostic and Statistical Manual of Mental Disorders (4th ed.; DSM-IV) diagnosis for undifferentiated somatoform disorder with fatigue as the main complaint (e.g. Kleijweg et al., 2013). These diagnoses all share the core symptom of clinically elevated levels of exhaustion and severe fatigue. Additional symptoms associated with clinical burnout include sleep disturbances (Sonnenschein et al., 2007), depressed mood (Glise et al., 2012;Sonnenschein et al., 2007) and somatic symptoms such as gastrointestinal problems, muscular aches and pain (Glise et al., 2014).
Patients with clinical burnout report experiencing cognitive difficulties with significant impact on daily functioning (Eskildsen et al., 2015;Oosterholt et al., 2014;Öhman et al., 2007;Österberg et al., 2009), which can persist several years after treatment despite clinical improvement (Dalgaard et al., 2021;Ellbin et al., 2021;Oosterholt et al., 2016). Corroborating this subjective cognitive deficit, a previous systematic review concluded that burnout is associated with impaired performance on neuropsychological tests, primarily within the domains executive function, attention and memory (Deligkaris et al., 2014). However, previous findings have been heterogeneous, with evidence of both intact and impaired cognitive performance within the investigated domains. For example, while some studies have reported deficits in non-verbal memory (Sandström et al., 2005), others have not (Eskildsen et al., 2015;Öhman et al., 2007). Similarly, impairments in short-term and working memory have been observed (Jonsdottir et al., 2013;Rydmark et al., 2006;Savic et al., 2018) but intact performance on these tests has also been found (Oosterholt et al., 2014;Öhman et al., 2007;Sandström et al., 2005) and contradictory findings have been reported also for other cognitive domains. Thus, while cognitive impairment is increasingly acknowledged as an important aspect of clinical burnout, the specific cognitive domains that are affected and the magnitude of impairment is not yet fully understood.
Some of the discrepancies in previous studies may be attributed to methodological issues, such as small sample sizes, differences in study populations and diagnostic procedures, and the use of different cognitive tests to assess and classify study outcomes across cognitive domains. Additionally, it remains unclear whether cognitive deficits in clinical burnout are associated with demographic and clinical variables, such as age, gender and comorbid depression. Previous reviews have narratively synthesised the available literature (Deligkaris et al., 2014;Grossi et al., 2015), but so far there has been no attempts to meta-analytically investigate the existing data and explore the sources of heterogeneity of previous findings.
The current systematic review and meta-analysis sought to synthesise the literature on cognitive performance in clinical burnout as compared to healthy controls. Specifically, our aim was to assess the pattern and severity of cognitive dysfunction across different cognitive domains; and explore the potential moderating effect of clinical, demographic and methodological variables.

Protocol and registration
This review adheres to the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines (Liberati et al., 2009). The review protocol was prospectively registered with PROSPERO (CRD42020219276).

Search strategy and study selection
An electronic database search of MEDLINE, EMBASE and PsycINFO was conducted from inception to 3rd of November 2020. The full search strategy can be obtained from the corresponding author. No restrictions on language or publication type were applied and the electronic search was complemented by hand searching the references of retrieved articles and of previous reviews. One person (HMG, MD, AN or ASN) conducted initial screening of titles and abstracts. Full-text screening of potentially relevant articles was conducted by two independent reviewers (HMG, MD or ASN) . Disagreements were resolved by consensus.

Eligibility criteria
To be included in the systematic review, studies had to meet the following inclusion criteria: (1) cross-sectional or longitudinal studies presenting the results on (2) cognitive function (any domain) assessed with at least one standardised neuropsychological test or close equivalent in (3) adults (>18 years) with clinical burnout, stress-related exhaustion disorder or equivalent diagnosis (e.g. stress-related mental disorder with work-related causes), compared to (4) a healthy control group without stress-related illness or other mental or physical disorders with potential impact on cognitive function. Studies were excluded if they focused on non-clinical samples, such as individuals recruited from the community who reported symptoms of burnout but without diagnosis or in need of treatment, or if the cognitive task involved some form of manipulation, such as performing the task under stress. For longitudinal or interventional studies, only baseline data was included.

Data collection and coding
Data were extracted by one reviewer (HMG) and checked against the original publication by another reviewer (MD or ASN). Disagreements were resolved by consensus. For each study, we extracted the mean and standard deviation for each group and outcome. If data could not be extracted from a study, we contacted the authors and requested the missing group-level data. We used the Cattell-Horn-Carroll-Miyake framework (Webb et al., 2018), with some slight modifications, to classify cognitive outcomes into the following domains: crystallized ability; episodic memory; short-term and working memory; executive function; attention and processing speed; visuospatial ability; fluency; and fluid reasoning. Executive function tasks were further specified as shifting, updating or inhibition, and short-term and working memory tasks were classified as short-term memory, low working memory or high working memory. The classification of outcome measures by domain can be obtained from the corresponding author.
Each cognitive task was further categorised based on the following features: (1) stimulus mode: whether the task stimuli was verbal or non-verbal; (2) outcome type: whether the outcome was measured in time (reaction time or time to completion) or performance (e.g. number of correct responses or errors); and (3) type of assessment: whether the task was computerised or assessed through traditional (pencil-and-paper) administration. Episodic memory tasks were further coded as either immediate, delayed or prospective memory. Furthermore, the following variables were coded for the study level moderator analyses: (1) mean age of the patient group; (2) percent female in the patient group; (3) diagnostic group: classified as exhaustion disorder, undifferentiated somatoform disorder or other diagnosis equivalent to clinical burnout; (4) comorbidity: whether or not the study excluded participants with clinically diagnosed depression; and (5) matching criteria: whether or not the study used one or more criteria (age, gender and/or educational level) for matching of the patient and control group.

Quality assessment
Quality assessment of individual studies was conducted using the eight-item JBI Checklist for analytical cross-sectional studies (Moola et al., 2020). Each item was classified as either Yes (criteria fulfilled), No (criteria not fulfilled) or Partial (criteria partially fulfilled or description unclear). The assessments were conducted independently by two reviewers (HMG and EÅ) and disagreements were resolved by consensus.

Data analysis
Between-group differences in cognitive function were converted to standardised mean difference, calculated as Hedges' g with 95% confidence interval, between the clinical burnout group and the comparison group for each eligible outcome. Hedges' g represents the standardised mean difference between the groups corrected for small sample size bias and can be interpreted similarly as Coheńs d. A negative effect size represented lower performance in the patient group. When a study reported several cognitive measures for the same cognitive domain, all eligible outcomes were included in the analysis and combined to a single effect size. Pooling of outcomes across studies was conducted using multivariate random-effects models with robust variance estimation to account for non-independence of multiple effect sizes within studies (Hedges et al., 2010), using the packages robumeta (Fisher et al., 2017) and clubSandwich (Pustejovsky, 2020) for R. Analyses were performed for overall cognitive function, comprising of all cognitive results combined, as well as for each cognitive domain. Domain-specific analyses were contingent on the availability of at least three studies for analysis. The alpha value was set to 0.05. Following established convention, an effect size of ≥ 0.80 was considered large, ≥ 0.50 was considered medium and ≥ 0.20 was considered a small effect. Heterogeneity across studies was quantified using τ² and expressed as a proportion of overall observed variance using the I² statistic (Borenstein et al., 2017;Higgins & Thompson, 2002). Finally, we calculated the prediction interval to assess the dispersion of true effects (Riley et al., 2011).
We investigated potential moderators of neuropsychological test performance by conducting pre-specified subgroup analyses using robust variance estimation metaregression models. This was done when I 2 ≥ 25% and at least three studies were available within subgroups. For each cognitive domain, the following moderators were investigated: stimulus mode (verbal or non-verbal), outcome type (time or performance) and type of assessment (computerised or traditional). For episodic memory, the moderating effect of memory type (immediate, delayed or prospective) was also investigated. Furthermore, the following study level moderators were investigated for the overall cognitive outcomes: mean age of the patient group, percent female in the patient group, diagnostic group, comorbid depression (included or excluded) and control group matching. Small-study effect was assessed for the overall cognitive outcome by visually inspecting funnel plots of standardised mean difference against standard error and tested formally using a multivariate analogue of the Egger's test (Egger et al., 1997;Sterne et al., 2011).
The results from the quality assessment are available from the corresponding author. The main areas for which criteria were not fully met concerned describing the study   subjects and setting in sufficient detail and using appropriate strategies to deal with confounding factors. Moreover, while the majority of the included studies used standard neuropsychological tests to assess the outcome, several did not report whether the cognitive assessment was conducted by a trained assessor. The majority of the included studies used validated instruments to assess burnout levels in both the patient group and the control group and diagnosed clinical burnout based on standardised diagnostic criteria (e.g. ICD-10-SE criteria for exhaustion disorder or DSM-IV criteria for undifferentiated somatoform disorder).

Neurocognitive functioning
Overall cognitive function Across 17 studies and 176 effect sizes, a small and statistically significant effect size was found for reduced overall cognitive performance in participants with clinical burnout compared to healthy controls (g = −0.34, 95% CI = −0.43 to −0.24, p < .001), with moderate heterogeneity (τ 2 = 0.08, I 2 = 59%, prediction interval −0.95 to 0.27). No significant funnel plot asymmetry was detected (β = −1.32, p = .22). The funnel plot and forest plot for overall cognition can be obtained from the corresponding author. There was no evidence for a moderating effect of age (β = 0.013, p = .48) or gender (percent female: β = −0.002, p = .49), nor any significant between-group differences in the overall cognitive effect size depending on the criteria used for diagnosis, inclusion of comorbid depression or control group matching (Table 2).

Domain-specific cognitive function
The results from the analyses of individual cognitive domains are shown in Figure 2. Separate forest plots for each domain are available from the corresponding author. Small and statistically significant effect sizes were found for episodic memory (g = −0.36), shortterm and working memory (g = −0.36), executive function (g = −0.39) and attention Note. CI = confidence interval and processing speed (g = −0.43). A statistically significant medium effect size was observed for fluency (g = −0.53). Within the executive function domain, patients performed significantly worse than healthy controls in shifting (g = -0.41) and inhibition (g = −0.38), but not in updating (g = −0.28, p = .08). For short-term and working memory, a non-significant effect size was found for short-term memory (g = −0.15, p = .36), whereas a small and statistically significant effect size was seen for low working memory (g = −0.36) and high working memory tasks (g = −0.47). No significant difference between patients with clinical burnout and healthy controls was found for crystallized ability (g = −0.10, p = .14) or visuospatial ability (g = −0.03, p = .44). Only two studies investigated fluid reasoning outcomes and data were therefore not pooled in meta-analyses. A significant impairment in abstract reasoning was found in one study (Nelson et al., 2021), whereas the other study found no difference between patients and controls in abstract and verbal reasoning (Sandström et al., 2005). Domain-specific moderators of neuropsychological test performance were investigated for four domains, in which low to moderate heterogeneity was observed (I 2 ≥ 25%) and a sufficient number of studies were available for subgroup analyses (Table 3). A significant moderating effect was found for type of assessment within the executive function domain, for which a larger effect size was observed for traditional assessment (g = −0.57) compared to computerised assessment (g = −0.31). No other statistically significant between-subgroup differences were found, although a similar pattern was observed also for attention and processing speed (traditional assessment: g = −0.55; computerised assessment: g = −0.33, test for subgroup difference: p = .11). Moreover, for episodic memory, larger effect sizes were noted for prospective memory and to some extent delayed memory, as compared to immediate memory, but the test for subgroupdifferences did not reach statistical significance.

Discussion
The results from this systematic review and meta-analysis showed that clinical burnout is associated with broad impairment across several cognitive domains. Across 17 studies encompassing 730 patients and 649 healthy controls, we found that clinical burnout was associated with impairments in episodic memory, short-term and working memory, executive function, attention and processing speed, and fluency, with effect sizes ranging from small to medium. No statistically significant differences between participants with clinical burnout and healthy controls were observed for crystallized ability and visuospatial function. A more detailed investigation of the executive function domain revealed a statistically significant impairment in inhibition and shifting, but not in updating. The domain-specific analysis for updating was, however, characterised by large heterogeneity and the effect size was of similar magnitude (g = −0.28) as several of the other domains that reached statistical significance, which disallows firm conclusions as to whether executive function impairments in clinical burnout are selective or domain-general. For short-term and working memory, impairments increased with higher cognitive load, as no statistically significant difference was found for short-term memory (g = −0.15), whereas the largest effect size (g = −0.47) was found for tasks with high working memory demands.
Effect sizes were largely comparable for tasks using verbal and non-verbal material; when the outcome was measured in time or performance; and when the task was computerised or assessed through traditional mode of administration. Thus, within the investigated domains, impairments were largely equivalent irrespective of the specific task characteristics. However, for executive function, traditional assessment yielded larger effect sizes than computerised assessment, suggesting that standard neuropsychological tests may be more sensitive to use in clinical practice compared to more experimental tasks. Within the episodic memory domain, markedly larger effect sizes were observed for prospective memory along with greater impairment in delayed memory, as compared to immediate memory. Prospective memory tasks involve remembering to execute a delayed intention (McDaniel et al., 1999), such as taking medication or attending a meeting; these tasks are multifactorial by nature and dependent on the integration of episodic memory and executive control processes (Kliegel et al., 2011). Although these results should be interpreted with caution, they nevertheless provide some interesting hypotheses into the pattern of memory impairment in clinical burnout, suggesting that retaining and retrieving information across delayed periods of time may be particularly challenging, especially when doing so is dependent on self-regulated task execution.
The results from this meta-analysis corroborates the problems with memory and concentration frequently reported by patients with clinical burnout (Eskildsen et al., 2015;Oosterholt et al., 2014;Öhman et al., 2007;Österberg et al., 2009). The overall pattern of impairment is indicative of a cognitive control deficit with broad implications across domains, which aligns with suggestions that cognitive impairments in clinical burnout are prefrontal in nature. Structural and functional deviations in the prefrontal cortex, as well as the striatum and the amygdala, are consistent findings from neuroimaging studies in the field (Blix et al., 2013;Golkar et al., 2014;Jovanovic et al., 2011;Sandström et al., 2012;Savic, 2015), which may additionally be related to state-dependent phenomena with potential impact on cognitive performance, such as mental fatigue (Gavelin et al., 2020;Skau et al., 2021). The practical implications of these cognitive deficits have so far been sparsely investigated, but given the importance of higherorder cognitive abilities such as executive function and working memory for several aspects of life (Diamond, 2013), including stress regulation (Williams et al., 2009) and occupational functioning (Knight et al., 2018), it is conceivable that a small impairment in these abilities could have a large impact for the individual. Difficulties with planning, organising and behavioural flexibility could lead to less efficient coping, impede the effectiveness of psychological treatment and hamper return to work. In everyday life, such deficits may manifest as difficulties managing complex tasks, staying focused, learning new things, indecisiveness and impaired job performance (Arnsten & Shanafelt, 2021;van Dam, 2021). Moreover, performing cognitive tasks may require larger investment of effort (Krabbe et al., 2017;Oosterholt et al., 2014), which could add to the state of exhaustion. Longitudinal studies have indicated that patients with remaining clinical symptoms (Dalgaard et al., 2021) and those who are still on sick-leave  at follow-up also tend to perform worse on cognitive tests, but the nature of the association between cognitive impairment and other areas of functioning, as well as clinical disease progression, is not well established.
Despite the different classifications used to diagnose clinical burnout, we found no differences in cognitive effect sizes depending on the criteria used for diagnosis. Of note, slightly larger effect sizes were observed in the "other" category, which mainly included older studies for which criteria for clinical burnout diagnosis were not yet clearly specified. This might be reflective of more heterogeneous patient samples included in these studies, including a higher prevalence of other comorbidities. Given the complex symptomology in clinical burnout, it has been cautioned that failure to control for the confounding effect of comorbid psychopathologies might lead to overestimation of the degree of cognitive impairment (Deligkaris et al., 2014). In particular, there is consistent evidence that depression is associated with cognitive dysfunction (Rock et al., 2014); however, we found no evidence that inclusion of patients with a comorbid diagnosis of depression served as a confounder, at least not at the study level. Notwithstanding, depressive symptoms are common in clinical burnout and the potential influence of depressive symptomology on cognitive function in the patient group remains to be established. Finally, although we observed no moderating effect of age or gender, it should be noted that the included studies were fairly similar in terms of the mean age and percentage of females in the patient groups, which is generally reflective of the patient population (e.g. Glise et al., 2012) but limits the possibility to explore the moderating effect of these variables across this relatively small set of studies.
Impaired cognitive performance is not exclusive for burnout; rather, broad cognitive deficits with small to medium effect sizes have been observed across a wide range of psychological disorders (Abramovitch et al., 2021;Snyder et al., 2015). In this context, it is important to consider that a small between-group effect size does not necessarily translate into clinical detection of cognitive impairment on standardised neuropsychological tests for the individual patient, especially if premorbid cognitive ability is high. In clinical burnout, an important area for future research is to move beyond group averages and explore within-group heterogeneity; this includes using normative data to identify patients performing below average limits and explore whether there are cognitive subgroups that relate differently to clinical and functional outcomes, as well as which neuropsychological tests may be the most sensitive to detect an impairment. Although some such efforts have been made by more recent studies (Bartfai et al., 2021;Ellbin et al., 2018), more knowledge is needed to guide clinical practice. Developing and evaluating more ecologically valid tasks that better capture the cognitive complexities of everyday life and may detect subtle cognitive deficits could also be of value.
Moreover, the fact that the majority of the effect sizes were small contrasts the high level of everyday cognitive problems reported by patients (Nelson et al., 2021) and further investigation into this discrepancy is warranted. One proposed explanation is that individuals with clinical burnout cope with challenges by perseverance and may be able to compensate for cognitive deficits by investing more effort in task performance; this may, however, lead to increased mental fatigue following cognitive activity (Krabbe et al., 2017;Oosterholt et al., 2014;Skau et al., 2021;van Dam, 2021). Of note, the observed pattern within the short-term and working memory domain, in which effect sizes increased with higher cognitive load, aligns with observations that mental fatigue primarily influences cognitive control processes rather than simpler tasks (van der Linden et al., 2003) and could also speak against low motivation as an explanation for cognitive underperformance. Taken together, further addressing the influence of motivation, effort and fatigue on cognitive performance in burnout is warranted, as is exploring the potential transdiagnostic nature of such factors.
This study has several clinical and practical implications. Firstly, our results highlight the importance of assessing cognitive function as part of the clinical management of burnout. Given the weak relationship between self-reported cognitive difficulties and neuropsychological test performance (Nelson et al., 2021;Österberg et al., 2012), cognitive screening using abbreviated test batteries might be the best suitable method to detect the presence of cognitive impairment and signal when a more detailed neuropsychological assessment is needed. Developing and validating cognitive screening tools specifically for this patient group remains an important area for future research. Secondly, cognitive impairment needs to be considered in the management of clinical burnout, for example when providing information or setting tasks as part of behavioural interventions. Cognitive dysfunction also needs to be considered in the return-to-work process, which might be facilitated by reducing the cognitive demands in the workplace and alternating complex cognitive tasks with simpler ones (van Dam, 2021), as well as increasing a sense of control over the work situation (Arnsten & Shanafelt, 2021). Finally, targeted interventions that could support or improve cognitive function in clinical burnout remains an important area of investigation. Positive effects on cognition have been observed following cognitive training and physical exercise (Gavelin et al., 2018); additional areas to consider are the use of principles from neuropsychological rehabilitation (Wilson, 2008), such as developing compensatory cognitive strategies, as well as reducing the negative impact of mental fatigue, for example by alternating activity and rest (van Dam, 2021), in order to optimise everyday and occupational function.
Some limitations of this study should be addressed. Firstly, the included studies employed a variety of different cognitive measures, which may have contributed to the observed heterogeneity within cognitive domains. Increased harmonisation of outcome measures in the field could therefore be of value to facilitate comparisons of findings across studies. Although we attempted to explore the sources of the observed heterogeneity, the relatively small number of studies reduced the power of the moderator analyses. Moreover, other factors with potential impact on cognition, such as anxiety, pain, sleep, medication use and disease duration were not considered in the present analysis. Of note, these variables were scarcely reported in primary trials and future studies should attempt to delineate the clinical characteristics of the patient samples in more detail, as well as more clearly describing the control group characteristics. Similarly, depressive comorbidity was inconsistently assessed and reported across trials, which prevented us from performing a more detailed analysis of the moderating effect of depressive symptomology on cognitive performance. Moreover, this review was limited to crosssectional studies and thus we cannot establish whether cognitive dysfunction is an antecedent or a consequence of clinical burnout. Prospective and longitudinal studies are needed to fully answer this question and explore potential bidirectional relationships, as well as to establish the longitudinal course of cognitive impairment in burnout. Finally, it is clear from this review that a variety of different methods are used to diagnose clinical burnout and while our results were robust to the different diagnostic criteria used across studies, this nevertheless contributes to clinical heterogeneity. This observation concurs with previous reports on the large variation in how burnout is defined and assessed across different settings, and highlights the need to establish international consensus on diagnostic criteria in order to improve research synthesis efforts in the field as well as clinical management.
In conclusion, the results from this meta-analysis show that clinical burnout is associated with impairment across several cognitive domains, primarily within executive function, working memory, attention and processing speed and episodic memory. Impairments seem to increase with higher cognitive load, and when tasks place greater requirement on attentional and executive control. Proper detection of cognitive impairment in clinical burnout is needed to increase awareness of the current level of functioning, facilitate clinical management and optimise return-to-work for employees on sickleave due to burnout.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by FORTE [grant number 2020-01111].