Validation of the new brief 6-item version of the Shirom-Melamed Burnout Measure

Abstract The Shirom-Melamed Burnout Questionnaire/Measure (SMBQ/M) is one of the most commonly used measures of burnout. Using confirmatory factor analyses, the present study aimed to evaluate the model fit, composite reliability, and factorial (i.e. convergent and discriminant) validity of the new brief Swedish version of the scale-labeled SMBM-6. In addition, we used Cronbach’s α as an indicator of the internal consistency of the total scale. The SMBM-6 consists of two subscales: the emotional and physiological exhaustion subscale (three items) and the cognitive weariness subscale (three items). A total of 1251 teachers in Sweden were included in the study. The analyses showed that the Swedish version of the SMBM-6 has an excellent model fit and good convergent validity. The discriminant validity for the cognitive weariness subscale was good, but slightly inadequate for the physiological exhaustion subscale. Composite reliability and Cronbach’s α indicated high internal consistency for the subscales and the total scale, respectively. Multi-group invariance tests for age indicated no violation of invariance. These results are consistent with those of the study by Almén and Jansson (2021), in which the SMBM-6 was developed, and a subsequent psychometric study by Sundström et al. (2022). In conclusion, there is strong support for the Swedish version of the SMBM-6 as a reliable and valid scale for measuring burnout. Testing the scale in languages other than Swedish is warranted.


Introduction
Sustained resource depletion as a consequence of stress manifested by sustained exhaustion, often referred to as burnout (Shirom, 2003) is common in many countries around the world.In Sweden, where the present study was conducted, clinical levels of exhaustion are among the most common reasons for long-term sick leave (Lidwall & Olsson-Bohlin, 2017).Burnout is associated ABOUT THE AUTHORS Niclas Almén is a clinical psychologist, specialized in Cognitive Behavior Therapy.He has a PhD in Psychology from at Mid Sweden University, Östersund, Sweden, and his research focuses on assessment and interventions in the field of stress and recovery.Billy Jansson has a PhD from Stockholm University and currently working at Mid Sweden University, Östersund, Sweden.His research focus mainly on anxiety and trauma.with a wide range of negative responses and events such as digestive problems, skin problems, headaches (Chakravorty & Singh, 2021), anxiety, depression (Koutsimani et al., 2019), allostatic overload, systemic inflammation, metabolic syndrome, cardiovascular disease, mortality (Bayes et al., 2021), and suicidal ideation (Andela, 2021;Wray & Jarrett, 2019).
It is important to have access to valid and user-friendly measurements to assess burnout, for example, to be able to detect (1) people at risk of suffering from the syndrome and associated negative experiences or problems, and (2) workplaces with employees with high levels of burnout.One of the most frequently used measures of burnout is the Shirom-Melamed Burnout Questionnaire/Measure (SMBQ/M; henceforth solely called SMBM; Qiao & Schaufeli, 2011).As different versions of the SMBM have been inadequately tested psychometrically, in a recent study, Almén and Jansson (2021) validated several Swedish versions of the instrument.A fourfactor model (SMBM-22) including the factors emotional and physiological exhaustion, cognitive weariness, listlessness, and tension, and a three-factor model (SMBM-18, including physiological exhaustion, cognitive weariness, and listlessness), reached a good model fit after the removal of three items.In addition, the two different two-factor models (labeled SMBM-11 and SMBM-12) solely covering the core dimensions of burnout-emotional, physiological, and cognitive resource depletion-via the physiological exhaustion subscale and the cognitive weariness subscalereached good model fit without any modifications.All models showed evidence of good composite reliability and convergent validity in terms of how closely the items were associated with their factors.The study raised some concerns regarding discriminant validity with respect to physiological exhaustion unsatisfactory ability to differentiate itself from the overall measure.Additionally, recent studies by Michel et al. (2022) and Sundström et al. (2022) support the construct validity of different versions of the SMBM.
A frequently occurring problem in many research fields, not least when self-reports are requested, is the lack of response rates.In particular, the response rate tends to be low in online surveys (Sammut et al., 2021;Wu et al., 2022).One of the largest problems with insufficient response rates is uncertainty regarding whether the existing sample is representative of the population intended to be studied.When Sammut et al. (2021) conducted a literature review to determine how to counteract the response rate problem, the authors suggested using short surveys.Accordingly, based on items from the SMBM-12, in their psychometric study, Almén and Jansson (2021) developed and analyzed a new brief version of the SMBM, a two-factor model (SMBM-6) consisting of three items from the physiological exhaustion subscale and the cognitive weariness subscale, respectively.Items were selected based on face validity.SMBM-6 demonstrated results very similar, but slightly better, compared to the two factor-SMBM-12.The model fit was excellent, with satisfactory composite reliability for the factors, 0.78 for the physiological exhaustion subscale and .93 for the cognitive weariness subscale, and an excellent Cronbach's α of .90 for the entire scale indicating excellent internal consistency for the total measure.The convergent validity was good for both subscales, while the discriminant validity was good for the cognitive weariness subscale and unsatisfactory for the physiological exhaustion subscale (which was the case for all SMBM-versions tested).In addition, the SMBM-6 correlated between .95 and .98 with the other tested SMBM scales.Following the first psychometric study of the SMBM-6 (Almén & Jansson, 2021), Sundström et al. (2022) evaluated the validity of the same measure, which also demonstrated an excellent model fit for the SMBM-6.Convergent validity was concluded based on subscale intercorrelations and correlations ≥ 0.50 between the overall scale and its subscales, and between the overall scale and several stress-and ill-health-related scales, such as perceived stress, anxiety, and depression.However, some correlations were not strong, with the weakest correlation (.28) demonstrated for the cognitive weariness subscale and self-related health.Sundström et al. (2022) did not test the reliability or discriminant validity of this scale.No measurement invariance tests were performed.
While the SMBM-6 seems to be a valid and reliable instrument for measuring burnout, this version of the SMBM needs to be further cross-validated to draw firmer conclusions regarding the validity and reliability of this scale.The two studies that have investigated the reliability and validity of the SMBM-6 have used general population samples; therefore, it is appropriate in the next step to test whether the scale is reliable and valid on a more specific population, and in particular, to test the scale on occupational groups where high levels of burnout are prevalent, such as nurses (Rudman et al., 2020) or teachers (Mijakoski et al., 2022).School teachers represent one of the largest professional cohorts in Sweden, and there is presently a shortage of qualified teachers in the Swedish labor market (Statistics Sweden, 2019).Burnout, which is associated with stress-related health issues, has emerged as a possible factor contributing to teacher attrition, retirement (Keller et al., 2014) and extended periods of sick leave (Lidwall & Olsson-Bohlin, 2017).
The aim of the present study was to empirically test the fit of the two-factor model of the Shirom-Melamed Burnout Measure (SMBM-6), using a specific population consisting of teachers.This population was appropriate in order to ascertain a wide range of responses, to capture scores towards both the lower and the higher ends of the scale.In addition, we aimed to evaluate the convergent and discriminant validity of the instrument by comparing estimates of average variance extracted and maximum shared squared variance of factors.Furthermore, we examined whether the scale showed similar structure between age groups (i.e., measurement invariance age groups).

Recruitment
We used multiple platforms and methods as a strategy to increase diversity and inclusiveness in the sample.Recruitment was conducted via a link to a web survey published on social media (Facebook, Instagram, and LinkedIn).In addition, principals at 39 primary and secondary schools in Sweden were contacted via email.Five principals accepted the invitation as a link to the web survey was distributed by the principals via email to teachers at each school.In addition, acquaintances were asked to distribute the survey to teachers in their surroundings.In order to reach the target population and to enhance the credibility of the survey, contact information, information about the study, and information with respect to credentials were included as part of the invitation to participate in the study.No incentives were provided for participation.

Participants
The data collection terminated after 10 days, which at that point the number of participants in the study was 1251 (mean age, years = 43.87,SD = 9.68).1141 (91.2%) stated that they were women and 101 (8.1%) stated that they were men, whereas 9 (0.7%) did not state any gender.All the participants worked as teachers.Due to a practical error during data collection, the first 333 (26.6%) participants did not have the opportunity to report the type of teacher they were.Of the remaining teachers, the majority (n = 607; 48.5%) were compulsory school teachers (students usually aged 6-16), while 133 (10.6%) worked at an upper secondary school (students usually aged 15-19), 10 (0.8%) at a folk high school (students usually aged 18 or older), and 95 (7.6%) at preschool (children usually aged 1-6).Teachers at a higher level (i.e., university) or training schools were not included in the study.
The study was conducted in accordance with the Declaration of Helsinki, and the participants were informed about the research purpose and issues concerning confidentiality, anonymity, and their rights were emphasized.Informed consent was obtained from all the participants.

The instrument
The items (see Table 1) included in SMBM-6 were scored on a 7-point scale ranging from 1 (almost never) to 7 (almost always), with the scores on the two subscales, and the total score was averaged by dividing by the number of items of the scale.The person was given the following information before completing the questions: "Below are a number of conditions that everyone can experience occasionally.Describe the degree to which you experienced these during the past month".

Analytical approach
First, we used Cronbach's α as an indicator of the internal consistency of the total scale and ≥ 0.7 was used as the threshold for an acceptable α value (Taber, 2018).
Using confirmatory factor analyses with the Maximum Likelihood estimator, we tested whether the SMBM-6 structure was represented by two correlated first-order factors.There are several measures for evaluating the overall fit of a model (Hu & Bentler, 1999), and the use of multiple measures to interpret model fit is recommended.Using three to four fit indices provides adequate evidence of model fit, and reporting the χ2 value and degrees of freedom, the comparative fit index (CFI) and/or the Tucker-Lewis Index (TLI), and the Root mean squared error of approximation (RMSEA) will usually provide adequate information in order to be able to evaluate a model (Hair et al., 2019).Additionally, the standardized root mean square residual (SRMR) was used (Shi et al., 2019).
Regarding Chi-square statistics, a statistically significant value means that the model is not supported.With respect to RMSEA, values below .06 are considered a good model fit, and values below .08,an adequate fit.SRMR values around .08 or lower indicate a good fit to the data.With respect to the CFI and the TLI, while values above .90suggest an acceptable fit, values above .95suggests a close fit.See Hu and Bentler (1999) for guidelines with respect to the cutoff criteria for fit indices.
Composite reliability was used as a measure of the internal consistency of the factors, and ≥ 0.7 a cut-off value for good reliability (Bacon et al., 1995).The criterion for discriminant validity is when the average variance extracted exceeds the maximum shared squared variance or average shared squared variance.For convergent validity, average variance extracted had to be greater than .50 and lower than composite reliability cale (i.e., variance explained by the construct should be greater than the measurement error and greater than the cross-loadings).See Hair et al. (2019) for the suggested thresholds of these indices.Lastly, measurement invariance tests were conducted across age groups: younger, 19-44 years (n = 633; 50.6%) versus older, 45-88 years (n = 618; 49.4%).A sequential strategy was used, and the invariance was tested at different levels.In the first model, the factor structure was specified identically across groups, with all parameters freely estimated across groups to establish configural invariance (i.e., equivalence in factor structure across the groups).Second, a metric (weak) invariance model was fitted in which the factor loadings were constrained to be equal, and this model fit was compared with the configural (baseline) model.Invariance exists when the fit of the metric invariance model is not substantially poorer than that of the configural model.Third, a scalar (strong) invariance model was fitted, in which factor loadings and item intercepts were constrained to be equal, and this fit was compared against the metric model.Finally, a residual (strict) invariance model was fitted in which factor loadings, intercepts, and residual variances were constrained to be equal, which was compared to the scalar measurement invariance model.
Although a scaled chi-square difference test for nested models can be used to index invariance between models, it suffers from the same dependency on sample size as the minimum fit function statistic; consequently, changes in model fit according to CFI and RMSEA were used.As suggested by Chen (2007), a decrement in CFI of ≥ −.01 in addition to an increment in RMSEA of ≥ .015,corresponds to an adequate criterion indicative of a decrement in fit between models for sample sizes of > 300.

Results
The items for each factor and their corresponding factor loadings are presented in Table 1.The 2factor-SMBM-6 demonstrated an excellent model fit with respect to all fit indices (Table 2).Cronbach's α for SMBM-6 was .927,indicating very good reliability.Composite reliability indices indicated very good reliability for both factors (both substantially above .70),and indices of convergent validity indicated no validity concerns (both factors' average variance extracted were less than composite reliability and greater than .50;see Table 3).While the discriminant validity for the cognitive weariness subscale was good, the average variance extracted for the physiological exhaustion subscale was lower than the maximum shared squared variance, which indicates slightly inadequate discriminant validity for the physiological exhaustion subscale.
With respect to invariance in age groups (see Table 4), the results showed support for configural invariance (indicating a similar factor structure across age groups).There was no substantial  decrease in the model fit in the metric model, indicating that full metric invariance was achieved (i.e., similar strength between the items and constructs across groups).Finally, the change in fit from scalar to residual model (fixing item loadings, intercepts, and residual variance to be equal across groups) passed the criteria for invariance.

Discussion
The aim of the present study was to further evaluate the Swedish version of the new brief, six-item version of the SMBM in order to draw firmer conclusions regarding reliability and factorial validity of the scale.
The Cronbach's α value .93 for the new brief Swedish SMBM-6 is markedly higher than the commonly used threshold (≤0.7) for good reliability (i.e., internal consistency), indicating excellent reliability across the entire scale.Composite reliability indicated excellent reliability (i.e., internal consistency) for both the physiological exhaustion subscale (.85) and the cognitive weariness subscale (.95).The results clearly indicate that the two-factorial SMBM-6 has excellent reliability (i.e., internal consistency), an excellent model fit, and good convergent validity.Regarding discriminant validity, it was good for the cognitive weariness subscale and slightly inadequate for the physiological exhaustion subscale (because the indicators for this factor had less unique variance).The multi-group tests of invariance for age showed no decrement in model fit at any level, suggesting that the 6-item model obtained from the confirmatory factor analyses worked equally well for the two age groups.The results obtained in the present study confirm the conclusions made in previous studies of the SMBM-6 (Almén & Jansson, 2021;Sundström et al., 2022), that the Swedish SMBM-6 is a reliable and valid measure of burnout.
Based on the conclusion by Sammut et al. (2021), the use of short surveys to counteract the common problem with low response rates, SMBM-6 could be beneficial for the response rate in comparison with the longer versions of the SMBM.Another advantage is the possibility of using SMBM-6 when conducting studies that have frequent assessments, for example, in diary or intervention studies that analyze change processes.In addition, as clinical levels of burnout can be difficult to treat, researchers recommend investing in preventive interventions (Glise et al., 2020) in which screening may be needed to capture people at risk, which could be done advantageously with fast-administering measurement methods.Moreover, because stress and burnout are related to many factors, many factors may need to be studied simultaneously, and the possibility of this increases if we have access to brief scales.
A limitation of the present study was the use of a non-randomized sample, which may limit the generalizability of the study's findings.However, the present evaluation, along with the two previous evaluations of the Swedish version of the SMBM-6 (Almén & Jansson, 2021;Sundström et al., 2022) suggest that the results can be generalized to adults in general, as the results have pointed in the same direction when using a random and a non-random sample, and for general population samples and for a specific occupational group (teacher) sample.In line with this, the invariance testing in the present study and in the first study of the SMBM-6 (Almén & Jansson, 2021) indicates that the results hold for age groups.The first study on SMBM-6 demonstrated no violations of gender invariance.A limitation of the present study was the low proportion of men participating, which did not allow us to examine possible violations of gender invariance, and measurement invariance across gender should be considered in future validation studies.
There is strong empirical support for the conclusion that the Swedish version of the SMBM-6 is a reliable and valid scale for measuring burnout.The study demonstrates results that warrant further research, in particular, the testing of the scale in languages other than Swedish.If the SMBM is to be used for repeated measurements, for example, every week or every day, it is important to test the scale with an alternative instruction (in the instruction used in the three validation studies of the SMBM-6 conducted so far, the person was asked to base her/his estimate in the last month).For such use, in addition to the analyses made in the present study, test-retest reliability may be important to test.

Table 1 . The items included in SMBM-6 and its factor loadings Factor loadings
Note.In brackets, the instruction and questions in Swedish that were used in the study are presented.EPE = the Emotional and Physical Exhaustion subscale; CWE = the Cognitive Weariness subscale

Table 4 . Results of the multi-group tests of invariance regarding age
Note.CFI = Comparative Fit Index; TLI = Tucker Lewis Index; SRMR = Standardized Root Mean square Residual; RMSEA = Root Mean Square Error of Approximation; SRMR = Standardized Root Mean square Residual.The Deltas are With Respect to the Previous Level of Measurement Invariance.