The optimal short version of the Zarit Burden Interview for dementia caregivers: diagnostic utility and externally validated cutoffs

ABSTRACT Objectives: Using a sample of dementia caregivers, we compared the diagnostic utility of the various short versions of the Zarit Burden Interview (ZBI) with the original scale to identify the most optimal one. Next, we established externally validated cutoffs for the various ZBI versions using probable depression cases as a reference standard. Methods: Caregivers (N = 394; 236 males; Agemean = 56 years) were administered the ZBI and a self-report depression measure. Participants who exceeded the cutoff for the latter were identified as probable depression cases. For each of the ZBI versions, a receiver operating characteristic (ROC) curve was plotted against probable depression cases. The area under these ROC curves between the short versions and the original were then compared using a non-parametric approach. Results: Compared to the original ZBI, the AUROC were similar for the 6-item, 7-item, and two 12-item versions, but significantly worse for the other short variants. The sensitivity and specificity of the cutoffs for all ZBI versions ranged from 77.3% to 85.2% and 60.1% to 79.8%, respectively. Conclusions: The original ZBI had good utility in identifying probable depression in caregivers, while the 6-item variant can be a useful alternative when short versions are preferred.


Introduction
As the prevalence of dementia soars due to the rapidly aging populations in developed countries, an increasing number of people will have to care for a loved one with dementia and undertake its associated burdens. Caregiver burden have been previously defined as the financial, physical and psychological consequence of caring for an adult with a disabling condition (George & Gwyther, 1986). Caring for a person with dementia (PWD) is a long-term commitment and may span up to 20 years after the initial dementia diagnosis (Karlin, Bell, Noah, Martichuski, & Knight, 1999). As a result of such long term burden, meta-analytic research have documented that caregivers, relative to non-caregivers, are more likely to suffer from depression and physical illnesses, as well as experience a decrease in self-efficacy and subjective well-being (Pinquart & S€ orensen, 2003). Hence, the need to assess caregivingrelated burden cannot be understated. Such assessments help inform the clinician if appropriate interventions are needed and ultimately the lived experience with dementia for both the caregiver and PWD.
The Zarit Buden Interview (ZBI; Zarit & Zarit, 1987) is one of the most widely used measures to assess caregiver burden. Since its inception, this 22-item scale has been translated into many languages and used in many countries across a diverse range of caregivers and patient populations; meta-analytic research have suggested the ZBI to be reliable across the diverse contexts in which it has been used in (Bachner & O'Rourke, 2007). To facilitate quicker administration, several short versions of the ZBI have been developed,using various methods, ranging from single-item to 18-item versions. Whitlatch, Zarit, and von Eye (1991) carried out an exploratory factor analysis (EFA) and identified the two factors of personal strain and role strain, consisting of 18 items in total. Arai, Tamiya, and Yano (2003) similarly obtained the two factors of personal and role strain in their EFA. However their model consisted of eight items. In Knight, Fox, and Chou (2000) EFA, they identified the three factors of embarrasment/anger, patient's dependency and self-criticism which collectively consisted of 14 items. B edard et al. (2001) created the 4-item and 12-item versions, by choosing four and 12 items, respectively, with the highest item-total correlation within a similar personal and role strain two-factor model that emerged as the optimal model in their EFA. H ebert, Bravo, and Pr eville (2000) obtained another 12-item version by carrying out an EFA based on the Whitlatch et al. (1991) two-factor model and selected items which constitute the most parsimonious structure of the two-factor model. These studies were carried out on dementia caregiver populations. Using data from caregivers of palliative care patients; Gort et al. (2005) produced their short version by having an expert committee select seven items. Higginson, Gao, Jackson, Murray, and Harding (2010) modified this 7-item version by removing the global item question (item 22) to obtain a 6-item version, which was validated in a mixed sample of dementia, cancer and brain injured patients' caregivers. Also within the same study, the authors examined a 1-item (global item question) version. The list of included items of these different short versions is presented in Table 1.
Nevertheless, the ZBI and their short variants are fraught with a few problems. First, having multiple short versions can be problematic as it makes for comparison between them difficult, and it is unclear if these short versions are equivalent to the original in terms of diagnostic utility. It would also be useful to know which of these versions is the most optimal, that isthe least-item version that has comparable diagnostic utility to the original. Furthermore, these short variants and the original lack cutoffs that are validated against a clinically significant outcome. For instance, the cutoff in the original ZBI (Zarit & Zarit, 1987) and one of the 12-item (B edard et al., 2001) versions were determined arbitrarily. While Higginson et al. (2010) reported the cutoffs for various short versions of ZBI in their study, they obtained these cutoffs by using the arbitrarily determined cutoff in the original as a reference standard. As such, these cutoffs lack external validity, as O' Rourke and Tuokko (2003) observed that the cutoff for the 12-item variant (B edard et al., 2001) had low levels of sensitivity in classifying probable depression cases. The lack of validated cutoffs for the ZBI and its short variants makes it difficult for one to assess the clinical significance of the scores and, more importantly, the need for intervention. Therefore, the current study aimed to address these issues and better inform clinicians on the use of the ZBI and their short versions. In the current study, we compared the diagnostic utility, via the area under the receiver operating characteristic (AUROC), between the various ZBI short versions and the original, to identify the most optimal ZBI short version. Next we sought to determine externally validated cutoffs for the various ZBI variants. Given that a specific clinical diagnosis does not exist to characterize excessive caregivers burden, similar to a previous validation study (O'Rourke & Tuokko, 2003) we opted to use a common consequence of such excessive burdensdepression (Alspaugh, Stephens, Townsend, Zarit, & Greene, 1999;Berger et al., 2005)., as a reference standard. That is, if one becomes depressed, it would be reasonable to assume that such burdens are excessive.

Participants and procedures
Participants were recruited from the dementia services of two tertiary hospitals in the north-eastern part of Singapore. We used a consecutive sampling method and had a response rate of 87.8%. Participants were included if they are : (1) age 21 or above; (2) spouses or children of PWD ; and (3) caring for PWD who is residing in the community. A total of 394 caregivers for persons at various stages of dementia were recruited. Table 2 presents the demographical characteristics of these caregivers and the person they are caring for. The PWDs were mostly females (Age mean = 79.5; SD = 8.2) and at the moderate and severe stage of dementia. Their caregivers (Age mean = 53.0; SD = 10.7) were mostly Chinese, married, children of PWD and the primary caregiver of the PWD. On average, they have cared for the PWD for 6.8 years (SD = 6.7).
At the respective clinics in the hospitals, these caregivers gave their informed consent before completing onsite, the self-administered demographic questionnaire, the ZBI, and Center for Epidemiological Studies Depression Scale (CES-D). This study had received the ethical approval from the Domain Specific Review Board of the National Healthcare Group, Singapore.

Measures
The ZBI (Zarit & Zarit, 1987) measures the perceived burden of caregivers via 5-point Likert scale items. These items were summed to produce a total score ranging from 0 to 88. According to the original test instructions, score range of 61-88 indicates high burden. The ZBI has demonstrated good reliability and validity for assessing caregiver burden in our local context (Yap, 2010). On top of the original 22-item total score, we also calculated the total scores for the different 1item 7 1 Your relative asks for more help than he/she needs? @ 2 You don't have enough time for yourself? @ @ @ @ @ @ @ 3 Stressed between caring and meeting other responsibilities? @ @ @ @ @ @ 4 Embarrassed over behaviors? @ @ @ 5 Angry when around your relative? @ @ @ @ 6 Your relative affects your relationship with others in a negative way? @ @ @ @ @ @ @ 7 Afraid of what the future holds for relative? @ 8 Your relative is dependent on you? @ @ 9 Strained when are around your relative? @ @ @ @ @ @ @ @ 10 Your health has suffered because of your involvement with your relative? @ @ @ @ @ 11 You don't have as much privacy as you would like, because of your relative? @ @ @ @ 12 Your social life has suffered because you are caring for your relative? @ @ @ @ @ 13 Uncomfortable about having friends over because of your relative? @ @ @ @ 14 Your relative seems to expect you to take care of him/her, as if you were the only one he/she could depend on? @ @ 15 You don't have enough money to care for your relative, in addition to the rest of your expenses? 16 You will be unable to take care of your relative much longer? @ 17 You have lost control of your life since your relative's illness? @ @ @ @ @ 18 You could just leave the care of your relative to someone else? @ @ @ @ 19 Uncertain about what to do about relative? @ @ @ @ 20 You should be doing more for your relative? @ @ @ 21 You could do a better job in caring for your relative? @ @ @ 22 Overall, how burdened do you feel in caring for your relative? @ @ @ short versions of the ZBI by summing up their respective items (see Table 1). The CES-D is a 20-item, self-report scale which assessed depressive symptoms in the past one week (Radloff & Radloff, 1977). Each item is scored on a 4-point Likert scale to reflect the frequency of the described symptom. The total score for this scale ranges from 0 to 60. A cut-off score of 16, as suggested by the original authors, was used in the current study to classify participant as having probable depression. This cutoff has been validated in the local context and has demonstrated high levels of sensitivity ( 90.9%; Stahl et al., 2008), although its specificity was relatively low (67.6) in classifying depression cases. Such levels of specificity were similar to the pooled specificity statistic obtained via a meta-analysis (Vilagut, Forero, Barbaglia, & Alonso, 2016). Furthermore, in view of the trade-off between sensitivity and specificity, it may be better to opt for higher sensitivity than specificity, especially since false-negative diagnoses may have more costly consequences that those of false-positive in the clinical context.

Analyses
For each of the studied ZBI variants, we plotted a receiver operating characteristic (ROC) curve against probable depression cases and calculated the AUROC. Next, we assessed whether the AUROC of the various ZBI short versions were significantly different from that of the original via a nonparametric approach (Delong & Carolina, 1988) to derive confidence intervals and standard errors of the differences in AUROC. Correlations between variables were examined using Pearson correlation coefficients. Statistical significance was set at p < 0.05. Given that multiple comparisons were carried out, we also computed bonferroni-adjusted p-values. All analyses were performed using the STATA software version 14.
The derived cutoffs for all ZBI versions had acceptable levels of sensitivity and specificity (>70%), except for the 1-item (Higginson et al., 2010) version (see Table 3). At the optimal cutoff of 34, the original ZBI had a sensitivity of 85.2% and a specificity of 74.8%. Detailed results of the sensitivity and specificity statistics for all ZBI variants are reported in the Supplementary materials.

Discussion
The aim of the current study is twofold. First, we sought to identify the most optimal ZBI short version and derive externally validated cutoffs for the different versions of ZBI. We compared the AUROC between the ZBI short versions and the original. The 6-item version (Higginson et al., 2010) emerged as the most optimal short version in having the least number of items yet demonstrated comparable diagnostic utility as the original 22-item version. The cutoff for this version also had relatively high specificity and sensitivity in classifying probable depression cases. Interestingly, while this 6-item version had a comparable AUROC value to the original 22-item version, a few of the longer ZBI versions (Arai et al., 2003;Knight et al., 2000;Whitlatch et al., 1991) actually had significantly lower AUROC values than the original. One possible explanation for this is that the extra items beyond the 6-item version may be less relevant in the context of depression, hence their inclusion is unlikely to increase the scale's diagnostic utility as measured via the accuracy in classifying probable depression cases. Relatedly, in showing that ZBI scores correlate significantly and highly with depression scores, we have also demonstrated the excellent convergent validity of ZBI and provided evidence to confirm similar findings reported previously (Tang et al., 2016).
Next, using probable depression cases as a reference standard, we derived the cutoffs for the full ZBI and its shorter variants. It is interesting to note that the cutoffs reported previously (B edard et al., 2001;Higginson et al., 2010;Zarit & Zarit, 1987) were lower than our validated cutoffs. Given that lower cutoffs generally correspond to lower specificity, this suggests that the previous cutoffs could suffer from reduced specificity in assessing caregiver burden and consequently result in false positive errors (i.e. identified as excessive burden when it is in fact not). Hence, one should exercise caution in interpreting the results of previous research that used these cutoffs.
These findings have two important implications. Firstly, the externally validated cutoffs for ZBI afford clinicians greater confidence in identifying caregivers who are experiencing high levels of caregiving-related burden, to the extent that they might be suffering from depression or at risk of developing it. This will facilitate early intervention for these caregivers and efforts to improve the caregiving experience for both the caregiver and PWD. Secondly, by showing that the 6-item, 7item and both 12-item versions of the ZBI are equivalent to the original 22-item version in terms of diagnostic utility, these shorter versions can be utilized to assess for caregiver burden with greater convenience and confidence.
Some limitations of the study are noteworthy. First, the participants were recruited from two hospitals in one region of Singapore, hence they may not be geographically representative. Second, the participants were recruited from dementia services in tertiary centers and may therefore not be fully representative of those in the community. However, this is less likely a problem considering that most PWDs in Singapore receive care from tertiary centers. Third, the caregivers sample included only spouses and children of the PWD, hence these findings may not apply to other caregivers such as the PWD's children-in-law or paid caregivers. Forth, the sample was over-represented by Chinese; as such the results may only be applicable to the Chinese population. In relation to these generalizability issues, future studies may consider replicating these findings in other populations. Finally, given the cross-sectional nature of this study, it is possible that the depressive symptoms assessed among caregivers may not be a consequence of caregiving-related burden but rather the result of other unrelated stressors. As such, future studies may consider verifying the current findings with a longitudinal design.