A clinically feasible short version of the 15-item geriatric depression scale extracted using item response theory in a sample of adults aged 85 years and older

Abstract Objectives To extract the items most suitable for a short version of the 15-item Geriatric Depression Scale (GDS-15) in a sample of adults aged ≥ 85 years using item response theory (IRT). Method This population-based cross-sectional study included 651 individuals aged ≥ 85 years from the Umeå 85+/GErontological Regional DAtabase (GERDA) study. Participants were either community dwelling (approximately 70%) or resided in institutional care (approximately 30%) in northern Sweden and western Finland in 2000–2002 and 2005–2007. The psychometric properties of GDS-15 items were investigated using an IRT-based approach to find items most closely corresponding to the GDS-15 cut off value of ≥5 points. Receiver operating characteristic curves were used to compare the performance of the proposed short version with that of previously proposed short GDS versions. Results GDS-15 items 3, 8, 12, and 13 best differentiated respondents’ levels of depressive symptoms corresponding to the GDS-15 cut off value of ≥5, regardless of age or sex, and thus comprise the proposed short version of the scale (GDS-4 GERDA). For the identification of individuals with depression (total GDS-15 score ≥ 5), the GDS-4 GERDA with a cut-off score of ≥2 had 92.9% sensitivity and 85.0% specificity. Conclusion The GDS-4 GERDA could be used as an optimized short version of the GDS-15 to screen for depression among adults aged ≥ 85 years.


Introduction
Depression is a leading cause of disability worldwide, and it is ranked first in the burden of disease in middle-and high-income countries (World Health Organization, 2008). Depressive disorders negatively affect well-being (Bergdahl et al., 2005) and daily functioning (Beekman, Copeland, & Prince, 1999), and are among the most common mental disorders in older adults (Fiske, Wetherell, & Gatz, 2009), although they often are underdiagnosed (Stek, Gussekloo, Beekman, Van Tilburg, & Westendorp, 2004) and not properly treated (Bergdahl et al., 2005). The prevalence of depression has been estimated to be 20-25% among people aged ! 85 years and 30-50% among those aged ! 90 years (Luppa et al., 2012).
Item response theory (IRT) is a psychometric statistical modelling (Chiesi et al., 2017) that can be used to compare different scale items relative to each other and depending on the level of a latent trait, which is a variable that cannot be measured directly (e.g. depression or pain) (Thomas, 2011). By analyzing individuals' responses to scale items, the level of a variable of interest (e.g. depressive symptoms) at which a specific item usually starts to render points can be extracted (Chiesi et al., 2017). IRT analyses are used to calculate item parameters such as difficulty and discrimination. For scales measuring depression, item difficulty represents the level of depression at which half of the population scores points on a particular item, and item discrimination describes the strength of the relationship between the item and the depressive symptoms (Thomas, 2011).
Previous short forms of the GDS-15 have been extracted using Classical Test Theory except Wongpakaran et al. (Wongpakaran et al., 2019) who used a complex combination of IRT analysis and Confirmatory Factory Analysis. To our knowledge no one has used item difficulty from the IRT analysis to find items closest to the cut-off of the original GDS-15 version. Using the items closest to the cut-off would probably suffice to identify individuals with or without depression and render the other items redundant. Further, to our knowledge no short version of GDS-15 have been tested among adults !85 years of age. Thus, the aim of this study was to use IRT analysis to determine which GDS-15 items best differentiate at the level of depressive symptoms corresponding to the original cut-off value of !5 points in a sample of adults aged ! 85 years, and to investigate whether these items can serve as a feasible, sensitive, and specific shorter version of the GDS for this population.

Setting and data source
The data used in this study were collected as part of the Umeå 85þ/GErontological Regional DAtabase (GERDA) study, a population-based cohort study performed by researchers from Umeå University, Sweden, in collaboration with Åbo Akademi, Vaasa University, and Novia University of Applied Sciences, Finland. In 2000-2002, participants from V€ asterbotten county, northern Sweden, were recruited. Those who participated in 2000-2002 were asked to participate again in 2005-2007. In addition, new participants were recruited from the same area and the study was also expanded to the Ostrobothnia region of western Finland. Every second person aged 85 years, and all persons aged 90 and !95 years registered in the Swedish National Tax Board and the Finnish Population Register were invited to participate in the GERDA study. Ethical approval was obtained from the regional Ethics Review Board in , and in Finland (nos. 05-87 and 10-54).

Participants
Of 1489 individuals who were eligible for participation during these periods, 524 were excluded due to death before contact or refusal of study participation. For individuals who participated at both timepoints, only data from 2005-2007 (when participants were oldest) were used. The remaining 965 individuals accepted interview and answered at least 14 of the 15 GDS-15 questions. Of these, 651 were cognitively assessed using the Mini-Mental State Examination, no exclusion criteria were used, and they constituted the final sample.

Procedure
Eligible participants were first sent letters with information about the study and later telephoned by a research assistant to obtain informed consent. In cases of concern about individuals' ability to provide informed consent (e.g. due to cognitive capacity), consent was discussed with the individuals' next of kin. Home visits (ca. 2 h) were then conducted, with health check-ups and structured interviews. The interviewers were performed by specially trained physicians, physiotherapists, nurses or medical students. Participation was voluntary and could be terminated at any time. Participants' medical records and information from relatives and caregivers were used to validate information collected during interviews.

Assessments
The GDS-15 has been shown to have high sensitivity and specificity for the detection of depression in older adults when used with the recommended cut-off of !5 points to indicate depression (Pocklington, Gilbody, Manea, & McMillan, 2016). Item response options are "yes" and "no"; answers indicating depression was given a score of 1 and overall scores range from 0 to 15. The MMSE was used to assess cognitive status. Total MMSE scores range from 0 to 30, with scores 17 indicating severe cognitive impairment and scores of 18-23 indicating mild cognitive impairment (Tombaugh & McIntyre, 1992). The GDS-15 has previously been shown to be feasible and valid among old adults with cognitive impairment (Conradsson et al., 2013). The Barthel index was used was used to assess independence in activities of daily living; scores range from 0 to 20, with higher scores indicating greater independence (Mahoney & Barthel, 1965). Impaired vision was defined as the inability to read 5-mm-high letters from a normal reading distance, with or without glasses. Participants who could not hear normal conversation at a distance of 1 m, with or without a hearing aid, where considered having impaired hearing. Participants who did not live with partners or other close relatives were considered to be living alone. Living in institutional care was defined by residence in a residential care facility, nursing home, or group home for people with dementia.

Statistical analyses
Differences between participants with total GDS-15 scores < 5 and !5 were evaluated using the chi-squared and independent-samples t tests. P values < 0.05 were considered to indicate significance. To identify items most suitable for a shorter version of the GDS, a two-parameter IRT analysis based on binary logistic regression was performed for each GDS-15 item using equations provided by L€ ovheim et al. (L€ ovheim, Gustafsson, Isaksson, Karlsson, & Sandman, 2019). The dependent variable was the item score and the independent variable was the total GDS-15 score. Item discrimination and difficulty were calculated as item b and minus constant b/item b, respectively. Item characteristic curves (ICCs) were drawn using the equation L€ ovheim et al., 2019). IRT analyses were also performed for subgroups defined by age, sex, and cognitive function to identify any related difference in item suitability for a shorter GDS version. The rationale for the selection of four items for the short version of the GDS was that this approach would allow for optimal sensitivity and specificity while minimizing administration time for clinical practice. Using the full scale cut-off score of !5 points as the gold standard, receiver operating characteristic (ROC) curves were drawn, and sensitivity and specificity of GDS-4-GERDA compared to GDS-15 were calculated. Cronbach's a was calculated and was used to compare the internal consistency of the full scale and the scale with individual items omitted. Cronbach's alpha values vary between 0 and 1. Alpha below 0.7 usually indicates poor internal consistency and values above 0.9 suggests items are very similar and perhaps fewer items could lead to similar results (Peacock & Peacock, 2011). Corrected item-to total-correlations were also determined where values below 0.2 would indicate that the item might measure something different from the scale as a whole (Streiner & Norman, 2008). All statistical analyses were performed using IBM SPSS Statistics (IBM Corp., Armonk, NY, version 26 for Macintosh), and figures were drawn using MicrosoftV R ExcelV R (version 16.9 for Macintosh).

Results
The study sample comprised 651 participants with a mean age of 89.9 ± 4.4 years. Most participants were women (67.3%), lived alone (80.9%), and did not live in institutional care facilities (66.7%). The mean GDS-15 score was 3.7 ± 2.6 (range, 0-14) and the mean MMSE score was 22.3 ± 5.4. Compared with participants with GDS-15 scores < 5, those considered to be depressed were older, used more medications, and were more likely to live in institutional care facilities or alone and to report experiencing loneliness. Depressed participants also had lesser cognitive function and functional levels and were more likely to have impaired reading vision, experience of the loss of a child, and heart disease ( Table 1). Excluded individuals did not differ from participants in age, but the excluded sample contained more women than did the included sample (74.2% vs. 67.3%; p < 0.01).

IRT analysis results
ICCs are provided in Figure 1a, and item difficulty and discrimination are summarized in Table 2. The difficulty of the GDS-15 items ranged from very low to very high (-0.519 to 11.061), where item 9 ("Do you prefer to stay at home, rather than go out and do new things?") was identified Table 1. Characteristics of the participants and comparison with previous studies and of subgroups depending on depression being indicated or not according to total GDS-15 score with cut off of ! 5 points as indicating depression.

Present study
Other versions suggested in previous research The dotted line (" ……..") between 3.595 and 5.288 in the Difficulty column represents the difficult level of the cut-off of the original GDS-15, i.e. 5 points.
Difficulty ¼ represents the level of depressive symptoms at which half of the population scores points on a particular item. The higher the value, the more severe symptoms. Discrimination ¼ describes the strength of the relationship between the item and the depressive symptoms. A higher value discriminate better and is therefore preferred.
with the lowest difficulty and item 10 ("Do you feel you have more problems with memory than most?") identified with the highest difficulty. These two items also had low discrimination values. Items 3 ("Do you feel that your life is empty?"), 8 ("Do you often feel helpless?"), 12 ("Do you feel worthless the way you are now?"), and 13 ("Do you feel full of energy?") corresponded most closely to the GDS-15 cutoff of !5 points (Figure 1b), with difficulty values ranging from 3.595 to 6.080. These four items also had satisfactory discrimination values (0.456-0.731). Item difficulty and discrimination values varied only marginally according to age group, sex, and cognitive function (data not shown), except for items 6 ("Are you afraid that something bad is going to happen to you?") and 10 ("Do you feel you have more problems with memory than most?"), which presented large sex-related differences in difficulty (item 6: 15.70 for men, 7.61 for women; item 10: 7.63 for men, 14.24 for women). No differences regarding sex or age group was found in which four items most closely corresponded to the GDS-15 cut-off. For participants with severe cognitive impairment (MMSE score < 18), items 2 ("Have you dropped many of your activities and interests?"), 3 ("Do you feel that your life is empty?"), 8 ("Do you often feel helpless?"), and 12 ("Do you feel worthless the way you are now?") most closely corresponded to the GDS-15 cut-off value; the item 2 ("Have you dropped many of your activities and interests?") difficulty value of 3.74 was marginally closer to the cut-off than the item 13 ("Do you feel full of energy?") value of 3.69.

Reliability
In this sample, the internal consistency of the full GDS-15 scale was acceptable (Cronbach's a ¼ 0.736). Corrected item-to-total correlations ranged from 0.108 to 0.498 (Table  2), with the lowest values obtained for items 9 and 10 (the items with lowest and highest difficulty; both < 0.2). Cronbach's a value for the total scale was 0.736 and removing item 9 increased a to 0.748 and item 10 to 0.739 (Table 2).

ROC analysis results
ROC curves were drawn for the proposed four-item version of the GDS (GDS-4 GERDA) and compared with previously proposed short versions (Figure 2). The GDS-4 GERDA had the greatest area under the ROC curve (0.939), indicating better performance than the other GDS short versions.

Sensitivity and specificity of the GDS-4 GERDA
The sensitivity and specificity of the GDS-4 GERDA for the identification of individuals with depression (total GDS-15 score ! 5) are shown in Table 3. With a cut-off value of !1, the GDS-4 GERDA had 99.5% sensitivity and 43.7% specificity; with a cut-off value of !2, it had 92.9% sensitivity and 85.0% specificity.  All values were calculated using the 15-item Geriatric Depression Scale with cut-off of !5 as indicating depression as golden standard. Participants (n) means the number of participants of the total sample (n ¼ 651) that would have scored at or above the suggested cut off value, including true positives and false positives.

Discussion
Based on IRT analysis of the GDS-15 items, we found that items 3, 8, 12, and 13 were most suitable for the GDS-4 GERDA in our sample. The inclusion of item 2 could also be considered for application to populations with severe cognitive impairment. For easy and rapid screening, an instrument should preferentially have a high negative predictive value and high sensitivity; a somewhat larger number of false positive findings is acceptable, given the ability to rule out such cases later in the diagnostic process using more specific methods (Streiner & Norman, 2008). Following this guideline, the GDS-4 GERDA could be used for screening in clinical practice with a cut-off score of !1 to indicate depression; this approach, however, resulted in the finding of depression in approximately 70% of our sample. GDS-4 GERDA use with a cut-off score of ! 2 yielded fewer false-positive results, with the detection of depression in approximately 40% of our sample compared with approximately 30% detected with the GDS-15. Clinical practitioners can memorize the four screening items for easier administration and less time consumption, which could lead to more frequent usage among very old adults.
Item 9 ("Do you prefer to stay at home, rather than go out and do new things?") had the lowest difficulty value in our sample, suggesting that the oldest adults may prefer to not leave their homes to do new things, regardless of depressive status. This result is in accordance with Wongpakaran et al.'s (Wongpakaran et al., 2019) finding that item 9 is likely culturally biased and unable to assess depression. In contrast, this item is included in the GDS-4 van Marwijk (van Marwijk et al., 1995) and GDS-5 Hoyl (Hoyl et al., 1999). This discrepancy could be explained by differences in methodology or respondent age and/or cultural background. Item 13 ("Do you feel full of energy?") is included in the GDS-4 GERDA, but not in any previously proposed short version used for comparison in this study. Wongpakaran et al. (Wongpakaran et al., 2019) found item 13 to have differential functioning among men and women. However, this item is similar to and could be taken to represent the major depression criterion "fatigue or loss of energy nearly every day" appearing in the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders: DSM-5 (Arlington, 2013). Our result suggests that item 13 is suitable for the detection of depression in the oldest adults, despite the common association of reduced energy levels with old age and health issues other than depression (Birrer & Vemuri, 2004). The greater sex-based variance of difficulty values for items 6 ("Are you afraid that something bad is going to happen to you?") and 10 ("Do you feel you have more problems with memory than most?") than for other items may imply a sex-related difference in the symptomology of depression in the oldest adults. Item 10 also had the highest difficulty value among items and is not included in any short version used for comparison in this study, whereas item 6 is included in the GDS-4 version proposed by D'Ath (D'Ath et al., 1994). The GDS-4 GERDA had a slightly larger area under the ROC curve than did other proposed short versions of the GDS examined in this study but comparison is difficult due to different age, language and culture etc in the compared samples.

Study strengths and limitations
Few studies have evaluated GDS use with adults aged ! 85 years, perhaps because this population is difficult to study due to age-related health issues. Adults aged ! 85 years have a high prevalence of depression 6 but also of loneliness, social isolation, negative life events, loss of function, living in institutional care as well as comorbidities (Table 1) which increases the importance of testing psychometric properties of screening tools for depression in this group. Thus, a strength of this study is that the participants were older than those included in previous studies (D'Ath et al., 1994;Hoyl et al., 1999;van Marwijk et al., 1995;Wongpakaran et al., 2019), which along with the large sample size supports the feasibility of the GDS-4 GERDA for these oldest adults. Because the GERDA study did not involve exclusion according to health status, cognitive functioning, or living conditions, our sample likely reflects the heterogeneity of the oldest adults in northern Sweden and western Finland during the study period. Thus the sample could be taken to be representative, a strength for the development of an easy-access screening method. The sample also included participants with a wide range of total GDS-15 scores. Further, we also used a novel and systematic approach using IRT analysis to find items in GDS-15 with difficulty close to the scales cut-off, which seems like a more logical method than previous attempts.
A limitation of the study is that the evaluation of the GDS-4 GERDA was performed with the GDS-15 serving as the gold standard; the validity of the scale needs further evaluation against clinical depression diagnoses and other scales assessing depressive symptoms. Such analyses were not possible in this study because GERDA participants' clinical depression diagnoses were based partially on GDS-15 scores. The use of the Swedish GDS-15 scale may limit the applicability of the results to the scale in other languages, as wording may differ.

Conclusions and implications
We propose the use of the GDS-4 GERDA, an optimized short version of the GDS-15, for easy and rapid screening for depressive symptoms among adults aged ! 85 years. Routine use of this tool could reduce the underdiagnosis of depression in this population.

Disclosure of interest
The authors reports no conflict of interest

Author contribution
All authors made substantial contributions to conception and design, and/or acquisition of data, and/or analysis and interpretation of data; participated in drafting the article or revising it critically for important intellectual content; and gave final approval of the version to be submitted.

Sponsor's role
Funders had no role in trial design, data collection, analysis, or preparation of the manuscript.