Comparison of the classification ratios of four depression rating scales commonly used in Turkey*

ABSTRACT Objective: According to literature more than 20 depression scales are in use in Turkey. Considering that depression is a popular area of study, it may not seem abnormally unusual that there are so many measuring scales available. However, so many measuring instruments may lead to a problem of different sensitivity levels and raise the question of whether or not all the instruments have the same sensitivity in measuring the particular entity. The purpose of this study is to compare the four commonly used self-report scales adapted into Turkish, namely CES-Depression Scale, Beck Depression Inventory, the Zung Self-Rating Depression Scale, and the Hospital Anxiety and Depression Scale (depression subscale) by cross-validation. Method: These depression scales had been applied to three hundred and forty-one subjects and total scores of the subjects for each scale have been obtained. Next, the sample group was divided into two according to group averages of total scale scores. Normative scores and cut-off scores have not been considered because the study objective was to compare these scales on a theoretical basis. The groups below and above average for each of the four scales have been compared by the ROC curve analyzes. Results: The results showed that the total score of Beck Depression Inventory had been grouped correctly by the Zung Self-Rating Depression Scale at a ratio of 0.871, the Hospital Anxiety and Depression Scale (depression subscale) at a ratio of 0.885, and by CES-Depression Scale at a ratio of 0.874. The total score of CES-Depression Scale had been correctly grouped by Beck Depression Inventory at a ratio of 0.871, the Zung Self-Rating Depression Scale at a ratio of 0.869, and by the Hospital Anxiety and Depression Scale (depression subscale) at a ratio of 0.862. The total score of the Zung Self-Rating Depression Scale has been correctly grouped by the Hospital Anxiety and Depression Scale depression subscale at a ratio of 0.848, Beck Depression Inventory at a ratio of 0.872, and by CES-Depression Scale at a ratio of 0.878. The total score of the Hospital Anxiety and Depression Scale (depression subscale) has been correctly grouped by the Zung Self-Rating Depression Scale at a ratio of 0.848, Beck Depression Inventory at a ratio of 0.889, and by CES-Depression Scale at a ratio of 0.887. Conclusion: The overall results showed that the scales cross-validated with ratios ranging from 0.85 to 0.89. The classifying ratios obtained by ROC analysis were similar across four depression scales.


Introduction
Symptoms of depression have been described roughly, and include a wide range of manifestations, ranging from short-term distress to despair, guilt, unwillingness, and low self-esteem. Long-term depression causes fatigue, sleeplessness, chronic pain, and extreme weight loss or gain [1,2]. Depression has been related to socioeconomic status, ethnicity, age, and gender in various studies, and females, lower socioeconomic groups, and people in the age range of 25-34 are the most affected ones [3][4][5][6][7]. The prevalence of depression is reported to be 15% in women and 10% in men [8]. Moreover, this disorder is related to one's livelihood [9]; thus it is estimated that by 2030 depression would be the most important reason for labor loss [4,10]. Depression is the most common and widespread psychological disorder and occurs in almost every society [4]; however, in underdeveloped and developing countries, it is difficult to determine the prevalence due to the lack of adapted scales and unestablished reliability and validity [11].
Tests labeled "Depression Tests" are claimed to measure similar constructs and are designed to provide a total individual score [12]. Santor et al. [13] reported that between the years of 1918 and 2006 more than 280 depression scales have been developed and published, with an increased rate of development after 1940. Scales have been developed on different samples and in different contexts, which have been categorized based on different theoretical backgrounds, such as psychoanalytic, cognitive, behavioral, and evolutionary as well as those related to attachment and interpersonal relationships. Most of these tests have been developed using classical test theory and focus on a "real" score and measurement error.
Standard depression tests are known to be useful when used for screening purposes in mental health research and in general populations. It is also stated that such measuring instruments possess many favorable characteristics, such as good psychometric qualities, convenient number of items, being understandable, free of charge, and easily accessible [14,15].
In an earlier study, Ceyhun [16] reported that 13 depression scales have been used in Turkey, 2 of which are applied by clinicians, 9 of them consist of self-report questionnaires, and 2 of them are structured interview scales.
In studies where Turkish versions of depression scales are used, depression levels have usually been examined on special groups [26] or the test itself has been validated [27] or depression has been compared with some other theoretical construct/concept [28,29]. There are very few studies other than adaptation of the scales into Turkish or examination of psychometric properties [30][31][32]. Likewise, there are few studies where the results of different depression scales are head-to-head compared [33][34][35][36][37][38][39][40].
Considering that depression is a serious medical illness with important societal implications, it may not seem unusual that there are many measuring scales available. However, with so many instruments available, it is important to determine if the instruments provide comparable measurement of the construct and have similar sensitivity. In addition, depression total test scores developed by classical test theory cannot be compared with others in terms of test equivalence [12]. Therefore, the criteria for selecting and using one instrument over another in a sample from Turkey are currently unclear. Given these factors, the purpose of this study was to compare the Turkish version of four self-report depression scales, namely Beck Depression Inventory (BDI), the Zung Self-Rating Depression Scale (SDS), the Hospital Anxiety and Depression Scale (HADS), and the CES-Depression Scale (CES-D) by a cross-validation study using Receiver Operating Characteristics (ROC) curves. In this way, we aimed to obtain evidence to determine if these instruments measure the same construct and whether one can be substituted for another.

Instruments
The CES-Depression Scale The scale, which has been widely used and translated into several languages [41] was developed by Radloff [42] to examine the items of available depression scales, with the purpose of screening depressive symptoms in the general population. The scale assesses emotional and somatic symptoms of the participants in the previous week [42]. The scale has been translated into Turkish by Tatar and Saltukoğlu [17]. The scale consists of 20 items scored between 0 and 3 and the score obtained varies between 0 and 60. Although there is not a defined factor construct of the scale, usually Radloff's proposed four-factor construct [42] has been supported [3,5,43]. However, researchers have reported different numbers of factor constructs [44]. The scale has been one of the most widely used scales in studies related to the treatment and prognosis of depression [45].

The Beck Depression Inventory
This inventory was developed in 1961 by Beck et al. [46] and consists of 21 items scored between 0 and 3 on a four-point rating scale and the obtained scores vary between 0 and 63. Items of the inventory focus on emotional, behavioral, and somatic symptoms [5]. This test is the most widely used scale for the measurement of depression [1]. The scale was adapted into Turkish by two independent studies [16]. In the present study, the form that has been used is the one adapted into Turkish by Hisli [47][48][49].
In some of the studies using the Turkish form of the scale topics such as major depression [50], postpartum depression [36], somatic symptoms [38], and comorbidity of impulse control disorder in depressive patients [35], depression in coronary artery disease [26], essential tremor, Parkinson's disease [51], and in haemodialysis patients [52], relationship between pain and depression in patients with knee osteoarthritis [53], depression in adolescents three and a half years after the Marmara earthquake [54], depression in mothers of children with food refusal [55], dimensions of alexithymia and the intensity of depression [56], relationship between depression and anger in patients with antisocial personality disorder [57] have been examined.

The Zung Self-Rating Depression Scale
This scale, developed by Zung [58], consists of 20 items scored between 1 and 4 on a four-point rating scale and obtained total score varies between 20 and 80. The scale was adapted into Turkish by Ceyhun and Akça [59] and measures common clinical symptoms of which half of the expressions are negative and half of them positive. The scale measures psychological and somatic symptoms of depression. Although many of the studies have not specified defined sub-dimensions or factor constructs, the scale has been widely defined by two factors; well-being and depressive symptoms (or positive and negative symptoms) [5].
In some of the studies where Beck Depression Inventory and the Hospital Anxiety and Depression Scale are used, the Turkish form of this scale of depression levels in mothers of handicapped children [40] and low back pain and its relationship with pain-related disability and depression have been examined [60].
The Hospital Anxiety and Depression Scale (depression subscale) This scale, developed by Zigmond and Snaith [61], was adapted into Turkish by Aydemir et al. [62]. The scale consists of a total of 14 items of which 7 items are for the assessment of anxiety (The Hospital Anxiety and Depression Scale-A; odd numbered items), and the remaining 7 items are for the assessment of depression (The Hospital Anxiety and Depression Scale-D; even numbered items). Both of the subscales are scored between 0 and 3 on a 4-point scale and the total score obtained ranges between 0 and 21 [62]. In this study, 14 items were administered to all participants to maintain the integrity of the test; however, only the items of the depression sub-dimension have been assessed.
Some of the other studies using the Turkish version of the Hospital Anxiety and Depression Scale examined the relationship between depression and alopecia aerate and alexithymia [63], examined depression in women in the premenopausal and postmenopausal period [64] and nurses [65], in patients with diabetes mellitus [66,67], in patients with low back and neck pain [68], and in medical inpatients [69].

Procedure
Subjects were selected by convenience sampling and participated in the study voluntarily. Subjects were asked whether they wanted to participate in a scientific survey and they were informed that they can withdraw their participation at any time. One participant did not complete both the CES-Depression Scale and the Zung Self-Rating Depression Scale, one participant did not complete the Beck Depression Inventory, one participant did not complete the Zung Self-Rating Depression Scale, and one participant did not complete the Hospital Anxiety and Depression Scale. Nonetheless, these participants completed and returned all other scales. All scales were administered on an individual basis. It took 10-15 minutes for a participant to complete all the forms.

Statistical analysis
First, the total scores for each scale were calculated. From the total scores, the mean scores for each scale were determined and are shown in Table 1. Next, the sample group is divided into two according to group averages of total scale scores. Normative scores and cutting points have not been considered because the purpose here is to compare the scales on a theoretical basis. The groups below and above average for each four of the scales have been compared by ROC curves.

Results
Cronbach's alpha internal consistency coefficients were calculated and were shown in the same table. Results showed that Cronbach's alpha internal consistency coefficients are 0.90 for the CES-Depression Scale, 0.91 for the Beck Depression Inventory, 0.84 for the Zung Self-Rating Depression Scale, and 0.85 for the Hospital Anxiety and Depression Scale-D.
The study group was then divided into two groups, representing participants who scored above and below the group mean. For this procedure, the norms and the cut-off points of the scales were not taken into account because the purpose was to compare the scales on a theoretical basis. Table 2 shows the comparisons of the groups that were derived from each scale, where one can see the curve ratios of people who are above and below the mean with respect to their grouping in each of the other scales. Out of the people who are situated below the mean in the CES-Depression Scale test (n = 189), 32 of them were above the mean in the Beck Depression Inventory, 45 of them were above the mean in the Zung Self-Rating Depression Scale, and 42 of them were above the mean in the Hospital Anxiety and Depression Scale-D. The distribution of these values has been compared by ROC curves. It is sufficient to give the number and percentage distributions of the scales relative to each other.
Next, Pearson's correlation coefficients of the scales with respect to each other were calculated and are shown in Table 3. The correlation coefficients found between the scales ranged between 0.75 and 0.80. The Zung Self-Rating Depression Scale obtained both the lowest and the highest scores, where it showed the highest correlation with the CES-Depression Scale and the lowest correlation with the Beck Depression Inventory.
Finally, scores of each scale below and above the group mean scores with respect to total scale scores were compared with the total scores of the other scales by using the ROC curve.
In Figure 1 results of the ROC curve showed that the CES-Depression Scale correctly classifies the Beck Depression Inventory by a ratio of 0.871, the Zung Self-Rating Depression Scale by a ratio of 0.869, and the Hospital Anxiety, and Depression Scale-D by a ratio of 0.862.
In Figure 2, results of the ROC curve showed that the Beck Depression Inventory correctly classified the CES-Depression Scale by a ratio of 0.874, the Zung Self-Rating Depression Scale by a ratio of 0.871, and the Hospital Anxiety and Depression Scale-D by a ratio of 0.885.
In Figure 3, results of the ROC curve show that the Zung Self-Rating Depression Scale correctly classifies the CES-Depression Scale by a ratio of 0.878, the Beck Depression Inventory by a ratio of 0.872, and the Hospital Anxiety and Depression Scale-D by a ratio of 0.848.
In Figure 4, results showed that the Hospital Anxiety and Depression Scale-D correctly classified the CES-Depression Scale by a ratio of 0.887, the Beck Depression Inventory by a ratio of 0.889, and the Zung Self-Rating Depression Scale by a ratio of 0.848.
Our results showed that groups located below and above the group mean scores had substantially similar ratios for correctly classifying the other three scales according to the ROC analysis. For the CES-Depression Scale, the average correct classifying ratio of other three scales is 0.867; for the Beck Depression Inventory, it is 0.877; for the Zung Self-Rating Depression Scale; it is 0.866; and for The Hospital Anxiety and Depression Scale-D, it is 0.872.

Discussion
Many scales have been developed for the purpose of measuring and assessing depression. In this study, only four of the most commonly used scales in Turkey  were compared. Without considering norms and cutoff scores, classification ratios were roughly compared. In our sample, participants completed these depression inventories and the results indicated that the four scales showed similar Cronbach's alpha coefficients ranging between 0.84 and 0.91. Likewise, the total scores of the scales were similar, yielding correlation coefficients that ranged between 0.75 and 0.80.
When we applied the ROC analysis to test whether the groups that are divided below and above the total score averages correctly classified the other scales, we found that the CES-Depression Scale classified other scales with the mean value of 0.867, the Beck Depression Inventory by 0.877, the Zung Self-Rating Depression Scale by 0.866, and the Hospital Anxiety and Depression Scale-D by 0.872. These values are very close to each other. Although the Zung Self-Rating Depression Scale's correct classification ratio of the Hospital Anxiety and Depression Scale-D and the Hospital Anxiety and Depression Scale-D's correct classification ratio were smaller than the others by a margin, the differences were small when the average values are considered.
ROC analysis is a statistical technique that shows the sensitivity and specificity of the test graphically. This analysis has been reported as an effective technique to evaluate the performance of diagnostic tests [70,71]. In this present study, we evaluated the depression scales using ROC analyzes. However, ROC analysis in this study was not used to evaluate the diagnostic power of the scales in depressed patients, but to compare the classifying ratios of the scales with each other. The results obtained for each scale showed at what ratio a scale reflected the other three scales. In other words, the result obtained for each test is the ratio of overlap with the other three scales in terms of diagnosis. Results showed that the differences between the    four depression scales were extremely small. This result explains that the use of these scales is interchangeable. On the other hand, whether a depression scale determines the level of depression in a person or makes a distinction between depressive and non-depressive individuals is beyond the scope of this study. Due to different number of items and different ways of scoring, the total scores of these four scales are not isomorphic. That is, 10 points obtained from the Hospital Anxiety and Depression Scale-D and 10 points obtained from CES-D do not show the same depression level. While the Hospital Anxiety and Depression Scale-D with 7 items gives a total score between 0 and 21, the CES-Depression Scale with 20 items gives a total score between 0 and 60. For this reason, the average scores of each scales which were obtained on the same group were used as cut-off scores. Even if the scale scores are mathematically isomorphic, in terms of measured variance of scale scores, that is, in terms of depression levels, these test scores will not be enough for ensuring measurement equivalence. There are some psychometric techniques to determine measurement equivalence. However, in this study, the way of the use of ROC analysis to compare tests in a sense resolves the problem without resorting to such psychometric techniques. In other words, the technique discussed in this study has made it possible for the tests to be compared to each other without ensuring measurement equivalence of total scores of the tests.
This study has certain limitations. Although the study group is small, it is possible to conclude that these scales do not produce different classifications and can be used interchangeably in screening studies. Although the focal point of this study was not the length or the number of test items, it is worth pointing out that only the Hospital Anxiety and Depression Scale-D subscale was used in this study, which consisted of only seven items and was significantly shorter than the others. Notwithstanding, considering that correct classifying ratio of this subscale was not significantly different from the other three tests, it can be concluded that the Hospital Anxiety and Depression Scale-D subscale is useful for screening purposes in studies with large samples apart from the purpose of individual diagnosis.
Another important limitation of this study is that no comparisons were made between clinical and nonclinical samples. The depression scales used in this study are also used as clinical diagnostic tools besides their use for research purposes. For this reason, the examination of these scales on the participants that represent both clinical and non-clinical groups would be useful. We have previously indicated that the criteria by which depression scales are selected in Turkey are not clear. We, therefore, selected four tests based on their widespread use in Turkey. We are unable to provide clear selection criteria for the scales under study. Despite these limitations, each of the scales under investigation in this study successfully classified the other three tests when below and above average scale scores were compared. The classifying ratios obtained by ROC analysis were similar across four depression scales. We conclude that these tests can be used interchangeably in Turkey.

Disclosure statement
No potential conflict of interest was reported by the authors.