Overcoming difficulties in measuring emotional regulation: Assessing and comparing the psychometric properties of the DERS long and short forms

Abstract Difficulties with emotion regulation have been found to be implicated in the development and maintenance of depression and symptoms of low mood, as well as various other significant psychological conditions including mood disorders, anxiety disorders and personality disorders. Thus, it is important to have valid and reliable measures of difficulties with emotional regulation that are easy to administer and interpret. There are presently four available measures for this construct: the Difficulties in Emotion Regulation Scale (DERS), and the three short-form versions, the DERS-16, the DERS-18 and the DERS-SF. There remains inconsistency in the literature about which short-form version of the DERS is best. The present study aimed to extend the literature by examining and comparing the psychometric properties and clinical utility of the well-known self-report measure the DERS, and the three short-form versions, the DERS-16, the DERS-18 and the DERS-SF, in a large convenience sample. A sample of 1049 first-year university students completed an online test battery of self-report questionnaires and a series of questions regarding demographic information. The DERS and the three short-form versions demonstrated good construct validity, good internal consistency, and good discriminative ability. The mean scores and standard deviations of the DERS subscales and DERS short-forms organized by depressive symptom severity are presented. Overall, this paper provides new evidence of the validity and clinical utility of the four versions of the DERS.


DERS-36
Given the considerable and well-established effect of emotion regulation abilities on psychiatric difficulties, it is crucial that researchers and clinicians have effective self-report forms to assess this construct. The most widely used self-report measure of this construct is the Difficulties in Emotion Regulation Scale (DERS; Gratz & Roemer, 2004;Charak et al., 2019;Hallion et al., 2018). Gratz and Roemer's (2004) model conceptualises emotion regulation as an individual's perceived abilities to understand, identify, respond to, accept, and manage their emotions. Gratz and Roemer's model captured six domains of emotion regulation abilities within 36 items: lack of awareness of one's emotions (awareness), lack of clarity about the nature of one's emotions (clarity), lack of acceptance of one's emotions (non-acceptance), lack of access to effective emotion regulation strategies (strategies), lack of ability to engage in goal-directed activities during negative emotions (goals), and lack of ability to manage one's impulses during negative emotions (impulse).
To date, extensive literature has examined the psychometric properties of the DERS, including its factor structure, test-retest reliability, internal consistency, predictive validity, and validity in various clinical populations, including adults with a diagnosis of an anxiety disorder, and diagnosis of emotion disorders (Bardeen et al., 2012;Fowler et al., 2014;Hallion et al., 2018;Neumann et al., 2010;Osborne et al., 2017;Perez et al., 2012). Although the original DERS had a 6-factor structure, subsequent examinations of its factor structure have found mixed support . Several studies demonstrated adequate fit of the 6-factor structure (Neumann et al., 2010;Perez et al., 2012). In comparison, several studies found limited reliability or validity for the Awareness subscale in the model (Fowler et al., 2014;Osborne et al., 2017). This poor factor fit has led to several studies removing the Awareness subscale in preference of a five-factor model (Bardeen et al., 2012;Fowler et al., 2014). More recently, several studies have found support for a bi-factor, five-factor model of the DERS, which has been shown to have adequate fit (Bardeen et al., 2012;Cho & Hong, 2013;Hallion et al., 2018). Despite mixed findings regarding the five-or six-factor structure of the DERS, with the exception of the Awareness subscale, overall, the DERS has consistently shown to be a valid measure capturing substantial difficulties in emotion regulation in individuals experiencing high levels of psychopathology; that is, high scores on the DERS have been associated with clinically relevant behaviours and symptoms of psychopathology (Dvorak et al., 2014;Hallion et al., 2018;Neumann et al., 2010;Roemer et al., 2009;Sloan et al., 2017;Tolin et al., 2018).

DERS short forms
In addition to the original DERS self-report form, there exist three different short-forms independently derived from the original DERS; the DERS-16 (Bjureberg et al., 2016), DERS-SF (Kaufman et al., 2016), and DERS-18 (Victor & Klonsky, 2016). These three brief measures were developed independently, albeit coincidentally, in 2016. The DERS-16 (Bjureberg et al., 2016) is a 16-item short form comprised of the items that demonstrated the highest item total correlations, with an additional item added to ensure at least two questions per subscale. The DERS-16 excluded the Awareness subscale, due to previous evidence of lack of support (Bardeen et al., 2012). Bjureberg et al. (2016) found the DERS-16 exhibited excellent validity and reliability and correlated significantly and positively with the original DERS-36 in a clinical sample of women receiving emotional regulation group therapy for self-harm in Sweden (n = 96, mean age = 25.37(6.63) years, 100% female) and also in American community samples (one community sample targeted recruitment at interested community members who experienced mood or behavioural problems; n = 102, mean age = 24.68 (10.27 years), 63.64% female; the other community sample of young women who were recruited by random sampling techniques; n = 482, mean age = 21.75 (2.23 years), 100% female).
The DERS-18 (Victor & Klonsky, 2016) is an 18-item, six-factor brief short-form, similarly composed of the strongest items from each subscale, but retained the Awareness subscale so that the DERS-18 was a brief replica of the original six-factor DERS. Victor & Klonsky, 2016) examined the psychometric properties of the DERS-18 in a series of studies in clinical and non-clinical populations of varying ages including a sample of high-school students aged 13 to 17 years (n = 265, 61.45% female), a sample of adolescents from an inpatient unit (n = 167, mean age = 15.61 (1.42) years, 77.25% female), undergraduate student who engage in non-suicidal self-harm (n = 160, mean age = 23.28 (5.45) years, 68.13% female), and online community participants (n = 163, mean age = 30.49 (10.73) years, 55.83% female; n = 705, mean age = 35.26 (13.19) years, 58.30% female). They found the DERS-18 had excellent reliability and validity, and adequate fit for the six-factor solution. However, results also demonstrated the Awareness and Goals subscales exhibited weak, non-significant correlations with the other scales. Similarly, the DERS-SF (Kaufman et al., 2016) is an 18-item, six factor self-report questionnaire. Kaufman et al. (2016), in both a clinical and non-clinical samples of American adolescents (n = 84 in-patient adolescents, n = 29 adolescents who has attempted suicide, n = 30 nonsuicidal control adolescents, and n = 131 adolescents in the community) and college students (n = 230, mean age = 24.38 (5.8) years, 63% female; n = 567, mean age = 24.2-(6.21) years, 67% female), found the DERS-18 to have excellent validity and reliability, with good model fit for the six-factor structure.
Since the development of these brief short-forms, there have been several subsequent studies that have sought to replicate and investigate the psychometric properties of these questionnaires (Charak et al., 2019;Hallion et al., 2018;Skutch et al., 2019). Consistently, investigations have found all three brief reports to have acceptable factor-fit, and be reliable and valid. However, despite these investigations, there continues to be uncertainty about which brief form is best. Specifically, each study has completed slightly different investigations, and found slightly different results, hence leading to different recommendations for the best brief form of the DERS. For example, (Charak et al., 2019) compared the factor structure and measurement invariance of the DERS-36, and three short versions across two inpatient populations: adolescents, and adults. In their investigation, convergent validity and internal consistency were not assessed. Charak et al. (2019) found six-factor model had excellent fit for both the DERS-SF and DERS-18, and the DERS-16 demonstrated excellent fit with a five-factor model. Regarding measurement invariance, only the DERS-SF achieved metric and scalar invariance between adult and adolescent populations. Thus, the authors recommended the DERS-SF as the most acceptable brief going forward. In comparison, Hallion et al. (2018) examined the factor structure, predictive utility, and internal consistency of the DERS-36, and the three short-forms in a population of treatment seeking adults who met criteria for one or more DSM-5 emotional disorder. Their results found all three forms showed good fit and internal consistency. Notably however, unlike Charak et al. (2019), the Awareness subscale was excluded from all analyses (in DERS-36, 18, SF), and a bi-factor, five-factor model was fitted instead of the original six-factor model. Their conclusion was that no one brief measure emerged as superior to another, however it was recommended that the Awareness subscale be excluded when using these measures. Similar to Charak et al. (2019), convergent validity was also not assessed by Hallion et al. (2018). Finally, Skutch et al. (2019) examined the convergent validity and internal reliability of the three DERS short-forms in a large sample of American undergraduate students (n = 1360, mean age = 20.7 (5.15) years, 70.79% female). Skutch et al. (2019) found the validity and reliability to be equivalent between all three brief measures, however concluded that the DERS-SF and DERS-18 may have greater clinical value due to the subscale scores, which showed specific relationships to various psychopathology (e.g., in their study higher BPD symptoms was associated with higher scores on the Strategies subscale). This is in contrast to Hallion et al., and Moreira et al. (2020), that have recommended the exclusion of the Awareness subscale. Thus, given these varying results and recommendations, one of our aims was to gain further clarity on the psychometric properties of these three brief-measures, and determine if one measure is psychometrically superior. It may be that no one measure is best, however clarification is important, as a multitude of measures which all assess the same construct can result in confusion for researchers, and also might mean that different research outputs cannot be directly compared due to the use of different measures (Skutch et al., 2019). Therefore, a primary aim of the study reported in this paper is to examine and directly compare the validity and reliability of the three short-form versions of the DERS along with the original 36-item version in a general non-clinical sample.

Difficulties with emotional regulation in depression
Depression is a debilitating disorder that is typified by the experience of enduring negative affect and the absence of positive affect. Difficulties coping with persistent negative emotions are a core concern (Otte et al., 2016), significantly reducing a person's functioning (Kupferberg et al., 2016). Depression is the second highest cause of disease burden in the world (Sutin et al., 2013) and is associated with lesser quality of life, poorer social functioning, and reduced vocational functioning (Greenberg et al., 2003). Research has shown that one in five people will experience a depressive episode at some stage in their life (Bromet et al., 2011). Contemporary research has highlighted the significant relationship between difficulties in emotion regulation and the experience of depression. For example, research has demonstrated the unhelpful maintaining role of emotional suppression (a maladaptive emotion regulation strategy) for both negative and positive emotions in depression (Beblo et al., 2012;Werner-Seidler et al., 2013). Research has also shown that depressed individuals hold a biased negative perception of their capacity to manage or cope with the experience of difficult emotions (Liu & Thompson, 2017;Rottenberg, 2017). Berking et al. (2014) conducted a longitudinal study that investigated individual's emotion regulation skills, and depression symptomatology over five years. Results showed that individuals' effective emotion regulation skills negatively predicted subsequent depressive symptom severity, such that more effective skills were associated with less severe symptoms, and deficits in skills were associated with greater symptom severity. Notably, the relationship between depression and regulation skills was found to be unidirectional, in that experiencing severe depressive symptoms did not predict the subsequent use of poorer emotion regulation skills, rather the use of poor emotion regulation skills predicted depressive symptomatology. Overall, Berking and colleagues concluded that emotion regulation skills were a predisposing factor that contributed to the development of depression. Since Berking and colleagues' study, the association between deficits in emotion regulation skills, and the development and maintenance of depressive symptoms has been further established (Aldao et al., 2010;Joormann & Stanton, 2016;Mehu & Scherer, 2015;Rottenberg, 2017;Yoon et al., 2018). Despite a growing body of research identifying an important link between depressive symptomatology and difficulties in emotion regulation, to date, no existing study has examined the DERS, full or short-form version, within a depressed sample. Therefore, an additional aim of the study reported in this paper was to add to the literature by examining the way in which the DERS and the short-forms of the DERS differentiate individuals with varying levels of depressive symptomatology.

Aims & hypotheses
The study outlined in this brief report aimed to further assess the psychometric properties and clinical utility of the Difficulties in Emotion Regulation Scale (DERS-36) and its short-forms (DERS-16, DERS-18 and DERS-SF) in a large convenience sample of undergraduate psychology students with varying levels of self-reported depressive symptoms. Specifically, this study aimed to compare the psychometric properties and clinical utility of the three short-form versions of the DERS to determine which of these is best placed to be adopted for clinical and research use. A further aim of this study was to extend on the current literature by assessing the measures' discriminative ability using receiver operating characteristic curve (ROC) analyses with putative subgroups of the sample based on the severity of their self-reported depressive symptoms.
Based on previous findings, we hypothesised that the short forms of the DERS (DER-16, DERS-18 and DERS-SF) would show evidence of reliability through good internal consistency. We also expected good convergent validity would be observed between the DERS short forms and related measures of emotion regulation, including the Distress Tolerance Scale (DTS; Simon & Gaher, 2005) which assesses an individual's perceived capacity to tolerate distress (a construct highly related to emotion regulation (Conway et al., 2021); and the Depression Anxiety Stress Scale (DASS-21; Lovibond & Lovibond, 1995), which is considered an indicator of not only depressive symptomatology but also overall emotional distress (Burton & Abbott, 2019;Henry & Crawford, 2005).

Participants
The total sample consisted of 1049 participants (66.1% female, mean age = 19.60 years, SD = 3.88 years, range = 17 to 60 years) who were first-year psychology students at The University of Sydney choosing to participate in the study by completing a test battery of questionnaires online in exchange for course credit. The sample was comprised of participants from two previously published studies (Burton & Abbott, 2018;Burton et al., 2017). In keeping with preceding research, participants were categorised according to their DASS-21 depression scores (Veilleux et al., 2019) to create putative subgroups for the purposes of assessing the DERS discriminative ability. Using DASS-21 severity cut-offs and scoring information resulted in four groups: normal, mild, moderate, and severe to extremely severe depression (Lovibond & Lovibond, 1995). Due to the relatively small group of participants who reported experiencing an extremely severe level of depression, the extremely severe and severe groups were combined (henceforth this group is referred to as severe+ throughout).
Altogether, participants who self-reported experiencing "normal" levels of depressive symptoms on the DASS-21 depression subscale comprised 575 participants (60.2% female, mean age = 19.68 years, SD = 4.10 years). 145 participants were categorised as experiencing "mild" levels of depression (73.0% female, mean age = 19.39 years, SD = 3.05 years), an additional 197 participants self-reported experiencing "moderate" level of depression (73.6% female, mean age = 19.52 years, SD = 4.15 years), and 132 participants self-ratings of depressive symptoms fell in the "severe+" range (72.7% female, mean age = 19.61 years, SD = 3.33 years). A one-way ANOVA found no significant difference in age across the different depression severity groups (normal, mild, moderate, severe+), F(3, 1045) = .25, p = .86. Chisquare tests of independence found a significant relationship between gender and severity of depression, χ 2 (6, 1049) = 29.88, p < .01, indicating a higher proportion of female participants in the clinical subgroups (mild, moderate, severe+) relative to the "normal" subgroup. Comparing the column proportions for each subgroup identified that there were differences in proportion of participants for each gender between the "normal" subgroup (60% female and 40% male) and each of the clinical subgroups (mild, moderate, severe+), each clinical subgroup containing a greater proportion of female participants when compared to the normal subgroup, but that the proportion of participants for each gender were equivalent across all three subgroup (74-75% female and 25-26% male).

Measures
Difficulties in Emotion Regulation Scale (DERS; Gratz & Roemer, 2004). The DERS is a 36-item self-report measure that assesses an individual's perceived ability to regulate their emotions. Items are rated on a 6-point Likert scale ranging from almost never (1) to almost always (6). The DERS consists of six subscales: lack of emotional awareness (e.g., "when I'm upset, I acknowledge my emotions"), nonacceptance of emotional responses (e.g., "when I'm upset, I become embarrassed for feeling that way"), impulse control difficulties (e.g., "when I'm upset I feel out of control"), restricted access to emotion regulation strategies (e.g., "when I'm upset I believe that wallowing in it is all I can do"), reduced emotional clarity (e.g., "I have no idea how I am feeling"), and difficulties participating in goal-directed behaviour (e.g., "when I'm upset I have difficulty focusing on other things"). There are three short-form versions of the DERS; the DERS-18, the DERS-SF and the DERS-16. There are two short-forms which contain 18-items and retain the same six-factor structure of the original DERS; the DERS-18 (Victor & Klonsky, 2016) and the DERS-SF (Kaufman et al., 2016). These two short forms are modifications of the DERS and vary only in some specific items retained in each version, based on the items found to have the highest factor loadings as identified in the development studies (Kaufman et al., 2016;Victor & Klonsky, 2016). Additionally, the DERS-16 (Bjureberg et al., 2016) is a 16-item, five-factor self-report measure of emotion regulation abilities, adapted from the DERS. Unlike its sister scales, the DERS-16 did not retain items of all six of the original subscales, as the DERS-16 does not include any items from the original DERS-36ʹs "awareness" subscale. Due to its brevity, the DERS-16 provides a total score and subscales are not calculated when scoring this measure. Across the DERS and DERS short-forms, higher scores indicate poorer ability to regulate emotions. See, Table 1 which includes the DERS-36 items, and which were retained in DERS-SF, DERS-18 and DERS-16.
The Distress Tolerance Scale (DTS; Simon & Gaher, 2005). The DTS is a 15-item self-report inventory that measures an individual's perception of their capacity to tolerate distressing emotions. The DTS is comprised of four subscales including tolerance of negative emotions (e.g., "I can't handle feeling distressed or upset"), regulation of emotions (e.g., "when I do feel distressed or upset, I must do something about it immediately"), appraisal of capability to manage distress (e.g., "I am ashamed of myself when I feel distressed or upset"), and absorption in one's upsetting emotions (e.g., "when I feel distressed or upset, all I can think about is how bad I feel"). Items are assessed using a five-point Likert scale (1 = strongly agree to 5 = strongly disagree), with higher scores signifying a greater capacity to tolerate distress. The DTS has previously demonstrated excellent internal consistency, good test-retest reliability, and validity (Simon & Gaher, 2005). Critically, research has demonstrated the DTS captures a related construct to the construct of emotion regulation abilities, and DTS total scores have been found to correlate significantly with the original DERS (Juarascio et al., 2020). In the current study, the DTS demonstrated good to excellent internal consistency across the sample, α = .92, with each of the DTS subscales demonstrating adequate to good internal consistency, ranging from α = .75 to .82. (DASS-21: Lovibond & Lovibond, 1995). The DASS-21 is a 21-item, self-report scale that assesses the number, and severity of symptoms of depression (e.g., "I felt that I had nothing to look forward to"), anxiety (e.g., "I felt I was close to panic"), and stress (e.g., "I found it hard to wind down"), which occurred during the previous week. Participants rate items using a 4-point Likert scale ranging from 0 "did not apply to me at all" to 3 "applied to me very much, or most of the time". Extensive research has demonstrated the DASS-21 is a reliable screening measure for depression (Ng et al., 2007). Results on the DASS-21 have been found to correlate significantly with other measures of depression (Lovibond & Lovibond, 1995). Moreover, research has demonstrated the DASS-21 total score also captures an individual's general psychological distress (Burton & Abbott, 2019;Henry & Crawford, 2005). In the current study, the DASS-21 demonstrated excellent internal consistency across the total sample, α = .93, and good to excellent internal consistency was demonstrated for each of the DASS-21 subscales, ranging from α = .80 to .90.

Procedure
The research was approved by The University of Sydney Human Research Ethics Committee (Project Code: 2014_082). First-year university students provided informed consent, then completed a battery of tests online using Qualtrics Survey Software, which included demographic questions, the DERS, the DASS-21 and the DTS. Specifically, participants completed the original 36-item version of the DERS once. The psychometric properties of the full version utilising all 36 items were compared to the psychometric properties of the short-form versions (calculated by only using the items pertaining to the DERS-SF, DERS-18, and DERS-16, and no one answered any particular item more than once). Participants were asked to respond to the items of the questionnaires and required to complete all items of each measure prior to moving onto the next measure, the specific instructions presented for each individual measure can be viewed in Table A of the supplementary files. After the demographic items were completed, participants were presented with the measures (e.g., the DERS, the DTS, the DASS-21) one at a time and the order of presentation of the scales was randomised to reduce the potential impacts of respondent fatigue.

Analyses
SPSS, version 26 (IBM, New York, NY, USA) was used to assess the validity of the DERS and its three short-form versions. Analysis of internal consistency was examined using Cronbach's Alpha and construct and concurrent validity were assessed using Pearson's correlations. Between-group differences were evaluated using one-way analysis of variance (ANOVA). The MedCalc program (version 19.4.1, Medcalc Software, Mariakerke, Belgium) was used to evaluate discriminative ability, by conducting ROC curve analyses. ROC curve analyses enable assessment of the area under the curve (AUC) that best discriminates between nonclinical and clinical cases. In addition, four markers of test performance were also evaluated using MedCalc: sensitivity (indicates likelihood of false negatives), specificity (indicates likelihood of true positives), positive predictive value (PPV; reveals the probability the condition is present when the test is positive), and negative predictive value (NVP; reveals the probability the condition is not present when the test result is negative) were calculated. Due to the nature of the online data collection for this particular study and being able to utilise "required response" functions within the Qualtrics survey platform, there was no missing data for the participants included in this study.

Psychometric properties
Total sample DERS-16 scores ranged from the minimum score of 16 to the maximum score of 80; 7 participants (0.67%) scored the lowest possible score of 16 (indicating no difficulties with emotional regulation), and 1 participant (0.10%) received the highest score of 80 (suggesting very severe difficulties with emotional regulation). DERS-18 scores for the whole sample ranged from 18 to 80, and scores for the DERS-SF total sample ranged from 18 to 79. For both the DERS-18 and the DERS-SF 4 participants (0.38%) scored the lowest possible score of 18 (no difficulties with emotional regulation), and 0% of participants rated the highest score of 90 (very severe difficulties with emotional regulation). All short-form versions of the DERS were found to be very strongly correlated with the original DERS (36-items), r = .94 to .98, and with each other, r = .95 to .99 (Table B in   The means and standard deviations for the different putative depressive symptomatology subgroups for the original DERS and the three short-forms (DERS-16, DERS-18 and DERS-SF) total scores and the means and standard deviations for the subscales of the DERS-18 and DERS-SF are displayed in Table 2.
Internal Consistency: Cronbach's alphas were calculated for the DERS and the three short-forms (DERS-16, DERS-18 and DERS-SF) total scores with the full sample demonstrating good internal consistency with Cronbach alphas ranging from .78 to .94, see, Table 3. Cronbach's alphas were also calculated for DERS-18 and DERS-SF subscales with alphas ranging from .67 to .90, see, Table 4. As previously discussed, subscale scores are not calculated for the DERS-16, due to its limited number of items per factor, and because it only retained 5 of the 6 original subscales (removing "awareness"). Due to this, the authors recommend only calculating a total score for interpretation (Bjureberg et al., 2016).

. Full Sample N = 1049
Construct Validity: Convergent validity was examined by evaluating correlations between the DERS total and subscale scores and the DERS short forms (and their subscales), and related constructs including the DTS and DASS-21 scales. Correlations between the DERS and the DASS-21 were significantly correlated, falling in the moderate range (r = .57 to r = .67), see, Table 5. Overall, correlations between the DERS and DTS total scores were significantly negatively correlated, falling in the moderate range (r = −.68 to r = −.72). Apart from the Awareness subscale, the subscales of the DERS-SF and DERS-18 were significantly negatively correlated with the subscales of the DTS (refer to Table C in Supplementary materials).

Group differences
One-way ANOVAs identified significant differences in scores between participants scoring in the normal range and those scoring in the clinical range for depression (mild, moderate, severe or above) for the DERS-16 ( Across all measures (DERS-16, DERS-18, DERS-SF and DERS-36) significant differences in total scores were found between groups for the varying levels of self-reported depression. Individuals in the normal subgroup scored significantly lower than participants who fell in the mild level of depression (all ps <.01), individuals reporting mild levels of depression scored significantly lower than individuals reporting moderate levels of depression (all ps < .01), and individuals reporting moderate levels of depression scored significantly lower than individuals reporting severe/extremely severe levels of depression (all ps < .01); refer to Table 2 for the group scores.

Receiver operating characteristic curve (ROC)
ROC curve analyses were conducted to evaluate discriminative ability for the DERS. For the ROC curve analyses, two sub-groups of participants based on DASS-21 Depression scores were entered, normal (n = 575) and depressed (n = 474; the depressed subsample combines the mild, moderate and severe+ subgroups). Results of ROC curve analysis found that a score of 84 or more on the DERS-36 provided the criterion score (Youden Index) to best differentiate clinical (i.e., depressed) from non-clinical (i.e., "normal") participants:

Discussion
Given the disparity in the literature regarding the various short-form versions of the DERS and the difficulty determining which short-form is superior in terms of psychometric validity, reliability and clinical utility, this study aimed to examine the psychometric properties of the DERS and the three short-form versions with DERS-16, DERS-18 and the DERS-SF in an undergraduate psychology sample. Further, this study also sought to extend the literature by assessing the clinical utility of these measures by examining the discriminative ability of these measures to reliably differentiate between participants self-reporting to be experiencing more severe depressive symptoms compare to participants who self-report experiencing "normal" levels of depressive symptoms.
Overall, good psychometric properties were observed for all three short-form versions of the DERS with the DERS-16, DERS-18 and DERS-SF all demonstrating good internal consistency and significant correlations with related measures, the DTS and DASS-21, providing evidence of good convergent validity. Overall, the results of our study support and replicate the findings of Skutch et al. (2019) who found evidence of validity and reliability for all brief versions. Specifically, our hypotheses were supported with our results indicating good internal consistency (α > 0.70) across the whole sample for the total scores of the DERS-36 and also for the total scores of the three short-forms (DERS-16, DERS-18 and DERS-SF). These findings for internal consistency met the Terwee criteria for adequate internal consistency (Terwee et al., 2007). Additionally, the subscales of the DERS-18 and DERS-SF demonstrated adequate internal consistency within the sample.
As anticipated, scores on the DERS-36 and the three short-forms (DERS-16, DERS-18 and DERS-SF) significantly differed depending on the severity of depression, such that scores were significantly higher for individuals who self-reported experiencing increased self-reported symptoms of depression. Moreover, significant differences in performance on these measures was also found between self-classified groups of depression whereby individuals who rated themselves as .82 *Subscales are the same items in both the DERS-SF and DERS-18; Total Sample N = 1049, experiencing severe to extremely severe levels of depression reported higher scores on the DERS (i.e., more pronounced difficulties with emotion regulation) than individuals who rated themselves as experiencing mild or moderate levels of depressive symptoms. Overall, this indicated individuals with high levels of self-reported depressive symptoms perceive themselves to have more severe difficulties in regulating their emotions, supporting models implicating poor emotion regulation skills in the development and maintenance of depression (Berking et al., 2014;. Further, the DERS-36 and the three short-forms (DERS-16, DERS-18 and DERS-SF) were significantly correlated with a measure of general psychological distress (DASS-21 total score; Burton & Abbott, 2019;Henry & Crawford, 2005), and significantly negatively correlated with a related measure of poor distress tolerance, the DTS. This provides further evidence for convergent validity of the DERS and the short-form measures (DERS-16, DERS-18 and DERS-SF) and reinforces past research, which has found the relationship with poor emotion regulation strategies, higher symptoms of psychological distress, and poor distress tolerance (Aldao et al., 2010;Berking et al., 2014;Hu et al., 2014;Joormann & Stanton, 2016;Mehu & Scherer, 2015;Rottenberg, 2017;Thompson et al., 2010;Yoon et al., 2018).
Finally, the discriminative ability of the DERS measures was assessed by ROC curve analyses with two putative subgroup of participants based on their DASS-21 depressive symptom severity. All versions of the DERS demonstrated significant AUC values indicating that scores on these measures can be reliably used to discriminate between groups with clinical levels of symptom severity and those who do not.

Limitations and future directions
A major limitation of this study is that it did not utilize a clinical sample. In the current study, depressive symptoms in a large general undergraduate sample were assessed by self-report questionnaire, in a cross-sectional design. Future research should endeavor to assess validity (both convergent and discriminant), reliability, and discriminative ability in a population of clinically diagnosed individuals with major depressive disorder. Using a clinical sample would also greatly strengthen and enhance the understanding of the discriminative ability of the DERS short forms, as ROC cut-off scores could be investigated with individuals who have already been established to meet diagnostic criteria, thus allowing for stronger conclusions about the clinical utility of these measures  to be drawn. In addition, to avoid participants repeating almost identical measures multiple times, in this study participants only completed the original DERS-36. Their item responses for the short-forms were taken from the DERS-36 responses. This was done to reduce poor quality of responses associated with responding fatigue (Galesic & Bosnjak, 2009), and has been the conventional method when assessing all three short-form measures due to that reason (Charak et al., 2019;Hallion et al., 2018;Skutch et al., 2019). However, given the research suggests individual responses shift based on the length of the questionnaire (Galesic & Bosnjak, 2009), it is possible that completing the full complement of items may influence the responding, and that response differences may emerge if participants only answered the brief version. Thus, future studies should attempt to use the brief versions, rather than create them from the DERS-36, to investigate these possible response differences. Finally, a further limitation is that due to the cross-sectional nature of the study and convenience sample, test-retest reliability and treatment sensitivity within a depressed sample were unable to be assessed. This should be addressed in future research.

Conclusions
The results of this study provide further evidence for the exemplary psychometric properties and clinical utility of the well-established DERS-36, and extends the literature by reporting on and comparing the psychometric properties and clinical utility of the three competing short-form measures of the DERS-36, namely the DERS-16, the DERS-18 and the DERS-SF within a large sample.
The findings of this study extend existing research by validating the DERS-16, DERS-18 and the DERS-SF by assessing the convergent validity with other related measures (DTS) and by assessing the discriminative validity and clinical utility of the DERS and the three short-form measures (DERS-16, DERS-18 and DERS-SF) using putative subgroups based on depressive symptom severity. All versions of the measures performed well and demonstrated validity and reliability with the DERS-36 (full original measure) and the DERS-16 demonstrating marginally superior internal consistency when compared with the DERS-18 and DERS-SF (likely due to the absence of items from the historically problematic "awareness" subscale). There were no discernible differences between the DERS-18 and the DERS-SF, and these two measures contain very similar items and the same subscales, both providing a suitable brief version of the DERS-36.
In summary, should the reader wish to use a comprehensive measure of difficulties in emotion regulation, the full-scale original measure of the DERS-36, provides a well-validated measure of this construct (including a measure of awareness of one's own emotions), however, should the reader be seeking a valid, brief and clinically useful measure of difficulties in emotion regulation, the results of our study indicate that the DERS-16 should be selected.