Psychometric properties of the Trauma and Distress Scale, TADS, in an adult community sample in Finland

Background There is increasing evidence that a history of childhood abuse and neglect is not uncommon among individuals who experience mental disorder and that childhood trauma experiences are associated with adult psychopathology. Although several interview and self-report instruments for retrospective trauma assessment have been developed, many focus on sexual abuse (SexAb) rather than on multiple types of trauma or adversity. Methods Within the European Prediction of Psychosis Study, the Trauma and Distress Scale (TADS) was developed as a new self-report assessment of multiple types of childhood trauma and distressing experiences. The TADS includes 43 items and, following previous measures including the Childhood Trauma Questionnaire, focuses on five core domains: emotional neglect (EmoNeg), emotional abuse (EmoAb), physical neglect (PhyNeg), physical abuse (PhyAb), and SexAb. This study explores the psychometric properties of the TADS (internal consistency and concurrent validity) in 692 participants drawn from the general population who completed a mailed questionnaire, including the TADS, a depression self-report and questions on help-seeking for mental health problems. Inter-method reliability was examined in a random sample of 100 responders who were reassessed in telephone interviews. Results After minor revisions of PhyNeg and PhyAb, internal consistencies were good for TADS totals and the domain raw score sums. Intra-class coefficients for TADS total score and the five revised core domains were all good to excellent when compared to the interviewed TADS as a gold standard. In the concurrent validity analyses, the total TADS and its all core domains were significantly associated with depression and help-seeking for mental problems as proxy measures for traumatisation. In addition, robust cutoffs for the total TADS and its domains were calculated. Conclusions Our results suggest the TADS as a valid, reliable, and clinically useful instrument for assessing retrospectively reported childhood traumatisation.

Within the European Prediction of Psychosis Study (EPOS; Klosterkö tter et al., 2005), Patterson et al. (2002) developed a new self-report instrument, the Trauma And Distress Scale (TADS) to enable the assessment of a range of adverse childhood experiences in patients at clinical high risk of psychosis. Items for the TADS were initially selected from a comparison of several scales for the assessment of traumatic, adverse, and distressing childhood events or experiences including the CTQ (Bernstein et al., 1994) and the Child Abuse & Trauma Scale (Sanders & Becker-Lausen, 1995). Additional items were gathered from a review of common childhood adversity-related issues reported by clinical staff treating individuals in youth and adult mental health services in EPOS project centres. The aim was to agree a checklist of items describing core domains of childhood adversity, and for the scale to be feasible in both self-report and interview formats for working with high-risk clinical samples and additional comparative populations. To ensure adequate content validity and psychometric consistency (Michel, Pace, Edun, Sawhney, & Thomas, 2014;Streiner, 1993), frequency ratings employing a five-point Likert-scale focused on the five core domains: emotional neglect (EmoNeg) and emotional abuse (EmoAb), physical neglect by parents/caregivers (PhyNeg), physical abuse (PhyAb), and sexual abuse (SexAb) by non-specified offenders. Other items assess loss events, discrimination, bullying, and guilt, and two items represent a ''lie scale.'' To examine other important psychometric properties of the TADS Streiner, 1993), we examined (1) inter-method reliability between the selfrated and interviewed TADS and (2) internal consistency of the five TADS trauma domain sub-scales as a measure of reliability. In addition, employing ''level of depression'' and ''help-seeking for mental health difficulties'' as proxy measures for traumatisation in the broadest sense, we examined the concurrent validity of the TADS and developed domain-specific cutoffs, whilst also considering the impact of potentially confounding conditions such as age, gender, and education.

Methods
The ethical committee of the University of Turku and the Turku University Central Hospital approved the study protocol.

Sample
A random, age stratified sample of 2,080 citizens aged 18 years or more was drawn from the general population of the Varsinais-Suomi Health District of South-West Finland. The general sampling rate was 1/100, and, because of their low proportion in the population, 2/100 for people over 70 years. An extensive questionnaire battery was mailed in spring 2008 and re-mailed to non-responders in summer 2008. In the first round 545 (26.2%) and in the second round 147 (7.1%) subjects responded, thus one-third (N0692, 33.3%) of the sample returned the completed questionnaire. Response rates for females (41.5%) were higher than that for males (25.3%; Fisher exact: p B0.001). Mean age of responders (42.0916.95 years) was slightly higher than that of non-responders (39.5916.37 years; p00.001).
In addition, a random sample of 100 responders were contacted, and items from the TADS were reassessed in a semi-structured telephone interview. The interviewers, three medical students, were blind to the questionnaire responses from the earlier completed TADS. The time period from return of the completed questionnaire to interview ranged from 2 to 4 weeks.

Assessments
The questionnaire battery included items on participants' socio-demographic background and prior help-seeking for mental health problems from a psychiatric service as well as the TADS. Originally, the TADS was developed in English (Birmingham). Three other EPOS centres (Cologne, Amsterdam and Turku) translated it into their own native language. In Turku, the TADS was translated in Finnish by one of our study group (MH) and backtranslated by a professional translator of English. Since the TADS was subsequently available in Finnish and as there was no other existing Finnish scale fit for the purpose of measuring childhood adversity, several research groups began to use it in various populations. Initially, we have selected a general population for the assessment of the TADS's basic psychometric properties. We have also planned to evaluate its properties in clinical samples.
The questionnaire battery also included the depression screening instrument DEPS (Salokangas, Poutanen, & Stengård, 1995) consisting of 10 questions rated on a Likert scale as: 00''not at all,'' 10''to some extent,'' 20''rather much,'' and 30''very much''); their sum indicates number of depressive symptoms during the past month. In a sample of patients attending primary care (Salokangas et al., 1995) at a cutoff of !8, the DEPS revealed a sensitivity of 74% and a specificity 85% for clinical depression.
Data on previous psychiatric treatment (help-seeking) and DEPS was available from all but three of the 692 subjects.

Statistical analyses
Data were analysed using Statistical Programme for the Social Sciences (SPSS) v22.0. To calculate the intermethod reliability between self-report and interview, intra-class coefficients (ICC) were calculated for the raw score of each TADS item. In addition, each TADS item was dichotomised [000 (''never'') to 1 (''rarely''), and 102 (''sometimes'') to 4 (''almost always'')], reversed for negatively phrased items. Agreement for presence of adverse childhood experiences across questionnaire and interview was calculated by the overall concordance rate (CR) and additionally by Cohen's kappa (k) statistic to control for effect of chance. ICC values of less than 0.40 indicate poor, 0.40Á0.59 fair, 0.60Á0.74 good, and 0.75Á1.0 excellent agreement (Cicchetti, 1994). According to Burn, Pitchard, and Whay (2009), k ]0.40 and CR]75% are considered clinically useful. A disadvantage generally associated with use of k is its dependence on the prevalence of an event (Byrt, Bishop, & Carlin, 1993); k tends to decrease when a response/event is rare, even if the CR is high. In the absence of a satisfactory mathematical solution to this problem, we followed the approach for the appraisal of k suggested by Burn and Weir (2011) and additionally calculated the prevalence index (PI) when information was contradictory, that is, when CR exceeded 75% but k fell below 0.40. The PI reports values between (1 and 1, and is 0 when both responses are equally probable (i.e., their prevalence is 50%). With PI 0j1j, the likelihood of an underestimation of k increases, and more attention should be paid to CR.
With regard to the five core domains, both raw (range 0Á20) and dichotomised (range 0Á5) scores of their respective items were summed as a measure of severity of trauma and adversity in each domain, and ICCs were calculated. Following this, the domain severity scores were again dichotomised (0 00; 1 01Á5) as an indicator of persons (''cases'') who rated ]2 (''sometimes'') in ]1 items of the respective domain and, thus, had suffered from some childhood adversity in this respect. To calculate the inter-method reliability of this binary score, CR, k, and PI were calculated again.
To examine the internal consistency of domains, Cronbach's alphas (a) were calculated for sum scores of both original raw items and dichotomised items of the domain. For the evaluation of a the following rules were applied: !0.900excellent, 0.80Á0.890good, 0.70Á0.790 acceptable, 0.60Á0.690questionable, and 50.590poor (George & Mallery, 2003).
Using current depression (DEPS !8) and help-seeking for mental problems as proxy measures of adverse experiences, we examined the concurrent validity of the TADS by cross-tabulating each of these two proxies with TADS domain ''cases,'' and diagnostic accuracy measures (sensitivity, specificity, positive and negative predictive values, positive and negative likelihood ratios (LRs)) were then calculated for TADS domain ''cases.'' LRs can guide the estimation of concurrent validity for the availability of interpretation guidelines (Jaeschke, Guyatt, & Sackett, 1994) that are missing for other accuracy measures that can only be interpreted by less reliable rules-of-thumb (Boyko, 1994;Jaeschke et al., 1994).

Results
Distribution and frequency of items and core domains Frequencies of individual items are shown in Table 1 and descriptive statistics for TADS domain scores in Table 2.
Over 70% of the general population subjects reported that they had experienced abuse or neglect at least sometimes (Table 2) with approximately 50% of the sample reporting emotional and physical neglect with the median score for EmoNeg (4) being twice as high as that for PhyNeg (2). Abuse was less frequent, with over 37% reporting EmoAb and 23% PhyAb at a level of ''sometimes'' or more frequently (Table 2). 5.5% reported experience of SexAb (Table 2), mostly by indicating that they were touched or had to touch someone else in a sexual way in their childhood (item 22: 4.1%, Table 1). The least frequent item with 1.6% endorsement of ''sometimes'' or more was from SexAb (item 25) referring to being forced as a child to keep SexAb a secret.
Internal consistency of the TADS and its five core domains Internal consistency of the total TADS score of the five domains was 0.92 for sum of original raw items and 0.89 for sum of dichotomised items. Corresponding figures of the total TADS sum score of all 43 items were 0.94 and 0.92. Internal consistencies of the five domains, indicated by Cronbach's a and calculated for original raw items and for dichotomised items, were generally better for original raw items (Table 2). While internal consistency was good for EmoNeg, EmoAb, and SexAb, and acceptable for PhyAb, it was questionable for PhyNeg. When the two items with poor inter-method reliability of raw scores (Table 1) were excluded from PhyNeg (item 2) and PhyAb (item 17), respectively, internal consistency improved to 0.64 and 0.78 for original raw items and 0.60 and 0.72 for dichotomised items, respectively. When item 17 was replaced by item 42 (I was afraid of someone in my family), internal consistency of PhyAb was acceptable with Cronbach's a00.79 for raw items and 0.73 for dichotomised items. Consequently, in further analyses of inter-method reliability and concurrent validity as well as normative data, the revised domains were used, that is, PhyNegR without item 2 and PhyAbR including item 42 instead of item 17.
Inter-method reliability of items and core domains As illustrated in Fig. 1, the means scores of self-reported and interview-assessed original TADS items were almost identical ( Fig. 1). In line with this, inter-method reliability values of items in terms of both raw (ICC) and dichotomised scores (CR and k) were good to excellent, the only exceptions being items 2, 17, and 36 (Table 1). For item 2 (When I was young, I was often hungry), both the ICC (0.54) for the raw score and k (0.36) for the dichotomised score were below acceptable levels. In addition, as the PI was (0.090, the CR of 79% cannot be regarded as providing a better estimate of the inter-method reliability of the binary score; consequently item 2 has to be regarded as having insufficient inter-method reliability. The same must be assumed for item 17 (Adults noticed cuts, bruises or marks from when I was beaten), and for item 36 (I have experienced harassment/persecution from other ethnic groups); both showed excellent CRs but only moderate ICCs and ks that had to be given priority in light of low PIs (Table 1). With regard to items 23, 31, 34, 38, and 39, good to excellent ICCs of raw scores indicated that these possess better inter-method reliability than their dichotomised version where ks were insufficient at low PIs (Table 1). Compared to the interview, the self-rating of raw scores tended to give an overestimation in the case of items 2, 23, and 39 and an underestimation only for item 38, while no clear tendency could be detected for items 17, 31, 34, and 36 ( Fig. 1).
As regards the five revised core domains, ICCs of totals of both raw and dichotomised scores were all good to excellent (Table 2). Furthermore, all five domains appeared to hold some clinical utility for indicating the presence of any respective adversity when compared alongside the gold standard of an interview assessment (Table 2). This did not hold for either TADS totals (Table 2); however, where by comparison the presence of any adversity was overestimated.

Concurrent validity of the TADS and its core domains
To study the criterion validity in terms of the concurrent validity, we used presence of depression (DEPS score !8) and help-seeking from mental health services, respectively, as proxy measures of traumatisation in terms of a negative impact on mental health. In total 135 (19.6%) subjects scored !8 in the DEPS and 187 (27.1%) had sought help from a mental health service at some time in their life. For participants who had affirmed at least some experience of childhood adversity in the TADS domains and totals, depression and help-seeking were significantly more frequent at effect sizes of 0.18Á0.33 and 0.14Á0.30, respectively (Table 3), thus indicating good criterion validity of the TADS and all of its domains. Consistently, the effect of EmoNeg and EmoAb on the proxy measures Raimo K. R. Salokangas et al. Table 1. TADS items: original (i.e., unrevised) score frequencies (in %), proportion of item scores ]2 in the general population sample (N0692), and inter-method reliability of self-rating of '' ]2'' to gold-standard interview assessment (N 0100) ''current depression'' and ''help-seeking,'' were strongest, while the effect of PhyNeg was weakest (Table 3).

Normative data
With regards to potential confounders of normative data, that is, age, gender and years of education, differential effects on the TADS domains or total caseness were detected for gender and education, while effects of age were unsystematic and did not allow examination of cutoff markers (Table 4). An effect of gender was detected for EmoAb and SexAb in favour of men who reported lower presence of any adversity in these domains as indicated by standardised residuals (SR) below (1.96, while EmoNeg, PhyNegR, and PhyAbR as well as the TADS domain total were negatively associated with years of education (Table 4). Marital status, likely confounded by other variables and thus not considered separately for normative data, suggested evidence of a greater likelihood of any kind of abuse in separated, divorced, or widowed participants (Table 4). In general, diagnostic accuracy measures of binary TADS caseness gave comparable figures for both proxy measures (Table 5). As expected, sensitivity for total of TADS domains and of total TADS scale were very high, but specificity low, especially for the total scale. Totals for TADS domains demonstrated high sensitivity but lower specificity to depressiveness. From the TADS domains, SexAb showed low sensitivity but high specificity for both depressiveness and help-seeking and moderate positive LR for depressiveness. Also PhyAbR showed quite low sensitivity but high specificity. For other TADS domains, sensitivity and specificity figures were relatively balanced.

Discussion
Within the EPOS project, the TADS was developed as a brief self-report scale of childhood adversity and trauma covering several core domains as well as tapping into other aspects of a broad concept of adversity (Thabrew et al., 2012). Employing a large general population sample, the current study examined major psychometric properties of the TADS and possible normative data which is often lacking for similar measures (Burgermeister, 2007;Pietrini et al., 2010;Thabrew et al., 2012).

Reliability: internal consistency
With the exception of the PhyNeg sub-scale that displayed only borderline internal validity even after revision, all other trauma subscales exhibited acceptable or excellent internal consistency indicating that the TADS and its subscales reliably assess the target construct of retrospective ''childhood trauma.''

Inter-method reliability
Inter-method reliability as measured between self-reported and interview-reported trauma scores was sufficiently high for individual items, subscales, and TADS totals with no indication of a general bias towards either an under-or over-reporting. There was, however some indication of better inter-method reliability for raw score based subscale and TADS totals compared to dichotomised scores (those having any one included item with a frequency of at least ''sometimes''). While the ICCs of all raw score sums indicated excellent agreement, k values of dichotomised domains and totals were poorer and fell below the threshold for clinical utility for totals.
Inter-method reliability was poor overall for three items (2, 17, and 36), two of which had originally been part of the physical neglect and abuse domains, respectively, and negatively affected their internal consistency. These were removed from the respective domains and in the case of PhyAbR, replaced by an item with excellent inter-method reliability. A further five items (23, 31, 34, 38, and 39) possessed better inter-method reliability for their raw scores compared with dichotomised scores. Finally, EmoNeg appeared to be over-reported by self-report compared to interview, hence self-reports should be treated with some caution for this scale. Concurrent validity and normative data Childhood adversity has frequently been associated with adult mental disorder, particularly depression (e.g., Fryers & Brugha, 2013;Kessler et al., 2010;Lindert et al., 2014), and so the DEPS screen positive cases and help-seeking for mental problems were used as proxy measures of the construct ''traumatisation'' in examining the TADS's concurrent validity and generation of norms. Effect sizes indicated small to moderate associations between proxy measures and TADS categories (caseness) that appeared to be quite robust. Because the relationship between childhood trauma and adult mental ill health is complex and significantly mediated by many interacting factors (Fryers & Brugha, 2013;Kessler et al., 2010), the small-to-moderate effect sizes suggest good concurrent validity of the TADS.
Using the same two proxies of clinically significant prior adversity as markers, TADS trauma domains and totals were assessed for their diagnostic relevance. We additionally examined the influence of age, gender and education to see if we could improve the population fit using different demographic norms . Education particularly seemed to relate to TADS scores and the inverse association appears supportive of  research linking childhood adversities to impaired physical brain development (Bick & Nelson, 2016) as well as the impact on education (Font & Maguire-Jack, 2016) and studies linking education to poly-victimisation (e.g., Barker, Kerr, Dong, Wood & DeBeck, 2015;Horan & Widom, 2015;Min, Farkas, Minnes, & Singer, 2007). Depression and help-seeking status also enabled an approximate comparison of diagnostic accuracy measures for TADS domain and total caseness. As expected, the total TADS (43 items) scale had very low specificity for proxy measures and therefore may not be suitable for detecting early traumatisation. Specifically for depressiveness, total TADS domains also demonstrated low specificity but higher specificity for help-seeking which is likely to be an indicator of a much wider range of psychiatric symptoms or disorders and thus indicates the instrument's clinical utility. Because of the low reported frequency of sexual abuse events, sensitivity for SexAb remained low, but its high specificity and moderate positive LR for depressiveness support the view that childhood sexual abuse is specifically related to clinical depression in adulthood (Lindert et al., 2014). However, with regard to specificity, positive predictive value (PPV) and LRs in particular, the limited nature of depression and helpseeking as proxy measures of ''traumatisation'' in a general population sample has to be kept in mind.

Strengths and limitations
In addition to the good psychometric properties of the TADS indicated by the present results, some further strengths as well as limitations should be discussed. While the TADS data presented is from a large adult general population and primary care samples with broad age ranges, the high level of non-responders may limit the representativeness of results and might have biased the reporting of childhood adversity, depression, and helpseeking. Females and young adults were particularly over-represented among subjects. In addition, it must be noted that the Finnish population is very demographically homogenous (97% spoke the official native language Finnish/Swedish) and the proportion of non-Caucasian people is very low (under 1%). This fact clearly limits generalisation of the results to other countries with more multicultural populations. Altogether 95% of participants affirmed at least one TADS item as having occurred ''sometimes'' or more often. Most frequent were reports of childhood and family ''being perfect'' and ''the greatest ever,'' respectively, of doing well at school, having trusted friends, and of having experienced the loss of an important person. None of these items are part of the five core domains, and, consequently, when only 24 domain items were considered, the hit rate reduced considerably. Figures for emotional (51.2%) and physical (49.7%) neglect and for emotional (37.4%) and physical (23.1%) abuse were considerably higher than in some other studies (Barbosa et al., 2014;Christoffersen, Armour, Lasgaard, Andersen, & Elklit, 2013;Kessler et al., 2010;Saed et al., 2013;Schü ssler-Fiorenza Rose et al., 2014), while prevalence of SexAb (5.5%) was as frequent as in German (Iffland et al., 2013) and Brazilian population samples (Barbosa et al., 2014) but lower than in the Boston area study (Chiu et al., 2013) and higher than in the WHO study (Kessler et al., 2010). However, the use of different instruments and definitions of adversity impede direct comparisons between separate studies (Burgermeister, 2007;Thabrew et al., 2012). For example in the ACE study (Schü ssler-Fiorenza Rose et al., 2014), the emotional abuse category included only one item with a description of adverse events (prevalence: 34%), while the TADS, for which the five emotional abuse items included also milder events, reported 51% prevalence. It is also possible that recent public and media discussions on childhood adverse experiences in Finnish society have increased reporting for milder adverse events or experiences. While the high reporting of prior adversity reported by this sample may indicate a questionnaire return bias, the rates of reported depression scores according to the DEPS of 20% and that of lifetime help-seeking of 27% is in line with the previous prevalence reports of mild-to-moderate self-reported depression symptoms of 14% in adults of the Finnish community sample (Koivumaa-Honkanen, Kaprio, Honkanen, Viinamäki, & Koskenvuo, 2004) and of help-seeking for mental health problems of 23% in young adults of a Swiss community sample (Schultze-Lutter, Michel, Ruhrmann, & Schimmelmann 2014). Thus, it is unlikely that a return bias towards more distressed individuals has driven the high reported rate of childhood adversity.
One limitation that is inherent to the construct of childhood adversity and trauma and consequently relates to all similar studies is the lack of a ''gold standard'' measure for the retrospective assessment of the complex construct ''traumatisation.'' Thus, when comparing the concurrent validity of adversity and trauma assessments, much depends on the quality of proxy measures of the construct. Based on consistent reports of a causal link between childhood adversities, traumatisation, and adult mental ill health (e.g., Fryers & Brugha, 2013;Kessler et al., 2010;Lindert et al., 2014), we had chosen a self-report measure of current depression and report of lifetime help-seeking for mental disorders that despite their differing time frames of reference led to impressively similar results. The proxy measures were limited in that only current depressiveness was assessed thus potentially excluding any earlier depressive episodes and helpseeking only from psychiatric services was assessed, yet help-seeking can involve other providers such as primary health services or indeed help might not be sought at all (Kaskeala, Sillanmäki, & Sourander, 2015). Future evaluations of the TADS might usefully employ measures of hypothalamicÁpituitaryÁadrenal (HPA) axis dysregulation as a neurobiological marker and proxy measure of childhood traumatisation. The HPA axis is a major part of the neuroendocrine system that controls reactions to stress, involved in the neurobiology of many mental disorders (Baumeister, Lightman, & Pariante, 2014) and permanently modulated by early life stressors (Macrì, Zoratto, & Laviola, 2011). While exposure to mild or moderate stressors early in life has been shown to enhance HPA regulation and promote a lifelong resilience to stress, early-life exposure to extreme or prolonged stressors can induce a hyper-or hypo-reactive HPA axis and may contribute to lifelong vulnerability to stress (Flinn, Nepomnaschy, Muehlenbein, & Ponzi, 2011;Hinkelmann et al., 2013). Similarly, future studies of the concurrent validity of assessments of childhood trauma could also consider employing cortisol*in particular hair cortisol that reflects cumulative cortisol levels over long periods of time*as a measure of potential HPA axis dysfunction and thus a neurobiological proxy of traumatisation (Hostinar & Gunnar, 2013).

Conclusions and outlook
In relation to measuring the important role that early-life traumatisation plays in the development of adult mental health problems and disorders, many instruments have been developed based on face validity, with relatively few reporting psychometric properties (Burgermeister, 2007;Pietrini et al., 2010;Thabrew et al., 2012). Regarding the TADS and its five revised sub-scale domains, our results indicate good psychometric properties in terms of internal consistency, content, inter-method reliability, and concurrent validity for adults from a Finnish community sample. These findings require replication and our suggested cutoff markers for clinical significance and traumatisation, respectively, will need validation with independent samples such as clinical populations or in other regions employing different proxy measures of traumatisation (including neurobiological) and prospective studies.
In addition, the testÁretest reliability of the TADS and its applicability to younger samples should be reported. As regards the TADS's utility, it seems possible to improve this while retaining good content validity in terms of the five core domains of childhood trauma by employing only 24 of the measures items.
Overall, the TADS appears to be a useful instrument for the assessment of retrospectively reported childhood adversity and trauma beyond the contextual framework of its original development for the prediction of psychosis in clinical high-risk samples.

Conflict interest and funding
There is no conflict of interest in the present study for any of the authors. The study was funded by Turku University Central Hospital (EVO funding). Dr. Patterson was partly funded by the National Institute for Health Research (NIHR) through the Collaborations for Leadership in Applied Health Research and Care in the West Midlands (CLAHRC-WM). Professor Salokangas was partly funded by Oy H. Lundbeck Ab, Finland.