Psychometric properties of the International Trauma Questionnaire (ITQ) examined in a Norwegian trauma-exposed clinical sample

ABSTRACT Background The International Trauma Questionnaire (ITQ) is a self-report measure for post-traumatic stress disorder (PTSD) and complex post-traumatic stress disorder (CPTSD), corresponding to the diagnostic criteria in the International Classification of Diseases, 11th Revision (ICD-11). A 12-item version of the ITQ based on samples from English-speaking countries has been presented, and the wider generalizability to other languages needs to be examined. Objective The current study examines the psychometric properties of scores from a longer, preliminary 22-item version of the ITQ and the current reduced 12-item version by means of generalizability theory (G-theory) and confirmatory factor analysis (CFA). Method The 22-item version of the ITQ was translated into Norwegian and administered to patients in two trauma treatment trials (total N = 202). A generalizability study was used to investigate the psychometric properties of scores reflecting CPTSD. G-theory was also used to investigate alternative measurement designs to optimize the sufficient number of items that provide acceptable generalizability and dependability of scores. Model fit to the theoretical factor structure was then examined by CFA, both for the 22-item version and for the 12-item version of the ITQ. Results The two subscales negative self-concept and relational disturbances had acceptable generalizability coefficients. We found substantial measurement error related to affective dysregulation, mainly attributable to affective hyperactivation. A latent factor structure model with two separate affective dysregulation factors: hyperactivation and deactivation, represented the data well in the 22-item version. The proposed confirmatory structure model for the 12-item short form did not converge in the CFA. Conclusion This study supports the applicability of the ITQ in a non-English-speaking country and provides support for the validity of the Norwegian translation. Further research is needed to improve the psychometric properties of the affective dysregulation subscale.


Introduction
The International Classification of Diseases, 11th Revision (ICD-11) working group on disorders specifically associated with stress distinguishes complex posttraumatic stress disorder (CPTSD) from post-traumatic stress disorder (PTSD) (World Health Organization, 2019). PTSD and CPTSD share the gate criteria of exposure to one or more potentially traumatic events. PTSD is defined by the presence of re-experiencing symptoms (Re), such as intrusive nightmares and flashbacks of the event, avoidance (Av) of internal and external stimuli associated with the trauma, and a sense of current threat (Th). Alongside these symptoms, persons with CPTSD also suffer from problems with affect dysregulation (AD), negative self-concept (NSC) and disturbances in relationships (DR) related to their trauma. These last three symptom clusters are jointly referred to as disturbances in self-organisation (DSO) (Maercker et al., 2013).
The term 'complex PTSD' was first coined by Judith Herman (1992) to fit the varying symptoms and difficulties encountered by survivors of repeated and prolonged interpersonal trauma. Studies have found that both childhood abuse and war captivity in adulthood are linked to an increased risk for CPTSD Zerach, Shevlin, Cloitre, & Solomon, 2019).
Further research on CPTSD, its prevalence and treatment implications, rests on the availability of valid and reliable diagnostic instruments to assess and differentiate PTSD and CPTSD in various languages and cultural contexts. The International Trauma Questionnaire (ITQ) has been developed to be an accessible and usable self-report measure . The ITQ is available in a number of translations (e.g. Bondjers & Arnberg, 2015;Ho et al., 2019;Kazlauskas, Gegieckaite, Hyland, Zelviene, & Cloitre, 2018;Vallières et al., 2018).
The instrument is constructed to reflect two overarching constructs, PTSD and DSO. Within PTSD and DSO, items are nested in six subordinate symptom clusters. Studies of the ITQ have found the internal reliability of the six ITQ symptom clusters to be acceptable in both clinical Karatzias et al., 2016) and non-clinical samples (Ben-Ezra et al., 2018;Cloitre et al., 2018;Ho et al., 2019). In factor analytic studies, two structural representations of the ITQ have repeatedly gained support. A correlated six-factor (Re, Av, Th, AD, NSC and DR), first-order model seems to fit the data best in trauma-exposed community  and student samples (Ho et al., 2019). In clinical samples, a correlated, second-order model closely corresponding to the ICD-11 diagnostic taxonomy has been found to be superior. In this model, PTSD explains the covariation between Re, Av and Th; and DSO the covariation between AD, NSC and DR Karatzias et al., 2016). It is worth noting that the differences between the two models are modest in most studies. Ben-Ezra et al. (2018) found support for a third structural model in an Israeli trauma-exposed community sample. In this model, affect dysregulation is split into two separate hyperactivation and deactivation factors, but is otherwise equal to the second-order model above. To our knowledge, this model has not been examined in clinical samples.
Initial studies of the ITQ used versions with six or more items to capture PTSD and 16 items to capture DSO (Karatzias et al., 2016;Kazlauskas et al., 2018). To ensure ease of administration and scoring while preserving the core symptoms of CPTSD, the developers of the ITQ aimed to reduce the number of DSO items. Shevlin et al. (2018) concluded that all the DSO items measured their intended symptom clusters well, and a final 12-item version of the ITQ with six DSO items has been proposed . The development of test versions with fewer items is typically motivated by a reduction of the burden on the respondent, or a test administrator's wish to cover more concepts within a restricted period. Reduction of a test form to a more limited set of items requires several considerations, among them the implications for influence of measurement error on the test results. Is reliable assessment of personrelated differences maintained in a short form? The ITQ is a complex measure, and the number of items needed to assess each symptom cluster reliably may vary. In addition, the items in the ITQ reflect various latent constructs defined by theory. Are the intended construct domains adequately covered with fewer items? Both questions relate to the dependability of the ITQ scores and, to our knowledge, they have not been addressed in prior studies of the ITQ short form.
The implications of item reduction for the practical utility of the ITQ also warrant consideration. Kane (2001) discusses the role of consequences in test validation, and proposes a distinction between the set of inferences leading from test scores to statements about persons, and decisions based on these statements. Although the ITQ primarily has been presented as a diagnostic screening tool , it is conceivable that future practical use could involve other clinical decisions and purposes. These include categorization of persons as in need of treatment intervention or not with reference to a clinical cut-off point, rank ordering of patients by symptom severity (as a basis for prioritizing certain treatment interventions for certain patients) or feedback to individual patients on how their symptoms change during treatment. The validity of such forms of use will rest both on the dependability of the ITQ scores and on the test user's familiarity with the strengths and limitations of the ITQ for different decision-making purposes.
In the present study, we aimed to examine the psychometric properties of scores from both the longer, preliminary 22-item version of the ITQ and the current reduced 12-item version, in a Norwegian clinical sample. Specifically, we examined reliability and validity in terms of generalizability and model fit to the theoretical structural model. Analyses based on generalizability theory (G-theory) and confirmatory factor analysis (CFA) were used.

Participants and procedures
Study participants were patients from two ongoing Norwegian trauma treatment studies. The first is a randomized controlled trial of outpatient stabilizing group treatment (N = 152) and the second is an ongoing randomized controlled trial comparing prolonged exposure, skills training in affective and interpersonal regulation (STAIR) and STAIR + narrative therapy (NT) in an inpatient setting (N = 50). In both studies, a local physician, psychologist or psychiatrist had referred the participants to specialized trauma treatment prior to recruitment. Data were collected at pretreatment assessment in the first study and at treatment start in the second study. Both studies have been approved by the Regional Committees for Medical and Health Research Ethics, Health South-East.
The total sample consisted of 202 patients, with a mean age of 41.5 years (SD = 9.5, range = 24-69 years). Of these, 53.1% were married or living with a partner in a committed relationship, 28.2% were employed full or part time, 7.9% were students and 70.3% received full or partial welfare benefits (e.g. sick leave, disability pension). Exposure to interpersonal trauma in childhood was assessed by the Childhood Trauma Questionnaire in Study 1 (Bernstein & Fink, 1998;Dovran et al., 2013) and Stressful Life Events Screening Questionnaire in Study 2 (Goodman et al., 1998;Thoresen & Øverlien, 2009), and is reported in Table 1. Almost all patients reported more than one type of trauma (92%). By the ITQ's latest diagnostic algorithm , a minority of the sample (13.4%) met the requirements for PTSD while over half of them (60.4%) met criteria for the CPTSD diagnosis. The remaining patients (26%) had substantial symptoms without reaching full diagnostic criteria for either disorder. Out of the six PTSD and DSO symptom clusters, the mean number of symptom clusters endorsed by this group was 3.9 (SD = 1.2). The estimated diagnostic rates are based on symptom items only.

PTSD and DSO symptoms
Both studies used a Norwegian translation of the preliminary 22-item version of the ITQ. Three  (Baekkelund, Sele, & Berg, 2019). The first section of the 22-item ITQ is devoted to three PTSD symptom clusters: re-experiencing of the trauma, avoidance of internal or external trauma reminders, and sense of current threat. These are measured by two items each (Re1 and Re2, Av1 and Av2, and Th1 and Th2). The second section consists of 16 DSO items. DSO is subdivided into three main symptom clusters: affective dysregulation (both hyperactivation and deactivation) (AD1-AD9), negative self-concept (NSC1-NSC4) and disturbances in relationships (DR1-DR3). All items are answered on a five-point Likert scale, ranging from 'Not at all' to 'Extremely'. In the PTSD section, respondents are instructed to report how much they have been bothered by the symptom in the past month. For DSO symptoms, they are asked to report how they typically feel, think about themselves and relate to others.
The current ITQ version has six PTSD items, six DSO items and three functional impairment items related to each symptom category . The six items chosen to represent the three DSO clusters are AD2 and AD6, NSC1 and NSC2, and DR1 and DR2 from the version we used. Functional impairment items were not part of the ITQ version used in this study, and the reported diagnostic rates are based on the 12 symptom criteria alone.
In the present study, the complete 22-item ITQ was used in the first study (both PTSD and DSO items, N = 152). In the second study (N = 50), participants completed the 16 DSO items of ITQ. The remaining six PTSD items were collected from corresponding items in the PTSD Checklist for DSM-5 (PCL-5) (Weathers et al., 2013). The PCL-5 items are reported on the same five-point Likert scale (with slight semantic differences in anchors), and used to construct a complete ITQ score. 1 See Table A1 in the Appendix for descriptive statistics and item endorsement rates (i.e. items scored ≥ 2).

Statistical analysis
G-theory is a formal statistical approach suited for investigating the psychometric properties of scores in multi-facet measurement designs, like the ITQ. It is applicable when the aim is to optimize a measure by reducing measurement error and the number of items without narrowing the construct domain (Brennan, 2011). Studies based on G-theory provide estimates of how the dependability of scores changes as the number of items changes in different test formats. This can aid a test designer's decision about the appropriate number of items in a new test format. Thus, G-theory supplements other analytic strategies in short-form development (e.g. item response theory or CFA) more suited to select the specific items to include in a new test format. Two types of studies are conducted within G-theory: generalizability studies (G-studies) and decision studies (D-studies).
A G-study provides information on the different variance components of a test. Both variance related to the intended object of measurement (variance in test scores that is attributable to differences between persons) and various sources of measurement error are estimated. Sources of measurement error can be differences related to items, raters or test occasions.
A D-study uses information from a G-study to design the best possible application of a measurement for a particular purpose (Webb, Shavelson, & Haertel, 2006). A distinct characteristic of G-theory is the distinction made between reliability involving absolute decisions, which is relevant if clinical decisions are based on an individual's score, and relative decisions involving stability in relative standing or rankings of persons (Brennan, 2003;Feldt & Brennan, 1989). This distinction is important and needed in clinical practice because most clinical decisions concern the standing of a given patient with regard to criteria used for determining clinical intervention (absolute decisions). In G-theory, the term 'universe score' refers to the long-run average of observed scores a person would obtain in the broad universe of admissible observations, analogous to 'true score' in classical test theory. Two types of relevant coefficients can be estimated to represent different definitions of measurement error: the 'generalizability coefficient' (G-coefficient) is the ratio of universe score variance to itself plus relative error variance, and the 'index of dependability coefficient' is a more conservative estimate of reliability, defined by the ratio of the universe score to itself plus absolute error variance. G-coefficients > .80 are regarded as acceptable. A total of six facets (including the object of measurement) may be estimated simultaneously in balanced designs. Multivariate G-study and D-study analyses were conducted in mGENOVA (Brennan, 2001).
First, we examined the 22-item ITQ in a multifacet G-study (p × i design). We treated the DSO symptom clusters (affective dysregulation, negative self-concept and relational disturbances) as three separate fixed facets and the three PTSD symptom clusters (re-experiencing, avoidance and sense of current threat) as one fixed facet. 2 Items within each fixed facet were regarded as randomly selected indicators and treated as a random facet. Three sources of measurement variance (persons, items, and person by items interactions) were estimated separately for the four fixed facets. The person component is the intended object of measurement, and reflects variance related to individual differences. The item component reflects measurement error related to systematic inconsistencies between items in a facet, across persons. G-theory is a random sampling theory and, as such, items are assumed to be randomly sampled from an infinite universe of items that are equivalent representations of a latent construct. The item component represents the degree of violation of this assumption. Item by person interaction is a second source of measurement error. It provides estimates of variation in the rank ordering of individuals based on different items. Acceptable scores are indicated by a combination of a high person component and low error components (item component, and item by person interaction component).
Secondly, we conducted a D-study of the 22-item ITQ to obtain a composite G-coefficient for the test as a whole, and separate G-coefficients for PTSD, affective dysregulation, negative self-concept and relational disturbances. We repeated these analyses to obtain the same information using the items included in the 12-item form proposed by .
Thirdly, we analysed factor structure in the ITQ by comparing two previously proposed models in CFA. Model 1 ( Figure A1) closely corresponds to the ICD-11 proposal, with two correlated second-order factors (PTSD and DSO), each with three underlying firstorder factors (for PTSD: Re, Av and Th; and for DSO: AD, NSC and DR) . In model 2 ( Figure A2), affect dysregulation is construed as two separate factors, affective hyperactivation and deactivation, both loading on DSO (Ben-Ezra et al., 2018). The analysis of model 1 was repeated for the 12-item short form developed by Cloitre et al. (2018) to assess factorial stability ( Figure A3). See Appendix for a graphical presentation of the models.
We used the means and variance-adjusted weighted least squares (WLSMV) estimator for the CFA analyses. WLSMV provides accurate parameter estimates, standard errors and test statistics for ordinal indicators. The amount of missing data was low, with 61 missing data points (1.3%). Standard criteria were used to assess model fit. Comparative fit index (CFI) and Tucker-Lewis index (TLI) values ≥ 0.90 indicated acceptable fit, and values ≥ 0.95 indicated excellent fit; root mean square error of approximation (RMSEA) values ≤ 0.8 indicated acceptable fit and values ≤ 0.5 indicated excellent fit (Hu & Bentler, 1999). WLSMV does not produce information-based indices needed for comparisons of model fit.
Therefore, all models were also fitted using robust maximum likelihood (MLR) to obtain the Bayesian information criterion (BIC). A model is considered to have strong evidence of statistical superiority when BIC values are 6-10 points lower than a competing model (Raferty, 1995). Mplus 8 was used in all CFA analyses (Muthen & Muthen, 2015).

G-study
In an initial analysis, we found high estimates of measurement error in the affective dysregulation scale. The estimates for person-related variance and item-related variance were .406 and .467, respectively. This means that the observed scores are more strongly related to measurement error than to the intended object of measurement, person differences. Based on this result, we chose to split the affective dysregulation facet into two forms of emotional problems embedded in the scale, which are termed hyperactivation and deactivation. This allowed for separate estimates of variance related to persons and measurement error in the two scales. For deactivation, we found a high person component and a low item component. Person by item interaction was high, reflecting differences in rank ordering of persons by different items. For the hyperactivation scale, the person component was low and item-related variance was high, indicating a high degree of measurement error for items of this particular subscale.
Negative self-concept showed satisfactory scores with a high person component relative to a low item component. Relational disturbances showed the same desired pattern of high person component and low item component. Error variance related to person by item interaction was also lower than the person component in both facets. In sum, this reflects that variances of scores in negative self-concept and relational disturbances are systematically related to individual differences, with little influence of measurement error. For PTSD, the person component was also higher than the item component, but lower relative to person by item interaction, reflecting differences in rank ordering of individuals from different items in this subscale.
The G-study results for the facets negative selfconcept and relational disturbances indicate that item reduction is viable without compromising the dependability of the scores. Fewer items may probably measure affective deactivation adequately, too. Item reduction seems less feasible for affective hyperactivation owing to extensive item-and item by person-related variance. Results from the G-study are reported in Table 2.

D-study
The D-study estimates for the 22-item version and a 12-item short form version of the ITQ are displayed in Table 3.
For the 22-item version of the ITQ, the composite G-coefficient score was excellent (.926). The Gcoefficients of the facets displayed more variation. PTSD was marginally lower than desired at .74, hyperactivation was substantially lower than desired at .62, deactivation was acceptable at .79, negative self-concept was excellent at .92 and relational disturbances was acceptable at .84. The 12item short form version had a lower, but still acceptable, composite G-coefficient (.86709). The G-coefficient estimates for the subscales negative self-concept, .85, and relational disturbances, .77, were both acceptable. However, the affective dysregulation scale had a very low G-coefficient value of .45.

CFA
Two confirmatory factor structure models were compared for the 22-item ITQ (see Figures A1 and A2 in the Appendix). Model 1 has a single affect dysregulation factor loading on DSO. Model 2 distinguishes between hyperactivation and deactivation as two separate affect dysregulation factors. Both models broadly correspond to the theoretical structure of CPTSD in ICD-11. Both models 1 and 2 have acceptable goodness-of-fit indices, with chi-squared to degrees of freedom ratios < 3:1, RMSEA levels < .08 and CFI/TLI > 0.95 (Table 4). Comparing BIC and sample-size adjusted BIC values indicates strong evidence of statistical superiority of model 2 to model 1 (Raferty, 1995). This suggests that model 2 should be retained, despite being more complex than model 1. Model fit statistics are displayed in Table 4.
Standardized factor loadings for model 2 are displayed in Table A2 (Appendix). All first-order loadings on re-experiencing (Re), avoidance (Av), threat (Th), negative self-concept (NSC) and disturbed relationships (DR) were positive, high (> .70) and significant (p < .001). The factor loadings for the two parts of affective dysregulation (AD) differed. Affective deactivation had positive and high loadings for all items (> .70). For affective hyperactivation, one item had a satisfactory factor loading above .70, three items loaded between .50 and .70, and one item (reckless behaviour) had a low loading of .29. The second-order loadings of hyperactivation, deactivation, negative self-concept and relational disturbances on DSO were all high (> .70) and statistically significant. For the PTSD factor, the second-order factor loading for threat was high (.88). Re-experiencing (.68) and avoidance (.61) had lower, but acceptable, loadings on PTSD. The two second-order factors, PTSD and DSO, were highly correlated (r = .81, p < 0.001).
The CFA of model 1 based on the 12 items proposed for the ITQ short form  did not converge. We found negative residuals in the AD factor. Two alternative models were tested. Neither a model allowing the hyperactivation item of AD to load on threat in addition to AD, nor a model where AD loaded on both PTSD and CPTSD resulted in converging models. Cloitre et al. (2018) used dichotomized variables in their study. Recoding our data set to dichotomized variables did not provide converging solutions. Problems with non-converging solutions may result from overparameterization, as σ 2 , universe score variance; σ 2 (δ), relative measurement error; σ 2 (Δ), total measurement error; Eρ 2 , generalizability coefficient; Φ, index of dependability.  each of the six first-order factors rests on only two indicators.

Supplementary analysis
We set out to examine the underlying structure of the 12-item short form. The results from the CFA indicated the need for supplementary data analysis to suggest an alternative short form. New D-studies were performed and compared to find a model with acceptable dependability estimates. An 18-item design with six PTSD items, five hyperactivation items, three deactivation items, and two items each for negative self-concept and relational disturbances was suggested to provide the best balance between brevity and dependability of the scores. The composite G-coefficient for this measurement design was acceptable (.897). The G-coefficients for the separate hyperactivation and deactivation scales were markedly improved compared to the poor G-coefficient (.45) found for the joint affect dysregulation scale in the 12-item design. However, the hyperactivation subscale was unchanged from the 22-item design, and still had a low G-coefficient of .62. The G-coefficient of .73 for the deactivation scale was also somewhat lower than desired. D-studies give information on the appropriate number of items needed for reliable estimation of each facet, but not on the selection of specific items. To reduce the number of items to an 18-item short form, we used CFA and inspected the standardized factor loadings of the items in the 22-item version (see Table A2 in the Appendix). Items with high loadings (> .70) on their corresponding factor and low cross-loadings to other factors (as indicated by inspection of modification indices) were considered as candidates for a short form. The modest G-coefficient in the D-study indicated that all five hyperactivation items should be retained (AD1-AD5). To represent deactivation, AD6, AD8 and AD9 were selected. AD7 also loaded adequately (< .70) on deactivation, but had high crossloadings to both NSC and DR. For negative self-concept, both NSC10 and NSC11 had high loadings on the factor and low cross-loadings to other factors. For relational disturbances, both DR14 and DR15 had high factor loadings and low cross-loadings to other factors. While the AD items are changed, the PTSD, NSC and DR items are the same as in the 12-item version.

Discussion
The ITQ provides clinicians and researchers with the first instrument to assess PTSD and CPTSD in line with ICD-11 diagnostic criteria. This study is based on a Norwegian translation of the instrument and adds to an expanding set of data about the psychometric properties of scores from the ITQ.
By means of generalizability theory, this study contributes to a further understanding of the sources of measurement variance and measurement error in the ITQ. Test scores inevitably reflect both an intended object of measure (in this case, individual differences in CPTSD symptoms) and other unintended variance components with the potential to reduce dependability of the scores. Estimates of these variance components provide important information when the aim is to develop reliable shortform versions.
For PTSD symptoms, we found that both when the three symptom clusters (Re, Av and Th) were estimated as one facet and when they were estimated separately (see Table A3 in the Appendix), personrelated variance was high and item-related error was modest. The larger person by item interaction component indicates that the rank ordering of persons may differ with different PTSD items. The overall Gcoefficient was acceptable, although in the lower range.
We found high estimates of person-related variance and little measurement error for two of the three DSO facets, namely negative self-concept and relational disturbances. The G-coefficients for these two facets were still acceptable when the number of items was reduced to two, supporting the proposal in the 12-item ITQ .
For the third DSO cluster, affective dysregulation, we found problematic measurement error estimates, with scores reflecting item-related variance and person by item-related variance to a larger extent than person-related variance. The affect dysregulation facets with two items, parallel to the 12-item ITQ , had a low G-coefficient. From a psychometric perspective, this calls for refinement of the facet, possibly by adding items to reduce the influence of measurement error to more adequately target the construct domain.
This study also suggests that affective dysregulation in the ITQ may be more properly conceived as two different facets than as one facet. Separate analysis of the variance components for hyperactivation and deactivation gave valuable information on the sources of the problematic error estimates. Deactivation had high person-related variance and acceptable error estimates, while hyperactivation had the opposite pattern (low person-related variance and high error estimates). The G-coefficient for hyperactivation (with all five items retained) was clearly below the acceptable level, with a value of .62. Based on these findings, further item reduction on the hyperactivation facet would not be recommended. Regarding the deactivation facet, the G-coefficient estimates were less conclusive. It is debatable whether three items yield sufficient dependability of scores, or if all four items should be retained.
The confirmatory factor analyses of two structural models of the ITQ, model 2 with a split affective dysregulation factor (where hyperactivation and deactivation are seen as separate factors) and model 1 with a merged affective dysregulation factor, contribute to the same picture. A BIC difference above 30 points favours the split model over the merged model. This split model is not among the most frequently studied models, but our finding replicates the findings from a trauma-exposed community sample (Ben-Ezra et al., 2018) in a clinical sample of childhood abuse patients. This strengthens the argument for a differentiated view on affective dysregulation.
Affective dysregulation is a central problem area in CPTSD and theoretically complex. Both undermodulated and overmodulated affect are repeatedly found to be common consequences of trauma (Lanius, Brand, Vermetten, Frewen, & Spiegel, 2012) and the ITQ is intended to cover both forms . Theories of emotion dysregulation, e.g. the 'window of tolerance' model, propose that these forms of dysregulation are closely associated in persons exposed to interpersonal childhood trauma. Siegel (2012) states that repeated exposure to out-ofcontrol emotions in childhood, combined with the lack of effective caregiver regulation, develops into impairments in the ability to self-soothe effectively in adulthood. Consequentially, later emotionally challenging experiences may overwhelm the person's regulatory capacity, resulting in frequent states of hyperactivation or deactivation, or oscillation between the two. Thus, different forms of affect dysregulation problems are expected to vary across individuals and over time within individuals. Both our results and other studies based on the ITQ suggest that we do not know the precise relation between hyperactivation and deactivation problems. A study of CPTSD symptom networks found high interrelatedness of symptoms (nodes) in negative self-concept and relational disturbances, but weaker associations within the symptoms of affective dysregulation (Knefel et al., 2019). Previous factor analytic studies also give a mixed picture. Karatzias et al. (2016) found weak factor loadings (< .60) for seven out of nine items on the affect dysregulation scale.  report higher factor loadings, > .70 for six out of nine items, whereas Rocha et al. (2019) propose that affect dysregulation should be split into three different factors. The samples in the above studies vary in the extent and type of traumatic experiences. In our study, the majority of the participants had been exposed to severe, repeated interpersonal childhood trauma. In that aspect, the affect dysregulation problems they report could be expected to approach Siegel's (2012) description. Our findings suggest that dominance of either hyperactivation or deactivation symptoms may be a more common clinical presentation than a pattern of frequent shifts between the two. A more thorough understanding of the relation between hyperactivation and deactivation may have important implications for treatment and should be a focus for further studies.
A further important finding from this study is the inability to replicate the structure model for the 12item short form. While Cloitre et al. (2018) uses dichotomized variables (symptoms scored ≥ 2 are regarded as present), we used the full five-point scale. However, this difference did not account for the non-converging CFA models in our study. The theoretical structural model for the 12-item ITQ is complex. With only two indicators reflecting each of the six first-order factors and two correlated secondorder factors, there is a risk of overparameterization of the model.
The ITQ is a self-report measure that provides operational definitions of the ICD-11 criteria for PTSD and CPTSD. We found acceptable reliability estimates for two of the three DSO facets in the 12item short form. Our finding suggests that the affective dysregulation scale needs further refinement. Regarding hyperactivation and deactivation separately is a viable alternative to a merged facet. We found that three items provided dependable estimates of deactivation, while hyperactivation needed five (or preferably more) items.
At this time, the ITQ is the only available selfreport measure of CPTSD that corresponds directly with the ICD-11 criteria, and therefore it is valuable in a variety of contexts. The 18-item short form adds to the list of ITQ versions, potentially expanding the applicability of the ITQ to areas beyond diagnostic screening, such as patient feedback on symptom change during therapy, or treatment decisions based on an individual's standing compared to clinical cutoff points. These applications of the ITQ involve interpretation of the test results of individual patients. The validity of such interpretations rests on the use of a measurement design with acceptable dependability estimates for the particular test purpose. This consideration should be taken into account when researchers and clinicians decide on the appropriate format of the ITQ for their use. It remains debatable whether an 18-item version is a substantial reduction of respondent burden compared to using the full 22-item form. Both the 18-item and the 22-item versions provide test users with alternative formats, thus expanding the practical utility of the ITQ.
This study has some limitations. A majority of the sample had long-standing problems and several prior treatment attempts, and may not be representative of a wider trauma-exposed clinical population. Also, the lack of assessment of functional impairment restricts the interpretation of our results, as the extent to which the endorsed problems affect daily life function is unknown. Although a full ITQ score was available from three out of four participants in our sample, six PTSD items were collected from corresponding items in the PCL-5 to create a full ITQ score for the remaining participants. These PCL-5 items are reported on the same scale and are highly similar in wording, but not identical, to the original PTSD items in the ITQ, which may have influenced our findings.