Measuring anxiety after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Anxiety item bank and linkage with GAD-7

Objective To develop a calibrated item bank and computer adaptive test to assess anxiety symptoms in individuals with spinal cord injury (SCI), transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a statistical linkage with the Generalized Anxiety Disorder (GAD)-7, a widely used anxiety measure. Design Grounded-theory based qualitative item development methods; large-scale item calibration field testing; confirmatory factor analysis; graded response model item response theory analyses; statistical linking techniques to transform scores to a PROMIS metric; and linkage with the GAD-7. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures Spinal Cord Injury-Quality of Life (SCI-QOL) Anxiety Item Bank Results Seven hundred sixteen individuals with traumatic SCI completed 38 items assessing anxiety, 17 of which were PROMIS items. After 13 items (including 2 PROMIS items) were removed, factor analyses confirmed unidimensionality. Item response theory analyses were used to estimate slopes and thresholds for the final 25 items (15 from PROMIS). The observed Pearson correlation between the SCI-QOL Anxiety and GAD-7 scores was 0.67. Conclusions The SCI-QOL Anxiety item bank demonstrates excellent psychometric properties and is available as a computer adaptive test or short form for research and clinical applications. SCI-QOL Anxiety scores have been transformed to the PROMIS metric and we provide a method to link SCI-QOL Anxiety scores with those of the GAD-7.


Introduction
Generalized anxiety disorder (GAD) was introduced into the Diagnostic and Statistical Manual of Mental Disorders, 3rd Edition 1 in 1980, and reflects the evolution of the definition and diagnosis of anxiety as an entity distinct from panic. 2 Current diagnostic criteria are available in the DSM-5 3 and continue to reflect excessive levels of anxiety and worry and exhibits a high degree of comorbidity with other anxiety and mood disorders. 2 Even in the absence of comorbid mental health disorders, anxiety in its most uncomplicated form is associated with a significant economic and personal burden, including a high use of health care services and reduced productivity. 4 Despite the pervasive and adverse impact of GAD and the greater use of health services, few individuals seek treatment. 2 It is important to note that an elevated level of anxiety is a component of many mental disorders and is not necessarily an indicator of GAD. For traumatic events such as spinal cord injury (SCI), individuals may experience a wide range of anxious symptoms. Some may have little or slightly elevated anxiety while, for others, anxiety may be a symptom of another disorder, such as post-traumatic stress disorder or adjustment disorder. 1 As is the case in the broader mental health literature, little attention has been paid to anxiety in individuals with spinal cord injury (SCI), despite early recognition of anxiety as a sequela of injury. In 1979, Cook 5 lamented the paucity of research addressing anxiety in persons with SCI; this oversight persists nearly 40 years later. The literature on anxiety after SCI is limited by small sample sizes, lack of controls, heterogeneity of samples, limited studies during the acute phase of recovery, diversity of measurement, lack of theory, lack of replication, exclusive focus on point-prevalence, ignoring lifetime prevalence or incidence, and few treatment trials. 6 Early investigations of anxiety following SCI found the prevalence to be within general population ranges. 5,7 Hancock et al. examined anxiety and depression in the first year post-injury. They found the prevalence of anxiety and depression were significantly higher in persons with SCI compared to controls, though mean scores were in the mild range of severity. 8 Craig et al. found no significant improvement in anxiety or depression in the second compared to the first year post-SCI; about 30% of the sample had elevated scores. 9 Bonanno et al. reported subsets of a SCI sample with stable/low, delayed, and improved trajectories of anxiety improvement. 10 In a cross-sectional sample, anxiety was related to level and completeness of injury, pain, motor and sensory loss, bowel dysfunction, and shorter duration between injury and rehabilitation. 6 Screening measures of anxiety are typically embedded in scales that assess both anxiety and depression symptoms. In a recent review of anxiety and depression measures in SCI, 11 no stand-alone measure of anxiety was identified. Rather, instruments such as the Depression Anxiety Stress Scales-21, 12 Hospital Anxiety and Depression Scale, 13 Ilfeld Psychiatric Symptom Inventory, 14 and the Symptoms Checklist-90-Revised Research Subscales 15 are used to assess anxiety. These scales have been criticized for complex loading of somatic items for both depression and anxiety 16 and low specificity for detecting anxiety and depression. 17 The Patient Reported Outcomes Measurement Information System (PROMIS) 18,19 addresses limitations of earlier instruments. PROMIS contains an anxiety item bank 20 which was developed with patient feedback to assure a wide range of items across the spectrum of anxiety from low to high severity symptoms. The anxiety items were calibrated using a gradedresponse item response theory (IRT) model 21 with a large sample drawn from the general population. 20 There are several advantages to IRT-based item banks over classical test theory-derived measures. In contrast to classical test theory methods that favor items of average symptom severity, IRT-developed scales contain items with a wide range of symptom severity. Including a broad range of items results in a bank that has greater reliability and measurement accuracy across a wider range of symptoms than classical test theory-based scores. 22 IRT methods produce unidimensional item banks that can be used with computer adaptive testing (CAT); items can be administered quickly with a small number of items while maintaining precision. Anxiety measures that are sensitive to change and developed in a patient-centered manner are essential to monitoring treatment.
The purpose of this project was to develop an anxiety item bank, estimate model fit, calibrate items, and produce, a CAT and short form. We hypothesized that, with a few additions, the PROMIS Anxiety items would largely be suitable for an SCI population, and that there would be a moderate to strong correlation between the new SCI-QOL Anxiety bank and the GAD-7, 23 a self-report measure of anxiety that is commonly used in SCI research. This manuscript describes the development, calibration, and psychometric characteristics of the SCI-QOL Anxiety item bank. We report scores using PROMIS' general population metric and develop a crosswalk (i.e. statistical linkage) with the GAD-7, a widely used anxiety measure.

Development of an anxiety item pool
We began by identifying candidate items, which included semi-structured interviews and focus groups with patients with SCI and clinicians who specialize in SCI (see Tulsky et al. 24 for a full description). We developed a set of 47 anxiety items based on the data from individual interviews. Then, specific phrases or concepts were drawn from the focus group transcripts and converted into 63 additional "new" anxiety items. For example, a focus group participant with complete tetraplegia talked about feeling "trapped"; like a "prisoner in your own body." From these statements, we drafted the item "I felt trapped in my body". We selected 27 additional items from the Neurology and Quality of Life (Neuro-QOL) measurement system (including 17 verbatim PROMIS items). In cases where newly written (i.e. based on interview or focus group feedback) items were redundant with these existing items, the new items were dropped in favor of the established Neuro-QOL/PROMIS items. A total of 73 unique Anxiety items (including the 27 Neuro-QOL items/17 PROMIS items) proceeded through Expert Item Review (EIR), 25 a method whereby project investigators considered items' relevance and clarity, and suggested revisions and deletions. Based on EIR feedback, we retained 49 items. Team members reviewed and modified these items. We arranged items on a symptom severity hierarchy and subsequently removed 4 redundant items where there was oversaturation in the middle range of the hierarchy. Additionally, during this phase of review, the project investigators decided to develop a separate pool of items related to psychological trauma, 26 and 7 of the more extreme anxiety items were moved to this new set of items. No new items were added at this time.
With the exception of the 27 Neuro-QOL/17 PROMIS items which had undergone cognitive debriefing, 27 we asked individuals with SCI to answer each item, then describe the process they used to generate with their answer, and to identify confusing, unclear, or derogatory wording. No items were modified or deleted based on cognitive interviewing. After this phase, the remaining 38 items were reviewed for translatability (for method, please see Eremenco et al.) 28 and reading level (using the Lexile framework). 29 Slight modifications were made to 2 items after the translatability and cultural review. For example, the item "I felt trapped in my body" was changed to "I felt trapped in my own body," since the phrase "in my body" is an idiomatic expression which would lose clarity upon translation. We wrote items no higher than a fifth grade reading level.

Calibration study and GAD crosswalk participants and data collection procedures
The Institutional Review Board at each collaborating site reviewed and approved this project. We administered 38 anxiety items along with other item pools reflecting different health related quality of life (HRQOL) subdomains to persons with SCI as part of a multisite item calibration study. Participating sites included the Kessler Foundation, University of Michigan, Rehabilitation Institute of Chicago, University of Washington, Craig Hospital and the James J. Peters/Bronx Department of Veterans Affairs (VA) hospital.
Inclusion criteria were 18 years of age and older, ability to read and understand English, and medicallydocumented traumatic SCI. There were no additional exclusion criteria. We stratified the sample by level of injury ( paraplegia vs. tetraplegia), completeness of injury (complete vs. incomplete), and time since injury (<1 year, 1-3 years, and >3 years) to ensure that the sample was heterogeneous. Stratification was achieved by targeted recruitment of each subgroup (e.g. paraplegia, complete, <1 year); we closely monitored the recruitment process and when an enrollment target for a subgroup cell was met, we prioritized recruitment of individuals who fit a different subgroup cell and no further individuals with those specific injury characteristics were enrolled. We confirmed diagnoses by medical record review; neurologic level was documented by the most recent American Spinal Injury Association Impairment Scale (AIS) rating. 30 To meet the sample size requirements of the graded response model, the goal was to recruit a sample of at least 500 individuals with sufficient heterogeneity to ensure at least 5 participants chose each of the 5 responses for each item.
Interviewers administered items using a structured protocol in person or by telephone. Tulsky et al. 31 describe the methods in detail. A subset of the sample also completed the GAD-7.

Reliability samples and data collection procedures
Test-retest reliability was assessed with a separate sample of individuals with SCI at 4 Spinal Cord Injury Model System (SCIMS) centers (University of Michigan, Kessler Institute for Rehabilitation, Rehabilitation Institute of Chicago, and Craig Hospital). Participants were community dwelling individuals with traumatic SCI (>4 months post injury) who completed the SCI-QOL Anxiety CAT at baseline and 1-2 weeks as part of a larger study. All data were collected during a structured interview; the interviewer presented each CAT item as it appeared and entered all responses directly into the Assessment Center SM platform.

Calibration analysis
Data analysis included confirmatory factor analysis (CFA) to confirm construct unidimensionality, use of a graded-response IRT model to calibrate items, and examination of differential item functioning (DIF). For CFA, we defined good model fit as Comparative Fit Index (CFI) >0.90 and root mean square error of approximation (RMSEA)<0.08, and excellent fit as CFI > 0.95 and RMSEA < 0.06. We iteratively identified poorly fitting items by examining item fit to a graded response IRT model, DIF, local item dependence (LID) (residual correlations >|0.20|), and significant loadings on the single factor (values >0.30). We removed poorly fitting items and repeated the analytic steps. Once misfitting and DIF items were removed, we programmed a CAT on the Assessment Center website (www.assessmentcenter.net) and used item parameters and clinical input to select items for a brief, fixed-length form ("short form").

Reliability analysis
We calculated Pearson's r between baseline and 1-2 week assessments to assess test-retest reliability.

Transformation to PROMIS metric
We report SCI-QOL Anxiety scores based on PROMIS normative data in which T-scores of 50 represent the mean of the general population rather than the mean for persons with SCI. 31 We identified anchor items, those common to PROMIS and SCI-QOL, and used the Stocking-Lord method 32 to link the measures. We examined item-response plots and scatter plots of item parameters, estimated transformation constants, and modified the initial item parameters accordingly.
Crosswalk to GAD-7 We created a crosswalk between the SCI-QOL Anxiety item bank and the GAD-7 using PROsetta Stone procedures. 33,34 Results Participant characteristics of samples Table 1 summarizes the demographic and injury characteristics of the calibration sample and GAD-7 subsample.

Preliminary analysis and item removal
Following the first round of analyses on the initial 35item pool, 2 items were deleted and 5 items were set aside as uncalibrated. Two items related to sleep ("I had trouble falling asleep" and "My sleep was restless") were removed. Both of these items exhibited LID, both could be related to a myriad of constructs other than anxiety, and both had been removed from Neuro-QOL during calibration. A subset of 5 items covered aspects of anxiety very specific to SCI (e.g. "I worried about having a bowel or bladder accident", "I was anxious if my wheelchair/walking aid was not nearby") exhibited low item-total correlations (all 5 items) and category inversions (3 items) and as such were removed from the item pool and set aside for possible future use. Repeating analyses of the remaining 31 items, we removed three Neuro-QOL items with slightly modified wording (e.g. "I felt nervous when I am left alone" instead of "I felt nervous when I was left alone) and two items due to low item-total correlations and local item dependency. One item demonstrated DIF when we calibrated the remaining 26 item set, "I felt nervous when I was left alone." After removing these items, Cronbach's α for final set of 25 items was 0.946; item/total correlations ranged from 0.50 to 0.74.

Confirmatory factor analysis
We observed unidimensionality (CFI = 0.953; RMSEA = 0.069) and R 2 values exceeding 0. 35. No item pairs demonstrated local dependence (i.e. residual correlations >|0.20|) and the ratio of the first to second eigenvalue value was 11.4.

IRT parameter estimation and model Fit
Slopes ranged from 1.17 to 2.52; thresholds ranged from -0.78 to 3.50. Precision in the range between −0.5 and 3.0 theta is equivalent to a classical test theory reliability of 0.95. Fig. 1 shows the bank's test information and precision.
We calculated the S-X 2 model fit statistics using the IRTFIT 35

Differential item functioning
We examined DIF using lordif 36 for six characteristics: age (≤49 vs. ≥50), sex (male n = 559 vs. female n =  Criteria for possible DIF were a statistically significant χ 2 test (P < 0.01) and effect sizes using McFadden's pseudo R 2 greater than 0.02, a small but non-negligible effect. While we examined 9 items for DIF based on the χ 2 test, all effect sizes were negligible. Descriptive statistics for the final 25 items are presented in Table 2.

Transformation to PROMIS metric
We used an algebraic formula to transform Anxiety Tscores to PROMIS' general population norms such that higher scores indicate more severe anxiety symptoms. Using Stocking-Lord techniques, we calculated transformation constants, slope and intercept, for the 15 "anchor" items (i.e. those items common to both SCI-QOL and PROMIS) and applied these as a linear transformation of each SCI-QOL parameter. Consequently, SCI-QOL Anxiety scores are directly comparable to PROMIS Anxiety scores, with higher scores indicating more severe symptoms of Anxiety. Transformed slopes range from 1.49 to 2.95 and thresholds ranged from −0.91 to 3.38 (see Table 3). The SCI-QOL calibration sample (n = 716) mean and standard deviation (SD) were 49.69 (9.60) before transformation and are now 50.75 (9.12). With CAT administration, Assessment Center automatically transforms IRT-based SCI-QOL scores to standard scores on the T metric.

Short form item selection
We developed a 9-item short form of the item bank for situations where it is not feasible to administer items via CAT. We selected items with the greatest information across a wide range of symptom severity. Following PROMIS naming conventions, this form is entitled SCI-QOL Anxiety SF9a. We have compared precision and error of the 25-item bank, 9-item short form, variable-length CAT with a minimum of 4 items, and variable-length CAT with a minimum of 8 items SCI-QOL. Anxiety SF9a raw scores can be converted to IRT-based T-scores using Table 4. Note that participants must complete all 9 SF items to receive a score. Table 5 presents the mean, standard deviation, range, and standard error ranges for the bank, CAT and short form; Fig. 2 shows the associated reliability curves.

Reliability
For the community-dwelling reliability sample (n = 245), the default stopping rules for the CAT were used (minimum of 4 items/maximum of 12 items); the CAT administration averaged 6.67 items (SD 2.8); almost two-thirds (64.0%, n = 151) of the sample completed the CAT in 6 items or fewer. Four items were received by 20.3% of the sample, and 15.3% received the maximum number of items (12). The correlation (Pearson's r) between the baseline and 1-2 week retest assessments was 0.80 (n = 245; P < 0.001), and the ICC (2,1) was 0.80 (95% CI, 0.75 to 0.84).

GAD-7 crosswalk
We linked the SCI-QOL Anxiety Scale to the GAD-7 using the same procedure used to link the PROMIS Anxiety Scale to the GAD-7. 34 Fig. 3 displays the relationship between the two measures. The correlation between SCI-QOL Anxiety and GAD-7 was 0.67. Table 6 provides the SCI-QOL Anxiety to GAD-7 conversion; the table allows clinicians and investigators to apply the crosswalk with confidence at a group level because the majority of cases have small differences between the observed and linked mean scores. Fig. 4 shows the marginal reliability at different symptom levels of the SCI-QOL Anxiety, GAD-7, and a score with SCI-QOL and GAD-7 items combined. The SCI-QOL Anxiety items have reliability coefficients above 0.8 for individuals 2 standard deviations below the mean through 4 standard deviations above the mean, much wider range than for the GAD-7. Fig. 5 shows that the test information for the SCI-QOL Anxiety scale is much higher than the GAD-7.
Individual level conversion of GAD-7 to SCI-QOL scores should be interpreted cautiously. We estimated an expected SCI-QOL score and then calculated a discrepancy score by subtracting the observed value from the predicted score. For over 40% of the sample, the predicted and observed scores were within a half of a standard deviation, and for over 70% of the sample the predicted score was within 1 SD. However, a substantial  number of people (n = 131; 28.2%) had discrepancies greater than 1 SD between the observed and predicted scores.

Discussion
The goals of this study were to develop an item bank to measure anxiety in individuals with SCI, evaluate the bank's psychometric properties, and develop a crosswalk between SCI-QOL Anxiety and the GAD-7. We administered these anxiety items to persons with SCI and developed SCI-specific calibrations using a graded-response IRT model. We removed items demonstrating DIF, local dependence, and poor fit. We linked SCI-QOL Anxiety scores using IRT methods to PROMIS' general population metric to facilitate score interpretation. As such, the SCI-QOL Anxiety bank is an optimized version of PROMIS V1.0 for individuals with SCI. Clinicians and researchers may specify a desired level of reliability when using CAT administration. Typically, 5 or 6 items are administered in 1 or 2 minutes. Screening requires less reliability; users can specify stopping rules to reduce respondent burden.  Relationship between SCI-QOL Anxiety and GAD-7 scores. *Individuals reporting no anxiety symptoms (i.e. "never" to all items), and an extremely high amount have been omitted from this graph due to instability of score estimates. When greater precision is desired, users can specify more stringent discontinue rule. Short forms can be administered when internet access is unavailable or respondents have difficulty using a computer. Users can be confident that all administration modes yield reliable scores and that an 8-item CAT demonstrates precision nearly equal to the full bank. Because the GAD-7 is used frequently in biomedical research, we developed a crosswalk to the SCI-QOL Anxiety score. The reliability and test information provided by SCI-QOL Anxiety is superior to the GAD-7 across a much wider range of symptom severity, making SCI-QOL Anxiety preferable, especially when assessing individuals at high and low levels of symptom severity. While the crosswalk is useful at a group level, the correlation of 0.67 between SCI-QOL anxiety and GAD-7 means that score conversions at an individual level are insufficiently precise.
The SCI-QOL Anxiety CAT uses the default criteria used by PROMIS; the minimum number of items to administer is four and the maximum is 12 with a maximum standard error of 0.3. With the default settings, the CAT will administer at least 4 items and discontinue when the standard error of the score estimate is less than 0.3 or 12 items are administered. Users may specify other discontinue criteria to produce more or less precise anxiety estimates. For instance, specifying a minimum of 8 items would result in a lengthier test, but a more reliable score would be obtained.
Scores on the short form are directly comparable to those from CAT or full item bank administration because items are calibrated on the same metric. Short forms may be administered within Assessment Center or may be downloaded for paper administration. Investigators and clinicians can develop custom short forms which can be scored on the same IRT-based metric with the help of a psychometrician.

Study limitations and future directions
The sample was recruited at 5 SCI Model System sites and one VA medical center. While the sample was stratified to ensure heterogeneity in injury level and severity, it may not be representative of all individuals with SCI living in the United States. The correlation of .67 between SCI-QOL Anxiety and GAD-7 limits the crosswalk to group-level applications. Future studies should evaluate the item bank's utility in clinical applications, use the bank to predict development of anxiety disorders,   and evaluate the bank's sensitivity to change. Specifically, it will be important to develop clinically relevant score cut points (or thresholds) and clear scoring interpretation guidelines for clinicians and researchers. In the future, the SCI-QOL Anxiety CAT or SF could be used clinically at a single time point to screen for elevated anxiety, or could be administered repeatedly over time to identify individuals who are exhibiting increasing levels of anxiety over time. Such information could provide clinicians with a useful starting point for discussion and, potentially, intervention.

Conclusions
The SCI-QOL Anxiety item bank is a SCI-tailored version of the PROMIS v1.0 Anxiety item bank. It demonstrates a high level of test information across a wide range of symptom severity. An 8-item CAT and 9-item short form demonstrate high levels of reliability. A crosswalk to the GAD-7 provides group-level score conversion.

Disclaimer statements
Contributors All authors have contributed significantly to the design, analysis and writing of this manuscript. The contents represent original work and have not been published elsewhere. No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit upon the authors or upon any organization with which the authors are associated.