Measuring depression after spinal cord injury: Development and psychometric characteristics of the SCI-QOL Depression item bank and linkage with PHQ-9

Objective To develop a calibrated spinal cord injury-quality of life (SCI-QOL) item bank, computer adaptive test (CAT), and short form to assess depressive symptoms experienced by individuals with SCI, transform scores to the Patient Reported Outcomes Measurement Information System (PROMIS) metric, and create a crosswalk to the Patient Health Questionnaire (PHQ)-9. Design We used grounded-theory based qualitative item development methods, large-scale item calibration field testing, confirmatory factor analysis, item response theory (IRT) analyses, and statistical linking techniques to transform scores to a PROMIS metric and to provide a crosswalk with the PHQ-9. Setting Five SCI Model System centers and one Department of Veterans Affairs medical center in the United States. Participants Adults with traumatic SCI. Main Outcome Measures Spinal Cord Injury – Quality of Life (SCI-QOL) Depression Item Bank Results Individuals with SCI were involved in all phases of SCI-QOL development. A sample of 716 individuals with traumatic SCI completed 35 items assessing depression, 18 of which were PROMIS items. After removing 7 non-PROMIS items, factor analyses confirmed a unidimensional pool of items. We used a graded response IRT model to estimate slopes and thresholds for the 28 retained items. The SCI-QOL Depression measure correlated 0.76 with the PHQ-9. Conclusions The SCI-QOL Depression item bank provides a reliable and sensitive measure of depressive symptoms with scores reported in terms of general population norms. We provide a crosswalk to the PHQ-9 to facilitate comparisons between measures. The item bank may be administered as a CAT or as a short form and is suitable for research and clinical applications.


Introduction
Depression is one of the most widely studied psychological experiences of persons with spinal cord injury (SCI) due to the high prevalence as well as the burden of illness and disability associated with this condition. In a recent meta-analysis of major depressive disorder (MDD) rates in SCI, Williams and Murray 1 estimated a point prevalence of 22.2%. This rate is more than three-times higher than the one year prevalence of 6.6% in the general US population. 2 A slightly higher prevalence of depression diagnosis (28%) was found in United States veterans with SCI. 3 Depression is also associated with a host of negative outcomes after SCI including urinary tract infections and pressure ulcers, 4 poorer community mobility and participation, 5,6 greater unemployment, 7 and greater risk of mortality. 8 The high rate of depressive symptoms and associated adverse outcomes is evidence of the need for research on depression assessment and treatment for persons with SCI.
Clinically sensitive depression measures are needed for mood disorder screening. Most depression measures do not use current diagnostic criteria, such as the Diagnostic and Statistical Manual of Mental Disorders criteria. 9 Only a few studies have examined the validity of a depression screening measure compared to a structured diagnostic interview for persons with SCI 10,11 ; in these studies, only one instrument, the Patient Health Questionnaire-9 (PHQ-9), 12,13 has demonstrated adequate diagnostic accuracy for MDD. 14 Using a cutoff of 11 or higher, the PHQ-9 has a sensitivity of 100% and a specificity of 84% compared with the Structured Clinical Interview for DSM-IV 15 diagnosis of MDD during acute SCI rehabilitation. 16 Two systematic reviews on depression after SCI 10,11 highlight the paucity of psychometric information of measures. These reviews highlight the variability of diagnostic criteria, symptoms, and time periods of assessment 11 ; multidimensionality of commonly-used measures; few validation studies; and poor reporting of psychometric properties. 10 Neither review recommended one depression measure; scale selection should be guided by the purpose of assessment and clinical considerations. Kalpakjian et al. 10 recommended development of symptom clusters and trajectories and the use of contemporary test development methods. Previous work has shown that depressive symptoms can assume various trajectories after SCI and have typically been identified as chronically high, 17 improving, 18,19 worsening, 17,18 or low. 17 A conceptual problem with most measures of depressive symptoms in persons with SCI is the inclusion of both somatic (i.e. neurovegetative) and cognitive-affective symptoms. Neurovegetative symptoms overlap with and are likely confounded by the effects of SCI. 10 Factor analytic studies of depression measures typically find that depression measures, including the PHQ-9, 13,20,21 the Older Adult Health and Mood Questionnaire, 22 the Zung Self Rated Depression Scale, 23 and the Inventory to Diagnose Depression, 24,25 are multidimensional; that is, they actually measure more than one underlying construct. Multidimensionality obscures the interpretation of symptom etiology, severity, and change; a unidimensional measure reduces ambiguity of scores and increases confidence in utilizing scores to inform clinical decision making. 21,26 Several problems limit use of depression measures. First, all measures of depression were developed for use in the general population and then applied to individuals with SCI. Second, most measures have been developed without patient input during their development. Third, all of the commonly used measures were developed using classical test theory methods rather than contemporary, item response theory (IRT) approaches. 10 Consequently, measures like the PHQ-9 have acceptable item functioning, but its psychometric properties are not optimal for SCI populations, 27 especially when the reporting of somatic complaints may be due to physical aspects of SCI and not depression.
The Patient Reported Outcomes Measurement Information System (PROMIS) 28,29 includes a depression item bank, 30 which was developed with patient feedback to represent a wide range of symptom severity and to ensure content validity of the items from a patient perspective. 31 The depression item pool was calibrated using graded response IRT in a large, general population sample (N = 14,839). 30 The item bank contains primarily cognitive-affective items. Pilkonis et al. believe that the exclusion of somatic complaints makes the scale useful in medical populations in which physical symptoms can confound depressive symptom measurement. 30 There are several advantages to IRT-based item banks over measures developed using classical test theory approaches. IRT-developed scales include items that measure symptoms across a wide range of severity. Tests developed with classical test theory methods typically exclude items at the extreme ends of the distribution. Extreme items are dropped due to poor item-total correlations. In the PROMIS Depression bank, for example, the item 'I felt sad' is the least difficult item to endorse, while 'I thought about suicide' is the most difficult. Inclusion of a broad spectrum of items results in an item bank that has greater reliability and measurement precision across a wider range of depressive symptoms than classical test theory-based measures. 32 IRT-based measures allow the use of computer adaptive testing (CAT); items can be administered in a targeted and brief manner while maintaining measurement precision. Administering short form scales with a fixed subsample of items is also facilitated. The ability to monitor change provides critical information on the natural history of depression and the optimal timing of interventions. It is therefore essential to use depression severity measures that are sensitive to change and developed in a patient-centered manner if the field of SCI rehabilitation is to make progress toward developing effective treatments for depression. The 'gold standard' measure of depression severity in pharmacologic treatment trials is the Hamilton Depression Rating Scale (HDRS). 33 Unfortunately, the HDRS is multidimensional and has limited sensitivity to change. 34,35 Maier used Rasch analyses to develop a unidimensional measure of depression severity from the HDRS that is more sensitive than the original score. 36 In treatment trials involving SCI patients, depression measures that include somatic items may not detect improvement. 37 Indeed, in a recent trial of venlafaxine XR for MDD in people with SCI, a unidimensional subscale of the HDRS detected improvement in depression while the full HDRS scale did not. 38 The purpose of this report is to describe development of the SCI-QOL Depression item bank and short forms. The item bank was derived largely from the PROMIS scales, but a large SCI sample was used to develop SCI-specific calibrations that ensure items are free from bias and item selection will be optimized for an SCI population. We report the calibration of the SCI-QOL Depression item bank, its psychometric properties, and comparability to the PROMIS Depression item bank. We provide information on how we transformed scores to the PROMIS metric. Because the PHQ-9 is one of the most widely used depression measures with SCI samples, we provide a crosswalk between the SCI-QOL depression item bank and the PHQ-9.

Methods
Overview of the sampling plan (2) field tests to calibrate the item pool and develop an item bank that could be administered via CATs and short forms; and (3) validation with criteria measures at several time points. We describe each sample, the methods, analytic plan, and results.

Development of a depression item pool
We began by identifying candidate items from our pilot work which included semi-structured interviews and focus groups with patients with SCI and clinicians with SCI experience (see Tulsky et al. 40 for a full description). Comments from individual interviews formed the initial 38 items in the pool, while focus group feedback yielded an additional 68 'new' items. We selected 27 items, verbatim, from the Neuro-QOL measurement system, 18 of which were also verbatim PROMIS items. Four of those items were subsequently deleted from Neuro-QOL but we retained them in our preliminary item pool. Many of the new items created from interviews and focus groups were redundant with the established Neuro-QOL/PROMIS items. In these cases, if the overlap was deemed sufficient, we dropped the new items in favor of those from Neuro-QOL/PROMIS.
Next, the preliminary item set underwent expert item review, 41 a method whereby co-investigators reviewed items for relevance and clarity, and made suggestions for revisions and deletions. We arranged items hierarchically to reflect symptom severity. Team members removed redundant items where there was oversaturation in the middle range of the hierarchy, and suggested new items to fill gaps in content coverage. Throughout the process, whenever a new item was redundant with a Neuro-QOL/PROMIS item, we retained the existing (Neuro-QOL/PROMIS) items.
For all newly written items, we asked persons with SCI to answer each item and describe the process they used to select a response. This procedure, called cognitive debriefing, 42 in which respondents were asked to answer each item, then describe the process they used to come up with their answer and relate whether they perceived anything to be confusing, unclear, or derogatory, or whether they thought any items could be better phrased. For this item pool, we did not need to modify any items based on cognitive interview feedback. We reviewed the 8 remaining 'new' items for translatability 43 ; none of the items required modification. Note that items from Neuro-QOL and/or PROMIS had already undergone this level of review during their parent project so they were excluded from the cognitive debriefing interviews as well as the translatability review process. A final step was to review the reading level of the item pool using the Lexile framework 44 ; all items were written at or below a 5 th grade reading level. The final pool for field testing consisted of 35 items, 23 of which were final Neuro-QOL items (18 of these were also from PROMIS), 4 of which were former Neuro-QOL items, and 7 of which were newly written during the item development phase of this project.

Item calibration and PHQ crosswalk procedures
We recruited 716 subjects as a part of a large-scale, multisite item calibration study from the Kessler Foundation, University of Michigan, Rehabilitation Institute of Chicago, University of Washington, Craig Hospital, and the James J. Peters/Bronx Veterans Administration hospital. Inclusion criteria were age 18 years and older, ability to read and understand English, and medically-documented traumatic SCI. We stratified the sample by level ( paraplegia versus tetraplegia), completeness of injury (complete vs. incomplete), and time since injury (<1 year, 1-3 years, and >3 years) to obtain a heterogeneous sample. Neurologic level was documented by the most recent American Spinal Injury Association Impairment Scale (AIS) rating. 45 Subjects completed the items in a structured interview in person or by telephone. Tulsky et al. describes the methods in detail. 46 A subset of the sample completed the PHQ-9 items during the same testing session.

Reliability sample and data collection procedures
An independent sample of 245 individuals at the University of Michigan, Kessler Institute for Rehabilitation, Rehabilitation Institute of Chicago, and Craig Hospital completed the item banks twice as part of a larger study. 46 Each site's Institutional Review Board reviewed and approved the study protocol. Eligibility criteria were similar to the calibration study: traumatic SCI, 18 years or older, and ability to read, speak, and understand English fluently. We stratified the sample by level and completeness of injury as well as time since injury (≤2 years, >2 years). Participants were community-dwelling and sustained SCI more than 4 months before the assessment.

Item calibration
Item calibration involved confirmation of construct unidimensionality, use of a graded-response IRT model to calibrate item parameters (slopes and thresholds), and examination of differential item functioning (DIF). We used confirmatory factor analyses (CFA) to determine if items conformed to a unidimensional model. Acceptable model fit indices were: CFI >0.90, RMSEA < 0.08, good; CFI > 0.95, RMSEA< 0.06, excellent. We removed items that demonstrated local item dependence (LID; residual correlation >| 0.2|), significant (P < 0.05), misfit (S-X 2 test), 47 or DIF 48 due to sex, age (<50 vs. ≥50), education (some college or less vs. college degree or higher), injury level ( paraplegia vs. tetraplegia), severity (complete vs. incomplete), and time post injury (<1 year vs. ≥1 year). We ran the graded response IRT analyses iteratively and removed poorly fitting items. Once we achieved a unidimensional model, we used the IRT parameters to develop CAT algorithms for the item bank. We programmed the CAT in the NIH Assessment Center (http://www.assessment center.net) and selected items for a short form which can also be downloaded as a PDF from the Assessment Center website. Tulsky et al. 46 within this special issue described the detailed methodology and data analysis plan.

Reliability analysis
To assess test-retest reliability, we calculated Pearson's r and the intraclass correlation coefficient (ICC) with data from the baseline and 1-2 week retest assessments.

Transformation to PROMIS metric
We computed a linear transformation of SCI-QOL Depression item parameters and scores so that scores reference PROMIS' general population metric. Thus, a SCI-QOL Depression score of 50 represents the mean of the general population rather than the mean of the SCI sample. The transformation procedure consisted of 6 steps. 46 First, we used counts of SCI-QOL calibrations and anchor items common to PROMIS and SCI-QOL to determine the linking configuration. We identified IRT parameters for anchor items, then used the Stocking-Lord method 49 to identify the transformation coefficients to link items. For the anchor items, we examined item-response plots and scatter plots of item parameters, estimated transformation constants, and transformed the item parameters accordingly.
Crosswalk to PHQ-9 We created a crosswalk from the SCI-QOL Depression item bank to the PHQ-9 50 so that PHQ-9 raw summed scores have a corresponding SCI-QOL T-score, which allows for direct comparison of the SCI-QOL Depression with PHQ-9 scores. We used the linking methodology and analytic procedures that were developed for the PROsetta Stone project. 51

Participant characteristics of samples
Demographic and injury characteristics of the calibration and PHQ Crosswalk samples are presented in Table 1. Tulsky et al. 46 provides additional details on the focus group and reliability samples.

Preliminary analysis and item removal
Following the first round of analyses on the initial 35item pool, 5 items were removed. Three of the removed items were Neuro-QOL items that had been included with slightly incorrect wording (e.g. 'I felt lonely even when I am with other people' instead of 'I felt lonely even when I was with other people). Fortunately, we had also included the correctly-worded version of each of these three items, so we removed the incorrectly worded version of each from the pool. The other 2 items were removed for LID and misfit (significant S-X 2 test), respectively. Analyses were repeated on the 30-item pool, and an additional 2 items were removed due to LID (both items) and DIF for sex (the item NQDEP09, 'I felt like crying'). For the final 28item set, internal consistency was α = 0.964 and item/ total correlations ranged from 0.51 to 0.81. For 26 of the items, over 30% of the sample selected category 1 (Never). No item had sparse data (i.e. <5 responses) in any category. Two items had a category inversion where the average raw score (for all items) for persons selecting category '5' (Always) was lower than the average for person selecting category '4' (Often).

Dimensionality
We observed a unidimensional model (CFI = 0.968; RMSEA = 0.066). Twenty-six items had R 2 values greater than 0.40, and 2 items were less than 0.40. We identified no local dependence, defined as residual correlations >|0.20|. The ratio of the first to second eigenvalue was 14.3.

IRT parameter estimation and model fit
Slopes (item discrimination parameters) ranged from 1.43 to 4.36; thresholds (item difficulty parameters) ranged from -0.87 to 2.84.
The measurement precision in the theta range between -0.7 and 2.8 is roughly equivalent to a classical reliability of 0.95 or better.
We examined the S-X 2 model fit statistics using the IRTFIT macro program. 55 All items had adequate or better model fit statistics (P < 0.05), with marginal reliability equal to 0.950.

Differential item functioning
We used lordif 48 to examine differential item functioning for age (≤49 vs. ≥50 years), sex (male vs. female), education (some college and lower vs. college degree and above), diagnosis (tetraplegia vs. paraplegia), injury severity (incomplete vs. complete), and time post injury (>1 year vs. <1 year). We flagged 10 items for possible DIF with χ 2 tests' P < 0.01 and effect sizes (McFadden's pseudo R 2 ) >0.02, which is a small but non-negligible effect. On examination of effect sizes, all DIF was negligible and we retained all items. Descriptive statistics for the retained item are presented in Table 2.

Transformation to PROMIS metric
The SCI-specific calibrations are based on the calibration sample. We transformed these SCI-QOL measures to PROMIS' general population norms. We calculated the transformation constants, slope and intercept, for the 18 PROMIS items using Stocking-Lord techniques 49 and applied them to create linear transformations for each SCI-QOL parameter. Thus, SCI-QOL scores are reported as a PROMIS Depression score with higher scores indicating more severe depressive symptoms. Transformed slopes range from 1.39 to 4.23, and thresholds range from -0.677 to 3.143 (Table 3). With CAT administration, the Assessment Center automatically transforms IRT-based scaled scores into T-scores with a mean of 50 and SD of 10 (Table 4).

Short form selection and mode of administration
We programmed item parameters into the NIH Assessment Center SM53 to facilitate CAT administration. Users can modify configurations to maximize reliability or reduce test burden, or select specific items. A short form is also available. SCI-QOL uses PROMIS' default discontinue criteria; the minimum number of items is four and the maximum is 12 with a maximum standard error of 0.3. Thus, the CAT always administers at least 4 items and will discontinue when the standard error of a score estimate drops below 0.3 or 12 items are administered. Users may change the discontinue criteria so that additional items are administered when a more precise assessment is needed. For instance, if the user selects an option that the CAT administers a minimum of 8 items before discontinuing, a lengthier test will be administered, but a more reliable score will be obtained.
In situations where it may not be feasible to use a laptop or tablet computer with internet access, users may want to use a short form. We developed a 10item short form with the goal of including the most informative items across a wide range of depressive symptoms. The items selected for this form, the SCI-QOL Depression short form 10a, are indicated by bold text in Tables 2 and 3. Since items are calibrated on a common metric, short form scores are comparable to those obtained from a CAT or full item bank. Investigators and clinicians can develop custom short forms which can be scored on the same metric. Short forms are scored by summing the item responses and finding the associated T-score in Table 5.
We evaluated measurement precision of the full bank, 10-item short form, and variable-length CAT with the default minimum of 4 items. Table 6 presents the mean, standard deviation, range, and standard error ranges for these administration modes; Fig. 2 presents the associated reliability curves.

Reliability
We used the default stopping rules for the CAT: minimum of 4 and maximum of 12 items with the community sample. Administration averaged 5.93 items (SD 3.1); 75% of the sample completed the CAT within 6 items, and 17.7% received the maximum number of items (12). When comparing SCI-QOL Depression scores at baseline with those from the 1-2 week follow up assessment (n = 245), Pearson's r = 0.80 (P < .001) and ICC (2,1) = 0.80 (95% CI: 0.75 to 0.84).

Crosswalk to PHQ-9
We produced a crosswalk from the SCI-QOL Depression item bank to the PHQ-9 using a similar linking procedure as was conducted by Gershon et al. 54,55 with data from a general population sample collected as part of the NIH Toolbox. Fig. 3 displays the relationship between SCI-QOL Depression and the PHQ-9 in our sample; the correlation was 0.76. Fig. 4 demonstrates the superior marginal reliability of the SCI-QOL items (or the SCI-QOL + PHQ-9 items) when compared to the PHQ-9 or PHQ-2 (which includes only the first 2 items of the PHQ for a very brief screening). Fig. 5 demonstrates the test information that is conveyed by the measures. Test information indicates the precision of measurement provided by the item bank across different scores; that is, the more information a test has, the more accurately it can determine what level of an underlying trait (in this case, depression) a given participant possesses. Test information has an inverse relationship with error variance; the more information a test has, the smaller the error of measurement. The figure shows the scale information of the SCI-QOL Depression bank, the PHQ-9, and the PHQ-2, as well as a combined score (including all of the SCI-QOL and PHQ-9 items) across the range of depression. The SCI-QOL provides greater scale information than the PHQ-9 or the PHQ-2 across individuals with scores ranging from 2 standard deviations below the mean through 4 standard deviations above the mean. As expected, the combined item bank yields the most information since it uses all component items (SCI-QOL and PHQ).These values   have been used to generate the PHQ-9 raw score to SCI-QOL T-Score metric conversion crosswalk table (Table 4). As indicated by the correlation between measures (r = 0.76), scores on the SCI-QOL Depression bank and the PHQ-9 are not strictly interchangeable at the individual level. For the subsample completing the item bank and PHQ-9, we used the PHQ-9 to estimate an expected SCI-QOL score and then calculated a discrepancy score by subtracting the observed value from the predicted score. The predicted and observed scores were within a half of a standard deviation for over half the sample. However, there was a substantial number of people (n = 106) who had discrepancies greater than 1 SD (this is consistent with the shared variance indicated by the correlation between the measures).

Discussion
The goal of this study was to (1) develop a bank of depression-related items for use with individuals with SCI; (2) evaluate the psychometric properties of the item bank; (3) develop a cross-walk from the PHQ-9 to SCI-QOL; and (4) provide information that facilitates clinical and research use of the depression item bank. The SCI-QOL Depression bank is an optimized version of the PROMIS v1.0 Depression item bank for individuals with SCI. Patient and clinician focus group participants confirmed that the PROMIS v1.0 Depression items had content validity in an SCI population. Like the PROMIS Depression bank, the SCI-QOL Depression item bank does not include items related to somatic symptoms which might be confounded with physical medical issues experienced by persons with SCI. We then developed SCI-specific item calibrations using a large, heterogeneous sample of individuals with SCI using a 2-PL graded-response IRT model. We removed items exhibiting DIF, poor item fit statistics, or local dependence. This procedure ensures that the SCI-QOL Depression bank is relevant and appropriate for individuals with SCI. We used IRT linking methods to transform the SCI-QOL calibrations to the PROMIS metric, thus allowing use of SCI calibrations that can be directly compared to PROMIS scores.
The SCI-QOL also provides the end user with several administration options depending upon the intended use of the scale. For studies requiring rapid, quick screening, the SCI-QOL could be administered using CAT stopping rules to reduce testing time (e.g. administer only 4 items regardless of standard error variance). For studies requiring more administration precision and participant burden is not an issue, the CAT stopping rules could be set to administer more items (e.g. a minimum of 8 items, standard error of 0.30, and a maximum number of 12 or more items). When a computer and/ or internet connection is not available for testing, a short form could be administered to the participant. Finally, if a special subpopulation is being tested, a customized short form could be developed that only includes items relevant to the subpopulation (e.g. a short form including only the most 'difficult' itemsi.e. those that will be endorsed only by individuals with the most severe depressive symptomatologycould be created for use in a study of individuals with SCI and concomitant MDD). While administration of the full item bank would yield the highest reliability, use of the full bank is not recommended given the high reliability of the 10-item fixed-length CAT and the variable-length CAT with a minimum of 8 items. Either of these administrations would very closely approximate the scores obtained when a full bank is administered.
The wide use of the PHQ-9 in SCI research studies led us to link SCI-QOL with the PHQ-9. We co-administered the SCI-QOL with the PHQ-9 allowing us to compare the psychometric properties of the instruments as well as developing a crosswalk between the scales. The reliability of the SCI-QOL item bank is superior to the PHQ-9 (or PHQ-2) over a wider range of depressive functioning suggesting that the SCI-QOL has greater measurement precision and is better able to assess individuals at both tails of the distribution. Additionally, the SCI-QOL provides significantly greater scale information than the PHQ-9 or the PHQ-2 which also indicates that the SCI-QOL score is a more reliable measure across the entire range of depressive symptoms, providing more accuracy and sensitivity across a wider range of depression. The SCI-QOL is able to estimate the score for an individual who is one or two standard deviations below the mean and up to three or four standard deviations above the mean, which is far more precise than either the PHQ-9 or PHQ-2. Collectively, these data suggest that the SCI-QOL has greater measurement precision.
We developed a crosswalk table to enable researchers and clinicians who utilize the PHQ-9 to transition to data collection with the SCI-QOL Depression bank. Utilizing Table 4, the PHQ-9 scores can be transformed to a SCI-QOL Depression T-Score metric allowing direct comparison of SCI-QOL to PHQ-9 scores. The two measures are correlated 0.76. Investigators can apply the crosswalk conversions with some confidence at the group level because the majority of cases have small differences between the observed and linked mean scores. At the same time, investigators and clinicians should exercise caution when applying the crosswalks to individuals because the 95% confidence interval is 1.2 SD, making it difficult to predict any individual case with exact precision. Therefore, investigators should use caution in the inferences drawn when using the crosswalk table to track performance of a specific individual over time. Nevertheless, the transformation is useful in answering sample-level questions.

Study limitations
We recruited the samples from a limited number of SCI Model System facilities and one VA medical center; they may not be representative of all persons with SCI in the United States. Persons who volunteered received a

Conclusions
The SCI-QOL Depression item bank is the first scale of depressive symptoms that has been developed specifically for an SCI population. The SCI-QOL is an optimized version of the PROMIS v1.0 item bank and the scores are, for all practical purpose, PROMIS scores. Moreover, it is linked to the PHQ-9 so that PHQ-9 scores can be transformed to SCI-QOL equivalent scores allowing researchers to maintain continuity of measurement in an ongoing longitudinal study. The SCI-QOL Depression item bank reflects the constellation of symptoms experienced by persons with SCI. It is not designed to provide a diagnosis of major depressive disorder, but rather serve as a populationspecific indicator of SCI-related depressive symptoms. The Depression item bank contains 28 items; users have the option of using a 10-item short form or CAT. We removed misfitting items systematically, based on psychometric and clinical criteria.
This new measure offers clinicians and researchers a precise, population-relevant, and flexible method to describe symptoms related to depression. The mixed methods approach assures relevance and patient-centered validity, and strengthens our ability to measure this important phenomenon in persons with SCI. Doing so will enhance our ability to identify critical time points of intervention along the SCI rehabilitation and recovery trajectory.

Disclaimer statements
Contributors All authors have contributed significantly to the design, analysis and writing of this manuscript. The contents represent original work and have not been published elsewhere. No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit upon the authors or upon any organization with which the authors are associated.
Funding This study was supported by National Institutes of Health grant numbers 5R01HD054659 (Eunice Kennedy Shriver National Institute of Child's Health and Human Development/National Center on Medical Rehabilitation Research and the National Institute on Neurological Disorders and Stroke) and U01AR057929 (NIH Common Fund/National Institute of Arthritis and Musculoskeletal and Skin Diseases) and by the National Institute on Disability and Rehabilitation Research grant numbers H133N060022, H133N060024, H133N060032, H133N060014, H133N060005, and H133N060027.

Conflicts of interest
No commercial party having a direct financial interest in the results of the research supporting this article has or will confer a benefit upon the authors or upon any organization with which the authors are associated.