The Cleveland Adaptive Psychopathology Inventory: preliminary validity and reliability of a multi-scale personality and psychopathology questionnaire

ABSTRACT OBJECTIVE: The present study describes the development of the Cleveland Adaptive Psychopathology Inventory (CAPI), a brief multi-scale personality and psychopathology questionnaire for the screening of common mental health disorders. METHODS: The 118-item questionnaire consists of 10 clinical scales, a brief scale for the screening of substance abuse, and three scales for the assessment of response bias. A sample of 4000 volunteers with and without self-reported medical or mental health conditions was used to assess the psychometric properties of the open source measure including internal consistency, test–retest reliability, and preliminary validity analyses with diagnostic sensitivity and specificity of self-reported psychiatric diagnosis. RESULTS: Internal consistency of the subscales for the normative sample ranged from .568 to .872, with mean inter-item correlations ranging from .161 to .410. The average test-retest across all of the samples ranged from .706 to .872. Finally, sensitivity and specificity (area under the curve) for the subscales with the dependent variables being self-reported diagnosis ranged from .666 to .899. CONCLUSIONS: The preliminary results suggest that the CAPI is a useful tool for clinicians and researchers interested in screening for comorbid psychopathology in both general and clinical populations.


Introduction
Humm and Wadsworth [1] were the first to develop a multi-scale personality questionnaire for the screening of psychopathology. A decade later, McKinley and Hathaway [2] developed the Minnesota Multiphasic Personality Inventory (MMPI), a 567 true-false multi-scale personality questionnaire. They adopted Humm and Wadsworth's true-false response format, the indirect questioning method, a large portion of their questions, as well as the use of profiles to assist with the data interpretation. Hathaway and McKinley refined the MMPI scales by employing clinical groups and item-correlational analyses [2][3][4]. Shortly after the MMPI was published, it was established that the MMPI has limited diagnostic sensitivity. Recognizing these limitations, Meehl [5] recommended that clinicians interpret the configural pattern of this inventory rather than examine each of the scales individually. Subsequently, Hathaway and Meehl [6] published an actuarial "atlas" of the personality descriptors. However, the resulting MMPI/MMPI-2 and MMPI-RF profiles continue to have limited diagnostic sensitivity and specificity [7][8][9][10].
Chronbach and Meehl [11] recommended that future test developers should only use criterion validity as an initial step when developing new measures and later evaluate their validity by correlating them with more established measures. Loevinger [12] expanded on Cronbach and Meehl's [11] work and suggested that test developers should first create a pool of items and eventually correlate the test scores with prevailing diagnostic criteria and other variables. Next, they should assess the structural validity of the test using statistical procedures such as factor analysis. Finally, they should examine how the scores on the new scale can be generalized to various situations and populations.
Leslie Morey adopted Loevinger's [12] guidelines and developed the Personality Assessment Inventory (PAI) [13]. This 344 item multi-scale questionnaire is comprised of questions with 4 possible response values (Likert Scale). Studies show that the PAI possesses adequate convergent validity with other personality tests, such as the MMPI but also possesses limited diagnostic validity [14,15].
Tellegen et al. [16] attempted to ameliorate the limitations of the MMPI and MMPI-2 by reanalyzing the normative data of the test using structural validation modelling. They subsequently published the MMPI Restructured Format (MMPI-RF), a multi-scale questionnaire consisting of 338 True-False items from the original MMPI and scales that no longer purport to assess explicit psychiatric entities. Therefore, the MMPI-RF, much like its predecessor, the MMPI, continues to have limited diagnostic validity [8][9][10][11].
In the past few years, a growing number of researchers have shown an interest in using relatively brief multi-scale psychological measures to study large samples (big data). To this end, several measures were developed including the Five Factor Personality Inventory [17], the Mind2Care Inventory [18], and the Basic Interest Markers Questionnaire [19]. Using such measures, researchers have been able to gather large online databases in a manner that is inexpensive and efficient. Moreover, there is compelling evidence that when subjects anonymously complete such measures online, they tend to disclose more about themselves than when they undergo more traditional face-to-face interviews [20,21].
The Cleveland Adaptive Psychopathology Inventory (CAPI) was developed with the goal of creating a relatively brief multi-scale "open source" personality measure whose administration restricted to administration by professionals. Unlike lengthy personality scales, it is not intended for forensic purposes and as such, it only evaluates relatively rudimentary social desirability, consistency of responses, and exaggeration bias.
The development of the CAPI followed Loevinger's [12] and Melley et al.'s [22] guidelines. Initially, the authors developed a set of self-report measures for the assessment of various personality and psychiatric conditions based on the DSM-IV-TR and DSM-5 criteria [23,24]. Each scale was developed using wellestablished non-parametric multi-dimensional methodology and the Facet Theory Approach [25][26][27]. Given the length of many of these questionnaires (between 55-81 items), the authors shortened each of the measures to form a brief multi-scale measure. The aims of this study were as follows: (1) describe the development of the CAPI, (2) provide data regarding the reliability of the CAPI scales, and (3) provide data regarding the criterion validity of the CAPI.

Procedure
The initial version of the inventory, the CAPI's Experimental Version 1 (CAPI-EV1) was developed by the authors using Loevinger's [12] rational test construction theory. Figure 1 depicts the flowchart of the CAPI development. Briefly, a large pool of test items was adopted from various existing measures (see Table 1) and was administered to 63 college students [28]. Next, double negatives and poorly written items were removed and the response format was transformed to a 4-point Likert scale (True, Mostly true, Mostly false, and False). The resulting questionnaire, CAPI Experimental Version 2 (CAPI-EV2, 330items), was then administered to a new sample, and additional items were removed. The resulting final version of the CAPI 179-items scale was then administered to 4000 volunteers who signed up via ResearchMatch.org, a clinical research registry that is funded by the NIH Clinical and Translational Science Award programme and is endorsed by the NIH Clinical Center as a source for the recruitment of volunteers.
Each volunteer from ResearchMatch.org was sent an email with an IRB-approved description of the study (IRB-FY2018-230). Those who volunteered were then sent a second email with a link to the CAPI which was then uploaded to a website that meets HIPAA standards. Of the 8,251 subjects who agreed to participate in the study, 4000 completed the CAPI. All participants signed an electronic consent form which included a detailed description of the study, specified that they would not be paid for their participation, and indicated that they could terminate participation Table 1. Sources of items for the CAPI-Experimental Version 1.

Authors Scale
Humm and Wadsworth [1] Humm and Wadsworth Personality Inventory Poreh et al. [29] Borderline Personality Questionnaire (BPQ) Iancu et al. [30] Positive and Negative Symptoms Questionnaire Martukovich [31] Obsessive Compulsive Personality Questionnaire Chaturvedi et al. [32] Scale for Assessment of Somatic Symptoms Escobar et al. [33] Somatic Symptom Index M. Hamilton, [34] Hamilton Rating Scale for Anxiety Max Hamilton, [35] Hamilton Depression Rating Scale Covi et al. [36] Covi Anxiety Scale Schalet et al. [37] Hypomanic Personality Scale Altman et al. [38] Altman Self-Rating Mania Scale First et al. [39] Structured Clinical Interview for DSM-IV Axis I and SCID-4-PD at any time. They were then asked to complete demographic questions, questions regarding their medical or mental health history, and the CAPI. Finally, after the completion of the study, they were asked for feedback regarding the study as well as whether they would be willing to retake the questionnaire. Table 2 describes the final version of the 118-item CAPI including its clinical and validity scales.

Normative sample
The sample was comprised of 1646 volunteers from 22 states who reported no significant medical or mental health conditions. The final normative sample consisted of 76.1% females with the age of participants ranging from 18 to 86 years (M = 44.53, SD = 16.079). Table 3 provides additional demographic information regarding the sample. One sees that most of the participants were Caucasian with at least a bachelor's level education.

Clinical sample
The sample was comprised of 2354 volunteers with a self-reported history of significant mental health problems and/or chronic pain history. Females made up 88.2% of the sample and age ranged from 18 to 80 years (M = 42.35, SD = 14.5). The average number of physical conditions reported was significantly higher than the control group (t = 3.92, df = 3992, p < 0.001) and the average number self-reported mental health related conditions was 1.92 (SD = .941). A subsample of subjects with a self-reported history of psychiatric disorders (n = 1253) was asked what type of professional diagnosed them and provide a list of their psychotropic medications. Of these, 67% of the subjects reported that their diagnosis was made by a psychiatrist, 23% reported that it was made by their primary care physician, 1% by a nurse practitioner or psychiatric nurse, 6% by a psychologist, and 0.01% were selfdiagnosed. Table 3 provides the demographics of the two samples and shows that the ethnic background of the clinical sample possessed similar characteristics as the normative sample. However, the level of education of the clinical group was significantly lower (X 2 = 74.76; df = 6, p < 0.0001). Table 4 provides information regarding the frequency of 9 self-reported somatic conditions. Table 5 provides information regarding the prevalence of the various self-reported mental health conditions in the Clinical Sample that might be pertinent for the validation of the CAPI. Items assessing feelings of distrust and suspiciousness of others, issues with authority Schizotypal Personality Traits (SCIZ) 10 Items assessing both positive and negative schizophrenia like symptoms such as odd beliefs, magical thinking, and flat affect Borderline Personality Traits (BOR) 11 Items assessing self-harm, feelings of emptiness, abandonment, and impulsivity Antisocial Personality Traits (ANTI) 10 Items assessing for lack of empathy, Machiavellian traits, childhood conduct disorder, and lack of anxiety Avoidant Personality Traits (AVD) 10 Assesses for feelings of inadequacy, sensitivity to rejection, feelings of shame, and social inhibition Obsessive Compulsive Personality Traits (OCPD) 7 Items assessing preoccupation with orderliness and rules, inability to delegate tasks, and perfectionism Anxiety (ANX) 11 Items assessing feelings of feeling tense, nervous, fearful, and having the tendency to worry Depressive Mood (DEP) 11 Assesses for sad mood, feelings of hopelessness, irritability, and feelings of guilt Bipolar (BD) 10 Items assessing need for sleep, level of energy, history of pressured speech and overconfidence, and elevated mood Somatization (SOM) 10 Items include concern with bodily symptoms as well as doubt and disbelief in current medical care Substance Abuse (SUB) 11 The scale is composed of questions regarding the abuse of alcohol or illicit drugs Alcohol (ALCH) 6 Questions regarding excessive use of alcohol Illicit Drugs (DRGS) 5 Questions regarding excessive use of illicit drugs Naivete' (NVA) 6 Assesses one's tendency to present themselves in a good light Inconsistency (CON) 17* Examines the consistency of the responder by comparing their responses on reversed items Infrequency Scale (INF) 16* Includes items that less than 1% of the "normal" population endorse *Items on these scales appear throughout the questionnaire. Test-retest sample A subsample (n = 1931) were asked to complete the CAPI a second time at time intervals ranging from 2 to 15 months after the initial completion of the inventory. Of those, 75.3% agreed to retake the CAPI at some later time and 45% of the sample completed it (n = 868). The subjects were broken down into three groups: 2-5 months, 6-12 months, and 12-15 months. The normative sample was broken down into two groups only: 2-5 months and 6-12 months. The age of participants ranged from 18 to 80 years (M = 42.2, SD = 16.39) for the 2-5 month retest sample and 18-84 years (M = 50.3, SD = 17.4) for the 6-12-month retest sample with 44% and 74% of the subjects being female in the two retest samples, respectively. Table 2 presents the level of education and ethnic composition of each of the groups.

Results
Reliability Table 6 shows the internal consistency of the CAPI's clinical and validity scales which ranged from .688 to .874. aside from the CAPI's Naivete scale, which fell below .60. Following Cohen et al. [40], the mean inter-item correlations of the various scales were examined. The scores ranged from .171 to .422, meeting the recommended criteria (between .15 and .85) regarding scales that tap into narrow domains [41]. Table 7 presents the CAPI's aggregated test-retest reliability followed by 3-time intervals: 2-5 months, 6-12 months, and over a year. The table shows that the reliability quotients remain relatively stable during the first 6-12 months but in some cases decline after a year.

Effects of demographic variables
Age correlated significantly with all the clinical scales (r = −.098 to .−283). However, when the CAPI's validity scales were used to control for the effects of social desirability and consistency of response, these    correlations were no longer significant (r = −0.03 to −0.08) aside from the AVD and SOM scales (r = .180 and −.247, respectively). The correlation between education and CAPI scales with social desirability and consistency of response serving as covariates did not produce significant findings aside from the ANTI (r = −117), SCHIZ (r = −.129) and SOM (−0.84). However, these modest correlations did not reach significance after carrying out a Bonferroni correction for type 1 error. Gender differences were examined using multiple t-tests with a Bonferroni correction to control for type one error. Within the normative sample, none of the clinical or validity scales approached significance.

Preliminary validity data
To examine the validity of the CAPI, sensitivity and specificity (ROC) analyses were performed with the CAPI scales serving as independent variables and the self-reported medical or mental-health diagnoses serving as the dependent variables. Given that a large proportion of the subjects had multiple comorbidities, we entered all the CAPI scales into the ROC analyses but reported only the two scales with the highest sensitivity and specificity (Area under the curve). Table 8 shows the sensitivity and specificity of the CAPI scales in identifying self-reported mental health conditions. Additional sensitivity and specificity analyses were conducted after the transformation of the associated CAPI scales into dichotomous categories with subjects scoring above a T of 65 (one and a half standard deviations above the mean) being considered as meeting the criteria for the disorder. The resulting sensitivity, sensitivity, likelihood ratio, false positive and negative rates, and other related indexes are presented in Table 9. Partial intercorrelations with bootstrapping while controlling for age, education, and the response bias scale was conducted to assess the intercorrelations between the various CAPI scales. Given the large sample size, confidence intervals were used to determine significance. Table 10 shows the resulting correlation coefficients with those that are significant, after employing Bonferroni correction, being highlighted. One sees that the Paranoid scale (PAR) significantly correlated with the Borderline (BOR) and Antisocial Personality Traits (ANTI). The Depressive Mood (DEP) scale significantly correlated with Borderline Personality Traits (BOR), Avoidant (AVD) and Anxiety (ANX) scales. The Avoidant Personality Traits Scale correlated with the Borderline Personality (BOR) and Obsessive-Compulsive Personality Traits (OCPD) scales as well as the Anxiety scale. (ANX). The Avoidant (AVD) and Anxiety (ANX) scales were highly correlated. The Bipolar (BD) and Obsessive-Compulsive Personality Traits (OCPD) scales were highly correlated. Finally, the Bipolar (BD) scale correlated with  Table 9. Sensitivity and specificity analysis of the CAPI scales using a cutoff of T > 65. the Schizotypal (SCHZ) scale and the Antisocial Personality Traits (ANTI) and Substance (SUB) scales were highly correlated. Separately, we also examined the correlation between the Somatization (SOM) scale, and the number of medical problems reported by the subjects. This analysis produced a highly significant correlation (r = .494, p < 0.0001).

Discussion
The present study describes the development and preliminary psychometric properties for a new and relatively brief multi-scale "open source" screening measure for the assessment of psychopathology. The results of this large-scale study show that the CAPI appears to be a psychometrically sound measure for the screening of such conditions in the adult population. The 10 main scales of the CAPI, composed of 6 personality traits scales, 3 mood scales, a somatization scale, a scale for the screening of substance abuse, and 3 validity scales, demonstrate adequate internal consistency, mean inter-items correlations, and test-retest reliability comparable and at times even exceeding more established and lengthier multiscale personality questionnaires [42][43][44]. It is noteworthy that the lower test-retest scores of some scales after 12 months suggest that the CAPI scores may not represent a person's emotional functioning over extended periods due to intra-individual changes. Another important finding is that the Naivete (NVA) scale showed better reliability test-retest coefficients than those obtained when the data were analyzed for internal consistency reliability analysis. It is very possible that those who completed the scale the second time were more careful in responding to the CAPI and consequently provided more robust reliability.
As was previously noted, additional analyses show that the CAPI scores are influenced by education in much the same way as other multi-scale measures [45]. However, as in the case of the PAI and MMPI, demographics had limited impact on the CAPI subscale scores after controlling for response style, consistent with previous studies regarding the limited value of developing age and education specific norms [46,47]. In the current sample, of those individuals who reported having been diagnosed with borderline personality disorder, 72.0% reported also being diagnosed with depressive disorder, 56.8% with anxiety, and 8.6% with Bipolar Disorder. This finding is consistent with the literature [48,49], highlighting the multi-dimensional characteristics of many psychiatric disorders, and exemplifying the importance of assessing various dimensions using multi-trait scales [29,50].
Preliminary validity analyses demonstrate that each of the CAPI scales evidences reasonable criterion validity. Namely, the sensitivity and specificity (ROC) analyses of the CAPI's subscales identify self-reported psychiatric diagnoses with a reasonable level of confidence. However, as might be expected, other CAPI scales were associated with co-morbid conditions. For example, the CAPI's DEP scale was sensitive in identifying subjects who reported having a depressive disorder but also Borderline Personality Disorder. Additionally, individuals with BOR also obtained high scores on the PAR, as is expected from the literature on overlapping personality characteristics [51,52]. Unfortunately, the paucity of large clinical studies that examine the sensitivity and specificity of existing multiscale measures makes it difficult to carry out a more comprehensive comparison with existing measures. Regardless, compared to the few studies that have been published (see Table 1), the CAPI evidences equivalent and at times even higher sensitivity and specificity than standalone measures or existing multiscale measures [15,53].
Although the present study employs large normative and clinical samples, some clinicians and researchers might argue that recruitment of subjects on-line (researchmatch.org) to collect the norms and clinical data might be inherently faulty, relative to the more traditional method often used by commercial testing companies of recruiting paid subjects in multiple testing centres. Studies repeatedly show, however, that recruitment of volunteers via the web is a reliable and valid method for conducting psychological research [54,55]. With that said, given the relatively high education of the samples as well as the fact that minorities were notably underrepresented, the authors are working on recruiting additional samples from the general population using non-web recruitment methods to validate the current results.
Another limitation of the current study is the use of self-reported diagnoses by the volunteers in the clinical sample which may limit the generalizability of the study. The question is raised as to whether a more systematic study using structured clinical interviews such as the Structured Clinical Interview for DSM-5® Personality Disorders [39] would be more suitable for validating the CAPI scales. While such methodology has many advantages, it is likely to limit the size of the clinical sample. Nevertheless, additional studies are under way at several mental health facilities, where diagnoses are made by licensed professionals, to further validate the CAPI. Given that the use of selfreported diagnosis is likely to result in more error bias, particularly because the diagnoses were made by multiple clinicians, one should view the results of the current study as providing a low-end estimation of the sensitivity and specificity of the CAPI scales. However, this limitation should be evaluated while taking into consideration the benefit of recruiting a very large sample, thus reducing some of the potential error. As is shown in Table 1, the average number of subjects in validation studies past multi-scale measures was 104.7 which is in sharp contrast to the current much larger sample.
As part of the study, correlational analyses between the CAPI scales were conducted. The results of these analyses provide insight into the comorbidity of certain symptoms. It shows, for example, that the depression and anxiety scales, two disorders that are well-documented to be frequently comorbid, were highly correlated [49,56,57]. It also shows that depressive symptoms and borderline personality traits, which both impact mood variability, are also highly intercorrelated, supporting the literature on the topic [50,58]. Finally, as might be expected, antisocial traits highly correlated with a history of substance use.
A question is raised as to whether personality measures could be useful for psychiatrists and primary care physicians [50]. Due to the length of existing measures, most clinicians employ stand-alone scales, and those are rarely used for diagnostic or clinical decision making. In that context, the existing multi -scale measures such as the MMPI or PAI are undesirable as they are too lengthy for a busy primary physician or a psychiatrist to interpret. With that in mind, in the next stage in the development of the CAPI, we hope to collect large clinical data sets with the goal of assisting in both therapy and psychopharmacological decisions. It is believed that only by using big data methodology, such as deep learning, might it be possible to identify which particular "item profiles" on the CAPI correspond with partial hospitalization length of stay and/or treatment adherence. In this context, one might also be able to address the question of psychiatric comorbidity and treatment outcome. For example, we hope to examine whether comorbid personality disorders, particularly with regard Cluster B personality traits [50], generally do [59,60] or do not impede [56,57] the pharmacological care of depression.
In sum, the present study provides data regarding the reliability and validity of a new multi-scale personality and psychopathology questionnaire. Unlike traditional psychological measures, the authors of this new measure emphasized the utilization of updated diagnostic principles with the goal of having a wide range of practitioners employing this measure in various clinical settings to improve clinical practice. While in its current format the new measure is aimed at providing a description of personality functioning, we hope that soon this new scale could also be used to assist in clinical decision making such as length of stay in psychiatric hospitals, formulating psychopharmacological decisions, and predicting adherence with psychiatric care.

Disclosure statement
No potential conflict of interest was reported by the authors.