The Quantified Behavioural Test Plus (QbTest+) in adult ADHD

Abstract The Quantified Behavioural Test Plus (QbTest+) is widely used in clinical practice to assess patients with attention-deficit hyperactivity disorder (ADHD). This study mapped its behaviour in a group of adults with ADHD. Does it signal problems with impulsivity, attention and/or activity? To what extent are patients’ self-reported problems reflected in QbTest performance? Does Qb testing foretell the future, as reflected in the patients’ and clinicians’ judgements 4 years later? We here recorded the three QbTest+ cardinals–QbActivity, QbImpulsivity and QbInattention – in 67 consecutive ADHD patients diagnosed in adulthood. Among the 54 patients who medicated as usual on the day of testing, 35 (65%) scored above the clinical cut-off (Q-score ≥ 1.25) on at least one of the QbTest+ cardinals. Out of the 13 patients who suspended medication prior to the test, 11 (85%) scored above the clinical cut-off on at least one of the Qb-variables. There were modest associations between QbTest+ cardinals and symptom self-ratings [Brown ADD scale (BADDS); Adult Self-Report Scale (ASRS)]. Forty-one patients completed a second QbTest+ approximately 4 years after the first. Performance was improved on the follow-up test and fewer patients scored in the clinical range (34%). The scores on the QbInattention cardinal at baseline correlated positively with BADDS and ASRS self-ratings at the 4-year follow-up.


Introduction
Attention-deficit hyperactivity disorder (ADHD) is a neuropsychiatric disorder with roots in childhood affecting approximately 3% of the adult population (Fayyad et al., 2007(Fayyad et al., , 2017. Although we note that the validity of adult ADHD diagnosis disputed by some (e.g. Moncrieff & Timimi, 2010), it is commonly diagnosed in adult psychiatric care: the prevalence of adult ADHD in nonpsychotic adult psychiatric care has been estimated to 17% in a study of eight European countries, Sweden included (Deberdt et al., 2015). This makes ADHD one of the largest patient groups in Swedish mental health care. Adult ADHD is associated with poorer functioning in everyday life , with substantial negative economical and personal consequences (e.g. Chang et al., 2014;Chung et al., 2019).
The FDA cleared and CE marked Quantified Behavioural Test Plus (QbTestþ) is widely used in clinical practice to assess patients with (suspected) ADHD. Research has suggested that this continuous performance test, tapping the three core symptoms of ADHD (inattention, impulsivity and hyperactivity), can help diagnosing ADHD in children and youth (Vogt & Shameli, 2011;Hall et al., 2016Hall et al., ,2017Hollis et al., 2018), aid in the search for neurobiological mechanisms (Oades et al., 2010), optimize treatment (Bijlenga et al., 2015;Tallberg et al., 2019) and facilitate communication with patients (Hall et al., 2017). However, problems have been reported by other studies with regard to its ability to tell ADHD apart from other neurodevelopmental and psychiatric conditions (Hult et al., 2018;Johansson et al., 2021) or even from healthy controls (Brunkhorst-Kanaan et al., 2020). It must be said, though, that the QbTest þ is not meant to be a standalone test but be used together with other assessment tools (clinical interviews and rating scales) since many mental disorders are associated with cognitive problems involving attention, impulsivity and/or activity (Millan et al., 2012).
This report is a naturalistic study which aimed to map QbTestþ performance in a group of adult patients already diagnosed with ADHD (the diagnostic process itself did not involve the QbTestþ). Its naturalistic design meant that we, as is usual in ordinary clinical settings, relied on the test manufacturer's cut-off proposals to judge whether a particular patient's performance deviated from normal or not (see Methods). Based on previous studies, we expected a robust impairment of QbTestþ performance (see Edebol et al., 2013b;Sharma & Singh, 2009). ADHD-medication has been shown to affect QbTest performance (Bijlenga et al., 2015;Ginsberg et al., 2012;Edebol et al., 2013a). Since some ADHD patients frequently miss taking their prescribed medication, this might influence test results. We therefore investigated if those patients who missed their medication performed worse than those who took their prescribed medication. Finally, we wished to explore whether QbTestþ performance at baseline predicted future attentional performance, subjective symptom ratings and/or clinicians' reports at a follow-up 4 years later.

Participants
The study sample derived from the St. G€ oran Bipolar Project carried out within the Northern Stockholm Mental Health Service, which assesses patients with ADHD (Nylander et al., 2021) and bipolar disorder (Sparding et al., 2015) over several years. The St. G€ oran Bipolar Project is an interdisciplinary, prospective and naturalistic study. It aims to study the course and outcome of treatment in standard psychiatric health care, and to investigate the pathophysiology of bipolar disorder and ADHD. The project's ultimate aim is to reduce time spent ill and suffering, and to improve everyday functioning and quality of life.
Eighty-two patients diagnosed with ADHD were enrolled from a tertiary outpatient clinic specialized in assessment and treatment of ADHD. Experienced board-certified psychiatrists (E.R. or O.F.) conducted structured anamnestic interviews with all ADHD patients. The interview structure relies upon the clinical assessment instrument Affective Disorder Evaluation, originally from a study of bipolar disorder (Sachs et al., 2003). The Affective Disorder Evaluation includes social anamnesis, medical history and family history, and was in this study complemented with a section covering the DSM-IV diagnostic criteria for ADHD. In addition, the M.I.N.I. International Neuropsychiatric Interview (Sheehan et al., 1998) was used to screen for psychiatric diagnoses other than ADHD and bipolar disorder. The Adult ADHD Self-report Scale [ASRS; Kessler, Adler, Barkley et al. (2005)] and the Brown Attention Deficit Disorder-Scales [BADDS; Rucklidge and Tannock (2002)] were used to assess current ADHD symptoms. Clinicians used the Global Assessment of Functioning (GAF) and Clinical Global Impression-Severity (CGI-S) to rate the patients' functioning level and symptom severity, respectively. All available sources of information, encompassing patient interview, case records and, if available, interviews with next of kin were utilized in to arrive at a best estimate diagnosis.
The investigations also included a magnetic resonance imaging brain scan at baseline and follow-up. Cerebrospinal fluid was also collected through lumbar puncture to shed light on the pathophysiology of bipolar disorder and ADHD. None of these measures are, however, used in this study. All patients also underwent rigorous testing of cognitive functioning.
The sample consisted of 67 patients (see Table 1) who had ongoing ADHD drug treatment and had completed a QbTest þ at baseline (41 of them also completed a QbTestþ 4 years later).
Out of the 67 patients, 13 (19%) missed by mistake taking their ADHD medication on the day of testing (see the flow-chart in Figure 1). The remaining 54 patients took their medication as prescribed. Forty-one patients completed a second QbTest þ approximately 4 years later. At this time point, 12 patients missed their medicine prior to the test, whereas the rest (n ¼ 29) medicated as usual. Hence, between baseline and the 4-year follow-up, 26 dropouts were registered. When comparing these 26 dropouts to the remaining 41 patients who participated in the follow-up, no statistical significant differences were found in their baseline QbTest scores or in their scale scores (BADDS, ASRS, GAF and CGI-S). We do not know the reason why the 26 patients chose to leave the study. According to ethical rules in our country, participants are free to drop out from research studies without being asked to declare the reason why.
The S.t G€ oran project is a naturalistic study and does not affect the management of care. Thus, medication type, doses, discontinuation, stop-start patterns and visits over the course of the 4-year study period were determined based solely on clinical needs and not defined in a study protocol. However, since the mainstay treatment for adult ADHD is central stimulants (Cortese et al., 2018), most patients used catecholaminergic reuptake blockers (e.g. methylphenidate and dexamphetamine) at least periodically, in 60% of the cases along with other psychopharmacological treatments to mitigate comorbid disorders. Antidepressants were the commonest [22 of 67 (33%)]. Sixteen (24%) were prescribed sleeping pills, and 6 (9%) patients were also prescribed anxiolytics. Most patients were prescribed more than one add-on drug to the stimulant at both baseline and follow-up. At follow-up, anxiolytics were the commonest [21 of 44 (48%)]. Eighteen (41%) were prescribed antidepressants at follow-up.

Instruments
The commercially available QbTestþ, described in detail by others (Bijlenga et al., 2015;Johansson et al., 2021;Knagenhjelm & Ulberstad, 2010), is a 20-min computerized continuous performance test measuring activity level, impulsivity and inattention. Instructions on how to take the test were given verbally and by means of a standardized film (QbTech AB) presenting the procedure of the test. The patient then performed a 1-min practice to make sure the instructions were understood. The test took place in a room with minimal visual and auditory stimuli, the test instructor being present in the room but discretely, so as to not distracting the test-taker.
During testing, the patient sits in front of a computer screen and is presented with 600 stimuli (25% of which are randomly distributed targets), one-by-one at a pace of two presentations per second. There are four different kinds of stimuli, differentiated by colour and form. The test-taker is instructed to press a hand-held clicker when two consecutive stimuli are identical by colour and form. Other stimulus sequences are non-targets (no clickerpress). The test generates a score for each of three ADHD core symptoms. The first cardinal, QbActivity, is recorded by a motion-tracking system and provides a weighted index of movements during the test, in terms of total distance, vividness, and change of position. The second, QbImpulsivity, is a weighted index of various commission errors (responses to non-targets). QbInattention, finally, is a weighted composite of reaction time averages and variability plus omission errors (non-responses to targets). The scores are expressed as zscores (i.e. standard deviations), which are called Q-scores by the test developers. A Q-score ! 1.25 on at least one of the parameters indicates that the participant may have ADHD (Knagenhjelm & Ulberstad, 2010). Qb Tests are not meant to be a standalone tool for diagnosing ADHD. However, it is designed to be added to the assessment process along with a rating scales and a clinical interview.

Questionnaires
The Brown ADD scale (BADDS) is a 40-item self-report scale assessing executive functioning. Individual items are rated on a scale from 0 to 3: "never"; "once a week or less"; "twice a week or more" or "almost daily." The patient does not further specify for how long the rated symptom have been present. The items are clustered into five subscales. BADDS is primarily designed to measure the inattentiveness of the ADHD symptomatology. The total score can range from 0 to 120 and the clinical cutoff score of 50 indicates "probable ADHD" (Brown et al., 2011). The Adult Self-Report Scale (ASRS) has 18 items, which correspond to the 18 DSM criteria of ADHD. Six of the 18 questions were found to be most predictive of ADHD; those questions form the part A (the screener version of ASRS) of the symptom checklist. Part B contains the remaining 12 symptoms of the ADHD diagnosis. The responses are given in a five-point Likert-scale from "never" to "always" (1-5). The patient is instructed to recall the preceding six months. The ASRS has shown good reliability and validity for evaluation of ADHD in adults (Adler et al., 2006). The six-item screener of ASRS has shown to outperform the full 18-item checklist in sensitivity, specificity and total classification accuracy (Kessler, Adler, Ames, et al., 2005). In this study, the six-item screener version is used. GAF ranges from 100 (extremely well-functioning) to 1 (extremely severe impairment). GAF is used to rate overall psychological functioning plus social and occupational functioning, e.g. how well the patient is handling their everyday problems, during the preceding month. The GAF was included as axis V in DSM-IV and has been widely used in routine clinical settings around the world (Monrad-Aas, 2010; Piersma & Boes, 1997;S€ oderberg et al., 2005).
CGI-S is a 3-item scale measuring symptom severity, global improvement and therapeutic response. In this study, the symptom severity item was included, which summarizes the clinician's global impression of symptom severity. The CGI-S is rated on a 7-point scale, from 1 to 7 ("not ill at" all "to extremely ill") and refers to the actual time point for the examination. Psychometric evaluations have reported good internal consistency and concurrent validity (Leon et al., 1993) although some problems with validity or test-retest reliability also have been noted (Beneke & Rasmus, 1992).

Statistical analyses
The results in the tables are presented as medians and interquartile ranges. Non-parametric statistical tests were chosen because of their robustness to violations of normality and because their use in psychiatric studies is encouraged (see Urbano Blackford, 2017). The median test was used to test between-groups differences. It assesses whether two independent groups have been drawn from populations with the same median (Siegel, 1956). Within-groups differences were assessed using the Wilcoxon matched-pairs signed-ranks test. Associations between variables were computed as Spearman correlations. SPSS Statistics for Mac version 22.0 (SPSS Inc., Chicago, IL) was used for these calculations. Box plots were made using software developed by Spitzer et al. (2014).
The QbTestþ developer suggests that a Q-score of ! 1.25 on at least one of the three parameters should be used as a cut-off in clinical practice (Knagenhjelm & Ulberstad, 2010; see also Hult et al., 2018), which equals performance in the ! 89th percentile (see Ulberstad, 2012c for information about normative data). This suggested cut-off was adopted in this study. Patients' Q-scores, which are automatically provided after the test is completed, are adjusted to the performance of a healthy population separated according to gender and age (age groups in the normative group: 18-19, 20-29 and 30-60 years of age).

Ethics
The Regional Ethics Committee in Stockholm approved this study (2005/554-31/3 and 2011/ 1700-32), which was conducted in accordance with the latest Helsinki Protocol. All patients and controls consented both orally and in writing to participation in this study.

Results
QbTest1 performance with and without ADHD medication Figure 2 illustrates the considerable individual variation in QbTestþ performance. Among the 54 patients who medicated as usual on the day of testing at baseline, 35 (65%) scored above the clinical cut-off (Q-score ! 1.25) on at least one of the QbTestþ cardinals. Out of the 13 patients who by mistake omitted their ADHD medication on the day of testing, 11 (85%) scored above the clinical cut-off on at least one of the Qb-variables. The difference between the medicated and the temporarily non-medicated patients was not significant by the v 2 -test [v 2 (1) ¼ 1.91, p ¼ .167]. Neither was the difference in the median Q-scores between patients on and off medication ( Associations between QbTest1 performance and rating scale scores Table 3 shows the Spearman correlation coefficients between QbTestþ performance on the one hand and rating scales scores in patients medicating at baseline on the other. QbActivity correlated positively with concurrent ASRS ratings and QbInattention with concurrent BADDS ratings. The correlations were in the moderate range. There were also significant correlations between baseline Q-scores (particularly ObInattention) and self-rated BADDS-and ASRS-scores 4 years later (Table 3).
Rating scale and Q-scores over the course of 4 years Table 2 shows that patients tested twice on the QbTestþ (N ¼ 41) performed significantly better on the follow-up test compared with the baseline test, 4 years earlier (QbActivity: Conforming to the general pattern reported in a previous study of ours (Nylander et al., 2021), the ADHD patients also improved over the course of 4 years ( Table 2) according to the BADDS (Z ¼ À3.44, p ¼ .001 and g 2 ¼ 0.29), ASRS (Z ¼ À2.20, p ¼0.028 and g 2 ¼ 0.11) and the CGI-S (Z ¼ À3.86, p ¼ .000 and g 2 ¼ 0.37). GAF estimates, by contrast, did not change significantly (Z ¼ À0.91 and p ¼ .362).

Discussion
A striking feature in our data is the considerable individual variation in QbTestþ performance (Figure 2). Many patients had Q-scores within the normal range on at least one of the cardinals and the medians were below the clinical range. This variability notwithstanding, 65% of the patients tested while on medication scored above the clinical cut-off proposed by the manufacturer (i.e. a Q-score > 1.25 on at least one of the cardinals). A group of patients omitted by mistake to take their stimulant medication on the day of testing. In this unintended "experimental" group, 85% scored above the QbTestþ clinical threshold. This difference failed to reach significance in the statistical sense. Likewise, the median Q-scores of patients tested on or off medication were similar statistically. However, since many studies report that the QbTestþ can detect beneficial effects of central stimulant treatment in adults (Bijlenga et al., 2015;Edebol et al., 2013a;Ginsberg et al., 2012) it is quite possible that our failure to unveil significant effects of no-medication was simply due to a lack of statistical power as there were only 13 patients in our drug-free condition. Table 3. Spearman correlation coefficients in non-medicated patients with ADHD (N ¼ 13) between QbTestþ cardinals and self-reports (BADDS and ASRS) and clinical measures (GAF and CGI-S) at baseline (N ¼ 13 (12 for CGI-S); upper panel) or at the 4-year follow-up (N ¼ 7-8; lower panel At baseline, we found significant but modest correlations between Q-scores on the one hand and scores on the BADDS-and ASRS self-reports on the other. Similar results have been reported by Bijlenga et al. (2015) and see also Lis et al. (2010) in a larger group of patients. More surprisingly, we found that Q-scores at baseline were correlated with symptom self-ratings 4 years later. For instance, the correlations between baseline QbInattention and follow-up BADDS-and ASRS scores were in the þ0.5 range, and the correlation between baseline QbActivity and follow-up ASRS scores were only slightly lower. Needless to say, this finding needs to be affirmed by independent studies before being fully accepted. If indeed confirmed, QbTestþ performance might constitute a useful tool (along with other sources) to predict the likely long-term prognosis of the disorder.
Even though we report quite high correlations between the objective test and the subjective self-reports, the associations explain part of the variance only (10-25% in this case). This disparity may be due to several factors. The two test modes may tap qualitatively different constructs (see Bijlenga et al., 2015) or reflect metacognitive difficulties in rating one's own shortcomings (Alexander & Liljequist, 2016;Butzbach et al., 2021;Du Rietz et al., 2016;Eckerstr€ om et al., 2016;Fuermaier et al., 2015). A patient's responses on a symptom self-report might mirror in part accumulated long-term outcomes (e.g. academic, selfesteem and social functioning) rather than the actual symptoms (see Shaw et al., 2012). A final possibility is that self-reports versus objective tests capture different, albeit related, cognitive endophenotypic levels and the assessment goal determining the appropriate measuring method (see Barkley & Fischer, 2011).
At follow-up after 4 years, both the QbTestþ and the self-reports indicated that the patients had improved. This observation is in line with the common observation that ADHD symptoms can abate or change in character with time (Kooij et al., 2019), possibly as a consequence of stimulant treatment (Edebol et al., 2013b;Ginsberg et al., 2012). Many studies show that ADHD-medication alleviates core ADHD symptoms in the short term (Kooij et al., 2019). However, its long-term efficiency remains more uncertain (e.g. Cortese et al., 2018).
Strengths of this study include the careful diagnostic procedure that all patients passed through by experienced psychiatrists at a specialized ADHD unit, as well as the inclusion of various and complementary measures. Another strength is the length of the follow-up: follow-ups over several years are rare. The naturalistic study design also yields a high ecological validity.
The naturalistic study design is also a limitation, with no control over medication type, discontinuations, doses and visits over the course of 4 years. Another limitation is the relatively small sample size. Several patients also had other psychopharmacological treatments to mitigate comorbid disorders, such as depression. The role of these comorbid affective disorders (and/or their medications) in causing QbTestþ performance deficits is insufficiently researched and this situation hampers any firm conclusions. However, according to Mesquita et al. (2016), depressive symptom scores do not correlate with performance on a continuous performance test related to the QbTestþ in adults with ADHD. Nevertheless, an interesting avenue for future research would be to explore the potential contribution of, for instance, depression in causing impairments in QbTestþ performance in patients with ADHD. For example Elliot et al. (1996) reported that clinically depressed people are overly sensitive to perceived failure during neuropsychological testing: having solved one task incorrectly, depressed patients were more prone than controls to fail the subsequent problem. Hypersensitivity to negative feedback may impair performance in a continuous performance task such as the QbTestþ, and should perhaps profitably be distinguished from more rash testing approaches that would result in an equally poor result, but for partly different reasons. Unfortunately, our present data lack the resolution needed for doing these kinds of fine-grained analyses.

Conclusions
There was large individual variability in QbTestþ performance among adults with ADHD, even though a fair share scored above the clinical cutoff while under medication. At the 4year follow-up, fewer patients scored above the QbTestþ cutoff and their symptom self-ratings had improved. The QbInattention score at baseline correlated significantly with followup self-ratings.