A pilot examination of the validity of stylus and finger drawing on visuomotor-mediated tests on ACEmobile

ABSTRACT Introduction Cognitive assessments, such as the Addenbrooke’s Cognitive Examination (ACE-III) and Montreal Cognitive Assessment (MoCA), have been modified for administration using tablet computers. While this offers important advantages for practice, it may also threaten the test validity. The current study sought to test whether administering visuospatial and writing tests using a tablet (finger or stylus drawing), would demonstrate equivalence to traditional pencil and paper administration on ACEmobile. Method This study recruited 26 participants with Alzheimer’s disease and 23 healthy older adults. Most participants had low familiarity with using a tablet computer. Participants completed ACEmobile in its entirety, after which they repeated the infinity loops, cube, and clock drawing and sentence writing tests by drawing with a stylus and their finger onto an iPad. Performance on the drawing and writing tests using a stylus, finger, and pencil were compared. Results Statistically significant differences were observed between the finger and pencil administration on the ACEmobile, with participants performing worse on the finger drawing trials. Differences in scores were most apparent on the sentence writing task. In contrast, no statistical differences were observed between the pencil and stylus administration. Discussion The findings of this pilot study have important implications for clinical neuropsychology and demonstrate that administering ACEmobile drawing tests with finger drawing is invalid. However, due to the small sample size, a lack of counterbalancing and the narrow range of scores of the dependent variable, we are unable to confidently interpret the validity of stylus drawing. This is an important consideration for future research.

Neuropsychological assessments, such as the Addenbrooke's Cognitive Examination (ACE-III; Hsieh et al., 2013), Mini Mental State Examination (MMSE), Alzheimer's Disease Assessment Scale -Cognition (ADAS-Cog; Rosen et al., 1984) and Montreal Cognitive Assessment (MoCA; Nasreddine et al., 2005) are widely used to inform the differential diagnosis of neurodegenerative syndromes, such as Alzheimer's disease (AD).The theory underpinning cognitive assessment is relatively simple: if we administer the same test in a consistent and standardized format, the results can be compared with normative data and a meaningful clinical interpretation made.For the ACE-III, provided the assessment is administered in a standardized way, scores of 84 and above are often considered to represent "normal cognitive variation," whereas scores below 84 are suggestive of a potential neurodegenerative disease (So et al., 2018).
Although pencil and paper-based cognitive screening tests such as the ACE-III may appear to be straightforward to administer, it is becoming increasingly clear that there are significant issues with test administration and scoring.For example, several studies have reported that scoring errors are extremely common on ACE-III (Newman et al., 2018;Say & O'Driscoll, 2022).ACEmobile, the tablet-based version of ACE-III, includes automated scoring and significantly reduces the frequency of errors (Newman et al., 2018).While there are many advantages to incorporating technology into neuropsychological practice (Miller & Barr), the nature of adapting assessments to be administered via an iPad or computer means potentially changing how the stimuli are presented, and how the patient interacts with the assessment (Jenkins et al., 2016).
There have been relatively few studies which have investigated the impact that altering the assessment CONTACT Donnchadh Murphy donnchadh.murphy@plymouth.ac.ukFaculty of Health University of Plymouth, Plymouth, UK This article has been corrected with minor changes.These changes do not impact the academic content of the article.
paradigm with technology has on performance.Wallace et al. (2019) found that while scores on the electronic version of MoCA (eMoCA) were equivalent to the pencil and paper MoCA, older adults performed worse on the visuospatial measures, which involved an interaction with the tablet.This finding, however, was not replicated by Berg and colleagues (2018).Picard et al. (2014) reported that children's drawing on an iPad with their finger was worse than on pencil and paper tasks.They argued that finger drawing requires more proximal motor skills, whereas pen drawing requires distal motor processing.Kirkorian et al. (2020) elaborated on this and found that for adults, pencil, and paper drawing was associated with the highest degree of accuracy, followed by stylus drawing, with finger drawing being noticeably worse.Lastly, a study by Gerth et al. (2016) compared handwriting performance on paper and tablets, when using a pen.They concluded that the lack of friction on the tablet resulted in a faster writing speed, and subsequently this required higher levels of motor control, and was therefore more demanding.This research provides a basis for questioning whether tabletbased assessments of visuospatial functioning are valid as part of a dementia assessment.
Overall, due to the clinical utility of measures of drawing (Jacobs et al., 2012), they are an essential component of cognitive screening assessments for dementia, and it is therefore important that any alterations to the testing paradigm are carefully validated.As such, it is important to evaluate the psychometric characteristics of these measures when they are administered directly on a tablet, to inform the development of ACEmobile.This study hypothesized that altering the assessment paradigm, i.e., replacing an over-learned motor skill (pencil drawing) with a novel task (stylus or finger drawing on a tablet), would increase the difficulty of the task and result in significantly worse performance on visuospatial and writing tests in ACEmobile.

Methods
This study recruited participants with a diagnosis of Alzheimer's disease (AD) and healthy controls.Participants with AD were primarily recruited through local memory clinics, disease registers (e.g., Join Dementia Research (JDR)), and specialist health clinics.A convenience retrospective sampling approach was used.This study did not independently diagnose AD and instead recruited participants who had received an AD diagnosis through a National Health Service (NHS) memory clinic and adhered to ICD-9 criterion.The healthy control group was also recruited through dementia registers (JDR) and through word of mouth from participants.
Potential participants who expressed an interest in the study were sent a participant information sheet and contacted a week later to determine if they were interested in participating.People who remained interested in participating were screened against the inclusion criterion, i.e., a clinical diagnosis of AD made through a memory clinic, aged 60 years and older with English as a first language, with no sensory issues that would prevent participation, and without any confounding neurological conditions which affect cognition, e.g., stroke.Participants were also screened on the day of assessment to ensure they had not taken cognition affecting medication prior to the appointment (i.e., benzodiazepines, sedatives, or alcohol).Only participants with the mental capacity to provide informed consent were recruited to this study.All participation was voluntary, and participants gave their informed written consent to participate.All research appointments were conducted in a distraction free and standardized clinical environment or in the patient's homes.Participants were initially asked demographic information and completed the NART, as a measure of estimated premorbid intelligence.After this, they completed a short orientation task on the iPad, to familiarize them with how it works.Following this, the full clinical record form (CRF) was administered, to standardize the delivery of the assessment.Participants completed the ACE-III first, followed by the drawing tasks using a stylus, and then their finger on the app.The order of the stylus and finger administration was counterbalanced.A paired digital touch-sensitive stylus was used.The assessments were double scored, and a double data entry system was used to reduce the potential for human error.

ACEmobile
ACEmobile is a freely available computerized version of the ACE-III and was developed by the research team in this study (Newman et al., 2018).The ACE-III is a widely used and empirically supported screening measure for dementia that was specifically designed for discriminating between amnesic and non-amnesic dementias (Hsieh et al., 2013).The ACE-III takes approximately 15 minutes to complete and comprises subtests of attention, memory, fluency, language, and visuospatial ability.These subtests can be combined to provide a composite score.ACE-III consists of four drawing or writing tasks -infinity loops, cube drawing, clock drawing, and sentence writing.For this study, participants were instructed to complete these tasks under three conditions -pencil and paper, using a stylus to draw onto an iPad and using a finger to draw onto an iPad.The instructions were the same on each condition.A composite score was created for each condition by summing scores on each of the drawing and writing tasks on ACEmobile.
A power calculation was completed to inform the minimum sample size for this study.The minimum sample size required was calculated using an effect size estimation in the moderate to large range (), on the basis that small between-condition differences in performance may be clinically insignificant.A two-tailed Wilcoxon signed-rank test power calculation of the composite ACEmobile drawing scores was calculated (α = 0.05, 1-β = 0.9), and a minimum sample of 47 was required.The power calculation was completed using G*Power software.The dependent variables were participants' composite ACEmobile drawing scores (0-10), which was calculated by summing their scores on infinity loop (0-1), cube drawing (0-2), clock drawing (0-5), sentence writing (0-2).The independent variables were the method of administration (pencil and paper, using the iPad with stylus or using the iPad with a finger), and diagnosis (AD and healthy controls).A lower score indicated a worse performance on each of the ACE-III subtests.Demographics and summary statistics for each of the ACE-III subtests were presented as mean and standard deviations or median and interquartile ranges.Diagnostic tests were performed to determine whether the iPad-based test alters the overall cutoff of ACEmobile.

Data analysis plan
Non-parametric analyses were used throughout due to the small size of the sample, the anticipated skewed distribution of scores on ACEmobile, and the small range of scores on the measures.The representativeness of the AD and Healthy Control samples were compared using a series of Mann Whitney and Fisher's Exact tests.Composite "pencil" "finger" and "stylus" variables were created by summing the scores on infinity loops, cube drawing, clock drawing, and sentence reading.
Spearman's rank correlations were used to calculate the parallel form reliability between pencil, finger and stylus composite scores, and Wilcoxon signed rank tests were used to calculate whether scores on the pencil, finger, and stylus composite differed significantly.A moderate to large magnitude effect size would be required to produce a clinically significant effect.All analyses were calculated on Stata.

Results
In total, 51 participants were recruited to the study: 25 healthy controls and 26 participants with a diagnosis of AD.Two healthy controls were excluded from the analysis -one due to poor effort (based both on clinical observation and their score on ACEmobile (78)) and the second due to experiencing a previous stroke, which resulted in some dominant hand weakness, which was not initially reported -as such the final sample consisted of 23 healthy controls.
In the AD group, one patient had macular degeneration but believed this did not affect his eyesight, which was verified by their carer, and therefore they were included in the study.One patient experienced severe fatigue during the assessment, which resulted in the assessment being discontinued early.No participants reported neuromotor symptoms, although a tremor was observed during the assessment in one patient with AD.Two healthy controls had a diagnosis of osteoarthritis and experienced pain when using a pencil.Additionally, one healthy control had a diagnosis of dyslexia.There were incidents whereby the app crashed (n =1), or the stylus was unavailable or not working (n =3), and as such, there are missing data for the stylus and finger administration.
The two groups were compared with respect to age, gender, handedness, and estimated premorbid intelligence (see Table 1).Of note, the AD group were significantly older, and reported having less experience using an iPad prior to this study -although this difference was not statistically significant.As expected, the control group scored significantly higher than the AD Based on the CBI-R, the difficulties most frequently observed with the AD group were with memory (X = 2.31, SD = 0.68), followed by occasional low motivation (X = 1.43;SD = 0.99).Given the median ACEmobile score (73.5) and the relatively preserved self-care (X = 0.24; SD = 0.63) and everyday skills (X = 1.29,SD = 1.16), as indicated on CBI-R, the sample is most reflective of mild AD.

Validity of stylus administered tests
A Spearman's rank correlation was computed to measure the relationship between the stylus and pencil composite scores.There was a moderate positive correlation between performance using pencil and stylus (see Table 2).This provides some evidence of parallel form reliability, although a significant proportion of this relationship was explained by measurement error.No statistically significant differences were observed between composite scores on the pencil and stylus administered tests (see Table 2).An effect size was computed and was observed to be in the small range (r = 0.15).This analysis was repeated just in patients with AD (n = 20), and no statistically significant differences were observed (although this analysis is underpowered), Z = −1.39,p = 0.16.However, on clock drawing, more participants with AD performed below the cutoff score recommended by Charernboon (2017), suggesting that stylus administration may increase the novelty/demands of the task (see Supplementary Information).Additional subtest-level analyses and descriptive information are provided in Supplementary Information.

Validity of finger drawing on visuospatial subtests
A Spearman's rank correlation was performed to measure the relationship between finger and pencil composite scores.There was a moderate positive correlation between pencil and stylus, which provides some evidence of parallel form validity (see Table 2).However, there were statistically significant differences between scores on the pencil and finger administered tests, which suggests that they are not comparable.The sum of negative ranks (353) was noticeably larger than the sum of positive ranks (112), which means that participants performed worse on the Finger administered tests.A moderate effect size was observed between the groups (r = 0.305).On a subtest level (see Table 3), while more than half of participants had matching scores on pencil and finger administered visuospatial subtests, a higher percentage performed worse on the finger tasks, which suggests it may be a more challenging task.This difficulty was even more pronounced on the sentence writing task, whereby 38% of participants performed worse on the finger task.
To understand this difference in composite scores further, a "Finger Loss" variable was created by subtracting finger score from pencil score.Scores had a sevenpoint range (−2 to 5) and a wide standard deviation (1.6), suggesting a relatively wide distribution in differences (M = 0.64).To consider whether finger loss was related to any of the other variables collected in this study, scores were correlated with "Tablet Experience," Estimated IQ (based on NART scores), and Cognition (ACEmobile Total score)."Finger loss" was insignificantly associated with tablet experience (r = 0.01; p = 0.48), premorbid IQ (r =−0.15; p = 0.16) or cognitive ability (r =−0.03; p = 0.43).

Discussion
Standardized cognitive assessment, such as ACEmobile, is predicated on the assumption that the assessment is administered using a consistent approach, so that an individual's score can be reliably attributed to their neuropsychological functioning.The current study sought to test the validity of administering visuomotormediated tests from the ACEmobile using an iPad.
Participants were asked to complete the infinity loop, 3D wire cube, clock drawing task, and the sentence writing task using the standard instructions (pencil and paper) and again on the iPad, using both the stylus and their finger to draw.This study found that administering visuomotor subtests of ACEmobile using the stylus resulted in comparable scores to pencil administration, and no statistical differences were observed between the two methods of administration.Although the sample size used in this study was small, which increases the risk of type II statistical error, the effect size between the groups was small.However, due to the methodological limitations of this study, it is not possible to conclude that stylus and pencil administration are equivalent, and further wellcontrolled research will be required to determine this.
In terms of the validity of finger administration of ACEmobile, overall scores differed significantly when compared with pencil and paper administration.Participants performed comparably worse across all finger drawing and writing tasks, which suggests that the added novelty of this task increases the cognitive load, and results in a more challenging test.This difference was most apparent on the sentence writing task.Writing using a pencil is an overlearned skill, i.e., it involves automatic processing (Hulstijn & van Galen, 1988), which means that subjects can focus their cognitive processing entirely on the task (writing two sentences).In contrast, finger writing involves a different set of motor skills (Picard et al., 2014), and this likely means that participants were required to split their conscious processing between the mechanics of finger writing and the task at hand -thus increasing the cognitive load of the task.Overall, administering ACEmobile using finger drawing was not equivalent with pencil and paper, and resulted in significant differences in how visuospatial scores on the ACEmobile can be interpreted.
To the authors' knowledge, this is only the third neuropsychological study to evaluate the impact of mode of administration on visuomotor-mediated cognitive tests, after Wallace et al. (2019) and Berg's studies on the validity of e-MoCA.This study supports the findings of previous research, which indicated that drawing performance using a finger on a tablet is significantly worse than drawing completed using pencil and paper (Kirkorian et al., 2020;Picard et al., 2014).Whereas finger drawing on a tablet may involve a fundamentally different motor process compared to the pencil and paper (Picard et al., 2014), the nature of holding a stylus in a similar way to a pencil, means that this task likely relies on similar motor processes.

Methodological limitations
It is important to interpret the results of this study with respect to the methodological limitations.Firstly, this study had a relatively small sample, which was further reduced due to participant exclusions, the app crashing, and the unavailability of a stylus.Although the sample size of this study was larger than the two previous eMoCA evaluations (Berg et al., 2018;Wallace et al., 2019), it was not sufficiently powered to detect differences small in magnitude.Furthermore, due to the narrow range of scores (i.e., 1-10) non-parametric analyses were used, which are less sensitive at detecting statistical significance.As such, this study cannot rule out that stylus administration may have had a subtle influence on performance.
The order of test administration was counterbalanced for stylus and finger administration; however, it was not counterbalanced for pencil administration, and all participants completed the pencil and paper task prior to the tablet tasks.This decision was taken so that we could assess AD patients' ability to engage in formal assessment before introducing additional novelty.However, this approach introduces potential bias into the study and raises the issue of practice effects.As such, it is important to take a critical perspective on how to interpret the findings of the results -i.e., while this study can provide evidence on invalidity (e.g., finger drawing), it cannot provide evidence of equivalence.
A final limitation in this study was that there was missing data for 19% of patients on the stylus condition.The primary reason for this was due to the stylus being unavailable or not working at the time of the assessment.Although this reduced the sample size, which is a limitation, it perhaps reflects a broader issue associated with human error and relying on a stylus.In discussion among the authors of this paper, losing styli was not uncommon in their clinical practice, and this should be considered.

Recommendations for future research
The results of this study provide tentative evidence that finger drawing and writing is not equivalent to pencil and paper administration of ACEmobile.However, the issue of whether stylus administration is an equivalent approach to test administration still requires investigation.Future research should include a broader range of normally distributed neuropsychological tests of increased difficulty, e.g., Rey Complex Figure Test (Rey, 1941) or which involve speed as a primary outcome, e.g., Trail Making Test (Reitan, 1958) to examine the validity of stylus administration.Additionally, this research should counterbalance the order of test administration (pencil versus stylus) and be powered sufficiently to detect subtle differences.

Recommendations for practice
Given the growing influence of technology in neuropsychological practice (Miller & Barr, 2017) and the fact that widely popular pencil and paper tests are increasingly being developed for computer or tablet administration (i.e., Q-Interactive), the results of this study have important implications for neuropsychological practice.The primary finding of this study is that finger-based administration of cognitive screening test is not an equivalent approach to pencil and paper administration of ACEmobile subtests, and we hypothesize that it increases the cognitive load of the assessment.Although this small pilot study only focused on ACEmobile, in the total absence of any better-quality research which suggests otherwise, we recommend that no paper and pencil-based neuropsychological tests are administered via finger drawing.However, there are many tests which have been developed exclusively for tablet administration which ask patients to complete drawing tasks using either their fingers or a stylus, e.g., the Oxford Cognitive Screen Plus (Demeyere et al., 2021).The current study found that the finger administration, although not equivalent to pencil administration, was effective at accurately classifying participants (AD vs healthy controls) and achieved fair classification accuracy (AUC = 0.72).Therefore, provided tests are either validated for finger administration or the normative data were collected based on finger administration, this could still be a valid approach in other contexts.

Conclusion
Overall, this pilot study sought to test whether there were significant differences in scores between visuomotor-mediated tests from the ACEmobile which were administered using a stylus, finger, or pencil.Based on the methodological limitations of this pilot study, i.e., sufficiently powered only to detect differences of moderate magnitude, and the potential for practice effects, the results of this study cannot rule out subtle differences between administration approaches.Furthermore, the results of this study cannot provide evidence of equivalence between approaches.As such, although there were no statistically significant differences between the pencil and stylus composite scores, further research is required to provide evidence of parallel validity.In contrast, this study found that finger administration is an invalid approach for administering ACEmobile subtests.Both patients and healthy controls performed significantly worse on this task, despite already having practice on the paper and pencil task.As such, administering ACEmobile using finger drawing increases the difficulty of the test, and should not be substituted for pencil administration.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Table 1 .
Demographics information of the AD and control groups.
Age, FSIQ -t-test Gender, dominant hand, degree -chi-squared Tablet experience, ACE-III total -Mann Whitney U group on ACEmobile, with the median and interquartile scores falling above the cutoff of 84 in the control group.

Table 2 .
Statistical comparison between methods of test administration.

Table 3 .
Performance of the stylus and finger against pencil -n (%).