Optimising verbal fluency analysis in neurological patients with dysarthria: examples from Parkinson’s disease and hereditary ataxia

ABSTRACT Background Verbal fluency tests (VFTs) are widely used to assess cognitive-linguistic performance in neurological diseases. However, the influence of dysarthria on performance in tests requiring oral responses is unclear in ataxia and Parkinson’s disease. Objectives To determine the impact of dysarthria on VFT performance and evaluate the validity and reliability of alternative methods for analyzing VFT data. Method Trained raters evaluated dysarthria using VFT recordings in people with ataxia (N = 61) or Parkinson’s disease (PD; N = 69). Total Correct Items scores and qualitative parameters (intrusions, ambiguous verbalizations, perseverations, and interjections) were compared across semantic, phonemic, and alternating fluency tasks. Disease severity was considered as a covariate in the regression model. Results VFT dysarthria ratings correlated with the benchmark (ground truth) dysarthria scores derived from a monologue. Ambiguous responses resulting from unclear speech impeded the rater’s ability to determine if a response was correct. Regression analysis indicated that more severe dysarthria ratings predicted diminished scores in all three tasks (semantic fluency, phonemic fluency and alternating fluency) in the ataxia group. The contribution of disease severity to semantic, phonemic and alternating fluency was reduced substantially in the ataxia group after accounting for dysarthria severity in the model in both groups. Conclusions Dysarthria severity can be estimated based on speech samples derived from VFT. Dysarthria can lead to lower total correct items and is associated with more ambiguous verbalizations in VFT. Dysarthria severity should be considered when interpreting VFT performance in common movement disorders.


Introduction
Oral-motor or speech deficits (e.g., dysarthria) are common in neurodegenerative diseases (Brendel et al., 2013;Kim et al., 2011;Magee et al., 2019).For cognitive tests that require oral responses, dysarthria can unduly influence performance.Participant responses may be delayed or unintelligible because of a motor speech impairment.Verbal fluency tests (VFTs) are widely used in neurological populations to measure cognitive and linguistic abilities, including semantic searching and executive functions (Barbosa et al., 2017;Hoche et al., 2018).Participants are asked verbally to recall words starting with the same letter or from specific semantic categories in one minute (Benton & Hamsher, 1976).
Performance on VFT is typically rated quantitatively via the number of total correct items (Benton et al., 1983) or qualitatively by characterizing responses as clustering, switching, intrusions, or perseverations (Galaverna et al., 2016;Troyer et al., 1997).These features offer insight into participants' cognitive abilities in multiple disease groups with adequate validity and reliability, including dementia and multiple sclerosis (De Araujo et al., 2011;Delis et al., 2004;Henry & Beatty, 2006;Jaimes-Bautista et al., 2020).Data from VFT in disease groups where motor speech impairment impacts speech rate or clarity may not be accurate representations of verbal ability or executive function.
Dysarthria can manifest in reduced speech rate and imprecise articulation, abnormal nasal resonance and poor voice quality, pitch and loudness control.Motor impairments may also impact patients' cognitive capacity to perform lexical searching and quick oral responses through cognitive-motor interference, adversely affecting VFT performance (Camicioli et al., 1998).An investigation into the competing demands on the performance of motor and cognitive performance is needed.
VFTs are used in neurodegenerative diseases like Parkinson's disease (PD) and hereditary ataxias to assess verbal fluency despite the potential impact of dysarthria on performance.PD progression leads to changes in speech, voice, cognition and language function (Magee et al., 2019).The majority of patients with PD present with hypokinetic dysarthria characterized by soft, harsh voice quality, imprecise articulation and diminished intelligibility (Duffy, 2019;Ho et al., 1998;Sapir et al., 2010).A review study by Sapir (2014) concluded that these symptoms might be accounted for by bradykinesia and hypokinesia secondary to dopaminergic depletion.The link between dysarthria and VFT performance is not well described with earlier work appearing to treat VFT as a cognitive task without regard for the effect motor impairment has on verbal response rates (See Table A1).
To overcome these limitations, we analyzed VFT data in two groups where motor speech impairment is common with the aim of disentangling the influence of dysarthria on VFT performance.To achieve this, we 1) investigated the feasibility of determining dysarthria severity from speech samples collected during VFT; and 2) examined whether dysarthric participants with PD and ataxia present with different VFT outcomes in the context of their dysarthria.It is predicted that trained raters can produce dysarthria ratings with adequate validity and reliability based on VFT performance compared to typical tasks (e.g., monologue); that more severe dysarthria predicts poorer VFT performance in the ataxia and PD groups, and when controlling for disease severity, participants with more severe dysarthria demonstrate more diminished VFT scores and more erroneous items compared to those with milder or no dysarthria in the ataxia and PD groups, respectively.

Participants
One hundred and twenty-nine participants (61 with ataxia, 68 with PD) were recruited for the study (Table 1).Participants with ataxia presented with a genetically confirmed ataxia or idiopathic ataxia were recruited from the Department of Neurology, University Hospital Tübingen, Germany and Alfred Health, Melbourne, Australia (Vogel et al., 2019).Participants with PD were drawn from a previously published study on PD-related cognitive impairments (Pourzinal et al., 2020).Participants with PD who met the UK Brain Bank criteria for idiopathic PD were recruited from movement disorders outpatient neurology clinics in Queensland, Australia to the original study on mild cognitive impairment in PD (PD-MCI), in which de-identified data was shared for this study from our existing database (Gibb & Lees, 1988;Pourzinal et al., 2020).Dysarthria status was not an inclusion criterion.Individuals with a co-morbid neurological condition (e.g., dementia) were excluded.

Disease severity
The Part III scores of the Movement Disorder Societysponsored revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS; Goetz et al., 2008) were used to determine the disease severity in the PD group.The MDS-UPDRS is a rating scale utilized to evaluate the severity of motor and non-motor PD symptoms.It consists of four parts: I) non-motor experiences of daily living; II) Motor experiences of daily living; III) motor examination; IV) motor complications.The assessment was conducted by Movement Disorder Society certified assessors.The Hoehn and Yahr scores of participants with PD were also collected.The Scale for Assessment and Rating of Ataxia (SARA) reflected the ataxia group's disease severity: zero indicates no impairments and 40 indicates the most severe ataxia (Schmitz-Hübsch, 2010).It consists of eight items, including gait, stance, sitting, speech disturbance, finger chase, nose-finger test, fast alternating hand movement and heel-shin slide (Schmitz-Hübsch, 2010).The SARA was conducted by neurologists involved in the primary study (Vogel et al., 2019).

Verbal fluency tests
All participants completed semantic, phonemic and alternating verbal fluency tasks (Delis et al., 2001).Participants recalled as many words as they could starting with the letter "f" (phonemic) and animals (semantic; Delis et al., 2001;Hoche et al., 2018).The alternating tasks differed between clinical groups, vegetable and occupation for the ataxia group and fruit and furniture for the PD group.The time limit for each task was 60 seconds.Both disease groups completed all VFTs in English.
For PD participants, individuals were prompted if the subject fails to make a response after any 15second interval.Individuals were also prompted if they generate three consecutive words that do not start with the designated letter.This prompt was only provided once per trial.For ataxia participants, individuals received responses from the assessor if they had questions however did not receive prompts and clarifying questions throughout the test administration.

Dysarthria ratings
Two groups of trained raters each containing two raters analyzed speech from the VFT tasks.Speech was rated on a 0-4 Likert scale (0 = normal, 1 = subclinical, 2 = mild, 3 = moderate, 4 = severe) across five integral sub-domains of speech production, including articulation, voice quality, resonance, intelligibility, and naturalness (see Table A2; Vogel et al., 2019).Articulation, voice quality, and resonance reflected speech production accuracy, vocal fold functioning, and velopharyngeal functioning, respectively (Vogel et al., 2019).Intelligibility and naturalness measured the extent to which the speaker can be understood and perceived as natural by listeners, respectively.The average scores between raters were established for all sub-domains (i.e., one participant had five scores for five sub-domains).The highest sub-domain score was selected as the overall dysarthria severity score for each participant because the most severe sub-domain impairment is likely to determine the overall perceived speech quality (see Table A3 for the rater training procedures).As a comparator to VFT-derived dysarthria scores, participants in the ataxia group were instructed to produce an unprepared monologue for one to two minutes.These monologue samples were rated using the same criteria above by trained raters distinct from those who rated dysarthria from VFT. Intelligibility and naturalness were collected from the monologue samples as overarching summative estimates of participants' speech.

Total correct items
The Total Correct Items scores in each fluency task were counted.Words with different derivational suffixes (e.g., "functional" and "functioning") are counted as a single correct item, and proper nouns were discounted (Barbosa et al., 2017;Galaverna et al., 2016;Rodrigues et al., 2015).Homophones were counted if the subject clarified their meanings, and words starting with the /f/ sound rather than the letter "f" were discounted.

Qualitative analysis: Error patterns and fillers
Intrusions, ambiguous verbalizations, perseverations, and interjections were scored across tasks (see Table A2).Intrusions were words that violated instructions.Ambiguous verbalizations were unrecognized words.
Perseverations were repeated correct items or intrusions.Interjections were meaningless vocalizations used by participants.

Statistical analysis
The relationship between dysarthria ratings based on VFT and dysarthria ratings based on the monologue task (criterion) was examined via Spearman's correlation in the ataxia group.Hierarchical linear regressions were used to determine the overall dysarthria severity score contributed to the difference in Total Correct Items scores among participants with different disease severity in the PD and ataxia group, respectively.Education was set as a covariate.This was achieved by utilizing a hierarchical linear regression analysis comprising a pre-arranged order, namely disease severity followed by the years of education followed by dysarthria severity.Age was not included as a covariate to reduce multicollinearity bias because it was correlated with both disease severity and dysarthria.
In addition, because speech impairment is included in both the SARA and MDS-UPDRS Part III scales, a supplemental hierarchical linear regression investigated the relationship between Total Correct Items and dysarthria among different disease severity levels, with speech items omitted from the disease severity measures.
The average of each qualitative outcome measure across three VFT types was calculated (i.e., intrusions, ambiguous verbalizations, perseverations, and interjections).One-way ANOVA with post-hoc Tukey tests was used to compare each parameter across dysarthria severity levels.
The intra-class correlation was calculated to estimate interrater reliability.Intra-class correlation with a twoway random model estimated the absolute agreement of dysarthria ratings by two raters across articulation, voice quality, resonance, intelligibility and naturalness (Koo & Li, 2016).The intra-class correlation of resonance in the PD group was replaced by the agreement rate due to the lack of intra-population variation.Statistical analyses were performed using Minitab 19 (Minitab LLC, Center County, Pennsylvania) and Statistical Package for the Social Sciences (25.0; IBM Corp., Armonk, NY).

Dysarthria and total correct items
Table 1 shows the median and range of dysarthria ratings and the mean of the Total Correct Items score in semantic, phonemic, and alternating fluency from two disease groups.Although this is not the main purpose of this study, higher dysarthria severity (see Table 1, p = 0.001**) and a wider dysarthria rating range (see Table 1) were observed in the ataxia group compared to the PD group.The ataxia group and the PD group employed different task instructions in VFTs except for semantic fluency.In semantic fluency, the two groups did not exhibit statistically significant differences (see Table 1; p = 0.437).

Correlation between dysarthria ratings based on VFT and monologue
Dysarthria ratings based on VFT were positively correlated with dysarthria ratings based on the monologue task in two summative speech quality measures, namely intelligibility and naturalness (see Figure 1).The Spearman's ρ between ratings based on VFT by two trained raters and ratings based on monologue by independent raters was 0.715 (p < 0.001, Figure 1A) and 0.727 (p < 0.001, Figure 1B) for intelligibility and naturalness, respectively, indicating strong positive correlations.

Ataxia group
Table 2 summarizes the regression analyses.All Betas reported in Table 2 were obtained from the final regression model after all independent variables were entered.Dysarthria ratings significantly predicted decreased Total Correct Items score in all three tasks    2 were from the final regression model after all independent variables were entered.
When the speech item was omitted from the SARA score, the regression analysis provided similar results.Dysarthria significantly predicted decreased Total Correct Items scores in all three tasks, and the SARA score without the speech item did not predict Total Correct Items in alternating fluency (Table A4).
When the speech item was omitted from the MDS-UPDRS Part III score, the regression analysis provided similar results.The years of education significantly predicted increased Total Correct Items in phonemic fluency (Table A4).The MDS-UPDRS Part III score (without the speech item) and dysarthria did not predict diminished Total Correct Items in all verbal fluency tasks.

Impact of dysarthria on qualitative measures: Error and filler analysis
Figure 2 shows the results of one-way ANOVA and post-hoc comparisons of intrusions, ambiguous verbalizations, perseverations, and interjections.In the ataxia group, there was a significant main effect of dysarthria on ambiguous verbalizations (F 4,60 = 25.00,p < 0.001, η 2 = 0.641; see Figure 2A).Post-hoc comparisons revealed more ambiguous verbalizations in participants with severe dysarthria than those with no dysarthria (adjusted p < 0.001), mild (adjusted p < 0.001) or moderate severity (adjusted p < 0.001).In the PD group, there was no significant effect of dysarthria on all four qualitative measures.

Interrater reliability
The raters exhibited high-to-excellent (0.692-0.938) levels of interrater reliability (Table 3) on articulation, voice quality, intelligibility and naturalness and fair reliability (0.692) on resonance in the ataxia group.

Discussion
The influence of disease severity on verbal fluency performance decreased in ataxia and PD groups when dysarthria severity was considered.Increased dysarthria severity was associated with worse performance on VFT, largely independent of disease severity.In the ataxia group, dysarthria predicted poorer performance in semantic, phonemic and alternating fluency.Participants with comorbid ataxia and dysarthria presented with increased ambiguous verbalizations, supporting our hypothesis that more severe dysarthria is associated with increased erroneous items and diminished correct items.A secondary aim of the study was to establish criterion validity and interrater reliability in measuring dysarthria severity in VFT.Our data suggest that intelligibility and naturalness ratings derived from VFTs can be accurately acquired when compared to traditional tasks used for rating speech (i.e., connected speech).

Effect of dysarthria on total correct items and qualitative measures
Dysarthria impacted VFT performance, supporting earlier work in the ataxia population (Feng et al., 2014;Hoche et al., 2018;Lopes et al., 2013).The largest impact was observed across all three VFTs in the ataxia group, with no impact in the PD group.Disease severity appeared to have a small effect on VFT performance when dysarthria was considered in the ataxia group.In contrast, in the PD group, performance on the all three fluency tasks were not driven by either disease severity or dysarthria, and performance on the phonemic fluency task was associated with education levels.Severe dysarthria in ataxia participants was associated with more ambiguous verbalizations and diminished Total Correct Items, as items became unintelligible and were considered erroneous in scoring.An increase in ambiguous verbalizations might be linked to a decrease in Total Correct Items.This was not the case for the PD group, which presented with less severe speech disorder overall.Together, these data suggest that judgments of cognitive capacity may be underestimated if based on VFT in speakers with dysarthria.

Task types and ambiguous verbalizations
Performance on VFT appeared to vary by disease group.In the ataxia group, dysarthria had a significant impact on all VFT tasks.An explanation for this impact might be attributed to cognitive-motor interference (CMI), in which comorbid cognitive and motor disturbances can induce difficulties in tasks requiring simultaneous cognitive and motor responses (Barbosa et al., 2017).As a result, a VFT that is already affected by participants' specific cognitive impairments might be disproportionately vulnerable to oral motor impairments.Studies on spinocerebellar ataxia type 1 and Friedreich's ataxia have delineated greater impairment in phonemic and alternating fluency compared to semantic fluency due to † %agreement between raters = 94.2%;ICC could not be calculated because of lack of variability of the ratings, that is, most subjects were rated zero in terms of resonance.
frontal executive dysfunction (Bürk et al., 2003;Nóbrega et al., 2007).Studies on spinocerebellar ataxia type 6 and multiple system atrophy of the cerebellar type (MSA-C) have reported impaired semantic and phonemic fluency (Bürk et al., 2006;Suenaga et al., 2008).These findings implied that the execution of these tasks might be more vulnerable in populations with dysarthria and related CMI secondary to cerebellar conditions.A unique finding from the current study is the impact of dysarthria on semantic fluency, despite previous work suggesting semantic fluency remains intact in people with ataxia (Nóbrega et al., 2007).Our data suggest that dysarthria might be a driver of poor performance on semantic fluency tasks in this group, as disease severity did not appear to influence outcomes when dysarthria was considered.

VFT dysarthria ratings: Criterion validity and reliability
Trained raters appear to provide equivalent ratings for dysarthria from VFT and connected speech tasks alike.
While no identical studies are available for comparison, it appears that dysarthria ratings can reliably be attributed to speakers when raters can identify dysarthric characteristics in single-word utterances, (e.g., distorted articulation, hyper-and hypo-nasality).VFT speech samples appear sufficient to yield summative ratings of intelligibility.However, it is important to note that raters may not have adequate information to perceive speech characteristics such as monoloudness, monopitch, breath support, and resonance as well as functional capacity on single-word samples.

Implications and limitations
These results built on existing evidence of CMI in neurological populations with dysarthria and contributed to a clearer understanding of how CMI could impact patients' performance in tasks involving verbal responses.Most methodologies do not consider dysarthria as an influencing factor on VFT (e.g., Henry & Crawford, 2004;Lopes et al., 2013;Rodrigues et al., 2015;Tamura et al., 2018).Thus, to optimize assessment, when participants present with cognitive impairments and dysarthria, it is prudent for assessors to incorporate non-verbal cognitive measurements and/or account for the impact of dysarthria on performance (e.g., incorporate dysarthria ratings into covariate analysis).Some key limitations of our protocol relate to disease severity distribution and methods for dysarthria rating validation.It was not clear whether the established criterion validity of VFT dysarthria ratings in the ataxia group could be generalized to the PD group, as comparisons were only made for ataxia.Our cohort did not include individuals with advanced PD or severe dysarthria, and the ataxia group included a cohort of mixed ataxia types, rather than a specific genotype.The VFT instructions were slightly different in the two primary studies, which did not allow for direct comparisons between groups.

Conclusions
Dysarthria appears to influence verbal fluency performance in ataxia and Parkinson's disease.When dysarthria severity was included as a covariate in predictive models of VFT performance, the impact of disease severity was reduced.We also provide evidence that dysarthria severity can be reliably derived from VFT.

Table 1 .
Demographic and clinical characteristics of participants.

Table 2 .
Hierarchical linear regression analysis: the effect of dysarthria on total correct items.