A comparative study of automatic vowel articulation index and auditory-perceptual assessments of speech intelligibility in Parkinson’s disease

Purpose : The purpose of this study was to analyse the relationship between automatic vowel articulation index (aVAI) and direct magnitude estimation (DME) among speakers with Parkinson’s disease (PD) and healthy controls. We further analysed the potential of aVAI to serve as an objective measure of speech impairment in the clinical setting. Method : Speech samples from native Finnish speakers were utilised. Expert raters utilised DME to scale the intelligibility of speech samples. aVAI scores for PD speakers and healthy control speakers were analysed in relationship to DME speech intelligibility ratings and, among PD speakers, disease stage utilising nonparametric statistical analysis. Result : Mean DME intelligibility ratings were lower among PD speakers compared to healthy controls. Mean aVAI scores were nearly the same between speaker groups. DME intelligibility ratings and aVAI were strongly correlated within the PD speaker group. aVAI and DME intelligibility ratings were moderately correlated with disease stage as measured by the Hoehn and Yahr scale. Conclusion : aVAI was observed to be a promising tool for analysing vowel articulation in PD speakers. Further research is warranted on the application of aVAI as an objective measure of severity of speech impairment in the clinical setting, with varying patient populations and speech samples.


Parkinson's disease
Parkinson's disease (PD) is a progressive neurodegenerative disease classified by the loss of dopamine within structures in the basal ganglia (V eronneau-Veilleux et al., 2021).Among the development of other symptoms, approximately 90% of individuals diagnosed with PD experience some change in their communication abilities, commonly diagnosed as hypokinetic dysarthria (Logemann et al., 1978;Miller et al., 2007;Spielman et al., 2011;Ramig et al., 2018).Hypokinetic dysarthria is a motor speech disorder characterised primarily by imprecise articulation, monoloudness, reduced loudness, monopitch, as well as a harsh and breathy voice quality (Schalling et al., 2017;Ramig et al., 2018).
As PD progresses, articulation precision deteriorates, for the range of movement of the articulators becomes more limited (Skodda et al., 2012).This change in articulation, also referred to as undershoot, results in decreased speech intelligibility (Whitfield & Goberman, 2014).These changes can also be observed in acoustic measures.For example, Walsh and Smith (2012) observed shallower second formant slopes in PD speakers compared to control speakers, whereas Rusz et al. (2013) identified that a single formant measurement, the second formant (F2) of the vowel /u/ (F2u), distinguishes healthy speakers from speakers with PD in sentence repetition speaking tasks.
Additionally, symptomatic differences between males and females with PD have been observed.Georgiev et al. (2017) state that females tend to present with tremor-dominate PD while males tend to present with more rigidity.Variability in fundamental frequency (F0) was found to be reduced in all PD speakers when compared to controls; F0 was found to be elevated in male PD speakers (Skodda et al., 2011a).

Acoustic measures and speech intelligibility
The relationship between acoustic measures that examine vowel acoustics and speech intelligibility among speakers with motor speech disorders has been deeply explored.The primary acoustic measures utilised in research when analysing speech from individuals with PD are vowel space area (VSA), vowel articulation index (VAI), and its inverse the formant centralisation ratio (FCR).Lansford and Liss detailed the ability of vowel acoustics to distinguish dysarthric speech from healthy speech, identify different forms of dysarthria (Lansford & Liss, 2014b), and the influence of acoustic differences on listener perception (Lansford & Liss, 2014a).Vowel space metrics that reflected vowel centralisation were shown to have significant relationships with intelligibility (Lansford & Liss, 2014a).Though the FCR was found to correlate to intelligibility (Lansford & Liss, 2014a), it was previously found to have poor sensitivity in identifying dysarthric speakers (Lansford & Liss, 2014b).
A study by Feenaughty et al. (2014) showed varying, and at most a moderate, correlation between the selected acoustic features (mean sound pressure level [dB], articulatory rate [syllables/second], F0 range, and F2 interquartile range [F2 IQR; Hz]) and speech intelligibility amongst some of the participants with PD.Through analysis of the vowel space features FCR, VAI, and F2 ratio, Kim et al. (2021) studied the relationship between the vowel space variables and speech intelligibility in individuals diagnosed with motor speech disorders.Results showed at most a moderate correlation between studied vowel space variables and speech intelligibility.A systematic review of the relationship between spectral acoustic features at the phoneme level and speech intelligibility in healthy speakers further acknowledged the complex interplay between these two variables (Pomm ee et al., 2021).
Though there is an established relationship between acoustic features and speech intelligibility, no one measure has proven consistently reliable in predicting speech intelligibility.When analysed in parallel, the above-mentioned studies reveal the multidimensional relationship between acoustic features, their role in perceived speech intelligibility, and the need for further research and development in this area.VAI was selected for use in the current study, for this acoustic measure has consistently differentiated between dysarthric speakers and healthy controls and was shown to be more sensitive in detecting dysarthric speech when compared to VSA (Sapir et al., 2011;Skodda et al., 2012).

Automated acoustic measures
Speech intelligibility in the clinical setting Acoustic measures are not consistently utilised in the clinical setting to measure and analyse speech intelligibility.Speech intelligibility ratings can be obtained through standardised measures and informal measures, both of which involve perceptual ratings completed by trained clinicians.Gurevich and Scamihorn (2017) explored the implementation of intelligibility measures among speech-language pathologists (SLPs) providing services to individuals with dysarthria.Of the SLPs that participated, 65% reported that they have access to at least one formal intelligibility measure in their clinical work.As a general trend though, participants reported a preference for the usefulness, efficiency, and simplicity of informal measures.This illustrates the need for an easy, efficient, and objective measure for speech intelligibility that can be readily implemented in the clinical setting.
Acoustic measures provide parameters through which articulation can be analysed.However, manually calculating these measures, including the annotation of speech samples, is subject to human bias, is laborious, and is tedious.VSA and VAI have been automated by Sandoval et al. (2013) and Liu et al. (2021), respectively.These vowel space acoustic measures have been of specific interest to the study of intelligibility in dysarthric speech (Rusz et al., 2013).Sandoval et al. (2013) proposed a novel method to automate the VSA calculation in an effort to develop an objective assessment to support clinical practice.This proposed method is fully automatic and considers all vowels contained in the speech sample.Typically, in the analysis of the American English dialect, VSA is computed from formant frequency measurements of either three (/a, i, u/) or four (/a, i, u, ae/) corner vowels.The incorporation of all 12 vowels enhances the accuracy of the represented vowel space.Additionally, this novel method allows for VSA to be estimated from any length or type of speech sample given that sample contains a range of vowels.The novel automatic VSA method proposed was found to strongly correlate to the control method (Sandoval et al., 2013).Since its publication, several studies have employed the automatic VSA algorithm in the analysis of speech samples (Schwedt et al., 2019;Utianski et al., 2019).

Automatic vowel space area
Automatic vowel articulation index VAI analyses the relationship between the first formant (F1) and second formant of the corner vowels /a/, /u/, and /i/ (Sapir et al., 2011).It is represented by the following equation: In healthy American English speakers, VAI score is close to 1.In speakers with hypokinetic dysarthria, the numerator can be expected to decrease and the denominator increase due to formant centralisation, resulting in lower VAI scores (Sapir et al., 2011).Compared to unnormalised vowel space measures, VAI has shown to be more sensitive in detecting mild cases of hypokinetic dysarthria (Skodda et al., 2011b).
Software developed by Liu et al. (2021) automates the VAI computation (automatic VAI ¼ aVAI).Figure 1 provides an outline of the software.While the novel software can compute aVAI from any speech clip, it was initially tested using a standardised reading passage.Speakers included healthy controls and individuals with PD.PRAAT (Boersma, 2001) was utilised to measure F1 and F2 values for the whole speech clip.A strong linear correlation between aVAI and manual VAI was found (r ¼ .89,p < .00001)when using mean F1 and F2 values for VAI computation.It should be noted that when compared to manual VAI, aVAI scores tended to be lower.
This novel software was found to be a reliable method for the automatic computation of VAI under the research conditions (Liu et al., 2021).If found to be efficacious outside of the initial research setting, this software has the potential to allow clinicians to easily administer aVAI in the clinical setting as a measure to analyse disease progression and track rehabilitation progress, as the use of the tool does not require manual labour besides recording the speech sample and entering the recording into the aVAI software (link to software: https://github.com/SPEECHCOG/autoVAI).

Direct magnitude estimation
Direct magnitude estimation (DME) ratings have been used in multiple studies when analysing speech from speakers with PD (Ma et al., 2015;Tjaden & Wilding, 2011).Specifically, these studies have employed DME in speech intelligibility analysis.DME provides an approach for subjective assessment of perceptual qualities of speech (Schiavetti, 1992;Weismer & Laures, 2002).DME can be completed with a modulus or without a modulus, termed modulus-free scaling.When a modulus, also known as a standard, is employed, a reference speech sample is selected to which all other samples are then scaled against.The level of intelligibility represented by the standard has been shown to influence ratings given by listeners.Given this, the select standard often represents "midrange" intelligibility (Weismer & Laures, 2002).During modulus-free scaling, raters independently scale each sample based on their internal measure of speech intelligibility.Ratings are then converted to a common scale for analysis (Tjaden & Wilding, 2004).However, it has been suggested that raters are less comfortable conducting modulus-free scaling due to the lack of structure within this method (Tjaden & Wilding, 2004;Weismer & Laures, 2002).

Study objectives
This study expands on work published by Liu et al. (2021).In this initial study, novel aVAI software was introduced.The development of the software was detailed and aVAI was piloted with three different language cohorts.aVAI was not explored in relationship to the field of speech-language pathology.
The focus of this study is to explore the possibility of aVAI to serve as an objective measure of speech impairment and to explore the practicality of implementing this measure in the clinical setting.A subset of the PD Finnish speaker data and control speaker data from the primary study was utilised in this study.Unlike the primary study, we further observe the relationship between two measures of speech impairment analysis: DME intelligibility ratings (perceptual measure) and aVAI (acoustic measure) among speakers with PD and healthy controls, disease stage, and between sexes.Information is presented in a manner designed to be accessible to the SLP community.
We addressed the following research questions.(1) Are DME intelligibility ratings consistent within and between raters?(2) Is dysarthric speech reflected as lower DME intelligibility ratings and aVAI in individuals with PD, compared to healthy controls and between sexes?(3) Is there a correlation between DME intelligibility ratings and aVAI measures within each speaker group and between sexes?(4) What is the relationship between the two measures and PD disease stage?
We hypothesise (H) that: H1: DME intelligibility ratings will be consistent within and between raters.H2: Ratings of speech impairment will be lower in speakers with PD and among female speakers.H3: A correlation will be found between DME intelligibility ratings and aVAI scores within each speaker group and male and female speakers.H4: A correlation will not be found between the two measures of speech impairment and PD disease stage.

Method
Speech corpora PD speakers were selected from the Parkinson's Disease Speech corpus of Tampere University (PDSTU) and control speakers were selected from the Healthy Adults Speech corpus of Tampere University (HASTU).Data for both speech corpora were collected from patient's health records.PDSTU includes speech data from Finnish and Swedish speakers diagnosed with PD.Within PDSTU, preand post-intervention data for voice therapy are included.Exclusion criteria for PD speakers included deep brain stimulation (DBS), dementia, and a diagnosis of communication disorders that predate the diagnosis of PD.Speakers from PDSTU had a verified diagnosis of PD; demographic speaker data are provided in Appendix 1. HASTU includes speech data from healthy Finnish speaking adults who had not been diagnosed with acquired or developmental disorders that impact speech, language, or cognitive skills (e.g.dementia, developmental language disorder, stuttering); demographic speaker data are provided in Appendix 2. All speakers, from both PDSTU and HASTU, completed the following speaking tasks: word and sentence repetitions, normal and emotional reading tasks, spontaneous speech tasks, and diadochokinetic tasks (/pK, tK, kK/).All speech recordings for PD speakers were made approximately 2-3 hours after individuals took their PD medication.All speakers provided answers about their hearing abilities via a questionnaire.All control speakers and 12 PD speakers self-reported hearing within functional limits.The three PD speakers who reported hearing not within functional limits utilise a device, and with use of device reported hearing within functional limits.Additionally, speaker background information and various self-rated measures for voice and speech are included in the datasets.

Selection of speakers and speech samples
From PDSTU, pre-intervention data from all Finnish speakers were selected for the analyses (n ¼ 15) to allow for examination of speech prior to the influence of speech therapy intervention.In the selection of control speakers from HASTU (n ¼ 15), a balanced ratio of female to male speakers was maintained and the age range of speakers was selected to be within the range of PD speakers.Speaker demographic information is presented in Table 1.
From both datasets, we selected the normal reading of a passage from a story called Pohjantuuli ja aurinko (The Northwind and the Sun), passage text is included in Appendix 3.This passage is commonly used in Finland in both the clinical and research settings (Kankare et al., 2020).The recording of this passage was done in a quiet room with a headset microphone that was kept 4 cm from the corner of the speaker's mouth, at a 45 degree angle.Recordings were made at a sampling rate of 44.1 kHz in WAV (Waveform Audio File) format, through PRAAT software (Boersma & Weenink, 6.0.37, 2018) and using Focusrite audio interface with a close-talking microphone.

Selection of covariate
In addition to the basic demographic information of speakers, the Hoehn & Yahr Rating Scale for Parkinson's Disease (H&Y scale) scores were selected as a covariate in the analyses, please see Appendix 1 (Rabey & Korczyn, 1995).The H&Y scale was developed in 1967 as a measure to assess the progression of disease stage in PD (Rabey & Korczyn, 1995).In PDSTU the modified H&Y scale was administered by a neurologist when speakers were in the on state.This scale consists of seven stages related to motor disability associated with PD severity: (a) Stage 1, unilateral involvement only; (b) Stage 1.5, unilateral and axial involvement; (c) Stage 2, bilateral or midline involvement without impairment of balance; (d) Stage 2.5, mild bilateral disease with recovery on pull test; (e) Stage 3, mild to moderate disability, bilateral involvement with some postural instability, patient able to live independently; (f) Stage 4, severe disability, patient able to walk independently; and (g) Stage 5, patient confined to bed or wheelchair unless aided (Hoehn & Yahr, 1967).

Expert raters
Professional SLP experts were recruited as raters for DME through social media.Criteria for expert raters included a minimum of 2 years clinical experience, in addition to clinical experience providing speech and language services to the adult neurogenic population.Four expert raters initially volunteered to participate in the study.One rater was unable to attend the listening session.The remaining three expert raters participated in this study (Rusz et al., 2013).Raters had an average of 23.34 years of clinical experience working with the adult neurogenic population.Hearing screenings were not completed for raters, however no rater reported hearing loss.All raters provided written, informed consent.

Automatic vowel articulation index
For all speech samples, aVAI were automatically calculated using software designed by Liu et al. (2021).aVAI was calculated from the entirety of the reading passage Pohjantuuli ja aurinko.The software was used with default settings using the medians of frame-level formant estimates for VAI calculations.
This system utilises an open-sourced, universal phoneme recogniser, Allosaurs, to identify speech frames with qualities resembling the corner vowels (Li et al., 2020).In the current study, in which Finnish speech was analysed, the following are the represented corner vowels: [a, a:, i, i:, u, u:].All speech frames identified as corner vowels are included in the analysis.F1 and F2 frequency ranges are determined for each corner vowel based on the automatically selected candidate speech frames.VAI is automatically calculated from these mean formant estimations from all selected frames.

Direct magnitude estimation
Speech intelligibility ratings were made by expert raters using the DME scaling procedure with a standard (Tjaden & Wilding, 2011;Weismer et al., 2001;Weismer & Laures, 2002;Yunusova et al., 2005).DME estimations were made on a paper form.One DME rating was given for each speech sample; speech samples consisted of three random phrases from a reading passage.
The standard stimulus was selected by two SLPs (authors NP and RC) from the HASTU dataset and represented midrange intelligibility from the given sample set (Weismer & Laures, 2002).The standard was assigned a value of 100.If raters found a speech sample to be half as intelligible than the standard, they were to give it a rating of 50.If they found a speech sample to be twice as intelligible, they were to give it a rating of 200.For the purposes of this study, raters were provided the following definition of intelligibility: "Intelligibility generally refers to the degree to which a spoken utterance is understood by a listener, and it is presumed to be dependent on the integrity of acoustic components that reflect articulation, prosody, voice intensity and quality, and resonance " (Yunusova et al., 2005(Yunusova et al., , p. 1294)).
Given the reading passage, Pohjantuuli ja aurinko, is commonly used in the clinical and research setting, speech samples for each participant were composed of three randomly selected phrases from the reading passage to reduce familiarisation (Kankare et al., 2020).In Microsoft Excel, version 2102, phrases were randomly assigned to each participant using the RAND function.Two of the selected phrases contained at least one production of an annotated corner vowel, the remaining phrase contained two or three productions.
In PRAAT, the assigned phrases were then cut from the participant's reading of the passage.To equalise the voice samples, the recordings were calibrated to obtain the true sound pressure level of the speech samples.The audio samples were edited using MAGIXV R Sound Forge Pro 10.0; any possible background noises were removed.For each participant, a single audio file was created containing the three random phrases.A two second silent pause was inserted between the selected phrases within each speech sample.

Listening sessions
Listening sessions were held in a quiet room.A total of three sessions were held, one rater participated in each session.Listening sessions lasted approximately 2 hours.Raters used Sennheiser HD598 headphones and could independently adjust the volume to a comfortable level (Ma et al., 2021).Speech samples were presented in a random order through Microsoft PowerPoint presentation, version 2102, on a computer.Each slide contained the sample number the rater was listening to or the word standard, and the Microsoft PowerPoint audio icon.One speech sample was presented per slide, allowing raters to self-pace their progression, and a new sample played automatically upon advancement to the next slide.To reduce the effect of familiarisation, samples for each speaker were presented only once for each expert.Ten percent of the speech samples (n ¼ 3) were presented twice to allow for reliability testing, but the experts were not explicitly informed of the repetitions (Tjaden & Wilding, 2011).

Statistical analysis
Statistical analysis was conducted on IBM SPSS Statistics 26.Correlations were tested with Spearman's correlation test.Group comparisons were tested with the Mann-Whitney U Test. Interrater reliability was tested with Kendall's Tau-b (p < 0.01; Tjaden & Wilding, 2011, 2004;Weismer & Laures, 2002).Intra-rater reliability was tested with intraclass correlation coefficient (ICC) with two-way mixed-model selections (Koo & Li, 2016) using the repeated DME samples (Turner et al., 1995).Spearman's correlation test was used to analyse the relationship between aVAI and DME intelligibility ratings.The relationship between the two ratings was analysed within total population of participants as well as within the two speaker groups: PD speakers and control speakers.

Inter-rater and intra-rater reliability
The first research question asked: Are DME ratings consistent within and between raters?The results support our initial hypothesis; DME intelligibility ratings are consistent within and between raters.We found that Kendall's Tau-b scores for inter-rater agreement rates reached s b ¼ 0.828 between Raters 1 and 2, s b ¼ 0.645 between Raters 1 and 3, and s b ¼ 0.889 between Raters 2 and 3. Results indicate modest agreement between raters.All coefficients were found to be significant at the 0.05 level.Average measure ICC ratings were ICC ¼ 0.743 for Rater 1, ICC ¼ 0.917 for Rater 2, and ICC ¼ 0.968 for Rater 3, indicating high intra-rater reliability.

aVAI and DME intelligibility ratings in PD speakers and controls
The second research question asked: Is dysarthric speech reflected as lower DME intelligibility ratings and aVAI in individuals with PD compared to healthy controls?The results do not support our initial hypothesis; DME intelligibility ratings and aVAI scores were lower in speakers with PD.Overall, the PD speaker group had a lower DME mean compared to the control speaker group (Table 2).This trend was reflected when comparing DME averages of PD females to control females and PD males to control males.
However, the PD speaker group was found to have almost the same aVAI mean compared to the control speaker group; information presented in Table 3.This trend is further reflected when comparing aVAI averages of PD females to control females and PD males to control males.aVAI scores of PD and control speakers did not differ based on sex or age.

Relationship between DME intelligibility ratings and aVAI
The third research question asked: Is there a correlation between DME intelligibility ratings and aVAI measures?Our initial hypothesis, that there is a correlation between DME intelligibility ratings and aVAI scores, is not fully supported by the data.aVAI and DME were found to be moderately correlated within the total speaker population, r s (28) ¼ 0.424, p ¼ 0.05; a strong correlation was found in the PD speaker group, r s (13) ¼ 0.752, p ¼ 0.01.aVAI and DME were not found to be significantly correlated within the control speaker group, r s (13) ¼ 0.404, p ¼ 0.136.DME intelligibility ratings from each rater are presented in Table 4. aVAI scores and mean DME intelligibility ratings are presented in Table 5.

Disease stage and intelligibility ratings
The fourth research question asked: What is the relationship between the two measures and PD disease stage?The results support our initial hypothesis.There is not a correlation between DME intelligibility ratings and aVAI scores, and PD disease stage.Mean DME intelligibility ratings and H&Y scale scores were not significantly correlated among the entire PD speaker group, r s (13) ¼ À0.134, p ¼ 0.635, nor among female PD speakers, r s (7) ¼ À0.475, p ¼ 0.197, or male PD speakers, r s (4) ¼ À0.304, p ¼ 0.304.aVAI and H&Y scale scores were not significantly correlated among the entire PD speaker group, r s (13) ¼ À0.152, p ¼ 0.588, nor among female PD speakers, r s (7) ¼ À0.037, p ¼ 0.926, or male PD speakers, r s (4) ¼ À0.338, p ¼ 0.512.Table 6 illustrates changes in DME intelligibility ratings and aVAI scores as H&Y scores progress among PD speakers.

Discussion
The purpose of this study was to observe the relationship between DME intelligibility ratings (perceptual) and aVAI (acoustic measure) in two speaker groups.aVAI scores for PD speakers and healthy control speakers were examined in relationship to perceptual speech intelligibility ratings and PD disease stage, as quantified by gross motor symptoms.Current results encourage future research on aVAI to develop the ability of this tool for analysis of vowel articulation and severity of speech intelligibility in PD speakers.
In analysis of DME intelligibility ratings, high intra-rater reliability and modest inter-rater reliability was found among expert raters.All raters had professional experience working with the adult neurogenic population as an SLP.The use of SLPs to analyse and rate speech intelligibility produced by speakers with PD is supported by Carvalho et al. (2021) who showed that compared to naıve listeners, individuals with experience listening to dysarthric speech had an increased understanding of dysarthric speech.In the future it would be interesting to compare intelligibility ratings done by naıve listeners to aVAI.
Mean DME intelligibility ratings were found to be lower among PD speakers, which is consistent with previous studies (Ma et al., 2015).Mean aVAI scores were found nearly the same between both speaker groups.Given that VAI has previously been superior in distinguishing PD and healthy-speaker groups when compared to other measures (Rusz et al., 2013;Skodda et al., 2011b), the similarity in aVAI scores suggests decreased precision with the automated process and not the VAI equation.
Qualities of speech samples, including speaking rate and vocal quality, impact rater's perception of speech intelligibility (Chiu & Neel, 2020;Dagenais et al., 2006;Ma et al., 2021).Unlike perceptual analysis, aVAI merely considers two formant measurements from three vowels.It is interesting that aVAI values were nearly equal between the two speaker groups.This could possibly be due to the small number of speakers, lack of articulatory change in PD speakers, relatively early disease stage of PD speakers, or it is possible aVAI reflects vowel articulation precision and not severity of dysarthria.Second to the germinal stage of this software, it is still unclear how aVAI should be interpreted and what factors (e.g.dialect, speech rate, coarticulation) are most influential.Given the analysis is based on formant frequency and not text recognition, corner vowels produced by speakers with severe dysarthria may not meet the articulatory and formant frequency targets and therefore not be included in the aVAI analysis.Conversely, additional vowels may be included if the formant features resemble those of corner vowels.Future studies might more closely examine the rank of speaker's aVAI scores in relation to diagnosis or additional measures of disease stage.aVAI was found to be strongly correlated to DME intelligibility ratings within the PD speaker group.This finding illustrates the potential of aVAI to serve as a valid and reliable measure of severity of speech impairment.However, before this measure can be reliably applied in the clinical setting, further research on aVAI under various research conditions is required.A moderate correlation was observed within the total speaker population and within the healthy-control speaker group; no significant correlation was observed.In analysis of the relationship between vowel space variables and speech intelligibility, Kim et al. (2021) also observed modest correlation.Further, Feenaughty et al. (2014) observed modest correlation between analysed acoustic features and speech intelligibility.
There is a clear need for software like aVAI in the clinical setting.Results reflect the complexity between acoustic features and speech intelligibility.Before aVAI can be reliably utilised, there is need for further research in which aVAI is applied in analysing different types of speech samples and speech from various speaker populations.
Within the entire PD speaker group, aVAI and DME intelligibility ratings were not significantly correlated to disease stage as measured by the H&Y scale.Though both aVAI and DME provide us with information about an individual's speech intelligibility and articulation, results support previous findings suggesting that changes in speech intelligibility are independent from disease stage (Miller et al., 2007;Skodda et al., 2011b).The H&Y scale provides a measure for the progression of PD, and primarily analyses an individual's gross motor abilities.The scale does not explicitly consider changes in voice or speech intelligibility.Additional measures that do, in part, account for these changes, including the Movement Disorder Society-Unified Parkinson's Disease Rating Scale (MDS-UPDRS; Goetz et al., 2007) or Voice Handicap Index (VHI; Jacobson et al., 1997), should be considered in the future.
Additionally, differences between male and female speakers were examined.There are known speech changes associated with healthy ageing, for example fundamental frequency has been observed to increase in healthy older males and decrease in healthy older females (Rojas et al., 2020).Patterns observed in the results between male and female speaker groups were consistent with observations made when comparing the overall PD speaker group with controls.aVAI has the potential to be implemented in clinical practice as an objective measure of severity of speech impairment, with the ease and time expediency offered by informal measures.Gurevich and Scamihorn (2017) found the implementation of informal speech intelligibility measures more common than formal measures among SLPs working with patients with dysarthria.Though aVAI does not need any manual work besides recording of the speech signal (e.g.no need for data annotation), it does require access to equipment (e.g.microphone, computer) and knowledge of how to accurately implement and run the software.
Several attempts have been made to develop technology that automates the computation of acoustic measures.Sandoval et al. (2013) proposed a novel method for automating the VSA computation, which was found valid under the initial research settings and has since been used in publications to analyse speech.Liu et al. (2021) pursued the development of novel aVAI software, a method that was also found to be efficacious in the initial research setting.Automated acoustic measures have the potential to serve as reliable clinical tools to be utilised alongside clinical judgement and perceptual observations.Dependent on future research and development, these measures could provide clinicians with precise, objective measures that aid in the diagnosis, analysis, and tracking of various speech and voice disorders.
There were several limitations in this study.Firstly, a relatively small number of expert raters participated in this study, however, this has been done in previous studies (Rusz et al., 2013).The number of analysed speech samples was relatively small, though similar sample sizes have also been employed in previous work (Rusz et al., 2013;Strinzel et al., 2017).The number of repeated speech samples included as part of intra-rater reliability analysis was also relatively small.In the future, given the small sample size, it would be beneficial to increase the percentage of samples included in this analysis.The speech sample selected from PDSTU and HASTU, from which aVAI was calculated, was a reading passage that was recorded in a quiet, controlled setting.The sample is not necessarily representative of natural speech.However, it is unclear at this point what the optimal speech sample would be for aVAI analysis.Finally, the length of the speech sample for each speaker that was analysed by the expert raters (three random phrases) was markedly shorter than the speech sample analysed by aVAI (entire reading passage).The influence of speech sample length on aVAI is an important focus for future research given the inclusion of phonemes in the analysis that share similar formant characteristics to the corner vowels.
Further development of aVAI is needed before it can be reliably utilised in the clinical setting.It would be beneficial for future research to focus on testing the applicability of the software with different types of speech samples (e.g.spontaneous speech sample, narrative speech sample), in different environments, and with different populations of speakers (e.g.speakers with cerebral palsy, or amyotrophic lateral sclerosis [ALS]).Additionally, it would be beneficial to inspect aVAI and its ability to analyse the spectrum of speech impairment by comparing speakers of different disease stages.Future studies may also focus on comparing aVAI analysis of speech samples differing in length, and the influence of dialects on aVAI.In conclusion, assessment of speech disorders should comprise a battery of evaluation measures.Here aVAI has the potential to serve as a measure that can be added to one's evaluation battery in assessing disease progression, speech intelligibility, and efficacy of speech therapy.

Disclosure statement
No potential conflict of interest was reported by the authors.

Figure 1 .
Figure 1.Basic outline of automatic vowel articulation index (aVAI) software processing pipeline.Based on a figure from Liu et al. (2021).

Table I .
Speaker demographics.Parkinson's Disease Speech corpus of Tampere University.ÃÃ HASTU ¼ Healthy Adults Speech corpus of Tampere University.

Table II .
Direct magnitude estimation intelligibility ratings summary table.

Table III .
Automatic vowel articulation index (aVAI) summary table.

Table VI .
Parkinson's disease speaker group: Changes in direct magnitude estimation (DME) and automatic vowel articulation index (aVAI) as Hoehn and Yahr (H&Y) score advances.¼ female, M ¼ male.ÃÃ DME ¼ mean DME value from three SLP ratings.