Relationships between behavioural and self-report measures in speech recognition in noise

Abstract Objective Using data from the n200-study, we aimed to investigate the relationship between behavioural (the Swedish HINT and Hagerman speech-in-noise tests) and self-report (Speech, Spatial and Qualities of Hearing Questionnaire (SSQ)) measures of listening under adverse conditions. Design The Swedish HINT was masked with a speech-shaped noise (SSN), the Hagerman was masked with a SSN and a four-talker babble, and the subscales from the SSQ were used as a self-report measure. The HINT and Hagerman were administered through an experimental hearing aid. Study sample This study included 191 hearing aid users with hearing loss (mean PTA4 = 37.6, SD = 10.8) and 195 normally hearing adults (mean PTA4 = 10.0, SD = 6.0). Results The present study found correlations between behavioural measures of speech-in-noise and self-report scores of the SSQ in normally hearing individuals, but not in hearing aid users. Conclusion The present study may help identify relationships between clinically used behavioural measures, and a self-report measure of speech recognition. The results from the present study suggest that use of a self-report measure as a complement to behavioural speech in noise tests might help to further our understanding of how self-report, and behavioural results can be generalised to everyday functioning.


Introduction
According to the World Health Organisation (WHO), it is estimated that more than 5% of the world's population has a hearing impairment (HI) (WHO 2015). Age-related decline in hearing thresholds leads to difficulties in perceiving sound and/ or speech (Humes et al. 2012). As age-related difficulties in speech perception are often exacerbated with hearing impairment, it is important to know how interfering speech affects listening to be better able to understand the difficulties some individuals experience in different listening situations (Tun et al. 2002;Rajan and Cainer 2008;Heinrich and Schneider 2011).
Listening to speech in adverse listening conditions, and its relation to age-related declines in hearing, have been widely investigated over the last decades (e.g. Humes and Dubno 2010;Heinrich and Schneider 2011;Humes et al. 2013). However, difficulties in listening to speech recognition in noise are not only associated with declines in hearing thresholds (as measured with pure-tone audiometry) but also incorporate cognitive factors such as working memory and executive functions (H€ allgren et al. 2005;Ng Ning Hoi et al. 2013;R€ onnberg et al. 2013). The putative link between hearing loss and cognitive factors has been investigated using both behavioural and self-report measures of speech recognition in noise (see e.g. Schneider et al. 2010).
However, understanding the link is arguably complicated by the variety of methods used to measure hearing and listening, both in the clinic and for research purposes.
Behavioural audiological measures are the standard for assessing hearing thresholds (Taylor 2007), and speech recognition in noise and in quiet in the clinic. Closed set speech recognition in noise tests is available in a variety of languages, including the Dantale test in Danish (Lunner et al. 2012), or the French Matrix test (Jansen et al. 2012). In the current study, we used the Swedish Hagerman sentences (Hagerman 1982). They were included as they are frequently used in audiology clinics in Sweden as a measure of speech recognition in noise. These consist of five-word sentences in a closed-set structure with high redundancy but little to no semantic context (e.g. "Britta has five black rings"). In the current study, we also used the Hearing In Noise Test (HINT) H€ allgren et al. (2006). This test is presumably more ecologically valid, using open set material with everyday sentences (e.g. The red boots were too small) in both quiet and with background noise such as speech-shaped noise (SSN). Thus, the sentences used in the two tests differ in both structure and the degree of contextual information available.
However, behavioural measures of speech recognition in noise do not fully reflect the complexity of everyday listening situations. Many adults complain about difficulties in listening to speech in noisy environments, despite clinically normal audiograms (CHABA 1988). The individuals' view of how noise and other talkers affect various listening environments is considered increasingly important to better capture how behavioural and self-report measures benefit the individual when considering hearing rehabilitation (McRackan et al. 2018).
A popular self-report measure of hearing is the SSQ (Speech, Spatial and Qualities of Hearing Questionnaire) (Gatehouse and Noble 2004). It was developed to report a rich set of communication domains, ranging from speech recognition in various settings, quiet and noise (summarised in a speech subscale), spatial hearing, including movement, perception, and discrimination (summarised in a spatial subscale), and qualities of hearing such as signal discrimination, clarity and naturalness, and ease of listening (summarised in a quality subscale) (Gatehouse and Noble 2004). The SSQ is available in a variety of versions, ranging from the SSQ5 (Demeester et al. 2012), SSQ12 (Noble et al. 2013), SSQ15 (Kiessling et al. (2011), 15iSSQ (Moulin et al. 2019 to the complete SSQ (Gatehouse and Noble 2004).
Although research on behavioural measures in relation to self-report measures of speech recognition in noise is sparse, many of the studies (e.g. Banh et al. 2012;Olsen et al. 2012) incorporate only pure-tone audiometry, or a single measure of speech recognition in noise as a behavioural measure. Some studies report (e.g. Banh et al. 2012;Olsen et al. 2012) no significant correlations between pure-tone hearing thresholds and self-report scores on the SSQ in both normally hearing (Banh et al. 2012) and hearing impaired individuals with asymmetrical hearing loss (Olsen et al. 2012). Akeroyd et al. (2014) reported weak to moderate negative correlations (ranging between À.20 and À.37) between the SSQ and better-ear hearing in a large sample of adults with hearing impairment divided into groups of unaided, unilaterally aided and bilaterally aided hearing. The results showed the highest correlation between better ear hearing and the speech subscale in the unaided condition, however, there was little difference between the groups across the three clear factors in their study. When dividing the participants into groups based on hearing asymmetry (mainly symmetric; partly asymmetric; highly asymmetric), they found no substantial differences in factor loadings on the SSQ between the groups (Akeroyd et al. 2014). Von Gablenz et al. (2018) report a relationship between pure-tone average at 500, 1000, 2000, and 4000 Hz (PTA4) and the different subscales in the SSQ, where the strongest correlation was found for the speech subscale (r¼ À.33). In their study, however, the most influential factor on scores in the speech subscale in the SSQ was self-reported hearing difficulties, as reported in a set of general questions regarding health issues (Wording of the question "Which of the following apply to you? Please check all that apply: Hearing difficulty, poor eyesight, high blood pressure, back problems, none of these.  correlation between the French Matrix test when measured dichotically in adults with asymmetrical hearing loss, and scores on the SSQ. However, no other significant correlations between the speech recognition measures and the SSQ were found in their study (Vannson et al. 2015). Heinrich et al. (2019) found a weak correlation (r ¼ .29) between a measure of speech recognition in noise and the speech subscale in the SSQ. Further research (Heo et al., 2013;Capretta and Moberly 2016;Ramakers et al, 2017) in hearing impaired (HI) persons post cochlear implantation (many of them using contralateral hearing aids), report weak to moderate correlations between speech tests in noise and the SSQ as an outcome measure of speech recognition. The strongest relationships were found for speech localisation and the spatial subscale in the SSQ. For the speech subscale the relationships with words or sentences in noise were either none, weak, or moderate (Heo et al. 2013;Capretta and Moberly 2016;Ramakers et al. 2017). Demeester et al. (2012) compared SSQ responses in young (18-25 year old) normally hearing, older (55-65 year old) normally hearing, and older (55-65 year old) HI individuals. They found that older adults, compared to young adults, with normal audiometric hearing thresholds reported significant disability, i.e. low scores on the SSQ, especially for speech recognition in noise conditions, despite minimal disability. Demeester et al. (2012) discussed, based on previous research, that disability measures such as the SSQ rarely show strong correlations with impairment measures such as puretone hearing thresholds, but that impairment measures and disability measures provide complementary information about an individuals' hearing. Overall, the mentioned studies show that the link between behavioural and self-report measures is ambiguous. The majority of the studies referenced in the current study with large sample sizes have not investigated the relationship between speech recognition in noise and self-report measures in the SSQ (for an overview of the articles referenced in the current study, see Table  1). The current study uses a relatively larger sample size incorporating both behavioural measures, including speech recognition in noise, and self-report measures ( Table 2).

The current study
The findings, to date, between behavioural speech recognition in noise measures and self-report measures show mixed results. To help resolve the issue, the aim of the current study was to investigate the relationship between behavioural measures of two different speech recognition in noise tests and self-report scores from the subscales in the SSQ in two groups differing in air-conduction pure-tone hearing thresholds. The speech recognition in noise tests administered had different characteristics and were masked with either speech shaped noise or a four-talker babble (see method description for more details on the materials).
The specific questions of interest are: To what extent do behavioural measures of speech recognition in noise and subjective scores on the subscales in the SSQ relate to each other? How does the relationship between speech recognition in noise and scores on the subscales in the SSQ differ between the NH and HI groups?

Methods
The data in the present study are a selection of available variables from the longitudinal N200 study (R€ onnberg et al. 2016).
The following variables were selected from the database, according to a standardised Data Transfer Agreement form: air-conduction pure-tone audiometry, the HINT sentences masked with a SSN, Hagerman speech recognition in noise test with two different background noises (SSN and four-talker babble), and selfreport data from the SSQ.

Participants
In total, 433 individuals participated in at least one session of three in the longitudinal study N200 (R€ onnberg et al. 2016). However, we only included participants that had completed the SSQ and the speech recognition in noise tests. The hearing aid users in the current study were recruited via the hearing clinic and two clinical audiologists performed the tests. Participants were deemed by the audiologists to have sensorineural hearing loss ranging from mild-severe based on air-conduction thresholds from 125 Hz to 8000 Hz. Participants with greater than 10 dB HL (hearing level) difference between air-conduction and bone-conduction measurements at two consecutive frequencies were not included in the study to avoid participants with conductive hearing loss. The hearing aid users had used their hearing aids an average of 6.7 years (SD ¼ 6.6) (R€ onnberg et al. 2016). The normally hearing adults had pure-tone hearing thresholds better than 20 dB HL at 500, 1000, and 2000 Hz in the better ear, and no worse than 40 dB HL in the same frequencies in the worse ear. This resulted in 191 hearing aid users (M age ¼ 60.8 years) with hearing loss (mean PTA4 ¼ 37.6 dB HL, SD ¼ 10.8 dB) and 195 (M age¼ 61.6 years) without pure-tone audiometric hearing loss (mean PTA4 ¼ 10.0 dB HL, SD ¼ 6.0 dB) (PTA4 ¼ 500, 1000, 2000, and 4000 Hz).

Measurements
The SSQ was administered in paper form and chosen as a measure of subjective hearing ability in various environments where the three subscales reflect different aspects of listening in adverse and non-adverse listening conditions. The speech materials were the Hagerman sentences (Hagerman 1982)

Procedure
All participants were tested in a CA Tegn er sound-attenuated booth (T-room model). The test procedure started by measuring air conducted pure-tone hearing thresholds, followed by the HINT and the Hagerman speech recognition in noise test. The auditory tests (The HINT sentences and Hagerman test in the current study) were generated by an ECHO Audiofire 8 external PC soundcard at a sampling rate of 22,050 Hz and transmitted to an experimental behind-the-ear hearing aid (Oticon Epoq XV) (see R€ onnberg et al. 2016 and Supplemental material for more details). The pure-tone audiogram for each participating individual was entered into the experimental hearing aid, and the speech recognition in noise tests were administered with the individually prescribed linear amplification algorithm (R€ onnberg et al. 2016). Practice trials were presented in the speech recognition in noise tests for the participants to get acquainted with the test procedure. Prior to testing, when given instructions regarding the test procedure, the hearing aid users were encouraged to use their own hearing aids. The SSQ was completed by the participants prior to testing.

HINT
The Swedish HINT sentences (H€ allgren et al. 2006) are 250 sentences divided into 25 phonemically balanced lists, with 20 sentences in each list. The sentences are 3-7 words long (e.g.: "The big black horse froze" and "Grandfather waxed the car"). In this study, the target speech was presented at a level of 65 dB SPL (sound pressure level), and the noise level was varied in an adaptive procedure in steps of 2 dB to obtain an intelligibility level targeting SRT for 50% correct whole sentences (Larsby et al. 2012). One list (20 sentences) was used as a practise trial for the participant to get acquainted with the test procedure. After the practise trial, 20 sentences were presented in a SSN and mean SRTs were calculated based on the signal-to-noise ratios (SNRs) obtained from the last 15 sentences.

Hagerman sentences
In this study, the Hagerman sentences and noises were presented at levels targeting SRTs for either 50%-or 80%-word recognition in an interleaved method (Brand 2000) where target speech was presented at a level of 65 dB SPL. The procedure was adaptive, meaning that SNRs decreased or increased by 1 dB depending on the number of correct recognised words (two correct recognised words did not result in an increase or a decrease in dB SNR. One or zero correct recognised words resulted in þ1 and þ2 dB SNR, respectively, while three, four and five correct recognised words resulted in À1, À2, and À3 dB SNR, respectively). Each participant was presented with one list (10 sentences in each list) as a practise trial. According to Hagerman and Kinnefors (1995), one training list before each test session reduces any training effects. The participants were presented with three lists in each background masker, after the practise trial, to obtain the correct SRTs. The mean SRT of the sentences at 50% and 80% was used as a measure of speech recognition in noise. Speech, spatial and qualities of hearing scale (SSQ) The Speech, Spatial and Qualities of hearing scale is a questionnaire developed by Gatehouse and Noble (2004) containing questions divided into three main subscales: speech, spatial, and qualities. There are also additional items specifically aimed to assess aided listening (Gatehouse and Noble 2004). The original questionnaire is comprised of 50 items divided in to three main subscales: speech, spatial, and qualities. The speech subscale contains 14 items concerning listening to speech in difficult listening situations; the spatial subscale contains 17 items concerning spatial location of sounds; the qualities subscale contains 19 items concerning the quality of hearing various sounds and speech. The respondent rates if a listening task can be performed well or not on a scale ranging between 0 and 10, where 0 corresponds to "not at all" and 10 to "perfectly." The instructions for the hearing aid users were to rate the various listening situations based on subjective hearing while using hearing aids.

Data analysis
Calculations show that the sample size in the current study is at least 90% powerful to detect a small effect size (i.e. r ¼ .24). We followed the definition by Gaeta and Brydges (2020) as the typical Cohen's (Cohen 1988) guidelines tend to underestimate the effect sizes in speech, language, and hearing research. All data were pre-processed and analysed within the R statistical programming environment (R Core Team 2019). Pearson product-moment correlations were performed between the speech recognition in noise tests (HINT and the Hagerman) and the mean score of each item of each subscale in the SSQ. As previous research has found an effect of age on SSQ scores (e.g. Bahn et al. 2012), partial Pearson product-moment correlations were carried out between the speech recognition in noise tests and the subscales and age. Data visualisation was carried out using the ggplot2 package (Wickham 2016) and partial correlations were carried out with the ppcor package (Kim 2015). In a final step, we also explored the relationship between each item of the speech subscale and the speech recognition in noise tests, also using Pearson product-moment correlation coefficient. The pvalues, from all statistical tests reported were adjusted for multiple comparisons using the Holm-Bonferroni correction (Holm 1979).

Speech recognition in noise and the SSQ
The correlations between the HINT, the Hagerman masked with SSN or a four-talker babble, and the subscales in the SSQ are presented in Table 3. The correlations for the normally hearing group show that self-report scores in the Speech and Quality subscales (SSQ) decrease with lower performance in the speech recognition in noise tests (i.e. lower scores mean more perceived difficulties). For the hearing aid users, SNRs did not correlate with self-report scores in the subscales on the SSQ. To follow up the significant correlations between the speech recognition in noise tests and SSQ subscales for the normally hearing group, we performed partial correlations controlling for age. Concerning the normally hearing group, the partial correlation showed significant negative correlations between the HINT and the speech subscale, r(194) ¼ À.23, p < .001 and between the HINT and the Quality subscale, r(194) ¼ À.16, p ¼ .03. See Figures 2 and 3 for the relationship between the HINT and the speech subscale, and the relationship between the HINT and the Quality subscale, respectively. However, the correlation between the Hagerman masked with a four-talker babble and the Quality subscale was rendered non-significant when controlling for age, r(195) ¼ À.13, p ¼ .08. Partial correlations for the non-significant correlation analyses are reported in Supplemental material 1.
Results also showed that PTA4 correlated with the speech recognition in noise tests in both normally hearing and hearing aid users, but not with the subscales or the total score of the SSQ (Table 3).

Correlations between speech recognition in noise and items in the speech subscale
In a final exploratory analysis, we examined the correlation between each item of the speech subscale of the SSQ and the SNRs in the speech recognition in noise tests (Table 4). For the normally hearing group, items #3, #4. #7, and #12 1 negatively correlate, with a small effect size, with performance in the HINT, indicating more self-rated difficulties with lower performance in the speech recognition in noise test. Once again, for the hearing aid users, no significant correlations were found between any of the items in the speech subscale, and the speech recognition in noise tests.

Discussion
The present study investigated the relationship between behavioural measures of speech recognition in noise and a self-report measure of listening to speech in adverse listening conditions, using a large sample size of normally hearing adults (N ¼ 195) and adults using hearing aids listening through an experimental hearing aid (N ¼ 191). The results showed (1) significant negative correlations between behavioural measures and self-report scores on the speech subscale, and the quality subscale, of the SSQ for normally hearing adults; (2) but no significant correlations between the self-report scores on the SSQ and behavioural speech recognition in noise performance for the sample of older adults using hearing aids; (3) no significant correlations between the SSQ and PTA4 in neither hearing aid users nor normally hearing adults. The results reflect a discrepancy between selfreport ratings of listening difficulties in various adverse listening situations, and performance on behavioural speech recognition in noise tests, suggesting that behavioural measures may not generalise very well to everyday listening situations for individuals using hearing aids.
The speech subscale in the SSQ is meant to evaluate various listening situations in realistic conversational settings. The results  of the present study did not show any significant correlations between the SSQ and the Hagerman test masked with Four Talker babble, nor for the SSN after controlling for age. The structure of the Hagerman test (Hagerman 1982), being a lowcontext material with low ecological validity, may not reflect everyday listening situations, resulting in no significant correlations between the SSQ and the speech recognition in noise test.
For the HINT sentences, the result significantly correlated with the speech subscale, although weakly. The speech material contains semantic context, and is ecologically more valid, and the speech subscale concerns recognising speech in adverse, every day listening situations, which may contribute to the significant correlations. The correlations remained significant for the normally hearing group between the behavioural measures and the self-report measures in the speech and quality subscales, even when controlling for age. However, it is important to note that this correlation was found statistically significant for the normally hearing group, and not for the hearing aid users, where similar results were reported by Demeester et al. (2012) andvon Gablenz et al. (2018). Demeester et al. (2012) used PTA4 as a behavioural measure and did not investigate behavioural speechrecognition-in-noise, contrary to the present study. The present study did not find correlations for neither hearing aid users nor normally hearing individuals between self-report scores of the SSQ and PTA4, while Demeester et al. (2012) found significant correlations in their population. As previously discussed by e.g. Demeester et al. (2012), it may be that the hearingaid users were experienced hearing aid users, and hence have adapted to their hearing status thereby reporting less hearing difficulties in the SSQ. A further explanation could be ascribed to the experimental hearing aid. The experimental hearing aid used a linear amplification algorithm based on every participant's pure-tone audiogram, ensuring that test results would not be confounded by differences in hearing aids and signal processing strategies between participants. However, it is possible that the results in the present study may have had a different outcome if the participants had used their own hearing aids.
The results in the present study suggest that there may be an important relationship between behavioural measures of speech recognition in noise, using a speech material with higher ecological validity, and the difficulties an individual perceives subjectively in adverse listening conditions, regardless of age. Previous research by e.g. Capretta and Moberly (2016), Heo et al. (2013), and Ramakers et al. (2017) mainly found weak or moderate correlations between self-report measures of the SSQ, and behavioural speech recognition in noise measures in hearing impaired individuals with cochlear implants, using contralateral hearing aids, whereas no correlations were found for the hearing aid users in the present study.
The present study showed significant correlations between the speech subscale of the SSQ and the HINT sentences in the older normally hearing participants. Heinrich et al. (2019) found a weak correlation between a behavioural speech recognition in noise measure and the speech subscale of the SSQ in older adults not using hearing aids. Banh et al. (2012) found a negative correlation between speech recognition in noise and the speech subscale in the SSQ in a young normally hearing group. Their results also indicated that only a few questions in the speech subscale were driving the negative correlation, which is on par with the results in the current study. Only a few items, all involving multiple talkers and/or adverse listening conditions, in the speech subscale correlated with the HINT sentences showing that the speech subscale in its entirety may not reflect difficulties listening to speech in adverse situations. Banh et al. (2012) reported that older adults reported the most difficulties on items involving divided attention or multiple background conversations. The items which correlated with speech recognition in noise performance in the current study also reflect listening situations of divided attention or multiple talkers. Although not investigated in the present study this indicates that not only auditory processing factors but also cognitive factors (previously investigated by e.g. Ng Ning Hoi et al. 2013) may be involved when individuals rate listening in various real life listening situations. However, comparatively to the Banh et al. (2012) study, the present study used a larger sample of older adults but does not include a group of younger adults. Although the results are not directly comparable, the older adults with normal hearing in the current study show a similar pattern as the results from Banh et al. (2012), suggesting that the results may still vary depending on age, sample size, and hearing status. The significant correlations in this study were negative, showing that with lower performance in the speech recognition in noise tests, the ratings on the speech subscale decreased. As previously suggested by e.g. Banh et al. (2012), Richard (2016), andMcRackan et al. (2018), the results in this study also suggest that an individuals' subjective view is of importance and that the SSQ may be an important subjective complementary tool alongside behavioural measures used in the clinic to assess the need for, and extent of, possible hearing rehabilitation. The lack of significant correlations between the self-report scores in the SSQ and the behavioural measures for the hearing aid users suggest that it is important to consider subjective experiences of a hearing impairment to help illuminate some of the difficulties experienced by the individual.

Conclusions
The present study aimed to investigate the relationship between behavioural measures of two different speech recognition in noise tests and self-report scores from the subscales in the SSQ in two groups differing in air-conduction pure-tone hearing thresholds. We further aimed to investigate correlations between speech recognition in noise and items in the speech subscale. The results in the present study showed that the speech and quality subscales ratings in the normally hearing adults significantly correlated with performance in the HINT, but not the Hagerman sentences. For the adult hearing aid users (listening through the experimental hearing aid) no significant correlations and hearing aid users (on the right). were found between behavioural and self-report measures. The results further showed that a few items in the speech subscale significantly correlated with the HINT. The items correlating with the HINT all concerned listening to multiple talkers and/or situations with adverse surrounding listening conditions. In conclusion, this study showed that behavioural and self-report measures related to each other in older normally hearing adults, but a relationship between these measures in hearing aid users listening through experimental listening aids could not be found, hence illuminating the ambiguous relationship found between behavioural and self-report measures.
In light of the results of the present study, it would be of interest, and of importance, to evaluate and validate which specific questions in the SSQ subscales relate to speech recognition in noise performance in individuals with varying pure-tone hearing thresholds (aided and unaided). It may also be of importance to investigate if cognitive factors affect which questions in the SSQ may be of importance for speech recognition depending on hearing status.

Disclosure statement
No potential conflict of interest was reported by the authors.