Financial reward has differential effects on behavioural and self-report measures of listening effort

Abstract Objectives To investigate the effects of listening demands and motivation on listening effort (LE) in a novel speech recognition task. Design We manipulated listening demands and motivation using vocoded speech and financial reward, respectively, and measured task performance (correct response rate) and indices of LE (response times (RTs), subjective ratings of LE and likelihood of giving up). Effects of inter-individual differences in cognitive skills and personality on task performance and LE were also assessed within the context of the Cognitive Energetics Theory (CET). Study sample Twenty-four participants with normal-hearing (age range: 19 − 33 years, 6 male). Results High listening demands decreased the correct response rate and increased RTs, self-rated LE and self-rated likelihood of giving up. High financial reward increased subjective LE ratings only. Mixed-effects modelling showed small fixed effects for competitiveness on LE measured using RTs. Small fixed effects were found for cognitive skills (lexical decision RTs and backwards digit span) on LE measured using RTs and correct response rate, respectively. Conclusions The effects of listening demands on LE in the speech recognition task aligned with CET, whereas predictions regarding the influence of motivation, cognitive skills and personality were only partially supported.


Introduction
Listening effort (LE) has been defined as 'the mental exertion required to attend to, and understand, an auditory message' (McGarrigle et al. 2014, p 434). A number of subjective (e.g. the NASA task load index (NASA-TLX; Hart and Staveland 1988), behavioural (e.g. RTs) and physiological (e.g. cardiac reactivity) measures of LE have been proposed (see McGarrigle et al. 2014 for a review). The Framework for Understanding Effortful Listening (FUEL), defines LE as the 'deliberate allocation of mental resources to overcome obstacles in goal pursuit when carrying out a [listening] task' (Pichora-Fuller et al. 2016, p. 10 S). According to this definition, the allocation of cognitive resources is not an automatic response to increased listening demands, but occurs only when a listener is motivated to achieve a particular goal.
The conceptual understanding of motivation outlined by FUEL builds upon Motivational Intensity Theory (MIT; Brehm and Self 1989), which posits that effort expenditure is influenced by both motivation and task demands. Most importantly, MIT predicts an interaction between listening demands and motivation. For relatively easy tasks, resource conservation limits the influence of motivation, such that the amount of effort expended never exceeds that required, regardless of motivation level. In contrast, for relatively hard tasks, effort is mobilised in line with motivation; the greater the importance of success, the more effort is exerted. Thus, MIT predicts an interaction between listening demands and motivation driven by a greater influence of motivation at higher demands, as seen in some previous LE studies (e.g. Kahneman and Beatty 1966;Richter 2016;Mirkovic et al. 2019;Zhang et al. 2019).
Nevertheless, there are inconsistencies both within and between studies that do not fit with MIT predictions based solely on task demands and motivation. When multiple LE measures are used within the same study, these do not always show consistent effects of listening demands and motivation, for example, a tone discrimination task resulted in changes in cardiac reactivity but no effects on performance accuracy or response times (RTs; Richter 2016). These differential effects may be due to the multi-dimensionality of LE, with different outcome measures tapping into interrelated aspects of the LE construct (McMahon et al. 2016;Strauss and Francis 2017;Hughes et al. 2018;Strand et al. 2018;Alhanbali et al. 2019;Herrmann and Johnsrude 2020). Moreover, additional factors may moderate the CONTACT Peter J. Carolan peter.carolan@postgrad.manchester.ac.uk The University of Manchester, A4.01 Ellen Wilkinson Building, Oxford Road, Manchester M13 9PL, UK relationship between task difficulty and motivation such as different ways of operationalising motivation (Picou and Ricketts 2014; c.f. Koelewijn et al. 2018;Richter 2016). For instance, no significant effect on RTs was found when motivating participants with financial reward (Richter 2016), but 'threat of evaluation' decreased RTs in an auditory oddball task (Carrillo de la Peña and Cadeveira 2000). Evaluative threat may increase arousal, a factor which has been demonstrated to result in faster RTs (Hackley and Valle-Incl an 1998); in other task designs increased LE may be reflected in slowed responses due to greater cognitive processing (Pisoni and Tash 1974;Marslen-Wilson and Tyler 1980). The slowing of RTs with increased listening demands has been interpreted as reflecting increased LE (Houben, van Doorn-Bierman, and Dreschler 2013). However, RTs are not a 'process pure measure' (Pichora-Fuller et al. 2016, 19S); therefore, changes in RT may also reflect other aspects of cognition, such as memory.
One aspect that is not considered by FUEL/MIT but which may well influence motivation and effort expenditure is personality traits, such as the need for closure. Need for closure refers to the strength of an individual's preference for clear, ordered and stable knowledge compared to confusion, uncertainty and ambiguity (Kruglanski and Webster 1996;Roets and van Hiel 2008;Viola et al. 2015). Participants with high need for closure, measured using the Need for Closure Scale (Kruglanski and Webster 1996), tend to choose less effortful means to achieve closure (see Kruglanski (2004) for an overview), but may exert greater effort when only effortful means to achieve closure are available (Kruglanski, Peri, and Zakai 1991;Kruglanski, Webster, and Klem 1993;Klein and Webster 2000;Richter, Baeriswyl, and Roets 2012;Sankaran, Szumowska, and Kossowska 2017). Another personality trait that may be important is achievement motivation, which refers to an individual's desire to achieve success and accomplish challenging goals (Capa, Audiffren, and Ragot 2008). The strength of this motive relative to an individual's desire to avoid failure determines resultant achievement motivation (McClelland et al. 1953). Individuals who are high in resultant achievement motivation may exert more effort than those who are low in achievement motivation (Beh 1990;Capa, Audiffren, and Ragot 2008;Hinsz and Jundt 2005;Humphreys and Revelle 1984).
A model from the wider field of effort research, the Cognitive Energetics Theory (CET) (Kruglanski et al. 2012), accommodates many of the aspects of motivation discussed above, and may thus be particularly well suited to elucidate current inconsistencies in LE research. Although also building upon MIT, CET incorporates personality factors and individual differences in resource capacity into a theory of effort and performance. CET therefore offers a more comprehensive model of motivation and effort compared to FUEL (Pichora-Fuller et al. 2016). In addition, CET is intended to apply to 'all instances of goal-directed thinking' (Kruglanski et al. 2012, p. 3), whereas FUEL posits that LE involves the 'deliberate' allocation of resources. Thus, CET is applicable to subjective effort associated with goal pursuit (which may be measured using self-rated outcomes), objective effort (which may be indexed using behavioural and physiological outcomes) and performance accuracy.
CET describes the underlying decision-making process behind effort investment in terms of opposing forces: a 'driving force' towards exerting effort (which depends upon goal importance, i.e. how motivated a person is to succeed in the task, and resource availability) and a 'restraining force' towards restricting effort (which depends upon task demands but also an individual's tendency to conserve resources). The balance between these forces is assumed to govern how much effort is exerted. CET makes a distinction between the maximum energy an individual is willing to mobilise to achieve a specific goal and the actual energy used, which depends upon several factors, including individual differences in the tendency towards resource conservation.

Application of CET to a speech recognition task
In CET, the individual's assessment of the importance of goal achievement, and the size of their resource pool, determines the magnitude of the 'potential' driving force (Kruglanski et al. 2012). Within a speech recognition task, we propose that the resource pool relates to two types of cognitive skills that have received particular scrutiny in the context of speech perception: working memory resources (R€ onnberg 2003;R€ onnberg et al. 2013;R€ onnberg, Holmer, and Rudner 2019) and linguistic skills operationalised as lexical decision-making ability (Kaandorp et al. 2016).
In CET, actual effort expenditure is limited by a restraining force consisting of three additive components: (a) task demands, (b) alternative goals, which compete with the target activity for resources, and (c) resource conservation (Kruglanski et al. 2012). Resource conservation is high in individuals who have a high need for closure (Kruglanski and Webster 1996;Roets and van Hiel 2008;Viola et al. 2015), and based on previous studies (e.g. Beh 1990;Hinsz and Jundt 2005;Capa, Audiffren, and Ragot 2008), may be low in individuals who have a motivational style more focussed on achieving success (i.e. achievement motivation). Hence, considering these personality traits may help to account for differences in LE studies.
Another advantage of using CET over FUEL is that it makes quantifiable predictions about the likely effects of manipulating task demands and motivation on effort and performance in the context of driving and restraining forces. A strong driving force, for example, is expected to permit the use of more effective means to goal attainment and lead to better performance, though at a cost in terms of effort. A strong restraining force, on the other hand, restricts the use of these resource-heavy strategies and hence results in poorer performance. In the present study, our main aim was to investigate how listening demands and motivation (operationalised as financial reward) regulate listening effort in a speech recognition task, in the context of CET. Our secondary aims were to investigate the possible moderating influence of (i) resource conservation (operationalised as need for closure and individual differences in achievement motivation) and (ii) resource pool capacity (operationalised as working memory span and lexical-decision-making ability) on the relationship between motivation and listening demands. We chose different types of outcome measures to reflect the multidimensionality of LE and to test whether CET predictions apply to the subjective (self-reported ratings of LE and likelihood of 'giving up', or avoidance, as described in Picou and Ricketts (2014)) and objective (correct response rate, RT) outcome measures used in the present study. We made the following predictions: We predicted that listening demands would show a main effect, with higher demands resulting in lower correct response rates, longer RTs and higher subjective ratings of LE and likelihood of 'giving up'. We also predicted a main effect of reward, with high value reward expected to motivate participants more than low, and result in higher correct response rates, longer RTs, higher self-rated LE and lower self-rated likelihood of giving up.
Interactions between listening demands and reward were predicted for the correct response rate, RTs and subjective ratings of LE and likelihood of giving up, driven by a greater motivational influence of financial reward under higher listening demands. In addition, we hypothesised that differences in the resource pool capacity (measured by working memory span and lexical-decision making ability) and in the resource conservation aspect of the restraining force (need for closure and individual differences in achievement motivation) would predict the correct response rate and subjective and behavioural measures of LE. We expected measures of resource conservation and measures of resource pool capacity to interact with listening demands and reward for behavioural and self-report LE outcomes.

Participants
To be eligible, participants needed to be between 18 and 35 years old, with normal hearing, normal or corrected-to-normal vision and no previous neurological issues or speech problems. Twentyfour (18 female) NH native-English speaking adults participated in the study, ranging from 19 to 33 years of age (median ¼ 23). This sample size is sufficient to achieve 80% power (1 -b ¼ .80, a ¼ .05) for a medium effect size (f ¼ .25) for a 2 Â 2 repeatedmeasures factorial design (Faul et al. 2009) for each of the four main outcome measures (see Procedures and Data Analysis). Prior to taking part, participants were informed that the purpose of the study was to understand whether a person's motivation to complete a listening task changes the amount of effort they use. Participants were compensated for their time with a £15 honorarium and were informed that they would have the chance to earn additional performance-based rewards by answering the questions correctly, in order to incentivise maximal effort exertion throughout the task (see Speech Recognition Task section below). The study was reviewed and approved by the University of Manchester Research Ethics Committee (approval number: 2019-6493-10583) and pre-registered with the Open Science Framework https://osf.io/6x7pd?view_only=d91bf9c111124fc2ab 1fc6c52893182f.

Hearing screening
Each participant was screened using otoscopy, tympanometry and pure-tone audiometry to ensure they met the eligibility criteria for participation. All participants had bilateral NH ( 20 dB HL for test frequencies of 250, 500, 1000, 2000, 4000 and 8000 Hz) (British Society of Audiology (BSA) 2017) and reported no recent ear infections or surgery, previous neurological issues or speech problems.

Materials
Speech recognition task: Stimuli A speech recognition task using degraded sentences was chosen as these types of tasks are effective in eliciting LE that can be measured using RTs (e.g. Gatehouse and Gordon 1990;Pals et al. 2015). Ninety Harvard IEEE sentences (Rothauser et al. 1969), spoken by a male speaker, were used as the speech materials. Speech intelligibility was modified using vocoding, an effective way to manipulate the intelligibility of speech in a controlled manner (Drullman, Festen, and Plomp 1994;Shannon et al. 1995). Vocoding has been shown to affect subjective and objective measures of LE (McMahon et al. 2016;Winn 2016).
Vocoded stimuli were created using a custom algorithm in Matlab (The Mathworks R 2018a). Speech stimuli were processed using a 2-band (high listening demands) or a 3-band (moderate listening demands) tone vocoder, with the frequency of each vocoder band logarithmically spaced between 80 and 8000 Hz. Two and 3 bands were chosen based on pilot testing that resulted in mean correct response rates of around 80% for the moderate listening demands condition and around 50% for the high listening demands condition, using the speech recognition task described below. The carrier frequencies were 225, 1047 and 4861 Hz for the 3-band vocoder and 440 Hz and 4440 Hz for the 2-band vocoder. The temporal envelope of the output of each channel was extracted using half-wave rectification and smoothing (using a low-pass filter with a cut-off frequency of 300 Hz) and used to modulate a sinusoidal carrier with a frequency equal to the centre frequency of the vocoder band. The signals within each band were then summed to produce tonevocoded sentences.
Speech recognition task: Procedure Figure 1 shows that in each trial of the speech recognition task, the sentence was presented twice. In the moderate listening demands condition, the first presentation ('cue') of the sentence was vocoded to produce a moderate degree of intelligibility (3band vocoder) followed by a second presentation ('target') vocoded for low speech intelligibility (2-band vocoder). In the high listening demands condition, the sentence was always presented (both 'cue' and 'target') at a low intelligibility level (2band vocoder). Thus, the second presentation ('target') of the Figure 1. Depiction of a typical 'high' reward trial. In 'low' reward trials the pre-trial screen informed participants that the reward was £0.25 rather than £2.50. speech sentence was always generated with a 2-band vocoder and was therefore identical in terms of its physical properties in both the high and moderate listening demands conditions, with only the 'cue' sentence changing in terms of its physical properties and perceived intelligibility. Our approach dissociates the perceptual effects of changes in speech intelligibility and LE from acoustical differences that can be used to vary listening demands. This is achieved by manipulating the perceived intelligibility of identical speech stimuli through prior exposure, that is, vocoded speech that is initially relatively unintelligible can become more intelligible after participants are exposed to an intelligible version of the same speech stimulus (e.g. Davis et al. 2005;Millman, Johnson, and Prendergast 2015). The use of identical speech stimuli that manipulate listening demands and LE could be particularly advantageous in interpreting changes in objective (physiological) measures of LE.
To future-proof the design for potential physiological testing, an assessment method for speech intelligibility was chosen that minimised movement-related noise caused by overt verbal responses. A test word from the sentence was selected randomly from either the beginning, middle or end of the sentence to ensure participants had to listen to the entire sentence. Of the 80 test words, 27 were selected from the beginning, 27 from the middle and 26 from the end of the sentence. Participants were asked to select, using a mouse, which word they had heard within the preceding sentence from amongst five foils presented as a 6-word visual grid (see Figure 1). The mouse cursor returned to the middle of the screen when the visual grid was displayed. The location of the test word varied randomly within the 6-word grid, with an equal chance of the test word appearing in any of the 6 positions. All five foils were either phonologically or semantically related to the test word or other foils. For instance, the sentence 'The loss of a second ship was hard to take' and the test word 'take' had the following foils: phonological foils related to 'take' ('talk', 'tale'); a semantic foil for 'take' ('accept'); phonological or semantic foils for other foils ('except', 'tell'). An online rhyming dictionary, rhymezone.com, was used to select phonologically and semantically related foils (>90 similarity rating). The six options were presented immediately after the speech presentation to minimise memory requirements.

Correct response rate and RTs
Participants were asked to respond as accurately and quickly as possible. The percentage of correct responses (correct response rate) and the average speed of responses (RTs) were measured. Mean RTs were computed inclusive of incorrect trials to avoid data loss (Houben, van Doorn-Bierman, and Dreschler 2013). Mean RTs for incorrect trials were 3.1 s longer than RTs for correct trials (t(748) ¼ 16.48, p < .001), but excluding incorrect trials from the analysis did not change the overall pattern of results.

Subjective ratings of LE and likelihood of giving up
After each trial in the speech recognition task (see Figure 1), the monitor displayed two consecutive questions to gauge subjective LE and the likelihood of giving up: 'How hard did you work to understand what was said?' and 'How likely would you be to give up or just stop trying?' We will refer to these measures as selfrated 'work' and 'giving up', respectively. The wording used to elicit these self-report ratings was almost identical to the wording used by Ricketts (2014, 2018) and is based upon questions from the Speech, Spatial and Qualities Hearing Scale (Gatehouse and Noble 2004). Participants provided subjective ratings, using a mouse, on a visual scale between 0 ('not at all') to 100 ('very').

NASA task load index
In the NASA Task Load Index (NASA-TLX; Hart and Staveland 1988), participants are asked to rate how mentally, physically and temporally demanding they found a recently completed task. Additionally, participants were asked to give ratings on their perceived performance level, how much effort they used and how frustrating they found the task. Rating scales run between 1 ('very low') and 20 ('very high'), except for self-rated performance for which the scale runs from 1 ('failure') to 20 ('perfect'). Participants completed the NASA-TLX immediately after completing all trials. These ratings were collected to gain an overall picture of effort levels and perceptions of the task to aid interpretations of other analyses.
Visual search task A target word was displayed visually and participants were instructed to select the target word in the 6-word grid as quickly as possible. The mean visual search RT was calculated based on 20 trials. As items within the 6-word response grid used for the speech recognition task were not equally spaced (i.e. selection of the outer items required a slightly greater mouse movement), the mean visual search RT for each participant was used to account for physical differences in the spacing of the items in the 6word grid.

Covariate measures
Motivational personality traits Achievement motivation was measured using the Personal Mastery and Competitive Excellence subscales of the Motivational Trait Questionnaire (Heggestad and Kanfer 2000). Both of these sub-scales index achievement-orientated traits: individuals scoring high in personal mastery strive to maximise their performance even for challenging tasks, whilst individuals high in competitive excellence strive to achieve a level of success above their peers. The personal mastery section has 16 items, and statements include: 'I set goals as a way to improve my performance'. The competitive excellence section has 13 items, and statements include 'Even in non-competitive situations, I find ways to compete with others'. Statements were rated between 1 (very untrue of me) and 6 (very true of me).
Need for closure was measured using the Need for Closure Scale (Roets and van Hiel 2011, updated from the original version written by Webster and Kruglanski 1994), which indexes a person's closed-mindedness, dislike of uncertainty and preference for order and predictability (Roets et al. 2015). The scale has 15 items, example items include 'I don't like situations that are uncertain'. Participants rated these statements between 1 (strongly disagree) and 6 (strongly agree).

Cognitive tests
All cognitive tests were carried out using Inquisit 5 (Millisecond Software 2015). Working memory was assessed using an auditory version of the backwards digit span test. Participants were presented with a series of digits and asked to recall them in reverse order. Responses were recorded using a computer keyboard. Participants received two practice trials prior to the main assessment. Participants were initially presented with a 2-digit sequence. Subsequently, the sequence length was adjusted based on performance. Correct recall increased the length of the sequence by 1; failing to recall the sequence correctly after two attempts reduced the sequence length by 1. The backwards digit span was defined as the maximal sequence length of correctly recalled digits after 14 trials.
Linguistic ability was assessed using a lexical decision-making task. Participants were presented with 4 or 5 letter strings and had to indicate whether the strings made up words or non-words as quickly and accurately as possible. Participants recorded their responses via a computer keyboard to yield the lexical decision RT. The task consisted of a practice block containing 6 trials (3 non-words and 3 words in random order), followed by 52 test trials (consisting of 26 words and 26 non-words presented randomly). In each trial, a fixation cross was presented for 700 ms, followed by the stimulus for 250 ms and then a blank screen. The mean RT was calculated for correct trials only.

Procedures
All tasks were completed in a single testing session, which lasted around 1 hour. For the speech recognition task, participants were seated in a sound-attenuated booth facing a computer monitor and given task instructions. During each trial, vocoded sentences were presented diotically at a fixed level of 65 dB(A) via loudspeakers at ±45 azimuth.
After a practice block consisting of 10 trials, 80 test trials were presented in 8 blocks of 10 trials each. There were four high-reward (£2.50) and four low-reward (£0.25) blocks presented in random order. No explanation as to why some trials were worth more than others was provided. Prior to each block, participants were informed that they would receive a financial bonus for answering 6 or more items correctly over the next block of 10 trials. Each block consisted of five trials with moderate listening demands and five trials with high listening demands, presented in random order. Feedback on the performance and the associated award was not given until the end of the experiment, to disassociate the effects of financial reward from mood-related changes in effort, which may occur when participants are given trial-by-trial feedback (Carver 2006;Koelewijn et al. 2018).
After the speech-recognition task, participants completed the visual search task. Following this, participants were asked to complete the NASA-TLX (Hart and Staveland 1988) and the personality questionnaires. Finally, participants were asked to complete the backwards digit span and the lexical decision-making tasks.

Data analyses
Prior to statistical analysis, correct response rates, self-rated 'work' and self-rated 'giving up' were converted to rationalised arcsine units (RAU) (Studebaker 1985). To remove outliers from the RT data, RTs further than three standard deviations from the mean for each participant were removed (Picou, Charles, and Ricketts 2017). A log 10 transformation was then applied to the RTs to meet the assumption of normality for parametric statistics. For each dependent variable (correct response rate, RT, selfrated 'work', self-rated 'giving up'), a repeated-measures analysis of variance (ANOVA) with two within-subject factors, listening demands (moderate/high) and financial reward (low/high) was conducted.
Linear mixed modelling was carried out to investigate whether cognitive skills and personality traits predicted outcomes from the speech recognition task. Statistical analyses were run in R version 3.5.1 (R Core Team 2018), using RStudio 1.1.453 and the nlme package (Pinheiro et al. 2020). For each outcome measure from the speech recognition task (correct response rate, RTs, self-rated "work" and self-rated "giving up") exploratory mixed models were fitted. Eight fixed effect predictors were included: listening demands, financial reward, an interaction term for listening demands and financial reward, backwards digit span, mean lexical decision RT, mean need for closure score, and total scores on the Personal Mastery and Competitive Excellence subscales of the Motivational Trait Questionnaire. Participants were included as a random effect in all mixed models. We used a backwards stepwise procedure (Pinheiro and Bates 2000) to prune the initial model in such a way that higher-level interaction terms only remained if they improved the model fit. 1 For each significant cognitive and personality main effect in the exploratory models, we conducted further analyses to investigate whether these predictors interacted with financial reward or listening demands. The full model included main effects for listening demands, reward and the cognitive or personality effect under investigation, plus all first-and second-level interaction effects and was subsequently pruned in the manner described above. Figure 2 shows the results of the speech recognition task and the associated measures of LE. The correct response rates (% correct) for the speech recognition task are shown in Figure 2(a). A repeated-measures ANOVA with two factors (moderate/high listening demands and low/high financial reward), showed a significant effect of listening demands (F(1,23) ¼ 53.76, MSE ¼ 146.49, p < .001, g p 2 ¼ .70) on the correct response rate, with a higher mean correct response rate in the moderate compared with the high listening demands condition collapsed across reward condition (moderate: mean ¼ 69.6%, SEM ¼ .020; high: mean ¼ 50.2%, SEM ¼ .031). There was no significant effect of financial reward on the correct response rate (F(1,23) ¼ .296, MSE ¼ 115.49, p ¼ .592, g p 2 ¼ .013) and no significant interaction between listening demands and financial reward (F(1,23) ¼ .015, MSE ¼ 85.55, p ¼ .902, g p 2 ¼ .001). Figure 2(b) shows mean RTs (log 10 (s)) for the speech recognition task. A repeated-measures ANOVA conducted on the RTs showed a significant effect of listening demands (F(1,23) ¼ 18.02, MSE ¼ .01, p < .001, g p 2 ¼ .44) with a slower mean RT in the high listening demands condition, compared with the moderate listening demands condition collapsed across reward condition (high: mean ¼ .788 log 10 (s), SEM ¼ .023; moderate: mean ¼ .714 log 10 (s), SEM ¼ .024). There was no significant effect of financial reward on RTs (F(1,23) ¼ 1.83, MSE ¼ .01, p ¼ .190, g p 2 ¼ .074) and the interaction between listening demands and financial reward was non-significant (F(1,23)

Cognitive and personality measures
Group means, standard deviations and ranges for motivational traits and cognitive abilities are shown in Table 1. Based on means/ranges in previous studies (e.g. Viola et al. 2015), all participants would be classified as low in need for closure. Means and standard deviations for the Motivational Trait Questionnaire subscales were similar to Hinsz and Jundt (2005). Means and standard deviations for backwards digit span were very similar to those recorded for young NH participants in Woods et al. (2011). Means and standard deviations for lexical decision RTs were similar to the NH young participants in Strand et al. (2018). Pearson's correlation coefficients between motivational traits (need for closure, competitive excellence and personal mastery) and cognitive skills (backwards digit span and lexical decision RT) were non-significant and small (all r <.2 except competitive excellence and need for closure where r ¼ .44, data not shown). Table 2 shows exploratory mixed models for each outcome measure, which included listening demands, reward, demand Ã reward interaction, backwards digit span, lexical decision RT and totals for the Competitive Excellence and Personal Mastery  subsections of the Motivational Trait Questionnaire and the Need for Closure Scale as fixed effects. For correct response rate, alongside a significant effect of listening demands (F(1, 1893) ¼ 81.23, p < .001, g p 2 ¼ .04) we found a significant fixed effect of backwards digit span (F(1, 18) ¼ 7.87, p ¼ 0.01, g p 2 ¼ .01). Our analysis showed that the best fitting model for the correct response rate consisted of listening demands, reward and backwards digit span, of which listening demands (F(1, 1894) ¼ 81.27, p <.001, g p 2 ¼ .02) and backwards digit span (F(1,22) ¼ 6.62, p ¼ 0.02, g p 2 ¼ .01) were individually significant fixed effects. Hence, backwards digit span did not interact with either listening demand or financial reward to affect correct response rate.

Discussion
This study evaluated the relationship between listening demands and motivation in a speech recognition task in the context of a multi-factorial model from the wider field of effort research, i.e. CET (Kruglanski et al. 2012). We manipulated motivation by varying financial reward and listening demands by varying the degree of degradation of the vocoded speech presented to listeners. We measured the effects of these manipulations on four main outcomes (correct response rate, RTs, self-rated work, selfrated giving up). We also considered the modulating effects of personality factors and cognitive skills. The manipulations and co-varying factors, as well as the resulting hypotheses, reflect predictions made by CET.
The prediction of a main effect of listening demands was supported. Varying prior knowledge of tone-vocoded speech was found to be an effective way of manipulating listening demands: high listening demands led to significantly decreased correct response rates, increased RTs and increased self-rated "work" and "giving up". These findings are consistent with CET (Kruglanski et al. 2012) and are also in line with FUEL (Pichora-Fuller et al. 2016).
We also found the predicted main effect of reward, consistent with CET, which stipulates that financial reward increases motivation, resulting in a stronger driving force and greater mobilisation of effort to counteract the restraining force. It is important to note, however, that according to the mixed model (Table 2), the effects of financial reward were limited to self-rated work and giving up and did not extend to increased correct response rates or changes in RTs. It is possible these results did not reflect LE but instead were due to demand characteristics of the experiment, that is, participants realising that greater effort was expected in high reward trials.
Although we cannot rule out demand effects, we propose two alternate interpretations for why financial reward affected only self-rated but not behavioural outcomes. First, LE may be a multi-dimensional concept (McMahon et al. 2016;Strauss and Francis 2017;Hughes et al. 2018;Strand et al. 2018;Alhanbali et al. 2019) where some measures, for example, self-report, show an effect of LE and others, for example, behavioural outcomes, do not. In a similar vein, other studies have shown physiological effects in response to increased LE under conditions of high financial reward but no significant behavioural effects (e.g. Richter 2016; Koelewijn et al. 2018). Self-rated LE measures may also be more sensitive to the effects of motivation than behavioural measures (Pichora-Fuller et al. 2016;Herrmann and Johnsrude 2020), which might explain why RTs did not appear to be sensitive to the financial reward manipulation used in this study. Second, RTs are sensitive to how motivation is operationalised: Weis et al. (2013) found differential effects on RTs in an auditory discrimination task depending upon whether the financial motivator was presented to participants as a reward (starting from zero and gaining money for correct answers) or a punishment (starting from maximum and losing money for incorrect responses).
The interaction between listening demands and financial reward was non-significant cf. CET, FUEL and MIT. Richter, Gendolla, and Wright (2016) suggest a number of extensions to MIT (Brehm and Self 1989) that may limit the greater influence of motivation at higher demands, some of which may explain the lack of interactive effects in this study. These extensions include fatigue level, mood, and participant perceptions of their ability to succeed at a task. In the present study, despite scoring well above chance ($50% correct) in the high listening demands condition, some participants may have perceived the task as too difficult and hence offering greater reward would have little impact upon effort investment. This may also account for the very high levels of frustration recorded on the NASA-TLX (Figure 3). Coupled with differences in task design, this may also explain why our results conflict with Zhang et al. (2019) who found an interaction between demands and reward on performance in NH listeners at high correct response rates ($70-85%).
We predicted that differences in the resource pool capacity (measured by backwards digit span and lexical decision RTs) and in the resource conservation aspect of the restraining force (i.e. need for closure and inter-individual differences in achievement motivation) would impact upon the correct response rate and subjective and behavioural measures of LE in our speech recognition task. A significant main effect of backwards digit span was found for the correct response rate. This result suggests that participants with greater resources performed better, consistent with the driving force component of CET. This finding also supports the ELU model (R€ onnberg 2003; R€ onnberg et al. 2013; R€ onnberg, Holmer, and Rudner 2019), that is, working memory resources are recruited during effortful listening to resolve mismatches between input and representations stored in long-term memory. However, note that individual differences in backwards digit span did not predict behavioural (RT) or self-rated LE, a result that we discuss further in the limitations section (see below). In contrast, lexical decision RT predicted only RTs from the speech recognition task and not correct response rate. Moreover, the direction of this association did not follow CET predictions. Specifically, CET predicts greater resources (here we assumed lexical decision-making ability) would strengthen the driving force. Yet in the present study, greater LE appeared to be exerted by participants who were slower at lexical decision-making. Our results are also inconsistent with previous suggestions (Larsby, H€ allgren, and Lyxell 2008;R€ onnberg et al. 2008;Kaandorp et al. 2016;Lyxell and R€ onnberg 1992) that lexical decision RT is related to the correct response rate in a speech task. The significant predictive effect we found for RTs in the speech recognition task and the lexical-decision-making task may simply be due to both tasks measuring information processing speed.
Personality traits (need for closure, competitive excellence and personal mastery) were included in the multi-level models as these traits may influence the tendency to conserve resources as part of the restraining aspect of CET. CET specifically identifies need for closure as a trait that affects resource conservation. We also expected that individuals with higher Motivational Trait Questionnaire scores (indicating greater achievement motivation) would show a stronger interaction between listening demands and motivation as these individuals may be less conservative in the allocation of their resources (Beh 1990;Capa, Audiffren, and Ragot 2008;Hinsz and Jundt 2005). The only outcome measure for which the exploratory modelling showed a significant effect for competitive excellence was RT. There, more competitive individuals tended to have longer RTs, suggestive of greater LE exertion. However, this main effect did not interact with motivation or listening demands and did not moderate the interaction between these two factors.
Based on CET, we would have expected a significant inverse relationship between participants' tendency to need closure and effort exertion. The null effect for need for closure goes against CET predictions regarding resource conservation. Moreover, need for closure did not interact with motivation or listening demands or moderate the interaction between these two factors, as was expected based on CET. However, this null-effect may be due to a lack of variability in need for closure within our sample, as, based on previous research (e.g. Viola et al. 2015), all our participants would be classified as having low need for closure. Previous research finding a significant effect of need for closure on effort investment (e.g. Richter, Baeriswyl, and Roets 2012) screened a large number of participants and conducted an experiment only with the participants who scored in the upper and lower quartiles, i.e. an 'extreme group' approach which increases statistical power (Cohen 1998). We, on the other hand, measured need for closure as a continuous covariate. Since the effect of personality on LE outcomes in a speech recognition task appears to be small, employing an extreme group approach and/ or enlarging the sample size of the present study may have revealed an effect of personality in line with CET.

Limitations of the present study
We did not find a consistent effect of financial reward on the main outcome measures in the present study, whereas research which informed CET predictions includes experiments where motivation in a listening task was manipulated using financial reward (e.g. Bijleveld, Custers, and Aarts 2009). However, the effectiveness of extrinsic rewards when manipulating motivation has been questioned. Previous meta-analyses have concluded that offering tangible rewards undermines a person's intrinsic motivation, that is, their desire to engage with interesting tasks to the best of their ability (Rummel and Feinberg 1988;Wiersma 1992;Tang and Hall 1995;Deci, Koestner, and Ryan 1999). A performance-contingent reward may erode a person's perceived autonomy and competence at a task if they attribute their performance to be due to the reward rather than their own interest (Lepper, Greene, and Nisbett 1973;Deci and Ryan 1985). It is, therefore, possible that financial reward may have demotivated some participants in the present study.
More granular aspects of study design may also impact upon the effectiveness of the motivating variable. We presented a high (£2.50) versus low (£0.25) reward for achieving a correct response rate of !60% per every 10 trials. Other studies used a much higher reward threshold e.g. 90% (Richter 2016) which may have encouraged greater effort, although setting the threshold to gain a reward too high may discourage effort investment if the goal is perceived as impossible (Brehm and Self 1989). In addition, participants could feasibly exert high effort in every trial, regardless of reward condition, to maximise the overall amount of reward they received. Introducing the need to strategically allocate resources (e.g. Zhang et al. 2019), may result in a more effective manipulation of motivation.
The lack of significant interactions between listening demands and financial reward, and the null effects for the resource pool and resource conservation covariates on our main outcome measures, may be explained by a lack of statistical power and/or a lack of variability in our resource pool measures. Our sample size reflects the number of participants needed to identify main effects on the main outcome measures. The current results will inform appropriate sample sizes in future studies to investigate the interactions predicted by CET/MIT. Such future research can then clarify whether the current null effects were due to methodological limitations of the present study (e.g. lack of extreme groups for personality traits, the possibility that participants perceived the listening task to be too hard), or whether CET is not appropriate for predictions in this particular context.

Conclusions
The present study shows that manipulating prior knowledge by using vocoded speech is a feasible way of varying speech intelligibility and associated measures of LE (RTs, self-ratings) in young, NH listeners. The effects of offering financial reward on LE were more complex: changes in subjective ratings of 'work' and 'giving up' with increased financial reward were not mirrored by increased correct response rates or greater LE investment, as measured by RTs. We found only partial support for CET predictions which are intended to apply to 'all instances of goal-directed thinking' (p. 3, Kruglanski et al. 2012). It is unclear whether this is due to the limitations of financial reward as a manipulator of motivation or the multi-dimensional nature of LE (McMahon et al. 2016;Pichora-Fuller et al. 2016;Strauss and Francis 2017;Hughes et al. 2018;Strand et al. 2018;Alhanbali et al. 2019). The results of the exploratory analyses presented here suggests the influence of personality and cognitive skills on effortful listening and their interaction with listening demands and motivation is small.

Note
1. The Akaike Information Criterion (AIC) was used to estimate the fit of each model. Comparisons were made between the AIC of a model containing a particular interaction effect and a model excluding this interaction term while keeping all other terms identical. The pruned model with the lowest AIC value was then compared with the unpruned model for this level. If the AIC of the pruned model was lower, the pruned model was carried forward to the next stage of analysis and set as the new base model for pruning. If the AIC of the pruned model was higher than the unpruned model, indicating a worse fit, an ANOVA was conducted to compare both model fits. If the pruned model was not significantly worse, it was carried forward as the new base model for pruning. Following this procedure, interaction terms were progressively eliminated, until only one remained. The final model was established by using ANOVA to compare this model to a model consisting of only the three main effects. ML (maximum likelihood) estimation was used for the stepwise comparisons and upon establishing the final model, fixed effects were calculated using REML (restricted maximum likelihood) estimation i.e. a modelling approach similar to Heinrich, Ferguson, and Mattys (2019) and Heinrich (2017, Knight andHeinrich 2018).

Disclosure statement
No potential conflict of interest was reported by the author(s).