Conversational expectations get revised as response latencies unfold

ABSTRACT The present study extends neuro-imaging into conversation through studying dialogue comprehension. Conversation entails rapid responses, with negative semiotics for delay. We explored how expectations about the valence of the forthcoming response develop during the silence before the response and whether negative responses have mainly cognitive or social-emotional consequences. EEG-participants listened to questions from a spontaneous spoken corpus, cross-spliced with short/long gaps and “yes”/“no” responses. Preceding contexts biased listeners to expect the eventual response, which was hypothesised to translate to expectations for a shorter or longer gap. “No” responses showed a trend towards an early positivity, suggesting socio-emotional consequences. Within the long gap, expecting a “yes” response led to an earlier negativity, as well as a trend towards stronger theta-oscillations, after 300 milliseconds. This suggests that listeners anticipate/predict “yes” responses to come earlier than “no” responses, showing strong sensitivities to timing, which presumably promote hastening the pace of verbal interaction.


Introduction
The central ecological niche for language, the one in which it evolved, in which it is acquired, and where it is most used, is verbal interaction or conversation. But experimental, and especially neurocognitive approaches to natural language use of this sort are in their infancy (see Bögels & Levinson, 2017 for review). In earlier work (summarised in Levinson, 2016) we have shown that language processing in this niche must in fact interleave comprehension and production processes. This is because, whereas such language use is characterised by short turns (around 2 s long on average) with very short gaps (modally 200 ms) between speakers, latencies in language production are of the order of 600 ms or more (Levinson & Torreira, 2015). Consequently, language processing in this environment has to be predictive and expectation driven. Using EEG there are two ways to show this. One is to look at production processes before the utterance begins. Thus one can show that, where the context makes it possible, response preparation starts well before the prior speaker has finished his or her turn (Bögels, Casillas, & Levinson, 2018;Bögels, Magyari, & Levinson, 2015). The other is to use an overhearer paradigm, where participants listen to conversational snippets (see, e.g. Gisladottir, Bögels, & Levinson, 2018;Gisladottir, Chwilla, & Levinson, 2015), which is the method used in the study we report here. Note that such an overhearer paradigm does not directly investigate actual participants in a conversation, which can be assumed to be the most natural form of language use, although overhearing also occurs regularly in daily life, such as in multi-party conversations or "eavesdropping" on a conversation. Still, the current study goes beyond most earlier research in terms of ecological validity since it uses auditory materials, which were taken from a spontaneous corpus of spoken telephone conversations (see below). Moreover, we believe it is likely that any effects found in an overhearer paradigm would be generalisable, or even expected to be enhanced for actual participants in a dialogue, who are presumably more invested in the outcomes.
An interesting question is why, despite the cognitive demands, conversation has this rapid pace. There are a range of possible answers, including the loss of a chance to speak, some even related to ethology or phylogenetics (see Levinson, 2016). But one thing that impels speakers is that overlong pauses between turns seem to have semiotic significance, especially when the first turn requires a response, as in the case of questions, offers, requests, and so on.
For example, when you invite someone to a party, a silence of, say, one second, might be indicative that the response might not be what you hoped for. Or consider how the caller C in this telephone call interprets a 1.86 secs pause as a negative answer to his own question which he himself voices (Levinson, 1995): C: So um (0.2) I was wondering would you be in your office (0.62) on Monday (0.42) by any chance?
(1.86) C: Probably not To avoid such imputations, responses have to be timely. The present study aims to investigate the interaction between expectations of response type and their timing. As implied in the above examples, responses to questions can be more, or less, cooperative or desirable. Conversation-analytic work has described a conversational system based on two-turn sequences of initiating and responding actionssuch as questions and answerswhich form a large part of day-to-day conversations (Schegloff, 2007). Within this system, responses can appear in different forms of which some are "preferred" and others are "dispreferred" (Levinson, 1983). This distinction refers respectively to unmarked responses that comply with, or go along with the initiating action on the one hand, versus those that block or reject the initiating action on the other. For example, preferred answers to proposals usually entail an acceptance of the request, whereas rejections are usually dispreferred (see e.g. Kendrick & Torreira, 2015 for real-life examples of preferred and dispreferred responses). These two types of responses generally differ in form (Pomerantz & Heritage, 2013); dispreferred responses often appear marked in some way relative to preferred ones, for example by including hesitations or particles, by being longer and more complex, by including accounts that "explain away" the dispreferred response, and most importantly for the present purposes, by being uttered after a delay. The latter observation was made in conversation analytic work, both qualitative and quantitative (Heritage, 1984;Kendrick & Torreira, 2015;Pomerantz, 1984;Stivers & Robinson, 2006). Specifically, Kendrick and Torreira (2015) showed that preferred responses to certain types of questions were generally more frequent in a corpus of English conversation and especially when they were preceded by a normal short gap. However, after a delay of about 700 milliseconds, dispreferred responses became more frequent. Offline experiments (Roberts, Francis, & Morgan, 2006;Roberts, Margutti, & Takano, 2011) also showed that listeners of recorded (but enacted) telephone conversations judged responders as "less willing" to comply with a request when the gap before their positive response was longer. An interesting possible implication of these corpus and offline findings is that listeners in a conversation could make use of gap lengths on-line: the longer the gap lasts, the higher the chances become that the answer will be dispreferred. Bögels, Kendrick, and Levinson (2015) first investigated the question whether listeners can indeed make on-line use of this information, using an EEG paradigm. They presented participants with the same kinds of initiating actions investigated in the corpus study described above (Kendrick & Torreira, 2015), namely requests, invitations, proposals, and offers. Importantly, these initiating actions were taken from a corpus of spoken Dutch, ensuring their ecological validity. These initiating actions were paired with two types of responses: preferred responses and dispreferred responses. For the purposes of that study, preferred responses were conveniently operationalised as "yes" and dispreferred responses as "no" (naturally, there are further alternatives and elaborations to be found in natural conversation). In some cases, "no" can serve as a preferred response, as in response to the following example: "You never had any regrets, have you?" (Kendrick & Holler, 2017, Table 1; see also Heritage, 2010). However, the initiating actions taken from the corpus were all of such form that they preferred a positive ("yes") response.
Crucially, the timing of these responses was manipulated; they either occurred after a normal, short gap of 300 milliseconds or after a long gap of 1000 milliseconds. Two interesting ERP results emerged. First, a larger N400 was found for "no" than for "yes" responses, but only after the short gap. Thus, listeners expected preferred rather than dispreferred responses after short gaps, but this expectancy difference disappeared after long gaps, indicating that the long gap changed the expectation of the response. Second, a larger anterior late positivity for "no" responses (roughly between 500 and 800 ms) was found irrespective of the gap length. The authors related this effect to the possible social disaffiliativeness of a plain "no" answer without an account (e.g. an explanation for the refusal or rejection), which might be perceived as rude by participants. 1 However, it was unclear whether the positivity reflected mainly cognitive consequences of the plain "no", such as searching for an account, or whether it reflected more socio-emotional processes. Thus, the study described above showed that longer gaps affected listeners' expectations for preferred vs. dispreferred responses, with longer gaps making an upcoming dispreferred response subjectively more likely. However, it did not investigate the processing of expectations during the gap itself.
The first aim of the present study is to investigate listener's expectations during the gap. To this end, we measured the brain's response to lengthening gaps within the silence itself, while contexts were presented to participants right before the auditory sequences. These contexts aimed to bias participants to expect the response that they would eventually hear (preferred or dispreferred; see Materials section and Figure 1 below). Thus creating an a priori expectation for a certain upcoming response, we can see whether this in turn leads to an expectation for a shorter (in the case of preferred responses) or longer gap (in the case of dispreferred responses). 2 We have no clear hypotheses about the specific neuronal instantiations of this interaction, given that it would be measured within the gap (silence) and most previous EEG research measured EEG responses to auditory or visual stimuli. For that reason, we analysed the EEG measurements during the long gap using both ERP and time-frequency analyses, which have been shown to reveal complementary aspects of processing (e.g. Hagoort, Hald, Bastiaansen, & Petersson, 2004).
A second aim of the present study is to shed more light on the late positivity found before (Bögels, Kendrick, et al., 2015). This was interpreted as related to the fact that the "no" response was presented Figure 1. Examples of the biasing context presented before the mini-dialogues; biased towards a preferred response (panel A) and towards a dispreferred response (panel B). The speaker icon indicates which speaker starts the target sequence (see, e.g. Table 1). Note that the original text was in Dutch (see Table 1); the English translation was inserted in the figure for readability.
without an account. In the present study, "no" responses are always preceded by a biasing context in which an account is given in advance for the dispreferred response (see Tables 1 and 2 for examples). Thus, if the late anterior positivity was only a reflection of the fact that the participants did not know the account for the dispreferred answer, it should disappear in the present study because an explanation is present in the prior contexts we have provided. In contrast, if the effect is at least partly also a reflection of the social or emotional consequences of a plain "no" (e.g. it being perceived as rude), some effect should still be found for "no" relative to "yes" responses. Note that the N400 effect found for dispreferred relative to preferred responses after a short gap is not hypothesised to be replicated in the present study, since the biasing context creates a strong expectation for a "no" response in that condition which would presumably override the expectation based on the general preference for "yes".

Participants
The experiment was approved by the Ethics Committee Social Sciences of the Radboud University Nijmegen. Thirty-six participants (8 males) from the database of the MPI for Psycholinguistics took part in the experiment. Four were excluded from the final analysis, one due to experimenter error, one due to excessive artefacts, and two due to a break-down of ocular electrodes. The 32 remaining participants (7 males) were 22.6 years old on average, right handed, and native speakers of Dutch without reading or hearing problems. They were paid 10 euros per hour for their participation.

Materials
The auditory materials were taken from an earlier study (Bögels, Kendrick, et al., 2015) and consisted of mini-dialogues containing an initiating and a responding Table 1. Example of one item (request) in Dutch, with English translations, including the text in the context pictures (see Figure 1) preceding the question and an example of a comprehension statement that followed in 20% of trials. Underlined text was presented in thinking balloons.

Request:
A: heb je volgende week nog een uh een moment om ons te ontvangen? "Do you have a uh a moment next week to receive us?" 300 ms/ 1000 ms ja "yes"/nee "no" De spreker wil graag vandaag nog bij de ander op bezoek. "The speaker would like to visit the other person today." A: "And then, eh, shall I cook for you?" Proposal B: I really worked for long today. B: It was a waste to sit inside the whole day, it is such a lovely summer day today.
B: I am quite tired from it, so I will go to bed soon because I have to wake up early tomorrow.
A: "Yes, by the way, would you still like to go for an evening walk or something?" Note: Underlined text was presented in thinking balloons, the rest of the context was presented in text balloons. The question (and response) was always presented auditorily.
action. The initiating actions were requests, offers, proposals, and invitations (see Couper-Kuhlen, 2014) taken from the telephone conversations of the Corpus of Spoken Dutch (CGN; Oostdijk, 2000). These actions impressionistically sounded intonationally and pragmatically complete (Ford & Thompson, 1996), required a conditionally relevant response (Schegloff, 2007) and were not biased towards a negative response (Heritage, 2010). See Tables 1 and 2 for examples. The responding actions consisted of 30 ja ("yes") and 30 nee ("no") tokens also taken from the CGN, but they were never the original responses to any of the initiating actions used. Each of the 120 initiating actions could appear in four different conditions: first followed by either 300 ms or 1000 ms of background noise from the same recording, then followed by either a ja or a nee response token. The initiating action and the response were always presented in different audio channels. Ja and nee tokens from speakers of the same gender were paired up and each pair was coupled to four initiating actions (in different conditions). As a result, each participant heard every response token only twice (see Design).
For the purpose of the present experiment, contexts in the form of pictures were also created to be presented before the auditory mini-dialogues (see Figure 1 for diagrammatic examples and Table 2 for three more examples presented with text only). The pictures contained stylised cartoon images of the speakers in the conversation (different images for male and female speakers), speaking balloons, and "thinking balloons", indicating what the two dialogue partners said and thought, respectively, just before they would utter the mini-dialogue. A loudspeaker icon placed next to one of the speaker's images (see Figure 1, bottom) indicated which of them would start speaking first in the auditory fragment. For each initiating action, two different contexts were created in such a way that the first context biased participants to expect a preferred ("yes") answer to the auditorily presented initiating action (i.e. "preferred context", see example in Figure 1, panel A) and the second context biased participants to expect a dispreferred ("no") answer to this action (i.e. "dispreferred context", see example in Figure 1, panel B). To ensure that the contexts indeed fulfilled these criteria, we performed a web-based pre-test in which 43 participants saw the context images, followed by the initiating action played auditorily when they pressed a button. Two lists were created (21-22 participants per list), both containing all initiating actions with half of them preceded by preferred and the other half preceded by dispreferred contexts. This was reversed in the second list. Participants were asked to indicate on a Likert scale from 1 to 7 how positive they thought the response to the initiating action would be. Responses with response times below 200 ms or above 20 s were removed (0.27% of the data). Preferred contexts were clearly judged to lead to a more positive response (M = 6.23) than dispreferred contexts (M = 2.06, t = 123.02, p < .001). Only three dispreferred contexts had an average score above 3 and two preferred contexts had an average score below 5. These five contexts were subsequently changed to provide a stronger bias before using them in the EEG experiment.
For the practice block, ten initiating actions (suggestions and requests for information) that could receive a "yes" or "no" response were taken from the earlier study as well (Bögels, Kendrick, et al., 2015). Half were followed by the original ja or nee response from the corpus and half were followed randomly by between 0 and 1000 ms of background noise and a cross-spliced response. For each of the 10 practice items a biasing context (picture) was created that biased participants to expect the response they would hear.

Design
The two factors gap duration (300, 1000 ms) and response type ("yes", "no") were fully crossed to create four conditions. Contexts always matched the response; thus, preferred contexts were always followed by "yes" responses and dispreferred contexts by "no" responses. Four lists were created, all administered to one fourth of the participants. Each list contained all 120 items once, in the same order, 30 in each condition. The conditions were rotated over the items in the four lists using a Latin Square design. The 120 items were divided into three blocks of 40 items, with pauses in between. Each block contained 10 items of each condition in a semi-random order, with the following restrictions. The same response token extracted from a particular recording was always separated by at least four items and occurred only twice in each list. Initiating actions coming from the same telephone conversation were separated by at least three items. The same condition appeared maximally twice in a row.

Procedure
After having given written informed consent and receiving EEG preparation, participants sat down in a sound proof booth in front of a computer screen. They read the instructions on the screen and could ask questions afterwards. They were instructed that they would hear fragments from a corpus of telephone calls. They were told that fragments had been selected in which the speakers were making plans and in which the response was "yes" or "no". The context they would see before each auditory fragment was also introduced and explained to them. See Appendix A for an English translation of the exact (Dutch) written instructions given to participants.
In each trial, participants first saw the context picture on the screen which they could view at their own convenience. They then pressed a button to continue. A fixation cross appeared and after 1000 ms the fragment played. One second after the end of the fragment, the fixation cross disappeared. For 20% of the items (and 50% of the practice items) this was followed by a written statement. See Table 1 for an example. Participants indicated whether they thought the statement was true (left button) or false (right button). On average, only 1.8 (out of 25) statements were responded to incorrectly (range: 0-6), indicating that participants paid attention during the experiment. Finally, a blinking sign was presented on the screen for 2000ms. Participants were asked to blink during the blinking sign, but to try not to move, blink, or move their eyes while the fixation cross was on the screen.
The experiment started with a practice block of 10 items, after which participants could ask questions and received feedback about their blinking. The practice and the three experimental blocks together lasted about 50 min. At the end of the experiment, participants filled out a short questionnaire on the computer. In this questionnaire, none of the participants reported to notice anything with regards to timing of responses. After this, they received a debriefing about the way the stimuli were created and the purpose of the experiment. The experiment, including EEG set-up, lasted about two hours in total.
Apparatus EEG was recorded from 61 active Ag/AgCI electrodes using an actiCap (e.g. Bögels, Magyari, et al., 2015). Of these, 59 electrodes were mounted in the cap with equidistant electrode montage referenced to the left mastoid. Two separate electrodes were placed at the left and the right mastoid outside of the cap. Blinks were monitored through a separate electrode placed below the left eye and one of the 59 electrodes in the cap. Horizontal eye movements were monitored through two separate electrodes placed at each outer canthus. The ground electrode was placed on the forehead. Electrode impedance was kept below 10 kΩ. EEG and EOG recordings were amplified through BrainAmp DC amplifiers. EEG signals were filtered online with a band-pass filter between 0.016 and 100 Hz. The recording was digitised online with a sampling frequency of 500 Hz and stored for offline analysis.

Data analysis
Pre-processing and statistical analysis of EEG data was conducted using Fieldtrip (Oostenveld, Fries, Maris, & Schoffelen, 2011). First, epochs were extracted from the EEG from 500 ms before the offset of the initiating action until 1000 ms after response onset. For purposes of artefact rejection, these epochs were filtered with a low pass filter of 35 Hz, detrended, and baselined at the last 200 ms before the gap (i.e. during the last 200 ms of the initiating action). Epochs containing eye artefacts or other artefacts that exceeded about +/−100 μV (visual inspection) were discarded. For the participants that entered the analysis, an average of 27-28 out of 30 trials (range: 19-30) remained for all four conditions. Two different epochs were extracted for the ERP and time-frequency analyses, one timelocked at gap onset and one time-locked at response onset. For ERP analyses epochs were low-pass filtered at 35 Hz and baselined at 0-200 ms relative to gap onset and at −200-0 ms relative to response onset, respectively. Trials of the same condition were averaged per participant. For time-frequency representations, no filtering or baselining was performed, but a linear trend was removed from the data before the analysis. The power of each frequency between 4 and 30 Hz (with steps of 1 Hz) was calculated on the extracted epochs of individual trials using a Hanning taper (Grandke, 1983) with a window of 500 ms for each frequency. For illustration purposes, relative differences were calculated between conditions, dividing the absolute power difference between conditions by the sum of the power in both conditions (see Figure 4).
To test for statistically significant differences between conditions, we used the cluster-based approach implemented in the Fieldtrip toolbox (Maris & Oostenveld, 2007). This robust method reduces the multiplecomparisons problem and controls family-wise error across subjects in time and space. To examine differences between experimental conditions, paired t-tests are performed for each time-point, channel, and frequency (for time-frequency analyses) with a threshold of .05. All time, channel, and frequency (for time-frequency analyses) points below the threshold are selected and clustered. Clusters in time, space, and frequency are identified on the basis of proximity of the points (neighbours) in all dimensions of the cluster. Cluster statistics are calculated by taking the sum of t-values in every cluster. To obtain a p-value for each cluster, a Monte Carlo method is used to estimate the permutation distribution of the largest cluster statistic. The permutation distribution is created by 1000 random permutations of the samples of the two conditions. At each randomisation, clusters are identified and the largest sum of t-values of the clusters enters the permutation distribution. The proportion of maximum cluster statistics of the permutation distribution that is larger than the observed one is the p-value. The threshold was fixed to p = .05. The time-locking point at response onset was analysed with ERPs only, for comparison to the earlier results (Bögels, Kendrick, et al., 2015). We used a similar approach as that study, first testing for interactions between gap duration and response type. Within the cluster-based approach, this was done by calculating the mean difference between the "yes" and "no" responses for each participant within the 300 and 1000 ms gap conditions. Then the cluster-based approach was used on these difference scores with the within-subject factor gap duration in a time window between 0 and 1000 ms. In the case of an interaction, separate analyses were performed to compare "yes" and "no" responses within the 300 and 1000 ms gap conditions. In addition, the short and long gap conditions were collapsed to look for main effects of response type. For the time-locking point at gap onset, we compared the two conditions containing long gaps using both ERP and time-frequency analyses. We analysed a window of interest between 0 and 1000 ms to explore any effects that might occur within the long gap. Furthermore, we analysed a window between 300 and 500 ms, because we expected any effects to occur soon after participants could first notice that this is a long gap. This reasoning was based on the fact that inter-turn gaps around 200-300 ms are most frequent in conversation (e.g. Heldner & Edlund, 2010) so longer gaps could be considered "long". More specifically, in the present experiment, gaps were always either 300 or 1000 ms long, so participants might implicitly learn that gaps longer than 300 ms are always long gaps. Since we did not expect main effects of context (our design was not built to find those, see footnote 2), but were interested in the moment at which information about the context would be integrated with the length of the gap, the most relevant time window is the one just after listeners realise that the gap is long (i.e. between 300 and 500 ms after long gap onset).

Results and discussion
Response onset Figure 2, panels A and B, show grand average waveforms time-locked to response onset for "yes" and "no" responses (preceded by preferred and dispreferred contexts, respectively) after a short (300 ms, panel A) and after a long gap (1000 ms, panel B). Two differences seem apparent, an early centroparietal negative effect and a somewhat later, mostly anterior positive effect for "no" relative to "yes" responses. Since an interaction analysis for 0-1000 ms on the difference values between "yes" and "no" responses showed no interactions between gap length and response type (p > .4), we collapsed the 300 and 1000 ms gap conditions to look at the main effect of response type. This analysis yielded one cluster reflecting the early negative effect for "no" vs. "yes" responses with a centroparietal distribution (117-322 ms. p = .003; see topographical plot in Figure 2, panel C) and a later marginally significant positive effect for "no" vs. "yes" responses with a mostly anterior distribution (312-456 ms, p = .075; see topographical plot in Figure 2, panel D). See Figure 2, panel E for difference waves between the two "yes" and "no" responses collapsed over the two gap lengths.
Given its early onset, the first, negative effect for "no" relative to "yes" responses could be explained by relatively low-level differences between these lexical items, for example in frequency (ja ("yes") is much more frequent than nee ("no") in the spoken corpus used to extract the materials; Oostdijk, 2000). Therefore, we should be cautious to interpret this effect and we can only speculate about its cause here. Early effects of frequency starting around 150 ms have been reported (Hauk & Pulvermüller, 2004;Sereno, Rayner, & Posner, 1998), where low-frequency words also led to a more negative waveform at posterior sites. However, in these earlier studies, the negative effects for low-frequency words were mostly found on negative peaks, whereas in the present study, the effect seems largest on a positive peak. An alternative explanation might be that "no" responses lead to a small early residual N400 effect. The early onset might not be too surprising given that the earlier study using the same materials but without biasing contexts (Bögels, Kendrick, et al., 2015) also found a very early starting N400 to unexpected "no" responses (after 300 ms gaps, with a non-significant suggestion in the same direction after 1000 ms gaps, see Figure 2 of that paper). Although in the current experiment, both responses should be expected based on the biasing contexts, it is possible that the general preference for "yes" responses adds to the contextual bias, still leading to a small N400 effect for "no" relative to "yes" responses. This N400-effect does not persist to the standard N400 window, which might be due to an overlap with the following positivity (see below).
The trend towards a somewhat later anterior positivity for "no" responses is, although found in a similar time Figure 2. Grand average waveforms time-locked to response onset after a 300 ms gap (panel A) and a 1000 ms gap (panel B) for "yes" responses (black dotted line) and "no" responses (red line). A representative subset of 15 electrodes is shown, the locations of which are indicated on the small head at the middle right of each panel. Panels C and D show distribution plots of differences in grand average waveforms between "no" and "yes" responses for the two windows in which a (marginally) significant effect was found. Electrodes that are significant (in panel C) or marginally significant (in panel D) in at least 50% of the time window are highlighted in white. Panel E shows difference waves for "no" minus "yes" responses for two representative electrodes, the locations of which are indicated on the small head. window, opposite in polarity to the N400-effect found in the earlier study (Bögels, Kendrick, et al., 2015). Given that this positivity was only a trend, we have to interpret it with caution here. However, assuming that it reflects a relevant difference and would be replicated in future studies, it might be interpreted in two different ways: (1) as a new effect, not present in the earlier study (Bögels, Kendrick, et al., 2015) or (2) as an earlier instantiation of the late positivity found in that study. Under the first interpretation, the late positivity found earlier (Bögels, Kendrick, et al., 2015) disappeared in the present study, presumably because an account was now given in the context. In that case, that late positivity (Bögels, Kendrick, et al., 2015) might be explained purely by the search for an account. The early positivity found in the present study might then still (perhaps like the earlier negativity) be due to a low-level difference between "yes" and "no", for example in frequency. In the 300-500 ms window where the positivity is found, studies on word frequency have found conflicting results. Some found a larger P300 for frequent than infrequent words (Polich & Donchin, 1988) or a larger N400 for infrequent than frequent words (Rugg, 1990;Van Petten & Kutas, 1990), which are incompatible with the present results, given their polarity. Hauk and Pulvermüller (2004) instead found a larger positivity for infrequent than frequent words between 320 and 360 ms, but that effect had a centroparietal peak, unlike the present anterior positive effect. Alternatively, the positivity might be interpreted as a general P300-like effect, reflecting some "surprise" at encountering a (plain) "no" response, even if a dispreferred response can be expected based on the context. Future research should try to disentangle the effects of "yes" versus "no" and preferred versus dispreferred responses by including questions that prefer a "no" answer.
The second interpretation, namely that the frontal positivity found here is an earlier instantiation of the late positivity found in the earlier study (Bögels, Kendrick, et al., 2015), might be corroborated by a similar frontal distribution of the two effects. Interestingly, ERP studies on comprehension of morally unacceptable statements or behaviour have also found fronto-central positivities, with onsets ranging between 320 and 500 ms (Leuthold, Kunkel, Mackenzie, & Filik, 2015;Van Berkum, Holleman, Nieuwland, Otten, & Murre, 2009) and emotional stimuli have been found to elicit fronto-central positive shifts starting around 300 ms (for a review, see Fischler & Bradley, 2006). The fact that the positivity in the present study occurs relatively early (and earlier than in Bögels, Kendrick, et al., 2015), might be due to a strong expectedness of the response given the 100% predictive context which enables participants to know in advance which response is forthcoming. This interpretation entails that a frontal positivity for "no" responses cannot be fully explained by the effects of having to come up with an account for a dispreferred response, but that socio-emotional consequences of the rudeness of a plain "no" may also play a role here. Figure 3, panel A, presents grand average ERP waveforms time-locked to gap onset for the two 1000 ms gap conditions. The black dashed line represents the condition preceded by a preferred context and the red line represents the condition preceded by a dispreferred context. Both conditions show a positive deflection with a maximum around 300 ms. Furthermore, the dispreferred context condition appears to go a bit more positive and stay positive longer than preferred context condition (see Figure 3, panel C for difference waves between dispreferred and preferred contexts at a representative electrode). A cluster-analysis between 0 and 1000 ms showed no significant effects (ps > .12), but the same analysis between 300 and 500 ms showed a significant effect between 404 and 500 ms (p = .033), with the dispreferred context going more positive than the preferred context (see also the topographical plot in Figure 3 panel B). Given that these two conditions consist of exactly the same auditory input up to and including the gap (same initiating action plus a silent gap), this finding of a difference between the two conditions within the gap, indicates that the context which was presented a few seconds earlier leads to expectations that interact with the gap duration. This suggests that the expectation for a preferred or dispreferred response leads listeners to interpret the gap differently.

Gap onset
This specific effect found here, the greater positive shift in the dispreferred condition, might be interpreted with regards to the anticipation of an upcoming stimulus. The ERPs in Figure 4 are going towards negative from about 400 ms in both conditions, which might indicate that listeners anticipate the upcoming response they will hear. The CNV or contingent negative variation (CNV; Walter, Cooper, Aldridge, McCallum, & Winter, 1964) related to the "Bereitschaftspotential" (Kornhuber & Deecke, 1965) is a slow negative-going potential occurring when participants anticipate an upcoming motor response or stimulus (for a review, see Kononowicz & Penney, 2016). Such negative shifts, with variable distributions, have been related both to motor preparation (if a response has to be performed) and to anticipation of a meaningful stimulus (Van Boxtel & Böcker, 2004;Van Boxtel & Brunia, 1994). In the latter case, effects have also been termed stimulus preceding negativity (SPN). The negative shift visible in Figure 3 in several electrodes in both conditions could be interpreted as such an SPN. The fact that the dispreferred context (red line) leads to a larger positivity could be interpreted in terms of an SPN that starts later, leading to a net positivity. A later SPN would indicate that listeners are not anticipating a response as quickly after a dispreferred context biasing them to expect a "no" response than after a preferred context biasing them to expect a "yes" response. In other words, listeners strongly expect a "yes" response soon after the first 300 ms of the gap have ended, whereas their expectation for a "no" response builds up only later on in the gap. Note that we are not interpreting the effect we find here as a violation of an expectation for a short gap after a preferred context. At first sight, one might have expected such a violation response because of the mismatch between the long gap and an expected "yes" response. Indeed, observational studies (e.g. Kendrick & Torreira, 2015) Figure 3. Panel A shows grand average waveforms time-locked to gap onset after a 1000 ms gap for gaps preceded by preferred contexts (black dotted line) and gaps preceded by dispreferred contexts (red line). A representative subset of 15 electrodes is shown, the locations of which are indicated on the head at the middle right of the panel. Panel B shows a distribution plot of differences in grand average waveforms between "no" and "yes" responses for the window in which a significant effect was found. Electrodes that are significant in at least 50% of the time window are highlighted in white. Panel C shows difference waves for "no" minus "yes" responses for a representative electrode, the location of which is indicated on the small head.
suggest the existence of some kind of threshold around 600 ms, after which dispreferred responses become more frequent than preferred ones. However, the effect we find occurs much earlier than 600 ms into the gap. Moreover, one would expect that the relative proportions of preferred vs. dispreferred responses (and consequently the expectations of participants) do not reverse abruptly, but rather change more gradually as the gap unfolds, making one time-locked mismatch response quite unlikely.
A time-frequency analysis between 4 and 30 Hz and between 0 and 1000 ms showed a marginally significant larger power in the theta frequency for preferred contexts relative to dispreferred contexts within the 1000 ms gap (p = .09, see Figure 4). The effect started around 200 ms in some electrodes and lasted up to around 800 ms, with a distribution that was maximal in central areas. This effect can be viewed as a trend for a relative increase in theta power during long gaps when a preferred rather than a dispreferred response is expected. Given the weakness of this effect, it has to be replicated by future studies before we can be more certain about its relevance. Nevertheless, we offer some speculative interpretations here.
Theta has been related to language processing, with for example power increases when participants are processing open class words, possibly due to retrieving lexico-semantic information of words (e.g. Bastiaansen, Linden, Keurs, Dijkstra, & Hagoort, 2005). Stronger theta oscillations have also been found in response to semantic violations (Hagoort et al., 2004). Although no language input is present within the gap in either condition, as mentioned in the introduction, observations of conversational behaviour suggest that response timing itself has semiotic significance. Thus the ongoing silence itself might also be viewed as Figure 4. Time-frequency results in the 1000 ms gap. Panel A shows relative differences of grand average time-frequency representations after a preferred context relative to a dispreferred context, time-locked to gap onset (or question offset). The solid colours indicate marginally significant effects (in theta). A representative subset of 15 electrodes is shown (for locations, see Figure 3). Panel B shows a topographical plot of the theta effect (5-9 Hz) between 300 and 500 ms. Electrodes with marginally significant effects in at least 30% of the time window are highlighted in white.
meaningful information that has to be processed and integrated with the context. In the case that listeners expect a "yes" response based on the context, the semiotics of the silence they hear (after some time) might be more difficult to integrate with their current expectations and might thus be interpreted as a kind of "semantic, or at least semiotic, violation". Note that oscillatory changes might be more suited to detect a "violation" effect here than ERPs because we assume that the violation would not be time-locked to one specific position but would build up over time. Alternatively, the theta effect might be related to predictive processing of the upcoming speech during the gap. An MEG study (Dikker & Pylkkänen, 2013) compared written words that were rendered predictable by a preceding picture of the object described by that word, with words rendered unpredictable by a preceding picture of a whole class of objects. They found increased theta power in occipital areas immediately before predictable relative to unpredictable words. Another MEG study (Bastiaansen, Magyari, & Hagoort, 2010) found a linear increase in theta power across syntactically structured sentences (see also, e.g. Bastiaansen, Van Berkum, & Hagoort, 2002). Although it was interpreted differently, such a theta increase potentially corresponds to an increase in contextual constraint, allowing for stronger prediction of the next word. Assuming this can be generalised to the auditory modality (although probably in different brain areas), a similar process could be going on in the present study. Listeners presumably predict a certain response (either "yes" or "no") in both conditions, based on the context. However, predictions about the timing of these responses might differ. If listeners predict the "yes" response to appear earlier, theta power might increase at an earlier point within the long gap (i.e. just before the word is predicted) than in the case of a predicted "no" response, which might not be predicted until after a longer gap. Thus, in this latter interpretation, a theta enhancement within in the gap after preferred contexts suggests that prediction of a preferred upcoming response starts earlier and/or is stronger than prediction of a dispreferred upcoming response.

Conclusion
The present experiment looked at listeners' expectations of the timing of different types of responses. We biased listeners' expectations for preferred or dispreferred responses using a contextual manipulation. The late positivity found by Bögels, Kendrick, et al. (2015) for plain "no" responses was not replicated fully when a reason for a rejection was given beforehand, suggesting that at least part of this earlier finding was caused by searching for an account for the dispreferred response. On the other hand, we did find a trend towards an earlier positivity for plain "no" responses, even if the account was clear, so we cannot exclude any socioemotional effects of the rudeness of a plain "no".
Furthermore, we showed that biasing contexts appear to affect listeners' expectations for the timing of preferred and dispreferred responses. That is, listeners appeared to expect upcoming responses earlier in the long gap when they were preferred than when they were dispreferred. Dispreferred responses showed a later stimulus preceding negativity as well as a trend towards a weaker theta effect which suggests delayed prediction of incoming speech material.
The effects found in the present study appeared very subtle. This is not very surprising if one considers the following. In the experiment, two main sources informed listeners' expectations about the response, namely context and gap length. Of these two, context can generally be expected to be a stronger cue (although that has not been tested to our knowledge), because it is more explicit and has fewer alternative explanations (e.g. a longer pause could also be due to the responder being distracted). Moreover, context was 100% reliable, whereas gap length was not informative at all within the experiment. Since all the reported effects hinge crucially on a role of gap length (in interaction with, or on top of contextual cues) in anticipating responses, it can be expected that the effects are modest. Other types of paradigms are needed to show the relative importance of different cues on their own and in different combinations. Finding any effects at all shows that gap length is relevant to listeners, even under these nonideal circumstances. Furthermore, the participants in this experiment were overhearers of a conversation. It can be expected that effects will be larger for actual participants in a conversation, because they are presumably more motivated to anticipate the upcoming response which is much more relevant for them personally. Still, the subtlety of the effects, together with the exploratory nature of the present study, makes it imperative that these results are replicated in future research (preferably in interactive situations as well).
Thus, importantly, our study suggests that the processing of conversation involves the generation of expectations about upcoming responses and the timeliness with which they will be delivered. Even during an unfolding gap, expectations appear to be revised in line with the lengthening silence, demonstrating the dynamic nature of comprehension. It thus throws further light on how language is processed in its "home" ecological niche, conversation. Notes 1. Since the late positive effect manifested as a main effect between "no" and "yes" responses, two alternative explanations for the effect were discussed in the paper, but deemed unlikely (Bögels, Kendrick, et al., 2015, pp. 11-12). First, a base frequency effect ("yes" is more frequent than "no" in spoken corpora) seems unlikely because frequency effects in earlier studies had an opposite direction (e.g. Polich & Donchin, 1988) or were found to occur earlier and at more posterior sites (e.g. Sereno et al., 1998). Second, effects related to semantic difficulties of negation have been shown to largely disappear when pragmatically licenced (Nieuwland & Kuperberg, 2008). 2. Note that this study was not designed to investigate the brain's response to the realization that the upcoming response would be preferred versus dispreferred, as anticipated from the context. Our stimuli (partly consisting of natural questions) presumably differ quite a bit in when exactly participants would be able to start anticipating the response. In some cases, the context alone will provide enough information to anticipate the type of response, in other cases, part or all of the question is also needed (see Tables 1 and 2 for examples of different contexts and questions). Thus, we do not expect to see an effect of the anticipation of a certain type of response per se. Rather, we are interested in a potential interaction between the context information and the length of the gap, which we might observe within long gaps.