Anticipating predictability: an ERP investigation of expectation-managing discourse markers in dialogue comprehension

ABSTRACT In two ERP experiments, we investigated how the Dutch discourse markers eigenlijk “actually”, signalling expectation disconfirmation, and inderdaad “indeed”, signalling expectation confirmation, affect incremental dialogue comprehension. We investigated their effects on the processing of subsequent (un)predictable words, and on the quality of word representations in memory. Participants read dialogues with (un)predictable endings that followed a discourse marker (eigenlijk in Experiment 1, inderdaad in Experiment 2) or a control adverb. We found no strong evidence that discourse markers modulated online predictability effects elicited by subsequently read words. However, words following eigenlijk elicited an enhanced posterior post-N400 positivity compared with words following an adverb regardless of their predictability, potentially reflecting increased processing costs associated with pragmatically driven discourse updating. No effects of inderdaad were found on online processing, but inderdaad seemed to influence memory for (un)predictable dialogue endings. These findings nuance our understanding of how pragmatic markers affect incremental language comprehension.


Introduction
Language comprehenders seem to use the available context to actively predict upcoming linguistic information (for reviews, see e.g. Federmeier, 2007;Kamide, 2008;Kuperberg & Jaeger, 2016;Kutas, DeLong, & Smith, 2011;Pickering & Garrod, 2013;Van Petten & Luka, 2012). Predictability effects in language processing have been particularly well-established in event-related brain potentials (ERPs), which can provide functionally specific measures of the cognitive and neural processes involved in incremental dialogue comprehension. However, it remains unclear how the broad range of informative lexical, syntactic and pragmatic cues in the context is used to generate and revise ongoing predictions. Against this background, the current study examined how previously established ERP effects of predictability are modulated by two specific pragmatic cues: Dutch eigenlijk (≈ "actually, in fact") and inderdaad (≈ "indeed").
Inderdaad and eigenlijk belong to the broad class of discourse markers (also referred to as pragmatic markers, pragmatic particles, discourse particles, or discourse connectives), which are linguistic elements that encode a relation between the sentence in which they occur and the surrounding discourse situation (e.g. Aijmer, 2002;Fischer, 2006;Fraser, 1999;Schiffrin, 1988). By using inderdaad and eigenlijk, speakers can demonstrate sensitivity to their addressee's likely expectations, as illustrated in the following constructed Dutch dialogue: (1) A: Je hebt vast genoten van  In the answers in (1), inderdaad and eigenlijk respond to the expectation that can be inferred from A's suggestive question: they mark either alignment (inderdaad) or misalignment (eigenlijk) between what B says and what B thinks A expects to hear.
Theoretically, discourse markers are assumed to manage the course of the conversation: they "function as instructions from the speaker to the hearer on how to integrate the host unit into a coherent mental representation of the discourse" (Mosegaard-Hansen, 1998, p. 358; see also Aijmer & Simon-Vandenbergen, 2004;Blakemore, 2002;Fox Tree, 2010;Schourup, 1999). Whereas theoretical research on discourse markers has extensively investigated under which conditions they are used by the speaker, surprisingly little is known about how they are used by the comprehender. Although there are studies on temporal and causal discourse connectives (e.g. Canestrelli, Mak, & Sanders, 2013;Nieuwland, 2015;Xiang & Kuperberg, 2015) and focus particles (Gerwien & Rudka, in press;Kim, Gunlogson, Tanenhaus, & Runner, 2015), these have not been investigated in conversational contexts, which is the typical environment of eigenlijk and inderdaad. Effects of disfluencies (uh/uhm) and repairs (oh, I mean) on language processing have been studied in interactive contexts (e.g. Fox Tree, 2001;Fox Tree & Schrock, 1999), but their assumed conversation-managing function (Clark & Fox Tree, 2002) remains controversial (e.g. Finlayson & Corley, 2012;Schegloff, 2010).
The present study empirically investigated the theoretically assumed function of eigenlijk and inderdaad by examining their effects on online language comprehension. We investigated to what extent comprehenders use the pragmatic information encoded in eigenlijk and inderdaad to guide their expectations about likely dialogue continuations during reading. This allowed us to refine theoretical claims about the facilitating role of discourse markers for the comprehender, which are almost exclusively based on language production and hence remain underspecified regarding the affected comprehension processes.

Electrophysiological effects of word predictability
We assessed multiple electrophysiological signatures of predictability-related processing to address whether, and at which processing stages, the presence of eigenlijk or inderdaad affected processing of subsequent (un)predictable words. One focus was the N400, a negativity that peaks over centro-parietal sites around 400 ms after the onset of a potentially meaningful stimulus. Although some disagreements remain regarding this component's exact interpretation in terms of retrieval or integration (Brouwer, Crocker, Venhuizen, & Hoeks, 2017;Brown & Hagoort, 1993;Kutas & Federmeier, 2000Nieuwland et al., in press;van Berkum, 2009), there is broad consensus that the N400 reflects semantic processing. Its amplitude is strongly negatively correlated with a word's cloze probability in a sentence (Kutas & Hillyard, 1980, 1984, that is, the proportion of participants who complete a truncated version of the sentence with that word in an offline task. N400 amplitude is also reduced by predictability based on extra-sentential information, such as the wider discourse context (e.g. Federmeier & Kutas, 1999;Nieuwland & van Berkum, 2006;Otten & van Berkum, 2008), specific pragmatic expressions (e.g. negation, Nieuwland & Kuperberg, 2008;scalar statements, Nieuwland, Ditman, & Kuperberg, 2010;counterfactuals, Nieuwland & Martin, 2012;connectives, Xiang & Kuperberg, 2015), general world knowledge (e.g. Hagoort, Hald, Bastiaansen, & Petersson, 2004), and voice-based pragmatic inferences about speaker characteristics ( van Berkum, van den Brink, Tesink, Kos, & Hagoort, 2008). In addition, N400 amplitude is sensitive to long-term memory structure, in that it is attenuated in response to words that are unpredictable, if they are semantically related to predictable words (Federmeier & Kutas, 1999).

The present study
In the present study, participants read dialogues (see Table 1) for comprehension while their EEG was recorded. The dialogues ended in contextually predictable or less predictable words, and the presence of the discourse markers eigenlijk (Experiment 1) or inderdaad (Experiment 2) was manipulated. Dialogues containing a discourse marker were compared with the same dialogues in which the discourse marker was replaced by a control adverb (e.g. gisteren "yesterday"; see Methods). We evaluated the above-mentioned ERP components to address whether, and at which processing stages, the presence of inderdaad and eigenlijk affected processing of subsequent (un)predictable words.
In control dialogues with an adverb, we expected reduced N400 amplitudes in response to predictable relative to unpredictable dialogue continuations, in line with prior research. We hypothesised that, if discourse markers affect semantic processing of subsequently read input, then having encountered inderdaad (as opposed to an adverb) should result in a further reduction of N400 amplitude elicited by predictable words, and possibly increased N400 amplitude in response to unpredictable words. Conversely, we expected that encountering the warning signal eigenlijk (relative to the control adverb) would yield reduced N400 amplitude in response to unpredictable words, and possibly enhanced N400 amplitude elicited by predictable words. Furthermore, if pragmatic expectationmanaging cues impact later processing stages, we predicted eigenlijk-and inderdaad-based modulations of post-N400 positivities. In line with previous studies, we expected a frontally distributed post-N400 positivity in response to unpredictable but plausible words, relative to predictable words, in control dialogues without a discourse marker. This predictability effect was hypothesised to be enhanced in response to words following a confirmatory cue (inderdaad), and attenuated or reversed for words following an adversative cue (eigenlijk). In addition, the pragmatic anomaly created by presenting less predictable words after inderdaad (normally associated with expectation confirmation), or predictable words after eigenlijk (normally marking unexpectedness), could lead to integration difficulty and hence elicit a posterior post-N400 positivity relative to pragmatically congruent conditions (e.g. DeLong et al., 2014).
Finally, after reading the dialogues, participants performed a recognition memory task, which allowed us to explore whether the presence of expectation-managing discourse markers affected the quality of the representations of predictable and unpredictable words in memory. Previous research has investigated how word predictability affects memory, but without manipulating discourse markers. Some studies have shown better memory for predictable words than for less predictable words (e.g. Miller & Selfridge, 1950;Riggs, Wingfield, & Tun, 1993), perhaps due to a better integrated memory trace. Other studies have shown poorer memory for predictable words than for unpredictable words (e.g. Cairns, Cowart, & Jablon, 1981;Corley, MacGregor, & Donaldson, 2007;Federmeier et al., 2007;O'Brien & Myers, 1985;Perry & Wingfield, 1994), possibly because more predictable input is processed less thoroughly (e.g. Rommers & Federmeier, 2018a; van Berkum, 2010). We hypothesised Notes: Dutch word order was maintained in the translations of the target sentences. The critical manipulations are underlined (control adverb/discourse marker) or in boldface (critical word; CW). "Plain Predictability" refers to the predictability of the CW in the adverb conditions (i.e. predictability in the absence of a discourse marker); "Pragmatic Coherence" indicates the pragmatic fit of the (un)predictable CW when following a discourse marker.
that potential memory differences between predictable and unpredictable words would be enhanced for words following a pragmatic prediction confirmation cue (inderdaad), and reduced for words following a pragmatic prediction disconfirmation cue (eigenlijk).

Participants
Each experiment was conducted with 40 participants ( Data from three participants in Experiment 1 and six participants in Experiment 2 that exhibited excessive artifacts such as blinks, drifts, eye movements or excessive muscle activity (>30% of trials affected) were excluded from further analyses. Two additional participants were excluded from Experiment 1 because of poor performance in the memory task (negative discriminability values, likely due to a misinterpretation of the response scale or a lack of attention during the reading experiment). Hence, for Experiment 1, 35 participants were included in the ERP analyses and 38 in the behavioural memory analyses; in Experiment 2, ERP analyses included 34 participants and behavioural memory analyses were performed on all 40 participants.

Materials
Experimental items consisted of 144 Dutch written conversations in easily imaginable situations. Each item consisted of an introductory context sentence, followed by a question-answer pair. The combination of the context sentence and the question were created to evoke a specific lexical-semantic prediction. Answers occurred in four conditions that combined two factors: a) the answers contained a critical word that was either predictable or unpredictable on the basis of the prior context, and b) the critical words were either preceded by an adverb or adverbial phrase (e.g. gisteren, "yesterday", graag, "happily"; adverbs varied across items and were included to keep sentence structure comparable across conditions) or by an expectation-managing discourse marker (eigenlijk in Experiment 1, inderdaad in Experiment 2). An example of an experimental item in all conditions is presented in Table 1.
Predictability of the critical word was determined on the basis of a web-based cloze test, in which participants read 180 experimental conversations in three conditions (adverb vs. eigenlijk vs. inderdaad; 60 items per condition, counterbalanced across three lists; 20 participants per list), and were asked to finish the truncated answer. We selected 72 lemmas with the highest cloze probability in the Adverb-condition as predictable critical words. The 72 unpredictable critical words were selected from the completions provided in the Eigenlijk-condition to ensure their semantic plausibility; these words had a lower cloze probability in the baseline condition. Note that the denotation of critical words as predictable or unpredictable follows from their cloze probabilities in the Adverb-conditions (referred to as Plain Predictability in Table 1), rather than from their cloze values in the discourse marker conditions. For example, a critical word with high cloze probability in the Inderdaad-condition but low cloze probability in the Eigenlijk-condition will have high Plain Predictability in both cases. Mean cloze probabilities of the critical words in the 144 selected items in each condition are provided in Table 2.
A logistic mixed-effects regression analysis of cloze probabilities of the selected critical words showed a Predictability by Discourse Marker cross-over interaction (comparing models with vs. without the interaction effect: χ 2 (2) = 2616, p < 0.001). Follow-up analyses of predictable and unpredictable critical words separately showed that, relative to the Adverb-condition, cloze probabilities of plain-predictable critical words were higher in the Inderdaad-condition (β = 0.32, SE = 0.08, p < 0.001) and lower in the Eigenlijk-condition (β = −2.46, SE = 0.15, p < 0.001); conversely, cloze probabilities of plain-unpredictable critical words were lower in the Inderdaad-condition (β = −6.05, SE = 1.58, p < 0.001) and higher in the Eigenlijk-condition (β = 2.47, SE = 0.18, p < 0.001) when compared with the Adverb-condition. Predictable and unpredictable critical words were comparable in terms of length (M = 7.0 vs. 6.7 characters, respectively) and frequency (M = 67 vs. 85 per million words in CELEX; Baayen, Piepenbrock, & Gulikers, 1995). Dialogues in each condition were divided across four counterbalanced lists such that participants would see each experimental item in only one condition. Seventytwo filler items were added to each list. Filler items had the same structure as the experimental conversations (context sentence followed by a question-answer pair) and contained an adverb or a discourse marker (eigenlijk in Experiment 1, inderdaad in Experiment 2), but all contexts were weakly constraining. 1 In sum, the resulting lists each consisted of 216 conversations, half of which contained an expectation-managing discourse marker in the answer. Lists were pseudo-randomized individually for each participant.
For the memory test, a subset of 96 critical words were selected under the constraint that they appeared only once in the experimental conversations (24 per condition). Each list was supplemented with 48 words that occurred in the filler conversations, and 48 new (unseen) words similar in length and frequency to the old (seen) words. This resulted in 192 words per list; the order of items on each list was pseudo-randomized for each participant.

Procedure
Participants were tested individually in a dimly lit soundproof booth, seated at a viewing distance of approximately 100 cm from a computer screen. For the reading phase, participants were instructed to silently and attentively read the conversations for comprehension, and to avoid blinks, muscle movements and eye movements.
Stimuli were presented in a black Lucida Console font, 26-point size, on a white background. Each trial started with a centred fixation cross which remained on the screen for 1000 ms, followed by a 500 ms blank screen. Next, the context sentence was presented in full, followed by the question that also occurred in full. Participants read both sentences at their own pace, and pressed a button to continue. After the question, the first part of the answer (e.g. "Diane says") appeared and remained on the screen for 800 ms. The presentation time of the subsequent words in the answer sentence was variable to mimic relatively natural reading (e.g. Nieuwland & van Berkum, 2006). Word duration was computed as (number of letters * 30) + 190 ms, with a maximum of 400 ms. Adverbs/discourse markers and critical words had a fixed duration of 400 ms and the inter-stimulus interval was fixed at 150 ms. The final word of the sentence appeared with a period and was presented for 800 ms, after which the next trial started automatically. Participants started with 4 practice items, and then completed 6 blocks of 36 experimental items, separated by self-timed breaks. This part of the experiment took 45-60 min.
After the reading experiment, participants took a 30 s math test to clear their verbal short-term memory, after which they started the word recognition test. Participants were asked to judge whether they recognised words from the reading experiment, indicating how confident they were of their response (zeker nieuw "sure new"; misschien nieuw "maybe new"; misschien oud "maybe old"; zeker oud "sure old"). Each target word (black Lucida Console font, 26-point size) was presented in the centre of the screen. After 1500 ms, the four answer options appeared below the target word, matching in colour and linear order with four buttons on the button box. Participants were instructed to wait for the answer options to appear on the screen before pressing a button. It took participants about 15 min to complete the memory test; the full experimental session took on average 2 h.

EEG recording and preprocessing
The EEG was recorded from 31 active cap-mounted Ag/ AgCl electrodes (actiCAP, Brain Products GmbH), referenced online to the left mastoid. Blinks and eye movements (EOG) were measured via four electrodes located at the outer canthi of both eyes and above and below the left eye. Electrode impedances were kept below 20 kΩ. Signals were amplified using BrainAmp DC amplifiers with a band-pass filter between 0.01 and 150 Hz and digitised at a sampling frequency of 500 Hz. The EEG signal was re-referenced to the mean of the left and right mastoids and bipolar EOG derivations were created. The continuous EEG was filtered with a 0.1 Hz high-pass filter (two-pass Butterworth with a 12 dB/oct roll-off) and segmented into epochs encompassing the signal from −200 ms until 1000 ms relative to the onset of the critical word. A 200 ms pre-stimulus baseline was subtracted. Segments containing blinks, drifts, eye movements, or excessive muscle activity were removed in a semi-automatic fashion using participant-specific thresholds. In Experiment 1, a total of 9% of the trials was removed, with similar trial numbers remaining across conditions: Predictable Adverb 33 ± 3 (mean ± SD), Predictable Eigenlijk 33 ± 3, Unpredictable Adverb 33 ± 2, Unpredictable Eigenlijk 33 ± 2. In Experiment 2, the overall trial loss was 8%; similar trial numbers across conditions remained: Predictable Adverb 33 ± 3, Predictable Inderdaad 33 ± 2, Unpredictable Adverb 33 ± 3, Unpredictable Inderdaad 33 ± 3.

Data analysis
In correspondence with most of the literature, ANOVAs are reported, supplemented with confidence intervals and effect sizes (Cohen's d z for within-subject designs). ANOVAs included the factors Plain Predictability (predictable, unpredictable), DM (adverb, eigenlijk/inderdaad) and their interaction as within-subject variables. In order to investigate the topographic distribution of the post-N400 positivity effects, Hemisphere was included as additional variable.
EEG data acquired during the memory test were not further analyzed; after data preprocessing too few trials were left per condition. Behavioural memory performance was analyzed by computing a standard signal detection-theoretic index of discriminability (d a ) per condition per participant. Discriminability was analyzed by means of repeated-measures ANOVAs including Predictability, Discourse Marker, and their interaction.

Experiment 1 (eigenlijk)
Event-related potentials Figure 1 shows the ERPs time-locked to the presentation of the critical words. After a visual P1, N1 and P2, an N400 was elicited, followed by a late positive-going wave. Figure 2 presents the scalp topographies of the difference waves between conditions in the N400 and post-N400 positivity time window.
With respect to later processing stages, we hypothesised a frontally distributed post-N400 positivity effect for unpredictable but plausible dialogue continuations (e.g. DeLong et al., 2014). Surprisingly, analyses of mean amplitudes in the post-N400 time window revealed no evidence for a difference in anterior post-N400 positivity between predictable and unpredictable words, F (1, 34) = 1.567, p = 0.219. Neither did we find evidence for a difference between words following eigenlijk and words following an adverb, F (1, 34) = 0.901, p = 0.349, nor for a Predictability x DM interaction effect, F (1, 34) = 0.330, p = 0.570. Note, however, that the waveforms suggested considerable component overlap between N400 and post-N400 effects, as observed elsewhere previously (e.g. Hagoort, 2003;Hoeks & Brouwer, 2014;Kutas, Van Petten, & Kluender, 2006). It is possible that the earlier N400 difference between predictable and unpredictable words obscured any later Predictability effects on post-N400 positivities. Indeed, post-hoc analyses taking N400 amplitude differences into account did suggest effects in the expected direction; these are reported in the Appendix.
With respect to the posterior post-N400 positivity, we hypothesised to find increased amplitudes for pragmatically anomalous dialogue continuations, that is, when eigenlijk (signalling unexpectedness) was followed by a contextually predictable word. Analyses of mean amplitudes in the post-N400 time window provided no evidence for such a DM by Predictability interaction, F (1, 34) = 0.080, p = 0.779. We did find more positive-going waveforms by 0.67 μV (95% CI [0.37, 0.96], d z = 0.54) in response to words following eigenlijk compared with words following an adverb, F (1, 34) = 11.747, p = 0.002. This DM effect tended to interact with Hemisphere, F (1, 34) = 3.682, p = 0.063; pairwise t tests between mean amplitudes revealed a larger effect of eigenlijk over the right hemisphere, t (34) = 3.88, p < 0.001, than over the left hemisphere, t (34) = 2.48, p = 0.018. These results suggest that the effect of eigenlijk on the post-N400 posterior positivity elicited by subsequent words may be independent of their predictability.
Taken together, the ERP findings from Experiment 1 corroborate earlier findings that a word's plain predictability modulates N400 amplitude, but we found no evidence that eigenlijk attenuated or reversed this N400 effect. Standard analyses provided no evidence for anterior post-N400 positivity modulations, but revealed that the presence of eigenlijk affected incremental processing of subsequent words, such that words following eigenlijk elicited an enhanced posterior positivity compared with words following an adverb.

Behavioural memory performance
The percentage of critical words that had been correctly recognised from the reading experiment (73%) was larger by 57% (95% CI [53.0, 61.4], d z = 5.78) than the percentage of false alarms to unseen words (16%). This difference yielded an average d a of 1.64. The fact that participants readily distinguished between seen and unseen words suggests that they had been paying attention during the reading experiment.
For words that appeared in the experimental dialogues, discriminability (d a = 1.75) was not influenced by Predictability, Discourse Marker, or their interaction (all F < 0.007). We hence found no evidence that plain predictability or the presence of eigenlijk in the reading experiment affected participants' word recognition memory. Figure 1. Grand-average ERPs time-locked to critical words at 9 scalp electrode sites (Experiment 1). Critical words were plain-predictable (green lines) or plain-unpredictable (red lines), and followed an adverb (solid lines) or eigenlijk (dashed lines). Negative is plotted up in all ERP figures.

Event-related potentials
The ERPs time-locked to the presentation of the critical words are presented in Figure 3; Figure 4 shows the scalp topographies of the difference waves between conditions in the N400 and post-N400 positivity time window.
As in Experiment 1 and many previous studies, the N400 elicited by predictable words was smaller compared with the N400 elicited by unpredictable words, by 0.62 μV (95% CI [0.29, 0.97], d z = 0.29), F (1,33) =5.914, p = 0.021. Contrary to our hypothesis that inderdaad would increase this Predictability effect, we found no evidence for a Predictability x Discourse Marker interaction, F (1,33) = 0.022, p = 0.884. Neither did we find evidence for a difference in N400 amplitude between words following inderdaad compared to words following an adverb, F (1,33) = 0.239, p = 0.628.
Corroborating evidence from previous studies, the anterior post-N400 positivity was more positive-going by 0.60 μV (95% CI [0.15, 1.18], d z = 0.36) for unpredictable words when compared with predictable words, F (1,33) = 4.359, p = 0.045. Although most previous studies observed a left frontal maximum (e.g. DeLong et al., 2014), the scalp topography in Figure 4 suggested that the Predictability effect in the current study was right-lateralized. This was confirmed by a Predictability by Hemisphere interaction, F (1,33) = 6.320, p = 0.017; pairwise t-tests revealed a difference between predictable and unpredictable words over the right hemisphere, t (36) = 2.77, p = 0.009, but not over the left hemisphere, t (36) = 1.18 p = 0.245. Our hypothesis that inderdaad would modulate this frontal positivity effect was not confirmed: we found no evidence for a Predictability x Discourse Marker interaction, F (1,36) = 0.678, p = 0.416, nor for a main effect of Discourse Marker, F (1,33) = 0.325, p = 0.573.
In sum, the ERP findings from Experiment 2 corroborate earlier findings that a word's predictability modulates the N400 and post-N400 anterior positivity, but we found no evidence that the pragmatic confirmation encoded in inderdaad enhanced predictability effects on subsequent words online, nor that the presence of inderdaad otherwise affected incremental processing of subsequent words when compared with adverbs.

Behavioural memory performance
The percentage of words that had been correctly recognised (73%) was larger by 54% (95% CI [49.9, 58.1], d z = 5.57) than the percentage of false alarms to unseen words (19%). This difference waspresent in all participants and yielded an average d a of 1.52.   We found no evidence for a difference in memory performance between predictable (1.70) and unpredictable (1.63) words, F (1,39) = 1.551, p = 0.220, but there was an effect of DM indicating better seen/new discriminability for seen words in the presence of inderdaad (1.73) than in the presence of an adverb (1.61) (difference 0.12, 95% CI [−0.04, 0.28], d z = 0.26), F (1,39) = 5.81, p = 0.021. There was no evidence for a Discourse Marker by Predictability interaction effect, F (1, 39) = 0.048, p = 0.828. Thus, results from Experiment 2 suggest that processing words in the presence of inderdaad as opposed to an adverb had positive downstream consequences for their accessibility in memory.

Between-experiments comparison
To explore differences between eigenlijk and inderdaad more directly, we combined the data from both experiments. This revealed that Predictability effects on the N400 in response to critical words were overall attenuated in the inderdaad-experiment (0.63 μV) compared with the eigenlijk-experiment (1.48 μV); Predictability by Experiment interaction, F (1, 67) = 5.70, p = 0.020 (difference 0.85 μV, 95% CI [0.14, 1.57], d = 0.57). Note that this was even the case for dialogues without a discourse marker, although these dialogues were identical across experiments (difference 1.23 μV, 95% CI [0.13, 1.89], d s = 0.54), F (1, 67) = 4.95, p = 0.030. This suggests that the regular presence of expectation-managing discourse markers in an experimental context affected semantic processing of subsequent input at a more global, experiment-wide level. There was no Discourse Marker by Experiment interaction or a three-way Predictability by Discourse Marker by Experiment interaction (all F < 0.95).
With respect to later processing stages, combined analyses confirmed a similar lack of evidence for effects of discourse markers on the anterior post-N400 positivity for inderdaad and eigenlijk, as there were no interactions with Experiment (all F < 1.15). As for the posterior post-N400 positivity, recall that Experiment 1 showed that words following eigenlijk elicited a positivity relative to words following a control adverb, and no effects were found for inderdaad in Experiment 2. A combined analysis confirmed a Discourse Marker by Experiment interaction, F (1,67) = 4.68, p = 0.034, suggesting that only eigenlijk affected incremental processing of subsequent input. There was no evidence for a three-way interaction, F (1,67) = 0.07, p = 0.783.
Finally, regarding recognition memory, a combined analysis of d a scores across experiments showed a tendency for a Discourse Marker by Experiment interaction, F (1,76) = 3.30, p = 0.073, confirming that only the presence of inderdaad affected recognition memory for subsequently processed words. Follow-up independent t-tests suggested that memory for words that had followed a discourse marker did not differ between experiments, t (67) = 0.20, p = 0.839 (difference 0.02, 95% CI [−0.28, 0.35], d s = 0.03); rather, recognition memory for words occurring in dialogues with an adverb was worse in the inderdaad-experiment than in the eigenlijk-experiment, t (67) = 1.76, p = 0.080 (difference 0.14, 95% CI [−0.04, 0.60], d s = 0.28), even though materials in these conditions were identical. This again suggests that the regular presence of expectation-managing discourse markers in the experimental context affected processing more globally. There was no evidence for a threeway interaction, F (1,76) = 0.02, p = 0.900.

Discussion
The present study set out to empirically test and further specify theoretical assumptions about the conversationmanaging function of discourse markers by examining their effects on the comprehender. In two experiments, we investigated to what extent the pragmatic information encoded in two expectation-managing discourse markerseigenlijk, marking upcoming prediction disconfirmation, and inderdaad, marking upcoming prediction confirmationaffected processing of subsequent (un)predictable input. We hypothesised that, upon encountering a discourse marker (relative to a control adverb in the baseline conditions), comprehenders would adjust their initial expectations about likely dialogue continuations during reading. As such, the presence of inderdaad was hypothesised to increase, and the presence of eigenlijk to reduce or alter predictability effects, as measured by modulations of N400 and post-N400 positivity amplitudes elicited by subsequently read words.
The results from control dialogues with an adverb corroborated earlier findings that a word's predictability modulates both N400 and post-N400 anterior positivity amplitudes. The topographic distribution of predictability effects on the post-N400 anterior positivity in our study did differ from previous findings: whereas most previous studies reported a left frontal maximum (e.g. DeLong et al., 2014), the predictability effect in Experiment 2 was right-lateralized. We are hesitant to derive strong conclusions from scalp topographical details, but speculatively relate the right-hemispheric bias to the pragmatic predictability manipulation in our experiments: previous studies have reported right-biased late positivities in relation to jokes (Coulson & Kutas, 2001;Coulson & Lovett, 2004) and indirect requests (Coulson & Lovett, 2010), which similarly have a pragmatic basis.
The results provided no evidence for eigenlijk-or inderdaad-based modulations of predictability effects on the N400, despite the fact that both discourse markers affected predictability in offline cloze probabilities, which are known to be predictive of N400 amplitude (e.g. Kutas & Hillyard, 1984). Apparently, inderdaad did not cause comprehenders to strengthen the likely prediction, nor did eigenlijk cause comprehenders to discard the likely prediction or change it to the unexpected word to an extent measurable in the N400. To the extent that initial predictions were not discarded, this is consistent with previous studies reporting lingering predictions or interpretations (e.g. Christianson, Hollingworth, Halliwell, & Ferreira, 2001;Corley, 2010;Lowder & Ferreira, 2016;Rommers & Federmeier, 2018b). Another possible explanation for the observed lack of modulation of the N400 by discourse markers is that the unpredictable words were selected from completions of dialogues containing eigenlijk. This ensured that the stimuli were relatively natural, but as a side effect the unpredictable words were often semantically related to the predictable words: pairwise semantic similarity (LSA) values obtained on the basis of their English translations (following Chwilla & Kolk, 2002) ranged from 0.05-0.81 (M = 0.25, SD = 0.21). This semantic similarity between predictable and unpredictable words may have reduced N400 amplitude (e.g. Federmeier & Kutas, 1999). For instance, when eigenlijk marked the unexpectedness of a colour (e.g. green), but the alternative was also a colour (e.g. grey), any switch from an original to a revised prediction may have had only minor consequences at the level of the N400. A final possibility is that refining or revising initial predictions may involve computations that require more time than participants had between encountering the discourse marker and reading the critical word (for evidence that contextual facilitation takes time, see Camblin, Ledoux, Boudewyn, Gordon, & Swaab, 2007;Chow, Lau, Wang, & Phillips, 2018;Wlotko & Federmeier, 2015). Future studies could investigate whether discourse markers can modulate N400 predictability effects if more time is available for prediction. This could be done by using slower stimulus presentation or, in an ecologically more valid way, by presenting discourse markers in sentence-initial position (e.g. eigenlijk hebben we alleen maar regen gehad "[actually] have we nothing but rain had").
Regarding the post-N400 anterior positivity, we found no evidence for inderdaad-or eigenlijk-based modulations of predictability effects, hence not supporting the hypotheses that the presence of inderdaad increases or the presence of eigenlijk reduces processing costs associated with prediction disconfirmations. One possible reason is component overlap with the preceding N400 effect, a possibility explored in the Appendix. The possibility that comprehenders did not use the discourse markers to adjust their expectations about subsequent words seems inconsistent with recent findings from a visual world eye-tracking study (van Bergen & Bosker, 2018), who investigated effects of inderdaad and eigenlijk on the processing of (un)predictable dialogue continuations by measuring fixations on imagespresented on a screen. Participants read a constraining context sentence (e.g. Tineke just got back from her holiday on Ibiza, and she is very tanned), after which they listened to a question (The weather must have been great there?) and an incomplete answer that contained either a control adverb or a discourse marker (We hebben daar / inderdaad / eigenlijk alleen maar … gehad "we have there / indeed / actually had nothing but … ", where sun would be expected). Participants completed the dialogues by clicking on one of four visually presented referents. Relative to a control adverb, encountering inderdaad led to increased fixations on likely discourse referents (i.e. sun), whereas encountering eigenlijk led to increased visual attention to contextually less likely discourse referents (i.e. rain), suggesting that participants immediately integrated the pragmatic cues to modulate their predictions about dialogue continuations. However, the visual world paradigm measures potential sentence interpretations using a limited set of visual objects (for discussion, see Dahan & Tanenhaus, 2004;Henderson & Ferreira, 2004;Huettig, Rommers, & Meyer, 2011). The current ERP study did not constrain potential dialogue interpretations visually, which suggests that effects of eigenlijk and inderdaad on online processing may disappear if more sentence interpretations and continuations are possible.
In interpreting the lack of predictability effects of discourse markers in the present study, we should consider that discourse markers are notoriously polyfunctional: their interpretation depends on the specific characteristics of the discourse (see e.g. Brinton, 1996;Fischer, 2006;Jucker & Ziv, 1998;Maschler & Schiffrin, 2015). Eigenlijk and inderdaad may for instance mark the assumed (un)expectedness of a name (e.g. De koningin van Nederland heet eigenlijk Máxima "the queen of the Netherlands is actually called Máxima"), an event (e.g. Na het plassen was ik inderdaad mijn handen "after peeing I indeed wash my hands") or a speech act (e.g. Hoe heet je eigenlijk? "what is your name actually?", marking the social unexpectedness of asking this question; van Bergen, van Gijn, Hogeweg, & Lestrade, 2011). Given that in principle, inderdaad and eigenlijk can signal (un)expectedness of any aspect of the discourse, comprehenders perhaps sometimes adjusted a different expectation, or needed more time to work out which likely expectation the speaker intended to modulate when using the discourse marker.
Experiment 1 did reveal an enhanced posterior post-N400 positivity in response to words following eigenlijk, although this effect was not restricted to plain-predictable and hence pragmatically anomalous words as we hypothesised. An important possibility is that the enhanced posterior positivity elicited by words following eigenlijk reflects the processing costs associated with pragmatically driven discourse updating or integration (e.g. Brouwer et al., 2012;Schumacher, 2013). After having encountered a warning signal for upcoming unexpectedness, the comprehender needs to integrate any subsequent input in relation to this adversative cue: after all, any next word may express the crucial unexpected information. It is thus possible that the mere presence of eigenlijk required additional processing of any dialogue continuation, irrespective of the (pragmatic) predictability of the subsequent information. This proposal would be compatible with the eye-tracking findings reported in van Bergen and Bosker (2018), who found that the presence of eigenlijk slowed down responses, regardless of the preferred dialogue completion; it would also corroborate the idea that arriving at a pragmatically more complex discourse interpretation is costly (see also Kurumada, Brown, Bibyk, Pontillo, & Tanenhaus, 2014).
Findings from Experiment 2 suggested that, in contrast with eigenlijk, the presence of inderdaad did not affect online processing of subsequent input. We speculate that this distinction between eigenlijk and inderdaad reflects a difference in cue informativity: a warning signal for unexpectedness (eigenlijk) is arguably a more informative cue to the comprehension system than an advance confirmation of a likely expectation (inderdaad). Again considering the polyfunctionality of discourse markers, our findings suggest that inderdaad may not typically be used to facilitate rapid on-line processing of subsequent input. Another theoretically recognised function of discourse markers is to manage interpersonal relations: speakers use discourse markers to express acknowledgement of, and attention to, their addressee's social identity or "face" (e.g, Brown & Levinson, 1987;Traugott, 2010). Although this will need further research, we speculate that inderdaad more likely serves such a socio-pragmatic goal in conversational interaction: speakers may use inderdaad to signal interpersonal agreement, with the aim of establishing social coherence.
Results from the between-experiment comparison suggested that the presence of discourse markers in the experimental context affected processing of dialogues that did not contain a discourse marker: for dialogues containing an adverb (which were identical across experiments), comprehenders who had regularly encountered inderdaad showed reduced Predictability effects on, and worse memory for, subsequently read words relative to comprehenders who had regularly encountered eigenlijk. We speculate that the occurrence of the discourse markers in the experiments modulated the overall utility of prediction. Although the proportion of (plain) predictable and unpredictable dialogue endings was the same across experiments, recall that Plain Predictability directly corresponded with Pragmatic Coherence in the case of inderdaad, but not in the case of eigenlijk. Consequently, in Experiment 1 (eigenlijk) three out of four experimental conditions contained some form of "prediction disconfirmation" (in terms of Plain Predictability, Pragmatic Coherence, or both), whereas in Experiment 2 (inderdaad), only two out of four conditions did. As a result, the overall probability of predictive success in Experiment 2 (inderdaad) was higher than in Experiment 1 (eigenlijk). Comprehenders may have been sensitive to these experiment-wide statistics, and adapted their processing accordingly: a lower likelihood of gaining new information in the inderdaad-experiment may have encouraged the comprehension system to operate in a top-down "verification mode", at the expense of thoroughly processing the bottom-up input (e.g. Rommers & Federmeier, 2018a; van Berkum, 2010; for research showing that comprehenders adapt their processing to the statistics of the experimental environment, see, e.g. Bradlow & Bent, 2008;Delaney-Busch, Morgan, Lau, & Kuperberg, 2017;Fine, Jaeger, Farmer, & Qian, 2013;Kaschak & Glenberg, 2004;Norris, McQueen, & Cutler, 2003; but see Harrington Stack, James, & Watson, 2018). Although we believe these findings enhance our understanding of the potential role of discourse markers in online language processing, future research could avoid such effects by manipulating discourse marker type within-subjects; this would also allow for a more direct comparison between eigenlijk and inderdaad.
Finally, findings from the memory test did not provide further evidence that a word's predictability during processing affected later recognition memory. This is perhaps not surprising, given that in our design, unpredictable words were often semantically related to the predictable words and were likely easy to integrate with the context. Interestingly, however, Experiment 2 did show better memory for words if they followed inderdaad than if they followed an adverb during the reading phase; we found no such effect for eigenlijk. Although this finding is in need of replication, we speculate that inderdaad might have provided positive reinforcement during incremental language processing: despite less thorough processing in the inderdaad-experiment overall, the confirmation expressed in inderdaad may have encouraged integration of subsequent information with existing knowledge (or schemata), which is known to improve later memory for that information (e.g. Brewer & Treyens, 1981;van Kesteren, Ruiter, Fernández, & Henson, 2012).
In sum, the current study makes a novel contribution to the literature on predictive language processing by investigating how the presence of pragmatic expectation-managing cues affects incremental dialogue comprehension. We hypothesised that comprehenders would use expectation-managing discourse markers to adjust their expectations about likely dialogue continuations, which in turn would modulate processing of subsequently read (un)predictable words. However, the findings provided no evidence that discourse markers modulated predictability effects elicited by subsequent words at any processing stage. We attributed this to their polyfunctionality, although the lack of evidence for effects of eigenlijk on post-N400 positivities was complicated by possible component overlap. We did find that, unlike inderdaad, the presence of eigenlijk yielded an increased late positivity, and tentatively linked this to integration of subsequent words. This difference between confirmative and adversative cues was explained in terms of informativity to the comprehension system. The presence of inderdaad seemed to haveconsequences for memory, perhaps because it encouraged integration of subsequently presented input with previously stored knowledge. In addition, our findings raise the possibility that the presence of pragmatic expectation-managing discourse markers modulated the likelihood of predictive success in an experimental context, which in turn affected comprehenders' overall processing behaviour. Taken together, our findings provide a more nuanced understanding of the theoretically assumed function of expectationmanaging discourse markers in conversational interaction. Shortly after Marte's birthday, her friend Annemarie pays her a visit.
Annemarie asks: which gifts did you get on your birthday?
Marte says: I have then/eigenlijk/indeed from my friends this bag received. 2. For the first 17 participants in Experiment 1, data acquisition did not include the electrodes for scalp sites Fp1/2, as these were used for measuring blinks and eye-movements. For these participants the average amplitude was calculated over the 8 remaining electrodes; analyses excluding Fp1 and Fp2 for all participants in both experiments yielded similar results.