Co-registration of eye movements and neuroimaging for studying contextual predictions in natural reading

ABSTRACT Sixteen years ago, Sereno and Rayner (2003. Measuring word recognition in reading: eye movements and event-related potentials. Trends in Cognitive Sciences, 7(11), 489–493) illustrated how “by means of review and comparison” eye movement (EM) and event-related potential (ERP) studies may advance our understanding of visual word recognition. Attempts to simultaneously record EMs and ERPs soon followed. Recently, this co-registration approach has also been transferred to fMRI and oscillatory EEG. With experimental settings close to natural reading, co-registration enables us to directly integrate insights from EM and neuroimaging studies. This should extend current experimental paradigms by moving the field towards studying sentence-level processing including effects of context and parafoveal preview. This article will introduce the basic principles and applications of co-registration and selectively review how this approach may shed light on one of the most controversially discussed issues in reading research, contextual predictions in online language processing.


Introduction
Sixteen years ago, Sereno and Rayner (2003) pointed out how "by means of review and comparison" eye movements (EMs) and event-related potentials (ERPs) may advance our understanding of the "what", "when" and "how" of visual word recognition in natural reading. Both methods contribute valuable insights on psycholinguistic processing in reading. EM studies provide sensitive measures of temporal and spatial progression of oculomotor control during reading that are indicative of processing effort at different psycholinguistic levels as well as attention allocation (Rayner, Sereno, Morris, Schmauder, & Clifton, 1989). ERPs, in contrast, provide reliable indications about the time-course of neural activity associated with cognitive mechanismshowever lack information on corresponding EM behaviour. Although findings from ERP and EM studies are therefore complementary, their comparison in terms of temporal processing are somewhat incongruous. That is, the time-course of visual word recognition reported in traditional ERP studies exceeds normal fixation durations during natural reading, impeding its integration into the processing timeline as derived from EM studies. Supposedly, this divergence is due to different experimental protocols: EM studies allow participants to read whole sentences or paragraphs at their own pace, whereas ERP studies prevent normal reading behaviour by imposing rapid serial visual presentation (RSVP) of isolated words. Undoubtedly, ERPs have proven to be a valuable method to study visual word recognition (see Kutas & Federmeier, 2011 for a review). However, breaking behaviour and experience down into a sequence of externally triggered events comes with limitations, questioning the suitability of RSVP with regard to studying online visual language processing. RSVP imposes three sorts of limitations: (1) Events are presented at a fixed pace, usually with presentation durations of 500-1000 ms per word, that is a multiple of the duration of a typical eye fixation during natural reading (∼200-250 ms; Kliegl, Dambacher, Dimigen, & Sommer, 2014;Rayner, 1998Rayner, , 2009). This might alter, for example by artificially prolonging, the time-course of visual word recognition . As a consequence, reading-related brain regions might be recruited beyond the intrinsic level necessary for visual word recognition during natural reading and thus show an increase in neural activation beyond naturalistic settings (Schuster, Hawelka, Richlan, Ludersdorfer, & Hutzler, 2015).
Investigating visual language processing therefore requires a method that is not only able to track the rate at which reading proceeds, but also to determine the informational role of EM behaviour and corresponding neural correlates of visual word recognition. In terms of an experimental setting close to natural reading, coregistration of EMs and neuroimaging meets these requirements. Montefusco-Siegmund, & Maldonado, 2010), or, in case of reading, a particular word during self-paced reading of multi-word stimuli, such as word lists, sentences or paragraphs (see Figure 1).
Although FRPs are a reliable and valid measure for the timing of cognitive processes, due to the low spatial resolution of EEG, their suitability for localising brain regions ascribed with certain aspects of language processing is limited (Jobard, Crivello, & Tzourio-Mazoyer, 2003;Price, 2012;Taylor, Rastle, & Davis, 2013). Consequently, coregistration has also been transferred to functional magnetic resonance imaging (fMRI), named fixation-related fMRI. Here, similar to FRPs, instead of using external triggers, the fixation-onset during online reading serves as a marker for modelling haemodynamic brain responses (Marsman et al., 2012;Richlan et al., 2014). The combined recording of EMs and fMRI has been used to investigate the neural underpinnings of natural EM behaviour during reading (Choi, Desai, & Henderson, 2014;Henderson, Choi, Luke, & Desai, 2015;Henderson, Choi, Luke, & Schmidt, 2018) and the generalisability of effects observed during isolated visual word recognition to natural reading (Bonhage, Mueller, Friederici, & Fiebach, 2015;Desai, Choi, Lai, & Henderson, 2016;Henderson, Choi, Lowder, & Ferreira, 2016;Schuster, Hawelka, Hutzler, Kronbichler, & Richlan, 2016). Together, these findings provide a first proof-of-concept that fixationrelated fMRI may pave the way not only for identifying brain areas engaging in natural reading in a spatially sensitive manner, but also to further our understanding of whether reading-related activation patterns within these regions can be attributed to specific representational levels (e.g. phonology, orthography, semantics; Carreiras, Armstrong, Perea, & Frost, 2014).
A rather new approach is fixation-related oscillatory EEG, which follows the same methodological principles as the above described fixation-related approaches, but with neuronal oscillations. Neuronal oscillations are assumed to reflect rhythmic changes of high and low levels of cortical excitability caused by fluctuating synaptic inputs and consequent firing rates of neuronal assemblies within various frequencies at multiple spatial scales. Moreover, they are supposedly distinctive with regard to their perceptual and cognitive functionality (Bassett, Meyer-Lindenberg, Achard, Duke, & Bullmore, 2006;Bressler, 1995;Buzsáki & Draguhn, 2004;Cohen, 2017;Hipp, Engel, & Siegel, 2011;Klimesch, Sauseng, & Hanslmayr, 2007). So far, fixation-related oscillations have been used to investigate online semantic and syntactic sentence-processing (Metzner et al., 2015;Vignali, Himmelstoss, Hawelka, Richlan, & Hutzler, 2016) as well as attention allocation during word-list reading (Kornrumpf, Dimigen, & Sommer, 2017). Although findings regarding fixation-related oscillations up to now are scarce, we will try to argue that this approach holds great promise in investigating the neural mechanisms underlying natural reading (see Section "New perspectives: the role of neuronal oscillations during natural reading").

Challenges
Co-registration of EMs and neuroimaging comes with technical and methodological challenges, particularly in the case of EEG. These aspects are beyond the scope of this article and therefore will only briefly be outlined. The interested reader may consult the below referenced work for an in-depth discussion: (1) Trigger synchronisation: The foremost requirement for co-registration is a proper synchronisation of the recording devices, that is, the eye tracking system and EEG or fMRI. This synchronisation has to be guaranteed throughout the whole recording (preferably by continuous triggering), since minimal drifts in the clocks of the devices can amount to substantial deviations (in the order of milliseconds). One way to ensure that triggers to the EM and the EEG datastream are synchronous, is to use split trigger pulses. A possibility to assess potential time delays of triggers is using a photosensitive diode to measure the timing of stimulus presentation, the detection of EMs by the eye tracker and the arrival of the triggers from the EEG (or fMRI; e.g. Richlan et al., 2013).
(2) Correction of ocular artefacts: More difficult is the proper correction of saccadic EMs preceding the time window of analysis and causing non-neural artefacts such as rotation of the corneo-retinal dipole or oculomotor muscles. Additionally, neural artefacts including presaccadic potentials related to motor preparation or perceptual suppression In RSVP, individual words are presented one-byone at the centre of a screen, usually preceded by a fixation-cross and intermitted by a blank screen (for illustrative purposes only shown for the first word). Analysis of the event-related signal proceeds time-locked to the externally triggered onset of a particular word. By contrast, in the fixation-related approach words are presented simultaneously, typically in a single line. Participants read at their own pace while their EMs are recorded. This approach allows participants (i) to endogenously allocate their attention and execute saccades, (ii) to parafoveally pre-process the upcoming word(s), (iii) to skip words and (iv) to reinspect formerly encountered words by means of regressive saccades. The EEG signal then is analysed time-locked to the point in time of, commonly, the first fixation on a word.
(3) Deconvolution of overlapping signals: Probably most challenging are overlapping brain responses caused by rapidly succeeding fixations. During reading, typical fixations last (on average) 200-250 ms and are thus much shorter than time windows commonly used for EEG analysis. For example, when analysing the N400, at the time this component arises, the readermore often than notis no longer fixating the word of interest. As a consequence, components evoked by the subsequent word coincide with the ongoing processing of the previous word, resulting in overlapping potentials. The issue of temporal overlap canat least to some extentbe circumvented by trying to keep overlapping components constant, for example by using identical sentence frames counterbalanced across participants. Such an approach, however, is not suitable for all research questions. Recently, Ehinger and Dimigen (2018) offered a solution for this issue by means of deconvolution which overcomes limitations of previous approaches (such as, e.g. the ADJAR method; Woldorff, 1993). For the event-related analysis of fMRI temporal overlaps are an inherent challenge due to the slow dynamics of the BOLD signal, but are successfully dealt with because of the (fairly) linear additivity of the BOLD signal (Dale & Buckner, 1997).
Besides these challenges, in our view, the foremost advantages of co-registration with respect to investigating visual word recognition are as follows: (1) Participants can process words at their own pace.
Therefore inherent temporal dynamics of visual word recognition during reading are maintained (e.g. Dimigen et al., 2011).
(2) Co-registration allows us to assess effects of EM control such as word skippings (Schuster et al., 2016) or regressive EMs (Metzner et al., 2017) and thus holds the potential to contribute to the further development of models of EM control during reading. (3) Co-registration allows the reader to parafoveally preprocess the upcoming word(s), providing insights into the nature of information being extracted from parafoveal vision (e.g. Dimigen et al., 2012;. (4) EMs can serve as an externally observable indicator for the engagement of the participant, rendering explicit tasks (e.g. lexical decision)which may alter brain responsesunnecessary (e.g. Reilly, 2014;Schuster et al., 2015).
In the following we will sketch existing and potential research endeavours which could be further advanced by means of co-registration. We particularly focus on contextual predictions in sentence comprehension and objectives to determine the time-course of top-down and bottom-up mechanisms: (i) disentangling contextbased integration from pre-activation and (ii) investigating additive and interactive effects of supposedly bottom-up and top-down determinants of visual word recognition. We will compare findings from ERPs and FRPs, but will also incorporate evidence from EM studies, especially on the role of parafoveal pre-processing in visual word recognition. Finally, we will elaborate on new perspectives in reading research with a special emphasis on fixation-related oscillatory EEG and neural network dynamics.
The "when and how" of contextual predictions in visual word recognition Natural reading proceeds at rates of up to 350 words per minute indicating fast and dynamic orchestration of perceptual, attentional and cognitive processing. Though there is broad consensus about which processes engage in reading, little is known about how information is organised to enable ongoing transition from perception to comprehension. An important debate in this respect is whether word recognition commences upon bottom-up lexical processing of a word, subsequently followed by post-lexical extraction of its meaning, or rather top-down modulated lexical processing based on pre-activated word meaning or contextual predictions. 1 Critically, the question thus is not if, but when and how context affects visual word recognition, more precisely, at which stage of processing (lexical vs. postlexical) bottom-up and top-down mechanisms become effective and in what way, if so, they interact with each other.
In EM research, the effect of sentential context upon word recognition is typically assessed by measuring the impact of word predictability 2 , a factor known to affect the speed of visual word recognition (for reviews see Rayner, 1998Rayner, , 2009Staub, 2015). Word predictability is thought to serve as a proxy of top-down expectations (e.g. Kliegl et al., 2014) as evidenced by shorter fixation durations and higher skipping probabilities for contextually predictable compared to unpredictable words (e.g. Balota, Pollatsek, & Rayner, 1985;Ehrlich & Rayner, 1981;Hawelka, Schuster, Gagl, & Hutzler, 2015;Kliegl, Grabner, Rolfs, & Engbert, 2004;Kliegl, Nuthmann, & Engbert, 2006;Rayner, Binder, Ashby, & Pollatsek, 2001;Rayner & Well, 1996). A question that has long been a subject of controversy, both in EM and neuroimaging research on reading, is whether facilitating effects of predictability are due to rapid context-based integration of encountered words ("integration view") or contextbased pre-activation of words and/or word features before they are encountered ("prediction view"). At the centre of the debate is one of the most well-documented event-related components in visual word recognition: the N400a negative-going deflection in the timelocked EEG signal (depending on the electrode site relative to the recording reference) with a peak amplitude at around 400 ms post-stimulus (Kutas & Federmeier, 2011).
The semantic N400: context-based integration or pre-activation?
From the perspective of the "integration view", visual word recognition is primarily driven by bottom-up lexical processing, initiating post-lexical retrieval of semantic representations and its integration into prior context. The "prediction view", on the contrary, assumes that visual word recognition operates via topdown pre-activation of word meaning, "gating" bottom-up lexical access 3 (DeLong, Troyer, & Kutas, 2014;Kutas, DeLong, & Smith, 2011;Kutas & Federmeier, 2011;Staub, 2015;Van Petten & Luka, 2012). There is extensive evidence that the N400 serves as an index for context-dependent processing demands and is modulated by various experimental manipulations including semantic congruency, plausibility, semantic relatedness or cloze probability. However, despite decades of research following the first report on this component by Kutas and Hillyard (1980), there is still no consensus whether reduced N400 amplitudes in response to contextually predictable target words reflect the ease with which a word is integrated into prior context or (probabilistic) pre-activation of plausible continuations (for an overview see Table 1; for reviews see Kuperberg, 2016;Kuperberg, Kreher, & Ditman, 2010;Kutas & Federmeier, 2000;Kutas & Federmeier, 2011;Lau, Phillips, & Poeppel, 2008;Swaab, Ledoux, Camblin, & Boudewyn, 2012). The debate has even been broadened by questioning a clear distinction between the two theoretical stances (Nieuwland et al., 2018a) by arguing in favour of a "multiple-process" account (e.g. Baggio, 2012;Baggio & Hagoort, 2011).
Investigating pre-target intervals has the potential to contribute to this discussion. By presenting sentences like "The day was breezy so the boy went outside to fly a/an kite/airplane" DeLong et al. (2005) not only replicated previously reported N400 effects in response to unpredicted nouns ("airplane" in the above example), but could also demonstrate that this effect was already present on the preceding article ("an" in the above example). According to the authors, this can only Table 1. Overview of studies investigating contextual effects on the N400.

Authors
Manipulation Integration

Preactivation
Multiprocess Kutas and Hillyard (1984) Cloze reasonably be interpreted in terms of probabilistic preactivation of the phonological word form of the upcoming noun, since the article itself does not impose differences in integration difficulty. Similar findings have been reported by Martin et al. (2013) in native compared to non-native readers. Having said that, attempts to replicate these findings, so far, have not been successful (Ito, Martin, & Nieuwland, 2017a;Nieuwland et al., 2018b), which, according to Nieuwland and colleagues, might not only question a "strong prediction view", but also fuel a more general discussion on the significance of predictions for language comprehension (Huettig, 2015;Huettig & Mani, 2016). While this debate is yet to be concluded (DeLong, Urbach, & Kutas, 2017;Ito, Martin, & Nieuwland, 2017b;Yan, Kuperberg, & Jaeger, 2017), investigating alterations of brain signals as a function of the "expectedness" of potential upcoming words within the pre-target time interval remains a promising approach to address the issue of word pre-activation and therefore has also been subject to FRP studies.

Evidence from FRPs
An FRP study explicitly addressing context-based integration versus pre-activation was done by Kretzschmar et al. (2009). The authors made use of antonym-constructions (e.g. "The opposite of black is … ") ending either with the predicted antonym ("white"), an unpredicted but semantically related word ("yellow") or an unpredicted and semantically unrelated word ("nice"). In line with previous findings, Kretzschmar and colleagues found higher N400 amplitudes and prolonged first fixation durations for unpredicted target words ("yellow", "nice" > "white"). Interestingly, analysis based on the last fixation prior to the target word also revealed an N400 effect, yet only in the semantically unrelated condition ("nice" > "white", "yellow"). While, according to the authors, the target-elicited N400 effect can be explained by both, the integration and the prediction view, the "pre-target" effect indicates (broad) lexical pre-activation of the expected antonym and semantically related words. Such lexical pre-activation however differs from those exposed by studies of DeLong et al. (2005) and Martin et al. (2013) in that it has not been induced by a mismatch between the pre-target word and preactivated features of the expected continuation, but by pre-activated features and the parafoveally pre-processed continuation (for similar findings see Metzner et al. (2015) who directly compared ERPs and FRPs with a world knowledge paradigm). Critically, however, the onset of the purportedly parafoveally induced N400 effect (∼250 ms) reported by Kretzschmar et al. (2009) exceeded the average duration of the last fixation in the pre-target region (∼186 ms) and thus must have coincided with fixations on the subsequent target words. Hence, it is possible that the effect was not induced by parafoveal pre-processing, but is actually an effect induced upon fixating the target word which overlaps with the preceding "pretarget" FRP (see section "Challenges -Deconvolution of overlapping signals"). Indeed, visual inspection reveals a close similarity between the gradation of the "parafoveal" N400 effect and an early component elicited by the target word. However, this effect could still be driven or at least influenced by word pre-activation and/or parafoveal pre-processing. To illustrate how two processes may contribute to this effect, let us consider how top-down pre-activation of the upcoming word may not initially interact with visual bottom-up information provided by the parafoveal word as long as the reader fixates the preceding word. Put differently, there may be no parafoveal-on-foveal influence since the two streams of information coincide only upon fixating a word, but then this may happen almost instantaneously (possibly as early as 80 ms post-fixation; Dimigen et al., 2012).
Furthermore, there is some evidence indicating a dissociation of (first) fixation durations and fixation-related N400 amplitudes as evidenced by a corpus analysis conducted by Dimigen et al. (2011). Based on 144 sentences comprising words with varying cloze probability (Potsdam Sentence Corpus; Kliegl et al., 2004)the authors observed shorter first fixation and gaze duration for high than low-predictable words and robust N400 predictability effects, peaking at around 384 ms after fixation-onset. However, when the N400 reached its peak, in 96% of the cases the initial fixation has already been terminated. It was further noted that N400 amplitudes were more closely related to gaze duration than to first fixation duration. Yet given a mean gaze duration of 278 ms, the authors questioned the possibility that this behavioural effect was indeed driven by the same neural generators as the N400 effect. Still, this finding seems to be in accordance with the notion that the N400 exceeds the time-course given for lexical processing with an upper bound between 200 and 250 ms during natural reading and therefore rather reflects post-lexical processing (Rayner, 1998;Sereno & Rayner, 2003;Sereno, Rayner, & Posner, 1998). In general, one might argue that this finding is in line with the "integration view", that is sequential processing of bottomup lexical and top-down post-lexical information, with the latter being reflected in the N400.
Having said this, predictability effects have also been observed prior to the presumed temporal constraint for post-lexical processing. For instance, an ERP study by Dambacher, Rolfs, Göllner, Kliegl, and Jacobs (2009) revealed a difference between predictable and unpredictable target words as early as around 90 ms poststimulus. In line with Dambacher et al. (2009), Lewis, Schoffelen, Hoffmann, Bastiaansen, and Schriefers (2017) reported effects of semantically coherent compared to incoherent discourses already between 80 and 200 ms relative to word onset. Importantly, these findings seem to correspond to EM research, estimating the earliest effect of word predictability on fixation duration at approximately 140 ms . If this holds true, predictability effects however must temporally coincide with effects of word frequency 4arguably a bottom-up determinant of lexical access (Inhoff, 1984;Kliegl et al., 2004; but see Chen, Davis, Pulvermüller, & Hauk, 2015;Strijkers, Bertrand, & Grainger, 2015). As for word predictability, EM studies consistently demonstrated facilitating effects of word frequency on visual word recognition as indicated by shorter fixation durations and higher skipping probabilities for frequent than infrequent words (Henderson & Ferreira, 1993;Hyönä & Olson, 1995;Inhoff & Rayner, 1986;Kliegl et al., 2004Kliegl et al., , 2006Rayner & Duffy, 1986;Rayner, Sereno, & Raney, 1996;Schilling, Rayner, & Chumbley, 1998;Slattery, Pollatsek, & Rayner, 2007). Critically, Reingold, Reichle, Glaholt, and Sheridan (2012) estimated that word frequency exerts its influence on fixation duration as early as 145 ms post-fixation, that is within the same time window as word predictability. Likewise, several ERP studies reported frequency effects emerging within the first 200 ms post-stimulus (e.g. Hauk, Davis, Ford, Pulvermuller, & Marslen-Wilson, 2006;Hauk & Pulvermüller, 2004;Reichle, Tokowicz, Liu, & Perfetti, 2011;Sereno et al., 1998; but see Laszlo & Federmeier, 2014 for a critical discussion). It must be noted, that there are also reports on frequency effects within the N400 time window, which, however, mainly seem to arise from interactions with other factors including word repetition and length (e.g. King & Kutas, 1998;Rugg, 1990). Of particular interest in this respect are word position effects, showing a gradual decrease in N400 amplitudes with increasing word position (Van Petten, 1993;Van Petten & Kutas, 1990, 1991. Critically, when accounting for word predictability and the interaction between word predictability and frequency, Dambacher et al. (2006) could demonstrate that the effect of word position on the N400 gets assimilated, which, according to the authors, substantiates the notion that word position can be considered as a proxy of contextual constraint (but see Schuster, Hawelka, Himmelstoss, Richlan, & Hutzler [in press] for dissociable effects of word position and predictability). More importantly, however, this study revealed an interaction between word predictability and frequency, indicating that when context is given, the impact of lexical frequency on the N400 seems to be attenuated.
Taken together, findings with respect to the question at which processing stage contextual predictions become effective are still inconclusive. Admittedly, most of the neuroimaging studies so far reported late effects of word predictability (i.e. within the N400 time window), whereas word frequency seems to exert its effect comparatively early (i.e. within the P200 time window), suggesting a primacy of lexical processing. Critically, however, there is also some evidence for early context-dependent ERP effects. Moreover, EM studies demonstrated that both, word predictability and frequency become effective within the time frame of lexical access (e.g. Sereno & Rayner, 2003), while FRP studies indicate that effects of word predictability on fixation duration do not align with those on N400 amplitudes. An alternative approach which may inform the debate on contextual influences on word recognition originates from EM research: investigating whether early effects of word frequency and predictability interact or are merely additive (e.g. Hand, Miellet, O'Donnell, & Sereno, 2010). Applying this reasoning to F/ERPs, in consequence, requires a focus on early rather than late components.
Before the N400: additive or interactive effects?
It has been proposed in the EM literature, that an additive effect of word frequency and predictability would point to a primacy of bottom-up processing with contextual effects emerging post-lexically, whereas an interaction of these variables would be indicative of an early effect of context on lexical processing (Hand et al., 2010;Sereno, Hand, Shahid, Yao, & O'Donnell, 2018). Interestingly, EM studies investigating word frequency and predictability conjointly, mainly reported additive effects, that is, both variables contribute to fixation duration and skipping probability, yet independent of each other (e.g. Altarriba, Kroll, Sholl, & Rayner, 1996;Ashby, Rayner, & Clifton, 2005;Fitzsimmons & Drieghe, 2013;Kliegl et al., 2004Kliegl et al., , 2006Lavigne, Vitu, & d'Ydewalle, 2000;Miellet, Sparrow, & Sereno, 2007;Rayner, Ashby, Pollatsek, & Reichle, 2004;Rayner et al., 2001; see Staub, 2015 for a review). A recent study by Sereno et al. (2018), however, puts these findings into perspective. In contrast to previous studies, Sereno and colleagues did not only manipulate word predictability and frequency of target words, but also the availability of their parafoveal preview (valid vs. invalid) by means of the boundary paradigm (see Figure 2). Importantly, while both preview conditions led to independent frequency and predictability effects on fixation times, only in the valid preview condition an interaction could be observed which has arised from a diminished frequency effect for high compared to medium and unpredictable words. Indeed, this finding provides reliable evidence for the impact of parafoveally pre-processed contextual information on early lexical processing during natural reading. Critically, one might argue, that a statistical interaction of word frequency and predictability in behavioural data, such as fixation durations, does not necessarily imply simultaneous processing, since fixation duration may only reflect the "endpoint" of processing stages which in itself still can be sequential. Respective conclusions, however, could be drawn from neural measures, such as, for example ERPs.
In contrast to the majority of EM studies, there is some evidence for early interactive effects in ERPs (Lee, Liu, & Tsai, 2012;Sereno, Brewer, & O'Donnell, 2003;Penolazzi, Hauk, & Pulvermüller, 2007). Of particular interest in this respect is a finding indicating that the emergence of interactive effects in RSVP studies is modulated by stimulus-onset-asynchrony (SOA), that is, the time between stimulus onsets (Harley, 2014). Here, Dambacher et al. (2012) could demonstrate that early interactive effects only emerge in case of short SOAs (280 ms), suggesting context-based enhancement of early lexical processing when RSVP approximates the rate at which natural reading proceeds. In general, this finding corroborates the notion that imposing a predefined time-window for information processing by use of RSVP might bias the engagement of cognitive processes (see also Brothers, Swaab, & Traxler, 2015;Wlotko & Federmeier, 2015) and therefore alter the time-course of visual word recognition. Having said that, even in case of short SOAsas has also been pointed out by Dambacher et al. (2012) the question remains open whether inferences based on RSVP findings can be transferred to natural reading.

Evidence from FRPs
To our knowledge, only one FRP study explicitly investigated potential interactions between word frequency and predictability, by presenting sentences with high and low-constraining target words, which were either high or low-frequent (Kretzschmar, Schlesewsky, & Staub, 2015). Analysis of the target words not only revealed reduced N400, but also enhanced P200 amplitudes for predictable compared to unpredictable words. However, no effect of word frequency or an In ERP studies using the RSVP-with-flankers paradigm participants are instructed to maintain central fixation to avoid EM artefacts. The fixated word is flanked by, e.g. the (previous and) next word in a sentence. After a certain constant duration, the fixated word is replaced by the parafoveal word which, in turn, is replaced by the consecutive word of the sentence, asf. In EM studies, parafoveal pre-processing is typically investigated with the boundary paradigm (Rayner, 1975) in which the preview of a parafoveal target word is experimentally manipulated until a pre-target boundary is crossed. Often, the manipulation is masking the target word, e.g. with a string of "X"s or different letters of equal length (see also Hutzler et al., 2013;Kliegl, Hohenstein, Yan, & McDonald, 2013;Marx, Hawelka, Schuster, & Hutzler, 2015 for cautionary notes on using parafoveal masks in the boundary paradigm). A recent study comparing the flanker and the boundary paradigm revealed that the preview effect is substantially larger in the boundary paradigm, indicating that passive reading (i.e. without saccades) in the flanker paradigm does not assimilate natural (i.e. active) reading with saccades (Kornrumpf et al., 2016). Recent FRP studies on parafoveal pre-processing during natural reading which made use of the boundary paradigm reported early effects of valid previews over occipitotemporal electrodes (e.g. Dimigen et al., 2012;. interaction of frequency and predictability could be observedneither in early nor late components. On the pre-target level, FRPs showed no impact of predictability or frequency of the upcoming target word (see Degno et al., 2018 for a similar finding). Behaviourally, while fixation durations on the target words were influenced by frequency and predictability in an additive fashion, analysis of the pre-target words revealed an interactive effect with longer fixation durations for unpredictable, high frequent words. Critically, as has also been pointed out by the authors, words preceding the target word were not matched across conditions with regard to word length and frequency. Thus, the observed interaction might have resulted from differences in the extent of parafoveal pre-processing due to varying saccade launch site 5a factor that has recently been shown to give rise to interactive effects in EMs (Hand et al., 2010; but see Slattery, Staub, & Rayner, 2012) Still, Kretzschmar and colleagues concluded, that when context is given, in contrast to EMs, the N400 seems to be insensitive to word frequency. Thus, word frequency might not carry additional information important for verifying top-down predictions.
To conclude, evidence in favour of early interactive effects in visual word recognition is scarceparticularly in EM studies. Still, initial findings on an early interplay of top-down and bottom-up determinants of visual word recognition in ERPs when approximating natural reading rates  and in EM behaviour when taking parafoveal preview into account (Sereno et al., 2018), clearly needs further investigation. Furthermore, to reliably assess whether potential early interactive effects in neural responses correspond to those in EM behaviour evidently necessitates the alignment of both measures. Co-registration of EMs and EEG may achieve a convergence of these findings. However, up to now only one FRP study explicitly addressed interactive effects between word predictability and frequency, yet without controlling for parafoveal preprocessing, limiting a meaningful interpretation. Thus, FRP studies emphasising early interactive effects while considering parafoveal preview as a potentially modulating factor might be a promising future endeavour to further specify the time-course of visual word recognition during natural reading and, in turn, contribute to the discussion on the significance of contextual predictions for online language processing.
New perspectives: the role of neuronal oscillations during natural reading In the previous sections we have sketched how FRPs may contribute to the issue of contextual predictions during natural reading by investigating the time-course of topdown and bottom-up processing inferred from evoked components. However, referring to these mechanisms also implies that reading engages an orchestrated interaction between brain regions acting at different levels of input processing (e.g. Carreiras et al., 2014) and, as a consequence, at different hierarchically organised cortical levels. Top-down and bottom-up, in this respect, can therefore also be referred to as feedback and feedforward information transmission (but see Rauss & Pourtois, 2013 for an overview of alternative definitions) between higher cortical levelsinvolved in syntactic and semantic information processingand lower cortical levelsassociated with visual-orthographic and lexical information processing. The question thus arises as to when and how information is integrated and transmitted within and between these levels. Investigating neuronal oscillations (commonly inferred from standard time-frequency analysis; Cohen, 2017; but see Haller et al., 2018), which are assumed to be indicative of intra-and interareal communication within cortical networks (e.g. Bressler & Richter, 2015;Engel, Fries, & Singer, 2001;Hipp et al., 2011;Varela, Lachaux, Rodriguez, & Martinerie, 2001;von Stein, Chiang, & König, 2000), holds great promise to contribute to this question. Although findings with respect to language processing vary widely depending on the experimental manipulation and methodological approach (for reviews see Bastiaansen & Hagoort, 2006;Lewis, Wang, & Bastiaansen, 2015;Meyer, 2018), investigating predictive processing during reading by means of oscillatory brain activity is a growing research area. Of particular interest in this respect is a recent theoretical framework for sentence-level language comprehension trying to link predictive coding theories with oscillatory network dynamics gating hierarchical information transmission in the language network .

Predictive coding in neuronal oscillations during reading
In brief, predictive coding posits that the brain continuously performs context-sensitive perceptual inference to optimise precision of sensory predictions and, as a result, reduce uncertainty about upcoming events (Clark, 2013;Friston, 2005Friston, , 2009Friston, , 2010Rao & Ballard, 1999). This theory prescribes a hierarchically organised cortical architecture with reciprocal, yet functionally asymmetric backward (i.e. top-down) and forward (i.e. bottom-up) connections between higher and lower cortical levels. Predictions are thought to be generated in higher levels, descending to the next lower level, where they are compared with sensory inputs resulting in a so-called prediction error (PE), that is, the difference between expected and actual incoming information. The error signal subsequently propagates up the hierarchy to the next higher level to adjust the current prediction by updating the generative model and inferred likeliest causes. The revision of the predictive model is ongoing, aiming at minimising PEs at all levels within the hierarchy. Importantly, in some models of predictive coding, the impact of PEs on model updating varies as a function of weighting predictions according to their environmental evidence, that is, their precision (Adams, Stephan, Brown, Frith, & Friston, 2013;Friston, 2005;Mathys et al., 2014;Rao & Ballard, 1999). Within predictive coding, beta oscillations are assumed to convey top-down predictions, while gamma oscillations are thought to be involved in the bottom-up propagation of PEs (e.g. Arnal & Giraud, 2012;Bastos et al., 2012;Friston, Bastos, Pinotsis, & Litvak, 2015).
Combining this scheme with current proposals on beta synchronisation underlying the formation of large-scale distributed networks (NeuroCognitive Network, NCN;Bressler & Richter, 2015;Engel & Fries, 2010) and gamma synchronisation reflecting the matching of predicted and actual linguistic input during language processing ,  hypothesised, thatwithin the language networkan increase in lower beta power signals the maintenance of its current configuration in case of effective construction of sentencelevel meaning as well as resulting top-down directed transmission of predictions. By contrast, failed construction of sentence-level meaning, necessitating a change of the network configuration, would be reflected in a beta power decrease. An increase in middle gamma power, on the other hand, is assumed to indicate successful "matching" of predicted and actual linguistic input (no such change would be expected in case of a mismatch), while bottom-up directed PEs would be reflected in an increase in high gamma power. Respective findings, however, up to now are scarce and mainly stem from RSVP paradigms (see Meyer, 2018 for a recent review).
While there is some evidence for the hypothesised function of beta synchronisation for construction of sentence-level meaning (Lewis, Schoffelen, Schriefers, & Bastiaansen, 2016;Lewis et al., 2017), respective findings in terms of top-down predictions (rather than just "meaning") areto our knowledgestill pending. In fact, some recent studies have revealed somewhat contradictory findings rather suggesting that predictive processing during reading is reflected in power suppression within the alpha and beta range, critically, already prior to word onset, that is, within the pre-stimulus interval. To illustrate, Rommers, Dickson, Norton, Wlotko, and Federmeier (2017) could demonstrate that as sentence reading proceeded to the critical word, alpha power decreased when the word would be highly constrained by the sentence, resulting in a decrease in pre-stimulus alpha power (8-12 Hz) over occipital regions, indicating enhanced preparedness to process the anticipated input. Similar findings have been reported by Wang, Hagoort, and Jensen (2018a), showing a decrease in pre-stimulus alpha (8-12 Hz) and beta power (16-20 Hz) within a widespread network encompassing left inferior frontal and posterior temporal regionsincluding the Visual Word Form Area (VWFA; Cohen et al., 2002). Notably, the VWFA has been linked to visuo-orthographic processing, supposedly encoding whole-word recognition units (e.g. Kronbichler et al., 2004). Thus, the finding reported by Wang et al. (2018a) might not onlyin line with Rommers et al. (2017) indicate anticipatory engagement of language-related areas in general, but pre-activation of abstract visuo-orthographic word templates in particular (see also Willems, Frank, Nijhof, Hagoort, & van den Bosch, 2016).
With regard to the hypothesised functions of gamma oscillations, evidence is mixed. For example, while Wang et al. (2018a) indeed observed an increase in gamma power (60-90 Hz) over left temporal and frontal regions in response to semantically incongruent compared to congruent sentence-final words, this effect was accompanied by a decrease in alpha power (8-12 Hz) over left temporal and visual regions. Moreover, only alpha power decreases differed depending on sentential constraint, suggesting that an increase in gamma oscillations might not reflect PEs, but rather semantic unification and retrieval effort. Furthermore, within high constraining sentences, pre-stimulus alpha desynchronisation in temporal regions was negatively correlated with gamma power at stimulus onset in prefrontal regions suggesting a predictive network that utilises gamma for processing "correct predictions" rather than for purely error processing (see also Vidal et al., 2012; but see Penolazzi, Angrilli, & Job, 2009). This notion has been further substantiated by a follow-up study, indicating that gamma reflects the successful matching of predicted and actual input (Wang, Hagoort, & Jensen, 2018b). By contrast, Rommers et al. (2017) observed an increase in frontal theta power (4-7 Hz) for unexpected words in high constraining sentences, which, unlike Wang et al. (2018a), has been interpreted as increased requirement for cognitive control in case of failed predictions (see also Molinaro, Barraza, & Carreiras, 2013).

Evidence from fixation-related oscillatory EEG
The first study investigating effects of predictability on fixation-related oscillatory dynamicsalthough not in the context of predictive codingwas done by Metzner et al. (2015), where the authors re-investigated previous RSVP findings which demonstrated not only robust N400 effects but also theta and gamma synchronisation in response to world knowledge violations (Hagoort et al., 2004). Interestingly, while Metzner and colleagues successfully replicated the N400 predictability effect both in ERPs and FRPs, an increase in theta power could only be found in the RSVP setting. During natural reading, however, the authors observed delta-synchronisation and upper-alpha desynchronisation.
A study explicitly evaluating Lewis and Bastiaansen's framework was done by Vignali et al. (2016). Participants read syntactically well-formed and syntactically illformed sentences (sentences whose words were randomly shuffled), including either a semantically congruent or incongruent target word. In line with the theoretical assumptions of , at the target-word Vignali and colleagues found a desynchronisation in lower-beta (13-18 Hz) only for semantically incongruent words, which may indicate how predicted words need re-updatingi.e. illustrating how a beta oscillation that carries an incorrect prediction requires disintegration. Interestingly to add to the gamma debate, the authors also reported an increase in gamma power (31-55 Hz) over the course of the sentences only for syntactically well-formed sentences as well as higher theta power (4-7 Hz) in well-formed compared to ill-formed sentences. In contrast to Metzner et al. (2015), this study therefore not only revealed qualitatively comparable results as the RSVP literature, but also modulations in gamma power on the sentencelevela pattern that has also been observed with intracranial recordings and interpreted as construction of linguistic meaning (Fedorenko et al., 2016).

Directed brain-connectivity to test for contextual predictions during natural reading
In light of only a few studies addressing the issue of contextual predictions and oscillatory network dynamics during (natural) reading, drawing firm conclusions with respect to its significance for linguistic processing would be premature. Still, the studies reviewed above demonstrate the potential of investigating oscillatory activity to identify how information is organised within the reading network and therefore to broaden our understanding about the "when and how" of contextual predictions in visual word recognition. It must be emphasised, however, that methods applied so far might only be suitable for testing undirected network processing, since they are indicative of functional (i.e. correlative) but not effective (i.e. causal) connectivity (see Friston, 2011 for a review). Furthermore, as described in the previous section, contextual effects during natural reading seem to be modulated by the availability of parafoveal information. Thus, it is plausible that oscillatory network dynamics are likewise sensitive to the predictability of upcoming information and its fit with parafoveal information.
Employing, for example, dynamic causal modelling (DCM; Chen, Kiebel, & Friston, 2008;Friston, Harrison, & Penny, 2003;Kiebel, Garrido, & Friston, 2007;Moran, Pinotsis, & Friston, 2013;Stephan et al., 2010) on coregistered EM and EEG data would allow us to test for effects of contextual predictions on directed hierarchical information transmission within the reading network (as, for example, suggested by  in an ecologically valid setting. We note that DCM has already successfully been applied to ERPs in the field of visual word recognition (e.g. Woodhead et al., 2014;Yvert, Perrone-Bertolotti, Baciu, & David, 2012) and EM behaviour in the context of Bayesian inference (e.g. Adams, Aponte, Marshall, & Friston, 2015;Adams, Bauer, Pinotsis, & Friston, 2016). Furthermore, DCM has provided fundamental insights into oscillatory dynamics underlying hierarchical processing as proposed by predictive coding theories of perception (e.g. Bastos et al., 2015). However some of these earlier predictions regarding the meaning of gamma oscillations for example, likely need revising in light of the oscillatory findings described here. Overall, the DCM approach not only offers the potential to investigate how top-down and bottom-up processing converges when encountering a specific word (e.g. pre-target versus target), but also how network dynamics underlying the transition from perception to comprehension evolve over time (e.g. from the beginning to the end of a sentence) while temporal and spatial aspects of EMs during natural reading are maintained. This is a crucial aspect with respect to the question of if and how we build up forward inferences during reading and, in consequence, whether contextual predictions during reading are mainly effective in prediction-encouraging tasks (Huettig & Mani, 2016), or are rather an inherent brain mechanism facilitating information processing during natural reading. Notes 1. We would like to point out that the discussion on predictive processing is nuanced and a comprehensive description would be beyond the scope of the article. For an in-depth discussion, we would like to refer the reader to Kuperberg and Jaeger (2016). 2. Word predictability is, traditionally, defined as the proportion [p] of readers (in an independent norming sample) predicting a particular word based on a previous sentence context in a so-called incremental cloze-task (Taylor, 1953). It must be noted that the utilisation of word predictability norms based on this procedure does not come without critique. In particular the circular mapping of behaviour to behaviour has been criticised to constitute a subjective rather than objective measure (Hofmann & Jacobs, 2014, p. 100). Moreover, the rather effortful collection of such predictability norms motivated attempts to approximate word predictability by means of corpus based transitional probabilities (e.g. McDonald & Shillcock, 2003). However, thorough investigations of the relation between effects of transitional probability and word predictability revealed that the variance in the eye movement parameters is better explained by predictability norms (Frisson, Rayner, & Pickering, 2005). 3. Lexical access is defined as the activation of a particular entry in the putative mental lexicon (Inhoff, 1984). 4. Word frequency is defined as the mean prevalence of a word in printed texts (Rayner, 1998(Rayner, , 2009. Please note that in this article we are primarily interested in the emergence of predictability effects in relation to the effect of word frequency. For a detailed analysis of the timecourse of other linguistic variables (e.g. word length, orthographic similarity, semantic coherence) we would like to refer the reader to Hauk et al. (2006). 5. Saccade launch site is the takeoff point of a saccade.
Saccade launch site distance is the distance between the takeoff point and the beginning of the aimed-at word (Heller & Müller, 1983).