Early EEG correlates of word frequency and contextual predictability in reading

ABSTRACT Previous research into written language comprehension has been equivocal as to whether word frequency and contextual predictability effects share an early time course of processing. Target word frequency (low, high) and its predictability from prior context (low, high) were manipulated across two-sentence passages. Context sentences were presented in full, followed by word-by-word presentation (300 ms SOA) of target sentences. ERPs were analysed across left-to-right and anterior-to-posterior regions of interest within intervals from 50 to 550 ms post-stimulus. The onset of significant predictability effects (50–80 ms) preceded that of frequency (P1, 80–120 ms), while both main effects were generally sustained through the N400 (350–550 ms). Critically, the frequency-predictability interaction became significant in the P1 and was sustained through the N400, although the specific configuration of effects differed across components. The pattern of findings supports an early, chronometric locus of contextual predictability in recognising words during reading.


Introduction
Two fundamental variables affecting how fast a word is recognised during reading are its frequency of occurrence and its predictability from the prior context (Rayner & Sereno, 1994). The relative timing of perceptual and knowledge-based processing is significant in determining the underlying neurocircuitry of word recognition. While word frequency effects are thought to reflect early lexical processing (Sereno & Rayner, 2000, 2003, the temporal locus of contextual effects is less certain, both theoretically and empirically. Modular models posit that context can only operate post-lexically (Fodor, 1983), while interactive models maintain that context can directly affect lexical processing (McClelland, 1987). The relative timing of bottomup, perceptual and top-down, contextual processes has been used as support for these alternative models of word recognition (for a discussion, see Dambacher et al., 2012;Sternberg, 1969). Demonstrating an earlier (e.g. Dambacher, Rolfs, Göllner, Kliegl, & Jacobs, 2009) or later (e.g. Murray & Forster, 2004) time course of predictability effects would provide support for interactive or modular accounts, respectively. Specifically, an early interaction of frequency and predictability would provide strong evidence that lexical access is guided by top-down processes (e.g. Dambacher et al., 2012).
The present study utilised an electrophysiological approach to provide a temporally refined examination of the combined effects of frequency and predictability. It is therefore worthwhile to first describe in more detail previous relevant ERP findings. In studies presenting words in isolation, robust word frequency effects have been reported in the N1 time range (∼130-200 ms post-stimulus), with LF words eliciting a greater amplitude than HF words over lateral posterior electrodes (e.g. Hauk & Pulvermüller, 2004;Scott, O'Donnell, Leuthold, & Sereno, 2009;Sereno, Rayner, & Posner, 1998). However, there is also evidence for a frequency effect occurring even earlier, in the P1 (100-120 ms), with increasing ERP amplitudes over left-posterior electrodes associated with decreasing word frequency (Hauk, Davis, Ford, Pulvermüller, & Marslen-Wilson, 2006). In time intervals later than 300 ms, LF words elicited more negativegoing amplitudes compared to HF words (Hauk & Pulvermüller, 2004;Rugg, 1990; but see Polich & Donchin, 1988, for a reverse effect in P300 amplitude).
Crucially, when studying word frequency effects in the ERP waveform at the sentence or discourse level, the predictability of the forthcoming critical word is an important factor that influences the processing of linguistic input. Predictability effects have been found in the ERP waveform as early as 50-90 ms after critical word onset during discourse comprehension (Dambacher et al., 2009). In the 200-300 ms time range, a larger ERP positivity (P200) over anterior midline sites has been found when incoming linguistic information was unexpected or affectively salient (Leuthold, Kunkel, Mackenzie, & Filik, 2015), in line with the view that the P200 component indicates the enhanced attentional processing of such visual input (e.g. Hillyard & Münte, 1984). After approximately 200 ms and over centroparietal midline sites, a negativegoing ERP deflection typically starts to develop in the ERP waveform, peaking at about 400 ms after critical word onset. This N400 component is taken to reflect lexical-semantic processingwords that are unexpected or a poor fit with context at the sentence-or discourselevel elicit a larger N400 than those that are expected or a good fit (e.g. DeLong, Urbach, & Kutas, 2005;Kutas & Hillyard, 1984;van Berkum, Hagoort, & Brown, 1999; for a review, see Kutas & Federmeier, 2011).
Of key interest for present purposes are six ERP studies that have investigated the combined effects of word frequency and predictability. We present the studies in chronological order. In the first study, van Petten and Kutas (1990) examined word frequency effects dependent on the position of the critical word within the sentence. Sentences were presented word-by-word, with each word displayed centrally for 200 ms using a 900 ms stimulus onset asynchrony (SOA). They did not directly manipulate predictability, but instead assumed that words become more predictable the later they appear in sentences. They showed a larger N400 (350-500 ms) for LF than HF words occurring early in the sentence but not at intermediate or sentence-final positions. They suggested that this attenuation of frequency effects indicated that contextual constraint strongly influences lexical processing.
In the second study,  presented HF, LF, and ambiguous words as sentence-final targets within neutral or biasing contexts, using 32 items per condition. Their ambiguous wordshaving a strongly dominant sense and a much weaker subordinate sense were functionally equivalent to HF words in neutral contexts and LF words in (subordinate-meaning) biasing contexts. Individual words from sentences were presented centrally for 225 ms with a 450 ms SOA (N.B. the sentence-final target was presented for 495 ms). In addition to a word frequency effect, they found an interaction of word type and context in the N1 (132-192 ms), with stronger frequency effects in neutral than biasing contexts. They suggested their findings supported an early, interactive account of lexical processing.
In the third study, Dambacher et al. (2006) examined word frequency and word position effects within sentences together with word predictability effects. Their targets comprised all open-class words from the 144-sentence Potsdam Sentence Corpus (PSC; Kliegl et al., 2004), with varying frequencies and predictabilities. Sentences were presented word-by-word, with each word displayed centrally for 250 ms with a 700 ms SOA. They found an ERP positivity (P2) between 140 and 200 ms over anterior electrodes that increased with decreasing word frequency and was independent from predictability. In the N400 (300-500 ms), amplitude increased with decreasing predictability. More importantly, in line with the findings of van Petten and Kutas (1990), the frequency effect in the N400 amplitude (i.e. a larger N400 for LF than HF words) decreased with increasing predictability. They suggested this result indicated that LF words can acquire greater benefit from contextual information.
In the fourth study, Penolazzi, Hauk, and Pulvermüller (2007) manipulated word frequency (LF, HF), predictability (low, high), as well as word length (four or six letters, on average) within single sentences, using 35 items per condition. Sentences were presented word-by-word, with each word presented centrally for 300 ms, using a 700 ms SOA. Separate frequency and predictability effects occurred interactively with word length between 110 and 130 ms after critical word onset, presumably indicating early lexical access. In the subsequent 170-190 ms time range, there was a more negativegoing ERP amplitude for low than high predictability conditions over midline-parietal electrodes. In the same time interval, LF words elicited more negative amplitudes than HF words over midline-central and -posterior electrodes, but only when word length was short. Lastly, N400 amplitude (250-450 ms), as anticipated, was larger for low than high predictability conditions over posterior sites. Although frequency and predictability each interacted with word length, in contrast to the prior studies, frequency and predictability did not interact with each other. Penolazzi et al. concluded that lexical access and semantic context integration are distinct systems but that both are influenced by lower-level orthographic and phonological processing.
In the fifth study, Dambacher et al. (2012) examined frequency (LF, HF) and predictability (low, high) effects on target words presented in the second sentence of two-sentence passages, with 36 items per condition. Passages were presented word-by-word, with each word displayed centrally for 250 ms. Word-to-word SOAs were manipulated across three separate experiments, using SOAs of 700, 490, and 280 ms. At the 700 and 490 ms SOAs, Dambacher et al. (2012) found significant main effects of both frequency and predictability in the P2 (240-300 ms) and significant effects of predictability in the N400 (300-500 ms), but no evidence of an interaction in any component (N.B. there was a marginal effect of frequency at the 490 ms SOA in the N1, from 140 to 210 ms). At the 280 ms SOA, using slightly different time windows for analysis, a different pattern emerged. Significant frequency effects were found in the N1 (190-260 ms) and P2 (300-360 ms) components, significant predictability effects were found in the P2 (300-360 ms) and N400 (300-500 ms) components, and a significant interaction was found in the early N1 (135-155 ms) component. Dambacher et al. (2012) suggested that presentation rate affects word recognition processes. Specifically, they proposed that their fastest SOA of 280 ms (i.e. the one most representative of a normal reading rate) allows for immediate top-down processing to constrain word identification. In a related study, Dambacher et al. (2009) presented the same materials using a 280 ms SOA and reported an earlier significant effect of predictability from 50 to 90 ms post-stimulus.
Finally, in the sixth study, Kretzschmar, Schlesewsky, and Staub (2015) investigated frequency (LF, HF) and predictability (low, high) effects on target words presented within single sentences during normal reading. Although they used 40 items per condition, target words were repeated across levels of predictability. Eye movements and fixation-related potentials (FRPs) were simultaneously recorded. Earlier, Dimigen, Sommer, Hohlfeld, Jacobs, and Kliegl (2011) also implemented a co-registration paradigm. However, similar to Dambacher et al. (2006), Dimigen et al.'s materials comprised all open-class words from the PSC, with words categorised as having low, medium, or high predictability. In their FRP data, Dimigen et al. found that significant predictability effects (low vs. high) were limited to the N400 (300-500 ms) component. Kretzschmar et al. used consecutive analysis windows of 50 ms from 150 to 700 ms after fixation onset. They found significant sustained effects of predictability from 150 to 650 ms. However, the frequency effect only emerged in the 500-550 ms window and the frequency-predictability interaction was restricted to the 300-350 ms time window. Kretzschmar et al. suggested that the interaction between top-down and bottom-up processes is relatively delayed, with the N400 demonstrating particular sensitivity to such interplay.
Taken together, the pattern of frequency-predictability effects is quite mixed, not only across electrophysiological studies, but also in RT and eye movement reading studies (for a review, see Hand et al., 2010). However, there are paradigm-specific constraints as well as aspects of the stimulus materials, themselves, which may have influenced results and, hence, limit their generalisability. For example, measures having longer latencies (e.g. RTs occurring in the range of 500-800 ms; later-onset components of the ERP such as the N400) are less likely to solely reflect the immediate, automatic stages of word recognition and are more susceptible to influences from conscious, strategic processing (for a discussion, see Gold et al., 2006). Similarly, the use of word presentation rates in ERP studies (often 500-1000 ms/word) that are several times slower than the normal reading rate of ∼200-300 ms/word also encourages non-automatic processing. In order to appropriately differentiate modular and interactive accounts, the use of measures sensitive to the early stages of lexical access is necessary.
In terms of materials, most studies to date have used fairly brief contexts, limited to a handful of words preceding the target within a single sentence. For example, the contexts of Penolazzi et al. (2007) comprised five words and often employed semantic associates or set phrases (e.g. electrical power, shook with fear; targets underlined). The use of condensed contexts increases the likelihood of employing semantic primes in order to make a target more predictable. While semantic associates facilitate target identification, when in close proximity to the target, such "context" is thought to originate at the lexical level and not from a higher discourse or message level (Forster, 1979). As such, intralexical priming effects cannot differentiate modular from interactive accounts of processing. Moreover, the use of longer texts preceding target words may allow for context effects to develop more fully (Hand et al., 2010). Another potential difficulty with materials in many ERP (as well as RT) studies is having the target word as the last word of the sentence (see Dimigen et al., 2011). For example, sentence-final words typically elicit more positive-going ERPs (Friedman, Simson, Ritter, & Rapin, 1975;Hagoort, 2003). Another aspect of ERP studies is that they require a relatively high number of items per condition. Most frequency-predictability studies (see above) have used 30-40 items per condition.
Finally, a variety of approaches have been adopted when creating materials in 2 (Frequency: LF, HF) × 2 (Predictability: low, high) designs. There are strengths and limitations of each approach utilised thus far. In Dambacher et al. (2012), materials comprised two-sentence passages, with an initial context sentence and a subsequent sentence containing an LF or HF target. For target sentences, an LF or HF word was identified in each of the 144 sentences of the PSC. An alternative target word of equal length but of opposing frequency was then selected for each sentence frame. A preceding, context sentence was then constructedone for the LF and one for the HF target sentencesdesigned to increase the predictability of the upcoming target (i.e. the high predictability conditions). For low predictability conditions, these initial context sentences were switched. Thus, for each of the 144 sentences of the PSC, four different passages were createdhigh predictability (HP) and low predictability (LP) versions containing either LF or HF targets. This method ensures that the local content of the target sentence is identical between LF and HF conditions. However, having the initial HP context sentence perform a dual rolethat is, act as the LP context for the alternative targetsometimes introduced anomalies. In the examples presented in Table 1 (upper section), it would not be appropriate to characterise targets in LP versions as simply having a lower level of predictability. In these cases, the initial contexts make the targets semantically awkward if not anomalous. In Kretzschmar et al. (2015), LF and HF targets (40 of each) appeared in single sentences whose preceding contexts were constructed to be either LP or HP. For each target word, individualised contexts were created for each LF or HF word, as seen in the examples in Table 1 (lower section). The practice of using uni-functional contexts has the advantage that target words are genuinely of higher or lower predictability. In particular, in LP contexts, target words do not stand out as semantic misfits. The downside of using such contexts is that the local sentence content (i.e. the words immediately preceding and following the target) is different across frequency conditions.
The current experiment addressed these issues. LF and HF target words were presented in two-sentence passages, with the first sentence establishing a context and the target embedded within the second sentence. For each target, two different context sentences were constructedone that was neutral, making the target  Dambacher et al. (2012) and Kretzschmar et al. (2015).
Condition Dambacher et al. (2012) example experimental materials LF-LP (a) The man on the picture fiddled around with models of Columbus' fleet. In his right hand he held a sceptre of considerable length. (b) Johannes heard a huge plane approaching from some distance. He gazed intensely into the tunnel and listened carefully.
LF-HP (a) The man on the pictured wore a golden crown and sat stately on a throne. In his right hand he held a sceptre of considerable length. (b) Before walking through the mountain, Johannes wanted to make sure, that no train was approaching. He gazed intensely into the tunnel and listened carefully.
HF-LP (a) The man on the pictured wore a golden crown and sat stately on a throne. In his right hand he held a ship of considerable length.
(b) Before walking through the mountain, Johannes wanted to make sure, that no train was approaching. He gazed intensely into the sky and listened carefully.
HF-HP (a) The man on the picture fiddled around with models of Columbus' fleet. In his right hand he held a ship of considerable length. (b) Johannes heard a huge plane approaching from some distance. He gazed intensely into the sky and listened carefully.
Condition Kretzschmar et al. (2015) example experimental materials LF-LP I want to go to graduate school so I can help people with anorexia recover from their illness.

LF-HP
The extremely skinny model looked like she suffered from anorexia and a lack of sleep.
HF-LP Yesterday I noticed that we passed by a church on our way to the apartment.
HF-HP On Sunday morning, the nun went to pray at the church and then went for a walk.
Note: Target   LP and one that was biasing toward the target, making the target HP. The second, target sentence was identical across context conditions. Care was taken to ensure that the pretarget region of the second sentence was relatively neutral and did not contain, for example, intralexical primes of the subsequent target. Two sets of materials were prepared. Each set was comprised of 62 items from each of the four possible conditions (LF-LP, LF-HP, HF-LP, and HF-HP). Each set of materials was presented to a distinct participant group, such that targets that were LP in one set were HP in the other and vice versa. In this way, all participants were presented with all targets, but in only one of the contexts to avoid repetition of target sentences. The electroencephalogram (EEG) was recorded while participants read the passages. The first, context sentence was presented in its entirety. This was followed by a word-by-word presentation of the second, target sentence. In order to more closely imitate the speed of normal reading, a presentation rate of 300 ms/word was used. With these methodological controls in place, the pattern of our results should elucidate the degree to which contextual factors can influence lexical processing. While Dambacher et al. (2012) examined the frequency-predictability interaction using lengthy pretarget contexts and a presentation rate akin to normal reading, their practice of mixing and matching contexts and targets was potentially problematic (i.e. possible target anomalies). Kretzschmar et al. (2015) avoided this by using individualised contexts per target. However, their pre-target contexts were relatively short, the local sentence content across conditions differed, and target words were repeated across predictability conditions. We have built upon these investigations by employing the following methodology: (1) a lengthier, separate discourse context; (2) bespoke LP and HP context sentences that preceded an identical target sentence; (3) a target word embedded within the body of a sentence (not sentence-final); (4) a rapid, reading-like presentation rate; and (5) a high number of experimental items per condition. First, contextual information (LP or HP) was provided via an entire initial sentence, while the second target sentence remained relatively neutral in its pre-target region (see, e.g. Sereno et al., 2018). When materials comprise single-line sentences (sometimes using sentence-final targets), biasing contextual information tends to take the form of semantic associates appearing only a word or two away from the target, fostering intralexical priming and not higherlevel discourse processes. Second, the LP context sentences were specifically created to be relatively neutral with respect to target words in target sentences. In other studies, HP contexts for LF and HF targets were switched in order to create corresponding LP contexts (e.g. Dambacher et al., 2012;Rayner et al., 2004). This results in target words that are often semantically anomalous in context, not simply of lower predictability. As such, findings interpreted as (facilitatory) predictability effects may additionally include (inhibitory) anomaly effects. Third, target words were embedded within target sentences and were not sentence-final. Moreover, the same (second) target sentence was used in LP and HP conditions (only the first, context sentence differed). Thus, target position was not confounded with predictability, but was kept identical across context conditions. Finally, words in the target sentence were presented at a rapid, reading-like rate (300 ms SOA). Slower presentation rates (e.g. SOAs of 500-1000 ms) that have typically been used disrupt the normal flow of reading and encourage strategic processing.

Participants
Thirty native English-speaking members of the University of Glasgow community (21 female; mean age 22) took part in the experiment. All had normal or corrected-tonormal vision, had not been diagnosed with any reading disorder, were right-handed, and were either paid £10 or given course credit for their participation. The study conformed to British Psychological Society ethical guidelines and protocols. Data from an additional 8 participants were collected but were excluded from the analyses due to excessive alpha activity (N = 3), substantial drifts in EEG activity at multiple electrodes (N = 3), or having less than 50% of trials per condition remaining after artifact correction and rejection (N = 2).

Design and materials
A 2 (Frequency: LF, HF) × 2 (Predictability: LP, HP) design was used, with 62 target items in each of the four conditions. Target word length was limited to a range of 4-8 letters. Word frequencies were acquired from the British National Corpus (BNC), a corpus of 90 million written word tokens (Davies, 2004). Each LF or HF target appeared in the second (target) sentence of two-line passages. The average position of LF and HF targets was word 5.86 and 5.74 of the sentence, respectively. The first (context) sentence provided either a relatively neutral or semantically biasing context for each target, producing LP and HP conditions, respectively. The average lengths of first sentences were matched across conditions and were as follows: 61, 60, 61, and 59 characters for LF-LP, LF-HP, HF-LP, and HF-HP conditions, respectively. The level of target predictability was determined by a Cloze probability task administered to two groups of 20 participants (none of whom participated in the main experiment). These participants were given each item up to but not including the target word and were asked to generate the next word in the sentence. Items were scored as "1" for correct responses (i.e. the unseen target word) and "0" for all other guesses. Each participant group was only presented with one of the two context sentences for any target word; both groups were presented with equal numbers of neutral and biasing contexts. Target word specifications of length, frequency, and Cloze values 1 are presented in Table 2. All target words are listed in the Appendix (see Supplemental data). Example sets of materials for LF and HF targets in LP and HP conditions are presented in Table 3.
We conducted a small post-hoc rating study to examine the plausibility of targets within their contexts, for both the current set of materials and those of Dambacher et al. (2012). For both sets of materials, two lists were created, counterbalanced such that targets appeared only once per list, in either their LP or HP context. Plausibility ratings were obtained from typically-developed adult native speakers of English (N = 10) and of German (N = 10). Participants were asked to rate the plausibility of target words on a scale from 1 (highly implausible) to 7 (highly plausible), with 4 indicating neutrality (neither plausible nor implausible). We compared lower-and higher-predictability conditions across the two material sets. The mean plausibility rating of LP targets was 4.73 (SD = 1.91) for our materials and 3.63 (SD = 2.23) for Dambacher et al.'s materials. A Mann-Whitney U test revealed that the materials in the current study were rated as significantly more plausible than those employed by Dambacher et al. [U = 188458.5, z = −10.064, p < .001]. For HP targets, the mean plausibility rating was 6.30 (SD = 0.97) for our materials and 5.39 (SD = 1.68) for Dambacher et al.'s materials. As before, the analysis demonstrated that our HP materials were rated as significantly more plausible than those of Dambacher et al. [U = 36889.0 For the EEG experiment, two sets of materials were created. Only one version of each passage (LP or HP) was included in each set. Each set had a similar profile in terms of word length, frequency, and predictability characteristics of target words. 2 With a total of 248 target words, each of the two sets of materials comprised equal numbers of LF-LP, LF-HP, HF-LP, and HF-HP items (N = 62).

Apparatus
Participants were tested individually in an electrically shielded booth with low level ambient light. Experimental Run Time System (ERTS) software was used to control stimulus presentation (Dutta, 1995). Visual stimuli were presented on a 21 ′′ 1100 MB Samsung SyncMaster screen with a resolution of 1280 × 960 pixels and a refresh rate of 60 Hz. Participants were seated at a viewing distance of approximately 65 cm from the monitor, maintained throughout the experiment by means of a chin rest. Stimuli were presented centrally in Helvetica font (14-point for the first sentence; 16point for the word-by-word presentation of the second sentence) in white letters on a black background.

Procedure
Participants were first informed about the nature of electrophysiological recording and were given specific task instructions. They were told that they would be reading several two-sentence passages of text, that each passage was like a very short story, and that they should read normally for comprehension. They were Table 3. Example materials.

Condition
Passages comprised of context and target sentences LF-LP (a) Alison normally steamed her food but today she was in a hurry. She added the onion and peppers into the oil in the pan.
(b) Jill shuddered as the rain battered against her doors and windows.
In the morning, she noticed an enormous stain on the carpet.
LF-HP (a) Alison's eyes were watering as she chopped the vegetables. She added the onion and peppers into the oil in the pan.
(b) Jill's friends were drinking red wine all night in her flat.
In the morning, she noticed an enormous stain on the carpet.
HF-LP (a) Johnny enjoyed his first day at primary school. There was one particular story he liked about a tiger.
(b) The child ran home with something he had taken from the garden.
It was a small stone which had come from the gravel path.
HF-HP (a) Johnny liked his father to read to him before bedtime. There was one particular story he liked about a tiger.
(b) I could feel something in my shoe which dug into my heel.
It was a small stone which had come from the gravel path.
Note: Target words are underlined. LF = low frequency; HF = high frequency; LP = low predictability; HP = high predictability. instructed that the first sentence would be presented in full and that the second would be presented word-byword. They were asked to maintain fixation at the centre of the screen during the word-by-word presentations. Following electrode application, participants sat facing the computer screen with their heads stabilised via a chin rest. Each trial began with a central red fixation cross presented for 500 ms, signalling the start of a new passage, replaced by a white fixation cross for another 500 ms. The first, context sentence was then presented on the screen for a minimum duration of 1500 ms. When participants had finished reading the sentence, they pressed the spacebar on the computer keyboard which initiated the word-by-word presentation of the second, target sentence. Once the spacebar was pressed, a blank screen was presented for 500 ms, followed by a central white fixation cross for 500 ms. Then each word of the second sentence was displayed centrally for 267 ms, with a 33 ms blank interval between successive word presentations. After the final word of the second sentence, the screen remained blank for 1000 ms before the next trial began.
Participants were first presented with a practice block of two trials to become accustomed to the procedure and presentation of materials. All 248 experimental trials were then presented in a different random order for each participant, divided into 8 blocks of 31 trials with self-paced rest periods. The entire experiment including participant preparation lasted approximately 1.5 h.
After removing epochs containing extreme values in single electrodes (e.g. amplifier blockings, values> ±1000 µV in any electrode) and trials containing values exceeding ± 75 μV in multiple adjacent electrodes unrelated to eye movements, z-scored variance measures were calculated for all electrodes. Noisy EEG electrodes (z-score>±3) were removed if their activity was uncorrelated to EOG activity and this "cleaned" EEG data set was subjected to a spatial independent components analysis (ICA) based on the infomax algorithm (Bell & Sejnowski, 1995). ICA components representing ocular activity (blinks and horizontal eye movements) were automatically identified using z-scored measures of the absolute correlation between the ICA component and the recorded hEOG and vEOG activity, respectively, and confirmed by visual inspection. Then, previously removed noisy channels were interpolated in the ICAcleaned EEG data set using the average EEG activity of adjacent uncontaminated channels within a specified distance (4 cm, ∼3-4 neighbours per electrode). This ensured a full electrode array for each participant. Following artifact rejection and correction, there remained on average 58.8 trials (out of 62) per condition (range: 49-61; median = 60).

Data analysis
All EEG/ERP analysis was performed using available MATLAB toolboxes (EEGLAB: Delorme & Makeig, 2004;FieldTrip: Oostenveld, Fries, Maris, & Schoffelen, 2011) and custom MATLAB scripts. The analysis epoch started 1000 ms prior to the onset of the critical word and lasted until 1500 ms after it, resulting in a total epoch duration of 2500 ms. Off-line, all EEG channels were recalculated to an average mastoid reference. For artifact-free trials, the signal at each electrode site was averaged separately for each experimental condition, timelocked to the onset of the critical word, low-pass filtered (30 Hz, 36 dB/oct), and aligned to a 200 ms baseline prior to the onset of the critical word. In line with the analysis procedures of ERPs established in previous research (e.g. Dambacher et al., 2009;Hauk & Pulvermüller, 2004;Scott et al., 2009), mean amplitudes of specific ERP deflections were measured for the following time intervals: 50-80 ms, 80-120 ms (P1); 160-200 ms (N1), 200-300 ms (N2), and from 350 to 550 ms (N400). In addition, aligned to a 200 ms baseline interval preceding the pre-critical word (−500 to −300 ms), we determined mean ERP amplitude during the 200 ms interval immediately before the onset of the critical word (−200-0 ms).

Frequency
Significant word frequency effects were demonstrated in the P1, N2, and N400 components. In the P1, LF words, in comparison to HF words, demonstrated enhanced negativity in anterior ROIs and enhanced positivity in posterior ROIs. In the N2, LF words showed enhanced positivity over HF words in midline and anterior ROIs. In the N400, LF words demonstrated enhanced negativity in anterior ROIs and enhanced positivity in posterior ROIs.

Predictability
Significant contextual predictability effects occurred in all time windows except the P1. In the 50-80 ms window, LP words were more positive-going than HP words over left-and midline-anterior ROIs, but more negative-going than HP words over the left-posterior ROI. In the N1, LP words were more negative-going than HP words over midline-and right-posterior ROIs. In the N2, LP words were more positive-going than HP words in the left-anterior ROI, but were more negativegoing in the midline-posterior ROI. In the N400, there were widespread predictability effects, with the most pronounced differences emerging in the left-anterior ROI, with LP words more positive-going than HP words, and in midline-central and midline-posterior ROIs, with LP words more negative-going.

Frequency × Predictability
Frequency and predictability interacted in all but the earliest time window. In the P1, frequency effects emerged selectively for HP words in anterior and central ROIs (enhanced negativity to LF words) as well as posterior ROIs (enhanced positivity to LF words). There were no frequency effects for LP words, nor predictability effects for LF or HF words. In the N1, frequency effects occurred in LP and HP words. For LP words, LF words were more negative-going over left-hemispheric ROIs and more positive-going over right-hemispheric ROIs. For HP words, in contrast, LF words were more positive-going over left-hemispheric ROIs. N1 predictability effects only occurred in HF words, with greater positivity to LP than HP words over left-hemispheric ROIs and greater negativity to LP words over right-hemispheric ROIs. In the N2, frequency effects emerged selectively for LP words in left-hemispheric ROIs (greater negativity to LF than HF words) and in midline and right-hemispheric ROIs (greater positivity to LF than HF words). N2 predictability effects emerged selectively to HF words in left-hemispheric ROIs (more positive-going in LP than HP words) and in midline ROIs (more negative-going in LP than HP words). In the N400, in HP words, frequency effects occurred in right-hemispheric and anterior ROIs (greater negativity to LF than HF words) and in posterior ROIs (greater positivity to LF than HF words). LP words only demonstrated a frequency effect in posterior ROIs (greater positivity to LF words). N400 predictability effects were widespread in both LF and HF words, demonstrating greater positivity to LP than HP words in left-hemispheric ROIs and greater negativity to LP than HP words in midline ROIs and right-hemispheric ROIs (HF only). Predictability effects were also reliable for both LF and HF words across anterior, central, and posterior ROIs, with greater positivity to LP than HP words over anterior ROIs, but greater negativity to LP than HP words over central and posterior ROIs.

Discussion
Our study examined the relative timing of word frequency and contextual predictability effects in reading using electrophysiological recordings. Our approach is unique in that we implemented several methodological procedures to assure a systematic investigation of such effects. In general, past ERP studies that have examined word frequency and predictability effects have typically used more than one of the following: relatively short contexts; biasing contexts containing semantic primes that are proximal to the target (e.g. Kretzschmar et al., 2015;Penolazzi et al., 2007); potentially anomalous contexts to represent "low predictability" conditions (e.g. Dambacher et al., 2012); sentence-final targets; and/or a slow presentation rate. Such procedures may not promote discourse processing associated with normal reading. Our methodology addressed these concerns in several ways. We employed a 2 (Frequency: LF, HF) × 2 (Predictability: LP, HP) design using a high number of items per condition (N = 62). Contexts were relatively long and independent from the target sentence. As neutral and biasing contexts were tailor-made for each target word, LP targets were not anomalous. Target words were not sentence-final, but were embedded within the discourse. Target position was not confounded with its predictability. Finally, a rapid, reading-like rate of 300 ms/word was used for target sentence presentation.
ERPs to target words across conditions were analysed across five time windows: 50-80 ms, 80-120 ms (P1), 160-200 ms (N1), 200-300 ms (N2), and 350-550 ms (N400). Analyses of the data suggest a complex pattern of results. We will first summarise our results in each successive time window, relating our findings to those of previous studies.
The earliest time window of 50-80 ms revealed a predictability effect with a more positive-going waveform to LP than HP words over left-and midline-anterior ROIs, and a more negative-going waveform to LP than HP words over the left-posterior ROI, which cannot be attributed to ERP differences arising in the baseline interval preceding the critical word. Dambacher et al. (2009) is the only other study to our knowledge that has also reported such an early (50-90 ms) predictability effect. However, the topography and direction of their effect differed, with the LP condition more negative than HP in right-and midline-anterior sites, but more positive than HP in left-posterior sites. It is interesting to note that in an auditory sentence processing study, van Berkum, Brown, Zwitserlood, Kooijman, and Hagoort (2005) report a larger anterior as well as right-posterior positivity to a prediction inconsistent than prediction consistent word between 50 and 250 ms. We are unable to offer an explanation for these mixed patterns of early ERP predictability effects aside from procedural differences.
In the P1 (80-120 ms), word frequency effects emerged, showing greater positivity in posterior sites, with reversed polarity in anterior sites. This is in accord with Hauk et al. (2006) who found a larger P1 to LF than HF words over left posterior sites. This frequency effect was modulated by predictability, with follow-up tests showing that the effect was limited to HP words. LF-HP words showed greater positivity than HF-HP words over posterior ROIs, but greater negativity in central and anterior ROIs. This is a novel finding demonstrating early interactive processing. In a similar time window (110-130 ms), Penolazzi et al. (2007) had found that both frequency and predictability interacted separately with word length, however, these factors did not interact with each other. It is possible that their study had insufficient power given their lower number of participants in comparison to that of the current study (17 vs. 30, respectively).
In the N1 (160-200 ms), the frequency-predictability interaction again emerged, but with a different pattern of effects. There were frequency effects for both HP and LP conditions. Over left-hemispheric ROIs, ERPs to LF words were more positive than HF words in the HP condition, but were more negative than HF words in the LP condition. In the LP condition, LF words were additionally more positive than HF words in right-hemispheric ROIs.  also found a frequency-predictability interaction in a similar time window (132-192 ms). They found that although ERPs were more negative to LF than HF words in both LP and HP conditions, the effect was of smaller magnitude in the HP condition. Dambacher et al. (2012), using a 280 ms SOA, also showed a frequency-predictability interaction in an early N1 (135-155 ms) and a significant effect of frequency in a later N1 (190-260 ms). In the early N1, frequency effects for HP words were more negative over posterior channels and more positive over anterior ones.
In the N2 (200-300 ms), there was a frequency effect over both midline and anterior ROIs with a greater positivity to LF than HF words. As in the N1, the frequency effect was modulated by predictability in interaction with hemisphere. However, only LP conditions demonstrated frequency effects in terms of a more negativegoing ERP to LF than HF words over left-hemispheric ROIs, but a more positive-going ERP over midline-and right-hemispheric ROIs.
In the N400 (350-550 ms), there was a frequency effect that emerged as a larger negative-going amplitude to LF than HF words over anterior ROIs and this frequency effect was reversed in polarity over posterior sites, possibly indicating a P300-like effect (cf. Polich & Donchin, 1988). Crucially, this effect was again modulated by predictability, as it was present over anterior ROIs only for the HP but not the LP condition, whereas both HP and LP conditions showed a reliable frequency effect over posterior ROIs. In addition, there was a frequency effect over left-hemispheric ROIs limited to the HP condition. Previous studies have demonstrated in the N400 that frequency effects are present only for LP but not for HP conditions (Dambacher et al., 2006;van Petten & Kutas, 1990), that frequency and predictability produce additive effects (Penolazzi et al., 2007), or that only predictability but not frequency effects occur (Dambacher et al., 2012;Kretzschmar et al., 2015).
Finally, there was also a predictability effect that was present over posterior-midline ROIs in the three successive time windows from 160 to 550 ms. This effect was characterised by a greater negative-going ERP for LP than HP conditions. Prior studies have also demonstrated significant predictability effects, with greater negativity for LP than HP conditions, with a similar topography: Dambacher et al. (2012), using a 280 ms SOA, found a predictability effect from 300 to 500 ms post-stimulus; Kretzschmar et al. (2015) showed sustained effects of predictability in successive 50 ms time windows, from 150 to 650 ms after fixation onset.
Taken together, our findings are generally supportive of other studies that have examined word frequency and contextual predictability effects in reading. Undoubtedly, there are certain discrepancies in terms of the topography and/or the polarity of effects. It is possible that the procedure we implemented, in particular, the rapid presentation rate, could have given rise to some of the differential findings. For example, if contextual benefits, broadly construed, were already expressed on the pretarget word, then the later-onset components associated with the pre-target word would be occurring during the same time interval as the earlier components associated with the target word. If this were the case, it would suggest an even earlier temporal onset of predictive processing that potentially may be additionally sensitive to the frequency of the upcoming word. Dimigen et al. (2011), however, observed that component overlap can be an issue, but only if the amount of overlap systematically differs between conditions. Target sentences in our LP and HP conditions were identical in terms of their content and word presentation rate.
Perhaps the most compelling aspect of our findings is the confirmation of an early time course of effects in an experimental situation akin to reading within a natural discourse context. The onsets of the main effects of word frequency and contextual predictability replicated those reported, respectively, by Hauk et al. (2006) in the P1 and by Dambacher et al. (2009) in an interval preceding the P1. We additionally demonstrated an interaction of frequency and predictability with a P1 onset that was sustained through the N400. These electrophysiological results can be used to constrain computational models of written language processing (for a review, see Barber & Kutas, 2007). There is growing evidence for rapid neural sensitivity and response to expectation in visual perception, in particular, via an early topdown influence from orbitofrontal cortex Trapp & Bar, 2015). Our early time frame and topography of effects suggest that context can influence activation of low-level visual features, a component central to models of language processing (Grainger & Holcomb, 2009Price & Devlin, 2011). Our findings of an early and robust frequency-predictability interaction provide strong evidence for interactive processing during lexical access in reading. Notes 1. Our mean HP Cloze values were .61, which is in line with HP conditions of eye movement reading studies that have examined frequency-predictability effects (see, e.g., Sereno et al., 2018). Moreover, the 95% confidence intervals ( However, certain methodological issues should be considered. In Dambacher et al.'s Cloze procedure, participants were allowed three guesses of the upcoming target word (typically, only one is given), functionally increasing the likelihood that the target would be guessed. In Kretzschmar et al., context was not provided in a separate sentence, but was limited to the first few words of the sentence. As such, although the mean Cloze values for HP conditions were relatively high, there was an increased risk of context operating via intralexical priming and not higher-level discourse processes (e.g., in Table 1, the words Sunday, nun, and pray closely precede the target church). 2. We performed analyses on the two alternative sets of materials in order to confirm whether there were any differences in terms of target length, frequency, and predictability, as well as number of words preceding the target, and number of characters in the first context sentence. No evidence of any differences between the two sets of materials was found on any of these dimensions [all Fs < 1].

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was supported by an Economic and Social Research Council [grant number RES-062-23-1900].