The role of word frequency and morpho-orthography in agreement processing

ABSTRACT Agreement attraction in comprehension (when an ungrammatical verb is read quickly if preceded by a feature-matching local noun) is well described by a cue-based retrieval framework. This suggests a role for lexical retrieval in attraction. To examine this, we manipulated two probabilistic factors known to affect lexical retrieval: local noun word frequency and morpho-orthography (agreement morphology realised with or without –s endings) in a self-paced reading study. Noun number and word frequency affected noun and verb region reading times, with higher-frequency words not eliciting attraction. Morpho-orthography impacted verb processing but not attraction: atypical plurals led to slower verb reading times regardless of verb number. Exploratory individual difference analyses further underscore the importance of lexical retrieval dynamics in sentence processing. This provides evidence that agreement operates via a cue-based retrieval mechanism over lexical representations that vary in their strength and association to number features.


Introduction
Number is expressed in English on nouns, verbs, pronouns, and determiners. Expressing number on these words makes it clear which nouns control which verbs, allowing a comprehender to easily establish syntactic dependencies in sentences. The typical pattern in English is that plural nouns receive an -s ending, such that the plural of skirt is skirts; however, some nouns mismatch this pattern. For example, some words, like cactus or dress, match the typical plural pattern (-s) when singular and some words, like men and cacti, match the typical singular pattern (no -s) when plural. These morphological patterns further co-vary with word frequency, such that atypical plurals like men are frequent, while atypical plurals like cacti are infrequent. Both factors could lead to trouble with agreement, such that probabilistic cooccurrences between nouns and verbs and between noun spelling and verb inflection might cause infrequent, atypically spelled singular nouns like cactus to be mis-read as plural. In the current study, we investigate how morpho-orthography and word frequency impact number agreement processing. This showcases the lexical properties influencing the processing of subjectverb dependencies and in turn sheds light on the underlying mechanisms of sentence processing and the representations these mechanisms operate over.

Cue-based retrieval mechanisms in agreement processing
In order to process agreement, a reader must retrieve and maintain lexical items, making a guess as to what noun will be the subject of a downstream verb from partial, continuously changing information. Though this processing is typically successful, mishaps occur. One common mishap is agreement attraction, where properties of an intervening but grammatically irrelevant noun influence readers' processing speed and acceptability judgments, as well as speakers' production difficulty. An example of this appears in sentences 1 and 2. When sentences contain an ungrammatical verb (were) that matches in number with a local (nonsubject) noun (cabinets, 1a), there is reduced processing difficulty compared to sentences where the local noun does not match the ungrammatical verb (1b; Pearlmutter, Garnsey, & Bock, 1999;Wagers, Lau, & Phillips, 2009). This means that ungrammatical verbs paired with a feature-matching local noun elicit a processing pattern more similar to a comparable grammatical sentence (2a vs. 2b).
(1a) *The key to the cabinets were rusty. (1b) *The key to the cabinet were rusty. (2a) The key to the cabinets was rusty. (2b) The key to the cabinet was rusty.
The cue-based retrieval framework provides a set of mechanisms to account for agreement attraction in ungrammatical sentences (Lewis & Vasishth, 2005;McElree, Foraker, & Dyer, 2003;Van Dyke & McElree, 2011; for reviews, see Lewis, Vasishth, & Van Dyke, 2006;and Van Dyke & Johns, 2012; for meta-analysis, see Jäger, Engelmann, & Vasishth, 2017). The premise is that retrieval cues allow the reader to establish syntactic dependencies between words in sentences; if retrieval cues are weak, processing is more difficult and interference is more likely to occur during retrieval (e.g. Van Dyke, Johns, & Kukona, 2014).
For agreement, this means that encountering an unpredicted ungrammatical verb triggers a memory search for a noun that is a plausible controller for the verb. When a feature-matching non-subject noun is retrieved instead of the subject, a sentence can appear to be grammatical when it is not, eliciting a reduced penalty for ungrammatical verbs when an attractor is present (e.g. Dillon, Mishler, Sloggett, & Phillips, 2013;Lago, Shalom, Sigman, Lau, & Phillips, 2015;Martin, Nieuwland, & Carreiras, 2012;Tanner, Nicol, & Brehm, 2014;Wagers et al., 2009). This means that retrieval dynamics elegantly capture much of the data on how grammatical number affects number agreement.

Probabilistic factors in agreement
However, there is also clear evidence for probabilistic factors distinct from grammatical number impacting number agreement production. This has led to the development of probabilistic agreement models, providing an alternate theoretical framework accounting for attraction (e.g. Haskell, Thornton, & MacDonald, 2010;Mirković & MacDonald, 2013;Smith, Franck, & Tabor, 2018). Probabilistic factors known to impact agreement production include syntactic properties of the language, such as the degree of inflectional morphology (Foote & Bock, 2012;Franck, Vigliocco, Antón-Méndez, Collina, & Frauenfelder, 2008) and the reliability of inflectional morphology (Mirković & MacDonald, 2013). They also include conceptual properties, such as the likelihood of a number cue matching the number concept (e.g. Brehm & Bock, 2013Eberhard, 1999;Haskell & MacDonald, 2003;Haskell et al., 2010;Humphreys & Bock, 2005;Mirković & MacDonald, 2013;Smith et al., 2018;Vigliocco, Butterworth, & Garrett, 1996), and morphophonological properties, such as the transparency of number marking in spoken forms (Haskell & MacDonald, 2003;Lorimor, Jackson, Spalek, & van Hell, 2016). These probabilistic cues tend to have small effects in isolation, often only reaching significance in combination with each other: in particular, effects of morphophonology and number marking are shown in combination with conceptual number.
It is also the case that other types of interference occur in processing: interference also occurs during encoding (Villata, Tabor, & Franck, 2018), such that more recent elements can overwrite older ones when they have similar semantic or morphological features. Encoding interference is not captured in classic cuebased retrieval model. However, as Villata et al. (2018) demonstrate, encoding interference and other similarity effects arise naturally as a consequence of lexicallyrooted, probabilistic structure assembly (SOSP) models, providing strong evidence that probabilistic factors are likely to be an essential piece of agreement comprehension.
While probabilistic models have to date mainly gained traction in agreement production research, we also note that what is produced has consequences for what is observed. This means that when a factor impacts agreement production, it also increases the likelihood of observing an error in comprehension. As such, a goal of the current study is to assess whether there is evidence for probabilistic factors in agreement comprehension, establishing a link at the lexical level between agreement in production and comprehension. This situates the current work within larger theoretical frameworks integrating language production and comprehension (e.g. Dell & Chang, 2014;Pickering & Garrod, 2013).
Linking cue-based retrieval and probabilistic agreement: the role of lexical retrieval The larger goal of the current study is to assess the underlying mechanisms and representations behind agreement processing. There is a potential mechanistic link between cue-based retrieval and the mechanisms by which probabilistic factors affect language processing, as noted by Lorimor and colleagues (Lorimor et al., 2016) for agreement production and by Van Dyke and colleagues (Van Dyke et al., 2014;Van Dyke & Johns, 2012) for comprehension. The critical dimension is lexical retrieval itself.
As reviewed in the first section, lexical retrieval is the primary driver of agreement attraction in a cue-based retrieval framework. Ungrammatical verbs trigger the need to retrieve nouns from earlier in the sentence that have a matching plural feature. It is also the case that lexical retrieval is affected by the same sorts of probabilistic factors that impact agreement production. Work on single-word production has carefully examined the factors that impact lexical retrieval, which include number morphology (e.g. Baayen, Dijkstra, & Schreuder, 1997; and semantics (e.g. Abdel Rahman & Melinger, 2009;Damian, Vigliocco, & Levelt, 2001;see Strijkers & Costa, 2011, for a recent review of findings from multiple paradigms). Since these factors also influence agreement, the implication is that probabilistic variations in retrieval might provide coverage for a variety of findings within agreement, linking cue-based retrieval with probabilistic models of agreement. Investigating the role of lexical properties in agreement therefore extends and links these theoretical models of psycholinguistics.
We operationalised lexical retrieval difficulty in the current study with two probabilistic factors. The first factor was word frequency, manipulated by the frequency of inflected nouns: Infrequent words tend to be more difficult to retrieve (e.g. Bates et al., 2003;Jescheniak & Levelt, 1994). The second factor was morphoorthography, as manipulated based upon the presence of -s endings crossed with plural morphology: Infrequently observed spelling patterns make lexical retrieval more difficult (e.g. Andrews, 1997;Coltheart, Davelaar, Jonasson, & Besner, 1977;Grainger, O'Regan, Jacobs, & Segui, 1989;Snodgrass & Mintzer, 1993).
For agreement, there is also a probabilistic relationship between spelling and number inflection, such that words that end in -s are typically plural, with the -s affix serving as the regular plural marker (skirts, boys, yuccas; but c.f. dress, cactus), and such that words with no -s ending are typically singular (skirt, boy, yucca; but c.f. men, cacti). Therefore, the presence of a misleading -s on local nouns could have an impact on attraction. Because it appears frequently as an independent morpheme, readers may aim to decompose the -s affix even when this is not licensed (e.g. as outlined in Rastle & Davis, 2008), leading to attraction effects. Similarly, the -s ending itself may have a direct but probabilistic impact on agreement processing (see, e.g. reduced attraction production for irregular -s absent plurals but only when paired with other cues; Haskell & MacDonald, 2003).
Morphology and frequency also tend to co-vary in English, such that irregular plurals are often extremely frequent (men) or extremely infrequent (cacti). This may have a strong impact on agreement, with interactive effects such that frequency effects are stronger for irregular words (e.g. Allen, Badecker, & Osterhout, 2003) and lower-frequency words are more likely to be stored in a decomposed fashion (e.g. Alegre & Gordon, 1999;Baayen et al., 1997). This means that morphoorthography and word frequency are ideal to manipulate in tandem.
We assess the dynamics of lexical retrieval in agreement by examining processing effects separately on regions that relate to noun and verb processing. This is a way in which the present study differs from many previous studies of agreement attraction, which tend to focus on processing costs on the verb region only. This analysis approach allows us to observe whether there are differences in how lexical retrieval related factors affect processing of nouns and copular verbs. Processing both nouns and verbs is likely to require lexical retrieval, but the representations that retrieval operates upon could differ. Evidence for this claim comes from the fact that different factors affect reading times in the noun and verb regions of sentences (see Wagers et al., 2009). Consistent with increased lexical retrieval difficulty, plural nouns tend to elicit slower reading times as a main effect at the noun and following word. However, the effect disappears at later regions, such that reading times are slower for only ungrammatical verbs that are preceded by a local singular noun. This shows a dissociation between what affects noun processing (number marking) and verb processing (the presence of a plural in memory once retrieval has been triggered); the question raised in the current work is whether the same set of probabilistic factors affect both sentence regions. Answering this question shows whether lexical retrieval affects noun and verb processing equivalently, showcasing the way memory retrieval supports sentence processing.

Current study
In the present study, we examine whether word frequency and morpho-orthography, two factors that can hinder lexical retrieval, also impact subject-verb number agreement and the preceding processing of a local noun. We manipulated the frequency, morphoorthography, and number of local nouns and the grammaticality of the following verb embedded in attraction-inducing sentences like "The landscaper who planted the cactus already was/*were anticipating the dry summer" (critical local noun and verb underlined, see Methods for more details) to test whether probabilistic retrieval cues modulate attraction. Local nouns were varied in their morpho-orthography (e.g. man/ men; dress/dresses; cactus/cacti) and compared with frequency-matched typical morphology count nouns to isolate the role of frequency from the role of morphoorthography in comprehension.
Predicted differences in the sentence's verb region can dissociate the relative roles of cue-based retrieval and probabilistic factors in agreement. A hypothesis derived from a strict, non-probabilistic cue-based retrieval framework is that only number will impact agreement. The cue-based retrieval framework suggests that agreement attraction occurs due to the retrieval of a plural noun (or plural feature), irrespective of how number was instantiated. As such, we predict that ungrammatical plural verbs preceded by a local singular noun will increase reading times versus local plural nouns. This is the standard attraction pattern.
In contrast, if a reader uses any and all cues for number agreement, as suggested by a strong probabilistic model of agreement, an alternate hypothesis is that noun morpho-orthography will also lead to differences in verb reading time such that -s endings will increase attraction, with less attraction observed for men and cacti type items than matched controls. Such a pattern would suggest that probabilistic cues such as -s endings can lead to the perception of grammaticality. In the case that the quality of lexical representations leads to probabilistic agreement difficulty (as suggested by Van Dyke et al., 2014;Van Dyke & Johns, 2012), a third prediction is that infrequent nouns should lead to increased attraction, such that the most attraction is observed for the least frequent items. Such a pattern would suggest that the strength of noun representations impacts agreement.
In contrast to the varied hypotheses for the verb region, the noun region predictions are relatively simple: processing time in the noun region is likely to be driven by retrieval difficulty. 1 It is predicted that frequency, morpho-orthography, and number will all impact reading times on the local noun, given their established link to lexical retrieval. Previous work has shown that plural nouns require more processing time than singular nouns (see Wagers et al., 2009). We predict that this will also be the case for infrequent nouns, which may have a more weakly specified lexical representation, and nouns with misleading morpho-orthography (-s ending in singular form, like dress and cactus; absence of -s ending in plural, like men or cacti); both properties affect lexical retrieval, and as such, are predicted to affect noun reading times.

Participants
One hundred twenty-five participants were recruited from the University of Illinois at Urbana-Champaign community for $7 compensation or course credit. Data from three participants were excluded (one due to technical issues, one who refused to put away her phone during the experiment, and one who was an older adult).

Design
All participants completed five computerised tasks in the same order. Participants first performed a self-paced sentence reading task. This was followed by four individualdifference measures that were used in exploratory analyses that focus on individual differences in lexical representations and retrieval ability (a vocabulary test, a reading span test, a Stroop test, and verb generation, see Appendix A for details). The entire experimental session took approximately 1 hour to complete; individual difference measures were omitted for time as necessary to keep the session under an hour.
The main experiment had 48 critical stimuli like, "The landscaper who planted the cactus already was/*were anticipating the dry summer" that varied in the frequency, morpho-orthography (presence of a word-final -s), and grammatical number of local nouns embedded in subject-extracted relative clauses, paired with verbs varying in grammaticality (see Examples 1-3 in Table  1). All items contained an adverb before the verb, serving as a spillover region for the noun. This allows better dissociation of reading time effects based upon noun and verb processing (e.g. Wagers et al., 2009).
To manipulate local noun morpho-orthography, we varied whether items ended in -s in their singular and plural forms. These patterns also co-varied with word frequency. As can be seen in Table 1, items came in three atypical orthographic patterns: high frequency with no -s in either form (man/men), medium frequency with -s for both forms (dress/dresses), and low-frequency with an -s in the singular but not plural (cactus/cacti).
To separate the role of word frequency from morphoorthography, each orthographically atypical item was paired with a fully regular, orthographically typical noun (control, see Table 1), matched across relevant properties as detailed in the Material Creation section below. Local noun orthographic type (atypical vs. control) was fully crossed with verb grammaticality (was (correct) vs. were (error)) and local noun number (plural vs. singular) for a total of eight versions of each item. These were distributed into eight lists such that one version of each item was presented to each participant and such that each list contained an equal number of items of each form.
The 48 critical sentences were mixed with 60 fillers for a total of 108 sentences. Fillers included a mixture of plural and singular nouns with regular spelling contained in structures varying in difficulty. These included items containing noun phrases with prepositional phrase modifiers, items with subject-and object-extracted relative clauses containing copulas and lexical verbs, and locally ambiguous sentences containing DO-and SCbiased verbs. Most fillers were grammatical (52), with eight ungrammatical fillers. Items were pseudorandomized within each list using the program Mix (van Casteren & Davis, 2006) to prevent any within-condition repeats on back-to-back trials. A list of critical items can be found in Appendix B.

Material creation
Critical items were created with a search in the MRC Psycholinguistic database (Coltheart, 1981) for items that ended in -s in the singular form and/or did not end in -s in the plural form. We then extracted log frequencies for these items from the Corpus of Contemporary American English (COCA, Davies, 2008). Stimulus development was done in the spring of 2014; frequencies reflect numbers taken from the corpus at that time.
As noted above, the three orthographic classes of items varied inherently in word frequency, leading to creation of a control condition. Control items were matched for suitability in the same sentence context based upon the N-gram log frequency in COCA for the association of each head noun with the control and critical local noun (occurrence of local noun within nine words after head noun) and the N-gram log frequency for the association of the relative clause verb and local noun (occurrence of local noun within five words after the RC verb). In addition to matching suitability in context, we aimed to match the frequency of control versus atypical items, as done by minimising the difference in log frequency for each version of an item (see Table 2). We also statistically controlled for the word frequency of both singular and plural local nouns by including log local noun token frequency as a covariate in analyses. See Appendix B for frequency measurements by item.

Procedure
Participants read sentences in a non-cumulative, selfpaced moving-window display (Just, Carpenter, & Woolley, 1982) presented using Paradigm software (Perception Research Systems, 2007). Sentences appeared on a single line with dashes replacing letters and punctuation but with spaces preserved between words. With each mouse click, a single word was revealed at a time, with all other words masked; participants were not allowed to revisit words of the sentence. Following the final word, a yes/no comprehension question appeared on a new screen. This gauged comprehension of information unrelated to the subject-verb agreement or local noun manipulation (e.g. "Did the summer tend to be dry?" for Sentence 3 in Table 1). Across the items in a list, odds of correct yes/no responses and the side of the screen on which yes or no appeared (left or right) were counterbalanced. Wordby-word reading times and comprehension question accuracy were recorded.

Data analysis
Self-paced reading performance was indexed by the amount of time elapsed before clicking the mouse to progress to the next word in the sentence. We excluded critical trials with incorrect responses to comprehension questions, which resulted in a loss of 8.97% of trials (range by participant: 4.76-25.68% incorrect trials). Of the correct response trials, we then removed word-byword reading times that were less than 200 ms and greater than 3000 ms (2.53% of words, resulting in a loss of 0.49% of data) and word-by-word reading times more than three standard deviations above participants' mean reading times, resulting in a loss of an additional The celebrity who promoted the dress/ dresses seldom was/were seen without a big entourage.
Critical Medium 2b The celebrity who promoted the skirt/ skirts seldom was/were seen without a big entourage.
Control Medium 3a The landscaper who planted the cactus/ cacti already was/were anticipating the dry summer.
Critical Low 3b The landscaper who planted the yucca/ yuccas already was/were anticipating the dry summer.

Control Low
2.53% of the data. 2 It is likely that these slow reading times reflect trials where a word was unknown to the participant or where the response time was slowed for reasons outside the process of interest. Data analysis used Bayesian linear mixed-effects models in R (version 3.3.3, R Core team, 2017) using the package brms (version 1.10, Bürkner, 2017). Bayesian analyses are concerned with the likely magnitude of effects rather than statistical significance, making them well-suited to quantifying evidence for various frameworks (see e.g. Nicenboim, Vasishth, Engelmann, & Suckow, 2018; for an application of Bayesian modelling to sentence processing; see Sorensen, Hohenstein, & Vasishth, 2016, for a tutorial). Here, our interest is in quantifying the size of parameters and the uncertainty around them. The size of reported betas reflects estimated effect sizes; betas with larger absolute values reflect larger effects. In the main text, we report the parameters for which the 95% Credible Intervals do not contain zero, which is analogous to the frequentist null hypothesis significance test: the parameter has a non-zero effect with high certainty. We also report any parameters for which the point estimate for the beta is about twice the size of its error, as this also provides evidence for an effect: the estimated effect is large compared to the uncertainty around it. We also report the posterior probability of these weak effects, indicating the proportion of samples with a value equal to or more extreme than the beta estimate.
All models had four chains and each chain had 12,000 or 14,000 iterations (listed in model output tables), with the first half representing a burn-in period. All models had weak normal priors with an SD of two, with the distribution centred at six for the intercept and centred at zero for all other parameters. This represents a weak expectation for an intercept that reflects the average reading time across the experiment, and a widelyspread, weak expectation of null effects for all other parameters such that a wide range of effects, including minimal and extreme ones, would be consistent with the prior. All models were run until theR value for each estimated parameter was 1.00, indicating full convergence.
Log-transformed reading times were predicted in four regions. Reading times at the local noun and the following word (local noun spillover region) were used to assess the effects of predictors on retrieving and processing the local noun; reading times at the main verb and following word (main verb spillover region) were used to assess the effects of predictors on retrieving and processing the verb's controller.
We included two continuous predictors in our models. The first accounted for the frequency of local nouns: the log-transformed Local Noun Frequency (for plural and singular tokens separately) was centred and entered as a covariate in all models. In addition, the length of the word in the region of interest (Word Length, centred) was entered as a covariate in each model. Both factors were entered as main effects only. 3 Categorical predictors included Local Noun Number (Singular, Plural), Local Noun Class (Low frequency = cactus/cacti/yuccas/yucca; Medium frequency = dress/ dresses/skirt/skirts; High frequency = man/men/boy/boys), Local Noun Morpho-orthography (Control, Atypical), and Grammaticality of the verb (Grammatical, Ungrammatical). Contrasts for the two-level variables (Local Number, Local Noun Morpho-orthography, and Grammaticality) were always −.5 and .5. Local Noun Class was coded with a simple effects contrast-coding scheme, which test pairwise comparisons between levels while preserving the interpretation of main effects, such that the intercept corresponds to the mean of cell means. The first contrast compared the two Noun Classes containing irregular plurals, with the following coefficients assigned to each level (Low: cactus/cacti/yucca/yuccas = −1/3 versus High: man/men/ boy/boys = 2/3, Medium: dress/dresses/skirt/skirts = −1/3) and the second compared the two Noun Classes ending in -s in the singular, with the following coefficients assigned to each level (Low: cactus/cacti/yucca/ yuccas = −1/3 versus Medium: dress/dresses/skirt/skirts = 2/3; High: man/men/boy/boys = −1/3). Effects of Local Noun Class without taking into account Local Noun Morpho-orthography reflect properties of binned word frequency, with comparisons between low and high frequency nouns in the first contrast and comparisons between low and medium frequency nouns in the second. In interaction with Local Noun Morpho-orthography, the Local Noun Class contrasts reflect the effect of type frequency for each of the morpho-orthographic patterns. The first Local Noun Class contrast represents the effect of irregularly-marked nouns versus typical controls, and the second represents the effect of the local -s for atypical nouns versus typical controls. In addition to these contrasts, we also created 95% credible intervals for relevant pairwise comparisons using the functions emmeans, version 1.3.3 (Lenth, 2019) and tidybayes, version 1.0.4 (Kay, 2019). The random effect structure for the noun-region models included random intercepts for Participants and Items, random slopes of Local Noun Number, Local Noun Class, Local Noun Morpho-orthography, Local Noun Frequency, and Word Length by Participants, and random slopes of Local Noun Number and Local Noun Morpho-orthography by Items. The random effect structure for the verb-region models contained all of the same predictors, as well as random slopes of Grammaticality by Participants and Items.

Local noun
See Figure 1 for log-transformed reading times by Word Length and Local Noun Frequency; see Figure 2(a) for mean log-transformed reading times by Local Noun Number, Local Noun Class, and Local Noun Morphoorthography. Outputs of Bayesian mixed effect models appear in Table 3. Word Length (β = 0.024, SE = 0.003, 95% credible interval = [0.018, 0.030]) and log Local Noun Frequency (β = −0.010, SE = 0.003, 95% credible interval = [−0.016, −0.004]) both reliably impacted reading time, such that longer and infrequent words were read more slowly.

Local noun spillover
See Figure 1 for log-transformed reading times by Word Length and Local Noun Frequency; see Figure 2(b) for mean log-transformed reading times by Local Noun Number, Local Noun Class, and Local Noun Morphoorthography. Outputs of Bayesian mixed effect models appear in Table 3. Again, Word Length impacted reading time (β = 0.017, SE = 0.004, 95% credible interval = [0.008, 0.025]), such that longer words were read more slowly, as did log Local Noun Frequency (β = −0.021, SE = 0.004, 95% credible interval = [−0.028, −0.014]) such that the word following an infrequent noun was also read more slowly. There was also evidence for an interaction between Local Noun Number and Local Noun Morpho-orthography (β = 0.039, SE = 0.018, 95% credible interval = [0.004, 0.073]), such that the word following atypical plural nouns (cacti/dresses/men) was read more slowly than would be expected from either marginal effect (95% credible interval for pairwise difference between control and atypical plurals =  Table 4. In this region, there was evidence for an interaction between Local Noun Number and Local Noun Morpho-orthography, (β = 0.047, SE = 0.016, 95% credible interval = [0.016, 0.078]), such that verbs following atypical plural nouns (cacti/dresses/men) were read more slowly than their typical counterparts compared to what would be predicted from the marginal effects alone (95% credible interval for pairwise difference between control and atypical plural nouns = [−0.059, −0.011]). There was also moderate evidence for an interaction between Verb Grammaticality, Noun Number, and the first Local Noun Class contrast, (β = 0.070, SE = 0.040, 95% credible interval = [−0.008, 0.147]); note that while the 95% credible interval contains zero, the point estimate of the beta is high relative to the error around it, and 96% of the posterior distribution around the estimated effect is greater than zero. The pattern was that grammatical verbs following singular high-frequency nouns (man/boy) were read more slowly than grammatical verbs following plural or singular low-frequency nouns (cactus/cacti/yucca/yuccas, see Figure 3).

Verb spillover
See Figure 1 for log-transformed reading times by Word Length and Local Noun Frequency; see Figure 3(a) for mean log-transformed reading times by Grammaticality, Local Noun Number, Local Noun Class, and Local Noun Morpho-orthography. Outputs of Bayesian mixed effect models appear in Table 4. Word Length impacted reading time (β = 0.016, SE = 0.003, 95% credible interval = [0.012, 0.021]), such that longer words were read more slowly.
There was also evidence for a main effect of Verb Grammaticality (β = 0.076, SE = 0.009, 95% credible interval = [0.058, 0.094]), such that the spillover region after ungrammatical verbs was read more slowly. This was supported by moderate evidence for an interaction between Verb Grammaticality and Noun Number (β = −0.031, SE = 0.016, 95% credible interval = [−0.062, 0.001]); while the 95% credible interval contains zero, the point estimate of the beta is high relative to the error around it, and 97% of the posterior distribution around the estimated effect is below zero. The pattern was the typical "illusion of grammaticality", such that the spillover region after ungrammatical plural verbs following local singular nouns was read more slowly than those following plural nouns (95% credible interval for pairwise difference = [0.050, 0.004]).
Verb Grammaticality and Local Noun Number also interacted with the first Local Noun Class contrast, showing modulation of attraction by word frequency (β = 0.092, SE = 0.040, 95% credible interval = [0.013, 0.170]). The pattern was such that the highest-frequency words elicited the least attraction, with no reliable verb spillover reading time difference between the singular (man/boy) and plural (men/boys) ungrammatical conditions (Point estimate of singular-plural difference = 0.026; 95% credible interval for pairwise difference = [−0.020, 0.072]). While the second Local Noun Class contrast did not interact reliably with Verb Grammaticality and Local Noun Number, pairwise comparisons also indicate that the attraction effect observed in the aggregate was carried by the least frequent items (ungrammatical verbs following cactus/yucca, versus cacti/yuccas; point estimate of singular-plural difference = −0.063, 95% credible interval = [−0.109, −0.017]). In contrast, although the point estimate suggests evidence for attraction in the middle frequency items (point estimate of singular-plural difference = −0.32), the 95% credible interval for the pairwise difference contains zero (95% CrI = [−0.077, 0.013]).
Finally, we also observed interactions between Morpho-orthography and Frequency class, showing the specific effect of atypical local noun plural marking on reading time in the verb spillover region. Importantly, these interactions did not involve Verb Grammaticality, indicating that they do not reflect attraction. We observed a two-way interaction between Morpho-orthography and the second Local Noun Class contrast (β = 0.049, SE = 0.022, 95% credible interval = [0.005, 0.092]) such that atypical nouns of the "dress/dresses" type led to faster verb spillover reading times than typicallyspelled, frequency-matched items in the same sentence frames than the marginal effects would predict, with no reliable pairwise difference between control and atypical nouns in the "dress/dresses" condition (95% credible interval = [−0.052, 0.009]). We also observed a threeway interaction between Morpho-orthography, Noun Number, and the first Local Noun Class contrast (β = 0.090, SE = 0.041, 95% credible interval = [0.009, 0.168]) such that singular atypical nouns of the "cactus" type led to equally-fast verb spillover reading times as regular items such as "yucca" (95% credible interval for difference = [−0.046, 0.035]), whereas the other atypical nouns in the contrast (man/men/cacti) led to faster verb spillover reading times than their frequencymatched regular counterparts (see Figure 3).

Exploration of individual differences in agreement processing
To further examine the mechanisms and representations supporting agreement, we now turn to exploratory analyses that further examine the cognitive mechanisms supporting agreement. Cue-based retrieval suggests that processing agreement requires activating and deploying information about lexical items. Other research suggests that the underlying cognitive mechanisms of lexical processing, memory, and executive function may differ between individuals. Previous work on similar processing phenomena (Van Dyke et al., 2014) showed vocabulary size to be an important predictor of retrieval ability. Working memory and executive control have also been linked to agreement processing  , and these have also been critically linked to other domains of sentence processing (Badre & Wagner, 2007;Christianson, Williams, Zacks, & Ferreira, 2006;Daneman & Carpenter, 1980;Hussey et al., 2017;Hussey & Novick, 2012;Hussey, Ward, Christianson, & Kramer, 2015;January, Trueswell, & Thompson-Schill, 2009;King & Just, 1991;MacDonald, Just, & Carpenter, 1992;Novick, Trueswell, & Thompson-Schill, 2005;Snyder & Munakata, 2008;Swets, Desmet, Hambrick, & Ferreira, 2007;Vuong & Martin, 2014;Ye & Zhou, 2009). Given the plausible role for these cognitive mechanisms in agreement processing, and given the large sample of participants used in this experiment, we decided to use any time remaining in the 1 hour experimental session after the self-paced reading study to collect four continuous individual difference measures (Vocabulary Size, Reading Span, Verb Selection Cost, and Stroop Cost, see Appendix A for methods). We hypothesised that these factors might contribute to agreement processing or reading times on irregular nouns. We predicted that high vocabulary, high working memory, low verb selection cost, and low Stroop cost might lead to faster reading times overall and might especially do so when number cues conflict. We also included the number of trials each participant got wrong as a predictor for reading time in the successfully-answered trials, providing an index of individual reading ability and/or attention to the reading task.
To each of the four baseline models described in the previous sections, we added each of the five centred continuous predictors in turn, in one model as a main effect only and in another, as a main effect plus all of  its interactions with other predictors. We used the leaveone-out information criterion (LOOIC), calculated using the brms function loo(model1, model2) to assess whether adding individual difference variables improved model fit. This function estimates the predictive accuracy of a model by re-running it while leaving out one observation at a time; as with other information criteria, the smaller value indicates the preferred model. We report the difference between the LOOIC for the original model and the LOOIC for the model with added predictors; a negative value indicates that the original model is preferred, while a positive value indicates that the model containing the individual difference factor is preferred.
In the noun region (see Table 4), there was weak evidence for Stroop Cost impacting reading time. Adding Stroop Cost as a main effect only to this model tended to improve model fit (β = 0.547, SE = 0.283, 95% credible interval = [−0.017, 1.094]), such that participants who suffered more interference in the Stroop task also had slower reading times on local nouns, providing evidence that cognitive control affects retrieval ability. In the noun spillover region, (see Table 4), the number of excluded trials also had an impact on reading time (β = −0.012, SE = 0.010, 95% credible interval = [−0.031, 0.007]), where individuals with more excluded trials tended to have faster reading times on local nouns; this is likely because the trials excluded tended to be the ones with the slowest reading times and excluding them made the condition averages smaller. There was no evidence for any of the other factors affecting reading time in the noun or noun spillover regions.
In the verb region, Vocabulary size improved model fit as a main effect only (β = −0.222, SE = 0.146, 95% credible interval = [−0.508, 0.066]) such that individuals with larger vocabulary scores tended to have faster verb reading times. This is consistent with the findings of Van Dyke et al. (2014), supporting their argument that a large vocabulary contains lexical items with strong representations and this influences retrieval in sentence processingbut this has no impact on attraction per se. There was no evidence for any of the other factors affecting reading time in the verb or verb spillover regions.

Discussion
The present study examined the role of local noun frequency and morpho-orthography in agreement processing. Broadly speaking, we found that frequency impacts both noun retrieval and agreement attraction, with the most attraction observed for low-frequency local nouns. In contrast, morpho-orthography impacts noun retrieval and creates local verb processing difficulty that is distinct from attraction. This supports cue-based retrieval models as well as the role of probabilistic factors in retrieval, demonstrating that both sets of mechanisms contribute to agreement comprehension. It also clearly shows that while many factors impact lexical retrieval, it is number that impacts number agreement. This in turn implies that it is number features that are retrieved to resolve agreement, not the full representation of the controlling noun. We unpack each finding and its contribution to cue-based retrieval and probabilistic understandings of agreement below.
In the local noun region and following spillover word, we showed that individuals tended to slow down for low-frequency local nouns, consistent with the slower retrieval of low-frequency nouns shown in single-word tasks (e.g. Baayen et al., 1997;Bates et al., 2003;Jescheniak & Levelt, 1994;. Noun number also impacted noun reading times, with individuals slowing down on words following atypical plural nouns. This is broadly consistent with effects of number morphology on lexical retrieval (e.g. Andrews, 1997;Coltheart et al., 1977;Grainger et al., 1989;Snodgrass & Mintzer, 1993). Together, these findings indicate the importance of interference in processing nouns embedded in sentences, which is fully consistent with the difficulty of reading nouns of these types in isolation. Nouns that are harder to retrieve due to their low frequency ("yuccas") or atypical number specification ("cacti") are generally harder to process, which affects reading times in sentences.
At the verb spillover region, we showed evidence for the typical attraction effect, such that ungrammatical verbs following local plural nouns elicited faster processing than those following local singular nouns. This "illusion of grammaticality" is the core agreement attraction pattern, and it replicates previous work on the comprehension of subject-verb agreement (e.g. Dillon et al., 2013;Lago et al., 2015;Martin et al., 2012;Tanner et al., 2014;Wagers et al., 2009). In this region, noun morphoorthography did not impact attraction, consistent with work in production (Bock & Eberhard, 1993;Haskell & MacDonald, 2003). The lack of an effect of morpho-orthography at the verb processing regions combined with the effect observed at the noun regions is of critical importance for several reasons.
First, the literature on single-word processing suggests that readers may aim to decompose the -s affix even when this is not licensed (e.g. Rastle & Davis, 2008), especially for low-frequency irregular nouns (e.g. Alegre & Gordon, 1999;Baayen et al., 1997). Our results show that while this might occur at the local noun, it has no consequences for attraction. The implication is that unlicensed ("pseudo") morphological decomposition effects are transient during sentence processing, and the correct decomposition of words dominates number agreement.
Second, given that only number features matter for verb processing, the implication is that a full representation of the local noun is not retrieved at the verb regiononly the relevant number feature. Number features on nouns conveyed with any type of morphology are the main driver of agreement attraction. If, as suggested by Wagers et al. (2009), attraction occurs when an ungrammatical verb triggers re-analysis, this suggests that it is plural features that are drawn from memory and create the observed illusion of grammaticality. This provides an important boundary on cuebased retrieval in comprehension: features of words, not words themselves, are retrieved.
Importantly, attraction was also qualified by local noun frequency: the most attraction occurred for low frequency nouns (cacti/yuccas), and there was no evidence for attraction with high frequency nouns regardless of morpho-orthography (men/boys). This provides further support that difficulty in lexical retrieval is the main driver of attraction, which is consistent with the current view of attraction as operating under a cuebased retrieval framework. It also suggests that attraction does not always occur as a last-resort strategy: instead, fast processing might induce a shallow or underspecified parse of the verb (e.g. as in good enough processing, see Christianson et al., 2006, among others). This suggests that not all local nouns trigger re-analysis: easy-to-retrieve local nouns may induce the reader to proceed without a full representation of the sentence structure. This finding is highly novel, and serves to integrate the "illusion of grammaticality" with the pattern that not all readers process structure veridically: both mis-retrieval and processing without full structure-building allow readers to extract meaning from sentences containing errors.
Though it did not impact attraction, morpho-orthography did lead to processing difficulty in the verb and verb spillover region. First, at the verb region, atypical plural nouns led to processing difficulty regardless of verb grammaticality. Taken in combination with an observed interaction between verb grammaticality, noun number, and noun class, this shows that atypical plural nouns of the "cacti" and "dresses" type tended to elicit slower verb reading times, and atypical plural nouns of the "cacti" type tended to elicit slower verb spillover reading times. This suggests that the atypical morpho-orthography on these nouns has downstream consequences for processing such that later verbs require more processing time regardless of number. Such a pattern fits with a probabilistic agreement model (e.g. Haskell et al., 2010;Mirković & MacDonald, 2013;Villata et al., 2018). Nouns with atypical morphoorthography may have an association to both plural and singular number features due to their actual number marking and their pseudo-morphological pattern (e.g. Alegre & Gordon, 1999;Baayen et al., 1997;Rastle & Davis, 2008), leading to later agreement trouble. The critical implication is that while plural number features drive agreement attraction, the activation of these plural number features may in part be probabilistically determined. This means that any successful model of agreement needs to allow for graded representations of number features.
Individual differences in sentence processing are small but consistent There were few effects of individual differences on sentence reading times, which suggests that as a whole, while reading long, complex sentences requires some effort and may be perceived as difficult, it does not tend to max out individuals' cognitive abilities. This is consistent with the wider literature on individual differences in sentence processing, which shows that many underlying abilities co-vary, and many underlying abilities relate to variations in processing. The implication is that there are multiple mechanisms that allow us to overcome cognitive difficulty when comprehending language.
We did find evidence for two effects on reading times within the sentence that are consistent with earlier work. Mirroring the finding that executive control impacts the rate of inhibition of local nouns, (e.g. Vandierendonck et al., 2017;Veenstra et al., 2018), we showed that Stroop cost impacted reading times on the local noun region. This suggests the importance of resolving interference when processing local nouns, and is broadly consistent with new findings of encoding interference during noun processing (e.g. Villata et al., 2018).
In addition, consistent with Van Dyke and colleagues (Van Dyke et al., 2014;Van Dyke & Johns, 2012), we showed that vocabulary size impacted verb reading times. The implication is that individuals with larger vocabularies tend to have higher-quality lexical representations, and these lead to improved processing. Individuals with large vocabularies tend to be faster readers overall (e.g. Mainz, Shao, Brysbaert, & Meyer, 2017). The current work suggests that large vocabularies may lead to relative speed in processing particularly difficult regions of sentences, such as encountering a verb with an unclear controller.
Reconciling probabilistic representations with cue-based retrieval In the introduction, we noted some probabilistic factors that affect agreement production, making some errors more likely to be observed by a comprehender. The present data show the important role of word frequency, a probabilistic factor, on agreement attraction, consistent with the experience-based account of agreement production by MacDonald and colleagues (Haskell et al., 2010;Mirković & MacDonald, 2013). Meanwhile, morpho-orthography, another probabilistic factor, had minimal impact, mirroring the weak effects of morpho-orthography and phonology on agreement in the domain of language production (e.g. Bock & Eberhard, 1993;Haskell & MacDonald, 2003). Combined with the work reviewed in the introduction, the implication is that lexical properties affect production and comprehension similarly: lexical retrieval is an important part of agreement. This provides evidence that the representations and mechanisms operating at the word (or lemma) level are shared across sentence comprehension and production, as in current integrated frameworks (e.g. Dell & Chang, 2014;Pickering & Garrod, 2013).
To reconcile the role of probabilistic factors with the cue-based retrieval framework, we might view our results in light of a feature-based view of memory, such as MINERVA (e.g. Hintzman, 1984), where representational strength impacts the ease of memory retrieval. Previous literature (e.g. Wagers et al., 2009) has suggested that an ungrammatical plural verb cues the retrieval of a plural noun controller; these data suggest that when processing grammatical and ungrammatical verbs, nouns with an unclear feature specification may be hard to retrieve as agreement controllers, and when processing ungrammatical verbs, high-frequency local nouns may be resistant to mis-retrieval.
As a whole, this means that the data support a cuebased retrieval framework where words have lexical representations that vary in quality based upon how many tokens have been encountered. This might be modelled by making the structural or lexical representations that cue-based retrieval operates upon a gradient (see e.g. Brehm & Goldrick, 2017, for a gradient model of structure), or by appealing to an SOSP framework (e.g. Smith et al., 2018;Villata et al., 2018). Importantly, the core finding is that lexical properties such as word frequency and morpho-orthography impact the representation of words, while plural features exist as either a separately represented or more strongly represented aspect of each word. This means that probabilistic factors impact agreement and can lead to a general slowing as words are read. However, what is most critical for processing number agreement is simply number.

Conclusion
The present work shows how agreement relies upon cuebased mechanisms and probabilistic, graded representations. While number features drive the fortunes and misfortunes of number agreement, lexical properties that affect the robustness of the noun's representation can protect from attractionmaking lower-frequency words like cacti or yuccas better attractors than high-frequency words like men or boys. This means that future accounts of number agreement processing need to account for the quality of noun lexical representations as well as the mechanisms operating on them.