Both Semantic Diversity and Frequency Influence Children’s Sentence Reading

ABSTRACT Semantic diversity – a metric that captures variations in previous contextual experience with a word – influences children’s lexical decision and reading aloud. We investigated the effects of semantic diversity and frequency on children’s reading of words embedded in sentences, while eye movements were recorded. If semantic diversity and frequency reflect different aspects of experience that influence reading in different ways, they should show independent effects and perhaps even different processing signatures during reading. Forty-nine 9-year-olds read sentences containing high/low frequency and high/low diversity words, manipulated orthogonally. We observed main effects of both variables, with high frequency and high semantic diversity words being read more easily. These results show that variations in the amount and nature of contextual experience influence how easily words are processed during reading.

the local lexical environment of a word via its co-occurrence with other words in an adjacent window across a large corpus. Termed contextual distinctiveness, this was a better predictor of lexical decision latencies than frequency. To capture the content of contexts more fully, Hoffman, Ralph, and Rogers (2013) used latent semantic analysis to derive semantic diversity (for a related approach, Jones, Johns, & Recchia, 2012). A word's semantic diversity value corresponds to the mean distance in multi-dimensional space between all the contexts it appears in across a corpus. An example of a low diversity word is spinachthis tends to occur in a limited range of contexts, all relating to food (Hoffman & Woollams, 2015). In contrast, a high diversity word like chance appears in a range of different contexts. Put simply, spinach provides a reasonable clue as to the content of a context whereas chance does not. Contextual knowledge builds over time as each encounter a person has with a word adds to their database of information about that word. Given greater contextual variation, words high in semantic diversity are likely to map to a range of multiple or nuanced meanings, based on the notion that variation in the meaning of a word is an emergent property of variation in the context in which it is used. Adults find high semantic diversity words easier to identify (e.g., Hoffman et al., 2013;Hoffman & Woollams, 2015;Jones et al., 2017Jones et al., , 2012, as do children (Hsiao & Nation, 2018). This facilitative effect cannot be explained by other important predictors such as frequency, document count and age-of-acquisition. Instead, it shows that variation in the content of the contexts a word has been experienced in previously influences the ease with which it is subsequently identified. As a result, models of lexical processing need to consider learning and the environment in which a word is learned. Accordingly, the Semantic Distinctiveness Model (Jones et al., 2017) assumes that each exposure to a word provides an opportunity for its lexical representation to be updated. If a word appears in a different context, new information is encoded as it is updated. Over time, words occurring in varying contexts with varying contents become more context-independent than words appearing in more similar contexts, allowing them to be identified more easily.
We used an eye movement paradigm to chart the time course of semantic diversity effects during silent sentence reading. By monitoring the pattern of fixations and saccades as people read, the eye movement record can show whether a particular variable influences "early" (word identification) or "later" (meaning integration) stages of processing (Liversedge, Paterson, & Pickering, 1998). Building on a wealth of evidence documenting skilled reading (Rayner, 2009), it is clear that different aspects of eye movement behavior reflect different cognitive processes in children's reading too (e.g., Blythe & Joseph, 2011). Thus, monitoring eye movements as children read words that vary in semantic diversity should inform when and where in processing semantic diversity exerts its influence.
Frequency effects are seen early in processing in children's reading with shorter fixation durations on high frequency words than low frequency words (Blythe et al., 2009;Joseph et al., 2013). Semantic diversity has not been investigated directly using eye movements. However, Plummer, Perea, and Rayner (2014) monitored eye movements as adults read sentences containing words varying in contextual diversity (instantiated as document count) and frequency. High document count words were fixated for less time, even when frequency was controlled, and document count subsumed any effect of frequency. They also reported a post-hoc analysis to explore the effect of semantic diversity on a subset of items. In contrast to document count, semantic diversity was not associated with early processing measures. However, high semantic diversity was associated with slower processing on later measures (go-past and total time), consistent with variations in semantic diversity influencing word-to-text integration, rather than word identification. These preliminary findings support the view that semantic diversity is distinct from both frequency and document count in terms of its processing signature.
In this experiment, we examined the time course of frequency and semantic diversity in children. We embedded words that varied orthogonally on the two constructs into the same neutral sentences. This allowed us to investigate their effects on word processing in natural reading, while controlling for any effects of sentential context. Consistent with previous work, we predicted that frequency effects would be seen early in processing. Our prediction for semantic diversity was less clear: If it influences word identification, it should affect early processing; alternatively, if it reflects discourselevel processing, its influence should emerge later in processing.

Participants
Fifty-three Year 5 children, recruited from two primary schools in Oxford took part in this experiment. First, children's legal guardians gave their informed consent and children gave their assent prior to their inclusion in the study. The Test of Word Reading Efficiency (Torgesen, Rashotte, & Wagner, 1999) was used to exclude any child with below average reading (standard score <85). Three children were excluded on this basis, and one for problems with calibration. The final sample comprised 49 children (M = 9.5 years, SD = 0.5) with normal or corrected-to-normal vision and no history of reading difficulties, and average-to-good reading (M = 115, SD = 13; range = 90-141). This experiment was approved by Oxford University's Research Ethics Committee. Therefore, this study conforms to the recognized standards of Declaration of Helsinki.

Materials
We selected 160 words (4-9 letters, M = 6.7, SD = 1.1) based on their frequency (zipf; log10 (frequency per billion words) and semantic diversity values, taken from the Oxford Children's Corpus (see Hsiao & Nation, 2018). Diversity and frequency were manipulated orthogonally, with 40 words in each condition. In addition, forty sentence-frames were constructed, each with a slot for a target word. As shown in Table 1, the same sentence frame was used to present four different words, one from each condition (see Supplementary Materials). Target word length was matched across each quartet, allowing for a maximum of one letter difference.
To confirm that children of this age know the target words, 61 9-10 year-olds (different from the main experiment) participated in a screening check. We created four counterbalanced lists with a target word in one of the four sentence frames. Approximately 15 children per list rated each sentence on a scale of 1 (easy-to-understand) to 3 (difficult-to-understand). The sentences were considered easy (M = 1.1, range = 1-2) with no difference across conditions (p > .1). To check the predictability of the sentences, we presented the 40 sentence frames up to the point of the target Table 1. Frequency and semantic diversity values for target words in each condition. Also shown is an example sentence and performance (% correct) on the comprehension question by condition. Note. Embedded target words are in shown in italics (not in the experiment). See supplementary materials for a full list of sentences and target words. T-test comparisons revealed statistically significant differences in frequency between high and low frequency target words (t = 20.15, p < .001) and in semantic diversity for the high and low semantic diversity target words (t = 13.49, p < .001). There was no difference in frequency for the two sets of high frequency words (t = 0.64, p = .53) or the two low frequency sets (t = 0.11, p = .91). Similarly, the two high semantic diversity sets were equivalent in diversity (t = 1.50, p = .14), as were the two low semantic diversity sets (t = 0.88, p = .38). Note that frequency is calculated using Zipf values (the log10(frequency per billion words)).
word and asked 14 additional 9-10 year-olds to write down the word that should come next, emphasizing that the word did not need to complete the sentence. The sentence frames did not predict the target word and there was no difference across conditions (ts<0.16). The four sets of 40 sentences were counterbalanced across four lists using a Latin Square design. Each list included 40 experimental sentences (10 per condition) and two practice sentences. Each sentence occupied one line on the screen (averaging 12 words; range: 8-15) and the target word appeared in the middle of the sentence. To promote reading for meaning, yes/no comprehension questions (not referring to the target) appeared after 50% of sentences, balanced across conditions.
Given the demands of finding words that fitted the semantic diversity x frequency manipulation while controlling for length, and permitted each word in a quartet to fit into the same sentential context without changing predictability, we did not attempt to match on other lexical properties. Semantic diversity is only moderately correlated with other key predictors of word-level reading (Hsiao & Nation, 2018) and both sets of words were equivalent on a range properties, including imageability, number of senses and concreteness (Table 2), but not age of acquisition. We discuss this point in the General Discussion.

Apparatus and procedure
Eye movements were recorded using an SR Research Eyelink 1000 tracker (SR Research Ldt., Ontorio, Canada; spatial resolution 0.05°, sampling rate 1,000 Hz). The sentences were presented on a 15" Dell monitor set at a refresh rate of 60 Hz with a 1,024 × 768 resolution and viewing distance of 60 cm. We used a lowercase black font (Courier New, size 12) on a gray background; three characters subtended 1°of visual angle. Reading was binocular but only the right eye was recorded. Forehead and chin rests minimized head movements. To calibrate, children looked at three horizontal fixation points. Calibration accuracy did not exceed 0.25. Children were instructed to read sentences silently. After each sentence, they pressed a button on a games controller to terminate the trial and to respond to the comprehension questions. Following two practice trials, the experimental sentences were presented randomly. The experiment lasted less than 30 minutes.

Results
Accuracy on the comprehension questions was high, with all children scoring at least 75% (M = 89%, SD = 5%). This indicates that children were reading for comprehension and there was no difference across conditions (Table 1).
Pre-analysis, the "clean" function in DataViewer (SR Research) trimmed the eye movement data. Fixations shorter than 80 ms and longer than 1200 ms were deleted. Fixations shorter than 80 ms which were located within one character space of the next or previous fixation were merged into that nearby fixation. The data were log-transformed to reduce skew (Baayen, Davidson, & Bates, 2008). We analyzed the data using linear mixed effects (lme) modeling with the lmer function from the lme4 package (Bates, Maechler, & Bolker, 2012) within R (R Core Team, 2013). For each dependent variable, we built an lme model with frequency and semantic diversity as categorical variables and specified as fixed factors, using the "contr.sdif" (MASS package) function. Contrasts were specified as 0.5/-0.5 for low/high frequency and low/high semantic diversity, such that the intercept corresponded to the grand mean and the fixed effects corresponded to the main effect of the fixed factors. Both participants and items were specified as random factors. A full random structure for subjects and items was specified to avoid being anti-conservative (Barr, Levy, Scheepers, & Tily, 2013). Significance values and standard errors reflect both participant and item variability (Baayen et al., 2008) and following convention, t > 2 was considered significant.
Our dependent variables included "early" measures: first fixation duration (initial first-pass fixation on a word) and gaze duration (sum of all consecutive first-pass fixations on a word before leaving it) and "late" measures on the target word: go past time (time from the initial fixation on a word until the eyes move onward) and total time (sum of the duration of all the fixations on a target word, including regressions). Raw data and analysis code are available via the Open Science Framework (https://osf.io/j9f72/) and as Supplementary Materials. Figure 1 shows descriptive data. As summarized in Table 3, there was a main effect of frequency for all the dependent measures except first fixation duration, with low frequency words receiving longer fixations and slower reading times than high frequency words. There was also a main effect of semantic diversity with longer fixations and slower reading times for low diversity words compared to high diversity words. As for frequency, there was no effect on first fixation duration, but robust effects in gaze duration and go past time; the effect was marginal for total time 1 . There was no interaction between frequency and semantic diversity on any measure (ts < .3).

General discussion
This experiment examined the influence of frequency and semantic diversity on children's reading of target words embedded in sentences. We used an orthogonal design to manipulate the two variables and kept the sentence context constant across conditions. To our knowledge, this is the first eye-movement study to investigate the effects of frequency and semantic diversity during silent sentence reading.
As predicted, children processed higher frequency words more easily than lower frequency words, replicating previous findings (Blythe et al., 2009;Joseph et al., 2013). We also found that frequency remains a reliable predictor of children's reading behavior, even when semantic diversity is controlled. Turning to semantic diversity, high diversity words were easier to process than low diversity words, even when frequency was controlled. These findings extend those of Hsiao and Nation (2018) to naturalistic reading and reveal that semantic diversity's processing signature is similar to (but separate from) the effect of frequency. This suggests that the contextual nature of previous experience with a word influences item-level differences in the ease of word identification.
Before reflecting on our findings further, it is important to note that any experiment that manipulates lexical variables in an orthogonal design is always at risk of attributing an effect to a particular variable when in fact the effect should be attributed to some other factor, correlated with the variable of interest but not controlled in the experiment. In our experiment, alongside the key manipulation that required items to vary in semantic diversity but not frequency (and vice versa), each quartet of targets needed to fit equally comfortably within the same sentential context and be matched for lengtha key variable to control when using eye movement measures. Given these constraints, we did not attempt to match across conditions for the myriad of other variables that influence reading. Nevertheless, we need to consider whether the semantic diversity effect observed in our data can be "explained" by something else. Comparing the high vs. low semantic diversity words, there was no difference in document count, or in "classic" semantic variables including imageability, concreteness and number of senses (all ps > .27). There was however a significant difference in age of acquisition, with high diversity words being earlier acquired than low diversity words (6.17 vs. 7.21 years; note there was a comparable difference in age of acquisition for the high vs. low frequency words: 6.07 vs. 7.29 years). This is not too surprising. There is a modest correlation between the two variables (r = − .40, Hsiao & Nation, 2018) and contextual variability has a close relationship with the order in which words are acquired in infancy, with early acquired words having more semantic connections that those acquired later (Hills, Maouene, Riordan, & Smith, 2010). Hsiao and Nation (2018) used a continuous design to investigate the effect of semantic diversity and age of acquisition on lexical decision and reading aloud. They found that both variables influenced children's reading. This offers reassurance that the two variables are not one of the same, but clearly more research is needed to unpack their relationship. Ultimately, to fully understand semantic diversity requires a different type of experimental designone that systematically brings about variation in semantic diversity as words are learned and examines the direct consequence of this for subsequent reading behavior (see Johns, Dye, & Jones, 2016;Joseph & Nation, 2018;Pagán & Nation, 2019;Rosa, Tapia, & Perea, 2017). In the meantime, however, our experiment is clear in showing that semantic diversity influences children's eye movements while reading in a way that cannot be readily explained by frequency, document count, word length, imageability, concreteness and number of senses. These facilitative effects of high semantic diversity contrast with Plummer et al.'s (2014) findings. In a post-hoc supplemental analysis on a subset of words, they observed an effect of semantic diversity on later processing measures only (go past time and total time), suggesting that it might influence word-to -text integration rather than word identification. Moreover, reading times slowed as diversity increasedthe opposite pattern of results to those observed here. As Plummer et al. acknowledged, their finding must be interpreted with caution as semantic diversity was not the focus of their experiment: their items were not selected or controlled with semantic diversity in mind. Another important difference concerns the sentential contexts in which words appeared in different conditions. While Plummer et al. built a different sentence frame for each word in each condition, we used the same sentence frame across the four conditions. This provided us with excellent stimulus control. Plummer et al. tested undergraduate students whereas our participants were children, but further investigation is needed before speculating on potential developmental differences.
Our findings indicate that the nature of previous experience with words, not just the amount of experience, shapes the development of lexical representations. Words high in semantic diversity map to a range of multiple or nuanced meanings, based on the notion that variation in the meaning of a word is an emergent property of variation in the context in which it is used. Accordingly, words that are ambiguous or polysemous tend to be higher in semantic diversity (see Hoffman et al., 2013;Hsiao & Nation, 2018). Importantly however, semantic diversity captures shades of meaning based on contextual usage in a way that is continuous and graded: even words deemed to be unambiguous show variation in semantic diversity (for further discussion, see Hoffman et al., 2013). Relevant to our findings is the notion that one consequence of this greater contextual variation is that a word's representation becomes more context independent over time, meaning that when a high diversity word is experienced in a new context (or out of context), it is easier to identify, even when frequency is matched (Hsiao & Nation, 2018). This fits with the Semantic Distinctiveness Model (Jones et al., 2017) in that contextual variation during experience provides more opportunity for updating, leading to easier identification on subsequent encounters, and with theoretical accounts of the polysemy processing advantage (e.g., Rodd, Gaskell, & Marslen-Wilson, 2004). In line with this, we found shorter fixation durations on high semantic diversity words.
It is also important to consider how semantic diversity influences processes involved in text comprehension and meaning integration. Arguably, what brings a processing advantage for word identification (i.e., context independence) might lead to processing costs when context dependence is desired (i.e., when building a coherent text representation). In our experiment however, semantic diversity behaved similarly in earlier and later processing, with an advantage for words high in diversity throughout. Notably, our sentences were neutral and unpredictable: it is likely that different levels of contextual constraint serve to moderate whether a word's semantic diversity eases or intensifies processing effort, as revealed by later measures tapping word-to-text integration. Future work is needed to investigate how the semantic diversity of individual words impacts on processing when context and predictability are allowed to vary more freely 2 .
In conclusion, this experiment extends previous studies by teasing apart the time-course of the effects of frequency and semantic diversity on lexical processing during silent sentence reading. We found that high semantic diversity, like high frequency, was associated with easier processing when reading words in neutral sentences. Our findings call for theoretical accounts of reading that accommodate both learning and contextual variation during experience.
Notes 1. Lme model on the probability of making regressions to the target word showed no differences across conditions (ps > 0.4), indicating that total time was not influenced by the percentage of regressions. 2. Raw data files for the entire sentence are posted to the Open Science Framework (https://osf.io/j9f72/).