Quantity and Diversity of Preliteracy Language Exposure Both Affect Literacy Development: Evidence from a Computational Model of Reading

ABSTRACT Diversity of vocabulary knowledge and quantity of language exposure prior to literacy are key predictors of reading development. However, diversity and quantity of exposure are difficult to distinguish in behavioural studies, and so the causal relations with literacy are not well known. We tested these relations by training a connectionist triangle model of reading that learned to map between semantic; phonological; and, later, orthographic forms of words. The model first learned to map between phonology and semantics, where we manipulated the quantity and diversity of this preliterate language experience. Then the model learned to read. Both diversity and quantity of exposure had unique effects on reading performance, with larger effects for written word comprehension than for reading fluency. The results further showed that quantity of preliteracy language exposure was beneficial only when this was to a varied vocabulary and could be an impediment when exposed to a limited vocabulary.

Quantity of exposure is likely to result in greater quality of representations for those words experienced and so may contribute independently, or interact with, lexical diversity. Quantity of exposure has been assumed to result in greater fidelity of representation of meaning and pronunciation of words (Perfetti, 2007), which reflects vocabulary depth, which has been operationalised in terms of ability to define words and produce synonyms (Ouellette, 2006). Diversity of exposure, on the other hand, can result in greater breadth of vocabulary, measured in either word recognition (Ouellette, 2006) or word production (Rowe, 2012). This distinction between vocabulary depth and breadth was measured in a study of oral language skills in Grade 4 children by Ouellette (2006). He found that concurrent measures of both vocabulary size and depth were independent predictors of reading accuracy and reading comprehension scores (see also Ouellette & Beers, 2010). Tannenbaum, Torgesen, and Wagner (2006) found a similar effect in Grade 3 readers. Jones and Rowland (2017) recently developed a computational model of vocabulary acquisition to explore how quantity and diversity of exposure relates to acquisition of the child's oral vocabulary. The model's ability to acquire additional words was improved by both lexical diversity and quantity of input, but quantity is important early and diversity is more important later for oral vocabulary learning, consistent with behavioural findings (Rowe, 2012). However, the effects of diversity and quantity of exposure on literacy development have not yet been demonstrated, except in concurrent studies of oral vocabulary and literacy skills (Ouellette, 2006).
The model of reading that we explore in this article is based on the triangle model of reading (Harm & Seidenberg, 2004;Plaut, McClelland, Seidenberg, & Patterson, 1996;Seidenberg & McClelland, 1989), which comprises phonological, semantic, and orthographic representations of words, with interconnections that are trained during the course of language and reading development ( Figure 1). A key feature of the model's performance is that it incrementally learns relations between each of the representations as a consequence of experience with the language. The triangle model has been successful in simulating a wide range of key behaviours in proficient readers (Chang, Furber, & Welbourne, 2012;Harm & Seidenberg, 1999;Plaut et al., 1996;Seidenberg & McClelland, 1989), and processes involved in reading development (Monaghan, Chang, Welbourne, & Brysbaert, 2017;Monaghan & Ellis, 2010), as well as extensions to nonalphabetic orthographic systems (Chang, Welbourne, & Lee, 2016;Yang, McCandliss, Shu, & Zevin, 2009).
The triangle model is consistent with key aspects of the SVR (Gough & Tunmer, 1986), as it includes mappings and representations that reflect oral language skills, reading fluency, and reading comprehension. Reading fluency (or decoding skills) in the SVR is operationalised in the triangle model as mapping from orthography to phonology, written word comprehension as mapping from orthography to semantics, and oral language skills as mapping between phonology and semantics. However, the triangle model is less constrained than the SVR in that connections between all representations are present in the triangle model. The role of pathways within the triangle model is thus not architecturally constrained but is instead a matter of degree of engagement, which is determined by the difficulty of the mappings to be acquired.
For investigating preliteracy language development, it is vital that the triangle model be exposed to oral language prior to literacy onset, such that preliteracy language experience can then be assessed for its impact on reading development. In this oral language experience, the model learns to map from words' sounds to meanings, as well as learning to produce words' sounds from meanings. Implementing these preliteracy language skills in a model, and then testing the literacy development of the same model, enables us to test the direct relation between preliteracy language skills and literacy development in a theoretical framework of reading. Furthermore, the language experience of the model can be controlled to determine the contributions to literacy development of both the variety and the quantity of preliteracy language experience, where, behaviourally, it is often difficult to distinguish their separate contributions due to the high correlation between variation in vocabulary and quantity of exposure (Rowe, 2012).
In this article, we addressed four main research questions. First, in line with behavioural studies, we predicted that both variety of exposure and quantity would contribute to literacy development (Jones & Rowland, 2017;Ouellette, 2006;Rowe, 2012). This would be due to the greater fidelity of phonological and meaning representations of words consequent on quantity and diversity of exposure, which should support acquisition of mappings from orthography onto phonology and meaning.
The second research question related to how quantity and diversity of preliterate language exposure might interact and how the pattern might change across reading development. The effects of diversity and quantity could be additive. Alternatively, diversity and quantity could affect each other. For instance, greater diversity may mitigate constraints that derive from limited exposure due to broader training on phonotactic probabilities of the vocabulary (e.g., Storkel, 2001), or limited exposure to a diverse vocabulary might result in poorer learning of all words due to fewer opportunities to acquire clear phonological or meaning representations of each word (Perfetti, 2007), and thus impair reading acquisition. Regarding the pattern across reading development, exposure could be more important early in literacy, with diversity becoming increasingly important, akin to oral vocabulary development (Rowe, 2012). Alternatively, diversity might be more important than exposure, consistent with processes involved in later oral vocabulary development (Jones & Rowland, 2017).
The third research question related to the differential contribution of exposure and diversity of oral language experience on written word comprehension and word reading fluency. In line with Ouellette's (2006) behavioural study, we predicted that exposure and variation would both be more important for development of written word comprehension than reading fluency. This is a consequence of the type of mappings to be learned between representations. In English, the mapping between meaning representations and written forms is an almost entirely arbitrary relation (Monaghan, Shillcock, Christiansen, & Kirby, 2014), but with some exceptions relating to morphology (Seidenberg & Gonnerman, 2000) and historical orthographic properties that have preserved distinctions of meaning (Aronoff, Berg, & Heyer, 2016). Acquiring arbitrary mappings is computationally extremely expensive and learning such associations is therefore slow. However, for generating spoken forms from written forms, the mapping is quasi-regular in English and can be acquired with fewer resources and greater speed Plaut et al., 1996). Thus, for the easier quasi-regular mapping task involved in reading fluency, generalisations can be constructed relatively quickly, and from a smaller vocabulary, than that required to produce meaning representations from written forms, as in written word comprehension.
The final research question determined the alignment of the triangle model of reading with the SVR, by quantifying the role of decoding skills (mappings from orthography to phonology) and the role of oral vocabulary (mappings from phonology to semantics) on written word comprehension. We tested the extent to which the triangle model was effective in simulating the division of labour predicted by the SVR that reading comprehension would be served by both oral vocabulary and decoding skills (Adlof et al., 2006;Curtis, 1980;Gough & Tunmer, 1986;Nation & Snowling, 2004;Ouellette & Beers, 2010;Ricketts et al., 2007;Storch & Whitehurst, 2002;Tomblin & Chang, 2006). The SVR and the triangle model differ somewhat in their conceptions of the directionality of mappings between phonology and semantics. The SVR focuses on mappings from phonology to semantics, whereas the triangle model contends that semantics to phonology may also be involved for reading fluency. Thus we also tested the extent to which oral language and written word comprehension affected reading fluency, by investigating the contribution of indirect mappings from orthography to phonology, via semantics.

A computational model of preliteracy effects on literacy development
The computational model was an implementation of the triangle model (Harm & Seidenberg, 2004) in English. Previously, this model has been applied mostly to simulate reading behaviours in proficient readers; however, it has not investigated the influence of oral language skills on literacy development. Here we systematically controlled and varied the model's preliteracy training to determine the effect on later literacy development while inheriting the explanatory strength of the triangle model approach in accounting for reading phenomena.

Architecture
The architecture of the model is shown in Figure 1. The model consisted of three key processing layers (orthographic, phonological, and semantic) and five intervening layers to form interconnections between the processing layers. Attractor layers, which contained 50 units, were connected to and from the phonological and semantic layers. These attractor layers helped the model develop stable and high-fidelity phonological and semantic representations of words where partial or noisy degraded activation patterns can move toward familiar representations (Harm & Seidenberg, 2004). In addition, there were four context units connecting to the semantic layer through a set of 10 hidden units. These units enabled the model to disambiguate homophones (e.g., hear, here) by using broad information about the context in which the word occurred. One context unit was active for each homophone, with the context unit assigned to each word meaning selected at random at the beginning of training. In this way, each context unit was almost equally active across the training corpus. For nonhomophones, none of the context units were active.
The semantic layer was connected to the phonological layer through a set of 300 hidden units, and the phonological layer was connected back to the semantic layer through another set of 300 hidden units. These hidden units provided resources for the model to learn the mappings between representations. The orthographic layer was connected to both the phonological and semantic layers through different sets of 500 hidden units. All units in one layer were connected with all units in the next layer. For all of the hidden layers in the model, the numbers of units were selected through pilot testing as the minimum required for reliable accurate mappings to be acquired.

Representations
The representations of orthography, phonology, and semantics were similar to those used by Harm and Seidenberg (2004). The training corpus comprised all 6,229 monosyllabic words in English for which semantic (from Wordnet; Miller, 1990) and phonological (from CELEX; Baayen, Piepenbrock, & van Rijn, 1993) representations were available. This corpus was identical to that used in Harm and Seidenberg (2004) but also included all inflected forms of words, some of which were originally omitted. Frequency, derived from the Wall Street Journal corpus (Marcus, Santorini, & Marcinkiewicz, 1993), was log-compressed prior to training of the model.
For orthography, each word was represented by 14 letter slots, permitting all words in the corpus to be represented. Each slot comprised 26 units, one for each of the 26 letters of the alphabet. Words were positioned with their first vowel aligned on the fifth slot. For words having two adjacent vowels, the second vowel was placed on the sixth slot. Consonants preceding or following the vowel were positioned in adjacent slots to the two vowel slots. Further vowels that were nonadjacent to the first vowel also occurred in adjacent slots after the first two vowel slots. 1 This maximised the model's ability to detect similarities between pronunciation of letter combinations by reducing the problem of dispersion (Plaut et al., 1996). 2 For phonology, each word was represented by eight phoneme slots, allowing all words in the corpus to be represented. Pronunciation of each word was positioned with the vowel at the fourth phoneme slot. The first three slots were for onset consonants and the last four slots were for coda consonants, enabling the probabilities of mappings between particular letters and phonemes to be detected. 3 Each phoneme was encoded by a binary vector of 25 phonological features (including, e.g., voice, nasal, labial, palatal, round, etc.), taken from Chomsky and Halle's (1968) phoneme feature matrix and exactly the same as in Harm and Seidenberg (2004).
The semantic representation for each word derived from Wordnet (Miller, 1990) comprised 2,446 semantic features, in accordance with those used in Harm and Seidenberg (2004). The presence of semantic features was encoded as 1, and the absence of semantic features was encoded as 0. For example, a dog has legs but cannot fly, so the leg feature for dog is 1 and the fly feature for dog is 0. Comprehension in the model relates to reproduction of the semantic features of a word; we therefore refer to the model's performance as written word comprehension, to distinguish the task from text comprehension.

Training procedure
The training process had two phases. In preliteracy training, the model learned the mappings between phonology and semantics, mimicking the language skills that children have developed before learning to read. In reading training, the model learned mappings from orthography to phonology and to semantics.
To investigate the effect of exposure and diversity in preliterate language experience on reading performance, the model was trained with six vocabulary sizes in the preliteracy training: 1,000, 2,000, 3,000, 4,000, 5,000, and 6,000 words. The set of words in each vocabulary size was selected from the whole training corpus (i.e., 6,229 words) based on frequency, such that the most frequent 1,000 words in the language composed the 1,000 vocabulary size condition, the most frequent 2,000 words for the 2,000 word vocabulary condition, and so on. This simulated the relation between frequency of words and the likelihood of their occurrence in language exposure (Kuperman & van Dyke, 2013). 4 In preliteracy training, the model was trained on both a speaking task (mappings from semantic to phonological representations) and a hearing task (mappings from phonological to semantic representations). The model also learned to develop stable phonological representations (mappings from phonological to phonological representations) via the phonological attractor units, and stable semantic representations (mappings from semantic to semantic representations) via the semantic attractors. The model learned to produce representations over several time steps. For both the speaking and hearing tasks, the input pattern of each word was presented constantly for eight time steps, and in the last two time steps, the model was required to reproduce the target pattern of the word. For both the phonological and semantic attractors, the input pattern of each word was presented constantly for six time steps. For Time Steps 7 and 8, the model had to reproduce the target pattern of the word. The input from the context units was provided only for the hearing task.
Following Harm and Seidenberg (2004), the four training tasks were interleaved, with 40% of trials for the speaking task, 40% of trials for the hearing task, 10% of trials for the phonological attractor training, and the remaining 10% for the semantic attractor training. These ratios were selected to ensure that all tasks were learned effectively. 5 Which word was presented to the model was determined by sampling according to the words' log-frequencies.
The model learned by adjusting weight connections between units based on the back-propagation through time algorithm (Pearlmutter, 1989(Pearlmutter, , 1995Plaut et al., 1996). The weight connections were incrementally adjusted to reduce this error between the actual and target representations. A typical learning rate of 0.05 was used to ensure that changes to weights were made gradually, preventing the model being unduly affected by individual learning trials. The difference between the actual and target representation for each word was measured in terms of the divergence between these representations (cross-entropy; Plaut et al., 1996, Equation 4). The model was trained on the oral language skills with varying amounts of exposure, either sampling words 400,000 times from the vocabulary, or 800,000, 1.2 million, 1.6 million, or 2 million times.
After preliteracy training, the model was trained on the literacy tasks, learning the mappings from orthography to semantics and to phonology. The same literacy training procedure was applied to each of the 30 preliteracy simulations of the model (6 vocabulary conditions × 5 exposure conditions). The orthographic representation of a word along with the context layer representation was presented constantly for 12 time steps. For Time Steps 7-12, the model was required to produce the phonological and semantic representations for that word. All the other training parameters remained the same as in the preliteracy training.
Four versions of each model, with different randomised starting parameters and different random sampling from the training vocabularies, were run to ensure that these random parameters did not adversely affect the simulations. 6

Testing procedure
After preliteracy training, the model was tested on the speaking and hearing tasks. For the speaking task, the semantic representation of each word was presented and the activation of units in the phonological layer at the end of the eight time steps was recorded. Error score was measured by the sum of the squared differences between the activation of each input unit and its target activation, and accuracy was computed by measuring for each phoneme slot the closest phoneme to the model's actual production, and determining whether they were the same for all phoneme slots. The error score and the accuracy are closely related, but error score provides a more nuanced measure of how close the model's production is to the target representation. Thus, if the model produced an incorrect phoneme, the error score would be high. However, if the model produced phonological representations that were closer to the target phoneme in each position but individual phonological features were less accurately represented, then the error score could still be higher than a phonological representation where all phonological features were accurately reproduced.
For the hearing task, the phonological representation of each word was presented and the activation of units in the semantic layer at the end of the eight time steps was recorded. Error score was measured by the sum of squared differences over the semantic layer. Accuracy was measured by computing the Euclidean distance between the model's actual semantic representation and the semantic representation of each word in the training corpus. If the smallest distance was to the target representation, then the model was judged to be correct. Again, error scores provide a more sensitive measure than accuracy, as two words could be accurately represented in semantics (in terms of being closer to the target set of meaning features) but diverge in terms of how close individual meaning features are to their target activation.
At the end of reading training, the model's reading performance was tested on all words in the corpus, by presenting the orthographic representation of a word and measuring error score and accuracy for both semantic and phonological output at Time Step 12 in the same way as for the preliteracy training phase.

Preliteracy training performance
We measured the model's ability to acquire the oral vocabularies with different amounts of exposure. Figure 2 shows the preliteracy performance of the model for the speaking task, mapping from semantics to phonology, and the hearing task, mapping from phonology to semantics, across training up to 2 million word exposures for the six vocabulary sizes. By 2 million words, accuracy scores were greater than 88% of the vocabulary for both tasks. Both exposure and vocabulary size had an overall positive influence on vocabulary size in the model. Figure 2 illustrates the percentage correct of the set of words that the model is exposed to. Thus, the model trained on a diverse vocabulary of 6,000 words has a larger vocabulary than the model trained on 1,000 words if its proportion correct exceeds one sixth that of the 1,000-word model. Note that the literacy models with different exposure conditions were trained at points that preceded the end of the 2 million words training.

Exploring relations between preliteracy language exposure and reading development
The model's performance was measured every 100,000 reading trials from 100,000, up to 1 million exposures as shown in Figure 3a for reading fluency and Figure 3b for written word comprehension. To investigate how vocabulary size and amount of exposure affected the model's accuracy at different reading times for both reading fluency and written word comprehension, we conducted generalised linear mixed-effect models on each of these measures. Simulation run (one to four) and word item were random factors, and vocabulary size (1,000, . . ., 6,000), amount of preliteracy language exposure (400,000, . . ., 2 million), and reading time (100,000, . . ., 1 million) were fixed factors. Reading time was log-transformed prior to the analyses (Figures 3a and 3b demonstrate that performance across reading experience was not linear). All the variables were scaled because the range of each variable was very different.  Figure 2. The pretraining performance of the model on the hearing task (phonology to semantics) and speaking task (semantics to phonology) with six vocabulary sizes (1,000 to 6,000). Note. K = thousand; M = million.    For reading fluency (orthography-to-phonology mappings), both amount of preliteracy exposure and vocabulary size were significant predictors (β = −0.05, p < .001, and β = 0. 25, p < .001, respectively). Log reading time also made a significant contribution (β = 1.45, p < .001). Thus, amount of exposure, vocabulary size, and log reading time all had significant effects on literacy outcomes in the model. There was a significant two-way interaction between exposure and vocabulary size (Figure 4; β = 0.06, p < .001). The interaction graph is plotted on the basis of predictions of the generalised linear mixed-effects models, measured in predicted probabilities for accurate reading. As can be seen in Figure 4, when vocabulary sizes were greater than 3,000, literacy acquisition of the model was not affected by amount of exposure, but performance decreased then with amount of exposure: for combined performance from 1,000 to 3,000 vocabulary size, exposure was significant (β = −0.09, p < .001), but for combined performance from 4,000 to 6,000 vocabulary size, exposure was not significant (β = −0.002, p = .66).
In addition, the three-way interaction between exposure, vocabulary size, and log reading time also reached significance (β = 0.008, p = .014). Further analyses at different training times ( Figure 5) showed that at early reading time 100,000, both the effects of exposure (β = −0.041, p < .001) and of vocabulary size (β = 0.28, p < .001) were significant. The interaction between vocabulary size and exposure was also significant (β = 0.047, p < .001). Whereas at later reading time 1 million, both exposure (β = −0.079, p < .05) and vocabulary size (β = 0.174, p < .001) were significant predictors but the interaction was not (p = .58). These results indicated that vocabulary size had a positive and stronger influence on reading fluency at early compared to later reading time, whereas exposure had a negative influence and the effect increased with reading training.
In addition, the three-way interaction between exposure, vocabulary, and log reading time also made a significant contribution (β = −0.01, p < .001). Figure 7 shows the interaction patterns at different training times. At reading time 100,000, both exposure (β = 0.31, p < .001) and vocabulary (β = 1.72, p < .001) were significant predictors, and the interaction was also significant (β = 0.51, p < .001). At reading time 1 million, both exposure (β = −0.21, p < .001) and vocabulary (β = 0.48,  p < .001) were significant predictors, and the interaction was also significant (β = 0.18, p < .001). The results showed that exposure had a positive effect in early reading training, whereas a negative effect in later reading training. Vocabulary size, on the other hand, had a positive effect at both early and later reading times, albeit the effect became smaller. For the interaction between vocabulary size and exposure, the beta values (0.51 vs. 0.18) were much larger in early reading than in later reading, suggesting that the effects were still persistent through reading development, though more reading experience resulted toward a converging of performance.
To test whether the effects of vocabulary size, exposure, and log reading time were different for written word comprehension and reading fluency, we included reading task as a fixed effect in a combined analysis. The results showed that the interaction between task, exposure, and vocabulary size was significant (β = 0.19, p < .001). The four-way interaction between task, log reading time, exposure, and vocabulary size was also significant (β = −0.09, p < .001). These results confirmed our hypothesis that there are stronger effects of vocabulary size and exposure for written word comprehension compared to reading fluency and that the effects of oral language on written word comprehension are sustained to a greater extent through reading development than for reading fluency in the model.

Effects of oral language and reading fluency on written word comprehension
The SVR predicts contributions to reading comprehension from both oral language and reading fluency. To determine the extent to which these effects are observed in the triangle model of reading, we repeated the linear mixed-effects models with written word comprehension accuracy as the dependent variable, and oral language (resulting from preliteracy oral language exposure and diversity) as predictors, but we also added reading fluency as a predictor. We found that, as demonstrated in latent variable models of behavioural data on reading comprehension (e.g., Adlof et al., 2006), oral language indexed by exposure (β = −0.02, p < .001) and vocabulary size (β = 0.86, p < .001) contributed significantly to written word comprehension in the model, and reading fluency was also related (β = 0.81, p < .001); thus, the model's performance was consistent with the SVR in predicting written word comprehension.  Effects of oral language and written word comprehension on reading fluency To test the possible contribution of both oral language skills and written word comprehension in affecting reading fluency, we conducted linear mixed-effects models on reading fluency with oral language (exposure and vocabulary size) and written word comprehension as predictors. Written word comprehension (β = 1.19, p < .001) predicted significant variance in reading fluency in addition to the oral language measures of vocabulary size (β = 0.17, p < .001) and exposure (β = −0.04, p < .001), indicating that both oral language skills and written word comprehension are impacting on the model's reading fluency, and not only effects from fluency on comprehension as constrained by the SVR.

Discussion
In behavioural studies of preliteracy language influences on learning to read, distinguishing individual predictors and determining their causal relations are a challenge. However, theoretical proposals for the effect of oral language on learning to read can be tested for their adequacy in computational modelling of reading. We here implemented the triangle model of reading (Harm & Seidenberg, 2004) but crucially investigated the model's learning, both prior to literacy onset, as well as during reading acquisition.
In relating the triangle model to the SVR, the simulation results demonstrated that both oral language and reading fluency contributed to written word comprehension, consistent with the SVR and with behavioural studies of reading development (Adlof et al., 2006;Curtis, 1980;Gough & Tunmer, 1986;Nation & Snowling, 2004;Ouellette & Beers, 2010;Ricketts et al., 2007;Storch & Whitehurst, 2002;Tomblin & Zhang, 2006). The contribution of (at least) two skills in predicting reading development in the model are shown to emerge from the computational requirements of the task to learn mappings between orthographic, phonological, and semantic representations. In addition, the triangle model also demonstrated that there were effects on reading fluency of written word comprehension as well as the measures of oral language skills. These results are consistent with the behavioural findings of semantic influences on reading fluency (Nation & Snowling, 2004;Ouellette, 2006;Ricketts et al., 2007;Share, 1995) and highlight the importance of bidirectional influences between reading fluency and reading comprehension.
A further influence on the reading system in the triangle model is the direct mappings between orthography and semantics, which becomes of increasing importance as reading acquisition develops (Nation, 2009;Nation & Snowling, 2004;Taylor, Duff, Woollams, Monaghan, & Ricketts, 2015).
Regarding the relative contributions of oral language on reading fluency and written word comprehension, the computational modelling demonstrates that oral language has an impact on reading fluency only in early reading development, whereas the differential effects of exposure and diversity remain, though somewhat reduced, for written word comprehension. According to Storch and Whitehurst's (2002) data, in early literacy development, oral language directly influences reading accuracy, whereas this direct effect is not observed by Grade 3 readers, which is instead primarily influenced by reading accuracy in previous years. In contrast, oral language continues to influence performance for reading comprehension by Grade 3, and a growing distinction between reading accuracy and reading comprehension appears to be observed as children's literacy develops (Adlof et al., 2006;Foorman et al., 2015;Pentimonti, O'Connell, Justice, & Cain, 2015;Tomblin & Zhang, 2006), with the latter influenced more by oral language skills.
The computational model also enabled us to distinguish between different contributors of exposure and diversity of preliteracy language experience in their effect on later development of reading. The modelling results showed that both vocabulary size and amount of exposure had unique effects on the reading performance, for both written word comprehension and reading fluency. As predicted based on behavioural results (Ouellette, 2006) and the computational properties of the mappings to be learned (Taylor et al., 2015), the effect of preliteracy oral language was substantially greater for written word comprehension than for reading fluency. For reading fluency, acquiring the mapping between orthography and phonology is easier than learning the mapping from orthography and semantics, and so the latter mapping is likely to be mediated to a greater degree by the preliteracy oral language system, via mappings from phonology to semantics (Harm & Seidenberg, 2004;. Furthermore, there was a larger effect on reading from vocabulary diversity than exposure. This suggests that variation in language exposure, rather than quantity of language exposure, ought to be the primary message for preliteracy language exposure and drives to enhance children's range of language experience, such as in shared reading (Cameron-Faulkner & Noble, 2013), rather than sheer quantity of exposure may best promote later development of reading skills.
We thus showed that quantity and diversity of language exposure relate not only to vocabulary acquisition (Jones & Rowland, 2017) but also to learning to read. Quantity of exposure appears to contribute more positively in early compared to later reading time (although it has overall a negative influence on reading fluency). Similarly, lexical diversity also has a larger influence early in reading development. This is partially consistent with the work of Jones and Rowland (2017), who showed that exposure is more important early in vocabulary learning and lexical diversity is more important later. Note that however the effects of vocabulary size and exposure were not additive in terms of the model's performance. The significant interaction between vocabulary size and amount of exposure suggests that the link between vocabulary knowledge and literacy was modulated by quantity of exposure to vocabulary, which was not always useful, particularly if increased exposure was drawn from a limited vocabulary.
So why is increased exposure harmful to later development of reading skills if drawn from a limited vocabulary? Within the model, this can be explained in terms of plasticity of the reading system. With more exposure, the model is able to represent the experienced vocabulary with a higher degree of fidelity (Perfetti, 2007) but becomes less flexible in incorporating new information (Monaghan & Ellis, 2010). So when the model is trained on a small vocabulary, its representation of that small vocabulary is highly accurate, but the model is then less able to expand to the vocabulary it experiences while learning to read. Then the newly experienced words are less effectively included into the oral vocabulary processing within the model, and greater reliance must be made on the direct orthography to phonology and orthography to semantics routes within the model. The simulation results further showed that this interaction pattern started from early literacy training and continued over the time course of learning to read, suggesting that extended reading experience does not completely mitigate the differences. The implication of this finding is that when children have limited oral vocabulary, it is more important to increase the diversity rather than quantity of their oral vocabulary, consistent with the observations of Rowe (2012) that breadth of oral vocabulary acquisition is ideally accomplished by promoting an increased vocabulary range after a core vocabulary has been acquired.
However, there are some limitations to the modelling study. Word reading in the model is characterised by exposure to monosyllabic words. Although the majority of words that children start to learn are monosyllabic, the average number of syllables in words increases constantly throughout the school years (Zeno, Ivens, Hillard, & Duvvuri, 1995). The skills that children learn for monosyllabic words cannot apply in exactly the same way to polysyllabic words (Toste, Williams, & Capin, 2017) due to their morphological complexity. Future work can be extended to develop a model of reading that has a fully representative vocabulary. This would also allow for the exploration of how morphological and syntactic structures of words might affect learning to read (Tomblin & Zhang, 2006), as polymorphemic words are more likely to be polysyllabic.
Another consideration is the operationalisation of reading only single words in the model. Tomblin and Zhang (2006) showed that grammar and vocabulary become distinct components of reading comprehension with literacy development, and Pentimonti et al. (2015) showed that discourse comprehension also fragments from other comprehension skills with development of reading. In our current modelling framework, we have included context units that relate to the semantic representations of individual words and included properties of the semantics that relate to grammatical distinctions. Clearly, implementing a richer context, and examining performance for sequences of words rather than isolated words, on the model's performance would be required to simulate this greater richness of literacy development.
A further limitation in the model is that once the reading tasks were introduced, further experience of oral vocabulary in the model ended so that we could isolate the role of early language exposure on reading development in the model. But children's oral vocabulary continues to develop during learning to read, and the structure of language skills may well then change as a consequence . So, later-acquired oral vocabulary may influence reading performance differently, and this would be an interesting topic for further investigation.
How the evident division of labour in the model with regard to reading development extends to other languages would further define the interactions between oral language skills and literacy across cultures. The extent to which a combination of decoding and oral language skills are involved in written word comprehension is likely to vary according to the ease with which the decoding of orthography to phonology occurs. In very regular alphabetic languages, such as Italian (Pagliuca & Monaghan, 2010), the role of both decoding and oral language in comprehension is likely to be more enhanced than in languages where acquiring orthography to phonology is as arbitrary as acquiring direct orthography to semantics mappings, such as in Chinese (Yang et al., 2009).
In conclusion, we have shown that theoretical models of relations between oral vocabulary skills and learning to read can be implemented in a computational model of reading, enabling a test of the explanatory adequacy of hypotheses about the causal relations between different language skills. We have further shown that such models can distinguish different aspects of preliteracy language experience-vocabulary size and amount of exposure-and determine their independent and combined influences on later development of learning to read. The model demonstrates that such relations are not straightforward and that under some circumstances, increasing quantity of language experience without ensuring vocabulary breadth may be detrimental to later development of reading skills. Notes 1. For instance, the word strengths was represented as _ s t r e _ n g t h s _ _ _, great was represented as _ _ g r e a t _ _ _ _ _ _ _, and tide was represented as _ _ _ t i_ d e _ _ _ _ _ _. 2. Many vowels in English words are represented by two adjacent vowels (as in great). Without two orthographic slots reserved for vowels, the model would learn less effectively the mapping between these two orthographic vowels and the phonological vowel. 3. For instance, the word strengths was represented as s t r E n g T s and great was _ g r eI t _ _ _. 4. Monaghan, Chang, Welbourne, and Brysbaert (2017) demonstrated that restricting training to the most frequent 1,000 words affected reading performance in the same way as randomly selecting 1,000 words across the frequency range, so that the particular characteristics of the higher frequency vocabulary were unlikely driving performance. 5. Note that the attractor training requires an identity mapping to be formed, which is computationally substantially easier than mapping between phonology and semantics, which is largely an arbitrary relation. 6. Altogether there were 120 simulation runs of the model with varied quantity and diversity of preliteracy language experience. As the four random versions of each model resulted in little variation in performance, we determined that additional simulation runs would not alter the patterns of results observed.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This research was supported by ESRC grant RES-000-22-4049. All authors contributed in a significant way to the manuscript. The authors have no conflicts of interest to declare.