Developmental Relations Between Reading Comprehension and Reading Strategies

ABSTRACT We examined the developmental relations between knowledge of reading strategies and reading comprehension in a longitudinal study of 312 Dutch children from the beginning of fourth grade to the end of fifth grade. Measures for reading comprehension, reading strategies, reading fluency, vocabulary, and working memory were administered. A structural equation model was constructed to estimate the unique relations between reading strategies and reading comprehension, while controlling for reading fluency, vocabulary, and working memory. The results showed that there was a unique effect of reading strategies on reading comprehension, and also of reading comprehension on reading strategies.

current study was restricted to the knowledge about how and when to use reading strategies, that is, the metacognitive knowledge of reading comprehension. The measure that we used was found to be related to reading comprehension in a cross-sectional study after reading fluency, vocabulary, and working memory were controlled (Muijselaar & de Jong, 2015), and appeared to be sensitive to the effects of a strategy intervention (Brand-Gruwel et al., 1998;Droop et al., 2016).
The general aim of this study was to investigate the developmental relations between reading comprehension and reading strategies. Because it is commonly known that reading fluency, vocabulary, and working memory are important predictors of reading comprehension (e.g., Daneman & Merikle, 1996;Hoover & Gough, 1990;Pressley, 2002), we controlled for these variables in our analyses. Consequently, we tested whether there is a unique effect of reading strategies measured at the beginning of Grade 4 on reading comprehension measured at the end of Grade 5, as well as whether there is a unique effect of reading comprehension measured at the beginning of Grade 4 on reading strategies measured at the end of Grade 5. The study was conducted in a group of fourth-grade children, because most fourth graders are able to read fluently and the development of metacognition and reading comprehension education has just started (Veenman et al., 2006). Important to note, Grade 4 is regarded as a critical point at which children's learning to read changes to reading to learn (McMaster et al., 2014).

Participants
The sample comprised 312 fourth graders (M = 9 years 7 months, SD = 5.79; 157 boys, 155 girls). These children came from 13 classes of 12 elementary schools in the Netherlands. Of these children, 95% were born in the Netherlands, and 96% spoke Dutch at home. The other 4% of the children spoke Dutch as a second language. All children were in regular elementary schools; none of the children received special education services.
In the Netherlands, children receive reading comprehension instruction from the end of Grade 2 or the beginning of Grade 3 onward. Teachers spend 1 to 2 hr on reading comprehension each week. Children are taught reading strategies such as predicting, questioning, and summarizing. In addition, they learn to pay attention to important characteristics of texts, such as the title, headings, and connectives and linking words.

Reading comprehension
To measure reading comprehension, texts of the Progress in International Reading Literacy Study (PIRLS; Mullis, Martin, Gonzalez, & Kennedy, 2003) and two tests of the AK-Reading Comprehension test (Aarnoutse & Kapinga, 2006) were chosen. The PIRLS contains several reading comprehension texts. Those texts are narrative or expository and have a number of corresponding questions. From these texts, three texts were used for this study. At the beginning of Grade 4, one narrative text ("Enemy Pie") and one expository text ("The Mystery of the Giant Tooth") were used. At the end of Grade 5, another narrative text ("Little Lump of Clay: An Unbelievable Night") and the same expository text were administered. Each text was between 814 and 920 words and contained 13 to 16 accompanying questions. Within these texts, four levels of comprehension were examined: (a) focus on and retrieve explicitly stated information; (b) make straightforward inferences; (c) interpret and integrate ideas and information; and (d) examine and evaluate content, language, and textual elements. Each text contained multiple-choice and open-ended questions. The multiple-choice questions contained four options, from which children were required to select the correct one. For the open-ended questions or constructed response questions, children were asked to write down their answer to the question. Children received 1 point for each correctly answered multiple-choice question and 1 to 3 points for the open-ended questions. Children's answers on the open-ended questions were scored by trained research assistants based on the scoring manual. 1 Before the start of the test, the children were provided with explainations on how to answer the different questions. The children were required to read the text silently and to finish all questions while the texts were available throughout the entire assessment. Most children finished the test within 40 min. Cronbach's alpha's were .79 ("Enemy Pie") and .73 ("The Mystery of the Giant Tooth") in the beginning of fourth grade, and .71 ("Little Lump of Clay: An Unbelievable Night") and .78 ("The Mystery of the Giant Tooth") at the end of fifth grade.
The AK-Reading Comprehension test is a series of Dutch standardized tests to measure reading comprehension in first to sixth graders. The AK-Reading Comprehension test 456, for fourth, fifth, and sixth graders, was used in the beginning of fourth grade in the current study. The test contained a booklet consisting of seven short texts (122-288 words), 44 questions, and an answer sheet. The AK-Reading Comprehension test 56, for fifth and sixth graders, was administered at the end of Grade 5. This test also consisted of a booklet with seven short texts (164-281 words) but had 40 questions. The AK-Reading Comprehension tests consisted of both narrative and expository texts, and the questions had a true/false format or a multiple-choice format with four answer options. Before the start of the test, one example text was given. The texts were available during answering of the items. Most children finished the test within 50 min. Cronbach's alpha of the AK-Reading Comprehension test 456 was .77 and the alpha of the AK-Reading Comprehension test 56 was .83.

Knowledge of reading strategies
Knowledge of reading strategies was measured with the Reading Comprehension Questionnaire (Gruwel & Aarnoutse, 1995). The questionnaire items focused on what to do before, during, and after reading a text (e.g., "What can I do best before reading?"); questions about strategies that can be used to monitor the reading process (e.g., "How can I check whether I comprehend the text?"); and questions that examined strategies that can be used when a child does not comprehend the meaning of a word or (a part of) the text (e.g., "If I do not understand a sentence, the best thing I can do is . . ."). The questionnaire contained 30 questions. All questions had four response alternatives, for example, "If I do not understand a sentence, the best thing I can do is . . . a) Look at the picture; b) Count the number of sentences I read; c) Read the sentence again, slowly; d) Copy the sentence." Before the start of the questionnaire, three example questions were given. The score was the number of questions answered correctly. For the analyses, the scores of the subscales were used. In the current study, Cronbach's alpha of the entire questionnaire was .82 in Grade 4 and .89 in Grade 5. Cronbach's alphas of the subscales for Grade 4 ranged from .46 to .56 and for Grade 5 from .54 to .68.

Word and pseudoword reading fluency
Reading fluency was measured with a word reading task and a pseudoword reading task. To measure word reading fluency, the Eén-minuut-test was used (Brus & Voeten, 1979). This is a standard test to measure word reading achievement in Dutch education. The test consists of a list of 116 words of increasing difficulty. The words had one to five syllables. Children were required to read aloud as many words correct as possible within 1 min from the list of words. The score was the number of words read correctly within 1 min. According to the manual, the mean parallel-test reliability is .90 (van den Bos, Lutje Spelberg, Scheepstra, & de Vries, 1994).
Pseudoword reading fluency was measured with the Klepel (van den Bos et al., 1994), a standard test to measure pseudoword reading achievement in Dutch education. The test consists of a list of 1 Children's answers on the open-ended questions were scored by trained research assistants based on the scoring manual. The 18 open-ended PIRLS questions in Grade 4 were scored by four assistants. First, they scored the tests of 10 children. Most of the items had a sufficient interrater reliability coefficient (ICC > .80). But for three items with low ICCs (.50, .58, and .71), agreement was reached by discussion. Then, all other questions were scored. The first and second author scored all 16 open-ended questions in Grade 5. First, the tests of 10 children were scored. From these questions, only three questions had a low ICC (.36, .76, and .78). For these questions, agreement was reached by discussion. Then, all other questions were scored. 116 pseudowords of increasing difficulty. The pseudowords had one to five syllables. Children were asked to read as many pseudowords correct as possible within 2 min. The score was the number of pseudowords read correctly within 2 min. According to the manual, the reliability coefficient (r tt ) of the whole test in fourth graders was .89 (van den Bos et al., 1994).

Vocabulary
Vocabulary was assessed with the Peabody Picture Vocabulary Test (Dunn & Dunn, 1997) and the Language Proficiency Test (Verhoeven & Vermeer, 1986). For the Peabody Picture Vocabulary Test, a Dutch version was used (Schlichting, 2005). Usually, this test is administered individually, but for this study Sets 8-13 were administered in class. In total, the test used in the current study consisted of 72 items. Each item contained four pictures representing the answer alternatives. A test assistant read out words, and children were instructed to underline the picture corresponding with that word. Two practice items were given in class before the start of the test. Administration in class took approximately 30 min. The score was the number of questions answered correctly. Within this sample, Cronbach's alpha was .66.
On the Language Proficiency Test, children were required to choose the best synonym for a word. The words were represented within sentences. The task consisted of 50 items with four answer possibilities each. Almost all children finished the test within 30 min. The score was the number of answers correct. Cronbach's alpha was .82 in this study.

Working memory
For working memory, both short-term memory tasks and working memory tasks were used. A word span task and a digit span task were used to measure verbal short-term memory. On the word span task, children were asked to recall series of monosyllabic words. The series consisted of two to nine nouns that were randomly chosen from a selection of nine nouns. The nouns were selected from a list of words that were commonly known by 6-year-old children (Schaerlaekens, Kohnstamm, & Lejaegere, 1999). For each number of nouns, or list length, there were three series. A test assistant read out the series with a speed of one word per second. Children were asked to recall the words in the same order as presented. When children failed on all three series of the same list length, the task was stopped. Two practice items were given, consisting of two words each. The score of the word span task was the number of series recalled correctly. Because this task was stopped if a child failed on three items of the same list length, the reliability just could not be calculated. Because it may be assumed that those missing items would be answered incorrectly, the missings were recoded as incorrect. Based on this assumption, Cronbach's alpha was .56. On the digit span task, children had to recall series of two to nine digits. There were three series per list length. The series were constructed by randomly selecting digits from 1 to 9. The series were read by a test assistant with a speed of one digit per second. At the end of each series, children were required to recall the digits in the same order. The task was stopped when a child failed on all three series of the same list length. There were two practice items with a list length of two digits. The score was the number of correctly recalled series. Based on the same assumption as for the word span task, the reliability was .74.
Verbal working memory was assessed with a listening span and a reading span task. On the listening span task, children were required to listen to series of sentences. After each sentence, they had to judge whether the sentence was correct or incorrect and remember the last word. After all sentences were done, the last words of all sentences had to be recalled in the order of presentation. The sentences contained three to seven monosyllabic words. The words to be recalled were selected from a list of words that were commonly known by 6-year-old children (Schaerlaekens et al., 1999). The series had a list length of two to five sentences, with four series per list length. The sentences were read out by a test assistant, and the child was asked to recall the last words in the same order. If a child recalled all four items of the same list length incorrect, the task was stopped. Two example items with a list length of one and two items, respectively, were given before the start of the test. The score was the number of items recalled correctly. Cronbach's alpha was .63, given that the missing scores were coded as incorrect.
For the reading span task, children were presented with a booklet with series of sentences, and they were required to read these sentences aloud. After each sentence, the children were required to judge whether these sentences were correct. After each sentence, the test assistant read out a word. At the end of each series, the children were required to recall all words the test assistant had read out in the correct order. The words to be recalled were monosyllabic words and commonly known by 6-year-old children (Schaerlaekens et al., 1999). The sentences contained three to seven words. The series had a list length of two to five sentences, and there were four items per list length. The task was stopped when a child failed on all four items of the same list length. Before the start of the test, two example items with list lengths of one and two items were provided. The score was the number of items recalled correctly. After recoding the missing scores into incorrect, Cronbach's alpha was .69.

Procedure
When the children were in the beginning of fourth grade, reading comprehension, knowledge of reading strategies, and the two vocabulary tests were administered in class in two sessions. Working memory and reading fluency were examined in two individual sessions. At the end of fifth grade, there were also two sessions in class to measure reading comprehension and knowledge of reading strategies. All tests were administered by trained research assistants.

Analyses
To examine the developmental relations between reading comprehension and reading strategies, first, measurement models for Grades 4 and 5 were specified (see Figures 1 and 2). Next, a full structural equation model was tested. The structural model contained latent factors for reading comprehension, reading strategies, reading fluency, vocabulary, and working memory measured at the beginning of Grade 4 and for reading comprehension and reading strategies measured at the end of Grade 5 (see Figure 3). Covariances between the latent factors in Grade 4, and between the disturbances of the latent factors in Grade 5 were allowed. The autoregressive effects of reading comprehension and reading strategies, as well as the cross-lagged paths between reading comprehension and reading strategies, were tested. Mplus Version 7.11 (Muthén & Muthén, 2012) was used to conduct the factor and path model analyses to test for a reciprocal relation between reading comprehension and reading strategies. Full information maximum likelihood estimation was used to obtain unbiased parameter estimates given missing data. The chi-square goodness-of-fit test statistic, the root mean square error of approximation (RMSEA) and its corresponding confidence interval, and the comparative fit index (CFI) were used to evaluate overall model fit (Kline, 2011). A nonsignificant chi-square indicated good model fit, and a significant chi-square was taken as poor fit (Hayduck, 1996). A model with an RMSEA below .05 had a good approximate fit to the data, RMSEA values between .05 and .08 indicated satisfactory approximate fit, and values greater than .10 were considered as poor approximate fit (Browne & Cudeck, 1993). A CFI greater than .95 was taken as good incremental model fit, and a CFI above .90 was considered acceptable (Hu & Bentler, 1999). The differences in model fit between two nested models were tested with the chisquare difference test (Kline, 2011).

Data screening and descriptive statistics
Before running the analyses, data were screened for outliers. Scores that were 3 standard deviations above or below the mean were coded as missing. There were eight outliers in Grade 4 and 17 in Grade 5. For the measurement occasion at the beginning of Grade 4, less than 1% of the data was missing. For the measures that were administered at the end of Grade 5, 11% of the data was missing due to moving and illness of some children. 2 The means and standard deviations of the different measures for reading comprehension, reading strategies, reading fluency, vocabulary, and working memory for the two measurement occasions are presented in Table 1. Skewness and kurtosis values were within acceptable ranges (skewness = -1.10 to .54; kurtosis = -.80 to 1.04; Kline, 2011). The correlations among these measures are also displayed in Table 1. The correlations between reading comprehension and reading fluency measures were moderate. Reading comprehension and vocabulary were moderately to highly correlated. There were low to moderate correlations between reading comprehension and working memory. The correlations between reading comprehension and knowledge of reading strategies were moderate to strong.

Preliminary analyses
First, we tested whether the mean levels of reading strategies and reading comprehension increased over time. The only measures that were administered in both grades were the Reading Strategies Questionnaire and the PIRLS test "The Mystery of the Giant Tooth." The mean score of knowledge of reading strategies in the children in Grade 5 was significantly higher than the average score of the children in Grade 4, t(277) = 13.92, p < .001, d = 0.70. Also for the PIRLS test, the average score in Grade 5 was higher than the average score in Grade 4, t(272) = 12.63, p < .001, d = 0.67. Next, we established the fit of the measurement models of the observed variables with the corresponding latent factors in Grades 4 and 5. The measurement models in Grades 4 and 5 were improved by adding two and one covariance, respectively 3 . The final models had a good fit to the data and were presented in Figures 1 and 2. These measurement models were used in further analyses.

Structural equation modeling
To test the developmental relations between reading comprehension and reading strategies, we used a structural equation model with autoregressive and cross-lagged paths between reading strategies and reading comprehension. This model had a good fit to the data, χ 2 (282) = 389.37, p < .001,     97. Because this model did not differ from the model without these additional paths, Δχ 2 (6) = 1.85, p > .05, the more parsimonious model was chosen. Next, it was tested whether the unstandardized cross-lagged paths could be constrained to be equal. The fit of the starting model and the model with equal cross-lagged paths did not differ, Δχ 2 (1) = 0.15, p > .05. Thus the more parsimonious model, with equal cross-lagged paths, was chosen as the final model (see Figure 3). This model fit the data well, χ 2 (283) = 389.52, p < .001, RMSEA = .035, 90% CI [.026, .043], CFI = .97. The path analyses revealed that there was a unique significant effect of knowledge of reading strategies on reading comprehension, and of reading comprehension on knowledge of reading strategies, whereas the contributions of the autoregressors, reading fluency, vocabulary, and working memory were taken into account. The standardized factor loadings of the tests on their corresponding factor in Grades 4 and 5 are displayed in Table 2. In Table 3, the factor intercorrelations in Grade 4 are presented. There were moderate correlations of reading comprehension with reading fluency and working memory. The correlation between reading comprehension and vocabulary was high. The standardized path estimates of the structural equation model are given in Table 4. The effect of reading strategies on reading comprehension may be somewhat lower due to the high stability of reading comprehension (.80). The high stability may partly be caused by the administration of the same reading comprehension test in both grades. Therefore, a model with the AK-Reading Comprehension tests only was also estimated, as these tests were not the same in both grades. In this model, the AK-Reading Comprehension tests were modeled as a single indicator of the corresponding latent variable. 4 This alternative model also revealed that there is both an effect of reading strategies in Grade 4 on reading comprehension in Grade 5 and of reading comprehension in Grade 4 on reading strategies in Grade 5 (see Table 5 for the path estimates of this alternative model).

Discussion
We examined the developmental relations between reading comprehension and knowledge of reading strategies from the beginning of fourth grade through the end of fifth grade. Knowledge of reading strategies in Grade 4 appeared to uniquely affect the level of reading comprehension in Grade 5, when taking into account reading fluency, vocabulary, working memory, and the autoregressive effects of reading comprehension and reading strategies. In addition, reading comprehension in Grade 4 had a unique effect on knowledge of reading strategies in Grade 5. Previous cross-sectional studies found small to moderate correlations between reading strategies and reading comprehension (e.g., Muijselaar & de Jong, 2015;Samuelstuen & Bråten, 2005). In accordance with these previous studies, we also found moderate relations between knowledge of reading strategies and reading comprehension, in both fourth and fifth grades. The correlations between reading comprehension and the control measures in the present study were also in line with previous studies (e.g., Oakhill & Cain, 2012). That is, the correlations between reading comprehension and reading fluency were moderate, there To take the reliability of the AK-Reading Comprehension tests into account, the residual variances were fixed at 1 minus the reliability times the variance of the AK-Reading Comprehension test: (1 -r xx ) × s 2 (Kline, 2011). were moderate to high correlations between reading comprehension and vocabulary, and reading comprehension and working memory showed low to moderate correlations. The most important findings were the cross-lagged effects between reading strategies and reading comprehension. The effect of knowledge of reading strategies on reading comprehension is in line with theories of reading comprehension, which emphasize the importance of reading strategies for the construction of a situation model of the text (e.g., Graesser, 2007). The effect of reading comprehension on knowledge of reading strategies matches with the findings that children are able to learn from texts (McMaster et al., 2014;Verhoeven & Perfetti, 2008) and with the results of studies on differences between novices and experts (e.g., Alexander, 2003;Alexander & Judy, 1988). As children get older, they probably read more and, more important, increasingly more difficult texts. While reading difficult texts, children may experience a breakdown in comprehension more often than when they read less difficult texts. When there is a breakdown in text comprehension, reading strategies will be used to fix this comprehension gap, and such strategies might be more advanced when reading more difficult texts. Thus, during the acquisition of reading comprehension, children may gain more knowledge of reading strategies by reading increasingly more difficult texts.
Note, however, that the observation of cross-lagged effects does not necessarily imply reciprocal causal relations between knowledge of reading strategies and reading comprehension (Selig & Little, 2012). We controlled for a number of variables that are known to be involved in the acquisition of reading comprehension, but we might not have accounted for all potential "third" variables. One obvious candidate that was not controlled for are higher order skills such as inference making. However, higher order skills are known to correlate very highly with reading comprehension (e.g., Oakhill & Cain, 2012) and might even be conceived as a part of reading comprehension skill. Therefore, it seems likely that the autoregressive effect of reading comprehension takes into account the contribution of those higher order skills. A straightforward causal interpretation of the cross-lagged effects is also hindered by the use of different measures of reading comprehension across occasions (e.g., de Jong & van der Leij, 2002). If the fifth-grade test for reading comprehension places higher demands on reading strategies than the fourthgrade test, then the autoregressive effect only partly controls for the earlier ability. Accordingly, we cannot exclude the possibility that the effect of Grade 4 knowledge of reading strategies on reading comprehension in Grade 5 is due to higher demands on reading strategies in the Grade 5 comprehension test. Our measure of knowledge of reading strategies was the same on both occasions. Therefore, the cross-lagged effect of reading comprehension on knowledge of reading strategies seems more likely to reflect a causal effect.
A limitation of the current study is that only the knowledge of reading strategies was measured. Such a measure does not give information about the actual use of reading strategies during text comprehension. In some studies the use of reading strategies was measured with questionnaires on which students had to report how often they used specific strategies or with measures on which students were required to apply specific reading strategies (e.g., Kozminsky & Kozminsky, 2001;Mokhtari & Reichard, 2002). The use of reading strategies seems to be more directly related to reading comprehension than the knowledge of reading strategies, although the validity of the measures to assess it can be questioned (e.g., Cromley & Azevedo, 2006;Veenman et al., 2006). However, we observed moderate correlations between the reading comprehension measures and the knowledge of the Reading Strategies Questionnaire, such as previous studies (e.g., Muijselaar & de Jong, 2015). Apparently, knowledge of reading strategies is important in text comprehension, possibly because the knowledge of reading strategies is a prerequisite to the use of strategies. Nevertheless, future studies should measure both knowledge and use of reading strategies in one study, such that the contributions of these two types of measures to reading comprehension can be investigated.
As a second limitation it might be argued that the number of parameters to be estimated in our models was rather large in comparison to the number of children in the sample (Kline, 2011). It should be noted, however, that our main interest was in the cross-lagged effects, being only four parameters. Moreover, a model without control variables, reading fluency, vocabulary, and working memory gave the same parameter estimates for the cross-lagged effects. A third limitation is that only partly the same reading comprehension tests were administered at both measurement occasions. This might have affected the stability of the latent variable of reading comprehension. The use of a different test might lower the stability, but with the same test, administered twice, the stability might be artificially high. In the current study, the stability was rather high. Such a high stability of reading comprehension may result in an underestimation of the effect of reading strategies on reading comprehension. However, additional analyses with the two AK-Reading Comprehension tests, which can be considered parallel versions, revealed comparable cross-lagged effects between reading strategies and reading comprehension. A fourth limitation concerns the low reliability of some of the measures, especially the measures of working memory. Although unfortunate, this does not seem to have affected the results. We used latent variables, and the factor loadings of the indicators of the main latent variables-reading strategies and reading comprehension-were adequate. As a last limitation we should mention that this longitudinal study concerned a rather short period, that is, fourth through fifth grade. Future research should focus on the developmental relations between reading strategies and comprehension in other age groups.
To conclude, this study is the first that showed a reciprocal developmental relation between reading comprehension and reading strategies. This implies that the moderate relationships found in cross-sectional studies are due to the effect of reading strategies on reading comprehension, as well as the effect of reading comprehension on reading strategies. The rather small effect of reading strategies on reading comprehension can explain the small or even nonsignificant effects of strategy interventions on reading comprehension (Compton, Miller, Elleman, & Steacy, 2014;Droop et al., 2016;McKeown, Beck, & Blake, 2009;Scammaca et al., 2015). Only large gains in the knowledge and use of reading strategies will result in effects that have relevance. Therefore, it might be important to pursue other reading comprehension interventions than enhancing reading strategies. Such interventions might simply aim to spend more time on reading and the comprehension of texts of increasing complexity, or more specifically focus on the construction of a situation model through a context approach or by practice in inference making (e.g., Elbro & Buch-Iverson, 2013;McKeown et al., 2009;McMaster et al., 2012).

Funding
This research was funded by the NWO Programming Council for Educational Research (PROO) (411-10-925).