The presence of students identified as having special needs as a moderating effect on their classmates’ reading comprehension scores in relation to other major class composition effects

ABSTRACT This study investigates the impact of the presence of students identified as having special needs (SEN) on their classmates’ achievements in reading comprehension. Multi-level regression modelling was conducted with the data of more than 75,000 fourth graders of 4,937 classes in Austria. Students’ scores of reading comprehension were used as the dependent variable in the models. The number of students with SEN was used as the independent variable, besides other class-level predictors like the socio-economic status or the self-concept. To disentangle individual from classroom composition aspects, variables at the individual level were used as independent variables as well (gender, age, first language, number of books at home, socio-economic background, kindergarten attendance, and self-concept). Results show only a small relationship (Cohen’s d = −0.16) between the presence of students with special needs on their classmates’ national standard scores, in particular compared to other class-composition effects like socio-economic status or self-concept.

Inclusion; national educational standards; reading comprehension; multi-level modelling

Introduction: inclusion and other factors of class composition related to reading achievement
In recent years, persons from different backgrounds (i.e. teachers, educational scientists, policymakers) have intensively discussed the benefits of inclusive schooling (Ainscow, 2020;Gasteiger-Klicpera et al. 2013;Persson 2013). Inclusive education aims at enabling all children -from those considered gifted to those with learning disabilities -to participate equally in the education system. The focus is on effective participation regarding abilities and competences as well as on social participation (Grosche 2015). Inclusion is an ongoing process, which demands a rigorous transformation of the educational system at all levels (Ainscow 2015). Inclusive education means teaching all students in one classroom with high-quality education, differentiated instruction and individualised support for all learners. For students with SEN, longitudinal studies demonstrate that inclusion positively influences academic achievement in mathematics (Haeberlin, Bless, Moser, & Klaghofer, 1991) as well as in reading (Cosier, Causton-Theoharis, and Theoharis 2013;Dessemontet, Bless, and Morin 2012). Although some longitudinal studies do not reveal differences in achievement gains between inclusive or exclusive schooling (Wild et al. 2015), meta-analyses confirm the positive effect of inclusive education for students with disabilities (Lindsay 2007). In German-speaking countries, inclusion's positive impact on reading achievements of students with SEN has also been demonstrated in crosssectional large-scale studies (Stranghöner et al. 2017).
For social and emotional development, the results are mixed (Kalambouka et al. 2007;Ruijs & Peetsma, 2009). Overall, children with special needs in mainstream classes seem to have less friends and are less integrated compared to children without special needs. Special attention has been paid to victimisation. Although victimisation can also occur in special schools, evidence shows that children with SEN in inclusive classes are also at risk. Regarding the development of self-concept, that is the self-perception of one's own abilities in academic domains, scholars have observed almost no difference between students with SEN in inclusive or exclusive education (Bear, Minke, and Manning 2002). Learning in a special school can help students with SEN to create a more positive selfconcept, but it prevents students from transitioning to the labour market after completing schooling. Social relationships between students with and without disabilities are faraway more dependent on the learning environment and the school framework conditions (Bossaert, deBoer, Frostad, Pijl, & Petry, 2015;Gasteiger-Klicpera and Klicpera 2008).
While these positive benefits of inclusion for students with SEN are widely acknowledged, the debate about the advantages and disadvantages of inclusion on students without SEN remains ongoing. The central assumption is that the presence of an additional special education teacher or assistant in classes with students with SEN should have a positive effect not only on the promotion of pupils with SEN, but also on the quality of teaching. The presence of a second teacher can help teachers plan lessons together, give feedback and provide mutual support. A secondary assumption is the need for differentiated teaching in those classes, which should have a positive impact on the performance of all children. Although we assume that in many cases the decision for an inclusive class presupposes a conscious decision by teachers and peers as well as their parents, this could introduce a bias from the beginning. Yet following the resourcesharing model, we assume that students with SEN claim more attention and support from the teacher, so he or she may have less time for the other children and this could have a negative impact on their performance (Hienonen et al. 2018;Kristoffersen et al. 2015;Ruijs 2017).
Most studies, however, do not support this view. Different scholars report either positive effects of inclusive schooling on students without SEN (Cosier, Causton-Theoharis, and Theoharis 2013;Hanushek, Kain, and Rivkin 2002) or no significant effect of the presence of students with SEN on the academic achievement of their peers Gebhardt, Heine, and Sälzer 2015;Krammer et al. 2019;Ruijs, Van Der Veen, and Peetsma 2010). The positive or neutral effects of inclusion seem to outweigh negative effects. In the meta-analysis of Kalambouka et al. (2007), for example, 81% of the outcomes reported positive or neutral effects of inclusive schooling on students without SEN, although some of the results indicated that the implementation of inclusion in primary school may be easier than in secondary school. Szumski, Smogorzewska and Karwowski's (2017) recent meta-analysis of existing studies encompassing nearly 4 800 000 students from different levels and different educational national contexts comes to a similar conclusion. The meta-analysis revealed a slightly positive effect of the effectiveness of inclusive schooling. Nevertheless, the inclusive schooling debate remains one of the most predominant educational policy discussions because providing a supportive school environment for all children is an important goal in most countries.
In addition to research that has examined the class composition related to students with SEN, other scholars have investigated other variables related to class composition, which has produced more consistent results. Studies recognise, for example, that the composition of students' socio-economic backgrounds in a classroom influences students' achievements (Bellin 2009;Biedermann, Weber, Herzog-Punzenberger, & Nagel, 2015;Stubbe, Schwippert, and Wendt 2016). In addition, the number of students with a first language different from the language of instruction also impacts students' achievements in certain classrooms (Bellin 2009;Biedermann et al., 2016).

Reading comprehension and its predictors
Reading comprehension is the ability to understand the meaning of a written word, sentence, or text (Perfetti, Landi, and Oakhill 2005). During the reading process, a reader constructs a mental representation of the text (Kintsch and Rawson 2005). For comprehending the meaning of a written text, two different processes are necessary: the word identification process and the reading comprehension process (Perfetti and Stafura 2014). For word identification, decoding abilities (the fast and correct capturing of letters, parts of words, and words) and linguistic abilities (especially vocabulary, but also syntax, etc.) are fundamental (Melby-Lervåg and Lervåg 2014). Students acquire basic reading comprehension skills in primary school, and a successful acquisition of these skills has a major impact on a further school career (Breit, Bruneforth, and Schreiner 2016). The acquisition of reading comprehension competences, however, is not only dependent on students' individual cognitive development (Wang, Haertel, and Walberg 1993), but also on the quality of teaching (Hattie 2009) and other contextual factors, such as the socioeconomic status of the family (economic resources, parents' occupational status, etc.), the level of parental education, the family's migration background, and the possession of learning relevant resources (e.g., books).
These learning-relevant resources, as well as parental education, are closely linked to family literacy, which has a strong impact on reading development (Rodriguez-Brown 2011). Both PISA 2015 (Breit 2016;Salchegger et al. 2016) and PIRLS 2011 (Bergmüller and Herzog-Punzenberger 2012;Schreiner 2012) showed that these factors are closely related to reading performance in Austria. Studies reveal that a low socioeconomic status, a low number of books at home, a low or absence of education of the parents, and the existence of a migrant background are related to lower reading performance. However, these variables seem to be highly intertwined with each other. Hippmann, Jambor-Fahlen, and Becker-Mrotzek (2019), for example, recently demonstrated that when controlled for socioeconomic status and intelligence in the German context, the influence of migration background and multilingualism on reading skills of students at the end of second grade vanished. Likewise, Maitz et al. (2018) observed that differences in socioeconomic background and cultural capital partially accounted for the performance gap between Austrian children with German as a first language and students with a different first language.
Kindergarten attendance is another factor that has an impact on school achievements (for an overview, see Anders 2013). For example, in the United States, children's attendance of pre-kindergarten, kindergarten and preschool was perceived as having positive effects on the development of reading skills (Haslip 2018;Huang, Invernizzi, and Drake 2012;Valenti and Tracey 2009). Such positive effects for institutional early childhood care have been reported in other countries as well (e.g., Great Britain: Sammons et al. 2002;New Zealand: Wylie et al. 2006;Sweden: Broberg et al. 1997;Germany: Bos et al. 2003). In Germany, Bos et al. (2003), using the PIRLS data, found that children who attended kindergarten for more than 1 year showed better reading abilities than children whose attendance was less than 1 year.
Another factor that influences reading comprehension skills is gender. In international large-scale reading assessment studies, girls often outperform boys in reading skills (Progress in International Reading Literacy Study; Mullis et al. 2012; National Assessment of Educational Progress; National Center for Education Statistics, 2011;Reilly, Neumann, and Andrews 2018). This gender gap, it should be noted, could be related to differences in reading habits and reading motivation, along with differences in their performance on specific test item formats (Schwabe, McElvany, and Trendtel 2014). According to Chiu and McBride-Chang (2006), who analysed the PIRLS data of 43 countries, the gender effect is highly mediated by reading enjoyment. Besides reading motivation and enjoyment, the self-concept of reading is also strongly connected with reading competence (Retelsdorf, Köller, and Möller 2014). The reciprocal effects between the two aspects are subjects of intensive discussion. Longitudinal studies have shown that reading ability is the most important predictor of reading self-concept (Retelsdorf, Köller, and Möller 2014) and that additional support enhances the development of self-concept in reading for students with special needs (Savolainen, Timmermans, and Savolainen 2018).

Inclusion within the austrian school system
Within the first 4 years of compulsory education in the Austrian school system, one class teacher usually teaches all subjects. When students with special needs are taught in regular classes, team teaching occurs either during the entire time (when there are five or more students with SEN in the class) or in a limited amount of lessons per week. The Federal Ministry of Education defines special educational needs as follows: "Special educational needs . . . exist if a pupil is unable to follow lessons in primary or secondary schools or polytechnic schools without special educational support as a result of physical or psychological disability and is not exempted from attending school . . . ". In addition, it is made clear that ". . . Insufficient school performance without the determining characteristic of the disability does not justify any special educational support requirement" (Zöhrer et al. 2010).
Students with SEN may be taught either separately in special needs schools or in an inclusive manner in regular schools (from kindergarten to primary school and then secondary education). In the 2017/2018 school year, 64.8% of all students with SEN in Austria were taught in inclusive classes in regular schools and 35.2% in special schools (23.4% of all students with SEN were taught in primary schools, while 37.5% were taught in Middle Schools) (Statistik Austria 2018).

Testing of the educational standards in austria
Austrian educational standards as desired learning outcomes are tested periodically. The results of the tests provide comprehensive information on students' competences and should initiate targeted quality development processes at each school (BIFIE 2012). Participating in the educational standards tests is obligatory for all students except students identified as having SEN and so-called "irregular" students, who are typically language learners without sufficient competence in German. For each student that is not being tested, teachers must document the reason. This documentation is an important source for the present study. For this study, data were collected in the BIST-D4 tests from 2015. This study concentrates on the competences in reading comprehension.
The Federal Institute for Research and Development in Education in Austria collected and evaluated the test data (BIFIE: Bundesinstitut für Bildungsforschung, Innovation & Entwicklung des österreichischen Schulwesens Breit et al,, 2016). In order to allow for further analysis, the anonymised data set can be obtained by researchers from BIFIE/IQS (as done in Krammer et al. 2019 and in the present study). More information about the BIFIE and the used dataset is also available under following link: www.iqs.gv.at/fdb/bist. In summary, the BIFIE/IQS can be regarded as an institution responsible for evidence-based controlling and development of the Austrian school system. In this regard, the BIFIE/IQS is also responsible for the large-scale assessments like PISA or TIMMS/PIRLS.

Aims and research questions
The present study further investigates the influence of SEN students on their classmates' achievement, but focuses on the reading comprehension achievements in national educational standard achievement tests from 2015. To do so, this paper explores the following questions: i.) Does the number of students with SEN affect the educational standards achievement scores in reading comprehension of their classmates without SEN in a positive or negative way?
i.a.) The presence of students identified as having SEN is statistically significantly related to the reading achievement of their classmates.
i.b.) Given the existing literature and the presence of a statistically significant relationship, a positive relationship is assumed.
ii.) Given that a statistically significant difference of the presence of students identified as having SEN on the reading achievement exists, how strong is this effect compared to other known class-composition effects like socio-economic status and self-concept for the same population?
ii.a) The presence of students identified as having SEN as a composition effect is considerably weaker than other well-known composition effects based on socio-economic status.
ii.b) The presence of students identified as having SEN as a composition effect is considerably weaker than other well-known composition effects based on self-concept.
iii.) Is there a different effect of the presence of students identified as having SEN on the reading performance of low-achieving students without SEN compared to highperformers?
iii.a) Low-performing students profit from the presence of students with SEN because of resources, knowledge, presence of specialised teachers, etc.
iii.b) High-performing students, in contrast, are disadvantaged through the presence of students with SEN because of levelling down the class performance and instruction speed.
SEN because of levelling down the class performance and instruction speed.
Moderating variables at individual level (age, gender, first language, number of books at home, socio-economic background, kindergarten attendance and self-concept) were considered.

Sample
The data were taken from the educational standards test in German (BIST-D4), which was carried out by the BIFIE/IQS as part of an Austrian-wide survey in primary schools in 2015. Seventy-five thousand three hundred and forty-one students from 2,994 primary schools in 4,937 classes in all nine federal Austrian states participated in this test. On average, students were 10.36 years old (SD = 0.46). A 49.37% of the students were female and 81% spoke German as their first language (in case of bilingual students, as one of their first languages; this information was obtained from the students' questionnaire) (Breit, Bruneforth, and Schreiner 2016).

Variables used in the regression model
The Austrian educational standards test in German is based on a competence-model encompassing three different levels (from Level 1, educational standards only partly achieved, to Level 3, educational standards surpassed). In terms of reading comprehension, the test assesses comprehension at word, sentence and text level. At text level, factual and narrative, as well as linear and non-linear texts (e.g. tables, graphics) of different length and complexity in terms of content, structure and language, are provided. The test assesses the competence to determine explicit information, general text comprehension, text-related interpretation and reflection (BIFIE 2016).
In the regression models, the education standards achievement scores in reading comprehension were used as the dependent variable. This variable consists of two parts. The first test part is a (speed) reading test on word level. Students have 1 minute time to solve as many items as possible. During the second part, reading comprehension on sentence and text level is tested. There are two kinds of texts conducted: short texts with a maximum of 300 words, followed by a longer text with 400-800 words, whereby always 4-6 items per text should be processed by the students. The test scores as raw values were then scaled using a Random Coefficients Multinomial Logits Model (Adams and Wu 2007;Trendtel, Pham, and Yanagida 2016). In this very flexible model class, classical IRT-Models, like Rasch-or the Partial-Credit model for polytomous items, were used to estimate the item parameters. Based on the estimated item parameters, it was possible to estimate individual capabilities as plausible values (pv) within a Marginal Maximum Likelihood (MML) approach, likewise it was done for the used dataset (Adams and Wu 2007;Trendtel, Pham, and Yanagida 2016;Wu 2005). It is important to notice that these pv's are not multiple imputations of the weighted likelihood estimates of the MML approach. Instead, the pv's are computed on the basis of random realisations of the aposteriori distribution. Hence, in the MML approach item parameters are estimated on the basis of individual parameters (prior). After this estimation, the before used individual parameter distributions are enriched by the likelihood of the response pattern (i.e. known item parameters or other co-variates) to an a-posteriori distribution. One of the advantages of this procedure is that it is possible to estimate individual response patterns, like individual means and variances (hence the individual measurement error). In this regard, pv's were used to control for measurement error (von Davier, Gonzalez, and Mislevy 2009;Robitzsch, Pham, and Yanagida 2016;Wu 2005). Moreover, from these pv's, reliability coefficients can be estimated, likewise it was done by PISA (OECD, 2017). Applied at the used dataset we receive a reliability-coefficient of .88. This is the proportion of the combined IRT model and the Latent-Regression model (Trendtel, Pham, and Yanagida 2016;Wu 2005). It is important that this reliability coefficient should not be confused with classical reliability-coefficients as it is based on more than item response (OCED, 2017). For further information please see Trendtel, Pham, and Yanagida (2016).
The reading comprehension score data were normally distributed (M = 523, SD = 100.48). The interquartile-range arrays from 454 to 594 score points. In other words, 50% of the students achieved scores within this bandwidth, whereas 25% scored below this range and 25% scored above. It should be noted again that students identified as having SEN or insufficient German language skills were excluded from the educational standards tests.
A student's (individual) level, gender, age, first language, attendance of kindergarten, self-concept in German and the individual socio-economic status were used as independent variables. Characteristics of the variables gender and age can be found in section 6.1. The variable "first language" was obtained from the students' questionnaire. For the regression analysis, this variable was dichotomised with 1 representing students with German as a first language and 0 representing both other categories. The variable "attendance of a kindergarten" is a 6-point categorical variable with a range from 1 (no attendance) to 5 (more than 3 years) with another sixth category "attendance without knowledge of the duration". Most of the students who chose one of the first five categories attended a kindergarten for more than two years (83.22%), while 8% reported an attendance of 1 year and 8.79% reported no attendance. Imputation has reduced the sixth category. Self-concept in German is a 4 point categorical variable ranging from 1 (full agreement) to 4 (do not agree at all). For more information about the distribution of the values in the data set, see BIFIE (2017).
Moreover, individual socio-economic status was estimated by a range of different background variables, like HISEI (International Socio-Economic Index), books at home and the education of the parents. The aggregation of these single variables to an index was mainly done because of multicollinearity concerns. The HISEI was used to estimate the highest socio-economic status of the parents of the students. The mean of the HISEI was 48.14 (SD = 20.78) for the student ratings and M = 50.2 (SD = 20.6) for the parents' ratings. The variable "number of books at home" was used (as in the PISA tests) to estimate the cultural background of the students. It is a 5 point categorical variable with a range from 1 (less than 10 books) to 5 (more than 200). Similarly to the HISEI, student ratings as well as parent ratings (from the parents questionnaires) were used. Finally, the education of the parents was measured with a 4-point scale ranging from 1 (no compulsory schooling) to 4 (university degree). All of these variables were z-standardised and aggregated using the formula (I) below and then used as independent variable. S = 1/6*(z(HISEI Parents ) + z(HISEI students ) + z(Books Parents ) + z(Books students ))+1/3* z (ParentsEducation).
Self-concept in German language arts was estimated using a 4-point scale ranging from 1 (low self-concept) to 4 (high self-concept). The mean of this variable is 3.1 (SD = 0.8). The independent variables for self-concept in German language arts and socio-economic status were group-centred (Raudenbush and Bryk 2002).
At the class level, the individual variables self-concept in German and the individual socio-economic status were also used as group mean aggregated predictors (Raudenbush & Byrk, 2002). Furthermore, the independent variable "number of students identified as having SEN" (number SEN) was used. This information was obtained from the BIFIE/IQS. This variable has a range from 0 to 8. A 99.47% of the classes included 0 to 5 students identified as having SEN (76.61% classes had no students identified as having SEN, 12.96% classes had one student identified as having SEN, 4.22% classes had two students identified as having SEN, 2.41% classes had three students identified as having SEN, 1.7% classes had four students identified as having SEN and 0.97% classes had five students identified as having SEN). The rest of the classes (0.53%) are likely to be invalid entries or misestimations from missing values.

Statistical methods
Following Krammer et al. (2019), the initial calculations were carried out with the Rpackage BIFIESurvey at a subsample of approximately 4500 students from 250 classes in 218 schools. The generated R-code was then sent to the BIFIE/IQS and applied to the complete data set. This procedure was chosen for privacy reasons and to provide anonymity. All calculations in this study refer to the complete data set.
The data set was multiple imputed with a nested data structure. Multi-level regression modelling was performed (George, Oberwimmer, and Itzlinger-Bruneforth 2016;Robitzsch and Oberwimmer 2016). The general underlying mathematical function used for the regression analysis was: xxxxx(II) y ij is the dependent variable, in this case the overall achievement score in reading comprehension in the national standards evaluation. X ij contains the fixed effects and Z ij contains the random effects. Moreover, decomposition of variance of fixed and random effects is considered between and within levels. For the full mathematical definition of the models used in this study, see Krammer et al. (2019) or Robitzsch and Oberwimmer (2016).
Moreover, only two levels were used, because most primary schools in Austria have only one class in the 4 th grade, hence school level and class level most often correspond to each other.
In the first step, a model without explanatory variables (unconditional model, Model 1) was calculated to determine whether variance at class level existed. Then, explanatory variables at student level were added in the Models 1-3. At a first step only the variables gender, age, first language and kindergarten attendance were used in Model 1. In the second Model (model 2) the calculated social status was added and in Model 3 also the individual self-concept was implemented.
In Model 4, the variable number of students with SEN was added as an explanatory variable at the class level. Furthermore, in Model 5, the class size was added as class-level predictor. Moreover, in model 6, social status and self-concept of the class were considered on both levels (class mean aggregated and group mean centred).
Finally, the models 7 and 8 are basically the same than model 6, only referring to the lower (model 7) and upper (model 8) quartile of the tested population.

Results
A multi-level analysis was performed in order to consider the nested structure of the data and predictors that affect the achievement score in reading comprehension. In Table 1, eight different multi-level regression models are presented. The ICC in the unconditional model (Model 0) reveals that 17% of the total variance in reading achievement occurs between classes and, hence, 83% of the variance within classes. In Models 1, 2 and 3, only individual predictors were taken into account. Unsurprisingly, strong relationships are presented for actually all individual predictors, with beta-values ranging from 37.02 for self-concept to 11.60 for kindergarten attendance. Variance explanation for the individual level reaches 35% in Model 3. Models 4, 5 and 6 included the variables at class-level. In Model 4, the number of students with SEN in class (−1.92) was included. This model revealed a very weak relationship between the number of students with SEN in class and the reading achievement of their class-mates. The same is true for Model 5 which included class size. Model 6 represents the main model, in which the predictors social status (25.53) and self-concept (16.37) at class level were considered. In contrast to Models 4 and 5, this relationship seems to be much stronger than class size or presence of students with SEN. Moreover, implementing these two variables reduces the relationship of the predictor number of SEN (−0.98) with reading achievement.
Model 6 represents the highest variance explanation with 57% for Level 2 and 34% for Level 1. In contrast to the self-concept and the social status at class level, the variance explanation of number of students with SEN is only 1%. Models 7 and 8 are similar to Model 6, only applied to the lower (Model 7), respectively, upper (Model 8) quartile. Interestingly, variance explanation of the class-level predictors is quite higher for the lower quartile compared to the upper quartile. Moreover, these two models are the only ones in which the predictor number of students with SEN is not significant. All other predictors in all other models are statistically highly significant.
In regard to effect size (Cohen's d), further analysis revealed only a very small negative effect of the number of students with SEN on their classmates' standard performance. In fact, the calculation of Cohen's d, by pooling the only slightly different means of the classes with different numbers of students with SEN, revealed only a small negative effect of d = −.159 for the presence of students with SEN in class compared to the classes that no students with SEN attended. Following Cohen (1988), this effect can be deemed very small. This is noticeable in contrast to the effect sizes for self-concept (d = .815) and first language (d = .67), for example, which are considerably higher.

Discussion
The multi-level regression analysis reveals a significant small negative relationship of the number of students with SEN in class with the standard achievement scores in reading comprehension of their peers for the entire population. The beta-value of −0.98 for this variable in Model 6 means that increasing the number of students identified as having SEN by one leads to a decrease in the average reading comprehension achievement of .98 points. Compared to a mean of 523 and a standard deviation of approximately 100, it seems obvious that the presence of students identified as having SEN has no practical implications for the reading comprehension performance of their classmates. According to these results, the hypothesis (i.a) regarding a statistically significant relationship between the number of students with SEN in class and the reading achievement of their peers can be confirmed. As this relationship tends to be negative, hypothesis (i.b) must be rejected. These results are comparable to the small effects revealed by Krammer et al. (2019), who found both weak positive and weak negative effects for mathematics standard achievements depending on the exact number of students with SEN in a certain class. Additionally, these results do not contradict those of the recent meta-analysis from Szumski, Smogorzewska, and Karwowski (2017), who found a very weak (d = .12) positive effect on the academic achievement of students without SEN in inclusive settings. The present study's effect is comparably weak (d = −.16).
In contrast to this negligible effect of the presence of students with SEN in class, different factors at the individual level seem to be stronger predictors for reading comprehension achievement. As seen in Model 6, strong effects of gender (−19.60), age (−18.41), first language (24.99), socio-economic status (11.43), kindergarten attendance (11.46) and self-concept (37.78) on the standard achievement in reading comprehension were revealed.
As expected, girls scored almost 20 points higher than boys. Hence, this study confirms that girls outperform boys as reported in large-scale reading assessment studies as PIRLS (Mullis et al. 2012), NAEP (Reilly, Neumann, and Andrews 2018), IEA and PISA (Baye and Monseur 2016). In addition, a child's age in relation to the age of the peers (relative age) is a strong predictor for reading abilities as well. Older children are more likely to score fewer points, which replicates the findings of Krammer et al. (2019), but contradicts previous findings from Germany that showed relative age effects for reading in favour of older children in Grade 2 (Thoren, Heinig, and Brunner 2016). The age effect reported there, however, diminished by Grade 3 and was actually reversed by Grade 8 in favour of the young over the old in reading. Similarly, Gold et al. (2012) revealed a relative age effect in favour of older German students in first grade. This effect narrowed in second grade, whereas in Grade 3 it was absent. All these findings refer to grade differences, indicating that only in lower grades older students show better reading abilities than younger ones, whereas in higher grades younger children tend to outperform the older ones. The effect revealed in the present study is very likely to be related to the fact that class repeaters and underachievers are often older children. These findings are in line with Lincove and Painter (2006), who found that repeating kindergarten and Grades 1 through 8 has a negative effect on academic outcomes. Even more, when controlling for grade retention, they identified younger students as having "better opportunities to acquire human capital" (p. 165) because they were more likely to attend college, showed an academic advantage in high school and were identified as having higher wages (Lincove and Painter 2006).
The attendance of kindergarten seems to act as a good predictor of reading achievements in line with Anders' (2013) systematic literature review on the effects of kindergarten attendance. In the present study, having attended a kindergarten for more than 3 years increased the reading achievement scores of a student for more than 11 points. From an empirical point of view, the effects of the inclusion of students with SEN on the reading skills of students without SEN are negligible compared to the effects of kindergarten attendance. For students with unfavourable starting conditions for the development of reading abilities (e.g. different first language than the language of instruction, few literacy resources at home), expanded kindergarten attendance could enhance their learning opportunities.
Regarding the first language of the students, students with German as a first language achieved an average of 24.99 points more than students who speak a different first language (either a different first language only or in a bilingual context). As other scholars have shown (e.g. Bergmüller and Herzog-Punzenberger 2012;Breit, Bruneforth, and Schreiner 2016;Melby-Lervåg & Lervåg, 2014), learning to read in a language that is not (exclusively) their first language greatly influences students' reading achievements. However, there are indications that several other contextual factors mediate the relation between first language and school performance. It was shown, for example, that students with a first language different than the language of instruction usually have significantly fewer books at home, their parents are less well educated and are more likely to pursue professions that are associated with low incomes or unemployment (McElvany, Becker, and Lüdtke 2009;Schnepf 2007;Zöller and Roos 2009). Comparably, in a study with Austrian third graders, differences in socioeconomic background and cultural capital partially accounted for the performance gap in reading comprehension between students with German as a first language and those with a different first language (Maitz et al. 2018).
The present study also confirms the often-reported influence of home socioeconomic status on students' reading skills (for an overview, see Chiu and McBride-Chang 2006;Niklas and Schneider 2013;Philipp 2011;Rodriguez-Brown 2011). The power of this variable encompasses home literacy environment, highest education of parents and the occupational status of the parents presented by the HISEI. As expected, this variable is highly connected to the reading achievement on the individual level (11.43) and also at Level 2.
A similar trend was revealed by the predictor self-concept at the individual level (37.88) and at class level (16.27), where -in contrast to socioeconomic status -the relationship seems to be much stronger at individual than on class-level; nonetheless, both levels are statistically highly significant. As expected, self-concept plays an important role above all for individual reading (Retelstorf et al., 2014).
However, compared with the effect size of the number of students with SEN the relationship of the class-level variables socioeconomic status and self-concept with the reading achievement seems to be much stronger. Along these lines, the variance explanation of these two variables (17%) also seems to be much higher than the number of students with SEN (1%). In this regard, the hypotheses (ii.a and ii.b) are confirmed.
Finally, having a differentiated look at the relationships of the reading achievement with the mentioned predictors, some interesting discrepancies are revealed for different sub-populations. In fact, neither for the lower quartile (0.32) nor for the upper quartile (−0.13) (regarding the achievement-groups) the relationship between the attendance of students identified as having SEN and the standards achievement of their classmates got statistically significant. This was not expected because we assumed that in inclusive settings low-achievers would benefit from individualised instruction, the presence of resources and of specialised teachers. Moreover, the results provide no evidence of an often-mentioned consideration (from policymakers and practitioners; Ruijs 2017) of the down levelling of class performance by the presence of students with SEN. In this regard, the hypotheses (iii.a and iii.b) can be rejected. However, having a closer look at the relationships of other predictors leads to further questions. In fact, all individual predictors show quite different beta-values for the two different sub-populations. Moreover, variance explanation at level two differs considerably between the lower quartile (74%) and the upper quartile (55%), indicating quite different relationships and composition mechanism for the different sub-populations.

Limitations
The present study only refers to the standards achievements in reading comprehension of primary students in the fourth grade in Austrian primary schools. Although similar results have been revealed for mathematics standards achievements in Austria (Krammer et al. 2019), implications to other subjects cannot be drawn. Additionally, it was not possible to include all variables, but only those of significant importance. Variables on the class level that have already been shown to influence students' reading achievements (e.g. the number of students with a different first language and the composition of students' socio-economic backgrounds in a classroom, see Bellin 2009;Biedermann et al., 2016) could not be included due to lack of data. The results are also limited to the Austrian context. More studies in different countries are needed to reveal the effects of inclusive schooling for students with and without SEN. The cross-sectional nature of the data does not allow drawing causal conclusions. Longitudinal studies are needed to further investigate the development in inclusive classes and to reveal the crucial predictors for enhanced learning opportunities for all students.
As the present study did not focus on different populations of students with special needs, the question remains whether the results are replicable considering these population differences. Although Szumski, Smogorzewska, and Karwowski (2017) did not find any differences for classes with children with severe disabilities or emotional and behavioural disorders, further studies should address these questions.

Conclusions
According to the results of this study, the presence of students with SEN in inclusive classrooms in Austria has practically no implications for their classmates' reading achievement. This replicates the results by Krammer et al. (2019) for abilities in mathematics and contradicts the assumption that the effect in reading and language could be more positive than in maths (Murawski and Swanson 2001).
Although a recent meta-analysis by Szumski, Smogorzewska, and Karwowski (2017) showed a positive effect of inclusion on the academic achievement of students without disabilities, the effects were marginal as well. Furthermore, these positive effects were restricted to studies from the United States and Canada. In European countries, however, no such positive effects of inclusion on students without SEN were found in the meta-analysis. The authors explain this difference with the assumption that teachers and participants of the educational system in the United States understand inclusion as a strategy to change the entire school system, whereas in Europe it is often assumed that inclusion can be implemented in a superficial way. These results demonstrate that inclusion in Austria seems to be implemented very half-heartedly. Transforming an educational system from an exclusive one to a system of education for all does not seem possible under these circumstances.
To reveal the positive effects of inclusion on the achievements of students without SEN, a strong commitment to a positive implementation of an inclusive school system would be necessary. Developing an inclusive school requires differentiated and individualised learning with high-quality instruction and involves all participating persons -teachers, parents, principals and school authorities.