Variation, context, and inequality: comparing models of school effectiveness in two states in India

ABSTRACT Existing research on “school effectiveness” indicates that differences at the school level contribute significantly towards variation in student outcomes; however, less is known about the effectiveness of schooling in low- and middle-income countries (LMICs). This paper addresses this gap using quantitative analysis of data from two states in India. It compares four multilevel model specifications to explore how school performance can be measured in the Indian context. The analysis reveals a large “school effect”, while also offering evidence that a considerable proportion of between-school variation stems from student intake. Findings suggest that school “value-added” models could offer better understanding of school performance and learning equity in India, and indicate the importance of recognising how differences in model specification affect those schools identified as “more effective”.


Introduction
Within international education policy, there is increasing emphasis on measuring student learning, with the aim of supporting greater accountability within education systems (United Nations Educational, Scientific and Cultural Organization [UNESCO], 2016) and improving educational effectiveness (World Bank, 2018). In low-and middle-income countries (LMICs) in particular, the shift of focus from educational access to quality of learning (United Nations [UN], 2016;UNESCO, 2015) has drawn attention towards the outcomes of schooling (Rolleston, 2016), initially within "basic" education but increasingly also within secondary schools (UN, 2016;World Bank, 2018). Evidence from the Indian context exemplifies much of this broader international debate on access versus quality. Enrolment in India is now close to universal at elementary level, and has reached almost 80% at lower secondary (Ministry of Human Resource Development [MHRD], 2018); yet, despite this, there is evidence that student learning levels are notoriously low (Dundar et al., 2014) and in some places appear to be declining further over time (Annual Status of Education Report [ASER], 2017[ASER], , 2018. As a result, questions have been raised about the "effectiveness" of many schools within the Indian education system, and about the quality of education they are providing (Muralidharan et al., 2014).
International "school effectiveness" research taking place mainly in the USA, UK, and other high-income countries finds that differences at the school level contribute significantly towards variation in student learning outcomes, indicating that children will learn more in a "more effective" school (Rivkin et al., 2005;Scheerens et al., 2003). This paper considers this in the Indian context, using secondary data from two states (Andhra Pradesh and Telangana) to estimate school effectiveness and thereby explore the extent to which school-level differences contribute to student learning and to potential gaps in the equitability of education outcomes. This is a topic of considerable policy interest in India, where there are great concerns about the failings of the education system to provide a high standard of education to all children (ASER, 2018). Inspired by school effectiveness research conducted in other contexts, the analysis undertaken in this paper offers four alternate approaches by which school performance in India could be assessed, and considers some of the implications of each approach for how we understand and define school quality. All four models use an outcome-based approach which identifies an "effective school" as one which leads to increased student learning according to a standardised assessment over a defined period of time (in this case, one school year; Rivkin et al., 2005), although it is recognised that there are other, equally valid, ways of defining of school quality (Goe et al., 2008). In particular, this paper makes use of the concept of school "value-added", which assumes that there are things schools can do which "add" to student learning and progress over time (Scheerens et al., 2003).
The research questions addressed in this paper are as follows: (1) How do alternate ways of modelling school effectiveness affect our understanding of school performance and learning equity in India? (2) How does the perceived effectiveness of different schools vary when different models are used, and what are the policy implications of this?
These questions are addressed using secondary data from the Young Lives 2016-17 school survey in Andhra Pradesh (AP) and Telangana, India. The data are analysed using four different multilevel random effects model specifications to estimate and explore school effectiveness in the Indian context, using Grade 9 student maths test scores as the measure of student learning. The Young Lives data set is well suited to this analysis as it provides cognitive test scores from the beginning and end of the school year from a large number of students in a random sample of schools, along with linked data on student, teacher, and school background. This is unique in the Indian context, where few other large-scale linked data sets are available at the school level, and where, as a result, little school effectiveness research has been undertaken to date.
What is known about educational quality in India?
In India, universal primary school enrolment is now close to being achieved, leading to an increased focus on secondary and higher levels of education (Banks & Dheram, 2013;Lewin, 2011;MHRD, 2018;World Bank, 2009). Nationally, 80% of children are enrolled in secondary school 1 ; while in AP and Telangana (the states in which the Young Lives data are collected), the figures are 76% and 83%, respectively (MHRD, 2018). However, despite higher numbers of children entering secondary schooling, there are ongoing concerns about the quality and equity of educational outcomes (Dundar et al., 2014;World Bank, 2012). Evidence is mounting of the low levels of learning among children who have attended school for many years (ASER, 2017(ASER, , 2018Das & Zajonc, 2010;Walker, 2011), with India's 2009 Programme for International Student Assessment (PISA) performance revealing that almost 90% of 15-year-olds who took part in the study fell far below the baseline levels of reading, maths, or scientific literacy "needed to participate effectively and productively in life" (Walker, 2011, p. xiv).
There is also extensive existing research to show that the Indian education system is highly unequal, with girls, children from poorer backgrounds, and those in rural areas less likely to enrol in school, more likely to drop out, and likely to have a lower level of attainment (Alcott & Rose, 2017;Moore et al., 2017;Singh & Mukherjee, 2017. Children from these backgrounds (and particularly those at the intersections of these groups; e.g., rural girls) are found to be concentrated in poorly performing schools. In the context of growing privatisation, this often (although not exclusively) means government-funded schools, with those who can afford it opting for private schooling where possible (Härmä, 2011;James & Woodhead, 2014;Kingdon, 2017;Singh & Bangay, 2014). As a result, disparities present at the beginning of schooling due to student background are further exacerbated by the education system, creating a "Matthew effect" in which the more advantaged benefit from the concentration of further advantage over time, while those who start off with less fall further behind with each subsequent year of schooling (Alcott & Rose, 2017;Dundar et al., 2014). India has many excellent schools, providing a level of education similar to that found in higher income countries, yet it also has many others in which children appear to be given few chances to learn (Das & Zajonc, 2010). This suggests that the ability to compare the effectiveness of schools using school-level analysis could be key to understanding how the outcomes of the education system as a whole could be improved. It also highlights the need to understand more clearly how variations in school effectiveness contribute towards the equity (or inequity) of learning in India, and to the learning outcomes of those who are the most disadvantaged.

Measuring school effectiveness: the international literature base
There are multiple reasons we might want to understand how a school is performing; whether as a parent choosing a school for their child, a policy maker identifying schools which may be struggling, or a researcher determining those factors which make a school more effective. School effectiveness approaches assess performance in relation to the "output" of the school, usually in terms of student learning (Scheerens et al., 2003), with the assumption that attending a more effective school will enable a student to learn more (Rivkin et al., 2005). As a result, school effectiveness research has a relatively narrow focus, seeking to quantify how effective schools are (i.e., how much difference they make to student learning in comparison to other schools), and identify those characteristics and practices which are associated with greater student attainment (Teddlie & Reynolds, 2000). In recent years, it has come to offer a concrete link between educational research and policy (Saunders, 1998), becoming more commonly used within the public domain (e.g., in English school performance statistics; see Perry, 2016). As a result, questions of measurement are no longer a concern for researchers alone, and definitions of school effectiveness are politicised, with decisions between approaches such as raw attainment, value-added, or contextual value-added becoming increasingly contentious (Leckie & Goldstein, 2017; Organisation for Economic Co-operation and Development [OECD], 2008).
Quantitative measurement of school performance can take various forms. "Raw attainment", a cross-sectional measure of student learning at one point in time, was one of the earliest methods used, most often in the form of average exam scores or rate of exam passes (OECD, 2008). Initially popular with policy makers for its "non-technical" nature (Saunders, 1998), in recent years measures of raw attainment have faced increasing criticism that they are not able to remove the influence of factors such as student intake which are beyond the control of the school (Goldstein et al., 2000). As a result, they can cause bias by assuming that all students, in all schools, regardless of their background, will have the same predicted outcome (Harris, 2011).
Measures of "contextualised attainment" go some way to addressing these concerns through the inclusion of student-level background variables. Used within a number of studies as an alternative to a prior attainment measure (e.g., Lenkeit, 2013), the reduction of the size of the "school effect" in these models reveals the extent to which student intake impacts on school outcomes. However, while these estimates may offer a means of accounting for some of the differences in the student population, they offer "little improvements over the use of completely unadjusted ('raw') […] scores" (Goldstein et al., 2000, p. 2), and as a result still tell us little about the effectiveness of the school as an institution (Goldstein et al., 2000). However, what they can reveal "is the considerable size of the achievement gap between schools in many educational systems" (Muñoz-Chereau & Thomas, 2016, p. 3) even when differences in background are controlled for, thereby highlighting the need for measurement approaches more able to take this into account. This is particularly important in contexts such as India, where there are very large differences in student intake between schools, as will be discussed below.
"Value-added" approaches to the estimation of school performance offer an alternative to those based on a single measure of learning. Value-added measures are based on an "input-output model" of schooling (Saunders, 1998, p. 2) whereby schools can "add" to the learning gain students would otherwise be expected to make (Thomas, 2001). By adjusting models to take account of differences in students' prior attainment (Goldstein, 1997), value-added approaches allow us to compare "like with like" (Perry, 2016), "[attempting] to strip away factors which are associated with performanceeither positively or negativelybut are not related to institutional quality" (Saunders, 1998, p. 2). There is now close to unanimous agreement among researchers that value-added measures of school effectiveness are more robust than those based on raw attainment (Papay, 2011). However, despite (or perhaps because of) the increasing dominance of value-added models, the risk that the estimates they generate can be easily misinterpreted is an issue of some concern. As with any estimate, they are subject to error, particularly when schools are small (Goldstein, 1997), and in some studies have been found to be unstable over time and across subjects (Harris, 2011;Thomas & Mortimore, 1996). This suggests a need for caution, particularly when taking a single-subject, single-cohort model as an accurate means of measuring school effectiveness, or when using estimates for high-stakes purposes (Goldstein, 1997).
Growing awareness of the benefits of value-added modelling has resulted in a policy shift in some countries, for example, in England, where school performance tables moved from reporting raw attainment to progress-based statistics from 2002 onwards (Leckie & Goldstein, 2017, 2019Saunders, 1998). However, this trend has not been universal, and in many countries reporting of raw attainment remains the norm (OECD, 2008), either due to a lack of suitable linked student data or insufficient awareness of the benefits of more sophisticated modelling approaches. Much like raw attainment measures can be improved by the addition of student background variables, value-added models can also be "fine-tuned" using student and school-level contextual factors (Thomas & Mortimore, 1996), which help account for factors outside the control of the school (Leckie & Goldstein, 2019;Perry, 2016).

Measuring school effectiveness in LMICs
To date, the majority of school effectiveness research has taken place in more economically developed countries. While a number of studies have taken place in LMICs since the 1980s (e.g., see Crawfurd & Elks, 2019;Fuller, 1987;Glewwe et al., 2011;Hanushek, 1995;Scheerens, 2001), these remain less common, in part because largescale data sets linking student learning outcomes to school and teacher data are considerably less common in these contexts . As a result, the knowledge base on what contributes towards school effectiveness in LMICs is less complete in comparison to higher income contexts (Scheerens, 2001), and the range of methodological approaches used is more limited (Fuller, 1987;Riddell, 1997). These concerns are increasingly relevant, given the current international policy focus on understanding and improving educational quality around the world (UNESCO, 2016).
The literature base which does exist suggests that there are some findings relating to school effectiveness which are more common in a lower income context. For example, most studies have found a larger school effect in LMICs, indicating that "the predominant influence on student learning is the quality of schools and teachers to which children are exposed" (Heyneman & Loxley, 1983, p. 1162. Input factors such as textbooks, teacher qualifications, and school infrastructure are also found to have a much larger effect than in higher income countries (Fuller, 1987); perhaps unsurprisingly, given that there is likely to be a much greater degree of variation in these basic inputs in LMICs (Scheerens et al., 2003). However, other findings are less consistent across studies, providing an important reminder that contextual factors remain the driving force in understanding school systems, particularly when looking across a diverse set of cultural contexts Scheerens, 2001;Yu, 2007).
In LMICS, as elsewhere, methods of school effectiveness analysis remain linked to purpose and context, as well as to the structure and availability of data (Timmermans & Thomas, 2015). While a contextualised value-added measure may be the "fairest" way for a government to identify schools performing above or below expectation (Leckie & Goldstein, 2019), it may be less useful for a parent seeking to identify the school most suited for their childnot least because it is based on data relating to a past cohort of children (Leckie & Goldstein, 2011). However, what does appear clear from existing international literature is that measurement using raw attainment is likely to be the least useful in estimating the contribution of the school accurately, particularly in contexts where school intakes vary considerably (Muñoz-Chereau & Thomas, 2016). It is unfortunate, therefore, that such measures remain particularly common in many LMICs (OECD, 2008), including India (e.g., see NITI Aayog, 2019).

Methodology and data
The Young Lives study This paper addresses questions of school effectiveness using school survey data from the Young Lives study, a longitudinal study in four lower income countries (Ethiopia, India, Vietnam, and Peru) which gathers data on childhood poverty and associated social and educational themes (Boyden & James, 2014). In India, Young Lives' data collection has taken place within 20 sites (at mandal level) in AP and Telangana. These states and sites were purposively selected when the Young Lives study began in 2002; the states were chosen as they represented approximately "median" states in terms of their Human Development Index and net state domestic product (Kapoor, 2017); while the sites within these states were selected with the aim of illustrating social diversity albeit with a "pro-poor" slant (Boyden & James, 2014).

Young Lives school survey data
The majority of Young Lives' longitudinal data are collected at the household level. To complement these data, Young Lives has also conducted a series of school surveys in each of the four focus countries. In India, these include a primary school survey (in 2010) and a secondary school survey (2016-17; Iyer & Moore, 2017) aimed to shed light on the quality, equity, and effectiveness of the schooling accessed by children living within the Young Lives sites.
This paper utilises data from the 2016-17 secondary school survey, which took place within the 20 Young Lives sites. The aim of the survey was to look at "the quality and effectiveness of education in AP and Telangana" (Moore et al., 2017, p. 7). Schools were sampled using a two-stage process of stratified sampling by school management type 2 within each site . Across all 20 sites, the survey sample consisted of 9,820 Grade 9 students (age 14; the penultimate year of lower secondary school) within 205 schools, along with their maths and English teachers. As a result of this sampling strategy, student data are nested within the relevant teacher-, school-, and site-level data (see Table 1), and are thus well suited to multilevel analysis. A selection of descriptive statistics on the students sampled by this survey is presented in Table 2.
The India school survey data set includes three student outcome measures (maths, English, and transferable skills) gathered through bespoke cognitive tests . All Grade 9 students present on the day of the first survey visit completed these cognitive tests and accompanying background instruments, with a second measure of cognitive attainment collected from the same students using linked tests at the end of the school year. For this study of school effectiveness, analysis focuses on data from the maths cognitive tests, which were designed to be closely related to the school curriculum, while the English test aimed to assess "functional language" learned both within and outside school and is therefore less relevant to understanding the school effect (see Azubuike et al., 2017). Tests were developed in collaboration with an Indian test development organisation and a group of subject and curriculum experts from AP and Telangana through a multistage process of suggestion, feedback, and pilot testing. As a result, student outcomes in the standardised tests can be related to gradelevel competencies within the state maths curriculum (James & Rossiter, 2018).

Methodology: estimating and comparing four models of school performance
This paper uses four different models of school performance to explore the extent to which estimation of school effectiveness in India varies when different measurement approaches are used. The models used are: raw attainment, contextualised attainment, school value-added; and contextualised school value-added (see Table 3). These models, which make reference to those within Munoz-Chereau and  paper on understandings of school effectiveness in Chile, are used to shed light upon the implications of measuring school performance in different ways in India. In this study, each model has two levels, with students clustered within schools, which is well  The abbreviations used in Column 5 are as follows: SC = Scheduled Caste; ST = Scheduled Tribe. These refer to officially designated groups of people who are considered to be among the most disadvantaged. a The total number of schools and students is reduced from that shown in Table 1 due to data cleaning prior to analysis.
suited to the hierarchical structure of the Young Lives data discussed above. This analysis structure also helps to account for the similarities found between students within the same school (in comparison to those in other schools), which means that treating student observations as independent would result in biased estimates. As will be discussed in the next section, there is a high degree of homogeneity within schools in this data set (and more broadly within the Indian context), making this a particularly important attribute to consider in this study.
The value-added models include a measure of prior maths attainment from the start of Grade 9; around 8 months earlier than the end-of-year test score which forms the outcome measure for the models. The contextualised models also include additional background variables at the student level which existing literature suggests have relevance to both student attainment and school intake in the Indian context and more broadly (e.g., see Muñoz-Chereau & Singh & Mukherjee, 2015). As discussed above, the inclusion of these variables aims to control for differences in the student population which may affect learning outcomes but which are outside the control of the school, thus making it more possible to identify any "school effect". It should be noted that the inclusion of data from just one school cohort and one academic year within the Young Lives data set presents a limitation in estimating the effectiveness of schools as institutions (as discussed in Goldstein, 1997). However, given the scarcity of studies considering school effectiveness within lower income contexts (Scheerens, 2001;Scheerens et al., 2003), and the lack of longer term linked education data from India (Aslam et al., 2019), it is hoped that this paper will continue to offer a useful contribution to understandings of school quality, while also highlighting the potential for collecting and analysing longer term data sets to enable more in-depth work in the future.
Estimating school effectiveness: the effect of different models overall and for individual schools All four models listed in Table 3 are multilevel random effects models. The outcome variable for each is student maths attainment from the end of Grade 9, estimated using three-parameter (3pl) scaled item response theory (IRT) and centred to have a mean of 500 and a standard deviation of 100 (for further details of the IRT scaling process used, see Moore, 2018). The value-added models also include a measure of student maths attainment from the start of the school year; this is estimated on the same scale as the outcome variable. Prior maths attainment from start of Grade 9, student background variables (sex, mother's education, number of books at home, caste).
In estimating the "effectiveness" of the schools within this data set, the output of the four models is compared both in terms of the fixed effect parameter coefficients and the variation in student learning found at the school level (the variance partition coefficient [VPC]) estimated by the random effect parameters. The latter measure is of particular importance in understanding how conceptualisations of institutional effectiveness at the school level alter when student factors are considered.
To consider this further, school-level residual estimates (an estimate of the extent to which each school differs from the average) are then categorised into "above average", "below average", and "not statistically different to average" for each of the four models. This comparison, and the change in school categorisation between the models, highlights the potential effect of the use of different models on individual schools, with relevance for those seeking to interpret school performance in different ways in the Indian context.

Considering between-school inequities in learning and progress
Descriptive analysis of the Young Lives data highlights the high degree of homogeneity in student background within schools (and particularly within school management types) in India, along with a correspondingly high level of heterogeneity between schools and school types (Table 2). When viewed in combination with a great deal of variation in school performance, this indicates a process by which children are "sorted" into different schools on the basis of background, serving to exacerbate inequities in learning outcomes between different groups of children. For this reason, the school residuals estimated using the four different models of school effectiveness described above are explored for different groups of schools (school management types, school locations) to consider how the use of different models informs our understanding of effectiveness for these different kinds of schools. Such comparison is indicative of those factors both within the school and more broadly which play a role in driving inequities in student learning outcomes in this context. It also allows deeper examination of the implications of these findings, particularly with regard to what they suggest for the identification of effective schools within this context.
The final stage of analysis in this paper involves the ranking of individual schools by their school residual, enabling comparison of how this ranking changes which schools appear as most or least effective for each of the four models. Through identifying the kinds of schools (by school management type, and location) ranked as the "best" or "worst" performers across the models, it is possible to consider what the implications of these differences are for student learning and progress in different schools, and thus to begin to explore what this means for learning equity in these two Indian states.

Findings
Although concerns about the quality of education in India receive a great deal of attention, there is currently no means by which the performance of individual schools is officially assessed, with government data collection at the school level focused largely on inputs such as the number of teachers and quality of the buildings (e.g., see http:// schoolreportcards.in). Consideration of different ways in which school performance could be modelled is therefore of considerable relevance.

A highly segregated school system: descriptive analysis
Descriptive analysis of the Young Lives school survey data reveals a highly unequal and segregated school system, with children "sorted" into schools according to their background. This is a finding with important implications for learning equity which will be relevant to consider in subsequent sections of this paper when exploring different models of school effectiveness. As shown in Table 2, Private Unaided schools and those in urban areas in particular are attended by more advantaged children (those from richer families, from more advantaged social groups, and those with more educated parents). These schools also achieve higher learning outcomes, on average. The presence of this type of unofficial segregation suggests that, unless these factors are taken into account when school performance is evaluated, there is a risk that measures of effectiveness will simply reflect how advantaged a school's intake is and the correlation of this with attainment, effectively "punishing" those which teach more disadvantaged students (Leckie & Goldstein, 2019).

The effect of context: modelling school performance in AP and Telangana
Model 1: a raw attainment (RA) model Maths test score ij = b 0 + u 0j + e ij The first model of school performance used in this paper is a simple "raw attainment" model, which contains only the outcome variable (maths attainment at the end of Grade 9) and no explanatory variables. This model considers student attainment at just one point in time, and is therefore equivalent to reporting school performance based on a one-off measure such as exam pass rate.
Output from Model 1 (shown in Table 4) reveals that, when differences in student intake or prior attainment are not controlled for, there is a high degree of betweenschool variation in student outcomes. Almost 40% of the variation in student learning is found at the school level (in the form of the variance partition coefficient); this is fairly similar to that seen in other lower income countries (Scheerens, 2001), although it is considerably higher than would be expected in a context such as England (Thomas & Mortimore, 1996). This model makes clear that if how well children do at the end of a period of schooling is all that is considered, without looking at any contextual variables or considering student prior attainment, the institution which a child attends appears to have a very large effect on how much they learn. It therefore also highlights the extent of between-school variation in learning outcomes, with implications for learning equity (or inequity) within broader society. Given the high level of homogeneity in school intake discussed above, and the "sorting" of children into schools by background indicated by Table  2, this finding is highly suggestive of a society with large disparities in learning attainment at the end of secondary schooling on the basis of socioeconomic status and other background factors.  although it goes some way to controlling for differences in student intake, it cannot yet be considered an estimate of "school effectiveness". Existing literature suggests that school performance in India is highly varied, with a great deal of inequality in accessing those schools which do better (Singh & Bangay, 2014). Controlling for differences in student background is therefore important when modelling school performance in order to understand whether schools are doing well despite or because of their intake.
The output from this model reveals that most of the included student background characteristics are found to be significantly associated with maths attainment. Being male is associated with a higher maths score, as is having a more educated mother, living in a household with more books, and being from a more advantaged social group (i.e., "general caste" compared to other caste categories). This provides further indication of a high level of inequity in learning outcomes associated with student background characteristics. Once these background variables are included in the model, the estimated school-level variation (the VPC) drops to 31%. This confirms that, while student attainment varies considerably between schools, this relates at least partly to differences in the composition of the school population, with implications for both understandings of school performance and for equity of learning opportunities for children from different backgrounds.

Model 3: a value-added (VA) model
Maths test score ij = b 0 + b 1 Prior attainment i + u 0j + e ij Model 3 offers an alternative way of extending Model 1, through the inclusion of a prior attainment measure from the start of Grade 9. This model does not include any student background variables other than prior attainment, which in itself will go some way to controlling for the earlier effects of background on learning. The results from this model reveal that, unsurprisingly, prior maths attainment is strongly and significantly associated with maths proficiency at the end of Grade 9; with an increase of 1 point at the start of the year associated with an increase of 0.7 points 8 months later. What is particularly notable from this model is that the inclusion of the prior attainment variable halves the amount of variation found at school level to just 20%. This confirms that, to a large extent, the apparent variation between these schools occurs due to differences in student starting points rather than in the effectiveness of schooling. As noted with regards to Model 2, this provides strong evidence for the informal "sorting" of children into schools in India, with higher achieving students clustered into certain schools which are higher attaining (in terms of raw test scores) and therefore have the appearance of performing better, even if they are not actually "adding more value" to student outcomes than those teaching students with a lower starting point.

Model 4: a contextualised value-added (CVA) model
Maths test score ij = b 0 + b 1 Prior attainment i + b 2 Female i + b 3 Mother ′ s education i + b 4 Number of books i + b 5 Caste i + u 0j + e ij The final model, Model 4, extends Models 2 and 3 with the inclusion of both prior attainment and selected student background variables. This model reveals that the inclusion of prior attainment reduces the significance of some other background variables. For example, "female" is no longer significant, suggesting that although girls have lower maths attainment at the end of Grade 9 (as seen in Model 2), this is not because they have made less progress over the surveyed school year but is instead likely to be because of disparities which occurred at an earlier stage in their education. Higher levels of mothers' education (i.e., secondary or higher education) remain significantly associated with learning, but having a mother who attended primary school is no longer significantly different to a mother with no education. Similarly, "scheduled caste" and "scheduled tribe" both remain negatively associated with maths attainment (although the coefficient is considerably smaller), but the "other backward caste" category is no longer significantly different. The number of books in the home remains significant, revealing that this is important both for overall attainment and progress made over the course of Grade 9. In addition, as would be expected, student prior attainment also remains highly significant in the CVA model. In this final model, 16.7% of variation in student attainment is found between schools. This is considerably lower than in Model 1, which considers raw attainment alone. This finding has important implications both for policy and for those researching education in India, confirming that a large proportion of what initially appears to be difference in school performance is actually attributable to differences in the student intake, both in terms of initial levels of learning and other student background characteristics. Situating this within research into school effectiveness in LMICs more broadly, this highlights that, while the school effect is often found to be larger in lower income contexts, this may be due to more extreme variation in student intake between different categories of schools, such as different school management types or schools in urban and rural areas. This is likely to be particularly relevant in countries such as India which have experienced huge growth of the "low fee private school sector" (Srivastava, 2013) leading to a largescale marketisation of the education system.
Model 4 also highlights the importance of considering other implications for the inclusion of background variables within such models. While controlling for differences in student background and prior attainment potentially helps to provide a more accurate reflection of the effectiveness of individual schools in terms of their support for student learning, controlling for these factors (which are associated with attainment) may also result in the huge variation in learning between different types of schools (and therefore different groups of students) being obscured. This indicates the need for a balance between modelling learning at the school level and at the student level, as will be considered later in this paper.

The effect of modelling decisions on understandings of the performance of individual schools
These different ways of measuring school effectiveness can then be used to explore how the performance of individual schools or groups of schools is perceived; something of considerable relevance in the Indian context where parents and guardians are found to make repeated "school choice" decisions throughout the course of their children's school career (James & Woodhead, 2014). To consider the effects of the different models in further detail, Table 5 shows the number of schools classed as performing significantly above or significantly below average (in terms of the school residual) for each of the four models. This reveals the importance of model specification in assessing school effectiveness (Goldstein et al., 2000;Leckie & Goldstein, 2019). In particular, those models where a greater number of fixed effects have been included identify fewer schools as significantly different from the average, as a large proportion of the factors contributing to the between-school variation has been controlled for. This again highlights the importance of considering the reasons for investigating school performance: If trying to assess which school is most effective in helping students learn, it appears fairest to control for all factors which are outside the control of the institution. In contrast, if there is interest in understanding how the distribution of students across different schools relates to inequities in attainment, it may be less helpful to control for these factors, as this would serve to obscure some of the variation being found between the schools.
Exploring this further, it can be seen that moving from Model 1, a raw attainment model, to Model 4, a contextualised value-added model, leads around half of the schools (101) to move between these performance categorisations. As shown in Table 6, it is clear that in Model 4, where both student prior attainment and background are considered, the number of schools classed as performing "significantly below average" drops from 76 to 37. It is also particularly notable that two schools move from being classed as performing "significantly below average" to "significantly above average" once these additional explanatory variables are included. Given the high stakes of perceptions of school performance in India, where increasing numbers of families are opting out of the government sector and into private schools, this analysis makes clear the importance of acknowledging the effect of student background when  59  49  43  39  Significantly worse than average a  76  63  47  37  Not significantly different to average  65  88  110  124  Total  200  200  200 200 Note: = raw attainment; CA = contextualised attainment; VA = value-added; CVA = contextualised value-added. a Confidence intervals of +/-1.96 standard error used to classify schools as significantly different to average (0). assessing school quality, remembering that "adjusting for pupil background qualitatively changes many of the interpretations and conclusions one draws as to how schools […] are performing" (Leckie & Goldstein, 2019, p. 532).

Considering the effect of different models across different school types
The effect of modelling decisions on understandings of school performance becomes even more apparent when considering how the use of each model affects different types of schools. For example, as can be seen in Figure 1, the average school effect (in terms of the "school residual"; i.e., the extent to which the school differs from the average) for Private Unaided schools drops dramatically moving from Model 1 to Model 2, and then drops still further when prior attainment is also considered in Model 3. On the other hand, for Tribal/Social Welfare schools (and to a lesser extent, State Government schools) the opposite pattern is seen, and the school effect is found to "improve" once background variables and prior attainment are added into the model. This indicates that, while there may be big differences in the performance of different school types in terms of raw attainment, with Private Unaided schools (attended by the most advantaged children) appearing to perform best, there are fewer differences once the background of children and their prior attainment is included in the analysis. A similar effect can be seen in Figure 2 in terms of school locality, with Model 1 benefitting urban schools, while the performance of rural schools appears better once student background and prior attainment are taken into account. With so much discussion of the poor performance of rural schools and government-run schools in India, this finding is important: Unless the sorting of students into each type of school is considered when measuring the quality of schooling, the resulting estimate will further benefit those schools which are already advantaged even though the data suggest they are not actually performing better once their intake is controlled for. However, it must also be remembered that the resulting school residuals from Model 1 are highly revealing in terms of the extent of inequalities in attainment they suggest. As Figure 2 shows, there is a huge gap between the learning attainment of children in urban and rural schools, and it is this which is likely to determine the types of opportunities they can subsequently access, rather than the effectiveness of the school they attend.

Comparing school rankings across models
As a final comparison, the effect of the different models on individual schools is considered. Ranking schools by their school residual for each model, it is clear that there is a big difference in which are seen to be the most or least effective once contextual variables are included (see Table 7). In Model 1, all of the top five performing schools are urban Private Unaided schools, while the bottom five are either Tribal/Social Welfare schools or State Government schools in rural areas. However, when both prior attainment and student background are included in the model (Model 4), this alters somewhat, and there is a great deal more variation in the types of schools found to be more or less effective. The school ranked as "most effective" is now a rural State Government school, while one of the bottom five ranked schools is a Private Unaided school in an urban area. Although such rankings would have little practical use in reality, as the schools they compare are spread across seven districts and 20 mandals, this comparison highlights how different ways of understanding and measuring school effectiveness can have a big impact on which schools are perceived as most effective, with very real implications for schools and the children within them.

Discussion
The analysis conducted in this paper reveals that, as expected, student prior attainment and background variables related to socioeconomic status, such as caste, mother's education, and the number of books in the household, are positively associated with Grade 9 attainment in maths in AP and Telangana. However, it is notable that the effect of some of these background variables becomes less significant in Model 4, suggesting that their impact largely occurs in the years of schooling prior to Grade 9 and is therefore accounted for once differences in prior attainment are included in the model. This confirms the importance of addressing issues of education equity at a much earlier point in the schooling cycle if learning outcomes at secondary level are to be improved for all students.
Existing literature suggests that there is a high degree of variation in school performance in India (Das & Zajonc, 2010). However, the four models estimated in this paper reveal that over half of the between-school variation in student raw attainment is explained by differences in intake and prior learning, suggesting that much of the apparent school-level difference is due to student intake rather than differences in school effectiveness. This is perhaps not surprising, as India has many deeply embedded inequalities (Alcott & Rose, 2017;Bhagavatheeswaran et al., 2016;Crouch & Rolleston, 2017) as well as an education system based on "school choice" which supports the unofficial segregation of children into better or worse schools depending on their socioeconomic status and location (Härmä, 2011).
Due to the clear impact of school composition on observable measures of performance such as test scores, it is important for policy makers and researchers to reflect on how information on school effectiveness (rather than only raw attainment) can be made accessible to parents and others making decisions about schooling. The analysis conducted in this paper reveals that the model specification used has considerable effect on how school performance is classified, suggesting that perhaps it would be helpful to report a selection of different progress measures (as suggested for the English education system by Leckie & Goldstein, 2019) along with an explanation for a "lay audience" of why each is relevant. This would have particular, and ongoing, relevance in the Indian context, given the importance of "school choice" and the regular movement of children between schools on the basis of perceived "quality" (James & Woodhead, 2014). Without this, discussions of school performance will continue to support the dominant deficit narrative around government schools in particular, when compared to private schools, and of rural schools compared to urban schools. Alongside this, the analysis undertaken in this paper also confirms the importance of seeking to understand more about the learning inequities arising out of the "sorting" of children into schools which have different levels of (raw) performance, both in terms of the mechanisms for such sorting, and the outcomes of it. Finally, this paper also highlights the importance of future attempts to examine school effectiveness in India taking into account differences in student background and prior attainment, if they are to reflect the very different circumstances in which these schools are working and the extent to which they can be said to be truly succeeding at helping their students learn. In particular, the finding that many lower attaining schools (in terms of raw scores) may be more or equally effective (in terms of student progress) indicates that there may be policy relevance in taking such analysis forward to consider how different school types deliver "effective" schooling. In this way, extending this type of school effectiveness analysis in the Indian context could serve as a means by which to identify how processes of schooling could be reformed to address, rather than exacerbate, the extreme inequities in learning currently found within the education system.

Notes
1. Gross enrolment rate. 2. The school management types used for sampling were: Private Aided, Private Unaided, State Government, and Tribal/Social Welfare. These four school types make up 99% of the schools within these sites. Further information on the characteristics of these school types can be found in Moore et al. (2017).

Funding
Work on this paper was supported by ESRC funding (Grant reference: SWDTP ES/P000630/1)

Notes on contributor
Rhiannon Moore is a PhD student based in the School of Education at University of Bristol. Her mixed-methods PhD explores teacher effectiveness, motivation, and professional knowledge in the Indian context, and considers the extent to which different teacher traits make a difference to the classroom environment and to how much students learn. Before starting her PhD, Rhiannon worked on a number of education projects in India, Sub-Saharan Africa, and in the UK.