Professional learning communities under test-based accountability: evidence from an Israeli intervention programme

ABSTRACT Test-based accountability (TBA) draws on a managerialist ideology that emphasises standards, constant measurement, and external motivation for improvement. It stands in sharp contrast to the idea of professional learning communities (PLCs) that aim to mobilise teachers’ internal motivation and willingness to cooperate with peers to facilitate a joint, self-reflective inquiry process of pedagogical improvement. The Israeli education system has adopted TBA policies. To determine how this affects a professional learning programme focused on reflective inquiry, we analysed staff discussions recorded in 180 PLC meetings in 17 schools. The study suggests that TBA not only narrows the curriculum and the repertoire of pedagogical practices used by teachers but also constrains the ability of teacher professional learning to counterbalance these negative consequences.


Introduction
In education, test-based accountability (TBA) and professional learning communities (PLCs) aim at a similar goal: improving educational processes whilst holding teachers accountable for their pedagogical improvement.Yet they build on very different logics.TBA approaches draw on a managerialist ideology that emphasises top-down predefined standards, constant measurement, and external motivation for improvement (e.g.Lingard, Martino, and Rezai-Rashti 2013).PLC programmes, in contrast, endeavour to mobilise teachers' internal motivation and willingness to cooperate with their peers to facilitate a joint, self-initiated, self-directed process of learning and improvement (e.g.Cochran-Smith and Lytle 1999).A major difference between these two approaches is related to trust.TBA expresses a distrust of the will and ability of teachers and principals to improve their pedagogical practices without external coercion (e.g.Ball 2003;Lingard, Martino, and Rezai-Rashti 2013).The PLC approach is much more optimistic about school staff's capacity and internal motivation to improve and expresses trust in bottom-CONTACT Yariv Feniger fenigery@bgu.ac.ilSchool of Education, Ben-Gurion University of the Negev, P.O.B. 653, Beer-Sheva 8410501, Israel up processes of agentive self-reflection, authentic learning, and educational change (e.g.Bannister 2015;Bryk et al 2101;Cochran-Smith and Lytle 1999).
The study we report here began as a comprehensive research project on a large-scale programme, supporting and examining teacher pedagogical talk in the framework of professional learning communities (Vedder-Weiss et al. 2018, 2020;Vedder-Weiss, Segal, and Lefstein 2019).As soon as we started analysing the data, however, we noticed that whilst teachers were encouraged to critically inquire into their pedagogical practices and to cooperate in developing ways to improve learning and instruction in their schools, in many instances, their frame of reference was national assessment tests.Using audiorecordings of 180 PLC meetings from 17 Israeli schools, we were able to analyse authentic teacher talk about pedagogical challenges associated with TBA.Most previous research on the effects of TBA on teachers' professional lives is based on interview and survey data (e.g.Hardy and Lewis 2017;Perryman et al. 2011;Saeki et al. 2018;Smith and Holloway 2020), but in our study, teachers were not asked about standardised tests.Rather, the topic spontaneously emerged in professional discussions, giving us a unique opportunity to understand the ways in which TBA interacts with professional learning in PLC meetings and whether, and if so, how teachers' pedagogical reasoning in PLC discussions is affected by the policy framework of TBA.

TBA -rationale and unintended consequences
Drawing on the American case of the 2002 No Child Left Behind Act, Wills and Sandholtz (2009) maintain that whilst there is wide consensus on the need for high teacher quality, there are contesting views on how to promote high-quality instruction in schools.More specifically, there are two main opposing approaches: professionalisation and standardisation.Both build on the concept of accountability; that is, schools, principals, and teachers are accountable for pedagogical quality and its improvement.Yet the approaches differ in their understanding of how to achieve educational improvement.The professionalisation approach emphasises teacher knowledge, experience, expertise, and judgment.Thus, 'when teaching is viewed as more than the simple transmission of facts and ideas, teachers must make reasoned judgments in determining how to make content and ideas more accessible to students' (Wills andSandholtz 2009, 1068).Accountability is achieved in this approach by building a strong base of knowledge and expertise through professional learning processes.The standardisation approach centres on the technical aspects of teaching, shifting 'the responsibility for developing and interpreting curriculum . . .from individual teachers to the district or state level and often relies on outside experts and curriculum designers' (Wills andSandholtz 2009:1070).External tests are the main tool to achieve schools' compliance with these external standards.In this context, TBA is described by Lingard and associates as a 'vertical, one way, top-down, one-dimensional form of accountability with restrictive and reductive effects on the work of principal and teachers, and on the school experiences of students and their parents' (Lingard, Martino, and Rezai-Rashti 2013, 544).
Like the United States, the United Kingdom implemented TBA relatively early on.Ball discusses this policy turn in the context of performativity, which he sees as 'a technology, a culture and a mode of regulation that employs judgements, comparisons and displays as means of incentive, control' (Ball 2003, 216).
According to Ball, subjecting teachers to the policy of performativity increases individualisation of teachers' work.Whilst the use of numerical indicators is aimed at producing more and 'better' information on the educational process, teachers' day-to-day professional life becomes increasingly elusive and insecure as they constantly ask themselves whether they are 'doing enough, doing the right thing, doing as much as others, or as well as others' (Ball 2003, 220).Anderson and Cohen further maintain that teachers and administrators 'are put in a position in which they must look to market and test-based forms of accountability for direction rather than their professional training, associations, or unions' (Anderson and Cohen 2015, 5).Following Foucault's notion of disciplinary power, they argue that in this policy environment, teachers and administrators internalise norms such as the idea that quantitative measurements are more important than other ways of evaluation and the school's ranking in a standardised test is ultimate proof of quality.Verger et al. (2019) point to the rapid growth of national large-scale assessments, mostly in the context of TBA.In the mid-1990s, about 15 countries used such assessments, but by 2012, the figure had doubled.The number of assessments within countries burgeoned as well.The extensive body of research on TBA documents several major unintended consequences for pedagogical practices in diverse education systems driven by pressures to meet the tests' standards.These include, for example, narrowing the curriculum and concentrating on subjects covered by the tests (e.g.Au 2007;Berliner 2011), spending more school hours on teaching to the test (e.g.Jennings and Bearak 2014;Ohemeng and McCall-Thomas 2013), and sorting students into groups to maximise the percentage who pass the test at the expense of inclusive and heterogeneous instruction (a practice referred to as 'educational triage;' see Booher -Jennings 2005;Ladd and Lauen 2010).TBA has also been found to increase stress and burnout among teachers who feel frustrated and deprofessionalised when constantly working under external assessments and dictations (e.g.Perryman et al. 2011;Saeki et al. 2018).
Whilst the body of research on the unintended consequences of TBA is extensive and covers many education systems, research suggests the consequences (both intended and unintended) of the use of standardised testing vary according to the characteristics of the accountability system.Based on a literature review from the US, Jennings (2012) argues teachers' use of test data for pedagogical improvement (versus turning to distortive practices aimed at raising test scores) depends on the accountability system attached to those data.She emphasises five main characteristics: amount of pressure, locus of pressure, type of goals for student performance, assessment policy and whether the accountability system uses multiple measures of processes or is focused on specific outcomes.Ehren et al. (2020) aim at complicating the dichotomy between accountability and trust that dominates much of the research presented above.Building on the case of South Africa, they maintain accountability and trust can be aligned only when schools have the financial, organizational, and professional resources they need to improve their pedagogical practices and meet external learning goals.When the system does not provide schools with these resources, 'the trust relation breaks down and the accountability becomes a tick box exercise where data is sometimes manipulated to seemingly comply to external demands' (Ehren, Paterson, and Baxter 2020, 206).

TBA and teacher professional discourse
An important topic in the extensive literature on TBA is how this policy environment influences teacher professional discourse, in either staff meetings or professional learning frameworks.This line of research is especially relevant to our present focus on teacher professional discourse in the framework of PLCs.Lundahl and Waldow (2009) examined the use of standardised tests in Sweden and Germany over the 20 th century.Based on their findings, they argue the tests were originally used to 'strengthen vague progressive ideas,' but over time, they began to overemphasise educational efficiency, thus contributing 'to a turn away from a more complex, or at least history-laden, educational language' (Lundahl and Waldow 2009, 381).As a result, actors in the field of education tend to adopt a simplistic discourse of educational problems (which they label 'quick language') heavily influenced by psychometric concepts.Thus, instead of inquiring into complicated and multifaceted pedagogical issues, teachers and administrators use numeric information to provide a sense of accuracy and objectivity.Similarly, in a study of the Norwegian education system, Mausethagen finds that with the introduction of TBA measures, teachers began to 'communicate a greater drift toward performance and the integration of national testing as a part of their work' (Mausethagen 2013, 140).Whilst some teachers tried to resist the logic of the tests, they saw students' preparations for the tests as important for their wellbeing and development.The teachers justified this by saying they needed to 'protect' the students and help them achieve the highest possible score.Bausell and Glazier (2018) examined the role of high-stakes testing in early career teachers' socialisation.The study was conducted in North Carolina from 2009 to 2015 and focused on learning communities in which early career elementary school teachers met and shared stories of practice with other teachers.The researchers say the language of teachers changed over the course of six years.The first phase was characterised by teachers distancing themselves from the test and applying different strategies to resist high-stakes testing.In this phase, teachers saw themselves as educators and emphasised student needs.In the second phase, teachers were more concerned with their effectiveness vis-à-vis the educational standards measured by the tests; their professional discourse was shaped by this concern and the logic of assessment.In the third phase, the focus of the meetings was on test scores, and teachers used the tests to evaluate their peers.In this phase, all the dilemmas presented by teachers were framed by the highstakes testing policy 'testing jargon.'In the view of Lewis and Holloway, the datafication of the teaching profession has reshaped its underlying logics: 'teachers are valued most for openly professing a data-responsive disposition and for their ability to embody these data-informed renderings of self, over and above other more educative and pedagogical practices' (Lewis and Holloway 2019, 48).
Garver's recent ethnographic study conducted in an American school serving diverse and high-needs students provides important insights into TBA.Whilst policy makers are concerned by the reliability and validity of using tests to evaluate teachers, Garver argues they tend to ignore broader influences of these tests on teachers' interactions and professional culture.In line with previous studies, she finds that large portions of teachers' collegial time are spent on preparing students for tests, administering tests, and grading tests.Her observations of staff meetings reveal that "sharing student test scores and proctoring and grading for colleagues' students fostered vulnerability and anxiety in teacher-teacher relationships" (Garver 2020, 640), resulting in an eroded professional culture.Looking at Russia, Gurova and Camphuijsen suggest that whilst TBA may have different effects in different schools, it tends to 'reinforce instrumental, and impede expressive logics of action in schools' (Gurova and Camphuijsen 2020, 199).In New Zealand, Dyson explores the ways accountability affects teachers' inquiry processes, highlighting the danger of 'overemphasis on narrow student outcomes' at the expense of broader examination of student learning.She showed how inquiry processes that are supposed to support teacher agency 'slip into performativity and become a way to hold educators accountable for their performance' (Dyson 2020, 5).Citing Smardon and Charteris (2017), she warns against 'accountability in the guise of agency' (Smardon and Charteris 2017,7).
Whilst one of the main reasons for the introduction of standardised tests into education systems has been to provide actors with information to improve their practice, research has shown the reality is much more complicated.As Ingram et al. (2004) show, teachers face several organisational and cultural barriers when they analyse standardised test results.This complexity is partly a result of contradictory beliefs about data in the age of TBA.When Hardy and Lewis investigated school actors' responses to performance data in Queensland, Australia, they found 'teachers engaged in a form of the doublethink of data, where data were simultaneously denied and deified' (Hardy and Lewis 2017, 683).As they point out, "Such responses reveal contradictory logics within a field of schooling practices, in which there seemed to be at least some evidence of teachers' representations of student (and teacher) performance being constituted at the expense of more substantive teaching practice" (Hardy and Lewis 2017, 683).

Professional learning communities
In contrast to the TBA approach that focuses on external top-down pressure, the PLC approach to educational improvement builds on the school staff's capacity and intrinsic motivation to engage in bottom-up processes of agentive self-reflection and authentic learning enabling educational change (e.g.Bannister 2015;Bryk et al. 2010;Cochran-Smith and Lytle 1999).PLCs are groups of teachers seeking to improve classroom teaching and student learning by 'sharing and critically interrogating their practice in an ongoing, reflective, collaborative, inclusive, learning-oriented, growth-promoting way' (Stoll et al. 2006, 222).Accumulating evidence demonstrates the potential of PLCs to facilitate teacher learning and improve teaching (Grossman, Wineburg, and Woolworth 2001;Horn and Kane 2015;Horn and Little 2010;Vescio, Ross, and Adams 2008).For example, Bryk et al.'s (2010) large-scale, longitudinal study of school reform in Chicago finds strong PLCs improve teachers' instruction.Similarly, Bolam et al.'s (2005) study of PLCs in 393 schools in England notes a positive impact of PLC participation on teaching practice.In contrast to TBA, which builds on a top-down accountability regime, the PLC approach offers a model for educational change that builds on developing in-school capacities that enable school leaders and teachers to be accountable for their own pedagogical improvement (Bryk et al. 2010;Stoll et al. 2006).
PLCs have become increasingly popular in many countries (Bryk et al. 2010;Lefstein, Vedder-Weiss, and Segal 2020;Tam 2015;Vangrieken et al. 2017;Vescio, Ross, and Adams 2008) as the PLC model aligns with current learning theory, namely a view of learning as situative and therefore entangled with practice (Lave and Wenger 1991;Putnam and Borko 2000).PLCs can help close the gap between 'learning' and 'practice,' a gap often observed to impede teachers' professional leaning (Ball and Cohen 1999).The PLC model may also help overcome the isolation characterising the professional context of many teachers (Hargreaves 1994;Little 1990;Lortie 1975).It acknowledges teachers' expertise and their ability to agentively direct their learning (Cochran- Smith and Lytle 1999).
The PLC model assumes teachers' professional conversations play a critical role in their learning (Lefstein, Vedder-Weiss, and Segal 2020).Teacher discourse reflects the way teachers think and shapes how they think.The concepts and categories teachers use to discuss teaching and learning are also those with which they think about them.Hence, the ways teachers talk about educational aims, how to achieve them, and how to evaluate them are consequential for how they think about their work (Vedder-Weiss et al. 2020).Discourse in PLCs is a key socialising medium through which members of the community negotiate their work, make sense of their practice, and construct professional norms (Goodwin 1994).Thus, PLC discourse is a fruitful context in which to examine the impact of accountability and standardised testing on how teachers think about their teaching and student learning.Such 'in the wild' (Hutchins 1995) loosely structured pedagogical discourse provides naturalistic data offering a glimpse into the way TBA penetrates schools' professional discourse and professional learning endeavours.
Research suggests productive PLC discourse (that is, discourse that improves participants' teaching; Lefstein, Vedder-Weiss, and Segal 2020) entails the following: (a) it focuses on problems of practice (Horn and Little 2010); (b) it inquires into such problems from an agentive generative but critical orientation (Louie 2016); (c) it is anchored in rich representations of practice, such as student work and video recordings of classroom practice (Little 2003); (d) it involves pedagogical reasoning that includes a broad range of considerations from different perspectives (Horn and Little 2010); and (e) it productively manages multiple voices and contrasting ideas (Grossman, Wineburg, and Woolworth 2001) by combining support and critique (Segal 2019).
Yet research also demonstrates that such productive discourse is rare (Lefstein, Vedder-Weiss, and Segal 2020).PLCs often demonstrate contrived collegiality (Hargreaves 2000), avoiding disagreements and restricting critical voices (Grossman, Wineburg, and Woolworth 2001).Teachers hesitate to 'open their classroom door' and share representations of their practice (Borko et al. 2008).They often prefer sharing 'tips and tricks' (Horn et al. 2017) and success stories (Segal 2019) over problems of practice, thus inhibiting opportunities to collaboratively reason about instructional challenges.When they do share problems, teachers tend to normalise and frame them in unproductive ways (Bannister 2015;Rainio and Hofmann 2021;Vedder-Weiss et al. 2018).
Thus, whilst the concept of PLCs holds great promise, realising their promise requires coping with professional norms that undermine it.We study how TBA adds to these constraining norms.

TBA in the Israeli context
The Meitzav, a Hebrew acronym for Growth and Efficiency Measures of Schools, is a testing regime introduced to the Israeli education system in 2002 as part of a shift in Israeli education policy towards a neo-liberal orientation emphasising educational standards, performance, and accountability (Resnik 2011;Yonah, Dahan, and Markovich 2008).This testing regime includes student achievement tests for the second, fifth, and eighth grades and questionnaires on school climate and other pedagogical aspects.Achievement is measured in four core subjects: language (Hebrew or Arabic), mathematics, English, and science.The assessments reflect the Israeli curricula, and they are aimed at examining the extent to which students in elementary and middle schools achieve the expected level of proficiency required by those curricula.Participation in the Meitzav, once every two or three years, is mandatory for all Israeli state schools and most publicly funded independent schools (some ultra-orthodox schools are exempt from the tests).About 20 percent of all second, fifth, and eighth graders sit each year for Meitzav tests.The Meitzav was presented to the education system and the wider public as a low-stakes accountability system with no formal consequences for students or schools.The Israeli Ministry of Education opposed the publication of the results from these tests.Yet after a few years of public and legal debates, a Supreme Court ruling obliged the Ministry to publish test scores at the school level (school means), and they now appear on the Ministry's website.As was expected, the publication of Meitzav results intensified its unintended consequences and competition pressures among schools and municipalities (Feniger, Israeli, and Yehuda 2016).
In an early study, Klieger (2009) interviewed teachers and principals and discovered that most saw the Meitzav as a tool for measuring student achievement, not a tool for improving learning and instruction, as stated by the Ministry of Education.More recently, Klein (2017) used a questionnaire to survey elementary and middle school teachers on the effects of the Meitzav on their work.The study compared the Meitzav's external tests with similar internal school tests.The external tests consumed much more preparation time and drew more teacher and administrator attention.They were also related, according to teachers' reports, to ethical deviations (such as asking low-achieving students not to attend school on the day of the test).Zohar and Alboher Agmon (2018) studied how senior science teachers viewed the influence of policies aimed at raising Meitzav and other test scores on the instruction of higher order thinking.The interviews revealed complex views on the tests, but the authors conclude that in conditions of TBA, "instruction may have facilitated students' ability to improve test scores but did not make a real contribution to the development of their scientific reasoning and deep understanding" (Zohar and Alboher Agmon 2018, 258).Feniger et al. (2016) collected data from school principals using a questionnaire and in-depth interviews.They documented increased pressure on teachers and principals, diversion of resources to tested subjects, narrowed curricula, a focus on tested subjects and skills, increased school hours teaching to the tests, and forms of educational triage.Furthermore, although principals criticised the Meitzav, they accepted the logic of the use of standardised testing.Most said the Meitzav provides valuable information, although they emphasised that their knowledge of how to analyse these data is very limited.When they were asked about actions taken following their analysis of test results, in most cases, the principals reported practices associated with preparing pupils for the tests.

The PLC intervention programme
The programme we studied is part of a large design-based implementation research project to advance and study in-school teacher learning by fostering productive pedagogical discourse (see Vedder-Weiss et al. 2018, 2020;Vedder-Weiss, Segal, and Lefstein 2019).Collaborating with the Ministry of Education and a philanthropic foundation, both of whom invested financial resources in the programme, we worked with two large education districts that managed the programme and recruited participating schools.The research team was responsible for documenting activity, leading the development of the programme, training and supporting district coaches, and advising school and district leaders.Participating schools established two to five in-school PLCs facilitated by leading teachers from the school.The PLCs were organised according to subject matter (e.g.mathematics PLC or language arts PLC) or according to a pedagogical topic (e.g.evaluation methods or student motivation).PLCs were expected to collaboratively inquire into problems of practice important to the group members, using classroom videos, student work, or detailed case studies they wished to share.They were encouraged to share problems and examine them through multiple perspectives, whilst respecting and considering all voices and agentively weighing various coping alternatives.Leading teachers learned to facilitate these pedagogical discussions in bi-weekly regional workshops where they experienced participating in such inquiry, discussed its rationale, and reflected on its advantages and challenges.The leading teachers were provided with discussion protocols and materials, such as classroom video excerpts, to scaffold their facilitation, but they were autonomous to set their own goals and foci of discussions and select which of the programme's materials and tools -if any -to use.
It may seem paradoxical that TBA and PLCs, which are almost opposite approaches to educational improvement, emerged over the same period and coexist in many education systems.In the Israeli case, the Ministry of Education adopted TBA as a major feature of education policy, whilst encouraging and collaborating in the implementation of largescale PLC interventions, even investing financial resources in their implementation.The Israeli case suggests two explanations for the paradoxical coexistence of two very different approaches.First, the Ministry of Education, like any other large organisation, is not monolithic in its ideology; it is composed of different individuals with different views of its aims and the ways to achieve them.Second, even in centralised education systems, such as the Israeli one, education policy is influenced by actors who are not part of the state administration, such as academic and philanthropic organisations.These actors can influence educational decision-making by bringing new ideas and external funding sources (e.g.Resnik 2007).For example, the Ministry of Education was highly invested in the programme we studied, but it was initiated by philanthropists and academics.

Methodology
To explore the impact of standardised testing on teachers' everyday pedagogical reasoning and their professional learning, we analysed meetings of PLCs focusing on mathematics, language arts (Hebrew), English as a foreign language, and science, all of which are included in the Meitzav.In these PLCs, teachers were encouraged to discuss topics relevant to their professional lives.The Meitzav was not a topic that the intervention programme asked teachers to discuss.In other words, in all the meetings we observed, the Meitzav appeared because teachers believed it was relevant to the discussion and decided to refer to it.Thus, in contrast to most research on TBA relying on interview data wherein teachers are directed to talk about TBA, we were able to analyse authentic teacher professional discourse that reveals the influences of standardised testing on how teachers think and act.
The data cover 180 full meetings (one hour long, on average), audio recorded, from 26 PLC teams in 17 Hebrew-speaking elementary and middle schools in Israel.Data collection and analysis was approved by the Chief Scientist Office in the Israeli Ministry of Education and by an Institutional Ethics Committee.All teachers who participated in the PLCs gave their explicit permission to have the meetings recorded, and they were able to ask to stop the recording at any time.
First, we scanned the entire data corpus, carefully listening to the recordings and searching for mentions of the Meitzav.It was mentioned in 71 meetings, that is, in almost 40% of the meetings in our sample.In 30 of these 71 meetings, there was a discussion of the tests, and in eight meetings, the Meitzav became the focus, or one of the main foci, of the discussion.Next, we qualitatively analysed the 30 meetings at which the Meitzav stood out in the discussion, using inductive thematic analysis to characterise the ways it was referred to.The second author, together with a research assistant, repeatedly read the transcripts of these meetings, using open coding to identify the ways the Meitzav was discussed.Through an iterative process of constant comparison method (Glaser and Strauss 1967), involving the other authors as well, we refined these codes and aggregated them into four main themes.To further demonstrate how standardised testing and TBA influence teachers' pedagogical discourse, reasoning, and learning, we conducted an indepth analysis of one meeting in which the Meitzav was salient.In analysing this meeting, we took a broadly linguistics ethnographic approach (Rampton, Maybin, and Roberts 2015), which assumes the conduct of everyday life is sensible and meaningful for those involved in it.Linguistic ethnography examines everyday interactions as they naturally occur, focusing on participants' actions and their orientation towards each other and the context.Accordingly, we applied micro-analytical methods (Rampton, Maybin, and Tusting 2007), analysing the sequential unfolding of the episode, examining the details of the conversation line by line.Such micro-analysis involves making sense of the content of participants' speech and their linguistic choices, examining what was said as well as what could have been said but was not, considering the broader social and cultural context.This micro-analysis helped us understand the subtle ways TBA logic and practices are embedded in teachers' pedagogical discourse.

Findings
In what follows, we present findings from our thematic analysis of all the meetings in which the Meitzav appeared in the discussion.This analysis is followed by an in-depth micro-analysis of one meeting in which the Meitzav became a central topic.Combined, these two parts of the analysis show how TBA affects professional learning in PLC meetings and how teachers' pedagogical reasoning in PLC discussions is affected by the policy framework of TBA.

Thematic analysis of ways the Meitzav was discussed
Our thematic analysis of the recorded meetings identified four main themes.The first and most frequent was 'teaching to the test.'Teachers used PLC meetings to discuss lessons specifically devoted to students' preparation for the Meitzav.This included, for example, attempts to predict subjects that would likely appear on the next round of the tests, pedagogical practices to prepare students to answer the type of questions included (e.g. using questions from previous tests), and administrative considerations of intensive preparation days before the test (referred to as 'teaching marathons' by the teachers).This theme was especially frequent in mathematics PLC meetings, but it was also prominent in language arts PLCs.For example, in one language arts PLC, a teacher shared her student assignments, aiming to collaboratively examine their writing performance and derive insights for her teaching.When asked to introduce her teaching goals for this lesson, rather than listing benchmarks or other content objectives to which students' performance can be compared, she argued, 'I have a very clear goal . . ..The main goal is preparing for the Meitzav.'Another teacher expressed her discomfort with this by saying, 'The goal is not to teach to the Meitzav; everything you teach will be covered by the Meitzav.'The first teacher agreed that it was not a formal goal of her teaching but emphasised, 'It's always in the back of my mind.'In a different school, a math PLC was engaged in coordinating assessment across teachers and grade levels.They asked which questions to include in the fifth-grade exam, to which the leading teacher responded 'from one of the latest Meitzavs.Doesn't matter which.'One teacher suggested that they should include 'exercise of a fraction . . .when the denominator is not the same', and started to explain the difficulties his student faced with this kind of exercise.Rather than leveraging this opportunity to inquire into students' difficulties and ways to map and tackle them, the leading teacher interrupted him midsentence, reminding her colleagues about the preparation for the exam: 'Well, you forgot the Meitzav preparation "marathon"; we agreed that we'll do two of them.'A few minutes later, the 'marathon' emerged again in the discussion when it became clear to the teachers that they would have to work on it during their spring break.
All of these practices are widely documented in previous studies from different education systems, as we noted in the section on TBA.What is interesting here is that they were discussed in PLC meetings aimed at promoting professional learning.That is, the pressure created by Meitzav tests caused teachers to divert time resources assigned to the PLC programme to discuss administrative and pedagogical preparations for these tests.
The second theme was the Meitzav as a pedagogical tool.This theme included teachers' views and use of the test as a tool to achieve goals, such as skill development, understanding of the subject matter, and assessment of students' knowledge and understanding.Here again, the theme most frequently appeared in math PLC meetings.In one meeting, a teacher argued schools with high Meitzav scores are also schools in which students have high achievement in higher-order thinking tasks.She explained that although their school was not participating in the national assessment that year, using Meitzav questions should be an integral part of learning because they help develop higher-order thinking skills in mathematics.
In another interesting example, this time from a language arts team, teachers devoted the entire meeting to discuss preparation for a Meitzav test, including using previous Meitzav tests.One teacher expressed a concern that this might burden the students too much and even went so far as to equate giving repeated tests to committing 'war crimes.'Another teacher agreed but argued that 'if you want them to succeed you have no choice' and later used an Israeli military idiom 'train hard, fight easy' to reflect her sense that practice was required.She further explained that using these tests can also help teachers 'know what the weakness was actually' in students' understanding of the curriculum.The group then discussed specific test questions and the difficulties they might raise but rather than analysing the sources of student mistakes, their main focus was on figuring out the scoring chart and how many points each type of answer deserved.In a meeting of a math PLC, the teachers analysed the Meitzav results, going through the scores for each measure, comparing their school average scores and standard deviations to those of other schools, happy to discover they were "much higher than the general schools' average" and even 'first in the district.'Teachers suggested they should 'advertise this in the school website and in the newspapers' to make parents 'happy' and help them 'decide which school to send their child to.'They did not go into detail on how they could use these data to improve their teaching.
Here we identify a complex and nuanced influence of the Meitzav test on PLC discourse.On the one hand, the main reason for discussing Meitzav tests was the external pressure to show high performance.Furthermore, the findings support Anderson and Cohen's 2015 argument on the internalisation of accountability norms with the Meitzav seen as able to test higher-order thinking abilities.On the other hand, teachers had a perception of the Meitzav test as a tool serving purposes beyond 'teaching to the test.'More specifically, they discussed using these tests as a diagnostic tool to better understand which parts of the curriculum were less understood by students and which students had difficulties that required more attention and extra help.Yet whilst these discussions confirmed that standardised tests can encourage the development of student capacities, it was evident that teachers often interpreted and used this in narrow ways, mainly to increase student success in the tests.It is also important to remember that when these capacities are dependent mainly on previous Meitzav tests, they are limited to subjects included in the tests and to the grades tested.This can be seen, for example in the use of Meitzav tests developed for the fifth grade to evaluate fourth graders, as one teacher suggested in an English language PLC.More broadly, whilst Meitzav tests can promote pedagogical professionalism, this professionalism is restricted to Meitzav tools and logic that prioritise quantitative measures over other types of student evaluation.
The third theme was a collision of the Meitzav with teachers' pedagogical beliefs.As the second theme showed, teachers used the Meitzav for pedagogical purposes and accepted its logic and relevance, but they also referred to it as an obstacle to meaningful learning and as hampering their ability to tailor instruction to students' needs.Several times teachers expressed the idea that standardised testing was an outdated pedagogical concept incompatible with the skills and knowledge needed in the 21 st century.This theme mainly appeared in math PLC meetings, but we also heard it in English and science PLCs.Although mentioned less frequently, when it appeared, it developed into an elaborate discussion.Thus, for example, when discussing 'differentiated' (or personalised) teaching and evaluation methods, an English teacher argued, 'We always talk about differential learning, and we write our tests in two different levels . . .and the Meitzav is the same for the whole class . . .including students who need special adjustments . . . .Explain why a Ministry of Education that requires differential learning from me doesn't do it itself.'In a math meeting, when teachers analysed the internal test of a third-grade student, one teacher commented, 'They won't accept this answer in the Meitzav,' referring to the fact that the student received all the points for this answer only because the teacher was familiar with the student's difficulties.In regular internal testing, teachers know each student's strengths and weaknesses and can provide appropriate constructive feedback, but this does not happen in the Meitzav.As this discussion demonstrates, the pedagogical belief that differentiated learning is necessary to promote every student's ability and achievement collides with the standardised nature of the Meitzav and limits the teacher discussion.
The fourth theme was negative consequences of Meitzav tests for teachers and students.This theme included teachers' claims about how the preparation negatively influenced learning and instruction, criticism of the amount of time dedicated to preparation, and claims about the emotional price paid by teachers and students alike.For example, in a math team, a teacher consulted with her colleagues about helping the many challenging students in her class.A teacher suggested that the fact these students were in 'a Meitzav class [i.e. a grade that should be tested this year] caused us to neglect the basic curriculum and to focus on what is going to be tested.'The group returned to this explanation when summarizing their analysis of the problem.In an English language PLC, a teacher expressed her anger and frustration, saying her students left the classroom in tears after the Meitzav, and added, 'A kid who used to get 90 on tests will get 75 . . ..I see them every year, they come to me crying . . ..Give me a break, 25 pages for a kid who can't finish a two-page test that I prepared . . ..Oh well, we can't change the world.'Whilst this teacher harshly criticised the tests, she also thought teachers needed to adapt because they cannot change the system.
Our findings from the thematic analysis echoes Hardy and Lewis' (2017) notion of 'doublethink of data.' Building on a case study of the response to TBA in Queensland, Australia, they find teachers and administrators criticised the prioritisation of quantitative outcomes and their constant presentation in charts, but simultaneously 'devoted their energies to preparing these very same "beautiful charts" and "eye-catching" presentations' (Hardy and Lewis 2017, 682).A similar 'doublethink of data' emerged in our data.On the one hand, criticism of the Meitzav, its logic and consequences, were relatively common, but on the other hand, teachers frequently saw it as a useful tool not only for preparation for the next round of tests but also for internal student assessment and for their own learning about teaching goals.Some teachers even saw it as a tool for public relations.

Microanalysis of one discussion of Meitzav
We now turn to an in-depth analysis of one discussion in which the Meitzav was a central topic.This was in a language arts PLC meeting; the main subject was assessment and testing.Whilst the aim of the meeting was to develop a general discussion of assessment as part of a learning process, the teachers repeatedly referred to the Meitzav and accepted it as frame of reference.As we show below, this is a clear example of how TBA becomes a main lens through which teachers reason about their practice (see, e.g.Dyson 2020;Hardy and Lewis 2017).
At the beginning of the meeting, Ayala (a pseudonym, as are all names in this section), a fourth-grade science teacher who joined the language arts PLC, presented a case to discuss with her colleagues.She described a science test she administered; to her 'astonishment' and disappointment, many students failed.This failure particularly worried her because the test simulated the fifth grade Meitzav questions.The failure might indicate that her students would not do well on the Meitzav.Thus, even before her students had reached grade five, when students are first tested, Ayala chose to give them a similar test.As she explained, 'This class is going to face the Meitzav next year . . ..They're supposed to know how to deal with this kind of questions.'After presenting this case, Ayala asked her colleagues what the reason for the low scores might be and how she could improve them.She thought the problem was a lack of the skills needed to answer the Meitzav questions.Other teachers suggested the source of the problem might be a lack of discipline in her class or the social and emotional problems characterising some of her students.Ayala admitted these problems existed but wondered why they affected only this Meitzav-like test and not other forms of assessment.Alon, another fourth-grade teacher, asked Ayala if the students were familiar with the material covered by the test.Ayala answered: 'It's hard to know if the students know before you test them.'This statement highlights the common perception that testing is the most accurate and important way to assess students' knowledge.
Mor, the coach of this PLC, suggested the reason for Ayala's students' failure was not a lack of subject matter knowledge but difficulties in reading comprehension and thus in understanding the questions.Mor tried to steer the conversation back to the original topic, student assessment, but the argument remained Meitzav-related: 'When the Meitzav tests arrive, the science teachers need help from the [Ministry of Education] literacy coaches.' Alon agreed, saying this was a common problem with all Meitzav tests.Mor suggested leaving the Meitzav aside and focusing on student skill assessment rather than on a specific test.This may be seen as an attempt to resist the tendency of the teachers in the group to frame a discussion of assessment as a 'Meitzav topic.' Ayala agreed with Mor's definition of the problem, but a few minutes later, the Meitzav found its way back into the conversation.As the discussion continued, instead of focusing on the development of reading comprehension skills to help students cope with diverse assignments, the teachers returned to the Meitzav and focused on ways teachers could improve their students' understanding of Meitzav-type questions.The conversation evolved into a direct discussion of preparations for Meitzav tests, as two other teachers in the group, Effi and Helli, explained how they worked with their students on Meitzav questions using tests from previous years.At this point, another teacher said she taught in a different school, where one hour of science was dedicated each week to the Meitzav using a special 'Meitzav book.'When one teacher hesitantly reminded the others that the Ministry of Education does not encourage teaching to the Meitzav, Helli replied, 'I still think we need to expose them [the students] to the Meitzav, maybe not in a Meitzav lesson, but once a week to use the topic. ..to give them questions from the Meitzav and work with them [on these questions].' As the conversation continued, Mor suggested various strategies to work with students on their reading comprehension.But Ayala dismissed them all, before even considering them, based on managerial and accountability reasoning, such as, 'You're like wasting an hour . . .there're goals to achieve . . ..At the end, one needs to show performances. . .
[so] how will you explain that you didn't cover the material?'By telling her colleagues about practices she enjoyed applying 'before the Meitzav and all this nonsense,' Ayala made it clear she was committed to her students' success in the standardised tests although she strongly opposed them.
Whilst this meeting may be seen as an extreme example of the influence of standardised testing and TBA on pedagogical discourse, it vividly reveals how this policy environment penetrates schools' professional culture.First, Ayala's motivation for bringing this case to the meeting stemmed from her fear that her students would not be ready for the Meitzav test the following year.Ayala opposed standardised testing and protested its negative consequences.Nevertheless, when she was offered the opportunity to share a problem from her practice with her community, rather than posing a question that aligned with her beliefs, she chose to initiate a discussion on how to better prepare for the Meitzav.Although she perceived PLC meetings as a space for professional learning, her choice reflected the lack of agency she experienced facing the Meitzav, not only in her teaching (as described, for example, by Dyson 2020) but also in navigating her learning and her community's learning.Second, whilst the case could have been a good starting point for a constructive discussion of student skill development and its assessment, most participants insisted on focusing on the Meitzav and how to prepare students.Although at times, the coach, serving as the official representative of the Ministry in the room, attempted to resist this narrow concentration on teaching to the test, the other teachers returned to it again and again.Moreover, the coach's practical suggestions about ways to promote students' skills were rejected at the outset because of accountability considerations.What we find here, then, is a discussion contradicting the original logic of PLC, with its emphasis on teacher agency in a critical multi-voiced and generative examination of knowledge, beliefs, and practice (Grossman et al. 2001).Instead of considering other assessment tools to help them understand why students had difficulties with reading comprehension and the role these difficulties might play in the ability to assess students' understanding of the subject matter, and instead of generating ways to manage these difficulties, the teachers talked about their students' performance on future Meitzav tests, despite their resistance to the tests, and despite the coach's attempts to prevent this turn in the discussion.The discussion was extremely narrow, considering only the perspective of the Meitzav and using its questions as the frame of reference.

Conclusion
Both TBA and PLC have been widely studied in recent decades.We offer a new perspective on their interaction using data collected in Israel.Based on observations of staff meetings in which TBA emerged as part of a genuine discussion among teachers, we explored how the policy environment of TBA affects professional learning in PLC meetings and how teachers' pedagogical reasoning in PLC discussions is affected by the policy framework of TBA.In line with previous studies (e.g.Berliner 2011;Feniger, Israeli, and Yehuda 2016;Jennings and Bearak 2014), we found many indications of teaching to the test and pedagogical concentration on materials and skills covered by the Meitzav.Since the PLC programme we studied focused on professional development, these topics were not expected to appear in the discussions.The fact that they did demonstrates the strong pressure to perform well in the Meitzav tests.It also suggests that under TBA, PLC and similar programmes may see a diversion of time and attention resources away from focusing on professional development to focusing on standardised test preparation.
We also found evidence that TBA, at least in the Israeli context, may constrain professional development by limiting the scope of the discussion and the range of options to the logic and materials of the standardised test.The micro-analysis of one discussion focusing on assessment tools clearly showed how teachers' professional discourse was subjected to the framework of the Meitzav.Even when some teachers tried to steer the discussion away, it returned again and again to the testing.It seems the salience of the Meitzav in the Israeli education system and the scarcity of other ready-made assessment tools available for the teachers who participated in the meeting set the boundaries for this discussion.
The study contributes to the development of a better understanding of how the policy framework of TBA constrains constructive implementation of the logic of PLC (Dyson 2020).Previous research has suggested PLCs have the potential to improve teachers' instruction when teachers participating in them consider diverse perspectives and critical voices (Lefstein, Vedder-Weiss, and Segal 2020;Grossman et al. 2001), arguing for teacher professional learning based on teacher agency to pose, examine, and solve problems in their specific contexts.However, our analysis shows the Israeli PLC discourse in our data was often limited to the Meitzav perspective, at the expense of other pedagogically imperative considerations and more critical stances.In our study, when teachers had the opportunity to exert agency by choosing which problems to pose and what to focus their community learning on, they often succumbed to the needs and rationale of standardised testing.This happened despite the clear messages conveyed by the programme leaders that this was not the goal.The subsequent narrowing of teachers' agency and pedagogical reasoning contrasts with the PLC goal of expanding teachers' agency and reasoning (see Horn and Little 2010).We found teachers often complied with test-based expectations even if they felt uneasy doing so.When teachers accept and adhere to the logic of standardised testing even though it contradicts their professional beliefs and their students' needs, without explicitly negotiating the inherent tension, their ability to engage in critical but generative discussions and to agentively explore instructional alternatives (Horn and Kane 2015;Louie 2016) is limited, again working against the underlying assumptions of PLC programmes.Further research is required to examine whether more training and more structured conversation protocols may be effective in resisting the narrowing caused by TBA.We suggest that explicitly and critically discussing with teachers TBA's effect on their reasoning, using excerpts such as the ones we provide in this article, may help them negotiate these tensions more productively.
Alongside these negative unintended consequences of TBA for the PLC intervention, our findings revealed a positive aspect of the Meitzav for teachers' professional learning and pedagogical reasoning, namely teachers' use of data based on Meitzav tests.Use of data in teachers' decision-making is an important goal of professional learning interventions (e.g.Little 2012), including the one we studied.Our analysis suggests standardised tests can encourage use of data not only because they pressure school personnel to do so as part of the concept of accountability but also because they equip teachers with ready-made tools that can help them measure their students' proficiency and identify topics that need further attention and students who need more help.This positive implication, however, comes with a price tag of prioritising quantitative measures over other evaluation methods and narrowing school assessment to what standardised tests measure.To build on the positive effect of the tests whilst counterbalancing their salience in discourse on assessment, PLC intervention developers could offer teachers tools to use the tests more productively in their PD (e.g.focusing on extracting insights into student understanding and related teaching).However, as the impact of the TBA logic appears so powerful, we suggest that a better approach would be to offer diverse assessment and evaluation tools that can foster more holistic discussions that take into account broader aspects of the curriculum and a much more complex and nuanced concept of students' knowledge and abilities.
The findings of this study are relevant for policy makers when considering the design of accountability systems.In order to enable authentic and effective teacher professional learning, which is essential for any process of school improvement, external pressures on schools, such as those exerted by TBA, should be used sparingly and with great caution.For example, instead of using mandatory external tests, schools should be encouraged to use standardised tests as part of internal assessment processes that enrich both pedagogical decision-making and teacher professional learning.