Implementation Support Improves Outcomes of a Fluency-Based Mathematics Strategy: A Cluster-Randomized Controlled Trial

Abstract The Say-All-Fast-Minute-Every-Day-Shuffled (SAFMEDS) strategy promotes fast and accurate recall. The existing literature suggests that the strategy can help learners improve academic outcomes. Through a cluster randomized controlled trial, we assessed the impact of implementation support on children’s mathematics outcomes during a teacher-led SAFMEDS intervention. Following training and prior to baseline assessments, we randomly allocated schools to receive either no (n = 31) or ongoing (n = 33) support from a researcher. Support consisted of three in-situ visits and email contact. Assessors remained blind to the condition of the schools throughout. We analyzed the outcomes of children (n Support = 294, n NoSupport = 281) using a multi-level mixed-effects model; accounting for the children nested within schools. The results suggest that implementation support has a small effect on children’s fluency of arithmetic facts (Mathematics Fluency and Calculation Tests (MFaCTs): Grades 1–2, d = 0.23, 95% CI: 0.06–0.40; MFaCTs: Grades 3–5, d = 0.25, 95% CI: 0.08–0.42). These results are larger than the average effect sizes reported within professional development literature that apply coaching elements to mathematics programs.

systematic support from researchers. Implementation science suggests that researchers first conduct efficacy studies on a small scale to validate an intervention before implementing them at a larger scale under real-world (day-to-day) conditions. The quality ofimplementation during the latter phase may bound the benefits of evidence-based programs in school environments (Cook & Odom, 2013). Implementation fidelity refers to the extent to which someone implements a program according to the original and intended design (Lee et al., 2009). Durlak and DuPre (2008) found that educators who do not specialize in research (e.g., teachers) often do not implement an intervention to 100% fidelity under the realworld conditions of a classroom. They also found that low-quality implementation of evidence-based interventions results in smaller effect sizes on outcome variables; including those linked to student achievement. This highlights the importance of identifying effective implementation support models to ensure that teachers are able to elicit desired and intended outcomes from evidence-based educational interventions.
Training can be an effective way of helping teachers to develop conceptual understanding of interventions but alone may not yield sufficient changes in practice (Education Endowment Foundation, 2019). Coaching teachers offers a lever for improving the quality of implementation by supporting them to translate knowledge into classroom practice (Kraft et al., 2018). Sailors and Shanklin (2010) used the term coaching to describe a process of sustained school-based support from a knowledgeable individual. Coaches model research-driven interventions and work with teachers to explore how they can use the strategies with their own students. Coaching programs can take a variety of forms, but generally consist of one-on-one interactions between a coach and a teacher. These interactions provide a platform for teachers to receive individualized feedback based on their professional development (PD) needs (Fletcher & Mullen, 2012).
Following a meta-analysis of 60 studies, Kraft et al. (2018) found that teachers often receive coaching in conjunction with additional treatment elements (i.e., in 90% of the reviewed studies teachers received coaching alongside group training, instructional content, and/or video resources). Their analysis revealed a pooled effect size of þ0.18 standard deviations (SD) relating to the effect of these programs on student achievement and þ0.49 SD relating to teachers' instructional practices. In their theory of action, Kraft et al. outlined that training sessions help improve teacher pedagogical and content knowledge. This knowledge, alongside coaching and the availability of relevant materials, positively influences teaching behavior. As a result, teachers implement higher-quality teaching practices and are better able to identify and use strategies that support student outcomes. However, it is worth noting that most of the interventions that met Kraft et al.'s inclusion criteria focused on applying these practices to literacy and content-based interventions; with only two studies reporting the outcomes of mathematics programs. Moreover, Kraft et al.'s analysis revealed that the effects on student outcomes from larger-scale effectiveness trials were smaller (þ0.10 SD) than those employing smallerscale efficacy designs (þ0.28 SD). Whilst coaching might be a valuable tool, research is still needed to disentangle the effects of coaching from additional treatment elements and to establish the effects of using coaching programs at scale.
In a complementary meta-analysis of 95 studies, Lynch et al. (2019) reviewed mathematics and science interventions supported by PD and/or curriculum materials. Lynch et al. defined PD as a set of experiences that intend to affect change in teacher-and classroom-level phenomena. They too highlighted that PD programs can be, and often are, multifaceted. As such, their inclusion criteria focused on the number of hours a teacher spent experiencing PD; the focus on improving knowledge of content, pedagogy, and/or use of curriculum materials; as well as the format of the program (e.g., one-onone coaching, summer workshops, online learning). Curriculum materials are instructional practices, guided by activities and text within the program itself. Their review identified that 22% of studies focused on PD alone, whilst 75% used PD in combination with curriculum materials. Overall, they found PD programs to be effective. However, only 20% of the PD programs included a coaching element. There was no evidence that coaching elements added value in terms of outcomes but neither did they reduce intervention's effectiveness. The vast majority of included studies with a coaching element were multi-component programs. With few published studies reporting the outcomes of coaching as a standalone PD format to support mathematics interventions, further research is needed.

SAFMEDS Overview and Prior Research
The current study is set within the context of North Wales, United Kingdom. Following disappointing results in the internationally comparative Programme of International Student Assessment tests in 2009 (OECD, 2010), Welsh Government identified a need to raise educational standards in their schools. In recent years, education policymakers in Wales have focused on improving the use of evidence-based practice within education (Furlong, 2015;OECD, 2016). As a result, an increasing number of teachers in North Wales are using the Say-All-Fast-Minute-Every-Day-Shuffled (SAFMEDS) strategy in their classrooms to improve children's fluency of basic mathematics skills (Tyler et al., 2018).
Traditionally, within educational practice teachers deem children to have mastered skills if they are able to perform them to a level of 90-100% accuracy (Fuller & Fienup, 2018). Binder et al. (2002) argued that a percentage correct criterion is too simplisticbeing accurate is necessary but alone is not sufficient to demonstrate mastery of content. If children practice skills beyond mastery they will be able to develop fluency (the combination of accuracy and speed). Adding a dimension of time into assessment provides more detail about performance and can more accurately predict whether children will be able to retain, apply, and generalize learned skills (Binder, 1996;Johnson & Street, 2012). SAFMEDS is a practice and assessment strategy that applies the principles of precision teaching (PT) to help children develop their skills to fluency (Lindsley, 1995). Kubina and Yurich (2012) described PT as a system for defining, measuring, recording, and analyzing teaching effectiveness on a child-by-child basis. To achieve this, teachers must reflect upon children's learning regularly, and use these data to make subsequent decisions about their teaching approach (Lindsley, 1995). A child engages with the SAFMEDS strategy using a deck of flashcards, with a question or statement on the front and the corresponding correct answer on the back. They read the front of the card silently before vocalizing the answer (Quigley et al., 2018). During each 1-minute timing, they aim to get through as many cards as possible, whilst separating their correct responses from their "not yets" (Cihon et al., 2012). The child then plots their best score from the session on a Standard Celeration Chart (SCC), which develops a learning picture over time (for more details see Lindsley, 1995). Learning pictures enable teachers, children, and/or other practitioners to decide if additional support is necessary. For example, if a learning picture shows that the number of cards a child can answer correctly in 1-minute has plateaued over several consecutive sessions, an intuitive approach might be to assess if the child has mastered all of the necessary prerequisite skills, and if not, to ensure that they do so (Johnson & Street, 2012).
The SAFMEDS strategy has clear utility within schools, with a growing quantitative evidence-base suggesting some of the associated advantages. For example, practitioners can adapt the strategy to implement it on a one-on-one basis (e.g., Cunningham et al., 2012), with small groups of children (e.g., Beverley et al., 2018), or class-wide (Hunter et al., 2016). The strategy also has evidence to support its effectiveness amongst different populations including learners attending mainstream classes and children with additional learning needs (e.g., Casey et al., 2003;Greene et al., 2018;Kubina et al., 2000).
Much of the available literature on the SAFMEDS strategy documents small N and case study research designs. These studies demonstrate the positive effects of the approach in improving academic outcomes of learners across a variety of domains. This includes helping children become more fluent at arithmetic (see for example, Casey et al., 2003), recalling content specific terminology and definition dyads (see for example, Stockwell & Eshlelman, 2010), as well as sight reading Dolch words (Lambe et al., 2015). More recently, comparative group studies have investigated the effectiveness of the SAFMEDS strategy against an education as usual control group (e.g., Hunter et al., 2016, Greene et al., 2018. Within these studies, a researcher with experience using the SAFMEDS strategy was present at each intervention session to support implementation and ensure high levels of fidelity. Although sparse, there is some evidence to suggest that teachers can elicit positive student outcomes from a SAFMEDS intervention even when researchers offer no in-situ support following training. Beverley et al. (2016) acknowledged the importance of providing teachers with the training necessary for them to implement and manage a SAFMEDS intervention. Following training, the teachers participating in their study did not receive any in-situ support from a researcher to implement the strategy on a classwide scale. The results demonstrated that the class of children who engaged with the SAFMEDS intervention made more reliable fluency progress between pre-and post-test compared to the class of children who did not use the strategy.
To date, the majority of empirical research investigating the SAFMEDS strategy focuses on efficacy designs with researcher driven implementation. Beverley et al. (2016) suggest that teachers can elicit positive student outcomes under conditions with no researcher input following training. Whilst both of these approaches have shown positive results, it is still unclear whether researcher involvement after initial teacher training is important for implementation and children's outcomes. The aim of the current study was to provide direct insight into the impact of coaching (i.e., in-situ individualized implementation support from a researcher) during a teacher led SAFMEDS mathematics program in schools. The teachers and teaching assistants used the SAFMEDS strategy with the children in their schools to help develop fluency of arithmetic.
In our theory of action, the initial training intended to support teacher's pedagogical knowledge of the SAFMEDS strategy and the associated data-driven teaching practice (PT). We anticipated that coaching support would improve the fidelity of the teacher's implementation. A researcher tailored each in-situ visit to the individual needs of each teacher, but broadly these sessions aimed to address challenges such as: interpreting learning pictures; identifying and correcting children's procedural steps as they progressed through each SAFMEDS timing; and managing challenges such as cheating and identifying appropriate learning materials. As a result of more accurate implementation, our theory of action proposed that children attending schools where their teacher received coaching would make greater fluency progress between baseline and follow-up, compared to those attending schools that did not receive coaching following training.
In line with previous studies that have investigated the effects of the SAFMEDS strategy, the outcomes from this research relate to children's arithmetic fluency. We acknowledge that it would have been beneficial to collect data directly relating to fidelity of implementation but were unable to due to practical and funding restraints. To our knowledge, this is the first randomized controlled trial (RCT) investigating the effects of providing implementation support to teachers following SAFMEDS training. Answering this question would provide a foundation for further research investigating the mechanisms that make SAFMEDS coaching programs effective and contribute to the broader literature about the effects of coaching for teachers on intervention outcomes for students.

Trial Design and Participants
As part of a wider initiative to improve numeracy standards across North Wales, the Regional School Effectiveness and Improvement Service for North Wales (GwE) disseminated the initial advertisement for this project. For a school to be considered eligible, they had to located within one of the six local authorities supported by GwE (Conwy, Denbighshire, Flintshire, Gwynedd, Anglesey, or Wrexham). Table 1 outlines the characteristics of the schools included in the randomization. To participate schools needed to be willing to release teacher(s) to attend the training at the beginning of the project. The nominated teacher needed to be able to invest the necessary amount of time per week to deliver the SAFMEDS intervention (i.e., three 20-minute sessions). The advertisement explained that by enrolling on the project schools would be randomized to one of the two trial arms. Schools had nominated teachers to complete the training before they knew which trial arm they had been allocated to. Therefore, any trial arm differences in the roles of teaching staff selected for training by the schools were due to chance. Table 2 displays the baseline characteristics of the teachers who attended training.
Each school selected up to 10 children to participate in the SAFMEDS mathematics intervention prior to randomization. We disseminated an opt-out consent form to all the children's parents/guardians detailing the aims of the study. This form asked if we could collect and analyze their child's outcome data. In instances where the consent form was returned, teachers could still include the children in the SAFMEDS intervention, but we did not collect their anonymized data for analysis. We had consent to analyze the data from 575 children (n Support ¼ 294, n NoSupport ¼ 281), across 60 schools (n Support ¼ 31, n NoSupport ¼ 29).
For children in year 3 or above (aged ! 7 years), we asked teachers to implement the intervention with children who scored less than 100 standard points on the national numeracy procedural test undertaken at the end of the preceding academic year. All children in years 2-9 (aged 6-14 years) who attend a maintained school in Wales (i.e., schools funded by a local education authority) sit this formative test at the end of each academic year. Children sit the procedural test online as it offers a personalized assessment experience (i.e., the questions get easier/more challenging depending on the child's ability). The procedural numeracy test assesses all relevant aspects of the numeracy curriculum in Wales.
Children in year 2 had not completed the national tests at the start of the study. In these instances, we asked schools to identify the children who they felt needed intervention support to improve fluency of basic mathematics skills and/or who they judged to be working below the expected standard for their age. These children were those who needed supplementary tuition to improve their fluency of arithmetic facts.
The mean age of the children attending schools randomized to the no support arm was 7-years 3-months (range: 6-years 0-months to 9-years 2-months; SD ¼ 14.34 months). The mean age of children attending schools allocated to the ongoing support arm was also 7-years 3-months (range: 6-years 0-months to 15-years 10-months; SD ¼ 14.32 months). It is worth noting that two secondary (high) schools participated in this study. One of these schools worked with a group of year 7 students (aged 11-12 years) who significantly underperformed on the procedural test. The other secondary school was a special educational needs school that supported children aged 11-17 years; these children lacked basic mathematics skills (e.g., single digit addition). Table 3 outlines the characteristics of the children included in the randomization. Table 4 displays the baseline characteristics for the children's outcome measures.

Randomization
Randomization occurred after all teachers received the SAFMEDS training but prior to the children completing the baseline assessments. A statistician-who was independent to the study-randomly allocated schools to one of the two trial arms using minimization. During this allocation, the statistician stratified schools by County (local education authority) and the language used predominantly for teaching (English versus Welsh medium). Some of the schools had the same headteacher; in these instances, the statistician treated the schools as one cluster to prevent bleeding effects across conditions. In terms of hierarchal structure, teachers and children were nested within each school. The first author could not be masked to the randomization due to the need to conduct support visits. However, the assessors who conducted the baseline and follow-up assessments remained blind to the allocation of each school. 192 128 4 (8-9 years) 4 9 5 (9-10 years) 3 1 6 (10-11 years) 2 -7 (11-12 years) 11 -8 (12-13 years) 1 -9 (13-14 years) 4 -10 (14-15 years) --11 (15-16 years) 4 -

SAFMEDS Training (All Teachers)
All teachers received the same training prior to randomization. During the 3-hour training session, we introduced the teachers to some of the basic theory behind the SAFMEDS strategy, modeled the procedure (as detailed in Table 5), and gave them the opportunity to practice using the cards. Following four 1-minute SAFMEDS timings, we showed the teachers how to record data and graph it on a SCC. During the training, we also emphasized the importance of interpreting learning pictures in relation to children's learning progression throughout the intervention (see Lindsley, 1995). After showing the different learning pictures, we went through a series of common scenarios using SCC data from previous research projects. The scenarios prompted discussion relating to cheating, identifying skill deficits, and deciding whether something in the surrounding environment may be affecting a child's scores (e.g., missing their favorite lesson to take part in the SAFMEDS session, or a loud music lesson scheduled in the room next door). We discussed what learning pictures may develop as a result of these scenarios  Table 5. An outline of the Say-All-Fast-Minute-Every-Day-Shuffled strategy.
Timing Action Learning principle (corresponding action) Before timing 1. Shuffle the cards. Prevents serial learning (1) 2. Teacher sets a timer for 1-minute.
Short focused practice sprints (2) During timing 3. Children read the front of the card in their head and say the answer out loud. They should turn each card over to check their answer, before placing it in either their "correct" or "not yet" pile.
Active responding (3) Immediate feedback (3) After timing 4. Once the timer has finished, the teacher says stop. 5. Children count their cards and write their scores down in the given table. 6. If a child gets any cards in their "not yet" pile, they should address these cards (error correction). 7. All cards should be put back in one pile, ready to shuffle and go again. 8. Following all four timings, the child should take their best score and plot it on their SCC. 9. At the end of each week, a teacher should look at each child's data. If they have shown little, to no, progression over three consecutive days they should consider making a change within the program.
Formative assessment (5) Practice and firm new skills (6) Repetition to build mastery (7) Assessment of learning (8,9) and suggested some interventions that might be appropriate to try (e.g., creating individualized score targets, building fluency of prerequisite skills, or changing the time/ location of the SAFMEDS session). Throughout the intervention period the children engaged with the SAFMEDS strategy via a deck of flashcards. On the front of each card was a question (e.g., 5 þ 6 ¼) and on the back was the corresponding correct answer. During the training, we provided teachers with all the materials that they would need to start the SAFMEDS intervention in their school. This included decks of addition and subtraction SAFMEDS cards, score tables, SCCs, and a placemat (so the children could easily distinguish between their "correct" and "not yet" cards). All teachers who attended the training also had access to printable PDF materials of component arithmetic skills across the national curriculum which they could download at their convenience. We instructed all schools to focus on single digit addition skills first and then progress through card decks as required (in line with the children's learning pictures).
We instructed teachers to use the SAFMEDS strategy at least three times per week with the children they were supporting. Each session should consist of four SAFMEDS timings and last approximately 20-minutes. Within these sessions, the teachers and children had clearly defined roles. The children were to work through their cards independently during each 1-minute timing (as outlined within Table 5). Teachers were required to monitor aspects of fidelity (e.g., ensuring the children: followed each of the appropriate steps, were not cheating, and were regularly engaged with the sessions). Additionally, we encouraged teachers to support children during the error correction step (including some one-on-one or small group teaching if necessary), review charted data regularly, and ensure that children were practicing a skill that was appropriately matched to their existing skill level. Once children had claimed they had become fluent at a deck of cards, it was also important that the teacher was able to verify this (e.g., watch a timing) before they issued a deck for a more difficult skill.

Ongoing Support
Schools allocated to the ongoing support trial arm received three in-situ support visits from the first author throughout the duration of the study (November, February, and May). The first author had several years' experience of using the SAFMEDS strategy in schools, so was able to advise teachers on themes around implementation and interpreting the children's data. Each visit was individualized based on the needs of each teacher and the children they were supporting. Examples of support varied, but largely consisted of the following: modeling sessions; observing the teachers delivering the intervention and providing direct feedback on implementation; suggesting interventions for children who were struggling to progress with particular decks (e.g., focusing on building fluency in prerequisite skills); discussing ways that teachers might be able to integrate the intervention more readily (e.g., adopting a peer-led approach to support error correction and reduce cheating); and supporting teachers to interpret the children's learning pictures. Each scheduled visit lasted 1-hour.
Between visits, teachers could email the first author about any issues relating to the intervention or the technology used to support the project. Teachers allocated to this trial arm contacted the author about accessing materials (18 instances; 12 schools), to gain advice about helping children progress (7 instances; 6 schools), and for advice about interpreting data (2 instances; 2 schools).
We gave all teachers-irrespective of trial arm-the option to plot the children's data using either paper or electronic SCCs. There were 10 instances (across 8 schools) where teachers allocated to receive support emailed the first author to report issues logging the data electronically. Moreover, we made all of the resources for this project available via the Welsh Government's online school platform for educational resources (HwB). Two teachers (across 2 schools) allocated to the support trial arm emailed about gaining access to the SAFMEDS HwB platform.

No Support
Following training at the beginning of the project, schools allocated to the no support trial arm received no implementation support from the first author. Teachers in this condition could contact the first author if they had any technical problems accessing the resources or inputting data into the electronic charts; however, they were not able to seek advice regarding the day-to-day implementation of the SAFMEDS strategy. There were 9 instances (across 7 schools) where teachers emailed the author to request access to resources, 2 instances (across 2 schools) where teachers needed support accessing the SAFMEDS HwB platform, and 10 instances (across 9 schools) where teachers reported issues logging their children's data electronically.
During the training, we highlighted an additional caveat about the support we could offer schools allocated to this arm. We had an ethical obligation to provide the teachers with support if they felt like they could not initiate or sustain the intervention without it. No school in this condition asked for additional support, but if they did, we would have provided it and handled their data appropriately. It is also important to note that the "no support" group was essentially a "support as usual" group in the context of school improvement efforts in Wales. Typically, schools would seek a training course, send their staff on the course, and then implement interventions on their own (unless they specifically purchase additional support with implementation). Thus, we believe that the no support trial arm is an ecologically valid comparison for inclusion within this study.

Baseline Assessments
The children completed the Mathematics Fluency and Calculation Tests (MFaCTs; Reynolds et al., 2015). The Grades 1-2 fluency assessment measures addition and subtraction fluency and is intended for children aged between 6-years 0-months and 8-years 11-months. The Grade 3-5 fluency assessment measures addition, subtraction, multiplication, and division fluency; this assessment is intended for children aged between 8years 0-months and 11-years 11-months. We used both measures with all the children in the sample to provide an inclusive overview of their skill progress across the intervention. To reduce practice effects, the MFaCTs assessments offer parallel test forms. The published statistics for these tests show high internal reliability across ages (a > .80). We used Form A during the baseline assessments.
The children came out in a group to complete these assessments but filled in their forms individually and in silence. The assessors provided each child with a pencil and the test form. The children completed MFaCTs: Grades 1-2 first. They had 5-minutes to answer as many of the 100 questions on the page as they could; working across the page from left to right. If they did not know the answer to a question, they were allowed to skip it and move onto the next one. Once the timer finished, an assessor instructed the children to turn the form over so that they could collect them. The children then repeated this procedure for the MFaCTs: Grades 3-5 assessment.

Eight-Month Follow-up Assessments
Eight months post-randomization, we reassessed the children who participated in the study. This process mirrored the administration of the baseline assessments, with the children completing both MFaCTs fluency assessments (Form B). Figure 1 outlines the flow of participants from enrollment to the final analysis. Prior to the follow-up assessments, four schools indicated that they were no longer using the SAFMEDS intervention due to unforeseen challenges with staffing. Three of these schools were happy for us to still collect follow-up data from their children (denoted as intend to treat); whilst one school was unable to accommodate this (denoted as withdrawal).

Analysis
The data for this study falls within two hierarchal levels (level 1 ¼ children, level 2 ¼ school). Due to children being nested within schools, we analyzed the data using a multi-linear mixed effects model. This analysis is consistent with other studies that have adopted cluster RCT designs (see, for example, Jahoda et al., 2017;Zimmermann et al., 2014). Linear mixed-effect models enable analysis of continuous outcome variables within hierarchal research designs by partitioning the overall variance of the outcome variable into factors that correspond to the different levels of the hierarchy (Gałecki & Burzykowski, 2013). Baayen et al. (2008) further outlined some of the advantages of using mixed-effects modeling over univariate alternatives, such as ANOVA or ordinary least squares regression.
Due to lack of availability of standardized scores for the range of ages included within this sample, we opted to analyze the children's raw scores on the MFaCTs measures. We used Stata v13.0 to analyze the raw data from this trial. Using Xtmixed, we assessed the interaction between time (baseline versus follow-up) and trial arm (ongoing support versus no support) across the fluency (MFaCTs) measures. Level 1 within our model contains covariates associated with individual children, these were: gender, predominant home language, eligibility for free school meals status (eFSM), and school year group. Level 2 within our model refers to the covariates associated with each school, these were: school administrative county, trial arm, and time. The model also generated the intraclass correlation coefficients values (ICCs) associated with each level of the model.
To assess the impact of support, we calculated a Cohen's d effect size for each measure. To calculate Cohen's d and the associated 95% confidence intervals we adhered to Feingold's (2015) formulae. We have discussed the outcomes of the results in relation to  Cohen's (1988) benchmarks, whereby an effect is small (d ¼ 0.20-0.49), medium (d ¼ 0.50-0.79), or large (d ! 0.80). We carried out sensitivity analyses by repeating the main analysis using multiple imputation and a complete cases analysis approach. The effect sizes varied minimally (refer to supplementary material). Existing published data suggest that certain factors predict an attainment gap between sub-groups of school-aged children. These include differences in outcome variables across genders and levels of social deprivation (OECD, 2012). Moreover, Van Rinsveld et al. (2017) provided evidence to suggest that bilingual individuals rely on differential activation patterns in the brain to solve simple and complex arithmetic questions in their different languages. As such, we also conducted a series of moderation analyses to investigate the effects of these variables (refer to supplementary material). We found no evidence of these factors moderating the effect of trial arm on children's mathematics outcomes.

Support Model
Figure 1 outlines the number of schools who completed each support visit. By the final visit, two schools allocated to receive ongoing support had stopped using the SAFMEDS strategy due to unforeseen changes to staff availability. Of the schools continuing to use the SAFMEDS strategy, all but one engaged with the three support visits. Seventeen schools allocated to the support arm made email contact with the first author between visits to access further support.

Fluency Outcomes
We were interested in investigating whether implementation support from a researcher could help improve children's fluency outcomes during a teacher led SAFMEDS intervention. In terms of the MFaCTs: Grades 1-2 assessment, the statistical analysis revealed a small positive effect of ongoing support over no support on the children's addition and subtraction fluency between baseline and follow-up (Trial arm x Time: b ¼ 2.92, SE ¼ 0.86, p ¼ .001, d ¼ 0.23). A pairwise comparison of marginal linear predictions, with Bonferroni correction, revealed significant improvements on this measure for children in both arms. Children's raw scores in the support arm improved to a greater extent on average between baseline (M ¼ 12.00) and follow-up (M ¼ 22.59; p < .001) compared to children in the no support arm (M baseline ¼ 9.02, M follow-up ¼ 16.50, p < .001).
Analysis of the MFaCTs: Grades 3-5 showed that ongoing support has a small positive effect relative to the no support arm, on the children's addition, subtraction, multiplication, and division fluency (b ¼ 2.68, SE ¼ 0.75, p < .001, d ¼ 0.25). Bonferronicorrected pairwise comparisons revealed that children's raw scores on this measure improved significantly in both arm of the study. Children in the ongoing support arm improved to a greater extent between baseline and follow-up (M baseline ¼ 8.52, M follow-up ¼ 19.12, p < .001) than children in the no support arm (M baseline ¼ 5.94, M follow-up ¼ 13.76, p < .001). Table 6 displays further descriptive statistics from the linear mixed effects analysis for both MFaCTs outcomes.

Discussion
Our aim for the current study was to gain insight into the putative benefits of providing teachers with implementation support throughout a SAFMEDS mathematics program. An increasing number of teachers across North Wales are using the SAFMEDS strategy to support children's fluency of basic mathematics skills. Yet, no known research internationally had investigated whether implementation support from a researcher can lead to better fluency outcomes than the more traditional "no support" approach following teacher training. Identification of a successful coaching model could help researchers to support this program at scale, help teachers advance their PD, and improve the outcomes of the children they teach. The results from this cluster RCT suggest that providing teachers with initial training in SAFMEDS and then three 1-hour visits and email contact with a researcher has a positive effect on children's fluency of arithmetic facts compared to initial training only. This paper also contributes to the growing literature reporting the effects of coaching teachers to implement evidence-based interventions within their schools; with a specific focus on mathematics outcomes.
The Education Endowment Foundation (Education Endowment Foundation, 2019) are a leading UK charity that support the generation of research and good practice within schools. They aim to support teachers to use evidence that works to improve educational outcomes for children. In their recent implementation guidance report, the EEF identified the importance of reinforcing initial training for interventions with expert follow-on support within school. The results from our study further support this guidance in the context of a SAFMEDS mathematics intervention. Whilst children attending schools in the no support arm of this trial did improve their fluency of arithmetic facts, children made more significant progress when their teacher received coaching to support their implementation of the SAFMEDS strategy.
Data from Kraft et al. (2018) supported the idea that someone with expertise can coach teachers to implement evidence-based interventions in schools. The results from their meta-analysis revealed that PD programs with an element of coaching can have a positive effect on student achievement outcomes by þ0.18 SD; although this largely reflected their application to literacy and content-based interventions. In contrast, Lynch et al. (2019) meta-analysis suggested that there was no added benefit to having a coaching element as a part of the format for PD interventions for mathematics and science. In the current study, we provided direct experimental manipulation of a coaching element to the SAFMEDS intervention and found the effect of coaching to be between þ0.21 and 0.23 SD across the MFaCTs measures; these outcomes are similar to Kraft et al.'s findings. Our results provide some additional support for the effectiveness of teacher coaching in the context of a fluency-based arithmetic intervention. Kraft et al. (2018) analysis also revealed that effect sizes varied significantly depending on whether the researchers devised their own assessments or administered standardized tests. When considering effect sizes within education research, Kraft (2020) outlined that researcher-designed assessments often reflect content that more closely align with the outcomes of the evaluated program, compared to the broader scope of standardized assessments. The MFaCTs measures are published and standardized, however the focus on fluency of arithmetic facts aligned closely with the content the children covered within the SAFMEDS sessions. This may have inflated the observed effect sizes compared to alternative standardized assessments. When interpreting the results from the current study, it is important to consider the underlying mechanisms and social contingencies that might have made coaching effective. First, the support visits served to provide teachers with feedback to improve their implementation fidelity. Durlak and DuPre (2008) reported that without support teachers are often unable to implement an intervention to 100% fidelity following training. This is not surprising given that field studies come with additional extraneous variables compared to efficacy/laboratory designs (Cook & Odom, 2013). However, improved implementation fidelity of evidence-based interventions in the classroom can lead to improved student outcomes (Durlak & DuPre, 2008;Ysseldyke et al., 2003).
By providing teachers with in-situ support during the present RCT, our aim was to help them deliver the program in a way that more closely aligns with its intended design (e.g., ensuring the children engaged with the practice regularly, discussing methods to address and reduce cheating, as well as reviewing and acting upon children's progress data). Improved adherence to the procedural aspects of the program may explain why the children who attended schools allocated to the support trial arm made greater fluency gains.
Second, between each visit, the teachers had the opportunity to adapt their practice based on the feedback they had received. By design, PT practices allow teachers to monitor and reflect upon their children's learning (Lindsley, 1995). If children are not making desirable progress toward fluency, then their teacher should adapt the instruction or materials that they provide. Through session observation and review of these data, the first author would have been able to see progress across the program. As such, there is a level of accountability that the teachers might have experienced to avoid feeling embarrassed during the following support visit. In a qualitative evaluation of a coaching program in a healthcare setting, Liddy et al. (2015) reported that coaches helped patients realize that they need to play an active role in managing and improving their health. The patients also reported that their personal accountability increased as a result of their engagement with the coaches because they knew someone else was monitoring their engagement with the program. It seems feasible that this finding could extend to programs relating to school-based educational interventions. It is difficult to disentangle implementation fidelity and accountability, but both of these mechanisms provide direction for future research in this area.
We acknowledge that this study would have been enhanced if we collected data relating to teachers' and children's implementation fidelity across both trial arms. By employing blind observers to attend a SAFMEDS session in each school following each cycle of support visits, it would have been possible to directly assess the effects of expertise on the teacher's implementation fidelity. Moreover, analysis of these data could identify common aspects of the strategy that teachers struggle, or fail, to implement in school settings. Due to practical constraints and funding, we were unable to incorporate this into the current study. However, this would provide a valuable extension to future replications. It is possible that the implementation support offered by the researcher helped teachers to: interpret learning pictures more readily and accurately; identify and correct children's procedural steps as they progressed through each SAFMEDS timing; as well as manage challenges such as cheating and identifying appropriate materials.
Whilst we did not carry out a formal economic analysis, we believe that this support model may be a cost-effective and feasible alternative to embedding a researcher in each school to run and maintain a SAFMEDS intervention. Adoption of the current support model would enable a researcher to provide necessary implementation support at scale and may encourage teachers to use the intervention beyond the termination of a research study. Costs associated with the replication of this support model include a researcher's time (three 1-hour support visits and designated time to respond to email queries), cost of travel between schools, the cost of materials (e.g., printing each SAFMEDS deck double-sided onto card; at approximately 6 sheets of A4 card per deck per child), and the cost a teacher/TAs time to prepare and deliver three SAFMEDS sessions per week (with each session lasting approximately 20-minutes).
The results from the current study suggest that initial training can provide teachers with skills to implement the SAFMEDS strategy in their school. Children across both trial arms evidenced improvements in their fluency of arithmetic facts between baseline and follow-up across both MFaCTs measures. Support from a researcher helped teachers to elicit greater fluency progress from the children that they worked with. Further research is still needed to establish the components of this model that make the support effective; including the exploration of the effects of coaching on teacher's implementation fidelity and perceived accountability.