Discrepancies between teachers’ self-reported perceived skills and use of classroom assessment practices: The case of Botswana

Abstract As teachers implement a diverse repertoire of assessment methods to gather information about students’ progress, they may recognise shortfalls in their own assessment skills. These perceived growth areas may be productive targets for future professional learning. To quantify any discrepancies between teachers’ self-reported assessment use needs and their own skills, we surveyed 691 teachers from selected public schools in Botswana. Multivariate analysis of variance and descriptive discriminant analysis were used to relate discrepancies between classroom assessment skills and usage to certain teacher characteristics. We found teachers felt most skilled at criterion-referenced testing practices and used them the most often; however, their frequency of use significantly exceeded their reported skill level. Conversely, teachers’ perceived skill in constructing objective test items generally exceeded their use. Moving forward, planning of future professional learning experiences for teachers in Botswana can draw on this study’s findings and survey instrumentation.


Introduction
Current understanding of teacher professional learning emphasises that teachers' learning outcomes depend critically on their motivation to re-evaluate and change their current practices (Kennedy, 2016;Kraft et al., 2018;Opfer & Pedder, 2011).Teachers recognise the importance of their continued professional learning about assessment, although they report different specific training needs (DeLuca et al., 2018;Livingston & Hutchinson, 2017).Leveraging teachers' own perceived learning needs may benefit the effectiveness of professional development programming about assessment (Andersson & Palm, 2018;Fulmer et al., 2015;Korthagen, 2017).

ABOUT THE AUTHORS
Setlhomo Koloi-Keaikitse is a Senior Lecturer at the University of Botswana with a PhD in Educational Psychology.She currently teaches Research Methods and Evaluation and test and measurement courses to both graduate and undergraduate students.Anne Traynor is Associate Professor of Educational Psychology and Research Methodology in the College of Education at Purdue University.Her areas of expertise are test-to-curriculum alignment methods, and score reliability and precision estimation methods.
We conceptualise teachers' own perceived assessment skills as views of personal effectiveness on a range of data acquisition, interpretation, and evaluation activities to monitor student learning outcomes and inform other educational processes (Pastore & Andrade, 2019).
Teachers vary considerably in their beliefs about and practical approaches to assessment (e.g., Brown et al., 2019;DeLuca et al., 2018), perceived assessment skills typically have been measured by presenting teachers with a list of student assessment activities and asking them to rate their expertise on each (DeLuca et al., 2016;Zhang & Burry-Stock, 2003).For example, DeLuca et al. asked teachers to rate their skill level pertaining to this statement: "I provide timely feedback to students to improve their learning" (p.255).The skill rating scales for such items are anchored with descriptors (e.g., "some experience," "very skilled") to support response reliability and avoid measuring teachers' beliefs, emotions, or values about assessment, which are conceptually distinct constructs.We view teachers' own perceived assessment skills as different from assessment confidence or self-efficacy but note that they are sometimes measured similarly (e.g., Levy-Vered & Nasser-Abu Alhija, 2015).About 20 years ago, Zhang and Burry-Stock (2003) found no significant multivariate relationship between years of teaching experience and seven perceived assessment skill factors.In a more recent study, Coombs et al. (2018) found teachers' confidence in designing classroom assessment and monitoring one's own assessment practices increased across career stages.Although the rating scale labels for each assessment skill in the confidence scale were evaluative in nature (e.g., "novice," "proficient," "expert"), the items seem likely to have captured teachers' own perceived assessment skills.These studies were conducted in Canada and the United States, however, so their results' generalisability to nations with different teacher retention patterns is uncertain.While many studies have made inferences about discrepancies between classroom assessment competencies that teachers have and those they need, to the best of our knowledge, no studies have attempted to quantify the magnitude of any such discrepancies on specific assessment skills.Further, much of the research on teacher perceived skill and use of classroom assessment practices has been conducted in Europe, North America, and New Zealand; little research has been conducted in developing countries like Botswana, making it difficult for educators there to have adequate information about teachers' practices that may influence learning in their context.This study was therefore meant to examine classroom assessment skills and use by primary and secondary school teachers in Botswana.

Classroom assessment practices
Assessment entails a broad spectrum of activities that includes collection, analysis, and interpretation of information for decision-making.The responsibility of teachers is to collect information through various assessment methods that can be used to make informed decisions about students' learning progress.Classroom assessment information also helps teachers to evaluate their teaching practices by finding out what they taught well and what they need to modify.Zhang and Burry-Stock (2003) argued that to be able to communicate assessment results effectively, teachers must possess a clear understanding about the limitations and strengths of various assessment methods.Teachers must also utilize clear terminology as they use assessment results to inform other people about the decisions about student learning.Stiggins (2001) echoed the same sentiments by arguing that in the past, schools were designed to use assessment results to sort students from the lowest to the highest achievers.When assessment information was used this way, many students did not perform well and had a sense of hopelessness in learning.However, over the past few decades, the mission of sorting students in rank order has evolved into teachers being held accountable for ensuring that all students are accorded the chance to meet their educational potential.
Debates regarding student assessment methods have often been top agendas in many educational forums.Educators have divided opinions about the best methods of assessing student learning outcomes.Some educators advocate for the use of traditional forms of assessment such as multiple choice and other objective tests; others advocate for extended performance assessments such as portfolios, journal critiques, and research essays.Those who support traditional forms of assessment believe that such tests are appropriate for assessing the skills and knowledge that students are expected to develop within a short period of time, are fair, and can have relatively high content coverage (Segers & Dochy, 2001).Those who support the use of alternative assessments argued that traditional forms of selected-response assessment test facts and skills in isolation, seldom requiring students to apply what they know and can do in reallife situations.They also suggest that traditional forms of assessment may be misaligned with contemporary academic content standards.They argue that over-reliance on this type of assessment often leads to instruction that stresses basic knowledge and skills, rather than complex reasoning.They argued that if teachers want to measure students' ability to engage in debate, write a poem, tune an engine, use a microspore, or prepare a meal, only authentic performance assessments can measure such skills.(Frey & Schmitt, 2007;Reynolds et al., 2009).
Criterion-referenced measures are those which are used to ascertain an individual's status with respect to some criterion or performance standard.The student's performance is compared to some established criterion, rather than other students' performance (e.g., Lok et al., 2016).The main emphasis of using criterion referenced testing practices is for teachers to ensure that individual students attain the set objectives.When students are evaluated, focus is placed on the extent to which each individual student was able to attain the set objectives.Some researchers argue that the process of collecting, analysing, and interpreting assessment information does not necessarily require sophisticated statistical analysis of assessment results.Simple tallies of how many students missed each assessment item or failed to meet a specific criterion may provide sufficient evidence, and only require teachers to have basic statistical competencies (Guskey, 2003).Although this assertion may be true, determining if teachers in Botswana perceive a need to use statistical techniques for analyzing and interpreting assessment data could also inform the development of professional learning experiences.

Gaps in teacher assessment practice in Botswana
As a developing country, Botswana identifies assessment as a critical component of education reform and improvement.Sound classroom assessment practices are imperative for helping students attain the set educational objectives.The Government of Botswana has thus recently promulgated several policy frameworks and strategies aimed at improving the teaching and learning outcomes in schools.These include Vision 2036, the Botswana Education andTraining Sector Strategic Plan (ETSSP 2015-2020), and the National Curriculum and Assessment Framework (2015), all of which envision Botswana to be a knowledgeable nation with a relevant, quality, and outcome-based education system.Irrespective of these plans however, increasing failure rates on the national examinations that determine admission into tertiary education suggest a need for better alignment between assessment and instruction (Makwinja, 2017) and more "student-friendly" assessment methods (Suping, 2022, p. 6).Koloi-Keaikitse (2016) highlighted that teachers in Botswana continue to decry the level of assessment training they receive and the general lack of alignment between assessment training and their day-to-day classroom assessment activities.
Gathering information about teachers' classroom assessment competencies in Botswana to determine their capabilities and weaknesses can inform countries with similar educational systems toward achieving their educational strategic policy goals.Such information can be used by institutions that conduct teachers' education and professional learning.The same information can be used by policy makers and assessment or curriculum experts to gauge what types of assessment training teachers need and align it with skills teachers can use in their daily instruction.
This study therefore investigated the extent to which teacher characteristics of educational level, standard/form taught, and teaching subjects are related to any existing discrepancies between their perceived skills and their use of different classroom assessment practices across Botswana.Primary and secondary school teachers were asked about both their existing assessment skills and their usage of those skills.We anticipated teachers' expertise with certain skills might be misaligned with the assessment needs they encountered daily in their classrooms.

Study population and sampling
A representative sample of N = 691 teachers, selected from nine educational inspectoral areas in Botswana participated in the study.To ensure that teachers who participated in the study were representative of relevant subgroups, stratified random sampling was used to select teachers based on their educational level, teaching level (i.e., Standard or Form), and subject taught.Among the sampled teachers, 265 were primary school, 243 were junior secondary school, and 183 were senior secondary school teachers.Primary school teachers in Botswana teach one standard (grade) each year, and they teach all the subjects.Those who teach in junior and senior secondary schools teach specialised subjects that they offer to different classes.Teachers in different inspectoral areas and schools have varying levels of teacher training.Some teachers hold certificates in primary education that are the lowest level of postsecondary teacher training, which was meant to give teachers basic skills they need for classroom teaching.These certificates have since been replaced by an advanced diploma that has been adopted to enable increased production of teachers trained at a higher level.In this study, some primary school teachers had a certificate in primary education (n = 55), some had a diploma in primary education (n = 170), and others held a degree in primary education (n = 40).Among those who teach at junior secondary school level, some had a Diploma in Secondary Education (n = 162), some a Degree in Secondary Education (n = 79), and only a few had a Master's Degree in Education (n = 2).Those who teach at the senior secondary school level had degrees in Humanities, Science, and Social Sciences, as well as post graduate Diplomas in Secondary Education (n = 175), and Master's Degrees in Education (n = 8), see Table 1.

Instrument
The Classroom Assessment Practices and Skills (CAPS) questionnaire (Koloi-Keaikitse, 2016) was used as the data collection instrument.The initial set of items in the CAPS questionnaire was adopted from the Assessment Practices Inventory (API; Zhang & Burry-Stock, 2003).The Assessment Practices Inventory was chosen as a basis for the CAPS from among other assessment competency measures because it surveys about a wide range of specific assessment activities that are currently used in Botswana and in line with the conceptualization of assessment competency as a cognitive deposition acquired to help teachers master how to consistently deal with assessment demands in their classroom settings (Herppich et al., 2018).The Skill items were designed to measure teachers' perception of their own assessment skills on a 5-point scale with the labels "not at all skilled," "a little skilled," "somewhat skilled," "skilled," and "very skilled."Using the same of items as in the Skill Scale, the Use scale was intended to measure teachers' self-reported assessment practices on a 5-point frequency scale with descriptors "not at all used," "seldom used," "used occasionally," "used often," and "used very often."Item stems included, for example, "Matching performance tasks to instruction and course objectives" and "Assessing students through observation" (p.339).After pilot testing, the final questionnaire had 29 perceived skills items (Cronbach's α = .95),and 29 perceived use items (Cronbach's α = .91).This indicated that teachers responded consistently to most items in the questionnaire.

Teacher recruitment and data collection procedure
A letter to solicit permission from school heads (principals) to administer the questionnaire to the teachers was used.Permissions were granted verbally.To recruit teachers for participation in the study, the principals/headmasters introduced the first author to the teachers in their working areas (staff rooms) during tea breaks.During these introductions the first author gave a brief overview of the study, addressed any concerns.Those who agreed to participate were given the questionnaire to complete.The completed questionnaires were immediately collected by the researcher and checked for any accidental omissions.

Data analysis
Exploratory Factor Analysis (EFA) was used to reduce many items into a smaller number of factors that can be analyzed and interpreted with ease.Data were factor-analysed with principal axis factoring and Promax oblique factor rotation (Thompson, 2005).Based on scree plots and the eigenvalues greater than 1 guideline, the 29 items from the Skill subscale loaded on six factors that accounted for 53% of variation in item responses after extraction, and 45% of the variance was explained by the six factors for the Use subscale.Both the "skill" and "use" scales had complex factor structures: many items had non-negligible cross-loadings (i.e., loadings with absolute value greater than 0.40 on more than one factor).The six factors for perceived "skill" and frequency of "use" of classroom assessment practices could be characterised as: (1) Criterion Referenced Testing, (2) Grading Practices, (3) Statistical Application, (4) Assessment Application, (5) Essay Items, and (6) Objective Items (see Tables 2 and 3).Items were generally assigned to the factor on which they loaded most highly, except for five items that had similar loading values (± 0.03) on multiple factors.Those items were assigned to factors based on their substantive content, since we intended to sum item ratings to generate sub-scores on the factors and wanted those scores to be meaningful.
To represent each factor in further analyses to address our research question, we computed the mean rating on each set of items for every participating teacher.Each teacher thus had mean scores for perceived skill and frequency of use pertaining to each of the six classroom assessment factors.All the mean scores ranged between 1 and 5 and had reliability coefficients that we judged to be acceptable (see Table 4).Discrepancy scores for every teacher representing the relative difference on each assessment factor between perceived skill and reported usage were computed by subtracting the skill from the used mean scores (James & Pedder, 2006;Winterbottom et al., 2008).2.4.1.How are teachers characteristics of educational level, standard/form taught, and teaching subjects related to the discrepancies between how skilled teachers believe they are compared to their frequency in using different classroom assessment practices?
After computing the discrepancy scores, multivariate analysis of variance and discriminant function analysis were used to ascertain the relationship between teachers' skill-use discrepancy scores on each assessment factor of Criterion Referenced Testing, Grading Practices, Statistical Applications, Assessment Applications, Essay Items, and Objective Items and their demographic characteristics educational level, teaching level, and teaching subjects.The discrepancy variables served as dependent variables, and teacher characteristics (educational level, teaching level, and subjects taught) were fixed factors.Our method differs from James and Pedder (2006) by choice of multivariate regression rather than cluster analysis as a final analytic step, as well as the use of different assessment practices survey instrument.
Assumptions of parametric statistics were examined; multivariate normality was assessed by checking univariate normality of each dependent variable using Q-Q plots (Tabachnick & Fidell, 2007).Although data were not perfectly normal for all the dependent variables, the non-normality was not severe; we judged the normality assumption to be tenable.Box's M test indicated that the assumption of homogeneity of the covariance matrices was also tenable for the dependent variables across teacher educational level and subject taught groups.

Ethical considerations
The study was approved Ball State IRB (Ref 231,092) and the University of Botswana.Permission to conduct the study was obtained from the Permanent Secretary, Botswana Ministry of Education (E1/20/2/X1(46)).Furthermore, permission to visit schools and administer questionnaire to teachers was sought from Regional Education Directors (CREOS 4/3/15 1 (39).Permission to contact teachers was solicited from headteachers of all selected schools.Individual informed consent was obtained from each teacher who participated in the study.

Results
Multivariate tests results showed some significant differences in the discrepancies between teachers' self-perceived skill and frequency of use for different classroom assessment practices by educational level, Wilks λ = .90,F (5, 95) = 12.000, p < 0.001, following this significant result, descriptive discriminant analysis was conducted to determine how teachers of different educational background differed.indicate teachers' discrepancies for Assessment Applications tend to be most similar in size and direction to their discrepancies for Essay Items and CRT Practices.Teachers with a certificate as their highest credential showed negative discrepancies in grading practices, statistical applications, assessment applications and objective items, an indication that they use more of these assessment practices even if they are less skilled in such.Teachers with a diploma showed negative discrepancies on their perceived skill and use of objective items and assessment applications.Teachers with a degree only reported positive discrepancies (see Table 5).
Multivariate tests were similarly significant for standard/form taught (teaching level) Wilks λ =.78, F (7. 31) = 24.00,p < 0.001.. Following these significant results, descriptive discriminant analysis was computed to establish factors that might have led to the differences.Structure values (coefficients) revealed that teachers had major discrepancies in objective items and statistical applications.Further examination of group means revealed that lower primary school teachers reported the highest discrepancies in objective items, and statistical applications, followed by middle primary school teachers who also reported some discrepancies in objective items, and statistical applications.Upper primary school teachers reported very small discrepancies in statistical applications and objective items, while Forms 1-3 and Forms 4-5 teachers did not report any discrepancies in any of these factors (see Table 6).This is an indication that teachers who teach at primary school level especially those who teach lower classes used statistical applications and objective items more than they are skilled in these classroom assessment practices.Table 6. Structure function values and group means by standard/form taught for the discrepancy between skill and use of classroom assessment practices Similarly, there were significant differences between the discrepancies of how skilled teachers believed they were in assessment practices compared to their frequency of use of different areas of classroom assessment practices by teaching subjects, Wilks λ = .76,F (4, 81) = 24.00,p < 0.001.Structure values were examined, and results showed the discrepancy between perceived skill and frequent use of criterion referenced testing practices, followed by the discrepancy between perceived skill and frequent use of grading practices, assessment applications and statistical applications had the largest coefficients on the first discriminant function.Group means were interpreted to identify teachers that showed major discrepancies by teaching subject (see Table 7).Results showed that those who teachers who teach humanities, sciences, business, and practical-based subjects' negative discrepancy in the use of criterion referenced testing practices with those teaching humanities and science subjects showing higher discrepancies.This showed that teachers across all subjects' areas used criterion referenced testing practices more even if they are less skilled in such practices.

Discussion
Findings of this study showed that teachers with the lower training levels of a certificate or advanced diploma showed negative discrepancies on their perceived skill and use of grading practices, statistical applications, assessment applications and objective items, an indication that these teachers felt less skilled in assessment practices they used most often.This is broadly consistent with Zhang and Burry-Stock's (2003) earlier findings using the same instrument in the USA.In our results, teachers with higher levels of training such as a degree did not show any discrepancy between their perceived skills and use of various classroom assessment practices, an indication that teachers with higher training believed they possessed those assessment skills that they needed to use most often.Teacher educators in Botswana should attend to this result, as it confirms findings of other researchers showing that teachers with more training (preservice and inservice) are more likely to use varied forms of assessment practices than those with less training (Alkharusi, 2011, Lander et al., 2015;Mellati & Khademi, 2018).Career stage also seems to matter for assessment use, with early-career teachers tending toward assessment approaches that are standardised among students, while mid-and late-career teachers tend to use assessment approaches that are more differentiated (Coombs et al., 2018).
Primary school teachers also reported relatively higher use than perceived skill for statistical applications and objective items.On the other hand, Forms 1-5 (secondary school teachers) reported more skill than use of statistical applications and objective items.These results suggest that even though primary school teachers may be implementing a curriculum that requires them to use objective items and statistical applications, they perceive themselves not well skilled in these assessment practices.These findings, to some extent, coincide with arguments raised by Stiggins (2005) who said that teachers are likely to be taught assessment skills that they may not need in their day-to-day classroom practices.This makes it indispensable for pre-service teacher training to emphasize assessment practices that can be readily integrated into the school curricula that teachers are most likely to be assigned to teach.Based on these sentiments, Wyatt-Smith et al. (2020) and Alkharusi (2011) emphasised the need to align assessment training with teachers' curriculum requirements.
Findings showed that science and humanities teachers, particularly, reported higher use than skill for criterion-referenced testing practices.Also, the more skills they have in grading practices and statistical applications, the more they tend to use these classroom assessment practices.Even though all teachers regardless of teaching subject are expected to use statistical applications for appropriate monitoring of students performance, findings of this study are an indicator of shortcomings in the use of appropriate methods for monitoring students learning outcomes as has been highlighted in earlier literature.For example, Onyefulu (2018) and Guskey (2003) reported significant differences between teacher groups in the use of statistical applications to improve assessment practices.
Collectively findings of this study confirm the concerns that have been raised more than three decades ago by Schafer and Lissitz (1987) and are still a concern by current assessment professionals (Brookhart, 2011;Pastore & Andrade, 2019;Wiliam, 2011) who argued that even though teachers are expected to implement recommended measurement concepts in their day-to-day classroom practices, they may lack an adequate knowledge base about these concepts.Findings of this study may provide valuable insights for understanding teachers' classroom assessment practices and needs in Botswana and other parts of the world.Findings in this study may be an indicator that professional learning about assessment skills tends to be underemphasised during pre-and in-service teacher training, relative to the substantial amount of time teachers spend on assessment-related activities (e.g., Kang & Furtak, 2021;Wylie & Lyon, 2015).Even though assessment of student learning is one of the teacher's main responsibilities, teachers feel that they do not possess sufficient skills to apply the most appropriate assessment practices and make well-informed decisions based on assessment results (Mertler & Campbell, 2005;Koloi-Keaikitse, 2016).It is also clear that teachers' values and preferences for assessment professional learning vary considerably (DeLuca et al., 2016), which suggests the need for more flexibility and incorporation of teachers' individual perceptions in determining the content for professional learning sessions.Professional learning about assessment may encourage reflection that causes teachers to reconceptualise their identity as an assessor and make decisions that are better justified (Xu & Brown, 2016).

Limitations
Though this study revealed some important findings that can inform both policy and practice, our findings were limited by use of a self-reported survey questionnaire only.Focusing only on survey of perceptions without observing the actual assessment practices, analysing relevant documents, or having dialogue with teachers was a major limitation in this study.Qualitative approaches may have given a clearer picture of classroom assessment practices at which teachers perceive themselves to be skillful and revealed why teachers use certain practices more often.Therefore, further research where qualitative approaches are used is recommended.

Conclusion
Teacher classroom assessment practices are recognized by educators around the world as integral in the instructional process and central ingredients for helping students learn.Understanding teacher classroom assessment practices provides insights into how such practices may influence educational practice and policy.Our study findings have several implications for improved teacher assessment practices.Evidence of existing discrepancy between teacher perceived skills and their use of assessment practices across all teaching levels and subject areas indicates a need to firmly link assessment training requirements to assessment practices teachers are likely to use often in their classroom teaching.We do not conclude that teachers should not be offered training in all assessment practices, but if their curriculum requires them to be more skilled in one form of assessment practice than another, then it remains imperative to spend more time on what teachers may need most.Yan et al. (2021) meta-analysis found that teachers' education and training was the individual factor that had the most consistent positive relationship with their implementation of formative assessment practices.Thus, analyzing the content of teacher education course requirements and in-service workshop offerings with reference to teachers' teaching curriculum needs may be a strategy to distribute that professional learning time as effectively as possible.Paying attention to critical assessment skills that teachers are required to use based on their subject areas and teaching levels remains key in ensuring that all learners are assessed appropriately, hence, given quality education.
Follow-up studies need to be conducted to ( 1) understand what causes the discrepancies between teachers' perceived skill and their use of certain classroom assessment practices, particularly criterion-referenced assessment practices, and (2) confirm whether there is a linkage between assessment practices on which teachers perceive their use needs exceed their skill level, and curriculum delivery expected of teachers.If this is verified, alternative measures should be taken to evaluate assessment training programs with a possibility of reducing the amount of time spent on teaching assessment practices that teachers are less likely to use, and focusing attention on assessment concepts that are more relevant to their curriculum teaching needs.

Table 2 . Factor loadings for teachers perceived skill in classroom assessment practice scale Items Loadings Factor 1. Use of Criterion Refenced Testing Practices
Factors score skill-use discrepancies for Objective Items (.85), Statistical Applications (.49), and Grading Practices (.36) had their largest structure coefficient values on the first discriminant function.Assessment Applications (.92), Essay Items (.40), and Criterion-Referenced Testing Practices (.37) skill-use discrepancies had their largest structure values on the second discriminant function.Assessment Applications has a larger structure value on the second discriminant function than on the first discriminant function.This may

Table 7 . Structure function values and group means by teaching subject for the discrepancy between skill and use of classroom assessment practices Structure values for predicting Teaching Subject Group means for Criterion Ref. Testing, Grading Practices, Statistical Applications, Assessment Practices
Notes: Criterion Ref. = Criterion Refenced Testing Practices, Appl.= Applications, Humanit.= Humanities.