Students’ argumentation in the contexts of science, religious education, and interdisciplinary science-religious education scenarios

ABSTRACT Background Argumentation, that is the coordination of evidence and reasons to support claims, is an important skill for democratic society, developing subject-specific literacies, and can be embedded in multiple school subjects. While argumentation has been extensively researched in science education, interdisciplinary argumentation is less explored, particularly between subjects where collaboration is not the norm, such as science and religious education (RE). Yet everyday issues often involve considering information from multiple sources, such as scientific information or ethical, moral, or religious perspectives. Purpose The purpose of this study was to better understand students’ abilities in argumentation within and across the school subjects of science and RE to inform research and practice of interdisciplinary argumentation. Sample The participants of this study were 457 students, aged between 11 and 14 years, from 10 secondary schools in England. Following data cleaning, 394 student responses were analysed. Design and Methods Students completed simultaneous written assessments for argumentation in three tasks which are situated within three different subject contexts: (1) science (2) RE, and (3) an interdisciplinary context which involved argumentation from science and RE. Results In each of the three contexts, high proportions of students achieve all available marks for identifying claims and evidence. These proportions drop when constructing the link between claim and evidence (warrant) and constructing an evaluative argument. Higher performances were generally noted in the context of science and that students experience particular challenges in argumentation in the RE scenario. Conclusions This study contributes to our understanding of the challenges and successes of students’ argumentation within and across the subjects of science and RE. Implications for both research and practice are discussed.


Introduction
Argumentation is often defined as the justification of claims with evidence and reasons (Toulmin, 1958). It is widely recognised as an important skill to learn in school, both for the development of critically literate citizens and a deep understanding of the disciplines being studied (e.g. European Union 2006; Monte-Santo, 2016). Crucially, argumentation is not some marginal nicety for a select group of students: being able to select knowledge and reason with it is a foundation of learning across the curriculum (Kuhn and Moore 2015;Wolfe 2011). Beyond this, many of the issues that we face in everyday life are not confined to boundaries of disciplines or subjects, but are complex and multi-or interdisciplinary in nature, drawing on information from a range of sources (Crujeiras-Pérez and Jiménez-Aleixandre, 2019). In such contexts, concern needs to be focused simultaneously on developing students' argumentation within and across subjects/disciplines, including those that consider morals and ethics. Clearly, students need to be taught the skills of argumentation, but consideration also needs to be given to how these skills are recognised and assessed by both teachers and researchers (Duschl and Osborne 2002). While argumentation has been extensively researched in many contexts (Rapanta, Garcia-Mila, and Gilabert 2013), there has been limited research about how to assess school students' competence in argumentation in science (Obsorne et al. 2016), and fewer still simultaneously consider science and other subjects, such as Religious Education, or interdisciplinary argumentation. This paper investigates students' skills in argumentation in three tasks, where each task is embedded in different contexts: Science, Religious Education and an interdisciplinary Science & Religious Education context. We use the term 'context' to include the two curriculum subjects and the interdisciplinary space. Performance in each of these individual contexts will be of inherent interest, however, our structured and theoretically informed approach to the task construction allows for some tentative comparisons to be made between the task performances which are embedded as typical arguments in these three contexts. The findings reveal some interesting challenges and opportunities for supporting students' argumentation in and beyond the subject of science, providing useful implications for researchers and practitioners. Toulmin's (1958) model sets out the structure of arguments, including components such as claims, data/evidence, warrants, qualifiers and rebuttals and argued that argumentation could have both generic and discipline-specific features, varying in the form of argumentation and the nature of evidence being utilised (Wolfe 2011). Though argumentation is thought of as the justification of claims with evidence and reasons, it is recognised that the acts of constructing and critiquing arguments within different disciplines require slightly differing, but complementary skillsets of argumentation (Osborne et al. 2016). These skills can be fostered in school through multiple school subjects where argumentation is an important epistemic practice of the discipline (Wolfe 2011). Argumentation can reflect the epistemology of a discipline or subject as the epistemic criteria of the subject are enacted: what counts as knowledge, or evidence that can legitimately support claims within the subject. In this sense, it is an important component of the procedures and practices of a subject, in creating and deploying substantive content knowledge. The practice of argumentation in both science and religious education will now be considered, before considering the interdisciplinary context.

Argumentation in the disciplines
Argumentation has been a highly prominent area of research in science education for many years and extensively investigated under the broad goals of scientific literacy (Erduran, Ozdem, and Park 2015). For example, developing students' reasoning and ability to draw justified conclusions (Sadler 2004;Sadler and Zeidler 2005), engagement in discourse in educational contexts Sadler 2006) and acquisition of scientific knowledge (Schwarz et al. 2003;Venville and Dawson 2010;Pabuccu and Erduran 2017). Furthermore, it enables students to understand how science works and how scientific knowledge is justified or evaluated, and as such it represents an important epistemic practice of the discipline (Erduran and Dagher 2014). Research about argumentation in science education has been broad and far reaching, including work on the understanding of arguments from both students' (Berland and Reiser 2009) and teachers' perspectives (Sadler 2006), the quality of arguments produced by students , teachers' role in classroom argumentation (McNeill 2009;Simon, Erduran, and Osborne 2006); and the influence of argumentation on learning scientific skills (Aydeniz et al. 2012;Duschl and Osborne 2002). Within the realm of research on argumentation in science education has been a focus on socio-scientific issues (SSI), or socio-scientific argumentation (Sadler 2004;Erduran, Ozdem, and Park 2015). These are issues with a scientific basis but of societal concern and may be debated beyond the scientific context to include, for example, ethical and moral dimensions (Sadler 2011). However, these have not always explicitly drawn on concepts from religious education as part of the argumentation.
Religious Education in England, as well as in much of Western Europe, is often pluralistic in nature and concerned with the impartial study of different religions and worldviews, often through dialogical learning, rather than induction into a particular faith (Jackson 2015;Jawoniyi 2015). Thus, as a school subject, it is often positioned as a multidisciplinary field that draws on many cognate disciplines such as philosophy, theology, sociology and psychology among others (Freathy et al. 2017). Unlike the case of science education, argumentation has been less extensively researched in the context of religious education. However, argumentation is a strong feature of many Religious Education curriculum documents in England as students are asked to analyse and evaluate various truth claims about faith and various moral positions, and to generate well-informed and reasoned responses for themselves that draw on a range of sources (Chan, Fancourt, and Guilfoyle 2020). This form of religious education is often intended to contribute to pupils' understanding of and ability to contribute to issues of societal concern -thereby overlapping with SSI.

Interdisciplinary argumentation and transfer
Many issues we face in daily life require interdisciplinary thinking and complex reasoning drawing on multiple disciplinary knowledge bases (Crujeiras-Pérez and Jimenez-Aleixandre 2019) or the integration of moral and ethical values (Joshi 2016). However, school subjects can often be presented in fragmented and siloed ways that limit integration (Billingsley et al. 2018) and it is unusual for science and RE teachers to collaborate (Hall et al. 2014). Focusing on argumentation can be helpful in generating coherence across the curriculum, highlighting the similarities and differences between different subjects. In this sense, argumentation can be a boundary crossing mechanism. The learning of argumentation can clarify the distinctiveness of a particular subject in terms of how knowledge is justified and what counts as evidence, and can also show how there are general skills of argumentation across subjects, which can be applied beyond the boundaries of the classroom. This is useful for the purpose of coherence in learning, gaining transferable skills, but also in moving to more complex understanding of arguments needed in everyday life where issues often require the consideration of information from multiple sources, including scientific information and ethical considerations (Levinson 2010). A recent example would be the evolving debate about the wearing of face masks during the Covid-19 pandemic, that balanced the scientific efficacy of wearing facemasks with the ethics of equitable allocation (Horwell and McDonald 2020).
While the research literature on argumentation has been extensive in science education, interdisciplinary argumentation between science and other disciplines is less explored (Erduran et al. 2019). Although there has been research on argumentation on science and religion debates (e.g. Basel et al. 2014;Weiß 2016), these have often focused on aetiological issues and have been conducted in German-speaking contexts where RE is more confessional in nature.
There are a number of studies that have considered the extent to which argumentation skills might be transferred between contexts, though often these have focused on the transfer between familiar and unfamiliar contexts within a particular subject area. For example, Zohar (1996) demonstrated the successful transfer of reasoning skills between one biological topic area (seed germination) to another (rodent population size) and Khishfe (2014) demonstrated the transfer of argumentation skills from a familiar scientific context (water usage) to another familiar scientific context (water fluoridation) and unfamiliar scientific context (genetic modification). Other researchers have focused on examining students' argumentation ability in the context of interdisciplinary contexts, such as socio-scientific issues (e.g. Dawson and Carson 2020) or science-religion debates (note the distinction between religion and religious education) (e.g. Basel et al. 2014). Each of these have focused on the explicit teaching of argumentation skills to students and giving them time in the classroom to develop these skills. However, few have focused on the comparison, relationship, or transfer between science and other subject domains (Osborne et al. 2016), with a dearth of research on the simultaneous comparison of student argumentation in the subjects of science and RE as individual subject areas alongside argumentation in the interdisciplinary science-RE context. Nussbaum and Asterhan (2016) note some difficulties with the transfer of argumentation from one domain or context to another. They point out that even if students develop a nuanced understanding of the need for evidence within arguments, they may not be able to identify what counts as evidence or may misidentify evidence in unfamiliar contexts. Additionally, a lack of expertise or knowledge in the unfamiliar context may limit the ability to utilise or adjudicate evidence. Nevertheless, developing students' knowledge and understanding in domains should facilitate the transfer of knowledge between them.

Measures of argumentation
Assessing the quality of student argumentation is persistently challenging for the field, with researchers proposing a range of ways to do so (Erduran 2008;Sampson and Clarke 2008). The lack of high-quality assessment measures of students' skills and proficiency in argumentation has been recognised in previous reviews (Lee et al. 2014). Scenarios have been used as a route to assessing argumentation in SSI contexts in particular (e.g. Dawson and Carson 2017), with responses being judged at different levels of quality depending on the components of argumentation present in the student response. However, there is also a particular need to have measures which recognise the subject-specific nature of argumentation (Wolfe 2011) and which differentiate the performance between components of the argumentation skill (Osborne et al. 2016). Osborne et al. (2016) produced and validated a learning progression for argumentation in science and developed structured argumentation tasks based on scenarios to assess student argumentation. Five levels of their learning progression pertinent to this study, and illustrative examples in Science and RE contexts, are displayed in Table 1.
Assessing student argumentation using this heavily structured approach may afford greater opportunity to compare argumentation performance between tasks in ways that are less feasible in more open tasks. Furthermore, the clear stratification of the subcomponents of argumentation (or 'levels' of the learning progression) affords the opportunity to focus on student success and challenges in particular components of argumentation.
It can be seen that argumentation is an important skill for students to develop within and across subject disciplines, and that little work has been done to investigate the measurement of students' skills in argumentation. Hence, this study sought to address the following research question: How does student performance vary between argumentation tasks in science, RE and Science-RE cross-curricular contexts?

Instrument
Even with decades of research on argumentation, in science education and more broadly, the assessment of students' argumentation remains a challenge (Henderson et al. 2018). This is in part due to the diversity of theoretical approaches to argumentation, but also a recognition of the difficulty in capturing such a complex competency (Rapanta, Garcia-Mila, and Gilabert 2013). Osborne et al. (2016) sought to address the limited research on how to assess student ability in argumentation through the construction and validation of assessments for argumentation in scientific and general contexts. Both the learning progression that underpins the assessments, as well as the structural approach to the tasks, informed the assessment used in this research study. The instrument (Appendix 1) used to assess students' argumentation skills in this study comprised of three tasks: 'Christmas for non-Christians' (CfNC; Religious Education), addressing arguments over whether religious festivals can be celebrated by non-adherents; 'What's growing?' (WG; Science) addressing the biological distinction between plants and fungi; 'A zoo near you' (ZNY; Science and Religious Education), addressing a SSI of the ethics and value of zoos, previously addressed in SSI research  but explicitly including an element from religious education in the shared religious notion of stewardship (Hitzhusen and Tucker 2013). The tasks all presented two characters with different views on each topic, so that students had to identify their individual lines of argumentation.
The tasks had a similar question structure which is detailed in Table 1. Each task consisted of seven items (A-F), though the exact presentation differed slightly to avoid repetition, fatigue and learning from the test. The test was also administered in two different sequences to ascertain if any sequence effects would emerge. Each task contains two items related to identifying the claim (graded 0 or 1), two items related to identifying the evidence (graded 0 or 1), two items that ask students to explain the reasoning/warrant that links the evidence and claim (graded 0, 1, or 2), and one item that expects the student to construct an argument by asking them to decide between the two competing arguments provided (graded 0, 1, 2, or 3). Toulmin's elements of rebuttal and qualification A frog is an amphibian because it has soft, moist skin, lays eggs and has a backbone, and amphibians are egglaying vertebrates which have soft, moist skin so they can absorb water through their skin.
Drinking alcohol is wrong because the Qur'an prohibits wine and intoxication, which by analogy cover all forms of alcohol.
Providing a two-sided comparative argument Student makes an evaluative judgment about two competing arguments and makes an explicit argument for why one argument is stronger and why the other is weaker.
They claim that turtles are reptiles because they have scaly skin, but some fish also have scalesthey need another reason. But the argument about frogs explains why frogs are amphibians and not a different class, so it is a better argument.
Christians say that abortion is wrong because it is murder, but it is only murder if the foetus is human -so they need another line of reasoning.
But the Muslim argument about alcohol explains how a specific prohibition on wine and intoxication extends to all alcohol were not assessed. In total, then, the highest possible score is 11 for each task, or 33 for the whole assessment, as set out in Table 2.
The project was designed with Key Stage 3 (11-14 yrs) students in mind and teachers were recruited to the project on the basis that they fit this criterion. The assessment was constructed such that the subject matter content of the test would be largely inconsequential to their ability to engage with the tasks across the three year-groups of Key Stage 3 (Years 7-9). While the content may be related to the subject area, the student would not require deep prior knowledge as the necessary content would be provided, but the topics were selected such that the students would be broadly familiar with the ideas presented.
When tests were administered, guidance was provided explaining that students could ask questions about the meaning of any of the words on the assessment. The purpose was not to test content knowledge or reading ability, but the ability to identify components of arguments and construct an argument based on the information provided.
Piloting was carried out to get feedback on key elements of the design and construction. Six tasks (including initial versions of the three ultimately used) were provided to 3 teachers and 66 students to trial. Teachers provided their professional perspectives on the tasks, including on content area, language used, question structure, length, and even minor details such as the names and images used. Teachers were also able to report specific difficulties their students experienced when attempting the tasks. Students helped to inform the expected responses and the construction of the grading rubrics for the tasks. Through these processes, the instruments were refined to balance each of the considerations, and reduced to the three most suitable tasks.
There are some points to be noted about the design of these instruments and the implications for the interpretation of the findings. First, the tasks were intentionally designed using similar structures, informed by Toulminian argumentation as well as validated task-structures and learning progression for assessing argumentation   (Osborne et al. 2016). This, along with the piloting for feedback on difficulty, should enhance the comparability of the assessments. However, given the different subject content and topics, it cannot be guaranteed that the difficulty level is identical, so comparisons need to be cautiously interpreted and conclusions tentatively drawn. Furthermore, we note that while these tasks were designed as being rather 'typical' of the subject context they are representing, the nature of argumentation for any discipline or subject is more complex than can be represented by one task. Students' performance on a single task for that context will not necessarily represent their performance for all argumentation in that context but only for a 'typical' example. Therefore, claims about students' argumentation in each of these disciplines or contexts need also to be considered cautiously.

Participants
The participants in this study were the students in the classes of the teachers who were involved in a professional development programme for the teaching and learning of argumentation in science and religious education in England, the Oxford Argumentation in Religion and Science (OARS) project (Erduran 2020).. As part of this professional development, teachers had selected that they would work with these particular classes to trial some new teaching approaches. This assessment was given prior to trialling any new teaching approaches beyond their normal practice. Four hundred and fifty-seven students from ten schools completed the assessment. The vast majority of these students were in Year 9 (n = 404). The remaining were in Year 7 (n = 28) and Year 8 (n = 25). Just over two-thirds of the respondents were male (67%). The mean age was 13 yrs (SD 0.78 yrs).
For each item, across all three tasks, there was approximately 6% missing values (M = 5.96%, SD = 0.91%). However, given the comparative analysis involved across question types and domains (RE, Science, Interdisciplinary), missing values anywhere within the assessment will be impactful. Therefore, missing values needed to be removed and were dealt with in two ways: (1) If a response box was left empty but other boxes surrounding it on that same task were complete, the empty box was graded as 0.
(2) If a response box was left empty and the rest of the task was empty, or a large portion of that task was unanswered, then this assessment was removed for the purposes of analysis.
Following this data cleaning process, 394 student assessments remained for analysis ( Table 3). The students in this data set were still majority Year 9 (90.1%), male (69%), and had a mean age of 13 (SD = 1.1).

Analysis
The grading of the assessment was conducted by the three authors. We sought to sure a robust reliability between the grading of multiple raters. The Krippendorff's alpha test was used (Hayes and Krippendorff 2007) to estimate the interrater reliability between three raters.
Rubrics were initially created on the basis of expected responses, literature, and feedback from teachers and expert academics. We refined these rubrics through an iterative process of establishing intercoder reliability. Initially, the results show our intercoder reliability across the whole assessment was low (α = 0.6870). Following a discussion of disagreements and refinement of the rubrics in use, we evaluated another sample of student assessments and interrater reliability improved (α = 0.7258). However, in order to scrutinise and improve the reliability further, we examined each of the different question types. We had complete agreement for 1-mark questions (α = 1.0000) and an acceptable agreement for 2-mark questions (α = 0.7632). However, 3-mark questions were more problematic (α = 0.4851). We discussed the disagreements for these questions in depth and revised the rubrics accordingly. Taking a new sample of questions, we achieved an acceptable reliability (α = 0.8674). With this established, each rater continued to grade an allocation of between 100 and 250 student assessments.
Means, standard deviations, and patterns of distribution were considered for each of the task's total scores and sub-components which were assessing different levels of argumentation. The performance on each of these levels of argumentation is also considered across the three tasks.

Results
The results will be presented from three perspectives. First, in terms of the total scores achieved for task, 'Christmas for Non-Christians' (CfNC) in the Religious Education (RE) context, 'What's Growing?' (WG) in the science context, and 'A Zoo Near You' (ZNY) in the science-RE crosscurricular context. Second, each of these individual task's scores will be presented with respect to the levels of argumentation assessed (Claim, Evidence, Warrant, and Constructing an Evaluative Argument). Third, we consider the total performances for each level of argumentation. The means and standard deviations for each of these are presented in Table 4

Total task scores
In the case of the CfNC task, the distribution appears skewed somewhat more towards the lower scores (M = 5.28, SD = 2.28) than the WG (M = 6.76, SD = 2.28) and the ZNY task (M = 6.19, SD = 2.02). Each context has a similar spread of scores with standard deviations ranging from 2.02 to 2.28. Figure 1 shows these distributions. Next, we consider how the cohort performs in each level of argumentation. Figures 2-5 show the performance for Identifying Claim, Identifying Evidence, Constructing a link between Claim and Evidence, and Constructing a Two-sided Comparative Argument respectively in each of the three contexts. These findings are then unpacked for each context alone in the text below.

RE: Christmas for non-Christians
When identifying claims in the CfNC task, 76% of students were able to identify the claims being asserted by both characters in the argumentation task 22% could identify at least one claim, and only 2% were unable to identify the claims correctly (M = 1.74, SD = 0.49). Identifying the evidence in the arguments was a similarly graded task, allowing for some comparison (M = 1.22, S = 0.77). In this case, more students were unsuccessful in identifying the evidence for either argument (21%) and fewer students were successful at identifying the evidence that both characters used to support their claims (43%). The task components which focused on the warrant and constructing an evaluative argument were graded differently, but both distributions skew more towards the lower end of the available marks. Looking across Figures 2-5, the performance for the CfNC task continues to shift more to the left.

SCIRE: a zoo near you
In the ZNY task, the pattern of performance across the various elements of the task followed a similar trajectory, with those obtaining the higher mark decreasing in subsequent components; 96% identified both claims (M = 1.95, SD = 0.27), 66% identified evidence for both arguments (M = 1.61, SD = 0.27), and scores continued towards the lower end of the available marks for warrant and constructing an evaluative argument, where the majority achieved less than half the available marks. It was also the case that more students achieved zero marks for the warrant element (24%) than achieved zero for the constructing an evaluative argument element (19%).

SCI: what's growing
When identifying claims in the WG task, 84% of students were able to identify the claims being asserted by both characters in the argumentation task, 9% identified one of two claims, while 7% identified neither claim (M = 1.77, SD = 0.57). There was again a drop in performance when progressing to the next task element, where 66% of the students could identify the evidence for both arguments in this task (M = 1.58, SD = 0.66). For the two later elements, the students' performance appeared to peak in the middle with a slight skew towards the higher end for constructing an evaluative argument, where 67% achieved 2 or 3 marks. Again, more students achieved zero in the warrant element (19%) than in the constructing an evaluative argument element (14%).

Total scores for each level of argumentation
The vast majority of students were generally successful for most items of identifying claims (M = 5.47, SD = 0.89). The distribution of scores for identifying the evidence of the argument was more spread out with fewer students achieving six marks (M = 4.40, SD = 1.26). This is highlighted in Figure 3 (<67% identified the evidence in both items for any of the contexts). The distribution of total scores for the warrant element were quite spread out and tending towards the lower end. While the maximum score was 12, the mean score was 4.53 (SD = 2.20). Figure 4 shows how score distributions differ between the domains, with higher proportions of students scoring in the middle for WG, higher proportions scoring lower in CfNC, and proportions more evenly spread for ZNY. There was a wider spread of scores for constructing evaluative argument items, with a tendency towards the lower values. The mean score for these items overall was 3.84 (SD = 1.96). Figure 5 appears to show that higher scores were achieved by higher proportions of students in the domain of science. When seen in the context of the other tasks in Figures 1-5, it can be visually observed that across the figures that scores shift more to the left for the CfNC task. With the exception of identifying evidence, there were higher performances for the WG than the ZNY task in each level of argumentation.

Discussion
The purpose of this study was to examine how student argumentation ability varies between science, RE and Science-RE interdisciplinary contexts. It was observed that overall performance was generally higher in the context of science than in the crosscurricular context and also higher in the cross-curricular context than in the RE context alone. This appears to confirm and add to the findings of Basel et al. (2014) who state that students found it easier to generate arguments from a scientific perspective in the context of a science-religion debate. This might perhaps be explained by the distinctions between arguments as rationalistic, emotive, or intuitive (Sadler and Zeidler 2005). That is to say, the arguments on the science context were perhaps more rationalistic, invoking less emotive or intuitive arguments, and therefore more straightforward to grapple with. Conversely, Osborne et al. (2016) reported that students appear to find argumentation more difficult in science than in more general contexts, because of the need for specific content knowledge. However, there are two reasons to expect different patterns within the present study. Firstly, while the interdisciplinary context may appear similar to 'general' argumentation contexts, it is differentiated by the presence of information from both scientific and RE perspectives. Secondly, the study attempted to negate the influence of content knowledge by providing the necessary information and using concepts that students would have been exposed to previously.
When examining the progression of performance within each context for subcomponents of argumentation, it can be seen that performance drops in subsequent questions. A range of 76-96% of students identified both claims in each context, but success in identifying both items of evidence dropped to 46-67%. The identification of the warrant and the construction of the evaluative argument were scored differently, but the performance generally skewed towards the lower grades. This pattern of performance is in line with expectations of the theoretical model, where subsequent tasks were for levels of argumentation considered to be more difficult (Osborne et al. 2016).
The overall performance at identifying claims across all three contexts was generally quite high, but the performance was higher in the science-RE context than the other two contexts and RE had observably lower performance in successfully identifying both claims.
In the case of identifying evidence, while all scores were lower than for the identification of claims in their individual contexts, performance in the CfNC task was lower than the other two contexts. Recent research has shown that students often find the identification of evidence to be particularly difficult in scientific argumentation (Rodríguez-Mora, Cebrián-Robles, and Blanco-López 2021), but the lower performances in CfNC may be attributed to difficulties surrounding the nature of evidence in this task, which may be typical for the RE subject context. In prior research on science and RE teachers' views about the nature of argumentation in these two subjects, differences in the nature and range of acceptable evidence were highlighted Park 2020, 2021). As RE is itself a multidisciplinary subject, there are arguably no clear standards about what counts as evidence for an argument in the subject, or how different evidence is warranted (Chan, Fancourt, and Guilfoyle 2020). This possibility of operating with ill-defined argumentation standards of RE is also indicated in the performances of students in terms of warrants, where the RE performance was again lower than the other two contexts. In terms of constructing evaluative arguments, too, performances were generally higher in the science context than either of the other two contexts. These findings seem to lend further support to the interpretation that science could perhaps be considered more 'straightforward' with more defined argumentation and reasoning on more rationalistic terms alone, while the other contexts are made more challenging by a lack of clarity about what counts as evidence or warrant and the likelihood of invoking emotive or intuitive reasoning.
It is perhaps also worth noting that the level of argumentation concerned with warranting seemed to elicit lower levels of success, where across all three tasks there were higher numbers of zero-scores, and distributions skewed towards lower marks, at the level of warrant than at the level of constructing an evaluative argument. This may highlight a particular challenge in engaging in this component of argumentation across all contexts, which may be worthy of further attention in research and teaching.
The concurrent assessment of argumentation in RE, science and the interdisciplinary context is novel for the research literature. While this offers new approaches to research to consider interdisciplinary argumentation, there are perhaps further refinements which can be made to explore these findings in greater detail. For example, if pluralistic RE is discursive and dialogical, then the elements of argumentation that we did not assess (i.e. rebuttal and qualification) might be more developed than in the other two contexts; students might be developing all these skills concurrently, whereas as science these features are not addressed till later in the curriculum.
The findings of this study make a number of important contributions to research and practice. While there has been interest in science-religion or science-RE debates in the past, these have rarely focused on argumentation, and where they have, it has been within settings with theological or confessional approaches to RE (e.g. Basel et al. 2014) rather than pluralistic RE as in England. Furthermore, while argumentation in science education has been extensively researched (Lin et al. 2018) and interest in interdisciplinarity is growing, much work remains to be done in terms of relating it to other traditionally disparate school subjects such as RE (Erduran et al. 2019). Gaining a deeper understanding of the challenges and successes students experience when engaging in argumentation across these contexts is important because it allows us, in research and practice, to home in on the challenges and capitalise on the successes. For example, that stronger performances were noted in the science context raises questions about the comparative challenges of identifying evidence and warrants within the RE, or interdisciplinary contexts and the extent to which standards about what counts are clear. But there are many indicators for optimism too. The differences between the contexts do not suggest a huge deficit, the spread of overall performance appeared to show a realistic distribution where a sizeable proportion of students were capable at engaging in argumentation successfully, and many did so across multiple contexts. Evidently, it is not beyond the capabilities of lower secondary school students to engage in argumentation tasks within and between their school subjects.
Given the benefits of argumentation for developing subject literacies as well as its importance in growing individuals' capacities to critically engage as active citizens Monte-Sano 2016), and that most real-life issues require the integration of information from multiple sources (Crujeiras-Pérez and Jimenez-Aleixandre 2019), there is a pressing need to better understand how argumentation can be integrated across the school curriculum. This study advances our understanding of the challenges and opportunities in advancing the agenda for interdisciplinary argumentation.