Group Argumentation Development through Philosophical Dialogues for Persons with Acquired Brain Injuries

ABSTRACT The high prevalence of brain injury incidents in adolescence and adulthood demands effective models for re-learning lost cognitive abilities. Impairment in brain injury survivors’ higher-level cognitive functions is common and a negative predictor for long-term outcome. We conducted two small-scale interventions (N = 12; 33.33% female) with persons with acquired brain injuries in two municipalities in Sweden. Age ranged from 17 to 65 years (M = 51.17, SD = 14.53). The interventions were dialogic, inquiry-based, and inspired by the Philosophy for Children Programme, a participatory thinking skills approach with documented higher-order cognitive outcomes, such as developed argumentation skills, in other target groups. Philosophical dialogues were conducted once a week in the two groups, totalling 12 dialogues per group. Group argumentation development was measured through compared scores from structured observations of filmed dialogues early and late in the intervention. Large positive changes in mean scores from early to late in the intervention, together with constantly high facilitator quality, suggest argumentation development in the sample due to the intervention.


Introduction
Acquired brain injuries (ABIs) consist of traumatic brain injuries (TBIs) (caused by external forces, such as motor vehicle accidents or falls) and non-traumatic ones (caused by internal forces, such as strokes or infections) obtained after birth. TBIs have been paid much attention lately in the research literatureunsurprisingly, as it is estimated to be the third leading cause of the global disease burden in 2020 (Colantonio et al., 2016) and cause disabilities for all age groups in all countries (WHO, 2004). Worldwide, more than 10 million people annually acquire a TBI (Colantonio et al., 2016). Given that the annual number of TBI incidents per 100 000 people is 235 (see Jacobsson, 2010), there would be 1.7 million TBI incidents annually in the European countries. Add non-traumatic injuries, and the numbers would be significantly higher. In fact, ABIs are currently the leading cause of disability among young adults (Walsh, Fortune, Gallagher, & Muldoon, 2014).
CONTACT Ylva Backman ylva.backman@ltu.se This article has been republished with minor changes. These changes do not impact the academic content of the article.
The long life expectancy makes the long-term societal costs of, for instance, public assistance and loss of wages and income taxes large (Sabatello, 2014). For instance, costs for TBIs total $48.3-$76.5 billion annually in the US (Fabiano & Sharrard, 2017). In Sweden, 70% of persons with an ABI responding to an ABI survey have economic compensation from the Swedish Social Insurance Agency (Soukka, 2012). Swedish costs for mild TBIs alone are approximately 4.5 billion SEK annually (Stålnacke, Styrke, Sojka, & Björnstig, 2005), i.e. approximately €450 million.
The broad array of negative effects, varying with injury severity and demographics, is consistently reported in the cognitive, social, and emotional domains (Cancelliere et al., 2014;Fabiano & Sharrad, 2017;Sabatello, 2014), resulting in second-order problems such as higher unemployment rates (Sabatello, 2014), employment loss (Colantonio et al., 2016), frequent job changes and lowered levels of workplace responsibility (Fabiano & Sharrad, 2017), as well as reduced workplace productivity and activity (Fabiano & Sharrad, 2017;Soeker, 2016). About 70% of persons with moderate TBIs do not return to work (Soeker, 2016). Besides economic and social costs for society, personal costs for the individuals with ABIs and their families are significant (Durham, 2012).
In a Swedish report, the expression 'black hole' is used to denote the situation where persons with ABIs are left in uncertainty as to where to turn in order to address the remaining impairments once they have finished their initial rehabilitation (Swedish Association of Brain Injured and their Families [SABIF], 2012, p. 15). Long-term impairments that extend beyond initial rehabilitation consist in lower cognitive, social, or emotional functioning (Colantonio et al., 2016;Jacobsson, 2010;Mills & Kreutzer, 2015;Sabatello, 2014;WHO, 2004). Cognitive ABI effects such as impaired abstract reasoning, mental flexibility, and problem-solving skills are examples of executive functioning deficits (Dodson, 2010;Fabiano & Sharrad, 2017;Whiting, Deane, Simpson, McLeod, & Ciarrochi, 2017), which are 'a potent negative predictor for long-term outcome' (Løvstad et al., 2012(Løvstad et al., , p. 1586) and pose significant challenges for rehabilitation efforts (Constantinidou, Wertheimer, Tsanadis, Evans, & Paul, 2012). Important long-term needs of the individuals are not fully met by the current rehabilitation services (Mitsch, Curtin, & Badge, 2014;Sabatello, 2014;SABIF, 2012). The majority of the research in brain injury rehabilitation concerns medical-biological interventions (Soeker, 2016;Strandberg, 2006), and alternative models are requested to fulfil long-term needs of the group (Mitsch et al., 2014;Strandberg, 2006).
In the last decades, the number of studies from the educational and closely connected sciences about the effectiveness of different pedagogical methods on cognitive development has increased. There is evidence that engaging in dialogic, inquiry-based and argumentation centred pedagogies is conducive to improved reasoning skills and cognition (see, e.g. Asterhan & Schwarz, 2016;Kuhn, Zillmer, Crowell, & Zavala, 2013;Murphy, Wilkinson, Soter, Hennessey, & Alexander, 2009;Topping & Trickey, 2007a). Among such approaches are Philosophy for Children (P4C), a participatory community of inquiry approach to critical thinking (Lipman, Sharp, & Oscanyan, 1980), with a record of previous research and emerging evidence for positive effects on cognition (García Moriyón, Rebollo, & Colom, 2005;Murphy et al., 2009;Trickey & Topping, 2004;Yan, Walters, Wang, & Wang, 2018). Some previous studies indicate effectiveness of P4C, or, more broadly, philosophical dialogues, in regard to socially disadvantaged groups (Trickey & Topping, 2004), but research on the potential of philosophical dialogues to promote cognitive skills in persons with ABIs is nearly non-existent globally. The only previous research known to the present authors was a small-scale pilot study with promising results (Gardelli, 2012;Gardelli, Backman, Gardelli, Gardelli, & Strömberg, 2013). In the exploratory research here presented, we have studied group argumentation development in persons with ABIs through an intervention based on philosophical dialogues.

Argumentation, Reasoning, and Cognition
According to Asterhan and Schwarz (2016), '[i]nterest in argumentation is as old as Western culture ' (p. 164). This is not surprising, since fostering argumentation is a viable means for promoting participation in a democratic society (see, e.g. Asterhan & Schwarz, 2016), since argumentation skills are central to communicative ability (cf. Felton & Kuhn, 2011;Kuhn & Udell, 2003), and since arguing is central to the development of cognitive ability (Kuhn & Crowell, 2011).
In this article, we use the term 'reasoning' to denote a proper subclass of cognition, such as the evaluation of information and evidence, logical thinking and the creation of arguments. Reasoning is hence taken to be internal, while we use the term 'argumentation' to denote the communication of arguments (reasons, evidence) in order to jointly investigate or influence someone else's thinking or beliefs. Argumentation, then, could be seen as a form of external reasoning, or the voicing of reasoning. The distinction we adopt between argumentation and reasoning is similar to Mercier's (2016), since he suggests that reasoning is a 'specific type of inferential mechanism that allows us to find and evaluate reasons' (Mercier, 2016, p. 690) while he takes argumentation to be the 'public exchange of arguments meant to convince' (Mercier, 2016, p. 690).
Based on ideas proposed by theorists such as Vygotsky, Dewey, Bakhtin, and Mead, much recent research on argumentation and reasoning rests on the premise that individual cognition is shaped through social interactions and that 'verbal dialogue plays a special role in this process' (Asterhan & Schwarz, 2016, p. 165). Indeed, according to Kuhn et al. (2013), sustained engagement in argumentation creates a social climate supporting development of individual argumentative competence (cf. Anderson et al., 2001). Asterhan and Schwarz (2016) argue that engaging in argumentation activities is especially effective for learning complex topics requiring deep cognitive engagement.
However, much research has found that it is difficult to develop argumentation skills and that 'individuals of all ages [are] performing poorly in assessments of both production and evaluation of arguments' (Kuhn et al., 2013, p. 456). Similarly, Mercier and Sperber (2011, p. 58) claim that for half a century, research has found that 'humans reason rather poorly, failing at simple logical tasks [and are] being subject to sundry irrational biases in decision making', among other problems with weak argumentative abilities (Trouche, Johansson, Hall, & Mercier, 2016). Indeed, efforts to promote gains in argumentative skills have often demanded a long time of engagement to bear fruit, sometimes taking several years of practice (Kuhn et al., 2013).
On the other hand, even children display and can learn quite complex and advanced reasoning skills (Reznitskaya et al., 2009). This in turn invites us to ask if we as a societythrough education and elsewheredo enough to promote the development of argumentative skills. Doing it, however, is important, since argumentation and reasoning are intimately connected (Asterhan & Schwarz, 2016;Fischer et al., 2014;Mercier & Sperber, 2011;Papathomas & Kuhn, 2017). Engagement in argumentation is a means to further reasoning abilities (cf. Reznitskaya et al., 2001), and some (e.g. Mercier, 2016) even claim that 'the main function of reasoning is to exchange arguments with others' (p. 689). Developing reasoning abilities, in turn, is a central part of cognitive development, since reasoning is a core cognitive function (Kievit et al., 2017;Kuhn, 1992;Mercier, 2016).

Dialogic Education
It is common to use sociocultural and sociocognitive theory to derive the theoretical rationales for explaining the importance of pedagogy based on discussion (see, e.g. Alexander, 2018;Murphy et al., 2009;Reznitskaya, 2005;Reznitskaya & Glina, 2013). Furthermore, according to Alexander (2018), a broad array of evidential bases, such as psycholinguistics, sociolinguistics, philosophy, and pedagogy, are used to argue for using dialogic teaching. For instance, psychological research and research in neuroscience demonstrate the effect of spoken language on cognitive development, as argued by Alexander (2018), and a number of studies have found that certain dialogic pedagogies are conducive to the development of reasoning skills (for a meta-analysis see Murphy et al., 2009, and for an example see; Topping & Trickey, 2007a).
However, while dialogic teaching currently has many proponents in, for instance, the educational sciences, and while the concept of dialogic teaching is richly elaborated theoretically (Sedova, 2017), there is to date no single and agreed definition of 'dialogic teaching' (Alexander, 2018). However, some central characteristics are found in various dialogic approaches (Alexander, 2018;Reznitskaya et al., 2012). Examples of standard student activities in dialogic teaching situations are providing response to each others' contributions and supporting or criticising each others' ideas, and the discussions are characterised by the students' active engagement, high levels of autonomy, and influence upon the development of the discussion (Sedova, 2017).
According to Sedova (2017, p. 279), '[a] dialogic teaching framework includes various conceptual tools, which in general can be distinguished as indicators, principles and methods of dialogic teaching'. Examples of indicators are the expression of students' thoughts with reasoning and with support from arguments, and authentic questions that demand high cognitive responses to answer. Examples of principles to follow in dialogic teaching are reciprocality between the teacher and the students in listening and sharing thoughts, and that the dialogue should be supportive in that the students should be free to express ideas without fear of being, for instance, ridiculed. Examples of dialogic teaching methods are P4C and Collaborative Reasoning. (Sedova, 2017) Collaborative Reasoning and P4C are also two out of nine discussion approaches that were subject to scrutiny in a meta-analysis that examined the effects of classroom discussion on critical thinking and reasoning outcomes, as well as student talk and reading comprehension (Murphy et al., 2009). The authors attempted to be exhaustive in their selection, but a selection criterion was that the approach be 'substantiated by a record of published, peer-reviewed research' (Murphy et al., 2009, p. 760). Out of nine approaches, three -P4C, Collaborative Reasoning, and Paideia Seminarwere called 'critical-analytic', since 'each of these approaches has an enacted goal of querying and interrogating the underlying arguments and evidence' (Murphy et al., 2009, p. 742), and the evidence reviewed supported effectiveness of P4C and Collaborative Reasoning for development in critical thinking, reasoning, argumentation, and amount of student talk.
In this study, we conducted an intervention for persons with ABIs inspired by P4C. The P4C programme was originally designed for school years K-12 (Lipman et al., 1980), but the method has been adapted for other contexts (UNESCO, 2007), remaining a distinct didactical method with certain typical facilitation procedures (Trickey & Topping, 2004). Today, it is an 'established educational model that places dialogue at the center of its pedagogy' (Reznitskaya & Glina, 2013, p. 50). In short, P4C is based on a cooperative learning, and more specifically a community of inquiry, approach to critical thinking (Lipman, 2003;Lipman et al., 1980), where a qualified facilitator promotes active and critical dialogue about contestable issues (Lipman et al., 1980). Trickey and Topping (2004) maintain that 'the process is dependent on the quality of interaction and dialogue engendered, rather than rigidly following a step-by-step procedure' (p. 370), but a 'routine classroom philosophical enquiry' (p. 369) is sometimes summarised in the following nine steps (with some explanatory additions and minor alterations from the present authors based on recent literature and newly developed facilitation tools): (1) Getting started (including agreeing upon rules of interaction); (2) Sharing a stimulus to prompt inquiry (such as a text or a film); (3) Pausing for thought; (4) Questioning (the participants think of interesting and contestable questions); (5) Making connections (making links between the questions); (6) Choosing a question to begin an inquiry; (7) Inquiring upon the chosen question under guidance of the facilitator; (8) Recording the discussion (e.g. by graphic mapping); and (9) Engaging in meta-dialogue (reviewing, summarising, reflecting on the process, etc.) During step 7 above, which often overlaps with step 8 and is the main and lengthiest step, the facilitator engages with productive talk moves, which help students pay attention to 'the quality of their reasoning, the inclusiveness of their group interactions, and the progress of their inquiryfrom contestable questions to reasoned judgments' (Reznitskaya & Glina, 2013, p. 51) in an emotionally supportive climate (Trickey & Topping, 2004). The contestable questions inquired upon reflect the interests of, and have no simple answer known to, the participants (Reznitskaya & Glina, 2013). Throughout the inquiry, the facilitator uses open-ended questioning, such as 'If someone disagreed with you, what would they say to argue against you?', 'How are you using the word . . . ?' and 'How does this relate to what she said?' (Reznitskaya & Wilkinson, 2017), and avoids trying to foster substantial values upon the participants (Gardelli, Alerby, & Persson, 2014;Gardelli, 2016;Lipman et al., 1980;Nilsson, Gardelli, Backman, & Gardelli, 2015).
According to meta-analyses reviewing experimental P4C-research over decades, P4C has evidence for effectiveness in regard to development of cognitive skills in children (García Moriyón et al., 2005;Trickey & Topping, 2004;Yan et al., 2018). Development of non-verbal and verbal cognitive skills has been found (Topping & Trickey, 2007a), including logical and critical thinking skills (García Moriyón et al., 2005;Topping & Trickey, 2007a, 2007b, reasoning skills, and argumentation abilities (Topping & Trickey, 2007b). While the evidence of P4C is emerging, some researchers have expressed concerns for lack of both well-articulated theoretical foundations (Reznitskaya, 2005) and methodological rigour (Trickey & Topping, 2004). However, according to Trickey and Topping (2004), its 'quality and quantity of evidence nevertheless bears favourable comparison with that on many other methods in education' (p. 374).
To summarise, there is a high prevalence of ABIs globally and a broad array of negative long-term cognitive effects, with high personal and societal costs, causing challenges for rehabilitation efforts. There are plenty of studies in the educational and connected sciences that reach the conclusion that dialogic, inquiry-based and argumentation centred pedagogies, for instance P4C, is conducive to improved reasoning and argumentation skills, which are central to cognitive development. Therefore, it is relevant to ask to what extent argumentation skills are developed by persons with ABIs that participate in philosophical dialogues.

Study Context and Overview
This exploratory study was part of a research project funded by the Swedish Research Council with the purpose of studying the possible effectiveness of philosophical dialogues as an educational method for persons with ABIs to regain lost, and develop new, abilities important for participating in society. The project aimed to study development of communicative abilities and critical thinking skills and dispositions through participation in philosophical dialogues. Two small-scale interventions were carried out during spring 2015 in collaboration with two ABI organisations in Sweden. Philosophical dialogues were conducted once a week (with the exception of holidays, etc.) in two groups, totalling 24 dialogues (12 per group). All 24 dialogues were recorded with two film cameras from different angles. In this particular study, we used the structured observational scale the Argumentation Rating Tool (ART) (Reznitskaya & Wilkinson, 2017) to measure group argumentation development in the two groups from early to late in the interventions through analysis of recorded dialogues. The details of the data processing procedure are provided below under 'Data processing'.
Before the interventions, the participants in both groups received the Montreal Cognitive Assessment (MoCA), a widely used and quickly administered (approximately 10 min per respondent) tool that is used to 'detect and quantify cognitive impairment' (Bernstein, Lacritz, Barlow, Weiner, & DeFina, 2011, p. 119). It measures abstraction, attention, executive functioning, orientation, language, naming, and delayed recall through 30 items (Bernstein et al., 2011). The MoCA data (see 'Participants' below) were only used to collect information about the study participants and not as a pre-test.
Application for ethical vetting in accordance with the Swedish 'Act concerning the Ethical Review of Research Involving Humans' was sent to the Regional Ethical Review Board in Umeå, Sweden, which approved the project before the start of the interventions. Permission was also received from principal/manager and staff of the participating ABI organisations. The principle of informed consent was applied for all research participants. They were informed that they were at all times free to terminate participation without giving any reason and that they were guaranteed confidentiality.

Participants
Participants with ABIs from two different municipalities in Sweden were offered to participate in the intervention. In Group A, eight persons (50% female) with an ABI that were enrolled in an educational programme for persons with ABIs participated in the study. One person refrained from participating in the study, but still participated in the dialogues, and one person with another medical condition (but no ABI) participated during the intervention as well. In Group B, four persons (0% female) with ABIs who were active in a daytime activity centre participated in the study. The total number of participants was 12 (33.33% female).
All participants in the study were medically rehabilitated, in the sense that no further intensive medical treatment was needed. Participants from the educational programme took courses in basic language, mathematics, brain knowledge, arts, and 'activities for daily life', among other things. The daytime activity centre offered social interaction and individually designed activities with the support of trained staff, such as crafts, various leisure activities, kitchen activities (baking, cooking, etc.), gardening, and planned activities with social orientation.
The average participant with an ABI in Group A participated in 11 dialogues, which was also the case for the average participant in Group B. In addition to persons with ABIs and researchers from the research group, staff also participated. In Group A, the participants with ABIs were between 17 and 65 years old (M = 45.50, SD = 14.79, based on the age of each participant without decimals), at the time of their first philosophical dialogue, and in Group B, the participants with ABIs were between 60 and 65 years old (M = 62.50, SD = 1.80). In the two groups taken together, the participants with ABIs were between 17 and 65 years old (M = 51.17, SD = 14.53).
MoCA mean scores were 22.9/30 points for Group A and 15.3/30 points for Group B. In Group B, two participants had aphasia. Mean scores for the total sample of persons with ABIs were 20.3/30 (≈ 68%) points. A total score of 26 or more indicates normal functioning. Lowest mean scores were obtained on items Language 2 (0.25/1 point) and Delayed recall (1.58/5 points), followed by Attention 3 (1.75/3 points).

Intervention Procedure
During the dialogues, approximately 20-25 min were spent on coming up with, refining, and choosing questions, while the remaining active time, usually 45-50 min, was used for inquiry, i.e. for discussing the chosen question. Mean length of the sessions was 75 min in Group A, including a short break (mean length 6 min), and 105 min in Group B, including a long break (mean length 31 min). Participating in the dialogues in Group A was usually (measured by median, rounded to whole numbers, among the participants with ABIs participating in the study) eight participants with an ABI, three members of staff, and two persons from the research group (one of whom acted as a facilitator). Participating in the dialogues in Group B was usually (measured by median, rounded to whole numbers) four participants with an ABI, two members of staff, and two persons from the research group (one of whom acted as a facilitator).
In the beginning of each philosophical dialogue in this research project, the participants were informed that they were going to participate in a philosophical dialogue, and that it included four steps: 1) thinking about questions, 2) raising questions, 3) voting, and 4) conducting dialogue. In step 1, the participants thought in silence (or sometimes together with a member of the staff) about what questions they wanted to ask (cf. steps 3-4 in the section 'Dialogic education' above). In step 2, the participants raised their question(s), if they had any, and the facilitator wrote the questions down for everyone to see on a whiteboard or other tool, and asked for clarifications if needed (cf. step 5). Also, other participants could ask such clarifying questions or in other ways assist in formulating the questions. In step 3, the participants voted for the question(s) they wanted to discuss (cf. step 6 above). The facilitator told everyone to close or cover their eyes, and as the facilitator read the questions, the participants raised their hands when the question(s) that they wanted to vote for was read. The question that received the most votes was discussed in step 4, in accordance with common methodological P4C guidelines (Lipman et al., 1980), and often involving graphic mapping of the inquiry on a whiteboard (or similar) by the facilitator (cf. steps 7-8).
Two facilitators with B.A.s in Philosophy and previous experience of conducting philosophical dialogues with persons with ABIs (and other persons) as well as of teaching philosophy at the university participated in the vast majority of the dialogues. They acted in accordance with a specific model where the two facilitators have different roles. The roles consist of one leading facilitator and one participating facilitator. This model was developed in a related project on methodological development and implementation of philosophical dialogues ('Young thoughtsphilosophical dialogues in democratic forms', funded by the Swedish Inheritance Fund ['Allmänna arvsfonden'] 2010-2014), previously run by members of the research group. According to the model, the leading facilitator leads steps 1-3 described above and tries in step 4 to facilitate good inquiry. This is done through talk moves such as asking clarifying questions to the participants, summarising (or asking for a summary), probing for alternative perspectives, or asking for reasons, in line with several methodological descriptions provided by, for instance, Lipman et al. (1980) and Reznitskaya and Wilkinson (2017). The participating facilitator, on the other hand, acts like any other participant in the dialogue, discussing the selected question by presenting answers, ideas, arguments, counterexamples, etc. The two members of the research group who acted as facilitators during almost all of the dialogues took turns in having the two different roles. (During two of the dialogues, another member of the research group was a facilitator.)

Measures
As mentioned earlier, the Argumentation Rating Tool was used in order to measure argumentation development through analysis of recorded dialogues early and late in the interventions. This detailed observational instrument contains four key standards of quality argumentation that were identified during the construction process of the instrument through reviews of previous scholarship on reasoning, argumentation, logic, and critical thinking (Reznitskaya, Wilkinson, Oyler, Bourdage-Reninger, & Sykes, 2016). For each such standard, the constructors connected talk moves intended to enforce the standard (Reznitskaya et al., 2016) (see Table 1).
The 11 'items' (as we will call them) were based on the Dialogic Inquiry Tool, a previously developed instrument constructed with influence from a comprehensive review of over a hundred articles about indicators for productive classroom talk, established pedagogical dialogue models promoting argumentation, existing observational instruments targeting classroom interactions, and repeated use and revisions through an empirical research programme (Reznitskaya et al., 2016). The rating scale runs from 1 to 6 for each of the 11 items, and rates the group of participants and the facilitator, respectively, through aspects of community reasoning. Validation studies indicate high inter-rater reliability and internal consistency for composite scores, and that the ART is sensitive to experimental manipulation (Reznitskaya et al., 2016).

Data Processing
Members of the research group translated (after permission for this was granted from Montclair State University) the ART to Swedish, and the translated tool was used throughout the study.
The ART was developed by researchers in dialogic education and P4C methodology (Reznitskaya & Wilkinson, 2017) and was initially designed for evaluation of collaborative reasoning quality in group discussions about texts in elementary school (Reznitskaya et al., 2016). In this study, the sample differed from the intended both in terms of age and mental conditions, and no texts were read in connection to the dialogues. In order to reach a shared view of how to interpret and apply the criteria in the ART in this particular context, the group of raters calibrated itself in early May 2016 by rating and discussing their ratings of parts of other dialogues from the intervention.
After the calibration process, three blind raters applied the ART (all eleven items) to an early and late sample from the total 24 filmed dialogues: the first two and the one last dialogues (numbers 1, 2, and 12) in the intervention for Group A, and the first and the two last (numbers 1, 2, 11, and 12) in the intervention for Group B. The initial plan was to use data from dialogues 1, 2, 11, and 12 for both groups, but dialogue 11 in Group A did not fulfil the requirements for sampling (only one staff member, who moreover had never before participated in the intervention, participated). Each of the three raters individually rated the seven sampled dialogues, using the 45-50-min inquiry parts (i.e. the discussion part occurring after coming up with, refining and choosing question). The facilitators were also rated on all facilitator items in the ART in order to determine consistency over time in facilitation quality during the intervention; these data can be used for considerations of internal validity. All three raters had experience in conducting philosophical dialogues and at least a B.A. in Philosophy. One had a PhD in Education, another received a PhD in Education about a month after the ratings, and a third had a teacher education degree.
During the rating process, the three raters individually noted and categorised every relevant interaction in the dialogue sample. They then, individually, put a score on each item for each dialogue. The scores were then given to a fourth researcher who calculated mean scores for all rated dialogues. Mean scores for dialogue 1-2 in Group A and B, respectively, were then calculated, as well as mean scores for dialogue 12 in Group A and for dialogue 11-12 in Group B. Those calculations then provided data on differences between early and late dialogues in the two groups, respectively (see Tables 5 and 6). Also, combined mean scores for early and late dialogues for both groups together were calculated (see Table 3). Percentages of agreement within 1 point between pairs of raters were calculated as a way of assessing inter-rater reliability. As an example, if on participant item 1 in dialogue 1 in Group A, one rater were to set the score '3', another rater would set the score '4' and the third rater would set the score '5', the agreement within 1 point on this particular item for this group would be 67%, since raters 1 and 2 scored within 1 point from each other, as did raters 2 and 3, while raters 1 and 3 did not (the difference between their ratings was 2 points). Table 2 shows the average percentages of agreement between each pair of raters for the participant and facilitator items, respectively, as well as means of these numbers.

Results
The differences between early and late means of participant scores on each ART item for both groups combined are displayed in Table 3.
The combined mean score difference between the early and late dialogues was 0.88 points (corresponding to an 18 percentage point increase). Mean score difference for the items in the 'Shared' category was 1.0, for the items in the 'Clear' category was 0.9, for the items in the 'Acceptable' category was 0.3 and for the items in the 'Logical' category was 1.2. The outlier here is the 'Acceptable' category, which consists of the items 'Evaluating facts' and 'Evaluating values'. There are, however, plausible explanations for the relatively low mean score difference on this category. If a philosophical dialogue centres, as is common in P4C-settings, on a particular text that is read by the group before the actual dialogue commences, students in a well-functioning dialogue will often 'refer to the text or other sources to support their positions' (as it is stated in the ART in regards to 'Evaluating facts'). In this intervention, the dialogues were not prompted by any texts, affecting the ability of the group of participants to score high on this item. Regarding  the 'Evaluating values' item, a rather well-functioning group might score very high in a dialogue centring on an ethical question, but not so high in a dialogue centring on a non-ethical philosophical question. If the question being discussed is 'big' but not ethical (e.g. an epistemological question), a group of participants could score low on this item without missing any opportunities to discuss questionable statements. This problem does not, however, arise in regard to the facilitator items, since as a facilitator one should receive a high score on those items as long as one does not miss any opportunities to prompt examination of questionable factual or value statements. In Group A, the mean score difference between the early and late dialogues was 1.3 and in Group B it was 0.5 (see Table 4). We calculated effect sizes per group, using the following formula: In the formula above, 'M 1,A ' denotes the mean value in the early dialogues for Group A and 'M 2,A ' denotes the mean value in the late dialogues for Group A, while 'S 1,A ' denotes   the sample standard deviation in the early dialogues for Group A, and similarly for Group B. The effect sizes for each group, when looking at participant mean score (the total score divided by the number of ART items), can be seen in Table 4. The effect sizes in the two groups when looking at the mean scores was 2.2 in Group A and 1.0 in Group B, and 1.6 on average. Hattie writes the following concerning effect sizes: 'Cohen (1988), for example, suggested that d = 0.2 was small, d = 0.5 medium, and d = 0.8 large, whereas the results in this book could suggest d = 0.2 for small, d = 0.4 for medium, and d = 0.6 for large when judging educational outcomes.' (Hattie, 2009, chap. 2) Later on he notes that '[c]ertainly effects above d = 0.40 are worth having' (Hattie, 2009, chap. 2).
Because of the different characteristics (regarding age, sex and MoCA scores) of the two groups, the numerical data are displayed also separately for Group A and B. Differences between early and late means of participant scores for Group A are thus displayed in Table 5.
The combined mean score difference for this group was 1.26 points (a 25 p.p. increase). Mean score difference in Group A for the items in the 'Shared' category was 1.2, for the items in the 'Clear' category was 1.4, for the items in the 'Acceptable' category was 1.3, and for the items in the 'Logical' category was 1.1.
For Group B, differences between early and late means of participant scores are displayed in Table 6.
For Group B, the difference between early and late means was overall lower than for Group A, with a combined mean score difference of 0.5 points (a 10 p.p. increase). Mean score difference in Group B for the items in the 'Shared' category was 0.8, for the items in the 'Clear' category was 0.5, for the items in the 'Acceptable' category was −0.7, and for the items in the 'Logical' category was 1.3.
As the internal validity is partly dependent on the constant intervention quality (i.e. the facilitator quality, in this study measured by all facilitator ART items), the means of facilitator scores are shown in Table 7.
The difference in facilitator quality between early and late in the intervention is very small for most items, ranging from −0.2 to +0.25 points on all items except for 'Labeling moves and parts of an argument', where the difference is 0.75. The mean total difference for all 11 ART items taken together is 0.08 points (a 2 p.p. difference).

Discussion
The large positive changes of the participants' ART scores, as judged by mean scores or by effect size, from early to late in the intervention, together with constant facilitator quality, suggest that there has been a development of group argumentation skills during the intervention. As theory and prior research have suggested that developments on the group level promote individual competencies, this gives reason to believe that individual cognitive abilities have been developed. A positive change from early to late in the intervention was expected given that much previous research on dialogic, inquiry-based and argumentation centred pedagogies, such as philosophical dialogues, has found positive effects on other groups (e.g. children) (Topping & Trickey, 2007a;Yan et al., 2018). Nonetheless, a positive change of the magnitude indicated in this study is higher than what is typically reported even in P4C studies (cf. Yan et al., 2018), and considerably larger than typical educational interventions (cf. Hattie, 2009), and hence would not have been expected had there been no intervention. This suggests that there was a positive change due to the intervention.
The underlying mechanisms making engagement in dialogue and argumentation effective for persons with ABIs should be studied further, but some initial remarks can be made here. Persons with ABIs suffer from long-term losses of cognitive and verbal skills, which influence them in their daily lives. To be able to practice such skills while discussing contestable questions of common interest could be motivating, which could partly explain the large effects. That this practice took place with guidance and modelling by experienced and philosophically educated facilitators is also believed by us to be a crucial causal factor in regards to the observed effect. Furthermore, the interventions allowed the participants with ABIs to more equally participate in, and to a greater extent influence, the activities than what has previously been described as common in the daily life of persons with disabilities (Gardelli, 2004). Active participation, empowerment, agency, equality and higher expectations are commonly considered lacking to some relevant extent and suggested important for persons with disabilities (Gardelli, 2004), and the influence of these factors in similar interventions for persons with ABIs could be studied further.
However, the conclusion that argumentation skill development occurred due to the intervention should be carefully considered because of certain methodological choices, such as the decision to not use an experimental design including a control group, mainly due to the difficulties with finding a truly equivalent group because of large individual differences in the participants with ABI. Furthermore, the ART was used in this study for another target group than originally intended by the observation scale constructors, which questions the validity of the study. This was addressed through the calibration procedure, but higher inter-rater agreement for the participant scores would have provided stronger reasons to believe that considerable group argumentation development occurred. We suggest that future research studies using the ART in analysis of dialogues with persons with ABI calibrate their ratings until higher percentages of agreements are reached.
The mean score change was larger in Group A than in Group B for each item. It is of interest to compare this with the MoCA mean scores (which were on average higher in Group A) and the mean age (which was lower in Group A). Taken alone, this would suggest that participation in philosophical dialogues has greater cognitive effects for persons with less severe ABIs or lower age, which would be in accordance with previous research on brain injury rehabilitation suggesting that injury severity and age have influence on recovery. It is, however, also relevant to consider the differences in mean scores in relation to the different contextual circumstances, where the participants in Group A were enrolled in an educational programme, while the participants in Group B were not.
An obvious limitation of the present study is the small sample size, which prompts caution with regards to generalisability. As an exploratory study, it does nonetheless pave the way for future research about P4C-like interventions for persons with ABIs related to argumentation development and similar forms of cognitive and communication development. This study is the first examining effects of philosophical dialogues with persons with ABIs, and it would be useful to conduct further similar studies. It would also be of interest to conduct longitudinal studies about long-term effects and single-subject designs with repeated interventions and baseline measures (see, e.g. Lammers & Badia, 2005). However, the findings of this study give reasons (though inconclusive) for believing that P4C-like interventions, if implemented, would fill an important gap in or after rehabilitation following an ABI.
In order to further the possibility of studying argumentation development, it would be beneficial with a theoretical framework and practical instrument or method enabling an in-depth examination of the quality of argumentation on both a group level and an individual level. This would potentially both benefit future research in dialogic interventions for people with ABIs and in dialogic education in general.

Disclosure statement
No potential conflict of interest was reported by the authors.