Playing a team game improves word production in poststroke aphasia

ABSTRACT Background: High intensity, one-to-one rehabilitation therapy is effective in the treatment of poststroke aphasia, but it can put strain on public health providers, as well as lead to high attrition. Working within a group of peers may be efficient for professional speech and language therapists, as well as reduce feelings of isolation and lack of confidence in patients, which can negatively affect progress. Evidence-based, structured group-based approaches, however, are lacking. Aims: We wanted to assess the feasibility a new group-delivered game-based intervention, designed to provide efficacious word-retrieval rehabilitation, in a cost-effective and motivating environment. Method & Procedure: Two cohorts of six participants took part. Each was split into two teams to play language games where pictures were named with the help of team members and facilitation from a speech and language therapist. Facilitation was varied in three different cueing conditions: phonemic, gesture + phonemic, and semantic + phonemic. Overall, 180 words were practiced (90 nouns and 90 verbs). Therapy was delivered 3 days per week, for 6 weeks (for a total of 54 hr). Outcomes & Results: Our intervention was equally effective across the three cueing conditions and for nouns and verbs. Gains were demonstrated in naming the pictures used in training, but also in the description of pictured scenes designed to elicit the same words. With these tasks, there were improvements of 25% and 18% from base-line accuracy, which compares well with gains reported in the literature using individually delivered speech and language therapy based on picture naming. Improvements were mostly maintained at both 4–7 weeks and 6-months post-therapy and were significant in all but the two most severely affected participants. There was some generalization of gains to narrative production, but not to other language tasks, nor to untreated words in picture naming. These positive language outcomes were combined with a high level of engagement and satisfaction (with participants stating a preference for games over standard therapy). Conclusions: Our results support embedding theoretically and empirically based techniques for aphasia rehabilitation within games with a strong social aspect, which may promote linguistic recovery in a way that is both time and cost-efficient and engaging. Future research should explore more formally outcomes in terms of increased well-being and reduced social isolation, as well as language proficiency.

recovery in a way that is both time and cost-efficient and engaging. Future research should explore more formally outcomes in terms of increased well-being and reduced social isolation, as well as language proficiency. Aphasia is often chronic and life-changing. It reduces quality of life and can hinder education, employment, and community integration (e.g., Astrom, Adolfsson, & Asplund, 1993;Hilari & Northcott, 2017;Hillari, Needle, & Harrison, 2012). Speech and language therapists (SLTs) are uniquely placed to treat people with aphasia (PwA). There is good evidence that speech and language therapy is effective to ameliorate language difficulties, at least when it is delivered with the right intensity (meaning, here, a dose that is high enough in terms in number of hours of therapy; see Bhogal, Teasell, & Speechley, 2003;Brady, Kelly, Godwin, Enderby, & Campbell, 2016;Cherney, 2012;Denes, Perazzolo, Piani, & Piccione, 1996;Hinckley & Carr, 2005). Nevertheless, PwA report unmet needs after they leave inpatient care (McKevitt et al., 2011). This situation is only predicted to worsen. Demand for health care is increasing as population grows and people live longer. Many national health services are already strained. To ameliorate this situation, we must devise new ways to deliver aphasia therapy, so that it is both effective and cost-effective.
Our study assessed the feasibility of a new intervention based on playing a social game with the expectation that this could be, at the same time, efficacious (being based on sound principles of language rehabilitation), fun, motivating, and cost-effective, since games can be carried out simultaneously by a group of participants supervised by a single SLT, thus reducing demands on professional time. We focused on poststroke word production difficulties including both difficulties in word retrieval (Broca's aphasia; anomia) and difficulties in phonological encoding (conduction aphasia; Wernicke's aphasia). Difficulties with word production are one of the most common, debilitating, and longlasting consequences of stroke aphasia (e.g., Goodglass & Wingfield, 1997) affecting a person's ability to communicate (Basso, Razzano, Faglioni, & Zanobio, 1990;Herbert, Hickin, Howard, Osborne, & Best, 2008). For this reason, rehabilitation therapies often focus on word production and use confrontation naming as a practicing tool (see Doesborgh et al., 2004;Nickels, 2002;Sul et al., 2016; for reviews, see Bhogal, Teasell, Speechley, & Albert 2003;Basso, 2005;Bhogal, Teasell, & Speechley, 2003;Bhogal, Teasell, Speechley, & Albert, 2003;Wisenburn & Mahoney, 2009). Our study maintains a focus on naming. Studies have shown item-specific and non-item-specific generalization to connected speech (see Conroy, Sage, & Ralph, 2009;Herbert et al., 2008;Rider, Wright, Marshall, & Page, 2008) supporting the usefulness of this approach to improve functional communication, which is a priority for PwA (Rider et al., 2008).

Background and rationale
One way to reduce therapy costs is to treat patients in a group. Moreover, practicing language in a group offers potential additional advantages because interacting with peers could be less intimidating and more motivating than having face-to-face interactions with a proficient speaker. In addition, participating in a group may reduce social isolation which could be as debilitating as the language impairment itself (Parr, 2007). Aphasia groups are commonly used to help PwA in their pathway to recovery either as the sole form of intervention or as an adjunct to one-to-one SLT (see Elman, 2007aElman, , 2007b. Aphasia groups, however, have been used only very sparingly to deliver structured interventions (see Lanyon, Rose, & Worrall, 2013, for a review), with the exception of CIAT-Constraint-Induced Aphasia Therapy/CILT-Constraint-Induced Language Therapy protocols, discussed later, where group size is limited (up to three patients). Most aphasia groups aim either to provide education and support or to provide a conversational environment for less severe patients (e.g., see Rose & Attard, 2015).
The social and emotional benefits of participating in a group are clear. Participating in a group normalizes experiences, allows socializing and encourages new friendships (Vickers, 2010), provides much-needed feelings of understanding and acceptance (Northcott, Moss, Harrison, & Hilari, 2016;Ross, Winslow, & Marchant, 2006;Vickers, 2010), and reduces depression (Brumfitt & Sheeran, 1997). The language benefits of unstructured conversations, however, are less clear. Only a few studies have assessed these benefits. Two studies assessed gains on general linguistic measures and found positive effects, but results were weakened by possible confounding with spontaneous recovery (Elman & Bernstein-Ellis, 1999;Wertz et al., 1981). Other studies have assessed gains linked more specifically to what was practiced within the group. Drummond and Simmons (1995) examined the quality of discourse (in terms of phonology, semantics, syntax, and pragmatics) in four PwA while they practiced topics of conversation within group. They found gains in quantity of verbal output, but no improvement in any of the quality measures. Falconer and Antonucci (2012) combined semantic feature analysis with group-based conversation in four PwA and found gains in informativeness and/or efficiency of communication (see also Antonucci, 2009). Two further studies have specifically assessed benefits for word production. The results of these studies provided only weak evidence of benefits. Eales and Pring (1998) carried out a within-subject study with four PwA. Target words were practiced first with individual therapy and then with group conversations using topics designed to elicit the target words. Performance was assessed at different points with picture naming. Performance improved mostly after individual therapy. It also improved after group conversations, but with no difference between the words practiced in conversation and control words. Nickels, McDonald, and Mason (2016) also carried out a within-subject study with four PwA. Participants' lexical retrieval abilities were assessed with both picture naming and structured interviews designed to elicit the target words. Performance was compared for three matched sets of 30 words which were (1) untreated; (2) treated with group conversations on associated topics; (c) treated with group conversations + home-based confrontation naming exercises. Treated sets-but not untreated sets-showed improvements, but only in picture naming. Moreover, gains were confounded by general trends for improvements which occurred both in treatment and no treatment phases of the study, weakening results.
Taken together the studies reviewed earlier showed limited evidence that non-structured conversion approaches are beneficial. Structured linguistic intervention (following a defined protocol) may work better, especially for patients with moderate-to-severe impairments who may find conversation too difficult (see also Lanyon et al., 2013). Structured interventions, however, are mostly delivered one-to-one with the important exception of Constraint-Induced Protocols (CIP), which involve small groups of participants (known as CIAT, CILT, or ILAT-Intensive Language Action Therapy; see Pulvermüller et al., 2001;Difrancesco, Pulvermüller, & Mohr, 2012; for a review, see Balardin & Miotto, 2009;Meinzer, Rodriguez, & Rothi, 2012;Zhang et al., 2017). These protocols share the following defining characteristics: (1) Treatment is delivered in small groups (up to three patients); (2) practice is strictly focused on a verbal, spoken output with other forms of communication either not practiced or actively discouraged (constrained); (3) treatment is intensive where intensity refers to the therapy being delivered both with a high-dose and in a compact way (massed rather than distributed practice); (4) treatment is focused on word production (picture naming); (5) treatment involves shaping, where word production is practiced repeatedly, with different carrier sentences, and different degrees of facilitation; (6) naming is promoted in the context of social requests as part of a card game (Go Fish) where participants ask other participants for matching cards. CIP have received a lot of attention because studies have shown benefits for treated words and, occasionally, improvements on standardized tasks (e.g., Carpenter & Cherney, 2016;Pulvermüller et al., 2001; for a review, see Zhang et al., 2017; but also see for negative results-Attard, Rose, & Lanyon, 2012;Hameister, Nickels, Ca, & Croot, 2017;Kurland, Stanek, Stokes, Li, & Andrianopoulos, 2016;Nickels & Osborne, 2016). Which elements are responsible for the success of CIP, however, remain unclear.
A recent study by Stahl et al. (2016) compared two forms of naming therapy delivered in small groups. One was ILAT, where participants ask for cards in the context of the game "Go Fish" (the same game used by other CIP). Here, naming is carried out for the purpose of acquiring matching cards (when a matching card is acquired, the pair can be discarded; the player who is left without cards wins). The other was a traditional naming therapy, where participants were asked to name what was depicted on the cards. Eighteen PwA carried out both the ILAT protocol and the confrontation naming protocol in counterbalanced order. Results showed that the ILAT protocol delivered more improvements on subscales of the Aachen Aphasia Test. The authors interpreted this result as showing the importance of social interaction for therapy results. In particular, they stressed the importance of embedding naming in the context of social requests. Another possible interpretation of these results, however, is that CIP involves playing a game which could be more motivating than carrying out naming exercises individually.
In our experimental investigation, we wanted to keep a number of elements used in CIP (as well as in other therapies) which we know are effective, such as a focus on spoken word naming, shaping with facilitation techniques based on cueing, and a high dose of therapy. Our protocol, however, also differed from CIP in important respects. We did not focus on speech acts involving requests. We focused on confrontation naming, but we embedded naming in the context of a social game which allowed more participants to play at once and to play in teams, differentiating it from CIP. We believe that the potential for using social/team games in the treatment of PwA has not been sufficiently exploited. There is evidence that playing games results both in learning and improved mood (e.g., for dementia, see Dartigues et al., 2013; for motor impairments, see Vanacken et al., 2010). Embedding language exercises in team games played in medium-sized groups may increase motivation and engagement which is a problem with intensive therapy (e.g., Brady et al., 2016) and may provide additional social and emotional benefits, while reducing costs. Finally, we wanted to assess the effect of cueing more systematically, given the importance of cueing facilitation techniques for rehabilitation (see Best et al., 2013;Nickels, 2002).
There is strong evidence that phonological cueing helps with word retrieval, both in control and in aphasic speakers (see Kay & Ellis, 1987;Patterson, Purell, & Morton, 1983;Pease & Goodglass, 1978). It is not clear whether semantic cueing significantly helps retrieval at the point when a word is unavailable (see Meteyard & Bose, 2018). However, both naming therapies focused on phonological and semantic cueing have shown to be effective, probably because both of them help to strengthen links between phonological and semantic representation in lexical networks (for phonological therapies, see Hillis, 1993Hillis, , 1998Nickels, 2002;Raymer, Thompson, Jacobs, & Le Grand, 1993; for semantic therapies, see Boyle, 2004;Coelho et al., 2000;Nickels, 2002;Raymer et al., 1993; for a review of the efficacy of semantic feature analysis to improve picture naming, see also Maddy et al., 2014). There is also some evidence that practicing picture naming in association with gestures (observed or carried out) is effective, especially for PwA with lexical retrieval difficulties (Boo & Rose, 2011;Kroenke, Kraft, Regenbrecht, & Obrig, 2013;Marangolo et al., 2010;Rose, 2013;Rose & Douglas, 2008;Rose, Douglas, & Matyas, 2002) and that PwA can use gestures to self-cue while naming (Hanlon, Brown, & Gerstman, 1990;Lanyon & Rose, 2009). Gestures may help naming because of possible relationships between lexical representations and associated motor patterns (see embodied cognition; e.g., Jirak, Menz, Buccino, Borghi, & Binkofski, 2010;Pulvermüller, 2005). This may be particularly true for verbs (which are generally the target of gesture facilitation, see Boo & Rose, 2011;Marangolo, Cipollari, Fiori, Razzano, & Caltagirone, 2012), but it may also apply to concrete nouns which are often associated with actions.
While facilitation approaches are generally effective which one is more successful to improve naming is unclear. When phonological and semantic approaches have been compared, both have been found to be effective (Greenwald, Raymer, Richardson, & Rothi, 1995;Stimley & Noll, 1991;Wambaugh, 2003;Wambaugh et al., 2001) although there is some evidence of longer-lasting effects and more generalization with semantic therapies (Howard, Patterson, Franklin, Orchardlisle, & Morton, 1985;Holland, Johns, & Woollams, 2018;Neumann, 2018;Lorenz & Ziegler, 2009; for a review, see Wisenburn & Mahoney, 2009). Equally, when therapy using gesture has been compared with other approaches, similar efficacy has been reported (Boo & Rose, 2011;Raymer et al., 2007;Rose & Sussmilch, 2007). Comparing different types of cueing with our group-based game-based therapy can provide further evidence about the relative efficacy of different approaches.
In conclusion, our study wanted to assess the feasibility of a new mode of delivering SLT based on playing language games in teams (from now on, game therapy, GT), but incorporating rehabilitation techniques with a strong theoretical and empirical basis. We practiced picture naming and repetition combined with cueing, but in the context of a competitive game where participants worked/played in teams. This approach would be suitable for many patients with aphasia. Picture naming practices word retrieval and benefits participants with a clinical classification of anomia (see Howard, 1994;Maher & Raymer, 2004). Repetition practices phonological encoding and benefits participants who have difficulties in selecting and organizing phonemes for production (Wernicke's aphasia, conduction aphasia, jargon aphasia; see Galluzzi, Bureca, Guariglia, & Romani, 2015;Nickels, 2002;Romani & Galluzzi, 2005).
We assessed feasibility in terms of positive outcomes achieved (with gains hopefully being comparable to those reached through one-to-one therapy) and acceptability to participants. More specifically, we assessed efficacy in terms of (1) treatment-specific gains; (2) gains maintained over time; and (3) gains obtained both in picture naming and in a narrative context, as evidence of generalization to functional communication. In a very preliminary way, we have also compared outcomes of GT with what is currently offered by the National Health Service (NHS) and considered interactions with order of administration (standard therapy (ST) before GT or vice versa). We assess acceptability in terms of rate of attrition and responses to a satisfaction questionnaire. Finally, nested within the aim of proving the efficacy GT, we aimed to assess whether different cueing techniques (phonological, semantic, or gestural) could be differentially effective. We hoped that a new team-game approach to SLT could be effective and acceptable while, at the same time, bearing the promise of reducing professional cost, and increasing well-being and engagement by making the therapy more fun.

Participants
Twelve participants with stroke-induced aphasia were recruited from an outpatient neurorehabilitation unit (Moor Green Outpatient Brain Injury Unit) and two community services within Birmingham Community Healthcare NHS Foundation Trust. SLTs provided information to their patients and invited participation. Informed consent was obtained using an "aphasia-friendly" information sheet. Recruitment occurred in two phases, each aimed at recruiting a cohort of six participants; recruitment stopped as soon as this was achieved.
Inclusion criteria were moderate-to-severe word finding difficulties, with performance in the Boston Naming Test (BNT) being <50% correct and relatively well-preserved comprehension to allow coping with the demands of the game. Exclusion criteria were a history of alcohol and/or substance abuse; developmental difficulties; and/or any other neurological, psychiatric, or degenerative disease that could contribute to language or communication impairment. All participants were fluent English speakers before their stroke. They were either monolingual speakers or bilingual since early childhood with the exception of one participant (P5) who learned English in school in India, but reported to be already fluent in English when he arrived in the UK age 27. All participants were at least 3 months post onset.
All participants received some standard speech and language therapy during our study, as well as our experimental GT; we use the term ST(standard therapy) to refer to the SLT received by participants as part of current/standard NHS. All participants had also carried out some SLT prior the beginning of our study. Participants from Cohort 1 (P1-6) received some additional ST after GT. Participants from Cohort 2 (P7-12) received some additional ST after our initial assessment but before GT participants from the two cohorts differed marginally by age and months post onset (Cohort 1 included older and more chronic participants), but the two groups did not differ significantly in the amount of additional therapy received, education or baseline measures (see Table 1). Clinical classification was established through discussion with the referring SLT.
Standard Speech and Language Therapy (ST) was delivered either at the neuro-rehabilitation outpatient unit (Moor Green) or in the community by NHS Speech and Language Therapists. It was flexibly adapted to the needs of the patient and included a mixture of therapy approaches according to the individual's therapy goals: impairment-based (e.g., picture-naming), functional (e.g., use of a communication book), activity-directed (e.g., practicing phone calls), or participation-based (e.g., conversation groups). There was no overlap with the materials used in the GT. On average, participants carried out 51 hr of ST over 4 months (15.4 weeks), but there was a lot of variability with patients attending for 7-33 weeks and receiving between 7 and 101 hr of ST. This variability was due to different offerings by different NHS services and variable patient needs/goals.
Experimental GT was carried out at the outpatient neurorehabilitation unit. Each game was delivered by a senior SLT (Louise Lander, a member of the research team), assisted by either a trained psychology student or another SLT. Overall, participants carried out 54 hr of GT over a total period of 8 weeks (three periods of therapy with assessment weeks in between).

GT protocol
Each cohort of six participants was split into two teams of three. The purpose of the game was to gain points for one's team by naming pictures. The participant whose turn it was picked a card from a set and, without showing it to the other players, tried to name it. He/she received facilitating cues if necessary. The other members of the team could also accrue points for the team by helping the participant on call. Once the player on call had produced the target, each member on his/her team would repeat it. This ensured shaping and widened participation on each trial. At the end of each round, the card was placed face down at the bottom of the pile and play passed to the other team. Different numbers of points were gained depending on ease of naming and degree of help by the facilitator.
At the end of each (1 hr) session, points were tallied and the winning team declared. Participants were encouraged to change teams after each session to ensure that each individual had the opportunity to interact with and against all other individuals. As well  as negating potential differences in outcomes due to differing one-to-one interactions, this strategy also helped to maintain interest in the games. The facilitation techniques used by the SLT during the games were systematically varied by contrasting phonological, semantic, and gestural cueing techniques. These techniques were used with matched sets of nouns and verbs at different phases of therapy, from now on: Game P using phonological cueing, Game PG using phonological and gestural cueing, and Game PS using phonological and semantic cueing.
Game P. If the participant could not name the target, the facilitator provided phonemic or syllabic cues, or a model for repetition, as required. For example, for the target word "umbrella", the following hierarchy of prompts would be used: "what sound does it begin with?" -> "it begins with uh" -> "it starts um" -> "it's an umbrella".
Game PG. Participants were encouraged to gesture appropriately while trying to produce the target. If naming was unsuccessful, the facilitator produced gestural, as well as phonemic cues. For example, for the target word "umbrella", the following hierarchy of prompts would be used: "can you show me what you do with it?" -> therapist gestures opening an umbrella -> phonemic cueing hierarchy in tandem with gestures.
Game PS. Participants were encouraged to talk around the target by producing similar words, describing its semantic features, or producing a phrase containing the target. If naming was unsuccessful, the facilitator provided semantic, as well as phonemic cues. For example, for the target word "umbrella", the following hierarchy of prompts would be used: "what do you use it for?" -> "what does it look like?" -> "you need it when it rains" -> "you open it" -> "It's raining, you open your …" -> phonemic cueing hierarchy.
Each game condition was played for 3 hr day −1 , split into three separate 1 hrsessions, three times a week, for 2 weeks, totalling 18 hr for each game condition (for a total of 54 hr over 6 weeks across all game conditions).
For Cohort 1, each item was presented either 16 or 17 times during Games P and PG, and 12 times during Game PS. For Cohort 2, each item was presented 19 or 20 times during Game P, 25 times during Game PG and 15 times during Game PS. The lower number of presentations during the first cohort's therapy reflects the more severely dyspraxic participants in this cohort. These participants often struggled with articulation, taking longer to produce the targets. Fewer presentations during Game PS for both cohorts were due to the additional time needed for semantic elaboration.

GT materials
A set of 60 words was trained in each game condition (30 nouns and 30 verbs) for a total of 180 words. This was deemed acceptable to achieve a reasonable "therapy dose" for each target, while also ensuring that participants remained interested in the protocol and functional gains could be achieved (Cherney, 2012). Set A was trained in Game P, Set B was trained in Game PG, and Set C was trained in Game PS. Words in the three sets were carefully matched for frequency, age of acquisition, length, and phonological complexity (see Appendix 1). Picturable, easy-to-name verbs are harder to find than equivalent nouns. Thus, across the three sets of words (A, B, and C), verbs had significantly higher frequency than nouns and they were shorter (see Appendix 1). We included nouns and verbs in our therapy materials because improvement with both types of stimuli are important if functional gains are to be reflected in connected speech and narrative production.
Pictures were black and white line drawings mostly taken from the Object and Action Naming Battery (Druks, 2000) and the International Picture-Naming Project Database (Szekely et al., 2004). A small number were also taken from clipart sources online. All pictures were presented on 8-cm 2 white cards. Assessment of trained words was through naming the same pictures used in therapy and through descriptions of pictured scenes which we had previously demonstrated elicit the trained words in control participants.
There were three scenes for each set of words, each designed to elicit 20 target words (see Appendix 2). Word set A was probed by scenes depicting (1) a house interior (with kitchen, study, and living room); (2) a beach; (3) a street. Word set B was probed by scenes depicting (1) a garden; (2) the interior of a café; (3) a fair at a castle. Word set C was probed by scenes depicting (1) another house interior (two bedrooms, bathroom and room to be decorated); (2) a countryside scene; (3) a concert. All scenes were black and white drawings; each was presented on an A3 sheet.
The scenes were given to a group of 9 younger control participants and a group of 17 older control participants, all of whom were asked to describe what was happening. The number of target words produced was counted for each participant and each scene. The control results indicated that the pictured scenes were successful in eliciting the production of target words. For the group of older adult controls (N = 17), 35.9 (SD 6.9) targets were elicited for Set A, 41.1 (SD 9.1) for Set B, and 34.4 (SD 7.3) for Set C (maximum = 60 for each set). For the group of younger adult controls (N = 9), the figures were 35.4 (SD 8.2) for Set A, 40.1 (SD 9.7) for Set B, and 35.8 (SD 8.9) for Set C.

Design
When participants are few (like in our case where we have 12 participants) and may differ substantially on variables which affect therapy outcome-such as age, severity of lesion, time post onset, education-between-group comparisons lack power. A better option is offered by multiple baseline designs where the same participants are assessed multiple times with matched materials which have been either treated or untreated (see Nickels, 2002). We used a multiple baseline design in our study. Following a multiple baseline design, we compared performance with trained and untrained word sets at the same point in time as well performance on the same word sets at different times (before and after training).
Our design did not include a direct comparison with an alternative treatment since our main aim was to assess whether our intervention was viable, effective, and well-liked by PwA. However, we did want to gather some preliminary results on the relative improvements offered by our GT, and by ST as it is currently offered within the NHS and on the possible interactions between these treatments based on administration order. Thus, across two groups, we counterbalanced participation in ST, with one group having some ST before GT and another group having GT first and ST afterward. This allowed us some comparison of general language gains after the two approaches as well as an evaluation of whether GT is more beneficial when administered before or after some improvements have already been obtained with ST. However, we should note from the start that results can only be considered very preliminary, not only because of the size of our sample, but also because the ST received by our participants was very variable in content, frequency, and intensity, mirroring the variability of therapy offered within the NHS which depends on the goals of the individuals, but also on the practice and resources of referring trusts.
A schematic schedule reflecting our design is shown in Table 2. Across times we carried out the following assessments; some were more comprehensive, others more limited (see later for more details): Time 1. For Cohort 2: Baseline: Comprehensive assessment before ST; Time 2. For Cohort 2: Comprehensive assessment after ST; For Cohort 1: Baseline: Comprehensive assessment before GT; Time 3. After Game P (both cohorts), word set A (trained) and B (untrained), to compare trained and untrained word sets and performance before and after therapy for set A; Time 4. After Game PG (both cohorts), word set B (trained) and C (untrained) to compare trained and untrained sets and performance before and after therapy for set B; Time 5. After Game PS (both cohorts): Comprehensive assessment: word set C (trained) to assess gains compared to baseline; word sets A and B to assess short-term maintenance; language battery to assess general gains; Time 6. For Cohort 1: After ST (and 5-6 months after GT); Comprehensive assessment: all three sets to assess long-term maintenance of GT gains; general language assessment to assess any further gains provided by ST.

Assessment
We assessed gains in production of both nouns and verbs with the same materials used in training (picture naming), but also with descriptions of pictured scenes which we had previously demonstrated elicited the words used in therapy in control speakers. In addition, we assessed gains in an unrelated narrative task (recount of the Cinderella story) with a number of measures (see later). This will demonstrate whether gains extended beyond the narrow conditions used in therapy. Finally, we assessed possible improvements in standardized tasks such as the Comprehensive Aphasia Test (CAT) and the BNT and we probed satisfaction with our protocol using a brief questionnaire and a focus group. More or less comprehensive assessments were carried out at different phases in the therapy.

Limited Assessment
Limited assessments were conducted after each round of GT. Potential improvements after each specific game were assessed though production of target words in picture naming and scene description tasks.
2.5.1.1. Picture naming. Participants were asked to name the same pictures presented in the therapy, but this time presented in a randomized order on a computer screen. There were no time limits for response. Responses were transcribed and assigned 1 point if correct, 0 points if incorrect, and 0.5 points if produced correctly but after an appreciable delay (more than 5 sec as per CAT) and/or after a self-correction.  2.5.1.2. Scene descriptions. Participants were presented with each scene in turn and asked "What is happening here?" and, if a particular area needed prompting, "What about here?". Descriptions were recorded and then transcribed verbatim, including hesitations, false starts, fillers (umm..). The number of words trained in therapy which were produced correctly were counted. In addition, the quality of the narrative was scored using total number of words produced, words produced per minute, percent of CIU (correct information units), and percent of errors (syntactic, morphological, phonological and semantic). The same method was used for the Cinderella Story (described later).

Comprehensive Assessment
A more complete assessment was carried out at three points in time: For Cohort 1: before GT, after GT, after ST; for Cohort 2: before ST, after ST, after GT. Besides picture naming and scene descriptions, we administered The BNT (Kaplan, Goodglass, & Weintraub, 1983) this is a standardized measure of picture naming, making it an effective tool for identifying any generalisation of wordretrieval gains to items not directly targeted during the GT protocol.
The CAT (Swinburn, Porter, & Howard, 2004). CAT provides a comprehensive assessment of language ability, including 27 language and cognition subtests probing language semantics (semantic memory, word fluency, visual recognition, and object use with gestures), repetition (of words, nonwords, digits strings, and sentences), comprehension (of written and spoken words using sentences and paragraphs), spoken production, reading (words, complex words, function words, and nonwords), and writing (copying, picture naming, writing to dictation, picture description). We used all but one subtest of the language battery. We excluded CAT 17 (naming objects) since naming was evaluated with the BNT. We calculated an overall standardized score substituting the participant mean for CAT 17. The CAT overall score has a mean of 50 and SD of 10 based on the performance of a large population of PwA. Baseline language assessments with the CAT were used by a trained SLT to classify aphasia type (see Table 1).
The Cinderella Story Retell (Saffran, Berndt, & Schwartz, 1989). This is a common task used with PwA to probe narrative production. Participants were asked to retell the commonly known story of Cinderella. A picture book with text blocked out was provided prior to the retell task to remind participants of the story. This task provides a way to assess generalization of therapy gains to connected speech (Conroy et al., 2009;Saffran et al., 1989). Narratives were recorded and transcribed verbatim, including hesitations, false starts, and fillers (umm..). We scored the total number of words produced (excluding false starts and fillers), word rate per minute, percentage of meaningful words produced over total number of words (or rate of CIU), and percentage of syntactic, morphological, phonological, and semantic errors out of total words produced (see Marini, Andreetta, Del Tin, & Carlomagno, 2011;Nicholas & Brookshire, 1993).
Finally, we administered the Disability Questionnaire from the CAT which assesses the impact of the impairment on an individual's life from that individual's perspective with questions such as "what is it like talking to the person closest to you?" and "does it make you feel frustrated?" Questions are answered using a rating scale.

End of therapy
At the end of the therapy program, participants were invited to provide feedback through a focus group involving five participants and an aphasia-friendly questionnaire asking 12 questions regarding issues such as the suitability of the protocol, whether they enjoyed the protocol, their perceived improvements, and whether they found the therapy tiring.

Ethical approval
This study received ethical approval from the NHS Health Research Authority: Coventry and Warwick NRES Committee, REC Reference 15/WM/0210.

Effects of GT on trained words
These effects were assessed for picture naming and scene description. Figure 1 shows performance by point in time and word set (trained vs. untrained); results for type of word (nouns and verbs) are collapsed. There are clear interactions between word set and time with steep improvements in performance after a word set has received training, but not at other times. A number of planned comparisons were run to assess significance of results.

Overall analyses
First of all, to assess the overall effect of GT, we carried out within-subjects ANOVAs with rate correct in either picture naming or scene description as the dependent variable and Time as the independent variable, contrasting Time 2 (T2, before any GT) with Time 5 (T5, immediately after completion of all GT). Performance was significantly better after therapy both for picture naming and scene descriptions (picture naming: F(1,11) = 30.1, p < .001, ηp 2 = .73; scene description: F(1,11) = 20.2, p = .002, ηp 2 = .69).

Modulation of outcomes by type of therapy
Effects of therapy on picture naming according to type of game (with phonological cues, phonological + gestural cues, or phonological + semantic cues) and type of word (nouns or verbs) and are shown in Figures 2 and 3, respectively. Effects were statistically analyzed with within-subjects ANOVAs containing three within-subjects factors: Wordclass (nouns vs. verbs), Game-type (P, PG, PS), and Therapy-phase (before GT vs. after GT; T2 vs. T5). There was no main effect of Game type (F(1.33,14.66) = 0.56, p = .52) and no interaction of Game-type × Therapy-phase (F(2,22) = 0.292, p = .75). The effects of therapy were the same regardless of the type of cueing strategy used in the game. There was also no significant main effect of Word-class (F(1,11) = 26.266, p = .18) with similar gains for nouns and verbs, and no interactions: Word-class × Therapy-phase, (F(1,11) = 0.002, p = .99) or Game-type× Word-class× Therapy-phase (F(1,11) = .62, p = .56). One might expect Game PG (stressing gestures) to be particularly beneficial for verbs. However, actions are also closely associated to most concrete nouns. We did not systematically contrast strength and type of association with gestures for nouns and verbs. Instead, we wanted to assess generalized gains across types of words; establishing possible differences between nouns and verbs was beyond the remit of our study.

Results by participant
Individual participant results are shown in Figure 4. Different panels show outcomes for trained words immediately after therapy, 4-7 weeks after therapy (short-term maintenance) and 5-6 months after therapy (long-term maintenance). Immediately after GT, gains were significant in 9/12 patients in picture naming and in 9/10 patients in scene description. No significant improvements were seen in participants P8 and P9 who had very severe impairments with a floor effect at baseline. A third patient, P1, showed no significant effect in picture naming, but a significant effect in the scene descriptions. P9 and P10 were not tested with scene description because they were unable to complete the task. After 4-7 weeks, 9/12 participants in picture naming and 8/10 participants in scene description showed significant improvement when compared to baseline. After 6 months, 5/6 participants showed significant gains when compared to baseline. Only P4 showed no difference.
3.1.5. Effect of experimental and demographic variables (cohort, age, and time post onset) To examine a possible effect of cohort, we carried out mixed ANOVAs with number of words produced correctly in picture naming and scene descriptions as dependent variables, Cohort (Cohort 1 with ST after GT vs. Cohort 2 with ST before GT) as a between-subjects factor and Therapy-phase (T2/before GT vs. T5/after GT) as a within-subjects factor. There was no significant main effect of Cohort (picture naming: F(1,10) = .117, p = .74; scene description: F(1,8) = 0.08, p = .78) and no significant interaction between Cohort and Therapy-phase (picture naming: F(1,10) = .006, p = .94; scene description: F(1,8) = 0.5, p = .50). Further studies with well-matched cohorts are needed to properly assess the advantages of delivering ST and GT in different orders. Most importantly, significant improvements were shown across patients. In fact, there was no significant correlation between degree of improvement immediately after GT and either age (Pearson's r = .09, p = .79) or months post onset (Pearson's r = .32, p = 031), although these correlations are based on small samples.

General effects on language functions
Generalization of gains from GT was assessed by comparing performance before and after GT (T2 vs. T5) on the CAT, the BNT, untrained words in picture naming and on measures of narrative production. Results are shown in Figure 5. Narrative measures were collapsed across the Cinderella Story and the Scene Descriptions. There were no significant group differences with the CAT (even considering individual tests) or the naming of untreated words. There was, however, a significant improvement in the BNT when a one-tailed t-test was performed, (t(11) = 1.75, p = .05) and significant improvements in measures of narrative production in terms of overall number of words produced (t(9) = 2.68, p = .03) and percent of CIU (t (9) = 2.69, p = .03). Error rate and rate of words per minute did not change (error rate before GT: mean 53.2%, SD 34.6%; after GT: mean 56.1%, SD 28.8%; t (9) = 0.466, p = .65; word rate: before GT: mean 136.6, SD 156.7; after GT: mean 158.9, SD 141.7; t(9) = 1.130, p = .29). The presence of significant generalizations from picture naming to connected speech, at the group level, is encouraging.
At the individual level, only P11 showed significant improvement across tasks and measures. He showed gains in producing untrained words in picture naming and in the scene descriptions, as well as improvements in the BNT and in percent Outcomes compare performance before any game therapy (T2) with performance at the end of all game therapy (T5). Asterisks mark significant differences evaluated with chi-square for individual participants and with paired, one-tailed t-tests for the group (MEAN). As there is no set total for the CAT or word count in narrative production, individual chi-squares could not be performed for these measures.
of CIU in narrative production. Since he was the participant with the most recent stroke (12 weeks poststroke when he entered our study), gains could have been boosted by spontaneous recovery. However, P11 did ST first for 3 months, and showed no improvement within that period. This lack of improvement does not necessarily indicate that ST was ineffective as he received very little of it (9 hr). Moreover, P11 was initially very anxious and distressed by his condition and this may have affected the assessments. However, the contrast between the lack of gains within the first 3 months and the significant gains obtained with GT later on indicates that these gains were not simply due to spontaneous recovery. P11 really enjoyed the games and relaxed during the course of GT, therefore, taking full advantage of the practice provided.

Effects of ST vs. GT
The effects of ST are shown in Figure 6 which reports performance on experimental words in picture naming and scene description and performance on the BNT and the CAT, before and after ST. Here, we wanted to assess any positive effect of ST, and compare benefits on standardized tasks like the BNT and the CAT with those obtained with GT. Note, however, that our results cannot offer more than a rough indication of outcomes since type and amount of ST was so variable from one participant to another. Results were analyzed using mixed ANOVAs with language performance in different tasks as the dependent variable; Therapy-phase (before vs. after ST) as a within-subjects factor; and Cohort (1 vs. 2) as a between-subjects factor.  Figure 6. Individual outcomes after standard therapy (ST). Asterisks mark significant differences evaluated with chi-square for individual participants and with paired, one-tailed t-tests for the group (MEAN). As there is no set total for the CAT or word count in narrative production, individual chisquares could not be performed for these measures.

Satisfaction with therapy
Our therapy and therapy schedule were very well accepted by participants. Overall, 93.6 % (SD = 9.9) of scheduled GT hours were attended and there was only a minimal loss of attendance over time for the three games (hours attended for Game P = 99.3; Game PG = 96.8; Game PS = 84.7).
The disability questionnaire of the CAT did not show any differences in self-assessment of disability after either GT or ST. However, the therapy satisfaction questionnaire administered at the end of the protocol reported very positive feedback for GT with participants scores averaging 4.76/5 indicating very strong satisfaction with the therapy. All participants either agreed or strongly agreed that GT increased their confidence and was enjoyable. All participants reported an improvement in their talking and 9/12 expressed a preference for GT over ST, with the remaining three participants not expressing a preference one way or the other.
Participants in the focus groups highlighted how the playing the games was useful, enjoyable and helpful. They also noted how it was good to meet other people with language difficulties and the team work and mutual support during the games made this type of intervention preferable to therapy delivered one-to-one.

General discussion
The aim of our study was to pilot a therapy intervention for poststroke aphasia which combined, in a novel way, ingredients that we know are effective in therapy rehabilitation. We strived to devise an intervention which (1) Allowed high-intensity practice, but at reduced professional costs and maintaining high engagement; (2) Focused on treating language impairments, but also improved social interaction and confidence; (3) Incorporated evidence-based rehabilitation techniques; (4) Was suitable for most PwA experiencing moderate-to-severe difficulties.
Our solution was an intervention focused on word retrieval which used tasks (picture naming and repetition) and cueing techniques (phonological, gestural, semantic) that are of proven efficacy in aphasia rehabilitation but incorporating them within the setting of a team game. This aspect of the therapy was crucial in fulfilling many of the characteristics that we wanted to achieve. It allowed lower costs, since a single SLT could supervise therapy for several PwA at the same time (six in our case). It allowed the therapy to be more enjoyable than in traditional approaches. This, in turn, maintained high motivation throughout the intervention which is especially important in the case of prolonged and high-intensity therapy. Finally, it addressed the need to increase social support and social interaction. Participants playing in teams created more excitement and increased cohesion as participants worked together toward a common goal.
Our approach is not the first attempt to deliver aphasia therapy in the context of a game. CIAT/CILT are popular protocols which adopt a game approach and show benefits (e.g., Zhang et al., 2017). Our intervention, however, has novel aspects. It stresses a social game aspect more than CIAT/CILT by allowing more participants to play at once, split into teams, thus increasing social and motivating aspects of the game. Additionally, it systematically incorporates cueing techniques which have proven efficacy and are commonly used by SLTs in face-to face therapy. These cueing techniques should not only facilitate retrieval, but also strengthen links to phonological representations (through phonological cueing) and semantic representations (though both semantic cues and gestures).
Our results are encouraging. Our intervention was very well tolerated with high rates of attendance. Our intervention was also enjoyed by all participants who often preferred it to the one-to-one ST. Language gains were significant, widespread across participants, maintained over time and demonstrable across different tasks, suggesting benefits to functional communication. Gains immediately after GT were on average 25% in picture naming and 17% in the scene descriptions, which is close to, or above the level of 20% proposed to be clinically relevant by Ramsberger and Marie (2007). Across the two tasks, gains were maintained long-term (6 months after therapy) with, on average, 14.5% improvement from baseline. All participants, with the exception of two, showed significant gains considering both picture naming and scene descriptions together (10/12 participants). The two participants showing no improvement had very severe impairments, with a floor effect at baseline. They showed no improvement in spite of good engagement with the intervention. These participants may either need more time to show benefits or may not have enough neurological resources left to support recovery (see also Sul et al., 2016 for less or slower recovery in global aphasia).
Importantly, our game intervention produced significant gains with materials other than those directly used in therapy. The trained words were better used in connected speech when our participants were asked to describe pictured scenes constructed to elicit the trained words. Moreover, there were gains in narrative production in terms of both overall number of words produced and rate of meaningful words produced (CIU). These behavioral gains align with self-perceived improvements in talking. Our results are consistent with previous studies which have demonstrated significant benefits of practicing picture naming on functional communication (Conroy et al., 2009) and a strong association between the ability to produce words in picture naming and in connected speech (Herbert et al., 2008). In contrast, we found no gains in picture naming for untreated (experimental) words and only marginal gains in the BNT. Typically, picture naming therapies do not produce gains in these conditions (e.g., see Best et al., 2013;Nickels, 2002;Raymer et al., 2007). Gains for untreated words may be more difficult to demonstrate in conditions where production is very constrained with no leeway in the choice of words. Finally, our GT produced no gains in the CAT. This is not surprising. We trained word production and we expected gains to be selective in this domain. In contrast (and pleasingly), gains on the CAT were seen after ST where SLTs worked to improve their clients' communication across domains. We are currently developing more articulated group game-based approaches to train more integrated aspects of communication using games where participants practice not only picture naming, but also requests in everyday situations (e.g., at the café, at the doctor, in the post-office, etc.) which should booster gains in functional communication.
Our results compare well with gains reported in the literature for other forms of picture naming therapies treating aphasic participants singly or in pairs. We searched the literature using, in combination, the following key words: aphas* or anomi* AND therap* AND naming or "word retrieval" or constraint. We reviewed 19 studies and 22 therapy comparisons which reported the number/percentage of words gained after therapy as well as crucial treatment parameters such as number of hours and duration of treatment. Sixteen studies involved a one-to-one intervention, six involved treating participants in pairs and one involved both a one-to-one and a group intervention. On average, studies treated a limited number of participants (N = 5.5 per study; SD = 4; overall N = 122), therapy involved 15 hr (SD = 9), lasted on average 25.5 days (SD = 15), and treated 57 words (SD = 29). Treated words showed a 31% increase in number correct (SD = 15), with, on average, 17 words gained after therapy (SD = 11). Our study involved more participants (N = 12), more therapy hours (N = 54), lasted longer (42 days), and treated many more words (N = 180); treated words showed a 25% increase in number correct, with 45 words gained after therapy.
To compare the efficacy of different forms of therapy is not straightforward, but two criteria are relevant: number of words gained and effort (number of hours of therapy). Thus, a rough measure of therapy efficacy may be the number of words gained per hour of therapy. With this measure, our reviewed studies returned 1.7 words gained per hour of therapy compared to 0.8 words in our case. Our measure is lower. However, we treated a much larger number of words than most studies since this is important to improve functional communication. It is likely that gains are harder to achieve the larger the number of words treated. Moreover, while for one-to-one therapy the hours engaged by the client and the therapist coincide, this is not the case for group therapy where a single therapist is treating simultaneously several clients (6 in our case; thus, an SLT would spend 1/6 of the time required for one-to-one treatment). Therefore, there are potential cost savings with a group approach. Finally, the enjoyment and social interactions offered by social games may well produce emotional gains not elicited by one-to-one approaches. Future studies should compare more directly forms of groupgame-therapy with matched forms of picture naming therapy delivered one-to-one in terms of language gains, satisfaction with the intervention, and emotional gains.
We found no difference in efficacy depending on the type of facilitation cues used during the games. This result is consistent with others from the literature showing similar benefits across types of facilitation techniques (Greenwald et al., 1995;Holland et al., 2018;Neumann, 2018;Stimley & Noll, 1991;Wambaugh, 2003;Wambaugh et al., 2001). This does not mean that all types of facilitation are equally effective for PwA with different kinds of impairment (although relationships are not always transparent; see Boo & Rose, 2011;Kroenke et al., 2013;Lorenz & Ziegler, 2009). In a mixed group, however, it is not surprising to see no differences in average benefits. Our study did not have the power to differentiate between types of impairment. When team language games are applied to a clinical setting, we would favor an inclusive approach where people with different types of impairment are treated together, but facilitation is used flexibly by the game-leader depending on the individual participant. This would be consistent with recent studies which have combined different types of cueing techniques during therapy with positive outcomes (Carragher, Conroy, Sage, & Wilkinson, 2012;Drew & Thompson, 1999;Hashimoto, 2012;Le Dorze, Boulay, Gaudreau, & Brassard, 1994;Rose, Attard, Mok, Lanyon, & Foster, 2013; for evidence that using multiple cueing techniques results in increased gains, see also Greenwald et al., 1995;Lorenz & Ziegler, 2009;Wambaugh, 2003;Wambaugh et al., 2001;Rose, Raymer, Lanyon, & Attard, 2013).

Conclusions
We found positive outcomes for a new game-based, group rehabilitation intervention targeting word production difficulties in individuals with poststroke aphasia. Our results suggest that interventions like ours, which combine theoretically and empirically motivated techniques with the social and motivating aspect of a game, are a positive way to supplement one-to-one therapy delivered in resource-stretched national health systems. We specifically targeted word production difficulties, but there is no reason why a similar approach based on social/team games could not be extended to other aspects of language-sentence production, for example-and, thus, become appropriate for PwA with a wider set of needs. We are not advocating that interventions of the type assessed here should substitute for one-to-one therapy delivered by professional SLTs. However, they can be a valuable means of increasing practice, allowing patients to work in areas of special difficulty, to consolidate gains and to enjoy social interactions in a safe and supportive environment.

Disclosure statement
No potential conflict of interest was reported by the authors.

Scene 2-in the countryside
There is a tractor by the windmill. Man parachuting under rainbow. Man hammering nail on bench. Boy shivering wearing jacket, turning collar up, button hanging off. Boys feeding ducks and fishing, wearing scarf. Chickens following mother. Girl watching egg hatching. Dog burying bone, dog barking.

Scene 3-at a concert
Band playing guitar trumpet and piano. Woman singing at microphone. Man conducting, woman curtseying. People on balcony arguing, got camera. Listening to music, leaning against wall, got bottle, see shadow. Give ticket, shaking hands. People clapping and dancing.