Training flexible conceptual retrieval in stroke aphasia

Semantic therapy in post-stroke aphasia typically focusses on strengthening links between conceptual representations and their lexical-articulatory forms to aid word retrieval. However, research has shown that semantic deficits in this group can affect both verbal and non-verbal tasks, particularly in patients with deregulated retrieval as opposed to degraded knowledge. This study therefore aimed to facilitate semantic cognition in a sample of such patients with post-stroke semantic aphasia (SA) by training the identification of both strong and weak semantic associations and providing explicit pictorial feedback that demonstrated both common and more unusual ways of linking concepts together. We assessed the effects of this training on (i) trained and untrained items; and (ii) trained and untrained tasks in eleven individuals with SA. In the training task, the SA group showed improvement with practice, particularly for trained items. A similar untrained task using pictorial stimuli (Camel and Cactus Test) also improved. Together, these results suggest that semantic training can be beneficial in patients with SA and may show some degree of generalisation to untrained situations. Future research should seek to understand which patients are most likely to benefit from this type of training.


Introduction
Research has shown that semantic deficits arise in at least three waysthere can be difficulty activating heteromodal concepts from specific input modalities, degradation of heteromodal knowledge itself, and impairment of control processes that support access to non-dominant aspects of knowledge. These different deficits might benefit from different types of intervention. The first pattern is seen in patients with post-stroke aphasia who have difficulty accessing conceptual meaning from language, yet good understanding of pictures as in pure word deafness and Wernicke's aphasia (Robson et al., 2012;Thompson et al., 2015). Understanding the meaning of visual objects can also be specifically disrupted after posterior cerebral artery infarcts (Roberts et al., 2013). These patients are likely to benefit from compensatory strategies maximizing their use of preserved input pathways.
Degradation of heteromodal concepts, in contrast, results in multimodal semantic impairment, affecting both verbal and non-verbal stimuli. Atrophy of the ventrolateral anterior temporal lobes (seen in semantic dementia, SD) leads to progressive degradation of semantic knowledge. SD patients show loss of specific and less familiar items first and consistent performance across different tasks probing the same concepts (Jefferies & Lambon Ralph, 2006;Lambon Ralph et al., 2010). These patients show some benefits in training studies focussed on relearning conceptual distinctions as long as the training is continued, potentially reflecting the fact that the anterior temporal lobes (ATLs) can support patterns of relearning despite degradation (Bier et al., 2009;Heredia et al., 2009;Hoffman et al., 2015;Jokel et al., 2006;Jokel et al., 2010;Mayberry, Sage, Ehsan, et al., 2011;Reilly et al., 2010;Savage et al., 2013).
Heteromodal semantic impairment does not always reflect degraded knowledge, however. Work by our group and others (Jefferies, 2013;Jefferies & Lambon Ralph, 2006;Lambon Ralph et al., 2017;Thompson-Schill, 2003) shows that semantic deficits following left-hemisphere stroke can also reflect difficulty constraining retrieval such that it is appropriate to the context or task. We have referred to this pattern as "semantic aphasia" (SA), since it affects both verbal and non-verbal manipulations of semantic knowledge, including picture matching and object use (Corbett, Jefferies, Ehsan, et al., 2009;Lambon Ralph, 2009, 2011). SA patients are thought to have impaired executive control processes, which "shape" conceptual retrieval following damage to left inferior frontal and/or posterior temporal regions, in the face of intact conceptual knowledge and brain damage that spares ventrolateral ATL. This causes greater impairment when non-dominant information needs to be retrieved, or when strong distractors need inhibiting (cf. Whitney et al., 2011). There has been little attempt to design training or rehabilitation strategies for these patients based on this theoretical framework, although we might expect that approaches that provide practice in retrieving a range of different kinds of association (including non-dominant aspects of knowledge) might be most successful in promoting flexible patterns of semantic cognition.
Many studies employing training tasks in post-stroke aphasia have focussed on picture naming (Kiran & Bassetto, 2008) as opposed to cognitive or semantic control. Studies employing picture naming tasks tend to show a clear benefit for items that are trained multiple times, but weak generalization to untrained items (Davis & Pring, 1991;Marshall et al., 1990;Pring et al., 1993). This suggests that such training strengthens lexical-articulatory forms, or the links between these representations and conceptual features that are activated by the picture. Semantic approaches to aphasia therapy also typically target speech production but seek to drive improvements through greater accessibility of semantic features which converge on the target concept, allowing activation of the required lexical item (for a review, see Efstratiadou et al., 2018). In Semantic Feature Analysis (SFA), devised by Ylvisaker and Szekeres (1985), clients are asked to generate (or in some variants, verify) semantic features for concrete nouns, including superordinate category membership, properties such as colour or shape, actions, locations and associations. Meta-analyses and caseseries studies suggest that SFA is generally successful at cueing lexical retrieval in picture naming tasks in post-stroke aphasia (e.g., Boyle, 2010;Maddy et al., 2014): for example, Efstratiadou et al. (2018) performed a meta-analysis of 21 studies and 55 participants and found improvements in naming in 45 individuals (with 40% of the sample showing generalization to untrained items). SFA activates the central or dominant features and associations of each item, and consequently, while it can facilitate picture naming, this approach may not be optimal for improving comprehension in SA, since these patients are able to retrieve semantic information in dominant contexts, but show reduced flexibility when weak or subordinate knowledge is required to suit the current goal or context (e.g. Noonan et al., 2010). A related approach, Verb Network Strengthening Treatment or VNeST (Edmonds et al., 2009), involves activating when/where/why information, together with agents and recipients for verbs. There is some evidence that vNeST can produce improvements in sentence production that generalize to untrained items (e.g., Edmonds et al., 2009;Edmonds et al., 2014) although again, effects on semantic tasks have not been widely explored, and this approach is not aimed at the retrieval of non-dominant information.
Efforts to support semantic processing in post-stoke aphasia have been largely unsuccessful in people with poor comprehension at the single item level (e.g., Van Hees et al., 2013) and fewer investigations have attempted to ameliorate these comprehension problemsperhaps because it has been noted previously that semantic deficits in aphasia are often accompanied by broader deficits of cognitive control (Baldo et al., 2015;Jefferies & Lambon Ralph, 2006;Purdy, 2002;Thompson et al., 2018). Executive control is often impaired in people with post-stroke aphasia (especially in those with more significant impairment, Glosser & Goodglass, 1990;Purdy, 2002) and its preservation is thought to be necessary for the strong recovery of language after stroke (Geranmayeh et al., 2017). People with poor cognitive control respond less well to conventional speech and language therapy (Fillingham et al., 2005a(Fillingham et al., , 2005b(Fillingham et al., , 2006; for a systematic literature review see Simic et al., 2019). This might be because such individuals are less good at allocating and maintaining attention to the training task, and/or because their primary difficulty is not a weakness in any specific type of language or conceptual representation that can be overcome through practice. In fact, massed practice at retrieving the same specific meanings or associations would be arguably unhelpful in people with deregulated semantic cognition who have SA, since their primary problem appears to be the flexible retrieval of diverse information pertaining to the same concept at different times, depending on the context. A more successful approach might involve helping patients to access a wide range of different associations, some relatively strong and some weaker, depending on the semantic decision to be made.
The capacity to control mental activity in a flexible fashion, to suit the changing demands of a task, is highly relevant to communication and comprehension. For example, it can be necessary to focus on the subordinate meanings of ambiguous words or on specific task-relevant meanings in the context of strong distracting information. Semantic control areasmost notably left interior frontal gyrus (LIFG) and left posterior middle temporal gyrusare activated in healthy participants by a range of semantic control manipulations, including the contrast of hard and easy semantic judgements Badre & Wagner, 2007;Noonan et al., 2013;Thompson-Schill et al., 1997). This semantic control network partially overlaps with "multiple demand cortex" that supports cognitive control across tasks (Davey et al., 2016;Duncan, 2010). Patients with SA, who have damage focused on the leftlateralised semantic control network, have particular problems accessing nondominant semantic features and associations, and typically also have executive deficits on non-semantic tasks (Noonan et al., 2010). In a recent study, we found SA patients with lesions to LIFG showed increased recruitment of undamaged nodes within the semantic control network during the auditory presentation of ambiguous sentences ; this pattern is consistent with functional compensation in non-lesioned parts of this network. Similarly, another study found that the fMRI response to language stimuli in aphasia resembles the response evoked by hard-to-comprehend material in healthy controls , while the ability to activate cognitive control regions is predictive of recovery Geranmayeh et al., 2017). Evidence from traumatic brain injury patients suggests that cognitive control training is more effective than knowledge-based training (Vas et al., 2016) and promotes increased connectivity in multi-demand control regions (Han et al., 2018). Given these observations, cognitive training might benefit people with post-stroke aphasia if it can strengthen the engagement of control mechanisms within the language and semantic tasks.
In this study, we trained the retrieval of diverse types of association to improve comprehension in patients with SA. Although we examined SA patients in this study, our approach might be applicable to any groups with deregulated semantic retrieval, in which heteromodal comprehension is impaired as a consequence of poor control (such as patients with lesions in semantic control key areas following non-stroke aetiologies). Volunteers were asked to decide which word was associated with a probe word, and the associations to be retrieved ranged in their associative strength from weak to strong. On each trial, the participants were helped to understand the relevant association through the provision of feedback and a linking photograph that captured the relevant association in a concrete way. We presented novel training items within each session, to encourage flexibility, but a subset of the items was also repeated across time points. In this way, we could look at the extent to which any training effect generalized to untrained items.

Participants
Eleven patients [7 females, mean age = 61.1 (SD = 11.3); mean education leaving age = 16.5 years (SD = 1.35); mean years since CVA = 7.9 (SD = 5.32)] with chronic stroke aphasia from left-hemisphere CVA were recruited from stroke and communication support groups in Yorkshire, UK. Demographic details are reported in Table 1. Patients were selected to show multimodal semantic control impairment (see section "Inclusion criteria"). Besides their multimodal semantic impairment, the patients had a range of other language impairments (e.g., deficits in repetition and fluency of speech), although their comprehension problems could not be entirely accounted for in these terms. None of the patients was undergoing a structured course of individual or group therapy for the treatment of comprehension deficits during the course of the study, though one patient (MB) was using React2, a computerized self-guided naming therapy. This participant had been using React2 regularly for many years, making it unlikely that changes over the course of our two-week training could be attributed to React2.

Inclusion criteria
In line with the original use of the term "semantic aphasia" by Henry Head (1926) and the inclusion criteria proposed by Jefferies and Lambon Ralph (2006), the patients in this study were selected to show deficits affecting the appropriate use of concepts presented as words and objects when control demands were high. In addition to verbal semantic problems, they were impaired on at least one non-verbal task (see section "Tests of semantic control"). The sample size was determined by the maximum number of patients available to take part in the study. These criteria for including participants were established prior to data collection. There were no other inclusion/exclusion criteria. In common with previous SA samples (e.g., Jefferies & Lambon Ralph, 2006;Stampacchia et al., 2018), the patients showed strong effects of semantic control manipulations across tasks (details below). Individual patient data and task descriptions are provided in section "Tests of semantic control".

Lesion analysis
MRI scans were traced onto standardized templates (Damasio & Damasio, 1989) and lesion identification was manually performed (see Table 2 and Figure 1 for lesion overlay). All eleven patients had lesions affecting the left posterior LIFG; in eight cases this damage extended to mid-to-anterior LIFG. Parietal regions (supramarginal gyrus and/or angular gyrus) were also affected in 9 cases out of 11, and pMTG was affected in all but four cases. While there was some damage to ATL in 4 patients (SD, KQ, KA, VN), the ventral portion of ATL, which has been implicated in conceptual representation across modalities Patients' brains compared to aged-matched controls. Grey matter, white matter and CSF were segmented and changes from the healthy control brains were highlighted as "lesion" using automated methods (Seghier et al., 2008). Colour bar indicates amount of overlap from 1 to 11 patients.
Note: MRI scans were manually traced onto Damasio templates. Lesion size* was calculated as % template damaged. For areas not comprehensively characterized by Damasio templates, analyses were combined with manual analysis of the structural scan with the help of a trained radiographer. Quantification of lesion: 2 = complete destruction/serious damage to cortical grey matter; 1 = partial destruction/mild damage to cortical grey matter; "-" = intact.  (Binney et al., 2012;Visser et al., 2012), was intact in all cases. This region is supplied by both the anterior temporal cortical artery of the middle cerebral artery and the anterior temporal branch of the distal posterior cerebral artery, reducing its vulnerability to stroke (Borden, 2006;Conn, 2008;Phan et al., 2005). The hippocampus and parahippocampal gyrus were intact in all patients.

Open access and declarations
The conditions of our ethical approval do not permit public archiving of brain data, because participants did not provide sufficient consent. Researchers who wish to access the data should contact the Research Ethics Committee of the Department of Psychology, University of York, or the corresponding author. Sufficient data to replicate all results reported in the paper will be released to researchers, subject to the approval of the Research Ethics and Committee of the Department of Psychology, University of York, when this is possible under the terms of the GDPR (General Data Protection Regulation EU 2016/ 679). Behavioural data are provided in the Open Science Framework (https:// osf.io/2vuhk). Digital study materials (i.e., experimental scripts and pictorial stimuli as described in the following sections) are provided on Open Science Framework (https://osf.io/4ebgr/). The background neuropsychological materials are not provided on OSF since these included published and copyrighted tests, and because they were administrated as "paper and pencil tests". Researchers who wish to access these materials should contact the corresponding author. Codes of analyses (https://osf.io/gh9qz) of behavioural data are provided on the Open Science Framework.
No part of the study procedures and analyses was pre-registered prior to the research being conducted. All manipulations and measures of this study are reported in the following sections.

Non-semantic tests
Individual scores are reported in Table 1. To characterize language processing, we examined word repetition (Test 9 from PALPA, Psycholinguistic Assessments of Language Processing in Aphasia; Kay et al., 1992) and words per minute on the Cookie Theft picture description task (BDAE; Goodglass & Kaplan, 1983). Four patients showed severe impairment of repetition, while one had a milder impairment. Three of these four individuals were also unable to produce speech in the Cookie Theft picture description task, and three additional cases showed reduced speech fluency. Digit-span was impaired in six patients. We assessed executive function and non-verbal reasoning with Raven's progressive coloured matrices (Raven, 1962) and the Brixton rule attainment test (Burgess & Shallice, 1997).
Eight of the group showed deficits on at least one of these assessments, in line with previous studies which found that deregulated semantic cognition correlated with executive dysfunction in stroke aphasia (Jefferies & Lambon Ralph, 2006;Noonan et al., 2010;Thompson et al., 2018).

Cambridge semantic battery
This assesses semantic retrieval for a set of 64 items across tasks (Adlam et al., 2010;Bozeat et al., 2000), including picture naming, word-picture matching, verbal and pictorial semantic associations (Camel and Cactus Test). Wordpicture matching involved an array of ten semantically-related items, while the association judgements required a probe to be matched with one of four response options, presented as either pictures or words (in written form and also spoken aloud by the researcher). In line with their varying language output impairment, patients showed large variability during picture naming [percentage correct M(SD) = 58% (40.3)]. In contrast, the performance was uniformly at the ceiling in word-picture matching [M(SD) = 95.9% (5.2)]. The performance was poorer on the Camel and Cactus Test, which has higher control demands, and there was no difference across modalities [words M(SD) = 79.4 (15.7); pictures M(SD) = 80.4 (14.5)]. Individual test scores are provided in Table 3. All but one of the patients (DF) showed some impairment on this standard semantic battery.

Tests of semantic control
Four tasks manipulated control demands. All of the patients were below the normal cut-off on both verbal tasks and non-verbal judgements. Individual scores are reported in Table 3.

Ambiguity task
This probed the dominant (MONEY) and subordinate (RIVER) meanings of ambiguous words (e.g., BANK) in a four alternative-forced-choice task (Noonan et al., 2010). On some trials, there were sentence cues (e.g., for MONEY, I WENT TO SEE THE BANK MANAGER) or miscues that related to the irrelevant interpretation (e.g., THE BANK WAS SLIPPERY). All the patients were below the normal cut-off in all conditions, showed higher performance in the dominant than subordinate condition, and higher performance following cues than miscues (with the exception of VN and PV who were not tested with cues and miscues

Object use task
This task required patients to select an object to accomplish a task (e.g., bash a nail into the wood), with all items represented as photographs (Corbett et al., 2011). The target was either a canonical tool, normally used to complete the task (e.g., HAMMER), or an alternative non-canonical option (e.g., BRICK), presented among a set of five unsuitable distractors, requiring suppression of the irrelevant yet dominant use of the object. All of the patients (except JI, who was below the normal range for the picture Camel and Cactus Test test) were more impaired at selecting non-canonical targets [canonical M(SD) = 92.4 (7.5) vs. alternative M(SD) = 61.7 (19.4); t(10) = 7.70, p < .001]. As a group, they showed poorer performance for non-canonical targets than controls, who were not asked to select the canonical use due to ceiling effects: t(10.6) = 5.99, p < .001 (control data from Corbett et al., 2011).

Synonym tasks
(i) Frequency effects in 96-item synonym judgement : In this task, administered to all patients but VN, a probe word was presented with three response options. The words on each trial varied in lexical frequency and imageability (full task details in Jefferies et al., 2009). Patients with semantic aphasia, in common with those with "access" impairment, typically do not show sensitivity to frequency (Hoffman et al., 2011;Jefferies et al., 2007;Thompson et al., 2015;Warrington & Cipolotti, 1996), unlike semantic dementia patients with "storage" impairment  In summary, we selected patients with multimodal semantic deficits following left hemisphere stroke to take part in this study, since previous work has shown that patients with this profile have deregulated semantic cognition typically associated with damage to key regions implicated in semantic control, particular left inferior frontal gyrus (e.g., Jefferies and Lambon Ralph, 2006;Noonan et al., 2010). All eleven patients in this investigation had damage to this region, which is causally implicated in the control of semantic retrieval in healthy participants by inhibitory transcranial magnetic stimulation (e.g., Whitney et al., 2011). In contrast, ventral ATL which is implicated in heteromodal semantic representation by patients with semantic dementia, was intact: this watershed site is rarely damaged in stroke patients (e.g., Payabvash et al., 2011). Consequently, patients with heteromodal semantic deficits following left hemisphere stroke are thought to have semantic "access" deficits that disrupt the ability to flexibly retrieve relevant information to suit the current goals or context. In line with expectations for semantic control impairment, the SA patients in our study were impaired at retrieving non-dominant aspects of meaning across verbal and non-verbal tasks, like previous samples (Corbett et al., 2011;Jefferies & Lambon Ralph, 2006;Noonan et al., 2010). This pattern was seen near-universally, even in patient VN, for whom we had limited data. The patients showed attenuated effects of word frequency on the synonym judgement task compared with a sample of semantic dementia patients examined previously, in line with the profile for semantic "access" deficits. They also showed strong sensitivity to manipulations of semantic control. Difficulties in retrieving weak and non-dominant aspects of knowledge could reflect either loss of this knowledge or difficulties in constraining the retrieval to suit the circumstances. In this context, it is notable that the SA patients were vulnerable to miscuing effects since it not trivial to explain how these could arise in the absence of control impairment. A composite score reflecting each patient's overall semantic control abilities was derived from the Camel and Cactus Tests, Object use and the Ambiguity task without cues (i.e., the semantic control tests that were administered to all participants) using factor analysis. Patients are ordered by this composite score in all the graphs and tables.

Training study overview
The experimental design is summarized in Figure 2. Patients were trained using a semantic associative task (hereinafter referred to as "training task"), administered in six consecutive sessions across two weeks. We examined training effects by looking at the performance (i) over the course of training and (ii) on a semantic associative task that had the same design as the training task but without feedbackadministered before and after training. In both cases, generalization was examined by looking at performance on novel trials (i.e., Figure 2. Schematic of study design. Trained trials were repeated in every training session, whereas novel trials were only presented once. presented only once over the course of training) as opposed to repeatedly trained trials. (iii) We also repeated the ambiguity task, the object use task and the harder trials from the picture Camel and Cactus Test, shortly after the training period, to assess generalization beyond the training paradigm. All eleven patients took part in the behavioural tasks (i.e., training task, semantic associative task with no feedback, ambiguity task, object use task and camel & cactus) with the exception of VN who withdrew from the study and was not tested on the ambiguity and object use tasks after training.

Training task: Procedure
Participants performed a three forced-choice semantic association task (see Figure  3). Three words appeared on the bottom of the screen for 2500 ms, during which time they were read out aloud by the examiner, followed by a single probe word appearing at the top. Participants were required to point to one of the three words that had the closest semantic association with the probe word. There was no maximum time allowed for a response; participants were asked to guess if they were unsure. The examiner repeated the words again at the participants' request, in order to reduce the impact of reading impairment on performance.
At the end of each trial, participants were provided with feedback as to whether they were correct or incorrect. This took the form of a green tick with the word "correct," or a red cross with "incorrect." An image that reinforced the relevant semantic association was also displayed together with the probe and the correct response. For example, for the association between TAXI and PHONE, an image of a taxi free phone was displayed (see Figure 3). A verbal description was added to summarize the link between the target and probe if the picture was unclear to the patient. These images were presented for both correct and incorrect trials. The feedback and summary picture were presented until the patient was ready to move onto the next trial. Trials were separated by 250 ms fixation cross.
We manipulated the strength of association between the probe and target. Strong associations required little control over retrieval, since the dominant association for the probe corresponded to the target, while medium and weaker associations required more control over semantic activation in order to focus retrieval on the relevant relationship and suppress stronger but currently irrelevant associations (cf. Whitney et al., 2011). The distractor words for each trial were related to the target to increase inhibitory demands. For example TAXI -PHONE not E-MAIL, FAX (weak association); JELLY BEAN -NEWSAGENT not FLORIST, BUTCHER (medium-strength association); HEN -EGGS not MILK, CHEESE-CAKE (strong association, see Figure 3). Forty trials were repeated in every session, whereas 25 novel trials were presented to test for generalization (see Figure 2). This gave 65 trials per session for analyses. Each training session started with 3 practice trials which were omitted from the analysis and lasted around 15-20 minutes. The order of the training sessions was counterbalanced across participants. The strength of association for each of these trials was matched across sessions (i.e., each session had the same overall level of difficulty). Associative strength was derived from Edinburgh Association Thesaurus (EAT; Kiss et al., 1973). Approximately one-third of the trials in both the repeated and novel conditions were strong, medium and weak associations. For the repeated trials, the average association on the EAT was as follows: Strong M (SD) = 5.9 (0.3); Medium M (SD) = 4.8 (0.4); Weak M (SD) = 3.1 (0.5). For the novel trials, the average association on the EAT was similar for strong, medium and weak trials: Strong M (SD) = 6.1 (0.4); Medium M (SD) = 4.8 (0.5); Weak M (SD) = 3.2 (0.6).
The six training sessions were conducted over a 2-3-week period, with sessions separated by at least 24 hours. This was motivated by accumulating evidence that brief intensive aphasia therapies are associated with better outcomes than more distributed and prolonged interventions (Stahl et al., 2018). The task was presented using E-Prime 2.0 (Psychology Software Tools). The complete list of stimuli is provided in the Appendix, Table 1.

Semantic associations without feedback: Procedure
Before and after training, participants performed a task with the same format as the training task, but without the provision of feedback and the linking picture after each trial. As in the training task, associative strength between the probe and target was manipulated; this was matched across the pre-and post-training sessions. There were 82 trials: 24 were trained (16 of these were trained repeatedly, and 8 were trained only once; all trained trials were tested in both pre-and post-training sessions) and 58 were not (34 of these trials were repeatedly tested in both pre-and post-training sessions whereas 24 were tested either prior or after training). The complete list of trials is provided in the Appendix, Table 2. This procedure therefore assessed whether (i) there was an overall improvement in selecting the correct semantic associate among distractor following training and (ii) whether any improvement was restricted to trials that had been trained or generalized to trials that had not been trained.

Untrained semantic tasks: Procedure
A set of semantic assessments were repeated in the two weeks before and after training, to characterize any changes in performance over the training period. After training, we retested the ambiguity task (dominant vs. subordinate without cues), the object-use task and a subset of 26 of the harder Camel and Cactus Test trials (these trials were selected according to the performance of an earlier sample of SA cases who had completed the full assessment; they were the items with the poorest performance, across both picture and word versions). Individual analyses were performed on overall accuracy (without distinguishing between conditions) to retain sufficient statistical power.

Behavioural analyses overview
Repeated-measures ANOVAs and 2-tailed paired samples t-tests were used to assess training effects and experimental manipulations (e.g., trained vs. novel, associative semantic strength) at the group level. Individual performance was analyzed using McNemar tests when the same trials were tested at different time points (such as for repeatedly trained trials during the first vs. last session of the training task). When different trials were presented before and after training, such as for novel trials of the training task, chi-square and Fisher's exact tests were used.

Training task: Results
Group level effects during training task Figure 4 shows the key results. A 6 (training sessions) by 2 (repeated vs. novel) by 3 (strong, medium and weak associations) ANOVA revealed an overall improvement across sessions main effect of training session [F(5,50) = 4.1, p = .004] and higher accuracy for repeated as opposed to novel items [F(1,10) = 68.61, p < .001]. There was also a main effect of strength of association [F(2,20) = 32.57, p < .001], revealing higher accuracy for strong vs. medium vs. weak associations. There were also two interactions. There   for six comparisons; all the other comparisons were non-significant]. All other interactions were non-significant [F < 1.5]. Figure 5 shows the key results. Effects of repetition and strength of association were examined in each individual patient using separate analyses to increase statistical power. For repeated trials, 3 patients (SD, WB and VN) showed significant improvement from session 1 vs. session 6 [McNemar p ≤ .008]. In all the other cases performance was higher in the last vs. first session of training, but this did not reach significance. For novel trials, only KQ showed a trend towards higher accuracy in the last vs. the first session [χ 2 (1) = 3.31, p = .069]. SD, KQ and PV showed increased accuracy between the first and last session for, respectively, strong [χ 2 (1) = 4.75, p = .029], medium [χ 2 (1) = 4.75, p = .029] and low [Fisher's exact test: p = .037] strength of association trials. No significant improvement was found for all other patients [χ 2 (1) < 3].

Individual analysis during training task
In summary, out of 22 sets of items (11 patients by novel/repeated sets), only 3 showed a statistically significant effect of the training, although 16 sets showed a numerical difference in the correct direction.

Semantic associations without feedback: Results
Group level effects comparing pre-and post-training sessions Figure 6 shows the key results. A 2 (session: pre vs. post) by 2 (trained vs. for trained trials. Paired t-tests showed that there was an improvement in accuracy on trained items [t(10) = 3.65, p = .008, Bonferroni corrected for two comparisons] but no significant improvement on untrained items [t(10) = 1.56, p = .3, Bonferroni corrected for two comparisons]. There were also interactions of training by the strength of association [F(2,20) = 4.64, p = .022] and a threeway interaction [training by the strength of association by session: F(2,20) = 8.94, p = .002]. This revealed that for trained trials, there was a trend toward improvement after training for weak trials only [t(10) = 3.01, p = .072, Bonferroni corrected for six comparisons, all the other comparisons were non-significant]. For untrained trials, performance improved for strong trials only [t(10) = 3.17, p = .06, Bonferroni corrected for six comparisons]; no difference was found for medium associative strength and performance dropped for weak associative trials [but this was not significant: p = .33].
Individual analysis comparing pre-and post-training sessions Effects of training and strength of association were examined in each individual patient using separate analyses to increase statistical power. Training effects were examined using McNemar for trials tested before and after training and Chi-square for trials tested either in pre-or post-training session. None of the patients showed a significant improvement in accuracy for untrained 3.72, p = .054: Fisher's exact test: p = .005]. No significant improvement was found for medium and low trials for all the patients [χ 2 (1) < 2].
Untrained semantic tasks: Results comparing pre-and post-training sessions

Discussion
In a group of semantic aphasia (SA) patients with multimodal semantic deficits stemming from poor semantic control, we assessed the effects of a training task designed to encourage flexibility in the retrieval of semantic associations. We found improvement in the accessibility of semantic associations; however, this effect was more marked for items that were trained repeatedly. There was little evidence of generalization of this training effect to novel items in the training task itself. Nevertheless, the group showed some generalization because a similar yet untrained task involving the retrieval of semantic associations (Camel & Cactus Test) also improved with training. It is not clear why the novel items within the training task and the Camel & Cactus Test showed different patterns; one possibility is that the harder Camel and Cactus items were highly sensitive to changes in semantic control.
Training often does not generalize to new itemsfor example, in picture naming therapies for participants with aphasia, often only the trained item set shows facilitation (Davis & Pring, 1991;Marshall et al., 1990;Pring et al., 1993). Training effects also typically fail to generalize to untrained tasksfor example, following protocols to increase cognitive control or working memory capacity, performance gains often do not extend to untrained paradigms that recruit the same putative cognitive processes (Melby-Lervåg & Hulme, 2013). Our results are broadly consistent with this pattern of weak or non-existent generalization. However, the group-level effect on one of our background semantic tasks (Camel & Cactus Test) is promising, suggesting this type of semantic training might be more broadly beneficial. There is also some evidence that some individuals in our case-series benefitted more than others: although our single-subject analyses had substantially-reduced statistical power relative to the group-level analyses, we found that one individual case showed an effect of the semantic training task on novel trials, while three individuals showed significant changes on trained trials. More research is needed to predict which patients are most likely to show benefits from semantic training: unlike studies of Semantic Feature Analysis (e.g., Boyle, 2010), there is no suggestion in our data that the most impaired patients were least able to benefit, even though these more severely affected individuals are likely to have had additional deficits in cognitive control (Jefferies & Lambon Ralph, 2006).
There are relatively few studies of semantic rehabilitation which use semantic judgements as opposed to lexical retrieval (e.g., picture naming tasks) as the outcome measure (see Efstratiadou et al., 2018, for a review). Given that semantic deficits are common in aphasia, an important aspect of the current investigation is our demonstration that single-item semantic judgements (including non-verbal semantic decisions) show improvement following semantic training, at least at the group level. Moreover, since multimodal semantic deficits in aphasia are associated with specific difficulties in retrieving non-dominant aspects of knowledge, our study provides an example of how training on a task designed to tap this particular difficulty can lead to improvements in performance in patients with SA. It might be that more intensive training over a longer period, with more sessions, or more trials per session, could produce a larger effect. In addition, it might be possible to optimize the training to encourage relevant patterns of retrieval across different contexts. For example, the word bank could be trained on associations of its dominant (i.e., financial institution; e.g., bankmoney, not morning, heart or child) or subordinate (i.e., the edge of a lake/river; e.g., bankriver, not dress song or birth) meanings on consecutive trials. This could promote flexible retrieval of conceptual knowledge according to the task requirement. The current investigation is also insufficient to identify the underlying cognitive change that was responsible for the improved performance that we saw: the training task may have facilitated the recovery of semantic control processes, for example through additional recruitment in undamaged parts of the semantic control network (cf. Hallam et al., 2018), or it may have allowed patients to identify compensatory strategies beyond semantic control.
In conclusion, patients with semantic control deficits may benefit from training tasks that encourage the retrieval of diverse types of semantic associations. There were clear individual differences in our sample, suggesting not all patients will be able to generalize the effects of training to untrained items or tasks. Our results confirm the need to develop more effective training protocols that target semantic control and executive processes in patients with aphasia, since this kind of training is thought to be more likely to produce functionally-meaningful improvement (Han et al., 2018;Vas et al., 2016). This may be the case especially in people with semantic aphasia who have difficulty regulating their retrieval of conceptual information, yet little loss of semantic knowledge from long-term memory. Not enough is yet known about the optimization of such cognitive trainingfor example, too much repetition of the same items might reduce mental flexibility, as these items become too dominant within the mental landscape. However, too little repetition might reduce the opportunity for patients to accurately retrieve diverse types of associations for themselves, and thereby acquire more effective retrieval strategies. Moreover, there is a need for research that confirms whether different approaches to neurorehabilitation are maximally effective in patients with semantic deficits that have different underlying causes, such as contrasting patients with deregulated semantic retrieval following left hemisphere stroke with patients with degraded conceptual knowledge in the context of semantic dementia.

Disclosure statement
No potential conflict of interest was reported by the author(s).