How long can naturalistic L2 pronunciation learning continue in adults? A 10-year study

Abstract We examined the naturalistic pronunciation development of two groups of L2 speakers over 10 years. Initially, 50 beginner ESL students participated in production tasks; despite attrition, the tasks were administered eight more times. Here we report listener judgements of accentedness, comprehensibility and fluency for the remaining six Mandarin and 12 Slavic language speakers at Year 10. Analyses of listener judgments of accentedness, comprehensibility, and fluency of utterances recorded at the 2-month, 1-year, 2-year, 7-year and 10-year points revealed that the Slavic language speakers improved in comprehensibility and fluency at each comparison point, while the Mandarin speakers’ results were variable; there was improvement in comprehensibility from Year 7 to Year 10, but only after worsening at earlier points. The Slavic language group showed improvement in accentedness several times, whereas the Mandarin group showed no improvement in accentedness at any point. The data were examined for individual differences in learning trajectories. Interview responses and a survey of language use were compared to participants’ trajectories. Some speakers showed steady improvement from Year 7 to Year 10, but the majority plateaued or regressed. We also elicited speakers’ views of their progress. The results are interpreted through Complexity Theory and the Willingness to Communicate framework. Suggestions are made for research and teaching interventions. ABSTRAIT Nous avons examiné longitudinalement le développement en contexte naturel de la prononciation en anglais L2 chez deux groupes de locuteurs: mandarin (LMA) et langues slaves (LSL). Au départ, les 50 participants ont réalisé les tâches de production, qui leur ont été passées huit fois de plus, sur une période de dix ans. Nous rapportons ici les jugements sur l‘accentuation, la compréhensibilité et la fluidité des locuteurs restant à la 10e année, après attrition au sein des deux groupes (LMA n = 6; LSL n = 12). Les données collectées aux années sept et 10 ont révélé que les LSL ont amélioré leur intelligibilité et leur fluidité à chaque point de comparaison, tandis que les LMA présentaient une variabilité: une amélioration de la compréhensibilité de la 7e à la 10e année, après une détérioration lors de points de comparaisons précédents. Les LSL ont démontré une amélioration de l‘accent à plusieurs reprises, tandis que les LMA n’en ont montré aucune dans le temps. Les données ont été examinées pour déterminer des différences individuelles. Les réponses aux entretiens et une enquête sur l‘utilisation de la langue ont été comparées aux trajectoires des participants. Une amélioration constante a été observée de la 7e à la 10e année chez certains locuteurs, mais la majorité d’entre eux a plafonné ou encore régressé. Nous avons également sollicité l’avis des locuteurs sur leurs progrès. Ces résultats sont interprétés dans le cadre de « la volonté de communiquer ». Des suggestions sont offertes pour la recherche et les interventions pédagogiques. PLAIN LANGUAGE SUMMARY This longitudinal study involved collecting data from two groups of second language speakers (Mandarin and Slavic language speakers) over 10 years. We recorded picture narratives from the 2-month, 1-year, 2-year, 7-year, and 10-year points. Audio samples of 20–25 s were randomized and played to 20 listeners who rated them on three dimensions using 9-point scales: fluency (1 = extremely fluent; 9 = extremely dysfluent); comprehensibility (1 = easy to understand; 9 = extremely difficult to understand); and accentedness (1 = no accent; 9 = extremely strong accent). Analyses indicated that the Slavic language speakers improved in comprehensibility and fluency at each comparison point, while the Mandarin speakers’ results were variable: there was improvement in comprehensibility from Year 7 to Year 10, but only after worsening at earlier points. The Slavic language group showed improvement in accentedness at several times, whereas the Mandarin group showed no improvement in accentedness at any point. We then considered individual trajectories, in the light of the language contexts in which each participant was immersed. There was tremendous variability in learner progress, but interestingly, the person who had the best ratings at the outset was judged the best after 10 years. It is clear that each person’s individual circumstances had an effect on their productions in English. The trajectories also suggested that some individuals made consistent progress in comprehensibility and fluency from Year 7 to Year 10, demonstrating that naturalistic language learning, at least for these dimensions, can continue for a long time, depending on the individual’s personal circumstances. The richer the interaction opportunities, the better.


Introduction
Language acquisition, whether first (L1) or second (L2), involves the development of a complex cognitive skill.Yet, it is also an inherently social phenomenon (Beckner et al., 2009;Ellis, 2014).Across a wide range of social contexts and cultures, L1 listening and speaking skills arise as a product of interaction with caregivers, with extended family and with the wider community (Bybee, 2010).In contrast, adult second language acquisition (SLA), even for listening and speaking, normally entails some type of formal instruction or study.This requirement stems, in part, from fundamental differences between the cognitive systems of L1 child and L2 adult learners (Serafini & Sanz, 2016).It is also linked to constraints on naturalistic learning opportunities available to adult L2 learners, which could contribute to further changes in their linguistic systems (Flege, 2018;Hartshorne et al., 2018;Rothman & guijarro-Fuentes, 2010).Competing demands for adult L2 learners' time (e.g.employment, family care, etc.) limit exposure to L2 input, relative to the massive and varied forms of language experience available to children.Thus, explicit instruction helps to fill a gap in adults' experience in a targeted and time-efficient manner.
The sheer abundance of instructed SLA studies investigating adult learners, in contrast to a relative dearth of studies of naturalistic L2 learning by adults, may cause researchers to regard instruction as the predominant pathway to SLA.It is possible, however, that L2 skills of adult learners in target language environments continue to develop far beyond a period of formal instruction (Flege, 2018), despite frequently cited claims to the contrary (Han, 2013;Lenneberg, 1967;Nakuma, 1998;Selinker, 1972).Consequently, it is important to understand which factors promote naturalistic learning, what trajectories that learning takes, and whether there are observable limits to improvement.There are no shortcuts to investigating such questions.Rather, the necessary research must be longitudinal, tracking the development of language skills over an extended timeframe.Researchers can then consider how cognitive mechanisms are affected by diverse patterns of language experience resulting in widely varying outcomes (e.g.Backus, 2020;Beckner et al., 2009;Behrens, 2009;Bybee, 2010;Ortega et al., 2016).

Complexity theory
Complexity Theory (CT), which includes complex adaptive systems (CAS), provides a useful framework for understanding individual differences in L2 learning trajectories, because it highlights how relationships between subcomponents of both cognitive and social systems influence linguistic performance (Larsen-Freeman, 2020;Larsen-Freeman and Cameron, 2008).As Beckner et al. (2009) point out, cognitive and social variables associated with language learning 'are not independent of one another but are facets of the same complex adaptive system (CAS) ' (p. 2).This CAS stems from individual speakers each bringing their own experience to bear upon learning, in interaction with the language experience of individual interlocutors.A learner's language ability during any communicative event is the product of both their previous language experiences and the present interaction.Myriad competing constraints related to individual differences (e.g.L1 background, language aptitude, cultural constraints, motivation, etc.) also influence past, present, and future language performance.Beckner et al. (2009) argue that a CAS approach is best suited to providing a unified explanation for variation in language use within and between speakers, and for continuous changes to their linguistic repertoire.
Emergentist scholars (Caldwell-Harris & MacWhinney, 2023;MacWhinney, 2021) provide similar insights, arguing that linguistic complexity and diversity emerges from a competition between cognitive constraints (e.g.aptitude, automatized L1 systems), and motivational and environmental forces, which in turn lead to varied communicative goals and outcomes.The emergentist account strongly emphasizes a complex relationship between L2 learning and a wide range of timescales, from the moment an interaction takes place to changes across the lifespan of the learner.
While Complexity Theoretic approaches can be applied to any subdomain of SLA, we are particularly interested in how these approaches can help us understand the development of oral skills, which have long been recognized as being especially resistant to change relative to other aspects of SLA such as grammatical and lexical knowledge (Selinker & Lamendella, 1978;Scovel, 1988).We know, for example, that a primary determinant in the development of more target-like L2 pronunciation is experience, which can at least partially overcome the significant effects of learners' biological age and L1 background (Flege, 2018;Flege & Bohn, 2021).In terms of timescale, Derwing and Munro (2014) have defined a window of maximal opportunity (WMO) during the early stages of exposure to the target language in a submersion environment when L2 pronunciation seems most amenable to change.They have also demonstrated that the WMO does not preclude change that occurs well beyond adult learners' early months or years, especially in the context of explicit instruction (Derwing et al., 2014).

Willingness to communicate
In naturalistic L2 learning, the availability of interlocutors to provide learners with the massive experience their developing system needs, and learners' willingness to take advantage of such opportunities, should be of paramount concern.Access to and desire to speak with interlocutors in the L2 can be conceived of as a subsystem of the larger CAS.Historically, as Ushioda (2011) notes, 'L2 motivation research has been concerned more with idealised language learners as theoretical abstractions or bundles of variables, rather than with learners as uniquely complex individual 'people' , with particular social identities, situated in particular contexts ' (p. 222). MacIntyre et al. 's (1998) Willingness to Communicate (WTC) framework situates learning within a multi-layered interactive hierarchy of contextual, affective, and cognitive factors.Together, these factors can help to predict the probability that individual L2 learners will participate in specific communicative acts with specific interlocutors, ultimately leading to change within their L2 systems.Recognizing the importance of timescale, the L2 WTC framework has more recently expanded to account for the highly fluid nature of motivational propensities to communicate, by using an idiodynamic method for evaluating individual L2 learners' experience of communicative events, in real time, and on timescales as short as a few minutes (MacIntyre, 2020).This line of research has revealed that the psychological conditions necessary for accessing opportunities for language input can rapidly shift in both instructed and naturalistic learning contexts.

The current study
While SLA research has clearly evolved toward greater emphasis on the complex and dynamic systems of individual learners, the subfield of L2 pronunciation research has been slower to follow suit.For the most part, pronunciation research continues to focus primarily on group averages, with the laudable goal of identifying the most productive instructional means of enhancing speakers' intelligibility, comprehensibility, and fluency.However, it is clear that reliance on group data has serious drawbacks, as Zielinski and Pryor (2020) have recently shown.In a longitudinal study tracing comprehensibility ratings and English use in 14 learners over 10 months, they determined that use of English varied considerably, not just between learners but within learners over time.Outcomes varied with an apparent relationship between L2 use and better comprehensibility in some but not all cases.Their data clearly demonstrate the need to examine this relationship further.Moreover, Zielinski and Pryor's findings are in accord with Flege and Bohn's (2021) SML-r (Speech Learning Model revised) which articulates the view that 'phonetic systems of individuals reorganize over the lifespan in response to the phonetic input received during naturalistic learning ' (p. 3).Nagle (2022aNagle ( , 2022b) ) has called for more longitudinal research probing individual differences, taking into account the dynamic nature of factors such as motivation, use, and opportunity.A key point he raises is the non-invariant nature of the many influences that distinguish individual performance.Nagle's (2021) survey of the literature from 2006 to 2021 identified 39 self-described 'longitudinal' studies, of which only 14 had more than three data collection points, and a mere five spanned more than 1 year.
Several years ago, we launched a longitudinal project involving two groups of adult immigrant learners who were studying English out of necessity, unlike the participants in many language studies, who self-select to take post-secondary language courses.Our participants were likely to represent a greater range of aptitudes, motivations, and social circumstances (Derwing et al., 2006).Comparing their performance over an initial 2-year period, Slavic language (SL) speakers showed significant improvement in both fluency and comprehensibility, whereas Mandarin (MA) speakers did not (Derwing et al., 2007).In a follow-up investigation comparing comprehensibility ratings of speech samples from 10 months to 2 years and 2 years to 7 years, the SL group again showed significant improvement over both time periods, whereas the MA speakers showed no change (Derwing & Munro, 2013).The SL group also showed evidence of significant improvement in fluency over the same time periods.
In the current work, we have extended this research to 10 years for a residual subset of the same MA and SL speakers who participated in these earlier studies.We used quantitative data to measure changes in accentedness, comprehensibility, and fluency.We then related the results to qualitative data based on interviews (Appendix A) and language use questionnaires (LUQs) (Appendix B).We looked closely at each speaker's unique experiences over the years of the study.Because this study was begun well before much work was being done within the CT framework, we did not apply data collection methods with this particular analysis in mind.Nevertheless, CT and the WTC framework provide helpful tools for making sense of our data.

Research questions
1.In view of the progress or lack of it in our earlier studies of these individuals, is there naturalistic improvement or worsening in accentedness, comprehensibility, and fluency between the 7th and 10th years in the L2 environment?2. Will differences be maintained in the trajectories of two distinct learner groups (L1-Mandarin and L1-Slavic language)?3. To what degree do the sociolinguistic contexts of the speakers influence their individual development?

Speakers
The L2-English speakers were 6 L1 speakers of Mandarin (MA group: M age = 34.5 years, range = 26-39 at time of immigration to Canada; 3 men and 3 women) and 12 L1 speakers of a Slavic language (SL group: 9 Russian and 3 Ukrainian; M age = 38.8years, range = 19-48 at time of immigration to Canada; 5 men and 7 women).At the outset of the study, the participants were enrolled in a fulltime English language program; they were all at Canadian Language Benchmarks Stage 1 (beginners), and all had completed secondary education in their countries of origin.These speakers were the only accessible individuals from the original cohort who enrolled in the longitudinal study 10 years earlier.At the first data collection point in 2002, oral L2 English narrative and sentence recordings were elicited from 25 NSs of Mandarin and 25 NSs of a Slavic language; this step was repeated eight more times.Table 1 summarizes the numbers of speakers who participated at the end of Years 1, 2, 7, and 10.Several of the initial participants returned to China.Other participants moved elsewhere in Canada and some dropped out for unknown reasons.
We obtained demographic information from participants at the study's outset and collected language use information at all other times.The questions in the language use questionnaires (LUQs) included self-report data concerning the frequency with which they interacted in English with monolingual English speakers and L2-English speakers and information about their daily exposure to English through television and talk radio.With respect to English interactions, participants chose from five categories (never, 1-3 times/week, 4-6 times/week, 1 time/day, or several times/day) to indicate how often conversations lasted longer than 10 minutes.In response to questions about television and talk radio exposure, they selected from among five categories (<1 h, 1 h, 2 h, 3 h, or >3 h/day).

Stimuli
Oral narratives from the L2 speakers were recorded at multiple points after the beginning of the study, five of which are considered here: 2 months, 1 year, 2 years, 7 years and 10 years.Although we collected data at nine times, these five were selected because they provide us with a full perspective on progress over 10 years.Moreover, inclusion of speech samples from all nine times would make the listener rating task prohibitively long.In the first year, the recording site was a quiet room at the learners' language schools.Later recordings were made in a research lab at the University of Alberta.In all sessions, a high-quality digital recorder with an external microphone was used.The speaking task was a description of an 8-frame picture story of a man and a woman who meet on a busy city street and accidently switch identical suitcases (Derwing et al., 2004).The first 20-25 s of each recording, excluding any initial false starts, were normalized to peak amplitude and saved for subsequent presentation to listeners.We also selected speech samples from two L1-English speakers to verify listeners' attention to task during the rating procedure.One recording from a Mandarin speaker (MA3) was missing because the speaker was absent at the 2-year point; thus, the final set comprised 91 items (89 L2 samples and 2 monolingual English samples).A randomized presentation was digitally recorded with an inter-trial interval of 5 s to allow time for listeners to complete each rating.

Listeners
Listeners were recruited via an open call to undergraduate applied linguistics majors at Brock University.The only requirements for participation were that listeners identify as native speakers of English and report having normal hearing.We capped participation at 20 raters (M age = 21.2 years, range = 19-26; 6 men and 14 women).While all had studied foreign languages previously (mainly French), none had studied Mandarin or a Slavic language.One had studied Cantonese.Eight reported being fluent in one or more second languages (five in French; one in Cantonese; one in Spanish; one in Spanish and Polish).While four reported having spent brief periods of time in another country, none were in Slavic-or Mandarinspeaking locations.We did not ask anything further about the raters' experience with foreign accented speech.Each listener received a $20 honorarium.

Rating task
We followed the same procedures used in Derwing and Munro (2013), variations of which have been widely used by other researchers (see Thomson, 2018).The listeners attended one of two listening sessions (10 listeners/session).First, they completed a questionnaire to provide demographic information.To reduce familiarity effects during the rating task, as the experimenter narrated the story, the listeners were shown the picture sequence used to elicit the L2 productions.Next, they listened to brief speech samples presented through a loudspeaker and rated them on three dimensions using 9-point Likert-type scales: comprehensibility (1 = easy to understand; 9 = extremely difficult to understand); fluency (1 = extremely fluent; 9 = extremely dysfluent); and accentedness (1 = no foreign accent; 9 = extremely strong foreign accent).After judging two practice items, the listeners were invited to ask clarification questions and comment on the suitability of the volume.The ratings were completed with a pencil and paper, in which listeners circled a number from 1 to 9 on each scale for each item.Playback was controlled by the experimenter, ensuring that all participants stayed in step.The total time for rating was approximately 50 min, including a three-minute break at the mid-point.For a full account of the procedure, see Appendix C.
In line with the analyses reported in our earlier papers, we examined the ratings via a series of repeated measures ANOVAs (see Table 2).Separate analyses were carried out in JASP (JASP Team, 2023) for each L1 group because between-group statistical comparisons were not an aim of the study.Rather, we were interested in the individual groups' trajectories over time.For each speech dimension, the independent factor was Time (5 levels), and data were pooled over speakers.In all but one case (accentedness for the MA group), the sphericity assumption, tested via Mauchly's W, was met.As shown in Table 2, significant effects of Time (Type III Sum of Squares) were observed on all L1-bydimension combinations due to generally downward-trending (i.e.improving) ratings.In general, effect sizes (η 2 ) were moderate to large, but noticeably larger for the SL than the MA group.
To examine the group trajectories more closely, we computed post hoc pairwise comparisons (Bonferroni-Holmes adjustment to insure overall α = .05)between the outset and the 10-year time point and between all successive pairs of time points.The results are summarized in Table 3 and illustrated in Figure 1a-f).In the latter, the boxplots summarize speaker data on the three speech dimensions at each data collection point, with individual means overlaid as dots.The broken lines capture the mean trajectory for the groups.For the MA group, in spite of the significant ANOVA results, the listeners detected no net change in either accentedness or comprehensibility between the outset and the 10-year points.However, both trajectories illustrate intervals of improvement and worsening over time.Although fluency was judged to have improved between the outset and Year 10, significant worsening occurred between Years 2 and 7.This was followed by recovery between Years 7 and 10.
In contrast, the SL group was judged to have improved on all three dimensions between the outset and Year 10.Moreover, the listeners detected significant improvement between all adjacent pairs of time points, except for accentedness between 2 months and Year 1.While the SL speakers showed continuing improvement on all dimensions over the full span  of the study, Figure 1 indicates that improvement in comprehensibility and fluency was more rapid during the earlier times than the later ones, unlike accentedness.

Individual differences analyses
Caution is necessary when interpreting the group-level results because group means may conceal important individual differences.For that reason, we undertook a close examination of individual learner performance.Another matter that prompted us to do so was the speakers' perceptions of their own ongoing learning.Of the 18 participants, one Mandarin speaker and four Slavic language speakers reported that they had plateaued at the 10-year point, while the others said they were still learning English.Interestingly, the speakers who indicated that they had stopped learning received ratings at the extreme ends of the comprehensibility and fluency continua; three had poor ratings, while the other two were among the best rated of all.Among the comments from those who said they were still learning were 'I am still improving in pronunciation, speaking and listening' (MA11); 'I'm still learning vocabulary' (SL44); 'My understanding is better.I am still learning, but slower' (SL50).
During the Year 10 interviews, we asked participants about the language learning advice they could offer to newcomers.They expressed a range of suggestions.Both MA and SL speakers recommended talking to local people, going to school, watching TV or movies in English, and avoiding the use of the L1.Speakers from the MA group also proposed getting children to help with learning, getting a job where English is spoken, and doing volunteer work.Members of the SL group suggested increasing vocabulary and asking for feedback.Several of their other recommendations pointed to a more sophisticated grasp of effective linguistic and social strategies, such as learning phonetics, listening, actively initiating conversations, studying subject matter in the L2, using 'ready to use phrases' (formulaic chunks), attending to 'tone of voice' (intonation), and focusing on grammar.
To inspect individual progress, we created plots of learning trajectories for comprehensibility and fluency, which are shown in Figure 2. Because comprehensibility and fluency are more important than accentedness, both communicatively and pedagogically, we have chosen to focus on them.A full set of individual trajectories including accentedness is available in the supplementary data.
It is very clear that the individual speakers do not show monotonic progress.Although some trajectories demonstrate relatively consistent improvement over time, we also see examples of speakers whose final ratings were comparable to their initial ratings despite notable changes in between.In a small number of cases, ratings were actually worse at 10 years than at the outset (e.g.MA3).Interpretation of these trajectories is complicated by the likelihood that some differences in scores are simply the result of regression to the mean.A single measurement of performance at any particular time is not necessarily a satisfactory representation of a speaker's general ability, and changes from one time to another may simply fall within the range of the speaker's day-to-day variability.Nonetheless, it would be irresponsible to ignore variability in the data without closer examination, especially in light of trajectories such as that of SL40, which stood out because of apparently sizeable improvements in comprehensibility and fluency between the 7-and 10-year points (see Figure 2).This speaker had shown a fairly continuous trend toward improvement in comprehensibility from the very start.For that reason, we were motivated to look more closely at the sociolinguistic contexts in which the speakers performed.
A reviewer of the current paper asked whether bilingual status (prior to acquisition of English) could have influenced learner outcomes.All bilingual participants were in the SL group.Of those, four ranked in the top half according to comprehensibility, while three were in the bottom half.The best performance overall on comprehensibility, fluency, and accentedness was from a monolingual speaker.Although other researchers have shown a benefit for third language acquisition in bilinguals (Cenoz, 2013), our limited data are inconclusive.

Context influences on learner performance: Examples from SL speakers
We collected contextual data in two ways.The first was a simple pen and paper circling task, the Language Use Questionnaire (LUQ).The second was an extensive interview.We now consider these data in relation to the learning trajectories in Figure 2. First, we examine the case of a married couple, Speakers SL37 and SL39, whose comprehensibility and fluency ratings are given in Figure 2; note that Speaker SL37 had the best ratings on all dimensions at both the beginning and end of the study.Her comments during interviews were indicative of a very positive attitude towards learning English, and although she reported feeling nervous talking to others early on, she forced herself to do so.At the 10-year point she was asked what language learning advice she would give to newcomers, and her reply was 'Don't be afraid.People are friendly.Very few people judge you.It's [perceived criticism] just in your head.Use English speaker tricks by saying "pardon me" instead of "please repeat slowly"' .Speaker SL37 followed her own advice-aside from once at the 2-year point, in her LUQs she consistently reported having extended conversations with L1-English speakers at least several times a day (see Figure 3).After her initial language program, she studied higher level ESL and then took a Master's degree at a local university.Following that, she took a management position in government where she oversaw several staff, worked with numerous people on a daily basis, prepared presentations for a high-level bureaucrat to deliver, and completed multiple tasks requiring sophisticated pragmalinguistic skills.
Speaker SL39 fell into the less comprehensible and less fluent half of SL speakers at the outset.As can be seen in Figure 2, he improved dramatically on both dimensions during the first 2 years, such that he was the best-rated SL speaker in comprehensibility and fluency at the 2-year point.He subsequently appeared to have regressed slightly, but was still rated in the top half of participants, and considerably improved at the 10-year point relative to the outset.Insight into possible sociolinguistic contributors to this pattern can be gained through an examination of his interview responses.Although he indicated ongoing, daily conversations in English with L1-English speakers in his LUQs over the whole study period, several life events may have influenced his unique comprehensibility trajectory.While enrolled in English classes he took a part time job (which became fulltime in Year 2) as a massage therapist in a high-end spa.In his interview, he reported having extensive conversations with clients all day long on a wide range of topics.At approximately the 4th year, when his wife had a baby, Speaker SL39 became a stay-at-home parent, allowing his wife to return to the workplace.Not only did his extensive conversations in English at work come to a halt, but he primarily used his L1 with the baby.As shown in Figure 3, he still reported having conversations in English lasting more than ten minutes every day, but interview responses indicated that his interactions were far more limited than before.At the 6-year point, Speaker SL39 enrolled in a 4-year nursing program where he again had more interactions in English, but not to the same extent as during the first 2 years.Thus, we can see a relationship between the nature of language use and comprehensibility and fluency ratings.It is worth noting that the LUQ data failed to convey important nuances in this participant's language experience, which emerged more fully in the interviews.
Speakers SL45 and SL47 are another married couple.Speaker SL45 was the least comprehensible SL individual throughout the study.Although his fluency ratings improved somewhat over the first 2 years, he regressed and eventually was rated the least fluent at Year 10 with a mean rating of 6.4, which was only half a scalar point better (i.e.lower) than his performance at the beginning (6.9).At Year 10, Speaker SL45 reported using English on the job at coffee and lunch with his immigrant co-workers, interacting with L1-English speakers no more than 4-6 times a week, roughly the same pattern throughout the whole study (see Figure 3).He did not watch a lot of English TV and he rarely listened to the radio.The researchers judged his demeanor as relatively shy and observed that he tended to defer to his wife, Speaker SL47, in contexts where they were both present.She was gregarious and talkative, but in her LUQ responses, she reported using English with L1-English speakers at levels similar to her husband's.This does not align well with her comments during interviews, in which she stated that her job entailed speaking all day long with personal banking clients.Her comprehensibility and fluency scores were considerably better than those of her husband at the outset, and continued to improve across the study.By Year 10, her comprehensibility and fluency ratings were virtually indistinguishable from others in the top third.

Context influences on learner performance: Examples from MA speakers
Speaker MA11 described himself as a quiet person who found it hard to contribute to class discussions.His comprehensibility was rated in the middle of the MA group at the study's outset, but became worse over the first 2 years, and then regressed even more by Year 7, but showed improvement during the last 3 years, ending up approximately where he started.A similar trajectory was seen in his fluency data.In an interview at the end of Year 2 he indicated that he was depressed about his lack of progress in English, especially since it had a negative impact on his employment opportunities.A professional in his home country, he had to resort to factory and labouring jobs in Canada.At approximately Year 5, he enrolled in a technical training program for 2 years.At the end of the study, Speaker MA11 had been working for 3 years as a lab tech, testing samples all day and speaking with no one except at lunch, when he either ate alone or with another Mandarin speaker.Although in his LUQ at Year 10 he reported speaking to both L1 and L2 speakers of English several times a day, Speaker MA11 also indicated that he did not watch much TV or listen to the radio in English.At earlier times in the study, his reported interactions in English occurred a few times a week at most.His friends were mostly Chinese.Despite his limited interactions, MA11's improved performance between Years 7 and 10 coincided with his taking new employment.
Speaker MA18 had the poorest comprehensibility and fluency ratings of all the Mandarin speakers at both the outset and end of the study.Although he made noticeable improvement during the first 2 years, he then regressed to a point just slightly better than his initial ratings.His LUQs showed low levels of interaction in English over the time of the study, until Year 10, when he reported speaking with L1-English speakers several times a day (see Figure 3).He did not watch much English TV or movies, did not listen to the radio, and socialized primarily with other Mandarin speakers.In the Year 10 interview, Speaker MA18 said that speaking was the most difficult aspect of language learning.He believed that immersion was the best way to learn.In his advice to newcomers, he said that it was 'better to find a job where you can talk and use English' .He stated that he had no regrets about immigrating to Canada, however, and said that he was a 'tough guy' despite his language struggles.
Finally, we consider the trajectories of Speaker MA20, a petroleum lab analyst.Her comprehensibility and fluency scores were among the best in the group at the outset, and she showed clear improvement in both at the 2-year point.She regressed slightly by Year 7 but at Year 10, her ratings showed slight improvement in comprehensibility and some improvement in fluency-she was the most fluent of all the Mandarin speakers.Throughout the first year of the study, Speaker MA20 reported in her LUQs that she had very few conversations in English with L1 speakers, but she interacted in English with other L2 speakers to a greater extent.By the 2-year mark, she was talking in English with both L1-English and L2-English speakers extensively every day; this coincided with her employment history.All along, her TV watching was minimal but she consistently listened to the radio for an hour a day.Speaker MA20 felt that she communicated well at work but still had difficulty in social situations.

Discussion
We remind readers that the subset of participants in this study is drawn from a larger longitudinal population originally enrolled in the project and differs in composition from the subsets covered in our earlier reports (e.g.Derwing & Munro, 2013).In addition, the raters are new; thus, the results may not be directly comparable to earlier findings.That said, the general pattern of outcomes is consistent with our previous work.In this study, as in the earlier research, the SL group was judged to have improved in both comprehensibility and fluency up to the 7-year point, whereas the MA group's ratings showed no similar improvement on those dimensions.To these previous findings, we now add evidence of continuing improvement on the part of the SL group on all three dimensions between the 7-and 10-year points.Although the MA group also improved significantly on comprehensibility during the same interval, that change came after a period of significant worsening between Years 2 and 7. Our earlier papers from this project highlighted a Window of Maximal Opportunity (WMO) during the initial period of massive exposure to the L2.An examination of the SL group's trajectories for comprehensibility and fluency suggests that, for that group, the WMO may have lasted over the first 2 years of the study, with continued but slower improvement over the remaining time.Such a pattern is not fully evident in the MA data, in part because of reversals in progress over time.
In the current study, our first research question was whether learners continue to improve between their 7th and 10th year in a new country.The early literature (e.g.Selinker, 1972) suggested that, in the absence of teaching interventions, L2 learners' development stops.Our earlier studies cast some doubt on this assumption, at least for pronunciation and fluency (Derwing & Munro 2013) because improvement on those dimensions occurred in the SL group after the 2-year point.The findings reported in the current investigation provide new insight, indicating that improvement is at least possible in all three dimensions even after 7 years, despite the absence of instruction.
Our second research question, regarding whether we would observe a difference in trajectories between the two distinct cultural and linguistic groups can be answered mainly in the affirmative.The between-group divergence that we observed in earlier studies by the 2-year point persisted as far as the 10-year point, with minimal narrowing of the gap.Notably, the within-group comparisons revealed periods of both improvement and worsening in the MA group, while the SL group showed continued trajectories toward improvement.That finding requires qualification, however.First, the attrited samples in this study were small, especially for the MA group; inclusion of speakers who could not be reached at Year 10 could have led to different findings.Second, the individual learner trajectories suggest that some speakers differed in whether or not they improved after the 7-year point, and that improvement was somewhat commensurate with their use of the L2.Just as Zielinski and Pryor (2020) found, an examination of mean data belies the important individual variability.
A primary benefit of this longitudinal study is the opportunity to examine individual learning trajectories.A notable feature of the trajectories in Figure 2 is that they are non-linear, as has been noted in much commentary from advocates of Complexity Theory, for example, Lowie and Verspoor (2019, 2022), and van Dijk and van geert (2007).As we noted in the introduction, this study was begun well before CT was as widely discussed in the field of SLA as it is today.Nevertheless, the results of this study are consistent with a CT view.By examining L1, cultural, and contextual variables over time, in relation to simultaneously shifting individual learner differences, we are able to identify how facets of L2 speech emerge, progress, and at times regress in far from stable systems.As Hulstijn et al. (2014) argue in great detail, the separation between cognitive and social in the domain of SLA is an unnatural one and researchers need to bridge the gap between the two in order to arrive at a better understanding of how both cognitive and social aspects of this language systems develop.Longitudinal studies of language development can make an important contribution in this regard because they can help to uncover the complex interplay between cognitive and social functions.
This interplay has relevance to our third research question which asked, 'To what degree do the sociolinguistic contexts of the speakers influence their individual development?'In several cases, we identified life events, personality characteristics, or behavioral patterns discussed in the interviews that were associated with changes in the participants' comprehensibility and fluency ratings.Language use as indicated in the LUQs did not follow a linear pattern either.This outcome fits with Nagle's (2022b) observation that latent influences on learning such as motivation, anxiety and engagement are not invariant themselves.Such variables also interact with each other in a dynamic system.For example, positive emotions can help to sustain motivation which may otherwise begin waning (Dewaele et al., 2023;Dörnyei & Henry, 2022;MacIntyre & Vincze, 2017).
Although the participants appeared to differ in terms of awareness of what would most promote learning, many recognized that they needed interaction, focusing on trying to find people with whom they could talk in the L2.SL37, who demonstrated a strong willingness to communicate, saw the necessity of initiating those conversations, despite fear of negative social evaluation.The SL speakers appeared to be more aware than the MA speakers of sociolinguistic factors that may benefit language learning.
A key finding of this research is that naturalistic improvement continued much longer than previous work assumed (Han, 2013), though the rate of change appears to be slower as time passes.Both groups showed improvement early on, but the SL group continued to improve beyond that initial time period, notably after 2 years.However, the gap between the 2-year and 7-year data collection points disallows pinpointing a specific time at which learning slowed down after the Window of Maximal Opportunity closed.Nonetheless, the slowing down may reflect the onset of an attractor state (i.e. a stable tendency) as in Complex Dynamic Systems Theory (Hiver, 2015).With respect to learning influences beyond the 7-year point, our interview and LUQ data indicated that some participants' continued and extensive use of English likely contributed to their ongoing advancement.
One pedagogical implication of the findings here is the need for some adult immigrant L2 students to learn strategies for initiating conversations outside the classroom to enrich their input.Local topics of interest, such as sporting or cultural events, politics, hobbies, and movies/TV generally make for good small talk, and may lead to more extensive conversations than primarily superficial exchanges about such things as the weather.As Derwing et al. (2007) suggest, learners who do not follow current events may be missing out on opportunities to interact with others in their L2.Teachers are in a position to assist their students both through the content they introduce in the classroom, as well as by arranging contact activities and heightening their awareness of ways to obtain richer interactions, which in turn may lead to higher levels of WTC (for further discussion, see Derwing & Munro, 2015).
Another practical implication of these findings that teachers could relay to their adult L2 students is that language learning does not stop when they exit their program.Naturalistic learning still takes place and it can go on for a very long time.As Ellis (2014) states, 'language bridges society and cognition.It is a distributed emergent phenomenon.People and language create each other, grow from each other, and act and change under the influence of the other' (p.607).
Beyond naturalistic learning, targeted interventions at a much later time have also been shown to be effective in improving oral language skills (Derwing et al., 1997;Derwing et al., 2014;Inceoglu, 2021).Teachers would be well-advised to have such a discussion with their students, especially since some may face a temporary language barrier to workplace advancement years after leaving L2 classes.Such L2 speakers may benefit from a short-term pronunciation course such as the one outlined in Derwing et al. (2014), and raising learners' metalinguistic awareness can allow them to be become better at self-assessment (O'Brien, 2019).More broadly, learners' communicative skills could be further improved with instruction on pragmatics and non-verbal features of interaction (Derwing et al., 2021;McDonough et al., 2023).
The methods we have described here offer several benefits.Among these is using both quantitative and qualitative data in concert to help us draw inferences about improvement.Another concerns the distinction between individual and group performance.As Lowie and Verspoor (2019) have indicated, it cannot be assumed that group statistics are a representation of individual behaviour and vice versa: 'That is why we need two lines of research in applied linguistics: group studies and single case studies ' (p. 185).The current investigation offers some merits of both; it is a two-group study, allowing cross-group comparisons, and it incorporates detailed individual data.Furthermore, the data were collected at several points over 10 years.Cross-sectional designs do not necessarily provide an accurate window on development over time, whereas an ongoing focus on the same dependent and independent variables offers insights that cannot be obtained otherwise.An important aspect of longitudinal study design is the incorporation of several distinct and informative variables.For example, although we did not report all our variables in this paper, over the course of 10 years we considered intelligibility, comprehensibility, fluency, and accentedness at different points, rather than restricting our focus to a single speech dimension.We also measured both speech perception and production in a variety of tasks.No researcher can anticipate exactly what is going to matter in the long run, so it is crucial to investigate several lines of inquiry.Not only does this ensure that there will likely be meaningful and interesting results, but as the foci of applied linguistics change over time, it is useful to safeguard relevance by covering a lot of ground.
The paucity of longitudinal studies described by Nagle (2021) is no doubt partly due to researchers' concerns about investing extensive research resources into investigations that may turn up no meaningful findings.Our studies ought to reassure other researchers that it is worthwhile to extend projects across several years because substantive results are very likely to be obtained.

Limitations
In longitudinal research both the independent and dependent variables must be evaluated over time to gain a full understanding of their relationship.A complication that arose in this study, however, is that responses on the LUQs could not always be taken at face value.A survey of this type is a very crude measure that may sometimes be contradicted by more specific information obtainable only through in-depth interviews or a more sophisticated instrument such as a language log (Ranta & Meckelborg, 2013).
Just as almost all adults continue to learn new words in their L1, the speakers here indicated they were continuing to learn new words in the L2.Our tests did not focus on vocabulary but several participants mentioned trying to continue to learn new words.Neither did we examine grammar over time.Thus, we make no claim to having assessed overall linguistic improvement.We also recognize the problems associated with interpreting data from an attrited set.Nonetheless, very few studies of language learning extend even beyond a year or two (Nagle, 2021) let alone up to the 10-year period covered here.A reviewer proposed that we comment on whether there were interaction effects across independent variables and to indicate whether any variables seemed to be irrelevant.We recognize the value of being able to make such observations; however, the limited number of participants makes it especially imprudent to go further than we have done.

Figure 1 .
Figure 1.Boxplots illustrating ratings over time on the three dimensions as assigned to the Ma (left side) and sL (right side) speakers.Individual dots show mean ratings for each speaker and mean group performance is shown on the broken lines.a) Ma Comprehensibility ratings, b) sL Comprehensibility ratings, c) Ma Fluency ratings, d) sL Fluency ratings, e) Ma accent ratings, f) sL accent ratings.

Figure 2 .
Figure 2. Individual trajectories for comprehensibility and fluency for Ma (top panels) and sL (bottom panels).on both scales, lower values indicate better performance.

Figure 3 .
Figure 3. self-reported L2 use from the LuQs for eight selected speakers.Trajectories are stylized to improve readability.

Table 1 .
attrition over the course of the study (speakers remaining of the original 50).

Table 2 .
repeated measures anOVa summary for listener ratings.
a greenhouse-geisser adjustment was applied because of violation of the sphericity assumption.

Table 3 .
results of pairwise posthoc comparisons a .
a gain and worse are all significant (p holm < .05);n/c = no statistical difference.