Emotional vocabulary in immigrants’ L2 written discourse: is linguistic distance a proxy for L2 emotionality?

Emotional vocabulary is an important element in daily conversations, and knowledge and teaching of this vocabulary in a second language (L2) should be a primary goal in migration contexts. This study aimed to identify the emotional words used in the written productions of 288 adult immigrants from di ﬀ erent countries of origin who were beginner-level learners of Spanish and to analyse the a ﬀ ective dimensions of valence and arousal of these words. The study also investigated whether the linguistic distance between the ﬁ rst language (L1) of these immigrants and their L2 (Spanish) – as assessed with the normalised and divided Levenshtein distance – constituted a proxy for emotionality in L2 written discourse. Multiple regression models and mediation analysis revealed that the e ﬀ ect of linguistic distance on the number of high-arousal words was mediated by L2 pro ﬁ ciency level, and that L2 pro ﬁ ciency level had a positive in ﬂ uence on the number of emotional (positive/negative) words. The results also revealed that these immigrants used a greater number of positive words in their L2 written productions.


Introduction
The movement of people across national and international borders, for reasons that range from searching for better work and life conditions to on-going wars such as Russia's invasion of Ukraine in 2022 and the armed conflict between Israel and Hamas in the Gaza Strip in 2023, represents one of the most important socio-political challenges.It is thus not surprising that international migration and the integration of migrants and refugees have become important topics in many scientific fields, as migration has significant economic and social implications for contemporary societies.One of the main priorities for this population is to learn the language of the host country when this differs from their first language (L1)since the mastery of literacy and oral skills is a prerequisite for successful integration into the society, the educational system and the workplace, as well as for social networking (Dustmann 1994;Esser 2006;Robinson and Gadelii 2003).As Tarone and Bigelow (2012) argued, these newcomers 'will weave their stories together with our own' and therefore, 'we must understand them and how they learn language as part of their adaptation process' (22).These stories will include not only facts and undisputed truths, but also emotions, feelings, and a considerable amount of subjective affectivity.This highlights the need to investigate productive emotional language among migrants, not just their general linguistic competence or assimilation process, since both are needed for a successful societal integration (see Piller 2012).Knaller (2017) claimed that 'Emotions on the level of production are among those least explored' (20) and emphasised the inherent link between the writing processes and emotions: 'writing relates to emotions as a prerequisite for self-and other-experiences, judgments, evaluations, understanding and perception ' (22).Focusing on this link and its psycholinguistic dimension, the current study analysed the emotional vocabulary in a corpus of Spanish as a second language (L2) 1 written productions by 288 adult immigrants from different countries of origin who were beginner-level learners of Spanish and were living in Madrid (Spain) at the time of data collection.These written productions were part of a certification exam and did not target specific emotions or highly emotional topics.However, in our view, emotions arise unpredictably and regularly without an emotionally charged context necessarily being present, hence the need to investigate the affective features of these more neutralyet not unemotionalwriting contexts.Specifically, we addressed the following research questions: (1) Does the linguistic distance between the L1 and the L2 relate to the affective dimensions (valence and arousal) of the vocabulary used by adult immigrant beginner-level learners of Spanish in their L2 written productions?If yes, to what extent is the linguistic distance a proxy for emotionality, operationalised as the number of positive, negative, and high-arousal words 2 used in L2 writing?(2) To what extent do gender, age, length of residence in the host country, and L2 proficiency level contribute to the emotionality of these immigrants' L2 written discourse?
As emotional words are an important indicator of discourse emotionality, our hypothesis was that the emotionality of the texts produced by our immigrant participants would decrease as a result of the linguistic distance between L1 and L2.However, the overall L2 proficiency level has been proposed as a significant determinant of L2 learners' emotional vocabulary (Dewaele and Pavlenko 2002;Mavrou and Bustos-López 2018;Pavlenko and Driagina 2007) and might therefore mediate the above link.Based on previous research among L1 and bi-/multilingual speakers (Brody, Hall, and Stokes 2016;Chaplin 2015;Dewaele and Pavlenko 2002;Goldshmidt and Weller 2000;Mavrou 2021;Montepare and Dobish 2014), we also hypothesised that women, younger participants, and those who had spent more time in the host country (Spain) would use a greater number of affective words.
Our study contributes to bi-/multilingualism and L2 emotion research in several ways.The influence of the L1 on L2 learning and use has occupied a privileged position from the very beginning of applied linguistics, and comparative and cross-linguistic studies have produced a considerable amount of evidence pointing to the more (or less) facilitative role of the L1 in L2 comprehension and production (Odlin 2003;Ringbom 1987;Zobl 1980).However, Second Language Acquisition (SLA) theories and models based on this evidence have mainly been grounded in the study of academic or highly literate students in Western societies, particularly learners of English as an L2.This trend represents a significant obstacle to the identification of universal cognitiveor emotionalprocesses involved in L2 acquisition in migration contexts (Tarone and Bigelow 2012;Tarone, Bigelow, and Hansen 2009;van de Craats 2011;van de Craats, Kurvers, and Schöneberger 2011;van Hout 2006;Young-Scholten 2013).As van Hout (2006) argued, 'University students are equipped and motivated learners of second languages, but they are trained to learn, using all written knowledge sources available, including digital resources and tools.We cannot generalise research results obtained from them to groups that have very low levels of schooling or have no schooling at all' (6), as might be the case for somebut definitely not allmigrants.
A second important caveat concerns the ways in which the L1 is usually operationalised.The common tendency in SLA studies is to compare two or more groups of L2 learners with different L1 by means ofin the best-case scenariomultivariate analysis of variance techniques in order to identify between-group differences in the outcome variable.Our view is supported by Plonsky's (2013) meta-analysis of 606 SLA primary studies revealing that the majority of these studies used comparison of means tests such as t-tests and (M)An(c)ovas.Moreover, if large samples of L2 learners with different L1 are treated as a whole, the variance observed in the outcome variable will undoubtedly confound with L1.A potential solution to this problem is the use of a measure that quantifies the linguistic distance between the L1 and the target language (Isphording and Otten 2014).Such a measure has been used in the present study to examine its predictive validity as a proxy for emotionality in immigrants' L2 written discourse.
Additionally, the terms emotion and emotionality clearly refer to many different domains and dimensions; however, there is a consensus amongst scholars that language and particularly emotion concepts are the foremost means of the communication of emotions (Dewaele 2008(Dewaele , 2013;;Pavlenko 2008Pavlenko , 2013)).A single word such as happy or war would be sufficient to describe a person's emotional state or to evoke strong emotions and feelings in the reader or listener.This emotional vocabulary is an important element in daily conversations, and knowledge of this vocabulary can serve as an essential learning strategy for beginner learners who, even if they are not able to produce coherent or grammatically correct discourses, can still communicate their emotions using a few (isolated) emotion concepts.This is relevant not only in instructional or naturalistic L2 settings, but particularly in migration contexts in which new values, beliefs, and mannerisms must be learned and applied to achieve a high degree of integration into the host society and culture.
Ultimately, this is an exploratory study that aims to inform both language policies and (psycho)linguistic research.From an educational perspective, examining the emotional words that migrant learners at the beginner level have acquired and are able to use in the L2 can provide educators with useful insights for selecting the most appropriate and effective means of pedagogical instruction to enhance this vocabulary.From a psycholinguistic perspective, the question is whether migrants' acquisition of emotional vocabulary in the L2 follows the same developmental trajectory as it does in children who acquire the emotion labels in their L1.Finally, from a linguistic perspective, it is important to determine whether L1-L2 interaction patterns that have already been identified in grammar or general vocabulary knowledge can also be extended to the acquisition and use of L2 emotional words by migrants.

Emotional vocabulary
Over the last decades, an increasing attention has been paid to the ways in which emotions are perceived and expressed in different languages and cultures.Emotional vocabulary encompasses the verbal manifestation of emotions and is not universal but rather tends to be specific to a certain language or culture (Barrett 2017;Kitayama, Mesquita, and Karasawa 2006).The mastery of a rich and varied emotional vocabularyboth in L1 and L2is particularly relevant for self-awareness, emotional regulation, conflict management, and successful interpersonal relationships and integration (Altarriba and Bauer 2004;Bisquerra and Filella 2003;Fisher and Shapiro 2005).Pavlenko (2008) distinguished between emotion concepts, which make direct reference to an affective state or process; emotion-related words, which define behaviours associated with specific emotions; and emotion-laden words, which are related to emotions indirectly, depending on the emotional weight that individuals attribute to them based on their personal circumstances or contextual factors.Of interest, Altarriba and Bauer (2004) challenged the common belief that emotion concepts and emotion-laden words are abstract and demonstrated that these words are represented, processed, and remembered differently compared to concrete and abstract words.Another way to classify emotion(al) words is according to their valence and emotional arousal, and this approach has been adopted in this study.Valence refers to the degree of pleasantness of a stimulus (i.e.whether it is or is perceived as being positive, negative, or neutral).Arousal concerns the degree of agitation or excitement that individuals experience in response to a stimulus (i.e.whether it is exciting and stimulating or relaxing) (Russell 1980).
Evidence suggests that the emotional force of a language is influenced by many factors.For example, L1 vocabulary is usually perceived as more emotionally intense and has stronger semantic representations in bilinguals' minds than L2 words; however, a naturalistic context of L2 acquisition, an early age of onset, frequent use of the L2, advanced L2 proficiency, and prolonged socialisation in the L2 tend to minimise the above differences (Dewaele 2009;2013;Dewaele and Pavlenko 2002;Ożańska-Ponikwia 2012, 2013;Ponari et al. 2015).It is also worth noting that language dominance may not always coincide with the order of language acquisition, as extensive socialisation in the L2 may lead to L1 attrition and convert the L2 into the language of the heart (Dewaele 2004).

Emotions and emotional expression in migration contexts
The last decade has witnessed an increasing number of studies focusing on the emotional aspects of speech of migrantincluding underrepresented and marginalisedpopulations; that is, the ability of migrants to express their emotions and feelings in their different languages and to perceive the emotional intensity of the target language (Dewaele 2013;Pavlenko 2013), which are fundamental for integration, socialisation, and a sense of belonging (Panicacci 2020(Panicacci , 2021)).Some studies placed emphasis on immigrants' language preferences and physiological reactivity to emotional phrases.For example, Panicacci (2020) investigated whether expressing emotions in the language of the host country predicted acculturation attitudes towards both the L1 and the language spoken in that country.The results revealed that frequent use of the L2 to express specific emotions, such as anger and love, as well as for swearing, was associated with higher levels of acculturation.In fact, the use of the L2 to express anger was the best predictor variable of migrants' sense of belonging to the L2 society.More recently, Shakiba and Dewaele (2022) examined how language preferences for swearing are linked to certain socio-biographical variables and the degree of acculturation of Persian-English immigrants residing outside Iran.Their results indicated that the frequency of swearing in English was associated with gender (females > males), an early age of onset, higher levels of L2 proficiency, and more time spent in English speaking countries (see also Dewaele 2013).Switching to the L2 to express their anger helped these immigrants escape the stigma of swearing in their L1.Similarly, in Caldwell-Harris et al. (2011), Chinese immigrant learners of English preferred to express anger, taboo phrases, and intimacies in the L2, which appeared to reflect cultural conventions and constraints related to their culture of origin.Cook and Dewaele (2021) further highlighted the liberating effects of using an L2 (English) to narrate traumatic experiences among survivors of sexual persecution.The use of the L2 helped these refugees bring suffering into words, feel free to express love and to be themselves and find a way to conciliate with their L1.Since emotion concepts are at the basis of emotional communication (Barrett 2009(Barrett , 2017)), the above studies reinforce the idea that having a rich L2 emotional vocabulary would provide immigrants with additional opportunities not only to make friends, integrate, and socialise, but also to reflect upon, share, and eventually overcome negative experiences.It would also allow them to confront the 'culture shock' and the day-to-day practical and pragmatic challenges when looking for 'a house, a job, health care, insurance and a driver's licence' (Dewaele and Stavans 2014, 204).Nevertheless, the above studies were mostly based on self-reports 3 of language choices and preferences rather than on the actual language use (i.e.productive language) to express emotions; in other words, they are methodologically different and thus not directly comparable with the current study.
The (emotional) vocabulary used by immigrant L2 learners and heritage speakers (i.e.individuals who were bi-/multilinguals from a young age) has been the focus of an important number of studies, which mainly analysed large corpora or used emotion-evoking stimuli and autobiographical memories as a technique to elicit emotional experiences.In a seminal study, Dewaele and Pavlenko (2002) explored several factors that may influence the number of emotional types and tokens (nouns, verbs, adverbs, adjectives) produced by Dutch L1-French L2 and Russian L1-English L2 speakers.Overall, the results revealed that higher levels of L2 proficiency and sociocultural competence predicted the number of emotional tokens and types, respectively.Moreover, females and more extroverted participants used more emotional tokens and types.
Evidence from heritage speakers who emigrated to a different country has suggested that an early age of onset and longer exposure to the L2 are associated with larger general vocabularies in the majority language (L2) of Japanese L1-English L2 (Mori and Calder 2013) and Frisian L1-Dutch L2 heritage speakers (Blom and Bosma 2016), as well as heritage speakers with 33 different L1 in Smolander et al.'s (2021) study.Vañó and Pennebaker (1997) investigated the emotional vocabulary of heritage speakers of Spanish in the United States and found that the amount of English input and output, both at home and in the classroom, was linked to these speakers' emotion-specific vocabulary size.Marian and Kaushanskaya's (2008) study with young adult Russian-English bilinguals who had emigrated to the United States revealed that these participants used more emotion words in their L2 narratives about the migration experience (but see Gökmen and Yarici 2018).They also used more positive words the earlier the age they had emigrated to the host country.Vidal Noguera, Villar, and Mavrou (2022) analysed the emotional vocabulary used by adolescent heritage speakers of Spanish who were living in Germany in their L1 and L2 autobiographical memories about events that triggered anger.Using Bayesian analysis, they found that the autobiographical memories in German contained a significantly higher number of affective types, but the autobiographical memories in Spanish included more positive tokens (i.e. total number of positive words, including repetitions) and types and more high-arousal tokens and types, with Bayes factors suggesting moderate to strong evidence in favour of these differences.On the other hand, no relationship between daily language input and output in Spanish and German and affective vocabulary in these languages was found.Of interest, those heritage speakers who had been exposed to German from birth used more varied emotional vocabulary in both their L1 and L2, which points to the advantages of early bilingualism when acquiring and using affective words.
Heritage speakers usually acquire their languages simultaneously during childhood, and this differentiates them from migrant L2 learners who emigrate to a new country during early or late adulthood.Furthermore, heritage speakers are usually dominant in their two languages (although not necessarily in all four basic language skills), while migrants may have a low L2 proficiency level upon arrival to the new country or even after years residing in it, as is the case of many participants of the current study.To our knowledge, only Mavrou and Bustos-López (2018; see also Bustos-López and Mavrou 2019) analysed the valence of the emotional vocabulary in speaking tasks used by adult immigrant beginner-level L2 learners.Their participants were from Ukraine, Morocco, Syria, and Egypt, and were living in Madrid (Spain) at the time of data collection.Sociodemographic and linguistic variables such as gender, age, length of residence in Spain, linguistic family, overall L2 proficiency level, and linguistic accuracy were considered.The results revealed that around one third of these immigrants' oral discourse consisted of emotional words, particularly positive ones.Male participants and those who had spent more time in Spain used more negative words.Furthermore, L2 proficiency level had a positive influence on the number of emotional tokens and types, and the use of more emotional words led to more lexically diverse oral discourses.
Altogether, the above findings highlight that L2 proficiency is a key factor influencing vocabulary knowledge and use, including emotional vocabulary.In addition, there seems to be a complex interaction between (emotional) vocabulary acquisition and use and certain sociodemographic variables such as gender, age of onset, length of residence in the host country and exposure to the L2 -, but their contribution should be considered along with the specific language profile of the participants of previous studies (foreign language learners versus heritage speakers versus migrant L2 learners).An important shortcoming of previous research concerns the limited combination of language pairs that are examined and compared within the same research design.For example, studies on heritage speakers usually recruit and analyse language performance or production of speakers of a specific language pair.Immigrants' L1(s) and differences in language families between these L1(s) and the language spoken in the host country may have an impact on the acquisition and use of L2 emotional vocabulary.In a recent study, Mavrou and Chao (2023) found that the linguistic proximity between the L1 and the L2as assessed with the normalised and divided Levenshtein distancewas a significant predictor of accuracy, text-production fluency, and overall L2 writing skills of immigrant L2 learners of Spanish, and that L2 proficiency level mediated the link between linguistic distance and fluency.Based on these results, we hypothesised that the linguistic distance may also have a role to play in the acquisition of other aspects of communicative competence, such as (emotional) vocabulary.Furthermore, mediation analysis appears to be a suitable tool to uncover hidden or complex relationships between variables, and as will be seen later, with this aim it has been used in the current study.

Participants
The participants were 288 immigrant L2 learners of Spanish, 191 females and 90 males (missing data = 7), aged between 16 and 71 (M = 34, SD = 9.88, missing data = 1), from 39 different countries of origin: Argelia, Bangladesh, Belarus, Brazil, Bulgaria, Cameroon, China, Congo, Ghana, Guinea, Guinea Bissau, Egypt, India, Italy, Ivory Coast, Jordan, Kenya, Kuwait, Mali, Mauritania, Moldova, Morocco, Nigeria, Pakistan, Palestine, Philippines, Poland, Portugal, Romania, Russia, Senegal, Sierra Leona, Somalia, Sri Lanka, Sudan, Syria, Uganda, Ukraine, United States.The average number of years they had lived in Spain was 6.14 (SD = 5.93, missing data = 3), and their education levels varied: 105 had attended university or had a university degree, 31 had obtained a vocational degree, 48 had ceased their studies after high school (12 years of schooling), 60 after secondary education, and 25 after primary education (10 and 6 years of schooling, respectively), while 6 participants were non-literate in their L1 upon arrival in Spain (missing data = 13).The participants' L1 used in the statistical models was the national language (i.e. standard language variety) spoken in their countries of origin.Some participants reported having a second L1 (e.g.some Ukrainian participants reported having 2 L1, Ukrainian and Russian).However, for reasons of parsimony, only the linguistic distance between the official language spoken in the participants' countries of origin and the target language (Spanish) was used in the statistical analyses.

Corpus
The written productions were derived from a language proficiency examination for immigrant learners who want to certify their proficiency level in Spanish up to the A2-n level. 4This level is slightly lower than the A2 level established by the Common European Framework of Reference for Languages (CEFR; Council of Europe 2001).The aim of the exam is to assess immigrants' communicative abilities to cope with daily language demands in public and professional domains; for example, to provide personal information, to conduct transactions, to accept or reject an invitation, and so forth.The exam is comprised of four sections: reading, speaking, listening, and writing.This study used the written productions from the writing section of the exam, specifically Tasks 2 and 3, which require the participants to create a job posting 5 and to respond to an email 6 , respectively.Part of the analysed samples have been transcribed and are available online (https://slabank.talkbank.org/access/Spanish/Nebrija-INMIGRA.html).
As mentioned previously, our participants had a low L2 proficiency in Spanish.This was also evidenced by their overall scores in the certification exam, which exclusively assesses Spanish as an L2 at the beginner level (M = 7.74 out of 10.00, which was the maximum score that a candidate could obtain, SD = 1.46).The average length of the participants' written productions was 61.15 words for the two tasks combined, with a standard deviation of 18.58.These results are in the expected direction considering the profile of our participants, their L2 proficiency level (i.e.slightly lower than the A2 level according to the CEFR), and the context in which the data collection took place (i.e.language testing setting that required participants to produce their texts in a specified time period without additional external aids or resources).

Measurement of emotional vocabulary
The emotional vocabulary that the participants used in their texts was analysed according to the total number of positively-valenced, negatively-valenced, and high-arousal words.As mentioned previously, valence represents the degree of pleasantness of a stimulus (positive, negative, neutral), while arousal refers to the level of excitement or intensity (high or low) caused by a stimulus.These affective dimensions allow for a relatively objective assessment of the emotionality of a text, at least at the word level.We calculated (1) the sum of the total number of positive and negative types that the participants produced in both writing tasks, which served as an indicator of emotional vocabulary development (or growth), and ( 2) the total number of high-arousal types used in both writing tasks, which served as a proxy for emotional intensity.To accomplish this, we used the web-based search engine emoFinder (Fraga et al. 2018), which includes subjective estimates of valence and arousal of Spanish words.These values range from 1 to 9. In the present study, we followed the cut-off points for valence and arousal established by Ferré et al. (2012) and Hinojosa et al. (2016).For valence, values of 1.00-3.99correspond to negative words, 4.00-5.99indicate neutral words, and 6.00-9.00represent positive words; for arousal, words with values between 1.00 and 4.99 are low-arousal words, and values of 5.00-9.00indicate high-arousal words.We analysed 17,612 words and identified a total of 5,330 positive words, 258 negative words, and 5,217 higharousal words.Given the low proportion of negative words, positive and negative words were all incorporated into one variable: emotional words.

Linguistic distance
Linguistic distance was operationalised as 'the minimum number of insertions, deletions, or substitutions of a single character needed to transform one word into the other' (Petroni andServa 2010, 2281) and was measured using the normalised and divided Levenshtein distance.This measure is derived from the Automated Similarity Judgment Program developed by the German Max Planck Institute for Evolutionary Anthropology and estimates the phonetic similarity of a set of 40 words (Swadesh list) that refer to common objects in different languages (Isphording and Otten 2011, 2013, 2014;Petroni and Serva 2010).The normalised and divided Levenshtein distance represents the percentage of dissimilarity between languages; lower values indicate closer linguistic proximity, while higher values indicate greater linguistic divergence.It covers a wide range of language pairs; for example, Spain is matched with around 178 different countries to compute the linguistic distance between Spanish and the official language spoken in those countries.In addition, the measure is not influenced by incentives to learn a language and can take a range of values distributed along a continuumexceeding 100% when a language pair is very dissimilarwhich makes it ideal for use in many statistical models (see Isphording and Otten 2011, 2013, 2014, for more information about this measure). 7

Data analysis
Statistical analyses were conducted using both NHST and Bayesian analysis procedures and were performed using SPSS v.23.0 (IBM Corp 2015) and JASP v.0.12. 2 (JASP Team 2020).Correlations between the linguistic distance and measures of emotional vocabulary were computed first.Multiple regression models were then run to examine the contribution of a series of potentially relevant predictor variables to participants' emotional vocabulary.These variables were gender, age, length of residence in the host country, proficiency level in Spanish based on the final score that the participants obtained in the certification exam, and linguistic distance between their L1 and Spanish.The results of these analyses informed the mediation analysis, which allowed to investigate whether the effect of linguistic distance on emotional vocabulary was mediated by L2 proficiency level.To determine whether the sample size was sufficiently large, a power analysis was conducted a priori using G*Power 3 Software (Faul et al. 2007(Faul et al. , 2009)), with input parameters that were set as follows: α = .05,β = .20,effect size f 2 = .15,and number of predictors = 5.The suggested sample size was N = 138, while the actual sample was much larger (N = 288).The study was approved by the Ethics Committee of Nebrija University (Ref.no.UNNE-2021-001) and followed the principles expressed in the Declaration of Helsinki.

Results
Descriptive statistics for the variables of the study are summarised in Table 1.Pearson productmoment correlations revealed a statistically significant negative correlation between the linguistic distance and the number of high-arousal words (r = −.153,p = .009,BF 10 = 2.103), while the correlation between the linguistic distance and the number of positive and negative words was close to zero and therefore not statistically significant (r = −.055,p = .348,BF 10 = 0.114).In other words, the closer the linguistic proximity between participants' L1 and Spanish, the greater the number of high-arousal words they used in their texts.
Two multiple regression models were then computed, with the number of positive and negative words (Table 2) and the number of high-arousal words (Table 3) as the outcome variables.The reference category for gender was female.The backward elimination method was chosen because it is more appropriate for exploratory model building (Field 2009).The results revealed that length of residence in Spain and L2 proficiency level were statistically significant predictors, explaining 11% and 16.5% of the variability in the number of emotional and high-arousal words, respectively.Of interest, the coefficients for length of residence in Spain were negative; that is, participants who had spent longer periods in Spain used fewer positive/negative, and high-arousal words.
These results led us to the hypothesis that the link between L1-L2 linguistic distance and emotional vocabulary might be mediated by a third variable, such as the level of L2 proficiency.Further regression models were run to investigate the contribution of the linguistic distance to the dependent variables (measures of emotional vocabulary) and to the mediator (L2 proficiency level) and showed that the linguistic distance was only a significant predictor of L2 proficiency level (b = −0.045,t = −4.297,p < .001)and the number of high-arousal words (b = −0.102,t = −2.613,p = .009).In addition, when controlling for L2 proficiency level, the linguistic distance was not a statistically significant predictor of the number of high-arousal words (Table 3).For reasons of parsimony, since no statistically significant link was found between the linguistic distance and the number of positive/negative words, we assumed the null hypothesis of no relationship for this vocabulary measure and did not proceed with a mediation analysis to avoid the increased risk of false positives (Agler and De Boeck 2017;Baron and Kenny 1986).
A mediation analysis was only performed for the number of high-arousal words using a bootstrapping method.The results of the indirect effect based on 5,000 bootstrap samples revealed a significant indirect negative relationship between the linguistic distance and the number of higharousal words mediated by L2 proficiency level (b = −0.010,z-value = −3.476,p < .001,Bootstrap CI 95 = −0.016,−0.006; see also Figure 1).On the other hand, no statistically significant direct relationship between the linguistic distance and the number of high-arousal words was observed (b = −0.009,z-value = −1.243,p = .214,Bootstrap CI 95 = −0.022,0.005).Following Zhao, Lynch, and Chen (2010), the type of mediation produced can be described as indirect-only mediation - or full mediation according to Baron and Kenny (1986)and allows for the conclusion that the mediator (L2 proficiency) accounted for all of the observed relationship between the latent variables.Given that the length of residence in Spain was correlated significantly with both the L2 proficiency level (r = .351,p < .001)and the number of high-arousal words (r = −.130,p = .028),a second mediation analysis was conducted with length of residence in Spain as the background confounder.

Discussion
Our study suggests that the linguistic distance between the L1 and the L2 (Spanish in this case) is negatively linked to immigrants' L2 proficiency level.In turn, L2 proficiency level appears to play a facilitative role in the use of high-arousal words and, eventually, in the production of more emotionally charged L2 written discourses.Based on these findings, we argue that the linguistic distance between the L1 and the L2 is an indirect proxy for emotionality related to immigrants' L2 written discourse, and that this link is mediated by L2 proficiency level, (see also Chiswick and Miller 2005;Mavrou and Chao 2023).A plausible explanation for this finding is the positive transfer of high-arousal words between similar languages (i.e.languages that share a considerable number of words and similar syntactic rules).It is also possible that once learned, L2 high-arousal words are not that easy to forget as compared to L2 neutral wordsi.e. a memory benefit for emotional vocabulary (see Kensinger and Corkin 2003) or are more easily remembered because of their cognitive distinctiveness (see Hourihan, Fraundorf, and Benjamin 2017).
In terms of emotional vocabulary development, operationalised in this study as the number of positive and negative words that the participants produced in their texts, L2 proficiency level emerged as a statistically significant predictor variable.This is in line with previous findings pointing to the overall benefits of being highly proficient in the target language (Dewaele and Pavlenko 2002;Mavrou and Bustos-López 2018;Pavlenko and Driagina 2007).More advanced L2 learners have probably reached an appropriate threshold of syntactic knowledge that allows them to focus explicitly on the acquisition of vocabularyincluding emotional vocabularyin a more active way and with instrumental purposes (Qian and Lin 2020), as well as to use more (low-frequency) emotion words in their discourse (Dewaele and Pavlenko 2002).
Length of residence in Spain also turned out to be a statistically significant predictor variable.However, contrary to expectation, it had an inverse relationship with emotional vocabulary (i.e.those participants who had spent less time in Spain used more emotional words) perhaps because length of residence in the host country may not be a sufficient condition to acquire a varied emotional vocabulary, unless it is accompanied by both formal instruction that helps to enhance this vocabulary (Juan Garau 2008) and high levels of psychological acculturation and socialisation within the host society (Panicacci 2020).In fact, previous studies have suggested that immigrants living outside their country of origin for a long time may have a larger network and more frequent contact with people from their L1 community, which may hinder their full emotional acculturation (Shakiba and Dewaele 2022;Zhou et al. 2021).The above finding can also be explained by the fact that immigrants who have lived in the host country for a long time may be more aware of the emotional force of L2 affective vocabulary, which makes them refine it to some degree and avoid abusing it in their social interactions with L1 speakers.
Our study also revealed a predominance of positive words being used by immigrant learners of Spanish L2 at the beginner level (for similar results see Bustos-López and Mavrou 2019; Jiménez Catalán and Dewaele 2017; Mavrou and Bustos-López 2018, 2019), which appears to resemble the development of emotional language in children.Children acquire and develop their emotional vocabularies by prioritising positive words as opposed to neutral and negative ones, and this pattern has been explained by early interactions between those infants and their caregivers (Bloom and Beckwith 1989;Li and Yu 2015; see also Hinojosa, Moreno, and Ferré 2020, for a review).An alternative explanation for this finding is the predominance of positive words in L2 textbooks.For example, Sánchez and Pérez-García (2020) measured the number of emotion words in intermediate-level English L2 textbooks and found that most of these words were high-frequency, positively-valenced words (see Ma 2012, for similar results).Moreover, immigrants' conscious effort to integrate into the new society perhaps makes them more prone to use positive words in an attempt to accommodate or gain sympathy by their L1 interlocutors.It is also possible that learning to manage negative emotions requires more sociopragmatic skills and a more in-depth socialisation into the host culture compared to positive emotions, as Panicacci (2020) suggested.The topic of the writing tasks might have also played a role in the predominance of positive vocabulary in our participants' texts, as they were required to write a job posting and to respond to a party invitation.In the second topic in particular, they tended to use words related to gratitude, happiness, and positively laden concepts such as family-and food-related terms.Participants who accepted the invitation to attend the party generally expressed their gratitude for the invitation, mentioned whether they would bring food to the party, and stated who their companion(s) would be.Those who rejected the invitation also thanked the host for inviting them, apologised for declining the invitation, and justified their lack of attendance based on reasons related to family issues.These observations highlight the need to account for differences in the valence of emotional vocabulary based on the materials and prompts used for emotion elicitation.
Nevertheless, this study has several limitations.Participants were from 39 different countries, and some of them spoke languages or dialects which are highly underrepresented in the scientific literature, such as Tagalog, Ilocano, Bisaya, Bicolano, and Igbo.It was practically impossible for us to empirically assess participants' proficiency level in their different languages and dialects or how this proficiency influenced the emotional vocabulary they produced, nor could we address the role of cultural differences for the same reason.Moreover, the corpus consisted of short written productions that did not target emotional topics, and our participants used very few negative words in their texts, making it difficult to analyse the two poles of valence separately.This is because of the language testing setting in which the data collection took place.Future replication studies should employ longer narratives or autobiographical memories by immigrants and analyse emotionality at the discourse level, as well as the specific themes emerging from these narratives.Including measures of cultural distance, in addition to linguistic distance, collecting data about immigrants' socioeconomic status, cultural and ethnic backgrounds, frequency of L2 use, age of immigration and degree of acculturation, as well as qualitative data about the social challenges they face and reasons for migration, would be particularly valuable to understand the ways immigrant L2 learners express their emotions.

Implications
The findings of this study can inform language policies that are intended to support migrants in their integration into the host society.Linguistic distance, L2 proficiency, and the expression of affective states and emotions are factors that deserve further attention.L2 lessons for migrants should promote the teaching and sharing not only of positive but particularly of negative experiences, which will help immigrant L2 learners enhance their knowledge of negatively-valenced words and expressions.L2 proficiency exams could employ grading rubrics designed to assess candidates' ability to express emotional content as part of their writtenas well as oraldiscourses, since this ability has already been included in the new descriptors of the 2018 Companion Volume of the CEFR (Council of Europe 2018).In addition to providing adequate language training and pragmatic knowledge about the host society's verbal and non-verbal communication styles, these language policies should focus on promoting the expression and understanding of migrants' discourse, particularly their emotional discourse.At the same time, raising awareness of migrants' linguistic and emotional barriers is essential in order to provide them with the necessary resources to promote their psychological health (such as interpreters in the health care system and multilingual therapy) and facilitate their integration into the host society and culture, as well as to foster prosocial communication that enables the achievement of cultural-emotional connectedness (Bennett, Volet, and Fozdar 2013).Said policies will not only provide migrants with the necessary skills that will allow them to navigate social interactions more effectively but will also strengthen the fabric of their ethnic communities, equipping them with 'emotional, relational, sociocultural, and political anchoring' (Martin and Nakayama 2022, 63).Including L1-L2 linguistic distance into the discussion will also help to bridge the gap between linguistic features and emotions as manifested in the use of emotional words.

Table 1 .
Descriptive statistics for emotional words, high-arousal words, and L2 proficiency scores.

Table 2 .
Predictors of the number of emotional (positive and negative) words.

Table 3 .
Predictors of the number of high-arousal words.