Does bilingualism influence neuropsychological test performance in older adults? A systematic review

Abstract Objective Using standardized tests which have been normed on monolinguals for the assessment of bilinguals presents challenges to the accurate characterization of cognitive profile as the literature provides compelling evidence for the influence of bilingualism on cognitive abilities. However, little is known about the generalizability of these findings to clinical neuropsychology. The aim of this review was to address this gap by summarizing current evidence on the performance of bilingual older adults on standardized tests routinely used in clinical practice. Method A systematic search of Web of Science, PsycINFO and PubMed was conducted. 27 cross-sectional and longitudinal studies which use at least one standardized neuropsychological test for cognitive impairment were included in the review. Potential demographic (cultural/linguistic background of the participants, immigrant status), clinical (diagnostic status), and methodological confounders (language of test administration, components of bilingualism) were also examined. The review protocol was registered at the PROSPERO International Prospective Register of Systematic Review with registration number CRD42018114658. Results The results of this review revealed some bilingual advantage on measures of inhibitory control and bilingual disadvantage on measures of verbal fluency in cross-sectional studies. Bilingualism status was not associated with test performance in longitudinal studies. However, findings lack consistency due to demographic variables and methodological differences across studies. Conclusion Neuropsychological tests assessing language domains and, to some extent executive function act as clinically relevant features of bilingualism for neuropsychological evaluation. However, immigration status, acculturation level and language of test administration needs to be taken into account when assessing bilingual older adults.


Introduction
The accuracy of clinical diagnosis of dementia and determination of its underlying cause play a crucial role in prognosis and early, suitable and effective application of disease-specific treatments (Salmon & Bondi, 2009). Neuropsychological assessment, a fundamental tool used in the diagnosis of mild cognitive impairment (MCI) and dementia, is considered to be primary means of guiding differential diagnosis, monitoring of the disease state and measuring post-treatment outcome (Casaletto & Heaton, 2017;Gracey & Morris, 2007). Normative comparison as a critical concept in neuropsychological assessment allows for more accurate detection of disease-related changes in cognitive function since it enables a comparison between the performance of an individual and reference groups of the same age, gender, ethnicity and educational attainment (Harvey, 2012). Due to the significant independent effects of these demographic factors on test performance, the extant literature provides demographically adjusted normative data in a variety of neuropsychological tests (Casaletto et al., 2015;Heaton et al., 2009;Norman et al., 2000). However, demographic characteristics of the population in Europe and the United States have changed over the years because of increased migration. This has resulted in increased diversity in the population, in terms of ethnicity and spoken languages (Llorente et al., 1999).
Employing standardized tests, which have been normed on monolingual populations for the assessment of bilinguals, presents challenges to clinical diagnostic processes (Gasquoine & Gonzalez, 2012). Bilinguals' possible advantages over monolinguals in executive tasks, and disadvantages in language-based tasks (Mindt et al., 2008) may produce biased results. Specifically, the cognitive advantages associated with bilingualism has been shown as smaller costs in task switching (Prior & MacWhinney, 2010) and smaller conflict effects (Bialystok et al., 2004;Coderre et al., 2013) across a variety of executive functioning experimental tasks and the Stroop task in populations ranging from children (Bialystok & Feng, 2009) to young adults, with the most robust advantage reported in older adults (Bialystok et al., 2012). On the other hand, cognitive disadvantages in relation to bilingualism has been indicated as reduced category fluency (Portocarrero et al., 2007), poorer naming performance (Roberts et al., 2002) and retrieval failures (Golan & Brown, 2006) with majority of these studies focusing on young adults.
Bilingualism may influence neuropsychological test performance (NP) beyond what is estimated by normative corrections for age, education, sex and ethnicity (Suarez et al., 2014). Thus, it is imperative to understand the impact of bilingualism on NP for the effective interpretation of test results, in order to determine whether obtained scores truly reflect individual's cognitive abilities, or the effects of life experiences such as bilingualism, particularly given the lack of normative data for this population (Mindt et al., 2008).

The role of confounding variables
The findings in studies of cognition and bilingualism are controversial and inconclusive due to large number of potential confounding variables stated in the extant literature (Bak, 2016) Language proficiency constitutes one of the critical confounding variables in this field as it provides implications for operationalization of language dominance (Hulstijn, 2012) and covers multiple aspects of language experience for its definition, such as vocabulary size, language use in a specific pattern or age of acquisition (AoA). The impact of language proficiency on cognition was reported in the Stroop task, with its modulating effect on inhibition (Singh & Mishra, 2013), conflict resolution and goal maintenance (Tse & Altarriba, 2012) AoA is considered as a factor modulating language proficiency in studies assessing the relation between bilingualism and cognition. However, its role in the neural organization of the bilingual brain has been more commonly investigated, with findings showing a differential pattern of brain activation associated with lexical retrieval  and syntax (Mahendra et al., 2003) in early language acquisition.
Variations in language proficiency and amount of exposure to the language (e.g., AoA) play a crucial role in determining the language of testing in neuropsychological evaluation of bilingual individuals. The impact of testing language on NP has been consistently shown on neuropsychological tests requiring significant language demands, namely, verbal fluency (Boone et al., 2007;Kisser et al., 2012), Boston Naming Test (BNT) and Digit Span (Boone et al., 2007) in a sample of culturally diverse young and middle-aged adults comparing native versus non-native English speakers. In the study by Kisser et al. (2012), native language status was additionally associated with a better performance on the Trail-Making Test (TMT-A). However, the associations between native language status and letter fluency and TMT-A did not remain significant with the inclusion of ethnicity as a covariate in the analyses. In another study, which included a more culturally homogenous group of Spanish/English bilingual adults (Gasquoine et al., 2007), no significant differences were found in test scores between Spanish and English language administrations in balanced bilinguals. However, the impact of language of test administration was observed on tests that exert a higher demand on language domain, such as, letter fluency, Woodcock-Munoz Language Survey-Revised (WMLS-R), Stroop Test and Story Memory in Spanish and English dominant bilinguals Therefore, a critical understanding regarding language of test administration is needed to interpret the differences in NP for both clinical and research settings.
Immigration status plays a distinct role in bilingualism research as prolonged contact between culturally dissimilar groups or members not only involves acquisition of another language but also leads to cultural, psychological and cognitive changes in terms of adapting to the cognitive styles of the host culture, a process referred to as acculturation (Berry, 2007;Park & Huang, 2010).The findings regarding immigration status and cognition are mixed. Regardless of bilingualism, some studies showed improved cognitive function and slower rate of cognitive decline in older adult immigrants, (Hill et al., 2012). In contrast, others studies reported either no significant differences in cognitive function and decline (Sheffield & Peek, 2009) or poorer performance on measures of abstract reasoning, verbal fluency, and naming in immigrant groups (Touradji et al., 2001). Hence, immigration status is a significant background variable which may elucidate some differences on NP attributed to bilingualism.
In light of these findings, language proficiency, AoA, language of test administration and immigration status have all been explored as possible modifiers of NP in bilingual older adults. The influence of education and socioeconomic status on NP was not addressed in relation to findings as they were beyond the scope of this review.

Objectives of the current review
The aim of the present work is to summarize current evidence on the performance of healthy bilingual older adults or bilinguals with cognitive decline due to dementia on standardized neuropsychological assessments routinely used in clinical practice. Specifically, this systematic review addresses whether (1) bilingual older adults display advantages or disadvantages in NP, (2) the advantage/disadvantage is found in specific cognitive domains and (3) the findings are influenced by language proficiency, AoA, immigration status and language of test administration.

Search strategy and study selection
A systematic review of the literature was conducted using the following databases: Web of Science, PsycINFO and PubMed. The search terms included were neuropsychological assessment, neuropsychology, cognitive tests, cognitive assessment, cognitive performance combined with second language, language proficiency, bilingualism, trilingualism and multilingualism. Using these search terms, published peer reviewed articles which were relevant to this review were identified and collected for further review. No restrictions on study design or date of publication were applied. The final search was carried out on March 24th, 2020. Additionally, manual searches were conducted in which reference lists of reviews and other articles were examined in order to identify relevant studies not detected by database searches. This review was conducted in accordance with the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines (Moher et al., 2009) and it was registered with the PROSPERO International Prospective Register of Systematic Review (Registration No: CRD42018114658).

Inclusion and exclusion criteria
Cross-sectional and longitudinal studies which use at least one standardized neuropsychological assessment routinely used in the diagnosis of dementia or cognitive impairment were included. Neuropsychological tools were identified in this review as brief screening measures, psychometric instruments, intelligence tests or computer-based cognitive assessments. The criteria used for selection of the studies were as follows: (1) the study investigated the impact of bilingualism on NP in older adults, (2) the study administered standardized neuropsychological tests or cognitive screening instruments which are commonly used in clinical practice, (3) the study included a group of cognitively healthy older adults, or individuals with MCI or any type of dementia, with a mean age of 60 years and older, (4) the study included a monolingual group for comparison. Studies were excluded if: (1) samples studied included comorbid neurological or psychiatric conditions which did not allow clear differentiation of findings by suspected etiology, (2) a modified version of the neuropsychological tests or a novel test was used, (3) conference abstracts, reviews, case studies, or PhD dissertations, (4) studies were published in a language other than English, or (5) studies aiming to provide normative data in bilingual older adults.
The abstracts were screened for eligibility and potentially relevant articles were reviewed in full via Covidence by two independent reviewers, namely, doctoral students in psychology (SC, EK). Disagreements between the two independent reviewers were discussed and resolved or a third reviewer (BT; PhD in biology) was consulted.

Data extraction
Data extraction process was guided by an extraction table designed by the first author tailored to the aims of this review. The extracted data included information on geographical region, study design, participant characteristics (sample sizes for all groups, age, immigration status of participants, diagnostic criteria used), the neuropsychological tool/s used, the language of test administration, results and conclusions. With regard to the outcome measures, only measures of cognition based on standardized tests were extracted from eligible studies and measures of cognition based on experimental tasks were excluded from this review. Additionally, in order to examine the extant literature comprehensively, in studies that included both younger and older adults, only data of older adults was included in the analysis.

Quality assessment
The methodological quality of the studies was evaluated by two reviewers (SC, EK) using a modified version of the Standard Quality Assessment Criteria for Evaluating Primary Research Papers: Quality Scoring for Quantitative Studies or "QualSyst" (Kmet et al., 2004). The Qualsyst tool consists of 14 items, of which, three items regarding interventional studies were removed as they were irrelevant to included studies. Additionally, one item related to adjustment of potential confounders was eliminated. A large number of potential confounding variables reported in this research area did not allow their effects to be accounted for in each study (Bak, 2016). The remaining 10 items assess studies based on objective, design, subject selection, subject characteristics, outcome measures, sample size, analytic methods, estimate of variance, results and conclusion supported by the results. Each item was scored as 2 (fully met the quality criterion), 1 (partially met the quality criterion) or as 0 (did not meet the quality criterion). A summary score was calculated for each study by dividing the obtained total score by the maximum possible score. As defined by Lee et al. (2008), quality of the studies was categorized as: limited (<0.50), adequate (0.50-0.70), good (0.71-0.79) or high (>0.80). The studies were not excluded on the basis of their score on the quality checklist, rather, this tool was used to identify strengths and weaknesses of the current literature and provide recommendations for future research.

Results
The search strategy identified 2,553 citations from the three electronic databases and 7 citations from hand searching. After adjusting for duplicates, 2,168 citations remained in the pool of papers. After screening titles and abstracts, 85 citations were potentially eligible and full reports were retrieved and analyzed. 58 studies were excluded based on a detailed examination of the full text. A total of 27 studies were identified and included in the review. The study selection process is summarized in Figure 1.

Study characteristics
Of the 27 included studies, 18 were cross-sectional and 9 were longitudinal studies. Tables 1 and 2 provide an overview of the main findings on bilingualism and its impact on the NP based on the study design.
Two studies did not report on the cognitive status of participants (Bialystok et al., 2008;Ihle et al., 2016). Among twelve cross-sectional and longitudinal studies that included cognitively healthy older adults, three studies provide no information on the instruments used for cognitive screening, but reported that the participants were cognitive healthy (Bialystok, Poarch, et al., 2014;Kowoll et al., 2015;Luo et al., 2013). The absence of neurological and psychiatric condition was the criteria used in some of the studies (Anderson et al., 2017;Papageorgiou et al., 2019;Sheppard et al., 2016). Other studies reported on the instruments they employed to test for cognitive status of the participants, such as, the Mini Mental State Examination (MMSE; Folstein et al., 1975)   Years of residence in a new country After controlling for the covariates, bilingualism was associated with a better performance in executive functions and task switching ability.
Ossher The results showed no differences in verbal fluency performance between MLs and BLs, when BLs were tested in their second and most frequently used language.  Scale (Zigmond & Snaith, 1983) scores. Nine of the eleven studies that included clinical samples reported detailed information about diagnostic criteria. The diagnostic criteria used in the studies, study characteristics and quality rating scores are summarized in Supplementary Tables 1 and 2.
Of those, six studies (Bialystok et al., 2008;Bialystok, Poarch, et al., 2014;Friesen et al., 2015;Massa et al., 2020;Sheppard et al., 2016;Soltani et al., 2019) did not specify their criteria for bilingualism. In general, the definitions used for bilingualism varied based on the goal of the study and the bilingual population that were being tested. Bak et al. (2014) and Cox et al. (2016) referred to bilingualism as being able speak in a second language, Yeung et al. (2014) and Costumero et al. (2020) described bilingual groups based on the participants' first language (e.g., English bilinguals, for a group whose first language was English or bilinguals, for a group whose native language is Catalan), whereas Ihle et al. (2016) identified bilingual or multilingual groups based on the number of languages participants spoke. Three study identified bilingualism based on the frequency of second language use (Mungas et al., 2018;Padilla et al., 2016;Papageorgiou et al., 2019), two based on the level of proficiency (Rosselli et al., 2000(Rosselli et al., , 2019, and two other based exclusively on the second language proficiency, instead of proficiency in each language (Nielsen et al., 2019;Zahodne et al., 2014). The remaining four studies (Clare, Whitaker, Craik, et al., 2016;Kousaie et al., 2014;Luo et al., 2013) used more specific definitions of the bilingual groups showing some similarities with the definition used by Bialystok et al. (2007).

Factors influencing neuropsychological test performance in bilinguals
Language proficiency. The level of proficiency reported, and tools used to measure language proficiency differ considerably among studies. In sixteen studies, proficiency was only measured through self-report questionnaires wherein participants rated their proficiency in the domains of reading, writing, comprehension and speaking in either one or both languages, based on objective or subjective measures. Eight studies did not report proficiency level. Four studies conducted interviews to assess participants' language history (Chertkow et al., 2010;Costumero et al., 2020;Nielsen et al., 2019;Yeung et al., 2014) and one study (Chertkow et al., 2010) categorized participants based on the number of languages spoken by participants instead of language proficiency. In seven studies, proficiency was determined by formal assessment in addition to the self-report questionnaire. Of these, five studies administered the measures to both bilingual and monolingual groups. Two studies (Kousaie et al., 2014;Sheppard et al., 2016) applied Animacy Judgment Task (Segalowitz & Frenkiel-Fishman, 2005), one study (Papageorgiou et al., 2019) employed British Picture Vocabulary Scale-III (Dunn & Dunn, 1997) and one other study (Massa et al., 2020) used verbal fluency task to both monolingual and bilingual groups. Additionally, one study included BNT as a measure of bilingualism (Rosselli et al., 2000) for both bilingual and monolingual groups. The other two studies only assessed proficiency for the bilingual group, which included a combination of BNT and semantic verbal fluency task in one study (Kowoll et al., 2015), and the Wide Range Achievement Test-Version 3 (WRAT-3) in the other (Zahodne et al., 2014). The variation in measuring language proficiency makes comparisons between studies difficult. Seven studies which included highly proficient bilinguals reported an association between bilingualism and NP, with four studies showing a bilingual advantage in inhibition domain, namely, on the Stroop task (Anderson et al., 2017;Bialystok, Poarch, et al., 2014;Kousaie et al., 2014), and the elevator counting with distraction subtest of the Test of Everyday Attention (TEA; Clare, Whitaker, Craik, et al., 2016). The remaining three studies showed better NP on the Corsi Block Test (Luo et al., 2013), letter fluency (Friesen et al., 2015), and Color Trails and Five Digit Test (Nielsen et al., 2019). Additionally, a study (Zahodne et al., 2014) demonstrated a better performance on executive function, memory and language composite of neuropsychological tests for an increase in self-reported bilingualism. However, other studies which reported high level of language proficiency in bilingual participants did not show any significant differences in neuropsychological test scores between monolinguals and bilinguals (Kowoll et al., 2015;Papageorgiou et al., 2019). One study which did not apply strict proficiency criteria found no significant differences in NP of bilinguals and monolinguals (Padilla et al., 2016).
Additionally, cutoff ages used to distinguish early and late language acquisition varied between studies, depending on the characteristics of included participants, with some studies using 11 (Clare, Whitaker, Craik, et al., 2016), 12 (Rosselli et al., 2000), 18 (Bak et al., 2014), and 8 versus 12 or older (Bialystok et al., 2008) as cutoff points. This confounds the interpretation of outcome comparisons across studies.
Four studies included AoA in their analysis. Of these, the study by Bak et al. (2014) showed an effect of early AoA on the tests for general intelligence and reading (NART) and of late AoA on the tests of general intelligence, processing speed (Symbol Search and Digit Symbol subtests of WAIS-III) and reading. On the other hand, the studies conducted by Bialystok et al. (2008), Papageorgiou et al. (2019) and Rosselli et al. (2000) did not find any significant differences in performance of early and late bilinguals in any of the cognitive domains. However, in the study by Rosselli et al. (2000), there was an interaction between AoA and the language of test administration on tests of repetition, naming and verbal fluency. Bilinguals who acquired English before the age of 12 performed better in their second language, namely, in the English version of the tests.
Language of test administration and the use of culturally appropriate tools. Only five studies used validated neuropsychological assessment tools or reported conducting culturally or linguistically appropriate neuropsychological tests for bilinguals (Massa et al., 2020;Mungas et al., 2018;Nielsen et al., 2019;Padilla et al., 2016;Rosselli et al., 2000). The potential impact of cultural, ethnic, and/or language factors in test performance was rarely considered. Of eighteen cross-sectional studies, seven included culturally and/or linguistically diverse participants in bilingual groups who were either tested in their non-dominant or second language. Monolingual groups were more likely to share the similar cultural and/or linguistic background (Anderson et al., 2017;Bialystok, Poarch, et al., 2014;Bialystok et al., 2008;Kowoll et al., 2015;Luo et al., 2013;Ossher et al., 2013;Papageorgiou et al., 2019). Additionally, the lack of use of neuropsychological tests standardized for the cultural background of the bilingual participants was commonly observed in these studies.
Three studies assessed the language groups in their native language or language of preference (Ihle et al., 2016;Nielsen et al., 2019;Rosselli et al., 2019). Eight studies administered the tests in both languages of participants (Clare, Whitaker, Craik, et al., 2016;  . Among the studies analyzing the impact of language of test administration on NP,  indicated no significant differences in performance between English and Welsh administration of Stroop task in healthy bilinguals. Similarly, Rosselli et al. (2000) reported similar test performance in healthy bilinguals on English and Spanish language tests. The only exception was the Cookie Theft task from the Boston Diagnostic Aphasia Examination, on which bilinguals produced a greater number of words when using their second language, English. On the other hand, in the study by Soltani et al. (2019) monolinguals performed better than bilinguals in the verbal fluency test when bilinguals were tested in their native language. However, no significant difference was found in this test between monolinguals and bilinguals when bilinguals were assessed in their second and mostly frequently used language. Furthermore, Kowoll et al. (2015) compared the performance of bilinguals across different diagnostic groups and tested them in their dominant and non-dominant language on the verbal fluency task and BNT. Bilingual MCI patients had a poorer performance when tested in their dominant language, whereas bilingual AD patients had a poorer performance when tested in their non-dominant language.
Two other studies employed a different testing procedure and tested French-English healthy bilinguals in three conditions, namely, English, French and bilingual (either-language) test administration wherein, participants could perform the test using both languages simultaneously (Kousaie et al., 2014;Sheppard et al., 2016). Both studies reported an advantage for the BNT in monolingual English speakers compared to monolingual French speakers and bilinguals. The findings from these studies suggest that the items of the BNT may not be equivalent in terms of difficulty or familiarity in French and English. Furthermore, Sheppard et al. (2016) showed that majority of bilingual participants performed better on the BNT when bilinguals used both languages during testing.
Of the nine longitudinal studies, three studies comprised of participants with mixed linguistic and cultural background and participants were tested in their second-language Chertkow et al., 2010;Yeung et al., 2014). In the study by Yeung et al. (2014), the participants whose second language was English performed worse on a test of global cognitive function (3MS) than those whose first language was English. Three studies included more culturally and linguistically homogenous samples, with Spanish-English speaking Hispanic and Latino American individuals and assessed the participants in their preferred language (Padilla et al., 2016;Mungas et al., 2018;Zahodne et al., 2014). In the study by Zahodne et al. (2014), bilinguals demonstrated better baseline performance on executive function and verbal episodic memory (Selective Reminding Test) than monolinguals when tested in their language of choice. The studies by Padilla et al. (2016) and Mungas et al. (2018) used a validated neuropsychological measure for the cultural and linguistic background of participants and reported no effect of language of test administration on the 3MS in bilinguals. Immigration status. The included studies were limited geographically, and thereby, culturally. With the exception of two studies, conducted in Israel and Iran, all other studies were conducted in Europe and North America. In half of the studies from Canada and four from the USA, knowledge of second language was linked to immigration status. Additionally, in majority of studies that included immigrant populations, the age at immigration Chertkow et al., 2010;Padilla et al., 2016;Papageorgiou et al., 2019;Sheppard et al., 2016) and the length of residence in the new country (Anderson et al., 2017;Bialystok et al., 2008;Chertkow et al., 2010;Kowoll et al., 2015;Mungas et al., 2018;Padilla et al., 2016;Papageorgiou et al., 2019;Sheppard et al., 2016) were not taken into account. Among cross-sectional studies, seven studies did not report on immigration status of participants (Bialystok, Poarch, et al., 2014;Friesen et al., 2015;Massa et al., 2020;Ihle et al., 2016;Luo et al., 2013;Ossher et al., 2013;Sheppard et al., 2016;Soltani et al., 2019). Two studies that included nonimmigrants showed a bilingual advantage only in the domain of response inhibition and management of response conflict on the TEA and Stroop task (Clare, Whitaker, Craik, et al., 2016;Kousaie et al., 2014). Of the nine longitudinal studies, one study (Yeung et al., 2014) did not specify the immigration status of participants. In three studies including nonimmigrant language groups, one study (Cox et al., 2016) did not report any effect of bilingualism on a reasoning/planning test, whereas one study showed a multilingual advantage on tests of verbal fluency, general intelligence, vocabulary/reading (Bak et al., 2014) and one other indicated a bilingual advantage on tests of global cognitive function, letter fluency (Costumero et al., 2020). Three studies included immigrant participants in both monolingual and bilingual groups in which two of them consisted of higher number of immigrants in the bi/multi-lingual group Chertkow et al., 2010;Mungas et al., 2018). Lastly, the other two studies included samples comprising of entirely immigrants in both language groups (Padilla et al., 2016;Zahodne et al., 2014). Bilingualism was associated with a better initial performance on language, executive function and praxis items of a global cognitive functioning test, before and after controlling for immigration status (Padilla et al., 2016), as well as, on tests of verbal episodic memory (Selective Reminding Test) and executive function without controlling for immigration status (Zahodne et al., 2014).

Discussion
The primary rationale for this review was to assess the findings from the extant literature to determine whether bilingualism confers an advantage and/or disadvantage in NP in older adults and, thereby, to give an insight into the assessment of bilingual older adults in clinical practice as little is known about the generalizability of research on NP of bilingual older adults to clinical neuropsychology. Language proficiency, AoA, immigrant status, and language of test administration were factors addressed in this review to examine whether participant and language related variables influence the degree of advantage or disadvantage that bilingual older adults may experience during test performance.
Does bilingualism in older adults offer advantages or disadvantages in the neuropsychological test performance?
In cross-sectional studies that included bilingual healthy older adults, a bilingual disadvantage was observed in lexical access tests involving vocabulary knowledge and verbal fluency (Anderson et al., 2017;Bialystok et al., 2008;Bialystok, Poarch, et al., 2014;Luo et al., 2013;Rosselli et al., 2000), particularly in category fluency subtests (Anderson et al., 2017;Bialystok et al., 2008;Rosselli et al., 2000). This is consistent with previous literature showing poorer performance among bilinguals in comparison to monolinguals on verbal tasks which require vocabulary knowledge and faster lexical access across different age groups (Kroll & Gollan, 2013). Explanation for these findings might lie in the frequency lag (Gollan et al., 2011;Gollan et al., 2008) and competition account (Abutalebi & Green, 2007;Green, 1998;Kroll & Gollan, 2013). According to the frequency lag account, bilinguals are exposed to each word less frequently, as they speak each language less often than monolinguals. Therefore, there is a weaker connection between the words and the concepts represented by the words, leading to reduced accessibility of words in each language .
On the other hand, the competition account assumes that the intention to speak prompts simultaneous activation of both languages in bilinguals, resulting in greater competition for selection of the target language (Green, 1998;Kroll et al., 2008). The need to resolve this competition between languages in bilinguals could slow down retrieval of target language words. This may lead to a decrease in the number of correct responses in bilinguals than in monolinguals (Sandoval et al., 2010), particularly in a semantic fluency task which relies primarily on lexical retrieval speed (Gordon et al., 2018). Semantic fluency task is more constrained as it is highly dependent on the connections between conceptual and semantic representations as opposed to the letter fluency task, where such connections are not required (Kav e & Knafo-Noam, 2015;Meinzer et al., 2009). Additionally, semantic fluency is more likely influenced by non-target language interference since translation equivalents is in the same semantic category (Giezen & Emmorey, 2017). The study by Sheppard et al. (2016) provides additional support for this account with the findings showing an association between either-language test administration and improved test performance in a non-speeded measure of semantic memory, namely the BNT. This is perhaps not surprising given that bilinguals may need to control interference from the non-target language and resolve competition in lexical selection and BNT and semantic fluency put demands on semantic memory (Henry et al., 2004). With the simultaneous use of both languages, there may be no need to inhibit any language. Therefore, no additional resources are used in inhibition, making it easier for bilinguals to access the lexicon more quickly than they would in a single-language test administration. Better performance on the BNT when using either-language test administration was found in bilingual healthy adults (Gollan et al., 2007). Further, in cross-sectional and baseline results of longitudinal studies, there was some evidence showing a bilingual advantage with respect to inhibition and management of response conflict domain assessed by the Stroop task in healthy older adults, (Bialystok, Poarch, et al., 2014;Kousaie et al., 2014;Massa et al., 2020) as well as in MCI and AD patients (Anderson et al., 2017;. These findings are consistent with the results of a previous study (Kousaie & Phillips, 2017) using both behavioral and electrophysiological measures in older adults showing a superior performance in the Stroop task in bilinguals compared to monolinguals .The competition account may provide an explanation for both the observed disadvantage in lexical access and the advantage in executive control. The mechanism of inhibitory control used by bilinguals regularly to reduce interference from the non-target language (Green, 1998;Linck et al., 2008) may result in enhanced cognitive control. The results from two studies with healthy older adults (Friesen et al., 2015;Massa et al., 2020) may also lend some support to this notion with findings showing a better performance in bilinguals in letter fluency which demands greater executive control functioning (Gordon et al., 2018;Shao et al., 2014). The recruitment of executive control in letter fluency may be a possible explanation of the observed bilingual advantages in these studies as it is a verbal task which requires retrieval of words starting with a specific letter and inhibition of words beginning with different letters, suggesting involvement of conflict-resolution skills (Blumenfeld et al., 2016;Luo et al., 2010). This result is in line with a previous study wherein the Stroop performance as a measure of inhibitory control and conflict-resolution was correlated with verbal fluency measures and bilingual healthy adults outperformed monolinguals in letter fluency task, especially, where there was a higher demand for executive control (Patra et al., 2020). However, on the other hand, the studies by Bialystok et al. (2008) and Soltani et al. (2019) with healthy older adults, provide contradictory findings, with monolinguals showing a better performance than bilinguals in letter fluency. A possible explanation for these contradictory findings may be due to language of test administration and executive control abilities in bilinguals. In the studies by Friesen et al. (2015) and Massa et al. (2020), majority of bilingual participants were assessed in their dominant language and they were either matched on executive function performance or a better performance was reported in Stroop test for bilinguals. Furthermore, in the study by Soltani et al. (2019), the bilingual disadvantage disappeared when bilinguals were tested in their most frequently used language. These findings suggests that bilingual advantages may occur in verbal tasks which have a higher role for executive control once the language of proficiency has been accounted for (Luo et al., 2010), particularly in language of testing. Overall, these results indicate that the advantages and disadvantages associated with cross-linguistic interference and inhibition may differ for tasks which impose demands on lexical access and executive control.
The longitudinal studies examining healthy older adults (Cox et al., 2016), older adults who are either with intact cognition or cognitive impairment at baseline (Mungas et al., 2018;Padilla et al., 2016), MCI and AD patients  and conversion to dementia (Yeung et al., 2014;Zahodne et al., 2014) yield consistent pattern of findings showing no bilingualism advantage on test performance over time except for one study. The longitudinal data from Costumero et al. (2020) which included multiple-domain aMCI patients reported a greater cognitive decline in letter fluency and global cognition in monolinguals compared to bilinguals. This finding is in contrast to the findings from a longitudinal study with MCI patients , which did not find any differences between monolinguals and bilinguals on the letter fluency task. The languages of bilinguals in the study by Costumero et al. (2020) consisted of two typologically similar languages, namely, Spanish and Catalan and monolinguals were identified as passive bilinguals who could speak Spanish and understand Catalan with poor fluency. Furthermore, bilinguals and monolinguals did not differ in their baseline performance on several tests. On the other hand, in the study by , bilinguals were participants with various language combinations and cultural backgrounds who were tested in their second language. For this reason, interpretation of these studies and their apparently conflicting findings is limited as a consequence of a number of factors, namely, differences in typology, structure and culture between languages (Eng et al., 2019), language of test administration and definitions used for bilingualism. These contradictory and inconclusive results highlight the need for more longitudinal studies investigating differences in performance on verbal fluency tasks in monolingual and bilingual MCI patients.
In general, the evidence in favor of the existence of a bilingual advantage is weak and observed more often in the cross-sectional studies using measures of inhibitory control or on baseline performance of bilinguals assessed in the longitudinal studies, but the findings lack consistency. A possible reason for these contradictory findings may be methodological differences across studies, namely, the sample (size, inclusion/exclusion criteria, inclusion of different clinical samples), different study designs, lack of consistency in standardized tests used to assess cognitive domains and language and participant related variables (see Figure 2).

What factors affect the findings of neuropsychological test performance in older bilingual adults?
A number of methodological differences might explain the apparent discrepancies in results across studies. First, the characteristics of bilingual participants varied across studies due to lack of a standardized definition of bilingualism (Anderson et al., 2018), leading to inconsistencies in defining bilingualism across studies. In order to gain a clear understanding of the nature of the relationship between bilingualism and cognitive outcomes, it is important to explore specific factors associated with bilingualism that may be responsible for the observed effect (Kroll, 2009).
Language proficiency comprises a core aspect of language experience in bilingualism research. However, only 19 studies reported the level of language proficiency of bilinguals. Studies showed that higher level of language proficiency was associated with better performance, particularly on tests measuring inhibition. This corresponds well with the findings of other studies using Stroop tasks (Tzelgov et al., 1990;Zied et al., 2004) which indicate that the size of Stroop effect is influenced by the proficiency in the second language such that a minimum level of language proficiency is necessary to elicit interference effects and higher levels of proficiency results in better controlled processing. Additionally, the finding regarding the higher level of language proficiency and letter fluency task is in line with the results reported in young adults (Luo et al., 2010). A possible explanation for this may be that the competition evoked by higher language proficiency may result in more engagement in executive control due to increasing demand imposed by managing cross-linguistic influences. However, language proficiency alone does not fully account for the advantage reported in the studies as a few studies including highly proficient bilinguals indicated no significant differences in test scores between language groups (Kowoll et al., 2015;Papageorgiou et al., 2019).
Another variable related to bilingualism, namely, AoA may provide further insight and help explain the inconclusive findings. AoA is a complex variable as it is not only associated with the level of input received according to years of exposure, but also varied patterns of language use between speakers with early and late acquisition (Antoniou & Wright, 2017). Indeed, some studies included in this review revealed a different pattern of test performance among early and late bilinguals in various cognitive domains. However, a major confounder is the different ages used as cutoffs for the classification of early versus late acquisition. The studies investigating the effect of AoA on the neural organization of the bilingual brain commonly use 6 years of age as cutoffs (Wattendorf & Festman, 2008). It is important for studies on the NP of bilinguals to determine AoA for the categorization of early and late bilinguals, based on available literature and examine its effect on NP for the integration of findings. However, simultaneous bilinguals who acquired two languages from birth can be considered as a potentially distinct group since their experience may vary from bilinguals with early AoA (Sabourin & V inerte, 2015).
Inconsistent results in NP of language groups may further be explained by a methodological difference in the  comparison of monolingual versus bilingual groups, namely, language of test administration. In a few studies Rosselli et al., 2000), scores on the Stroop task and language tests did not differ depending upon whether the tests were administered in different languages. Upon closer examination of the characteristics of samples, it can thus be hypothesized that inclusion of bilinguals with comparable proficiency in each language may help eliminate the effects related to language of test administration (Gasquoine et al., 2007). Furthermore, contradictory findings in letter fluency performance highlight the need to consider the testing language in bilinguals. Language of test administration is also closely linked to the issue of linguistically and culturally equivalent version of the neuropsychological tests used in the studies. In some studies, monolingual English speakers performed better than monolingual French speakers on the BNT, pointing to a limitation for cross-linguistic and cultural equivalence of this test. This is further evident through studies which found no impact of language of administration on test performance when using a validated assessment tool targeting the cultural and linguistic background of the test-takers (Padilla et al., 2016;Mungas et al., 2018). Taken together, the results point to an interplay between language proficiency, language of test administration and the use of culturally/linguistically appropriate tools in the assessment of bilinguals. Furthermore, these results highlight the need to evaluate cross-cultural and cross-linguistic potential of neuropsychological tests and emphasize the importance of using validated assessment tools appropriate for the cultural/linguistic background of the participants in order to obtain more accurate results of NP in bilinguals.
Besides language-related confounding variables, another critical factor that has been neglected in the studies is the demographically diverse backgrounds of monolingual and bilingual groups, such as different cultures/ethnicities and language families. Specifically, studies were more likely to compare a homogenous group of English monolinguals with group of bilinguals comprised of varying cultural, ethnic, and linguistic background (Bialystok et al., 2008;Luo et al., 2013;Ossher et al., 2013;Papageorgiou et al., 2019). The effect of cultural variables on neuropsychological tests is well-established in the extant literature (Agranovich & Puente, 2007;Ardila, 2005Ardila, , 2007aLoewenstein et al., 1994;Walker et al., 2010). Cultural variables may exert a greater impact on test performance in individuals with a non-Western or non-English background as majority of the tests used in clinical settings are developed by Western, English speaking countries (Wong, 2000). Performance of culturally/ ethnically diverse bilinguals on neuropsychological tests may be mitigated by an interaction of various linguistic and cultural variables and not necessarily reflect a direct relationship between cognition and test performance.
The diverse cultural and linguistic backgrounds in bilingual groups also bring out another confounding factor in the studies; immigration status of participants. In studies examining nonimmigrant language groups (Clare, Whitaker, Craik, et al., 2016;Cox et al., 2016;Kousaie et al., 2014), there was no difference in NP between monolinguals and bilinguals on tests of executive function, except for a small advantage in bilinguals on the inhibition domain reported in cross-sectional studies Kousaie et al., 2014). The findings in longitudinal studies were mixed; showing either no long-term effect of bilingualism on NP or better performance at baseline on tests of global cognitive functioning., verbal episodic memory and executive function. The association between immigration status and cognitive function is still poorly understood (for a review, see Xu et al., 2017). The inconsistent findings may be partially explained by level of acculturation. There is a growing body of evidence indicating that acculturation might be a salient factor in NP differences among various ethnic/cultural groups (Arentoft et al., 2012). Higher acculturation level is significantly linked to better NP in a variety of cognitive domains, including executive function, verbal fluency, naming, processing speed and global cognitive functioning (Arentoft et al., 2012;Boone et al., 2007;Coffey et al., 2005;Manly et al., 1998). However, acculturation was measured in a few studies in cursory terms by focusing on country of birth and length of residence in the new country without referring to them as proxies of acculturation. Acculturation is a multidimensional construct and these proxies yield only an indirect measure of acculturation (Abe-Kim et al., 2001;Lopez-Class et al., 2011). It could be that different levels of acculturation as a consequence of immigration status may have distinct effects on NP in language groups. Hence, the implementation of acculturation measures has strong potential to benefit research in NP of bilingual older adults.
Taken together, all the factors listed reveal the difficulty in disentangling the interplay between linguistic and, participant-related variables on NP.

Clinical implications
Despite the mixed findings, the results of this review can be utilized as a tool that might aid in making clinical interpretations with bi/multi-lingual populations in both clinical and research settings. In order for accurate assessment of individuals' cognitive abilities, researchers and neuropsychologists should take into account factors such as, acculturation level of the individuals, culturally corrected norms or crosscultural neuropsychological assessment tools for participants from diverse cultural backgrounds, as well as language proficiency. Additionally, given the impact of language of test administration, consideration of alternate assessment approaches and identification of strategies for increasing the utility of neuropsychological testing is of critical importance when testing bilingual populations with differing proficiency in each language. Lastly, due to contradictory findings on letter fluency between cross-sectional and longitudinal studies, it remains an open question whether a differential performance emerges between monolinguals and bilinguals.

Limitations and future recommendations
The findings of this review indicate that AoA and proficiency may play a distinct role in bilingualism research. Although they are highly correlated, it has been shown that AoA and proficiency influence the brain in different ways, with AoA influencing regions used in grammatical processing and proficiency affecting those involved in semantic judgements (Wartenburger et al., 2003). A more rigorous investigation of these two factors may help determine the weight of their impact on test performance, specifically on tests measuring language domain. This review revealed that another critical language-related factor, which has been largely overlooked in this area of research is context of language use and language switching pattern. Studies included in the review predominantly addressed language proficiency or the number of languages spoken by the participants. It is possible that the context in which language is used and switching frequency between languages might be associated with a different pattern of findings. According to the Adaptive Control Hypothesis proposed by Green and Abutalebi (2013), context is a key factor in determining which language can be used and the extent to which bilinguals can switch between languages. For this reason, it has been proposed that the demand on cognitive control processes can vary as a result of bilingual language context. However, there is a paucity of research in this area and future studies should closely examine the relationship between context of language use, language switching pattern and NP.
Furthermore, variables accounted for and neuropsychological assessments employed vary considerably across studies. For this reason, a meta-analysis was not possible. Besides this, there was a lack of consensus on categorization of cognitive domains assessed by neuropsychological tests in the studies.
Finally, due to the scarcity of longitudinal studies, it is currently difficult to fully understand the changes in NP that occur with the progression of the disease in bilingual MCI and AD patients. The use of longitudinal studies, rather than cross-sectional studies seems to shed more light on the relationship between language knowledge and cognition (Calvo et al., 2016).
Taken together, efforts should be taken to address or examine moderating variables identified in this review and to use more standardized neuropsychological tests that can be appropriately used with culturally and linguistically diverse bilingual individuals in order to elucidate the effects of bilingualism on cognitive domains more comprehensively.

Conclusions
The research on NP in bilingual older adults is riddled with complexity and discrepancies regarding the conceptualization and measurement of bilingualism and its association with immigration status, language of test administration and the standardized assessment tools appropriate for the cultural background of the tested population. This review demonstrated that neuropsychological tests in the language domain, specifically in verbal fluency and to some extent executive function (particularly, inhibitory control) represents a core and clinically relevant feature of bilingualism which needs to be taken into account during the evaluation of participants in clinical and research settings.
There is an increasing need for measures that accurately and efficiently contribute to the neuropsychological assessment of bilingual populations. The cross-culturally and cross-linguistically applicable tests, namely, the Cross-Linguistic Naming Test (Ardila, 2007b), European Cross-Cultural Neuropsychological Test Battery (Nielsen et al., 2018) or tests (Five Digit Test, Stick Design Test, Rowland Universal Dementia Assessment Scale) which are less influenced by differences in cultural and language background (Franzen et al., 2020) may utilize research in neuropsychological performance of bi/multilingual older adult groups consisting of various cultural/language backgrounds. Furthermore, the Language Background Questionnaire (LSBQ; Anderson et al., 2018) which was developed to quantify bilingualism based on a composite score in culturally diverse populations and to provide with evidence-based classifications of language groups may help compare the results of various studies effectively. Alternatively, language proficiency is a significant indicator of individuals' level of acculturation as it characterizes the degree of integration in a new country and the amount of exposure to the dominant language of the host culture (Schumann, 1986). Thus, evaluation of acculturation level may aid in determining the language in which participants can be tested (Pont on) and it can allow for addressing the contextual variables which may maintain or improve language proficiency in bilinguals.
A comprehensive theoretical model is necessary to portray which cognitive domains are influenced by which components of bilingualism and to suggest testable predictions (Antoniou, 2019). Furthermore, components of bilingualism and participant-related variables stated in this review (see Figure 2) highlight the lack of a thorough and systematic understanding of the underlying factors contributing to the inconsistent findings in the field and may provide an initial step toward studies with more rigorous methodology.

Disclosure statement
The authors declare no conflict of interest.

Funding
The study was supported by the Robert Bosch Foundation (grant no. 11.5.G411.0005.0) Stuttgart, Germany (Robert Bosch Stiftung) and the interdisciplinary Graduate Program "People with Dementia in Acutecare Hospitals", located at the Network Aging Research (NAR), Heidelberg University, Germany.