Cross-Country (Brazil and Iran) Invariance of Fractionation of Executive Functions in Early Adolescence

ABSTRACT Cultural background can influence cognition, including executive functions (EFs), abilities that encompass skills responsible for self-regulation of thoughts and behavior. The seminal unity and diversity model of EFs proposes the existence, in adulthood, of at least three correlated but separable EF latent (shared variance in more than one task/indicator) domains: inhibition, updating and shifting. However, evidence of the cross-cultural generality of this framework is lacking, especially in adolescence, an age during which these domains become more clearly separable. We tested whether this EF fractionation could be observed in early adolescents (9- to 15-year-olds) from metropolitan areas in Brazil (São Paulo) and Iran (Tehran) (total sample: 739; 407 Iranians; 358 girls). Participants carried out two open-access tasks that are representative of each EF domain and that were adapted to each cultural context. Seven latent model configurations were tested. The three-correlated latent factor structure had adequate fit, and multiple-group confirmatory factor analysis invariance testing showed invariance for country at the level of the latent factor structure (configural), factor loadings (metric), and partial invariance at the intercept (scalar) level. Iranians had higher scores in all domains. Multiple indicators multiple causes invariance testing showed model invariance across age (except for one task) and parental education. Performance in all domains improved with age and only minimally with parental schooling. We conclude that EF fractionation into three domains is present in the first half of adolescence in two samples from underrepresented populations in the literature, suggesting a potential generality of EF latent unity/diversity development at this age.


Introduction
It is generally agreed that cultural factors such as language, educational practices, social norms and values can influence behavior and cognition (Barrett, 2020).Because culture is a multifaceted and plural construct that has no consensual definition, many studies have resorted to comparing cognitive abilities across countries (Taras, Steel, & Kirkman, 2016) and have shown that, at the individual cognition level, there are indeed country-specific cognitive abilities that include perception, memory and higher-order cognitive skills (Barrett, 2020) such as executive functions (EFs) (see Schirmbeck, Rao, & Maehler, 2020).EFs are top-down cognitive abilities that overlap with the concept of controlled attention and govern how people achieve the goals that they have in mind at any given moment (Friedman & Miyake, 2017).These abilities have been extensively studied (with various types of assessment tools) because they promote adaptive functioning and underly the skills needed to maximize adequate reasoning, planning, decision-making and the ability to regulate many types of behavior (see Fatima & Sharif, 2019;Garcia-Barrera, 2019), all of which are essential to lead a healthy and fulfilling life, regardless of cultural context.It therefore stands to reason that fundamental abilities such as these could also have parallels across countries (Barrett, 2020;Liebal & Haun, 2018) in people of different ages.Crosscountry developmental studies can therefore be helpful in establishing such similarities, because the emergence of abilities at similar ages in different cultures can indicate their generality (Fatima & Sharif, 2019;Liebal & Haun, 2018).However, much less attention has been paid to the common cognitive features among samples from different nations than to cross-country differences (see Barrett, 2020;Fatima & Sharif, 2019).Therefore, we set out to examine whether some particular types of EFs are similarly dissociated in two diverse cultural samples during adolescence, a period in which EFs rapidly improve (e.g.Schirmbeck, Rao, & Maehler, 2020).
There are many contemporary conceptualizations and theories of EFs which often distinguish between cool and hot EF abilities (see Garcia-Barrera, 2019).Of interest here are cool EFs, that is, executive abilities that involve problem solving that is not predominantly associated with socio-affective features or contexts, unlike hot EF skills (see Garcia-Barrera, 2019).We based our analyses on the seminal EF unity and diversity framework (Miyake et al., 2000; see also Friedman & Miyake, 2017), which proposes the existence of at least three EF facets or domains that are inter-correlated (unity) but are also statistically differentiable (diversity or fractionation of EFs) (Miyake et al., 2000).The domains proposed by Miyake and collaborators were not meant to be exhaustive in terms of the existing EFs but were proposed by convenience and based on the range of well-known EF laboratory tasks in the literature.These domains are as follows: 1) inhibition, the ability to inhibit automatic/prepotent responses (henceforth referred to as inhibition); 2) switching, the ability to switch goals sequentially and 3) updating, the ability to refresh the content that is kept in mind in order to retain only information that is relevant to achieve goals under pursuit.This theoretical perspective considers that EFs are constructs that cannot be directly measured using raw scores in EF tasks.Instead, these EFs represent individual characteristics that can be psychometrically determined as latent traits which represent the common variance in raw task performance scores (indicators) in more than one task designed to measure the same construct (Friedman & Miyake, 2017;Miyake et al., 2000).The same group of researchers later updated the model configuration to a bifactor model using the same types of tests.It includes (see Friedman & Miyake, 2017): 1) a common factor which represents the "unity" of EF and reflects the common variance among scores on all tasks -this latent factor was found to be equivalent to the variance previously attributed to the inhibition factor and is believed to measure "general goal-directed behavior-"; and 2) two orthogonal, non-correlated latent factors that specifically reflect correlations between updating and shifting scores after removing the common EF variance (see Friedman & Miyake, 2017).However, the absence of a covariation between the specific (updating and shifting) factors in this type of model has since been shown to introduce anomalies to the factor loadings that may affect model interpretability (Eid, Geiser, Koch, & Heene, 2017) which, to our knowledge, has only been acknowledged and corrected for [bifactor-(S-1)] in one study that applied this model to study the unity/diversity of EF (Segura et al., 2022).Despite this issue, it is generally accepted that both the three-correlated and bifactor models show EF unity and diversity and that using latent traits instead of raw scores minimizes the problem of the intrinsic impurity of EF tasks (i.e. the fact that to carry out EF tests one must recruit not only executive abilities but also other cognitive capacities such as language skills and perception) (Friedman & Miyake, 2017).Hence, latent traits exhibit less random measurement error, which varies across different tasks, even those that supposedly measure the same EF construct/domain.
Although the existence of correlated, yet separable EF domains has been replicated extensively in adult samples, mostly from high-income populations (e.g.see Karr et al., 2018), evidence of the cross-country generality of this theoretical concept is lacking because studies carried out in different countries: 1) reported different model configuration (generally found two or more factors that are not necessarily equivalent); 2) assessed different executive abilities that are not necessarily comparable.Considering task characteristics is important because there are many theoretical misconceptions in the choice of EF tasks, many of which are not representative of the domains proposed in the unity/diversity framework as it was initially conceptualized (e.g.confuse working memory with updating and mental flexibility with shifting: see Morra, Panesi, Traverso, & Usai, 2018).-and 3) did not provide evidence that the EF domains that were measured were equivalent across the compared samples.Stated differently, even if studies in different countries use tasks that measure the same EF domains and find the same model configurations, cross-country comparisons can still be biased by many aspects intrinsic to specific countries, such as language and sociocultural norms, which can affect performance in the same task differently (Chen, 2008).One way of ascertaining whether EF constructs are measuring the same things across samples (e.g. from different countries) is to use a psychometric approach called invariance testing (Chen, 2008).In spite of the fact that establishing invariance for country is a pre-requisite for cross-country comparisons, we are unaware of any study that has done so considering the EF unity/diversity framework.This is true for all age groups, even though it is recognized that carrying out crosscountry analyses throughout human development is helpful to identify cognitive capacities that are country unspecific (see Liebal & Haun, 2018).
Choosing the ideal age, or time window, to conduct such a cross-country comparison of the unity/diversity framework requires taking into account the developmental/maturation trajectory of the separability of the proposed EF domains.In studies with samples from a single country, the exact age at which this separability, observed in adults, is first apparent is still under debate (see Garcia-Barrera, 2019;Karr et al., 2018).Most authors agree that young children do not present separable EF latent domains and that, at some point during adolescence, domains become differentiable (for reviews, see Garcia-Barrera, 2019;Karr et al., 2018;Schirmbeck, Rao, & Maehler, 2020).In adolescents, variable model configurations have been found in cross-sectional studies that specifically considered tasks that measure the domains conceptualized by Miyake et al. (2000) and Friedman and Miyake (2017) (i.e. that did not confound, for instance, working memory capacity and mental flexibility with updating and shifting, respectively: Morra, Panesi, Traverso, & Usai, 2018).Few of these studies show that the domains are not differentiable at all (unifactorial models: Xu et al., 2013, 7 to 12 years old only) mirroring what is found in childhood.Others found only two (Huizinga, Dolan, & van der Molen, 2006;van der Sluis, de Jong, & van der Leij, 2007) and most report at least three distinguishable domains (Agostino, Johnson, & Pascual-Leone, 2010;Duan, Wei, Wang, & Shi, 2010;Friedman et al., 2008;Friedman, Miyake, Robinson, & Hewitt, 2011;Hartung, Engelhardt, Thibodeaux, Harden, & Tucker-Drob, 2020;Segura et al., 2022;Wu et al., 2011;Xu et al., 2013, 13 to 15 years old and all ages collapsed; Zanini et al., 2021), while longitudinal studies indicate that a differentiation into three-correlated domains is only attained around 15 years of age (Lee, Bull, & Ho, 2013).The conflicting findings in terms of EF factor structure in adolescence, that is, which latent domains are or are not differentiable, may have resulted from the diversity of age ranges adopted in each of these studies and, therefore, in their developmental cognitive status.As such, a more restricted age range during adolescence seems to be the ideal sensitive phase of life to explore developmental aspects of the cross-country generality of the EF unity/ diversity framework [i.e.models that contain at least three latent factors that either interassociate (as in Miyake et al., 2000) or a general factor and two other specific factors (see Friedman & Miyake, 2017)].
Concerning prior evidence of cross-country comparisons of executive functioning during development, regrettably, most of the literature is restricted to the investigation of preschool children (see Schirmbeck, Rao, & Maehler, 2020).Additionally, most of these studies in adolescents only contrasted raw score performance in EF tasks between samples from different countries (see Schirmbeck, Rao, & Maehler, 2020), so their findings should be interpreted with caution because of the problem of the intrinsic impurity of EF tasks, which could have led to spurious results.The few cross-country studies that included adolescents and determined EF latent traits only considered a general, single latent EF factor structure (unity) formed by only one task of different theoretical EF domains (Ellefson, Zachariou, Ng, Wang, & Hughes, 2020;Holding et al., 2018;Wang, Devine, Wong, & Hughes, 2016;Xu, Ellefson, Ng, Wang, & Hughes, 2020), which precludes the determination of EF fractionation.Notwithstanding, these publications tested whether samples from different countries displayed equivalence in terms of their statistical models mostly using multiple-group confirmatory factor analysis (MGCFA: Brown, 2015) to test for invariance [except for Wang, Devine, Wong, and Hughes (2016), who used another statistical technique].These studies showed invariance for country at least at one of the three hierarchical levels: 1) configural invariance, meaning the factor structure was comparable (i.e.indicators were related to the same number of latent factors in both countries), which was found by Holding et al. (2018); 2) metric invariance, that is, the equivalence of the factor loadings (the strength of the relationship between each factor and its associated indicators: Brown, 2015), as found by Ellefson, Zachariou, Ng, Wang, and Hughes (2020) and Xu, Ellefson, Ng, Wang, and Hughes (2020); and 3) scalar invariance, or comparable means (intercepts) of raw scores, found in none of these studies except for Wang, Devine, Wong, and Hughes (2016), who found no differential item functioning, which indicates invariance of the intercepts.Overall, these studies provide evidence of cross-country generalizability of common variance among EF indicators of different domains (unity).They also showed that Eastern samples tend to outperform Western ones in this general ability (Ellefson, Zachariou, Ng, Wang, & Hughes, 2020;Wang, Devine, Wong, & Hughes, 2016;Xu, Ellefson, Ng, Wang, & Hughes, 2020).However, they provide no information on cross-country EF domain separability (fractionation or diversity) during adolescence, which we intended to study here.
To explore the cross-country generalization of the dissociability and intercorrelation of EF traits considering the unity/diversity framework domains in adolescents, we used samples of 9-to-15-year-olds from two countries (Brazil and Iran), which we will call early adolescence.This restricted age range was chosen because we wanted to determine whether EF domain separability was already present at the beginning of adolescence, what model configuration would be found, and whether it was invariant across two cultures.Brazil and Iran were selected because their populations are not only distinct from those of high income countries, which have been the focus of studies in the EF unity/diversity literature, but also because they are quite different to each other in many respects, such as: oral and written language (Portuguese and Farsi, respectively), geographic location (West versus East, respectively, and also of different latitudes, longitudes, continents and climate), history and sociocultural aspects [e.g. in respect of the six Hofstede cultural dimensions (https://www.hofstede-insights.com/country-comparison/brazil,iran/)except for similarity in individualism].Brazil and Iran also differ from most high income countries in that they still present above-average income inequality (https://worldpopulationreview.com/country-rankings/gini-coefficient-by-country), despite being currently regarded as upper middle-income countries (https://www.wipo.int/edocs/pubdocs/en/wipo_pub_gii_2021/br.pdf).We reasoned that, from a developmental point of view, comparing the EF unity/ diversity framework in under-studied and diverse early adolescent populations with varying socioeconomic status (SES) could contribute (see Barrett, 2020) to a better understanding of the applicability and/or adequacy of the unity/diversity framework worldwide.
To this end, we used a test battery which includes measures that reflect the originally proposed domains of EF unity/diversity (Free Research Executive Functions Evaluation, FREE: Zanini et al., 2021) and that were similar to those used in studies that found some evidence that these abilities are differentiable at a latent trait level in adolescence (Agostino, Johnson, & Pascual-Leone, 2010;Duan, Wei, Wang, & Shi, 2010;Hartung, Engelhardt, Thibodeaux, Harden, & Tucker-Drob, 2020;Wu et al., 2011;Xu et al., 2013).Importantly, the FREE battery allows language and sociocultural adaptation to a diverse range of contexts, and was built to minimize the effects of SES by involving simple instructions and highly familiar stimuli, making it a culturally fair assessment tool, which is essential to reduce cultural bias (e.g.Fatima & Sharif, 2019;Zanini et al., 2021).Indeed, scores in the FREE EF tasks have been found to be only minimally affected by SES (Segura et al., 2022), as well as being shown to be psychometrically reliable (Segura et al., 2022;Zanini et al., 2021).Additionally, this battery is open-access and does not require the use of copyrighted and expensive equipment/software, allowing its use in under-resourced research settings and countries.
To determine which factor structure best explains the organization of the EFs latent domains in early adolescents from Brazil and Iran we explored whether the EF domains could be differentiated using Confirmatory Factor Analyses (CFA) with various alternative factor structures (e.g.single factor, two factors, three-correlated factors or a general plus two specific ones reviewed by Karr et al., 2018)].The second step was to establish if the best fitting most fractionated model in both countries measured the same constructs in the samples from the two countries, that is, we tested the model for invariance at the hierarchical configural, metric and scalar levels using MGCFA, which also allows comparisons of latent traits between countries if evidence of invariance is obtained.The third and last step explored invariance for age in months and average parental schooling level (as a proxy for SES) and, in the case of invariance, the extent to which the EF domains differed in adolescents of different ages and from different SES backgrounds.Because age and parental schooling were used as continuous measures, which cannot be tested under MGCFA, this analysis was carried out using multiple-indicators, multiple-causes invariance testing (MIMIC: Brown, 2015).Lack of prior studies on fractionated latent EFs factor structure at the age of interest that tested for invariance across countries, age and SES precluded us from hypothesizing whether evidence of invariance would be obtained in adolescents.If invariance for an adequately fitting model was found, based on the literature (e.g.Last et al., 2018;Schirmbeck, Rao, & Maehler, 2020) we hypothesized that: 1) Iranians would have higher EF abilities because children and adolescents from Eastern populations often outperform those from Western ones; 2) EF performance would increase by age because EFs improve throughout adolescence; and 3) despite reports of differential SES effects on EFs in adolescents, we expected parental schooling to only have minimal or no influence on performance because the tasks were adapted to minimize this effect (Segura et al., 2022;Zanini et al., 2021).

Participants
A total of 739 participants aged 9-to 15-years old were tested, 407 (170 girls) of whom were Farsi-speaking Iranians and 332 (188 girls) of whom were Portuguese-speaking Brazilians.Samples of both countries were enrolled in the local equivalent of the United States grades 4 through 9 from public and private schools in urban areas in Tehran (Iran) and São Paulo (Brazil).All participants had normal or corrected vision and were regarded as typically developing because they were enrolled in a school grade compatible with their age and because their guardians reported they had no clinical history of neurodevelopmental or neuropsychiatric disorders.

General procedures
The procedures of this cross-sectional study were approved by the Ethics Committee of the Universidade Federal de São Paulo in Brazil (CAAE# 56284216.7.0000.5505and 50662015.30000.5505)and of the Education Office of Tehran (Approval code: D/100/ 10247; DATE: November 1, 2018).Informed consent was obtained from the participants' legal guardians (both samples) and the participants gave their assent in Brazil, following local ethical guidelines.Detailed information on sampling can be found in the Supplementary Material.Note that the majority of the data were used in other publications in single country studies: Segura et al. (2022); Segura and Pompéia (2021); Zanini et al. (2021).Participants were tested individually at their schools and provided information on their age, sex and their guardians/parents' schooling levels (1 = did not complete high school; 2 = finished high school; 3 = completed two years of higher, tertiary education; 4 = completed tertiary education; 5 = master's degree or above; as a proxy for family SES (see Last et al., 2018).The examiners were trained to administer tests and determine scores following a test manual described in Zanini et al. (2021) (available in English, Portuguese and Farsi at https://doi.org/10.17605/OSF.IO/2BX8N).The present study was not preregistered.

Executive functions measures
The Free Research Executive Evaluation (FREE; Zanini et al. (2021) battery was used.This test battery contains two tasks of each of the three executive domains (inhibition, updating and switching; Figure 1).Inhibition and switching tasks included three blocks each, two of which were control blocks (naming or classifying stimuli) and the third was the executive block that includes the same task as the control blocks plus an executive demand (inhibiting a prepotent response or switching between classifications).Updating tasks, as per literature, did not include control blocks.Tasks and scoring are described in Table 1.Participants were asked to complete each task as fast as they could, avoiding mistakes.Tasks were carried out until the end (no interruption criteria), were self-paced, participants proceeded to the next stimuli and/or task block by swiping a touchscreen or pressing a single key on a keyboard.Participants' answers were vocal in their native language.The experimenters recorded accuracy and timed how long they took to complete each task or task block using a stopwatch.The stimuli were the same as those used in Zanini et al. (2021) and were found to be adequate in pilot studies for both countries, except for one color change in the Victoria Stroop task for the Iranian sample.All task instructions and stimuli were in Portuguese for Brazilian and in Farsi for Iranian participants.
The whole test session included the EF tasks that were alternated with other tasks and questionnaires that will not be discussed here and administered in a pseudorandom order to minimize the effects of fatigue.The EF tasks were presented on a tablet or laptop screen and preceded by simple instructions (read by the examinee or by the experimenter if the examinees preferred so) and practice stimuli (except for the inhibition tasks; see Table 1).
We used seven EF indicators -performance measured as Rate Correct Scores (RCS: accuracy divided by time to complete tasks) because it corrects for speed-accuracy tradeoffs (Vandierendonck, 2017) and there is no clear rationale regarding when and why to select accuracy or reaction times in the EF literature (Karr et al., 2018).The updating tasks (Number Memory and 2-Back tasks) contained only one type of block with updating demands, so the total raw RCS was used, following the literature (Miyake et al., 2000; see also Hartung, Engelhardt, Thibodeaux, Harden, & Tucker-Drob, 2020;Zanini et al., 2021).As per this same literature, the procedure differed for the switching (Color-Shape and Category Switch) and inhibition (Victoria Stroop and Happy-Sad Stroop) tasks, which contained various task blocks, only one of which had executive demands; the other blocks in each task were control blocks, that is, they contained similar stimuli but did not require participants to switch or inhibit, respectively.For tasks of both the latter EF domains, the indicators were absolute executive cost measures, that is to say, the RCS of blocks of tasks with executive demands minus the RCS considering all trials in both control blocks (so that the number of stimuli in the control and executive blocks was the same).However, there was an exception.Briefly, we added an extra indicator for the Victoria Stroop task [non-incongruent inhibition cost score: detailed in Segura et al. (2022), in the Supplementary Material and in Table 1] to correct for possible reading skill differences.This was necessary because this task involves inhibiting reading words and, instead, naming the ink colors that words are printed in; therefore, less proficient readers can actually find it easier to just name these ink colors so their scores may not reflect use of EF abilities per se.
Further details on the rationale for the choice of tasks and the scoring system, task adaptation, administration and scoring procedures can be found in Zanini et al. (2021) and are available at https://doi.org/10.17605/OSF.IO/2BX8N in English, Portuguese and Farsi.

Measurement model: unity/diversity model configurations
We tested seven different EF model configurations using CFAs under the maximum likelihood (ML) estimator including the seven EF indicators described above for the merged Each screen contains a single digit number (1 to 9).As participants pass from screen to screen, they report the last three seen digits (trios), in the same order as they were presented.The total number of updating opportunities = 24.
Total RCS (no control block) Switching (Colour-Shape) Three blocks: single-colored geometric pictures are presented on each screen (trial).From screen to screen, pictures must be classified by shape (squares/circles) (block 1: 20 trials), by color (black/gray) (block 2: 20 trials) or alternating (switching) classifications (block 3: 40 trials) according to cues presented on top of the pictures (abstract shape for shape, rainbow for color).
Switching costs: RCS in block 3 minus the sum of RCS in blocks 1 and 2 (Category Switch) Three blocks: single pictures are presented on each screen (trial).From screen to screen, each picture must be classified as living or non-living (block 1: 20 trials), big or small (block 2: 20 trial) or alternating (switching) classifications without cues (living/non-living, then big/small, and so forth) (block 3: 40 trials).
Switching costs: RCS in block 3 minus the sum of RCS in blocks 1 and 2 Note: Detailed descriptions of tasks can be found at https://doi.org/10.17605/OSF.IO/2BX8N.RCS: Rate Correct Scores (Vandierendonck, 2017), i.e. accuracy (vocal responses) divided by the total time taken to complete blocks/task.See Figure 1 for a visual illustration of the tasks.
samples.The same models were tested for the Iranian sample (Segura et al., 2022) and for the Brazilian sample (Supplementary Material) separately.Six of these EF model configurations were found in the literature as reviewed by Karr et al. (2018): (1) a unidimensional model (with all indicators loading on a single factor); two-intercorrelated factor models, with three (inhibition) and two (switching and updating) indicators combined to form latent factors [(2) updating and switching merged and an inhibition latent factor; (3) updating and inhibition merged and a switching latent factor; (4) switching and inhibition merged and an updating latent factor]; (5) a three-correlated latent factor model (inhibition, shifting and updating latent factors), which indicated fractionation of the three domains; (6) a bifactor model with a common factor onto which all indicators loaded plus an updating and switching latent factor with no covariation with each other.We also tested a ( 7) configuration called bifactor-(S-1) model (Eid, Geiser, Koch, & Heene, 2017): a common factor plus updating and shifting latent factors with covariation between the two specific latent factors to correct for anomalies in patterns of factor loadings (Eid, Geiser, Koch, & Heene, 2017).Importantly, an a priori covariation was included in all tested configurations: the residual covariation between the non-incongruent and incongruent absolute inhibition cost of the Victoria Stroop task (see Table 1) because they were obtained from the same test (see Segura et al., 2022).
Fit indices for each model were determined according to the recommendations of Schreiber, Nora, Stage, Barlow, and King ( 2006): chi-square (χ 2 ) test (p ≥ 0.05); root mean square error of approximation (RMSEA < .06 to .08 with corresponding p-value >0.05), standardized root mean-square residual (SRMR ≤ .08), the Tucker-Lewis Index (TLI ≥ .95) and the comparative fit index (CFI ≥ .95).When a model presented poor fit, we inspected the parameters that could improve the model fit that were proposed by Mplus (expressed as modification indices: MIs).If MI values were higher than 4 and were theoretically justifiable (Brown, 2015), analyses were carried out again including the estimation of the respective parameters.
If two or more models were found to be acceptable in terms of fit indices, we used the chi-square difference to compare models: non-significant p values indicate no significant increase in misfit from one model to another.After defining the best model that could indicate EF domain diversity (separability or fractionation of EF domains), we tested the measurement invariance of this model configuration for country with samples merged, as explained next.

Multiple-Groups Confirmatory Factor Analysis (MGCFA): invariance for country
Three progressive invariance testing steps (Brown, 2015) were carried out as this allows the establishment of whether the latent constructs (configural invariance), factor loadings (metric invariance) and intercepts (scalar invariance) were equivalent between the two countries.This is done by hierarchically imposing restrictions to progressive sequences of nested models and checking for significant decreases in model fit from one step to the next (Schreiber, Nora, Stage, Barlow, & King, 2006).The tested models resulting from these three steps were compared two-by-two using the chi-square difference test and differences in CFI, RMSEA and SRMR (Chen, 2007).Details of these analyses and Mplus syntax can be found in the Supplementary Material.

Multiple Indicators, Multiple Causes (MIMIC): invariance for age and SES
Measurement invariance across age (in months) and SES (level of parental schooling) of the best-fitting model configuration was also tested using MIMIC (Brown, 2015) invariance test.In MIMIC models, MIs > 4 of direct paths between the covariates of interest (i.e.age and SES) and the indicators are evidence of differential item functioning (DIF), showing measurement non-invariance.Models were also inspected for significant effects of covariates (age and parental schooling) on latent factors, which is evidence of population heterogeneity (i.e. if the factor means are different at different levels of the covariates).Model fit and MI cutoffs were the same as described above.

Results
The databank is available at https://doi.org/10.17605/OSF.IO/DN82J and the Mplus scripts can be found in the Supplementary Material.Participants were 9-15 years old (mean ± SD: Brazil = 12.52 ± 1.85; Iran = 11.60 ± 1.74) and average parental schooling level (SES proxy) ranged from 1 to 5 in both countries (Brazil mean ± SD: 2.48 ± 1.18; Iran mean ± SD = 3.31 ± 1.20).Data on extreme outliers with a score over five SD of the mean were excluded (one participant from the Brazilian sample in the Number Memory task and one from the Iranian sample in the Happy-Sad Stroop task).RCS of the inhibition and switching tasks were reverse scored so that higher values represent better performance.The raw and RCS scores per task and country can be found in the Supplementary Material (Table S1).

Measurement model
For model configurations per sample separately, see Segura et al. (2022) for results regarding the Iranian participants, and Figure S1 in the Supplementary Material for the Brazilian sample.In both cases, the three-correlated factor model had adequate fit.Fit indices for all the seven tested model configurations for the merged samples (Table 2) showed that the solutions with one and two factors did not reach acceptable fit and also did not retrieve theoretically acceptable MIs.The bifactor and bifactor-(S-1) models were also not acceptable due to non-convergence and the Heywood case, respectively.The only model with an acceptable fit was the three-correlated factor solution. Figure 2a shows the configuration, standardized factor loadings and fit indices for this model.Significant correlations among all factors were found (ranging from r = 0.29 to r = 0.79) and all indicators were significantly loaded on their respective factors.Factor loadings for the inhibition domain were the lowest, ranging from λ = 0.19 to λ = 0.50, and were moderate to high for switching (λ = 0.63 to λ = 0.68) and updating (λ = 0.62 to λ = 0.81).

Multiple-Groups Confirmatory Factor Analysis (MGCFA)
The MGCFA analysis of invariance was carried out using the three-correlated model that was found to have adequate fit with samples merged (Table 3).Configural invariance was achieved with a good model fit, indicating that the factor structure was equivalent between the two countries.The next step (metric invariance) involved constraints of the factor loadings of the two samples, which also showed good fit to the data with no significant degradation in model fit (∆χ 2 (4) = 5.17, p = 0.27), meaning that factor loadings were equivalent between countries and that they can be directly compared in this regard.The last, most restricted, model (scalar) involved constraining the intercepts of both countries to be equal.This model presented poor fit to the data (see Table 3) and produced a significant degradation of model fit (∆χ 2 (4) = 66.81, p < 0.001), with a negative residual variance due to problems with the 2-Back indicator.We therefore tested a similar scalar model but, alternatively, specified the intercepts of the 2-Back task to be freely estimated.This solution presented adequate fit to the data (χ 2 ( 27  intercepts of the 2-Back and Happy-sad Stroop indicators were non-invariant) showed acceptable fit indices (see Table 3) and retrieved no more MIs.The constraints of this model, however, produced a significant, although marginal, degradation of model fit regarding chi-square index (scalar against metric ∆χ 2 (2) = 7.78, p = 0.02 and scalar against configural ∆χ 2 (6) = 12.95, p = 0.04).The assessment of other fit indices differences between models suggested that partial scalar invariance is acceptable (∆CFI = 0.006, ∆RMSEA = 0.003 and ∆SRMR = 0.007).Freeing the noninvariant intercepts under the assumption of partial invariant models has been shown to be very robust and to effectively retrieve factor means (Pokropek, Davidov, & Schmidt, 2019).Therefore, in order to compare countries in terms of the EF latent abilities, we used the difference in the latent means from the final model that accounted for the two noninvariant parameters (2-Back and Happy-Sad Stroop intercepts).Iranian participants had higher latent EF traits in all EF domains (factor means difference in inhibition: 0.70, p = 0.005; shifting: 1.03, p < 0.001, updating: 0.82, p < 0.001).

Multiple Indicators, Multiple Causes (MIMIC)
MGCFA can only be used with categorical covariates, so we used MIMIC (Brown, 2015) to determine measurement invariance of the task indicators across age in months and average parental schooling, as well as the effect of these covariates on the EF factors (population heterogeneity).The MIMIC model included paths from the covariates to each EF factor and no direct path to the indicators.Inspection of MIs indicated improvement in model fit by including a correlation between the covariates age and SES and by regressing either of the two indicators of updating, Number Memory or 2-Back onto age, which indicated equal DIF.Standardized expected parameter change index (StdYX E.P.C.), which indicates the expected value of the parameter if the constraint to equality is released, was higher for the 2-Back task.We therefore estimated the model (Figure 2B) including both the correlation between the covariates and the direct path on the 2-Back indicator.This model yielded no more DIFs and had adequate model fit (χ 2 (17) = 67.77,p < 0.001, CFI = 0.97, TLI = 0.93, RMSEA = 0.064 and SRMR = 0.046).These results suggest evidence of partial invariance for age only regarding the updating 2-Back task (β = −0.38,p < 0.001).No DIFs were found regarding parental schooling, which is an indication of measurement invariance across this proxy of SES.Regarding effects of the covariates on the latent factors, older age predicted better inhibition (β = 0.70, p < 0.001), switching (β = 0.23, p < 0.001) and updating (β = 0.49, p < 0.001).Higher levels of parental schooling also significantly predicted better inhibition (β = 0.20, p = 0.009), switching (β = 0.30, p < 0.001) and updating (β = 0.32, p < 0.001).
Based on Cohen's d, all the effects of the covariates on the EF factors are small, except for the medium effect of age on inhibition and updating (Cohen's d = 0.70 and 0.49, respectively).

Discussion
The results of the present cross-country (Brazil, Iran) study found that the three-correlated EF domain factor configuration proposed in the initial version of the unity/diversity framework, comprising inhibition, shifting and updating (Miyake et al., 2000), is an acceptable solution for two developing and culturally diverse samples of early adolescents (aged less than 16 years).These results corroborate the separable nature of these three domains found in studies that explored this issue in only one country/culture in adults (Karr et al., 2018) Lee, Bull, & Ho, 2013), although other configurations have also been found (Karr et al., 2018).Additionally, a novel finding was that the threecorrelated latent factor configuration (configural invariance) and the extent to which each task loaded (contributed) to the EF latent trait domains (metric invariance) were comparable across samples from the tested countries, allowing us to directly contrast their latent traits and confirm better EF abilities in Eastern (Iranian) versus Western (Brazilian) young individuals (see Schirmbeck, Rao, & Maehler, 2020).The model was also mostly invariant for age (except for one indicator) and fully invariant across parental schooling.
Corroborating prior findings (Last et al., 2018;Schirmbeck, Rao, & Maehler, 2020) the MIMIC model also showed positive effects of age and SES on EF latent traits, although the latter were minimal.Furthermore, because we used samples from underrepresented and diverse countries (Barrett, 2020), one being in Latin America and the other in the Middle-East, our findings suggest that this particular EF fractionation is possibly extendable to samples from other cultural contexts from a similar age onwards.Below, we discuss our results in more detail.
Prior studies on adolescent participants that tested for invariance across countries only assessed one-factor models, that is, studied the unity of EFs (Ellefson, Zachariou, Ng, Wang, & Hughes, 2020;Holding et al., 2018;Wang, Devine, Wong, & Hughes, 2016;Xu, Ellefson, Ng, Wang, & Hughes, 2020).The single publication that we are aware of that investigated the separability of three correlated EF domains (invariant at the configural, metric and even scalar level) was carried out in pre-schoolers from Hong Kong and Germany (Schirmbeck et al., 2022).However, in this study, the model has theoretical domains (i.e.working memory, cognitive flexibility and inhibition) that are not the same as those proposed in the unity/diversity framework tested here (see Morra, Panesi, Traverso, & Usai, 2018).This highlights the need to better define the theoretical constructs that are under investigation when studying the fractionation of EFs.It is also noteworthy that construct validity (see Sherman, Brooks, Iverson, Slick, & Strauss, 2011) was ensured by the careful selection of the scoring metric (RCS; Vandierendonck, 2017), tasks and stimuli, which were culturally adapted (Zanini et al., 2021), and based on theory confirmed by literature reviews (Friedman & Miyake, 2017;Karr et al., 2018).Construct validity (Sherman, Brooks, Iverson, Slick, & Strauss, 2011) was also attained because the scores in the tasks of each EF domain shared variance.Additionally, the internal reliability of the measures was attested (see Sherman, Brooks, Iverson, Slick, & Strauss, 2011) by the finding of a wellfitting three-intercorrelated factor model as proposed by Miyake et al. (2000) with the merged sample that was invariant at the configural and metric levels and partially invariant at the scalar level.Together, these results suggest that the FREE test battery, which can be adapted to other cultural contexts, may be useful to researchers who intend to replicate our results and to further explore possible cross-cultural generalities of EF unity/diversity and factors that can affect these abilities.
Our three-correlated EF factor model displayed only partial scalar invariance across countries, with two noninvariant indicators (2-Back and Happy-Sad Stroop task).This is not surprising as this level of invariance is seldom found in different samples (see Chen, 2008).This means that the mean raw performance in the two noninvariant tasks cannot be explained solely by each participant's latent EF abilities and also includes systematic errors that are particular to each sample.Consequently, it would be fallacious to directly compare the raw scores of participants of both countries in these two non-invariant tasks because this can lead to spurious conclusions (Chen, 2008).We did not design the present study to answer why this occurred, so we can only speculate on possible reasons for this particular lack of scalar invariance.
van de Vijver and Tanzer ( 2004) suggest three factors that can lead to this type of bias: 1) poor item translation; 2) "nuisance effects"; and 3) cultural specifics.We believe poor verbal content translation/adaptation is unlikely to be the cause because stimuli were carefully selected and piloted to ensure instructions and stimuli were easily understood and highly familiar to both samples.The so-called nuisance effects (biases due to additional traits or abilities that are recruited to respond to tasks) cannot be discarded as a source of bias, since the problem of impurity in EF tasks is pervasive (Friedman & Miyake, 2017;Miyake et al., 2000); however, we focused on latent factor scores that are supposed to minimize this.This shows the importance of invariance testing in cross-country studies, at least at the configural and metric levels and the need to use more than one measure of each studied EF domain so that latent traits can be obtained.It follows that cultural-specific factors are probably responsible for the lack of invariance in the indicator intercepts level of the 2-Back and Happy-Sad Stroop task, which may have been influenced by abilities involved in EF task performance but not executive functioning per se.For example, bias in the horizontal direction of visual exploration or reading direction (left-to-right for the Brazilians and the opposite for Iranians) has been shown to affect how efficiently visual images are scanned and processed (Gonthier, 2022), which could have affected 2-Back task performance.However, more work is needed to explore the importance of spatial abilities in crosscultural studies (Fatima & Sharif, 2019;Gonthier, 2022;Schirmbeck, Rao, & Maehler, 2020).It is also unsurprising that performance in the Happy-Sad Stroop task was noninvariant considering cultural differences in the attribution of emotions to facial expressions and in socio-emotional competence development (e.g.Fatima & Sharif, 2019;Schirmbeck, Rao, & Maehler, 2020).Other cultural factors described next may also have been responsible for these specific effects and/or why Iranian adolescents had better EF latent abilities in all domains compared to Brazilians, corroborating the advantage of East vs.West performance found in many studies in children and adolescents (e.g.Schirmbeck, Rao, & Maehler, 2020).For instance, this may have stemmed from a myriad of factors such as the length of word pronunciation or the number of syllables in the words that represent the stimuli that were used (numbers, shapes, colors, etc.), which differ in Portuguese and Farsi.This could impact the amount of information that can be kept in mind at a given moment so it can also affect goal-directed maintenance and manipulation of information (Friedman & Miyake, 2017), potentially altering how people respond to EF task demands in all EF domains.We cannot discard other culture-specific aspects like multilinguism, which is much more prevalent in Iran than in Brazil and is believed to positively affect some EF measures (see Schirmbeck, Rao, & Maehler, 2020).Similarly, differences in schooling systems, parenting styles, mathematical abilities, social norms and/or social relations could have led to differential development of EFs, as well as motivational and/or test taking attitudes (see Fatima & Sharif, 2019;Schirmbeck, Rao, & Maehler, 2020).
The MIMIC analysis showed measurement invariance across age, except for one small positive direct effect on the updating 2-Back task.This may be explained by the fact that updating tasks in the EF unity/diversity literature are not usually controlled for the speed of processing, which has been shown to increase throughout development, especially during adolescence (see Korzeniowski, Ison, & Difabio de Anglat, 2021).This improvement may be systematically promoting an increase in performance in the 2-Back task with age, which is not captured by the EF updating latent trait.The participants' EF latent trait improvement across age (small-to-medium effect size) in the merged sample in all domains was an expected result as this is found considering raw (Last et al., 2018;Schirmbeck, Rao, & Maehler, 2020) and also latent factor scores (Xu, Ellefson, Ng, Wang, & Hughes, 2020) using many EF tasks.The MIMIC model also showed that tasks were invariant across parental schooling, a factor that has been found to impact EF development (e.g.Fatima & Sharif, 2019;Last et al., 2018).However, unlike what was observed for age, parental schooling exerted only a very small direct positive effect on latent scores, in keeping with our hypothesis, due to the adaptation of tasks to be minimally influenced by this factor (simple instructions and highly familiar stimuli), unlike the majority of studies in the literature, which show higher SES effects (Last et al., 2018).
In summary, this dual-site (Brazil and Iran) study in early adolescents (9-15 year-olds) involved diverse samples that exhibit high relative cultural distance (see Fatima & Sharif, 2019) in terms of history, geographic features, oral and written language and general sociocultural and educational practices and values.Despite these differences, the use of standardized tasks that were socioculturally adapted (the FREE battery) allowed us to show: 1) measurement invariance for country considering a three-correlated EF domain factor structure that corresponds to the one initially proposed in the unity/diversity framework (Miyake et al., 2000) and the factor loading of the indicators on their respective EF domains; 2) measurement invariance across SES (parental schooling) and partial invariance for age; 3) the expected improvement of EF latent traits with age and minimal effects of SES, despite the wide range of parental schooling levels.The fact that we used underrepresented populations in the literature and replicated the existence of three separable EF domains found in adolescent samples from (mostly) high income nations (Duan, Wei, Wang, & Shi, 2010;Hartung, Engelhardt, Thibodeaux, Harden, & Tucker-Drob, 2020;Wu et al., 2011;Xu, Ellefson, Ng, Wang, & Hughes, 2020) speaks to the possible cross-cultural generality of the separability of inhibition, shifting and updating.Overall, our findings substantiate the psychometric adequacy of the configuration of the EF unity/diversity framework at an age that includes all stages of the pubertal transition (Marceau, Ram, Houts, Grimm, & Susman, 2011).
Concerning limitations, our samples were not nationally representative of their respective countries and were restricted to adolescents from a specific generation and from highly urbanized cities.We also did not include highly underprivileged individuals who seem to be more vulnerable to poor EF development (Schirmbeck, Rao, & Maehler, 2020).Thereby, our results do not express the plurality of sociocultural contexts (see Taras, Steel, & Kirkman, 2016) that exist within Brazil and Iran.As per the literature, we did not have control blocks or conditions in the updating tasks, which may in future be found to impact the extent to which this domain is associated to switching and inhibition, both of which had indicators that control for the non-EF demands of each task (e.g.speed).We used a crosssectional study although, ideally, longitudinal designs are necessary to establish the interand cross-country developmental trajectory of the fractionation of EF domains.Nonetheless, longitudinal studies involve difficulties in controlling for practice effects that are common in EF tasks, which are not observed in cross-sectional studies such as ours.Additionally, because our goal was to test the unity and diversity framework, we used reflexive models as originally proposed (Miyake et al., 2000).However, the adequacy of these models has been challenged and other analytic approaches, including use of formative (Willoughby, Blair, & The Family Life Project Investigators, 2016) and composite models (Camerota, Willoughby, & Blair, 2020) have been suggested as alternatives, although how they can contribute to the development of theory on EF is still unclear.Also, it cannot be excluded that the latent factors derived from the tested models merely reflected effects of the pattern of inter-associations of this particular set of tasks and does not reflect differentiable EF abilities per se (Garcia-Barrera, 2019).Another limitation was that we followed the common practice of using only two tasks per domain for young participants (Karr et al., 2018), although at least three indicators are advised to achieve model identification (Kline, 2011).Nevertheless, most of our models were empirically identified (Brown, 2015) and even with only two tasks per domain we were able to find a good fitting three-factor solution by using tasks that reflect the EF abilities that were proposed in the original framework, suggesting that adequate task selection may be more crucial than the number of tasks in demonstrating the diversity of EF.However, having more indicators could have helped to overcome convergence issues with the bifactor and bifactor-(S-1) models.As to the adaptation of tasks and scores for the populations under study, we stress that most publications in this field employ different tests and/or use reaction times or accuracy as measures except for some rare cases (studies carried out by the same researchers/laboratory and/or papers on the same cohort that was tested at different life stages).Therefore, if varying tests and scoring system is a limitation of our study, it follows that this generalizes to most papers on the unity/diversity framework.Lastly, it should also be considered that the idea that differentiation of EF is brought about by development of EFs themselves has been challenged; it has been proposed that instead of a fractionation into domains, the development of EFs may result from developmental-related changes in other abilities that influence how people use executive control to reach specific goals and that depend on improvement of a range of other, broader skills that are recruited when carrying out EFs tasks (e.g.acquisition of knowledge, beliefs, values, motor and procedural abilities: see Doebel, 2020).Consequently, to better understand the cross-cultural generality of EF development and to establish how specific demographic and cultural factors affect them, future studies must not only replicate our results in more samples from different backgrounds and wider age ranges but also use other types of tasks that assess inhibition, shifting and updating and include more indicators.Coming studies should also investigate other EF domains, such as planning, dual tasking and verbal fluency (see Garcia-Barrera, 2019).After all, as proposed by the original unity/diversity framework (Miyake et al., 2000), EFs are not limited to three domains nor to a general one plus two updating and shifting specific factors (Friedman & Miyake, 2017).Taking these additional factors into consideration may better allow the exploration of an as yet unresolved issue (Doebel, 2020), which is how performance in EF laboratory tasks associate with real-life self-regulation outcomes and whether this changes across development and culture.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Figure 1 .
Figure 1.Illustration of the two tasks of each domain of executive functions: inhibition (A and B), switching (C and D) and updating (E and F).Note: Tasks are shown in English, but they were administered in modern Persian (Farsi) for the Iranian sample and in Brazilian Portuguese for the Brazilian sample.See Table1for details of the tasks and scoring method, or, for a text version, see https://doi.org/10.17605/OSF.IO/2BX8N.
) = 74.18,p < 0.001, CFI = 0.95, TLI = 0.93, and RMSEA = 0.069).Inspection of MIs indicated improvement of the model by freeing the equality constraint of the Happy-Sad Stroop intercepts.This final model (in which

Figure 2 .
Figure 2. Diagrams include factor loadings (values on the linear arrows) of executive measures (squares) on the executive latent variables (ovals): a) Diagram of the best solution among the 7 tested Confirmatory Factor Analyses models; b) Multiple Indicators, Multiple Causes (MIMIC) model showing the effect of the continuous covariates (age in months and mean level of parental schooling as a proxy for socioeconomic status -SES) on the three latent factors (ovals) of the three-correlated factor solution.Note: Standard errors of residuals for each task are represented by the numbers at the end of the arrows pointing toward each square.Double-headed arrows represent correlations of residual variances.Fit indices of each psychometrically adequate model are presented below each diagram.For fit indices cutoff scores, see text.Rate Correct Score costs of the inhibition and switching tasks are reversed scored (see text).The diagram also displays factor loadings (values on the linear arrows) of executive tasks (squares) on the executive latent variables.*Indicator with differential item functioning for age in months (significant direct path (effects) from age to the 2-Back total score, β = -0.38,p < 0.001).

Table 2 .
Model fit for the seven tested confirmatory factor analyses model configurations for samples from Brazil and Iran merged.

Table 3 .
Model fit information for the Multiple-Groups Confirmatory Factor Analysis (MGCFA) nested models considering the three-correlated factor structure of the merged samples.In the scalar final model intercepts of the 2-Back and Happy-Sad Stroop tasks were not constrained to equality between groups.For fit indices cutoff values, see text.