Questioning cognitive heterogeneity and intellectual functioning in fetal alcohol spectrum disorders from the Wechsler Intelligence Scale for Children

Abstract Introduction: Fetal Alcohol Spectrum Disorders (FASD) are characterized by a variety of multiple cognitive and behavioral impairments, with intellectual, attentional, and executive impairments being the most commonly reported. In populations with multiple neurodevelopmental disorders, the Full Scale Intelligence Quotient (FSIQ) may not be a proper measure of intellectual abilities, rarely interpreted in FASD clinical practice because the heterogeneity of the cognitive profile is deemed too strong. We propose a quantitative characterization of this heterogeneity, of the strengths and weaknesses profile, and a differential analysis between global cognitive (FSIQ) and elementary reasoning abilities in a large retrospective monocentric FASD sample. Methods: Using clinical and cognitive data (Wechsler Intelligence Scale for Children) from 107 children with FASD, we characterized subject heterogeneity (variance and scatter of scaled/composite scores), searched for strengths and weaknesses, and specified intellectual functioning in terms of FSIQ and elementary reasoning (General Abilities Index, Highest Reasoning Scaled Score), in comparison with standardization norms and a Monte-Carlo-simulated sample from normalization data. Results: Performance of children with FASD was lower on all subtests, with a significant weakness in working memory and processing speed. We found no increase in the variance and scatter of the scores, but a discordance between the assessment of global cognitive functioning (28% borderline, 23% deficient) and that of global and elementary reasoning abilities (23–9% borderline, 15–14% deficient). Conclusion: Our results question the notion of WISC profile heterogeneity in FASD and point to working memory and processing speed over-impairment, with global repercussions but most often preserved elementary reasoning abilities.


Introduction
Alcohol consumption during pregnancy is a major cause of neurodevelopmental disorders (Flak et al., 2014;Tsang et al., 2016).The clinical consequences of prenatal alcohol exposure are grouped under the diagnosis continuum of Fetal Alcohol Spectrum Disorder (FASD) (Riley & McGee, 2005).Several international diagnostic guidelines converge on common key clinical criteria (Astley & Clarren, 2000;Cook et al., 2016;Hoyme et al., 2016).They distinguish fetal alcohol syndrome (FAS), with its specific association of physical features, from non-specific non-syndromic FASD (NS-FASD), when these physical criteria are absent or incomplete but a probabilistic causal link can be assumed between cognitive and behavioral impairment and significant prenatal alcohol exposure.
Primary studies have reported a group-level cognitive profile in FASD characterized by poor intelligence or intellectual functioning.The majority of children with FASD showed borderline to mildly deficient Intellectual Quotient (IQ) (Mattson et al., 1997(Mattson et al., , 2011;;Streissguth et al., 1996), but IQ score showed a weak correlation with adaptive functioning in this population (Boseck et al., 2015;Kautz-Turnbull & Petrenko, 2021).A typical FASD cognitive profile dominated by poorer performance on executive and perceptual-motor tasks than verbal ones has also been reported (Sampson et al., 1989;Streissguth et al., 1989Streissguth et al., , 1996)).However, this profile has not been fully or consistently replicated and the current consensus insists instead on inter-individual variability in the cognitive profile of individuals with FASD (Astley et al., 2009;Kodituwakku, 2007).Indeed, more recent studies of the cognitive phenotype in FASD highlight impairments in cognitive domains that vary from study to study but collectively cover the full range of cognition: impairments in executive functions, working memory, attention, learning, language, problem-solving, intelligence, and fluid reasoning (Astley et al., 2009;Green et al., 2009;J. L. Jacobson et al., 2021;Lewis et al., 2015;Rasmussen et al., 2013;Streissguth et al., 1989;Vaurio et al., 2008).Yet, within this variability of profiles, the executive deficit seems to be the most consistent and common, leading some authors to conceptualize it as the core of cognitive symptomatology (e.g.Fuglestad et al., 2015).In clinical practice, multiple neurodevelopmental diagnoses have been given to patients with prenatal alcohol exposure reflecting this variability and multiplicity of profiles.These diagnoses are in decreasing order of frequency: Attention Deficit Disorder with or without Hyperactivity (ADHD), mild to borderline Intellectual Development Disorder (IDD), various Specific Learning Disorders (SLD), and Developmental Coordination Disorder (DCD) (Chasnoff et al., 2015;Geier & Geier, 2022;Popova et al., 2016;Weyrauch et al., 2017).This diversity and multiplicity of functional diagnoses is also reflected in international diagnostic guidelines, where cognitive and behavioral impairment in at least two domains is a required criterion for the diagnosis of FASD (Astley & Clarren, 2000;Cook et al., 2016;Hoyme et al., 2016).
The question of how multiple impairments, particularly attentional and executive ones, influence the assessment of intellectual functioning in children with FASD has implications for the diagnosis of Intellectual Developmental Disorder (IDD) in this population, but the issue has not been specifically studied.According to the International Classification of Diseases 11th revision (ICD-11), IDD is characterized by intellectual and adaptive functioning below the 2.5 th percentile of the general population (World Health Organization, 2019).Intelligence is most often seen as an overall measure of cognitive abilities, a latent variable that explains most of the variability in cognitive performance (Deary et al., 2010).The Wechsler Intelligence Scales for Children (WISC) are the most widely used scales for intellectual assessment in the pediatric population.Based on multiple specific cognitive tasks, a hierarchical factorial model supports the calculation of several domain-specific composite scores and the Full Scale Intellectual Quotient (FSIQ), a proxy for the most general factor (Wechsler, 2016).However, the use of the FSIQ to assess intellectual functioning may be misleading in some individuals with neurodevelopmental disorders because of strong cognitive dissociation (Fiorello et al., 2007;Flanagan & Kaufman, 2004).Hence, earlier versions of the WISC cautioned against the relevance of the FSIQ if there was "too much" heterogeneity in the cognitive profile, at least in the series of standard notes or scores hereafter referred to as the "Wechsler or WISC profile" (Wechsler, 1991).This claim is no longer present in more recent editions (e.g.Wechsler, 2005bWechsler, , 2016) ) but is still current in daily clinical practice, which can make intellectual assessment somewhat complicated or equivocal.Indeed, the Diagnostic and Statistical Manual of Mental Disorders 5 th edition (DSM-5) definition of IDD tries to specify the clinical meaning of intellectual functioning as a reflection of abilities in" reasoning, problem-solving, planning, abstract thinking, judgment, academic learning, and learning from experience" (American Psychiatric Association, 2013).However, it still mixes different levels of integration, and to a certain extent, reasoning and executive functioning.To obtain a composite measure of intellectual functioning that better disentangles elementary reasoning abilities from more integrative, executive, or procedural ones, the WISC proposes the General Ability Index (GAI) that focuses on reasoning subtests and excludes working memory and processing speed ones (Lecerf et al., 2010).Nevertheless, it remains sensitive to instrumental dysfunction such as verbal, gestural, and visual-spatial impairments or any source of domain variability in reasoning.Alongside the FSIQ and GAI, a third level of analysis may then be proposed at the scale of each subtest specifically built upon an elementary reasoning task-for instance inductive, sequential, quantitative, and categorical (McGrew, 2009)-to unveil any reasoning ability of the subject, may it be preserved or accessible in only one modality.
This study aims to question the analysis of the WISC-based first step of cognitive investigation in children with FASD to determine what can be expected in terms of the WISC profile (internal heterogeneity and inter-subject regularity) and how to interpret intellectual functioning outcomes.As a first hypothesis, we assumed the WISC profile to be more heterogeneous in the FASD than in the normalization sample, reflecting the variability and multiplicity of cognitive impairments previously described in the literature.To test this hypothesis, we measured individual variance in inter-scaled and inter-composite scores as an adjunct to the classical scatter analysis.Secondly, we assumed sufficient regularity in the WISC profile among children with FASD to identify a mean profile of strengths and weaknesses at the group level.Finally, we expected that in the FASD sample, an assessment of intellectual functioning grounded in reasoning abilities with GAI would lead to the classification of fewer patients within the deficit or borderline range compared to the utilization of a global cognitive functioning indicator such as FSIQ.We introduced the Highest Reasoning Scaled Score (HRSS) as a potentially more differentiating measure than the GAI, as a new metric that would specify intellectual assessment by identifying and quantifying the preservation of elementary reasoning across at least one modality.We then compared the range of intellectual functioning within our FASD sample using the HRSS, the GAI, or the FSIQ for classification.

Participants
Participants were retrospectively included from a large clinical series of patients attending the NDD-dedicated child neurology consultation at the Robert-Debré University Hospital (AP-HP Paris, France) between 2012 and 2020.The diagnosis of FASD was established according to the 4-Digit Diagnostic Code or 4-DDC (Astley & Clarren, 2000) and the revised guidelines of the Institute of Medicine (Hoyme et al., 2016) and a complete differential diagnosis work-up had to be performed, including systematic brain MRI and genetic testing (see supplementary material Table 1 for details of the diagnostic procedure).We assessed the concordance between FASD diagnostic assignments using one or the other guidelines.
Inclusion required a fully documented diagnosis of FASD by a specialized child neurologist with reference expertise in FASD (D.G.) and the availability of a cognitive assessment using the WISC 4th or 5th edition, de facto restricting the age range to between 6 and 16 years of age.Exclusion criteria were prenatal exposure to another major developmental toxic agent such as sodium valproate (see supplementary material Table 1 for details) or explicit refusal to participate in the study.A total of 108 children met the inclusion criteria, one patient with concomitant exposure to sodium valproate was excluded, leaving a total of 107 children (39 girls, 68 boys).Three previous neuroimaging studies have described part of the cohort (see Fraize, Convert, et al., 2023;Fraize, Garzón, et al., 2023;Fraize, Fischer, et al. 2023).
This study was not a clinical trial, and the use of care-related data was approved by the appropriate ethics committee (CER-Paris-Saclay 2020-094) ensuring proper consent of participants.

Primary variables and cognitive profiles
From the medical records of each patient, we collected several clinical and demographic variables (age, sex, family status, maternal professional status, FASD diagnosis, functional diagnosis in terms of NDD diagnosis (DSM-5), methylphenidate treatment at time of assessment, site of assessment), as well as WISC scaled scores, composite scores and version of the scale (IV or V).All cognitive assessments were reviewed by a neuropsychologist of the research team (E.K.).

WISC profile heterogeneity and strengths/weaknesses
To examine the within-subject variability, we calculated the scatter (i.e. the difference between the maximum and minimum score) of the scaled and composite scores, as well as the variance between scaled scores and the variance between composite scores.
To examine the profile of patients' strengths and weaknesses, we computed the differences between each composite score and the Mean of the Composite Scores (MCS), and the differences between each scaled score and the Mean of the Scaled Scores used to calculate the composite scores (MSS).In the following, we will refer to these differences from the reference simply as strengths and weaknesses when the statistical tests for inference are significant.

Intellectual functioning and elementary reasoning abilities
We calculated the General Abilities Index (GAI) based on the French norms published by Lecerf et al. (2010) for the WISC-IV subgroup, as it was not available in the French Administration and Scoring Manual (Wechsler, 2005a).We selected the Highest Reasoning Scaled Score (HRSS) among the elementary reasoning subtests (Similarities, Comprehension, Block Design, Matrix Reasoning, Figure Weight, Visual Puzzles, and Picture Concepts) to obtain a single score reflecting the best elementary reasoning irrespective of the modality.
We categorized FSIQ, GAI, and HRSS into three categories: The choice of the deficit range thresholds for the classification based on the FSIQ and the GAI was made in accordance with the threshold reported in the DSM-V and ICD (American Psychiatric Association, 2013;World Health Organization, 2019).For the borderline range, since DSM-V no longer specifies an FSIQ score range, we selected the threshold corresponding to the "below average score" threshold of the American Academy of Clinical Neuropsychology consensus (Guilmette et al., 2020).The thresholds for the classification based on the HRSS were chosen for both fitting the distribution of the highest score among the five elementary reasoning subtests and being relevant in terms of clinical interpretation of the highest-scoring subtest.Indeed, we computed the HRSS cumulative distribution according to the WISC-IV and WISC-V standardization samples (see below the Simulated Comparison Group section).The 2.5 th percentile was between an HRSS of 7 and 8 in both cases and the 10 th between 8 and 9 for the WISC-V while being a bit above 9 for the WISC-IV (see supplementary material Table 2), consistent with our choice to limit the borderline range to an HRSS of 8.At the level of the highest scoring subtest, these choices remained in agreement with the classical clinical interpretation that states ≤ 7 to be weak and 8 to be the first value of the average range (Sattler & Dumont, 2004), hence arguably limit, particularly for the highest scoring.

Statistics
All analyses were performed using R statistical Software (v4.1.2;R Core Team, 2021).We used the Benjamini and Hochberg False Discovery Rate (FDR) method (Benjamini & Hochberg, 1995) for p-value correction to account for the increased risk of type I error in multiple comparisons.

Accounting for covariables and test version
We assessed the feasibility of pooling cognitive data from the two versions of the WISC.First, we tested whether the WISC-IV and WISC-V subgroups differed on clinical, cognitive, and demographic variables (Student's t-test for continuous dependent variables and Chi-2 tests for categorical variables).Next, we assessed the effect of the WISC version on the scaled and composite scores in a unifactorial analysis (Student's t-test).We then performed multifactorial analysis (ANOVAs) to disentangle the influence of clinical, sociodemographic, and version factors.To account for missing data in the sociodemographic variables, we estimated the overall percentage of missing data, then checked whether these missing data were randomly distributed using Little's method (Little, 1988) from the naniar package (Tierney & Cook, 2018), and finally, we performed an imputation of missing data using mixed data factorial analysis from the FactoMineR package (Lê et al., 2008).

Simulated comparison group (SCG).
Since the norms available in the administration and scoring manuals (Wechsler, 2005a(Wechsler, , 2016) ) did not cover all the metrics we wanted to analyze or the norms of the metrics used were not calculated in the same way for both versions of the scales, we performed two Monte-Carlo simulations based on the method fully described in Aubry and Bourdin (2018) to construct a simulated comparison group (SCG).This method consists in generating a large sample (n = 10,000) that follows a normal distribution while respecting the constraints of the correlations between the subtests of the standardization sample.This procedure was used twice to simulate two SCGs based on the version of the scale (WISC-IV or WISC-V).To compare the pooled FASD data (WISC-IV and WISC-V), we created a pooled SCG by randomly selecting the same percentage of WISC-IV and V SCGs as in the FASD sample.
Inferences.As a preliminary to our heterogeneity analyses, we tested whether there was a relationship between the magnitude of the FSIQ and the proposed heterogeneity measures in the standardization sample, i.e. whether lower IQ or higher IQ was associated with lower or higher WISC profile heterogeneity.
To answer this preliminary question, we tested a linear predictive model of scatter and of variance by FSIQ.Then, we compared the mean variance and scatter between the subgroups of children with FASD and the SCG using twosample unpaired Student's t-tests, we also compared the proportions of children with FASD with abnormal variance and scatter (greater than or equal to the 90 th percentile of the SCG distribution) using Fisher's exact tests, and finally we compared the means of the scatter of the FASD subgroups with those of the standardization samples using one-sample Student's t-tests.As for the profile itself, we compared the means of the FASD and pooled subgroups with the SCG for each scaled score and composite score using unpaired two-sample Student's t-tests.The procedure was the same for weaknesses and strengths, but we also compared the proportions of children with FASD who had a weakness at or below the 10 th percentile of the SCG distribution using Fisher's exact tests.Finally, we compared the proportions of children with FASD and SCG subjects classified in the deficient, borderline, or normal range using Fisher's exact tests and associated odds ratio (OR).We also conducted Bootstrap resampling comparisons (N samples = 1000) to assess whether there were differences in the OR of being classified in in the borderline-to-deficient range between the different metrics.For the latter analysis, the sample of children with FASD was reduced in size (N = 88) because the analysis required sufficiently complete data to calculate the FSIQ, GAI, and the HRSS across all reasoning subtests.Post-hoc multiple linear regression analyses were performed with the functional diagnoses and their interaction, or the sociodemographic variables, as predictor variables to examine their potential contribution to WISC profile heterogeneity.
The search for an abnormal proportion of subjects above or below the 10 th and 90 th percentile limit values (known as normative analysis) was considered complementary to comparing means in the case of potentially unequal or heterogeneous distributions among patients, and because it would show the extent of potential FASD specificity at the individual level.

Results
All included children had received a cognitive assessment using the WISC-IV (62.62%) or WISC-V (37.38%).This assessment was performed on-site (Robert-Debré Hospital) for 55.14% and at a mean age of 10.14 years (SD = 2.28).Descriptive statistics of the 4-DDC criteria in the FASD sample are presented in supplementary material Tables 3  and 4, proportions of the main associated NDD diagnoses in the FASD sample are represented in supplementary material Figure 1.Interrater reliability between the 4-DDC and the IOM diagnostic guidelines was high (Κ = .79,p < .001).

Unifactorial effect of WISC version
Descriptive statistics of the sociodemographic and clinical variables of the FASD sample and the comparison between the two subgroups according to the WISC version are presented in Table 1.The FASD subgroups did not differ in clinical and sociodemographic variables except for methylphenidate treatment at the time of assessment with a slightly higher proportion of treated subjects in the WISC-V subgroup (N = 15 vs. 11 for the WISC-IV subgroup) but the p-value was not significant after correction for multiple comparisons (χ 2 (1) = 4.68, p adj = .49).Mean performances on the scaled scores and composite scores were not different between the two groups except for the Similarities subtest (t(102) = 3.03, p adj = .02,cohen's d = .59),with a higher mean in the WISC-IV subgroup (Table 2).

Multifactorial clinical and sociodemographic effects
Missing data on the 7 subtests common to both scales and the 7 clinical and sociodemographic variables accounted for 3.60% of the data and were completely randomly distributed (χ 2 (227) = 199, p = .08),allowing their imputation for multifactorial ANOVA.We found only two parameters affecting WISC performances in FASD: a sex effect on the Coding subtest (girls > boys, F(1,96) = 8.38, p = .005,p adj = .04,η 2 = .08)and an FASD diagnostic effect on the Symbol Search subtest (NS-FASD > FAS, F(1,96) = 8.22, p = .005,p adj = .04,η 2 = .08).The tendency for a WISC version effect on the Similarities subtest (p adj = .15),Socio-economic status and methylphenidate treatment effects on the Vocabulary subtest (p adj = .19,p adj = .07resp.) and a sex effect on the Symbol Search subtest (p adj = .14)did not last after correction for multiple comparisons (Table 3).
We performed further analyses on both the separate WISC subgroups and the pooled data, except for the Similarities subtest and the Verbal Comprehension Index for which pooling seemed too disputable.

Heterogeneity
Preliminary analyses found no significant relationship between FSIQ categories and the variances or the scatters of the composite and scaled scores (all p adj > .05,supplementary material Table 5), allowing direct comparison of children with FASD and SCG on these measures.There were no significant mean differences in heterogeneity measures (scatter and variance) between the FASD subgroups and the SCG all p adj > .05(Table 4).The results were similar for normative analyses (supplementary material Table 6) or comparisons with the standardization sample, except for a single difference on scaled scores scatter between the WISC-IV subgroup and the standardization sample (p adj = .04,d = 0.35) (supplementary material Table 7).There were no significant associations between  measures of heterogeneity and sociodemographic or clinical factors such as associated NDD diagnoses (all p adj > .05)(supplementary material Tables 8 and 9).

General population contrast
There were significant differences between the FASD and SCG groups for all scaled and composite scores (all p adj < .05),except for the Similarities subtest in the WISC-IV subgroup (p adj = .17).Effect sizes were large (Cohen's d > 1) for the FSIQ, Working Memory Index, Verbal Working memory subtests (Letter-Number Sequencing for the WISC-IV subgroup, and Digit Span for WISC-V and pooled groups), and Coding, with Cohen's d ranging from −1.09 to −1.35 (Table 5).Results were similar when comparing children with FASD to standardization norms or with normative analysis (supplementary material Table 10).

Intra-subject contrast: strengths and weaknesses
We found significant strengths on the Similarities, Picture Concept subtests, and the Verbal Comprehension Index only in the WISC-IV subgroup.Conversely we found significant weaknesses on the Letter-Number Sequencing and Coding subtests in the WISC-IV subgroup, on the Digit Span subtest in the WISC-V subgroup and pooled group, and on the Working Memory Index for both sub-and pooled groups (all p adj < .05)(Table 6).Consistently, normative analyses showed significant weaknesses on the Coding subtest and the Working Memory Index in the pooled group (supplementary material Table 11).

Global intellectual functioning vs. elementary reasoning abilities
Descriptively, children with FASD were 1.4 times more likely to be classified within the borderline to deficient ranges on the FSIQ than on the GAI, and 2.25 times more likely than on the HRSS.The classification (deficient, borderline, normal) was the same for all criteria (FSIQ, GAI, and HRSS) in 51 subjects (58%).Of the 37 (42%) remaining subjects, 12 (13.6%)had an FSIQ in the borderline range but a GAI and an HRSS in the normal range and 9 (10.23%) had an FSIQ and a GAI in the borderline to deficit range but an HRSS in the normal range, and 3 (3.4%)had an FSIQ in the deficit range, a GAI in the borderline range and an HRSS in the normal range (Table 7).
We then compared the proportions of subjects in the borderline, deficit, and borderline-to-deficit ranges between the SCG and children with FASD according to the different intellectual functioning criteria.The odds ratio (OR) of children with FASD classified in each range depending on the FSIQ, HRSS, or GAI were all > 1 (OR ranging from 3.68 to 12.71, all p adj <.01) (Table 8).The OR of being classified as borderline-to-deficient was significantly different (approximately 1.7 times higher) between the FSIQ and GAI (p adj < .001),and between the FSIQ and HRSS (p adj = .03).Nonetheless, even though the HRSS reference was more conservative in terms of functioning (fewer patients were classified as borderline-to-deficient), this OR was not different between the GAI and HRSS (p adj = .81).See supplementary material Figure 2 for an example of the profiles of six children with FASD, who were either consistently or inconsistently categorized using these three metrics.

Discussion
Our study found that children with FASD may not have more heterogeneous WISC profiles than the general population, contrary to what we first hypothesized.Yet it Table 6.strengths and weaknesses: means comparisons of differences between composite scores and mean composite score and of differences between scaled scores and mean scaled scores between fetal alcohol spectrum disorders and simulated comparison groups.confirmed the second assumption that these profiles would show a certain level of regularity among globally lower mean performance, subtests, and composite scores of working memory and processing speed emerging as significant weaknesses both at the mean level and in an excessive number of patients.The possibility of better verbal functioning was suggested by the assessment with the WISC-IV but not confirmed at the whole sample level.In line with our last proposition, our results showed that the proportion of children with FASD classified in the borderline to deficient ranges was approximately twice as large using the FSIQ compared to the GAI or the Highest Reasoning Scaled Score, with little group difference between the last two metrics, supporting that the choice of a clinically relevant measure of intellectual functioning is not only a theoretical but also a practical issue in this population.

No indication of excessive cognitive heterogeneity in FASD WISC profile
The absence of significant excess heterogeneity of the WISC profile in our FASD sample may be surprising given the description of FASD in the literature as highly heterogeneous in terms of cognitive impairment and associated functional diagnoses (Astley et al., 2009;Kodituwakku, 2007).In addition, a few studies have demonstrated heterogeneity in Wechsler profiles in samples of children with specific learning disorders using scaled scores scatter or discrepancies between composite scores (Poletti, 2016;Watkins, 2005).As we checked at the outset, this lack of significant heterogeneity cannot be directly explained by a shrinkage related to an overall lower level of functioning in the population.However, this overall lower level of functioning may still have masked the heterogeneity that a significant proportion of language and coordination developmental disorders would be expected to produce.Another non-exclusive and plausible explanation is that the Wechsler Intelligence Scales do not allow for a sufficiently accurate assessment of cognitive heterogeneity, since the range of cognitive domains is limited by tasks that focus mainly on multi-modal reasoning, working memory, and processing speed.Moreover, FASD-related executive dysfunction may induce variability in performance across WISC subtests, but not in a systematized or even systematic way (over time) which would be consistent with the Wechsler Intelligence Scales not being designed to be sensitive to or analytic in executive functioning.In any case, our result challenges the idea that the FSIQ score should not be used in cases of children with FASD because of an excessive heterogeneity of their WISC profile.In fact, the construct and prognostic validity of the FSIQ is not questioned by the profile heterogeneity (Daniel, 2007;Freberg et al., 2008;Watkins et al., 2007), but it is rather its diagnostic and functional significance that should be reconsidered in these clinical contexts.

Strengths and weaknesses revealed by FASD WISC profiles
In regard to group-level cognitive profiles, the strength observed on the Similarities subtest is consistent with a previous study conducted on a large cohort of children with FASD that identified this same group-level characteristic (Streissguth et al., 1996) but not strictly with another study conducted by the same team that identified the Vocabulary and Comprehension subtests instead (Sampson et al., 1989), and many subsequent studies that found no verbal and nonverbal dissociation (Kodituwakku, 2007 for a review).Furthermore, we did not find this effect in the WISC-V subgroup without finding any explanation other than the test version.The significant weaknesses found in working memory and processing speed were consistent with previous work showing that the Arithmetic and Digit Span subtests were more impaired (Sampson et al., 1989;Streissguth et al., 1996), as well as the processing speed composite score (Dalen et al., 2009).The large proportion of children with ADHD in our sample was consistent with the pattern of cognitive weakness previously reported in FASD-unrelated ADHD samples (Thaler et al., 2013;Theiling & Petermann, 2016).Interestingly, the population with FASD-related ADHD was reported to have more impairment in the   working memory and processing speed domains than the unrelated population.Furthermore, their impairment in the processing speed domain was mostly due to the Coding subtest (Raldiris et al., 2018), which calls on executive functions, working memory, graphomotor skills (L. A. Jacobson et al., 2011), and procedural learning (Beljan et al., 2022).Together, these findings are consistent with the proposition that the executive deficit is a key feature of the cognitive profile of children with FASD (Fuglestad et al., 2015).Identifying a specific neurobehavioral profile for individuals with FASD has been a major area of research (Mattson & Riley, 2011).One goal of this quest is to aid the diagnosis of non-syndromic forms of FASD, where the causal relationship of the diagnosis remains probabilistic in the absence of key physical features of FAS.Despite the extensive research, a clear characterization of individuals with FASD has yet to be established.Of course, our findings, which are based solely on the Wechsler Intelligence Scales as opposed to a variety of neuropsychological tasks, do not provide a comprehensive answer to this ongoing debate, nor do they suggest any specificity of the WISC profile.However, they show that, on a WISC-based initial neuropsychological assessment, children with FASD are expected to present with a significant and suggestive impairment in working memory and processing speed.

Global cognitive functioning vs. elementary reasoning abilities in FASD
We found that all three selected measures, or at least proxies, of intellectual functioning consistently classified more than half the children with FASD in our sample, and that low scores were restricted to the mild to borderline range of severity.The remaining children, with few exceptions, were as expected classified in a higher functional level category when this intellectual categorization was based on their elementary reasoning abilities (HRSS and GAI) compared to the FSIQ.Similarly, the odds of being classified in the borderline and deficit ranges for a child with FASD compared to the general population were larger based on the FSIQ than on the GAI or HRSS but not between these last two metrics.Indeed, there were fewer children classified as borderline on the HRSS.Nonetheless, it can be argued that this range of functioning was narrower by construction due to the raw granularity of this metric.However, the HRSS showed relevant in distinguishing individuals with borderline and even deficient composite reasoning abilities (GAI) and well-preserved reasoning ability in at least one modality (see supplementary material Figure 2).Taken together, these findings strongly suggest that intellectual impairment defined as a global deficit in cognitive functioning may not be uncommon in FASD, but that a generalized deficit in elementary reasoning abilities is much less common in this population.
Several studies have questioned the relevance of FSIQ as a good proxy of intellectual functioning across the whole spectrum of neurodevelopmental disorders.In specific learning disorders, categories of neurodevelopmental disorders without intellectual impairment by definition, poorer performance in working memory and processing speed were shown to be associated with a decrease in the FSIQ and dissociation between the Cognitive Proficiency Index CPI (an index including only the working memory and processing speed subtests) and the GAI (Toffalini et al., 2017).These findings were consistent with the dissociation observed in a sample of individuals with ADHD between the GAI and the FSIQ (Theiling & Petermann, 2016) and the underestimation of the intellectual functioning of children with ASD with FSIQ when compared to fluid reasoning ability measured by Raven's Progressive Matrices task (Dawson et al., 2007;Nader et al., 2016).Indeed, the debate suffers from a certain circularity since the populations where the different proxies of intellectual functioning were tested were often defined in the first place as with or without intellectual impairment, or occasionally characterized by measures of adaptive functioning which can be strongly affected even in the absence of intellectual impairment (Cornoldi et al., 2014;Koriakin et al., 2013).The relevance of the distinction between a global cognitive measure and more elementary reasoning ones to assess intellectual functioning seems to be restricted to mild or borderline situations, but they are unfortunately frequent.In the end, aside from theoretical considerations on intelligence, it appears clearly that what is at stake is the ability to characterize elementary reasoning abilities independently of instrumental or executive ones, and that this may require the use of at least several proxies of intellectual functioning, not alternatively of one or the other, nor a single one that is applied to all situations (Lanfranchi, 2013).

Clinical considerations
The first clinical lesson of our results is that the measurement and interpretation of heterogeneity should not be made a preliminary question to the analysis of a psychometric profile in children with FASD, since the intra-subject WISC variances and scatters failed to capture and anticipate both the weakness in working memory and processing speed and the dissociation between FSIQ and more elementary reasoning proxies.It is all the more debatable since the construct of the scatter makes it a rather gross measure of heterogeneity and normative data are not easily available for the variances.Second, we consider that our results support a bottom-up approach to these WISC profiles, in which some characteristics at the lower level (scaled scores or composite scores) condition not the validity but the interpretability (for instance functional or diagnostic meaning) of higher-level constructs such as the FSIQ.
Regarding intellectual functioning, it means that the identification of an HRSS, in the first instance, in the preserved or borderline range of functioning should prompt the computation of the GAI among other ancillary indices before any interpretation of the FSIQ in terms of intellectual impairment or intellectual development disorder.In the case of a dissociation between the FSIQ and the GAI or HRSS, one should first consider only the most parsimonious interpretation of these 3 potential proxies of the patient's intellectual functioning: FSIQ should be seen as a measure of global cognitive abilities, GAI as a measure of multimodal reasoning abilities, and HRSS as a measure of the best elementary reasoning ability within a specific modality (conceptual, inductive, sequential, visuo-spatial, quantitative).The aim of this approach is to seek for intellectual preservation even in a context of multiple instrumental, procedural or executive (attentional) impairment that prevents more comprehensive and regular access to intellectual resources.Of course, the latter interpretation should be taken very cautiously given the low robustness of scaled score measurements taken individually (McDermott et al., 1990;Watkins, 2003), but it is at least supported by the rather good consistency between the odds of being considered intellectually impaired with either HRSS or GAI in this population.In the end, FSIQ, GAI, and/or HRSS within different ranges of functioning should of course prompt the search for multiple specific cognitive impairments by means of complementary neuropsychological investigations.
Regarding more specific cognitive domains such as instrumental, procedural, or working memory, strengths and weaknesses should be systematically searched for in both scaled and composite scores, with at least a 10 th (90 th resp.)percentile threshold, and then related to the patient's behavioral and adaptive complaints.The relevance of the scaled score level of analysis, and not only the composite score, is highlighted in our results by the difference between the Coding and Symbol Search subtests that may be more sensitive to executive or attentional impairment than the composite Processing Speed Index, advocating for more in depth investigation during further neuropsychological assessment, especially for executive and attentional domains.
That said, this sequential procedure does not question the value of the FSIQ construct nor its well established adaptive or prognostic correlates (Daniel, 2007;Freberg et al., 2008;Watkins et al., 2007).It can and perhaps should be calculated and interpreted in any case, but its interpretation as a proxy of intellectual functioning that may lead to a diagnosis of intellectual impairment or IDD must be taken cautiously within the multiple impairment or multiple NDD context of FASD.

Limitations
Our study shows several limitations due to the retrospective nature of the data collection.The first is the lack of a proper typically developing control group.The WISC 4 th and 5 th editions benefit from French standardization upon a large normative sample, and we were able to overcome the limitation of these norms by extending them to the metrics we wanted to use, thanks to computational simulation (Aubry & Bourdin, 2018), replicating a statistically sound strategy previously reported in several publications (Brooks, 2010(Brooks, , 2011;;Brooks & Iverson, 2010;Crawford et al., 2007).Nevertheless, such a simulated comparison group is still an external reference, not fully controlled for confounding factors (e.g.socioeconomic status) nor assessed under similar conditions.
A second limitation of this study is the difference in mean scores that we observed between our two WISC version subgroups, for the Similarity subtest.Our analysis of the effects influencing performance on the different subtests did not enable us to explain this difference, except for an effect of the version of the scale.It should be noted that this task presents differences between the two French versions only in the items that compose it: only 9 out of 23 items are kept between the two versions, the semantic category that groups the word pairs remains the same for 14 items, but the principle of the task and the instructions remain the same.To our knowledge, a single study found a similar effect on the same subtest in a sample of children with ASD (Kuehnel et al., 2019).This difference cannot be explained by the Flynn effect either, because the difference between the groups is on a single subtest, and also because the VCI appears to capture the dimension (i.e.crystallized intelligence) that is the least sensitive to this effect (Pietschnig & Voracek, 2015).The most plausible explanation, even if unexpected considering the sample size, remains a difference in the recruitment of these two samples on some parameter not captured by the variables we collected.In any case, our results still raise the question, leaving it open, of some degree of dissociation in favor of verbal functioning, at least at the group level, in the FASD population.
A third limitation to the scope of our results may be the choice of the 4-DDC as the principal diagnostic guideline (Astley & Clarren, 2000).Several studies have shown differences in diagnostic assignment between diagnostic systems (Coles et al., 2016(Coles et al., , 2023)).It is difficult to infer how different the results would have been using other guidelines.Some being more conservative (Coles et al., 2023) might have resulted in a more severe sample where the cognitive dissociation and the heterogeneity of the WISC profiles might have been greater.Yet we carefully assessed the diagnosis concordance with the revised Institute of Medicine guideline (Hoyme et al., 2016), which was high in our sample when diagnoses were grouped into two classes (FAS and non-syndromic FASD).This high consistency may have partly resulted from our medical practice of applying a minimal threshold of clinical relevance for prenatal alcohol exposure, as proposed by Hoyme et al. ( 2016) even if we otherwise followed the 4-DDC criteria.In any case, we consider that both this consistency analysis and the description we provide of the diagnostic procedure and clinical features make it possible to envisage the interest of our results beyond the choice of guidelines.
Finally, the lack of systematization of the neuropsychological scales used in our care settings did not allow us to collect systematic data on functioning in other specific domains, such as executive functions or adaptive functioning, which would have been very interesting to consider in our differential analysis of global cognitive functioning and broad and elementary reasoning abilities.In addition, even if the WISC is widely used in clinical settings as a first step in neuropsychological assessment, our findings are limited to this particular tool and step.The use of a different intelligence scale, and the addition of complementary tests covering a broader range of cognitive domains, might have shed a possibly different and at least more precise light on the issue of cognitive heterogeneity and intellectual functioning in FASD.

Conclusion
Our study provided a detailed analysis of the WISC-based cognitive profile in a large sample of children with FASD.WISC profile heterogeneity was not greater than that of the standardization sample.Performance was reduced along the whole cognitive profile but more so in the working memory and processing speed tasks.Intellectual assessment based on FSIQ led to a much higher frequency of classification in the deficit range than when more specific measures of reasoning were used, suggesting that at least in the FASD population, FSIQ should be considered as a measure of global cognitive functioning that may underestimate specific aspects of intellectual functioning, We therefore advocate the more systematic use of ancillary indices such as the GAI or the search for at least one preserved modality of elementary reasoning that reveals access to conceptualization and abstraction.

Table 1 .
Clinical and demographic characterization of the fetal alcohol spectrum disorders sample and comparisons between Wechsler intelligence scale for children 4th and 5th editions groups.
* With revised institute of medicine criteria: 62 children with fetal alcohol syndrome, 39 with non-syndromic fetal alcohol spectrum disorders, and 6 with another diagnosis.M = Mean; sD = standard Deviation; FasD = Fetal alcohol spectrum Disorders; group comparison = false discovery rate corrected p-value from Chi squared tests or student's t-tests between Wechsler Intelligence Scale for Children 4 th and 5 th editions FasD groups.

Table 3 .
socio-demographic and clinical factors on common subtest performances between the two versions of Wechsler intelligence scales for children in fetal alcohol spectrum disorders sample.Before this analysis we checked that the missing data (3.60%) were completely randomly distributed and imputed them.prof.= professional; FasD = Fetal alcohol spectrum Disorders; * = at the time of testing.False Discovery rate (FDr) adjusted p-value.italic p-value = p-value ≤ 0.05 before FDr adjustment.Bold p-value = p-value ≤ .05after FDr adjustment.

Table 4 .
scatter and variance profiles mean comparisons between fetal alcohol spectrum disorders and simulated comparison groups.

Table 5 .
scaled and composite scores mean comparisons between fetal alcohol spectrum disorders and simulated comparison groups.

Table 7 .
Differential analysis of global intellectual functioning, broad and elementary reasoning abilities: classification as a function of general ability index, highest reasoning scaled score, and full scale intelligence quotient.

Table 8 .
Differential analysis of global intellectual functioning, broad and elementary reasoning abilities: comparison of proportions.Note.1/or represents the odds ratio (or) of being classified in the borderline-to-deficit range.FsiQ = Full scale intellectual Quotient; gai = general ability index; hrss = highest reasoning scaled score; FasD = Fetal alcohol spectrum Disorders; sCg = simulated Comparison group.False discovery rate adjusted p-value from Fisher's exact tests: ns p adj > .05;*p adj ≤ .05;**p adj < .01;***p adj < .001.