Syntactic revision in wh-questions: developmental trajectory and the role of cognitive control

ABSTRACT In our study, we examine children’s and adults’ interpretation of argument wh-questions requiring syntactic revision using a questions-after-stories procedure, leading to three major findings. First, children aged 7–8 years, but not 5–6-year-olds, were found to preferably attach the fronted wh-element to the first available gap, rather than the second, indicating an adultlike incremental parsing strategy of active gap filling. Second, older children show a similar rate of revision as adults, higher than younger children, although a substantial rate of misinterpretations remains. Third, we find a significant link between the rate of revision and cognitive control skills measured with an N-back task. Implications for theories of language development and parsing are discussed.


Introduction
A number of psycholinguistic studies have highlighted children's difficulties in the processing of garden-path sentences, i.e. temporarily ambiguous structures requiring syntactic revision.Revision difficulties are subject to a developmental trend, and this trend may arise due to immature syntactic processing and/or to immature cognitive skills, in particular cognitive control.Assessing the developmental trend and clarifying factors involved in syntactic revision is crucial for a better understanding of parsing mechanisms and their relationship with other components of cognition.

Incremental parsing and revision in children and adults
Massive experimental evidence has accumulated showing that adult sentence comprehension is an incremental process.The incremental nature of sentence processing gives rise to interpretive predictions about the upcoming linguistic material before the sentence entirely unfolds (Sussman & Sedivy, 2003).Linguistic predictions can be viewed as an instance of a more general tendency of the human cognitive system to efficiently interpret stimuli from noisy or ambiguous input (cf.Pickering & Garrod, 2007).With respect to parsing syntactic dependencies, a trigger of (temporary) ambiguity is constituted by a dislocated constituent, called filler.Upon encountering a filler, the parser may draw structural predictions to resolve the temporary ambiguity and refrain from retaining lexical and morpho-syntactic information in the working memory for a prolonged time.For example, long-distance dependencies, such as wh-questions or relative clauses, involve a constituent dislocation requiring the parser to keep the filler in memory and to associate it with an available thematic position, called gap, in the sentence.Experimental evidence suggests that adults complete filler-gap dependencies as soon as possible, i.e. they associate the filler to the first available (predicted) gap site (e.g.Stowe, 1986).This phenomenon, called active gap filling (Fodor, 1978), attests to top-down influences and has been robustly evidenced in various languages with different word orders (e.g. in Dutch: Frazier, 1987;Frazier & Flores d'Arcais, 1989; in English: Garnsey et al., 1989;Stowe, 1986;Sussman & Sedivy, 2003;Traxler & Pickering, 1996;Wagers & Phillips, 2014; in Japanese: Aoshima et al., 2004; for a review, see Phillips & Wagers, 2007).For example, in object wh-questions like (2), adults' reading times are increased at the gap position (indicated by the underscore) as compared to declarative sentences with an if-clause like (1) that contain no fillergap dependency (Stowe, 1986).Although incremental parsing is efficient in resolving syntactic dependencies, it comes at a cost in ambiguous sentences that are compatible with several structural analyses, called garden-path sentences.If subsequent elements of the sentence are incompatible with the initial parse, syntactic revision, also called garden-path recovery, is required.
(1) My brother wanted to know if Ruth will bring us home to Mom at Christmas.
(2) My brother wanted to know who Ruth will bring __ home to Mom at Christmas.
Revision takes time and garden-path sentences sometimes lead to misinterpretations, either due to revision failure or to the initial parse lingering in memory (for a review, see Ferreira & Patson, 2007).Evidence for misinterpretation in temporarily ambiguous sentences like (3) was reported by Christianson et al. (2001) who found that adults interpret the baby as the object of the verb dressed, even though later arriving information signals that the only grammatically correct interpretation of this sentence is that Anna dressed herself and not the baby.The authors argued that the parser generates an initial "quick-anddirty" interpretation of the sentence based on various types of heuristics; that good enough representation is fast and efficient, but not necessarily well-formed syntactically (see also Ferreira, 2003;Ferreira & Patson, 2007, on the good enough approach to comprehension).
(3) While Anna dressed [ the baby [ that was cute and cuddly ] played in the crib.] Turning to child sentence processing, children also show important difficulties with temporarily ambiguous sentences, attributed to syntactic revision failure.In a seminal paper, Trueswell et al. (1999) presented garden-path sentences containing a syntactic ambiguity like (4) to 5-yearold children and adults.Upon the arrival of the prepositional phrase (PP) in the box, the PP on the napkin is interpreted as the argument of the verb put, i.e. the destination.When the second PP arrives, this initial parse has to be revised in that on the napkin becomes the NP modifier of the frog and in the box becomes the argument of the verb, i.e. the ultimate destination of the frog.Eyegaze and act-out performance showed that while adults initially fixated the napkin, but then shifted their eye gaze to the ultimate destination (the box) and put the frog in the box, children also initially fixated the napkin, but failed to shift and moved the frog to the empty napkin, showing that they had stayed on their initial interpretation.
(4) Put the frog on the napkin in the box.
Although this initial study focused on children's difficulty to revise an initial parse (unlike adults), the data also attest to the fact that children behave like adults in parsing sentences incrementally and predictively: the verb put triggers the expectation of a destination that is met upon encountering the PP on the napkin, which triggers its immediate attachment as argument of the verb.
A number of studies more directly explored the possibility that children proceed to incremental parsing and predictive processing in online sentence comprehension.The ability of young learners to make use of linguistic information for predicting upcoming words in simple sentences has been attested at three years of age, e.g. by measuring anticipatory looks to the target in eye-tracking experiments (cf.Borovsky et al., 2012).More recent studies showed that lexically-driven structural predictions in simple sentences were already drawn by toddlers at two and a half years of age (Gambi et al., 2016;Lukyanenko & Fisher, 2016) and even at 19 months (White & Lidz, 2022).As for more complex structures like wh-questions, the use of early lexical cues to disambiguate sentences has been revealed in visual world studies in 5-year-old children (Contemori et al., 2018).Several studies investigating filler-gap dependencies by means of offline tasks provided additional evidence for incremental parsing with complex syntactic structures by showing that children engage in the active completion of filler-gap dependencies (Love, 2007;Roberts et al., 2007).In studies using the questionsafter-stories paradigm (in English: de Villiers & Roeper, 1996; in English and Japanese: Omaki et al., 2014;in French: Lassotta et al., 2016), children are presented with short animations and then asked to answer ambiguous biclausal wh-questions as in (5).Critically, these questions are ambiguous because the grammar allows attaching the wh-element either to the verb of the main clause (tell) or to the embedded verb of the subordinate clause (catch).Data revealed that at age 5 already, English-speaking children prefer attaching the whelement to the first (main) verb over the second (embedded) verb, suggesting that they actively complete the wh-attachment before the sentence entirely unfolds.
(5) Where did Lizzie tell someone __ [ that she was gonna catch butterflies __? ] Interestingly, the first verb attachment preference was found no matter if the first verb is the main verb or the embedded verb.Omaki et al. (2014) tested 5-year-old children speaking Japanese, a verb-final language where the gap position for the wh-element is preverbal.In Japanese, the embedded verb appears first in the linear sequence, as illustrated in (6).The authors found that children preferably attached the wh-element to the first verb in 94% of the cases (the embedded verb tsukamaeru "catch").However, when these questions included a disambiguating embedded clause PP filling the gap, like kouen-de "in the park" in (7) which forces wh-attachment to the second verb (the main verb itteta "tell"), children rarely made use of this cue, despite the object slot being filled (answering where the "catching" event happened).This suggests, again, that children struggle to revise their initial wh-attachment.
(6) Doko-de Yukiko-chan-wa [ __ pro choucho-o tsukamaeru-to ] __ itteta-no?where-at Yukiko-Dim-Top she butterfly-Acc catch-Comp was telling-Q "Where was Yukiko telling someone that she will catch butterflies?" (7) Doko-de Yukiko-chan-wa [ kouen-de pro choucho-o tsukamaeru-to ] __ itteta-no?where-at Yukiko-Dim-Top park-at she butterfly-Acc catch-Comp was telling-Q "Where was Yukiko telling someone that she will catch butterflies in the park?" Lassotta et al. (2016) provided further evidence for children's difficulties in revising filled-gap wh-questions in French, a verb-initial language where the gap position for the wh-element is postverbal like in English.When presented with biclausal adjunct wh-questions, French 6-year-olds showed the expected adultlike first-verb attachment bias in 85% of the cases (corresponding to the main verb expliquer "explain") when both attachment sites were available, as in (8).But again, when the first attachment site was blocked by the main clause PP filling the gap, like dans le salon "in the living room" in (9), children nevertheless kept attaching the wh-element to the first verb in 88% of the cases (answering where the "explaining" event happened).where Q Aline has explained in the room that she went catch some butterflies "Where did Aline explain in the living room that she was going to catch butterflies?" Interestingly though, French-speaking children performed better in revising argument wh-questions than adjunct wh-questions.Like for adjunct questions, they preferably attached the argument wh-element to the first verb in 95% in ambiguous questions like (10) (answering where the "telling" event happened), but when presented with filled-gap argument questions like (11), second-verb attachment raised to 43% (answering where the "distributing" event happened).
(10) À qui Marie a raconté __ [ qu' elle avait distribué des bonbons __? ] to whom Marie has told that she had distributed some candy "To whom did Marie tell that she had distributed some candy?"(11) À qui Marie a raconté à son papa [ qu' elle avait distribué des bonbons __? ] to whom Marie has told to her Dad that she had distributed some candy "To whom did Marie tell her Dad that she had distributed some candy?" This finding suggests that French children are, to some extent, able to revise their initial interpretation, but that this ability is modulated by linguistic factors, in particular here, the type of wh-question (see Discussion).

Parsing and the role of cognitive control abilities
Domain-general cognitive abilities, and in particular cognitive control, have been hypothesised to be involved in syntactic revision.Cognitive control (also called executive control or executive functions) is an umbrella concept for multiple related but discrete mental functions regulating and controlling thought and action.It can be defined as a set of specific higher order cognitive processes responsible for complex, non-automatic, flexible, adaptive, goal-directed behaviours (e.g.Chan et al., 2008;Diamond, 2013;Miyake et al., 2000).It consists of several components, such as inhibition, switching and updating of working memory representations (e.g.Chan et al., 2008;Miyake et al., 2000).Inhibition (also called inhibitory control) refers to the ability to actively suppress an initial and/or prepotent mental operation when a different operation is currently required, despite the interfering effect of the initial/prepotent operation.Switching (also called mental shifting or setshifting) refers to the ability to appropriately disengage from an operation that has become irrelevant and to actively engage in a different, relevant operation instead, which contributes to mental flexibility.Updating working memory representations refers to the ability to actively manipulate (i.e. to code and to monitor) incoming information that is relevant for the currently ongoing operation and to refresh representations held in working memory by replacing information that has become irrelevant with newer, relevant information.Behavioural as well as functional imaging studies using experimental tasks arguably targeting cognitive control indicated that these components are clearly separable and involve process-specific neural areas, although they correlate with one another and also have shared neural areas (Miyake et al., 2000;Sylvester et al., 2003; for a review of both the unity and the diversity of cognitive control, see Miyake & Friedman, 2012).Novick et al. (2005) observed that patients with damage in the frontal lobe neural network, which underlies cognitive control, present difficulties in syntactic revision.Moreover, behavioural studies revealed that adults' garden-path recovery can be improved by cognitive control training (using an N-back task: Hussey et al., 2017;Novick et al., 2013;using a Flanker task: Hsu et al., 2020).Further evidence for a selective link between cognitive control and revision has been found in adults, as garden-path effects correlate with performance in cognitive control tasks (like Flanker and N-back task), but not with attention or working memory tasks (like reading span; Hsu et al., 2020;Oishi et al., 2010).All these results lead to the hypothesis that cognitive control plays a crucial role in dealing with misinterpretation in sentence comprehension by enabling the parser to disengage from prepotent interpretations when other (ultimately correct, but dispreferred) interpretative options arise (Novick et al., 2005).Under this view, cognitive control would assist the parser in that inhibition helps to countermand an initial (eventually prepotent) interpretation that got invalidated by newly arriving information (i.e.disambiguating revision cues in temporally ambiguous sentences or the referential context in sentences like (4)) and dealing with its interfering effect, switching allows to disengage from the initial interpretation and actively engage in an alternative (more appropriate or ultimately correct) interpretation, and updating enables the parser to hold past linguistic material in working memory and to actively integrate new upcoming material as the sentence unfolds.
Given that cognitive control matures between ages 5 and 8 (sometimes even until age 13; e.g.Davidson et al., 2006), it has been suggested that children's lack of revision could also be explained by their immature cognitive control (Mazuka et al., 2009;Novick et al., 2005;Omaki & Lidz, 2015;Ye & Zhou, 2009).In line with that hypothesis, misinterpretations of garden-path sentences like (4) disappear around age 8 (Weighall, 2008), coinciding with the maturation of cognitive control.Also, good performance on a Flanker task requiring inhibitory control correlates with success in resolving the syntactic ambiguity of sentences like (4) in 5-year-olds (Qi et al., 2020;Woodard et al., 2016).Hence, cognitive control seems to relate to syntactic revision in children, just like in adults.No study to date, however, has systematically investigated the relationship between domain-general cognitive functions and syntactic revision in filler-gap wh-dependencies across ages, focusing on executive control capacities.Filler-gap dependencies constitute an optimal testing ground for studying the relationship between parsing and cognitive control for two reasons: first, they are a paradigmatic case of how the language processor generates syntactic predictions to optimise the parsing process (incremental parsing); second, because of this, they can be exploited to systematically generate garden-path configurations that require structural revision.The present work aims to fill this gap in the literature by administering a linguistic task and three non-linguistic taskstwo tasks to assess executive functions (Dimensional Change Card Sorting task and N-back) and a control spatial memory task (Corsi) in order to track the interplay between the development of cognitive functions and that of the ability to revise, in two age groups of French-speaking children and in adults.

Current study
The current study has two goals.First, we aimed at obtaining a clearer picture of the developmental trajectory of syntactic revision.We wanted to take a closer look at age differences than previous studies which either examined children of only a narrow age range or collapsed child groups with an age range of two years and more.We studied children from age 5 to 8 years and considered not only group effects, but also individual variations.This age range was selected on the basis of previous studies (Mazuka et al., 2009;Novick et al., 2005;Omaki & Lidz, 2015;Trueswell et al., 1999;Ye & Zhou, 2009) suggesting that major changes occur within that window (Diamond, 2013;Diamond & Kirkham, 2005).We estimated the revision rate for each participant as the difference between the rate of embedded verb responses provided in ambiguous wh-questions like (10) and those provided in unambiguous questions involving a filled-gap as in (11).Based on previous research (Contemori et al., 2018;Lassotta et al., 2016;Omaki et al., 2014;Sussman & Sedivy, 2003;Trueswell et al., 1999), we assume that a main verb preference in ambiguous biclausal wh-questions reflects a predictive active gap filling strategy.Thus, the increase of embedded verb responses in filled-gap wh-questions compared to ambiguous ones constitutes a reasonable estimate of syntactic revisions performed by participants (as argued by Omaki et al., 2014 andLassotta et al., 2016).It should be noted that because the adopted estimate of revision rate is indirect, i.e. it is derived by computing the difference in the rate of offline responses between the ambiguous vs. filled-gap condition, it presents a limitation.Namely, it will inform us on the frequency of participants' revision but it will provide no information about the timing of this process during online processing.However, using online measures to investigate revision in sentences like (10) and ( 11), such as listeners' fixations monitored via eye-tracking, would lead to another problem.Recall that the revision trigger in sentence ( 11) is the filled-gap "to her Dad" in (11).The lexical content of this constituent depicts the visual image corresponding to the MV character.It is difficult to determine whether looks to this picture in the filled-gap condition would index temporary interpretation with MV-attachment, an ongoing revision process or a mere visual strategy to fixate the picture matching the lexical material just heard.Instead, an offline estimate of revision is not affected by this problem.
The second goal was to explore the role of some specific components of cognitive control in syntactic revision.We focused on inhibition, switching, and updating, which were argued to play a key role in revision.Although empirical evidence has already highlighted their role in adults, much less is known about children.Moreover, studies have focused on a single syntactic dependency (PP attachment), without systematically exploring the role of each of these cognitive control components in revision.Finally, most studies were conducted on English speakers and none assessed both children and adults with the same battery of tasks.
We collected measures of syntactic revision in French wh-questions through a questions-after-stories comprehension task comparing French ambiguous biclausal argument wh-questions like (12) with disambiguating filled-gaps like in (13), similar to Lassotta et al. (2016).
(12) À qui Marie a dit __ [ qu'elle avait offert des chocolats __? ] to whom Marie has said that she has offered some chocolate __? "To whom did Marie say that she offered some chocolate?" (13) À qui Marie a dit à sa maman [ qu'elle avait offert des chocolats __? ] to whom Marie has said to her Mom that she has offered some chocolate __? "To whom did Marie say to her mom that she offered some chocolate?" In order to prevent floor effects, given the very low rates of revision previously reported, we used the semantically simple verb dire "say" as main verb instead of the more complex verbs previously used (raconter "tell" and expliquer "explain").While de Villiers and Roeper (1996) only employed the verb "say" without further VP modification, De Villiers et al. ( 2008), Omaki et al. (2014) and Lassotta et al. (2016) employed more complex verbs such as "ask", "tell" and "explain".These verbs are semantically more complex than "say", in that they involve richer event predication and may thus attract the wh-element more strongly, reducing the strength of embedded verb attachment (in line with Omaki et al., 2014, Experiment 1).This adjustment is of particular importance given the report, in some visual world eyetracking studies, that children tend to over-rely on verb semantics when interpreting complex structures (e.g.garden-path sentences, Kidd et al., 2011;Snedeker & Trueswell, 2004), such that semantically richer verbs may enhance main verb attachment.
Cognitive control measures were obtained through two tasks that were known to be challenging for both adults and children but nevertheless child-friendly and adequate for 5-year-olds.First, to assess updating and inhibition, we created an N-back task with lures.Although similar tasks have often been used in adults, only a few studies used them with young children and none of them involved lures (Ciesielski et al., 2006;López-Vicente et al., 2016;Pelegrina et al., 2015).In the N-back task, participants are asked to monitor a sequence of stimuli and to provide a response whenever the current stimulus is the same as the one presented N trials ago, requiring that information to be constantly updated as the sequence of stimuli unfolds.The presence of lures (i.e. the same stimulus in a non-N position) adds an inhibition component to the task (Novick et al., 2013).Second, we chose the Dimensional Change Card Sorting (DCCS) task (standard and advanced version), a classical index of perceptual inhibition and switching (Diamond & Kirkham, 2005;Zelazo, 2006;Zelazo et al., 1996).In this task, participants are asked to sort objects, which requires inhibition of processing of a perceptual characteristic (colour or shape), and then to switch from the initial sorting rule (based on colour or shape) to a new rule (based on the other criterion).We also tested participants on a task that does not involve cognitive control, to test for the selectivity of the link predicted between syntactic revision and cognitive control.We used the Corsi block-tapping task of spatial working memory (Bull et al., 2008;Corsi, 1972).Under the hypothesis that cognitive control, and more specifically inhibition, switching, and updating play a role in syntactic revision, we expect a selective and systematic link between performance to the tasks measuring these functions and revision.This holds for children and adults alike, given that we presume child-adult continuity in cognitive mechanisms underlying syntactic revision.We explored how the different cognitive skills illustrated above affect revision as well as active gap filling.This is obtained by assessing the statistical relationship between cognitive indexes and the ratings in the ambiguous condition for the latter case vs. the filled-gap condition and the difference in ratings in the two conditions for the former.
With specific regard to the age range selected in the present study, a strong link between executive control and the ability to revise predicts an increase of syntactic revision from the group of younger children (aged 5-6 years) to the older ones (aged 7-8 years) and adults, in parallel with an increase in effectiveness of executive functions.That is, younger children are expected to perform more poorly than older peers in both executive function assessment tasks and syntactic revisions, computed as difference in embedded verb attachment responses to ambiguous vs. filled-gap questions.In contrast, no link is expected between perceptual inhibition and revision, nor between spatial working memory and revision.

Participants
Forty-eight French-speaking children (27 female) participated in the experiment, divided into two age groups: 24 younger children aged 5;2 to 6;8 years (mean age 6;0 years) and 24 older children aged 7;1 to 8;3 years (mean age 7;10 years).They were recruited in three primary schools in Geneva.Thirty-two of them (67%) also spoke at least one other language.Based on the UBiLEC parental questionnaire (Unsworth, 2011), we assessed children's language exposure and proficiency: Bilinguals had been exposed to French from birth onwards and regularly practiced this language since then, so French was their first and dominant language.We therefore considered bilinguals and monolinguals as one group.Bilinguals were similarly distributed across age groups (17 younger and 15 older bilinguals).Data from seven additional children were excluded from the analyses since they didn't complete all the tasks due to fussiness (n = 5) or illness on the second testing day (n = 2).Forty-eight French-speaking adults (40 female) aged 19 to 34 years (mean age 23 years) also participated in this experiment.They were recruited from the students' community of the University of Geneva and received course credit for participation.Seven (15%) were bilingual, but French was always their first and dominant language.

Materials and procedures
All participants were presented with four tasks: a linguistic task involving a questions-after-stories paradigm and three non-linguistic tasks: N-back, DCCS, and Corsi.The tasks were administered within a total duration of 90 min, split into two testing sessions (first session: wh-question and DCCS tasks; second session: N-back and Corsi tasks).Materials and procedures of each task are detailed hereafter.
WH-QUESTION task.The same questions-afterstories design as in Lassotta et al. (2016, Experiment 2) was used.We created eight argument wh-question sets with two conditions: ambiguous vs. filled-gap.Ambiguous questions, as in ( 12), contain the fronted wh-element à qui "to whom" that can be attached to one out of two possible positions: the main verb (MV) dire "say" or the embedded verb (EV) offrir "offer" in example (10).Hence, the question is globally ambiguous: it can either be interpreted as "To whom did Marie say something?",reflecting an attachment of the wh-element to the MV, or as "To whom did Marie offer chocolates?", reflecting an attachment to the EV.In case of MV attachment, the answer would be "to her Mom" (corresponding to the upper right picture in Figure 1), whereas in case of EV attachment, the answer would be "to her friends" (corresponding to the upper left picture in Figure 1).Filled-gap questions, as in ( 13), were constructed by adding an overt PP recipient to the ambiguous questions in the empty position of the MV object, i.e. filling the gap in the main clause, which syntactically blocks the attachment of the wh-element to the MV and thus serves as revision cue disambiguating the sentence.Hence, the grammatically correct interpretation of the question in (13), is "To whom did Marie offer some chocolates?" corresponding to the EV attachment of the wh-element, and the answer would be "to her friends".The four ambiguous questions were always presented before the four filled-gap questions in order to first establish participants' preferential interpretation of ambiguous questions.Four experimental lists were created containing eight stories each, in randomised orders, each story being followed by two questions.The first question consisted of the test question (either ambiguous or filled-gap), followed by a filler question to ensure that participants were paying attention to the task (i.e.simple yes/no questions about an element of the story, e.g.Est-ce que Marie a offert des bonbons?"Did Marie offer some candies?").In half of the filler questions, the correct answer was "yes".All verbal stimuli (stories and questions) were pre-recorded by a female native speaker of French and test questions were carefully controlled to create a natural prosody that is compatible with both main and embedded clause attachment interpretation.The stimuli were presented via two loudspeakers located in front of the participants.Participants were randomly distributed across the four lists.
Participants listened to eight short animated cartoon stories (see example in Figure 1).Each story contained two storylines of two events each.In the example illustrated below, the sequence of events is the following: (1.A) Marie offers some chocolate to her friends, (1.B) Marie says to her mom that she offered some chocolate to her friends, (2.A) Marie offers some candies to her sisters, and (2.B) Marie says to her dad that she offered some candies to her sisters (see full example story in Appendix A).In each story, the main character, Marie, thus accomplishes four actions: two "saying" actions (dire "say") and two "doing" actions involving different ditransitive verbs (donner "give", offrir "offer", montrer "show", distribuer "distribute", prêter "lend", envoyer "send", acheter "buy", or servir "serve").The "saying" actions are always displayed on the right side of the screen and the "doing" actions on the left.So, there are two storylines with two actions each that belong together: 1.A & B and 2.A & B. The aim of presenting two "saying" actions and two "doing" actions was to increase the pragmatic felicity of asking a question about the "saying" action as well as about the "doing" action.Four picture animations corresponding to each of the four actions appeared one by one together with the pre-recorded verbal description of the event.After each story, the experimenter presented participants with a pre-recorded wh-question about storyline 1 or 2, a factor that was randomised (half of the items were about storyline 1).Prior to starting the experiment, a practice trial identical to the test trials was introduced in order to familiarise participants with the paradigm.
Participants were asked to provide each answer out loud after the question offset and the experimenter noted the answer.Children received regular positive comments and encouragements, independently of whether their answers were correct or not.The adult version of this task was identical to the child one, but adults received no such feedback.The task lasted approximately 15 min.
N-BACK task.We designed a new N-back task suitable to our child population.Memory items consisted in pictures of six well-known fruits (apple, strawberry, cherry, banana, grape, and pear; see Figure 2).A train containing one fruit per wagon travelled from the left to the right of the screen, partly hidden by a fence (see Figure 3).The content of the wagons could only be seen through a hole in the fence on the left side of the screen, next to the little girl Marie.On the right side, there was a door in the fence next to a farmer who could open this door.The goal of the game was to detect when the fruit hidden behind the farmer's door was the same as the visible fruit situated in the wagon next to Marie.The task was a 2-back task since the critical fruit behind the door was always situated two wagons away from the one currently seen.Participants were instructed to press a key (space bar) whenever Marie's fruit was the same as the farmer's.
In the example displayed in Figure 2, participants would have to key-press when they saw the second cherries appearing on the screen (i.e. the third wagon after the locomotive), since this is a target 2-back situation.In all other cases no key-press should be provided.Each train also contained lure items, i.e. 1-back lures (like the strawberries in Figure 2), and 3-back lures (like the bananas in Figure 2).Prior to the test phase, our N-back procedure involved a complex familiarisation comprising eight steps, detailed below and illustrated in Figure 2.Each step was repeated until succeeded twice before the experimenter continued with the next step.If one of the steps was not succeeded, the experimenter stopped the task.1. Fruit naming: All fruits are presented on the screen and participants are asked to name each fruit.2. Recall of one fruit: A train with one wagon containing a fruit travels from the left to the right through the landscape and once it disappears, participants are asked to say which fruit it was.3. Recall of two fruits: A train with two wagons containing a fruit each travels from the left to the right and once it disappears, participants are asked to say which fruits were in the wagons (the order did not matter).4. Recall of two fruits and 2-back rule introduction: Same as step III, but asking the participants to recall the order of appearance as well and to tell if the last fruit seen was the same as the one before.5. 2-back practice with three wagons and introduction of the rules of the game: A train with three wagons containing a fruit each travels from the left to the right (see Figure 3, scenes A-C), stopping automatically when the third/last wagon is visible through the hole in the fence (scene D).Participants are asked to press the key if the fruit seen through the hole in the fence next to the girl is the same as the one behind the farmer's door, i.e. the one seen two wagons ago/in the first wagon, to win the game.
After the participants' response (key-press or not), the experimenter lets the farmer's door open to show the hidden fruit (scene E).At each correct response (key-press) to a 2-back situation, a magical star appears and the farmer is jumping and smiling (scene F). 6. 2-back practice with four wagons: Same as step V, but with four wagons (the 2-back always happening when the fourth/last wagon is visible through the hole in the fence and the train automatically stopped), and a recap of the rules of the game.7. 2-back practice with an ongoing train: A train with 18 wagons travels from the left to the right, stopping and revealing the content of all the wagons on keypress (the whole train being lifted above the fence).
Participants are instructed to stop the ongoing train by pressing the key whenever the girl's fruit is the same as the farmer's fruit.8. 2-back practice: Same as step VII, except the train doesn't stop on key-press, but continues to travel behind the fence.At each hit, i.e. correct key-press in a 2-back situation, a magical star appears and positive auditory feedback is provided (sound of children cheering and applauding).False alarms, i.e. inappropriate key-presses at lure or foil items, cause negative auditory feedback (sound of children saying a sad "ooh").
The test phase followed after familiarisation phase VIII and was identical to that phase, except that it was composed of four more trains with 18 wagons.In each train, we presented three target items and three lure items intermixed with 12 foil items.During the test phase, participants were rewarded and motivated by receiving magical stars with each hit (target detection), but no auditory feedback, and thus no negative feedback to false alarms was provided anymore.In total, the Nback task lasted approximately 30 min for children and 20 min for adults.The adult version of the task was identical to the child one, but with a reduced familiarisation procedure: only the two last familiarisation phases (VII & VIII) were presented prior to the test phase.Also, in order to increase task difficulty, train speed was increased and the size of the hole in the fence reduced, such that the stimulus presentation was shortened and higher processing speed was required for adults to perform the task.
The dependent variables of the N-back task were overall accuracy, indexed by a d' score, accuracy of lure items and hit latency.The d' score takes into account the number of hits (appropriate key-press when a 2-back target occurred) and the number of false alarms (inappropriate key-press when no 2-back occurred), so d' was estimated as d' = Z hits -Z false alarms and provides a general estimation of inhibition and working memory updating mobilised in this task.The higher the d', the better the cognitive control performance.Lure accuracy is of particular interest since the appropriate response to lures is a non-response, i.e. to inhibit the key-press strongly triggered by the previously seen fruit.This index thus represents a specific measure of inhibition capacity, higher lure accuracy meaning better inhibition.We were also interested in hit latency (i.e.reaction times of 2-back target detection) which assesses the efficiency of working memory updating, since each new arriving fruit has to be integrated in the memorised sequence and correctly retrieved in order to provide the appropriate key-press.The lower the hit latency, the more efficient the updating.
DCCS task.We created a computerised version of the DCCS in which participants were required to sort toys (see examples in Figure 4) according to one of two dimensions: their shape (car or teddy bear) or their colour (blue or red).The colours were the same as in Zelazo (2006), and we followed the same procedure as Diamond and Kirkham (2005) testing children and adults.Stimuli were presented with E-prime 2.0.They appeared one by one in the upper middle of the screen whereas the two response icons illustrated by grey sorting boxes were displayed in the bottom left corner and the bottom right corner.Throughout the whole experiment, the box on the left contained a picture of a blue car while the box on the right contained a picture of a red teddy bear.In the congruent condition, the stimulus to be sorted was identical to the response icons (as in Figure 4, example A), while it was different in the incongruent condition (as in Figure 4, example B).
The task involved three blocks that assess three cognitive control components.Block 1 tested for perceptual inhibition: participants were asked to sort according to one of the two sorting rules (either shape or colour), which requires to inhibit the irrelevant dimension of incongruent stimuli (colour in the shape game and shape in the colour game).Half of the participants started with the shape game in block 1. Half of the 12 trials contained incongruent stimuli.In block 2, participants were asked to sort cards according to the second rule: now the other dimension had to be considered (colour if they initially sorted by shape and conversely).Inhibition of the irrelevant dimension was again necessary in incongruent stimuli, but in addition, participants had to switch to a new rule.Again, half of the 12 trials of this block employed incongruent stimuli.These first two blocks parallel the standard DCCS and were followed by block 3 corresponding to the advanced DCCS.In block 3, the sorting rule changed randomly all the time, a pre-recorded voice announcing the relevant rule (audio word "colour" or "shape") before each stimulus appeared.So, participants had to switch between rules flexibly and multiple times.All of the 24 trials in this block contained incongruent stimuli and half of them were switch trials.The stimuli of all three blocks were presented in a pseudo-random order.Before each one of the test blocks, a short practice block familiarised the participant with the task, but used a different set of objects (orange and purple balls and Legos).
Each participant was presented with one of two lists: colour-first or shape-first.In both lists and all blocks, each trial began with the presentation of a centred black fixation cross on white background during 800 ms followed by the stimulus and the response icons appearing simultaneously, until the participant responded.Participants were instructed to press one of two keys situated on the keyboard below the left and the right sorting boxes represented on the screen.Participants were asked to keep their fingers on the keys.Once one of the keys had been pressed, the stimulus and the response icons disappeared, leaving a blank screen for 800 ms, after which the next fixation cross was presented, starting the next trial.In block 3, the audio sorting rule cue was presented at the onset of the fixation cross.Thus, in block 3, there was an 800 ms response-cue interval and an 800 ms cue-stimulus interval.Key-press responses were recorded by Eprime 2.0.In total, the task took approximately 10 min.The adult version of the task was identical to the child experiment, but with twice as many trials (24 trials in blocks 1 & 2 and 48 trials in block 3).
We computed three different indexes of cognitive control cost by subject (Davidson et al., 2006): perceptual inhibition cost, single-switch cost, and multipleswitch cost.The lower the cost index, the stronger are the respective cognitive control abilities.The switching costs are expected to be related to syntactic revision, but not perceptual inhibition cost.Perceptual inhibition accuracy cost was obtained by calculating the mean accuracy in congruent trials of block 1the mean accuracy in incongruent trials of block 1, and perceptual inhibition latency cost was obtained by calculating the mean latency of correct responses in incongruent trials of block 1the mean latency of correct responses in congruent trials of block 1. Lower performance in incongruent trials (requiring inhibition) than in congruent trials (not requiring inhibition) would result in a positive value of this index, indicating a high conflict cost (i.e.low inhibition abilities).Single-switch accuracy cost was obtained by calculating the mean accuracy in the last two incongruent trials of block 1the mean accuracy in the first two incongruent trials of block 2, and singleswitch latency cost was obtained by calculating the mean latency of correct responses in the first two incongruent trials of block 2the mean latency of correct responses in the last two incongruent trials of block 1. Lower performance in block 2 (requiring rule-switching) than in block 1 (not requiring rule-switching) would result in a positive value of this index, indicating high single-switch cost (i.e.low single-switch abilities).Multiple-switch accuracy cost was obtained by calculating the mean accuracy in no-switch trials of block 3the mean accuracy in switch trials of block 3, and multipleswitch accuracy cost was obtained by calculating the mean latency of correct responses in switch trials of block 3the mean latency of correct responses in noswitch trials of block 3. Again, lower performance with switch trials (requiring rule-switching) than with no-switch trials (not requiring rule-switching) would result in a positive value of this index, indicating high multiple-switch cost (i.e.low multiple-switch abilities).
CORSI task.In this standardised task (Corsi, 1972) assessing the spatial working memory span, participants were presented with a board containing nine blocks.The experimenter tapped a sequence of blocks that the participant was requested to replicate, i.e. to tap the same blocks in the same order.The number of blocks in the sequence increased from one to nine or until performance breakdown, which is defined as failure to each of the five items of a level.Depending on participant's performance, the duration of the task varied between 5 and 10 min.The adult version of this task was identical to the child one, but starting directly at level 4, automatically considering the previous levels as being succeeded.Performance was measured by the Corsi level, i.e. the maximum sequence length at which at least one sequence was correctly reproduced (called "span" in Farrell Pagulayan et al., 2006), ranging from 0 to 9.

Wh-question task
Participants' verbal responses to target wh-questions were coded either as MV or EV responses.Other responses, i.e. null responses and distractor responses (concerning a character from the alternative storyline), were observed at a low and similar rate in each participant group: 16% in 5-6-year-old children, 13% in 7-8year-old children, and 14% adults.These responses were removed from analyses.Since MV and EV response types are in complementary distribution, all our analyses were conducted on EV responses.The proportions of EV responses (calculated over all EV + MV responses), displayed in Figure 5, show that 7-8-year-olds and adults chose the EV picture over the MV picture much more frequently in the filled-gap question type than in the ambiguous one (7-8 yo: 95% vs. 32%; adults: 68% vs. 6%), showing a difference in EV responses of 62-63% between the two conditions.5-6-year-olds provided 82% of EV responses in the filled-gap vs. 60% in the ambiguous question type, thereby showing a difference of 22% in the two conditions.Thus, while all the three groups displayed a high rate of EV choices in the filled-gap question type (5-6 yo: 82%; 7-8 yo: 95%; adults: 68%), the 7-8-year-olds and the adults also displayed a low rate of EV choices in the ambiguous question type (32% and 6%, respectively), whereas the younger children chose it in 60% of trials.
The rates of EV responses across conditions and groups were analysed using mixed-effects logistic regression with question type (ambiguous vs. filledgap) and age group (5-6-year-old children vs. 7-8year-old children vs. adults) as fixed factors, subjects and items as random factors, adopting binomial family distribution.The model included random intercept and random slope of question type varying by subject and item (maximal random effect structure until convergence, see Barr et al., 2013) and the levels of the fixed factors were coded with sum contrast.Estimates (β) and standard errors (SE) of the model, reported in Output B1 in Appendix B, were obtained via the "afex" package (Singmann et al., 2016) and χ 2 and p-values were obtained via likelihood-ratio tests.In this model we found a significant effect of age group (χ 2 (2) = 25.95;p < .001),question type (χ 2 (1) = 18.91; p < .001)and an interaction between the two factors (χ 2 (2) = 11.45;p < .01).
To further investigate the effects and interactions for each comparison between the three age groups we computed post hoc comparisons via the "emmeans" package (Lenth, 2023), i.e.Type 3 tests, based on the full model, on the effects of question type by age group and of age group by question type.First of all, the overall effect of question type was significant in 7-8-year-olds (χ 2 (1) = 23.503;p < .001),adults (χ 2 (1) = 27.070;p < .001),but not in 5-6-year-olds (χ 2 (1) = 0.272; p = .602).This shows that 7-8-year-olds and adults were sensitive to the different interpretations of the two question types.The same cannot be concluded about 5-6-year-olds, in spite of a difference of 22% in the rate of EV responses between the two conditions.
A similar pattern emerged from the comparison between 5-6-year-olds and adults, where the effects of age group (χ 2 (1) = 19.927;p < .001)and question type (χ 2 (1) = 21.115;p < .001)were significant, as well as their interaction (χ 2 (1) = 14.236; p < .001).This suggests that 5-6-year-olds provided more EV responses than adults and that both provided overall more EV responses with filled-gap questions than with ambiguous questions.The interaction between the two factors supports the idea that this difference was larger in adults than in 5-6-year-olds.Again, in the post hoc analysis the effect of age group was significant in the ambiguous questions (χ 2 (1) = 18.487; p < .001)but not in the filled-gap questions (χ 2 (1) = 1.147, p = .284).
From the assumption that revisions are estimated as the relative difference between the rate of EV responses in the filled-gap vs. ambiguous question type, the interactions discussed above lead to the conclusion that 5-6year-olds revised significantly less frequently than older children and adults.The fact that in all comparisons the effect of age was present for the ambiguous but not for the filled-gap question type suggests that the former condition was the driving factor for the differences in the rate of revision among the three groups.
Chance levels of responses were assessed using Student's t-tests; degrees of freedom (df), t-and p-values (two-tailed) were obtained via the t.test function of the "stats" package (R Core Team, 2019).In order to specifically assess active gap filling, we explored chance levels in ambiguous questions: active gap filling is expected to manifest in terms of MV attachment preference, i.e. a low rate of EV attachment (which we kept as our dependent variable across all analyses).Post-hoc t-tests revealed that 5-6-year-olds' EV response rate is at chance (60%; t = 0.960, df = 23; p = .348)while the rate of EV responses is nearly lower than chance in 7-8-year-olds (32%; t = −1.978,df = 23, p = .059)and in adults (0.06%; t = −23.146,df = 47, p < .001).In other words, only adultsand to a lesser extent 7-8-year-oldsshowed a MV attachment preference.Looking at filled-gap questions, a preference for EV attachment was found in all three groups showing EV rates above chance, attesting to a grammatical parse: 82% in 5-6-year-olds (t = 4.894, df = 22, p < .001),95% in 7-8-year-olds (t = 14.922, df = 23, p < .001)and 68% in adults (t = 2.864, df = 47, p < .01).Given that the effect of question type was not significant in 5-6-year-olds in the post hoc analysis, the fact that their responses to the filled-gap questions are not at chance attests that they understood the experimental task and did not answer randomly.
In order to assess revision, we examined the increase of EV responses from the ambiguous to the filled-gap question type.We discovered a significantly lower increase in 5-6-year-olds (+22%) than in 7-8-year-olds (+63%, β = −3.694;SE = 0.927; z = −3.984;p < .001),and no difference between 7-8-year-olds and adults (+62%; β = 0.906; SE = 0.930; z = 0.974; p = .330).In the next sections we will explore the correlation of this index with memory and cognitive capacities, assessed with the three non-linguistic tasks we adopted.

Non-linguistic cognitive tasks
Participants' performance in the N-back, DCCS and Corsi tasks is reported in Table 1 by age group.Participants at more than ±2.5 SD of the mean, calculated within each age group, in at least one of the basic DCCS conditions (i.e.congruent trials of block 1 and/or no-switch trials of block 3) were removed (six of the 5-6-year-olds, seven of the 7-8-year-olds, and eight of the adults).Also, two participants (one of the 5-6-year-olds and one of the adults) were removed from the N-back analysis because they performed at more than ±2.5 SD of the mean in at least one of the N-back indexes.No participants were excluded from the Corsi analysis since they all performed within ±2.5 SD of the mean Corsi level.

Linking syntactic revision with cognitive control
To inspect the link between revision and cognitive control, the increase of EV responses from the ambiguous to the filled-gap question type was correlated with the various cognitive measures collected.We also computed the correlation between the cognitive indexes and the rate of EV responses for each question type to estimate their contribution to the effect of executive functions on revision.One participant (5-6-year-old) provided no target responses in the filled-gap question type and therefore had to be excluded from the correlational analysis, reducing the number of participants to 95.Also, two participants were excluded from this analysis due to N-Back outlier rejection, as well as 21 participants due to DCCS outlier rejection (see previous section).Partial correlations (controlling for age, given the important age variability) between EV increase, the rates of EV responses and each of the cognitive control indexes were calculated.Pearson's r and p-values were obtained via the pcor.testfunction of the "ppcor" package for partial correlations (Kim, 2015).
Analyses (see Table 2) reveal that d' positively correlated with EV increase (r = .23;p = .025),indicating that the more accurately participants responded in the Nback task, the larger the EV increase was.This index also displayed a weaker negative correlation with the rate of EV in the ambiguous question type (r = -.21;p = .048),suggesting that, while both conditions contributed to the correlation with the d', trials in the ambiguous question type played a major role.Further analyses conducted on the groups of children did not reveal any other significant correlation (p > .1).Lure accuracy positively correlated with EV increase (r = .21;p = .046),indicating that the more often participants successfully inhibit their response to lures, the larger the EV increase.Furthermore, this index did not significantly correlate with the rate of EV in any of the single conditions (p > .1).To further inspect this effect, we computed partial correlations between Lure accuracy and the rate of EV responses within the groups of children.The analyses on the data including both groups of children showed a significant correlation between the rate of EV in the filled-gap condition and Lure accuracy (r = .31;p = .037),suggesting a high sensitivity of this index to revision in children.Hit latency was negatively correlated with EV increase (r = -.26;p = .013),showing that the faster participants detected the target, the larger was the EV increase.This index also displayed a negative correlation with the rate of EV in the ambiguous question type (r = -.23;p = .028),but not in the filled-gap question type (r = -.15;p = .165),suggesting that a MV preference in the ambiguous trials, indexing active gap filling, was the determining factor for this correlation.To further investigate this effect, we explored this correlation within the groups of children and we found that Hit latency correlates with the rate of EV responses in the ambiguous question type in 7-8-year-olds (r = .50;p = .016)but not in the dataset including both age groups (r = .17;p = .267).These analyses show that the effect of Hit latency was maximum for 7-8-year-olds and mainly affected their MV preference in the ambiguous condition.
The DCCS single-switch cost accuracy index correlated marginally negatively with EV increase (r = -.21;p = .070),but it also correlated with the rate of EV responses in the ambiguous question (r = .30;p = .010)provided by adults and children, suggesting that the smaller the cost due to rule change is (i.e. the stronger their switching ability), the larger is the EV increase and the MV preference in the ambiguous condition.Interestingly, this correlation was not significant in the data subset including children but not adults (r = -.12;p = .424).All other correlations were non-significant (all ps > .10).

Discussion
Four major findings emerge from the current study.First, when parsing ambiguous wh-sentences, young children aged 5-6 years do not show a clear attachment preference; this preference arises at age 7-8, attesting to an adultlike use of an incremental parsing active filling strategy.Second, children aged 5-6 revise less than 7-8-year-olds and adults, who show a similar rate of revision, again attesting to adultlike parsing at age 7-8.Third, although 7-8-year-olds and adults revise more, adults show a substantial rate of misparsing.Fourth, we found a selective link between active gap filling and the rate of revision on the one side, and indices of cognitive control, and more particularly inhibition and updating, on the other.Participants with stronger cognitive control abilities apply active gap filling and revise more often than those with weaker cognitive control.In the following sections, we discuss the developmental trajectory of incremental parsing and revision and the link between syntactic revision and cognitive control.

Developmental trajectory of incremental parsing and revision
As reviewed in the Introduction, incremental parsing has been largely attested in the adult literature (e.g.Sussman & Sedivy, 2003; for a review, see Phillips & Wagers, 2007), but less so in children (Contemori et al., 2018;Lassotta et al., 2016;Love, 2007;Omaki et al., 2014;Roberts et al., 2007).We explored incremental parsing through the study of ambiguous biclausal wh-questions, which can be parsed in attaching the wh-argument to the main verb or to the embedded verb.A preference for main verb attachment in this context is viewed as evidence for incremental parsing, as it attests to wh-attachment to the first available gap.We found no evidence for active gap filling in 5-6-year-old children, who showed a MV preference in only 40% of the cases.However, inspection of individual profiles in that group shows that some children actually do show a preference: 46% of them went for MV attachment in 75% or more of their responses.A change is observed at age 7-8 years, where we found a MV preference at the group level (68%).However, although the majority of the children showed a MV bias in that group (63%), a number of them still failed to show active gap filling.Our data show that incremental parsing still develops beyond age 7-8, as adults show a massive MV preference, both at the group level (94%), and at the individual level (96%).A novel result from the present study is that the use of active gap filling in resolving wh-dependencies is linked to cognitive control, in particular to the N-back indexes that are sensitive to working memory update such as the Hit latency.This supports the idea that the process of active gap filling, which requires the prior stipulation of a gap as soon as the filler is encountered followed by an "active" ranking of possible attachment sites (cf.Frazier & Flores d'Arcais, 1989), taps onto domaingeneral cognitive resources involved in the update of temporary representations in working memory.In particular, the correlation between a MV-attachment preference in the ambiguous condition and the rapidity to hit a target in the N-back task found in 7-8-year-old children suggests that this age is critical for the development of adultlike incremental parsing strategies that require such resources.
However, it is well known that young children are subject to robust verbal and working memory recency effects, as shown in different studies employing memory (Berry et al., 2018), question/answer (Sumner et al., 2019) and reasoning (Chiesi & Primi, 2009) tasks.The sensitivity to information that has recently been stored in working memory may have biased their interpretation of wh-questions towards the last-mentioned portion of the sentence, which includes the action described by the embedded verb.This may have at least in part reduced the MV attachment bias in young children as well as prompted more EV responses in both experimental conditions, and cognitive control may well be involved in reducing the sensitivity to this bias as discussed in the next section.
Interestingly, by comparing the results from this study to those coming from previous works investigating filled-gap wh-questions in children, the following consideration may be drawn.The MV attachment bias in wh-questions appears to be modulated by two linguistic factors: verb semantics and argument structure (argument vs. adjunct wh-, see Table 3).In Lassotta et al. (2016), the very same argument wh-structure in French was used as in the current study, but with semantically more complex MVs (raconter "tell" and expliquer "explain") instead of the simple verb dire "say" used here.As can be seen in Table 3 (column "Active gap filling"), children's MV bias is much lower with "say" than with more complex verbs, and a similar tendency is present in adults, although less pronounced.Lassotta et al. (2016) also uncovered that MV attachment is shaped by verb argument structure: both children and adults showed slightly stronger MV attachment in argument wh-questions, in which the wh-element is an argument of the verb, than in adjunct wh-questions, in which the wh-element is an optional complement (Table 3).Hence, the main point of departure between children and adults lies in the role of verb semantics: while children showed a strong MV preference with semantically complex verbs, the preference is absent with simple verbs in younger 5-6-year-olds, and still mild in 7-8year-olds.In contrast, adults show a strong MV preference independently of verb semantics.This suggests that children's initial parsing strategy is contingent on verb semantics, attaching the wh-element to the semantically complex verb, which progressively evolves towards a more efficient, default strategy of active gap filling.This possibility is in line with findings showing children's sensitivity to verb semantics in visual world eye-tracking studies reporting over-reliance on verb information in the interpretation of garden-path sentences like (4), while adults make use of both verb information and information about the referential context (Kidd et al., 2011;Snedeker & Trueswell, 2004).Now turning to the parsing of filled-gap questions, the current study uncovered revision, manifested by an increase of EV responses in filled-gap vs. ambiguous wh-questions, in all age groups, showing that participants were sensitive to the filled-gap PP signalling the parser to reject MV attachment (see Table 3, column "Revision").The results revealed that among children, revision increases with age, 5-6-year-olds showing less revision (22%) than 7-8-year-olds (63%).Interestingly, adults revised at a similar rate as 7-8-year-olds (62%).At the individual level, 43% of the 5-6-year-olds showed revision (i.e. an EV increase of at least +25%; 13% systematically revising in all items), against 75% of the 7-8-year-olds and 77% of the adults (in both groups 42% revising in all items).This shows, on the one side, fully operational revision skills by age 7-8 years, and on the other side, substantial inter-individual variability in all age groups, with some 5-6-year-old children being already able to revise while some older children and adults (occasionally) fail.It should be noted, however, that pairwise comparisons of the three groups reveal that 5-6-year-olds' responses differ significantly from those provided by older participants with ambiguous questions but not with filled-gap questions.This means that the reduced rate of revisions in younger children was mainly caused by their inability or reluctance to apply active gap filling strategy, which is the prerequisite to generate a garden-path interpretation to be further revised, rather than their inability to revise.
Considering how the two linguistic factors discussed in regard to MV attachment preference affect revision, it turns out that while children revise argument wh-questions as often with semantically complex verbs (43%) as with simple verbs (43%), adults succeed less with complex verbs (21%) than with simple verbs (62%).This suggests that adults are sensitive to the semantic properties of the verb when it comes to revision, while children are not.With respect to verb argument structure, Lassotta et al. (2016) reported that whereas adults revise adjunct and argument wh-questions equally often (both 21%), children only revise argument wh-questions.Adjunct wh-questions involve an optional filled-gap whereas argument wh-questions involve an obligatory filled-gap, which determines the nature of the error signal (optional in the former and obligatory in the latter).Children thus appear to be very sensitive to the nature of the error signal in that Table 3. Overview of rates of active gap filling, misparsing and revision (in %) in studies on wh-attachment in French biclausal whquestions (Lassotta et al., 2016, and  revision is easier for them with more salient, obligatory filled-gaps than with less salient, optional filled-gaps. Considering parsing errors, that is, interpretations based on the attachment of the wh-element to the MV although its argument slot is already occupied by a PP, both the current results and Lassotta et al.'s (2016) attest to substantial error rates in adults.We focus here on argument questions, because attaching the wh-element to the filled MV gap in adjunct questions can actually be made grammatical (e.g."Where did Aline in the living room explain that … ?" could be parsed as "Whereabouts in the living room?"; see Lassotta et al., 2016, for a discussion of this option).Both studies show that adults and children make more errors with heavy main verbs, which appears to play a role in attracting MV attachment (see Table 3, column "Misparsing").But more crucially, both studies show that adults misparse filled-gap argument questions significantly more than children: 79% errors vs. 52% with complex MVs, and 32% vs. 12% (on average across the two child groups) with simple verbs.Interestingly, 7-8year-olds showed only 5% errors with simple verbs.Our finding that adults sometimes generate erroneous parses is actually in line with the few studies that tested adults' comprehension of complex structures involving either garden-path of object movement, which also attested to unexpectedly high rates of comprehension errors (e.g.Christianson et al., 2001;Ferreira, 2003;Villata et al., 2018;Villata & Franck, 2020).One possible explanation for the high rate of ungrammatical parses in adults, even with simple verbs, lies in their strong reliance on the active gap filling parsing strategy, which could make the initial parse too stable to be reconsidered and hence no alternative parse is generated (i.e.revision failure).Another possible interpretation of misparses is that an alternative parse is generated, but finally not selected due to lingering memory traces (or semantic persistence) of the early parse that are too strong to be ruled out (e.g.Slattery et al., 2013;Staub, 2007;Sturt, 2007;van Gompel et al., 2006).We discuss both revision and misparsing in relation to cognitive control skills in the next section.

Role of cognitive control in syntactic revision
Different cognitive control indices were collected through the N-Back and DCCS tasks, tapping into inhibition, switching, and updating.A generaland unsurprisingobservation is that adults show higher performances than children, that is higher accuracy (Nback d' and lure accuracy), lower latencies (N-back hit latency), and lower costs (all six DCCS indexes).Considering accuracy indices, performance was similar between 5-6-year-olds and 7-8-year-olds (N-back lure accuracy, DCCS perceptual inhibition accuracy cost, single-switch accuracy cost, and multiple-switch accuracy cost), with one exception: the N-back d' is lower in 5-6-year-olds than in 7-8-year-olds, who do not differ from adults.Considering latencies, we found that older children outperformed younger children in all indexes (N-back hit latency, DCCS perceptual inhibition latency cost, single-switch latency cost, and multipleswitch latency cost).These findings are in line with previous studies showing a continuous development of cognitive control abilities in childhood (e.g. for DCCS: Diamond et al., 2005;N-back: Pelegrina et al., 2015).We also used the Corsi task as a control test and found developmental changes here, too, with 5-6-year-olds' level being lower than in 7-8-year-olds, whose level is lower than in adults.Normative data from Farrell Pagulayan et al. ( 2006) reported similar Corsi levels (age 7: 5.0; age 8: 5.2).
Only very few studies using N-back tasks with children at that age are available.A categorical picture Nback task (without lures) showed hit latencies that fit our data (age 6: 760 ms, age 10: 490 ms, adults: 480 ms; Ciesielski et al., 2006) and also demonstrated that the underlying updating abilities continuously develop beyond age 8. Normative 2-back data (without lures) indicated a lower d' (age 7: 0.7; age 8: 0.8) and longer hit latencies (age 7: 1'220 ms; age 8: 1'180 ms; Pelegrina et al., 2015) as compared to our results (age 7-8: d' = 1.7 and hit latency = 601 ms).The letter N-back used in this normative study was much less child-friendly than our N-back task with animated pictures and the gaming context including positive feedback and goal-set; it is thus plausible that the higher performance rate found here reflects more adequate task setting influenced performance (e.g.Harackiewicz, 1979).Interestingly, despite the child-friendly design and procedure, adults were challenged by our N-back task, as attested by the fact that neither their d' (1.87) nor their lure accuracy (86%) were at ceiling.Hence, the new N-back task we designed appears suitable for multiple age groups, thus providing an excellent tool for the study of cognitive control development.
We saw that both the linguistic indices (active-gap filling and revision) and the cognitive control indices develop with age, and that revision rates substantially vary across children within the same age group.Moreover, although children have been argued to struggle with syntactic revision due to immature syntactic processing mechanisms (e.g.Trueswell et al., 1999;Trueswell & Gleitman, 2004), we saw that adults still struggle.Hence, it seems plausible that revision difficulties in both children and adults do not (only) lie in the immaturity of parsing strategies, but rather in the involvement of factors external to the language domain (e.g.Mazuka et al., 2009;Novick et al., 2005;Omaki & Lidz, 2015).Indeed, we found significant links between the frequency of syntactic revision and these components of cognitive control.Participants with higher N-back and DCCS performance were found to be better at rejecting the erroneous MV interpretation, and opting for the alternative and correct EV interpretation.More specifically, the rate and the success of revision are tied to correctly and quickly detecting 2-back targets and in discarding 1-back and 3-back lures.These two indexes measure the ability to constantly update what constitutes the last two items encoded in memory, and to inhibit stored objects from memory that are retrieved in virtue of their identity with the current object being processed, although they are not targets.Interestingly, our results suggest that N-back indexes sensitive to update capacity (Hit latency) correlate to a greater extent with active gap filling, measured as MV preference in ambiguous questions, while indexes more specific to inhibition (Lure accuracy) correlate more strongly with revision, as measured both by EV responses to filled-gap questions and by the difference in EV rates between the two conditions.
In line with these findings, participants with a lower single-switch cost in the DCCS also tended to revise more frequently.This index measures the ability to switch sorting rules which in fact means deactivating (or inhibiting) a previously relevant rule and actively selecting a new rule to be currently used (possibly also involving updating).
In accordance with our prediction of a selective link between revision and some components of cognitive control, we found no correlation between revision and perceptual inhibition assessed in the DCCS task, and no correlation with spatial working memory assessed in the Corsi task.We also failed to find a significant correlation between revision and the multiple-switch cost of the DCCS, although this measure is expected to measure switching.Multiple-switch cost was measured in the third block of the DCCS task, after participants performed two blocks in which they first had to sort cards according to one criterion, and then according to the other.Although we expected this third block to be particularly difficult, given that participants have to regularly switch, between the two sorting criteria, it turned out that this switching cost was low (see Table 1).It is thus plausible that this measure did not mobilise cognitive control as expected, explaining why it fails to correlate with revision.
The attested link between N-back performance and revision in our study is in line with previous findings showing that repeated training with the N-back task improves adults' ability to revise temporarily ambiguous sentences (Hussey et al., 2017;Hussey & Novick, 2012;Novick et al., 2013).Interestingly, even a brief training with a trial preceding the comprehension of gardenpath sentences was found to reduce comprehension errors and facilitate revision in adults (Hsu et al., 2020).To-date, only one study used N-back training with children (age 9) and found it to be effective, i.e. increasing fluid intelligence on untrained tasks, and lasting over three months (Jaeggi et al., 2011).Future investigations could examine if such a training effect also transfers to revision performance and test younger children as well.
In summary, the overall data collected in the current study shows developmental changes in both the linguistic and non-linguistic domains.At the linguistic level, we found that a main verb attachment preference for filler-gap argument questions in French is absent in 5-6-year-olds, starts to emerge at age 7-8, and keeps consolidating until adulthood.Indeed, while only the older children significantly differed from the adult controls in the rate of EV responses in the filled-gap question type, all the three groups differed from each other in the ambiguous question type.This finding supports the idea that the incremental parsing strategy of active gap filling is absent in 5-6-year-olds, although alternative hypotheses may contribute to account for this result.The higher amount of EV answers provided by younger children in the ambiguous question type could be due to (a) the reluctance to perform a MV attachment because of the weak semantics of the verb "say", although this would go against what was found by Omaki et al. (2014) in Japanese, and (b) a higher sensitivity of young children to recent elements in memory, from which they struggled to divert their attention, thereby boosting the EV interpretation.
As for the focus of the present study, i.e. the ability to revise an initial parse and its relationship with the development of cognitive functions, the picture emerging from the results is quite straightforward.Namely, 5-6year-olds sometimes manage to revise their initial parse, but less often than children aged 7-8 and adults who revise alike, as shown by the comparable difference in their EV responses through the two question types.At the level of cognitive control, the key measures of updating and inhibition that correlate with revision (d', lure accuracy, and hit latency) improve between ages 5-6 and 7-8, and between age 7-8 and adults.The overall pattern can be accounted for if one assumes that two factors pull syntactic revision in opposite directions.On the one side, the strength of the initial engagement towards main verb attachment, possibly channelled by incremental processing, increases the difficulty of disengaging from that initial path and revising.On the other side, the strength of cognitive control abilities reduces the difficulty of disengaging from that initial analysis.In other words, the more systematically active gap filling is applied, the more cognitive control needs to be efficient to overcome the initial analysis.Younger children benefit from the fact that they have not yet developed a main verb attachment preference, but are penalised by their immature cognitive control.In contrast, adults benefit from mature cognitive control, allowing them to revise more easily than young children, but they are penalised by their strong active gap filling parsing strategy, leading them to still get stuck on their initial parse in almost one third of the gardenpath sentences.Children aged 7-8 seem to show a nearly optimal balance between the two factors, with improved cognitive control but a mild active-gap filling strategy, allowing them to revise as much as adults, but generate fewer misinterpretations.

Implications for theories of language development and parsing
The results from this study, together with data from other works investigating filler-gap dependencies with wh-questions in French and other languages (Lassotta et al., 2016;Omaki et al., 2014), offer new insights into language development and parsing.One line of research has produced evidence that children's immature parser leads them to overconsider semantic information and lexical cues during online sentence comprehension, compared to adults (Kidd et al., 2011;Snedeker & Trueswell, 2004).Our findings suggest that semantics interacts with syntactic revision in the following way.While in previous studies involving French wh-questions with semantically complex main verbs (e.g."to explain", "to tell") children displayed a strong and adultlike MVattachment preference that they struggled to overrule, in the present study using semantically simple main verbs, revision was found in the majority of 7-8-olds and even in a subset of 5-6-year-olds, suggesting that complex verbs have the potential to trigger a stronger semantic dependency between the main verb and the wh-element, making revision more difficult.
With respect to the role of executive functions in parsing, our data lead to two conclusions.On the one side, they indicate that executive functions play a role in revision rate: participants with stronger executive control revise more often.On the other side, our data show that although 7-8-year-olds have weaker executive control than adults, they do not struggle more than adults with revising an initial MV-attachment.This means that even though executive functions appear to play a role in revision, immature executive control alone does not prevent children from performing adultlike revision.Our results also confirm what was reported by Lassotta et al. (2016), namely that the type of wh-filler, i.e. whether it is a wh-argument or a wh-adjunct, affects the success and rate of revision in children.The rate of misparsing dropped from 88% with wh-adjunct questions to 52% (Lassotta et al., 2016) and even 12% (current study) with wh-argument questions, whereas the revision rate increased from virtually no revision with wh-adjunct questions to about 43% with wh-arguments.The important modulation due to linguistic factors shows that they strongly affect revision in children and they should therefore be carefully considered in future studies.
As for children's ability to draw predictions based on structural information (Borovsky et al., 2012;Gambi et al., 2016;Lukyanenko & Fisher, 2016), our results show that the use of predictive active gap filling increases with age and correlates with executive functions, in particular with cognitive skills involved in updating representations in working memory.What emerges from our study is that strong executive functions facilitate the ability to revise and boost predictive parsing strategies, with inhibition skills being more selective for the former process while working memory update for the latter.This conclusion, however, builds on the assumption that early attachment preference on the main verb results from an early commitment of the parser following an active gap filling strategy.Further support for this hypothesis should be sought in studying children's comprehension of filler-gap dependencies with online methodologies such as eye-tracking.
All in all, the results coming from the present study provide support for the view that domain-general cognitive resources are recruited by syntactic operations such as positing the existence of a predicted syntactic structure and revising an initial parse.This finding speaks against the hypothesis that such operations only recruit cognitive resources that are specialised for syntactic processing (i.e.syntactic working memory, see Fiebach et al., 2001; for discussion, see Ryskin et al., 2020).

Conclusion
The systematic investigation of children's and adults' processing of complex wh-questions has revealed that young children do not show a clear active gap filling strategy of incremental parsing, which keeps developing even after age 8.However, young children are capable of syntactic revision, an ability that improves until age 7-8 years, at which age it stabilises.In line with the few existing studies on adults' comprehension of complex sentences, adults were found to frequently misinterpret garden-path sentences, even more so than children.The systematic investigation of participants' cognitive control skills in parallel to the linguistic measures collected allowed us to extend the finding of a link between revision and cognitive control to children for the first time.The current conclusions rely on an indirect estimate of the revision rate, based on offline responses given by participants.Future work would benefit from gathering more direct measures of revision from methodologies investigating online comprehension.Nevertheless, our study opens a path for research on how sentence processing, and revision in particular, could be improved by cognitive training in children, both with and without developmental language disorder, and whether long-term benefits are discernible.

Figure 1 .
Figure 1.Example of story pictures in the wh-question task.

Figure 2 .
Figure 2. Example of fruit sequence in the N-back task.

Figure 3 .
Figure 3. Example of 2-back detection in the familiarisation step V of the N-back task.

Figure 4 .
Figure 4. Example of (A) congruent and (B) incongruent trials in the DCCS task.

Table 1 .
Means of cognitive control indexes of DCCS, N-back and Corsi tasks for each age group.
Notes: Latency measures are in ms.Standard deviations are in parentheses.Significance levels of comparisons between age groups are indicated between the respective pairs of age groups (significance levels: *** p < .001;** p < .01;* p < .05; .p < .10).

Table 2 .
Partial correlations between EV increase and EV responses, and N-back, DCCS, and Corsi task indexes (controlled for age).
the current study), depending on argument structure and main verb semantics.Active gap filling is measured through MV attachment in ambiguous questions, misparsing through MV attachment in filled-gap questions, and revision through EV increase between ambiguous and filled-gap questions.