Enhanced prosody adds to morpho-syntactic cues in the interpretation of structural ambiguities in German

ABSTRACT This study investigated the effects of syntactically marked and enhanced prosody on local ambiguity resolution in German SVO and OVS sentences. In a visual-world experiment, thirty younger and thirty elderly healthy participants performed a sentence-picture matching task. Response accuracy, reaction times and fixations proportions to the target picture were analysed using linear mixed models. We found no support for beneficial effects of syntactically marked prosody, however, results suggested a facilitative role of enhanced prosodic cues (i.e. increased f0 maximum) prior to the point of disambiguation in SVO structures, as well as the beneficial effects of enhanced prosody adding to morpho-syntactic cues in OVS structures. Both age groups showed comparable cue use but inter-individual variability in prosodic cue processing. Overall, our study replicates and extends previous findings demonstrating the importance of examining variability in prosodic cue processing in future research.


Introduction
The integration of linguistic cues such as lexical, morpho-syntactic or prosodic information plays an important role in deriving sentence interpretations and for generating predictions to facilitate sentence comprehension.Previous studies have demonstrated rapid integration of morpho-syntactic and lexical cues for structural prediction (e.g.Hopp, 2015;Kamide et al., 2003;Knoeferle et al., 2005).Notably, prosodic cues might provide additional or earlier information to facilitate syntactic disambiguation and thematic role assignment (Grünloh et al., 2011;Henry et al., 2017;Weber et al., 2006).Therefore, in this study, we aimed to examine the effects of syntactically marked prosody for disambiguation of structural ambiguities in German subject-verb-object (SVO) and object-verb-subject (OVS) sentences by means of a sentence-picture matching task and the visual-world paradigm.Moreover, we aimed to explore effects of a (phonetically) enhanced prosody condition (i.e.increased f0 maximum), in addition to the marked prosodic cues, because of a previously reported facilitative role of enhanced prosody for prosodic processing and parsing among variable groups of listeners (Fernald & Simon, 1984;Grant, 1987).
In the following, we present previous findings on the resolution of locally ambiguous structures and discuss the functional gap arising from form-function ambiguities and flexible word order in German.We further discuss the role of prosodic cues in German sentence comprehension and the effects of prosody on the interpretation of locally ambiguous German SVO and OVS sentences.In addition, we point to variations in prosodic cues in comprehension and production, to variations in prosodic cue strength, as well as to variability in prosodic cue processing in younger and elderly individuals.

Local ambiguity resolution in German SVO and OVS sentences
In sentence comprehension, listeners process and incrementally combine several types of linguistic cues, such as lexical, semantic, morpho-syntactic, contextual-pragmatic, as well as prosodic information to develop sentence interpretations and to generate predictions about the upcoming input (Hopp, 2015).According to the model of sentence processing proposed by Garrett (1988) and developed further by Bock and Levelt (1994), listeners build a syntactic structure to subsequently link structural sentence components (subject, object) to thematic roles (agent, patient), which is referred to as structural mapping.Such underlying processing mechanisms are also represented in the general assumptions of more current, computational models (e.g.Vasishth et al., 2019).In German, transitive sentences typically include a subject as the agent and an object as the patient of an action.However, German word order is flexible and allows for both canonical SVO structures like in (1) as well as non-canonical OVS structures like in (2).Therefore, besides positional order information, listeners frequently need to make use of cues like morpho-syntactic case marking or lexical-semantic verb information for thematic role assignment to determine "who did what to whom" (Hopp, 2015;Kamide et al., 2003).In German SVO structures, the determiner of a masculine sentence-initial noun phrase (NP1) carries nominative case and marks the constituent as the subject (agent), while the determiner of the post-verbal noun phrase (NP2) has accusative case and thus marks it as the object (patient) of the sentence.However, there is a form-function ambiguity (syncretism) among neuter and feminine determiners in German leading to ambiguities in case marking.For transitive sentences with a neuter NP1, this results in locally ambiguous structures exemplified in (1) for SVO and (2) for OVS sentences: (1) Das Kamel tritt nun den Tiger.
the NOM/ACC-n.camel kicks currently the ACC-m.tiger "The camel is currently kicking the tiger." (2) Das Kamel tritt nun der Tiger.the ACC/NOM-n.camel kicks currently the NOM-m.tiger "The camel is currently kicked by the tiger." Typically, listeners initially interpret the ambiguous NP1 in (1) and (2) as the subject making use of positional order information.Hence, the parser initially prefers to assign the thematic role of the agent to NP1 in both sentence structures because of a strong subject-beforeobject preference (subject-first bias) in German (Gorrell, 2000;Hemforth & Konieczny, 2000).However, in OVS sentences like (2), this initial interpretation turns out to be incorrect and a re-analysis is required at NP2 (Hanne et al., 2015).The case-marked determiner of NP2 (den ACC vs. der NOM ) thus constitutes the morphosyntactic point of disambiguation.Re-analysis, however, is costly and entails a complete re-assignment of the verb's thematic roles to both arguments (Bornkessel et al., 2002;Grewe et al., 2007).Consequently, the revision of the initial SVO interpretation towards an OVS structure is associated with higher processing demands for OVS sentences resulting in, for instance, slower reaction times in sentence-picture matching tasks on auditory sentence comprehension (Hanne et al., 2011(Hanne et al., , 2015) ) or longer reading times in studies of written sentence comprehension (Bader & Meng, 1999;Gorrell, 2000;Schlesewsky et al., 2000).
Since cognitive processes involved in syntactic analysis and structural prediction can be inferred from eyemovements reflecting the direction of visual attention, eye-tracking in the visual-world paradigm is a widely used method to investigate auditory sentence comprehension (for an overview: see Huettig et al., 2011;Ito & Knoeferle, 2023).For instance, Knoeferle et al. (2005) tested locally ambiguous German SVO and OVS sentences similar to those provided in (1) and ( 2) with an initial structural and role ambiguity and an unambiguously case-marked NP2.They conducted an eye-tracking study using the visual-world paradigm where participants listened to auditory sentences and inspected a visual scene showing agent-agent-patient events.The authors found support for verb-mediated visual event information allowing syntactic disambiguation such that participants showed anticipatory eye-movements towards the appropriate patient or agent of the SVO or OVS sentence, respectively.However, when the main verb occurred in sentence-final position, participants were not able to establish reference to the depicted scene.Consequently, verb-final sentence disambiguation was guided by morpho-syntactic case marking information at NP2 only.Similar effects of linguistic cues have been found in visual-world studies with matching tasks including two pictures one of which showing the correct sentence interpretation and the other one displaying a thematic role reversal (e.g.Hanne et al., 2015;Schumacher et al., 2015;Wendt et al., 2014).The studies mentioned so far did either use naturally produced stimuli (Schumacher et al., 2015;Wendt et al., 2014) or stimuli that did not differ in their intonation patterns between conditions by using SVO intonation patterns in both conditions (Knoeferle et al., 2005), or they used ambiguous intonation patterns, that were kept as constant as possible while recording, matched according to their prosodic parameters and artificially manipulated if needed (Hanne et al., 2015).
In sum, previous studies have revealed that listeners rapidly integrate positional order and visual event information, as well as lexical and morpho-syntactic cues to predict and/or revise syntactic structure for the interpretation of auditory sentences.Effects of different SVO and OVS intonation patterns were not systematically investigated with respect to guiding parsing decisions in the interpretation of auditory sentences.

Prosodic cues in German sentence comprehension
In auditory sentence comprehension, prosody constitutes another relevant cue besides, for instance, morpho-syntactic or lexical information (for a review: see Frazier et al., 2006).This becomes obvious when ambiguous sentences are solely disambiguated by prosodic cues, for instance in a sentence like "Who came out ahead?Old men and women with very large houses" in which with very large houses modifies either women (i.e.low attachment) or old men and women (i.e.high attachment) depending on the relative prosodic strength of the prosodic boundary after men or women (example taken from: Frazier et al., 2006).Accordingly, to overcome structural ambiguities arising from syncretisms and flexible word order in German, prosodic cues might provide additional information to facilitate the interpretation of auditory sentences (Grünloh et al., 2011;Weber et al., 2006).Since prosody unfolds globally over the whole utterance, such facilitation might even appear at very early points in time, that is, prior to the morpho-syntactic point of disambiguation at NP2 in the processing of transitive sentences like (1) and (2).
Prosody comprises suprasegmental aspects of the speech stream such as pitch, loudness and length corresponding to fundamental frequency, intensity and duration at an acoustic level (for a review: see Wagner & Watson, 2010).Acoustic variations due to phrase-level stress (pitch accents) relate to focus and information structure and thus to the semantic meaning and thematic relations of a sentence in a variety of languages.Accordingly, previous studies have demonstrated that listeners rapidly integrate pitch accents for syntactic disambiguation and thematic role assignment (Carlson, 2009;Grünloh et al., 2011;Müller et al., 2021;Nakamura et al., 2020;Weber et al., 2006).Therefore, in the present study, we focus on changes in pitch, corresponding to fundamental frequency (f0), for structural prediction: While f0 rise or fall movements align relative to the syllables of the constituents in a sentence, pitch accents can determine the f0 contour of the entire sentence.
Several studies have investigated the role of prosody for sentence interpretation in German SVO and OVS sentences using different f0 contours for the respective word order conditions (Grünloh et al., 2011;Henry et al., 2017;Henry et al., 2022;Kröger, 2018;Kröger et al., 2017;Weber et al., 2006).For morphologically unambiguous SVO and OVS sentences, Kröger et al. (2017) tested two prosodic conditions using the visualworld paradigm: an ambiguous prosody condition (i.e.prosody did not indicate either of the two sentence structures) and a marked prosody condition with SVO or OVS intonation patterns (i.e. with prosodic cues differentiating between the two sentence structures).The authors found support for rapid integration of morpho-syntactic cues, but no beneficial effects of marked prosody.In contrast, Henry et al. (2017) described effects of additivity of prosodic and morphosyntactic cues in unambiguous sentences.That is, participants used prosodic cues in addition to morpho-syntactic case marking information to anticipate the postverbal argument and to maximise their prediction success or, in other words, to minimise their prediction error.The presence of prosodic cues led to a faster rise in fixation proportions to the target picture and a reduction in entropy, thus demonstrating beneficial effects of syntactically marked prosody.The results, however, also showed that the parser constantly adapts to the availability and utility of linguistic cues: Prosodic cues were considered less useful and showed no beneficial effects on the parsing process when they were not consistently available.
For locally ambiguous SVO and OVS structures, previous findings on the effects of prosodic cues in ambiguity resolution also provided mixed results (Kröger, 2018;Weber et al., 2006).Weber et al. (2006) investigated sentences in a marked prosody condition with SVO or OVS intonation patterns using the visual-world paradigm.Sentences consisted of the following structure: NP1, verb, adverb, NP2.For example, SVO structures like Die Katze jagt womöglich den ACC Vogel ("The cat is possibly chasing the bird.")and OVS structures like Die Katze jagt womöglich der NOM Hund ("The cat is possibly chased by the dog.") were used (example taken from: Weber et al., 2006).In their study, SVO intonation was characterised by a prenuclear accent on NP1 (L*+H, following the GToBI annotation system; Grice et al., 2005) with sentence stress on the verb (H*), while OVS intonation was characterised by sentence stress on NP1 (L+H*).They found anticipatory eye-movements towards the suitable patient in SVO structures when participants heard the post-verbal adverb (i.e.prior to the morphosyntactic point of disambiguation at the determiner of NP2) indicating a facilitated interpretation of NP1 as the subject.However, there were no reliable facilitating effects of marked prosody on the interpretation of OVS sentences, since no clear object-before-subject preference was found for OVS structures.Kröger (2018) used the same paradigm to test locally ambiguous OVS sentences in a marked prosody condition with either SVO or OVS intonation assigning similar intonation patterns as used in the study of Weber et al. (2006).However, they found no reliable effects of marked prosody.
To sum up, previous studies investigating the effects of prosodic cues on the interpretation of auditory sentences provided mixed results for unambiguous as well as for locally ambiguous structures.Overall, the results suggest that prosodic cues constitute more subtle information for thematic role assignment and facilitate sentence interpretation of canonical SVO structures.

LANGUAGE, COGNITION AND NEUROSCIENCE
However, previous studies did not find support for reliable effects of prosody on the interpretation of non-canonical OVS structures.

Variability of prosodic cues
The mixed results found for the effects of prosody on the interpretation of auditory sentences and local ambiguity resolution might be explained by factors such as variations in experimental design, differences in visual and auditory stimuli and, most importantly, in the examined f0 contours.
Previous studies reported variability in the production of f0 contours associated with German SVO and OVS sentences, respectively, and there is intra-as well as interindividual variability in the use of prosody for sentence production (Huttenlauch et al., 2022;Weber et al., 2006).Accordingly, prosodic cues can be variable in their consistency and strength within and across speakers (Cangemi et al., 2015;Henry et al., 2017), which might result in inter-individual differences in syntactic disambiguation with the help of prosody.Following the GToBI annotation system (Grice et al., 2005), previous studies generally used an f0 contour in OVS sentences that was characterised by sentence stress on NP1 (L+H* accent with f0 rise on the stressed syllable), while the f0 contour of SVO sentences was either characterised by an initial accent on NP1 (L*+H accent with f0 minimum at the centre and f0 rise at the end of the stressed syllable) and sentence stress on the verb (H* accent with an f0 maximum) (Kröger, 2018;Kröger et al., 2017;Weber et al., 2006), or an H+L* accent with an f0 fall on NP2 (Henry et al., 2017(Henry et al., , 2022)).These differences in stimulus characteristics might be a potential factor responsible for the mixed results observed so far.
Moreover, the use of prosodic cues for sentence interpretation and thus its facilitative role in local ambiguity resolution might also be influenced by listenerdependent variations.Listeners might be sensitive to speaker-specific intonation patterns in the use of prosodic cues.However, inter-individual differences among listeners lead to variability in the detectability of prosodic cues and in the reliability of decoding intonation patterns across speakers (Cangemi et al., 2015).That is, some listeners more reliably decode specific intonation patterns, while others might be less able to reliably decode them.Therefore, increasing prosodic cue strength might lead to better detectability of prosodic cues and to more reliability in decoding of intonation patterns in general.Moreover, increased prosodic cue strength might be especially beneficial for some groups of listeners.For instance, studies on infantdirected speech (e.g.Fernald & Simon, 1984) or listeners with hearing impairments (e.g.Grant, 1987) have investigated enhanced f0 contours with increased prosodic cue strength and demonstrated beneficial effects for listeners in following intonation patterns and parsing the speech stream.Finally, age might be another listenerdependent source of variation that influences the processing of prosodic cues (for a review: see Burke & Shafto, 2007).Titone et al. (2006), for instance, demonstrated that both younger and elderly individuals make use of prosodic cues in a comparable manner.However, younger participants were able to employ prosodic cues more quickly than elderly participants.
Given the differential findings for the use of prosody in comprehension and production of SVO and OVS sentences, Huttenlauch et al. (2022) investigated speakerspecific intonation patterns to naturally distinguish SVO and OVS structures in a production study.They used a referential communication task, where participants were asked to utter locally ambiguous sentences like (1) and ( 2) in such a way that the listener would be able to identify the picture matching the correct sentence interpretation as quickly and accurately as possible.Visual stimuli depicted agent, patient and the action for either an SVO or OVS sentence thus showing either the correct sentence interpretation (i.e. the target picture) or a thematic role reversal (i.e. the foil picture).At first, participants saw both the target and the foil picture in a preview phase of 4000 ms.Next, the target picture was highlighted with a green frame and the question Was sehen Sie? ("What do you see?") was presented via headphones to trigger production of either the SVO or OVS sentence.The authors found, firstly, a rather consistent use of prosodic cues within participants and across trials, reflecting only minor use of prosody to differentiate between SVO and OVS structures.Secondly, there was a high degree of variability between participants, and only one speaker (out of 16 participants) naturally and consistently produced intonation patterns to distinguish SVO and OVS structures (Huttenlauch et al., 2022).This speaker used an L*+H accent on NP1 (f0 minimum at the centre and f0 rise at the end of the stressed syllable) in SVO structures in contrast to an L+H* accent on NP1 (f0 rise on the stressed syllable) in OVS structures. 1 However, it has not been tested yet whether listeners are capable of reliably decoding these speaker-specific intonation patterns to guide their parsing decisions in the processing of local ambiguities.

Aim of the study
The present study aimed to investigate whether younger and elderly listeners are sensitive to the SVO and OVS intonation patterns found by Huttenlauch et al. (2022) and use these prosodic cues in addition to or earlier than morpho-syntactic cues for structural prediction and syntactic disambiguation.In addition, we aimed to explore to what extent enhanced prosody facilitates thematic role assignment and the interpretation of locally ambiguous structures in both age groups.
We thus examined the effects of ambiguous, marked and enhanced f0 contours on the interpretation of locally ambiguous German SVO and OVS sentences at the group level as well as at the level of individual participants by means of a sentence-picture matching task and the visual-world paradigm.Therefore, we investigated three different prosodic conditions: The ambiguous prosody condition involved auditory sentences without acoustic differences in the intonation patterns of SVO and OVS structures in reference to Kröger et al. (2017), while stimuli in the marked prosody condition complied with the f0 contours distinguishing SVO and OVS sentences that were naturally produced by one speaker in the study of Huttenlauch et al. (2022).They were thus used to syntactically mark locally ambiguous structures.The enhanced prosody condition involved the same f0 contours, in which, however, these contours were phonetically enhanced (i.e.increased f0 maximum) by the speaker.
Our overarching research question reads as follows: How efficient are prosodic cues in guiding parsing decisions in the interpretation of locally ambiguous sentences?To address this question, we investigated (i) sentence interpretation of locally ambiguous SVO and OVS structures (word order effect), (ii) the facilitative role of marked and enhanced prosodic cues in local ambiguity resolution in comparison to the ambiguous prosody condition (prosodic effect), and (iii) variability in prosodic cue processing between younger (aged 18-35 years) and elderly (aged 60-80 years) individuals (age effect).Firstly, because of the subject-first bias in German (Gorrell, 2000;Hemforth & Konieczny, 2000), we expected to find higher response accuracy, faster reaction times and higher fixation proportions to the target picture for SVO compared to OVS sentences.Secondly, given the finding of rapid integration of prosody for syntactic disambiguation (Weber et al., 2006), we hypothesised a facilitative role of prosodic cues prior to the point of disambiguation in locally ambiguous structures in the marked prosody condition.Based on the findings by Weber et al. (2006), marked prosody in SVO structures would lead to higher response accuracy, faster reaction times, higher fixation proportions, as well as earlier looks to the target picture (prior to the morpho-syntactic point of disambiguation at NP2) compared to the ambiguous prosody condition.However, we expected to find no support for reliable prosodic effects in OVS structures (Kröger, 2018;Weber et al., 2006).Therefore, we aimed to explore an additional enhanced prosody condition and expected beneficial effects of enhanced prosody on the interpretation of both SVO and OVS structures based on previous findings demonstrating the facilitative role of enhanced prosody for listeners with hearing impairments (e.g.Grant, 1987) and in infant-directed speech (e.g.Fernald & Simon, 1984).Lastly, the age ranges of 18-35 for younger and 60-80 years for elderly participants were set to explore variability in prosodic cue processing across the adult lifespan (Burke & Shafto, 2007;Titone et al., 2006).Following Titone et al. (2006), we hypothesised a comparable use of prosodic cues and no differences in response accuracy for younger and elderly individuals between prosodic conditions.At the same time, we expected to find faster reaction times for younger vs. elderly participants in the marked prosody condition compared to the ambiguous prosody condition based on their ability to make use of prosodic cues for sentence interpretation more quickly.This might also result in earlier looks to the target picture for marked prosody in younger vs. elderly participants.We further hypothesised enhanced prosody to show more beneficial effects on sentence interpretation in elderly individuals because of an increase in prosodic cue strength potentially leading to better detectability of prosodic cues and to more reliability in decoding of intonation patterns.

Participants
Sixty-three native speakers of German participated in this eye-tracking study: thirty younger and thirty-three elderly participants.Younger participants (25 females, 5 males) were aged between 18 and 32 years (M = 23.73,SD = 4.03).They reported to be right-handed (n = 26) or left-handed (n = 4).Four younger participants were raised bilingually (Chinese, Polish, Russian, Turkish), but each considered German as their dominant language.Data from three elderly participants had to be excluded from analyses due to ocular artefacts or the use of hearing aids which led to interference effects with the headphones.The remaining thirty elderly participants (21 females, 9 males) were aged between 61 and 79 years (M = 70.27,SD = 5.29) and were right-handed (n = 28), left-handed (n = 1) or ambidexter (n = 1) as assessed by the Edinburgh Handedness Inventory (Oldfield, 1971).All participants had normal or corrected-to-normal vision and reported no history of neurological or psychiatric impairments.A hearing screening was conducted to ensure normal hearing, that is, a mean of air-conduction thresholds at 0.5, 1, 2 and 4 kHz < 25 dB HL in the better ear (World Health Organization, 1991).Accordingly, all younger as well as twenty-four elderly participants showed normal hearing abilities.Six elderly participants showed mean air-conduction thresholds between 25 and 30 dB HL due to a slight decline in hearing ability affecting hearing at 2 and 4 kHz.However, this does not comprise hearing of normal levels of conversations and can be associated with age-appropriate changes in hearing ability (World Health Organization, 1991).Importantly, these age-related changes in hearing had no impact on processing of auditory stimuli in the present study encompassing other frequencies.
We conducted a screening of non-linguistic cognitive abilities in both age groups using German versions of the following neuropsychological tests: Multiple-choice Vocabulary Test (MWT-B; Lehrl, 2005) as a measure of crystallised intelligence, Digit and Block Span Tests of Wechsler's Memory Scale (WMS-R; Härting et al., 2000) to assess working memory capacities, the Digit-Symbol-Substitution Test of Wechsler's Intelligence Scale for Adults (WAIS-IV; Petermann, 2012) for an assessment of general processing speed as well as the Trailmaking Test A and B (TMT; Reitan, 1979) as a measure of attention and executive functioning.Elderly participants differed from younger participants in the following measures (Welch's two-sample t-test, two-sided, p < .05):As expected, they outperformed younger participants in measures of crystallised intelligence, but showed lower general processing speed and visual working memory capacities.However, both age groups showed similar performances in measures of auditory working memory, attention and executive functioning (Welch's two-sample t-test, two-sided, p > .05).Elderly participants additionally performed and successfully passed the Montreal-Cognitive-Assessment (MoCa; Nasreddine et al., 2005) as a screening tool for mild cognitive impairments.
The study was approved by the ethics committee of the University of Potsdam (registration number 99/ 2020) and participants gave informed consent in accordance with the Declaration of Helsinki (World Medical Association, 2013).Younger participants were recruited via the cloud-based participants pool software at the University of Potsdam and via word-of-mouth.Elderly participants were recruited via flyers, word-of-mouth and announcements on the website of a local senior association.They were all compensated by monetary reimbursement.The study took place in the eye-tracking lab at the University of Potsdam.

Materials and validation study
Since it has not been tested yet whether listeners are sensitive to the speaker-specific intonation patterns (marked prosody) found in the study of Huttenlauch et al. (2022) and capable of reliably decoding these SVO and OVS intonation patterns as well as the magnified f0 contours with increased prosodic cue strength (enhanced prosody), we conducted a validation study in order to examine to what extent listeners would be sensitive to marked and enhanced prosody to differentiate between SVO and OVS sentences.We aimed to examine how efficient differentially produced f0 contours are as disambiguating cues to distinguish locally ambiguous structures.Subsequently, we selected the best distinguishable and most representative auditory stimuli for the eye-tracking study.

Auditory stimuli
Auditory stimuli were taken from Huttenlauch et al. (2022) and consisted of 42 locally ambiguous semantically reversible German sentences (SVO: n = 21; OVS: n = 21) with the following structure: noun phrase (NP1), verb, adverb, noun phrase (NP2).The two different word order conditions are exemplified in (1) for SVO and (2) for OVS sentences.A list of stimuli is provided in Table I in Appendix.
The stimuli included two-place verbs describing reversible depictable actions.Mono-and bisyllabic animate nouns of the categories humans, animals and fairy tale characters were used in subject and object positions.All nouns in NP1 were neuter and thus caseambiguous (das NOM/ACC ).The nouns in NP2 were masculine and the unambiguous case-marked determiner at NP2 thus served as the morpho-syntactic point of disambiguation of the sentence (den ACC in SVO vs. der NOM in OVS sentences).The adverb was the same for every stimulus and was used to stretch the distance between NP1 and NP2 to give time for prosodic encoding.

Stimuli recording
Auditory stimuli were spoken by a linguistically trained female native speaker of German in three different prosodic conditions: marked prosody condition (marked), enhanced prosody condition (enhanced) and ambiguous prosody condition (ambiguous).The speaker was a member of the research team familiar with the prosodic conditions and able to naturally produce the auditory stimuli in the different conditions.They were recorded in a sound-attenuated booth at a sampling rate of 44,1 kHz.Marked and enhanced prosody differed in their SVO and OVS intonation patterns of NP1, respectively.The marked condition was recorded in reference to productions of one speaker in the study of Huttenlauch et al. (2022), who used differential f0 contours to naturally distinguish SVO and OVS sentences.Although we did not run a formal GToBI analysis, the auditory stimuli can be characterised as follows: In the SVO condition, the f0 contour resembled an L*+H accent on NP1 showing its peak at the end of the last syllable of NP1 around 15-20 ms from word onset with a longer rise time from f0 minimum to maximum.In contrast, the f0 contour in the OVS condition resembled an L+H* accent showing its peak in the centre of NP1 around 10-15 ms from word onset with a shorter rise time compared to the SVO condition.The f0 contours in the marked prosody condition of the examples in (1) for SVO and (2) for OVS sentences are provided in Figure 1.
The enhanced condition can be characterised by the same f0 contours like the marked condition, but differed from the marked condition by showing an increased f0 maximum (around 70 Hz) in both word order conditions.Thus, the marked condition was phonetically enhanced while recording.However, similar to the marked condition, the f0 contour showed its peak at the end of the last syllable of NP1 for SVO sentences (L*+H accent), while the f0 maximum was more centred for OVS sentences (L+H* accent).In the ambiguous condition, there was no difference in the intonation patterns of SVO and OVS sentences, since in reference to Kröger et al. (2017) no accent was placed on NP1.Multiple tokens of each experimental and practice item were recorded and four tokens were then selected for further examination based on sound quality judgements and visual inspections of the f0 contours.

Validation study
In order to validate that listeners would indeed be sensitive to the marked and enhanced f0 contours to differentiate between SVO and OVS sentences, stimuli in the marked and enhanced conditions were investigated in a preceding web-based validation study using a two-alternative forced choice (2AFC) task that was set up via the web-based experimental tool LabVanced (Finger et al., 2017).In this validation study (see Schneider et al., 2024), participants (n = 32) listened to overall 336 recordings of the experimental and practice items (21 sentences * 2 word order conditions * 2 prosodic conditions * 4 tokens of each sentence), which were played only up to the disambiguating determiner at NP2 (e.g.Das Kamel tritt nun).The task was to decide, based on the heard intonation pattern, which of two possible options for NP2 would best complete the sentence: One option represented an NP2 in accusative case and thus the SVO condition (for example: den Tiger), the other option showed an NP2 in nominative case therefore representing the OVS condition (for example: der Tiger).Response accuracy and reaction times were measured and analysed following signal detection theory and by applying linear mixed models for significance testing.An a'-score of 0.69 (b''d = −0.01)for the two word order conditions indicated a moderate mean sensitivity to discriminate SVO and OVS sentences.Sensitivity levels in the enhanced condition (a' = 0.72, b''d = 0.02) were higher than in the marked condition (a' = 0.64, b''d = −0.02).Hence, the differentially produced f0 contours in marked and even more so in enhanced prosody could indeed serve as disambiguating cues between SVO and OVS structures.
Selection procedure for auditory stimuli Based on the findings of the validation study, for each experimental and practice item, the best distinguishable and most representative tokens in the marked and enhanced conditions were selected for the eye-tracking study by applying the following selection criteria in weighted order: . For each possible pair of SVO and OVS tokens of an item, the f0 rise quotient should be (a) maximally different from 1 and (b) preferably larger than 1 resembling an earlier f0 rise in the OVS than in the SVO condition (see below for more details on how the f0 rise quotient was determined for each pair). .Response accuracy to the tokens in the 2AFC experiment should preferably be above 60%. .Visual inspections of the time-normalised f0 contours of the pair of SVO and OVS tokens should support a prosodic differentiation between the two word order conditions.
For calculation of the f0 rise quotients, f0 values were extracted and f0 contours were post-processed using PRAAT (Boersma & van Heuven, 2001) following Huttenlauch et al. ( 2022) and Hanne et al. (2015) in reference to procedures of Mausmooth (Cangemi, 2015) and Proso-dyPro (Xu, 2013).The f0 rise quotient served as the first criterion in the weighted hierarchy and was determined for each pair of SVO and OVS tokens by firstly calculating the difference between the f0 minimum and maximum within NP1, resulting in the f0 rise value of each token.In the following, the f0 rise value of each SVO token was then paired with the f0 rise value of each OVS token.The ratio between the two respective f0 rise values was calculated for each pair to select stimuli in the marked condition (M = 1.35,SD = 0.22) and in the enhanced condition (M = 1.23,SD = 0.15).Based on response accuracy in the 2AFC task as the second criterion in the weighted hierarchy, stimuli were selected in the marked SVO condition (M = 70.3%,SD = 9.9%), marked OVS condition (M = 61.1%,SD = 9.9%), enhanced SVO condition (M = 74.1%,SD = 9.0%) as well as enhanced OVS condition (M = 66.6%,SD = 6.8%).For the third criterion, the visual differentiation of the token pair was judged based on the plotted time-normalised f0 contours in Hz.For items with a monosyllabic NP1 or a bisyllabic NP1 with ultima stress, differences in f0 rise delay, that is, the time from NP1 onset until f0 starts to rise, were examined in SVO vs. OVS structures.For items with a bisyllabic NP1 with penultima stress, differences in f0 maximum prolongation were inspected.Visualisations were generated using the package ggplot2 (Wickham, 2016) in RStudio (R Core Team, 2020).
Auditory stimuli for SVO and OVS items in the ambiguous condition were selected based on a similar pairing procedure, however, since the ambiguous condition is not supposed to show any difference in the intonation patterns between SVO and OVS structures, it was determined for which pair the quotient between the two f0 rise values was nearest to 1 and the respective stimuli were selected accordingly (M = 1.01,SD = 0.06).The mean f0 contours of SVO and OVS structures in all three prosodic conditions are provided in Figure 2.
In addition to f0 calculations, constituent durations (NP1, verb and adverb, NP2) were statistically compared between the two word order conditions within one prosodic condition.Constituents did not show significant differences in duration between SVO and OVS structures, except for the duration of NP2 in the marked condition with significantly shorter durations of NP2 in the OVS compared to the SVO condition (t = 2.68, df = 39.25, p < .05).Hanne et al. (2015) pointed to a shorter duration of the determiner der NOM compared to den ACC at NP2 because of the phonological difference between fricatives and nasal sounds that might explain these durational differences.However, because the determiner at NP2 constituted the morpho-syntactic point of disambiguation in the present study, these durational differences do not affect prosodic cue processing in the ambiguous part of the sentence prior to NP2.Finally, the overall speech rate in syllables per second was determined across conditions for the selected stimuli and resembled a natural speech rate of 3-6 syllables per second across languages including German (M = 3.97, SD = 0.43) (Levelt, 2001).

Experimental design
The selected auditory stimuli consisted of 120 experimental items (20 sentences * 2 word order conditions * 3 prosodic conditions) and 6 practice items (1 sentence * 2 word order conditions * 3 prosodic conditions).The test phase of the experiment consisted of two blocks: Half of the participants first listened to the mixed ambiguous and marked conditions in the first block (80 trials), followed by the enhanced condition in the second block (40 trials).For the other half of participants, the order was reversed.Block order was varied between participants and experimental items were assembled into eight pseudorandomised lists, in which lists 1-4 started with the mixed ambiguous-marked block and lists 5-8 started with the enhanced block.They were controlled for the following conditions that varied within participants: word order (SVO vs. OVS), prosody (ambiguous vs. marked for the mixed block), target position (left vs. right) and action direction (from left to right vs. from right to left) with maximally three occurrences of the same condition in a row, respectively.Further constraints involved the occurrence of the same sentence (at least five items in between), as well as the occurrences of NP1 and NP2 (as agent: at least three items in between; as patient: at least two items in between).

Materials for preview
For the preview phase of each item in the eye-tracking study, carrier sentences were recorded by the same trained speaker for every trial.The carrier sentence exemplified in (3) described the depicted figures and actions of the visual stimuli and ensured the recognition of the constituents of the auditory sentences in (1) or (2).
"You can see a camel and a tiger on each of those pictures.
The action is kicking."The carrier sentences were post-processed using PRAAT (Boersma & van Heuven, 2001).The single components for the introduction parts (Auf diesen Bildern sehen Sie jeweils; Die Handlung ist), NP1 and NP2 (for example: ein Kamel und einen Tiger) and the verb (for example: treten) were recorded separately and spliced afterwards.Hence, all carrier sentences involved the same introduction parts at the beginning of the sentence.The auditory stimuli and carrier sentences were all scaled at an intensity level of 70 dB.A silence period of 250 ms before the stimulus as well as 40 ms after the stimulus was added to the audio files to ensure a smooth uploading to MATLAB (MATLAB, 2021).

Visual stimuli
Visual stimuli were taken from Huttenlauch et al. (2022) and consisted of black-and-white line drawings depicting the events of the auditory stimuli, that is, the agent, patient and action, in both word order conditions.Thus, there was a target picture matching the event mentioned in the sentence and a foil picture showing a thematic role reversal.The visual stimuli of the examples in (1) for SVO and (2) for OVS sentences are provided in Figure 3.

Procedure
Participants first received study information and signed the consent form and data protection declaration.Demographic information was inquired and noted down in the study's protocol.Next, participants performed the hearing screening and the neuropsychological tests.For the visual-world experiment, they received verbal and written instructions on paper, before they were seated in a comfortable position at approximately 65 cm in front of a computer screen (screen size: 24 inch, resolution: 1920 × 1080 pixels).A Tobii Pro X3-120 eye-tracker (binocular tracking, accuracy: 0.4°, head-movetolerance at 80 cm: 50 cm x 40 cm) was mounted to the screen, calibrated and aligned appropriately.The experiment was implemented using the PsychToolbox extension (version 3.0.17;Brainard, 1997) for MATLAB (MATLAB, 2021) with the eye-tracker being interfaced using the Tobii SDK (version 1.8.0.21) to set up the experiment.Participants' eye-movements were monitored with a sampling rate of 120 Hz.They completed a 9point calibration procedure and were again led through the most important instructions on screen.Participants wore headphones to listen to the auditory stimuli.After a practice phase for the first block, participants were given the opportunity to ask questions.In the course of the experiment, participants first looked at a central fixation cross for 600 ms.In a subsequent preview phase of 7400 ms, they looked at both the target and foil picture while listening to the corresponding carrier sentence of the presented stimulus.After the central fixation cross re-appeared for 600 ms, participants again looked at the target and foil picture while now listening to the auditory stimulus in the prosodic condition(s) of the first block.Their task was to look at the pictures and listen to the sentence.In addition, participants performed a sentence-picture matching task.They needed to select the picture that matched the auditory sentence as quickly and accurately as possible via button press.To press the button, participants used the index and middle finger of the dominant hand.The maximum response time was set to 8000 ms.After an inter-stimulus-interval of 600 ms, the next trial started.Following the first block, there was a short break to preserve attention.At the beginning of the second block, a re-calibration was administered and an additional practice phase for the second block was performed.The whole experiment lasted approximately sixty minutes.
Response accuracy was coded as correct (1) or incorrect (0).Responses exceeding maximal response time of 8000 ms were treated as incorrect.For five participants (younger: n = 3, elderly: n = 2), one block was excluded from further analyses based on an error rate of 2.5 SD above the mean error rate per block and age group or based on reported misunderstanding of instructions.Younger participants showed a total mean error rate of 4.25%, while elderly participants showed a total mean error rate of 7.22%.Only correct responses were included in further analyses on reaction times and eye-movement data.
Reaction times beyond 2.5 SD from the individual participants' means per word order condition were defined as outliers and thus excluded from further analyses.Thus, 2.22% of reaction time data for younger and 2.02% for elderly participants were excluded.Reaction times were measured in ms from sentence onset.However, for modelling, reaction times were normalised to sentence offset to account for durational differences due to the different prosodic conditions.Reaction times thus ranged from -650.30 ms to 5367.16 ms from sentence offset.They were re-zeroed in dependence on the fastest given response to account for responses given before sentence offset and to avoid negative values.Finally, reaction times were log-transformed following the result of the box-cox transformation test (λ = −0.30).
Eye-movements were recorded to examine fixation proportions towards the target picture.They were post-processed using the package eyetrackingR (Dink & Ferguson, 2015) as well as a fixation detection algorithm adapted from Van der Lans et al. (2011), which captured within-fixation variability like drifts, as well as saccades and blinks in the data.Thus, invalid gazes as well as saccades and blinks were excluded from the data set.As a result, 83.38% of the data for younger and 79.04% of the data for elderly participants were treated as fixations and entered further analyses.Visual areas of interest (AOIs) consisted of the two black-and-white line drawings on the screen.Non-AOI looks were treated as missing data.Fixation data from both eyes were used to examine gaze positions.Missing data points from both eyes were treated as trackloss and removed resulting in 7.49% of overall trackloss in the fixation data set.Previously calculated trigger points were implemented in MATLAB (MATLAB, 2021) in form of onset times of constituents in relation to sentence onset and served to define the following auditory regions of interest (ROIs) for a time-window analysis of fixation proportions: NP1, verb, adverb, NP2.The time window after sentence offset until the participants' response was additionally defined as a silence period.Time windows were shifted 200 ms forward to account for the time to make a saccade in response to the auditory stimulus (Saslow, 1967).Conducting a time window analysis was motivated by our research questions on potential effects of our dependent variables occurring in respective time windows (interest periods) following previous studies on disambiguation (for an overview: see Ito & Knoeferle, 2023).
For statistical analyses, generalised linear mixed models with a binomial link function were fit on response accuracy data as well as on fixation proportions separately for every time window.Linear mixed models with a Gaussian link function were fit on reaction time data of correct responses.Word order, prosody, age group and their interactions were included as predictors, while block order was additionally included as a covariate.Contrasts for word order, age group and block order were effect coded.For prosody, a custom contrast was coded where the intercept reflected the grand mean of all three prosodic conditions, the first slope coded the difference between ambiguous and marked prosody and the second slope coded the differences between ambiguous and enhanced prosody.Models further comprised random effects of word order, prosody and age group with correlated varying intercepts and slopes by participant (word order, prosody) and by item (word order, prosody, age group).Model reductions were performed following the concept of parsimony in mixed models (Matuschek et al., 2017).Model comparisons were performed using the likelihood ratio test and residuals were checked for their distributional properties.For the coded contrasts, coefficient estimates, standard errors, and z-or t-values are reported.In addition, p-values are provided indicating statistical significance.The resulting p-values of the generalised linear mixed models on fixation proportions for each time window were corrected for multiple comparisons using the Bonferroni method.In the following, reported results focus on main effects and interactions in accordance with their relevance to the research questions.

Response accuracy
Means and standard errors of response accuracy in younger and elderly participants are shown in Table 1.For both younger and elderly participants, mean response accuracy showed performances at ceiling for all three prosodic conditions (ambiguous, marked, enhanced) with only slight differences between conditions.Overall, response accuracy was higher and standard errors tended to lower in the SVO compared to the OVS condition.
This was confirmed by the results of the maximal generalised linear mixed model on response accuracy (4) that showed a significant main effect of word order indicating overall higher response accuracy for SVO vs. OVS structures with an estimated difference of 2.62%.In addition, there was a significant main effect of age group indicating overall higher response accuracy for younger vs. elderly participants with an estimated difference of 1.69%.We did not find significant main effects of prosody, block order or any statistically significant interactions (see Table 2).Given the ceiling effects for all three prosodic conditions, results on response accuracy must, however, be interpreted with caution.

Reaction times
Reaction times for younger and elderly participants are visualised in Figures 4 and 5 The maximal linear mixed model on reaction times did not converge and showed a singular fit.Therefore, it was reduced stepwise in its random effects structure following the procedure described by Matuschek et al. (2017).The final model ( 5) explained variance equally well in comparison to the maximal model (χ²(92) = 79.63,p = 0.82) and showed a better model fit in terms of a smaller AIC value.Hence, the final model was preferred over the maximal model as the less complex one.
(5) log (reaction times) ∼ word order * prosody * age group + block order + (1 + word order + prosody (ambiguous-enhanced Results showed a significant main effect of word order indicating overall faster reaction times for SVO vs. OVS structures with an estimated difference of 328.84 ms.In addition, there was a significant main effect of age group indicating overall faster reaction times of younger vs. elderly participants with an estimated difference of 573.71 ms.Moreover, a significant main effect of block order indicated overall faster reaction times of participants when starting with the mixed ambiguous-marked block in comparison to the enhanced block.However, there were no significant main effects of prosody or any statistically significant interactions (see Table 3).

Eye-movements
Fixation proportions to the target picture for correct responses of younger and elderly participants are  shown in Figure 6 separated by the following time windows: NP1, verb, adverb, NP2 and the silence period.
In the following, results on statistical comparisons of fixation proportions to the target picture are reported separately for each time window (for a description of a similar procedure on interest periods: see Ito & Knoeferle, 2023).P-values were Bonferroni-adjusted for five comparisons (i.e. according to the number of regions of interest).The maximal generalised linear mixed model on fixation proportions for each time window is exemplified in ( 6 For the NP1 region, the maximal generalised linear mixed model on fixation proportions did not show any statistically significant effects or interactions (see Table 4).
For the verb region, there was a significant main effect of word order which was qualified by a statistically significant interaction between word order and prosody: ambiguous-enhanced.This interaction effect indicated a larger difference in fixation proportions between SVO and OVS structures in the enhanced compared to the ambiguous condition.There were no significant main effects of prosody, age group or block order and no further statistically significant interactions (see Table 5).
For the adverb region, there was a significant main effect of word order with higher fixation proportions to the target picture for SVO vs. OVS structures (estimated difference: 47.36%).There were no significant main effects of prosody, age group, block order and no statistically significant interactions (see Table 6).
For the NP2 region, we found a significant main effect of word order with higher fixation proportions to the target picture for SVO vs. OVS structures (estimated difference: 47.39%).In addition, there was a significant main effect of prosody: ambiguous-enhanced indicating more fixations to the target picture in the enhanced compared to the ambiguous condition (estimated difference: 7.20%).There were no significant main effects of prosody: ambiguous-marked, age group or block order and no statistically significant interactions (see Table 7).
For the silence period, we found a significant main effect of word order with higher fixation proportions to the target picture for SVO vs. OVS structures (estimated difference: 5.23%).There were no further statistically significant effects or interactions (see Table 8).
Post-hoc, we checked for potential learning effects by comparing responses made earlier (first half of trials in each block) or later (second half of trials in each block) in the course of the experiment.Notably, adding learning (early vs. late) as a predictor to the maximal generalised linear mixed models on fixation proportions did not affect the previously found significance patterns pointing to stable effects.In addition, we found a significant main effect of learning for the NP2 region with higher fixation proportions to the target picture for late vs. early responses (estimated difference: 4.26%).We further found several statistically significant interactions in different time windows.However, since this is only an exploratory analysis, we concentrate on effects that were repeatedly encountered: Firstly, we found a significant interaction between prosody: ambiguous-marked and learning in the NP1, verb, and NP2 regions as well as in the silence period.In the NP1 and verb regions, the interaction effects indicated higher fixation proportions to the target picture for late vs. early responses for marked prosody.In contrast, lower fixation proportions to the target picture for late vs. early responses were  found for ambiguous prosody.In the NP2 region and silence period, ambiguous prosody showed the reversed pattern.Secondly, there was a significant three-way interaction between prosody: ambiguous-marked, age group and learning in all time windows.Up to the NP2 region, the interaction effects pointed to higher fixation proportions to the target picture in late vs. early responses for marked prosody, but only in younger participants while elderly participants showed the opposite pattern.For ambiguous prosody, fixation proportions were more comparable in early and late responses for both participant groups, except for the silence period, where elderly participants demonstrated higher fixation proportions in late vs.

Discussion
The present study examined the effects of prosodic cues on the interpretation of locally ambiguous German SVO and OVS sentences.We investigated (i) the effect of word order in locally ambiguous SVO and OVS structures, (ii) the effect of prosody and a potentially facilitative role of marked and enhanced prosody on the interpretation of structural ambiguities in German, and (iii) the effect of age and variability in prosodic cue processing across the adult lifespan.

Word order effect
We examined local ambiguity resolution in German SVO and OVS sentences and found significant main effects of word order for response accuracy, reaction times and fixation proportions over time from the verb region to the silence period.In line with our hypotheses, we thus found higher response accuracy, faster reaction times and higher fixation proportions to the target picture for SVO compared to OVS structures.This indicates a strong subject-before-object preference in the interpretation of these auditory sentences in line with previous findings on the subject-first bias in German (Gorrell, 2000;Hemforth & Konieczny, 2000).That is, participants initially interpret the locally ambiguous structures as SVO sentences and assign the thematic role of the agent to the ambiguous NP1 in both sentence structures.Consequently, when encountering the morphosyntactic cues at the unambiguous NP2 in OVS sentences, participants needed to revise their initial misinterpretation.Hence, the significant main effects of word order support the assumption of higher processing demands attributable to a costly re-analysis in OVS structures in both age groups (Bornkessel et al., 2002b;Grewe et al., 2007;Hanne et al., 2015).The main effect of word order for fixation proportions to the target picture from the beginning of the verb region is further consistent with findings by Weber et al. (2006): For SVO structures, the authors reported anticipatory eye-movements towards the suitable patient prior to the morpho-syntactic point of disambiguation at NP2, but this effect was not found for OVS structures indicating the absence of an object-before-subject preference.However, making use of morpho-syntactic case marking information at NP2, all participants were able to disambiguate the locally ambiguous SVO and OVS structures in the present study, which is apparent from both age groups' performances at ceiling in the sentence-picture matching task.The integration of morpho-syntactic cues for syntactic disambiguation is in line with previous studies (e.g.Knoeferle et al., 2005;Kröger et al., 2017).

Prosodic effect
We further examined whether marked and/or enhanced prosody facilitates local ambiguity resolution in comparison to the ambiguous prosody condition, that is, whether participants can make use of prosodic cues for thematic role assignment and sentence interpretation.For marked prosody, our findings are only partially in line with the formulated hypotheses.In contrast to the study of Weber et al. (2006), who reported a rapid integration of prosodic cues for syntactic disambiguation, our results on response accuracy, reaction times and fixation proportions did not show statistically significant differences between the ambiguous and marked prosody condition.Visual inspections of the mean fixation proportions to the target picture indicated a tendency of marked prosody facilitating the interpretation   (Kröger, 2018;Weber et al., 2006).Thus, in the present study, marked prosody did not facilitate the interpretation of SVO or OVS structures nor did they lead to a reduction of the subject-first bias at very early points in time, that is, prior to the morphosyntactic point of disambiguation at NP2.For enhanced prosody, our findings on response accuracy and reaction times did not show statistically significant differences in comparison to the ambiguous prosody condition either.However, statistical analysis of fixation proportions showed a statistically significant interaction between word order and the ambiguous and enhanced prosodic contrast in the verb region.This interaction indicated a larger difference in fixation proportions between SVO and OVS structures in the enhanced prosody condition.Even though this difference seems to be driven by the subject-first bias, enhanced prosody might still have played a facilitative role in the interpretation of SVO structures prior to the morpho-syntactic point of disambiguation.Since the verb region is one of the critical regions of interest following the prosodic cues in the NP1 region and prior to the morpho-syntactic point of disambiguation at NP2, our results point to beneficial effects of enhanced prosody in SVO structures.Moreover, a significant main effect of the ambiguous and enhanced prosodic contrast in the NP2 region supports beneficial effects of enhanced prosody in general.Hence, our results demonstrate an additive use of morpho-syntactic and enhanced prosodic cues in both age groups in line with findings by Henry et al. (2017), who found cue additivity of prosodic and morpho-syntactic cues in unambiguous sentences.Our data extend those findings to enhanced prosody, which seems to add to morpho-syntactic cues in the interpretation of locally ambiguous structures.The beneficial effects of enhanced prosody are thus in line with our hypotheses and previous studies on hearing impairments (e.g.Grant, 1987) andinfant-directed speech (e.g. Fernald &Simon, 1984).
There are several possible explanations why we did not find support for such beneficial effects of marked prosody, for instance related to the specific study design and/or stimulus characteristics, inter-individual variability among listeners or a potential adaptivity of prosodic cues: Firstly, our experimental design involved a sentencepicture matching task with two pictures one of which showed a thematic role reversal, while previous studies examining prosodic effects on local ambiguity resolution used visual scenes including agents, patients as well as distractor items or role fillers without an explicit matching task (Kröger, 2018;Weber et al., 2006).In contrast to previous studies, we focused our analysis on fixation proportions to the target vs. foil picture rather than on agent vs. patient fixations in line with other visualworld studies (e.g.Hanne et al., 2015;Schumacher et al., 2015;Wendt et al., 2014).The preview phase ensured recognition of the constituents so that participants were able to attribute the target and foil pictures to SVO and OVS structures, respectively, before encountering the sentence.Using pictures with reversed thematic roles as foils may even model processes of thematic role assignment for sentence interpretation more closely.As for the f0 contours in our materials, we used intonation patterns as the basis for our trained speaker, that were produced by one naive speaker for syntactic disambiguation of SVO and OVS structures in the study of Huttenlauch et al. (2022).Accordingly, we defined the f0 contours of marked prosody as an L*+H accent on NP1 in SVO structures in contrast to an L+H* accent on NP1 in OVS structures.As a result, the marked f0 contours of OVS structures in the present investigation resembled those examined in previous studies (Henry et al., 2017(Henry et al., , 2022;;Kröger, 2018;Kröger et al., 2017;Weber et al., 2006), while the f0 contours of SVO structures showed different intonation patterns, that is, a missing H* accent on the verb, compared to Weber et al. (2006) or Kröger (2018).However, Weber et al. (2006) argued that there were no actions depicted in the visual-world scene in their experimental design, such that the verb referred to new information and sentence stress on the verb was therefore appropriate.This was not the case in the present study, where the visual stimuli depicted the agent, patient as well as the action of the scene which were additionally named in the preview phase.Huttenlauch et al. (2022) used the same visual materials in their production study.Nevertheless, we have to keep in mind that other speakers in the study of Huttenlauch et al. (2022) showed only minor use of pitch accents or used additional durational cues to differentiate between the two word order conditions.This points to a high degree of inter-speaker variability with regards to f0 contours of German SVO and OVS sentences and questions if prototypical OVS intonation patterns exist at all (for a discussion on prosody in object-initial German sentence structures: see Wierzba, 2020).At least for the examined speaker-specific SVO and OVS intonation patterns, the present study did not find support for reliable decoding of syntactically marked intonation patterns in younger and elderly participants.
Secondly, and closely related to inter-speaker variability, listeners in turn vary in how reliably they can decode intonation patterns across speakers (Cangemi et al., 2015).Post-hoc exploratory visual inspections of the by-participant mean fixation proportions to the target picture revealed a high degree of inter-individual variability among listeners.Nevertheless, it was possible to define subgroups of listeners from both age groups based on similar fixation patterns: Subgroup 1 (noprosody-group; younger: n = 7, M = 21.29,SD = 2.21; elderly: n = 4, M = 72.25,SD = 4.19) can be characterised by a strong subject-first bias with no effects of prosody on sentence interpretation (i.e.no difference in fixation proportions between the three prosodic conditions).Subgroup 2 (prosody-group, younger: n = 12, M = 25.17,SD = 4.49; elderly: n = 21, M = 69.43,SD = 5.19) can also be characterised by a subject-first bias with, however, potential effects of marked and/or enhanced prosody on the interpretation of SVO structures, namely higher fixation proportions to the target picture in these two prosodic conditions compared to the ambiguous prosody condition.Moreover, this subgroup showed potential prosodic effects in OVS structures with an increase in fixation proportions prior to the morpho-syntactic point of disambiguation at NP2 and/or an additive use of morpho-syntactic and prosodic cues at NP2.But notably, individuals of this subgroup revealed a high degree of variability.Subgroup 3 (waiting-group, younger: n = 11, M = 23.73,SD = 3.9; elderly: n = 5, M = 72.2,SD = 6.53) applied a possible waiting strategy by fixating both the target and foil picture to the same extent up to the disambiguating determiner at NP2.That is, this subgroup did not demonstrate a subject-first bias but waited for the integration of morpho-syntactic cues to derive an interpretation of the auditory sentence.A similar top-down "waitand-see" strategy was found by Hanne et al. (2015) in individuals with acquired language disorders (aphasia).Individuals with aphasia relied on a "wait-and-see" strategy until unambiguous morpho-syntactic cues were available.Only then did they initiate prediction of upcoming syntactic structures which led to an overall delay in cue processing.In the present study, subgroup 3 (including healthy younger and elderly individuals) might have relied on a similar waiting strategy to minimise their prediction error.In sum, exploratory visual inspections of fixation patterns in the three subgroups highlight the importance of investigating variability in the processing of linguistic cues in general in future studies.
Lastly, another possible explanation for the null effect of marked prosody could be the presentation of, on the one hand, blocks with mixed ambiguous and marked prosody trials, and on the other hand, enhanced prosody blocks without an ambiguous prosody condition.Therefore, prosodic cues were consistently available in the enhanced prosody block but not in the mixed ambiguous-marked block.According to Henry et al. (2017), prosodic cues only show beneficial effects on the parsing process when they are consistently available reflecting adaptive cue processing in the course of a constant assessment of utility.Consequently, our results on marked and enhanced prosody need to be interpreted with caution and the differential block design can be considered as a potential limitation of the present study. 2Nevertheless, we found support for cue adaptivity in our data comparable to Henry et al. (2017), namely overall significantly faster reaction times when starting with the mixed ambiguousmarked block followed by the enhanced block in comparison to the reversed block order.Half of the participants started with the mixed ambiguous-marked block including the not consistently available prosodic cues in the marked condition that might be added to the parsing process only until the parser no longer considered them as useful.But the increase in prosodic cue strength (i.e. higher level of cue utility) in the following enhanced block might have led to the parser's immediate adaption and integration of enhanced prosodic cues.The other half of participants were confronted with the reversed block order (i.e. they started with the enhanced block).They might have used the consistently available enhanced prosodic cues in the first block but in the following mixed ambiguous-marked block, prosodic cues in the marked condition were immediately considered less useful (i.e.lower level of cue utility).Therefore, the parser might not have added them to the parsing process at all, which finally resulted in overall slower reaction times in this block order.

Age effect
We additionally investigated variability in prosodic cue processing between a younger and an elderly group of participants.We found significant main effects of age group for response accuracy and reaction times.Thus, younger participants showed overall higher response accuracy compared to elderly participants, although these differences were very small since response accuracy was close to ceiling in both age groups.In line with our hypotheses and Titone et al. (2006), the two age groups did not differ in response accuracy between prosodic conditions.Our results further showed overall faster reaction times in younger vs. elderly participants, probably based on age-related differences in general processing speed.We did not find support for statistically significant interactions between prosody and age group that would qualify these age differences in reaction times as being related to differential prosodic cue processing.Hence, the formulated hypotheses and previous findings by Titone et al. (2006) were not supported.We rather demonstrate a comparable prosodic cue processing in both age groups.This was further illustrated by our findings on fixation proportions to the target picture, where we did not find any statistically significant interactions with age group.

Limitations and outlook
In the following, we would like to point to possible limitations of our study design and/or stimulus characteristics, discuss post-hoc analyses and highlight outlooks on future research.Firstly, our differential block design with mixed ambiguous and marked prosody trials in one block as well as enhanced prosody trials in the other block might limit our results on marked and enhanced prosody and serves as one explanation for the null effect of marked prosody.An additional explanation might be related to our within-participant design.Since participants were presented with the same visual stimulus multiple times in different prosodic conditions, this might have led to strategic fixation patterns.However, post-hoc exploratory visual inspections of the mean fixation proportions to the target picture revealed no systematic differences between fixation proportions of first displays of our stimuli in the two word order conditions compared to the full data set.Hence, we did not find support for effects of marked prosody in first displays or strategic fixation patterns in general.
Secondly, another shortcoming of the present study could be the fact that our auditory stimuli contained local ambiguities that were morpho-syntactically disambiguated at NP2.Hence, participants would in principle be able to perform the task by solely waiting for the morpho-syntactic cues at the point of disambiguation.This was supported by post-hoc exploratory visual inspections of the mean fixation proportions to the target picture that revealed one subgroup of participants (Subgroup 3, waiting-group), who applied such a waiting strategy to derive an unambiguous interpretation of the sentences.Therefore, we discuss two different alternatives to further explore the effects of prosodic cues on ambiguity resolution: (a) global ambiguity and (b) sentence completion.Alternative (a) might be to examine global ambiguity resolution in German.For the interpretation of globally ambiguous structures, participants would need to rely on prosody only.However, in a pilot study of Huttenlauch et al. (2022) they showed tremendous difficulties in producing f0 contours to distinguish globally ambiguous SVO and OVS sentences like Die NOM/ACC-f.Mutter sucht das ACC/NOM-n.Kind ("The mother looks for the child.").In fact, some participants rejected to perform the task, since the OVS interpretation of the sentences was inexplicable to them.Alternative (b) might be to conduct a sentence completion task with locally ambiguous SVO and OVS sentences in which participants would listen to the auditory stimuli only up to the disambiguating determiner at NP2 (e.g.Das Kamel tritt nun) and would need to decide which of two possible options for NP2 would best complete the sentence.Both options (for example: den Tiger vs. der Tiger) could be presented together with the respective visual stimuli to record eye-movements.In this way, participants would need to use prosodic cues to resolve the task, but would still be able to relate to both the SVO and OVS interpretation of the sentences.
Thirdly, since the marked condition was recorded in reference to productions of only one speaker in the study of Huttenlauch et al. (2022), this might limit our stimulus characteristics as to being too weak for prosodic differentiation between the two word order conditions.Notably, this speaker was the only person who naturally and consistently produced intonation patterns to distinguish SVO and OVS structures.Moreover, our preceding validation study aimed to examine to what extent listeners would be sensitive to these intonation patterns and found a moderate discriminability of SVO and OVS sentences.In the eye-tracking study, post-hoc exploratory visual inspections of the mean fixation proportions to the target picture revealed one subgroup of participants (Subgroup 2, prosody-group), who showed potential effects of marked and/or enhanced prosody on the interpretation of SVO and OVS structures even though there was a high degree of variability.Hence, at least for some participants, the differentially produced f0 contours in marked and enhanced prosody provided sufficient information to distinguish SVO and OVS structures.By further investigating potential learning effects and comparing early vs. late responses, we found additional support for our stimulus characteristics.Results revealed interesting temporal dynamics of fixation patterns in the course of the experiment and dependent on the age group suggesting an improved prosodic cue use for sentence interpretation over time (i.e. higher mean fixation proportions to the target picture for marked prosody in the second half of trials compared to the first half in each block).We thus found support for learning effects in using prosodic cues for ambiguity resolution and for adaptations to the examined speaker-specific intonation patterns in the course of the experiment.
Lastly, following previous studies on disambiguation, we conducted a time window analysis using linear mixed modelling to test for our effects of interest.However, additional analysis approaches would serve to further assess the temporal emergence of the found effects, for instance growth curve analysis, bootstrapped differences of time-series or generalised additive modelling (for an overview: see Ito & Knoeferle, 2023).
To conclude, the present study examined the effects of prosodic cues on guiding parsing decisions in the interpretation of locally ambiguous German SVO and OVS sentences in younger and elderly individuals.Our results replicate findings of a strong subject-first bias, as well as of rapid integration of morpho-syntactic cues for sentence interpretation.We did not find support for a facilitative role of syntactically marked prosody reflecting previously reported mixed results of prosodic effects on the interpretation of locally ambiguous structures.However, we extend previous findings to an enhanced prosody condition which indeed showed beneficial effects prior to the point of disambiguation in SVO structures, and added to morpho-syntactic cues in OVS structures in both age groups.The present study further highlights the importance of examining variability in prosodic cue processing in future research.

Notes
1. Please note that no formal GToBI analysis was run here.
The prosodic contours resembled the pitch accents of the GToBI system and were described in reference to previous studies.In this way, we aimed to make it easier to compare the auditory stimuli across different studies.2. We decided to test the ambiguous and marked prosody condition in a mixed block, because the aim of the present study was to investigate syntactically marked prosody in direct comparison to the ambiguous prosody condition.In addition, we aimed to explore potential effects of enhanced prosody, but we refrained from mixing it with the ambiguous prosody condition to avoid an imbalance of prosodically ambiguous trials in the overall experiment.Moreover, we did not include a mixed ambiguous and enhanced prosody block to not direct the participants' attention to the prosodic differences between the most opposing conditions.

Figure 1 .
Figure 1.Prototypical f0 contours (pitch in Hz) for SVO (left) and OVS (right) sentences with bisyllabic nouns (ultima stress) at NP1 in the marked prosody condition as analysed in PRAAT.

Figure 2 .
Figure 2. Mean and standard error of time-normalised f0 contours (in Hz) of two word order conditions (SVO = dashed line, OVS = solid line) and three prosodic conditions (ambiguous = black, marked = grey, enhanced = light blue) separated by syllable structure and stress patterns.

Figure 3 .
Figure 3. Visual stimuli for an example of SVO (left) and OVS (right) sentences.

Figure 4 .
Figure 4. Mean reaction times (in ms) for younger participants in the two word order conditions (SVO, OVS), separated for the three prosodic conditions (ambiguous = black, marked = grey, enhanced = light blue); violin plots show kernel probability density; boxplots show median, interquartile range and outliers.

Figure 5 .
Figure 5. Mean reaction times (in ms) for elderly participants in the two word order conditions (SVO, OVS), separated for the three prosodic conditions (ambiguous = black, marked = grey, enhanced = light blue); violin plots show kernel probability density; boxplots show median, interquartile range and outliers.
early responses.Mean fixation proportions to the target picture for early and late responses are presented in Figure I in Appendix.All fixed effects and interactions including learning as a predictor in the different time windows are provided inTable II in Appendix.

Figure 6 .
Figure 6.Mean fixation proportions to the target picture (in %) for younger and elderly participants in the two word order conditions (SVO = dashed line, OVS = solid line), separated for the three prosodic conditions (ambiguous = black, marked = grey, enhanced = light blue) and by time window (NP1, verb, adverb, NP2, silence period); whiskers show +/ 1 standard error.

Table 1 .
Mean of response accuracy (in % correct) for younger and elderly participants in two word order conditions (SVO, OVS) and three prosodic conditions (ambiguous, marked, enhanced); SE = standard error (in brackets).

Table 2 .
Fixed effects and interactions of the maximal generalised linear mixed model on response accuracy; SE = standard error.

Table 3 .
Fixed effects and interactions of the final linear mixed model on reaction times of correct responses; SE = standard error; df = degrees of freedom.

Table 4 .
Fixed effects and interactions of the maximal generalised linear mixed model on fixation proportions in the NP1 region; SE = standard error; p-values were Bonferroniadjusted.

Table 5 .
Fixed effects and interactions of the maximal generalised linear mixed model on fixation proportions in the verb region; SE = standard error; p-values were Bonferroniadjusted.

Table 6 .
Fixed effects and interactions of the maximal generalised linear mixed model on fixation proportions in the adverb region; SE = standard error; p-values were Bonferroniadjusted.

Table 7 .
Fixed effects and interactions of the maximal generalised linear mixed model on fixation proportions in the NP2 region; SE = standard error; p-values were Bonferroniadjusted.

Table 8 .
Fixed effects and interactions of the maximal generalised linear mixed model on fixation proportions in the silence period; SE = standard error; p-values were Bonferroniadjusted.SVO structures in the verb and adverb region especially for younger individuals, but this was statistically not confirmed by the data.In OVS structures, neither statistical data analysis nor visual inspections indicated such tendency in the ambiguous part of the sentence for none of the age groups, which is in line with previous studies that have reported a lack of reliable prosodic effects in the interpretation of locally ambiguous OVS sentences of