Lexical and perceptual biases in speakers’ syntactic choices

Russell Tomlin’s groundbreaking work laid a foundation for the explorations of the interplay between linguistic and nonlinguistic factors determining structural choice during sentence production. The present article investigates the interplay between two such factors, lexical priming and visual cueing , which may prioritize a particular referent of an event for privileged subject status within a sentence. To this end, native speakers of English read noun names of either the agent or the patient prior to describing a transitive event. Before the event description, an exogenous visual cue directed their attention to the location of one of the event’s referents. Analysis of the proportion of passive-voice descriptions revealed a strong effect of lexical priming with more passives after patient-related lexical primes but no effect of visual cueing. This highlights potential constraints on previously documented visual cueing effects on speakers’ structural choices.


Introduction
When people talk, they need to organize their sentences in order to facilitate communication.As a result, the content and the structure of individual sentences need to reflect the world's constantly changing nature accurately and consistently.One part of this process is being able to identify, track, and bind the concepts that these sentences are about.Any given language equips its speakers with an inventory of lexical and structural alternatives to choose from when describing the same semantic content.For example, a speaker of English could describe an event in which a professor is praising a student (a rare or frequent event, depending on perspective) by producing one of the following structural options, among many others: (1) A professor is praising a student . . .(2) A student is (being) praised by a professor . . .(3) A student whom a professor is praising . . .(4) A professor who is praising a student . . .What determines which of these available structural alternatives a native speaker of English will select in a given discourse environment?What factors determine the choice between available grammatical constructions all seemingly equally suited to describe how a professor is praising a student?The current article offers new evidence that helps elucidate the balance between two (out of many) such contributing factors: the privileged access to one of the referents' names (lexical priming) and the visual bias to one of these referents at the point of event apprehension (perceptual cueing).
Existing literature indicates that perceptual, conceptual, and linguistic contexts together determine both the positioning of the constituents in a spoken sentence and the overall structural frame selected by the speaker (Myachykov et al., 2012).For example, research within the syntactic or structural priming tradition convincingly demonstrates that one of the strongest determiners of syntactic choice is the preceding syntactic context.The corresponding structural priming effect is arguably one of most studied and robust psycholinguistic phenomena (e.g., Mahowald et al., 2016).In its most basic form, syntactic priming demonstrates how interlocutors tend to recycle previously encountered and/or produced syntactic frames.In this sense, if a speaker of English has recently used an unrelated sentence in the passive voice (e.g., A burglar is chased by a policemen), they will be more likely to use number 2 out of the options described above (Bock, 1986a, inter alia).
Importantly, syntactic priming is not fully autonomous and immune to other factors determining syntactic choice.For example, several studies demonstrated that repeating the same verb (e.g., Pickering & Branigan, 1998) or noun phrase (Cleland & Pickering, 2003) from prime to target "boosts" the magnitude of the overall syntactic priming effect and that this lexical repetition boost may be accumulative, that is, the more lexical material is shared between the prime and the target, the stronger is the overall syntactic priming effect (Scheepers et al., 2017).Moreover, it has been demonstrated that speakers' syntactic biases can be successfully primed by a single word presentation, that is, with no syntactic information overtly available to them.For example, Melinger and Dobel (2005) showed that the presentation of an isolated verb that is strongly biased toward either a prepositional dative or a double object verb frame is sufficient to prime subsequent production of a prepositional dative or double object sentence (using a different verb) in subsequent picture description.Similarly, a study by Ferreira (1994) showed that English speakers are more likely to use a passive voice frame when the verb they have to use foregrounds the "experiencer" of the event (e.g., challenged) rather than its "instigator" (e.g., avoided).
Even more relevant for the current investigation is that several previous reports demonstrated speakers' syntactic choices to be biased by a single noun presentation.One of the first of such studies (Flores, 1975) primed Italian participants with words like cat or dog before they described a transitive interaction involving them (e.g., dog chasing cat).When the primed referent was the event's agent, participants were more likely to produce an active voice sentence (The dog is chasing a cat) while in the patient-primed condition, they were more likely to use a passive voice frame (A cat is chased by a dog).
Similarly, Bock (1986b) presented participants with a priming word such as thunder or worship, which preceded a picture of, for example, lightning striking a church.Importantly, the noun primes were semantically related to either lightning (thunder) or church (worship).Participants were more likely to describe this event in the active voice (e.g., Lightning is striking the church) when primed by the noun prime semantically related to lightning (i.e., thunder).In contrast, they were more likely to describe the picture in the passive voice (e.g., The church is being struck by lightning) when they were primed by the word semantically related to church (i.e., worship).These and similar studies (Bates & Devescovi, 1989;Bock & Irwin, 1980;Ferreira & Yoshita, 2003;Flores, 1975;Osgood & Bock, 1977;Prat-Sala & Branigan, 2000) indicate that lexically or semantically primed referents tend to be assigned to the most prominent syntactic slot in the sentence (i.e., the subject role), which determines the resulting choice between an active and a passive voice frame.
Indeed, the choice between syntactic alternatives can also be influenced by the speaker's attentional focus.In English, if the speaker wants to foreground the agent over the patient, they may select the active voice frame for a spoken sentence describing a transitive interaction between them.Conversely, if they want to foreground the patient over the agent, they may choose to use the passive voice frame.The foundational research in this area was conducted by Russell Tomlin (1995Tomlin ( , 1997)), who demonstrated that visually salient referents tend to occupy prominent syntactic roles (e.g., subject), which in turn biases the resulting selection between active and passive voice frames.In the original study, participants observed and described an unfolding interaction between two fish, ending by one fish eating the other.An explicit visual cue (e.g., a pointing arrow) accompanied either the agent or the patient fish throughout the trial.The grammatical voice of the participants' descriptions of the target event (one fish eating the other) varied dramatically as a function of the cue direction, such that the cued referent was consistently mapped on the subject slot (thereby making an active or a passive voice description more likely when the cue was on the agent or the patient, respectively).Studies that followed used updated versions of the perceptual priming paradigm developed by Tomlin.These studies targeted languages other than English (Hwang & Kaiser, 2015;Myachykov et al., 2010;Myachykov & Tomlin, 2008;Pokhoday et al., 2019) as well as syntactic constructions different from simple transitive sentences (e.g., Gleitman et al., 2007;Montag & MacDonald, 2014).
Importantly, some studies also investigated how perceptual priming of syntactic choice interacts with the lexical and syntactic priming effects reviewed above.One such study (Myachykov et al., 2012) used all three priming manipulations.English-speaking participants first read priming sentences in either active or passive voice, half of which contained a notional verb matching the one necessary to describe the subsequent target event presentation (cf.Pickering & Branigan, 1998).Immediately afterward (but before the to-be-described event was presented), participants' attention was directed to the location of either the agent or the patient of the target event.Hence, all three priming manipulations-syntactic, lexical, and perceptual-were sequentially administered before describing the target event.First, in line with Tomlin's (and others') findings, it was found that the speaker's attentional focus on the patient or the agent reliably predicted the choice between active and passive voice.Second, both "abstract" and "lexically boosted" structural priming effects were observed, as participants were more likely to use passive voice descriptions after passive voice primes, and even more so when the prime's notional verb matched the target event.Third, the lexical effect (verb match) interacted with perceptual cueing: Participants were more likely to use passive voice when the prime verb matched the target event and the cue was on the patient.This latter finding indicates a degree of interplay between perceptual and lexical cues in driving syntactic choices.Apparently, the prior presentation of a matching prime verb preactivates syntactic alternatives (e.g., passive) that otherwise have a relatively low activation status (Tomlin, 2014).However, this verb-based priming effect promotes (the typically less preferred) passive voice choice only in the absence of a simultaneously available visual cue to the patient, indicating a degree of competition between lexical and perceptual cues (cf., Melinger & Dobel, 2005, but see Anton-Mendez, 2017).
The literature reviewed above clearly indicates that sentence production unfolds in a rich context where multiple sentence production cues are available simultaneously and that they may compete to bias the speaker's syntactic choice.Some of these cues are linguistic (cf.syntactic and lexical priming) and others are not (cf.perceptual cueing).A comprehensive sentence production architecture needs to account for these interactive properties and model how different priming parameters known to influence the speaker's choice of syntax, including lexical, syntactic, and perceptual effects, interact.Some sentence production models indeed support this view.For example, a dual-path mapping mechanism by Chang (2002) suggests that mapping nonlinguistic priming (such as perceptual salience) and linguistic effects (such as syntactic and lexical accessibility) can affect subject role assignment independently and in parallel, each contributing its individual biases.Even so and given the paucity of existing studies using simultaneous linguistic and nonlinguistic priming manipulations, several questions remain largely unaddressed.
Importantly, for the current study and, as noted above, the results by Myachykov et al. (2012) indicate potential competition between lexical and perceptual cues in their capacity to bias speakers' syntactic choices.One recent study (Stanford & Delage, 2023) reports the data largely consistent with this view.This study investigated syntactic voice in French sentence production with typically developing children, children with attention-deficit/hyperactivity disorder and developmental language disorder, and adults without a developmental disorder.Three cueing manipulations were used: no cue, a visual cue to the agent or the patient, and a linguistic cue condition, which involved topicalization of the referent in the sentence.These cue types were presented independently and sequentially.The study primarily focused on the comparisons between the clinical and the typically developing children's groups; however, analysis of the performance in the adult group makes it comparable with the previously reported findings.It clearly indicates that while the perceptual cueing manipulation was successful (in line with previous findings), linguistic cues facilitated the production of passives much more than the visual cues.This finding indicates that while both types of cues may be successful under certain conditions, lexical cues tend to dominate perceptual ones.
A comparison between Myachykov et al. (2012) and Stanford and Delage (2023) beyond the obvious fact that these studies used different languages (English vs. French) motivated the research reported below.First, while the former study used a mixed-cue design whereby a putatively interactive contribution of linguistic and nonlinguistic cues was tested, the latter used a sequential cue presentation protocol.As a result, while both studies highlight an overall higher relevance of the linguistic cues to syntactic choice than the perceptual ones, only the former study addressed the question of whether these cues can be accommodated in parallel.Nevertheless, both studies presented primes/cues sequentially, but the ordering was different: visual before lexical in Stanford and Delage (2023) but lexical before visual in Myachykov et al. (2012).
Second, Myachykov et al. (2012) used verb overlap as a lexical boost manipulation while the latter tested lexical priming (names referring to agent or patient, respectively).Arguably, these two manipulations relate to two different aspects of using lexical information in sentence processing: While the lexical boost effect is crucial for the retrieval of subcategorization frames associated with verb-lemma nodes, the lexical priming manipulation only leads to the activation of the corresponding lexicalsemantic representations, most likely in their canonical form (e.g., nominative case for the primed nouns).As a result, both noun-and verb-related lexical boost may bias speakers to retrieve the primed sentence structure from explicit memory and lexical priming may be more closely associated with a lexical accessibility effect per se (Scheepers et al., 2017;Zhang et al., 2022).
So, to investigate the putatively interactive contribution of lexical and perceptual cues to speakers' syntactic choice, we designed a study that used the elements of both Myachykov et al. (2012) and Stanford and Delage (2023).In our study, native English speakers' syntactic choices were primed by first presenting an agent-or patient-related noun in the center of the screen and then presenting a visual cue (to the location of the agent or the patient) before they described a depicted transitive event.Based on existing findings, we predicted the following.First, we expected to register a main effect of lexical prime, manifested as a higher probability of producing passive voice descriptions when the lexical prime refers to the patient rather than agent.Second, we expected to observe a visual cueing effect, with participants producing more passive voice sentences when their attention is directed toward the patient rather than agent.Third, and following the previous findings by Myachykov et al. (2012), we also expected an interaction between these two manipulations, with congruent prime/cue combinations (e.g., patient prime/patient cue) further amplifying the likelihood of selecting the corresponding syntactic frame (i.e., passive).

Participants
We recruited 24 participants (8 men and 16 women) from the population of University of Glasgow undergraduate students.All participants had normal or corrected-to-normal vision.Participants received two course credits compensation for their participation.The study received ethical approval from the University of Glasgow psychology department.

Apparatus
The experiment was conducted using SR-Research Experiment Builder.Participants' eye movements were monitored using an EyeLink II head-mounted eye-tracker to ensure effectiveness of the cueing manipulation (further details below).The experimental materials were displayed on a 17-inch CRT monitor of a Dell Optiplex GX 270 desktop computer operating at a display refresh rate of 75 Hz.Speech recording was performed using a Sony DAT recorder.

Materials
The experimental materials consisted of black and white line drawings of simple transitive events showing interactions between two easily recognizable characters (see example in Figure 1).The pictures were from a material set that has been extensively used before in numerous previous studies on structural priming or related research (e.g., Pickering & Branigan, 1998).There were 15 characters and 8 events, with a total of 32 pictures used.Each picture was controlled for the referents' positioning on the screen (left or right) with an equal number of left-to-right and right-to-left agent-patient orientations across items.We also included 66 filler pictures showing various arrangements of geometrical shapes presented in different regions of the screen (e.g., a square diagonally above and right of a heart).In the filler trials, participants had to describe those visual arrangements by producing a locative sentence describing the shapes and the relationship between them (e.g., "the triangle is above the square"; cf., Myachykov et al., 2012).

Design
This experiment used a 2 × 2 within-subject, between-item design with two factors manipulated as independent variables: lexical prime, with two levels (prime referred to either the agent or the patient in the image), and perceptual cue, also with two levels (cue was either on the agent or the patient).The occurrence of a passive voice target description (1 for yes, 0 for no) was the main dependent variable in our analyses.

Procedure
First, participants completed a consent form before taking part in the main experiment.Then, a practice session followed that familiarized participant with the names of the referent characters and the events.Participants were instructed to use these names in the upcoming experimental task.In the practice session, the pictures of the characters appeared individually in the center of the screen, followed by the names of the corresponding characters.Participants were instructed to read the word aloud and then press the spacebar to receive the next picture.In a similar way, participants also practiced the names of the events: A picture of the event would appear on the screen, followed by the verb written in the center of the screen.Participants examined the picture, read the word aloud, and then pressed spacebar to receive the next picture.
Each experimental trial began with the presentation of a central fixation cross, which participants were instructed to direct their gaze to.Then a noun (lexical prime) was presented in the center of the screen.Participants read the word aloud and then pressed the space bar.The noun prime referred to either the agent or the patient of the subsequent target picture.A perceptual cue (a red circle 25 pixels in diameter) then appeared on the screen in the approximate center of one of the subsequently presented referents (agent or patient).Participants were instructed to direct their gaze to a red dot whenever it appeared on the screen.The progression from the cue display to the target picture display was contingent on fixating the cue for a minimum of 100 ms.The cueing manipulation was therefore 100% effective in the sense that participants always directed their attention to the cued location before the target picture was presented.The target picture was presented in the center of the screen with the visually cued locations corresponding to either the agent or the patient of the event.Participants were instructed to describe the target picture extemporaneously and in a single sentence.After describing the picture, they then had to press the space bar to initiate the next trial.The target picture remained on the screen for a maximum 7,700 msec or until the participant pressed the space bar.There were 32 critical trials and 66 filler trials.Randomization was constrained so that there were always four fillers at the beginning of each session and that each prime-target trial was separated by at least two filler trials.

Results
We coded participants' verbal responses as one of active voice, passive voice, or other.There were four trials coded as "other" where participants incorrectly mapped the agent and patient referents onto the sentential structure.For example, the picture with a cowboy kicking a boxer would be described as A boxer kicked a cowboy.To be coded as active voice, the description had to use a transitive verb referring to the depicted event, a subject noun phrase referring to the agent, and a direct object noun phrase referring to the patient (e.g., The cowboy is punching the boxer).To be coded as passive voice, the description had to use a passivized transitive verb referring to the depicted event, a subject noun phrase referring to the patient, and a byphrase referring to the agent (e.g., The boxer is [being] punched by the cowboy).Note that truncated passives (not including a by-phrase) were hardly ever produced since they were explicitly discouraged in the practice session.All remaining responses (including missing responses) were excluded from analysis.Of the 24 (participants) × 32 (items) = 768 trials, 9 trials (1.2%) were affected, which left 759 valid trials for analysis.

Observed passive voice frequencies
Overall, participants produced passive voice descriptions about 28% of the time (i.e., active voice descriptions were clearly preferred).Table 1 shows a breakdown of passive voice descriptions per experimental design cell.What becomes immediately obvious is that passive voice descriptions were much more likely when the patient rather than the agent was lexically primed.

Statistical modeling
Occurrences of passive voice descriptions were analyzed by fitting binary logistic mixed effects models to the data, using the glmer() function of the R package lme4 (Bates et al., 2015).The categorical predictors perceptual cue and lexical prime were mean-centered (deviation coding), and the model used the maximal random effect structure justified by the design (Barr et al., 2013).At the participant level, we included the random intercept term plus random slopes (and appropriate random correlations) for the main effect of perceptual cue, the main effect of lexical prime, and the cue × prime interaction.At the item level, we included only the random intercept term, but no random slopes because the two factors varied between items.Models were fitted using the bobyqa optimizer with max.50,000 iterations and a tolerance of 0.0005.The p values were determined via likelihood-ratio chi-square model comparisons, each time testing the full model against a reduced model in which a given fixed effect term of interest was excluded.Figure 2 shows model-predicted log odds for occurrences of passive voice descriptions per design cell.
In line with the descriptive findings, and as suggested by Figure 2, there was a very clear and significant main effect of lexical prime (Est.= 3.867, SE = 0.779, X 2 LR (1) = 19.723,p < .001),due to increased likelihoods of passive voice descriptions when the lexical prime matched the patient rather than agent in the depicted scene.In contrast, neither the main effect of perceptual cue (Est.= 0.368, SE = 0.510, X 2 LR (1) = 0.515, p = .473)nor the cue × prime interaction (Est.= −0.139,SE = 1.044,X 2

Discussion
The current study investigated the interplay between lexical and perceptual cues to speakers' syntactic choices in English transitive sentence production by using a mixed-cue design whereby both the lexical and the visual cues are available to the participants in the same trial.To this end, native speakers of English described visually presented transitive interactions between two human characters after the label of either the agent or the patient was lexically primed and following a visual-attentional cue to one of these referents.Overall, only one of our three originally entertained hypotheses was supported by the data.Specifically, we registered a strong main effect of lexical prime: Participants were more likely to generate a passive voice sentence when the patient rather than the agent was lexically primed (and in turn, more likely to generate an active voice sentence when the agent rather than the patient was lexically primed).Our perceptual cueing manipulation led to an observable but not statistically reliable increase in passive voice sentences in the patient prime/patient cue condition, but neither the main effect of perceptual cue nor the interaction between lexical prime and perceptual cue were significant.Overall, the effect of lexical prime on syntactic choice further supports similar findings by Flores (1975) and Bock (1986b) who reported similar findings.At the same time, there are only two studies that we are aware of that previously attempted to use both perceptual cueing and lexical priming within the same experiment: Myachykov et al. (2012) and Stanford and Delage (2023).The former reported both a main effect of perceptual cueing and an interaction between lexical priming (verb overlap) and perceptual cueing.Using nouns as lexical primes in a mixed-cue design of the current study resulted in the registration of a much stronger lexical priming than perceptual priming effect.Below we briefly discuss our new findings, highlight their limitations, and suggest directions future research.

Interaction between lexical priming and the perceptual cueing effects
One important difference between Myachykov et al. (2012) and the current study highlighted in the Introduction is the fact that the former used verbs as a lexical boost primes presented within the syntactically priming sentences while the latter used nouns as single standing lexical primes.So, the absence of the interaction between the lexical priming manipulation and the effect of the perceptual cueing in our current study may suggest lexical priming (particularly noun-related) outside of sentence context could be more closely associated with a lexical accessibility effect leading to the preferential order of naming rather than syntactic priming effect per se.

Prime cue order
Presentation of the noun prime before the visual cue likely resulted in an early commitment to insert the primed referent into the frame as its subject regardless of the cue.In addition to what's described in 1 and 2, the influence of perceptual cues is constrained by the relative power of the top-down biases resulting from priming by lexical information.

Agent preference, prime presentation modality, and left-to-right scanning bias
Some recent studies (Esaulova et al., 2020;Pokhoday et al., 2019;Schlenter & Penke, 2022) showed that event orientation and the related left-to-right arrangement of referents plays a distinct role in biasing speakers' syntactic choices.Overall, speakers are heavily biased to ascribe a more privileged role to the agents and scan the described events from left to right with the further preference to receive the agents on the left of the patients (Isasi-Isasmendi et al., 2023;Kemmerer, 2022;Sauppe & Flecken, 2021;Wilson et al., 2022;Zuberbühler & Bickel, 2022).This may be referred to as a canonical event schema.In our study, the lexical primes were written words, which, in addition to a preferential top-down lemma activation due to the primacy of the lexical priming manipulation, may have resulted in a regular establishment of a co-occurring left-to-right attentional bias.This, in turn, may have led to an increase in the proportion of the registered active voice responses on the one hand and to a reduced effectiveness of the right lateral visual cueing manipulation and the associated decrease in the proportion of the registered passive voice responses on the other.

Imbalanced structural choice
The imbalance between active and passive voice may be too large, given only 5% frequency of passive voice occurrences in English corpora (Svartvik, 1966).Indeed, the original Tomlin's Fish Film study registered a very strong referential cueing effect on the proportion of active-and passive-voice sentences produced by English speakers.However, one needs to consider several features of Tomlin's design that may account for this unusually strong finding.Bock et al. (2004) emphasized the following critical points: the repetitive use of the same event without filler materials, the explicit nature of the visual cue (and related experimental instructions), and the joint presentation of the cue and its target.The studies that followed (see the Introduction) reported consistent much smaller effect sizes, leaving the possibility that in the presence of lexical primes, perceptual cues are not always considered.The use of a more balanced structural contrasts (e.g., prepositional dative/double object, symmetrical predicates, and conjoined noun phrases) could give perceptual cues a larger chance of survival in the presence of lexical information.A similar logic is described in Myachykov et al. (2018), and some existing studies indeed document stronger perceptual cueing effects for constructions other than transitive frames.For example, Montag and MacDonald (2014) demonstrated that visual salience operationalized as animate/inanimate referential contrast influenced relative clause production strategies even in the presence of a linguistic manipulation.Similarly, Gleitman et al. (2007) registered higher magnitude perceptual cueing effects in conjoined noun phrases and symmetrical predicates more potent than in active/passive voice selection.

Future directions
Future studies will need to take into account the methodological limitations of the current research by implementing the designs that use auditory lexical primes instead of the visual ones in order to control for the left-to-right scanning processes, use multiple cues co-presentation protocol in order to enforce cue competition and avoid commitment to the initially presented cue, use various orders of cue presentation to the same aim, and use presentation of the noun prime displaced to the location (right or left) of the eventual patient or agent for the same simultaneous mapping conflict.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Figure 1 .
Figure 1.Experimental stimulus example.The picture can be described as (e.g.) the policeman is kicking the clown or the clown is being kicked by the policeman.

Figure 2 .
Figure 2. Box plot of the predicted values from the binary logistic mixed effects model, broken down by perceptual cue (red for agent, blue for patient) and lexical prime (x-axis labels).Predicted values are in log odds units.

Table 1 .
Absolute observed frequencies of producing a passive voice description by levels of perceptual cue (on the agent or on the patient) and lexical prime (matching the agent or the patient) .