Auditory distraction of vocal-motor behaviour by different components of song: testing an interference-by-process account

ABSTRACT The process-oriented account of auditory distraction suggests that task-disruption is a consequence of the joint action of task- and sound-related processes. Here, four experiments put this view to the test by examining the extent to which to-be-ignored melodies (with or without lyrics) influence vocal-motor processing. Using song retrieval tasks (i.e., reproduction of melodies or lyrics from long-term memory), the results revealed a pattern of disruption that was consistent with an interference-by-process view: disruption depended jointly on the nature of the vocal-motor retrieval (e.g., melody retrieval via humming vs. spoken lyrics) and the characteristics of the sound (whether it contained lyrics and was familiar to the participants). Furthermore, the sound properties, influential in disrupting song reproduction, were not influential for disrupting visual-verbal short-term memory—a task that is arguably underpinned by non-semantic vocal-motor planning processes. Generally, these results cohere better with the process-oriented view, in comparison with competing accounts (e.g., interference-by-content).

Extraneous task-irrelevant sound is typically omnipresent.Humans are passive recipients of music, speech, and environmental sound around them and may even be unaware of its presence.However, through the unique, sentinel quality of the human hearing system, our auditory environments are nonetheless processed and therefore carry the potential to influence cognition.Thus, irrespective of how we are otherwise engaged, an irrelevant auditory stimulus can detrimentally impact our goal-directed behaviour.Selective attention must be permeable to enable a necessary response irrespective of current behaviour-for example, to alert danger.However, it is this quality that leaves an organism open to the necessary consequence of distraction (Hughes & Jones, 2003;Johnston & Strayer, 2001).Previous exposure to sound, such as listening to music before conducting a task, can sometimes enhance cognitive task performance-an effect attributed to the benefit of increased arousal or positive mood (Perham & Whitney, 2012;Schellenberg, 2005;Thompson et al., 2001).However, a more common consequence of the passive processing of concurrent sound while engaged in mental activity, is the disruption of cognitive task performance, compared to quiet (e.g.Colle & Welsh, 1976;Furnham & Strbac, 2002;Perham et al., 2013;Thompson et al., 2011).For example, reading comprehension is impaired by background speech (Martin et al., 1988;Sörqvist et al., 2010) and to a greater extent by the presence of lyrical, than non-lyrical music (Martin et al., 1988;Perham & Currie, 2014).In the current study, we explore whether the disruptive effects of task-irrelevant sound, especially different components of song, are jointly characterised by properties of the sound and properties of the mental processes underpinning a focal task (cf.Jones & Tremblay, 2000).Specifically, we address whether the characteristics of sound that disrupt tasks involving semantic memory for music and lexical retrieval, differ from those that require visual-verbal shortterm memory.
The empirical platform upon which most work on auditory distraction has been undertaken uses a visual-verbal short-term memory task (e.g.Beaman & Jones, 1997;Conrad, 1964).This task requires recall of to-be-remembered information (usually digits or letters) in the exact order of its initial presentation.An appreciable decrease in serial shortterm memory performance in the presence of tobe-ignored sound has been termed the "irrelevant sound effect" (ISE, Beaman & Jones, 1997; see e.g.Colle & Welsh, 1976;Hughes et al., 2007;Jones & Macken, 1993;LeCompte, 1996;Salamé & Baddeley, 1982).Several accounts have been put forward to explain the ISE and one purpose of the current study was to augment theoretical debates regarding mechanisms of auditory distraction (e.g.Bell et al., 2019;Hughes, 2014).

Theories of auditory distraction
The classical structuralist view (e.g.Baddeley, 1986) promotes the idea that focal task disruption from irrelevant sound is due to interference caused by the structural similarity between to-be-remembered and to-be-ignored items that cohabit a short-term store or space.Identified as "interference-bycontent" accounts, they have numerous manifestations (Neath, 2000;Oberauer & Lange, 2008;Salamé & Baddeley, 1982).In the context of the ISE, for example, this class of accounts suggest that background speech impairs the memory of lists of stimuli because the to-be-ignored stimuli (the speech) and the to-be-recalled items are similar in content.
An alternative explanation, identified as "interference-by-process" (Jones et al., 1996;Jones & Tremblay, 2000), proposes task impairment from auditory distraction arises due to concurrent, common, mental processes operating during the serial organisation of stimuli.According to this account, disruption produced by irrelevant sound on visual-verbal serial recall is attributable to a clash between two contemporaneous serial-ordering mechanisms: one deliberate and applied to the focal task-serial rehearsal-and one automatic and applied pre-attentively to the irrelevant sound (Jones et al., 1996;Tremblay & Jones, 1998).
Seriation processes must apply for both focal task and irrelevant sound processing for disruption to take place.Disruption therefore happens irrespective of the content of the sound (e.g.whether it comprises language or not) and irrespective of whether it is familiar or not (Jones et al., 1990).In this view (e.g.Jones & Tremblay, 2000), any disruption to focal task performance by irrelevant sound will be determined by factors related to perceptual streaming (see Bregman, 1990).
Changes in the acoustic make-up of the stimulusare important determinants of disruption.Changing speech letters (e.g.d, w, r, l or varied pitch, A, C, F#, B), are more distracting relative to repeated sounds (e.g. d, d, d, d or constant pitch A, A, A, A).This is called "the changing state effect" (Jones, 1993) and plays a central role in the interferenceby-process account (Jones & Tremblay, 2000).A series of changing-state speech or non-speech sounds will invariably produce greater disruption of serial recall than a sequence of unvarying material (Jones & Macken, 1993).The fact that the changing-state effect is determined by the sounds' acoustic properties, rather than the sound items' identity or content, aligns with the interferenceby-process view but contradicts that of interference-by-content (Neath, 2000;Salamé & Baddeley, 1982).
The concept of interference-by-process is encompassed within an emerging account of short-term memory phenomena that has been coined the perceptual-gestural account (e.g.Jones et al., 2006), or perceptual-motor view (e.g.Hughes et al., 2009;Hughes & Marsh, 2017;Jones et al., 2004).This view assumes that short-term memory performance is parasitic on processes and systems that are general-purpose mechanisms not specifically dedicated to memory.The perceptualmotor view (Hughes & Marsh, 2017;Jones et al., 2004Jones et al., , 2006) ) eschews the notion that inner speech (sub-vocal rehearsal) serves the function of refreshing decaying memory traces within a mnemonic store or space, as promoted, for example, by the phonological store interference-by-content account (e.g.Baddeley, 1986Baddeley, , 2007)).Rather, the skill of speaking (overt or inner-speech) is exploited to bind together items-that have no pre-existing syntactical or grammatical cues as to their orderinto a single temporally extended motor-plan on a common carrier.Speech, being necessarily sequential, is thus well suited to embodying the serial order of items and co-articulatory and prosodic characteristics of natural speech can further imbue the motor-plan with serial order cues (Macken et al., 2014;Woodward et al., 2008).Furthermore, forward internal models that operate to enable sensory outcome predictions of planned or imagined actions (sensory, auditory or visual) can influence the online processing of external sensory information (e.g.music) within the environment (Halász & Cunnington, 2012;Maes & Leman, 2013;Schutz-Bosbach & Prinz, 2007;Witt, 2011).This suggests an intricate relationship between motor planning and musical perception, similar to the relationship between perceptual organisation and motor planning (e.g.Hughes et al., 2016).In support of this relationship research suggests that long-term experiences (and activities) involved in playing a musical instrument yield a superior association between sensory information and fine motor movements in musicians, as compared to non-musicians, that gives rise to superior sequence learning (e.g.Anaya et al., 2017).

Vocal production
The perceptual-motor view explains the greater disruption of serial recall observed from changingstate (as compared to steady-state) sound sequences as being due to the motor-plan's susceptibility to the obligatory process involved in perceptually organising sound into streams (Hughes & Jones, 2005;Jones & Macken, 1993).Providing successive sounds within a changing-state sequence, verbal or non-verbal, are acoustically similar to one another (in frequency or timbre) and share a common ground (e.g.voice), they are assigned to the same stream.Their serial order sequence is formed via the assignment of successive cues to the same stream, which results in it being a candidate for populating the motor-plan (Hughes, 2014;Hughes & Jones, 2005;Hughes & Marsh, 2017;Jones et al., 2004; see also Marsh et al., 2009).Therefore, according to the perceptual-motor view, limitations in performance (in the context of serial recall) by changing-state sound are not due to the limitation of mnemonic capacity within a putative static short-term memory store or space that is determined by the existence of items prone to decay, interference, or displacement as interference-by-content accounts hold (Baddeley, 1986;Neath, 2000).Rather, it represents a competitionfor-action, whereby disruption to serial recall is produced by the conflict of changing-state between the target items and the irrelevant sounds (Hughes, 2014).While the interference-by-content accounts focus on the individuality of each item presented (and their rate of decay, or structure, such as phonological similarity; Baddeley, 1986;Nairne, 1990;Neath, 2000), the perceptual-motor view focuses on sequence factors rather than constituent, individual items, and means of output via motorplanning that enable sequences to be assembled and sub-vocally rehearsed (Hughes & Jones, 2005;Jones et al., 2006Jones et al., , 2007)).
The generality of the vocal-motor disruption via background sound, as suggested by the perceptual-motor view (e.g.Hughes & Jones, 2005), implies that it should not be confined to shortterm serial recall.It should also extend to other tasks involving motor-planning and sequential verbal production.For example, the automatic processing of irrelevant auditory input could assume/ threaten to assume control of the sub-vocal motor system responsible for the planning of, and production of, vocalisable components of song (cf.Godøy & Leman, 2009;Leman, 2007;Leman & Maes, 2014, 2015).One goal of the current study was to determine whether the perceptual-motor view (Hughes & Jones, 2005;Hughes & Marsh, 2017) and the construct of interference-by-process, could be extended to a hitherto unstudied research domain involving vocal retrieval of known song components (melody, lyrics) from long-term memory.
Later in this empirical series, we introduce accounts that attribute auditory distraction to an attentional capture mechanism.Specifically, it will be investigated whether some of the disruptive effects of to-be-ignored sound reported in the current investigation align better with an account that assumes that particular properties of sound are capable of wresting attention from a focal task, thereby resulting in performance decrement (Bell et al., 2010(Bell et al., , 2012;;Hughes et al., 2005Hughes et al., , 2007;;Marsh et al. 2018;Parmentier et al., 2018;Röer et al., 2013;Vachon et al., 2017).
A considerable body of existing research has focused on the impact of presenting irrelevant sound/music during short-term memory recall/recognition tasks of visually presented notated tones/ words.By contrast, little research focusses on the impact of irrelevant sound/music on vocal production.It is conceivable that task-irrelevant sound, known to disrupt the vocal-motor planning process in short-term memory (Hughes & Marsh, 2017), may impact on the vocal production mechanisms required for melody/lyrics retrieval from longterm memory.Existing studies have focussed more on exploring music representation within memory, rather than its production (e.g.Fiveash et al., 2018;Miranda & Ullman, 2007;Perruchet & Poulin-Charronnat, 2013;Thompson & Yankeelov, 2012).).Consequently, it is unclear how processes involved in vocal input/output are related to one another and, of greater consequence, how vocal-motor planning required for vocal music/language production may be affected by a competing motor-plan provided by the presence of extraneous music or language.One aim of the current study was thus to explore the properties of background sounds that impact on the vocal-motor planning process in the context of a familiar song.Watkins and Allender (1987, p. 565) state: "it is surely more difficult to bring to mind a particular tune if another is being heard".The motivation for the current work was driven, in part, by the lack of any known empirical study exploring this anecdote and ensuing questions to which it gives rise: Is it, in fact, more difficult to produce a familiar song when hearing another?If so, which features of the background music (e.g.melody, lyrics, familiarity) are responsible for promoting the disruption?Does the nature of the production of song (e.g.speaking lyrics vs. humming melody) also interact with the features of the background sound to determine disruption?Addressing such questions could potentially inform the nature of the perceptual and cognitive processes underpinning memory for, and production of, song, and more generally inform the nature of verbal behaviour and memory functions at large.

Integration versus independence
Language and melody within song are thought to be subtended by independent, hierarchical systems that have comparable properties (Schön et al., 2004).For example, discrete items (notes of pitch, words) are combined into musical chords, or sentences, with syntax rules governing the operation (Fiveash et al., 2018;Thaut, 2009).A continued source of debate, however, is whether the language and melody components of song are integrated or processed independently when listening to song (Hamzelou, 2010;Hébert & Peretz, 2001).
The finding that melodies of songs are better recognised when heard with their original words than when heard with text of another equally familiar song, and vice versa (Crowder et al., 1990;Samson & Zatorre, 1991), is taken as support for the integration of melody and lyrics.However, some neurological studies have demonstrated independent processing of melodic and semantic song components (e.g.ERP1 studies, Besson et al., 1998; PET2 studies, Gröussard et al., 2010; see also Bonnel et al., 2001).
Exploring the impact of different properties of task-irrelevant auditory distracters on the production of melody (Experiment 1) and lyrics (Experiment 2) in the present study, affords a window onto the nature of processing within the mnemonic systems responsible for the two properties of song, thereby shedding light on the integration versus independence debate.According to the integration view (Schön et al., 2010;Serafine et al., 1986), we might expect the pattern of any disruption attributable to task-irrelevant sound with different properties, to be observed regardless of whether melody production (e.g.humming: Experiment 1) or lyrics production (Experiment 2) is required.On the flipside, a deviation to the patterns of interference observed could be taken as prima facia evidence for independence (e.g.Besson et al., 1998;Bonnel et al., 2001).As mentioned in the foregoing, supposing that the presence of background music does indeed impair vocal production of melody or lyrics, it is important to determine whether an effect is underpinned by prior memory for, and thus familiarity with, distracter stimuli.

Influence of distracter familiarity
To understand the potential influence of task-irrelevant music on the production of familiar melody (via humming; Experiment 1) or lyrics (via speaking; Experiment 2) one might look to modular models of music processing (e.g.Peretz & Coltheart, 2003).These models propose the presence of brain specialisation, a "modular architecture", for local neural circuitries essential for music processing (Peretz & Coltheart, 2003, p. 689).Within the context of this modular model, memory for familiar melodies, but not necessarily the temporal process of learning, is generally understood to denote musical semantic memory.
The embodied music cognition theory suggests music perception activates motor systems that allow melodies to be reproduced or performed (Leman, 2007;Leman & Maes, 2015;Sievers et al., 2013).Therefore, the production of a familiar target melody (with associated lyrics) or lyrics from long-term memory and the mere perception of a different familiar to-be-ignored melody/lyrics should involve some degree of activation within the motor system.For example, subvocalisation (covert singing of lyrics using inner speech) has been implicated in the rehearsal of auditory images (the imagining of an auditory stimulus [e.g.music] without hearing the actual sound, utilising the inner ear, Halpern, 2001;Pring & Walker, 1994;Reisberg et al., 1989;Smith et al., 1995).Moreover, since familiar, as compared to unfamiliar, stimuli are well-represented in long-term memory, such vocal-articulatory activation should be greater for familiar than for unfamiliar stimuli and should, therefore, be more disruptive to the vocal production of another song.The current study set out to test these predictions.

Experiment 1
The incentive for Experiments 1 and 2 of the current study was guided by the need to explore whether and how retrieval processes for song operate and interact.The vocal production mechanism, responsible for producing components of familiar song (e.g.Peretz & Coltheart, 2003;Tsang et al., 2011), was explored by investigating its propensity to be disrupted by the mere presence of familiar and unfamiliar task-irrelevant music and/or language.In Experiment 1, participants hummed the melody of familiar songs that were cued by title.Humming was chosen because, arguably, it does not necessitate access to the phonological lexicon (Peretz & Coltheart, 2003) but nonetheless requires low-level sensorimotor operations of singing mechanisms such as pitch control and timing of movement (Özdemir et al., 2006;Schubotz et al., 2000).While humming target melodies in Experiment 1, participants were exposed to to-be-ignored sounds comprising: (1) a different instrumental melody that was familiar or unfamiliar to participants, (2) sung familiar or unfamiliar lyrics to a familiar or unfamiliar melody, and (3) spoken lyrics from different familiar or unfamiliar song.Inclusion of these conditions made it possible to study characteristics of the features of song that drive disruption of vocal motor-processing by sound.
The design of Experiment 1 enables us to address the suggestion that familiar songs are more difficult to produce when hearing another (Watkins & Allender, 1897).It allows insight into which features of background music (melody, lyrics, familiarity) promote disruption.In turn, the design can also inform whether language and music are integrated or independent within song (Besson & Schön, 2001;Peretz & Zatorre, 2005;Thompson & Yankeelov, 2012).).From the standpoint that verbal retrieval is vulnerable to distraction from activation of similar information stored within semantic memory (Jones et al., 2012) it was hypothesised that production of a familiar melody would be more impaired by the presence of a task-irrelevant familiar melody-that has a representation within long-term memory and can therefore compete with the target melody for retrieval-than an unfamiliar melody combining a novel pitch contour with a familiar rhythmic pattern.However, it was also hypothesised that unfamiliar melody, acting as an irrelevant auditory distraction, would produce some disruption as compared with quiet (Jones & Macken, 1993;Salamé & Baddeley, 1989).If humming does not require access to the phonological lexicon (cf.Peretz & Coltheart, 2003), then it was expected that an irrelevant familiar melody with sung lyrics would produce no more disruption than a familiar melody without lyrics.However, if access to the phonological lexicon was required, a distracter with lyrics would produce a greater disruption than without lyrics.Moreover, if stored separately, spoken lyrics (familiar or unfamiliar) without melody should be less disruptive than spoken lyrics with a familiar melody.

Participants
Two hundred and ninety-four adults aged 18-92 (mode range 45-49 3 ), from local community/university groups participated, in return for travel reimbursement or course credits.As multiple between-participants conditions were used, to ensure comparability between the participants across groups, all 150 participants aged over 50 years completed the Addenbrooke's Cognitive Examination Revised (ACE-R4 ).A mixed analysis of variance (ANOVA) showed Sound Condition × Cognition to be nonsignificant. 5Visual acuity and hearing tests confirmed acceptable levels, with Ethical Clearance for the ensuing four experiments obtained from the University of Central Lancashire.To determine an appropriate sample size in each experimental condition, we conducted two apriori power analyses (using G*Power; Faul et al., 2007).There are, to the best of our knowledge, no previously published experiment that is identical or nearly identical to the experiments conducted in the present paper.Because of this, we based the power analyses on the theoretical assumption that the effects of the present study would resemble the effects of background sound on retrieval from long-term memory reported in Marsh, Crawford, et al. (2017) and in Jones et al. (2012, Experiment 3).The effects reported in their studies are conceptually similar to the effects under study in the current investigation.The experimental effect (the difference between the two Sound Conditions) in Marsh, Crawford et al.'s study had an effect size of d z = 0.41.The estimated sample size needed to detect an effect (one tailed) of this size is 66 participants.Moreover, the experimental effect (from a repeated measures analysis of variance across three Sound Conditions) reported in Experiment 3 (that had spoken output as dependent measure, similar to the current study) in Jones et al. (2012) had an effect size of h 2 p = .22(from which we obtained an effect size of the F of 0.53).The estimated sample size needed to detect this experimental effect is 11 participants.Taken together, we aimed to collect data from about 70 participants in each condition, which should be enough to detect the effects according to the power analyses.

Materials and apparatus
Thirty target and ten distracter songs were compiled from western culture nursery/traditional rhymes following a pre-study questionnaire completed by 20 non-participant volunteers to assess familiarity.All songs shared distinctive intrinsic characteristics, with simple duple, triple, quadruple, or compound duple time signatures, a duration range of 12-24 s (M = 18.4), and a beat range within each excerpt of 24-48 (M = 36.9).Pace in western music is indicated by beats per minute (bpm) and each excerpt had the tempo of 120 bpm to a crotchet pulse (quarter note).Time values included quavers, crotchets, and minims, with occasional dotted quaver + semiquaver combinations.A trained contralto pre-recorded a live a cappella performance, and monotonic spoken lyrics (paced to the tempo of the sung version [Simmons-Stern et al., 2012]), into a laptop computer via a line-in USB microphone using Audacity.Instrumental stimuli were computer generated using synthesised sounds of a concert flute from the "Avid Sibelius 7" music software notation program.Phrasing was legato throughout.All stimuli were looped from their original times to run for 32-second to allow equal duration with the 32-second time requirements to complete the humming for each target melody.For the unfamiliar melodies, while tonality, rhythmic pattern, and implied harmony were matched to the original melody, the pitch contour was reformed to create novel interval progressions.Unfamiliar lyrics followed the original syntactic structure, syllabic setting with limited melisma (see Figure 1(a,b)).Irrelevant sound was presented via Sennheiser HD201 headphones, averaged at 62 dB(A), executed by the E-Prime program (2) software tool (Taylor & Marsh, 2017) via a laptop computer.The dynamic was consistently mezzo-forte (mf) with no tone gradation.

Design
A mixed 3 (Sound Condition) × 4 (Sound Type) design was used, and the within-participants factor of Sound Condition was classified into three levels, quiet (no distracter), familiar distracter present, and unfamiliar distracter present.Each level comprised 10 trials, and each of the four song sets (the fourth being the distracter set) contained 10 melodies.The between participants factor of Sound Type (Group) had four levels, Melody, Familiar Lyrics, Speech (spoken lyrics), and Unfamiliar Lyrics, and participants were alternately allocated to each Sound Type (Group) according to the balance of numbers per condition.Each Sound Type had 72 participants with an even distribution of age (except 78 participants participated in the Familiar Lyrics Group) as shown in Table 1.To address repeated measure unsystematic variation, participants were alternately assigned into four 72-participant Groups,6 the six orders of presentation within each being counterbalanced.All participants experienced a quiet control condition.
Hummed recall accuracy of melodic contour was the dependent variable, prior knowledge was established in a final Recognition test that involved indicating whether participants recognised familiar against unfamiliar material using the same stimulus type (e.g.sung-lyrics, melody, spoken-lyrics) they had previously experienced within the experiment.
Although not a key goal, a mixed 3 (Sound Condition) × 4 (Sound Type) design also computed the onset time (OST) to begin each melody performance.

Procedure
Participants were tested individually in a quiet room in the presence of the researcher.Following an explanation of task requirements, instruction to ignore any auditory sequences heard through their headphones, completion of a demographic response sheet 7 and consent form, two practice trials in quiet allowed familiarisation.Task response was to hum the target melody, or as much as was known, from a 32-second visually presented title before an audible bleep signalled the next title.Performances were recorded via the computer's  , known and understood by the author.The scale was designed to allow a zero score for no rewardable material, the total number of bars for each melody then being divided into four sections of equal number to match the criterion descriptors.Two independent raters, musically trained, qualified teachers, also each assessed half of each Sound Type Group for participant accuracy to the target melody following the same (understood) scoring criterion.To assess author/inter-rater reliability an intraclass correlation coefficient (ICC) was conducted and showed a high consistency between author/researcher and independent raters, Cronbach's α = .966. 8 In addition, performances were judged recognisable in relation to the song title by two non-musically trained raters, Cronbach's α = .988.

Statistics
Within the results sections throughout the current article, we report Cohen's d as a measure of effect size for pairwise comparisons and the size of these effects are interpreted as small, medium, or large using Cohen's (1988) conventions.We also report Bayes factors for all pairwise comparisons.These were computed using a Cauchy prior with a scaling factor set to 1 (Rouder et al., 2009), and we used the categorisation scheme developed by Jeffreys (1961) and updated by Lee and Wagenmakers (2013) to define the strength of evidence for the alternative hypothesis and the null hypothesis, where appropriate, to back-up null hypothesis significance testing (NHST) based inferences relating to the absence of between condition differences.Since some of the key conclusions within our study rest on the observation of null effects the Bayesian approach was adopted to provide further support for the null hypothesis (H 0 ).Where assumptions of sphericity are violated, the Huynh-Feldt correction is deployed and reported.We report unadjusted pairwise comparisons for multiple tests.However, when the p values depart from those derived from the Holm-Bonferroni sequential method (Holm, 1979) to deal with familywise error rates, we report the adjusted p value in parenthesis preceded by an Asterisk after reporting the unadjusted p value.

Retrieval accuracy
As illustrated in Figure 2, the mean proportion of correct recall was affected by the presence of sound.Data was relatively consistent across Sound Type in the quiet condition although overall performance was slightly better for the Unfamiliar Lyrics Group in comparison to the Melody Group, MD = .203,SE = .103,p = .05(*p = .3);95% CI [.000, .405],d = .321,BF 01 = 1.341.However, this advantage was not maintained in the Sound Conditions.Familiar distracter stimuli appeared to produce greater disruption than unfamiliar stimuli, but this depended on whether the distracters comprised melody (without lyrics), a combination of lyrics sung to a melody or speech (spoken lyrics).
A 3 (Sound Condition: quiet, unfamiliar, familiar) × 4 (Sound Type: Melody, Familiar Lyrics, Unfamiliar Lyrics, Speech [spoken lyrics]) mixed ANOVA showed a significant main effect of Sound Condition, F(1.960, 568.49) = 161.226,MSE = .230,p < .001,h 2 p = .357.There was a significant betweenparticipants main effect of Sound Type, F(3, 290) = 6.089,MSE = 1.335, p < .001,h 2 p = .059,and a significant interaction between Sound Condition × Sound Type, F(5.881, 568.494) = 10.686,MSE = .230,p < .001,h 2 p = .100.A simple effects analysis (LSD) to decompose the Sound Condition × Sound Type interaction found that retrieval performance was poorer when accompanied by sound, regardless of Sound Type or whether it comprised a familiar component (e.g.lyrics): Performance in quiet was superior to performance in all other Sound Conditions regardless of Sound Type (all ps < .05).
We first compared unfamiliar and familiar Sound Conditions within Sound Type, the only significant difference to arise was for the Unfamiliar Lyrics Group (MD = −.325,SE = .081,p < .001;95% CI [−.485, −.165], dz = 0.533, BF 01 = .001,demonstrating extreme evidence for H1), thereby illustrating a familiarity effect whereby unfamiliar lyrics sung to a familiar melody was more disruptive than unfamiliar lyrics sung to an unfamiliar melody.Next, we compared the impact of unfamiliar, then familiar, stimuli across Sound Type.
Unfamiliar Stimuli.In relation to unfamiliar stimuli, an analysis of Sound Type (e.g.Group) as a function of Sound Condition, found that melody (without lyrics) was more disruptive than speech

Mean onset time
Mean onset times as a function of Sound Condition and Sound Type can be observed in Figure 3.A 3 (Sound Condition: quiet, unfamiliar, familiar) × 4 (Sound Type: Melody, Familiar Lyrics, Unfamiliar Lyrics, Speech [spoken lyrics]) mixed ANOVA

Recognition test
The results of the recognition test, which served as a check for the familiarity manipulation, demonstrated that 79% of familiar melodies were known by participants across Groups.Pairwise comparisons for Sound within Groups showed no significant differences between conditions.The results of Experiment 1 demonstrate for the first time that the production of song, via humming, from long-term memory is disrupted by the mere presence of background sound regardless of the type of sound deployed (melody or speech).This suggests that vocal-motor planning in the production of sequence from long-term memory, like vocal-motor processing in the context of visualverbal serial recall (e.g.Hughes & Marsh, 2017;Jones et al., 2004), is susceptible to disruption from the mere presence of background sound.Crucially, however, in contrast to studies reporting comparable (Jones & Macken, 1993) or greater (Körner et al., 2017) disruption of visual-verbal serial recall by speech than by non-speech sounds, nonspeech sounds (instrumental melodies) were more disruptive of humming performance than were speech sounds (spoken lyrics), regardless of their familiarity.This shows that the similarity between the sequential information provided by the irrelevant material, compared to the target melody (greater for irrelevant melody than irrelevant spoken material), exacerbates disruption.Familiar melody in combination with lyrics (regardless of the familiarity of those lyrics) drove additional disruption of melody retrieval.The fact that unfamiliar lyrics combined with familiar melody (i.e.unfamiliar lyrics sung to a familiar melody) were shown to be more disruptive than unfamiliar lyrics combined with unfamiliar melody (i.e.unfamiliar lyrics sung to an unfamiliar melody), suggests incongruity in familiarity between lyrics and melody may also be important in determining disruption.Finally, a striking finding from the analysis of onset times demonstrated that a combination of familiar melody with familiar lyrics (familiar lyrics sung to a familiar melody) slowed initial production of a target melody to a greater extent than all other conditions, suggesting this condition differentially impairs retrieval of the target melody.
The results of Experiment 1, in part, support the interference-by-process component of the perceptual-motor view of memory (Hughes & Jones, 2005;Jones et al., 2006Jones et al., , 2007)).Experiment 1 clearly showed familiarity within a component of song (melody or lyrics) to be a potent distracter for melody retrieval, especially when both the melody and associated lyrics components are familiar.This pattern of results suggests disruption related to increased demands on the vocal-motor system resulting from the concurrent involuntary processing of sound sequences.
At the same time, the pattern of findings is inconsistent with interference-by-content views (Neath, 2000;Salamé & Baddeley, 1982), that assume disruption by irrelevant sound occurs at the item-level.According to such a view, familiar melodies should have been no more distracting than unfamiliar melodies, regardless of the presence of lyrics.Since acoustic variability of all musical items in Experiment 1 was carefully controlled, it is unlikely that any additional disruption produced by familiar melody with lyrics (familiar or unfamiliar) over unfamiliar melody with unfamiliar lyrics is attributable to greater acoustic variation (cf.Schlittmeier et al., 2008) and hence a more pronounced (acoustic) interference-by-process (cf.Jones & Tremblay, 2000).It has been suggested that musicians, as compared to non-musicians, use two working memory (WM) systems (Schulze et al., 2011).These purportedly comprise the phonological loop (e.g.Baddeley, 1986)-for rehearsing verbal information-and a tonal loop-supporting pitch rehearsal.It is difficult to see how appeal to a tonal loop could explain the patterns of irrelevant sound disruption observed in Experiment 1 since the pitch information at the item-level is the unit of currency within the tonal loop and the disruption observed in Experiment 1 is attributed to the involuntary processing of sound sequence and post-categorical factors (e.g.familiarity).
In relation to the independence vs. integration debate, the finding that lyrics and melody combined within song, familiar or unfamiliar, in Experiment 1 impaired melody retrieval to a greater degree than spoken-lyrics, familiar or unfamiliar, suggests that song is stored differently to verbal information in WM (Berz, 1995).These results appear to support a degree of independence between lyrics and melody (Besson et al., 1998;Besson & Schön, 2001;Peretz et al., 1994).Further, that spoken-lyrics, familiar or unfamiliar, without melody, were less disruptive than familiar melody and sung-lyrics, also suggests a degree of separation.

Experiment 2
The purpose of Experiment 2 was to shift the focal task process to enable further evaluation of the integration debate and an interference-by-process account (cf.Jones & Tremblay, 2000) of the disruption of song production by different components of song.
Instead of requesting the production of melody via humming, Experiment 2 required participants to vocally produce, via speech, the lyrics component of familiar song in the presence of the same irrelevant auditory conditions as for Experiment 1. Changing the characteristics of the focal task is a device frequently used to enable further assessment of the interference-by-process view (Hughes et al., 2007;Marsh et al., 2018).If the same pattern of auditory distraction is obtained when attempting to speak the lyrics song component as found when trying to produce the melody component of familiar song by humming (Experiment 1), then this would support the notion that melody and lyrics appear to be integrated within song (e.g.Schön et al., 2010).If, however, spoken or sung lyrics impair spoken lyrics performance to a greater degree than melody alone, then some evidence would be obtained for the independence of melody and lyrics processing within song (e.g.Besson et al., 1998;Bonnel et al., 2001).
It should also be considered that the additional demands on semantic processing for lyrics assembly/performance may render the task vulnerable to disruption via the presence of meaning within the irrelevant sound (e.g.Jones et al., 2012;Marsh et al., 2009;Marsh & Jones, 2010).Previous studies have shown that impairment to tasks requiring semantic processing, such as the free or, categorically organised, recall of lists of semantically related words (Marsh et al., 2008(Marsh et al., , 2009(Marsh et al., , 2014)), or the generation of category exemplars from semantic memory (Jones et al., 2012;Marsh et al., 2017), is prevalent when the irrelevant sound contains semantic information (for a related finding, see Meng et al., 2020).This finding suggests that the principle of interference-by process also extends to automatic (applied to to-be-ignored sound) and voluntarily (applied in the context of the memory task) semantic processes (Marsh & Jones, 2010;Meng et al., 2020).According to the notion that a semantic interference-by-process can arise for lexical semantic retrieval, to-be-ignored spoken lyrics should disrupt spoken retrieval of lyrics from a target song, more than to-be-ignored melody (the opposite pattern identified from Experiment 1).However, some caution might be exercised here, since the extent to which familiar melody alone also governs the implicit retrieval of associated lyrics (Pring & Walker, 1994), and in doing promotes additional disruption via a semantic interference-by-process, remains unclear.

Method
Any modifications/differences from Experiment 1 are detailed below.

Participants
Two hundred and eighty-eight adults aged 18-90 (M = 46.53,SD = 22.41) took part, and a mixed ANOVA for 144 participants aged over 50 years showed Condition × Cognition interaction non-significant.The interaction between Sound Condition and Sound Type detected in Experiment 1 (with retrieval accuracy as dependent variable) had an effect size of h 2 p = .100.An apriori power analysis (using G*Power) of ANOVA with repeated, withinbetween measures, and number of groups set to 4 and number of measurements set to 3 revealed that a sample size of at least 36 participants is needed to detect an interaction effect of this magnitude.

Design
An identical mixed 3 (Sound Condition) × 4 (Sound Type) design was used: The dependent variable being exact recall accuracy of known song lyrics within each bar of the song.A total of 72 participants took part in each Sound Type Condition, respectively.

Procedure
To ensure comparative assessment, each bar of the song's spoken lyrics matched their hummed melody counterpart and were measured with reference to the number of correct lyrics produced in each bar rather than the correct notes of pitch hummed as for Experiment 1.9 Musically trained rater comparison revealed Cronbach's α = .994.There were no non-musical raters, speech assessment considered to be less subjective than melody.Demographic responses,10 were comparable to Experiment 1.

Retrieval accuracy
Figure 4 shows the retrieval accuracy of spoken lyrics as a function of Sound Condition and Sound Type in Experiment 2. Retrieval accuracy was higher for all Sound Type Groups in the quiet condition (although significantly lower for the Unfamiliar Lyrics as compared to Melody Group, p = .027)identifying the Melody Group as the most able.The disruption of the accuracy of lyric retrieval by to-be-ignored sound appeared to depend on the familiarity of the distracters and whether they comprised melody (without lyrics), lyrics combined with melody or speech (without melody).Accuracy appeared to be impaired to a greater extent in the presence of a familiar, as compared to an unfamiliar, distracter stimulus, and more so when lyrics were familiar (Familiar Lyrics Group).Furthermore, melody (without lyrics) was generally less disruptive than speech (spoken lyrics) but only when the distracter stimuli were unfamiliar.

Recognition test
Based on the Recognition test, 80% of song lyrics were known by the participants across conditions in Experiment 2. Pairwise comparisons for Sound within Groups showed no significant differences between conditions.

Discussion
The results of Experiment 2 showed target lyrics retrieval-as indexed by the accuracy of lyrics retrieval performance-was adversely affected by all Sound Conditions as compared to quiet (with the exception of unfamiliar melody).Of greater importance, however, was that the results clearly demonstrate that the production of spoken lyrics is more detrimentally affected by sung lyrics than by spoken lyrics or melody thereby replicated the results of Experiment 1.In the absence of lyrics, familiar melody was more potent at impeding spoken lyric production than was unfamiliar melody.Crucially, however, familiar lyrics combined with familiar melody (familiar lyrics sung to an associated familiar melody) was more disruptive than familiar lyrics combined with an unfamiliar melody (familiar lyrics sung to an unfamiliar melody).This suggests that melody familiarity overrode lyrics familiarity if there was an incongruity (familiar lyrics sung to an unfamiliar melody).A striking difference between the patterns of disruption observed between Experiment 1 and Experiment 2 is that unfamiliar speech (spoken lyrics) produced greater disruption of lyric retrieval than unfamiliar melody (without lyrics).
The opposite pattern was true in Experiment 1 whereby unfamiliar melody (without lyrics) produced greater disruption than unfamiliar speech (spoken lyrics).
Although some previous studies provide evidence that a common neurological network subserves lexical/phonological and melodic processing (e.g.Schön et al., 2010), the results of Experiment 2 support a degree of processing independence (e.g.Besson et al., 1998;Besson & Schön, 2001;Peretz et al., 1994) at the behavioural level, as here, irrespective of familiarity, sung-lyrics were more disruptive than spoken lyrics of spoken lyrics retrieval.The fact that unfamiliar spoken lyrics produced greater impairment than unfamiliar melody likewise suggests different processes are involved in melody against spoken retrieval, as the opposite pattern of disruption was observed for melody retrieval via humming in Experiment 1.The pattern of disruption is consistent with the notion that the vocal plan for humming, compared to spoken lyrics retrieval, requires melodic, pitch, spectral and temporal information and the processing of these components could be disrupted by similar properties within the to-be-ignored sound (see later).
For Experiments 1 and 2 the passive, interference-by-content view (e.g.Neath, 2000) was pitched against the functional, interference-byprocess view (Hughes & Jones, 2005;Jones & Tremblay, 2000).According to the interference-bycontent account, whereby disruption occurs at the item-level, familiarity with lyrics or melody should be no more distracting than unfamiliar lyrics/ melodies.Inadequacies of the interference-bycontent account are underscored by several findings in the context of Experiment 2. For example, familiar melody produced more disruption than unfamiliar melody despite little overlap in the similarity in content between the task-irrelevant and to-be-recalled material.One possible explanation for the difference in disruption, in line with the interference-by-content view, is that familiar melody automatically activates familiar lyrics and these disrupt retrieval of target lyrics (Pring & Walker, 1994).However, this view does not seem to adequately explain why familiar lyrics combined with familiar melody are more disruptive to lyric retrieval than unfamiliar lyrics combined with familiar melody since in both cases the presence of task-irrelevant lyrics should interfere with the representation and production of task-relevant lyrics.Furthermore, the interference-by-content account does not adequately explain how performance disruption attributable to different properties of taskirrelevant sound are acutely sensitive to the nature of the focal task processing.
The overarching pattern of disruption observed from the Sound Conditions and Sound Type deployed in Experiments 1 and 2 cohere with the suggestion that the presence of to-be-ignored sound containing lyrics impacts upon the efficacy of an articulatory plan responsible for the assembly and production of sequences of lyric patterns.This notion coheres with the perceptual-gestural view whereby the motor planning process operates according to the sequence of items rather than their individuality (e.g.Hughes & Jones, 2005).Tobe-ignored sung familiar lyrics in particular, appear to have disrupted the motor-programming necessary to retrieve and perform familiar target melodies and spoken production of lyrics (cf.Leman, 2007;Leman & Maes, 2015).This finding, in part, supports the concept that the motor-planning mechanism underpinning spoken lyric production is more vulnerable to disruption via the presence of sung against spoken lyrics.
The finding that spoken lyric production in Experiment 2 was disrupted more by the presence of speech (spoken lyrics) than melody (without lyrics) when the stimuli were unfamiliar, and that the reverse was true for melody production via humming in Experiment 1, aligns well with the interference-by-process account (Jones & Tremblay, 2000;Marsh et al., 2009).According to this account, the degree and nature of disruption by to-be-ignored sound is jointly dependent on the properties of the sound and the characteristics of the processes underpinning the focal task.The Semantic retrieval element of lyric production, therefore, would render the task more susceptible to disruption via the processing of the semantic, as opposed to the acoustic, properties of to-be-ignored sound (e.g.Jones et al., 2012).Likewise, the melody retrieval element of the humming production task would render the task more susceptible to disruption via the processing of the acoustic features of the to-be-ignored sound (pitch, spectral and temporal information) as opposed to its semantic attributes.
To assess more directly whether effects observed in Experiment 1 (melody retrieval via humming) and Experiment 2 (spoken lyric retrieval) were a consequence of the combination of the characteristics of the focal task and the nature of the to-be-ignored sound, a comparison analysis including data from Experiment 1 and 2 was undertaken.

Further discussion
The key finding resultant from the comparative analysis was that long-term memory retrieval/performance accuracy for Experiment 1 and Experiment 2 did not show similar patterns of retrieval in the presence of the same Sound Type distracters.The fact that melody retrieval via humming (Experiment 1) was impaired to a greater degree than retrieval of spoken lyrics (Experiment 2), by the same to-be-ignored sound, implies some independence of the processing of melody and lyrics (Besson et al., 1998;Bonnel et al., 2001;Peretz et al., 1994).Of particular interest was that in the context of an unfamiliar stimulus, a dissociation arose between Speech Groups.In Experiment 1, humming a target melody was significantly disrupted more by an unfamilar melody (without lyrics) than by unfamiliar spoken lyrics.In contrast, for Experiment 2, spoken lyrics retrieval performance was significantly disrupted by unfamiliar spoken lyrics as compared with an unfamiliar melody (without lyrics).While onset time was significantly increased by to-be-ignored sung-lyrics for melody retrieval via humming in Experiment 1, no type of sound against quiet produced any differential prolonging of onset time for lyrics retrieval via speaking in Experiment 2.
In sum, the results thus far have been interpreted as illuminating a specific interference-byprocess (Jones & Tremblay, 2000) in the context of melody retrieval via humming (Experiment 1) and retrieval of spoken lyrics (Experiment 2).Earlier we described how the vocal-motor system is involved in the planning of melody retrieval and production (cf.Godøy & Leman, 2009;Leman, 2007;Leman & Maes, 2014, 2015) and musical imagery (Halpern, 2001;Smith et al., 1995).For example, Halpern (2001) proposes that the supplementary motor area (SMA) is activated during the mental "replaying of music" and may mediate rehearsal involving motor programmes including imagined humming.Thus, the disruption produced via to-be-ignored sounds could be localised to their degree of competition for the SMArelated processes: Familiar lyrics combined with either familiar or unfamiliar melody, may be the most potent distracter since this overlearned sequence of melody (and lyrics in the case of familiar lyrics) may be a particularly strong competitor to target retrieval requiring melody or spoken lyric production.In the specific instance of exposure to familiar lyrics to an unfamiliar tune it is possible that concurrent parsing of both the auditory unfamiliar melody and the "correct tune" associated with the lyrics could initiate a competing response.
A question remains, however, concerning whether this strong competition for target retrieval imposed by familiar melody combined with familiar/unfamiliar lyrics results in a seizure of the vocal motor system by the to-be-ignored sound regardless of the particular processing required by the focal task.For example, are exactly the same effects observed for a task that merely requires the visual-verbal serial recall of digits?
As in the case for musical imagery tasks, the SMA has also been implicated in the serial rehearsal of visual-verbal items (Awh et al., 1996;Smith & Jonides, 1998).However, the seriation processes required for the serial recall task may be both quantitatively and qualitatively different from that of melody retrieval via humming and/or retrieval of spoken text.For example, in contrast to retrieval of melody or spoken lyrics, serial recall is characterised by the phenomenological experience of relentless subvocal repetition of the sequence of items.Further, the serial recall task, unlike the melodic and lyrics retrieval task, does not require recall of melodic elements or semantically rich (e.g.lyrics) material.Moreover, familiar melodies are, by definition, learned, habitual sequences while recall of random sequences of letters or digits (typically used as to-be-recalled items in serial recall tasks) are not.The former is typically more disruptive to task performance (Marsh et al., 2013).Thus, the competition for action in serial recall may be driven by different attributes of the to-be-ignored material: its familiarity or the presence of lyrics that interfere with melody and spoken lyrics retrieval may be redundant in disrupting performance on the serial recall task.To address this, a non-music task involving strict serial recall was used in Experiment 3. The deployment of the serial recall task enabled the determination of whether dissociations in susceptibility to task performance disruption from the same irrelevant Sound cCnditions (e.g.familiarity of melody/lyrics) can be modulated by the nature of goal-driven processes adopted (e.g.retrieval of melody/lyrics vs. serial rehearsal).Evidence to this effect would gel with predictions of the interference-by-process account (Jones & Tremblay, 2000;Marsh et al., 2009).

Experiment 3
The classical irrelevant sound effect (ISE; Beaman & Jones, 1997) is associated with the serial recall task (Colle & Welsh, 1976;Jones & Macken, 1993;Salamé & Baddeley, 1982).According to the interference-by-process account of the ISE (Jones & Tremblay, 2000), task-irrelevant sounds gain automatic access to a representational system wherein they are organised into coherent streams that flow directly into the articulatory-planning process (Jones et al., 1993).Since interferenceby-process results from a conflict of seriation processes, the degree of focal task disruption rests upon the degree of seriation necessary for task performance (Jones & Tremblay, 2000).This view correctly accounts for why little, if any, impact of irrelevant sound is observed on tasks for which serial recall (seriation) is not required (Beaman & Jones, 1997;Klatte et al., 2007).Further, the interference-by-process account is consistent with the finding that the post-categorical content of tobe-ignored sound (e.g.phonology or meaning) is uninfluential in the distraction of serial recall performance (e.g.meaningfulness, Jones et al., 1990).By design, serial recall tasks minimise participants' use of syntactical or lexical semantic processing that might aid ordered recall.Consequentially, participants are considered to co-opt vocalmotor processing and paralinguistic skills to graft transitional probabilities onto items that do not have such properties.However, motor-processing using serial order is open to interference by other serially ordered material that may at some level fit the action-parameters (Hughes & Jones, 2005).
According to the interference-by-process view, the preattentive processing of pre-categorical acoustic changes drives the disruption of serial recall.Therefore, the familiarity of irrelevant auditory distracters should not be influential in the disruption sound conveys to serial recall performance.Of central interest in Experiment 3 is whether the specific patterns of disruption observed from Experiments 1 and 2, including the specific effects of familiarity, are produced due to active engagement in a melody and lyrics retrieval process.The interference-byprocess account (e.g.Jones & Tremblay, 2000) assumes that the degree and type of disruption by sound, is dictated jointly by the demands of the focal task, and the characteristics of the tobe-ignored sound.On this account, the same patterns of disruption from the auditory conditions adopted for Experiments 1-2 and observed on melody retrieval (Experiment 1) and lexical (lyrical) retrieval (Experiment 2) should not be observed for the serial recall task that requires vocal-motor processing-subvocalisation-for target items (digits) serial rehearsal, but does not require access to, or production of, song components (e.g.melody/lyrics).
If the results of Experiment 3 are to be consistent with the interference-by-process account (Jones & Tremblay, 2000), then the changing nature of acoustic properties of Sound Condition should be disruptive of serial recall-with irrelevant speech, as compared to tonal melody, being more disruptive (due to its greater acoustic complexity; cf.Tremblay et al., 2000).For example, unlike the pattern of results obtained in Experiment 1, the combination of familiar melody with lyrics should not disrupt the serial recall of digits more than unfamiliar melody with lyrics.This is because the clash of processing in serial recall is due to seriation processes driven by acoustical changes within to-be-ignored sequence and not post-categorical processes related to the familiarity of the task-irrelevant sound.

Method
Participants Two hundred participants aged 18-88 (M = 46.55,SD = 22.68) undertook this experiment, with 100 aged over 50 years taking the Addenbrooke's Cognitive Examination Revised (ACR).We used the cut-off point of <82 for inclusion, giving 0.84 sensitivity for dementia, with MMSE >24 showing no cognitive impairment.Although two participants scored below this point removing these participants did not materially affect the pattern of results and conclusions drawn.Demographic outcomes were non-significant. 11In a study by Sörqvist (2010; Experiment 2) an effect with the size of d z = 1.18 was obtained for the difference in disruptive effects of changing-state tone sequences and steady-state tone sequences on serial recall.This can be taken as indicative of the sample size needed to detect an effect of melodies on serial recall.An apriori power analysis (using G*Power) indicates that a sample size of at least 12 participants is needed to detect an effect of this magnitude.

Design
A 5 (Sound Condition) × 2 (Sound Type) design was adopted.The within-participants variable of Sound Condition was classified into five levels for each of two between participants Sound Type Groups: Group 1, quiet (no distracter), Melody-familiar, Melody-unfamiliar, Unfamiliar sung-lyrics-familiar melody, Unfamiliar sung-lyrics-unfamiliar melody: Group 2, quiet (no distracter), Familiar-sung-lyricsfamiliar melody, Familiar sung-lyrics-unfamiliar melody, Speech-familiar, Speech-unfamiliar.Accuracy of digit recall in strict serial order served as the dependent variable.

Materials and apparatus
Digits 1-8 were compiled into 10 series of random presentation for each Sound Condition, controlled by the E-Prime (2) psychology software tool.A confidence continuum allowed for participant selfreported prior knowledge of distracter melodies and lyrics.

Procedure
There were three main modifications to the procedure from Experiments 1-2.First, all participants experienced five, as compared to three, irrelevant Sound Conditions; second, each sound stimulus was reduced from 32 to 10-second (see Klatte et al., 2002) to correspond to focal task duration; and third, irrelevant sound occurred during short-term memoranda presentation, rather than during long-term memory recall/ performance.
Participants undertook four practice trials in quiet conditions where they were instructed to memorise, in the order of visual presentation via a computer screen, series of eight single digits drawn from digits 1-8.No digit was repeated within a trial.Prior to each trial a visual orienting + was shown, and instructions to click "Begin Trial" were consistent for each movement through the programme.Digits were presented consecutively at a rate of 1 per second (800-millisecond on, 200millisecond off) in 72-point equidistant black Monaco font on a white background.From a viewing distance of 45 centimetres each number subtended a vertical visual angle of 1.49°and a horizontal angle of 0.92°.Following a 2-second retention interval at the end of each trial, participants saw a circular array of all 1-8 digits (changed for each trial to eliminate practice order effect) and were required to click on each digit in order of presentation.The "?" in the centre of the circle to be used if a digit could not be recalled in a specific position.50 sequences of digits were created by sampling without replacement from the set 1-8.From these, 10 sequences were allocated to each Sound Condition.A confidence continuum was also developed, that allowed for participants' selfreported confidence in their decision, ranging from "Not confident" to "Totally confident".The total task took approximately 45 min.Raw data from each participant was collapsed to establish the mean, accuracy measured using a scale ranging from zero, for no digits, to eight for all digits correctly recalled according to strict serial recall criterion.

Results
Mean serial recall performance as a function of Sound Conditions in Experiment 3 can be observed in Figure 5(Panel A and Panel B).
Following a one-way repeated measures ANOVA, serial-digit recall scores were found to be more accurate in quiet conditions.An unexpected inconsistency in recall score accuracy in the quiet condition between Group 1 and Group 2 created a slight baseline slip (Group 2 had higher performance).However, for both Groups, scores in quiet conditions were significantly higher than those from any of the distracter conditions.
The combined Group mean Recognition scores for known melodies was 8.55 12 with 8.48 for known lyrics.A univariate analysis on d'prime scores (sensitivity/discriminability) for melody or spoken lyrics showed no difference between Groups.However, increased confidence was identified, from the c scores, for matched speech (spoken lyrics) compared to matched melody.

Discussion
In line with previous literature using musical excerpts as distracters (Alley & Greene, 2008;Baddeley, 1986;Jones & Macken, 1993;McCorkell, 2012;Perham & Sykora, 2012;Perham & Vizard, 2011;Pring & Walker, 1994;Salamé & Baddeley, 1989;Schlittmeier et al., 2008), serial recall accuracy was highest in quiet conditions as compared to all Sound Conditions, and lowest, irrespective of melody or lyrics familiarity, when combined within song (cf.Iwanaga & Ito, 2002;Nittono, 1997;Salamé & Baddeley, 1989).Consistent with the interference-by-process approach (Jones & Tremblay, 2000), serial recall performance was poorer for spoken lyrics unaccompanied by melody, than for melody alone, suggesting that the acoustic complexity of speech over tonal melody yields a more pronounced interference via competing seriation processes.Note that the opposite pattern is true for the melody retrieval task in Experiment 1, wherein melody was more disruptive than spoken lyrics.Although spoken lyrics were more disruptive than melody in Experiment 2, that required spoken retrieval of lyrics, we assume that the basis of this effect (a semantic interference-by-process) is different from that in the context of serial recall in Experiment 3. Taken together, the opposite findings in relation to task disruption via to-be-ignored melody against spoken lyrics observed in Experiments 1 and 3 coheres with the notion that the deliberate processing of melody in the focal task (Experiment 1), renders the task vulnerable to disruption from the automatic processing of taskirrelevant melody, as the interference-by-process account suggests (Jones & Tremblay, 2000).Also consistent with the interference-by-process account is the failure to replicate the patterns of disruption observed in Experiments 1 and 2. For example, unlike Experiment 1 wherein familiar melody combined with lyrics (either familiar or not) was more disruptive of melody retrieval than unfamiliar melody combined with lyrics, and familiar melody combined with unfamiliar lyrics was more disruptive of melody retrieval than unfamiliar melody combined with unfamiliar lyrics, these sequences were not differentially disruptive of serial recall.Similarly, the lack of any differential disruption according to different sequence types on serial recall differed from the results of Experiment 2 wherein familiar sung lyrics combined with familiar melody was more disruptive of lyrics retrieval than familiar sung lyrics combined with unfamiliar melody.This pattern of results suggests against the notion that familiar melody combined with lyrics usurps control of the vocal-motor system regardless of the focal task process.Rather, it suggests that a more specific competition for action drives the disruptive effect attributable to a combination of familiar melody with lyrics. 12Hit rate from 10 items.
The results of Experiment 3 mostly favour interference-by-process/perceptual-gestural account (Hughes et al., 2009).According to this account, serial recall requires the conversion of visual-verbal sequences into articulatory form, exploiting the speech-planning mechanism, for organisation and maintenance of the required sequence via subvocal rehearsal.Thus, the process required for retention of visual-verbal information across the short-term is independent of melodic/semantic processing.Further, the "perceptual organization" of irrelevant sound via the streaming process operates on precategorical acoustic changes.Thus, this interference-by-process mechanism correctly explains why the familiarity of melody/lyrics when combined within task-irrelevant sequences have no disruptive power in the context of serial recall.
The idea that impairment results from requirement to remember and simultaneously ignore similar items that require similar seriation processes-interference-by-content- (Neath, 2000;Salamé & Baddeley, 1982) was not supported from Experiment 3 results.Irrespective of the nature of the sound, serial-digit recall was disrupted compared to recall in quiet conditions.
One unexpected finding from Experiment 3, was that of a distracter familiarity effect whereby familiar as compared to unfamiliar melody was more disruptive of serial recall.This difference occurred even though the changing-state properties (and thus acoustic variability) of both sequence types was equated-thus their disruptive potency should be similar.This distracter familiarity effect is also somewhat puzzling given that the effect of familiarity failed to materialise when melody was combined with lyrics.At first glance it might appear that the presence of to-be-ignored speech (either familiar or unfamiliar) overrides the disruption produced by familiar melody, possibly due to its presence increasing the changing-state of the sequence and thus its power to disrupt serial recall (Jones, 1994).However, that familiar to-be-ignored spoken lyrics failed to disrupt performance relative to unfamiliar to-be-ignored spoken lyrics suggests some limit to the notion that increasing changing-state information eliminates the unique disruption attributable to distracter familiarity.The cognitive underpinnings of the melody familiarity effect observed in Experiment 3, therefore, requires further investigation.
A pivotal question emerging from the results of Experiment 3 is whether the melody familiarity effect observed for serial recall in Experiment 3, can be reconciled with the interference-byprocess view.Any explanation of this effect in terms of the interference-by-process account must go beyond simplistic conceptualisation of the changing-state effect: It implies some post-categorical processing (e.g.identification of a melody).Adhering to the framework of interference-by-process, one potential explanation for the melody familiarity effect is that familiar melodies, on account of being overlearned, may strongly activate serial order representations that act as a stronger competing subvocal motor plan (Lima et al., 2016), thereby exacerbating interference-by-(seriation)-process.Further, schemadriven processing (Bregman, 1990) may be invoked by a to-be-ignored familiar melody which gives rise to a competing serial order representation that is stronger than that derived from non-schema driven processing of an unfamiliar to-be-ignored melody: Since the strength of order cues dictates the magnitude of interference-byprocess, it is possible that greater disruption of serial digit recall should be observed for sequences of familiar against unfamiliar melody.Yet, the distracter familiarity effect might be better explained by a separate mechanism altogether.
While a debate continues regarding whether the changing-state effect reflects attentional capture (e.g.Körner et al., 2018;Labonté et al., 2021), there is consensus that some forms of auditory distraction reflect such attentional diversion.For example, the duplex-mechanism account argues for the existence of two discrete forms of auditory distraction (Hughes et al., 2007(Hughes et al., , 2013;;Sörqvist, 2010): One is attributable to interference-by-process, the other reflects attentional capture (Hughes et al., 2005;Hughes et al., 2007; but see Bell et al., 2010Bell et al., , 2012)).Interference-by-process is manifest through the changing-state effect which, in the context of Experiment 3, accounts for why disruption occurs from all sound sequences, regardless of familiarity, compared to quiet (all sound sequences satisfy the criteria for changing-state).However, in the view of the duplex mechanism account (Hughes, 2014), the distracter familiarity effect (whereby familiar as compared to unfamiliar melody was more disruptive of serial recall) may require appeal to an attentional diversion mechanism.Two forms of attentional capture have been outlined.First, aspecific attentional capture can occur from a sound that violates expectation (e.g.Parmentier et al., 2018).This form of attentional capture occurs even when there is relationship between sound properties and those of task.Second, specific attentional capture occurs when the sound's content has the capacity to divert attention and may be independent of the task (Hughes, 2014).For example, increased task errors may be incurred when the background sound's content is responsible for gifting the sound its power to divert attention, irrespective of the processing involved in that task (Hughes et al., 2007;Vachon et al., 2017).For example, by a high valence word (Buchner et al., 2004;Marsh et al., 2018), or by its personal significance (Röer et al., 2013).In the present context, the additional disruption from familiar relative to unfamiliar melody observed in Experiment 3 may cohere with the notion that familiar, as compared with unfamiliar melody, produces specific13 attentional capture that is independent of the requirements of the task (e.g.Röer et al., 2013;Marsh et al., 2018).

Experiment 4
In Experiment 4, we deploy a task devoid of serial recall, the missing-item task, to assess the mechanism underpinning the greater disruption produced by familiar versus unfamiliar melody.The missingitem task has been shown to be sensitive to disruption via properties of sound that induce attentional diversion (e.g.Hughes et al., 2007;Marsh et al., 2018) but is immune to disruption produced by changing sequences of sound (e.g.Beaman & Jones, 1997).The rationale for adopting the missing-item task was that if familiar melody seizes control of, or recruits, motor-planning systems more than unfamiliar melody, then familiar melody should only impair serial digit recall and not the missing-item task, which does not require such sequential motor-planning.Establishing the missing-item task to be invulnerable to the effect of distracter familiarity would lend support to the perceptual-gestural account (albeit requiring an explanation for why enhanced seriation derives from, for example, schema-driven processing of melody), while finding a disruptive effect of melody familiarity would support an attentional diversion account of the distracter familiarity effect in the context of short-term memory.

Method
Only melody distracter material was delivered for this experiment as the familiarity effect identified in Experiment 3 was observed for the Melody Sound Type Group.Any modification to design and procedures from Experiment 3 are detailed below.

Participants
One hundred and six adults aged 18-85 (M = 43.62,SD = 23.04)participated, 44 of whom were aged over 50 years. 14For theoretical reasons explicated by the interference-by-process account of auditory distraction, there should be no effect of sound on missing-item task performance, unless the sound sequence comprise deviant (e.g.surprising) sound elements.A power analysis of the sample size needed to detect an effect of sound on the missing-item task is therefore, arguably, better conducted based on data that revealed such an effect and that is theoretically consistent with the interference-by-process account.In a study by Hughes et al. (2007;Experiment 2), an effect of background sound (comprising deviating sound elements) on missing item task performance was reported with a size of d z = 1.12.An apriori power analysis (using G*Power) revealed that a sample size of at least 13 participants is needed to detect this effect.

Design
As a within-participants design, the Melody distraction variable was classified: quiet (no distracter), unfamiliar and familiar.Two blocks of three conditions yielded six orders of presentation, fully counterbalanced.The single item response time was not computed.

Materials and apparatus
In order to yield a richer set of data an additional set of ten familiar melodies (without their associated lyrics), and an additional ten unfamiliar matched melodies created a second block of trials of identical structure to the original distracter block.

Procedure
Following completion of demographic15 and consent forms, five practice trials in quiet conditions undertaken.Participants were visually presented with series of eight single digits, drawn from digits 1-9 (each digit presented once in each trial) and directed to identify the missing digit.Digits were presented consecutively at a rate of 1 per second (800-millisecond on, 200-millisecond off) in 72-point equidistant black Monaco font on a white background.Following a 200-millisecond retention interval at the end of each trial, all nine digits (1-9) were presented on the computer screen with instruction to double click on the digit that was missing from the original trial sequence.There was no time-limit on recall.Ten experimental trials were then delivered for each of six condition blocks (2 each in quiet, familiar melody, unfamiliar melody).Participants were informed they should ignore any sounds being played through headphones; onset of the to-be-ignored sound was simultaneous with to onset of each visual digit pattern presentation.A 30-second rest between experimental conditions was permitted, if needed.Following all experimental trials, participants indicated, from a given list, the strategy they had used to complete the missing item task (based on Morrison et al., 2016).

Results
Overall scores in quiet conditions were the highest (maximum 1).Performance decreased in the unfamiliar melody condition with a further decrement to scores in the familiar melody condition (see Figure 6).
Several participants self-reported using a grouping strategy that possibly indicated that they had undertaken some serial rehearsal (cf.Hughes & Marsh, 2020).Therefore, the means for these two groups self-reported differential use on rehearsal were compared (Rehearsal versus non-rehearsal strategy).The ensuing ANOVA revealed no between-participant main effect of Self-Reported Strategy, F(1, 104) < 0.001, MSE = .090,p = .999,h 2 p < .001,and no Sound Condition × Strategy interaction, F(2, 208) = 0.697, MSE = .013,p = .499,h 2 p = .007.Total Recognition test scores yielded a hitrate of 7.216 and the d'prime measure of recognition and the criterion shift score showed that participants were able to discriminate unfamiliar from familiar melodies, albeit conservatively (see Benjamin & Bawa, 2004).

Interim discussion
The results of Experiment 4 were unequivocal: the missing-item task, widely thought to involve nonseriation processes (e.g.Beaman & Jones, 1997;Buschke & Hinrichs, 1968), was indeed vulnerable to the distracter familiarity effect.Consequently, this vulnerability supports the idea that the mechanism underlying the effect is specific attentional capture (Marsh et al., 2018), and not interferenceby-process (Jones & Tremblay, 2000).Instead, the results seem to favour an attentional capture account of the familiarity effect.Moreover, further evidence in support of the attentional capture view was gleaned from the fact that additional familiar over unfamiliar melody disruption occurred regardless of whether participants self-reported a seriation or non-seriation-based strategy, as reflected in the absence of a Sound-Condition × Self-Reported Strategy interaction.Thus, the distracter familiarity effect appears task-process insensitive.In contrast, task processing sensitivity appears to be operating for the disruption unfamiliar melody produces.It appears to be a more pronounced an effect on serial digit recall (Experiment 3) than on the missing-item task 4).This was addressed below, with an analysis that directly compares the results of Experiment 4 with those from the Melody conditions of Experiment 3, that required serial recall.

Further discussion
The pattern of results across both the serial recall and the missing-item task call into question an interpretation that familiar melodies, being overlearned, activate serial-order representationssuch as a competing subvocal motor plan (e.g.Lima et al., 2016)-to a greater degree than unfamiliar melody distracters, empowering them with a superior propensity to clash with subvocal motor output organisation underpinning the serial rehearsal process.On this approach, it would be expected that an interference-by-process, attributable to melody familiarity would be exacerbated on performance of serial, rather than non-serial, shortterm memory tasks.At odds with this, the melody familiarity effect was observed on the non-seriation-based missing-item task (Beaman & Jones, 1997;Jones & Macken, 1993).Rather, the familiarity effect specifically gels with an attentional diversion view according to which the content of familiar, as compared to unfamiliar, melody, is responsible for diverting attention away from the focal task (i.e.specific attentional capture; cf.Marsh et al., 2018).

General discussion
The results of the current series can be summarised as follows: As indexed by accuracy of melody retrieval via humming, Experiment 1 demonstrated that the mere presence of background sound disrupted the production of song.Regardless of their familiarity, instrumental melodies were more disruptive than spoken lyrics.Familiar melody in combination with familiar or unfamiliar lyrics, produced additional disruption to song retrieval via humming.Furthermore, unfamiliar lyrics combined with familiar melody produced greater disruption than unfamiliar combined with unfamiliar melody.Onset time for song retrieval was increased when familiar lyrics were combined with melody as compared with all other background Sound Conditions.Similarly, Experiment 2 revealed that the accuracy of the retrieval of target lyrics was disrupted by all background Sound Conditions.Replicating Experiment 1, greater disruption was observed for sung lyrics as compared with spoken lyrics or melody alone.However, familiar melody disrupted spoken lyric production to a greater extent than unfamiliar melody.Familiar lyrics combined with familiar melody disrupted spoken lyrics retrieval more than unfamiliar lyrics combined with an unfamiliar melody.Unfamiliar spoken lyrics produced greater disruption than unfamiliar melody alone.This is at odds with Experiment 1 where the opposite pattern emerged.Experiment 3 revealed that all background sound conditions disrupted the serial recall task.Here, spoken lyrics unaccompanied by melody produced greater disruption than melody alone, an opposite pattern to that found in Experiment 1. Unlike Experiments 1 and 2, no differences emerged between the conditions in which familiar, or unfamiliar lyrics were combined with unfamiliar or familiar melody.However, familiar melody alone produced greater disruption of serial recall than unfamiliar melody alone.This melody familiarity effect was replicated in Experiment 4 wherein the missing-item task that does not necessitate serial order processing was adopted to address whether the manifestation of this melody familiarity depended on the adoption of a seriation strategy.
The results of all the experiments appear to gel well within a process-oriented account of the disruption produced by task-irrelevant material (Hughes & Jones, 2005;Jones & Tremblay, 2000;Marsh et al., 2009;Neumann, 1996).The perceptual-gestural view (Hughes & Marsh, 2017;Jones et al., 2004Jones et al., , 2006) ) holds that short-term memory is subserved by general purpose perceptual and motor mechanisms wherein inner speech is coopted to bind together visual-verbal items into motor-plan that enables their sequential reproduction.In the case of the classical ISE, the perceptual processing of serial order in the sound, as part of the perceptual streaming process (Bregman, 1990) conflicts with the vocal-motor processing of serial order in the primary task.In this setting this interference may emerge as the result of the costs of mechanisms that play a role in resolving the competitionfor-action promoted by order cues that arise from the deliberate processing of to-be-remembered material and the automatic processing of to-beignored material (Hughes & Jones, 2003;Marsh et al., 2009).On this view the semantic properties of background sound, including the familiarity of music and/or lyrics, does not play a role in the disruption (Experiment 3; see also Buchner et al.,Figure 7. Probability of correct recall as a function of the quiet condition and unfamiliar irrelevant Sound Condition for the serial recall task (Experiment 3) and missing-item task (Experiment 4).Error bars represent standard error of the means.
1996; Jones et al., 1990;Marsh et al., 2009).This is because the classic ISE is driven by the information that the background sound conveys in terms of its serial order, not its meaning.Such information is superficial fit with the action (or vocal-motor process) of serial rehearsal.Due to its greater acoustic complexity than non-speech sounds, background speech conveys more cues as to serial order that enhances its competition for the vocalmotor process underpinning serial recall performance (e.g.Tremblay et al., 2000).The interferenceby-process approach also explains why different characteristics of background sound become more disruptive when the focal task processes change (Experiments 1-2; Marsh et al., 2008Marsh et al., , 2009;;Meng et al., 2020).When, for example, the primary task requires dynamic vocal-motor retrieval processes involving a familiar target melody (Leman, 2007;Leman & Maes, 2014, 2015), irrelevant melodic (e.g.contour) information extracted from background sound produces stronger specific competition for these vocal-motor processes, while spoken lyrics that lack a melodic element produce weaker competition.This finding is reminiscent of previous research demonstrating a modalityspecific ISE when comparing tones and speech (LeCompte et al., 1997;Pechmann & Mohr, 1992;Schendel & Palmer, 2007;Williamson et al., 2010).Furthermore, background song wherein one of the two elements (lyrics or melody) is familiar, confers even stronger competition with the vocal-motor melodic retrieval processes because they activate candidate overlearned sequences within long-term semantic memory that compete for retrieval.Slower onset time measures for melody retrieval in the presence of familiar background melody combined with lyrics suggests that vocal-motor planning may be particularly compromised by background familiar song perhaps because it assumes control of the retrieval process and requires removal.
The interference-by-process account also readily explains the shift in the patterns of disruption from background sound that occur when the focal task requires lyrics retrieval (Experiment 2).Here, spoken (unfamiliar) lyrics become more disruptive than (unfamiliar) melody because the primary task involves dynamic retrieval processes focused on lexical-semantic representations.Thus, irrelevant semantic information extracted from the speech produces competition for these processes-a semantic interference-by-process (Marsh et al., 2009).That all Sound Conditions produced disruption relative to quiet suggests that vocal-motor planning is required for the overt production of sequences of spoken lyrics.The greater disruption from song in which one element (lyrics or melody) was familiar compared to when both components were unfamiliar coheres with the notion that this information activates long-term representations of well-known songs that highly specify context-compatible, but ultimately response-inappropriate, information in the context of the lyric retrieval task.The greater disruptive effect of familiar against unfamiliar melody (without lyrics) within Experiment 2, suggests that familiar melody may activate competitor song to a greater extent when the focal task requires lyrics retrieval relative to melody retrieval (Experiment 1).Perhaps this is because melody retrieval via humming requires melodic, pitch, spectral and temporal cues, that are disrupted to a greater extent by cues from any (unfamiliar or familiar) background melody.
One finding that is potentially problematic for the process-oriented approach is that familiar melody (without lyrics) produced greater disruption than unfamiliar melody (without lyrics) in the context of visual-verbal serial recall (Experiment 3).Although this pattern was observed for lyric retrieval in Experiment 2, it may have a different basis in the context of visual-verbal short-term memory.To explore this, in Experiment 4 the impact of familiar versus unfamiliar melody was explored in the context of the missing-item task that arguably does not involve, or at least does not necessitate (Hughes & Marsh, 2020;Morrison et al., 2016) memory for serial order and hence vocal-motor planning.That the missing-item task was shown to be as vulnerable to disruption produced by background familiar against unfamiliar melody suggests that the effect in the context of visual-verbal shortterm memory, is underpinned by a mechanism other than interference-by-process, namely specific attentional capture (Marsh et al., 2018).On this view, the content of the background sound has the capability to divert attention away from the focal task, irrespective of the processes underpinning that task (Hughes et al., 2007;Vachon et al., 2017).In the context of visual-verbal serial recall, the changing-state effect and attentional capture can be additive (Hughes et al., 2005(Hughes et al., , 2007;;Marsh et al., 2018).Thus, familiar, and unfamiliar melodies both produce a changing-state effect on serial recall but the additional disruption produced by familiar melody can be attributed to specific attentional capture (for a similar logic, see Hughes & Marsh, 2019).Specific attentional capture can be due to the content being of personal significance (Röer et al., 2013), or intrigue (Hughes & Marsh, to the participant, or because it possesses emotional valence for the listener (Buchner et al., 2004;Marsh et al., 2018).
Taken together, however, the results of Experiments 1-4 can be interpreted within a processoriented approach compared with attentionalresource-based accounts of the disruption produced by background sound (Bell et al., 2019;Cowan, 1995;Lange, 2005;Neath, 2000).For example, on the attentional capture approach (Bell et al., 2019;Cowan, 1995;Lange, 2005), the classical ISE is explained due to acoustic changes-in-state from one item to the next in an irrelevant sequence causing an orienting response towards the sound and thus away from the focal task.On this account it must be assumed that the same patterns of disruption from background sound should be observed regardless of the nature of the focal task-processing.Evidence demonstrating that attentional capture effects are observed on the missingitem task (Beaman & Jones, 1997;Buschke, 1963) that does not engage, or necessitate, a seriation process, while a changing-state effect is not (Hughes et al., 2007; see also Hughes & Marsh, 2020;Marsh et al., 2018Marsh et al., , 2023) ) suggests that attentional capture and changing-state effects are functionally distinct.Similarly, Experiments 1-4 clearly show that the nature of auditory distraction is crucially dependent on the prevailing mental activity further undermines the attentional capture approach while at the same time supporting an axiomatic tenet of the interference-by-process approach (Jones & Tremblay, 2000;Marsh et al., 2008Marsh et al., , 2009)).Indeed, task-sensitivity to auditory distraction suggests that process, not content, dictates the magnitude and character of disruption and aligns with a functionalist approach to memory in which task-goals and the retrieval environment (instructions, cues, task demands) are central to remembering and forgetting (cf.Toth & Hunt, 1999).
The current study also sought to shed light on the integration/independence debate concerning melody and lyrics within song.If the melody and lexical-semantic (e.g.lyric) retrieval is undertaken within the same processing system, or via shared processes, then comparable patterns of disruption should have been observed between Experiments 1 and 2. However, the retrieval accuracy for melody (Experiment 1) and spoken lyrics (Experiment 2) was influenced differently by the same background sounds, which supports the notion of independence (Besson et al., 1998;Besson & Schön, 2001;Bonnel et al., 2001;Peretz et al., 2009).Greater evidence of independence was provided by the finding that familiar melody (without lyrics) was not significantly more disruptive than unfamiliar melody (without lyrics) for melody retrieval in Experiment 1 but was for lyrics retrieval in Experiment 2. Further, that familiar melody with associated lyrics was indeed more disruptive than unfamiliar melody, and unfamiliar melody with unfamiliar lyrics (Experiment 1), suggests that associated lyrics were not imagined or automatically activated by the presence of familiar melody per se (i.e.without lyrics; see Bailes et al., 2012;Pring & Walker, 1994) as an integration account might predict.
Arguably, several findings reported in Experiments 1 and 2 might be understood within the modular model of music processing (Peretz & Coltheart, 2003).On this account, the musical lexicon contains all the representations of musical phrases that an individual has been exposed to during their lifetime.Recognition of an incoming familiar tune requires selection of that tune from others within the musical lexicon.Output from the musical lexicon, for example, when one is to produce a song requires pairing of the song with lyrics that are stored in the phonological lexicon.Song and lyrics are then integrated and planned for vocal production.It is possible that, consistent with the perceptual-gestural approach (Hughes & Marsh, 2017;Jones et al., 2004Jones et al., , 2006)), this planning draws upon a dynamic interplay between music perception, motor-planning and action-planning and reflects embodied cognition (Leman, 2007;Leman & Maes, 2014).On the modular model of music processing, melody production (e.g. via humming, Experiment 1) can occur independently of the phonological lexicon but requires pitch and temporal organisational processes.Activation of the same pitch and temporal processes by a background melody would thus produce difficulty in accessing the musical lexicon for familiar melody as well as its vocal production via humming (Experiment 1).However, spoken lyrics, which primarily activate the phonological lexicon will consequently produce less disruption.Conversely, spoken lyric production makes greater demands on accessing and output from the phonological lexicon and thus activation of other entries within the phonological lexicon by unfamiliar or familiar spoken lyrics produces greater disruption than melody.
According to the modular model of music processing (Peretz & Coltheart, 2003) bidirectional links exist the musical lexicon and the phonological lexicon which means that when a familiar background melody activates its representation within the musical lexicon it can also activate its associated lyrics within the phonological lexicon.Similarly, spoken lyrics that activate the phonological lexicon can activate the associated melody within the musical lexicon.That familiar melody against unfamiliar melody produces greater disruption to spoken lyric retrieval (Experiment 2) than melodic retrieval (Experiment 1) suggests that the link between the musical lexicon to the phonological lexicon is stronger than the other way around: That is, familiar melody activates representations of associated lyrics that competes with the vocal planning of spoken lyrics, while spoken lyrics do not (at least strongly) activate melody that can interfere with the vocal planning of melody production via humming.Unfamiliar melody accompanied with unfamiliar lyrics produces less disruption that the other combination of melody and lyrics because both unfamiliar melody and unfamiliar lyrics fail to strongly activate representations of (familiar songs) within the musical and phonological lexicons that could compete with vocal planning.Providing at least one component of the song is familiar, competing representations within the musical lexicon and phonological lexicon, either direct, or through the bilateral link between musical lexicon and phonological lexicon, lead to that representation strongly competing for the vocal-planning underpinning melodic and spoken lyric, retrieval.That is, selection of a familiar target melody or familiar target lyrics is influenced by the activation of competitors within the musical and phonological lexicon.This can have a bearing on the actual selection of target information (melody or lyrics) to populate the motor-planning stage for the target sequence (cf.Peretz et al., 2009;Peretz & Zatorre, 2005).Dynamic selective attentional processes such as inhibition may facilitate target selection by resolving this competition or by removing the candidate, but response inappropriate, melody, or lyrics, from the planning process in the event that they assume the control of action (cf.Neumann, 1987).
The results in relation to onset-time (Experiment 1) may also be explicable within the modular model of music processing (Peretz & Coltheart, 2003).Here, the presentation of a familiar melody with familiar lyrics would result in faster recognition and activation of the competitor within the musical lexicon and thus more completely influence the vocal-planning process, perhaps through capturing the articulators.The finding that familiar lyrics combined with familiar melody delayed onset time for melody retrieval relative to the other background conditions in Experiment 1 is consistent with this explanation: a longer time would be required to resolve the competition and recover the target melody in this condition.That the effect on onset time was not observed for lyrics retrieval in Experiment 2 might be attributable to two explanations.First, when lyrics are decoupled from melody-as in spoken lyric retrieval-there may be less demand on the vocal planning mechanism to produce the output.Second, melody retrieval may be entwined to a greater degree with associated memories than spoken lyric retrieval and as such the task parameters yield greater opportunity generally for background material to compete for, and therefore disrupt, the vocal-planning mechanism responsible for target melody production.

Limitations
Great care was taken to match familiar and unfamiliar melodies and lyrics within the background Sound Conditions used within this experimental series.For this reason, however, it is possible that some unfamiliar background sequences may have been perceived as familiar due to melody priming.To avoid this potential problem in future work, unfamiliar folk song melodies might be adopted as unfamiliar melodies (e.g.Dowling et al., 1995).The results of Experiments 1-4, however, demonstrate that participants were able to discriminate familiar from unfamiliar melodies well enough for them to be differentially disruptive.Indeed, this was also supported by supplementary data from recognition tests.Nevertheless, the use of melodies that are not derived from familiar melodies may produce purer effects.It is also possible that the effects of familiarity might have been diluted somewhat through semantic priming.Some of the songs used may be semantically and associatively related to one another through, for example, theme (e.g.Christmas, or holiday songs) or learning episode (e.g. during pre-school).
A key question for future research would be to establish whether the production of a target song (melody or lyrics, e.g."Jingle Bells") is more disruptive by the concurrent of a semantically (and contextually) associated song (e.g."Rudolph the Red-Nosed Reindeer") compared to a semantically (and contextually) non-associated song (e.g."Row, row, row your Boat").On the modular model of music processing (Peretz & Coltheart, 2003), such categorical priming between musical lexicon representations should exacerbate competition for the vocal-motor planning process.
In Experiment 2 we requested spoken retrieval of lyrics.This task, however, may have involved inhibition of the tendency to sing lyrics accompanied to their melody.An outstanding question, therefore, is whether requesting singing as the vocal-motor output would change the patterns of disruption observed from unfamiliar and familiar melody and lyrics.Singing, as compared with humming (Experiment 1), requires a greater level of vocal co-ordination, due to the simultaneous activation of melody and lyrics.The production of melody and lyrics through singing also offers an advantage of undertaking scoring of these two facets (accuracy of melody and lyric retrieval) on the same output as compared to different outputs in Experiments 1 and 2. Furthermore, it would be interesting to discover if patterns of results from speaking/singing in the presence of different properties of background sound, results in patterns similar to the clinical dissociations identified from neurological studies wherein patients with left frontal lesions can sing, but not speak, words (e.g.Brust, 2003).
It should also be noted that some of our conclusions should be taken tentatively.When adjusting for multiple tests using the Holm-Bonferroni sequential method (Holm, 1979) to deal with familywise error rates, some of the comparisons in Experiments 1 and 2 approached significance rather than achieved significance.These are indicated in the results sections.

Conclusion
The findings reported here show that retrieval of melody (Experiment 1) and spoken lyrics (Experiment 2) is disrupted by background sound in ways coherent with the dynamic interference-byprocess view of the interference-forgetting relationship (Jones & Tremblay, 2000;Marsh et al., 2009).Disruption to the vocal-motor planning process required to retrieve melody or lyrics is produced due to a competition for retrieval that differs in terms of specificity with the requirements of the focal task.As such the explanatory compass of the interference-by-process construct, successful in explaining auditory distraction within short-term serial recall (Jones & Tremblay, 2000), semantic organisation (Marsh et al., 2009), creativity (Marsh et al., 2021), reading (e.g.Meng et al., 2020;Vasilev et al., 2020) and writing (Sörqvist et al., 2012) can be successfully extended to the retrieval of song.

Figure 1 .
Figure 1.An example of familiar melody with familiar lyrics (panel A) and an example of unfamiliar matched melody with unfamiliar lyrics (panel B).

Figure 2 .
Figure 2. Mean proportion retrieval accuracy for melody humming across the quiet (no sound) and Sound Conditions as a function of Sound Type, deployed in Experiment 1. Error bars represent standard error of the means.

Figure 3 .
Figure 3. Mean onset time (in seconds) for melody humming retrieval across Sound Condition as a function of Sound Type deployed in Experiment 1. Error bars represent standard errors of the means.

Figure 4 .
Figure 4. Mean proportion retrieval accuracy for lyrics across the quiet, unfamiliar, and familiar Sound Conditions according to Sound Type in Experiment 2.

Figure 5 .
Figure 5. Mean serial recall performance as a function of group (Panel A and Panel B, respectively) and Sound Conditions deployed in Experiment 4. Error bars represent the standard error of the means.

Figure
Figure Probability correct recall as a function of irrelevant Sound Condition for the missing-item task in Experiment 4. Error bars represent standard errors of the means.

Table 1 .
Combination of experimental Sound Conditions for Sound Type Groups.

Table 2 .
Mean serial-digit recall scores across Sound Conditions by Sound Type Group.