Between Sound and Speech: Liminal Signs in Interaction

ABSTRACT When people talk, they recruit a wide range of expressive devices for interactional work, from sighs, sniffs, clicks, and whistles to other conduct that borders on the linguistic. These resources are used in the management of turn and sequence and the marking of stance and affect, and they represent an aspect of the interactional machinery that is as elusive as it is powerful. Phenomena long assumed to be beyond the purview of linguistic inquiry emerge as systematically deployed practices whose ambiguous degree of control and convention allows participants to carry out subtle interactional work without committing to specific words. While these resources have been characterized as nonlexical, nonverbal, or nonconventional, I propose that they are unified in their liminality: They work well precisely because they equivocate between sound and speech. The empirical study of liminal signs shows the promise of sequential analysis for building a science of language on interactional foundations.

Everyday language use is full of sighs, sniffs, whistles, clicks, and other conduct that borders on the linguistic. For a long time, these sounds and actions have escaped analytical attention. One reason is technological: Recording and transcription methods were not up to the task. Another is theoretical: A focus on idealized competence led a generation of linguists to look away from what was seen as the flotsam and jetsam of mere performance. But the most intriguing reason is delightfully self-referential: For talk-in-interaction to work as smoothly as it does, perhaps some aspects of it must seem like not-talk.
The articles in this issue move the goalposts of research on language in social interaction in three ways. They showcase new affordances of technology and transcription that bring new areas within reach of systematic scrutiny. They respecify conceptual foundations to call into question received views on what is and isn't linguistic. And they bring into view some of the more elusive aspects of language in everyday use. The interactional resources we encounter here emerge as integral elements of an interactional machinery that is "finely powerful" (McHoul, 2005, citing Harvey Sacks's unfinished book) and that underpins the most sophisticated uses of language.
CONTACT Mark Dingemanse m.dingemanse@let.ru.nl P.O. Box 9103, Nijmegen HD 6500, The Netherlands. I thank Leelo Keevallik and Emily Hofstetter for inviting me to a stimulating inaugural workshop on "Non-lexical vocalizations" in Linköping, and Keith Jarrett and Esperanza Spalding for sustaining that other liminal world between sound and speech.
A metaphor of machinery (Garfinkel & Sacks, 1986;Sacks, 1985) is quite apt for the resources and structures of talk-in-interaction featured in this issue. In the careful technical analyses of mundane interaction we can feel the hum of a whirring machine, gears turning and pawls clicking into place. The mechanical metaphor should not detract us from the artisanal aspects of the resources recruited here. If this is a machine, it is one of flesh and blood, interactionally achieved and human-operated. The articles in this issue study talk-in-interaction in a wide range of everyday activities: from beer tastings to board games, from reality shows to outdoor runs, and from kitchens to construction sites. Together, they paint a compelling picture of bodily sounds and actions recruited as interlocking elements in the service of a human-made interactional order.

Between sound and speech
Grounded in bodies of work on sound objects (e.g., Reber, 2012) and prosody and affect in interaction (e.g., Couper-Kuhlen, 2009), Reber & Couper-Kuhlen (2020/this issue) describe some recurrent forms and functions of conversational whistling. They show that whistles can serve as carriers for recognizable prosodic contours (as in the sequentially initial two-tone whistle, which bears similarity to a calling contour also found on spoken words), and more often end up in nonsanctioned overlap than otherwise functionally similar forms like "wow," perhaps in line with their being noticeably not speech. Reber and Couper-Kuhlen propose a continuum of sound objects according to their degree of lexicality, with whistles at the nonlexical and nonlinguistic extreme and words like hey or wow closer to the other end.
Clicks are known as phonemes in a number of African languages, but in this issue we find them described as systematic interactional resources in two languages not ordinarily thought of as featuring phonological clicks: English and Mandarin Chinese. Ogden (2020/this issue) focuses on a number of recurrent and recognizable practices involving clicks. One aim is to contribute to a "less logocentric view of language," a goal we also see reflected in earlier work on paralinguistic phenomena in interaction (e.g., Ogden, 2012Ogden, , 2013. The clicks in focus are postpositioned relevant to some prior action, and they function as stance markers that comment on rather than respond to these actions, or, as Ogden puts it in a particularly felicitous formulation, as a resource for "audibly not saying something." Li (2020/this issue) focuses on the use of clicks in an in-progress turn constructional unit, where they help mark a sequential trajectory change. In one sequential environment, clicks attend the repair of a socially and interactionally sensitive action-in-progress. The click in this case allows something not on the record (the possible inappositeness of an action) to not pass by without notice. As part of a larger phonetic format, the click helps project turn continuation, halts the talk-in-progress, and foreshadows a sequential trajectory change. Another use is more self-oriented and occurs predominantly in extended tellings. Ogden and Li's articles both provide illuminating discussions of the affordances connected to the phonetic substance and semiotic status of clicks. Clicks work well as noncommittal sound objects, at least in English and Mandarin Chinese, because they are clearly positionable yet don't involve explicitly lexical material.
The theme of turn and sequence organization is continued in a contribution by Hoey (2020/this issue), who describes the interactional deployment of sniffs as part of an impressive line of work exploring embodied social life at the seams of the interactional order (e.g., Hoey, 2014Hoey, , 2017. Hoey finds two recurrent uses of sniffs, or quick inhalatory actions through the nose. Placed prior to or in the course of an unfolding turn, sniffs function as members of a larger "natural class of delay devices" that also includes throat clearings and items like uh and um (Clark & Fox Tree, 2002). The other sequential environment in which sniffs occur is in the postcompletion space of a turn, where Hoey documents a close relation between sniffing and turn-completion by means of an exhaustive comparative analysis of inbreaths in the same environment.
Mondada (2020/this issue) focuses on the sounds made while smelling in a beer tasting session for amateurs. Her description of the confluence of semiotic resources in these tasting events is part of a larger body of work on the multimodal and sensorial nature of human interaction (e.g., Mondada, 2016). Whereas the short sniffs described by Hoey are connected to the management of turn and sequence, here sniffing forms an integral part of the beer tasting activity, with sniff-prefaced turns offering aroma descriptions in response to olfactory inquiries or previous descriptions. By detailing the sequential organization of sniffing in interaction, Mondada brings to light an intersubjective dimension of smelling, which is too often treated merely in psychophysiological, individual-bound terms.
Pehkonen (2020/this issue) describes some structural and sequential aspects of a vocalization transcribed huh huh [ ɦ uh w ɦ uh w ], which he analyzes as a lexicalized, stylized version of exhalation or sighing. One of the more telling data fragments in this study is of a dad lifting his daughters (aged 2 and 6) while climbing a mountain and uttering a > huh ↓hu(hh)h.<, which is followed, after 1.3 s of silence, by the 6-year-old saying erm did you get hot when you climbed that mountain?-a candidate gloss of the meaning of this token which he then ratifies. Like many other sound objects studied in this issue, huh huh is implicated in displays of stance but also plays roles in the management of turn and sequence; in particular, it is frequently found in transitions from a joint activity to assessment of that activity.
Hofstetter (2020/this issue) studies communicative displays produced in response to board game events in which a player suffers. A perfect embodiment of Bateson's (1955) paradox of framing, these "moans" index suffering-in-the-game while also signaling a willingness to continue play. Hofstetter's sequential analysis shows that they are produced and treated in line with this two-faced nature: Their producers often follow up "moans" with downgrading expressions, and other players can publicly index alignment with them while also treating them as laughable. Hofstetter points out that "moans," as generic and personal expressions of in-game suffering, have the interactionally useful feature of avoiding some of the collateral effects (Sidnell & Enfield, 2012) of lexical expressions like "but why w-why would you do that to me." As she shows, the latter are more vulnerable to being interpreted as serious or as involving personally directed complaints that result in interactional turbulence.

Liminal signs
The area of talk-in-interaction targeted in this issue is beset with the terminological misfortune of non-labels: The phenomena are described as nonlexical vocalizations, nonverbal responses, or nonconventional sounds, with nonphonemic sound structure, nonarbitrary form-meaning mappings and noncommittal meanings. At the same time the analyses show that these negative definitions are frequently too strong, as many of these items do in fact draw on routinized practices, are used in systematic ways, and stand in paradigmatic relations to other resources.
Rather than turning the dial all the way to the "non" side, if there is one thing that unites these phenomena it may well be their in-between status. They are recurrently described as ambiguously conventional, borderline linguistic, semantically vague, and equivocal as to physiological or interactional causes. Because they equivocate between showing and saying, giving off and giving, symptoms and symbols, I will refer to them as liminal signs. Liminal signs are signs that derive interactional utility from being ambiguous with regard to conventionality, intentionality, and accountability. Their in-between status (Turner, 1969) is an essential part of their form and function, as they occupy the interstices of talk and frequently serve to navigate liminal and transitory spaces in interaction. Besides the phenomena described in this issue, liminal signs can take the form of coughs (Bailey, 2009), sighs (Hoey, 2014), and inbreaths (Torreira, Bögels, & Levinson, 2015;Winter & Grawunder, 2012) as well as visual conduct like winks, nose wrinkles, and "thinking" facial expressions, some of which may play broadly equivalent roles in sign language interaction (Mesch, 2016).
Other terms exist in this area. One is "sound objects" (Reber, 2012), used most commonly in relation to affective responses. This fits some of the contributions in this issue, but it is hampered by its modality-specificity and doesn't fully bring out the semiotic and intersubjective dimensions of the phenomena. Another is "collateral signals" (Clark, 1996), a much broader notion covering any sign doing metacommunicative work, from repair initiations and continuers to certain gestures and gaze patterns. The term liminal signs is more general than the first yet more specific than the second and is meant to highlight the importance of liminality, ambiguity, and deniability in social interaction.

Why these forms?
Why do liminal signs take the forms they do? A number of nonexclusive explanations are offered in the contributions. Modality-specific affordances are one reason: Whistles, strain grunts, and "moans" can carry affective prosodic contours and are overlappable with speech. Clicks and sniffs physiologically preclude speech and are used to manage the unfolding and sequential placement of turn constructional units. Relatedly, at least some liminal signs have their origins in bodily actions with noninteractional purposes. This lends them an air of plausible deniability: They can be noticeable yet off-record, perceptible yet ignorable.
Most generally, all of the liminal signs discussed here are markedly different from other words. This is one reason they have been overlooked and consigned to the margins of language. But it bears a straightforward relation to the interactional work they do. For it is not merely that they are different from other words: Most of them display a tight fit of form to function in their particular sequential environment. A whistle vents surprise where words fail. A sniff recruits respiratory conduct for regulatory use. A click audibly conveys not saying something. A playful "moan" embodies the paradox of play and notplay. The sheer aptness of form inspires awe about the resourcefulness of people in interaction, and at a larger timescale, about the cultural evolutionary processes that sample the landscape of possible resources and arrive at adaptive solutions (Enfield, 2013(Enfield, , 2014. To the extent that we are dealing here with interactional needs encountered by embodied participants anywhere, we can expect to find similar solutions even across unrelated languages: a new crop of candidate pragmatic universals awaiting systematic comparative investigation. Recent work on human interaction is marked by a renewed interest in human pragmatic and metacognitive skills: the processes by which we monitor self and other in interaction (Frith, 2012;Woensdregt & Smith, 2017). A fundamental methodological conundrum in this field is a lack of direct access to the underlying cognitive processes. The interactive and intersubjective liminal signs we meet in these pages are a crucial part of the solution, for users and analysts of language alike (Schegloff, 1991). They are material symbols of metacognition (Dingemanse, 2020): routinely deployable resources that help streamline social interaction by providing a running commentary on states of mind.

Technological and conceptual contributions
Half a century ago, Sacks already foresaw the far reach of the methods of discovery pioneered by him with Jefferson and Schegloff and noted that technologies and theories shape the questions we can ask: [I]t would be nice if things were ripe so that any question you wanted to ask, you could ask. But there are all sorts of problems that we know in the history of any field that can't be asked at a given time. They don't have the technology, they don't have the conceptual apparatus, etc. We just have to live with that, and find what we can ask and what we can handle. (Spring 1966Lecture in Sacks, 1992).
The articles in this issue are part of a wave of new research into multimodal talk-ininteraction (Keevallik, 2018;Mondada, 2016) that is steadily making progress in just what the study of talk-in-interaction can handle.

Technological advances
Take sniffing. Whereas many transcripts have featured descriptions like ((sniff)), for his analysis Hoey proposes the more precise transcription >.nh<, in a move that evokes Jefferson's successive refinements of the transcription of laughter (Jefferson, 1985). One important benefit of >.nh< is that it does not rely on an analyst's gloss, which could carry its own connotations about form and meaning. Perhaps more importantly, >.nh< is a more direct approximation of the acoustic features of the short inhalatory sniff, opening up possibilities for comparative observations. Compare ((renifle)) (Mondada, 2006, Extract 1) with .nh (Mondada, this issue, Extract 1): The first requires another layer of translation; the second is transparently readable as similar to >.nh< but longer. A related development is the adoption of symbols from the International Phonetic Alphabet for the transcription of clicks and "moans" in other articles in the issue. This transcriptional innovation combines the advantages of Jefferson-style transcription with the crosslinguistic comparability of the IPA.
Other advances lie in capturing the fine temporal structure of sound. Participants in interaction are capable of timing their contributions in fine coordination with others, and Ogden (2020/this issue) and Hofstetter (2020/this issue) demonstrate such precision timing in graphical depictions of the temporal and acoustic structure of speech. It is important that this is not replacing but rather complementing prior treatments. Work on entrainment in conversational speech suggests that timing can be thought of in relative terms (Ogden, 2015), fully in line with Jefferson's treatment of beats as a "shared time unit" (Jefferson, 1973). Visualizations of the acoustic structure make this entrainment directly perceptible. This opens up a new field for discovery and enables adjacent disciplines, from phonetics to quantitative studies of conversation, to build on conversation analytic findings.
The rigorous sequential treatment of the phenomena in this issue also represents an important advance over prior work. One of the earliest recorded observations of the nonmelodic whistling described by Reber and Couper-Kuhlen (2020/this issue) is by Darwin, who cites evidence from an English novel and from a personal report from Southern Africa where a Zulu-or Xhosa-speaking girl, "on hearing the high price of an article, raised her eyebrows and whistled just as a European would" (Darwin, 1872, p. 286). The type of situation described is just enough to identify it as likely equivalent in terms of sequential context: a response to the numerical breach of an implicit norm. But the subtle facts about overlappability and its interactional achievement uncovered by Reber and Couper-Kuhlen would not have been discoverable without systematic sequential analysis.
Darwin also wrote about clicks, relaying observations on the use of a "clucking noise" in expressions of gentle surprise in Australians as well as Europeans (Darwin, 1872, p. 286). Such clicks were taken up recently in a survey targeting two broad uses: affective, expressing positive or negative affect, and logical, expressing affirmation or negation (Gil, 2013). Gil's survey, based on personal communication with a large number of fieldworkers and linguists, indicates that at least some interactional uses of clicks may be fairly widespread across the world's languages (see also Pillion, Grenoble, Ngué Um, & Kopper, 2019). Such surveys of course are several steps removed from the actual data of everyday talk-in-interaction, and hampered by limits to metapragmatic awareness (Silverstein, 1981). Ogden's (2020/this issue) and Li's (2020/this issue) careful sequential analyses reveal more precise interactional uses and show how they can be investigated in recordings of talk-in-interaction.

Conceptual connections
Besides technology, Sacks also referred to the "conceptual apparatus" as shaping what we can ask. For developing this conceptual apparatus, Sacks of course drew inspiration from a wide range of fields (Sacks, 1989). Fittingly, the contributions to this issue also display a productive engagement with other interaction-focused disciplines, from cybernetics to sociology and from linguistics to physiology. This engagement is likely to be reciprocated. For instance, the rigorous analysis of the intersubjective use of embodied interactional resources is of direct relevance to enactive and embodied approaches to language and cognitive science, from work on material symbols (Clark, 2006) to studies of participatory sense-making (Di Paolo, Cuffari, & De Jaegher, 2018;Jaegher, Peräkylä, & Stevanovic, 2016). Sharing epistemological commitments to materiality, intersubjectivity, and emergence that will strike scholars of talk-in-interaction as familiar, these lines of research find allies in conversation analysis and interactional linguistics.
The research in this issue displays a productive strategy for interrogating subjective notions of marginality. One key insight is that we can describe and analyze the sequential and structural organization of empirically observed phenomena without getting hung up prematurely on questions about intentionality or conventionality. For instance, we don't need to speculate about the psychological states of participants in order to attend to how participants themselves navigate matters of accountability and distinctions like play/not-play (Bateson, 1955) or serious/unserious (Goffman, 1963). Similarly, when studying the systematic deployment of sniffs, sighs, or "moans," the question of whether to count them as linguistic is at best secondary to the actual facts of the orderly organization of routinized methods in interaction. Indeed such an investigation may well shift our sense of where the boundaries lie: For instance, the systematicity of some of the liminal signs described here makes it possible to think of them as an instantiation of multiple phonemic systems (Fries & Pike, 1949) in the service of phatic and expressive functions of language (Jakobson, 1960). This approach is a constructive epistemic stance to take in the study of language in general, especially if we want to make sure we are not circularly defining the object of study as that which inherited theories dictate as the most interesting (Bolinger, 1946;Dingemanse, 2017).
To be clear, the point is not that questions of intentionality or conventionalization are irrelevant. It is that insisting on a priori boundaries invites us to disregard whole swathes of meaningful behavior along with the conceptual tools that can help us shed light on them. The language sciences need approaches that can deal just as well with the fluid, hybrid, and liminal aspects of language in interaction as they can deal with its betterstudied systematic, structural, and compositional aspects.

In closing
Language scientists have long distinguished between center and periphery in language and linguistics. Vision scientists, perhaps more acquainted with the importance of perspective, speak instead of central and peripheral vision, putting the distinction in the eye of the beholder. The articles in this issue can be seen as diligently mapping out interactional practices that form the backdrop to our everyday use of language even if they tend to fall just outside our gaze. By starting from linguistic structures that present themselves directly to our awareness as accountable actions, linguistics has been able to make considerable progress. But the careful, cumulative sequential analysis of records of talk-in-interaction is now giving us access also to more liminal resources that streamline our linguistic lives. If we have been overlooking liminal signs, it is because they were not meant to be particularly noticeable, and work best when they seem least like talk.
There is in these pages a palpable sense of discovery as new phenomena are brought under the lens of sequential analysis. This bringing under the lens, as in any field, is not merely a question of looking: It requires the right conceptual and methodological tools.