Ingressive phonation conveys arousal in human nonverbal vocalizations

ABSTRACT Animals normally vocalise while exhaling. Ingressive, or inspiratory, voice production occurs in humans and many other species, but its communicative function, if any, remains unknown. To test the perceptual effects of ingressive phonation, naturally occurring ingressive syllables in 109 human nonverbal vocalisations (55 laughs, 21 cries, and 33 moans) were experimentally attenuated or morphed into quiet and unvoiced intakes of breath using voice resynthesis technology. Ratings of the intensity of discrete emotions (amusement, sadness, pleasure) and of general arousal in three perceptual experiments revealed that listeners (N = 283) judged vocalisations with attenuated ingressive syllables to be less emotionally intense compared to the originals. Ingressive vocalisations were not experienced as either unnatural or unpleasant, confirming that they are a familiar part of human vocal repertoire. In sum, ingressive phonation can occur in a wide range of human nonverbal vocalisations and typically conveys intense emotion, presumably because listeners associate heavy breathing, imperfect vocal control, and continuous egressive-ingressive vocalising with the physiological state of high arousal. It remains to be seen whether ingressive phonation is a mere byproduct of high arousal or whether it can be exaggerated, and whether its communicative function extends to vocalisations of non-human animals.


Introduction
The standard textbook explanation of voice production is that the lungs provide the necessary pressure, which drives a continuous airflow through the glottis and sets the vocal folds in motion (Titze 1994;Behrman 2018). As anyone who has ever gasped in surprise knows, this flow of air does not have to be in a particular direction in order to set the vocal folds vibrating, but the voice sounds rather different depending on whether we vocalise while exhaling (egressive or expiratory phonation) or inhaling (ingressive or inspiratory phonation). Ingressive phonation is generally quieter and harsher (Eklund 2008), with less energy in upper harmonics (Orlikoff et al. 1997;Vanhecke et al. 2016) and a breathy voice quality. Subjectively, it may feel effortful or unpleasant to sustain ingressive phonation for a long time (Orlikoff et al. 1997) due to an anatomical asymmetry of the vocal folds (Ohala 1983), which suggests that the vocal apparatus evolved to be optimised for expiratory airflow. This design feature makes sense physiologically: it is easier to achieve high pressure when contracting, as opposed to enlarging, the rib cage (Behrman 2018). Furthermore, delaying an intake of breath with a long vocalisation may be more costly metabolically compared to prolonging the exhalation. As a result, humans, and apparently most other mammals, find it more natural to vocalise during exhalation.
While ingressive phonation is clearly the exception rather than the rule, the list of these exceptions has been steadily growing without any unifying theoretical framework to explain them. A wide range of animal vocalisations are partly ingressive, including pant-hoots of chimpanzees (Riede et al. 2004), laughs of all great apes (Ross et al. 2009), purring of felines (Eklund et al. 2010), songs of gibbons (Geissmann 1984) and birds (Goller and Daley 2001), bellows of koalas (Charlton 2015), groans of fallow deer (Reby and McComb 2003), display calls of African penguins (Favaro et al. 2014), braying of donkeys, roars of howler monkeys, etc. In humans, ingressive phonation is found in some speech registers (Eklund 2008) and in singing (Vanhecke et al. 2016), but above all in nonverbal vocalisations such as laughs (Tanaka and Campbell 2011;Bryant 2020), sobs (Darwin 1872;Aucouturier et al. 2011), and gasps (Anikin 2020a). We are not aware of any quantitative analyses of the prevalence of ingressive phonation in vocalisations of adult humans, but it was straightforward to find over a hundred examples with loud ingressive syllables for this study, suggesting that they are fairly common. This raises the following question: is the presence of ingressive phonation in nonverbal vocalisations an epiphenomenon, or does it contribute meaningfully to communication? There are two theoretical considerations that suggest a possible function for ingressive phonation.
First, polysyllabic bouts of vocalising unfold over multiple respiratory cycles and therefore require precisely timing voice onsets and offsets to keep them purely egressive, particularly when the breathing is heavy and rapid due to generally high arousal. In other words, because respiratory rate increases and vocal control becomes more challenging with mounting physiological arousal, ingressive syllables may be increasingly difficult to avoid, making them reliable indicators of the vocaliser's agitated state. To take an extreme example, respiration may be seriously disrupted during a fit of uncontrollable giggles, leaving the sufferer literally -and audibly -gasping for breath. As a result, listeners can be expected to interpret heavily ingressive laughs as a sign of genuine, intense amusement. This reasoning primarily applies to rapid egressive-ingressive call sequences, such as sobbing in humans or pant-hoots in chimpanzees (Riede et al. 2004), but probably not to call bouts that are produced at a relatively slow rate (e.g. purring of cats, human moans) or to purely ingressive calls (e.g. isolated human gasps or fallow deer groans).
Second, phonating during both exhalation and inhalation enables an animal to maximise the number of calls per unit of time, as well as the proportion of time spent vocalising, which are important acoustic indicators of high arousal (Volodin et al. 2009;Briefer 2012). Unlike players of wind instruments who have mastered circular breathing (White 2014), most ordinary mammals have to interrupt their vocal production now and then to draw a breath. Ingressive phonation can fill the silences otherwise left between syllables, increasing the rate of acoustic events and intensifying the 'acoustic bombardment' of listeners. This is important because intense vocalisations, and in fact emotional speech, need to be effective at involuntarily attracting and holding the attention of listeners, which creates a close alignment between emotion intensity and low-level auditory salience of vocalisations (Anikin 2020b). In other words, it may be adaptive for callers to vocalise during inhalation in urgent, high-arousal contexts, such as intense distress, in order to maximise the salience of their vocal output.
Both theoretical arguments lead to the same prediction: ingressive phonation should perceptually convey high arousal or emotion intensity. Two approaches can be used to test this prediction. One is to correlate the degree of ingressiveness in naturally occurring vocalisations with other perceptual qualities. In the only such study that we are aware of, Kret et al. (2021) found that subjective ratings of ingressiveness were negatively correlated with the pleasantness or contagiousness of infant laughter, concluding that infants are socialised to laugh in a purely egressive manner because adults find this 'proper' laughter more pleasing. While the above study is a welcome pioneering attempt to shed some light on the role of ingressive phonation in laughter, the analysed laughs presumably differed in many acoustic characteristics apart from their perceived ingressiveness. A stronger case can be made if vocalisations are manipulated experimentally to change the amount of ingressive phonation while preserving their temporal structure, pitch, voice quality, and other acoustic properties. This is the approach taken in the current study: nonverbal vocalisations with a loud, voiced ingressive syllable had this syllable attenuated and morphed into inhalation noise, turning it into a soft intake of breath with approximately the same spectral and amplitude envelopes as in the original. The stimuli consisted of three types of polysyllabic human nonverbal vocalisations that often contain ingressive syllables: laughs, cries or sobs, and moans.
Laughter is extremely common in everyday life, present in all human societies (Provine 2000;Bryant et al. 2018), found in children born deaf and blind (Eibl-Eibesfeldt 1989), and similar to the laughter of great apes in terms of its function and basic acoustics (Ross et al. 2009). The evolutionary origins of laughter are assumed to lie in rough-and-tumble social play accompanied by laborious breathing, which was eventually ritualised into a pant-like vocalisation (van Hooff 1972;Provine 2000;Vettin and Todt 2005;Bryant 2020). Consistent with this hypothesis, great apes produce pant-like laughs with alternating egressive and ingressive syllables. Interestingly, human infants initially laugh in an 'ape-like', egressive-ingressive manner and only switch to adult-type, primarily egressive laughter over the first few years of life (Kret et al. 2021). Notwithstanding this developmental trajectory, the acoustic variability of laughs should not be underestimated (Bachorowski et al. 2001;Vettin and Todt 2004), and ingressive phonation is well documented in the laughter of adult humans (Eklund 2008;Tanaka and Campbell 2011;Bryant 2020). Crying in humans is functionally similar to distress calls in other animals and has some acoustic parallels with the screams and whimpers of any young mammal (Lingle et al. 2012). However, human crying is a multimodal nonverbal behaviour with various components emerging at different times in development. Newborn infants are already capable of producing screams and cries of distress of the kind analysed by Lingle et al. (2012), but both weeping with tears and egressive-ingressive sobbing emerge only after a few months ( Darwin 1872). Crying in adults is less common than in infants, but very distinct acoustically when it does occur . Ingressive phonation in crying appears to be very common, as captured in the English word to sob.
The third vocalisation type we tested is moaning. There has been little formal research on moaning, but moans are known to be common in painful contexts such as childbirth, as well as during sexual activities (Anikin and Persson 2017;Prokop 2021), and are sometimes joined by loud ingressive syllables. Gasps were not considered because these single-syllable calls are fully ingressive by definition (Anikin 2020a).
For all three call types (laughs, cries, and moans), experimentally attenuating their ingressiveness was predicted to lower the perceived level of arousal and emotion intensity. To test this hypothesis, listeners rated the original and manipulated vocalisations on several perceptual scales designed to capture the perceived level of general arousal, the intensity of expressed emotion (amusement for laughs, sadness for cries, and pleasure or pain for moans), and induced emotion (contagiousness, pleasing or disturbing nature of manipulated vocalisations). Finally, authenticity ratings were obtained to test whether removing ingressive phonation made the vocalisations more or less natural.

Stimuli
Recordings of laughs, cries, and moans (Table 1) were chosen for high audio quality and the prominence of ingressive phonation. Laughs (n = 55) were obtained from two sources: 21 from a collection of nonverbal vocalisations from YouTube videos (Anikin and Persson 2017) and 34 from unscripted dyadic interactions based around retelling funny videos (Wood 2020). In both collections, adult humans were laughing spontaneously and presumably were genuinely amused, although the acoustic intensity of laughter was considerably higher in the YouTube collection. Each clip was trimmed to contain a single prominent and at least partly voiced ingressive syllable, which could be located at the onset (n = 8), offset (n = 7), or in the middle (n = 40) of the laugh. Ingressive syllables were manually annotated; their duration was (mean ± SD) 436 ± 152 ms, range [237,1086].
Cries of sadness (n = 21) were taken from Anikin and Persson (2017) and trimmed to contain 1-3 ingressive syllables. Six were produced by school-age children, and 15 by adult men and women. Moans (n = 33) were taken from recordings of women giving birth (n = 16 recordings) and engaging in real or simulated sexual activities (n = 17), which were obtained from two sources (Magnard 2014;Anikin and Persson 2017) and edited to contain between one and three voiced ingressive syllables and one or more egressive syllables.

Manipulation of ingressiveness
Ingressive syllables were manually annotated and manipulated in one of two ways. In the original condition, they were attenuated by 0, 9, or 18 dB by means of separating, processing, and then cross-fading egressive and ingressive parts of each laugh with an R script. In the noise condition, there was an additional processing step: the ingressive syllable was mixed with a synthetic noise with the same RMS amplitude and the same spectral envelope, which was achieved by using the transplantFormants() function in the R package soundgen (Anikin 2019). The relative contribution of noise was 0% at attenuation = 0 dB (i.e. unmodified original recording), 50% at attenuation = 9 dB, and 100% at attenuation = 18 dB (i.e. only noise). In other words, in addition to making the ingressive syllable quieter, in the noise condition it was morphed into an aspiration-like sound with the same formant structure. Obviously, this is only an approximation since an unvoiced inhalation would not have exactly the same smoothed spectral envelope as a voiced ingressive syllable. Coupled with the manipulation of intensity, however, in most cases this produced a passable impression of the speaker drawing a quiet breath without phonating, while the egressive parts of the original vocalisation remained unchanged (Figure 1).

Participants
Participants (N = 283) were recruited on the online testing platform Prolific (https://www.prolific.co/). Independent samples of listeners rated laughs (n = 100 listeners), cries (n = 82 listeners), and moans (n = 101 listeners). All participants self-reported to have normal hearing and to be fluent in English; 55% selfidentified as female, 43% as male, and 1% as 'unspecified'; the average age was 24 ± 6 years, range [18,59]. Sample sizes were chosen to ensure sufficient precision of estimates of population-level effect sizes in Bayesian multilevel models. This precision primarily depends on the number of stimulus prototypes (here 55 + 21 + 33 = 109) and the number of times each unique manipulated sound (here 109 * 6 = 654) is tested on each response scale (here 6.7 times for laughs and moans, 8.3 times for cries). The average width of 95% credible intervals (CIs) of effect sizes was 4.0% for laughs, 6.5% for cries, and 6.9% for moans, and the main factor limiting the precision of this analysis is the number of prototypes available, rather than the number of raters.

Procedure
We obtained listeners' ratings of the manipulated stimuli on five response scales (Table 1) in three online experiments (for laughs, cries, and moans), each with an independent sample of raters. Each participant rated all prototypes in two (laughs and moans) or three (cries) blocks, each with a randomly selected response scale. The order of blocks and trials within blocks was randomised for each participant. Responses were given on a horizontal Visual Analog Scale labelled at the extremes, without tick marks but with grey stripes.

Data analysis
Each of the three experiments (laughs, cries, and moans) was analysed independently. Unaggregated responses were analysed with Bayesian multilevel zero-oneinflated beta models using the R package brms (Bürkner 2017). The model predicted the rating as a function of Condition (2 levels: original or noise), Attenuation of ingressive syllable (linear effect: 0, 9, or 18 dB), and Scale (5 levels), with all possible interactions. Because an attenuation of 0 dB means that the sounds were identical in both conditions, the corresponding beta coefficients were set to zero as part of prior specification, forcing the separate regression lines per condition to converge at zero (Figure 2). The effects of condition and scale were assumed to vary across subjects and across prototypes, and the effect of scale could also vary across individual stimuli. Finally, the variance of responses (phi) was assumed to vary across participants to account for individual differences in using the response scales. The model structure in brms syntax was as follows:

response ~ condition * attenuation * scale + (condition * scale | subject + prototype) + (scale | stimulus), phi ~ (scale | subject)
As a measure of inter-rater agreement in the rating task, we aggregated the ratings of each vocal stimulus on each response scale and calculated the mean Pearson's correlation between the responses of each participant and these aggregated ratings.
Averaging across five response scales, this correlation was r = .65 for laughs, .53 for cries, and .63 for moans. Comparing different scales, the lowest inter-rater reliability was obtained when rating the contagiousness of cries (r = .38) and the authenticity of moans (r = .39), suggesting that these judgements were rather idiosyncratic. Posterior distributions of model parameters and fitted values were summarised by their medians and 95% credible intervals (CIs). The proportion of posterior distribution (PD) of effect sizes that is positive (or negative) is also reported in the text. The audio, datasets, and R code for audio manipulation and data analysis are available in online supplements at http://cogsci.se/publications.html.

Results
The effect sizes below correspond to fitted values of the difference between manipulated and unmanipulated stimuli on a scale of 0 to 100, reported as percentage points. The effects mentioned in the text are for the maximum attenuation of 18 dB (see Figures 2 and 3 for full results), which turned a loud ingressive syllable into a barely audible one in the original condition, or into a quiet intake of breath in the noise condition.
Attenuating ingressive syllables did not noticeably change the pleasantness of laughs: by 0.2% [−2.6, 3.2] in the original condition and −0.8% [−3.8, 2.2] in the noise condition. There was a slight, statistically uncertain tendency for laughs to sound less authentic when the original ingressive syllable was made quieter (by 2.6%, 95% CI [0.5, 5.8], 95.0% of PD > 0), but no change at all in perceived authenticity when it was morphed into breathing noise (−0.9% [−3.8, 2.0], 72.2% of PD < 0). Thus, experimental manipulations did not sound unnatural, but neither did laughs become more authentic when loud ingressive syllables were replaced with quiet breathing.

Cries
Cries whose ingressive syllables were turned into quiet breathing were rated lower on sadness (6.5% [2.4, 10.7], 99.9% of PD > 0) and authenticity (6. 8% [1.8, 11.9], 99.6% of PD > 0). There was also a statistically less certain tendency for them to be rated lower on emotion intensity (3.4% [−0.7, 7.5], 95.2% of PD > 0). The effects of simply making the ingressive syllables quieter were qualitatively similar, but less pronounced compared to morphing them into breathing (Figure 3). Thus, less ingressive cries were perceived as less intense emotionally, but also less natural or authentic-sounding, even when ingressive syllables were simply made quieter.

Moans
Whereas laughing is a relatively unambiguous signal of amusement or joy, and crying of sadness, moans can express very different emotions ). Among the 33 tested moans, 17 were moans of sensual pleasure, and all 17 had mean valence ratings above 50% before manipulations of their ingressiveness (mean = 77%), suggesting that they were recognised as pleasure ( Figure 2C-D). Likewise, 15 out of 16 childbirth moans had a mean valence rating under 50% (mean = 32%), so the context was usually recognised as unpleasant, but with more variability compared to sexual moans. There may be differences in how moans with positive and negative perceived valence are affected by the manipulations of ingressiveness. For example, if less ingressive moans sound less emotionally intense, this would be expected to translate into lower valence ratings for moans of pleasure (less intense pleasure = less positive valence), but higher valence for moans of pain (less intense pain = more positive valence). Accordingly, moans were analysed with an additional interaction with their production context: pain (giving birth) or pleasure (sexual activities).

Discussion
We used voice resynthesis technology to directly manipulate the ingressiveness of human laughs, cries, and moans, revealing that ingressive phonation enhances the perceived level of emotion intensity. Thus, morphing loud ingressive syllables into quiet, unvoiced breaths makes it appear that a person laughing is less amused, a person crying is less sad, and a person moaning is not enjoying herself as much as when audible ingressive phonation is preserved in otherwise identical nonverbal vocalisations. This confirms the hypothesis that the presence of ingressive phonation conveys high arousal. Below, we discuss the findings in relation to the three tested call types -laughs, cries, and moansand consider their implications for the study of animal and human vocal communication.
The presence of ingressive gasp-like syllables in laughs was found to enhance the perceived level of amusement, emotion intensity, and contagiousness of laughs. This agrees very well with the notion of the 'animal nature' of spontaneous laughter advocated by Greg Bryant and Aktipis (2014), Bryant (2020)), who argues that regular, completely voiced laughs are produced under the control of the neurological circuits responsible for speech and are therefore perceived as less spontaneous, whereas laughs caused by a powerful, genuine emotion rely on evolutionarily older brain stem mechanisms and are more variable acoustically. Specifically, rapid heavy breathing associated with high physiological arousal is likely to introduce noisy and occasionally ingressive syllables into laughs (Gervais and Wilson 2005). Accordingly, listeners are probably correct in interpreting ingressive phonation in laughs as a signal of high arousal and intense amusement.
Contrary to the recently published evidence that ingressiveness ratings of infant laughs negatively correlated with their pleasantness and contagiousness ratings (Kret et al. 2021), ingressive laughs in the present study were not experienced as either unnatural or unpleasant, and in fact were rated as more, not less contagious. Direct manipulation of the target acoustic characteristic is a more powerful method than the correlation analysis used by Kret et al. (2021). On the other hand, the manipulated laughs contained only a single ingressive syllable; repeated ingressive syllables in a long bout of laughing might indeed begin to sound irritating. Intense nonverbal vocalisations are seldom pleasant to listen to -in fact, intense delight may sound remarkably similar to extreme anguish (Anikin and Persson 2017;Atias et al. 2019;Atias and Aviezer 2020).
Nevertheless, it appears unlikely that the negative correlation between ingressiveness and pleasantness or contagiousness reported by Kret et al. (2021) in infant laughs is a general phenomenon as there was no increase in contagiousness or pleasantness ratings of any vocalisations when their ingressiveness was experimentally reduced. Ingressive phonation thus appears to be a normal feature of human laughter, neither uncommon nor repulsive to listeners. Accordingly, it is probably an exaggeration to draw a strict distinction between egressive-ingressive laughter of other apes and egressive-only laughter of adult humans, as suggested in some publications (e.g. Provine 2000;Ross et al. 2009;Kret et al. 2021).
As expected, cries that were experimentally made less ingressive conveyed less intense sadness. Interestingly, they were also rated as less authentic, even when the ingressive syllables were simply made quieter without rendering them unvoiced. As no such effect was observed in laughs and moans, ingressiveness may be such a common feature in intense sobbing that its absence is experienced as unnatural. If so, ingressive phonation may need to be added to the list of acoustic features that 'make a cry a cry' (Lingle et al. 2012) alongside chevron-shaped pitch contours and nonlinear phenomena. Only cries of adults and school-age children were tested in this study. Considering the higher prevalence of crying in infants, as well as the great theoretical and practical importance of infant vocalisations, in future studies it will be important to extend the present findings by examining ingressive phonation in baby cries. It is well known that baby cries contain ingressive syllables (Darwin 1872;Aucouturier et al. 2011), but their prevalence and communicative significance have not been investigated. Judging by the present findings, ingressive phonation may turn out to be an important marker of intense distress in the crying of infants.
Moans are different from laughs and cries in their lack of an intrinsic temporal structure: a single vocalisation is normally produced per respiratory cycle, with no apparent sequences comparable to bouts of laughing or crying. It is therefore possible that listeners are less familiar with ingressive syllables between moans than they are with ingressive laughs and cries. Manipulation effects were less clear-cut for moans compared to laughs and cries, but the observed patterns are consistent with the hypothesis that ingressiveness in moans may intensify the conveyed level of pain or pleasure. In particular, moans became more neutral in valence when we attenuated voiced ingressive syllables between them. Interestingly, breathiness was previously found to be associated with perceived intensity of pleasure in moans (Anikin 2020a), so there appears to be a general expectation to find breathy and ingressive moans in erotic contexts, which can be investigated further in future studies.
The distinction between spontaneous and volitional vocal production (Bryant and Aktipis 2014;Anikin and Lima 2018;Atias and Aviezer 2020) is important to consider when studying acoustic indicators of emotion intensity such as ingressiveness. Humans can exert volitional control over their voice and imitate or modify otherwise innate vocalisations such as laughs (Ackermann et al. 2014), which means that speakers may be suppressing or exaggerating ingressiveness and other acoustic features in accordance with the context and cultural expectations. For instance, while the ingressiveness of 'animal-like' spontaneous laughter may be involuntary and indexical of high arousal, ingressiveness in erotic moans might well be exaggerated volitionally to conform to cultural stereotypes or please the partner. In fact, while 68% of women in a recent survey (Prokop 2021) reported moaning during intercourse, 38% also reported faking sexual vocalisations. Like other indicators of emotional arousal, ingressiveness is thus open to intentional manipulation.
The novel method of manipulating ingressiveness that was developed for this study produced encouraging results. Simply making ingressive syllables quieter (by separating, processing, and then cross-fading separate syllables) is a viable 'low-tech' approach, but the alternative method of morphing voiced ingressive syllables into aspiration noise consistently produced stronger effects, while preserving the naturalness of vocalisations. It works particularly well for removing the tonal component from relatively short ingressive syllables surrounded by egressive syllables -for example, in egressive-ingressive laughs. This manipulation sounds highly authentic because both the spectral envelope (formant structure) and the amplitude envelope of the original are preserved, but some adjustment of overall amplitude is still necessary because breathing is seldom as loud as ingressive phonation. It is more technically challenging to turn very high-pitched or whistle-like ingressive sounds into aspiration, and especially to add ingressive syllables where they were none originally, which is why in this study we only worked with original vocalisations that already contained ingressive syllables. Despite these current limitations, the ability to manipulate ingressiveness directly is a powerful experimental technique and another example of the benefits of using voice synthesis technology in research on vocal communication.
An important limitation of the present study is that only perceptual effects of manipulating ingressiveness were tested. In future it will be essential to compare the ingressiveness of laughs and other vocalisations produced under low and high arousal, verifying that ingressive phonation indeed occurs more frequently as the urgency or intensity of emotional state increases. A good start would be to perform a large-scale, quantitative analysis of the prevalence of ingressive phonation in human and animal nonverbal vocalisations associated with different contexts, which can actually be challenging to do because it is not always clear which syllables are ingressive. Targeted manipulation of ingressiveness, such as the method proposed in the present study, in combination with perceptual experiments in humans and playback studies in other species can then provide decisive evidence on the role that ingressive phonation plays in vocal communication.
To summarise, the present results demonstrate experimentally that vocalising during both exhaling and inhaling -egressively and ingressively -conveys high arousal in a range of human nonverbal vocalisations such as laughing, crying, and moaning. Assuming that this perceptual bias has some foundation in reality -that is, that ingressive phonation is indeed more common when vocalising in a state of high physiological arousal -an important area for future research will be to investigate the extent to which ingressiveness is a mere consequence of vocalising in an excited state and/or a flexible acoustic feature that may be exploited and exaggerated by speakers. On the one hand, ingressive syllables may be considered 'vocal slips' that betray a lack of vocal control: as the breathing rate increases and repeated vocalisations are produced under high arousal, imperfect timing of voice onsets and offsets may create unintentional gasps, whistles, wheezes, gurgles, coughs, and other unexpected sounds that introduce such incredible variety into high-intensity spontaneous vocalisations such as laughs (Bachorowski et al. 2001;Vettin and Todt 2004;Bryant 2020). Ingressive syllables may then function as indexical and therefore honest (Bradbury and Vehrencamp 2011) signals of intense emotion. On the other hand, it may be adaptive for the caller to utilise both exhalation and inhalation for vocalising, as this would optimise the production of acoustic events per unit of time. From this perspective, and perhaps especially in slower vocalisations such as sequences of moans, ingressive syllables may be introduced more or less voluntarily in order to maximise auditory salience and hold the listeners' attention. To learn more about the evolution and diverse roles of ingressive phonation, it will be crucial to systematically investigate its occurrence and communicative functions in vocalisations of non-human animals.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Ethical statement
All participants provided informed consent. Ethical approval for performing perceptual experiments with human subjects was provided by the Comité d'Ethique du CHU de Saint-Etienne (IRBN692019/CHUSTE).