Introduction

Several mammalian species are known to scale their vocal frequencies (see refs 1, 2, 3, 4 for reviews). Red deer (Cervus elaphus) offer a prime example, wherein stags drastically lower their larynges to extend their vocal tracts during roaring and do so predominantly in response to threatening male competitors5,6. This behaviour lowers formants beyond what would be expected based on mammalian acoustic allometry, thus exaggerating the animal’s apparent size. Several researchers have proposed similar capabilities in humans, suggesting that systematic voice frequency modulation for size exaggeration should be observed not only across mammalian species3, but also across human cultures7,8. Others have further hypothesized that exaggeration of body size through voice frequency modulation may have contributed to the descent of the human larynx9, and is likely to have played a critical role in the early evolution of nonverbal communication, ultimately paving the way for the emergence of articulated speech4,9. The present study is the first to empirically test whether men or women do in fact systematically modulate F0 and formants when instructed to deliberately alter their apparent body size.

Anatomical constraints on voice frequencies

Guided by the source-filter theory of speech production10,11, behavioral scientists studying acoustic communication of body size in humans and other mammals have focused on two voice features: fundamental frequency (F0) and vocal tract resonances (formants). Voice F0 is produced by the vocal folds, whose rate of vibration is related to their mass, length and tension, whereas the supralaryngeal vocal tract filters the voice producing formants that are inversely related to supralaryngeal vocal tract length12. Voice F0 and formants affect our perception of pitch and timbre, respectively, and play a major role in speech articulation13. These voice features are also highly sexually dimorphic and have likely undergone intense sexual selection in humans14.

Formants scale fairly allometrically with vocal tract length and body size15, because the mammalian vocal tract is constrained by the skeletal structures that surround it. In contrast, although larger vocal folds produce a lower F0, the larynx grows largely independently of the rest of the body and F0 does not therefore scale allometrically with body size in humans16. Indeed, formants explain several times more variation in body size than does F0 when sex and age are controlled17. Nevertheless, among humans, neither vocal feature explains a substantial portion of the variance in body size at the within-sex level17,18.

The lack of a robust physical relationship between the human voice and body size suggests a lack of constraints to maintain allometry. Volitional voice modulation to exaggerate body size should therefore be possible, and could help to further explain this puzzling disassociation. At the perceptual level, and despite the lack of robust physical relationships, listeners cross-culturally associate both low F0 and low formants with large body size even within sexes19,20,21,22,23. This further suggests that similar to other mammals (see e.g. refs 24 and 25) the human voice conveys both honest and exaggerated cues to size. Perceptual correspondences between low voice frequencies and large body size are important because they may drive selection for vocal communication (or exaggeration) of size, even in the absence of robust physical relationships between the voice and body.

Morphological modifications for size exaggeration

The vocal anatomy of many mammals has undergone morphological modifications that appear to function, at least in part, to exaggerate apparent size1. These include non-laryngeal velar vocal folds in koalas (Phascolarctos cinereus) that allow males to produce F0’s typical of an animal as large as an elephant26, the subhyoid air sacs in black-and-white colubus monkeys (Colobus guereza) that amplify resonant frequencies24, and the descended larynx in males of several polygynous deer species6, and koalas27, that enable them to produce low formant frequencies characteristic of much larger species.

Humans also have a descended larynx. In humans the descended larynx allows for the production of a broader range of speech sounds relative to the vocal repertoires of other primates28, but importantly, also results in a lengthened pharyngeal cavity and thus relatively lower formants9. Among men, pubertal hormones cause the larynx to descend even further, a full vertebra lower than among women16, and cause men’s vocal folds to grow 60% larger than women’s29. These morphological modifications are evolutionarily relevant, as they implicate a role of sexual selection and size exaggeration in the evolution of human vocal frequencies. However, men’s F0 and formants are approximately 80% and 20% lower than women’s, respectively, and these sex differences in F0 and formants exceed that which can be explained by sexual dimorphism in the vocal anatomy (i.e., men’s vocal folds are on average only 60% larger than women’s, and their vocal tracts are typically 15% longer) or by sexual dimorphism in body size (men are on average only 10% taller than women)30. This discrepancy alludes to possible behavioural differences between men and women in vocal production or modulation31, wherein men may lower their F0 and formants more than women through the behavioural mechanism of voice modulation. If true, voice modulation may account for some portion of the unexplained variance between men and women’s vocal frequencies.

Voice frequency modulation in humans

Mechanistically, volitional modulation of F0 is achieved by manipulating the tension and effective length or surface area of the vocal folds using the laryngeal muscles (cricothyroid muscles lengthen the vocal folds and increase F0, whereas thyroarytenoid muscles shorten the vocal folds and decrease F0, and their opposing effects can be coordinated or independent)32,33 or by increasing subglottal pressure. In contrast, lowering the larynx or protruding the lips increases supralaryngeal vocal tract length and reduces formant spacing13,32,33. Although recent investigations suggest some flexible control of voice frequencies in nonhuman primates34,35,36, the ability to intentionally and volitionally modulate source and filter components is uniquely advanced in humans and is thought to constitute a precursor of speech4,9. Indeed volitional voice modulation in humans involves comparatively complex neural processes that are absent in other mammals, including nonhuman primates37.

Infant directed speech, in which adults speak with higher F0 and exaggerated prosodic cues when addressing infants compared to older individuals, represents perhaps the most extensively studied form of voice modulation in humans and appears to be present across diverse cultures38,39. More recently, a small number of empirical studies have begun to examine voice modulation as a social tool used to exploit ecologically relevant traits, and among these, almost all have focused on F0 modulation (see ref. 4 for review). For example, in a series of recent studies, Cartei and colleagues40,41,42 showed that men, women, and children volitionally decreased both F0 and formants when asked to sound masculine, and increased both voice features to sound feminine. Several studies report F0 modulation in men or women when speaking to a potential mate43,44,45,46,47,48 or competitor49. In the context of mate preferences, these studies have found that both sexes volitionally modulate F0 when instructed to speak in a more attractive voice43 as well as when directing their speech toward an attractive person of the opposite sex45,47.

Voice modulation may therefore be utilized to deemphasize or accentuate various indexical traits and this may be evolutionary adaptive. In particular, men who can effectively exaggerate their apparent body size through F0 and formant modulation may reap the social benefits associated with physical largeness, such as increased access to resources and mates. Indeed, taller men, and those with relatively lower voice F0 and formants indicating larger body size, are typically preferred as mates by women across a diverse range of cultures50. Nevertheless, to be effective, vocal modulation of body size should exceed the just-noticeable differences in F0/formant perception23,51,52 and should have the intended effects on listeners’ social assessments. While some studies have found that volitional voice modulation effectively increased listeners’ assessments of the vocalizer’s attractiveness, competence, and intelligence43,47, one study found that sex-typical F0 modulation influenced listeners’ assessments of dominance but not voice attractiveness46.

The Present Study

The present study is the first to test whether humans can modulate voice features known to be associated with body size (fundamental and formant frequencies) when instructed to deliberately alter their apparent body size. In addition, we examined whether this voice modulation reflects real (physical) and perceived relationships between the human voice and body (i.e., lower F0 and formants indicate larger size and visa versa), whether the behaviour differs between the sexes, and whether the behaviour is present cross-culturally.

We tested these hypotheses in 167 men and women from three distinct cultures and language groups: Canada (English), Cuba (Spanish), and Poland (Polish). Participants were recorded speaking vowel sounds in their baseline voice and while imitating a physically large and small body size. We predicted that participants would lower F0 and formants (increase apparent vocal tract length, VTL) to convey large size, and raise voice F0 and formants (reduce VTL) to convey small size. We further predicted that men would modulate their voices more than women, thereby accounting for some of the unexplained sexual dimorphism in F0 and formants. In contrast, we predicted that patterns of voice modulation would not differ across the three cultures. This latter finding would provide some support for fairly universal sound-size correspondences, and/or anatomical or biomechanical constraints on voice modulation.

The present study was specifically designed to test for the first time whether adult speakers are capable of volitional adjustments to their larynx (fundamental frequency modulation) and vocal tract (formant frequency modulation) in a manner that parallels the known relationships between these vocal parameters and body size in humans. Acoustic analyses were utilized to measure voice frequency parameters and to test whether these modulations exceed just-noticeable differences in F0 and formant perception. However in the present study we did not test whether these modulations effectively alter listeners’ perceptions of the vocalizer’s body size.

Results

Table 1 shows unstandardized means and maxima in VTL and F0 modulation for each sex and condition. As predicted, both sexes decreased VTL and increased F0 to sound small, and increased VTL and decreased F0 to sound large (Fig. 1; Supplementary Audio S1). Notably, men increased their apparent VTLs by as much as 25% to portray a physically larger body size, and increased their F0 by up to three times the baseline frequency (i.e., almost 300%) to sound smaller, reaching pitch registers characteristic of a child53.

Table 1 Means and maxima in VTL (cm) and F0 modulation (Hz and ERB) for each sex and condition, given in absolute units and percentage change from baseline.
Figure 1: Spectrograms illustrating the vowel /a/ spoken by the same adult male in each condition.
figure 1

Formants (F1–F4) are labeled. Fundamental frequency and the first two harmonics (multiple integers of F0) are indicated by red arrows. Participants raised formants and F0 to sound small, thus increasing spacing between F1-F4 and between harmonics (left), and lowered formants and F0 to sound large (right). Gaussian FFT, window length 0.04; dynamic range 60 dB. Refer to Supplementary Audio S1 for corresponding voice recording.

Formant or vocal tract length modulation

An analysis of variance revealed a main effect of condition (large versus small body size imitation)(F1,111 = 109.2, p<0.001,  = 0.50; Fig. 2a) and an interaction between condition and sex (F1,111 = 8.1, p = 0.005,  = 0.07; Fig. 2b) on VTL modulation. There were no other significant effects (all F < 2.1, all p> 0.13) including no effects of culture (Fig. 2c). Post-hoc analyses showed that participants increased their VTL from baseline in the large condition (one-sample t132 = 9.7, p < 0.001) and decreased their VTL in the small condition (t132 = −5.4, p < 0.001). Moreover, men increased VTL in the large condition (one-way F1,132 = 6.01, p = 0.016) and decreased VTL in the small condition (F1,122 = 5.78, p = 0.018) significantly more than did women. A model examining absolute differences from baseline (i.e., magnitude of modulation) indicated that VTL modulations were more extreme in the large than small condition, and more extreme among men than women in both conditions (see Supplementary Information; see also Fig. 2).

Figure 2: Vocal tract length (VTL) modulation given as the standardized difference from baseline in the large and small conditions.
figure 2

(a) Participants increased VTL to sound physically large and decreased VTL to sound small. The magnitude of VTL modulations was greater in the large than small condition. (b) Men modulated their VTLs more than did women in both size conditions. (c) VTL modulation did not vary cross-culturally. ***p < 0.001, *p < 0.05, ns p > 0.05.

Fundamental frequency modulation

We observed main effects of condition (F1,161 = 55.77, p < 0.001,  = 0.26; Fig 3a), sex (F1,161 = 10.7, p = 0.001,  = 0.06; Fig 3b) and culture (F2,161 = 6.1, p = 0.003,  = 0.07; Fig 3c) on F0 modulation. These effects were qualified by a significant interaction between condition and sex (F2,161 = 4.4, p = 0.037,  = 0.03) and a marginally non significant interaction between condition and culture (F2,161 = 3.1, p = 0.051, =0.04). There were no other significant effects (all F < 1.9, all p> 0.16).

Figure 3: Fundamental frequency (F0) modulation.
figure 3

(a) Participants decreased F0 to sound physically large, and increased F0 to sound small. The magnitude of F0 modulations was greater in the small than large condition. (b) Men modulated VTL more than did women, but only in the small condition. (c) F0 modulation did not vary cross-culturally in the large condition, however in the small condition, Poles modulated their F0 more than did Canadians. ***p < 0.001, **p < 0.01, *p < 0.05, ns p > 0.05.

Planned post-hoc analyses showed that participants decreased their F0 in the large condition (one-sample t166 = −2.6, p = 0.01) and increased their F0 in the small condition (t166 = 6.7, p < 0.001). Men increased their F0 more than did women to sound small (one-way F1,166 = 7.2, p = 0.008), however women decreased their F0 more than did men to sound large (F1,166 = 5.5, p = 0.021). Cultural differences in F0 modulation emerged only in the small condition (F2,166 = 4.4, p = 0.014), and only between Canadians and Poles (Fisher’s LSD p = 0.004; all other p > 0.11; Fig. 3c). A model examining absolute magnitude indicated that F0 modulations were more extreme in the large than small condition. Within the small condition, F0 modulations were more extreme among men than women (see Supplementary Information; see also Fig. 3).

Discussion

The capacity for humans to volitionally modulate the source and filter components of our voices has traditionally been studied in the context of speech and language production9,11. The extent to which we modulate our voices for nonverbal communication, for instance to sound more masculine/feminine or attractive, has been investigated in comparatively few empirical studies40,41,43,44,45,46,47,48,54,55. Our study provides the first evidence that men and women from diverse cultures can spontaneously and volitionally modulate their fundamental and formant frequencies with the intent to exaggerate or reduce apparent body size, and that regardless of culture, men generally modulate their voices more than do women in this context. Acoustic analyses indicated that these modulations were in the predicted direction, such that men and women lowered F0 and formants when instructed to sound large, and increased F0 and formants when instructed to sound small, and that in most cases these modulations exceeded the just-noticeable differences in F0 and formant perception.

The patterns of voice frequency modulation observed in our study map onto real physical relationships between the voice and body, as larger people generally have lower formants and F0 than do smaller people17,18,22. However, because neither vocal parameter (especially F0) can explain a substantial proportion of the variance in human body size when sex and age are controlled17,18,22, volitional voice modulation of these parameters may also reflect an exploitation of listeners’ perceptual biases linking low voice frequencies to large body size and dominance7,8,21,22,23,54, or more general sound symbolic correspondences56. Indeed our results support Ohala’s prediction that similar voice frequency modulations will be observed across cultures, reflecting a universal “frequency code”7,8. It has also previously been suggested that perceptual biases based on the laws of physics, such that large objects resonate at lower frequencies, are likely to be cross-culturally universal precisely because they are determined by physics, not culture57 (see also ref. 3). Our cross-cultural results may alternatively reflect constraints on voice production in humans. Formants are especially constrained by the bony anatomy surrounding the vocal tract15, which is likely to impose upper and lower limits on formant modulation.

The sex differences in voice modulation observed here may be tied to a number of factors, most parsimoniously to differences in the vocal anatomy of men and women. For example, a longer supralaryngeal vocal tract among men may allow for greater laryngeal mobility that could result in a broader range of formant manipulations. Men’s voices are also lower in frequency than are women’s, and as a result men must raise their voices more than women to reach similar high frequency targets. Nevertheless our results indicate that men exceeded the frequency targets reached by women even when raising their voice frequencies to sound small. Indeed we observed extreme maxima in modulations of both F0 and VTL, particularly among men. On one hand, this demonstrates an impressive capacity for men to volitionally manipulate their larynges and vocal tracts. On the other hand, it elicits a question about the ecological validity of such extreme modulations, which may be perceived as abnormal.

Our results indicate that speakers modulated F0 more than VTL. We also observed asymmetries within each vocal parameter, specifically greater decreases than increases in formants, and greater increases than decreases in F0. This latter finding might be explained by nonlinearities in the relationship between vocal fold length and F032, and the greater physiological effort required to increase versus decrease vocal fold tension58. Indeed baseline F0 is closer to the minimum than maximum producible F012. As a consequence, sopranos can reach F0’s above 1200 Hz, whereas bass singers lower their F0 by only a fraction of this magnitude, typically to around 80 Hz59.

The demonstrated capacity to volitionally modulate vocal parameters known to be physically related to and perceptually associated with body size can be evolutionarily advantageous, as various indicators of physical size in humans are known to influence a wide range of socioeconomic variables and the mate preferences of both sexes50. At the same time, voice modulation is ecologically relevant only if and when it affects listeners. Perceptually, human listeners can discriminate changes in F0 or formants of about 5% from a series of vowel sounds52, and formant manipulations of 5% are known to affect listeners’ body size estimates60. Based on this our results suggest that, on average, men’s formant-based size exaggeration, and both men’s and women’s F0-based size reduction, would be perceptually detectable. Studies examining the effectiveness of voice modulation on other types of judgments have produced mixed results43,46,47, but generally suggest that voice modulation may be an effective tool for manipulating listeners’ social judgments of traits such as attractiveness, dominance, and competence. For instance, one recent study found that listeners preferred the voices of men and women whose speech was directed towards attractive individuals, and these preferences were observed for voices recorded in the listener’s own language as well as in a foreign language47. In the case of vocally faking a larger body size, and thus a more dominant persona, individuals who are perceived as physically larger due to voice modulation could reap the socioeconomic and reproductive benefits typically linked to these traits across various social contexts including mating, political and marketing contexts. Currently we are conducting playback experiments to test whether vocal modulation can effectively alter listeners’ estimates of body size.

Methods

Participants

A total of 167 men and women from Canada (students of McMaster University in Hamilton), Cuba (students of the University of Havana, and staff and students of the Cuban Neuroscience Center in Havana), and Poland (students of the University of Wrocław and the College of Humanities and Economics in Brzeg) took part in the experiment. All participants provided informed consent. Sample characteristics are given in Table 2.

Table 2 Sample characteristics (mean (s.d., range)).

Procedure

All participants were first recorded speaking the five monophthong vowels /α/, /i/, /ɛ/, /o/, and /u/ (International Phonetic Alphabet) in their natural, baseline voice. Following this, participants were asked to repeat the five vowels while sounding physically small (small condition) and physically large (large condition). These instructions, back translated and given in the native language of the participant, were the only instructions given. Condition order was counter-balanced between participants. Participants then completed a short questionnaire indicating their sex and age. Height was measured using metric tape and weight using an electronic scale. The study was approved by the McMaster Research Ethics Board and methods were carried out in accordance with the approved guidelines.

Voice recording

All participants were recorded using condenser microphones with a cardioid pick-up pattern at an approximate distance of 5–10 cm (Canada: Sennheiser MKH 800; Cuba: Sennheiser MKH 70; Poland: Audio-M Nova). Audio was digitally encoded with an M-Audio Fast Track interface at a sampling rate of 44.1–96 kHz and 16–24 bit amplitude quantization, and stored onto a computer as PCM WAV files. Recordings from participants at McMaster University and the Cuban Neuroscience Center were conducted in an anechoic sound-controlled booth and recordings at the Universities of Havana and Wrocław were conducted in a quiet room.

Voice measurement and acoustic analysis

All acoustic measures were performed in Praat61. Voice measures were taken from each vowel separately and then averaged across vowels within each vocalizer and condition to obtain mean values. We measured F0 using Praat’s autocorrelation algorithm. Following previous work, we set a broad search range of 30–500 Hz for men, and 65–600 Hz for women41. We transformed F0 measures into equivalent rectangular bandwidth (ERB) units, a quasi-logarithmic scale that controls for the difference between physical and perceived properties of pitch, where 1 ERB is approximately equal to a 40 Hz change at a centre frequency of 120 Hz62. The ERB scale correlates strongly with F0 in Hz in the range of adult human speech (e.g., r = 0.99 in men)21.

We measured formants (F1–F4) using Praat’s Burg Linear Predictive Coding algorithm with the initial settings of maximum formant set to 5500 Hz for women and 5000 Hz for men. Formants were first overlaid on a spectrogram and formant number was manually adjusted until the best visual fit of predicted onto observed formants was obtained. From the mean centre frequencies of F1–F4 we computed formant spacing, ∆F, a measure of the distance among adjacent formants, as well as apparent vocal tract length derived from formant spacing, VTL(∆F)63. The results of a recent meta-analysis indicate that ∆F and VTL(∆F) each independently explain more variance in men’s heights and women’s weights than do any other formant measures17, and are strongly inversely related (here, r = −0.99 within each sex).

Each individual formant is related to ∆F by Equation (1):

where i represents formant position (F1–F4). Thus, we derived ∆F by plotting mean formant frequencies for each individual against the expected increments of formant spacing [(2i − 1)/2], where ∆F is equal to the slope of the linear regression line with an intercept set to 041,63. From this, we estimated the apparent vocal tract length of each individual following equation (2):

where c is 35 000 cm/s, the approximate speed of sound in a uniform tube with one end closed controlling for warmth and dampness (i.e. the vocal tract12). From the pooled samples, we confirmed that baseline VTL explained several times more variance in men’s (12%, rS = 0.35) and women’s (16%, rS =0.40) heights than did baseline F0 (2.5% in each sex, rS = 0.16; See Supplementary Fig. S1). This pattern of results was similar across samples and agrees with weighted relationships reported at the population level17.

Statistical analysis

We first calculated differences in voice measures between each size condition and baseline, separately for F0 and VTL. Positive values indicate increases, and negative values decreases, from baseline. We then ran separate repeated measures ANOVAs for F0 and VTL. In each model, the dependent variable was the standardized difference from baseline ([large–baseline]/baseline; [small–baseline]/baseline), controlling for baseline sex differences. Condition (large, small) was included as a within-subject factor, and sex (male, female) and culture (Canada, Cuba, Poland) as between-subject factors. To examine differences in the magnitude of voice modulations, we re-ran the models on the absolute standardized difference from baseline in each condition (see Supplementary Information). Significant effects were further examined using planned post-hoc tests. All tests were two-tailed with an alpha of 0.05.

Additional Information

How to cite this article: Pisanski, K. et al. Volitional exaggeration of body size through fundamental and formant frequency modulation in humans. Sci. Rep. 6, 34389; doi: 10.1038/srep34389 (2016).