CNBH, Physiology Department, Cambridge University www.mrc-cbu.cam.ac.uk/cnbh The perception of size and sex in vowel sounds P60 David R. R. Smith and Roy.

Slides:



Advertisements
Similar presentations
CNBH, PDN, University of Cambridge Roy Patterson Centre for the Neural Basis of Hearing Department of Physiology, Development and Neuroscience University.
Advertisements

CNBH, Physiology Department, Cambridge University 2. Experimental procedure The experiment is a 2AFC paradigm design in which.
Plasticity, exemplars, and the perceptual equivalence of ‘defective’ and non-defective /r/ realisations Rachael-Anne Knight & Mark J. Jones.
Hearing relative phases for two harmonic components D. Timothy Ives 1, H. Martin Reimann 2, Ralph van Dinther 1 and Roy D. Patterson 1 1. Introduction.
Voice quality variation with fundamental frequency in English and Mandarin.
“Connecting the dots” How do articulatory processes “map” onto acoustic processes?
Acoustic Characteristics of Vowels
CNBH, Physiology Department, Cambridge University Estimating vocal tract length from formant frequency data using a physical.
Multipitch Tracking for Noisy Speech
Introduction Relative weights can be estimated by fitting a linear model using responses from individual trials: where g is the linking function. Relative.
Hillenbrand: Vowels1 The Acoustics and Perception of American English Vowels.
Periodicity and Pitch Importance of fine structure representation in hearing.
A novel method for the automatic evaluation of retinal vessel tortuosity Enrico Grisan, Marco Foracchia and Alfredo Ruggeri Enrico Grisan, Marco Foracchia.
Fundamental Frequency & Jitter Lab 2. Fundamental Frequency Pitch is the perceptual correlate of F 0 Perception is not equivalent to measurement: –Pitch=
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.
Effects of Vocal Characteristics on Perceived Gender and Sexual Orientation Ricky McGee & Levi Hamner Hanover College.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Vowel Acoustics, part 2 March 12, 2014 The Master Plan Today: How resonance relates to vowels (= formants) On Friday: In-class transcription exercise.
Development of Speech Perception. Issues in the development of speech perception Are the mechanisms peculiar to speech perception evident in young infants?
Speech and speaker normalization (in vowel normalization)
Perception of syllable prominence by listeners with and without competence in the tested language Anders Eriksson 1, Esther Grabe 2 & Hartmut Traunmüller.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
CNBH, PDN, University of Cambridge Roy Patterson Centre for the Neural Basis of Hearing Department of Physiology, Development and Neuroscience University.
Vowel Acoustics, part 2 November 14, 2012 The Master Plan Acoustics Homeworks are due! Today: Source/Filter Theory On Friday: Transcription of Quantity/More.
Voice source characterisation Gerrit Bloothooft UiL-OTS Utrecht University.
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Representing Acoustic Information
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Source/Filter Theory and Vowels February 4, 2010.
Introduction Mel- Frequency Cepstral Coefficients (MFCCs) are quantitative representations of speech and are commonly used to label sound files. They are.
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
CNBH, Physiology Department, Cambridge University The perception of size in four families of instruments; brass, strings, woodwind.
CSD 5400 REHABILITATION PROCEDURES FOR THE HARD OF HEARING Auditory Perception of Speech and the Consequences of Hearing Loss.
Determining Wages: The Changing Role of Education Professor David L. Schaffer and Jacob P. Raleigh, Economics Department We gratefully acknowledge generous.
MUSIC 318 MINI-COURSE ON SPEECH AND SINGING
METHODOLOGY INTRODUCTION ACKNOWLEDGEMENTS LITERATURE Low frequency information via a hearing aid has been shown to increase speech intelligibility in noise.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Sounds in a reverberant room can interfere with the direct sound source. The normal hearing (NH) auditory system has a mechanism by which the echoes, or.
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Katherine Morrow, Sarah Williams, and Chang Liu Department of Communication Sciences and Disorders The University of Texas at Austin, Austin, TX
Performance Comparison of Speaker and Emotion Recognition
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Nonparametric tests: Tests without population parameters (means and standard deviations)
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Danielle Werle Undergraduate Thesis Intelligibility and the Carrier Phrase Effect in Sinewave Speech.
4aPPa32. How Susceptibility To Noise Varies Across Speech Frequencies
The Human Voice. 1. The vocal organs
Are masculine males attractive
The Human Voice. 1. The vocal organs
The “Flash-Lag” Effect Occurs in Audition and Cross-Modally
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Voice source characterisation
A Sparse Object Coding Scheme in Area V4
Contrast Gain Control in Auditory Cortex
Somatosensory Precision in Speech Production
Volume 61, Issue 2, Pages (January 2009)
Norm-Based Coding of Voice Identity in Human Auditory Cortex
Volume 66, Issue 6, Pages (June 2010)
Volume 17, Issue 13, Pages (July 2007)
Evolution of human vocal production
15.1 The Role of Statistics in the Research Process
Segregation of Object and Background Motion in Visual Area MT
Attentive Tracking of Sound Sources
Daniel E. Winkowski, Eric I. Knudsen  Neuron 
Auditory Morphing Weyni Clacken
Presentation transcript:

CNBH, Physiology Department, Cambridge University The perception of size and sex in vowel sounds P60 David R. R. Smith and Roy D. Patterson INTRODUCTION speaker size speaker sex and age The voices of men, women and children sound different – men tend to have low frequency voices while children have high frequency voices, and women lie somewhere in the middle. Our perception of these size differences (adult or child) and sex differences (male or female) is more complicated than it at first seems. There are two physical parameters that both vary with size and sex which are combined in our perception. One is the rate of opening and closing of the vocal folds (glottal pulse rate, GPR) determining voice pitch; the other is the length of the supra-laryngeal vocal tract (VTL) determining the centre of gravity of the frequencies. Both GPR and VTL are linked to the physical size and sex of the speaker (Fig. 1 panel 4) – however, it is unclear how they interact to determine our perception of speaker size and sex. The purpose of this work is to measure the relative contribution of GPR and VTL to judgements of: APPROACH Vowels were scaled to simulate people with a huge range of GPRs and VTLs, including many well beyond the normal range of the population (Fig. 1 panel 4). Vowels were scaled using the vocoder STRAIGHT (Kawahara et al., 1999). Panel 8 gives more details. Listeners were presented with vowels in a single-interval, two-response rating paradigm. Listeners had to make one judgement about: speaker size (on a 7 point ordinal scale ranging from “very short” to “very tall”) speaker sex (man, woman, boy, girl) and a second judgement about: RESULTS & CONCLUSIONS SPEAKER SIZE Listeners are able to estimate the size of speaker of vowel sounds (Fig. 2 panel 5). The influence of VTL (SER) upon size estimates is up to six times greater than GPR (Fig. 3 panel 6). SPEAKER SEX & AGE Listeners’ judgement of the sex and age of the speaker is affected about equally by the GPR and VTL (SER) of the vowel (Fig. 4 panel 7). When listeners are presented with supra-normal combinations of GPR and VTL, the VTL information is more heavily weighted than the GPR information. 213 FIGURE 1. The four ellipses show estimates of the normal range of GPR and SER values in speech for men, women, boys and girls (derived from Peterson and Barney, 1952). In each case, the ellipse encompasses 90% of individuals in the Peterson and Barney data for that category of speaker. The open circles are combinations of GPR and SER values used in the experiment. The abscissa is GPR and the ordinate is SER, plotted on logarithmic axes. girls boys 4 56 FIGURE 2. Speaker size rating judgements presented as a 2D surface plot with colour showing perceived speaker size. The 7-point ordinal speaker size rating scale goes from 1 (meaning “very short”) to 7 (meaning “very tall”). Sample points are shown as circles with interpolation between the data points. Data is collapsed across the five vowels and four listeners giving 100 trials per point. FIGURE 3. Speaker size rating as a function of GPR and as a function of SER (VTL). Error bars are one standard error of the mean (across four listeners). Best-fitting regression lines calculated for speaker size rating as a function of natural logarithm of parameter. Probabilities are one-tailed Spearman’s rank order correlations for non-parametric variables. slope = slope = r s = r s << FIGURE 4. Sex categorisation performance. Data are presented as 2D surface plots with colour showing probability of assigning a given GPR-SER combination to one of four sex categories. The dotted black contour line marks our classification threshold, that is, a probability ≥0.50 of consistently choosing one category out of the four available. Data is collapsed across the four listeners giving 100 trials per point. 7 METHODS 9 8 REFERENCES ACKNOWLEDGEMENTS Research supported by the UK MRC (G ; G ) and the German Volkswagen Foundation (VWF 1/79 783). Kawahara, H., Masuda-Kasuse, I., and de Cheveigne, A. (1999). “Restructuring speech representations using pitch- adaptive time-frequency smoothing and instantaneous- frequency based F0 extraction: Possible role of repetitive structure in sounds,” Speech Communication 27, Smith, D. R. R., Patterson, R. D., and Jefferis, J. (2003). “The perception of scale in vowel sounds,” British Society of Audiology, Nottingham. P35. Smith, D. R. R., and Patterson, R. D. (2004). “The existence region of scaled vowels in pitch-VTL space,” 18th International Conference on Acoustics, Kyoto Japan, vol I., The work also extends our previous work on 2AFC size discrimination (Smith et al., 2003; Smith and Patterson, 2004) by using a rating procedure to measure speaker size. CANONICAL VOWELS Vowels (/a/, /e/, /i/, /o/, /u/) were extracted from a natural /hVd/ speech sequence spoken by an adult male (RP) – haad, hayed, heed, hoed, who’d. Sounds were digitized with 16-bit quantification and a sampling rate of 44.1 kHz. All vowels were 500 ms. SCALE MANIPULATION Vowels were manipulated to have a range of GPRs and simulated VTLs using STRAIGHT (Kawahara et al., 1999). STRAIGHT produces a pitch- independent spectral envelope that accurately tracks the motion of the vocal tract through an utterance. Once STRAIGHT has segregated a vowel into its GPR contour and a sequence of spectral-envelope frames, the vowel can be resynthesized with the spectral-envelope dimension (frequency) expanded or contracted, and the GPR dimension (time) expanded or contracted, and the operations are largely independent. EXPT We used a single-interval, two-response rating paradigm. Listeners (n=4) heard a scaled version of one of five English vowels (pseduo-randomly chosen vowel and GPR-SER value cf. Fig. 1), and had to make one judgement about the size of the speaker (very short, short, quite short, average, quite tall, tall, very tall) and a second judgement about the sex of the speaker (man, woman, boy, girl). The level of the vowel was roved in intensity over a 10 dB range. There was no feedback. women men