Effectiveness of spatial cues, prosody, and talker characteristics in selective attention C.J. Darwin & R.W. Hukin.

Slides:



Advertisements
Similar presentations
Acoustic Characteristics of Consonants
Advertisements

Vowel Formants in a Spectogram Nural Akbayir, Kim Brodziak, Sabuha Erdogan.
Physical modeling of speech XV Pacific Voice Conference PVSF-PIXAR Brad Story Dept. of Speech, Language and Hearing Sciences University of Arizona.
“Connecting the dots” How do articulatory processes “map” onto acoustic processes?
Acoustic Characteristics of Vowels
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
The perception of dialect Julia Fischer-Weppler HS Speaker Characteristics Venice International University
Hillenbrand: Vowels1 The Acoustics and Perception of American English Vowels.
Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner.
Voice source characteristics in speaker segregation Patti Adank.
Speech and speaker normalization (in vowel normalization)
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
Structure of Human Speech Chris Darwin Vocal Tract.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Vocal Emotion Recognition with Cochlear Implants Xin Luo, Qian-Jie Fu, John J. Galvin III Presentation By Archie Archibong.
L 17 The Human Voice. The Vocal Tract epiglottis.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex With thanks to : Rob Hukin John Culling John Bird MRC.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
Hearing & Deafness (5) Timbre, Music & Speech Vocal Tract.
Auditory Objects of Attention Chris Darwin University of Sussex With thanks to : Rob Hukin (RA) Nick Hill (DPhil) Gustav Kuhn (3° year proj) MRC.
Hearing & Deafness (5) Timbre, Music & Speech.
1 Lab Preparation Initial focus on Speaker Verification –Tools –Expertise –Good example “Biometric technologies are automated methods of verifying or recognising.
Sound source segregation (determination)
Different evaluations for different kinds of hearing Matthew B. Winn Au.D., Ph.D. Waisman Center, UW-Madison Dept. of Surgery.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Speech Perception1 Fricatives and Affricates We will be looking at acoustic cues in terms of … –Manner –Place –voicing.
Speech Perception 4/4/00.
Acoustic Cues to Laryngeal Contrasts in Hindi Susan Jackson and Stephen Winters University of Calgary Acoustics Week in Canada October 14,
Sounds in a reverberant room can interfere with the direct sound source. The normal hearing (NH) auditory system has a mechanism by which the echoes, or.
creating sound value TM Spatial release from masking deficits in hearing-impaired people: Is inadequate audibility the problem? Helen.
Dynamic Aspects of the Cocktail Party Listening Problem Douglas S. Brungart Air Force Research Laboratory.
‘Missing Data’ speech recognition in reverberant conditions using binaural interaction Sue Harding, Jon Barker and Guy J. Brown Speech and Hearing Research.
Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp
The Effect of Pitch Span on Intonational Plateaux Rachael-Anne Knight University of Cambridge Speech Prosody 2002.
Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Performance Comparison of Speaker and Emotion Recognition
Listeners weighting of cues for lateral angle: The duplex theory of sound localization revisited E. A. MacPherson & J. C. Middlebrooks (2002) HST. 723.
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
Speech Perception.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
Control of prosodic features under perturbation in collaboration with Frank Guenther Dept. of Cognitive and Neural Systems, BU Carrie Niziolek [carrien]
Information Conveyed by Vowels (Ladefoged and Broadbent) Three kinds of information conveyed Linguistic Meaning of utterance.
What can we expect of cochlear implants for listening to speech in noisy environments? Andrew Faulkner: UCL Speech Hearing and Phonetic Sciences.
SPATIAL HEARING Ability to locate the direction of a sound. Ability to locate the direction of a sound. Localization: In free field Localization: In free.
Danielle Werle Undergraduate Thesis Intelligibility and the Carrier Phrase Effect in Sinewave Speech.
Acoustic Recognition of the Lamb by its Mother Frédéric Sèbe, Pascal Poindron, et Equipe Comportement, Neurobiologie et Adaptation Physiologie de la Reproduction.
영어교육에 있어서의 영어억양의 역할 (The role of prosody in English education) Korea Nazarene University Kyuchul Yoon English Division Kyungnam University.
L 17 The Human Voice.
High Quality Voice Morphing
Speech and Singing Voice Enhancement via DNN
Doppler Effect The apparent shift in frequency caused by the movement of the sound source or the movement of the observer. When the waves get crunched.
Precedence-based speech segregation in a virtual auditory environment
August 15, 2008, presented by Rio Akasaka
Attentional Tracking in Real-Room Reverberation
Speech Perception.
Speech Perception (acoustic cues)
Information Conveyed by Vowels (Ladefoged and Broadbent)
Attentive Tracking of Sound Sources
Stephen V. David, Benjamin Y. Hayden, James A. Mazer, Jack L. Gallant 
COPYRIGHT © All rights reserved by Sound acoustics Germany
Auditory Morphing Weyni Clacken
Follow-up: hearing and communication
Presentation transcript:

Effectiveness of spatial cues, prosody, and talker characteristics in selective attention C.J. Darwin & R.W. Hukin

Background Spatial attention often the focus of studies of the cocktail party effect But humans can separate sources that aren’t separated in space What other aspects of the speech signal are useful for source separation? –Pitch contour? –Individual characteristics? –A combination of characteristics?

Aims Characterize the role of natural prosody in sound source localization Characterize the role of vocal-tract size in sound source localization

Methods 13 listeners (21-52yrs) “Could you PLEASE write the word bead/globe down now?” / “You’ll ALSO hear the sound bead/globe played here”

Methods 13 listeners (21-52yrs) “Could you PLEASE write the word bead/globe down NOW?” / “You’ll ALSO hear the sound bead/globe played HERE” –Target word onsets aligned –Target word duration matched –Similar phrase durations

Methods Three pitch conditions –Original –Together (Equalize target word F0s) –Apart (Shift target word F0s apart) Two splicing methods –Normal –Swapped

You will ALSO hear the sound globe played here You will also hear the sound globe played HERE Could you please write the word bead down NOW Could you PLEASE write the word bead down now

You will ALSO hear the sound globe played here Could you please write the word bead down NOW Swapped… Could you PLEASE write the word bead down now You will also hear the sound globe played HERE

Methods Three pitch conditions –Original –Together (Equalize target word F0s) –Apart (Shift target word F0s apart) Two splicing methods –Normal (prosodic cues reinforce spatial) –Swapped (prosodic cues oppose spatial) ITDs –0, ±45.3, ±90.7 µs 144 trials heard 5 times each (720 trials) You will ALSO hear the sound globe played here / Could you please write the word BEAD down now

Results ITD = 0 Normal: Select target with matching prosody (83%) Swapped: Lower incidence of accuracy (69%) In the absence of other cues, listeners can use natural F0 contour to track a sentence

Results ITD ≠ 0 Normal: Improved accuracy (93%) Swapped: chance selection with an ITD of ±45.3 µs With ITD of ±90.7 µs report target with ITD of target sentence

Results ITD ≠ 0 Apart condition strengthens prosodic cues chance of reporting target with same prosody as target sentence Together condition weakens prosodic cues ITD cues dominate, but natural prosody can help direct listeners’ attention

Aims Characterize the role of natural prosody in sound source localization Characterize the role of vocal-tract size in sound source localization

Experiment 2 Changed spectral envelope by 15% –Formant frequencies changed –Voice source characteristics changed –F0 unchanged Produced 2 apparently different talkers ITD 0, ±45.3, ±90.7, ±181.4 µs

Different vocal tract sizes have a large effect Even with large ITDs and swapped condition, listeners prefer original target word (73%)

Experiment 3 Fixed ITD ±90.7 µs Vocal tract size changes of ±2, ±4, ±8, ±15%

A ± 8% size difference is comparable to that between male and females Little significant change arises across vocal tract length change conditions below ±8%

Conclusions Natural prosodic variations more effectively override spatial cues than monotone F0 Vocal tract size changes ≥ average male/female differences can override spatial cues

Things to consider Natural cues? Natural setting? In a natural environment are these cues ever pitted against one another? What are listeners really attending to? Can we really conclude that more attention is being paid to ITD than to prosody?

But is the vocal tract modification of realistic proportions?