Dynamic Aspects of the Cocktail Party Listening Problem Douglas S. Brungart Air Force Research Laboratory.

Slides:



Advertisements
Similar presentations
Improving audibility as a foundation for better speech understanding Pamela Souza, PhD Northwestern University Evanston, IL.
Advertisements

Perception: Attention Experiments Intro Psych Mar 3, 2010 Class #18.
Multipitch Tracking for Noisy Speech
Binaural Hearing Or now hear this! Upcoming Talk: Isabelle Peretz Musical & Non-musical Brains Nov. 12 noon + Lunch Rm 2068B South Building.
Sound source segregation Development of the ability to separate concurrent sounds into auditory objects.
Timbre perception. Objective Timbre perception and the physical properties of the sound on which it depends Formal definition: ‘that attribute of auditory.
Perceptual Processes: Attention & Consciousness Dr. Claudia J. Stanny EXP 4507 Memory & Cognition Spring 2009.
Lecture 8  Perceived pitch of a pure tone  Absolute pitch  Midterm review Instructor: David Kirkby
Chapter 1: Information and Computation. Cognitive Science  José Luis Bermúdez / Cambridge University Press 2010 Overview Review key ideas from last few.
Effectiveness of spatial cues, prosody, and talker characteristics in selective attention C.J. Darwin & R.W. Hukin.
Vocal Emotion Recognition with Cochlear Implants Xin Luo, Qian-Jie Fu, John J. Galvin III Presentation By Archie Archibong.
Working Memory: The Feature Model Presented by: Umer Fareed.
Masker-First Advantage in Cued Informational Masking Studies Virginia M. Richards a, Rong Huang a, and Gerald Kidd Jr b. (a) Department of Psychology,
Using Fo and vocal-tract length to attend to one of two talkers. Chris Darwin University of Sussex With thanks to : Rob Hukin John Culling John Bird MRC.
1 Pattern Recognition (cont.). 2 Auditory pattern recognition Stimuli for audition is alternating patterns of high and low air pressure called sound waves.
ICA Madrid 9/7/ Simulating distance cues in virtual reverberant environments Norbert Kopčo 1, Scott Santarelli, Virginia Best, and Barbara Shinn-Cunningham.
Interrupted speech perception Su-Hyun Jin, Ph.D. University of Texas & Peggy B. Nelson, Ph.D. University of Minnesota.
TOPIC 4 BEHAVIORAL ASSESSMENT MEASURES. The Audiometer Types Clinical Screening.
Sound source segregation (determination)
Acoustical Society of America, Chicago 7 June 2001 Effect of Reverberation on Spatial Unmasking for Nearby Speech Sources Barbara Shinn-Cunningham, Lisa.
The Cocktail Party Effect Presented by Group 8 廖朝弘 梅衍儂 張雅婷 陳瑋鈴 張筠淇 郭裕芯 組長:劉芳潔 翁怡欣
1 Recent development in hearing aid technology Lena L N Wong Division of Speech & Hearing Sciences University of Hong Kong.
Different evaluations for different kinds of hearing Matthew B. Winn Au.D., Ph.D. Waisman Center, UW-Madison Dept. of Surgery.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Hearing.
Speech Segregation Based on Sound Localization DeLiang Wang & Nicoleta Roman The Ohio State University, U.S.A. Guy J. Brown University of Sheffield, U.K.
Speech Perception in Noise and Ideal Time- Frequency Masking DeLiang Wang Oticon A/S, Denmark On leave from Ohio State University, USA.
From Auditory Masking to Supervised Separation: A Tale of Improving Intelligibility of Noisy Speech for Hearing- impaired Listeners DeLiang Wang Perception.
METHODOLOGY INTRODUCTION ACKNOWLEDGEMENTS LITERATURE Low frequency information via a hearing aid has been shown to increase speech intelligibility in noise.
CAPD: ”Behavioral assessment”
Chapter 3.2 Speech Communication Human Performance Engineering Robert W. Bailey, Ph.D. Third Edition.
Perception: Attention – Module 11 General Psych 1 March 1, 2005 Class #11.
The Intersection of Hearing Science And Hearing Technology Brent Edwards, Ph.D. Executive Director Starkey Hearing Research Center Berkeley, CA.
Frank E. Musiek, Ph.D., Jennifer Shinn, M.S., and Christine Hare, M. A.
Sounds in a reverberant room can interfere with the direct sound source. The normal hearing (NH) auditory system has a mechanism by which the echoes, or.
Calibration of Consonant Perception in Room Reverberation K. Ueno (Institute of Industrial Science, Univ. of Tokyo) N. Kopčo and B. G. Shinn-Cunningham.
From last time …. ASR System Architecture Pronunciation Lexicon Signal Processing Probability Estimator Decoder Recognized Words “zero” “three” “two”
‘Missing Data’ speech recognition in reverberant conditions using binaural interaction Sue Harding, Jon Barker and Guy J. Brown Speech and Hearing Research.
Hearing Research Center
Auditory & tactile displays EGR 412 Human Factors Engineering ISE
Perceptual attention Theories of attention Early selection Late selection Resource theories Repetition blindness and the attentional blink.
When the Brain is attending a cocktail party When the Brain is attending a cocktail party Rossitza Draganova.
Katherine Morrow, Sarah Williams, and Chang Liu Department of Communication Sciences and Disorders The University of Texas at Austin, Austin, TX
Memory. Modal Model of the Mind Three memory stores Three memory stores Four Control Processes Four Control Processes Long-term memory Working or Short-term.
1 ISE 412 ATTENTION!!! From page 147 of Wickens et al. ATTENTION RESOURCES.
© Copyright McGraw-Hill 2004
Current Approaches to Management of DAS Michelle D. White.
Fletcher’s band-widening experiment (1940)
The role of reverberation in release from masking due to spatial separation of sources for speech identification Gerald Kidd, Jr. et al. Acta Acustica.
SOUND PRESSURE, POWER AND LOUDNESS
What can we expect of cochlear implants for listening to speech in noisy environments? Andrew Faulkner: UCL Speech Hearing and Phonetic Sciences.
Introduction Method Experiment 2 In spoken word recognition, phonological and indexical properties (i.e., characteristics of the speaker’s voice) of a.
Selective Attention
Speech and Singing Voice Enhancement via DNN
Aim To test Cherry’s findings on attention ‘more rigorously’. Sample
4aPPa32. How Susceptibility To Noise Varies Across Speech Frequencies
PSYCHOACOUSTICS A branch of psychophysics
Assist. Prof. Dr. Ilmiye Seçer Fall
Precedence-based speech segregation in a virtual auditory environment
Consistent and inconsistent interaural cues don't differ for tone detection but do differ for speech recognition Frederick Gallun Kasey Jakien Rachel Ellinger.
Ana Alves-Pinto, Joseph Sollini, Toby Wells, and Christian J. Sumner
Localizing a speech target in a multitalker mixture
B Shinn-Cunningham, V Best, ML Dent,
Attentional Tracking in Real-Room Reverberation
seeing unfamiliar voices
The cocktail party problem
An Introduction to Speechreading
Cognitive area The cognitive area sees behaviour as being heavily influenced by ones cognitive processes. The area likens human cognitive processes to.
Perception & Neurodynamics Lab
Shadowing Task Cherry, 1953 Attended Unattended
Presentation transcript:

Dynamic Aspects of the Cocktail Party Listening Problem Douglas S. Brungart Air Force Research Laboratory

2 Credits AFOSR Sponsored Research Team: Brian Simpson Alex Kordik Rich McKinley Mark Ericson Collaborators: Chris Darwin Gerald Kidd

3 Introduction 1)Energetic and Informational Masking: Speech in Noise vs Speech in Speech 2)Monaural speech segregation 3)Binaural and Dichotic speech segregation 4)Dynamic aspects of cocktail party problem 5)Audio-Visual cocktail party effects

4 Energetic Masking In classic speech-on-noise masking, only one type of masking occurs: Energetic Masking In Energetic Masking: -The masking sound is more intense than the target in one or more critical bands -Some portion of the target signal is inaudible at the periphery

5 Energetic Masking Articulation Theory Energetic masking in speech was studied for years by Fletcher and others at Bell Labs -Articulation Theory -Articulation Index (AI) Allows accurate prediction of intelligibility: -For any phonetically balanced vocabulary -For any continuous noise source -Plus numerous correction factors High-Amplitudes, Reverb, Peak-Clipping, etc.

6 Informational Masking Energetic Masking also occurs in Speech-on-Speech masking -Where signals overlap within critical band However, informational masking also occurs: Listeners hear two or more audible sounds, but can’t segregate them into separate messages Classic example: multi-tone complexes - No energetic overlap in stimuli, but substantial masking is observed (Kidd, Neff)

7 Data collected with Coordinate Response Measure -CRM Originally developed by Moore & McKinley (1980) - Format: Ready (Call Sign) go to (Color) (Number) now. - Target is indicated by call sign Baron - Maskers indicated by other call signs - Complete CRM corpus is available (Bolia et. al, 2001) - 8 Talkers in corpus (4 M, 4 F), 2048 Phrases - 8 Talkers x 4 Colors x 8 Numbers x 8 Call Signs - Embedded call-sign ideal for multitalker studies - Similar to many multichannel monitoring tasks Methods The Coordinate Response Measure (CRM)

8 "); document.writeln(""); document.writeln(" Your call sign is Baron. Listeners respond by selecting the appropriate colored digit with the computer mouse Methods The Coordinate Response Measure

9 Methods Pros and Cons of CRM Advantages of CRM: Rapid data collection: training and scoring Sentences are reusable Embedded call sign to designate target - does not require a priori designation Disadvantages of CRM: Limited vocabulary - partially offset by lack of context - not phonetically balaced Not “conversationally” realistic CRM emphasizes “speech on speech” masking

10 Methods Pros and Cons of CRM Advantages of CRM: Rapid data collection: training and scoring Sentences are reusable Embedded call sign to designate target - does not require a priori designation Disadvantages of CRM: Limited vocabulary - partially offset by lack of context - not phonetically balaced Not “conversationally” realistic CRM emphasizes “speech on speech” masking

11 Methods Pros and Cons of CRM Advantages of CRM: Rapid data collection: training and scoring Sentences are reusable Embedded call sign to designate target - does not require a priori designation Disadvantages of CRM: Limited vocabulary - partially offset by lack of context - not phonetically balaced Not “conversationally” realistic CRM emphasizes “informational” masking

12 Two-Talker Diotic Listening Results TM=Mod. Noise Masker TN=Cont. Noise Masker TD=Diff. Sex Masker TS=Same Sex Masker TT=Same Talker Masker

13 Two-Talker Diotic Listening Error Distribution Most errors match the color and number spoken by the masking talker…. This is indicative of informational masking

14 Three-Talker Diotic Listening Results T=Target Talker M=Mod. Noise Masker D=Diff. Sex Masker S=Same Sex Masker T=Same Talker Masker

15 Four-Talker Diotic Listening Results T=Target Talker M=Mod. Noise Masker D=Diff. Sex Masker S=Same Sex Masker T=Same Talker Masker

Talker Listening Results

17 Dichotic Listening Introduction To this point, all stimuli have been diotic Spatial separation is known to play a role - Cherry’s “Cocktail Party Problem” Dichotic masking is pure informational masking - No contralateral energetic masking occurs Previous results have suggested: - Almost perfect segregation across ears - Cherry, Broadbent, Triesman, Kidd, Neff, etc.

18 Dichotic Listening Procedure Dichotic listening similar to other procedure but 1) Talkers were known a priori - 1 male, 1 female target talker 2) 2 Talkers presented in right ear (T and M) 3) Masking signal was presented in left ear

19 Dichotic Listening Results With 2 talkers in right ear… Noise in left ear doesn’t interfere (Even when Loud) Speech interferes substantially… (Even when Quiet) Reversed Speech interferes… but only when target in right ear lower than masker in right ear

20 Binaural Listening Spatial Separation in Azimuth From the classic “cocktail party effect” Spatial separation improves segregation Diotic vs. 45˚ Separation, same-sex talkers

21 Binaural Listening Spatial Separation in Distance

22 Binaural Listening Spatial Separation in Distance With Natural Better-Ear SNR Cues, Both speech and noise Benefit from separation in distance

23 Binaural Listening Spatial Separation in Distance With normalization, speech is Better but Noise is not

24 Dynamic Aspects of Multitalker Listening Most Cocktail-Party Listening Experiments assume 1) Target talker is known (“Selective Attention”) 2) Target talker is unknown (“Divided Attention”) Real world listening falls in between these extremes - Attention focused primarily on one talker - Other talkers monitored for “important” info How do listeners adapt to conversational dynamics

25 Dynamic Cocktail Party Effects Multitalker Transition Probability Experiment: 3-Talker Condition 1)Standard CRM task 2) 2, 3, or 4 Spatially Separated Same-Sex Talkers - Close or Far separation for 2 and 3 talkers 3)5 Transition Probabilities (0-1) 4) 3 Talker Configurations - Talkers selected randomly - Each location assigned a talker - Target talker follows target location 5)Total of 106,200 Trials - Balanced by Target Talker and Target Location

26 Dynamic Cocktail Party Effects Multitalker Transition Probability Overall Perfomance Improves Gradually After Transitions

27 Conclusions ? 1)Speech-on-Speech  Speech-in-Noise - Deployment of Auditory Attention is Important - Signal “similarity” is a major factor - Spatial separation is particularly beneficial 2) Multitalker Listening is a Dynamic Process - Listeners adapt to source location changes over 5-8 trials - Listeners learn new situations quickly (10 trials) - Listeners adopt optimal listening strategies