Precedence-based speech segregation in a virtual auditory environment

Slides:



Advertisements
Similar presentations
Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements Christopher A. Shera, John J. Guinan, Jr., and Andrew J. Oxenham.
Advertisements

Frequency representation The ability to use the spectrum or the fine structure of sound to detect, discriminate, or identify sound.
Auditory scene analysis 2
Psychoacoustics Perception of Direction AUD202 Audio and Acoustics Theory.
Binaural Hearing Or now hear this! Upcoming Talk: Isabelle Peretz Musical & Non-musical Brains Nov. 12 noon + Lunch Rm 2068B South Building.
Sound source segregation Development of the ability to separate concurrent sounds into auditory objects.
Periodicity and Pitch Importance of fine structure representation in hearing.
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
Speech Science XII Speech Perception (acoustic cues) Version
Hearing Detection Loudness Localization Scene Analysis Music Speech.
Pitch Perception.
Foundations of Physics
Reflections Diffraction Diffusion Sound Observations Report AUD202 Audio and Acoustics Theory.
1 Auditory Sensitivity, Masking and Binaural Hearing.
AUDITORY LOCALIZATION Lynn E. Cook, AuD Occupational Audiologist NNMC, Bethesda, MD.
A.Diederich– International University Bremen – Sensation and Perception – Fall Frequency Analysis in the Cochlea and Auditory Nerve cont'd The Perception.
Interrupted speech perception Su-Hyun Jin, Ph.D. University of Texas & Peggy B. Nelson, Ph.D. University of Minnesota.
Hearing & Deafness (3) Auditory Localisation
Two- tone unmasking and suppression in a forward-masking situation Robert V. Shannon 1976 Spring 2009 HST.723 Theme 1: Psychophysics.
Spectral centroid 6 harmonics: f0 = 100Hz E.g. 1: Amplitudes: 6; 5.75; 4; 3.2; 2; 1 [(100*6)+(200*5.75)+(300*4)+(400*3.2)+(500*2 )+(600*1)] / = 265.6Hz.
A.Diederich– International University Bremen – USC – MMM – Spring Onset and offset Sounds that stop and start at different times tend to be produced.
Sound source segregation (determination)
Applied Psychoacoustics Lecture 2: Thresholds of Hearing Jonas Braasch.
Hearing.
Beats and Tuning Pitch recognition Physics of Music PHY103.
15.1 Properties of Sound  If you could see atoms, the difference between high and low pressure is not as great.  The image below is exaggerated to show.
SOUND IN THE WORLD AROUND US. OVERVIEW OF QUESTIONS What makes it possible to tell where a sound is coming from in space? When we are listening to a number.
Mr Background Noise and Miss Speech Perception in: by Elvira Perez and Georg Meyer.
Applied Psychoacoustics Lecture: Binaural Hearing Jonas Braasch Jens Blauert.
Chapter 5: Normal Hearing. Objectives (1) Define threshold and minimum auditory sensitivity The normal hearing range for humans Define minimum audible.
Pure Tone Audiometry most commonly used test for evaluating auditory sensitivity delivered primarily through air conduction and bone conduction displayed.
Sounds in a reverberant room can interfere with the direct sound source. The normal hearing (NH) auditory system has a mechanism by which the echoes, or.
speech, played several metres from the listener in a room - seems to have the same phonetic content as when played nearby - that is, perception is constant.
Localization of Auditory Stimulus in the Presence of an Auditory Cue By Albert Ler.
Developing a model to explain and stimulate the perception of sounds in three dimensions David Kraljevich and Chris Dove.
Dynamic Aspects of the Cocktail Party Listening Problem Douglas S. Brungart Air Force Research Laboratory.
‘Missing Data’ speech recognition in reverberant conditions using binaural interaction Sue Harding, Jon Barker and Guy J. Brown Speech and Hearing Research.
Figures for Chapter 14 Binaural and bilateral issues Dillon (2001) Hearing Aids.
Hearing Research Center
Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none.
Reading Assignment! We’ll discuss the chapter by Gregory in your book on Thursday of next week.
MASKING BASIC PRINCIPLES CLINICAL APPROACHES. Masking = Preventing Crossover Given enough intensity any transducer can stimulate the opposite cochlea.
The Ear As a Frequency Analyzer Reinier Plomp, 1976.
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
Hearing Detection Loudness Localization Scene Analysis Music Speech.
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
PSYC Auditory Science Spatial Hearing Chris Plack.
Fletcher’s band-widening experiment (1940)
The role of reverberation in release from masking due to spatial separation of sources for speech identification Gerald Kidd, Jr. et al. Acta Acustica.
What can we expect of cochlear implants for listening to speech in noisy environments? Andrew Faulkner: UCL Speech Hearing and Phonetic Sciences.
SPATIAL HEARING Ability to locate the direction of a sound. Ability to locate the direction of a sound. Localization: In free field Localization: In free.
Sound Localization and Binaural Hearing
Hearing after Hemispherectomy Frank E. Musiek, Ph.D., CCC-A Professor and Director NeuroAudiology Lab University of Arizona Selected comments on …… Hemispherectomy.
Speech and Singing Voice Enhancement via DNN
Auditory Localization in Rooms: Acoustic Analysis and Behavior
Auditorium Acoustics 1. Sound propagation (Free field)
4aPPa32. How Susceptibility To Noise Varies Across Speech Frequencies
PSYCHOACOUSTICS A branch of psychophysics
Interference.
Objective and Subjective Audio Assessment of MP3 Players’ Quality
Consistent and inconsistent interaural cues don't differ for tone detection but do differ for speech recognition Frederick Gallun Kasey Jakien Rachel Ellinger.
Ana Alves-Pinto, Joseph Sollini, Toby Wells, and Christian J. Sumner
3) determine motion and sound perceptions.
1.
FM Hearing-Aid Device Checkpoint 2
Cognitive Processes PSY 334
What if cross-hearing is a problem?
C-15 Sound Physics 1.
CHAPTER 10 Auditory Sensitivity.
Speech Perception (acoustic cues)
Presentation transcript:

Precedence-based speech segregation in a virtual auditory environment Brungart, Simpson & Freyman (2005)

The Precedence Effect Sounds produced in areas with multiple surfaces give rise to reflections. Many copies of a sound reach a listener’s ears. The direct sound arrives first. With complex sounds like speech, early reflections tend to perceptually “fuse” with the direct sound (the “Haas” effect). The direct sound dominates localisation – the precedence effect. D ms Delay = +/- 0.5 ms > “summing localisation” <- Perceived direction two sources perceived D > 1 ms > “precedence effect” D > 20 ms > “echo threshold”

Masking “…the amount of interference one stimulus can cause in the perception of another stimulus.” (Yost and Nielsen, 1977) The elevation in threshold of a target signal due to the presence of a masker. Energetic masking “…masking that results from competition between target and masker at the periphery of the auditory system, i.e., overlapping excitation patterns in the cochlea or auditory nerve (AN).” (Durlach et al., 2003) Informational masking Non-energetic masking Central masking “difficulty segregating the audible acoustic components of the target speech signal from the audible acoustic components of a perceptually similar speech masker.” (pp. 3241).

Some Assumptions Speech target Random noise masker = purely energetic masking? Speech masker = energetic and informational masking? So if an experimental manipulation affects the amount of masking produced by the speech masker but not the noise masker – this is due to a reduction in informational masking? Seems reasonable

The Basic Experiment Freyman et al., 99 – free-field. Brungart et al. – virtual auditory space over headphones F-F – Baseline masking F-R – Release from masking regardless of type of masker F-RF – Release from masking with speech but NOT with noise masker

Experiment 1 Adding delayed copy of noise to front presented stimulus drops performance to baseline Adding delayed copy of speech to front hardly makes any difference Note: using a speech recognition task which is resistant to energetic masking - Therefore large informational masking component?

Interpretation The precedence effect causes the listener to localise the RF masker off to the right, which helps auditory selective attention attend to the target speech, hence reducing informational masking. This doesn’t affect the noise masker because it has no informational masking effect – adding it to the front just increases its energetic masking effect. BUT – The effect is also observed when the delay is negative, so that the first copy of the masker comes from the front (i.e. F-FR). (Freyman et al. 1999) Precedence should localise the masker to the front in this condition – so why the release from masking with a speech masker?

Experiment 2 F-RF F-FR SNR - 8dB 0 dB ? What is the effect of varying the delay between the two masker presentations between +/– 64 ms? For a noise masker? Very little. Some release from masking at delays which cause “notches” in the spectrum of the masker far enough apart to be resolved by the ear For a single-speaker speech masker? Little effect of delay, positive or negative, until the “echo threshold” is exceeded For a two-speaker speech masker? Much more variation, but still substantial release from masking. Possibly some release from energetic masking effects Note that as speakers are added, multi-speaker babble approaches speech-shaped noise. Baseline

A Puzzle There is virtually no difference between positive and negative delays with the single-speaker masker and not much of an advantage with the two-speaker masker What is going on here? Two possibilities (actually 3, but I’ll come back to this): 1) The effect is not based on perceived location, but on timbre or “ source width” 2) Even when the copy of the masker added to the front leads the one from the right, the one to the right “pulls” the perceived location off a little so that it is perceived somewhere between front and right If (2) is the case, then shifting the apparent location of the target to match that of the masker, should abolish the release from masking

Experiment 3 Position of target varied from 0o to 60o In 5o steps, at 7 different delay values from + to – 4ms. U-shaped performance curves for all 3 maskers at D = 0 ms. Masker heard midway between front and right. For the two-speaker masker, when there is a lag (+ve D) > 0.5 ms, subjects do best when target is located near the front (0o). As expected When there is a lead (-ve D) > 0.5 ms, subjects do best when target is located to the right. BUT – the minimum performance is found around 10o – NOT at 0o

Conclusions This would appear to support the hypothesis mentioned earlier BUT – why is there not a similar minima around 50o when there is a positive delay? Also – energetic and informational masking do not seem to have been completely separated by this paradigm as was first thought AND – no mention is made of the phenomena of the BMLD: Whenever the phase or level differences of the target signal at the 2 ears are not the same as those of the masker, ability to detect or identify the target improves Inversion of the signal at one ear gives better performance than delaying it – so not just segregation by spatial separation Large BMLD’s occur when target and masker are not subjectively well separated Hearing is sensitive to the profile of interaural decorrelation across frequency This could potentially explain why negative delays are as useful as positive delays – adding a delayed copy of the masker at the right changes the interaural correlation of the masker relative to the target But this still wouldn’t explain the difference between speech and noise…