Download presentation
Presentation is loading. Please wait.
Published byEaster Goodwin Modified over 9 years ago
1
Source Localization in Complex Listening Situations: Selection of Binaural Cues Based on Interaural Coherence Christof Faller Mobile Terminals Division, Agere Systems Juha Merimaa Institut für Kommunikationsakustik, Ruhr-Universität Bochum
2
Complex listening situations Jazz Blaah, blaah, blaah Hum Speech source at -15º, good music at 50º, and noise through an open door at -125º azimuth
3
This work A model to extract binaural cues corresponding to human localization performance in several complex listening situations
4
Outline 1. Model descripiton 2. Simulation results A) Independent sources in free-field B) Precedence effect C) Independent sources and reverberation 3. Comparison with earlier models 4. Summary
5
HRTF/ BRIR 1 Left ear input Stimulus 1 HRTF/ BRIR N Right ear input Gammatone filterbank HRTF/ BRIR N HRTF/ BRIR 1 Stimulus N Internal noise Normalized cross-correlation & level difference calculation Model of neural transduction Exponential time window 10 ms Bernstein et al. 1999
6
Extraction of binaural cues Estimated at each time instant: – Interaural Time Difference (ITD) Time lag of the maximum of the normalized cross-correlation – Interaural Level Difference (ILD) Ratio of signal energies within time window – Interaural coherence (IC) Maximum of the normalized cross-correlation
7
Assumption for correct localization The auditory system needs to acquire ITD and ILD cues similar to those evoked by each source separately in an anechoic environment
8
Example: Two active sound sources Superposition with different level and phase relations at left and right ears For independent or non-stationary source signals: – Time-varying binaural cues – Reduced IC
9
How to obtain correct localization cues? Simply select ITDs and ILDs only when IC is above a set threshold – An adaptive threshold is assumed
10
Simulation results
11
1. Effect of number of sources Speech sources at same overall level (Hawley et al. 1999; Drullman & Bronkhorst 2000) – One or two distracters have little effect on localization performance – Performance is still good for 5 competing sources Simulations with different phonetically balanced sentences recorded by the same male speaker
12
Two talkers, ±40º azimuth 65 and 58 % selected signal power
13
3 and 5 talkers Simulated at 500 Hz critical band 3 talkers: 0º and ±40º azimu th 5 talkers: 0º, ±40º, and ±80º azimuth
14
3 talkers: c 0 = 0.99 p 0 = 54 % 5 talkers: c 0 = 0.99 p 0 = 22 % All cues Selected cues
15
2. Effect of target-to-distracter ratio Click-train target in presence of a white noise distracter – Target is localizable down to a few dB above detection threshold (Good & Gilkey 1996; Good et al. 1997) – High frequencies are more important for localization (Lorenzi & et al. 1999)
16
Simulation 2 kHz critical band White noise at 0º azimuth 100 Hz clicktrain at 30º azimuth -3, -9, and -21 dB absolute target-to- distracter ratios (T/D) – Corresponds to 8, 2, and -10 dB T/D relative to detection threshold, as defined by Good & Gilkey (1996)
17
-3 dB T/D c 0 = 0.990, p 0 = 3 % -9 dB T/D c 0 = 0.992, p 0 = 9 % -21 dB T/D c 0 = 0.992, p 0 = 99 % All cues Selected cues
18
Precedence effect Perception of subsequent sound events – Fusion – Localization dominance by the first event – Suppression of directional discrimination of latter events Depends on interstimulus delay – Summing localization (approx. 0-1 ms) – Localization dominance by first event (stimulus dependent, until 2-50 ms) – Independent localization
19
1. Click pairs Classical precedence effect experiment: Two consecutive clicks with same level from different directions
20
Lead: 40º, lag: -40º, ICI: 5 ms
21
Click pairs as a function of inter- click interval (ICI) Simulations for ICI between 0 - 20 ms Same click sources: ±40º azimuth 500 Hz critical band A single threshold did not predict all cases correctly – Threshold was determined for each ICI such that the standard deviation of ITD is 15 μs
22
Click pairs as a function of ICI
24
Note on crossfrequency processing At certain small ICIs the required IC threshold gets very high – Anomalies of precedence effect have been reported for bandpass filtered clicks (Blauert & Cobben 1978) Some characteristic power peaks occur at different ICIs at different critical bands Across frequency band processing would allow extraction of correct cues
25
2. Sinusoidal tones and a reflection Steady state cues are a result of coherent summation of sound at the ears of a listener Localization depends on onset rate (Rakerd & Hartmann 1986) – Correct localization with a fast onset – Localization based on misleading steady state cues for tones with a slow onset
26
Sinusoidal tones: Simulation 500 Hz sinusoidal tone Direct sound from 0º azimuth Reflection after 1.4 ms from 30º Linear onset ramp Steady state level of 65 dB SPL
28
Sinusoidal tones: Results The model cannot as such explain discounting of the steady state cues Dependence on onset rate can be explained by considering cues at the time when signal level gets high enough above internal noise
29
Independent sources and reverberation Final test for the model Simulation at 2 kHz critical band – One speech sources at 30º azimuth – Two speech sources at ±30º azimuth BRIRs measured in a hall with RT = 1.4 s at 2 kHz octave band
30
All cues Selected cues 1 talker: c 0 = 0.99 p 0 = 1 % 2 talkers: c 0 = 0.99 p 0 = 1 %
31
Comparison with earlier models
32
Weighting of localization cues with signal power Not done outside 10 ms analysis window Contribution of each time instant to localization is defined by IC Model can neglect information corresponding to high power when due to concurrent activity of several sources Power still affects how often ITDs and ILDs of individual sources are sampled
33
Lindemann (1986) Based on contralateral inhibition using a fixed (10 ms) time constant Tends to hold cross-correlation peaks with high IC Differences – Operation of the cue selection method is not limited to the 10 ms time window – When necessary (complex situations), the “memory” of past cues can last longer
34
Zurek (1987) Localization inhibition controlled by onset detection In precedence effect conditions, the cue selection naturally derives most localization cues from onsets Differences – Cue selection is not limited to getting information from signal onsets
35
Summary A method was proposed for modeling auditory localization in presence of concurrent sound ITD and ILD cues are selected only when they coincide with a large IC Operation of the model was verified with results of several psychoacoustical studies from the literature
36
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.