Download presentation
Presentation is loading. Please wait.
Published byRussell Wells Modified over 9 years ago
1
Modelling compensation for reverberation: work done and planned Amy Beeston and Guy J. Brown Department of Computer Science University of Sheffield EPSRC 12-month meeting, Sheffield: 23rd Oct 2009
2
1 of 36 Overview 1.Work done sir-stir framework 3 Across-channel model configurations 2.Work planned Across-channel model Within-channel model Further questions
3
2 of 36 Part 1: Work Done sir-stir framework 3 Across-channel model configurations
4
Watkins’ sir/stir paradigm 3 of 36 Category boundary, step Human listeners Context distance, m 100.32 10 5 0 effect of reverberation effect of compensation more ‘sir’ responses more ‘stir’ responses
5
4 of 36 Efferent auditory processing Reverberating a speech signal reduces its dynamic range Reflections fill gaps in the temporal envelope Efferent system helps control dynamic range (Guinan & Gifford, 1988). Could compensation be characterised as restoration of dynamic range? mean= small value mean/peak= 0.1216 mean= larger value mean/peak= 0.2142 dry reverberated
6
5 of 36 mean-to-peak ratio (MPR) measured over some time-window peak does not vary greatly with source-receiver distance mean increases with source-receiver distance MPR = mean/peak therefore MPR increases with distance
7
6 of 36 Modelling framework
8
7 of 36 Stimuli Watkins, JASA 2005, experiment 5 forward/reversed speech carrier forward/reversed reverberation fwdrev fwd rev speech carrier reverberation Watkins, JASA 2005, experiment 4 reverberate, then flip polarities: noise after flip polarities, then reverberate: noise before afterbefore noise
9
8 of 36 Auditory Periphery Outer/middle ear Simulates human data from Huber et al. (2001) Basilar membrane DRNL – dual resonance nonlinear filterbank (DRNL) Originally proposed by Meddis, O’Mard and Lopez-Poveda (2001) Human parameters from Meddis (2006) Efferent attenuation introduced by Ferry and Meddis (2007) Hair cell Linear output between threshold and saturated firing rate (Messing 2007) Does not model adaptation in the auditory nerve
10
Best frequency (Hz, log-spaced) 100 8000 100 8000 100 8000 Time Auditory Nerve STEP 9 of 36 Spectro-temporal excitation patterns … ok, next you’ll getto click on … {} sir stir
11
10 of 36 Efferent attenuation based on dynamic range Idea to control the amount of efferent attenuation applied in the model according to the dynamic range of the context Dynamic range measured according to mean-to-peak ratio in AN response Kurtosisnegative differentials offsetsmean-to-peak ratio
12
11 of 36 Auditory Nerve response Across-channel model the auditory nerve response is summed across all frequency channels prior to implementation of the efferent system component Within-channel model the auditory nerve response is NOT summed across all frequency channels
13
12 of 36 Across Channel the auditory nerve response is summed across all frequency channels prior to implementation of the efferent system component Σ MPRATT frequency >> time >>
14
13 of 36 Within Channel Auditory nerve response in each frequency channel influences the efferent system component frequency >> MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT time >>
15
Efferent attenuation MPR ATT Linear map from MPR (of summed AN) to efferent attenuation, ATT ATT turns down the gain on the non-linear pathway of DRNL The rate-intensity curve shifts to the right 14 of 36
16
15 of 36 Recognition helps to recover the dip in the temporal envelope corresponding to the ‘t’ closure in ‘stir’ Templates: sirstir
17
16 of 36 3 Model configurations Open loop Semi-closed loop amount of attenuation is estimated during one second preceding the test- word, and held constant thereafter Closed loop amount of attenuation is estimated continually in a sliding time window, and updated on a sample-by-sample basis (or at a specified control rate)
18
17 of 36 Efferent system: Open loop Open loop Many simulations were run: the amount of attenuation applied was varied across a range of values (0-30 dB), and the category boundary resulting were recorded in calibration charts. The ‘best-match’ to human results was found (manually) for each condition. Near contexts match best with low attenuation values, while far contexts match best with higher attenuation values.
19
18 of 36 Results: Open loop Category Boundary results 12.5 9.0 22.0 21.5 Attenuation applied (dB) farnear Attenuation applied (dB) Calibration curves for tuning 0, 0.5, … 29.5, 30
20
19 of 36 Efferent system: Semi-closed loop Semi-closed loop amount of attenuation is estimated during one second preceding the test- word, and held constant thereafter
21
20 of 36 Semi-closed loop ATT …………… ok, next you’ll getto click on …………… {} sir stir Examine context within time window to derive a metric value Use metric value to determine the efferent attenuation
22
21 of 36 Metric: Semi-closed loop MPR MPR in experiment 5 (JASA 2005) across-channel AN response measured over 1s window before test-word increases with context distance small difference for reversed-carrier (squares/circles) larger difference (decrease) for reversed-reverb (black/white)
23
22 of 36 Results: Semi-closed loop Tuned to match near-near and far-far (fwd fwd) conditions experiment 5 achieves qualitative (not quantitative) match to human data… …but experiment 4 conditions do not match well ATTENUATION=(38.36*MPR)+13.77 fwdrev forward reverse speech carrier reverberation before after
24
23 of 36 Efferent system: Closed-loop Closed loop amount of attenuation is estimated continually in a sliding time window, and updated on a sample-by-sample basis (or at a specified control rate)
25
24 of 36 Closed loop ATT …………… ok, next you’ll getto click on …………… {} sir stir Examine context within time window to derive a metric value Use metric value to determine the efferent attenuation applied Window slides forward, process repeats…
26
25 of 36 Metric: Closed loop MPR Expt. 5 MPR in experiment 5 (JASA 2005) measured continually over window, shifting forwards from start of context Insignificant change with reversal of speech carrier (left/right) Significant change with reversal of reverberation (top/bottom) MPR through time forward reverse reverberation fwdrev speech carrier
27
26 of 36 Closed loop (expt 5) tuned to ‘best’ match near-near and far-far (fwd fwd) conditions variation possible due to granularity of model (± 0.5) ATTENUATION=(45*MPR)+18ATTENUATION=(45*MPR)+19 fwdrev speech carrier fwdrev speech carrier fwd rev reverberation fwd rev reverberation
28
27 of 36 Closed loop (expt 4) MPR mapping does not generalise for experiment 4 noise contexts ATTENUATION=(45*MPR)+18 fwdrev speech carrier fwd rev reverberation before after
29
28 of 36 Part 2: Work Planned Across-channel model Within-channel model Further questions
30
29 of 36 Across-channel model Practical considerations Control rate specified to speed up the simulation (usually 1 kHz i.e., attenuation parameter is updated every 1 ms) Time-window over which to determine metric (usually previous 1 second, different values under investigation at present) Shape of window (rectangular at present, should have a ‘forgetting function’) Question to Tony et al. What data can we use to determine the shape/duration window?
31
30 of 36 Across-channel model Σ MPRATT frequency >> time >> window shape/duration? time >> weight
32
31 of 36 Within-channel model Previously we asked what duration and shape is the metric-window in time. Now we ask what duration and shape is the metric-window in frequency. /t/ is defined by sharp onset burst 2->8 kHz (Régnier & Allen, 2008) template matching over restricted areas of the frequency domain
33
32 of 36 Within-channel model Frequency-dependent suppression: Feedback from efferent system appears to be fairly narrowly tuned fall-off in the effect of efferent-induced threshold shift at low BFs [data from cat, Guinan & Gifford (1988)] improves representation of low-frequency speech structure when efferent attenuation is high Modelling implications: Need no longer be a pooled auditory nerve (STEP) response for metric/map to attenuation Each channel can react quasi-independently to the audio context it hears
34
33 of 36 Within-channel model frequency >> MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT time >> weight window shapes/durations?
35
34 of 36 Questions Is there a time-analogy to the frequency gaps in 8-band stimuli? -imposing gaps so that bits are missing from the freq/time pattern in the context window. -might allow an importance weighting for time-bands like for the frequency bands.
36
35 of 36 Implication? What happens with a silent context? Physiology predicts that efferent system is not activated Model predicts small dynamic range, - maximum mean/peak ratio - high efferent attenuation - low category boundary (more stirs) specifically, if (when) context is shorter than metric window: - should we shorten the metric window? - zero pad the utterances? - count previous trial as context?
37
36 of 36 Thanks Tony Watkins, Simon Makin and Andrew Raimond of Reading University for all the data. Ray Meddis and Robert Ferry of Essex University for the DRNL program code. Kalle Palomäki, Hynek Hermansky and Roger Moore for discussion.
38
The end
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.