Modelling compensation for reverberation: work done and planned Amy Beeston and Guy J. Brown Department of Computer Science University of Sheffield EPSRC.

Slides:

Advertisements

Similar presentations

Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements Christopher A. Shera, John J. Guinan, Jr., and Andrew J. Oxenham.

Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.

Hearing relative phases for two harmonic components D. Timothy Ives 1, H. Martin Reimann 2, Ralph van Dinther 1 and Roy D. Patterson 1 1. Introduction.

Periodicity and Pitch Importance of fine structure representation in hearing.

Purpose The aim of this project was to investigate receptive fields on a neural network to compare a computational model to the actual cortical-level auditory.

Hearing and Deafness 2. Ear as a frequency analyzer Chris Darwin.

CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.

Effect of reverberation on loudness perceptionInsert footer on Slide Master© University of Reading Department of Psychology 12.

Pitch Perception.

5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.

Artificial neural networks:

Source Localization in Complex Listening Situations: Selection of Binaural Cues Based on Interaural Coherence Christof Faller Mobile Terminals Division,

Collaboration FST-ULCO 1. Context and objective of the work  Water level : ECEF Localization of the water surface in order to get a referenced water.

Watkins, Raimond & Makin (2011) J Acoust Soc Am –2788 temporal envelopes in auditory filters: [s] vs [st] distinction is most apparent; - at higher.

A.Diederich– International University Bremen – Sensation and Perception – Fall Frequency Analysis in the Cochlea and Auditory Nerve cont'd The Perception.

1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.

Lecture 7 AM and FM Signal Demodulation

Frequency Response Methods and Stability

Sound Transmission and Echolocation Sound transmission –Sound properties –Attenuation Echolocation –Decoding information from echos.

Speech Perception Richard Wright Linguistics 453.

Cross-Spectral Channel Gap Detection in the Aging CBA Mouse Jason T. Moore, Paul D. Allen, James R. Ison Department of Brain & Cognitive Sciences, University.

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:

LE 460 L Acoustics and Experimental Phonetics L-13

Abstract We report comparisons between a model incorporating a bank of dual-resonance nonlinear (DRNL) filters and one incorporating a bank of linear gammatone.

EE Audio Signals and Systems Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

CE 4228 Data Communications and Networking

Acoustics/Psychoacoustics Huber Ch. 2 Sound and Hearing.

Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.

Methods Neural network Neural networks mimic biological processing by joining layers of artificial neurons in a meaningful way. The neural network employed.

perceptual constancy in hearing speech played in a room, several metres from the listener has much the same phonetic content as when played nearby despite.

Hearing & Aging Or age brings wisdom and other bad news.

speech, played several metres from the listener in a room - seems to have the same phonetic content as when played nearby - that is, perception is constant.

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

HEARING MUSICAL ACOUSTICS Science of Sound Chapter 5 Further reading: “Physiological Acoustics” Chap. 12 in Springer Handbook of Acoustics, ed. T. Rossing.

Gammachirp Auditory Filter

Applied Psychoacoustics Lecture 3: Masking Jonas Braasch.

Hearing Research Center

SOUND PRESSURE, POWER AND LOUDNESS MUSICAL ACOUSTICS Science of Sound Chapter 6.

By Sarita Jondhale 1 The process of removing the formants is called inverse filtering The remaining signal after the subtraction of the filtered modeled.

Loudness level (phon) An equal-loudness contour is a measure of sound pressure (dB SPL), over the frequency spectrum, for which a listener perceives a.

BCS547 Neural Decoding.

Applied Psychoacoustics Lecture 3: Masking

Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.

Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.

SOME SIMPLE MANIPULATIONS OF SOUND USING DIGITAL SIGNAL PROCESSING Richard M. Stern demo January 15, 2015 Department of Electrical and Computer.

Signal Analyzers. Introduction In the first 14 chapters we discussed measurement techniques in the time domain, that is, measurement of parameters that.

SOUND PRESSURE, POWER AND LOUDNESS

HEARING MUSICAL ACOUSTICS Science of Sound Chapter 5 Further reading: “Physiological Acoustics” Chap. 12 in Springer Handbook of Acoustics, ed. T. Rossing.

Comparison Between AM and FM Reception. 21/06/20162 FM Receiver.

Ch 8. The Centrifugal Pathways(a) 강현덕. Contents  A. Introduction  B. The Olivocochlear Bundle 1. Anatomy 2. Neurotransmitters 3. Physiology.

CS 591 S1 – Computational Audio -- Spring, 2017

distance, m (log scale) -25o 0o +25o C50, dB left right L-shaped

Loudness level (phon) An equal-loudness contour is a measure of sound pressure (dB SPL), over the frequency spectrum, for which a listener perceives a.

Presentation Overview

Precedence-based speech segregation in a virtual auditory environment

MART: Music Assisted Running Trainer

Loudness level (phon) An equal-loudness contour is a measure of sound pressure (dB SPL), over the frequency spectrum, for which a listener perceives a.

Ana Alves-Pinto, Joseph Sollini, Toby Wells, and Christian J. Sumner

Sampling rate conversion by a rational factor

EPSRC Perceptual Constancy Meeting

Temporal Processing and Adaptation in the Songbird Auditory Forebrain

Loudness asymmetry in real-room reverberation: cross-band effects

CHAPTER 10 Auditory Sensitivity.

TLK10xxx High Speed SerDes Overview

Volume 61, Issue 2, Pages (January 2009)

Temporal Processing and Adaptation in the Songbird Auditory Forebrain

Timescales of Inference in Visual Adaptation

Presentation transcript:

Modelling compensation for reverberation: work done and planned Amy Beeston and Guy J. Brown Department of Computer Science University of Sheffield EPSRC 12-month meeting, Sheffield: 23rd Oct 2009

1 of 36 Overview 1.Work done  sir-stir framework  3 Across-channel model configurations 2.Work planned  Across-channel model  Within-channel model  Further questions

2 of 36 Part 1: Work Done  sir-stir framework  3 Across-channel model configurations

Watkins’ sir/stir paradigm 3 of 36 Category boundary, step Human listeners Context distance, m effect of reverberation effect of compensation more ‘sir’ responses more ‘stir’ responses

4 of 36 Efferent auditory processing  Reverberating a speech signal reduces its dynamic range  Reflections fill gaps in the temporal envelope  Efferent system helps control dynamic range (Guinan & Gifford, 1988).  Could compensation be characterised as restoration of dynamic range? mean= small value mean/peak= mean= larger value mean/peak= dry reverberated

5 of 36 mean-to-peak ratio (MPR)  measured over some time-window  peak does not vary greatly with source-receiver distance  mean increases with source-receiver distance  MPR = mean/peak therefore MPR increases with distance

6 of 36 Modelling framework

7 of 36 Stimuli Watkins, JASA 2005, experiment 5  forward/reversed speech carrier forward/reversed reverberation fwdrev fwd rev speech carrier reverberation Watkins, JASA 2005, experiment 4  reverberate, then flip polarities: noise after flip polarities, then reverberate: noise before afterbefore noise

8 of 36 Auditory Periphery  Outer/middle ear Simulates human data from Huber et al. (2001)  Basilar membrane DRNL – dual resonance nonlinear filterbank (DRNL) Originally proposed by Meddis, O’Mard and Lopez-Poveda (2001) Human parameters from Meddis (2006) Efferent attenuation introduced by Ferry and Meddis (2007)  Hair cell Linear output between threshold and saturated firing rate (Messing 2007) Does not model adaptation in the auditory nerve

Best frequency (Hz, log-spaced) Time Auditory Nerve STEP 9 of 36 Spectro-temporal excitation patterns … ok, next you’ll getto click on … {} sir stir

10 of 36 Efferent attenuation based on dynamic range  Idea to control the amount of efferent attenuation applied in the model according to the dynamic range of the context  Dynamic range measured according to mean-to-peak ratio in AN response Kurtosisnegative differentials offsetsmean-to-peak ratio

11 of 36 Auditory Nerve response  Across-channel model the auditory nerve response is summed across all frequency channels prior to implementation of the efferent system component  Within-channel model the auditory nerve response is NOT summed across all frequency channels

12 of 36 Across Channel  the auditory nerve response is summed across all frequency channels prior to implementation of the efferent system component Σ MPRATT frequency >> time >>

13 of 36 Within Channel  Auditory nerve response in each frequency channel influences the efferent system component frequency >> MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT time >>

Efferent attenuation MPR ATT  Linear map from MPR (of summed AN) to efferent attenuation, ATT  ATT turns down the gain on the non-linear pathway of DRNL  The rate-intensity curve shifts to the right 14 of 36

15 of 36 Recognition  helps to recover the dip in the temporal envelope corresponding to the ‘t’ closure in ‘stir’ Templates: sirstir

16 of 36 3 Model configurations  Open loop  Semi-closed loop amount of attenuation is estimated during one second preceding the test- word, and held constant thereafter  Closed loop amount of attenuation is estimated continually in a sliding time window, and updated on a sample-by-sample basis (or at a specified control rate)

17 of 36 Efferent system: Open loop  Open loop  Many simulations were run: the amount of attenuation applied was varied across a range of values (0-30 dB), and the category boundary resulting were recorded in calibration charts.  The ‘best-match’ to human results was found (manually) for each condition.  Near contexts match best with low attenuation values, while far contexts match best with higher attenuation values.

18 of 36 Results: Open loop Category Boundary results Attenuation applied (dB) farnear Attenuation applied (dB) Calibration curves for tuning 0, 0.5, … 29.5, 30

19 of 36 Efferent system: Semi-closed loop  Semi-closed loop  amount of attenuation is estimated during one second preceding the test- word, and held constant thereafter

20 of 36 Semi-closed loop ATT …………… ok, next you’ll getto click on …………… {} sir stir  Examine context within time window to derive a metric value  Use metric value to determine the efferent attenuation

21 of 36 Metric: Semi-closed loop MPR MPR in experiment 5 (JASA 2005)  across-channel AN response measured over 1s window before test-word  increases with context distance  small difference for reversed-carrier (squares/circles)  larger difference (decrease) for reversed-reverb (black/white)

22 of 36 Results: Semi-closed loop  Tuned to match near-near and far-far (fwd fwd) conditions  experiment 5 achieves qualitative (not quantitative) match to human data…  …but experiment 4 conditions do not match well ATTENUATION=(38.36*MPR) fwdrev forward reverse speech carrier reverberation before after

23 of 36 Efferent system: Closed-loop  Closed loop  amount of attenuation is estimated continually in a sliding time window, and updated on a sample-by-sample basis (or at a specified control rate)

24 of 36 Closed loop ATT …………… ok, next you’ll getto click on …………… {} sir stir  Examine context within time window to derive a metric value  Use metric value to determine the efferent attenuation applied  Window slides forward, process repeats…

25 of 36 Metric: Closed loop MPR Expt. 5 MPR in experiment 5 (JASA 2005)  measured continually over window, shifting forwards from start of context  Insignificant change with reversal of speech carrier (left/right)  Significant change with reversal of reverberation (top/bottom) MPR through time forward reverse reverberation fwdrev speech carrier

26 of 36 Closed loop (expt 5)  tuned to ‘best’ match near-near and far-far (fwd fwd) conditions  variation possible due to granularity of model (± 0.5) ATTENUATION=(45*MPR)+18ATTENUATION=(45*MPR)+19 fwdrev speech carrier fwdrev speech carrier fwd rev reverberation fwd rev reverberation

27 of 36 Closed loop (expt 4)  MPR mapping does not generalise for experiment 4 noise contexts ATTENUATION=(45*MPR)+18 fwdrev speech carrier fwd rev reverberation before after

28 of 36 Part 2: Work Planned  Across-channel model  Within-channel model  Further questions

29 of 36 Across-channel model Practical considerations  Control rate specified to speed up the simulation (usually 1 kHz i.e., attenuation parameter is updated every 1 ms)  Time-window over which to determine metric (usually previous 1 second, different values under investigation at present)  Shape of window (rectangular at present, should have a ‘forgetting function’) Question to Tony et al.  What data can we use to determine the shape/duration window?

30 of 36 Across-channel model Σ MPRATT frequency >> time >>  window shape/duration? time >> weight

31 of 36 Within-channel model  Previously we asked what duration and shape is the metric-window in time.  Now we ask what duration and shape is the metric-window in frequency.  /t/ is defined by sharp onset burst 2->8 kHz (Régnier & Allen, 2008)  template matching over restricted areas of the frequency domain

32 of 36 Within-channel model Frequency-dependent suppression:  Feedback from efferent system appears to be fairly narrowly tuned  fall-off in the effect of efferent-induced threshold shift at low BFs [data from cat, Guinan & Gifford (1988)]  improves representation of low-frequency speech structure when efferent attenuation is high Modelling implications:  Need no longer be a pooled auditory nerve (STEP) response for metric/map to attenuation  Each channel can react quasi-independently to the audio context it hears

33 of 36 Within-channel model frequency >> MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT MPRATT time >> weight  window shapes/durations?

34 of 36 Questions Is there a time-analogy to the frequency gaps in 8-band stimuli? -imposing gaps so that bits are missing from the freq/time pattern in the context window. -might allow an importance weighting for time-bands like for the frequency bands.

35 of 36 Implication? What happens with a silent context?  Physiology predicts that efferent system is not activated  Model predicts small dynamic range, - maximum mean/peak ratio - high efferent attenuation - low category boundary (more stirs)  specifically, if (when) context is shorter than metric window: - should we shorten the metric window? - zero pad the utterances? - count previous trial as context?

36 of 36 Thanks  Tony Watkins, Simon Makin and Andrew Raimond of Reading University for all the data.  Ray Meddis and Robert Ferry of Essex University for the DRNL program code.  Kalle Palomäki, Hynek Hermansky and Roger Moore for discussion.

The end