Abstract We report comparisons between a model incorporating a bank of dual-resonance nonlinear (DRNL) filters and one incorporating a bank of linear gammatone.

Slides:

Advertisements

Similar presentations

Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements Christopher A. Shera, John J. Guinan, Jr., and Andrew J. Oxenham.

Advertisements

Hearing relative phases for two harmonic components D. Timothy Ives 1, H. Martin Reimann 2, Ralph van Dinther 1 and Roy D. Patterson 1 1. Introduction.

The case of the missing pitch templates: How harmonic templates emerge in the early auditory system Shihab Shamma and David Klein, 2000.

Psychoacoustics Riana Walsh Relevant texts Acoustics and Psychoacoustics, D. M. Howard and J. Angus, 2 nd edition, Focal Press 2001.

Multipitch Tracking for Noisy Speech

Periodicity and Pitch Importance of fine structure representation in hearing.

Purpose The aim of this project was to investigate receptive fields on a neural network to compare a computational model to the actual cortical-level auditory.

Hearing and Deafness 2. Ear as a frequency analyzer Chris Darwin.

CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.

The peripheral auditory system David Meredith Aalborg University.

Pitch Perception.

Foundations of Physics

INTRODUCTION TO HEARING. WHAT IS SOUND? amplitude Intensity measured in decibels.

Sound Chapter 15.

Source Localization in Complex Listening Situations: Selection of Binaural Cues Based on Interaural Coherence Christof Faller Mobile Terminals Division,

A.Diederich– International University Bremen – Sensation and Perception – Fall Frequency Analysis in the Cochlea and Auditory Nerve cont'd The Perception.

Physiology of the cochlea Mechanical response of cochlea in response to sound Two major functions: 1. Analysis of sound into components: Frequency/Spectral.

A computational model for simulating basilar-membrane nonlinearity in subjects with normal and impaired hearing Enrique A. Lopez-Poveda Centro Regional.

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

1 Multimedia Systems 1 Dr Paul Newbury School of Engineering and Information Technology ENGG II - 3A11 Ext: 2615.

Sound Transmission and Echolocation Sound transmission –Sound properties –Attenuation Echolocation –Decoding information from echos.

Structure and function

Spectral centroid 6 harmonics: f0 = 100Hz E.g. 1: Amplitudes: 6; 5.75; 4; 3.2; 2; 1 [(100*6)+(200*5.75)+(300*4)+(400*3.2)+(500*2 )+(600*1)] / = 265.6Hz.

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

Chapter 6: The Human Ear and Voice

1 ECE 3336 Introduction to Circuits & Electronics Note Set #12 Frequency Response More About Filters Spring 2015, TUE&TH 5:30-7:00 pm Dr. Wanda Wosik.

Lock-in amplifiers

ARO 2001 A minimal model for the diversity of onset responses in the ventral cochlear nucleus C.J. Sumner, R. Meddis, CNBH, Department of Psychology, University.

Digital Signals and Systems

„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.

EE Audio Signals and Systems Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Copyright ©2011 by Pearson Education, Inc. publishing as Pearson [imprint] Introductory Circuit Analysis, 12/e Boylestad Chapter 21 Decibels, Filters,

Resonance, Revisited March 4, 2013 Leading Off… Project report #3 is due! Course Project #4 guidelines to hand out. Today: Resonance Before we get into.

15.1 Properties of Sound  If you could see atoms, the difference between high and low pressure is not as great.  The image below is exaggerated to show.

Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.

Hearing Chapter 5. Range of Hearing Sound intensity (pressure) range runs from watts to 50 watts. Frequency range is 20 Hz to 20,000 Hz, or a ratio.

Methods Neural network Neural networks mimic biological processing by joining layers of artificial neurons in a meaningful way. The neural network employed.

1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:

1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.

Signals CY2G2/SE2A2 Information Theory and Signals Aims: To discuss further concepts in information theory and to introduce signal theory. Outcomes:

Studies of Information Coding in the Auditory Nerve Laurel H. Carney Syracuse University Institute for Sensory Research Departments of Biomedical & Chemical.

1 Inner Ear Physiology 2 3 Transduction Tympanic membrane Acoustical/mechanical Oval window Mechanical/hydraulic Basilar & tectorial membrane Hydraulic/mechanical.

Chapter 14 Sound. Sound is a pressure wave caused by vibrating sources. The pressure in the medium carrying the sound wave increases and decreases as.

Figure 7 Measured equal loudness curves. 4 Modelling random changes in the parameters along the length of the cochlea and the effect on hearing sensitivity.

HEARING MUSICAL ACOUSTICS Science of Sound Chapter 5 Further reading: “Physiological Acoustics” Chap. 12 in Springer Handbook of Acoustics, ed. T. Rossing.

Gammachirp Auditory Filter

Applied Psychoacoustics Lecture 3: Masking Jonas Braasch.

By Sarita Jondhale 1 The process of removing the formants is called inverse filtering The remaining signal after the subtraction of the filtered modeled.

Copyright ©2011 by Pearson Education, Inc. publishing as Pearson [imprint] Introductory Circuit Analysis, 12/e Boylestad Chapter 21 Decibels, Filters,

Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

Speech Segregation Based on Oscillatory Correlation DeLiang Wang The Ohio State University.

HEARING MUSICAL ACOUSTICS Science of Sound Chapter 5 Further reading: “Physiological Acoustics” Chap. 12 in Springer Handbook of Acoustics, ed. T. Rossing.

Sound and LightSection 1 Properties of Sound 〉 What are the characteristics of sound waves? 〉 Sound waves are caused by vibrations and carry energy through.

Ch 8. The Centrifugal Pathways(a) 강현덕. Contents  A. Introduction  B. The Olivocochlear Bundle 1. Anatomy 2. Neurotransmitters 3. Physiology.

Chap4. The auditory nerve Pronounced by Hwang semi.

PATTERN COMPARISON TECHNIQUES

Precedence-based speech segregation in a virtual auditory environment

EPSRC Perceptual Constancy Meeting

Tuning in the basilar membrane

C-15 Sound Physics 1.

CHAPTER 10 Auditory Sensitivity.

The Neural Response and the Auditory Code

Josh H. McDermott, Eero P. Simoncelli Neuron

Volume 61, Issue 2, Pages (January 2009)

Volume 66, Issue 6, Pages (June 2010)

Speech Perception (acoustic cues)

Detection of Cochlear Amplification and Its Activation

Tuning to Natural Stimulus Dynamics in Primary Auditory Cortex

Presentation transcript:

Abstract We report comparisons between a model incorporating a bank of dual-resonance nonlinear (DRNL) filters and one incorporating a bank of linear gammatone filters. Previous computational models of the auditory system have typically used linear gammatone filters to model the frequency selectivity of the basilar membrane. These filters have been adequate as a first approximation, but lack certain characteristics that have been demonstrated in psychophysical and physiological studies. These include compression, downward shifts in best frequency, and widening of the filter response with increasing level, and tuning curves showing a long low-frequency tail and sharp high-frequency cutoff. The complete model incorporates several stages of processing. English vowels synthesised using the Klatt synthesiser are passed through a pre-emphasis filter modelling the outer/middle ear transfer function. A filterbank models the frequency selectivity of the basilar membrane. Auditory nerve spikes are generated for each frequency channel using a model of inner hair cell/auditory nerve (IHC/AN) function. The spiking activity in each channel is used to generate an autocorrelation function (ACF) to display signal periodicity. The ACFs are summed across all channels to generate a summary autocorrelation function (SACF). The SACF picks up timbral properties of the vowels at delays from 0 to 4.5 ms. The model is run using both a nonlinear bank of filters and a linear bank of filters. The output patterns from the two banks of filters are distinctly different. Each linear filter shows a unique response dominated by its corresponding harmonic. By contrast, adjacent nonlinear filters may show similar responses dominated by the nearest spectral peak that is lower in frequency than the filter’s best frequency. This different pattern of responses in the filter channel is reflected in the ACF channels and therefore the SACF. In addition, the nonlinear model retains the same pattern of peaks and troughs in the SACF when the signal level is varied between 50 and 90 dB SPL, while the linear model shows large changes in the SACF at different levels. This is because the IHC/AN stage in the nonlinear model becomes saturated at low (< 50 dB) levels across all channels. The IHC/AN stage in the linear model only saturates slowly with increasing level in the spectral troughs, so the overall spike pattern, and therefore the SACF, changes as the level varies. We anticipate that the level invariance of the nonlinear model will facilitate vowel recognition in future modelling work. This investigation was carried out using the Development System for Auditory Modelling (DSAM).

Introduction We report comparisons of the responses to synthesised English vowels of two computational models of auditory processing. These two models differ only in the filters that are used to simulate the basilar membrane. The first, referred to as the linear model, incorporates a bank of linear (Gammatone) auditory filters. The second, referred to as the nonlinear model, incorporates a bank of Dual Resonance Nonlinear (DRNL) auditory filters. The linear model is typical of previous computational models in using gammatone filters (e.g Meddis & Hewitt, 1992; De Cheveigne, 1997). A problem is that linear filters lack certain characteristics known from physiology and psychophysics (e.g. Plack & Oxenham, 2000; Rhode & Cooper, 1996) : Compression with increasing level Downward shift in best frequency (BF) with increasing level Widening of filter bandwidth with increasing level Long low frequency tail and sharp high frequency cutoff. It is desirable to include such characteristics in modelling work. For this reason a nonlinear model is presented which uses DRNL filters to introduce the above characteristics, in order to see how they affect an established vowel representation, the summary autocorrelation (SACF). Of particular interest are changes in the SACFs with vowel intensity.

Model Description A computational model comprising several sequential processing stages. 1. Stimulus Input 2. Outer/middle ear filter 4. Inner hair cell excitation / auditory nerve spiking 6. Summary autocorrelation (SACF) 5. Autocorrelation function (ACF) 3. Filterbank - Linear or Nonlinear Stimulus input stage: English vowels synthesised using a Klatt (1980) synthesiser at a 10 kHz sampling rate. Subsequent stages are described on the following pages. All following figures show responses to vowels with a fundamental frequency of 100 Hz.

Outer/Middle Ear and Basilar Membrane Filtering Outer/middle ear pre-emphasis filter: Single linear bandpass filter 450 Hz - 5000 Hz 3dB down points. Basilar membrane (BM) filtering: Bank of 100 filters, logarithmically spaced Centre frequencies 100 - 4000 Hz Either linear or nonlinear filters. Each filter channel converts the pressure at its frequency to BM velocity. Synthesised vowel ‘Ah’ has formants at 650, 950, 2950, 3300 and 3850 Hz. Linear model clearly picks out the spectral peaks in the signal. Nonlinear model shows a response distributed across a wider range of filters. Linear Model: ‘Ah’ at 50 dB Nonlinear Model: ‘Ah’ at 50 dB

IHC Excitation / AN Spike Generation Meddis2000 IHC/AN module (Sumner et al, 2000) converts BM velocity into auditory nerve spikes. BM velocity generates excitation in the inner hair cell model, which leads to spike generation in the auditory nerve model. AN fibre refractory period of 1 ms. 170 high spontaneous rate (50 spike/s) auditory nerve fibres per channel. Limited dynamic range of 30 dB. AN spiking (linear model): ‘Ah’ at 50 dB

Autocorrelation & Summary Autocorrelation Functions Correlates IHC/AN output to itself at varying lags to produce an autocorrelation function (ACF) for each channel. The ACF detects signal periodicity for each channel - the dominant period(s) in each channel produce a peak at the corresponding lag. All ACF channels are summed at each lag to give the summary autocorrelation function (SACF). Note that the peak at 10 ms corresponds to the fundamental frequency of the vowel (100 Hz). ACF (linear model): ‘Ah’ at 50 dB

ACF Variations with Signal Level (Linear Model) ‘Ah’ ‘Ee’ Comparison of the ACF plot for the vowels ‘Ah’ and ‘Ee’ at 50 and 90 dB SPL. Individual channels generally respond differently to their neighbours. At 50 dB SPL the ACFs for each vowel are distinctive due to only the spectral peaks being active. With increasing signal levels the ACF changes as more channels become active. At high signal levels all the channels become saturated. The ACFs for each vowel therefore become more similar as spectral information is lost. 50 dB 90 dB

ACF Variations with Signal Level (NonLinear Model) ‘Ah’ ‘Ee’ Comparison of the ACF plot for the vowels ‘Ah’ and ‘Ee’ at 50 dB SPL and 90 dB SPL. Bands of coherent activity are visible in the figures. The ACFs for each vowel are distinctive at both 50 and 90 dB SPL. With increasing signal levels the ACF changes as more channels become active. 50 dB 90 dB

SACF variations with signal level Linear Model Nonlinear Model Ah Ee Ah Ee 50 dB 90 dB First formants correspond to lags of 1.54 ms (650 Hz) for ‘Ah’ and 4 ms (250 Hz) for ‘Ee’. Linear model SACFs pick out no strong formant features at 50 dB, only the contribution of 200 and 300 Hz harmonics (corresponding to lags of 5 ms and 3.3 ms respectively) at 50 and 90 dB. SACFs for each vowel are not significantly different for linear model at 90 dB. Nonlinear model SACFs do vary, but show peaks at the same lag across sound levels, corresponding to the first formants of the vowels.

Results The output patterns from the two banks of filters are distinctly different. Each linear filter shows a unique response dominated by its corresponding harmonic. Adjacent nonlinear filters may show similar responses that are dominated by the nearest spectral peak that is lower in frequency than the filter’s best frequency. The linear model does not highlight vowel formats in the SACF because few channels respond to the formant harmonics. This is the case even though the filterbank response at 50 dB shows the spectral peaks. At 90 dB, nearly all the channels in the linear model are saturated and so spectral information about the vowel formants becomes lost. The SACF is dominated by the more densely packed low-frequency channels, regardless of the spectral shape of the stimulus. In contrast, the nonlinear model retains a representation of the vowel formants across signal levels. Within each filter channel the strongest harmonic drives the activity in the channel. Many channels in the nonlinear model are saturated at 50 dB. There is less growth in activity with higher levels, just a spread of response to the first formant of the vowel, resulting in a more level-invariant response.

Conclusion Preliminary results suggest that the nonlinear model generates a more level-invariant representation of the vowel formants than the linear model allows, a property that is useful in respect to vowel identification. Although the linear filters are invariant with level, the combination of linear filters and a (nonlinear) hair cell model is not invariant with level. Nonlinear filters vary with level but the combination of nonlinear filters with a (nonlinear) hair cell model may be invariant with level. The linear model does not reflect vowel formants in the SACF at high signal levels, due to saturation in all channels. It would not be expected to distinguish between different vowels presented at high levels (90 dB). The nonlinear model in contrast does represent vowel formants and should therefore distinguish between different vowels. Ongoing work is investigating this prediction. We anticipate that the level invariance demonstrated here by the nonlinear model will improve vowel identification.

References Acknowledgments De Cheveigne, A. (1997). Concurrent vowel identification. III. A neural model of harmonic interference cancellation. Journal of the Acoustical Society of America, 101, 2857-2865. Klatt, D. H. (1980). Software for a cascade/parallel formant synthesizer. Journal of the Acoustical Society of America, 67, 971-990. Meddis, R. & Hewitt, M. J. (1991). Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: Pitch Identification. Journal of the Acoustical Society of America, 89, 2866-2882. Plack, C. J. & Oxenham, A. J. (2000). Basilar-membrane nonlinearity estimated by pulsation threshold. Journal of the Acoustical Society of America, 107, 501-507. Rhode, W. S. & Cooper, N. P. (1996). Nonlinear mechanics in the apical turn of the chinchilla cochlea in vivo. Auditory Neuroscience, 3, 101-121. Sumner, C. J., Meddis, R. & O’Mard, L. P. (2000). An enhanced computational model of the inner hair cell auditory-nerve complex. British Journal of Audiology, 34, 117. Acknowledgments This work was carried out using the Development System for Auditory Modelling (DSAM), developed by Dr. Lowel P. O’Mard.