Auditory Scene Analysis

Slides:



Advertisements
Similar presentations
Hearing Complex Sounds
Advertisements

Auditory scene analysis 2
Early auditory novelty processing in humans: auditory brainstem and middle-latency responses Slabu L, Grimm S, Costa-Faidella J, Escera C.
The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/
Periodicity and Pitch Importance of fine structure representation in hearing.
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
Auditory Scene Analysis (ASA). Auditory Demonstrations Albert S. Bregman / Pierre A. Ahad “Demonstration of Auditory Scene Analysis, The perceptual Organisation.
Cortical Encoding of Natural Auditory Scenes Brigid Thurgood.
A.Diederich– International University Bremen – Sensation and Perception – Fall Frequency Analysis in the Cochlea and Auditory Nerve cont'd The Perception.
Sensory Systems: Auditory. What do we hear? Sound is a compression wave: When speaker is stationary, the air is uniformly dense Speaker Air Molecules.
Interrupted speech perception Su-Hyun Jin, Ph.D. University of Texas & Peggy B. Nelson, Ph.D. University of Minnesota.
AUDITORY PERCEPTION Pitch Perception Localization Auditory Scene Analysis.
Spectral centroid 6 harmonics: f0 = 100Hz E.g. 1: Amplitudes: 6; 5.75; 4; 3.2; 2; 1 [(100*6)+(200*5.75)+(300*4)+(400*3.2)+(500*2 )+(600*1)] / = 265.6Hz.
Sound source segregation (determination)
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Measuring the brain’s response to temporally modulated sound stimuli Chloe Rose Institute of Digital Healthcare, WMG, University of Warwick, INTRODUCTION.
Audio Scene Analysis and Music Cognitive Elements of Music Listening
SOUND IN THE WORLD AROUND US. OVERVIEW OF QUESTIONS What makes it possible to tell where a sound is coming from in space? When we are listening to a number.
Mr Background Noise and Miss Speech Perception in: by Elvira Perez and Georg Meyer.
Ch 111 Sensation & Perception Ch. 11: Sound, The Auditory System, and Pitch Perception © Takashi Yamauchi (Dept. of Psychology, Texas A&M University) Main.
Lecture 2b Readings: Kandell Schwartz et al Ch 27 Wolfe et al Chs 3 and 4.
Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none.
Hearing: Physiology and Psychoacoustics 9. The Function of Hearing The basics Nature of sound Anatomy and physiology of the auditory system How we perceive.
Auditory Neuroscience 1 Spatial Hearing Systems Biology Doctoral Training Program Physiology course Prof. Jan Schnupp HowYourBrainWorks.net.
When the Brain is attending a cocktail party When the Brain is attending a cocktail party Rossitza Draganova.
CHAPTER 4 COMPLEX STIMULI. Types of Sounds So far we’ve talked a lot about sine waves =periodic =energy at one frequency But, not all sounds are like.
Acoustic Illusions & Sound Segregation Reading Assignments for Quizette 3 Chapters 10 & 11.
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
SPHSC 462 HEARING DEVELOPMENT Overview Review of Hearing Science Introduction.
Signal Analyzers. Introduction In the first 14 chapters we discussed measurement techniques in the time domain, that is, measurement of parameters that.
Audio Scene Analysis and Music Cognitive Elements of Music Listening Kevin D. Donohue Databeam Professor Electrical and Computer Engineering University.
HOW WE TRANSMIT SOUNDS? Media and communication 김경은 김다솜 고우.
Auditory Perception 1 Streaming 400 vs. 504 Hz 400 vs. 566 Hz 400 vs. 635 Hz 400 vs. 713 Hz A 400-Hz tone (tone A) is alternated with a tone of a higher.
General Principles: The senses as physical instruments
PSYCHOACOUSTICS A branch of psychophysics
Precedence-based speech segregation in a virtual auditory environment
The barn owl (Tyto alba)
Ana Alves-Pinto, Joseph Sollini, Toby Wells, and Christian J. Sumner
Volume 67, Issue 2, Pages (July 2010)
Review Session 3: Sensation and Perception
Mark Sayles, Ian M. Winter  Neuron 
David M. Schneider, Sarah M.N. Woolley  Neuron 
Ranulfo Romo, Adrián Hernández, Antonio Zainos  Neuron 
Contrast Gain Control in Auditory Cortex
Perceptual Echoes at 10 Hz in the Human Brain
Josh H. McDermott, Eero P. Simoncelli  Neuron 
Volume 61, Issue 2, Pages (January 2009)
EE513 Audio Signals and Systems
Volume 66, Issue 6, Pages (June 2010)
Cycle 10: Brain-state dependence
Volume 74, Issue 5, Pages (June 2012)
Speech Perception (acoustic cues)
Hearing Illusory Sounds in Noise: The Timing of Sensory-Perceptual Transformations in Auditory Cortex  Lars Riecke, Fabrizio Esposito, Milene Bonte, Elia.
Sensitivity to Complex Statistical Regularities in Rat Auditory Cortex
Feature- and Order-Based Timing Representations in the Frontal Cortex
Attentional Modulations Related to Spatial Gating but Not to Allocation of Limited Resources in Primate V1  Yuzhi Chen, Eyal Seidemann  Neuron  Volume.
David M. Schneider, Sarah M.N. Woolley  Neuron 
Sharon C. Furtak, Omar J. Ahmed, Rebecca D. Burwell  Neuron 
Attentive Tracking of Sound Sources
The Temporal Correlation Hypothesis of Visual Feature Integration
Xiangying Meng, Joseph P.Y. Kao, Hey-Kyoung Lee, Patrick O. Kanold 
The Normalization Model of Attention
Stefano Panzeri, Jakob H. Macke, Joachim Gross, Christoph Kayser 
Jennifer K. Bizley, Ross K. Maddox, Adrian K.C. Lee 
Volume 64, Issue 3, Pages (November 2009)
Multisensory Integration: Maintaining the Perception of Synchrony
Albert K. Lee, Matthew A. Wilson  Neuron 
The Spectrotemporal Filter Mechanism of Auditory Selective Attention
Christophe Micheyl, Biao Tian, Robert P. Carlyon, Josef P. Rauschecker 
Presentation transcript:

Auditory Scene Analysis Auditory Neuroscience 6 Prof. Jan Schnupp wschnupp@cityu.edu.hk http://auditoryneuroscience.com

Understanding sound perception Psychophysics Neurophysiology How the auditory system extracts the pitch, identity and spatial location of sounds is usually studied in the context of single sound sources present at any one time. However, most of the time the sound we receive is a mixture of sounds from multiple sources. Traditionally, studies of sound perception have usually taken 2 forms. Psychophysics examines the relation between the physical attributes of a sound stimulus and our percepts of them. Neurophysiological approaches examine changes in the activity of a neural population as a function of a particular stimulus parameter. But what role these neural populations play in shaping an animal’s perception of that stimulus remains largely a matter of conjecture. One approach designed to address this relation is the “neurometric” analysis. Neurometric analysis is a statistical approach that can be used to investigate whether stimulus related changes in firing rate within a population of sensory neurons could provide the “psychophysical signal” on which sensory discriminations are based. This has been used with great success in the visual and the somatosensory systems, but so far, to a much lesser extent in auditory research. Neurometrics

Auditory grouping and segregation

Tonotopicity of the auditory pathway 0.5 kHz 6 kHz 16 kHz

The Spectrogram The spectrum is a complete description of a sound only if the frequency composition of the sound is constant over time. However, natural sounds usually do vary with time. To deal with this, sounds can be divided into short time segments, and spectra calculated for each time segment in turn. The result of this analysis is called a spectrogram.

The Long Road from Spectrogram to Auditory Scene Analysis The Neurogram idea is deceptively simple, and does not capture some of the fine scale temporal encoding the auditory system is capable of. It is nevertheless clear that the job of the auditory system is to perform some sort of “spectro-temporal analysis” to identify and localise “auditory objects”. The auditory system is astonishingly good at this job. To appreciate how hard this is, try to guess what this sound to the left represents.

Masking a tone by noise The auditory scene consists of separate tone and noise “objects” http://8nerve.org/masking

Masking Tone By Noise

Masking Tone By Noise To decide which of the time periods delineated by the vertical stripes has a tone in it, an “ideal observer” would only look at the amount energy in the frequency band delineated by the vertical stripes. But is that how the brain analyses the sound to decide whether it “hears” a tone embedded in the noise?

Co-modulation masking release http://auditoryneuroscience.com/cmr If the noise fluctuates in amplitude, the tone becomes easier to detect: this is called “release from masking” Paradoxically, this is observed when fluctuating noise is added away from the frequency of the tone Moore 1999

Co-modulation masking release has a neurophysiological correlate

Co-modulation masking release has a neurophysiological correlate Tone Onset Response to fluctuating noise Response to weak tone alone Response to tone plus fluctuating noise Response to loud tone alone Noise Stimulus Intracellular recordings from a neuron in auditory cortex to a tone in modulated noise. The noise response is suppressed in the presence of the tone. Las et al. 2005

“Gestalt Psychology” Principles Similarity Good continuation Proximity Closure (Pattern Completion) Common Fate

The continuity illusion The red line is obviously broken in two as you can see the gap. However, most people would see the blue line as continuous, assuming that it continues behind the green boxes. A similar effect in hearing…..

Visual objects can occlude each, but rarely mix “Grouping”: Whose foot is this? Which parts of the image belong to what? Sound waves from different sources mix Grouping cues – Common onset – Pitch & harmonic structure – Interaural time differences

Common onset as a grouping cue A: artificial vowel heard as /I/ or /e/ depending on where the first formant peak is. B: Moving the first formant peak toward 500 Hz makes the vowel sound more like /I/. But if the 500 Hz harmonic is started 32 ms or 240 ms earlier, it is no longer part of the vowel and it no longer contributes perceptually to the formant. The perception shifts to /e/. According to Bregman, common onset at <30 ms is one of the strongest grouping cues. http://auditoryneuroscience.com/scene-analysis/onsets-vowel-identity AN Fig 6.4. Based on Darwin and Sutherland 1984

Pitch and harmonicity structure as a grouping cue 0 dB level difference 10 dB level difference A difference in pitch improves the identification of two vowels heard simultaneously % trials in which both vowels are correctly identified 20 dB level difference http://auditoryneuroscience.com/scene-analysis/double-vowels Difference in fundamental frequency of two simultaneously presented vowels de Cheveigné et al. 1997

“Unmixing” of Responses by Pitch Responses of a “chopper” neuron from the cochlear nucleus to “I” at F0=88 Hz and “ae” at F0=112 Hz. “I” alone. “I” and some “ae” “ae” and some “I” “ae” alone. Note that the periodicity of the responses transitions quite rapidly as the ratio of I/ae changes. B and C aren’t mixtures of A+D. AN Fig 6.7. Based on Keilson et al 1997

Sequential Grouping Cues Proximity Similarity Rhythm

Auditory stream formation Increased frequency separation 2 streams of tones of different frequencies 3-tone galloping melody https://auditoryneuroscience.com/scene-analysis/streaming-alternating-tones

https://auditoryneuroscience.com/scene-analysis/la-campanella

A possible neural correlate of auditory streaming As the rate at which tones of alternating frequencies is increased, the response to the tone closest to the neurons best frequency dominates. Fishman et al. 2001

The Galloping Rhythm Paradigm https://auditoryneuroscience.com/scene-analysis/streaming-galloping-rhythm

Buildup of Streaming It often takes time for an alternating tone or galloping rhythm sequence to break into two streams. How long it takes depends on the frequency separation and speed. It suggests that the brain starts with the simplest assumption that there is only one source or stream, and only introduces additional streams if enough evidence accumulates (“Occams razor”)

Build-up of streaming: behaviour and physiology First triplet Last triplet Neuron firing rate Time (s) Human behaviour Probability of 2 streams Monkey physiology Time (s) Micheyl et al. 2005

Rhythm and Streaming When one sequence of alternating sounds has a regular rhythm, the other an irregular (“jittered”) rhythm, the two sequences are more likely to be heard as separate streams Rajendran and colleagues (2013 JASA-EL).

Mismatch Negativity and Deviant Detection Figure 6.15 Mismatch negativity (MMN) to frequency deviants. (Left) The potential between electrode Fz (an electrode on the midline, relatively frontal) and the mastoid in response to a 1,000-Hz tone (dashed line) that serves as the standard, and deviant tones at the indicated frequencies. (Right) The difference waveforms (deviant-standard), showing a clear peak around 150 ms after stimulus onset. Note that negative potentials are plotted upward in this figure. From figure 1 of Naatanen et al. (2007).

Sensing changes in the auditory scene The ‘mismatch negativity’ – difference in human event-related potentials to unexpected sounds (‘deviant’) embedded in a stream of expected sounds (‘standard’). Winkler et al. 2003

Neural correlates of scene segregation Background noise Main chirp Natural birdsong Bar-Yosef & Nelken 2007

Listening to 2 people talking simultaneously Speaker 1 (male) Speaker 2 (female) Mesgarani & Chang, 2012, Nature

Sensing changes in the auditory scene Probability D Response amplitude Time (ms) Stimulus-specific adaptation in the responses of auditory neurons - Neurons tire of the same repetitive stimulus, but fire vigorously to a different, rare stimulus Ulanovsky et al. 2003

Further Reading Auditory Neuroscience – Chapter 6 Bregman AS (1990) Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press