How Does auditory perception organization works ? by Elvira Perez and Georg Meyer Dept. Psychology, Liverpool University, UK Hoarse Meeting, Chrysler Ulm,

Slides:



Advertisements
Similar presentations
Attributes of Attention: David Crundall Rm 315 Quantal or analogue? Spatial or object-based? "attention can be likened to a spotlight that enhances the.
Advertisements

Accessing spoken words: the importance of word onsets
The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Hearing relative phases for two harmonic components D. Timothy Ives 1, H. Martin Reimann 2, Ralph van Dinther 1 and Roy D. Patterson 1 1. Introduction.
Auditory scene analysis 2
Modifications of Fechner’s methods, forced choice Research Methods Fall 2010 Tamás Bőhm.
Periodicity and Pitch Importance of fine structure representation in hearing.
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
SPEECH PERCEPTION 2 DAY 17 – OCT 4, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
Perception Chapter 4.
Perception of syllable prominence by listeners with and without competence in the tested language Anders Eriksson 1, Esther Grabe 2 & Hartmut Traunmüller.
Attention I Attention Wolfe et al Ch 7. Dana said that most vision is agenda-driven. He introduced the slide where the people attended to the many weird.
Auditory Scene Analysis (ASA). Auditory Demonstrations Albert S. Bregman / Pierre A. Ahad “Demonstration of Auditory Scene Analysis, The perceptual Organisation.
A.Diederich– International University Bremen – Sensation and Perception – Fall Frequency Analysis in the Cochlea and Auditory Nerve cont'd The Perception.
A.Diederich – International University Bremen – USC – MMM – Spring 2005 Rhythm and timing  Clarke, E.F. Rhythm and timing in music. In Deutsch, D. Chapter.
1 Pattern Recognition (cont.). 2 Auditory pattern recognition Stimuli for audition is alternating patterns of high and low air pressure called sound waves.
Representation of statistical properties 作 者: Sang Chul Chong, Anne Treisman 報告者:李正彥 日 期: 2006/3/23.
Interrupted speech perception Su-Hyun Jin, Ph.D. University of Texas & Peggy B. Nelson, Ph.D. University of Minnesota.
Pattern Recognition Pattern - complex composition of sensory stimuli that the human observer may recognize as being a member of a class of objects Issue.
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
Two- tone unmasking and suppression in a forward-masking situation Robert V. Shannon 1976 Spring 2009 HST.723 Theme 1: Psychophysics.
4aPP17. Effect of signal frequency uncertainty for random multi-burst maskers Rong Huang and Virginia M. Richards Department of Psychology, University.
Sound source segregation (determination)
Cognitive Processes PSY 334 Chapter 2 – Perception.
Vowel formant discrimination in high- fidelity speech by hearing-impaired listeners. Diane Kewley-Port, Chang Liu (also University at Buffalo,) T. Zachary.
Audio Scene Analysis and Music Cognitive Elements of Music Listening
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
Change blindness and time to consciousness Professor: Liu Student: Ruby.
Liverpool University The Department Centre for Cognitive Neuroscience Department of Psychology Liverpool University Overall Aim Understanding Human Information.
Mr Background Noise and Miss Speech Perception in: by Elvira Perez and Georg Meyer.
Speech Perception 4/4/00.
Hearing: auditory coding mechanisms. Harmonics/ Fundamentals ● Recall: most tones are complex tones, consisting of multiple pure tones ● The lowest frequency.
Perception Review Kimberley Clow
Studies of Information Coding in the Auditory Nerve Laurel H. Carney Syracuse University Institute for Sensory Research Departments of Biomedical & Chemical.
Sounds in a reverberant room can interfere with the direct sound source. The normal hearing (NH) auditory system has a mechanism by which the echoes, or.
Localization of Auditory Stimulus in the Presence of an Auditory Cue By Albert Ler.
Epenthetic vowels in Japanese: a perceptual illusion? Emmanual Dupoux, et al (1999) By Carl O’Toole.
SEPARATION OF CO-OCCURRING SYLLABLES: SEQUENTIAL AND SIMULTANEOUS GROUPING or CAN SCHEMATA OVERRULE PRIMITIVE GROUPING CUES IN SPEECH PERCEPTION? William.
Hearing Research Center
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
Psychophysics and Psychoacoustics
# Attentional Volleying Across Visual Quadrants Andrew S. Clement 1,2 & Nestor Matthews 1 1 Department of Psychology, Denison University, 2 Department.
Evaluating Perceptual Cue Reliabilities Robert Jacobs Department of Brain and Cognitive Sciences University of Rochester.
Pitch Perception Or, what happens to the sound from the air outside your head to your brain….
When the Brain is attending a cocktail party When the Brain is attending a cocktail party Rossitza Draganova.
Katherine Morrow, Sarah Williams, and Chang Liu Department of Communication Sciences and Disorders The University of Texas at Austin, Austin, TX
Performance Comparison of Speaker and Emotion Recognition
Listeners weighting of cues for lateral angle: The duplex theory of sound localization revisited E. A. MacPherson & J. C. Middlebrooks (2002) HST. 723.
Motor Theory + Signal Detection Theory
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Transitions + Perception March 25, 2010 Tidbits Mystery spectrogram #3 is now up and ready for review! Final project ideas.
Tonal Violations Interact with Lexical Processing: Evidence from Cross-modal Priming Meagan E. Curtis 1 and Jamshed J. Bharucha 2 1 Dept. of Psych. & Brain.
Motor Theory of Perception March 29, 2012 Tidbits First: Guidelines for the final project report So far, I have two people who want to present their.
Audio Scene Analysis and Music Cognitive Elements of Music Listening Kevin D. Donohue Databeam Professor Electrical and Computer Engineering University.
Danielle Werle Undergraduate Thesis Intelligibility and the Carrier Phrase Effect in Sinewave Speech.
Vision Sciences Society Annual Meeting 2012 Daniel Mann, Charles Chubb
PSYCHOACOUSTICS A branch of psychophysics
Precedence-based speech segregation in a virtual auditory environment
Cognitive Processes PSY 334
Ana Alves-Pinto, Joseph Sollini, Toby Wells, and Christian J. Sumner
Effects of divided attention and sensorimotor synchronization on detection of timing perturbations in auditory sequences Bruno H. Repp Haskins Laboratories,
Contribution of spatial and temporal integration in heading perception
Information-Theoretic Listening
Using Time-Varying Motion Stimuli to Explore Decision Dynamics
Perceptual Echoes at 10 Hz in the Human Brain
Speech Perception (acoustic cues)
Shin'ya Nishida, Alan Johnston  Current Biology 
Presentation transcript:

How Does auditory perception organization works ? by Elvira Perez and Georg Meyer Dept. Psychology, Liverpool University, UK Hoarse Meeting, Chrysler Ulm, Germany 28 th -30 th October, 2004

1.Introduction: Ears receive mixtures of sounds. We can tolerate surprisingly high levels of noise and still orientate our attention to whatever we want to attend. But... how the auditory system can do this so accurately?

Auditory scene analysis (Bregman, 1990) is a theoretical framework that aims to explain auditory perceptual organisation. Basics: –Environment contains multiples objects Decomposition into its constituent elements. Grouping. It proposes two grouping mechanisms: –1. ‘Bottom-up’: Primitive cues (F0, intensity, location) Grouping mechanism based on Gestalt principles. –2. ‘Top-down’: Schema-based (speech pattern matching) 1.Introduction:

Primitive process (Gestalt. Koffka, 1935): –Similarity –Good continuation –Common fate –Disjoint locations –Closure 1.Introduction:

1. Introduction Criticisms: Too simplistic. Whatever cannot be explained through the primitive processes, it is explained by the schema-based processes. Primitive processes only work in the lab. Sine-wave replicas of utterances (Remez et al., 1992) –Phonetic principles of organization find a single speech stream, whereas auditory principles find several simultaneous whistles. –Grouping by phonetic rather than by simple auditory coherence.

3.Experiments (baseline): The purpose of these studies is to explore how noise (a chirp) affects speech perception. The stimulus used is a vowel-nasal syllable which is perceived as /en/ if presented in isolation but as /em/ if it is presented with a frequency modulated sine wave in the position where the second formant transition would be expected. In the three experiments participants categorised the synthetic syllable heard as /em/ or /en/. Direction, duration, and position of the chirp were the values manipulated.

The perception of a nasal /n/ change to /m/ when adding a chirp between the vowel and nasal F2 Formant frequ. (Hz) vowelnasal ms 3. Experiments

Experiment 1 Baseline/Direction chirp up chirp down vowelnasal ms In 80% of the trials the participants heard the difference between up and down chirp.

Experiment 2 Duration vowelnasal ms

vowelnasal Experiment 3 Position

5. Conclusions: Chirps from 4 ms to 20 ms duration and in a range of 1kHz-2kHz, independently of their direction, are apparently integrated into the speech signal and change the percept from /en/ to /em/. Subjects very clearly hear two objects, so that some scene analysis is taking place since the chirp is not integrated completely into the speech. Duplex perception with one ear. It seems that listeners can also discriminate the direction motion of the chirp when they focus their attention in the chirp and a more high level of auditory processing takes places (80% accuracy).

Mr. Background Noise Do human listeners actively generate representation of background noise to improve speech recognition? Hypothesis: Recognition performance should be highest if the spectral and temporal structure of interfering noise is regular so that a good noise model can be generated  unpredictable noise.

Experiment 4 & 5 Stimuli: chirps + /en/ Ten subjects The amplitude of the chirp vary (5 conditions: 0dB - 8dB -14dB -20dB no-chirp) Background noise (down chirps): –Quantity: Lots (170/20ms) vs Few (19/20ms) –Time of appearance: Regular vs Irregular Categorization task 2FC. Threshold shifts.

en en en Regular condition Irregular condition

Each point in the scatter is the mean threshold over all subjects for a give session. The solid lines show the Boltzmann fit (Eq.(1) for each individual subject in the fifth different conditions. All the fits have the same upper and lower asymptotes.

lots vs. few (t = -3.34, df = 38, p = 0.001). control vs. lots (t = -3.34, df = 38, p = 0.001). No effect between irregular and regular. Exp. 4 rand/reg

Two aspects change from exp. 4 to 5: –Amplitude scale of the chirps (0dB -4dB -8dB - 16dB no-chirp). –The conditions lots now includes 100/20’’ and before 170/20’’.

lots vs few (t = 2.27, df = 38, p = 0.05). control vs. lots (t = 3.12, df = 38, p < 0.05). No effect between irregular and regular. Exp.5 rand/reg

5. Conclusions Only the amount of background noise seems to affect the performance of the recognition task. The regularity of the background noise seems an irrelevant cue to improve auditory stream segregation and therefore, speech perception. Counterintuitive phenomenon.

Irrelevant sound effect (ISE) (Colle & Welsh, 1976) disrupts in serial recall. The level of meaning (reverse vs forward speech), predictability of the sequence (random vs regular), and similarity (semantic or physical) of the IS to the target material, seems to have little impact in the focal task. (Jones et al., 1990). Changing state: The degree of variability or physical change within an auditory stream is the primary determinant of the degree of distrupion in the focal task.

Smooth change Abrupt change Frequency Time Frequency Time

Zoom in Up Bottom Top Down

Experiment 6 Stimuli : Synthesised vowel-nasal + background FM tone + Chirps Three blocks (200trial each block): First Control, second Smooth or Abrupt (counterbalanced order) Chirps: Four different frequencies: Up/Down/Top/Bottom Five amplitudes: 0dB, -4dB, -8dB, -16dB, no- chirp

Experiment 6 Subjects: 42 Musicians vs Non musicians Female vs Male Nationality (27) Age(27.7), languages spoken (3) Hearing issues (AP, RP, Tinnitus)

Results Tukey test p < down vs up YES control vs. smooth YES control vs abrupt YES No effect between smooth and abrupt.

Results Musicians vs. Non Musicians …no differences.

More analysis… Take away the intermediate conditions –Results remain the same Habituation –Differences between musicians and non musicians; overall, 10, 5, 3 first blocks, but not the first block. –Again control vs smooth/abrupt

6. Conclusions It seems that listeners do not use pattern prediction as a cue for auditory perceptual organisation. … or they do it extremely fast (3ms), or is due to STM (pre-perceptual auditory storage) Attention must be focused on an object (background noise) for a change in that object to detected (Rensink, et al, 1997) … or we just ignore the information contained in the transitions for not being reliable.

Formant frequ. (Hz) vowelnasal Conflict area

Formant frequ. (Hz) vowelnasal Ignore is not reliable

Formant frequ. (Hz) vowelnasal Formant frequ. (Hz) vowelnasal = = /em/

Which information are we taking in account? Do we combine cues? Do we measure the variances associated which each cue to test its reliability? Maximum likelihood integrator Antecedents: The nervous system seems to combine visual and haptic info similar to a MLE (Ernst, et al. 2002)

transition nasal noise

Conditions e e e n n e n n Full syllable No transition No nasal No transition/nasal ( ( ( ) ) ) Vowel Transition Nasal

Preliminary results

New Methodology Until now Method of Constants: Several stimulus levels are chosen beforehand, and groups of observations are placed at each of these stimulus levels. The order of observations is randomized. A conventional method of estimation is used in fitting the psychometric function to the resulting data. Adaptative procedures: The stimulus level on any one trial is determined by the preceding stimuli and responses. –Sequential experiment: The course of the experiment is dependent on the experimental data.

Up-Down procedures or staircase method: The stimuli level (amplitude of the speech signal) is decreased after a positive response (or increase after a negative). On each trial, the participant is required to give both a binary judgment (em/en/e) and a confident rating. The binary judgment are used to decide on the direction of change in the stimulus level, and the confident ratings are used to decide on the steps size (dB). Advantages: Most of the observations are placed at or near the 50% level

Conclusions …In the next meeting.

Thank you