Mr Background Noise and Miss Speech Perception in: by Elvira Perez and Georg Meyer.

Slides:



Advertisements
Similar presentations
Acoustic/Prosodic Features
Advertisements

All slides © S. J. Luck, except as indicated in the notes sections of individual slides Slides may be used for nonprofit educational purposes if this copyright.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Hearing relative phases for two harmonic components D. Timothy Ives 1, H. Martin Reimann 2, Ralph van Dinther 1 and Roy D. Patterson 1 1. Introduction.
Auditory scene analysis 2
Working Memory Dr. Claudia J. Stanny EXP 4507 Memory & Cognition Spring 2009.
The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/
Sound source segregation Development of the ability to separate concurrent sounds into auditory objects.
Timbre perception. Objective Timbre perception and the physical properties of the sound on which it depends Formal definition: ‘that attribute of auditory.
Periodicity and Pitch Importance of fine structure representation in hearing.
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
SPEECH PERCEPTION 2 DAY 17 – OCT 4, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
The Neuroscience of Language. What is language? What is it for? Rapid efficient communication – (as such, other kinds of communication might be called.
Speech Science XII Speech Perception (acoustic cues) Version
Pitch Perception.
The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/
Chapter 1: Information and Computation. Cognitive Science  José Luis Bermúdez / Cambridge University Press 2010 Overview Review key ideas from last few.
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
Auditory Scene Analysis (ASA). Auditory Demonstrations Albert S. Bregman / Pierre A. Ahad “Demonstration of Auditory Scene Analysis, The perceptual Organisation.
A.Diederich– International University Bremen – Sensation and Perception – Fall Frequency Analysis in the Cochlea and Auditory Nerve cont'd The Perception.
Speech perception Relating features of hearing to the perception of speech.
Exam 1 Monday, Tuesday, Wednesday next week WebCT testing centre Covers everything up to and including hearing (i.e. this lecture)
Auditory Scene Analysis. Sequential vs. Simultaneous Organisation Sequential grouping involves connecting components over time to form streams Simultaneous.
1 Pattern Recognition (cont.). 2 Auditory pattern recognition Stimuli for audition is alternating patterns of high and low air pressure called sound waves.
Interrupted speech perception Su-Hyun Jin, Ph.D. University of Texas & Peggy B. Nelson, Ph.D. University of Minnesota.
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
Two- tone unmasking and suppression in a forward-masking situation Robert V. Shannon 1976 Spring 2009 HST.723 Theme 1: Psychophysics.
AUDITORY PERCEPTION Pitch Perception Localization Auditory Scene Analysis.
4aPP17. Effect of signal frequency uncertainty for random multi-burst maskers Rong Huang and Virginia M. Richards Department of Psychology, University.
Sound source segregation (determination)
Cognitive Processes PSY 334 Chapter 2 – Perception.
Human Psychoacoustics shows ‘tuning’ for frequencies of speech If a tree falls in the forest and no one is there to hear it, will it make a sound?
Audio Scene Analysis and Music Cognitive Elements of Music Listening
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Studies of Information Coding in the Auditory Nerve Laurel H. Carney Syracuse University Institute for Sensory Research Departments of Biomedical & Chemical.
Sounds in a reverberant room can interfere with the direct sound source. The normal hearing (NH) auditory system has a mechanism by which the echoes, or.
LATERALIZATION OF PHONOLOGY 2 DAY 23 – OCT 21, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
Epenthetic vowels in Japanese: a perceptual illusion? Emmanual Dupoux, et al (1999) By Carl O’Toole.
SEPARATION OF CO-OCCURRING SYLLABLES: SEQUENTIAL AND SIMULTANEOUS GROUPING or CAN SCHEMATA OVERRULE PRIMITIVE GROUPING CUES IN SPEECH PERCEPTION? William.
How Does auditory perception organization works ? by Elvira Perez and Georg Meyer Dept. Psychology, Liverpool University, UK Hoarse Meeting, Chrysler Ulm,
Hearing Research Center
Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
Temporal masking of spectrally reduced speech: psychoacoustical experiments and links with ASR Frédéric Berthommier and Angélique Grosgeorges ICP 46 av.
SOUND PRESSURE, POWER AND LOUDNESS MUSICAL ACOUSTICS Science of Sound Chapter 6.
When the Brain is attending a cocktail party When the Brain is attending a cocktail party Rossitza Draganova.
Listeners weighting of cues for lateral angle: The duplex theory of sound localization revisited E. A. MacPherson & J. C. Middlebrooks (2002) HST. 723.
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
Speech Perception.
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Control of prosodic features under perturbation in collaboration with Frank Guenther Dept. of Cognitive and Neural Systems, BU Carrie Niziolek [carrien]
Transitions + Perception March 25, 2010 Tidbits Mystery spectrogram #3 is now up and ready for review! Final project ideas.
Motor Theory of Perception March 29, 2012 Tidbits First: Guidelines for the final project report So far, I have two people who want to present their.
How we actively interpret our environment..  Perception: The process in which we understand sensory information.  Illusions are powerful examples of.
Audio Scene Analysis and Music Cognitive Elements of Music Listening Kevin D. Donohue Databeam Professor Electrical and Computer Engineering University.
SOUND PRESSURE, POWER AND LOUDNESS
What can we expect of cochlear implants for listening to speech in noisy environments? Andrew Faulkner: UCL Speech Hearing and Phonetic Sciences.
SPATIAL HEARING Ability to locate the direction of a sound. Ability to locate the direction of a sound. Localization: In free field Localization: In free.
Selective Attention
HOW WE TRANSMIT SOUNDS? Media and communication 김경은 김다솜 고우.
Auditory Perception 1 Streaming 400 vs. 504 Hz 400 vs. 566 Hz 400 vs. 635 Hz 400 vs. 713 Hz A 400-Hz tone (tone A) is alternated with a tone of a higher.
Vision Sciences Society Annual Meeting 2012 Daniel Mann, Charles Chubb
PSYCHOACOUSTICS A branch of psychophysics
Precedence-based speech segregation in a virtual auditory environment
Cognitive Processes PSY 334
Ana Alves-Pinto, Joseph Sollini, Toby Wells, and Christian J. Sumner
Information-Theoretic Listening
Speech Perception (acoustic cues)
Presentation transcript:

Mr Background Noise and Miss Speech Perception in: by Elvira Perez and Georg Meyer

Structure 1.Introduction 2.Background 3.Experiments 4.Conclusions 5.Future research

1.Introduction: Ears received mixtures of sounds. We can tolerate surprisingly high levels of noise and still orientate our attention to whatever we want to attend. But... how the auditory system can do this so accurately?

Auditory scene analysis (Bregman, 1990) is a theoretical framework that aims to explain auditory perceptual organisation. Basics: –Environment contains multiples objects Decomposition into its constituent elements. Grouping. It proposes two grouping mechanisms: –1. ‘Bottom-up’: Primitive cues (F0, intensity, location) Grouping mechanism based on Gestalt principles. –2. ‘Top-down’: Schema-based (speech pattern matching) 1.Introduction:

Primitive process (Gelstalt. Koffka, 1935): –Similarity: Sounds will be grouped into a single perceptual stream if they are similar in pitch, timbre, loudness or location. –Good continuation: Smooth changes in frequency, intensity, location or spectrum will be perceived as changes in a single source, whereas abrupt changes indicate change in source. –Common fate: Two components in a sound sharing the same kinds of changes at the same time (e.g.: onset-offset) will perceived and grouped as part of the same single source. –Disjoint locations: A given element in a sound can only form part of one stream at a time (  duplex perception). –Closure: When parts of a sound are masked or occluded, that sound will be perceived as continuous, if there is no direct sensory evidence to indicate that it has been interrupted 1.Introduction:

1. Introduction Criticisms: Too simplistic. Whatever cannot be explained through the primitive processes, it is explained by the schema-based processes. Primitive processes only work in the lab. Sine-wave replicas of utterances (Remez et al., 1992) –Phonetic principles of organization find a single speech stream, whereas auditory principles find several simultaneous whistles. –Grouping by phonetic rather than by simple auditory coherence.

Previous studies (Meyer and Barry, 1999: Harding and Meyer, 2003) have shown that perceptual streams may be formed also from complex features such as formants in speech, and not just from low-level sounds like tones. 2. Background:

Perception of synthetic /m/ changes to /n/ when preceding by a vowel with a high second formant and no transition between the vowel and consonant. Hear as /m/ (Grouped F2 matches that of an /m/) Heard as /n/ (Grouped F2 matches that of an /n/) vowel m

3.Experiments (baseline): The purpose of these studies is to explore how noise (a chirp) affects speech perception. The stimulus used is a vowel-nasal syllable which is perceived as /en/ if presented in isolation but as /em/ if it is presented with a frequency modulated sine wave in the position where the second formant transition would be expected. In the three experiments participants categorised the synthetic syllable heard as /em/ or /en/. Direction, duration, and position of the chirp were the values manipulated.

The perception of a nasal /n/ change to /m/ when adding a chirp between the vowel and nasal F2 Formant frequ. (Hz) vowelnasal ms 3. Experiments

Chirp down means that the frequency of its sine wave changes from 2kHz to 1kHz. Chirp up means that its frequency changes from 1kHz to 2kHz 3. Experiments

Subjects: 7 male and 6 female. Their task was always the same: After each signal presentation subjects were asked to judge whether the syllable sound more like and /em/ or like and /en/ by pressing the appropriate button on a computer graphics tablet. There was no transition between the vowel and the nasal. A chirp between the vowel and the nasal F2 was added to the signal. 3. Experiments

Experiment 1 Baseline/Direction chirp up chirp down vowelnasal ms In 80% of the trials the participants heard the difference between up and down chirp.

Experiment 2 Duration vowelnasal ms

vowelnasal Experiment 3 Position

5. Conclusions: Chirps of 4 ms and 10 ms duration, independent of their direction are apparently integrated into the speech signal and change the percept from /en/ to /em/. For longer duration chirps the direction of it matters but for chirp durations up to 30 ms the majority of stimuli are still heard as /em/. Subjects very clearly hear two objects, so that some scene analysis is taking place since the chirp is not integrated completely into the speech.

Duplex perception with one ear. It seems that listeners can also discriminate the direction motion of the chirp when they focus their attention in the chirp and a more high level of auditory processing takes places (80% accuracy).

The data suggests that a low spectro-temporal analysis of the transitions is carried out because: (1) the spectral structure of the chirp sound is very different from the harmonically structured speech sound, and (2) the direction of the chirp motion has little perceptual significance in the percept. These results complement the auditory scene analysis theoretical framework.

Mr. Background Noise Do human listeners actively generate representation of background noise to improve speech recognition? Hypothesis: Recognition performance should be highest if the spectral and temporal structure of interfering noise is regular so that a good noise model can be generated  random noise Noise prediction.

Experiment 4 & 5 Same stimuli than before (chirp down) The amplitude of the chirp vary (5 conditions) Background noise (down chirps): –Quantity: Lots vs Few –Quality: Random vs Regular Categorization task 2FC. Threshold shifts.

en en en Regular condition Irregular condition

Each point in the scatter is the mean threshold over all subjects for a give session. The solid lines show the Boltzmann fit (Eq.(1) for each individual subject in the fifth different conditions. All the fits have the same upper and lower asymptotes.

lots vs. few (t = -3.34, df = 38, p = 0.001). control vs. lots (t = -3.34, df = 38, p = 0.001). No effect between random and regular. Exp. 4 rand/reg

Two aspects change from exp. 4 to 5: –Amplitude scale of the chirps. –The conditions lots now includes 100/20’’ and before 170/20’’.

lots vs few (t = 2.27, df = 38, p = 0.05). control vs. lots (t = 3.12, df = 38, p < 0.05). No effect between random and regular. Exp.5 rand/reg

Conclusions Only the amount of background noise seems to affect the performance of the recognition task. The sequential grouping of the background noise seems an irrelevant cue to improve auditory stream segregation and therefore, speech perception. Counterintuitive phenomenon. Attention must be focused on an object (background noise) for a change in that object to detected (Rensink, et al, 1997)

Change deafness: May occur as a function of (not) attending to the relevant stimulus dimension (Vitevitch, 2003). SPEECH Lexical (meaning) Inlexical (age, gender, emotions of the talker) Acoustical (loudness, pitch, timbre)

Irrelevant sound effect (ISE) (Colle & Welsh, 1976) disrupts in serial recall. The level of meaning (reverse vs forward speech), predictability of the sequence (random vs regular), and similarity (semantic or phisical) of the IS to the target material, seems to have little impact in the focal task. (Jones et al., 1990). Changing state: The degree of variability or phisical change within an auditory stream is the primary determinant of the degree of distrupion in the focal task.

Frequency Time One stream Frequency Time Two streams Changing single stream Two unchanging streams

Experiment 6 Shannon et al. (1999) database of CVC syllables. Constant white noise + high frequency burst + low frequency burst. 3 Conditions: Control, background noise 1 stream, background noise 2 streams. Does auditory stream segregation require attention?

Results Is there a confound effect? Tukey test p < More quantity of noise

Procedure White noise: 68 dB Burst high 71 dB Burst low 67 dB Speech signal dB : 61.5dB

Conclusions Changing-state stream is more disruptive than two steady-state streams. Working memory or STM and the phonological loop (Badeley, 1999) Auditory streaming can occur in the absence of focal attention.

6. Future research: Streaming and focal attention Streaming and location (diotic vs dichotic presentation)

Thank you

References: Bregman, A.S Auditory Scene Analysis: The perceptual organisation of sound, MIT Press, Cambridge MA. Harding and Meyer, G. (2003) Changes in the perception of synthetic nasal consonants as a result of vowel formant manipulations. Speech Communication 39, Meyer, G. and Barry, W. (1999) Continuity based grouping affects the perception of vowel-nasal syllables. In: Proc. 14th International Congress of Phonetic Science. University of California.

Table 1: Klatt synthesizer parameters: Fx: formant frequency (Hz), B: bandwidth (Hz), A: Amplitude (dB). The nasals have nasal formants at 250Hz and nasal zeros at 750 Hz(/m/) and 1000 (/n/) /m/ /n/ Vowel ABF3ABF2ABF1

Spectrogram of the stimulus used in the three experiments, with vowel formant at 375, 2000, and 2700 Hz corresponding to an /e/, and with nasal prototype formants at 250, 2000 and 2700 Hz corresponding to an /n/. There was no transitions between the vowel and the nasal