The Relation Between Speech Intelligibility and The Complex Modulation Spectrum Steven Greenberg International Computer Science Institute 1947 Center Street,

Slides:



Advertisements
Similar presentations
Hearing relative phases for two harmonic components D. Timothy Ives 1, H. Martin Reimann 2, Ralph van Dinther 1 and Roy D. Patterson 1 1. Introduction.
Advertisements

Periodicity and Pitch Importance of fine structure representation in hearing.
Purpose The aim of this project was to investigate receptive fields on a neural network to compare a computational model to the actual cortical-level auditory.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Pitch Perception.
Temporal Properties of Spoken Language Steven Greenberg The Speech Institute
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Cortical Encoding of Natural Auditory Scenes Brigid Thurgood.
Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704
Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.
What are the Essential Cues for Understanding Spoken Language? Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley,
© Bob York l Analysis of speech segments. A) Variation of sound pressure level over time for a representative utterance from the TIMIT corpus (the sentence.
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Baysian Approaches Kun Guo, PhD Reader in Cognitive Neuroscience School of Psychology University of Lincoln Quantitative Methods 2011.
Cross-Spectral Channel Gap Detection in the Aging CBA Mouse Jason T. Moore, Paul D. Allen, James R. Ison Department of Brain & Cognitive Sciences, University.
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
Representing Acoustic Information
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Introduction Mel- Frequency Cepstral Coefficients (MFCCs) are quantitative representations of speech and are commonly used to label sound files. They are.
ELECTRICAL CIRCUIT ET 201 Define and explain characteristics of sinusoidal wave, phase relationships and phase shifting.
Audio Scene Analysis and Music Cognitive Elements of Music Listening
THE MODULATION SPECTRUM and Its Application to Speech Science and Technology Les Atlas, Steven Greenberg, Hynek Hermansky Interspeech Tutorial August 27,
The Modulation Spectrum – Its Role in Sentence and Consonant Identification Steven Greenberg Centre for Applied Hearing Research Technical University of.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Project title : Automated Detection of Sign Language Patterns Faculty: Sudeep Sarkar, Barbara Loeding, Students: Sunita Nayak, Alan Yang Department of.
METHODOLOGY INTRODUCTION ACKNOWLEDGEMENTS LITERATURE Low frequency information via a hearing aid has been shown to increase speech intelligibility in noise.
The History and Biology of THE MODULATION SPECTRUM Steven Greenberg Silicon Speech & Technical University of Denmark Additional material:
Methods Neural network Neural networks mimic biological processing by joining layers of artificial neurons in a meaningful way. The neural network employed.
Sounds in a reverberant room can interfere with the direct sound source. The normal hearing (NH) auditory system has a mechanism by which the echoes, or.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
Chapter 8: Perceiving Motion
Paradoxical False Memory for Objects After Brain Damage Stephanie M. McTighe 1,2 ; Rosemary A. Cowell 3, Boyer D. Winters 4, Timothy J. Bussey 1,2 and.
Phonetic Context Effects Major Theories of Speech Perception Motor Theory: Specialized module (later version) represents speech sounds in terms of intended.
What are the Essential Cues for Understanding Spoken Language? Steven Greenberg Centre for Applied Hearing Research Technical University of Denmark Silicon.
Electrophysiological Processing of Single Words in Toddlers and School-Age Children with Autism Spectrum Disorder Sharon Coffey-Corina 1, Denise Padden.
Adaphed from Rappaport’s Chapter 5
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
Speech Intelligibility Derived from Asynchronous Processing of Auditory-Visual Information Steven Greenberg International Computer Science Institute 1947.
Temporal masking of spectrally reduced speech: psychoacoustical experiments and links with ASR Frédéric Berthommier and Angélique Grosgeorges ICP 46 av.
Judith C. Brown Journal of the Acoustical Society of America,1991 Jain-De,Lee.
Modulation? Modulation is the addition of information (or the signal) to an electronic or optical signal carrier. In electronics, modulation is the process.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
1 Separable Processing of Consonants and Vowels Alfonso Caramazza, Doriana Chialant, Rita Capasso & Gabriele Miceli (Jan. 2000) Nature. Vol 403:
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
The role of prosody in dialect authentication Simulating Masan dialect with Seoul speech segments Kyuchul Yoon Division of English, Kyungnam University.
2D Fourier Transform.
Effect of laterality-specific training on visual learning Jenna Kelly & Nestor Matthews Department of Psychology, Denison University, Granville OH
Temporal Properties of Spoken Language Steven Greenberg In Collaboration with Hannah Carvey,
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Robert W. McCarley, Presenter Cindy Wible, Marek Kubicki ( generated fMRI data), and Dean Salisbury (generated ERP data) Harvard, VA Boston Healthcare.
Spectral subtraction algorithm and optimize Wanfeng Zou 7/3/2014.
10fs laser pulse propagation in air Conclusion The properties of femtosecond laser pulse propagation over a long distance (up to 100m) were studied for.
17th International Conference on Infant Studies Baltimore, Maryland, March 2010 Language Discrimination by Infants: Discriminating Within the Native.
Auditory Localization in Rooms: Acoustic Analysis and Behavior
Precedence-based speech segregation in a virtual auditory environment
Liverpool Keele Contribution.
Copyright © American Speech-Language-Hearing Association
Volume 77, Issue 5, Pages (March 2013)
Perceptual Echoes at 10 Hz in the Human Brain
Benedikt Zoefel, Alan Archer-Boyd, Matthew H. Davis  Current Biology 
Neural Entrainment to Speech Modulates Speech Intelligibility
Wallis, JD Helen Wills Neuroscience Institute UC, Berkeley
Advances in Deep Audio and Audio-Visual Processing
INTRODUCTION TO THE SHORT-TIME FOURIER TRANSFORM (STFT)
INTRODUCTION TO ADVANCED DIGITAL SIGNAL PROCESSING
Auditory Morphing Weyni Clacken
Presentation transcript:

The Relation Between Speech Intelligibility and The Complex Modulation Spectrum Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704, USA Takayuki Arai Department of Electrical and Electronics Engineering Sophia University, 7-1 Kioi-cho, Chiyoda-Ku, Tokyo, Japan

Acknowledgements and Thanks Technical Assistance Joy Hollenback, Shino Sakaguchi and Rosaria Silipo Research Funding U.S. National Science Foundation

Germane Publications PERCEPTUAL BASES OF SPEECH INTELLIGIBILITY Arai, T. and Greenberg, S. (1998) Speech intelligibility in the presence of cross-channel spectral asynchrony, IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, pp Greenberg, S. and Arai, T. (1998) Speech intelligibility is highly tolerant of cross- channel spectral asynchrony. Proceedings of the Joint Meeting of the Acoustical Society of America and the International Congress on Acoustics, Seattle, pp Greenberg, S. and Arai, T. (2001) The relation between speech intelligibility and the complex modulation spectrum. Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech-2001). Greenberg, S., Arai, T. and Silipo, R. (1998) Speech intelligibility derived from exceedingly sparse spectral information, Proceedings of the International Conference on Spoken Language Processing, Sydney, pp Greenberg, S. (1996) Understanding speech understanding - towards a unified theory of speech perception. Proceedings of the ESCA Tutorial and Advanced Research Workshop on the Auditory Basis of Speech Perception, Keele, England, p Silipo, R., Greenberg, S. and Arai, T. (1999) Temporal constraints on speech intelligibility as deduced from exceedingly sparse spectral representations, Proceedings of the 6th European Conference on Speech Communication and Technology (Eurospeech-99 ).

What is the Complex Modulation Spectrum? The Complex Modulation Spectrum Combines both the Magnitude and Phase of the Modulation Pattern Distributed Across the (tonotopically organized) Spectrum This Representation Predicts the Intelligibility of (Locally) Time-Reversed Speech (which Dissociates the Phase and Magnitude Components of the Modulation Spectrum) Thereby Demonstrating the Importance of Modulation Phase (across the frequency spectrum) for Understanding Spoken Language We’’ll discuss this slide in greater detail shortly

Modulation Phase Across the Spectrum The modulation phase pattern distributed across the (tonotopically organized) frequency spectrum is most easily visualized as follows: The signal spectrum is partitioned into 15 separate 1/3-octave channels Only 4 of the channels are retained; the remaining 11 are “tossed” The upper edge of a channel is one octave below the lower edge of the adjacent (upper) channel The modulation pattern in the waveform emanating from each channel is shown Note that the timing of the peaks and valleys (i.e., phase) of the modulation pattern varies across the spectrum An earlier study (Greenberg et al., 1998), using spectrally sparse speech signals, suggested that the modulation phase pattern across the frequency spectrum could be important for intelligibility

What is (Locally) Time-Reversed Speech? Each segment of the speech signal is “flipped” on its horizontal axis The length of the segment thus flipped is the primary experimental parameter This signal manipulation has the effect of dissociating the phase and magnitude components of the modulation spectrum What impact does this manipulation have on intelligibility? Stimulus paradigm based on K. Saberi and D. Perrott (1999) “Cognitive restoration of reversed speech,” Nature 398: 760. Experimental paradigm and acoustic analysis bear virtually no relation to that described in the Saberi and Perrott study

Intelligibility of (Locally)Time-Reversed Speech What impact does local time reversal have on intelligibility? There is a progressive decline in intelligibility with increasing length of the reversed segment When the segment exceeds 40 ms the intelligibility is very poor What acoustic properties are correlated with this decline in intelligibility? Stimuli were sentences from the TIMIT corpus Sample sentence: “She washed his dark suit in greasy wash water all year” 80 different sentences, each spoken by a different speaker

Intelligibility Does NOT Depend Solely on the Magnitude Component of Modulation Spectrum Intelligibility as a function of reverse-segment length Modulation Spectrum (magnitude component only) Saberi and Perrott had conjectured that the results of their experiment could be explained on the basis of the magnitude component of the modulation spectrum Brain – 1 (Cognitive) Scientists – 0

Phase dispersion (relative to the original signal) across 40 sentences as a function of reversed-segment length (ms) (example = Hz sub-band; 4.5 Hz) Increasing Modulation Phase Dispersion as a Function of Increasing Reversed-Segment Length Original 80 Intelligibility as a function of reverse-segment length Let’s examine the relation between modulation phase and intelligibility ….

Increasing Modulation Phase Dispersion Across Frequency as a Function of Increasing Reversed-Segment Length Let’s examine the relation between modulation phase and intelligibility from a slightly different perspective …. Phase dispersion across the spectrum for a single sentence at 4.5 Hz For reversed-segment lengths greater than 40 ms there is significant phase dispersion (relative to the original) that becomes severe for segments > 80 ms Frequency

Computing the Complex Modulation Spectrum Complex Modulation Spectrum = Magnitude x Phase It is important to compute the phase dispersion across the spectrum with precision and to ascertain its impact on the global modulation spectral representation (shown on the following slide)

Intelligibility is Based on BOTH the Magnitude and Phase Components of the Modulation Spectrum Intelligibility as a function of reverse-segment length Complex Modulation Spectrum (both magnitude and phase) The Relation between Intelligibility and the Complex Modulation Spectrum isn’t Bad! Complex modulation spectrum computed for all 80 sentences

Locally time-reversed speech provides a convenient means to dissociate the magnitude and phase components of the modulation spectrum Complex Modulation Spectrum - Summary

Locally time-reversed speech provides a convenient means to dissociate the magnitude and phase components of the modulation spectrum The intelligibility of time-reversed speech decreases as the segment length increases up to ca. 100 ms Complex Modulation Spectrum - Summary

Locally time-reversed speech provides a convenient means to dissociate the magnitude and phase components of the modulation spectrum The intelligibility of time-reversed speech decreases as the segment length increases up to ca. 100 ms Speech intelligibility is NOT correlated with the magnitude component of the low-frequency modulation spectrum Complex Modulation Spectrum - Summary

Locally time-reversed speech provides a convenient means to dissociate the magnitude and phase components of the modulation spectrum The intelligibility of time-reversed speech decreases as the segment length increases up to ca. 100 ms Speech intelligibility is NOT correlated with the magnitude component of the low-frequency modulation spectrum Speech intelligibility IS CORRELATED with the COMPLEX modulation spectrum (magnitude x phase) Complex Modulation Spectrum - Summary

Locally time-reversed speech provides a convenient means to dissociate the magnitude and phase components of the modulation spectrum The intelligibility of time-reversed speech decreases as the segment length increases up to ca. 100 ms Speech intelligibility is NOT correlated with the magnitude component of the low-frequency modulation spectrum Speech intelligibility IS CORRELATED with the COMPLEX modulation spectrum (magnitude x phase) Thus, the phase of the modulation pattern distributed across the frequency spectrum appears to play an important role in understanding spoken language Complex Modulation Spectrum - Summary

That’s All, Folks Many Thanks for Your Time and Attention