Temporal Properties of Spoken Language Steven Greenberg The Speech Institute

Slides:



Advertisements
Similar presentations
Frequency Band-Importance Functions for Auditory and Auditory- Visual Speech Recognition Ken W. Grant Walter Reed Army Medical Center Washington, D.C.
Advertisements

An Adaptive, Dynamical Model of Linguistic Rhythm Sean McLennan Proposal Defense
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Sounds that “move” Diphthongs, glides and liquids.
Acoustic Characteristics of Consonants
Frequency modulation and circuits
Auditory scene analysis 2
Acoustic Characteristics of Vowels
Stress-Accent and Vowel Quality in The Switchboard Corpus Steven Greenberg and Leah Hitchcock International Computer Science Institute 1947 Center Street,
Syllables and Stress, part II October 22, 2012 Potentialities There are homeworks to hand back! Production Exercise #2 is due at 5 pm today! First off:
Prosodics, Part 1 LIN Prosodics, or Suprasegmentals Remember, from our first discussions in class, that speech is really a continuous flow of initiation,
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
TRANSMISSION FUNDAMENTALS Review
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704
Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Data Transmission Slide 1 Continuous & Discrete Signals.
Understanding Spoken Language using Statistical and Computational Methods Steven Greenberg International Computer Science Institute 1947 Center Street,
What are the Essential Cues for Understanding Spoken Language? Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley,
The Relation Between Stress Accent and Pronunciation Variation in Spontaneous American English Discourse Steven Greenberg, Hannah Carvey, Leah Hitchcock.
Sound Transmission and Echolocation Sound transmission –Sound properties –Attenuation Echolocation –Decoding information from echos.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
Chapter three Phonology
Spectral centroid 6 harmonics: f0 = 100Hz E.g. 1: Amplitudes: 6; 5.75; 4; 3.2; 2; 1 [(100*6)+(200*5.75)+(300*4)+(400*3.2)+(500*2 )+(600*1)] / = 265.6Hz.
Angle Modulation.
Conclusions  Constriction Type does influence AV speech perception when it is visibly distinct Constriction is more effective than Articulator in this.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Lecture 1 Signals in the Time and Frequency Domains
Topics covered in this chapter
Interarticulator programming in VCV sequences: Effects of closure duration on lip and tongue coordination Anders Löfqvist Haskins Laboratories New Haven,
Phonetics and Phonology
Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch.
THE MODULATION SPECTRUM and Its Application to Speech Science and Technology Les Atlas, Steven Greenberg, Hynek Hermansky Interspeech Tutorial August 27,
The Modulation Spectrum – Its Role in Sentence and Consonant Identification Steven Greenberg Centre for Applied Hearing Research Technical University of.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
Speech Science Fall 2009 Nov 2, Outline Suprasegmental features of speech Stress Intonation Duration and Juncture Role of feedback in speech production.
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
The Phonetic Patterning of Spontaneous American English Discourse Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.
The History and Biology of THE MODULATION SPECTRUM Steven Greenberg Silicon Speech & Technical University of Denmark Additional material:
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Automatic Identification and Classification of Words using Phonetic and Prosodic Features Vidya Mohan Center for Speech and Language Engineering The Johns.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
What are the Essential Cues for Understanding Spoken Language? Steven Greenberg Centre for Applied Hearing Research Technical University of Denmark Silicon.
Adaphed from Rappaport’s Chapter 5
Hearing Research Center
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
Speech Intelligibility Derived from Asynchronous Processing of Auditory-Visual Information Steven Greenberg International Computer Science Institute 1947.
Temporal masking of spectrally reduced speech: psychoacoustical experiments and links with ASR Frédéric Berthommier and Angélique Grosgeorges ICP 46 av.
Tongue movement kinematics in speech: Task specific control of movement speed Anders Löfqvist Haskins Laboratories New Haven, CT.
Tone, Accent and Quantity October 19, 2015 Thanks to Chilin Shih for making some of these lecture materials available.
Syllables and Stress October 21, 2015.
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
The Relation Between Speech Intelligibility and The Complex Modulation Spectrum Steven Greenberg International Computer Science Institute 1947 Center Street,
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Control of prosodic features under perturbation in collaboration with Frank Guenther Dept. of Cognitive and Neural Systems, BU Carrie Niziolek [carrien]
Signal Analyzers. Introduction In the first 14 chapters we discussed measurement techniques in the time domain, that is, measurement of parameters that.
Temporal Properties of Spoken Language Steven Greenberg In Collaboration with Hannah Carvey,
Understanding Spoken Language using
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Speech Perception (acoustic cues)
A Japanese trilogy: Segment duration, articulatory kinematics, and interarticulator programming Anders Löfqvist Haskins Laboratories New Haven, CT.
Presentation transcript:

Temporal Properties of Spoken Language Steven Greenberg The Speech Institute

Acknowledgements and Thanks Research Funding U.S. Department of Defense U.S. National Science Foundation Research Collaborators Hannah Carvey, Shawn Chang, Ken Grant, Leah Hitchcock, Joy Hollenback, Rosaria Silipo

For Further Information Consult the web site:

This presentation examines WHY the temporal properties of speech are the way they are Some General Questions

Specifically, we ask …. WHY is the average duration of a syllable (in spontaneous speech) ca. 200 ms? Some General Questions

Specifically, we ask …. WHY are some syllables significantly longer than others? Some General Questions

Specifically, we ask …. WHY are some phonetic segments (usually vowels) longer than others (typically consonants)? Some General Questions

And …. WHAT can the temporal properties of speech tell us about spoken language? Some General Questions

The temporal properties of spoken language reflect INFORMATION contained in the speech signal Conclusions

PROSODY is the most sensitive LINGUISTIC reflection of INFORMATION (PROSODY refers to the RHYTHM and TEMPO of syllables in an utterance) Conclusions

Much of the temporal variation in spoken language reflects prosodic factors Conclusions

Hence, prosody is the key to understanding much of the temporal (and phonetic) variation observed in spoken language Conclusions

Prosody also shields the information contained in the speech signal against the deleterious forces of nature (a.k.a. background noise and reverberation) Conclusions

Therefore, to understand spoken language, it is also necessary to understand how prosody is encapsulated in the speech signal (acoustic and otherwise) This is the focus for today’s presentation But, before considering prosody per se, let’s first examine an important acoustic property of the speech signal …. Conclusions

SLOW modulation of acoustic energy, reflecting movement of the speech articulators, is crucial for understanding spoken language The fine spectral detail is FAR less important (80% of the spectrum can be discarded with much impact on intelligibility) WHY should this be so? WHY? WHY? WHY? WHY? Importance of Slow Modulations 90% Intelligibility

Quantifying Modulation Patterns in Speech The modulation spectrum provides a convenient quantitative method for computing the amount of modulation in the speech signal The technique is illustrated for a paradigmatic, simple signal The computation is performed for each spectral channel separately

The low-frequency modulation patterns can thus be quantified using the modulation spectrum, which looks like this for spontaneous speech …. Modulation Spectrum of Spoken Language The modulation spectrum has a broad peak of energy between 3 and 10 Hz

Linguistically, the modulation spectrum reflects SYLLABLES The distribution of syllable duration is similar to the modulation spectrum Modulation Spectrum of Spoken Language Syllable duration Modulation Spectrum 15 minutes of spontaneous material from a single Japanese speaker

Questions: Why do syllables vary so much in duration? And why is the peak of the modulation spectrum so broad? Variation in Syllable Duration Syllable duration Modulation Spectrum 15 minutes of spontaneous material from a single Japanese speaker

Why do syllables vary so much in duration? In large part, it is because syllables carry differential amounts of information Longer syllables tend to contain more information than shorter syllables Below, the vowels in “ride” and “bikes” are longer than in other words (as well as more intense) Variation in Syllable Duration

Duration is one of the most important correlates of syllable accent (prosody) We know this because of studies SIMULATING syllable prominence (accent) labeling by highly trained linguistic transcribers In one study, it was shown that duration is the single most important acoustic property related to syllable prominence (in Am. English) Duration Correlates with Syllable Stress Duration Amplitude Silipo and Greenberg (1999) Pitch

Word Duration and Syllabic Accent Level Words that contain an accented syllable tend to be considerably longer than unaccented words What are the implications of this insight? Heavily Accented Lightly Accented Unaccented All Words

Heavily Accented Lightly Accented Unaccented All Words Word Duration and Stress Accent Level The broad distribution of word duration (and, in turn, syllable duration) largely reflects the co-existence of accented and unaccented words (and syllables), often within the same utterance This interleaving of long and short syllables reflects the DIFFERENTIAL DISTRIBUTION of ENTROPY across an utterance

Breadth of the Modulation Spectrum The broad bandwidth of the modulation spectrum, as it reflects syllable duration, encapsulates the heterogeneity in syllabic and lexical duration associated with variation in syllable prominence Does this insight have implications for spoken language? Modulation spectrum of 40 TIMIT sentences (computed across a 6-kHz bandwidth) Unaccented Heavily Accented All Accents (Convergnce)

Modulation Spectrum Breadth & Intelligibility Long ago, Houtgast and Steeneken demonstrated that the modulation spectrum is highly predictive of speech intelligibility In highly reverberant environments, the modulation spectrum’s peak is severely attenuated, shifted down to ca. 2 Hz, and the signal becomes largely unintelligible What does this imply with respect to prosody? [based on an illustration by Hynek Hermansky] Modulation Spectrum

As the modulation spectrum is progressively low-pass filtered, intelligibility declines Suggesting that intelligibility requires both long and short (i.e., accented and unaccented) syllables (However, some syllables - the accented ones - are “more equal” than others) Intelligibility and Modulation Frequency Silipo et al. (1999) Unaccented Heavily Accented All Accents (Convergnce)

Syllable Duration and Accent Canonical Syllable Forms Heavily accented syllables are generally % longer than their unaccented counterparts The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV) This pattern implies that accent has its greatest impact on vocalic duration V = Vowel C = Consonant

Canonical Syllable Forms Vowel Duration - Accent Level/Syllable Form Vowels in accented syllables are at least twice as long as their unaccented counterparts This pattern implies that the syllabic nucleus absorbs much of accent’s impact (at least as far as duration is concerned)

Canonical Syllable Forms Syllable Onset/Coda Duration and Accent ONSETS of accented syllables are generally 50-60% longer than their unaccented counterparts and are somewhat sensitive to stress accent While there is little difference in duration between accented and unaccented CODA constituents CODAS are relatively insensitive to prosody and carry less information than onsets OnsetsCodas

Sensitivity of Syllable Constituents to Accent Thus, the duration of syllabic nuclei (usually vowels) is most sensitive to syllable accent Syllable CODAS are LEAST sensitive to prosodic accent This differential sensitivity to prosodic accent reflects some fundamental principles of information encoding within the syllable, as well as principles of auditory function (e.g., onsets are more important than offsets for evoking neural discharge – hence much of the neural entropy is embedded in the onset)

Syllable Prominence (Accent) Illustrated [s] [eh] [vx] [en] accented syllable unaccented syllable “Seven” mean duration Full-spectrum perspective OGI Numbers95 [s] [eh] [vx] [en] Nucleus Onset Ambi-syllabic “pure” juncture Nucleus Juncture

Robustness Based on Temporal Properties Reflections from walls and other surfaces routinely modify the spectro- temporal structure of the speech signal under everyday conditions Yet, the intelligibility of speech is remarkably stable This implies that intelligibility is NOT based on the spectro-temporal DETAILS but rather on some more basic,TEMPORAL parameter(s)

Temporal Basis of Intelligibility 90% Intelligibility Four narrow channels, presented synchronously, yield ca. 90% intelligibility Intelligibility for two channels ranges between 10 and 60% 60% Intelligibility

Desynchronizing Slits Affects Intelligibility When the center slits lead or lag the lateral slits by more than 25 ms intelligibility suffers significantly Intelligibility plummets to ca. 55% for leads/lags of 50 ms And declines to 40% for leads/lags of 75 ms

Asynchrony greater than 50 ms results in intelligibility lower than baseline A trough in performance occurs at ca ms asynchrony, roughly the interval associated with the syllable What does this mean? Perhaps, that there is a syllable-length time window of integration Slit Asynchrony Affects Intelligibility

Importance of Visual Cues Time (ms) WB F1 F2 F3 RMS Amplitude (dB) Lip Area (in 2 ) Watch the log float in the wide river Data courtesy of Ken Grant Amplitude Fluctuation in Different Spectral Regions Lip Aperture Variation Visual cues often supplement the acoustic signal, and are particularly important in adverse acoustic environments (i.e., noise & reverberation) What is the basis of visual supplementation to understanding speech? One possibility is the common modulatory properties of the visual and acoustic components of the speech signal

Combining Audio and Visual Cues + + Video Leads 40 – 400 ms Audio Leads 40 – 400 ms Baseline Condition SYNCHRONOUS A/V Place of Articulation Visual cues (a.k.a. speechreading) also provide important information about consonantal place of articulation and the nature of both prosodic and vocalic properties One can desynchronize the audio and visual streams and measure its impact on intelligibility

Focus on Audio-Leading-Video Conditions When the AUDIO signal LEADS the VIDEO, there is a progressive decline in intelligibility, similar to that observed for audio-alone signals These data are next compared with data from the audio-alone study to illustrate the similarity in the slope of the function

Comparison of A/V and Audio-Alone Data The decline in intelligibility for the audio-alone condition is similar to that of the audio-leading-video condition Such similarity in the slopes associated with intelligibility for both experiments suggest that the underlying mechanisms may be similar The intelligibility of the audio-alone signals is higher than the A/V signals due to slits 2+3 being highly intelligible by themselves

When the VIDEO signal LEADS the AUDIO, intelligibility is preserved for asynchrony intervals as large as 200 ms These data are rather strange, implying some form of “immunity” against intelligibility degradation when the video channel leads the audio Focus on Video-Leading-Audio Conditions

The slope of intelligibility-decline associated with the video-leading-audio conditions is rather different from the audio-leading-video conditions WHY? WHY? WHY? Auditory-Visual Integration - the Full Monty

Time Constants of Audio-Visual Intergration The temporal limits of combining visual and acoustic information are SYLLABLE length, particularly when the video precedes the audio signal Suggesting that visual speech cues are syllabically organized

WHY are the temporal properties of speech the way they are Because …. The brain requires such intervals to combine information across sensory modalities and to associate the sensory streams with meaning Some General Answers

WHY is the average duration of a syllable (in spontaneous speech) ca. 200 ms? The syllable’s duration reflects a basic sensori-motor integration time constant and can be considered to represent the sampling rate of consciousness Some General Answers

WHY are some syllables significantly longer than others? The heterogeneity in duration reflects the unequal distribution of entropy across an utterance and is a basic requirement for decoding the speech signal Some General Answers

WHY are some phonetic segments (usually vowels) longer than others (typically consonants)? Vowels reflect the influence of prosodic factors far more than consonants, and therefore convey more information concerning a syllable’s intrinsic entropy than their consonantal counterparts Some General Answers

WHAT can the temporal properties of speech tell us about spoken language in general? It provides a general theoretical framework for understanding the organization of spoken language and how the brain decodes the speech signal Some General Questions

The temporal properties of spoken language reflect INFORMATION contained in the speech signal PROSODY is the most sensitive LINGUISTIC reflection of INFORMATION Much of the temporal variation in spoken language reflects prosodic factors Hence, prosody is the key to understanding much of the temporal (and phonetic) variation observed in spoken language Prosody shields the information contained in the speech signal against the deleterious forces of nature (a.k.a. background noise and reverberation) Therefore, to understand spoken language, it is also necessary to understand how prosody is encapsulated in the speech signal Conclusions and Summary

That’s All Many Thanks for Your Time and Attention

Language - A Syllable-Centric Perspective An empirically grounded perspective of spoken language focuses on the SYLLABLE and Syllabic ACCENT as the interface between “sound” and “meaning” (or at least lexical form) Modes of Analysis Energy Time–FrequencyProsodic Accent Phonetic Interpretation Manner Segmentation Fric Voc V Nas J Word “Seven” Linguistic Tiers

Syllable as Interface between Sound & Meaning The syllable serves as a key organizational unit that binds the lower and higher tiers of linguistic organization There is a systematic relationship between the syllable and the articulatory- acoustic features comprising phonetic constituents Moreover, the syllable is the primary carrier of prosodic information and is linked to morphology and the lexicon as well

These slow modulation patterns are DIFFERENTIALLY distributed across the acoustic frequency spectrum The modulation spectra are similar (in certain respects) across frequency But vary in certain important ways …. Modulation Spectra Across Frequency Modulation Spectra

Modulation Spectrum Varies Across Frequency In Houtgast and Steeneken’s original formulation of the STI, the modulation spectrum was assumed to be similar across the acoustic frequency axis An analysis of spoken English (in this instance TIMIT sentences) suggests that their formulation was not quite accurate for the high frequency channels, as shown below The highest channels have considerable energy between 10 and 30 Hz

Summary of the Presentation Low-frequency modulation patterns reflect SYLLABLES, as well as their specific content and structure

Syllable as Interface between Sound & Meaning The syllable serves as a key organizational unit that binds the lower and higher tiers of linguistic organization There is a systematic relationship between the syllable and the articulatory- acoustic features comprising phonetic constituents Moreover, the syllable is the primary carrier of prosodic information and is linked to morphology and the lexicon as well

Summary of the Presentation Such temporal properties reflect a basic sensory-motor time constant of ca. 200 ms – the SAMPLIING RATE of CONSCIOUSNESS

Modulation Spectrum as Predictor of Intelligibility In the 1970’s, Houtgast and Steeneken demonstrated that the magnitude of the modulation spectrum could be used to predict speech intelligibility over a wide range of acoustic environments In optimum listening conditions, the modulation spectrum has a peak between 4 and 5 Hz, as shown below In highly reverberant environments, the modulation spectrum’s peak is attenuated, shifting down to ca. 2 Hz, becoming increasing unintelligible [based on an illustration by Hynek Hermansky] Modulation Spectrum

In face-to-face interaction the visual component of the speech signal can be extremely important for understanding spoken language (particularly in noisy and/or reverberant conditions) It is therefore of interest to ascertain the brain’s tolerance for asynchrony between the audio and visual components of the speech signal This exercise can also provide potentially illuminating insights into the nature of the neural mechanisms underlying speech comprehension Specifically, the contribution of speechreading cues can provide clues about what REALLY is IMPORTANT in the speech signal for INTELLIGIBILITY that is independent of the sensory modality involved Audio-Visual Integration of Speech

In Conclusion ….

Language - A Syllable-Centric Perspective A more empirically grounded perspective of spoken language focuses on the SYLLABLE as the interface between “sound” “vision” and “meaning” Important linguistic information is embedded in the TEMPORAL DYNAMICS of the speech signal (irrespective of the modality)

Germane Publications Arai, T. and Greenberg, S. (1998) Speech intelligibility in the presence of cross-channel spectral asynchrony, IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, pp Grant, K. and Greenberg, S. (2001) Speech intelligibility derived from asynchronous processing of auditory-visual information. Proceedings of the ISCA Workshop on Audio-Visual Speech Processing (AVSP-2001), pp Greenberg, S. and Arai, T. (1998) Speech intelligibility is highly tolerant of cross-channel spectral asynchrony. Proceedings of the Joint Meeting of the Acoustical Society of America and the International Congress on Acoustics, Seattle, pp Greenberg, S., Arai, T. and Silipo, R. (1998) Speech intelligibility derived from exceedingly sparse spectral information, Proceedings of the International Conference on Spoken Language Processing, Sydney, pp Greenberg, S. (1996) Understanding speech understanding - towards a unified theory of speech perception. Proceedings of the ESCA Tutorial and Advanced Research Workshop on the Auditory Basis of Speech Perception, Keele, England, p Silipo, R., Greenberg, S. and Arai, T. (1999) Temporal constraints on speech intelligibility as deduced from exceedingly sparse spectral representations, 6th European Conference on Speech Communication and Technology (Eurospeech-99), pp

Syllables rise and fall in energy over the course of their duration Vocalic nuclei are highest in amplitude Onset consonants gradually rise in energy arching towards the peak Coda consonants decline in amplitude, usually more abruptly than onsets The Energy Arc Illustrated Spectro-temporal profile (STeP)Spectrogram + Waveform “seven”

Spectrally sparse audio and speech-reading information provide minimal intelligibility when presented alone in the absence of the other modality This same information can, when combined across modalities, provide good intelligibility (63% average accuracy) When the audio signal leads the video, intelligibility falls off rapidly as a function of modality asynchrony When the video signal leads the audio, intelligibility is maintained for asynchronies as long as 200 ms For eight out of nine subjects, the highest intelligibility is associated with conditions in which the video signal leads the audio (often by ms) There are many potential interpretations of the data The interpretation currently favored (by the speaker) posits a relatively long (200 ms) integration buffer for audio-visual integration when the brain is confronted exclusively (even for short intervals) with speech-reading information (as occurs when the video signal leads the audio) The data further suggest that place-of-articulation cues evolve over syllabic intervals of ca. 200 ms in length and could therefore potentially apply to models of speech processing in general Speechreading also appears to provide important prosodic information that is extremely useful for decoding the speech signal Audio-Video Integration – Summary

The temporal properties of spoken language reflect INFORMATION contained in the speech signal PROSODY is the most sensitive LINGUISTIC reflection of INFORMATION Much of the temporal variation in spoken language reflects prosodic factors Hence, prosody is the key to understanding much of the temporal (and phonetic) variation observed in spoken language Prosody is what likely shields the information contained in the speech signal against the deleterious forces of nature (a.k.a. background noise and reverberation) Therefore, to understand spoken language, it is also necessary to understand how prosody is encapsulated in the speech signal This is the focus for today’s presentation But, before considering prosody, let’s first examine an important acoustic property of the speech signal …. Take Home Messages