The auditory and the visual percept evoked by the same audiovisual stimuli Hartmut Traunmüller Niklas Öhrström Dept. of Linguistics, University of Stockholm.

Slides:



Advertisements
Similar presentations
A. Hatzis, P.D. Green, S. Howard (1) Optical Logo-Therapy (OLT) : Visual displays in practical auditory phonetics teaching. Introduction What.
Advertisements

Tom Lentz (slides Ivana Brasileiro)
Plasticity, exemplars, and the perceptual equivalence of ‘defective’ and non-defective /r/ realisations Rachael-Anne Knight & Mark J. Jones.
Philip Harrison J P French Associates & Department of Language & Linguistic Science, York University IAFPA 2006 Annual Conference Göteborg, Sweden Variability.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
From Resonance to Vowels March 8, 2013 Friday Frivolity Some project reports to hand back… Mystery spectrogram reading exercise: solved! We need to plan.
A two dimensional kinematic mapping between speech acoustics and vocal tract configurations : WISP A.Hatzis, P.D.Green1 History of Vowel.
Chapter 12 Speech Perception. Animals use sound to communicate in many ways Bird calls Bird calls Whale calls Whale calls Baboons shrieks Baboons shrieks.
Vowel Acoustics, part 2 March 12, 2014 The Master Plan Today: How resonance relates to vowels (= formants) On Friday: In-class transcription exercise.
“Speech and the Hearing-Impaired Child: Theory and Practice” Ch. 13 Vowels and Diphthongs –Vowels are formed when sound produced at the glottal source.
Speech and speaker normalization (in vowel normalization)
Perception of syllable prominence by listeners with and without competence in the tested language Anders Eriksson 1, Esther Grabe 2 & Hartmut Traunmüller.
PHONETICS AND PHONOLOGY
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
Speech Group INRIA Lorraine
Chapter 6: Visual Attention. Scanning a Scene Visual scanning – looking from place to place –Fixation –Saccadic eye movement Overt attention involves.
Cognitive Processes PSY 334 Chapter 2 – Perception April 9, 2003.
PSY 369: Psycholinguistics
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Today Speaker Variable: Gender
The Effect of Incongruent Visual Cues on the Heard Quality of Front Vowels Hartmut Traunmüller Niklas Öhrström Dept. of Linguistics, University of Stockholm.
Phonology: Analyzing the Sounds of Language Introduction to Articulatory Phonetics: Vowels.
Perceptual Inference and Information Integration in Brain and Behavior PDP Class Jan 11, 2010.
Cognitive Processes PSY 334 Chapter 2 – Perception.
SPEECH ARTICULATION: Vowels David Brett David Brett.
Conclusions  Constriction Type does influence AV speech perception when it is visibly distinct Constriction is more effective than Articulator in this.
Phonetics HSSP Week 5.
Hypothesis Testing II The Two-Sample Case.
PHONETICS & PHONOLOGY COURSE WINTER TERM 2014/2015.
Preschool-Age Sound- Shape Correspondences to the Bouba-Kiki Effect Karlee Jones, B.S. Ed. & Matthew Carter, Ph.D. Valdosta State University.
Phonological Constraints on the Acquisition of Mid Vowels in English for Students in Taiwan author: 黃俐雯 presented by Lisa Liu 報告人: 劉莉莎.
Segmental factors in language proficiency: Velarization degree as a signature of pronunciation talent Henrike Baumotte and Grzegorz Dogil {henrike.baumotte,
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Vowel Acoustics November 2, 2012 Some Announcements Mid-terms will be back on Monday… Today: more resonance + the acoustics of vowels Also on Monday:
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
Speech Perception 4/4/00.
Acoustic Cues to Laryngeal Contrasts in Hindi Susan Jackson and Stephen Winters University of Calgary Acoustics Week in Canada October 14,
Sounds in a reverberant room can interfere with the direct sound source. The normal hearing (NH) auditory system has a mechanism by which the echoes, or.
Speech Science IX How is articulation organized? Version WS
Pragmatically-guided perceptual learning Tanya Kraljic, Arty Samuel, Susan Brennan Adaptation Project mini-Conference, May 7, 2007.
Sensation & Perception
Sounds and speech perception Productivity of language Speech sounds Speech perception Integration of information.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
Chapter II phonology II. Classification of English speech sounds Vowels and Consonants The basic difference between these two classes is that in the production.
Chapter Five Language Description language study and linguistic study 1Applied Linguistics Chapter 5 by TIAN Bing.
Introduction to Language Phonetics 1. Explore the relationship between sound and spelling Become familiar with International Phonetic Alphabet (IPA )
Perceptual attention Theories of attention Early selection Late selection Resource theories Repetition blindness and the attentional blink.
Performance Comparison of Speaker and Emotion Recognition
2.3 Markedness Differential Hypothesis (MDH)
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Language Perception.
Speech Science II Capturing and representing speech.
Intersensory Redundancy Facilitates Infants’ Perception of Meaning in Speech Passages Irina Castellanos, Melissa Shuman, and Lorraine E. Bahrick Florida.
Speechreading Based on Tye-Murray (1998) pp
Stimuli were presented on a 17 inch monitor (in a dimly lit room), operating at 60 Hz with a resolution of 1280 x Two objects of the same type (teapot.
S. Kramer1, K. Tucker1, A.L. Moro1, E. Service1, J.F. Connolly1
Cognitive Processes PSY 334
Understanding Standards Event Higher Statistics Award
The “Flash-Lag” Effect Occurs in Audition and Cross-Modally
English Phonetics and Phonology
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Speech Perception (acoustic cues)
An Introduction to Speechreading
Motor theory.
Multisensory integration: perceptual grouping by eye and ear
Attentive Tracking of Sound Sources
Topic: Language perception
PHONETICS AND PHONOLOGY
Presentation transcript:

The auditory and the visual percept evoked by the same audiovisual stimuli Hartmut Traunmüller Niklas Öhrström Dept. of Linguistics, University of Stockholm

Theoretical background It is fairly obvious that acoustic speech stimuli evoke an auditory percept, while optic speech stimuli evoke a visual percept. In phonetic terms, these percepts agree with each other in congruent AV stimuli. In incongruent AV stimuli, this is not necessarily so.

Theoretical background Acoustic signalOptic signal Auditory signal analysisVisual signal analysis An auditory percept A visual percept

Theoretical background Acoustic signal A common percept Optic signal Auditory signal analysis Audiovisual integration Visual signal analysis An auditory percept A visual percept

Theoretical background According to the Motor Theory and the Direct Realist theory of speech perception, the ‘object’ of speech perception is gestural in nature. These theories know of only one percept of speech, which may be identified with the common AV-percept in Figure 1.

Theoretical background Another theory, the Modulation Theory, considers speech primarily as modulated voice. The ‘object’ of normal speech perception is vocal in nature and consists in the modulation of a voice. The theory allows for a different percept in lip reading. This is gestural and consists in the modulation of a face.

Theoretical background In order to clarify the situation, it is necessary to investigate not only the effects an optic speech signal has on auditory perception, but also those an acoustic speech signal has on visual perception of speech – and to compare these effects with each other.

Earlier studies In an earlier experiment, we presented congruent and incongruent AV stimuli to subjects. The AV stimuli consisted of different front vowels presented within a [g_g] frame. They were incongruent with respect to openness (height) or roundedness or both. The subjects had to report which vowel they had heard. The response alternatives consisted of the nine letters that represent the long vowel phonemes of Swedish.

Earlier studies In an earlier experiment, we presented congruent and incongruent AV stimuli to subjects. The stimuli consisted of different front vowels presented within a [g_g] frame. They were incongruent with respect to openness (height) or roundedness or both. The subjects had to report which vowel they had heard. The response alternatives consisted of the nine letters that represent the long vowel phonemes of Swedish.

Earlier studies In an earlier experiment, we presented congruent and incongruent AV stimuli to subjects. The stimuli consisted of different front vowels presented within a [g_g] frame. The vowels were incongruent with respect to openness (height) or roundedness or both. The subjects had to report which vowel they had heard. The response alternatives consisted of the nine letters that represent the long vowel phonemes of Swedish.

Earlier studies In an earlier experiment, we presented congruent and incongruent AV stimuli to subjects. The stimuli consisted of different front vowels presented within a [g_g] frame. The vowels were incongruent with respect to openness (height) or roundedness or both. The subjects had to report which vowel they had heard. The response alternatives consisted of the nine letters that represent the long vowel phonemes of Swedish.

Earlier studies Typical result AVPercept ɡyɡɡeɡɡeɡ → ɡiɡ ɡeɡɡyɡ → ɡøɡɡøɡ ɡiɡɡyɡ → ɡyɡɡyɡ ɡeɡɡeɡɡiɡ → ɡeɡ Visual roundedness combined with auditory openness.

Earlier studies Explanation Acoustic cues to openness (F 1 etc.) are salient and reliable. Optic cues to openness are less reliable because of variation due to individual habits, attitude and emotion.

Earlier studies Explanation Acoustic cues to openness (F 1 etc.) are salient and reliable. Optic cues to openness are less reliable because of variation due to individual habits, attitude and emotion. Optic cues to roundedness are more reliable; rounded lips are easy to distinguish from unrounded in most conditions. Acoustic cues to roundedness (higher formants) lack salience and are less reliable.

Earlier studies The mentioned experiment was designed with the objective of investigating perception in terms of phonemic categories.

Earlier studies The mentioned experiment was designed with the objective of investigating perception in terms of phonemic categories. However, subjects informally reported having heard vowels whose quality differed from that of ordinary Swedish vowels. Auditorily rounded vowels appeared to be shifted backwards in the front-back dimension when presented together with optically unrounded vowels.

The present study The present experiment has the aim of exploring the cross-modal perceptual effects on the finer phonetic, sub-categorical perception of vowels.

The present study The present experiment has the aim of exploring the cross-modal perceptual effects on the finer phonetic, sub-categorical perception of vowels. It has also the additional aim of comparing the auditory and the visual perception of the same AV stimuli.

The present study We reused a subset of the stimuli from the previous experiment. AV ɡyɡɡiɡ ɡyɡɡeɡ ɡyɡ-- ɡyɡ AV ɡeɡɡiɡ ɡeɡɡyɡ ɡeɡ-- ɡeɡ AV ɡiɡɡyɡ ɡiɡɡeɡ ɡiɡ-- ɡiɡ

The present study There were 4 speakers: 2 male, 2 female.

The present study There were 8 perceivers: They were selected from a previous experiment where they had shown sensitivity to the optic signal in incongruent audiovisual stimuli. The 8 subjects were all phonetically skilled and familiar with the IPA-chart for vowels.

The present study The subjects perceived the stimuli by way of headphones and a computer screen. The stimuli were presented in quasi-random order. Responses were given on electronic response sheets.

The present study The subjects were instructed to rate these dimensions of the vowels: Lip rounding (6 degrees), 1st: unrounded; 5th: rounded Lip spreading (3 degrees) Openness (18 degrees), 2nd: close vowels, 6th: close-mid vowels Backness (11 degrees auditorily; 7 degrees visually), 2nd: front vowels, 6th (auditorily): central vowels

The present study In a first experiment, the subjects were instructed to rate the dimensions of vowels they heard. In a second experiment, the same subjects were instructed to rate the dimensions of vowels they saw. The incongruent stimuli were the same in the two experiments.

Results Openness opn vs. roundedness rnd; acoustic stimuli (listening only). Symbols represent speakers.

Results Openness opn vs. roundedness rnd; optic stimuli (lipreading only). Symbols represent speakers.

Results Heard openness of incongruent AV-stimuli vs. opn of A-stimuli (ρ =.80*). Symbols represent acoustically presented vowels.

Results Heard roundedness of incongruent AV-stimuli vs. rnd of A-stimuli (ρ = -.05). Symbols represent acoustically presented vowels.

Results Heard spreadness of incongruent AV-stimuli vs. spr of A-stimuli (ρ =.07). Symbols represent acoustically presented vowels.

Results Heard backness of incongruent AV-stimuli vs. roundedness of A-stimuli (ρ =.71*). Symbols represent acoustically presented vowels.

Results Heard openness of incongruent AV-stimuli plotted against opn of A- stimuli (left, ρ =.71*) and of V-stimuli (right, ρ =.03). Symbols represent acoustically presented vowels.

Results Heard roundedness of incongruent AV-stimuli plotted against rnd of A- stimuli (left, ρ = -.05) and of V-stimuli (right, ρ =.79*). Symbols represent acoustically presented vowels.

Results Heard spreadness of incongruent AV-stimuli plotted against spr of A- stimuli (left, ρ =.07) and of V-stimuli (right, ρ =.90*). Symbols represent acoustically presented vowels.

Results Heard backness of incongruent AV-stimuli plotted against roundedness of A-stimuli (left, ρ =.71*) and of V-stimuli (right, ρ = -.59*). Symbols represent acoustically presented vowels.

Results The results were subjected to linear regression analyses in which the average ratings obtained in each unimodal presentation were taken as candidate independent variables together with the interaction terms. A comparison of the regression equations that describe the results of the listening task and the viewing task shows that the two percepts need to be distinguished from each other.

Results The difference is particularly clear in the dimension of openness: opn heard = opn A opn V (r 2 =0.97) opn seen = opn A opn V (r 2 =0.81) the rounded vowels to the right of their charts.

Results The difference is particularly clear in the dimension of openness: opn heard = opn A opn V (r 2 =0.97) opn seen = opn A opn V (r 2 =0.81) In the listening task, the estimates were based on the acoustic cues alone. In the viewing task, they were based on a weighted sum of the acoustic and the optic cues. rounded vels to the right of their rts.

Results In perception of roundedness and spreadness, there were only some minor differences between the results of the two tasks. In these dimensions, our subjects relied almost totally on optic cues not only when asked what they saw, but also when asked what they heard.

Results There was, however, an interesting difference in perceived backness. bac heard = rnd A rnd AV (r 2 =0.74) bac seen = bac V (r 2 =0.22)

Results There was, however, an interesting difference in perceived backness. bac heard = rnd A rnd AV (r 2 =0.74) bac seen = bac V (r 2 =0.22) Note that bac heard is given by cues reflecting roundedness rather than backness.

Discussion There are two hypothetical explanations for an effect of roundedness on perceived backness: 1.The distance from the lips to the dorso-palatal ’place of articulation’ is increased by lip rounding as well as by tongue retraction. This would provide an articulatory (gestural) explanation. 2.F 2 ’ is lowered by lip rounding as well as by tongue retraction. This would provide an auditory explanation. Both explanations would be consistent with the placement of the rounded vowels to the right of their unrounded counterparts in IPA-charts.

Discussion There are two hypothetical explanations for an effect of roundedness on perceived backness: 1.The distance from the lips to the dorso-palatal ’place of articulation’ is increased by lip rounding as well as by tongue retraction. This would provide an articulatory (gestural) explanation. 2.The upper formants (F 2 ’) are lowered by lip rounding as well as by tongue retraction. This would provide an auditory explanation. Both explanations would be consistent with the placement of the rounded vowels to the right of their unrounded counterparts in IPA-charts.

Discussion There are two hypothetical explanations for an effect of roundedness on perceived backness: 1.The distance from the lips to the dorso-palatal ’place of articulation’ is increased by lip rounding as well as by tongue retraction. This would provide an articulatory (gestural) explanation. 2.The upper formants (F 2 ’) are lowered by lip rounding as well as by tongue retraction. This would provide an auditory explanation. Both explanations would be consistent with the placement of the rounded vowels to the right of their unrounded counterparts in IPA-charts.

Discussion Analysis of perceived backness StimulusPredictionObservation A (acoustic) V (optic) Expl. 1 (gestural) Expl. 2 (auditory) roundedunroundedfrontedretracted unroundedroundedretractedfronted

Discussion Analysis of perceived backness StimulusPredictionObservation A (acoustic) V (optic) Expl. 1 (gestural) Expl. 2 (auditory) roundedunroundedfrontedretracted unroundedroundedretractedfronted Conclusion: The effect is due to auditory (F 2 ’) rather than articulatory (gestural) associations.

Discussion The observed effect of liprounding on perceived backness cannot be explained on the basis of a late- integration hypothesis. Swedish lacks non-front unrounded vowel phonemes and phones, whose existence would be required in order to apply such a hypothesis. This is clear and direct evidence for early, pre- categorical integration. The result also shows that this integration takes place in an auditory space in which roundedness and backness have an essential component in common.

Discussion Acoustic signal A common percept Optic signal Auditory signal analysis Audiovisual integration Visual signal analysis An auditory percept A visual percept

Discussion Acoustic signal Vocal percept Optic signal Auditory analysis (demodulation) Integration of gestural information Visual analysis (demodulation) Integration of vocal information Modulation of voiceModulation of face Gestural percept

Summary Some earlier findings: 1)In clear AV vowel stimuli, Swedes hear roundedness predominantly by eye – but openness only by ear. (The strength of the influence of a modality reflects the reliability of the information.) 2)A predominantly male minority is less sensitive to vision. (There is a significant sex difference.) 3)Presence of visible lip rounding (a ‘marked’ feature) is more influential than its absence. Ref: H. Traunmüller and N. Öhrström (2007) "Audiovisual perception of openness and lip rounding in front vowels" Journal of Phonetics 35:

Summary Some earlier findings: 1)In clear AV vowel stimuli, Swedes hear roundedness predominantly by eye – but openness only by ear. (The strength of the influence of a modality reflects the reliability of the information.) 2)A predominantly male minority is less sensitive to vision. (There is a significant sex difference.) 3)Presence of visible lip rounding (a ‘marked’ feature) is more influential than its absence. Ref: H. Traunmüller and N. Öhrström (2007) "Audiovisual perception of openness and lip rounding in front vowels" Journal of Phonetics 35:

Summary Some earlier findings: 1)In clear AV vowel stimuli, Swedes hear roundedness predominantly by eye – but openness only by ear. (The strength of the influence of a modality reflects the reliability of the information.) 2)A predominantly male minority is less sensitive to vision. (There is a significant sex difference.) 3)Presence of visible lip rounding (a ‘marked’ feature) is more influential than its absence. Ref: H. Traunmüller and N. Öhrström (2007) "Audiovisual perception of openness and lip rounding in front vowels" Journal of Phonetics 35:

Summary Recent findings: 4)In addition to the auditory (vocal) percept that may be influenced by vision, there is a visual (gestural) percept that may be influenced by audition. (There are two AV percepts!) 5)The auditory perception of frontness/backness is based on AV integration at the level of phonetically informative properties prior to categorization. (This is likely to hold more generally for AV integration.)

Summary Recent findings: 4)In addition to the auditory (vocal) percept that may be influenced by vision, there is a visual (gestural) percept that may be influenced by audition. (There are two AV percepts!) 5)The auditory perception of frontness/backness is based on AV integration at the level of phonetically informative properties prior to categorization. (This is likely to hold more generally for AV integration.)

Summary Recent findings: 6)In AV vocal perception, only a minority comes close to optimal (Bayesian) integration. 7)In AV gesture perception (by normal hearing subjects), integration is less optimal. Ref: H. Traunmüller (2006) "Cross-modal interactions in visual as opposed to auditory perception of vowels" Working Papers 52: (Lund University, Dept. of Linguistics).

Summary Recent findings: 6)In AV vocal perception, only a minority comes close to optimal (Bayesian) integration. 7)In AV gesture perception (by normal hearing subjects), integration is less optimal. Ref: H. Traunmüller (2006) "Cross-modal interactions in visual as opposed to auditory perception of vowels" Working Papers 52: (Lund University, Dept. of Linguistics).

Conclusions  The results clash irreconcilably with gestural-only theories of speech perception, such as the Motor Theory and the Direct Realist Theory.

Conclusions  The results clash irreconcilably with gestural-only theories of speech perception, such as the Motor Theory and the Direct Realist Theory.  Models of auditory-visual integration need to be extended in order to capture the two percepts.

Conclusions  The results clash irreconcilably with gestural-only theories of speech perception, such as the Motor Theory and the Direct Realist Theory.  Models of auditory-visual integration need to be extended in order to capture the two percepts.  The Modulation Theory, according to which speech is primarily modulated voice, but also modulated face, provides a possible foundation for such an extention.

Acknowledgement Research supported by the Swedish Research Council

Thank you for your attention!

Results Left: Seen spreadness plotted against seen roundedness. Right: Heard spreadness plotted against heard roundedness. Symbols represent acoustically presented vowels.

The Modulation Theory Speech is modulated voice and face. The said is conveyed by the modulation. Perceptual recovery requires 'demodulation'. Users associate modulations with corresponding somatosensations. Ref: H. Traunmüller “Speech considered as modulated voice“.