Download presentation
Presentation is loading. Please wait.
1
The auditory and the visual percept evoked by the same audiovisual stimuli Hartmut Traunmüller Niklas Öhrström Dept. of Linguistics, University of Stockholm
2
Theoretical background It is fairly obvious that acoustic speech stimuli evoke an auditory percept, while optic speech stimuli evoke a visual percept. In phonetic terms, these percepts agree with each other in congruent AV stimuli. In incongruent AV stimuli, this is not necessarily so.
3
Theoretical background Acoustic signalOptic signal Auditory signal analysisVisual signal analysis An auditory percept A visual percept
4
Theoretical background Acoustic signal A common percept Optic signal Auditory signal analysis Audiovisual integration Visual signal analysis An auditory percept A visual percept
5
Theoretical background According to the Motor Theory and the Direct Realist theory of speech perception, the ‘object’ of speech perception is gestural in nature. These theories know of only one percept of speech, which may be identified with the common AV-percept in Figure 1.
6
Theoretical background Another theory, the Modulation Theory, considers speech primarily as modulated voice. The ‘object’ of normal speech perception is vocal in nature and consists in the modulation of a voice. The theory allows for a different percept in lip reading. This is gestural and consists in the modulation of a face.
7
Theoretical background In order to clarify the situation, it is necessary to investigate not only the effects an optic speech signal has on auditory perception, but also those an acoustic speech signal has on visual perception of speech – and to compare these effects with each other.
8
Earlier studies In an earlier experiment, we presented congruent and incongruent AV stimuli to subjects. The AV stimuli consisted of different front vowels presented within a [g_g] frame. They were incongruent with respect to openness (height) or roundedness or both. The subjects had to report which vowel they had heard. The response alternatives consisted of the nine letters that represent the long vowel phonemes of Swedish.
9
Earlier studies In an earlier experiment, we presented congruent and incongruent AV stimuli to subjects. The stimuli consisted of different front vowels presented within a [g_g] frame. They were incongruent with respect to openness (height) or roundedness or both. The subjects had to report which vowel they had heard. The response alternatives consisted of the nine letters that represent the long vowel phonemes of Swedish.
10
Earlier studies In an earlier experiment, we presented congruent and incongruent AV stimuli to subjects. The stimuli consisted of different front vowels presented within a [g_g] frame. The vowels were incongruent with respect to openness (height) or roundedness or both. The subjects had to report which vowel they had heard. The response alternatives consisted of the nine letters that represent the long vowel phonemes of Swedish.
11
Earlier studies In an earlier experiment, we presented congruent and incongruent AV stimuli to subjects. The stimuli consisted of different front vowels presented within a [g_g] frame. The vowels were incongruent with respect to openness (height) or roundedness or both. The subjects had to report which vowel they had heard. The response alternatives consisted of the nine letters that represent the long vowel phonemes of Swedish.
12
Earlier studies Typical result AVPercept ɡyɡɡeɡɡeɡ → ɡiɡ ɡeɡɡyɡ → ɡøɡɡøɡ ɡiɡɡyɡ → ɡyɡɡyɡ ɡeɡɡeɡɡiɡ → ɡeɡ Visual roundedness combined with auditory openness.
13
Earlier studies Explanation Acoustic cues to openness (F 1 etc.) are salient and reliable. Optic cues to openness are less reliable because of variation due to individual habits, attitude and emotion.
14
Earlier studies Explanation Acoustic cues to openness (F 1 etc.) are salient and reliable. Optic cues to openness are less reliable because of variation due to individual habits, attitude and emotion. Optic cues to roundedness are more reliable; rounded lips are easy to distinguish from unrounded in most conditions. Acoustic cues to roundedness (higher formants) lack salience and are less reliable.
15
Earlier studies The mentioned experiment was designed with the objective of investigating perception in terms of phonemic categories.
16
Earlier studies The mentioned experiment was designed with the objective of investigating perception in terms of phonemic categories. However, subjects informally reported having heard vowels whose quality differed from that of ordinary Swedish vowels. Auditorily rounded vowels appeared to be shifted backwards in the front-back dimension when presented together with optically unrounded vowels.
17
The present study The present experiment has the aim of exploring the cross-modal perceptual effects on the finer phonetic, sub-categorical perception of vowels.
18
The present study The present experiment has the aim of exploring the cross-modal perceptual effects on the finer phonetic, sub-categorical perception of vowels. It has also the additional aim of comparing the auditory and the visual perception of the same AV stimuli.
19
The present study We reused a subset of the stimuli from the previous experiment. AV ɡyɡɡiɡ ɡyɡɡeɡ ɡyɡ-- ɡyɡ AV ɡeɡɡiɡ ɡeɡɡyɡ ɡeɡ-- ɡeɡ AV ɡiɡɡyɡ ɡiɡɡeɡ ɡiɡ-- ɡiɡ
20
The present study There were 4 speakers: 2 male, 2 female.
21
The present study There were 8 perceivers: They were selected from a previous experiment where they had shown sensitivity to the optic signal in incongruent audiovisual stimuli. The 8 subjects were all phonetically skilled and familiar with the IPA-chart for vowels.
22
The present study The subjects perceived the stimuli by way of headphones and a computer screen. The stimuli were presented in quasi-random order. Responses were given on electronic response sheets.
23
The present study The subjects were instructed to rate these dimensions of the vowels: Lip rounding (6 degrees), 1st: unrounded; 5th: rounded Lip spreading (3 degrees) Openness (18 degrees), 2nd: close vowels, 6th: close-mid vowels Backness (11 degrees auditorily; 7 degrees visually), 2nd: front vowels, 6th (auditorily): central vowels
24
The present study In a first experiment, the subjects were instructed to rate the dimensions of vowels they heard. In a second experiment, the same subjects were instructed to rate the dimensions of vowels they saw. The incongruent stimuli were the same in the two experiments.
25
Results Openness opn vs. roundedness rnd; acoustic stimuli (listening only). Symbols represent speakers.
26
Results Openness opn vs. roundedness rnd; optic stimuli (lipreading only). Symbols represent speakers.
27
Results Heard openness of incongruent AV-stimuli vs. opn of A-stimuli (ρ =.80*). Symbols represent acoustically presented vowels.
28
Results Heard roundedness of incongruent AV-stimuli vs. rnd of A-stimuli (ρ = -.05). Symbols represent acoustically presented vowels.
29
Results Heard spreadness of incongruent AV-stimuli vs. spr of A-stimuli (ρ =.07). Symbols represent acoustically presented vowels.
30
Results Heard backness of incongruent AV-stimuli vs. roundedness of A-stimuli (ρ =.71*). Symbols represent acoustically presented vowels.
31
Results Heard openness of incongruent AV-stimuli plotted against opn of A- stimuli (left, ρ =.71*) and of V-stimuli (right, ρ =.03). Symbols represent acoustically presented vowels.
32
Results Heard roundedness of incongruent AV-stimuli plotted against rnd of A- stimuli (left, ρ = -.05) and of V-stimuli (right, ρ =.79*). Symbols represent acoustically presented vowels.
33
Results Heard spreadness of incongruent AV-stimuli plotted against spr of A- stimuli (left, ρ =.07) and of V-stimuli (right, ρ =.90*). Symbols represent acoustically presented vowels.
34
Results Heard backness of incongruent AV-stimuli plotted against roundedness of A-stimuli (left, ρ =.71*) and of V-stimuli (right, ρ = -.59*). Symbols represent acoustically presented vowels.
35
Results The results were subjected to linear regression analyses in which the average ratings obtained in each unimodal presentation were taken as candidate independent variables together with the interaction terms. A comparison of the regression equations that describe the results of the listening task and the viewing task shows that the two percepts need to be distinguished from each other.
36
Results The difference is particularly clear in the dimension of openness: opn heard = 0.05 + 1.00 opn A + 0.00 opn V (r 2 =0.97) opn seen = 0.05 + 0.59 opn A + 0.42 opn V (r 2 =0.81) the rounded vowels to the right of their charts.
37
Results The difference is particularly clear in the dimension of openness: opn heard = 0.05 + 1.00 opn A + 0.00 opn V (r 2 =0.97) opn seen = 0.05 + 0.59 opn A + 0.42 opn V (r 2 =0.81) In the listening task, the estimates were based on the acoustic cues alone. In the viewing task, they were based on a weighted sum of the acoustic and the optic cues. rounded vels to the right of their rts.
38
Results In perception of roundedness and spreadness, there were only some minor differences between the results of the two tasks. In these dimensions, our subjects relied almost totally on optic cues not only when asked what they saw, but also when asked what they heard.
39
Results There was, however, an interesting difference in perceived backness. bac heard = 0.06 + 0.25 rnd A - 0.20 rnd AV (r 2 =0.74) bac seen = 0.09 + 0.42 bac V (r 2 =0.22)
40
Results There was, however, an interesting difference in perceived backness. bac heard = 0.06 + 0.25 rnd A - 0.20 rnd AV (r 2 =0.74) bac seen = 0.09 + 0.42 bac V (r 2 =0.22) Note that bac heard is given by cues reflecting roundedness rather than backness.
41
Discussion There are two hypothetical explanations for an effect of roundedness on perceived backness: 1.The distance from the lips to the dorso-palatal ’place of articulation’ is increased by lip rounding as well as by tongue retraction. This would provide an articulatory (gestural) explanation. 2.F 2 ’ is lowered by lip rounding as well as by tongue retraction. This would provide an auditory explanation. Both explanations would be consistent with the placement of the rounded vowels to the right of their unrounded counterparts in IPA-charts.
42
Discussion There are two hypothetical explanations for an effect of roundedness on perceived backness: 1.The distance from the lips to the dorso-palatal ’place of articulation’ is increased by lip rounding as well as by tongue retraction. This would provide an articulatory (gestural) explanation. 2.The upper formants (F 2 ’) are lowered by lip rounding as well as by tongue retraction. This would provide an auditory explanation. Both explanations would be consistent with the placement of the rounded vowels to the right of their unrounded counterparts in IPA-charts.
43
Discussion There are two hypothetical explanations for an effect of roundedness on perceived backness: 1.The distance from the lips to the dorso-palatal ’place of articulation’ is increased by lip rounding as well as by tongue retraction. This would provide an articulatory (gestural) explanation. 2.The upper formants (F 2 ’) are lowered by lip rounding as well as by tongue retraction. This would provide an auditory explanation. Both explanations would be consistent with the placement of the rounded vowels to the right of their unrounded counterparts in IPA-charts.
44
Discussion Analysis of perceived backness StimulusPredictionObservation A (acoustic) V (optic) Expl. 1 (gestural) Expl. 2 (auditory) roundedunroundedfrontedretracted unroundedroundedretractedfronted
45
Discussion Analysis of perceived backness StimulusPredictionObservation A (acoustic) V (optic) Expl. 1 (gestural) Expl. 2 (auditory) roundedunroundedfrontedretracted unroundedroundedretractedfronted Conclusion: The effect is due to auditory (F 2 ’) rather than articulatory (gestural) associations.
46
Discussion The observed effect of liprounding on perceived backness cannot be explained on the basis of a late- integration hypothesis. Swedish lacks non-front unrounded vowel phonemes and phones, whose existence would be required in order to apply such a hypothesis. This is clear and direct evidence for early, pre- categorical integration. The result also shows that this integration takes place in an auditory space in which roundedness and backness have an essential component in common.
47
Discussion Acoustic signal A common percept Optic signal Auditory signal analysis Audiovisual integration Visual signal analysis An auditory percept A visual percept
48
Discussion Acoustic signal Vocal percept Optic signal Auditory analysis (demodulation) Integration of gestural information Visual analysis (demodulation) Integration of vocal information Modulation of voiceModulation of face Gestural percept
49
Summary Some earlier findings: 1)In clear AV vowel stimuli, Swedes hear roundedness predominantly by eye – but openness only by ear. (The strength of the influence of a modality reflects the reliability of the information.) 2)A predominantly male minority is less sensitive to vision. (There is a significant sex difference.) 3)Presence of visible lip rounding (a ‘marked’ feature) is more influential than its absence. Ref: H. Traunmüller and N. Öhrström (2007) "Audiovisual perception of openness and lip rounding in front vowels" Journal of Phonetics 35: 244-258.
50
Summary Some earlier findings: 1)In clear AV vowel stimuli, Swedes hear roundedness predominantly by eye – but openness only by ear. (The strength of the influence of a modality reflects the reliability of the information.) 2)A predominantly male minority is less sensitive to vision. (There is a significant sex difference.) 3)Presence of visible lip rounding (a ‘marked’ feature) is more influential than its absence. Ref: H. Traunmüller and N. Öhrström (2007) "Audiovisual perception of openness and lip rounding in front vowels" Journal of Phonetics 35: 244-258.
51
Summary Some earlier findings: 1)In clear AV vowel stimuli, Swedes hear roundedness predominantly by eye – but openness only by ear. (The strength of the influence of a modality reflects the reliability of the information.) 2)A predominantly male minority is less sensitive to vision. (There is a significant sex difference.) 3)Presence of visible lip rounding (a ‘marked’ feature) is more influential than its absence. Ref: H. Traunmüller and N. Öhrström (2007) "Audiovisual perception of openness and lip rounding in front vowels" Journal of Phonetics 35: 244-258.
52
Summary Recent findings: 4)In addition to the auditory (vocal) percept that may be influenced by vision, there is a visual (gestural) percept that may be influenced by audition. (There are two AV percepts!) 5)The auditory perception of frontness/backness is based on AV integration at the level of phonetically informative properties prior to categorization. (This is likely to hold more generally for AV integration.)
53
Summary Recent findings: 4)In addition to the auditory (vocal) percept that may be influenced by vision, there is a visual (gestural) percept that may be influenced by audition. (There are two AV percepts!) 5)The auditory perception of frontness/backness is based on AV integration at the level of phonetically informative properties prior to categorization. (This is likely to hold more generally for AV integration.)
54
Summary Recent findings: 6)In AV vocal perception, only a minority comes close to optimal (Bayesian) integration. 7)In AV gesture perception (by normal hearing subjects), integration is less optimal. Ref: H. Traunmüller (2006) "Cross-modal interactions in visual as opposed to auditory perception of vowels" Working Papers 52: 137 - 140 (Lund University, Dept. of Linguistics).
55
Summary Recent findings: 6)In AV vocal perception, only a minority comes close to optimal (Bayesian) integration. 7)In AV gesture perception (by normal hearing subjects), integration is less optimal. Ref: H. Traunmüller (2006) "Cross-modal interactions in visual as opposed to auditory perception of vowels" Working Papers 52: 137 - 140 (Lund University, Dept. of Linguistics).
56
Conclusions The results clash irreconcilably with gestural-only theories of speech perception, such as the Motor Theory and the Direct Realist Theory.
57
Conclusions The results clash irreconcilably with gestural-only theories of speech perception, such as the Motor Theory and the Direct Realist Theory. Models of auditory-visual integration need to be extended in order to capture the two percepts.
58
Conclusions The results clash irreconcilably with gestural-only theories of speech perception, such as the Motor Theory and the Direct Realist Theory. Models of auditory-visual integration need to be extended in order to capture the two percepts. The Modulation Theory, according to which speech is primarily modulated voice, but also modulated face, provides a possible foundation for such an extention.
59
Acknowledgement Research supported by the Swedish Research Council
60
Thank you for your attention!
61
Results Left: Seen spreadness plotted against seen roundedness. Right: Heard spreadness plotted against heard roundedness. Symbols represent acoustically presented vowels.
62
The Modulation Theory Speech is modulated voice and face. The said is conveyed by the modulation. Perceptual recovery requires 'demodulation'. Users associate modulations with corresponding somatosensations. Ref: H. Traunmüller “Speech considered as modulated voice“.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.