Presentation is loading. Please wait.

Presentation is loading. Please wait.

Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough,

Similar presentations


Presentation on theme: "Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough,"— Presentation transcript:

1 Optical Phonetics and Visual Perception of Lexical and Phrasal Stress in English
Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough, Abeer Alwan, Edward T. Auer, Lynne E. Bernstein

2 Introduction Phrasal (focal) stress can be perceived visually above chance, though intonation cannot (e.g. Bernstein et al. 1989). Many studies have shown that stress is marked by longer, larger, and faster movements of jaw, lips, and tongue; sometimes by eyebrow movements; and acoustically mainly by f0 (pitch accents), lengthening, and loudness. Jaw lowering and acoustic duration are known to correlate with auditory perception of stress, and eyebrow movement with visual perception.

3 Optical phonetics of stress
Extents, durations, and velocities of movements of lips, chin, and eyebrows, and mouth opening, are all potentially visible to perceivers. Our production (optical) measures are position and movement measures of visible fleshpoints.

4 This study Production experiment: Do speakers show any consistent optical correlates of phrasal and lexical stresses? Perception experiment: Are there differences in the visual intelligibility of phrasal and lexical stress, and of the different speakers? Production-perception comparison: Which, if any, of the optical production correlates account for visual intelligibility?

5 Production methods Lexical stress materials
4 minimal pairs DIScharge / disCHARGE DIScount / disCOUNT PERvert / perVERT SUBject / subJECT 4 non-minimal pairs DEbit / casSETTE INstance / conVINCE BUSiness / subMIT COUrage / gaZELLE Minimal pairs read as given, and also reiterantly Non-minimal pairs only reiterantly 2 reiterant syllables “buh” = [bʌ] / [bƏ] “fer” = [fɝ] / [fɚ] differ in mouth opening TOTAL 40 words

6 Production methods Phrasal stress materials
“So TOMMY gave Timmy a song from Debby.” “So Tommy gave TIMMY a song from Debby.” “So Tommy gave Timmy a song from DEBBY.” “So Tommy gave Timmy a song from Debby.” narrow (contrast) accent on one name or “neutral” broad focus these 4 stress conditions x 6 combinations of names = 24 sentences sentences not read reiterantly

7 Production methods Both stress contrasts involve nuclear accent
Lexical stress items read in isolation Phrasal stress items read with narrow focus to show contrast and/or emphasis H* L-L% H* L-L% …a song from TIMMY DIScount (phrasal stress) (lexical stress)

8 Production Methods Speakers
3 male Californians differing in perceptually-determined visual intelligibility for segments low-medium = Sp-LO medium = Sp-MID high = Sp-HI VISUAL INTELLIGIBILITY SCORING: speakers video-recorded reading 320 (other) sentences 8 expert deaf lipreaders transcribed sentences, yielding % correct visual intelligibility scores

9 Production methods Recording set-up and procedure
Videorecording professional-quality teleprompter under camera DAT recording Facial motion using Qualisys™ system 120 Hz SR 20 small passive retroreflectors three cameras infrared flash 3D position for each retroreflector Items blocked by stress location Two tokens of each item

10 Production methods Facepoint marker locations and measurements
Left eyebrow displacement Head displacement Interlip maximum distance Interlip opening displacement Interlip closing displacement Lower lip opening peak velocity Lower lip closing peak velocity Chin opening displacement Chin opening peak velocity Chin closing displacement Chin closing peak velocity eyebrow markers head marker lip markers chin marker

11 Production methods Data analysis
Prosody of audio speech signals checked by two transcribers (some small differences found between prompted and produced stresses, but these differences generally do not affect analyses presented here) Here, only tokens used in perception study analyzed (1 of the 2 tokens of each item) Effects of stress on the 11 facepoint marker measurements tested by (factorial) ANOVAs

12 Production results Overview
Stress is well-marked by these measures Lexical vs. phrasal stress: more significantly different measures, and larger differences between stressed and unstressed, with phrasal stress than with lexical Reiterant vs. nonreiterant words: both sets show stress effect

13 Production results Significant differences due to Lexical stress
5 of 11 measures distinguish stress - 3 opening gesture measures e.g. Head, and Interlip Max. Distance Generally holds across speakers and real vs. reiterant Interlip Opening Displacement all reiterant words syllable 1 syllable 2

14 Production results Significant differences due to Phrasal stress
All 11 measures distinguish stress, e.g. Chin and eyebrow measures are more consistent across speakers Chin Closing Peak Velocity accented unaccented

15 Production results Significant Head and Eyebrow movements
Stress in words Head moves, eyebrow not Stress in phrases Head down (2 speakers) Eyebrow up So TIMMY gave Tommy a song from Debby

16 Production results An aside: Eyebrows and F0
40 sentences from the phrasal stress corpus F0 from audio, and right and left eyebrow positions, at 12 ms intervals Significant correlations between eyebrows and F0, but accounting for little variance (only 1-4%)

17 Perception methods 1 token of each item from production corpus (120 words, 72 sentences), each presented twice (384 total trials) 16 hearing perceivers (not screened for lipreading ability) Test video clip (no sound) on right monitor, clickable response choices on left monitor Lexical stress: Response choices were pairs of real words, even for reiterant items Sentences: Click on one name, or on “NoStress”

18 Perception results Overview
Stress is perceived above chance Lexical vs. phrasal stress: phrasal stress is perceived better Reiterant vs. nonreiterant words: perceived equally well

19 Perception results Overall results, all above chance
%correct Chance 25% N=2304 N=3072 N=768

20 Perception results Lexical vs. phrasal stress
Individual subjects’ % correct relative to levels that are significantly above chance: phrasal perceived better (significantly so by paired t-test) phrasal all lexical

21 Perception results Lexical stress
All lexical speech conditions equally-well perceived overall: Reiterant & non buh & fer Minimal & non % correct Minimal pairs non-minimal

22 Perception results Speakers: lexical stress
All speakers’ lexical stress perceived above chance (50%) Sp-LO perceived better on reiterant words % correct non-reiterant reiterant minimal reiterant non-minimal

23 Perception results Phrasal stress
3 focal positions perceived equally well, and correct above chance for almost every item Responses to Neutral condition at chance % correct Position of stress in sentence

24 Perception results Speakers: phrasal stress
All speakers’ phrasal stress perceived above chance (25%) Sp-MID perceived less accurately Sp-LO best for Neutral condition (not shown here) % correct

25 Production-perception comparisons: Speaker differences
Prosodic intelligibility: Sp-LO highest for words, Neutral sentences; Sp-MID lowest for sentences Re production: Sp-LO shows larger lip differences than Sp-MID on sentences, and largest Chin closing displacement on words (but Sp-HI has largest head movement differences) Unrelated to segmental intelligibility: compare above with speakers’ names LO-MID-HI, which reflect their segmental intelligibility

26 Production-perception comparisons: Correlational analyses of sentences
Tested relations between production measures and % correct perception of phrasal stresses 10 of 11 measures correlated significantly with perception, with chin measures accounting for the most variance (up to 40%) Only Interlip maximum distance (mouth opening) did not correlate with perception

27 Production-perception comparisons: Correlational analyses of sentences
Partial correlations (controlling for contributions of various lip measures) show independent contributions to perception of Chin opening displacement (15% of variance) Chin peak opening velocity (11%) Lower lip peak opening velocity (11%) Closing gestures generally make no independent contributions to perception

28 Summary Lexical and phrasal stress are visually perceived above chance
Phrasal stress is marked by more and larger production differences, and perceived better Chin opening accounts for most variance in perception of phrasal stress Speakers’ visual intelligibility for prosody does not correspond to segmental


Download ppt "Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough,"

Similar presentations


Ads by Google