Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough,

Slides:



Advertisements
Similar presentations
Phonetics as a scientific study of speech
Advertisements

Teaching Pronunciation
Voice quality variation with fundamental frequency in English and Mandarin.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Frequency, Pitch, Tone and Length October 15, 2012 Thanks to Chilin Shih for making some of these lecture materials available.
1 The Effect of Pitch Span on the Alignment of Intonational Peaks and Plateaux Rachael-Anne Knight University of Cambridge.
Spoken Language Analysis Dept. of General & Comparative Linguistics Christian-Albrechts-Universität zu Kiel Oliver Niebuhr 1 At the Segment-Prosody.
Nuclear Accent Shape and the Perception of Prominence Rachael-Anne Knight Prosody and Pragmatics 15 th November 2003.
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
Prosodic Signalling of (Un)Expected Information in South Swedish Gilbert Ambrazaitis Linguistics and Phonetics Centre for Languages and Literature.
Tone, Accent and Stress February 14, 2014 Practicalities Production Exercise #2 is due at 5 pm today! For Monday after the break: Yoruba tone transcription.
PHONETICS AND PHONOLOGY
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
Vocal Emotion Recognition with Cochlear Implants Xin Luo, Qian-Jie Fu, John J. Galvin III Presentation By Archie Archibong.
Optical Phonetics and Visual Perception of Lexical and Phrasal Stress in English Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough, Abeer.
Linguistic Phonetics in the UCLA Phonetics Lab Pat Keating Sound to Sense / June 11, 2004.
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Phonology Katie Burns Title III Resource Teacher.
STUDY OF ENGLISH STRESS AND INTONATION
Linguistics 341: Introduction to Phonetics Steve Winters, Instructor Jacqueline Jones, Teaching Assistant Science A 247 MWF 1:00-1:50.
Conclusions  Constriction Type does influence AV speech perception when it is visibly distinct Constriction is more effective than Articulator in this.
Interarticulator programming in VCV sequences: Effects of closure duration on lip and tongue coordination Anders Löfqvist Haskins Laboratories New Haven,
Phonetics and Phonology
Perceived prominence and nuclear accent shape Rachael-Anne Knight LAGB 5 th September 2003.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
Acoustic Aspects of Place Contrasts in Children with Cochlear Implants Kelly Wagner, M.S., & Peter Flipsen Jr., Ph.D. Idaho State University INTRODUCTION.
Speech Perception 4/4/00.
1 Loudness and Pitch Be sure to complete the loudness and pitch interactive tutorial at … chophysics/pitch/loudnesspitch.html.
Sh s Children with CIs produce ‘s’ with a lower spectral peak than their peers with NH, but both groups of children produce ‘sh’ similarly [1]. This effect.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Frequency, Pitch, Tone and Length October 16, 2013 Thanks to Chilin Shih for making some of these lecture materials available.
Phonetics February 7, 2012 Housekeeping Morphology homeworks are due!
Background: Speakers use prosody to distinguish between the meanings of ambiguous syntactic structures (Snedeker & Trueswell, 2004). Discourse also has.
The Effect of Pitch Span on Intonational Plateaux Rachael-Anne Knight University of Cambridge Speech Prosody 2002.
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
Neurophysiologic correlates of cross-language phonetic perception LING 7912 Professor Nina Kazanina.
Roland Goecke Trent Lewis Michael Wagner 1Big ASC Meeting April 2010.
Tone, Accent and Quantity October 19, 2015 Thanks to Chilin Shih for making some of these lecture materials available.
Bettina Braun Max Planck Institute for Psycholinguistics Effects of dialect and context on the realisation of German prenuclear accents.
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
Control of prosodic features under perturbation in collaboration with Frank Guenther Dept. of Cognitive and Neural Systems, BU Carrie Niziolek [carrien]
Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
Suprasegmental features and Prosody Lect 6A&B LING1005/6105.
“Articulatory Talking Head” Showcase Project, INRIA, KTH. Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre.
Introduction Method Experiment 2 In spoken word recognition, phonological and indexical properties (i.e., characteristics of the speaker’s voice) of a.
Danielle Werle Undergraduate Thesis Intelligibility and the Carrier Phrase Effect in Sinewave Speech.
Speechreading Based on Tye-Murray (1998) pp
1 Probing the Big Bang with ultrasound: Retraction of /s/ in English Adam Baker, Jeff Mielke, Diana Archangeli University of Arizona Supported by James.
Speech Audiometry Lecture 8.
Sentence Durations and Accentedness Judgments
Lecture Overview Prosodic features (suprasegmentals)
Introduction   Many 3-D pronunciation tutors with both internal and external articulator movements have been implemented and applied to computer-aided.
Caterina Petrone. , Mariapaola D’Imperio. , Susanne Fuchs
Introduction   Many 3-D pronunciation tutors with both internal and external articulator movements have been implemented and applied to computer-aided.
Investigating Pitch Accent Recognition in Non-native Speech
August 15, 2008, presented by Rio Akasaka
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Studying Intonation Julia Hirschberg CS /21/2018.
What is Phonetics? Short answer: The study of speech sounds in all their aspects. Phonetics is about describing speech. (Note: phonetics ¹ phonics) Phonetic.
Detecting Prosody Improvement in Oral Rereading
Representing Intonational Variation
Comparative Studies Avesani et al 1995; Hirschberg&Avesani 1997
Speech Perception (acoustic cues)
A Japanese trilogy: Segment duration, articulatory kinematics, and interarticulator programming Anders Löfqvist Haskins Laboratories New Haven, CT.
Presentation transcript:

Optical Phonetics and Visual Perception of Lexical and Phrasal Stress in English Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough, Abeer Alwan, Edward T. Auer, Lynne E. Bernstein

Introduction Phrasal (focal) stress can be perceived visually above chance, though intonation cannot (e.g. Bernstein et al. 1989). Many studies have shown that stress is marked by longer, larger, and faster movements of jaw, lips, and tongue; sometimes by eyebrow movements; and acoustically mainly by f0 (pitch accents), lengthening, and loudness. Jaw lowering and acoustic duration are known to correlate with auditory perception of stress, and eyebrow movement with visual perception.

Optical phonetics of stress Extents, durations, and velocities of movements of lips, chin, and eyebrows, and mouth opening, are all potentially visible to perceivers. Our production (optical) measures are position and movement measures of visible fleshpoints.

This study Production experiment: Do speakers show any consistent optical correlates of phrasal and lexical stresses? Perception experiment: Are there differences in the visual intelligibility of phrasal and lexical stress, and of the different speakers? Production-perception comparison: Which, if any, of the optical production correlates account for visual intelligibility?

Production methods Lexical stress materials 4 minimal pairs DIScharge / disCHARGE DIScount / disCOUNT PERvert / perVERT SUBject / subJECT 4 non-minimal pairs DEbit / casSETTE INstance / conVINCE BUSiness / subMIT COUrage / gaZELLE Minimal pairs read as given, and also reiterantly Non-minimal pairs only reiterantly 2 reiterant syllables “buh” = [bʌ] / [bƏ] “fer” = [fɝ] / [fɚ] differ in mouth opening TOTAL 40 words

Production methods Phrasal stress materials “So TOMMY gave Timmy a song from Debby.” “So Tommy gave TIMMY a song from Debby.” “So Tommy gave Timmy a song from DEBBY.” “So Tommy gave Timmy a song from Debby.” narrow (contrast) accent on one name or “neutral” broad focus these 4 stress conditions x 6 combinations of names = 24 sentences sentences not read reiterantly

Production methods Both stress contrasts involve nuclear accent Lexical stress items read in isolation Phrasal stress items read with narrow focus to show contrast and/or emphasis H* L-L% H* L-L% …a song from TIMMY DIScount (phrasal stress) (lexical stress)

Production Methods Speakers 3 male Californians differing in perceptually-determined visual intelligibility for segments low-medium = Sp-LO medium = Sp-MID high = Sp-HI VISUAL INTELLIGIBILITY SCORING: speakers video-recorded reading 320 (other) sentences 8 expert deaf lipreaders transcribed sentences, yielding % correct visual intelligibility scores

Production methods Recording set-up and procedure Videorecording professional-quality teleprompter under camera DAT recording Facial motion using Qualisys™ system 120 Hz SR 20 small passive retroreflectors three cameras infrared flash 3D position for each retroreflector Items blocked by stress location Two tokens of each item

Production methods Facepoint marker locations and measurements Left eyebrow displacement Head displacement Interlip maximum distance Interlip opening displacement Interlip closing displacement Lower lip opening peak velocity Lower lip closing peak velocity Chin opening displacement Chin opening peak velocity Chin closing displacement Chin closing peak velocity eyebrow markers head marker lip markers chin marker

Production methods Data analysis Prosody of audio speech signals checked by two transcribers (some small differences found between prompted and produced stresses, but these differences generally do not affect analyses presented here) Here, only tokens used in perception study analyzed (1 of the 2 tokens of each item) Effects of stress on the 11 facepoint marker measurements tested by (factorial) ANOVAs

Production results Overview Stress is well-marked by these measures Lexical vs. phrasal stress: more significantly different measures, and larger differences between stressed and unstressed, with phrasal stress than with lexical Reiterant vs. nonreiterant words: both sets show stress effect

Production results Significant differences due to Lexical stress 5 of 11 measures distinguish stress - 3 opening gesture measures e.g. Head, and Interlip Max. Distance Generally holds across speakers and real vs. reiterant Interlip Opening Displacement all reiterant words syllable 1 syllable 2

Production results Significant differences due to Phrasal stress All 11 measures distinguish stress, e.g. Chin and eyebrow measures are more consistent across speakers Chin Closing Peak Velocity accented unaccented

Production results Significant Head and Eyebrow movements Stress in words Head moves, eyebrow not Stress in phrases Head down (2 speakers) Eyebrow up So TIMMY gave Tommy a song from Debby

Production results An aside: Eyebrows and F0 40 sentences from the phrasal stress corpus F0 from audio, and right and left eyebrow positions, at 12 ms intervals Significant correlations between eyebrows and F0, but accounting for little variance (only 1-4%)

Perception methods 1 token of each item from production corpus (120 words, 72 sentences), each presented twice (384 total trials) 16 hearing perceivers (not screened for lipreading ability) Test video clip (no sound) on right monitor, clickable response choices on left monitor Lexical stress: Response choices were pairs of real words, even for reiterant items Sentences: Click on one name, or on “NoStress”

Perception results Overview Stress is perceived above chance Lexical vs. phrasal stress: phrasal stress is perceived better Reiterant vs. nonreiterant words: perceived equally well

Perception results Overall results, all above chance %correct Chance 25% N=2304 N=3072 N=768

Perception results Lexical vs. phrasal stress Individual subjects’ % correct relative to levels that are significantly above chance: phrasal perceived better (significantly so by paired t-test) phrasal all lexical

Perception results Lexical stress All lexical speech conditions equally-well perceived overall: Reiterant & non buh & fer Minimal & non % correct Minimal pairs non-minimal

Perception results Speakers: lexical stress All speakers’ lexical stress perceived above chance (50%) Sp-LO perceived better on reiterant words % correct non-reiterant reiterant minimal reiterant non-minimal

Perception results Phrasal stress 3 focal positions perceived equally well, and correct above chance for almost every item Responses to Neutral condition at chance % correct Position of stress in sentence

Perception results Speakers: phrasal stress All speakers’ phrasal stress perceived above chance (25%) Sp-MID perceived less accurately Sp-LO best for Neutral condition (not shown here) % correct

Production-perception comparisons: Speaker differences Prosodic intelligibility: Sp-LO highest for words, Neutral sentences; Sp-MID lowest for sentences Re production: Sp-LO shows larger lip differences than Sp-MID on sentences, and largest Chin closing displacement on words (but Sp-HI has largest head movement differences) Unrelated to segmental intelligibility: compare above with speakers’ names LO-MID-HI, which reflect their segmental intelligibility

Production-perception comparisons: Correlational analyses of sentences Tested relations between production measures and % correct perception of phrasal stresses 10 of 11 measures correlated significantly with perception, with chin measures accounting for the most variance (up to 40%) Only Interlip maximum distance (mouth opening) did not correlate with perception

Production-perception comparisons: Correlational analyses of sentences Partial correlations (controlling for contributions of various lip measures) show independent contributions to perception of Chin opening displacement (15% of variance) Chin peak opening velocity (11%) Lower lip peak opening velocity (11%) Closing gestures generally make no independent contributions to perception

Summary Lexical and phrasal stress are visually perceived above chance Phrasal stress is marked by more and larger production differences, and perceived better Chin opening accounts for most variance in perception of phrasal stress Speakers’ visual intelligibility for prosody does not correspond to segmental