Intonational and Its Meanings

Slides:



Advertisements
Similar presentations
“Effect of Genre, Speaker, and Word Class on the Realization of Given and New Information” Julia Agustín Gravano & Julia Hirschberg {agus,
Advertisements

Prosody Modeling (in Speech) by Julia Hirschberg Presented by Elaine Chew QMUL: ELE021/ELED021/ELEM March 2012.
“Downstepped contours in the given/new distinction” Agustín Gravano Spoken Language Processing Group Columbia University, New York On the Role of Prosody.
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
1 Spoken Dialogue Systems Dialogue and Conversational Agents (Part IV) Chapter 19: Draft of May 18, 2005 Speech and Language Processing: An Introduction.
6/10/20151 Predicting Phrasing and Accent Julia Hirschberg CS 4706.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
J-ToBi Jennifer J. Venditti Presentation by James Rishe.
CS 4705 Lecture 22 Intonation and Discourse What does prosody convey? In general, information about: –What the speaker is trying to convey Is this a.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Dianne Bradley & Eva Fern á ndez Graduate Center & Queens College CUNY Eliciting and Documenting Default Prosody ABRALIN23-FEB-05.
Intonation September 18, 2014 The Plan for Today Also: I have posted a couple of readings on TOBI (an intonation transcription system) to the course.
Intonation and Information Discourse and Dialogue CS359 October 16, 2001.
Intonation in Communication Skill: Recent Research Discourse, both in theoretical linguistics and in foreign language pedagogy,has focused on describing.
TOBI, continued (continued) February 2, 2010 Languages! Polish2 Tagalog2 Urdu Spanish Afrikaans Korean Gujarati Italian Russian Swedish Also: Perception.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
TOBI Basics April 13, 2010.
Lecture 7 Intonation 2 Lec. Maha Alwasidi.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
TOBI, continued January 29, 2008 The Outlook 1.Return course project reports. 2.New course schedule. 3.Today: Continue the discussion of English Intonation.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Suprasegmental features and Prosody Lect 6A&B LING1005/6105.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
INTONATION And IT’S FUNCTIONS
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
Lecture Overview Prosodic features (suprasegmentals)
August 15, 2008, presented by Rio Akasaka
Functions of intonation 1
Why Study Spoken Language?
Recognizing Structure: Dialogue Acts and Segmentation
INTONATION in spoken English by Ruth Wickham, Training Fellow, IPGKDRI.
Studying Intonation Julia Hirschberg CS /21/2018.
Meanings of Intonational Contours
Representing Intonational Variation
Studying Intonation Julia Hirschberg CS /21/2018.
Issues in Spoken Dialogue Systems
…It’s how you say it Julia Hirschberg CS /21/2018.
Spoken Dialogue Systems
Intonational and Its Meanings
Intonational and Its Meanings
Automatic Speech Recognition
The American School and ToBI
Intonational Variation in Spoken Dialogue Systems
Meaningful Intonational Variation
Speech Generation: From Concept and from Text
Dialogue Acts Julia Hirschberg CS /18/2018.
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Information Structure and Prosody
Why Study Spoken Language?
Meanings of Intonational Contours
Studying Spoken Language Text 17, 18 and 19
Representing Intonational Variation
Representing Intonational Variation
Advanced NLP: Speech Research and Technologies
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
“Downstepped contours in the given/new distinction”
Predicting Phrasing and Accent
Agustín Gravano & Julia Hirschberg {agus,
Advanced NLP: Speech Research and Technologies
Spoken Dialogue Systems
Discourse Structure in Generation
Comparative Studies Avesani et al 1995; Hirschberg&Avesani 1997
Predicting Phrasing and Accent
Spoken Dialogue Systems
Recognizing Structure: Dialogue Acts and Segmentation
Discourse & Dialogue CMSC October 28, 2004
Jennifer J. Venditti Presentation by James Rishe
Prosody in Generation JH 4/8/2019.
Presentation transcript:

Intonational and Its Meanings Julia Hirschberg CS 6998 12/30/2018

What do speech researchers do? Study human production and perception Try to embody it in machines Production: TTS, CTS Perception: ASR, ASRU, speaker ID, language ID 12/30/2018

Pitch Accent/Prominence in ToBI Which items are made intonationally prominent and how? Accent type: H* simple high (declarative) L* simple low (ynq) L*+H scooped, late rise (uncertainty/ incredulity) L+H* early rise to stress (contrastive focus) H+!H* fall onto stress (implied familiarity) 12/30/2018

Downstepped accents: !H*, L+!H*, L*+!H Degree of prominence: within a phrase: HiF0 across phrases 12/30/2018

Functions of Pitch Accent Given/new information S: Do you need a return ticket. U: No, thanks, I don’t need a return. Contrast (narrow focus) U: No, thanks, I don’t need a RETURN…. (I need a time schedule, receipt,…) Disambiguation of discourse markers S: Now let me get you the train information. U: Okay (thanks) vs. Okay….(but I really want…) 12/30/2018

Prosodic Phrasing in ToBI ‘Levels’ of phrasing: intermediate phrase: one or more pitch accents plus a phrase accent (H- or L- ) intonational phrase: 1 or more intermediate phrases + boundary tone (H% or L% ) ToBI break-index tier 0 no word boundary 1 word boundary 2 strong juncture with no tonal markings 3 intermediate phrase boundary 4 intonational phrase boundary 12/30/2018

Functions of Phrasing Disambiguates syntactic constructions, e.g. PP attachment: S: You should buy the ticket with the discount coupon. Disambiguates scope ambiguities, e.g. Negation: S: You aren’t booked through Rome because of the fare. Or modifier scope: S: This fare is restricted to retired politicians and civil servants. 12/30/2018

L*+H L* H* H-H% H-L% L-H% L-L% 12/30/2018

H* !H* H+!H* L+H* H-H% H-L% L-H% L-L% 12/30/2018

Contour Examples http://www.cs.columbia.edu/~julia/cs6998/cards/examples.html 12/30/2018

Contours: Accent + Phrasing What do intonational contours ‘mean’ (Ladd ‘80, Bolinger ‘89)? Speech acts (statements, questions, requests) S: That’ll be credit card? (L* H- H%) Propositional attitude (uncertainty, incredulity) S: You’d like an evening flight. (L*+H L- H%) Speaker affect (anger, happiness, love) U: I said four SEVEN one! (L+H* L- L%) “Personality” S: Welcome to the Sunshine Travel System. 12/30/2018

Propositional attitude (uncertainty) Did you feed the animals? I fed the L*+H goldfish L-H% Distinguish direct/indirect speech acts Can you open the door? 12/30/2018

And Other Things Contribute: Pitch Range and Timing (Rate, Pause) Level of speaker engagement Hello vs. HELLO Contour interpretation Rise/fall/rise (L*+H L-H%): Elephantiasis isn’t incurable Discourse/topic structure: paratones 12/30/2018

Prosodic Generation for TTS Corpus-based approaches Train prosodic variation on large labeled corpora using machine learning techniques Accent and phrasing decisions Associate prosodic labels with simple features of transcripts To do: Contour variation TTS default prosodic assignment developed to be independent of domains and tasks. Uses simple text analysis to vary phrasing, accent, possibly pitch range. While hand-built rule-sets are still used for particular application domains, most systems have moved toward automatically trained prosodic assignment systems. 12/30/2018

Timing and backchanneling Disfluencies? Emotion and ‘personality’ Personalized voices Work in spoken language generation is only beginning as a serious topic of research and development. Along the way there are large questions to answer, both for dialogue and monologue generation: 12/30/2018

Concept to Speech Decisions in TTS depend on text analysis Concept-to-Speech (CTS) systems should be able to do better System knows what it wants to say and can specify how But…. Still need labeled corpora to train on CTS features may be hard to label (focus, given/new,…) How to decide how to realize these? In principle, the information TTS systems lack to support natural prosodic assignment is readily available to CTS systems. So the initial hope in the NLG community was that prosodic assignment would be a simple problem. It’s proven however fairly hard. Why? 12/30/2018

Prosody in ASRU Little success in improving ASR transcription More promise in other areas: Improving rejection Shrinking search space Automatic topic segmentation for browsing/retrieval Identifying ‘salient’ words in turns Disambiguating speech/dialogue acts: okay 12/30/2018

Recognizing communicative ‘problems’ ASR errors User corrections ‘Aware’ turns ‘Problematic’ dialogues Disfluencies and self-repairs Recognizing speaker emotion 12/30/2018

Some Research Topics Meaning of intonational contours: Rise/fall/rise (L*+H L-H%) A: Did you take out the garbage? B: Sort of. A: Sort of! High rise questions (H* H-H%) This is the chicken Chermula? I’m from Skokie? 12/30/2018

Compositional theory of intonational meaning (w/Pierrehumbert) Intonational disambiguation across languages: Spanish, Italian and English (w/Avesani & Prieto) William isn’t drinking because he’s unhappy Disfluencies: self-repairs (w/Nakatani) I want to go to Ba- Baltimore. Cue phrases (w/Litman) Now let’s go to work. Get a3 and a4 for disambig gw for other 12/30/2018

Accent and strict/sloppy interpretations of ellipsis (w/Ward) People who live in Los Angeles adore it’s beaches and so do people who live in New York 12/30/2018

Accent and given/new (w/Terken) The ball touches the circle. The ball touches the triangle. The ball touches the cone. The square touches the ball. Intonation and discourse structure (w/Grosz & Nakatani) Boston Directions Corpus Automatic assignment of accent and phrasing for TTS (w/Wang, Sproat, Koehn, Abney, Collins, Rambow) 12/30/2018

ToBI prosodic labeling conventions w/many) Prosody in dialogue systems (w/Litman & Swerts): generation and understanding (TOOT) Audio browsing and retrieval: SCAN and SCANMail (w/many) 12/30/2018

Potential Projects Build a TTS system in a limited domain Build a speech recognizer Study a speech phenomenon (disfluencies, accenting, contours, pitch range variation) Do some experiments (production, perception). Examples: Speech summarization, eye tracking and emotion, deceptive speech, given/new and contour,…. 12/30/2018