CS 4705 Lecture 22 Intonation and Discourse What does prosody convey? In general, information about: –What the speaker is trying to convey Is this a.

Slides:



Advertisements
Similar presentations
4/29/20151 Predicting Phrasing and Accent Julia Hirschberg CS 4706.
Advertisements

Prosody Modeling (in Speech) by Julia Hirschberg Presented by Elaine Chew QMUL: ELE021/ELED021/ELEM March 2012.
“Downstepped contours in the given/new distinction” Agustín Gravano Spoken Language Processing Group Columbia University, New York On the Role of Prosody.
Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity Kjelgaard & Speer 1999 Kent Lee Ψ 526b 16 March 2006.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Varied, Vivid Expressive How can you use your voice to engage, express, and create meaning?
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Making & marking text for synthesis Caroline Henton 10 August 2006.
1 Spoken Dialogue Systems Dialogue and Conversational Agents (Part IV) Chapter 19: Draft of May 18, 2005 Speech and Language Processing: An Introduction.
Prosodic analysis: theoretical value and practical difficulties Anne Wichmann Nicole Dehé.
6/10/20151 Predicting Phrasing and Accent Julia Hirschberg CS 4706.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
J-ToBi Jennifer J. Venditti Presentation by James Rishe.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Dianne Bradley & Eva Fern á ndez Graduate Center & Queens College CUNY Eliciting and Documenting Default Prosody ABRALIN23-FEB-05.
6/28/20151 Predicting Phrasing and Accent Julia Hirschberg.
FUNCTIONS OF INTONATION
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Intonation and Information Discourse and Dialogue CS359 October 16, 2001.
FUNCTIONS OF INTONATION
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Intonation in Communication Skill: Recent Research Discourse, both in theoretical linguistics and in foreign language pedagogy,has focused on describing.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
1 Computation Approaches to Emotional Speech Julia Hirschberg
TOBI, continued (continued) February 2, 2010 Languages! Polish2 Tagalog2 Urdu Spanish Afrikaans Korean Gujarati Italian Russian Swedish Also: Perception.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
TOBI Basics April 13, 2010.
Lecture 7 Intonation 2 Lec. Maha Alwasidi.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
TOBI, continued January 29, 2008 The Outlook 1.Return course project reports. 2.New course schedule. 3.Today: Continue the discussion of English Intonation.
TOBI (the exciting conclusion!) February 1, 2011.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.
Suprasegmental features and Prosody Lect 6A&B LING1005/6105.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
INTONATION And IT’S FUNCTIONS
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
Lecture Overview Prosodic features (suprasegmentals)
Investigating Pitch Accent Recognition in Non-native Speech
SUPRASEGMENTAL PHONEME
Phonetics SPAU 3343 Chap. 10 – Grasping the melody of language
Recognizing Structure: Dialogue Acts and Segmentation
Studying Intonation Julia Hirschberg CS /21/2018.
Meanings of Intonational Contours
Representing Intonational Variation
Studying Intonation Julia Hirschberg CS /21/2018.
…It’s how you say it Julia Hirschberg CS /21/2018.
Intonational and Its Meanings
Intonational and Its Meanings
The American School and ToBI
Intonational Variation in Spoken Dialogue Systems
Meaningful Intonational Variation
Information Structure and Prosody
Meanings of Intonational Contours
Representing Intonational Variation
Representing Intonational Variation
Advanced NLP: Speech Research and Technologies
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
“Downstepped contours in the given/new distinction”
Predicting Phrasing and Accent
Advanced NLP: Speech Research and Technologies
Comparative Studies Avesani et al 1995; Hirschberg&Avesani 1997
Intonational and Its Meanings
Recognizing Structure: Dialogue Acts and Segmentation
Discourse & Dialogue CMSC October 28, 2004
Jennifer J. Venditti Presentation by James Rishe
Presentation transcript:

CS 4705 Lecture 22 Intonation and Discourse

What does prosody convey? In general, information about: –What the speaker is trying to convey Is this a statement or a question? –The speaker state Is the speaker getting angry, frustrated? In dialogue, information about: –The structure of the dialogue Is the user or the system trying to start a new topic? Is the speaker talking about given or new information? –The state of the interaction: Is the user having trouble being understood? Is the user having trouble understanding the system?

Current Trends New description schemes (e.g. ToBI) Corpus-based research and machine learning Emphasis on evaluation of algorithms and systems (NLE ‘00 special issue) Investigation of spontaneous speech phenomena and variation in speaking style Applications to CTS, ASR and SDS

Corpora Public and semi-public databases –ATIS, SwitchBoard, Call Home, Meetings (NIST/DARPA/LDC) –TRAINS/TRIPS (U. Rochester), FM Radio (BU), BDC (Harvard, AT&T) Private collections –Acquired for speech or dialogue research (August, KTH; Voic , AT&T, IBM) –Meetings, call centers, operator services, focus group collections The Web –Newscasts, radio

To(nes and)B(reak)I(ndices) Developed by prosody researchers in four meetings over Goals: –devise common labeling scheme for Standard American English that is robust and reliable –promote collection of large, prosodically labeled, shareable corpora ToBI standards also proposed for Japanese, German, Italian, Spanish, British and Australian English,....

Minimal ToBI transcription: –recording of speech –f0 contour –ToBI tiers: orthographic tier: words break-index tier: degrees of junction (Price et al ‘89) tonal tier: pitch accents, phrase accents, boundary tones (Pierrehumbert ‘80) miscellaneous tier: disfluencies, non-speech sounds, etc.

Sample ToBI Labeling

Online training material,available at: – Evaluation –Good inter-labeler reliability for expert and naive labelers: 88% agreement on presence/absence of tonal category, 81% agreement on category label, 91% agreement on break indices to within 1 level (Silverman et al. ‘92,Pitrelli et al ‘94)

Pitch Accent/Prominence in ToBI Which items are made intonationally prominent and how? Accent type: –H*simple high(declarative) –L*simple low(ynq) –L*+Hscooped, late rise (uncertainty/ incredulity) –L+H*early rise to stress(contrastive focus) –H+!H*fall onto stress (implied familiarity)

Downstepped accents: !H*, L+!H*, L*+!H Degree of prominence:  within a phrase: HiF0  across phrases

Functions of Pitch Accent Given/new information –S: Do you need a return ticket? –U: No, thanks, I don’t need a return. Contrast (narrow focus) –U: No, thanks, I don’t need a RETURN…. (I need a time schedule, receipt,…) Disambiguation of discourse markers –S: Now let me get you the train information. –U: Okay (thanks) vs. Okay….(but I really want…)

Predicting Accent: Is it accented or not? Applications: TTS and CTS Corpora: read and spontaneous speech Features: pos window of 3, sentence position, position within NP, # of syllables, position in complex nominal, inferred given/new status, inferred focus, mutual information Results: 75-85% correct, depending on genre

Prosodic Phrasing in ToBI ‘Levels’ of phrasing: –intermediate phrase: one or more pitch accents plus a phrase accent (H- or L- ) –intonational phrase: 1 or more intermediate phrases + boundary tone (H% or L% ) ToBI break-index tier –0 no word boundary –1 word boundary –2 strong juncture with no tonal markings –3 intermediate phrase boundary –4 intonational phrase boundary

Functions of Phrasing Disambiguates syntactic constructions, e.g. PP attachment, restrictive/non relative clause: –S: You should buy the ticket with the discount coupon. –S: The itinerary which I faxed includes deluxe accommodations Disambiguates scope ambiguities, e.g. Negation: –S: You aren’t booked through Rome because of the fare. Or modifier scope: –S: This fare is restricted to retired politicians and civil servants.

Predicting Phrase Boundaries Applications: TTS, CTS, ASR Corpora: AP news, Penn Treebank, ATIS Features: sentence position, sentence length, pos window of 4, location of previous predicted boundary, mutual information, constituent information, dependency structure Results: 96% correct

Contours: Accent + Phrasing What do intonational contours ‘mean’ (Ladd ‘80, Bolinger ‘89)? –Speech acts (statements, questions, requests) S: That’ll be credit card? (L* H- H%) –Propositional attitude (uncertainty, incredulity) S: You’d like an evening flight. (L*+H L- H%) –Speaker affect (anger, happiness, love) U: I said four SEVEN one! (L+H* L- L%) –“Personality” S: Welcome to the Sunshine Travel System.

Pitch Range and Timing Level of speaker engagement –S: Welcome to InfoTravel. How may I help you? Contour interpretation –S: You can take the L*+H bus from Malpensa to Rome L-H%. –U: Take the bus. vs. Take the bus! Discourse/topic structure –Topic beginnings have higher pitch range, faster, preceded by longer pauses –Endings the opposite

Prosody and Speaker Emotion What makes an utterance sound angry? Sad? –How much comes from the lexical information? –How much from the acoustic/prosodic? –Does all anger, e.g., sound the same? Cahn ‘88 (examples)

Applications Text-to-Speech and Concept-to-Speech generation: improve naturalness Speech Recognition: identify suprasegmental meaning Spoken Dialogue Systems: understand when people are confused, angry Audio Browsing: format corpora for browsing and search

Challenges We don’t really know what most contours ‘mean’ Our accent prediction needs more sensitivity to better model of given/new, focus, grammatical function Our phrasing prediction needs better information about e.g. attachment We don’t know much about emotional speech or ‘personality’ -- critical to applications