Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.

Slides:



Advertisements
Similar presentations
1 Multimodal Technology Integration for News-on-Demand SRI International News-on-Demand Compare & Contrast DARPA September 30, 1998.
Advertisements

Sub-Project I Prosody, Tones and Text-To-Speech Synthesis Sin-Horng Chen (PI), Chiu-yu Tseng (Co-PI), Yih-Ru Wang (Co-PI), Yuan-Fu Liao (Co-PI), Lin-shan.
“Effect of Genre, Speaker, and Word Class on the Realization of Given and New Information” Julia Agustín Gravano & Julia Hirschberg {agus,
Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.
IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System.
Nuclear Accent Shape and the Perception of Prominence Rachael-Anne Knight Prosody and Pragmatics 15 th November 2003.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
1 Spoken Dialogue Systems Dialogue and Conversational Agents (Part IV) Chapter 19: Draft of May 18, 2005 Speech and Language Processing: An Introduction.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Identifying Local Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago October 5, 2004.
Characterizing and Recognizing Spoken Corrections in Human-Computer Dialog Gina-Anne Levow August 25, 1998.
Presented by Ravi Kiran. Julia Hirschberg Stefan Benus Jason M. Brenier Frank Enos Sarah Friedman Sarah Gilman Cynthia Girand Martin Graciarena Andreas.
Understanding Spoken Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago MAICS April 1, 2006.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
Varying Input Segmentation for Story Boundary Detection Julia Hirschberg GALE PI Meeting March 23, 2007.
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Robust Recognition of Emotion from Speech Mohammed E. Hoque Mohammed Yeasin Max M. Louwerse {mhoque, myeasin, Institute for Intelligent.
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
Center for Human Computer Communication Department of Computer Science, OG I 1 Designing Robust Multimodal Systems for Diverse Users and Mobile Environments.
Exploiting video information for Meeting Structuring ….
Discourse Markers Discourse & Dialogue CS November 25, 2006.
AUTOMATIC DETECTION OF REGISTER CHANGES FOR THE ANALYSIS OF DISCOURSE STRUCTURE Laboratoire Parole et Langage, CNRS et Université de Provence Aix-en-Provence,
May 2006CLINT-CS Verbmobil1 CLINT-CS Dialogue II Verbmobil.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
1 Prosody-Based Automatic Segmentation of Speech into Sentences and Topics Elizabeth Shriberg Andreas Stolcke Speech Technology and Research Laboratory.
National Taiwan University, Taiwan
Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Control of prosodic features under perturbation in collaboration with Frank Guenther Dept. of Cognitive and Neural Systems, BU Carrie Niziolek [carrien]
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
Audio/Speech CS376: November 4, 2004 as presented by Jessica Kuo.
Investigating Pitch Accent Recognition in Non-native Speech
Recognizing Structure: Dialogue Acts and Segmentation
Studying Intonation Julia Hirschberg CS /21/2018.
Meanings of Intonational Contours
Studying Intonation Julia Hirschberg CS /21/2018.
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Agustín Gravano & Julia Hirschberg {agus,
Discourse Structure in Generation
Emotional Speech Julia Hirschberg CS /16/2019.
Emer Gilmartin, Carl Vogel, ADAPT Centre Trinity College Dublin
Recognizing Structure: Dialogue Acts and Segmentation
Low Level Cues to Emotion
Acoustic-Prosodic and Lexical Entrainment in Deceptive Dialogue
Guest Lecture: Advanced Topics in Spoken Language Processing
Automatic Prosodic Event Detection
Presentation transcript:

Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004

Roadmap Motivation Data Collection –Segment Boundary Selection Feature Extraction & Analysis Cues to Segment Boundaries Preliminary Classification Study Conclusion

Why Segment? Enables language understanding tasks –Reference resolution Anaphors typically refer to entities in current segment –Summarization Identify and represent range of topics –Conversational understanding Constrain recognition Different interpretations in different contexts

Approaches to Segmentation Monologue –Text similarity: Vector space, language model, cue phrases –(Hearst 1994, Beeferman et al 1999, Marcu 2000) –Prosodic cues: (with text) Pitch, amplitude, duration, pause –(Nakatani et al 1995, Swerts 1997, Tur et al 2001) Human dialogue –Dialogue act classification (Shriberg et al, 1998; Taylor et al, 1998) Text: language models; Prosody: contour, accent type Multi-party segmentation –Text + silence (Galley et al, 2003)

Prosody in Human-Computer Dialogue Errors in speech recognition –Prosody provides additional source of evidence Topic change can be expensive Possible contrasts to human-human dialogue –More stilted speaking style –Slow conversation

Data Collection System: –SpeechActs (Sun Microsystems, ) Voice-only interface to desktop applications – , calendar, weather, stock quotes, time, currency –Data: 60 hours, collected during field trial 19 subjects: 4 expert, 14 novice, guest Recorded : 8KHz, 8-bit ulaw, Logged Manually transcribed > 7500 user utterances

Discourse Segment Boundary Data Focus: –High-level discourse segment boundaries Not fine-grained subtopic analysis (future work) –More reliably coded and extracted »(Swerts, 1997; Nakatani et al 1995) –Task-based correspondence Align with changes from application to application –Reliably extractable from current data set

Data Set Paired data set: –Discourse segment-final and segment-initial pairs User utterances Last command in current application, and application change command –U: What’s the price for Sun? Segment-final –S: … –U: Switch to mail.Segment-initial –473 pairs Extracted automatically Alignment, content verified manually

Acoustic Analysis Features: –Pitch and intensity Extracted automatically (Praat) –5-point median smoothed –Normalized per-speaker/call Scalar measures: –Maximum, minimum, mean Full utterance

Acoustic Contrasts Pitch: –Segment initial vs segment-final: Maximum, minimum, and mean significantly higher –Lower final fall in segment-final Intensity: –Segment-initial vs segment-final Mean intensity significantly higher –No other measures significant

Acoustic Contrasts

Discussion Segment initial utterances –Significantly higher in pitch and intensity Largest contrast –Dramatically lower pitch in segment final –Low pitch as cue to topic finality Robust cues to discourse segment boundaries

Classification: Preliminary Experiments Automatic prosody-based identification of segment boundaries –Question: Does a pair of utterances span a segment boundary? Data: –Ordered utterance pairs: Half segment-final + segment-initial Half non-boundary

Classifier and Features Decision tree classifier (c4.5) Features: –Pitch and intensity Maximum, minimum, mean –Values for each utterance –Differences across pair Preliminary classification results: –70-80% accuracy –Key features: Minimum pitch, average intensity

Classifier Tree Min

Conclusions & Future Work Discourse segment boundaries in HCI –Segment-initial utterances Significant increases in pitch and intensity –Relative to segment final –Robust contrastive use of pitch and intensity –Preliminary classification efforts: 70-80% Difference in pitch minimum, intensity Extend to subdialogue structure –Richer feature set, data set

Conclusions & Future Work Discourse segment boundaries in HCI –Segment-initial utterances Significant increases in pitch and intensity –Relative to segment final –Robust contrastive use of pitch and intensity Extend to subdialogue structure –Richer feature set, data set