Adapting Dialogue Models Discourse & Dialogue CMSC 35900-1 November 19, 2006.

Slides:

Advertisements

Similar presentations

Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.

Advertisements

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.

/ nailon / – software for online analysis of prosody Interspeech 2006 special session: The prosody of turn-taking and dialog acts September 20, 2006 Jens.

5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.

Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.

Dialogue Management Ling575 Discourse and Dialogue May 18, 2011.

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.

Learning in the Wild Satanjeev “Bano” Banerjee Dialogs on Dialog March 18 th, 2005 In the Meeting Room Scenario.

Recognition of Voice Onset Time for Use in Detecting Pronunciation Variation ● Project Description ● What is Voice Onset Time (VOT)? – Physical Realization.

Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.

Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.

Detecting missrecognitions Predicting with prosody.

Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.

Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.

Natural Language Understanding

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

Clarification in Spoken Dialogue Systems: Modeling User Behaviors Julia Hirschberg Columbia University 1.

Discourse Markers Discourse & Dialogue CS November 25, 2006.

Recognition of meeting actions using information obtained from different modalities Natasa Jovanovic TKI University of Twente.

Exploiting lexical information for Meeting Structuring Alfred Dielmann, Steve Renals (University of Edinburgh) {

Interaction Design Session 12 LBSC 790 / INFM 718B Building the Human-Computer Interface.

Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.

On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.

Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.

Turn-taking Discourse and Dialogue CS 359 November 6, 2001.

1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.

ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.

1 Natural Language Processing Lecture Notes 14 Chapter 19.

Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

1 Prosody-Based Automatic Segmentation of Speech into Sentences and Topics Elizabeth Shriberg Andreas Stolcke Speech Technology and Research Laboratory.

Wrapping Up Ling575 Spoken Dialog Systems June 5, 2013.

National Taiwan University, Taiwan

Building & Evaluating Spoken Dialogue Systems Discourse & Dialogue CS 359 November 27, 2001.

Discourse & Dialogue CS 359 November 13, 2001

Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.

© 2013 by Larson Technical Services

Challenges in Dialogue Discourse and Dialogue CMSC October 27, 2006.

Dialogue Act Tagging Discourse and Dialogue CMSC November 4, 2004.

Turn-taking and Backchannels Ryan Lish. Turn-taking We all learned it in preschool, right? Also an essential part of conversation Basic phenomenon of.

1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.

HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.

Natural conversation “When we investigate how dialogues actually work, as found in recordings of natural speech, we are often in for a surprise. We are.

Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.

Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,

Lexical, Prosodic, and Syntactics Cues for Dialog Acts.

Audio/Speech CS376: November 4, 2004 as presented by Jessica Kuo.

1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.

A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.

S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,

Conversational role assignment problem in multi-party dialogues Natasa Jovanovic Dennis Reidsma Rutger Rienks TKI group University of Twente.

Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.

Challenges in Dialogue

Linguistic knowledge for Speech recognition

Conditional Random Fields for ASR

Spoken Dialog System.

Recognizing Structure: Dialogue Acts and Segmentation

Spoken Dialogue Systems

Dialogue Acts Julia Hirschberg CS /18/2018.

Turn-taking and Disfluencies

Studying Spoken Language Text 17, 18 and 19

Recognizing Structure: Sentence, Speaker, andTopic Segmentation

Statistical Models for Automatic Speech Recognition

Spoken Dialogue Systems

Recognizing Structure: Dialogue Acts and Segmentation

Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,

Challenges in Dialogue

Presentation transcript:

Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006

Roadmap The Problem: Portability Task domain: Call-routing Porting: –Speech recognition –Call-routing –Dialogue management Conclusions

SLS Portability Spoken language system design –Record or simulate user interactions –Collect vocabulary, sentence style, sequence Transcribe/label –Expert creates vocabulary, language model, dialogue model Problem: Costly, time-consuming, expert

Call-routing Goal: Given an utterance, identify type –Dispatch to right operator Classification task: –Manual rules or data-driven methods Feature-based classification (Boosting) –Pre-defined types, e.g.: Hello? -> Hello; I have a question -> request(info) I would like to know my balance. > request(balance)

Dialogue Management Flow Controller –Pluggable dialogue strategy modules ATN: call-flow, easy to augment, manage context –Inputs: context, semantic rep. of utterance ASR –Language models Trigrams, in probabilistic framework

Adaptation: ASR ASR: Language models –Usually trained from in-domain transcriptions Here: out-of-domain transcriptions –Switchboard, spoken dialog (telecomm, insur) –In-domain web pages New domain: pharmaceuticals Style differences: SLS:pronouns; OOV: med best Best accuracy: spoken dialogue+web –SWBD too big/slow

Adaptation: Call-routing Manual tagging: Slow, expensive Here: Existing out-of-domain labeled data –Meta call-types: Library Generic: all apps Re-usable: in-domain, but already exist Specific: only this app –Grouping done by experts Bootstrap: Start with generic, reusable

Call-type Classification Boostexter: word n-gram features; 1,100 iter –ASR output basis Telecomm based call-type library Two classifications: reject-yn; classification –In-domain: true: 78%; ASR: 62% –Generic: test on generic: 95%; 91% –Bootstrap: generic+reuse+rules: 79%, 68%

Dialogue Model Build dialogue strategy templates –Based on call-type classification Generic: –E.g.. Yes, no, hello, repeat, help Cause generic context dependent reply Tag as vague/concrete: –Vague: “I have a question” -> clarification –Concrete:clear routing, attributes – sub-dialogs

Dialogue Model Porting Evaluation: –Compare to original transcribed dialogue Task 1: DM category: 32 clusters of calls –Bootstrap 16 categories – 70% of instances Using call-type classifiers: get class, conf, concrete? If confident/concrete/correct -> correct; –If incorrect, error Also classify vague/generic 67-70% accuracy for DM, routing task

Conclusions Portability: –Bootstrapping of ASR, Call-type, DM –Generally effective Call-type success high Others: potential

Turn-taking Discourse and Dialogue CS November 16, 2004

Agenda Motivation –Silence in Human-Computer Dialogue Turn-taking in human-human dialogue –Turn-change signals –Back-channel acknowledgments –Maintaining contact Exploiting to improve HCC –Automatic identification of disfluencies, jump-in points, and jump-ins

Turn-taking in HCI Human turn end: – Detected by 250ms silence System turn end: –Signaled by end of speech –Indicated by any human sound Barge-in Continued attention: –No signal

Missed turn example

Yielding & Taking the Floor Turn change signal –Offer floor to auditor/hearer –Cues: pitch fall, lengthening, “but uh”, end gesture, amplitude drop+’uh’, end clause Likelihood of change increases with more cues Negated by any gesticulation Speaker-state signal: Shift in head direction AND/OR Start of gesture

Retaining the Floor Within-turn signal –Still speaker: Look at hearer as end clause Continuation signal –Still speaker: Look away after within-turn/back Back-channel: –‘mmhm’/okay/etc; nods, sentence completion. Clarification request; restate –NOT a turn: signal attention, agreement, confusion

Segmenting Turns Speaker alone: –Within-turn signal->end of one unit; –Continuation signal -. Beginning of next unit Joint signal: –Speaker turn signal (end); auditor ->speaker; speaker- >auditor –Within-turn + back-channel + continuation Back-channels signal understanding –Early back-channel + continuation

Regaining Attention Gaze & Disfluency –Disfluency: “perturbation” in speech Silent pause, filled pause, restart –Gaze: Conversants don’t stare at each other constantly However, speaker expects to meet hearer’s gaze –Confirm hearer’s attention Disfluency occurs when realize hearer NOT attending –Pause until begin gazing, or to request attention

Improving Human-Computer Turn-taking Identifying cues to turn change and turn start Meeting conversations: –Recorded, natural research meetings –Multi-party –Overlapping speech –Units = “Spurts” between 500ms silence

Tasks Sentence/disfluency/non-boundary ID –End of sentence, break off, continue Jump-in points –Times when others “jump in” Jump-in words –Interruption vs start from silence Off- and online Language model and/or prosodic cues

Text + Prosody Text sequence: –Modeled as n-gram language model Hidden event prediction – e.g. boundary as hidden state –Implement as HMM Prosody: –Duration, Pitch, Pause, Energy –Decision trees: classify + probability Integrate LM + DT

Interpreting Breaks For each inter-word position: –Is it a disfluency, sentence end, or continuation? Key features: –Pause duration, vowel duration 62% accuracy wrt 50% chance baseline –~90% overall Best combines LM & DT

Jump-in Points (Used) Possible turn changes –Points WITHIN spurt where new speaker starts Key features: –Pause duration, low energy, pitch fall –No lexical/punctuation features used –Forward features useless Look like SB but aren’t Accuracy: 65% wrt 50% baseline Performance depends only on preceding prosodic features

Jump-in Features Do people speak differently when jump-in? –Differ from regular turn starts? Examine only first words of turns –No LM Key features: –Raised pitch, raised amplitude Accuracy: 77% wrt 50% baseline –Prosody only

Summary Prosodic features signal conversational moves –Pause and vowel duration distinguish sentence end, disfluency, or fluent continuation –Jump-ins occur at locations that sound like sent. ends –Raise voice when jump in