Spoken Dialogue Systems

Slides:



Advertisements
Similar presentations
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
Advertisements

Results Clear distinction between two question intonations: perception and understanding level Three distinct prototypes for different interpretations.
U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.
Error detection in spoken dialogue systems GSLT Dialogue Systems, 5p Gabriel Skantze TT Centrum för talteknologi.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
ASR Evaluation Julia Hirschberg CS Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.
Detecting missrecognitions Predicting with prosody.
Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
1 Back Channel Communication Antoine Raux Dialogs on Dialogs 02/25/2005.
Speech Guidelines 2 of Errors VUIs are error-prone due to speech recognition. Humans aren’t perfect speech recognizers, therefore, machines aren’t.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Discourse Markers Discourse & Dialogue CS November 25, 2006.
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Evaluation of SDS Svetlana Stoyanchev 3/2/2015. Goal of dialogue evaluation Assess system performance Challenges of evaluation of SDS systems – SDS developer.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
1 Natural Language Processing Lecture Notes 14 Chapter 19.
Challenges in Dialogue Discourse and Dialogue CMSC October 27, 2006.
Natural conversation “When we investigate how dialogues actually work, as found in recordings of natural speech, we are often in for a surprise. We are.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
Misrecognitions and Corrections in Spoken Dialogue Systems Diane Litman AT&T Labs -- Research (Joint Work With Julia Hirschberg, AT&T, and Marc Swerts,
circle Spoken Dialogue for the Why2 Intelligent Tutoring System Diane J. Litman Learning Research and Development Center & Computer Science Department.
1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.
Grounding and Repair Joe Tepperman CS 599 – Dialogue Modeling Fall 2005.
Objectives of session By the end of today’s session you should be able to: Define and explain pragmatics and prosody Draw links between teaching strategies.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Advantages and Disadvantages of Different Error Correction Techniques
Challenges in Dialogue
Techniques and Principles in Language Teaching
PRAGMATICS 3.
Dialogue Systems Julia Hirschberg CS /17/2018.
Building and Evaluating SDS
Error Detection and Correction in SDS
Spoken Dialogue Systems
Spoken Dialogue Systems
Studying Intonation Julia Hirschberg CS /21/2018.
Meanings of Intonational Contours
Studying Intonation Julia Hirschberg CS /21/2018.
Issues in Spoken Dialogue Systems
Spoken Dialogue Systems
Spoken Dialogue Systems: Human and Machine
SPEAKING ASSESSMENT Joko Nurkamto UNS Solo 11/8/2018.
Intonational and Its Meanings
Intonational and Its Meanings
Prosody in Recognition/Understanding
Intonational Variation in Spoken Dialogue Systems
Teaching Listening Based on Active Learning.
Dialogue Acts Julia Hirschberg CS /18/2018.
Meanings of Intonational Contours
Turn-taking and Disfluencies
Advanced NLP: Speech Research and Technologies
Dialogue Acts and Information State
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
High Frequency Word Entrainment in Spoken Dialogue
SPEAKING ASSESSMENT Joko Nurkamto UNS Solo 12/3/2018.
Advanced NLP: Speech Research and Technologies
Spoken Dialogue Systems
Spoken Dialogue Systems
Discourse Structure in Generation
SECOND LANGUAGE LISTENING Comprehension: Process and Pedagogy
Intonational and Its Meanings
Spoken Dialogue Systems: System Overview
Recognizing Structure: Dialogue Acts and Segmentation
Spoken Dialogue Systems
Challenges in Dialogue
Low Level Cues to Emotion
Presentation transcript:

Spoken Dialogue Systems CS 4705

Talking to a Machine….and (often) Getting an Answer Today’s spoken dialogue systems make it possible to accomplish real tasks without talking to a person Could Eliza do this? What do today’s systems do better? Do they actually embody human intelligence? Key advances Stick to goal-directed interactions in a limited domain Prime users to adopt the vocabulary you can recognize Partition the interaction into manageable stages Judicious use of system vs. mixed initiative 10 or 15 years ago we talked about talking to machines, we studied the problems involved, and we even built demo dialogue systems. Today when you talk to a machine, you have a chance of getting a useful answer -- about travel arrangements or your email or your finances. What’s made this possible?

Dialogue vs. Monologue Monologue and dialogue both involve interpreting Information status Coherence issues Reference resolution Speech acts, implicature, intentionality Dialogue involves managing Turn-taking Grounding and repairing misunderstandings Initiative and confirmation strategies

Segmenting Speech into Utterances What is an `utterance’? Why is EOU detection harder than EOS? How does speech differ from text? Single syntactic sentence may span several turns A: We've got you on USAir flight 99 B: Yep A: leaving on December 1. Multiple syntactic sentences may occur in single turn A: We've got you on USAir flight 99 leaving on December. Do you need a rental car? Intonational definitions: intonational phrase, breath group, intonation unit

Turns and Utterances Dialogue is characterized by turn-taking: who should talk next, and when they should talk How do we identify turns in recorded speech? Little speaker overlap (around 5% in English --although depends on domain) But little silence between turns either How do we know when a speaker is giving up or taking a turn? Holding the floor? How do we know when a speaker is interruptable?

Simplified Turn-Taking Rule (Sacks et al) At each transition-relevance place (TRP) of each turn: If current speaker has selected A as next speaker, then A must speak next If current speaker does not select next speaker, any other speaker may take next turn If no one else takes next turn, the current speaker may take next turn TRPs are where the structure of the language allows speaker shifts to occur

Adjacency pairs set up next speaker expectations GREETING/GREETING QUESTION/ANSWER COMPLIMENT/DOWNPLAYER REQUEST/GRANT ‘Significant silence’ is dispreferred A: Is there something bothering you or not? (1.0s) A: Yes or no? (1.5s) A: Eh? B: No.

Intonational Cues to Turntaking Continuation rise (L-H%) holds the floor H-H% requests a response L*H-H% (ynq contour) H* H-H% (highrise question contour) Intonational contours signal dialogue acts in adjacency pairs

Timing and Turntaking How should we time responses in a SDS? Japanese studies of aizuchi (backchannels) (Koiso et al ‘98, Takeuchi et al ‘02) in natural speech Lexical information: particles ne and ka ending preceding turn or (in telephone shopping) product names Length of preceding utterance, f0, loudness, and pause after even more important in predicting turntaking

Turntaking and Initiative Strategies System Initiative S: Please give me your arrival city name. U: Baltimore. S: Please give me your departure city name…. User Initiative S: How may I help you? U: I want to go from Boston to Baltimore on November 8. `Mixed’ initiative U: I want to go to Boston. S: What day do you want to go to Boston?

Grounding (Clark & Shaefer ‘89) Conversational participants don’t just take turns speaking….they try to establish common ground (or mutual belief) Hmust ground a S's utterances by making it clear whether or not understanding has occurred How do hearers do this? S: I can upgrade you to an SUV at that rate. Continued attention (U gazes appreciatively at S) Relevant next contribution U: Do you have a RAV4 available?

Acknowledgement/backchannel U: Ok/Mhmmm/Great! Demonstration/paraphrase U: An SUV. Display/repetition U: You can upgrade me to an SUV at the same rate? Request for repair U: I beg your pardon?

Detecting Grounding Behavior Evidence of system misconceptions reflected in user responses (Krahmer et al ‘99, ‘00) Responses to incorrect verifications contain more words (or are empty) show marked word order (especially after implicit verifications) contain more disconfirmations, more repeated/corrected info ‘No’ after incorrect verifications vs. other ynq’s has higher boundary tone wider pitch range longer duration longer pauses before and after more additional words after it Signalling whether information is grounded or not (Clark & Wilkes-Gibbs ‘86, Clark & Schaeffer ‘89): presentation/acceptance 120 dialogue for Dutch train info; one version uses explicit verification and oneimplicit; 20 users given 3 tasks; analyzed 443 verification q/a pairs predicted that responses to correct verifications would be shorter, with unmarked word order, not repeating or correcting information but presenting new information (positive cues) -- principle of least effort findings: where problems, subjects use more words (or say nothing), use marked word order (especially after implicit verifs), contain more disconfirmations (duh), with more repeated and corrected info ML experiments (memory based learning) show 97% correct prediction from these features (>8 words or marked word order or corrects info -> 92%) Krahmer et al ‘99b predicted additional prosodic cues for neg signals: high boundary tone, high pitch range, long duration of ‘nee’ and entire utterance, long pause after ‘nee’, long delay before ‘no’, from 109 negative answers to ynqs of 7 speakers; hyp

User information state reflected in response (Shimojima et al ’99, ‘01) Echoic responses repeat prior information – as acknowledgment or request for confirmation S1: Then go to Keage station. S2: Keage. Experiment: Identify ‘degree of integration’ and prosodic features (boundary tone, pitch range, tempo, initial pause) Perception studies to elicit ‘integration’ effect Results: fast tempo, little pause and low pitch signal high integration Shimojima et al: prior observational study produced corpus of task-oriented conversations in japanese: block configuration task, ftf, 2 party focussed on immediate repeats from responder (not hello/hello, etc) Consensus labeled ‘degree to which responder seemed to have integrated information’ (1-5) Chi-sq and t-tests show close corr of ratings and prosodic features modified tokens’ tempo, ppau and pitch to create prosody favoring very high and very low integration ratings and presented to subjects to rate, 1-5 Asked what grounding functions were being performed (request-repair or ack) Over all, fast tempo, little pause and low pitch signal integration

Grounding and Confirmation Strategies U: I want to go to Baltimore. Explicit S: Did you say you want to go to Baltimore? Implicit S: Baltimore. (H* L- L%) S: Baltimore? (L* H- H%) S: What time do you want to leave Baltimore? No confirmation

How do we evaluate Dialogue Systems? PARADISE framework (Walker et al ’00) “Performance” of a dialogue system is affected both by what gets accomplished by the user and the dialogue agent and how it gets accomplished Maximize Task Success Minimize Costs Efficiency Measures Qualitative Measures

What metrics should we use? Efficiency of the Interaction:User Turns, System Turns, Elapsed Time Quality of the Interaction: ASR rejections, Time Out Prompts, Help Requests, Barge-Ins, Mean Recognition Score (concept accuracy), Cancellation Requests User Satisfaction Task Success: perceived completion, information extracted 2/19/2019

User Satisfaction: Sum of Many Measures Was Annie easy to understand in this conversation? (TTS Performance) In this conversation, did Annie understand what you said? (ASR Performance) In this conversation, was it easy to find the message you wanted? (Task Ease) Was the pace of interaction with Annie appropriate in this conversation? (Interaction Pace) In this conversation, did you know what you could say at each point of the dialog? (User Expertise) How often was Annie sluggish and slow to reply to you in this conversation? (System Response) Did Annie work the way you expected her to in this conversation? (Expected Behavior) From your current experience with using Annie to get your email, do you think you'd use Annie regularly to access your mail when you are away from your desk? (Future Use) 2/19/2019

Performance Model Weights trained for each independent factor via multiple regression modeling: how much does each contribute to User Satisfaction? Result useful for system development Making predictions about system modifications Distinguishing ‘good’ dialogues from ‘bad’ dialogues But … can we also tell on-line when a dialogue is ‘going wrong’

Identifying Misrecognitions, Awares and User Corrections Automatically (Hirschberg, Litman & Swerts) Collect corpus from interactive voice response system Identify speaker ‘turns’ incorrectly recognized where speakers first aware of error that correct misrecognitions Identify prosodic features of turns in each category and compare to other turns Use Machine Learning techniques to train a classifier to make these distinctions automatically

Turn Types TOOT: Hi. This is AT&T Amtrak Schedule System. This is TOOT. How may I help you? User: Hello. I would like trains from Philadelphia to New York leaving on Sunday at ten thirty in the evening. TOOT: Which city do you want to go to? User: New York. misrecognition Here are examples of the 3 turn types we focus on. correction aware site

Results Reduced error in predicting misrecognized turns to 8.64% Error in predicting ‘awares’ (12%) Error in predicting corrections (18-21%)

Conclusions Spoken dialogue systems presents new problems -- but also new possibilities Recognizing speech introduces a new source of errors Additional information provided in the speech stream offers new information about users’ intended meanings, emotional state (grounding of information, speech acts, reaction to system errors) Why spoken dialogue systems rather than web-based interfaces?