Spoken Dialogue Systems

Slides:

Advertisements

Similar presentations

CS305: HCI in SW Development Evaluation (Return to…)

Advertisements

5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.

Lecture Six Pragmatics.

Deciding How to Measure Usability How to conduct successful user requirements activity?

User interaction ‘Rules’ of Human-Human Conversation

Error detection in spoken dialogue systems GSLT Dialogue Systems, 5p Gabriel Skantze TT Centrum för talteknologi.

CAS LX 502 7a. Speech acts Ch. 8. How to do things with words Language as a social function. — I bet you $1 you can’t name the Super Tuesday states. —You’re.

Detecting missrecognitions Predicting with prosody.

Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002.

Direct and indirect speech acts

Beyond Usability: Measuring Speech Application Success Silke Witt-Ehsani, PhD VP, VUI Design Center TuVox.

Semantics 3rd class Chapter 5.

Theories of Discourse and Dialogue. Discourse Any set of connected sentences This set of sentences gives context to the discourse Some language phenomena.

Speech User Interfaces Katherine Everitt CSE 490 JL Section Wednesday, Oct 27.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Evaluation of SDS Svetlana Stoyanchev 3/2/2015. Goal of dialogue evaluation Assess system performance Challenges of evaluation of SDS systems – SDS developer.

Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.

1 Natural Language Processing Lecture Notes 14 Chapter 19.

Standards Of Textuality And Speech Acts.

Building & Evaluating Spoken Dialogue Systems Discourse & Dialogue CS 359 November 27, 2001.

May 2006CLINT CS Dialogue1 Computational Linguistics Introduction NL Dialogue Systems.

Misrecognitions and Corrections in Spoken Dialogue Systems Diane Litman AT&T Labs -- Research (Joint Work With Julia Hirschberg, AT&T, and Marc Swerts,

1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.

SPEECH ACTS Saying as Doing See R. Nofsinger, Everyday Conversation, Sage, 1991.

Grounding and Repair Joe Tepperman CS 599 – Dialogue Modeling Fall 2005.

NLP. Natural Language Processing Abbott You know, strange as it may seem, they give ball players nowadays.

Discourse and Pragmatics Speech Acts Lecture 4: Paltridge, pp

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Aristotel‘s concept to language studies was to study true or false sentences - propositions; Thomas Reid described utterances of promising, warning, forgiving.

Speech Acts: What is a Speech Act?

Speech and multimodal Jesse Cirimele. papers “Multimodal interaction” Sharon Oviatt “Designing SpeechActs” Yankelovich et al.

Analysis of spontaneous speech

How to compose a message to a teacher

Sequence organisation in conversation: Sequence expansion

SPEECH ACT THEORY: Three Kinds of Act.

Techniques and Principles in Language Teaching

SPEECH ACT THEORY: Felicity Conditions.

conversation takes place in real time, is spontaneous and unplanned

Language Functions.

Dialogue Systems Julia Hirschberg CS /17/2018.

Building and Evaluating SDS

Error Detection and Correction in SDS

Spoken Dialogue Systems

Spoken Dialogue Systems

Meanings of Intonational Contours

Studying Intonation Julia Hirschberg CS /21/2018.

Issues in Spoken Dialogue Systems

Spoken Dialogue Systems

Spoken Dialogue Systems: Human and Machine

Spoken Dialogue Systems: Managing Interaction

SPEECH ACTS AND EVENTS 6.1 Speech Acts 6.2 IFIDS 6.3 Felicity Conditions 6.4 The Performative Hypothesis 6.5 Speech Act Classifications 6.6 Direct and.

Intonational Variation in Spoken Dialogue Systems

Teaching Listening Based on Active Learning.

Dialogue Acts Julia Hirschberg CS /18/2018.

Meanings of Intonational Contours

Turn-taking and Disfluencies

Advanced NLP: Speech Research and Technologies

Dialogue Acts and Information State

Spoken Dialogue Systems

SPEECH ACTS Saying as Doing

Lecture 30 Dialogues November 3, /16/2019.

Spoken Dialogue Systems: System Overview

Spoken Dialogue Systems

Spoken Dialogue Systems

SPEECH ACTS Saying as Doing Professor Lenny Shedletsky

Direct and indirect speech acts

SPEECH ACT THEORY: Felicity Conditions.

Presentation transcript:

Spoken Dialogue Systems Lecture 22 Spoken Dialogue Systems CS 4705

Talking to a Machine….and Getting an Answer Today’s spoken dialogue systems make it possible to accomplish real tasks, over the phone, without talking to a person Real-time speech technology enables real-time interaction Speech recognition and understanding is ‘good enough’ for limited, goal-directed interactions Careful dialogue design can be tailored to capabilities of component technologies Limited domain Judicious use of system initiative vs. mixed initiative 10 or 15 years ago we talked about talking to machines, we studied the problems involved, and we even built demo dialogue systems. Today when you talk to a machine, you have a chance of getting a useful answer -- about travel arrangements or your email or your finances. What’s made this possible?

Why is Dialogue Different? Different phenomena to model Turn-taking Grounding Speech acts/dialogue acts New problems to handle How much flexibility to allow: initiative strategies How to deal with error: confirmation strategies How to evaluate `successs’

Turns and Utterances Dialogue is characterized by turn-taking: who should talk next, and when they should talk How do we identify turns in recorded speech? Little speaker overlap (around 5% in English --although depends on domain) But little silence between turns either How do we know when a speaker is giving up or taking a turn?

Simplified Turn-Taking Rule (Sacks et al) At each transition-relevance place of each turn: If during this turn current speaker has selected A as the next speaker, then A must speak next If current speaker does not select the next speaker, any other speaker may take the next turn If no one else takes the next turn, the current speaker may take the next turn Transition-relevance places are where the structure of the language allows speaker shifts to occur.

Adjacency pairs (set up next speaker expectations) GREETING GREETING QUESTION ANSWER COMPLIMENT DOWNPLAYER REQUEST GRANT Significant silence follows first element of adjacency pair) A: Is there something bothering you or not? (1.0s) A: Yes or no? (1.5s) A: Eh? B: No.

Utterances Transition-relevance places typically occur at utterance boundaries but how to define `utterance’ Spoken utterances typically shorter, contain more pronouns, include repairs …, compared to written Cue words, ngrams, prosody A single sentence may span several turns A: We've got you on USAir flight 99 B: Yep A: leaving on December 1. Multiple sentences may occur in single turn A: We've got you on USAir flight 99 leaving on December. Do you need a rental car?

Grounding Conversational participants must continually establish common ground (or, mutual belief) Hearer must ground a speaker's utterances (by making it clear that (believed) understanding has occurred), or else indicate that a grounding problem occurred How do hearers do this? Acknowledgement continuer / backchannel / acknowledgement token (also nods if vision available) A: … returning on U.S. flight one. C: Mm hmm grounds A's utterance, and also returns turn

Display (stronger method) display all or part of utterance to be grounded verbatim C: OK I'll take the 5ish flight on the 11th. A: On the 11th? Request for repair indicates lack of grounding A: Huh? C: I'll take the 5ish flight on the 11th.

Detecting Grounding Automatically Evidence of system misconceptions reflected in user responses (Krahmer et al ‘99, ‘00) Responses to incorrect verifications contain more words (or are empty) show marked word order (especially after implicit verifications) contain more disconfirmations, more repeated/corrected info ‘No’ after incorrect verifications vs. other ynq’s has higher boundary tone wider pitch range longer duration longer pauses before and after more additional words after it Signalling whether information is grounded or not (Clark & Wilkes-Gibbs ‘86, Clark & Schaeffer ‘89): presentation/acceptance 120 dialogue for Dutch train info; one version uses explicit verification and oneimplicit; 20 users given 3 tasks; analyzed 443 verification q/a pairs predicted that responses to correct verifications would be shorter, with unmarked word order, not repeating or correcting information but presenting new information (positive cues) -- principle of least effort findings: where problems, subjects use more words (or say nothing), use marked word order (especially after implicit verifs), contain more disconfirmations (duh), with more repeated and corrected info ML experiments (memory based learning) show 97% correct prediction from these features (>8 words or marked word order or corrects info -> 92%) Krahmer et al ‘99b predicted additional prosodic cues for neg signals: high boundary tone, high pitch range, long duration of ‘nee’ and entire utterance, long pause after ‘nee’, long delay before ‘no’, from 109 negative answers to ynqs of 7 speakers; hyp

User information state reflected response (Shimojima et al ’99, ‘01) Echoic responses repeat prior information – as acknowledgment or request for confirmation S1: Then go to Keage station. S2: Keage. Experiment: Identify ‘degree of integration’ and prosodic features (boundary tone, pitch range, tempo, initial pause) Perception studies to elicit ‘integration’ effect Results: fast tempo, little pause and low pitch signal high integration Shimojima et al: prior observational study produced corpus of task-oriented conversations in japanese: block configuration task, ftf, 2 party focussed on immediate repeats from responder (not hello/hello, etc) Consensus labeled ‘degree to which responder seemed to have integrated information’ (1-5) Chi-sq and t-tests show close corr of ratings and prosodic features modified tokens’ tempo, ppau and pitch to create prosody favoring very high and very low integration ratings and presented to subjects to rate, 1-5 Asked what grounding functions were being performed (request-repair or ack) Over all, fast tempo, little pause and low pitch signal integration

Dialogue Acts Austin (1962) observed that dialogue utterances are a kind of speaker action, or speech act E.g.: performative sentences I name this ship the Titanic. I second this motion. I bet you five dollars it will snow tomorrow.

Types of Speech Acts Locutionary acts: the utterance of a sentence with a particular meaning Illocutionary acts: the act of asking, answering, promising, etc. in uttering a sentence Perlocutionary acts: the (often intentional) production of certain effects upon the feelings, thoughts, or actions of the addressee in uttering a sentence You can't do that. locutionary: utterance illocutionary force: protesting perlocutionary effect: stopping or annoying the hearer

Types of Illocutionary Acts Searle’s term to classify illocutionary acts (1975). Assertives: committing the speaker to something's being the case (suggesting, putting forward, swearing boasting, concluding) Directives: attempts by the speaker to get the addressee to do something (asking, ordering requesting, inviting advising, begging) Commissives: committing the speaker to some future course of action (e.g., promising, planning, vowing, betting, opposing) Expressives: expressing the psychological state of the speaker about a state of affairs (thanking, apologizing, welcoming, deploring) Declarations: bringing about a different state of the world via the utterance (including many of the performative acts above: I resign, you're fired)

Types of Initiative System Initiative User Initiative `Mixed’ initiative

Some Representative Spoken Dialogue Systems Deployed Mixed Initiative Brokerage (Schwab-Nuance) User E-MailAccess (myTalk) System Initiative Directory Assistant (BNR) Air Travel (UA Info-SpeechWorks) Communicator (DARPA Travel) Also, Waxholm, August, PADIS (Bouwman & Hulstijn ’98) Not only have spoken dialogue systems become useful for real tasks, but they’ve also become increasingly usable. While early systems relied upon the system controlling all stages of the interaction (e.g. Please say ‘yes’ or ‘no’.), more recent systems allow the user greater input into how the dialogue will proceed (e.g. ‘Hello. I want to go to Philadelphia on 11 May.’) But actual commercially deployed systems mostly still use system initiative – the system tells the user precisely what to say and can accept no deviations (i.e., no additional information or topic switching), although we are starting to see some mixed initiative, where the system is able to process additional information offered by the user. User initiative systems, with open-ended prompts (like How May I Help You) are mostly in trial. And all systems can fail utterly….<Play damon/task3> ANSER - Japanese banking information. Primarily single word commands. ATIS - DARPA air travel (explained earlier) MIT Galaxy/Jupiter - from DARPA ATIS, mixed-initiative for multiple applications, including weather info, directions to restaurants in the MIT area, airline reservations) Directory Assistant - Bell Northern Research telephone number retrieval by name ARISE - European research effort on dialogues for train timetable info and ticket purchase (some systems are system initiative, others mixed initiative) Brokerage info - stock quotes, account information - mostly system initiative, limited mixed initiative (e.g., “transfer to checking from savings”) E-mail access - mostly single commands Flight arrival information - system initiative for getting flight info for United Airlines Communicator - current DARPA project on modular dialogue system architecture, mixed initiative dialogue How May I Help You - customer care routing, user intiative for initial request, then mixed initiative Multimodal maps (research) - TRAINS application [U. Rochester] - planning train routes; Quickset (Oregon Grad. Institute), disaster management application. MIT Galaxy/Jupiter Communications (Wildfire, Portico) Customer Care (HMIHY – AT&T) Banking (ANSER) ATIS (DARPA Travel) Train Schedule (ARISE) Multimodal Maps (Trains, Quickset) 1980+ 1990+ 1993+ 1995+ 1997+ 1999+

Types of Confirmation Strategies U: I want to go to Baltimore. Explicit S: Did you say you want to go to Baltimore? Implicit S: Baltimore? S: What time do you want to leave Baltimore?

Evaluating Dialogue Systems PARADISE framework (Walker et al ’00) “Performance” of a dialogue system is affected both by what gets accomplished by the user and the dialogue agent and how it gets accomplished Maximize Task Success Minimize Costs Efficiency Measures Qualitative Measures

Task Success Task goals seen as Attribute-Value Matrix ELVIS e-mail retrieval task (Walker et al ‘97) “Find the time and place of your meeting with Kim.” Example: Attribute Value Selection Criterion Kim or Meeting Time 10:30 a.m. Place 2D516 Task success defined by match between AVM values at end of with “true” values for AVM

Metrics Efficiency of the Interaction:User Turns, System Turns, Elapsed Time Quality of the Interaction: ASR rejections, Time Out Prompts, Help Requests, Barge-Ins, Mean Recognition Score (concept accuracy), Cancellation Requests User Satisfaction Task Success: perceived completion, information extracted 12/5/2018

Experimental Procedures Subjects given specified tasks Spoken dialogues recorded Cost factors, states, dialog acts automatically logged; ASR accuracy,barge-in hand-labeled Users specify task solution via web page Users complete User Satisfaction surveys Use multiple linear regression to model User Satisfaction as a function of Task Success and Costs; test for significant predictive factors

User Satisfaction: Sum of Many Measures Was Annie easy to understand in this conversation? (TTS Performance) In this conversation, did Annie understand what you said? (ASR Performance) In this conversation, was it easy to find the message you wanted? (Task Ease) Was the pace of interaction with Annie appropriate in this conversation? (Interaction Pace) In this conversation, did you know what you could say at each point of the dialog? (User Expertise) How often was Annie sluggish and slow to reply to you in this conversation? (System Response) Did Annie work the way you expected her to in this conversation? (Expected Behavior) From your current experience with using Annie to get your email, do you think you'd use Annie regularly to access your mail when you are away from your desk? (Future Use) 12/5/2018

Performance Functions from Three Systems ELVIS User Sat.= .21* COMP + .47 * MRS - .15 * ET TOOT User Sat.= .35* COMP + .45* MRS - .14*ET ANNIE User Sat.= .33*COMP + .25* MRS +.33* Help COMP: User perception of task completion (task success) MRS: Mean recognition accuracy (cost) ET: Elapsed time (cost) Help: Help requests (cost)

Performance Model Perceived task completion and mean recognition score are consistently significant predictors of User Satisfaction Performance model useful for system development Making predictions about system modifications Distinguishing ‘good’ dialogues from ‘bad’ dialogues But can we also tell on-line when a dialogue is ‘going wrong’

Identifying Misrecognitions, Awares and User Corrections Automatically (Hirschberg, Litman & Swerts) Collect corpus from interactive voice response system Identify speaker ‘turns’ incorrectly recognized where speakers first aware of error that correct misrecognitions Identify prosodic features of turns in each category and compare to other turns Use Machine Learning techniques to train a classifier to make these distinctions automatically

Turn Types TOOT: Hi. This is AT&T Amtrak Schedule System. This is TOOT. How may I help you? User: Hello. I would like trains from Philadelphia to New York leaving on Sunday at ten thirty in the evening. TOOT: Which city do you want to go to? User: New York. misrecognition Here are examples of the 3 turn types we focus on. correction aware site

Results Reduced error in predicting misrecognized turns to 8.64% Error in predicting ‘awares’ (12%) Error in predicting corrections (18-21%)

Conclusions Dialogue (especially spoken) presents new problems but also new possibilities Recognizing speech introduces a new source of errors Additional information provided in the speech stream offers new information about users’ intended meanings, emotional state (grounding of information, speech acts, reaction to system errors)