Spoken Dialogue Systems

Slides:



Advertisements
Similar presentations
Dialogue systems at KTH. The August project Part of the Stockholm Cultural Capital of Europe '98 program Swedish spoken dialogue system with an animated.
Advertisements

5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
TT Centre for Speech Technology Early error detection on word level Gabriel Skantze and Jens Edlund Centre for Speech Technology.
An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004.
The HIGGINS domain The primary domain of HIGGINS is city navigation for pedestrians. Secondarily, HIGGINS is intended to provide simple information about.
Results Clear distinction between two question intonations: perception and understanding level Three distinct prototypes for different interpretations.
HIGGINS Error handling strategies in a spoken dialogue system Rolf Carlson, Jens Edlund and Gabriel Skantze Error handling research issues The long term.
HIGGINS A spoken dialogue system for investigating error handling techniques Jens Edlund, Gabriel Skantze and Rolf Carlson Scenario User:I want to go to.
Error detection in spoken dialogue systems GSLT Dialogue Systems, 5p Gabriel Skantze TT Centrum för talteknologi.
Learning in the Wild Satanjeev “Bano” Banerjee Dialogs on Dialog March 18 th, 2005 In the Meeting Room Scenario.
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
Demo from AVSP ‘99 [1] An example from the August dialogue system Magnus Lundeberg and Jonas Beskow Centre for Speech Technology, KTH, Sweden.
1 error handling – Higgins / Galatea Dialogs on Dialogs Group July 2005.
Speech Guidelines 2 of Errors VUIs are error-prone due to speech recognition. Humans aren’t perfect speech recognizers, therefore, machines aren’t.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Evaluation of SDS Svetlana Stoyanchev 3/2/2015. Goal of dialogue evaluation Assess system performance Challenges of evaluation of SDS systems – SDS developer.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
1 Natural Language Processing Lecture Notes 14 Chapter 19.
Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.
Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.
ACE TESOL Diploma Program – London Language Institute OBJECTIVES You will understand: 1. The terminology and concepts of semantics, pragmatics and discourse.
Building & Evaluating Spoken Dialogue Systems Discourse & Dialogue CS 359 November 27, 2001.
User Responses to Prosodic Variation in Fragmentary Grounding Utterances in Dialog Gabriel Skantze, David House & Jens Edlund.
 Chapter 4-6 Davies  Assignment Expectation Review  Looking at mini unit and assessment information  Phone conversations  Writing Notes to parents.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.
Grounding and Repair Joe Tepperman CS 599 – Dialogue Modeling Fall 2005.
1 Spoken Dialogue Systems Dialogue and Conversational Agents (Part III) Chapter 19: Draft of May 18, 2005 Speech and Language Processing: An Introduction.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Speech and multimodal Jesse Cirimele. papers “Multimodal interaction” Sharon Oviatt “Designing SpeechActs” Yankelovich et al.
Advantages and Disadvantages of Different Error Correction Techniques
Jump Start Clear Speech
How to compose a message to a teacher
Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.
System Design Ashima Wadhwa.
Unit 2 User Interface Design.
How can we become good learners?
Dialogue Systems Julia Hirschberg CS /17/2018.
Dialogue Systems Julia Hirschberg CS /17/2018.
SDS Future Julia Hirschberg LSA /17/2018.
Dialogue Systems Julia Hirschberg CS /18/2018.
Building and Evaluating SDS
Spoken Language Processing
Error Detection and Correction in SDS
Spoken Dialogue Systems
Studying Intonation Julia Hirschberg CS /21/2018.
Issues in Spoken Dialogue Systems
Spoken Dialogue Systems
Evaluation of a multimodal Virtual Personal Assistant Glória Branco
Intonational Variation in Spoken Dialogue Systems
Teaching Listening Based on Active Learning.
Dialogue Acts Julia Hirschberg CS /18/2018.
Entrainment in SDS Julia Hirschberg CS /18/2018.
Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov
Turn-taking and Disfluencies
Julia Hirschberg Columbia University SIGdial 2008
Dialogue Acts and Information State
Turn-taking and Disfluencies
Nigel G. Ward, Anais G. Rivera, Karen Ward, David G. Novick
Advanced NLP: Speech Research and Technologies
Spoken Dialogue Systems
Spoken Dialogue Systems
Discourse Structure in Generation
SECOND LANGUAGE LISTENING Comprehension: Process and Pedagogy
Spoken Dialogue Systems
Spoken Dialogue Systems
Speech recognition, machine learning
Low Level Cues to Emotion
Speech recognition, machine learning
Evaluation of a multimodal Virtual Personal Assistant Glória Branco
Presentation transcript:

Spoken Dialogue Systems Julia Hirschberg CS 4706 9/20/2018

Today Some Swedish examples Controlling the dialogue flow State prediction Controlling lexical choice Learning from human-human dialogue User feedback Evaluating systems 9/20/2018

The Waxholm Project at KTH tourist information Stockholm archipelago time-tables, hotels, hostels, camping and dining possibilities. mixed initiative dialogue speech recognition multimodal synthesis graphic information pictures, maps, charts and time-tables Demos at http://www.speech.kth.se/multimodal 9/20/2018

The Waxholm system When do the evening boats depart? I think I want to go to Waxholm The city This is a table of the boats... Information about the restaurants in Waxholm is shown in this table Is it possible to eat in Waxholm? Which day of the week do you want to go? I want to go tomorrow There are lots of boats from Stockholm to Waxholm on a Friday, At what time do you want to go? I am looking for boats to Waxholm From where do you want to go Thank you Information about the hotels in Waxholm is shown in this table Thank you too Where can I find hotels? Waxholm is shown on this map Information about hotels is shown in this table Which hotels are in Waxholm? Where is Waxholm? 9/20/2018

Today Some Swedish examples Controlling the dialogue flow State prediction Controlling lexical choice Learning from human-human dialogue User feedback Evaluating systems 9/20/2018

Dialogue control - state prediction Dialog grammar specified by a number of states Each state associated with an action database search, system question… … Probable state determined from semantic features Transition probability from one state to state Dialog control design tool with a graphic interface 9/20/2018

Waxholm Topics TIME_TABLE Task: get a time-table. Example: När går båten? (When does the boat leave?) SHOW_MAP Task : get a chart or a map displayed. Example: Var ligger Vaxholm? (Where is Vaxholm located?) EXIST Task : display lodging and dining possibilities. Example: Var finns det vandrarhem? (Where are there hostels?) OUT_OF_DOMAIN Task : the subject is out of the domain. Example: Kan jag boka rum. (Can I book a room?) NO_UNDERSTANDING Task : no understanding of user intentions. Example: Jag heter Olle. (My name is Olle) END_SCENARIO Task : end a dialog. Example: Tack. (Thank you.) 9/20/2018

{ p(ti | F )} Topic selection FEATURES TOPIC EXAMPLES argmax i TIME SHOW FACILITY NO UNDER- OUT OF END TABLE MAP STANDING DOMAIN OBJECT .062 .312 .073 .091 .067 .091 QUEST-WHEN .188 .031 .024 .091 .067 .091 QUEST-WHERE .062 .688 .390 .091 .067 .091 FROM-PLACE .250 .031 .024 .091 .067 .091 AT-PLACE .062 .219 .293 .091 .067 .091 TIME .312 .031 .024 .091 .067 .091 PLACE .091 .200 .500 .091 .067 .091 OOD .062 .031 .122 .091 .933 .091 END .062 .031 .024 .091 .067 .909 HOTEL .062 .031 .488 .091 .067 .091 HOSTEL .062 .031 .122 .091 .067 .091 ISLAND .333 .556 .062 .091 .067 .091 PORT .125 .750 .244 .091 .067 .091 MOVE .875 .031 .098 .091 .067 .091 { p(ti | F )} argmax i 9/20/2018

Topic prediction results 15 12,9 12,7 8,8 10 8,5 All % Errors “no understanding” excluded 5 3,1 2,9 complete parse raw data no extra linguistic sounds 9/20/2018

Today Some Swedish examples Controlling the dialogue flow State prediction Controlling lexical choice Learning from human-human dialogue User feedback Evaluating systems 9/20/2018

User answers to questions? The answers to the question: “What weekday do you want to go?” (Vilken veckodag vill du åka?) 22% Friday (fredag) 11% I want to go on Friday (jag vill åka på fredag) 11% I want to go today (jag vill åka idag) 7% on Friday (på fredag) 6% I want to go a Friday (jag vill åka en fredag) - are there any hotels in Vaxholm? (finns det några hotell i Vaxholm) 9/20/2018

Examples of questions and answers Hur ofta åker du utomlands på semestern? Hur ofta reser du utomlands på semestern? jag åker en gång om året kanske jag åker ganska sällan utomlands på semester jag åker nästan alltid utomlands under min semester jag åker ungefär 2 gånger per år utomlands på semester jag åker utomlands nästan varje år jag åker utomlands på semestern varje år jag åker utomlands ungefär en gång om året jag är nästan aldrig utomlands en eller två gånger om året en gång per semester kanske en gång per år ungefär en gång per år åtminståne en gång om året nästan aldrig jag reser en gång om året utomlands jag reser inte ofta utomlands på semester det blir mera i arbetet jag reser reser utomlands på semestern vartannat år jag reser utomlands en gång per semester jag reser utomlands på semester ungefär en gång per år jag brukar resa utomlands på semestern åtminståne en gång i året en gång per år kanske en gång vart annat år varje år vart tredje år ungefär nu för tiden inte så ofta varje år brukar jag åka utomlands 9/20/2018

Results no reuse no answer 4% 2% other 24% reuse 52% 18% ellipse 9/20/2018

Today Some Swedish examples Controlling the dialogue flow State prediction Controlling lexical choice Learning from human-human dialogue User feedback Evaluating systems 9/20/2018

The August system Strindberg was married three times! Yes, it might be that we will! What is your name? Strindberg was born in 1849 Over a million people live in the Stockholm area What do you do for a living? I call myself Strindberg, but I don’t really have a surname I can answer questions about Strindberg, KTH and Stockholm When were you born? Do you like it here? How many people live in Stockholm? The information is shown on the map People who live in glass houses should not throw stones I come from the department of Speech, Music and Hearing Yes, that was a smart thing to say! Good bye! Thank you! The Royal Institute of Technology! You are welcome! Perhaps we will meet soon again! 9/20/2018

Evidence from Human Performance Users provide explicit positive and negative feedback Corpus-based vs. laboratory experiments – do these tell us different things? Signalling whether information is grounded or not (Clark & Wilkes-Gibbs ‘86, Clark & Schaeffer ‘89): presentation/acceptance 120 dialogue for Dutch train info; one version uses explicit verification and oneimplicit; 20 users given 3 tasks; analyzed 443 verification q/a pairs predicted that responses to correct verifications would be shorter, with unmarked word order, not repeating or correcting information but presenting new information (positive cues) -- principle of least effort findings: where problems, subjects use more words (or say nothing), use marked word order (especially after implicit verifs), contain more disconfirmations (duh), with more repeated and corrected info ML experiments (memory based learning) show 97% correct prediction from these features (>8 words or marked word order or corrects info -> 92%) Krahmer et al ‘99b predicted additional prosodic cues for neg signals: high boundary tone, high pitch range, long duration of ‘nee’ and entire utterance, long pause after ‘nee’, long delay before ‘no’, from 109 negative answers to ynqs of 7 speakers; hyp 9/20/2018

Adapt – demonstration of ”complete” system 9/20/2018

Feedback and ‘Grounding’: Bell & Gustafson ’00 Positive and negative Previous corpora: August system 18% of users gave pos or neg feedback in subcorpus Push-to-talk Corpus: Adapt system 50 dialogues, 33 subjects, 1845 utterances Feedback utterances labeled w/ Positive or negative Explicit or implicit Attention/Attitude Results: 18% of utterances contained feedback 94% of users provided 9/20/2018

65% positive, 2/3 explicit, equal amounts of attention vs. attitude Large variation Some subjects provided at almost every turn Some never did Utility of study: Use positive feedback to model the user better (preferences) Use negative feedback in error detection 9/20/2018

The HIGGINS domain This is a 3D test environment The primary domain of HIGGINS is city navigation for pedestrians. Secondarily, HIGGINS is intended to provide simple information about the immediate surroundings. 9/20/2018

Initial experiments Studies on human-human conversation The Higgins domain (similar to Map Task) Using ASR in one direction to elicit error handling behaviour Vocoder User Operator Listens Speaks Reads ASR 9/20/2018

Non-Understanding Error Recovery (Skantze ’03) Humans tend not to signal non-understanding: O: Do you see a wooden house in front of you? U: ASR: YES CROSSING ADDRESS NOW (I pass the wooden house now) O: Can you see a restaurant sign? This leads to Increased experience of task success Faster recovery from non-understanding 9/20/2018

Today Some Swedish examples Controlling the dialogue flow State prediction Controlling lexical choice Learning from human-human dialogue User feedback Evaluating systems 9/20/2018

Evaluating Dialogue Systems PARADISE framework (Walker et al ’00) “Performance” of a dialogue system is affected both by what gets accomplished by the user and the dialogue agent and how it gets accomplished Maximize Task Success Minimize Costs Efficiency Measures Qualitative Measures 9/20/2018

Task Success Task goals seen as Attribute-Value Matrix ELVIS e-mail retrieval task (Walker et al ‘97) “Find the time and place of your meeting with Kim.” Attribute Value Selection Criterion Kim or Meeting Time 10:30 a.m. Place 2D516 Task success defined by match between AVM values at end of with “true” values for AVM 9/20/2018

Metrics Efficiency of the Interaction:User Turns, System Turns, Elapsed Time Quality of the Interaction: ASR rejections, Time Out Prompts, Help Requests, Barge-Ins, Mean Recognition Score (concept accuracy), Cancellation Requests User Satisfaction Task Success: perceived completion, information extracted 9/20/2018

Experimental Procedures Subjects given specified tasks Spoken dialogues recorded Cost factors, states, dialog acts automatically logged; ASR accuracy,barge-in hand-labeled Users specify task solution via web page Users complete User Satisfaction surveys Use multiple linear regression to model User Satisfaction as a function of Task Success and Costs; test for significant predictive factors 9/20/2018

User Satisfaction: Sum of Many Measures Was Annie easy to understand in this conversation? (TTS Performance) In this conversation, did Annie understand what you said? (ASR Performance) In this conversation, was it easy to find the message you wanted? (Task Ease) Was the pace of interaction with Annie appropriate in this conversation? (Interaction Pace) In this conversation, did you know what you could say at each point of the dialog? (User Expertise) How often was Annie sluggish and slow to reply to you in this conversation? (System Response) Did Annie work the way you expected her to in this conversation? (Expected Behavior) From your current experience with using Annie to get your email, do you think you'd use Annie regularly to access your mail when you are away from your desk? (Future Use) 9/20/2018

Performance Functions from Three Systems ELVIS User Sat.= .21* COMP + .47 * MRS - .15 * ET TOOT User Sat.= .35* COMP + .45* MRS - .14*ET ANNIE User Sat.= .33*COMP + .25* MRS +.33* Help COMP: User perception of task completion (task success) MRS: Mean recognition accuracy (cost) ET: Elapsed time (cost) Help: Help requests (cost) 9/20/2018

Performance Model Perceived task completion and mean recognition score are consistently significant predictors of User Satisfaction Performance model useful for system development Making predictions about system modifications Distinguishing ‘good’ dialogues from ‘bad’ dialogues But can we also tell on-line when a dialogue is ‘going wrong’ 9/20/2018

Next Class Turn-taking (J&M, Link to conversational analysis description, Beattie on Margaret Thatcher) 9/20/2018