Online supervised learning of non-understanding recovery policies Dan Bohus Computer Science Department Carnegie.

Slides:



Advertisements
Similar presentations
Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
Advertisements

6.5 Compare Surveys, Experiments, and Observational Studies p. 414 How do you accurately represent a population? What is an experimental study? What is.
READ 180 Expectations, Requirements and Rewards
Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University (
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
IELTS (International English Language Testing System) Why do we need to know about it? Why do we need to know about it? What does it look like? What does.
Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.
Speech Graffiti Tutorial MovieLine version Fall 03.
Audiovisual Emotional Speech of Game Playing Children: Effects of Age and Culture By Shahid, Krahmer, & Swerts Presented by Alex Park
An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004.
The HIGGINS domain The primary domain of HIGGINS is city navigation for pedestrians. Secondarily, HIGGINS is intended to provide simple information about.
constructing accurate beliefs in task-oriented spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University.
Sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air.
Characterizing middle school science teachers’ informal formative assessment strategies and their effects on student inquiry skills Joseph A. Brobst &
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part II: Sources & impact of non-understandings, Performance of various.
Belief Updating in Spoken Dialog Systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh,
Modeling the Cost of Misunderstandings in the CMU Communicator System Dan BohusAlex Rudnicky School of Computer Science, Carnegie Mellon University, Pittsburgh,
Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002.
Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus
misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department Carnegie Mellon.
ITCS 6010 Speech Guidelines 1. Errors VUIs are error-prone due to speech recognition. Humans aren’t perfect speech recognizers, therefore, machines aren’t.
belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements Alex Rudnicky,
“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department
Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part I:Issues,Data Collection,Rejection Tuning Dan Bohus Sphinx Lunch.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Speech Graffiti Tutorial FlightLine version Fall 03.
Speech-to-Speech Translation with Clarifications Julia Hirschberg, Svetlana Stoyanchev Columbia University September 18, 2013.
Towards Natural Clarification Questions in Dialogue Systems Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg AISB 2014 Convention at Goldsmiths, University.
Interview Tips.
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
How Do I Find a Job to Apply to?
Speech Guidelines 2 of Errors VUIs are error-prone due to speech recognition. Humans aren’t perfect speech recognizers, therefore, machines aren’t.
How to Make a Survey.
Social Networking and On-Line Communities: Classification and Research Trends Maria Ioannidou, Eugenia Raptotasiou, Ioannis Anagnostopoulos.
The Cambridge BEC Exam.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Telephone English When you answer the phone: Good morning. This is Chengtai Trading Company. May I help you? Good afternoon. This is Dan An Commercial.
Who wants to be a millionaire - Shyon K.Lam, David M.Pennock, Dan Cosley, Steve Lawrence Archana Gupta Temple University.
Malaysian University English Test (MUET).
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Chapter 8: Asking for Clarification
Judging Tips for Junior & Senior Projects 2012 Scott Ferguson, Ph.D. Atlantic Turf & Ornamental Consulting.
How Solvable Is Intelligence? A brief introduction to AI Dr. Richard Fox Department of Computer Science Northern Kentucky University.
QUESTION ANSWER RELATIONSHIP CedarBridge Academy, February 2009.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Unit 8 LANGUAGE FOCUS. Content  Word study  Word used in Computing and Telephoning  Grammar  Pronoun  Indirect speech with conditional sentences.
Prepared by: A. T. M. Monawer Success in EPT Listening & Speaking Reading Writing Listening &Speaking Reading Writing.
THE TEST OF ORAL ENGLISH PROFICIENCY YOUR GUIDE TO PREPARING FOR THE TOEP November 13, 2015 Dawn Takaoglu.
Lti Shaping Spoken Input in User-Initiative Systems Stefanie Tomko and Roni Rosenfeld Language Technologies Institute School of Computer Science Carnegie.
Learning from Obama: Redesigning Analytics.  In 2008, Obama campaign raised $750 million  Would not be enough in 2012 The fundraising challenge Not.
1 Multiplicative Relationships and Ways of Working 1 Professional Development Session 3.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
Studying. Move Beyond Memorization Instructors expect you to have a deeper understanding of principles, and will ask you to apply the principle to problems.
By: WenHao Wu. A current situation that I have is that I cannot decide if a computer career is for me. I am considering any career in computers, but I.
#1 Make sense of problems and persevere in solving them How would you describe the problem in your own words? How would you describe what you are trying.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Designing a Survey The key to obtaining good data through a survey is to develop a good survey questionnaire.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
CREATING A SURVEY. What is a survey questionnaire? Survey questionnaires present a set of questions to a subject who with his/her responses will provide.
To my presentation about:  IELTS, meaning and it’s band scores.  The tests of the IELTS  Listening test.  Listening common challenges.  Reading.
 Understand why schools ask students to give presentations.  Understand what makes a good presentation.  Learn about structuring a presentation. 
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning.
Hierarchical Sampling for Active Learning Sanjoy Dasgupta and Daniel Hsu University of California, San Diego Session : Active Learning and Experimental.
6.5 Compare Surveys, Experiments, and Observational Studies
Chapter 8: Asking for Clarification
Presentation transcript:

online supervised learning of non-understanding recovery policies Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA with thanks to: Alex Rudnicky Brian Langner Antoine Raux Alan Black Maxine Eskenazi

2 Sorry, I didn’t catch that … Can you repeat that? Can you rephrase that? Where are you flying from? Please tell me the name of the city you are leaving from … Could you please go to a quieter place? Sorry, I didn’t catch that … tell me the state first … S: understanding-errors in spoken dialog S: Where are you flying from? U: Birmingham [BERLIN PM] System constructs an incorrect semantic representation of the user’s turn MIS-understanding S: Where are you flying from? U: Urbana Champaign [OKAY IN THAT SAME PAY] System fails to construct a semantic representation of the user’s turn NON-understanding Did you say Berlin? from Berlin … where to? S: ? ? ?

3 recovery strategies  large set of strategies (“strategy” = 1-step action)  tradeoffs not well understood  some strategies are more appropriate at certain times  OOV -> ask repeat is not a good idea  door slam -> ask repeat might work well Sorry, I didn’t catch that … Can you repeat that? Can you rephrase that? Where are you flying from? Please tell me the name of the city you are leaving from … Could you please go to a quieter place? Sorry, I didn’t catch that … tell me the state first … S:

4 recovery policy  “policy” = method for choosing between strategies  difficult to handcraft  especially over a large set of recovery strategies  common approaches  heuristic  “three strikes and you’re out” [Balentine] 1 st non-understanding: ask user to repeat 2 nd non-understanding: provide more help, including examples 3 rd non-understanding: transfer to an operator

5 this talk … … an online, supervised method for learning a non-understanding recovery policy from data

6 overview  introduction  approach  experimental setup  results  discussion

7 overview  introduction  approach  experimental setup  results  discussion

8 intuition … … if we knew the probability of success for each strategy in the current situation, we could easily construct a policy S: Where are you flying from? U: [OKAY IN THAT SAME PAY] Urbana Champaign Sorry, I didn’t catch that … Can you repeat that? Can you rephrase that? Where are you flying from? Please tell me the name of the city you are leaving from … Could you please go to a quieter place? Sorry, I didn’t catch that … tell me the state first … S: 32% 15% 20% 30% 45% 25% 43%

9 two step approach step 1: learn to estimate probability of success for each strategy, in a given situation step 2: use these estimates to choose between strategies (and hence build a policy)

10 learning predictors for strategy success  supervised learning: logistic regression  target: strategy recovery successfully or not “success” = next turn is correctly understood labeled semi-automatically  features: describe current situation extracted from different knowledge sources recognition features language understanding features dialog-level features [state, history]

11 logistic regression  well-calibrated class-posterior probabilities  predictions reflect empirical probability of success  x% of cases where P(S|F)=x are indeed successful  sample efficient  one model per strategy, so data will be sparse  stepwise construction  automatic feature selection  provide confidence bounds  very useful for online learning

12 two step approach step 1: learn to estimate probability of success for each strategy, in a given situation step 2: use these estimates to choose between strategies (and hence build a policy)

13 policy learning  choose strategy most likely to succeed  BUT:  we want to learn online  we have to deal with the exploration / exploitation tradeoff S1 S2 S3 S4 0 1

14 highest-upper-bound learning  choose strategy with highest-upper-bound  proposed by [Kaelbling 93]  empirically shown to do well in various problems  intuition S1 S2 S3 S exploitation exploration

15 highest-upper-bound learning  choose strategy with highest upper bound  proposed by [Kaelbling 93]  empirically shown to do well in various problems  intuition S1 S2 S3 S exploitation exploration

16 highest-upper-bound learning  choose strategy with highest upper bound  proposed by [Kaelbling 93]  empirically shown to do well in various problems  intuition S1 S2 S3 S exploitation exploration

17 highest-upper-bound learning  choose strategy with highest upper bound  proposed by [Kaelbling 93]  empirically shown to do well in various problems  intuition S1 S2 S3 S exploitation exploration

18 highest-upper-bound learning  choose strategy with highest upper bound  proposed by [Kaelbling 93]  empirically shown to do well in various problems  intuition S1 S2 S3 S exploitation exploration

19 overview  introduction  approach  experimental setup  results  discussion

20 system  Let’s Go! Public bus information system  connected to PAT customer service line during non-business hours  ~30-50 calls / night

21 strategies Name Example HLPFor instance, you can say ‘FORBES AND MURRAY’, or ‘DOWNTOWN’ HLP_R For instance, you can say ‘FORBES AND MURRAY’, or ‘DOWNTOWN’, or say ‘START OVER’ to restart RPWhere are you leaving from? [repeats previous system prompt] AREPCan you repeat what you just said? ARPHCould you rephrase that? MOVE Tell me first your departure neighborhood … [ignore the current non- understanding and back-off to an alternative dialog plan] ASA Please use shorter answers because I have trouble understanding long sentences … SLLSorry, I understand people best when they speak softer … ITGive general interaction tips to the user ASO I’m sorry but I’m still having trouble understanding you and I might do better if we restarted. Would you like to start over? GUP I’m sorry, but it doesn’t seem like I’m able to help you. Please call back during regular business hours …

22 constraints  constraints  don’t AREP more than twice in a row  don’t ARPH if #words <= 3  don’t ASA unless #words > 5  don’t ASO unless (4 nonu in a row) and (ratio.nonu > 50%)  don’t GUP unless (dialog > 30 turns) and (ratio.nonu > 80%)  capture expert knowledge; ensure system doesn’t use an unreasonable policy  4.2/11 strategies available on average  min=1, max=9

23 features  current non-understanding  recognition, lexical, grammar, timing info  current non-understanding segment  length, which strategies already taken  current dialog state and history  encoded dialog states  “how good things have been going”

24 learning  baseline period [2 weeks, 3/11 -> 3/25, 2006]  system randomly chose a strategy, while obeying constraints  in effect, a heuristic / stochastic policy  learning period [5 weeks, 3/26 -> 5/5, 2006]  each morning labeled data from previous night  retrained likelihood of success predictors  installed in the system for the next night

25 2 strategies eliminated Name Example HLPFor instance, you can say ‘FORBES AND MURRAY’, or ‘DOWNTOWN’ HLP_R For instance, you can say ‘FORBES AND MURRAY’, or ‘DOWNTOWN’, or say ‘START OVER’ to restart RPWhere are you leaving from? [repeats previous system prompt] AREPCan you repeat what you just said? ARPHCould you rephrase that? MOVE Tell me first your departure neighborhood … [ignore the current non- understanding and back-off to an alternative dialog plan] ASA Please use shorter answers because I have trouble understanding long sentences … SLLSorry, I understand people best when they speak softer … ITGive general interaction tips to the user ASO I’m sorry but I’m still having trouble understanding you and I might do better if we restarted. Would you like to start over? GUP I’m sorry, but it doesn’t seem like I’m able to help you. Please call back during regular business hours …

26 overview  introduction  approach  experimental setup  results  discussion

27 results  average non-understanding recovery rate (ANNR)  improvement: 33.6%  37.8% (p=0.03) (12.5%rel)  fitted learning curve: A = B = C = D =

28 policy evolution  MOVE, HLP, ASA engaged more often  AREP, ARPH engaged less often MOVE ASA IT SLL ARPH AREP HLP RP HLP_R

29 overview  introduction  approach  experimental setup  results  discussion

30 are the predictors learning anything?  AREP(653), IT(273), SLL(300)  no informative features  ARPH(674), MOVE(1514)  1 informative feature (#prev.nonu, #words)  ASA(637), RP(2532), HLP(3698), HLP_R(989)  4 or more informative features in the model dialog state (especially explicit confirm states) dialog history

31 more features, more (specific) strategies  more features would be useful  day-of-week  clustered dialog states  ? (any ideas?) ?  more strategies / variants  approach might be able to filter out bad versions  more specific strategies, features ask short answers worked well … speak less loud didn’t … (why?)

32 “noise” in the experiment  ~15-20% of responses following non- understandings are non-user-responses  transient noises  secondary speech  primary speech not directed to the system  this might affect training, in a future experiment we want to eliminate that

33 unsupervised learning  supervised version  “success” = next turn is correctly understood [i.e. no misunderstanding, no non-understanding]  unsupervised version  “success” = next turn is not a non-understanding  “success” = confidence score of next turn  training labels automatically available  performance improvements might still be possible

34 thank you!