Lti Intelligent Help (or lack thereof) in Spoken Dialog Systems Dialogs on Dialogs discussion Stefanie Tomko 20-Feb-04 HELP!

Slides:



Advertisements
Similar presentations
How to be a good teacher? What makes a good teacher?
Advertisements

The meaning of Reliability and Validity in psychological research
The Writing Process. What is it? Have you heard this phrase before? What do you know about the writing process? Have you heard this phrase before? What.
Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
E. Barton 1.  There is no substitute for independent preparation. It is quite clear who is revising and who is not.  You need to revise all materials.
Materials and Lesson Planning
Language and Cognition Colombo, June 2011 Day 8 Aphasia: disorders of comprehension.
Designing Rubrics For Classroom Assessment Professor Timothy Farnsworth, CUNY Hunter College.
WRITING CRITIQUE GROUP GUIDELINES Writing responses to your group members’ work and receiving responses from others is the most important step in revising.
Speech Graffiti Tutorial MovieLine version Fall 03.
Miss. Mona AL-Kahtani.  Why do we test the oral ability? because we want to measure the development of the spoken language and the ability to interact.
An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004.
Rule Based Systems Alford Academy Business Education and Computing
The HIGGINS domain The primary domain of HIGGINS is city navigation for pedestrians. Secondarily, HIGGINS is intended to provide simple information about.
Do you suffer from judgement creep? A group moderation session will soon put you right!
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
Introduction to VXML. What is VXML? Voice Extensible Markup Language Used in telephone-based speech applications voice browsing of the web.
Chi-square Test of Independence
Speech Graffiti Tutorial MovieLine version Fall 03.
1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.
ITCS 6010 Speech Guidelines 1. Errors VUIs are error-prone due to speech recognition. Humans aren’t perfect speech recognizers, therefore, machines aren’t.
Review.
Perceptions of the Role of Feedback in Supporting 1 st Yr Learning Jon Scott, Ruth Bevan, Jo Badge & Alan Cann School of Biological Sciences.
Enhancing Student Learning Through Error Analysis
Stages of testing + Common test techniques
Speech Graffiti Tutorial FlightLine version Fall 03.
8/20/2015Slide 1 SOLVING THE PROBLEM The two-sample t-test compare the means for two groups on a single variable. the The paired t-test compares the means.
© Curriculum Foundation1 Section 2 The nature of the assessment task Section 2 The nature of the assessment task There are three key questions: What are.
Speech Guidelines 2 of Errors VUIs are error-prone due to speech recognition. Humans aren’t perfect speech recognizers, therefore, machines aren’t.
Safety On The Internet  Usage time  Locations that may be accessed  Parental controls  What information may be shared with others Online rules should.
Beyond Usability: Measuring Speech Application Success Silke Witt-Ehsani, PhD VP, VUI Design Center TuVox.
IMPROVING PARAGRAPHS IMPROVING PARAGRAPHS SAT Prep SAT Prep Writing Section Writing Section Ms. Amorin Ms. Amorin.
1 High Resolution Statistical Natural Language Understanding: Tools, Processes, and Issues. Roberto Pieraccini SpeechCycle
What is Assessment? Assessment is a measure of what students are learning. Its purpose is to improve student learning. It can be thought of as a.
CS 4720 Usability and Accessibility CS 4720 – Web & Mobile Systems.
1 Bacon – T. A. Webinar – 7 March 2012 Transforming Assessment with Adaptive Questions Dick Bacon Department of Physics University of Surrey
The Scientific Method Honors Biology Laboratory Skills.
Usability Evaluation/LP Usability: how to judge it.
HELLO THERE !.... It's great to see you ! And by the way, did you know about the previous expression ?
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Step 2: Inviting to Challenge Group. DON’T! Before getting into the training, it’s important that you DON’T just randomly send someone a message asking.
User Support Chapter 8. Overview Assumption/IDEALLY: If a system is properly design, it should be completely of ease to use, thus user will require little.
Intermediate 2 Software Development Process. Software You should already know that any computer system is made up of hardware and software. The term hardware.
How a Computer Processes Data With today’s technology a little knowledge about what’s inside a computer can make you a more effective user and help you.
1 Natural Language Processing Lecture Notes 14 Chapter 19.
Optimizing Your Computer To Run Faster Using Msconfig Technical Demonstration by: Chris Kilkenny.
Extending VERA (Conference Information) Design Specification & Schedules Arthur Chan (AC) Rohit Kumar (RK) Lingyun Gu (LG)
Sight Word List.
What are the stages of test construction??? Take a minute and try to think of these stages???
Introduction to Communicative Language Teaching Zhang Lu.
Course Enhancement Module on Evidence-Based Reading Instruction K-5 Collaboration for Effective Educator Development, Accountability, and Reform H325A
Sight Words.
Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.
Lti Shaping Spoken Input in User-Initiative Systems Stefanie Tomko and Roni Rosenfeld Language Technologies Institute School of Computer Science Carnegie.
Chi-Square Analyses.
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
TARGETED HELP FOR SPOKEN DIALOGUE SYSTEM SREEDHAR ELLISETTY.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
Workshop Dora Morales By Fiona Ross Colegio Ignacio Zaragoza Saltillo, Coah. Learning From our Mistakes Effective Error Correction.
Thursday, September 16, Announcements Movie night, Friday September 25, at 6:00 pm. Bring your family and friends. There will be vocabulary review.
© 2015 albert-learning.com How to talk to your boss How to talk to your boss!!
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Language Learning for Busy People These documents are private and confidential. Please do not distribute.. Intermediate: I Disagree.
6/27/20161 Interviewing Chapter Section Objectives Identify methods of preparing for interviews, including researching and rehearsing Recognize.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Module Example: Influence of Race and Gender on Income1
Writing Rubrics Module 5 Activity 4.
Writing Tasks and Prompts
Managing Dialogue Julia Hirschberg CS /28/2018.
Presentation transcript:

lti Intelligent Help (or lack thereof) in Spoken Dialog Systems Dialogs on Dialogs discussion Stefanie Tomko 20-Feb-04 HELP!

lti 20-Feb-04Dialogs on Dialogs discussion2 Papers Adding intelligent help to mixed-initiative spoken dialogue systems. G. Gorrell, I. Lewin & M. Rayner. In Proc. of ICSLP, Targeted help for spoken dialogue systems: intelligent feedback improves naive users' performance. B.A. Hockey, O. Lemon, E. Campana, L. Hiatt, G. Aist, J. Hieronymous, A. Gruenstein, J. Downding. In Proc. of EACL, ?????  there isn't a lot out there about this!

lti 20-Feb-04Dialogs on Dialogs discussion3 We need Help! 56% of NL system users in experiment asked for help without explicit knowledge that they could do so Speech Graffiti users knew about various help/orientation keywords –91% used options –70% used where was I? –48% used help

lti 20-Feb-04Dialogs on Dialogs discussion4 What is Help? How do I do ? What can I do? Where was I? How do I say that? I didn't understand what you said….

lti 20-Feb-04Dialogs on Dialogs discussion5 User-initiated Help examples NL Movieline: –Wordy, general This system allows you to obtain movie and theater information for Pittsburgh. You can ask for the location, phone number, and movie listing for a certain theater. Or, you can ask about a particular movie to get the rating and or genre find out where it is playing. Specify both a movie and theater to obtain showtimes. If you get stuck, you can say Reset, to start over. Jupiter –Example based You can ask about general weather forecasts as well as information on temperature, windspeed,… Try saying one of the following: 'what's the weather for Denver?' 'what cities do you know about?' 'what do you know about besides weather?' 'what can I say?' Try saying one of the following 'are there any advisories for the United States?' 'what is the extended forecast for Boston?' 'will it rain in Toronto?'…

lti 20-Feb-04Dialogs on Dialogs discussion6 User-initiated Help examples Speech Graffiti –Somewhat "state" based slot + options : you can say, rating is... G, PG, PG-13, R, NC-17, not rated, or you can ask, what is the rating? options : you can specify or ask about title, show time, day… help : gives list of keywords on 1 st round, then gives explanation of keyword functions TellMe –Orientation, or, at main level, lots of general system info You're in Sports, in the NHL section.

lti 20-Feb-04Dialogs on Dialogs discussion7 These are all kind of "dumb" They might not take system state into account They aren't smart about what users really want to do They might not tell users exactly how to speak They might not orient users to where they are in the system But at least they give users some information…

lti 20-Feb-04Dialogs on Dialogs discussion8 System-initiated "Help" examples NL Movieline: –Excuse me? –Didn't catch that. Jupiter –Pardon me? Speech Graffiti –I'm sorry, I'm having trouble understanding you TellMe –I'm sorry, I didn't get that. Please say a category in Travel. These are really dumb!

lti 20-Feb-04Dialogs on Dialogs discussion9 Intelligent/Targeted Help Makes system-initiated help a little smarter Goal: provide immediate feedback, tailored to what the user said, for cases in which the system was not able to understand an utterance Kind of different perspective compared to traditional error handling What should I do to deal with this error? How can I help the user not make this error in the future?

lti 20-Feb-04Dialogs on Dialogs discussion10 Gorrell et al ICSLP paper Grammar-based vs. statistical LMs –Grammars easy to create (?) –GB performs better if users know what to say –SLMs better for unusual & less constrained utts 1 st attempt – recognition only (i.e. no help) –Run all utts through GBLM &SLM, choose based on confidence scores –Not reliable enough

lti 20-Feb-04Dialogs on Dialogs discussion11 On/Off House User initiative Natural language –Turn off the light in the bathroom –Are the hall and kitchen lights switched on? –Could you tell me which lights are on?

lti 20-Feb-04Dialogs on Dialogs discussion12 Targeted Help Grammar- based LM parsable? Play regular output Send to SLM Classify result Play appropriate help message yes no

lti 20-Feb-04Dialogs on Dialogs discussion13 Classification Hand-classified training set 12 classes 24 features Most common classes –REFEXP_COMMAND (35%) I didn't quite catch that. To turn a device on or off, you could try something like 'turn on the kitchen light.' –LONG_COMMAND (13%) I didn't quite catch that. Long commands can be difficult to understand. Perhaps try giving separate commands for each device. –PRON_COMMAND (11%) I didn't quite catch that. To change the status of a device or group of devices you've just referred to, you could try for example 'turn it on' or 'turn them off.'

lti 20-Feb-04Dialogs on Dialogs discussion14 Evaluation Baseline classification error: 65% Cross-validated final decision tree error: 12% Between-subjects user study task –call a voice-enabled house & leave it in a secure state No training Targeted help (N=16) vs. control help (N=15)

lti 20-Feb-04Dialogs on Dialogs discussion15 Results Targeted helpControl help WER (GB only?) 39%55% Grammaticality 47%36% WER(?): 1 st 5 utts 45%76%

lti 20-Feb-04Dialogs on Dialogs discussion16 Results (2) Targeted help group had more variety in constructions Targeted help users requested help more often –Six TH users vs. only one (!) control user Longer dialogs in TH groups –Some of this is system exploration No significant differences in awareness of final house state or perception of systems' abilities No comparison of task completion

lti 20-Feb-04Dialogs on Dialogs discussion17 Hockey et al EACL paper Domain: WITAS command & control for robotic helicopter Targeted Help is an independent module Grammar- based LM parsable? Play regular output Send to SLM Create & play appropriate help message yes no SLM parsable ? yes no

lti 20-Feb-04Dialogs on Dialogs discussion18 Help message content Message contains one or more of –A. What the system heard A report of the backup SLM recognition hypothesis –B. What the problem was (diagnostic) A description of the problem with the user's utterance –C. What you might say instead A similar in-grammar example Rule-based determination of exact content for B & C Not clear how often A B & C appear & in what combinations

lti 20-Feb-04Dialogs on Dialogs discussion19 B. Diagnostic Endpointing –Check if initial recognized word is ok initial parsable-input word Out-of-volcabulary –Compare SLM vocab to GBLM vocab Subcategorization –Check features of verbs in SLM hypothesis Zoom in [+intrans] => ! Zoom in on the red car

lti 20-Feb-04Dialogs on Dialogs discussion20 C. In-grammar example Try to use words & dialog-move type from user's original utterance –wh-question –yn-question –answer –command Fly over to the hospital GBLM: [reject] SLM:fly hospital TH: fly to the hospital (how does TH know this is a command?)

lti 20-Feb-04Dialogs on Dialogs discussion21 Evaluation Between-groups user study –Targeted help vs. no help –Was user-initiated help available? N=20, 5 tasks each –Only T1 & T5 assessed –Locate an x and then land at the y

lti 20-Feb-04Dialogs on Dialogs discussion22 Results Significantly fewer TH users gave up on tasks –Control users gave up on 39% of tasks –TH users gave up on only 6% Time to completion effects –Hard to measure "completion!" –Task (=> users get better over time) –Help x Task –Help alone (p<.1 in "lenient" analysis)

lti 20-Feb-04Dialogs on Dialogs discussion23 Discussion Definitely an improvement over "dumb" options How easy are these options to automate and port to new domains/systems? –Classifier version needs training data –Rule-based version needs… rules Is there such a thing as too smart? The system doesn't understand the word X The system doesn't understand the word X used with the red car

lti 20-Feb-04Dialogs on Dialogs discussion24 Discussion (2) Do grammaticality improvements fostered by TH persist? How frequently is TH activated? –Does frequency decrease over time? At a faster rate cf. plain-old help? –In rule-based system, how often do both LMs fail?

lti 20-Feb-04Dialogs on Dialogs discussion25 Discussion (3) How often does either system (esp. rule-based) provide inappropriate help? –Wrong dialogue-move type? –Wrong vocabulary? What % of 1 st -utt-after-TH are grammatical? –cf. plain-old help Are there other ways to implement/ supplement TH? –State information? –Back-off to directed dialog? (in worst case…)

lti 20-Feb-04Dialogs on Dialogs discussion26 Anything else? Let me know if you come across any more references to this sort of thing…