Learning Dialog Acts for Embodied Agents Thomas K Harris KTH: 19 May 2005.

Slides:



Advertisements
Similar presentations
Ch:8 Design Concepts S.W Design should have following quality attribute: Functionality Usability Reliability Performance Supportability (extensibility,
Advertisements

Software Process Models
Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -
Fostering Algebraic Thinking October 26  December 2  6-hour Assignment after Session 2  January 20 Presented by: Janna Smith
An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004.
HIGGINS Error handling strategies in a spoken dialogue system Rolf Carlson, Jens Edlund and Gabriel Skantze Error handling research issues The long term.
Heterogeneous Multi-Robot Dialogues for Search Tasks Thomas K Harris, Satanjeev (Bano) Banerjee Alexander Rudnicky AAAI Spring Symposium 2005: Dialogical.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Hinrich Schütze and Christina Lioma
Brent Dingle Marco A. Morales Texas A&M University, Spring 2002
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
Engineering Dialog for Gadgets Thomas K Harris September 12, 2003.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Maximizing Classifier Utility when Training Data is Costly Gary M. Weiss Ye Tian Fordham University.
James A Personal Mobile Universal Speech Interface for Electronic Devices.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Latent Semantic Indexing.
Part I: Classification and Bayesian Learning
Introduction to machine learning
Radial Basis Function Networks
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
 A data processing system is a combination of machines and people that for a set of inputs produces a defined set of outputs. The inputs and outputs.
Human Computer Interface. Human Computer Interface? HCI is not just about software design HCI applies to more than just desktop PCs!!! No such thing as.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])
Chapter 10 Artificial Intelligence. © 2005 Pearson Addison-Wesley. All rights reserved 10-2 Chapter 10: Artificial Intelligence 10.1 Intelligence and.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
1 High Resolution Statistical Natural Language Understanding: Tools, Processes, and Issues. Roberto Pieraccini SpeechCycle
Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Interaction Design Session 12 LBSC 790 / INFM 718B Building the Human-Computer Interface.
IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 7: Focusing on Users and Their Tasks.
Designing Interface Components. Components Navigation components - the user uses these components to give instructions. Input – Components that are used.
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.
Quality Software Project Management Software Size and Reuse Estimating.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
How Solvable Is Intelligence? A brief introduction to AI Dr. Richard Fox Department of Computer Science Northern Kentucky University.
University of Windsor School of Computer Science Topics in Artificial Intelligence Fall 2008 Sept 11, 2008.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.
Alternative IR models DR.Yeni Herdiyeni, M.Kom STMIK ERESHA.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Learning to Share Meaning in a Multi-Agent System (Part I) Ganesh Padmanabhan.
1 CS 430: Information Discovery Lecture 11 Latent Semantic Indexing.
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.
Simulation Chapter 16 of Quantitative Methods for Business, by Anderson, Sweeney and Williams Read sections 16.1, 16.2, 16.3, 16.4, and Appendix 16.1.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Mapping Utterances onto Dialogue Acts with LSA and Naïve Bayes Thomas K Harris Dialogs on Dialogs: May 6, 2005.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Computer Control and Monitoring Today we will look at: What we mean by computer control Examples of computer control Sensors – analogue and digital Sampling.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Automatic cLasification d
Algorithms and Problem Solving
Speech recognition, machine learning
Speech recognition, machine learning
Presentation transcript:

Learning Dialog Acts for Embodied Agents Thomas K Harris KTH: 19 May 2005

Introduction::SGPUC::Learning::Applicability::Grounding2 Today’s Talk Introduction: Problems talking to robots SGPUC: A mini-problem addressed Learning –Supervised –Semi-supervised Application: Weak contributors Grounding: Back to Sensors

Introduction::SGPUC::Learning::Applicability::Grounding3 Scenario for Search We found it! We are at

Introduction::SGPUC::Learning::Applicability::Grounding4 Issues in Spoken HRI 1. How do people decompose the task into sub-tasks? 2. What language do people use to get the tasks performed by the robots? 3. Given a human command, what is the expected robot behavior?  Explore using Wizard of Oz experiments

Introduction::SGPUC::Learning::Applicability::Grounding5 WOZ design Natural spoken communication takes place via walkie-talkie. All communication and robot movements are recorded. Participants: 1 experimenter 2 robot teleoperator- actors 1 subject Experimenter places treasure. Teleoperators can only see what robots can identify. Subject can only see map data generated by robots.

Introduction::SGPUC::Learning::Applicability::Grounding6 Annotation and analysis Utterances classified into functional categories For one experiment: 8 major utterance categories 20 minor utterance categories 394 unique utterances Carnegie Mellon MockBrow annotation tool

Introduction::SGPUC::Learning::Applicability::Grounding7 Utterance/Task Breakdown  Controlling team behaviors Controlling team behaviors  Grounding Grounding  Positive/negative feedback  Informing robot of it’s state or the world  Explanations of commands  Orientation Grounding  Navigation Navigation  Simple Navigation commands  Spatial Referential Navigation  Object Referential Navigation  Manipulation Manipulation  Manipulating the environment  Manipulating treasure  Coverage Coverage  Manipulating the webcam view  Object coverage commands  Generic coverage  Asking about the robot’s abilities Asking about the robot’s abilities  Filler Filler  Real-time command modifications Real-time command modifications

Introduction::SGPUC::Learning::Applicability::Grounding8 Designing the SDS Input Pass Words -> Speech Acts and Concepts is usually a knowledge engineered “white-box” function. Coverage issues: –One-to-many mapping from concepts to words –Space (words) is large (Nobody can even say how large.) –ASR is sensitive to overcoverage Input issues: –Noisy –Probabilistic –Dynamic and Situational Output (concepts) are difficult to share/generalize from one domain/system to another.

Introduction::SGPUC::Learning::Applicability::Grounding9 What do we do? A lot of design iterations! Restrict the domain Share components Control the speaker through –Training and entrainment –Domain-related expectations –Influencing or outright directing the dialog

Introduction::SGPUC::Learning::Applicability::Grounding10 Use the Data Words -> Speech Acts and Concepts can also be a data- driven “black-box” function, or a hybrid. This has its own set of problems –Labeling data is costly –The catch-22 (data collection requires a working system). Iterate starting with seed data which can be nothing designer hypothesized data WoZ data from a similar or previous-version SDS from some human-human analog –The performance often seems nice at first, but then asymptotes quickly. I’m only going to address the labeling cost issue here.

Introduction::SGPUC::Learning::Applicability::Grounding11 Avoiding Labeling Costs Easily Labeled –Observe broad classes of utterances relevant to a domain, e.g. “request for train ticket, request for train schedule, other” Automatically Observable Data –Observe co-occurring automatically identifiable phenomena, e.g. record which tickets are purchased by a human agent after which customer utterances. Unlabeled Data

Introduction::SGPUC::Learning::Applicability::Grounding12 A Mini-Problem Let’s look at a small part of the words -> speech acts and concepts problem in a real system, the Speech Graffiti Personal Universal Controller (SGPUC). Hopefully this small, concrete system and it’s mini-problem will facilitate manageable experimentation of approaches. But first, a little about the system itself.

Introduction::SGPUC::Learning::Applicability::Grounding13 Speech Graffiti Personal Universal Controller Protocol-based appliance communication architecture Automatically built SDS from appliance description language Speech Graffiti style user interface

Introduction::SGPUC::Learning::Applicability::Grounding14 Appliance Communication Architecture Havi adapter X10 adapter Speech Graffiti Personal Universal Controller

Introduction::SGPUC::Learning::Applicability::Grounding15 James Stereo (mode) tuner (radio band) AM frequency # station WXXX FM frequency # station WXXX seek forward backward auxiliaryCD (status) playstoppause repeat offsingle tracksingle discall discs disc # track # next track last track random onoff x-bass on off volume volume up volume down off Digital camera control Play mode play stop fast fwd rewind record pause step forward backward info Device mode camera VCR Media type Digital video unknown VHS none (mode) Other devices…

Introduction::SGPUC::Learning::Applicability::Grounding16 Speech Graffiti Dialog Artificial subset language Tree-structured functions Universal conversational primitives User-directed Great for recognition Entirely declarative (and automatic)

Introduction::SGPUC::Learning::Applicability::Grounding17 Six Dialog Acts in SGPUC Session Management Help Execution Query Response Related Other

Introduction::SGPUC::Learning::Applicability::Grounding18 Session Management User: blah blah blah... System: ignoring user User: hello james System: stereo, digital camera User: stereo System: stereo here User: goodbye System: goodbye User: blah blah blah... System: ignoring user

Introduction::SGPUC::Learning::Applicability::Grounding19 Help/Exploration/Orientation User: alarm clock options System: alarm, clock, radio, sleep, and more… User: help System: to hear what options you have, say options, to repeat an utterance, say repeat… User: stereo options System: while turning stereo on: off, am, fm, auxiliary, cd, and more… User: where am I System: stereo options

Introduction::SGPUC::Learning::Applicability::Grounding20 Execution/Specification User: stereo auxiliary System: while turning the stereo on and switching to auxiliary: auxiliary User: cd System: while switching to cd mode: cd User: play System: while playing a cd: playing cd User: switch disc to four System: while switching discs: switching cd disc to four

Introduction::SGPUC::Learning::Applicability::Grounding21 Query User: what is the am frequency System: the am frequency is five hundred thirty User: random status System: random is off User: status System: tuner am, station wabc, volume low, and more…

Introduction::SGPUC::Learning::Applicability::Grounding22 Response Related User: radio band am options System: am options are frequency, kabc, k001, k002, and more… User: more System: k003, k004 User: more System: band options are fm User: repeat System: band options are fm

Introduction::SGPUC::Learning::Applicability::Grounding23 Back to the Mini-Problem The language is explicit and regular in classifying dialog acts. A grammar will accurately classify dialog acts. Users are taught the SG language. Users learn the language incompletely and have faulty memories. Utterances have false starts, spurious repetitions, etc. ASR is error prone. 37.5% of utterances’ dialog acts were misclassified.

Introduction::SGPUC::Learning::Applicability::Grounding24 Data Listening to the actual speech, I labeled 2010 utterances (from 10 participants). Each utterance is labeled with one of the six of the dialog acts. Note that this labeling is much faster than transcription or much other labeling utterances were labeled in 2 ½ hours, close to real-time. Each utterance is represented by a boolean vector, where each element in the vector represents whether that word appears or not in the utterance. (i.e. word order is ignored!)

Introduction::SGPUC::Learning::Applicability::Grounding25 A Naïve Bayes Classifier

Introduction::SGPUC::Learning::Applicability::Grounding26 Classifier Results

Introduction::SGPUC::Learning::Applicability::Grounding27 Problems with Naïve Bayes Independence assumption –Word existence in an utterance contributes a fixed amount to class distinction regardless of context. –i.e. “bank” contributes the same thing to the classifier in the context of “world bank” and “river bank” Estimates a high-dimensional model –The model estimates 5 parameters (1-#classes) for each word. Words that occur infrequently will be severely over-fitted. Problems with singletons words –If a word appears in an utterance that hasn’t occurred in the training data for a particular class, the probability assigned to that class is zero.

Introduction::SGPUC::Learning::Applicability::Grounding28 Latent Semantic Analysis to the Rescue Independence assumption –LSA models both synonymy and polysemy. –Polysemy: Words that occur in different contexts i.e. “bank” in “world bank” vs “river bank” tend to become distinguished. –Synonymy: Words that occur in similar contexts i.e. the “white” and “black” of “white sheep” and “black sheep” tend to become undistinguished. Estimates a high-dimensional model –The effective dimension is arbitrarily fixed. Problems with singletons words –The dimensionality reduction serves as a smoothing function.

Introduction::SGPUC::Learning::Applicability::Grounding29 How Does LSA Work? C1: Human machine interface for ABC computer applications. C2: A survey of user opinion of computer system response time. C3: The EPS user interface management system. C4: System and human system engineering testing of EPS. C…: C1C2C3C4… Human1001… Interface1010… Computer1100… User0110… System0112… Response0100… Time0100… EPS0011… Survey0100… {X} =

Introduction::SGPUC::Learning::Applicability::Grounding30 Singular Value Decomposition Any mxn matrix X where m>n can be decomposed into the product of three matrices, UDV T, where: –U is an mxn matrix and V is an nxn matrix both with orthogonal columns. –D is an nxn diagonal matrix D is a sort-of basis in n dimensions for X. In Matlab, [U, D, V] = SVD(X);

Introduction::SGPUC::Learning::Applicability::Grounding31 LSA Algorithm in 4 Easy Steps Build your feature-passage matrix X. (Here I chose word-utterance.) [U, D, V] = SVD(X) Zero out all but the highest g values of D to form a new reduced D. Recompose a reduced X as U D V T.

Introduction::SGPUC::Learning::Applicability::Grounding32 The Recomposed Matrix C1: Human machine interface for ABC computer applications. C2: A survey of user opinion of computer system response time. C3: The EPS user interface management system. C4: System and human system engineering testing of EPS. C…: C1C2C3C4… Human(1)0.16(0)0.40(0)0.38(1)0.47… Interface(1)0.14(0)0.37(1)0.33(0)0.40… Computer(1)0.15(1)0.51(0)0.36(0)0.41… User(0)0.26(1)0.84(1)0.61(0)0.70… System(0)0.45(1)1.23(1)1.05(2)1.27… Response(0)0.16(1)0.58(0)0.38(0)0.42… Time(0)0.16(1)0.58(0)0.38(0)0.42… EPS(0)0.22(0)0.55(1)0.51(1)0.63… Survey(0)0.10(1)0.53(0)0.23(0)0.21… {X} =

Introduction::SGPUC::Learning::Applicability::Grounding33 And This Means? Cosine distances between words show patterns of similarity, as do cosine distances between passages. Clustering with these distances makes clusters that feel “semantic” and mimic human choices in standardized tests for word sorting and lexical priming so well that people have suggested that LSA may be an actual psycholinguistic mechanism.

Introduction::SGPUC::Learning::Applicability::Grounding34 LSA-Discounted NB Estimators Why don’t we try to use an LSA- reconstructed matrix to train the NB classifier? Used various amounts of labeled data, discounted by various amounts of unlabeled LSA data. Unlabeled decoder output boosts classification!

Introduction::SGPUC::Learning::Applicability::Grounding35 Results

Introduction::SGPUC::Learning::Applicability::Grounding36 Applications for Weak Contributors By itself a la “How may I help you?” systems Informing dialog management by adjusting confidence measures of parsed concepts. –More effective error correction, i.e. “Please repeat the name of the city in which you want to travel?” vs. “I’m sorry, I didn’t understand that?” –More effective confirmation strategies. Guided utterance self-correction. A coarse classifier could re-weight the language model or re-order hypotheses to elicit a corrected best hypothesis. How much information needs to be understood for the conversation to progress?

Introduction::SGPUC::Learning::Applicability::Grounding37 Grounding Language for Embodied Agents Prediction Functions –Concepts and Actions -> Words –Concepts and Actions -> Sensor Data Perceptive Function –Words, Sensor Data, Proprioception, and Predictions -> Concepts Planning Function –Concepts and Goals -> Actions

Introduction::SGPUC::Learning::Applicability::Grounding38 Sensory Deprivation Push: To press forcefully Force: Energy or strength Energy: Strength of force Strength: The power to resist force From D. Roy, 2004

Introduction::SGPUC::Learning::Applicability::Grounding39 Prediction Why bother with prediction? Among other things, we’d like to see robots find stable meanings of things “in the wild”. “Tiger” predicts

40 Summary Spoken dialogue is poorly characterized by engineers. Approaches that learn in both supervised and unsupervised settings can help. Embodied agents provide an ideal platform for grounded language acquisition.

Introduction::SGPUC::Learning::Applicability::Grounding41 Controlling team behaviors  “you guys get together”  “T- you go first and B- follow”

Introduction::SGPUC::Learning::Applicability::Grounding42 Grounding  Positive/negative feedback  “ok that’s better”  Informing robot of state  “so that’s up”  “I don’t see anything there”  Explanations of commands  “so I can see which direction is up”  Orientation Grounding  “What you’re facing now with the camera – is that the vehicle that you just circumnavigated”  “I can tell you’re going in the wrong direction, stop”

Introduction::SGPUC::Learning::Applicability::Grounding43 Navigation  Simple Navigation commands  “so um T- turn to you left”  “T- I want you to turn right 90 degrees”  “can you go in that general direction”  “can you proceed in that direction”  Spatial Referential Navigation  “go to that open area”  “continue around the periphery of that open area”  “back out of that alley”  “proceed in that direction until you find an opening to turn left”  Object Referential Navigation  “go over by T-”  “can you go on the other side of that vehicle”  “go over by the posters”

Introduction::SGPUC::Learning::Applicability::Grounding44 Manipulation  Manipulating the environment  “T- why don’t you move the trash can”  Manipulating treasure  “T- bring the coin to me”  Manipulating the webcam view  “ok B- look to your left”  “B- can you look around with the camera a little”

Introduction::SGPUC::Learning::Applicability::Grounding45 Coverage  Object coverage commands  “ok so examine the shelf”  “do you see something on that shelf in front of B-”  “can you look over by that table over there”  Generic Coverage  “do you see anything that looks interesting”

Introduction::SGPUC::Learning::Applicability::Grounding46 Asking about the robot’s abilities  “is that possible”

Introduction::SGPUC::Learning::Applicability::Grounding47 Filler  “and now um”  “ok um”

Introduction::SGPUC::Learning::Applicability::Grounding48 Real-time command modifications  “keep going”  “stop”  “a little more”  “change of plans”  “other direction”