© 2008 SRI International Question Asking to Inform Preference Learning: A Case Study Melinda Gervasio SRI International Karen Myers SRI International Marie.

Slides:



Advertisements
Similar presentations
Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.
Advertisements

Lecture 18: Temporal-Difference Learning
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Computing Kemeny and Slater Rankings Vincent Conitzer (Joint work with Andrew Davenport and Jayant Kalagnanam at IBM Research.)
Mixed-Initiative Planning Yolanda Gil USC CS 541 Fall 2003.
Reinforcement Learning
Università di Milano-Bicocca Laurea Magistrale in Informatica
Using Trees to Depict a Forest Bin Liu, H. V. Jagadish EECS, University of Michigan, Ann Arbor Presented by Sergey Shepshelvich 1.
Learning from Observations Chapter 18 Section 1 – 4.
Multiple-Instance Learning Paper 1: A Framework for Multiple-Instance Learning [Maron and Lozano-Perez, 1998] Paper 2: EM-DD: An Improved Multiple-Instance.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Ensemble Learning: An Introduction
Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton.
Special Topic: Missing Values. Missing Values Common in Real Data  Pneumonia: –6.3% of attribute values are missing –one attribute is missing in 61%
Classification.
Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?
Machine Learning: Ensemble Methods
Hardness-Aware Restart Policies Yongshao Ruan, Eric Horvitz, & Henry Kautz IJCAI 2003 Workshop on Stochastic Search.
Approaches to Modeling and Learning User Preferences Marie desJardins University of Maryland Baltimore County Presented at SRI International AI Center.
Radial Basis Function Networks
For Better Accuracy Eick: Ensemble Learning
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Machine Learning CS 165B Spring 2012
Copyright © 1994 Carnegie Mellon University Disciplined Software Engineering - Lecture 1 1 Disciplined Software Engineering Lecture #14 Software Engineering.
Learning Objectives. Objectives Objectives: By the conclusion to this session each participant should be able to… Differentiate between a goal and objectives.
Machine Learning Chapter 3. Decision Tree Learning
An efficient distributed protocol for collective decision- making in combinatorial domains CMSS Feb , 2012 Minyi Li Intelligent Agent Technology.
Artificial Neural Networks
CS 391L: Machine Learning: Ensembles
Universit at Dortmund, LS VIII
BOOSTING David Kauchak CS451 – Fall Admin Final project.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
Gary M. Weiss Alexander Battistin Fordham University.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University
5/10/2002 Adaptive Goal Recognition Neal Lesh Presented by Don Patterson.
What is feedback? Feedback is … information provided by an agent (e.g., teacher, peer, book, parent, self/experience) … regarding aspects of one’s performance.
Answering Top-k Queries Using Views Gautam Das (Univ. of Texas), Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto), Dimitris.
Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Machine Learning Concept Learning General-to Specific Ordering
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Classification Ensemble Methods 1
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
CS Machine Learning Instance Based Learning (Adapted from various sources)
Concept Learning and The General-To Specific Ordering
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
On-Line Algorithms in Machine Learning By: WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ KIM HYEONGCHEOL HE RUIDAN SHANG XINDI.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Session 7: Planning for Evaluation. Session Overview Key definitions:  monitoring  evaluation Process monitoring and process evaluation Outcome monitoring.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Machine Learning: Ensemble Methods
Data Mining – Algorithms: Instance-Based Learning
Ensembles.
Classification and Prediction
David Kauchak CS158 – Spring 2019
Data Mining CSCI 307, Spring 2019 Lecture 21
Presentation transcript:

© 2008 SRI International Question Asking to Inform Preference Learning: A Case Study Melinda Gervasio SRI International Karen Myers SRI International Marie desJardins Univ. of Maryland Baltimore County Fusun Yaman BBN Technologies AAAI Spring Symposium: Humans Teaching Agents March 2009

AAAI 2009 Spring Symposium: Humans Teaching Agents 2 POIROT: Learning from a Single Demonstration Demonstration Trace ((lookupReqmts S42) ((lookupAirport PFAL 300m) ((ORBI 90m BaghdadIntl))) ((setPatientAPOE P1 ORBI )) ((getArrivalTime P1 PFAL ORBI )(1h 3h )) ((setPatientAvailable P1 3h )) ((lookupHospitalLocation HKWC) ((KuwaitCity))) ((lookupAirport KuwaitCity 300m 2) ((OKBK 250m KuwaitIntl))) ((setPatientAPOD P1 OKBK )) ((lookupMission ORBI OKBK 24h 3h )) ((lookupAsset ORBI OKBK 24h 3h ) ((C h 2h 10))) ((initializeTentativeMission c ORBI OKBK 15h 2h)) ((getArrivalTime P1 OKBK HKWC 17h) (18h 19h)) … Learning Generalized Problem-solving Knowledge

AAAI 2009 Spring Symposium: Humans Teaching Agents 3 Target Workflow Learned Knowledge Temporal Ordering Conditional branching Iterations Selection Criteria Method Generalization

AAAI 2009 Spring Symposium: Humans Teaching Agents 4 QUAIL: Question Asking to Inform Learning Goal: improve learning performance through system- initiated question asking Approach: 1.define question catalog to inform learning by demo 2.develop question models and representations 3.explore question asking strategies “Tell me and I forget, show me and I remember, involve me and I understand.” - Chinese Proverb

AAAI 2009 Spring Symposium: Humans Teaching Agents 5 Question Models Question Cost: approximate ‘cognitive burden’ in answering Cost(q) = w F ×FormatCost(q) + w G ×GroundednessCost(q) w F + w G = 1 Question Utility: normalize utilities across learners Utility(q) = ∑ l  L w l × Utility l (q,l) where ∑ w l = 1 Utility l (q) = w B × BaseUtility l (q) + w G × GoalUtility l (q) w B + w G = 1

AAAI 2009 Spring Symposium: Humans Teaching Agents 6 Question Selection Given: –questions Q={q 1 … q n } with costs and utilities –budget B Problem: find Q' ⊆ Q with Cost(Q') ≤ B with maximal utility –equivalent to 0/1 knapsack problem (no question dependencies) –efficient dynamic programming approaches – O(nB)

AAAI 2009 Spring Symposium: Humans Teaching Agents 7 CHARM (Charming Hybrid Adaptive Ranking Model) Learns lexicographic preference models –There is an order of importance on the attributes –For every attribute there is a preferred value Size: Small Authority: Civil Size: Large Authority: Military Example: Airports characterized by Authority (civil, military), Size (small, medium, large) Preference Model: –A civil airport is preferred to a military one. –Among civil airports, a large airport is preferred to a small airport.

AAAI 2009 Spring Symposium: Humans Teaching Agents 8 CHARM Learning Idea: –Keep track of a set of models consistent with data of the form Obj1<Obj2 A partial order on the attributes and values –The object that is preferred by more models is more preferred Algorithm for learning the models –Initially assume all attributes and all values are equally important –Loop until nothing changes Given Obj1<Obj2 predict a winner using the current model If the predicted winner is actually the preferred one then do nothing Otherwise decrease the importance of the attribute/values that led to the wrong prediction.

AAAI 2009 Spring Symposium: Humans Teaching Agents 9 Learn From Mistakes AirportSizeAuthority BWILargeCivil DCASmallCivil RankValue 1Small 1Large 1Civil 1Military RankValue 2Small 1Large 1Civil 1Military 1)Given training data (e.g., BWI<DCA) 2) Most important attributes predict a winner 3) Ranks of attributes who voted for the looser updated.

AAAI 2009 Spring Symposium: Humans Teaching Agents 10 Learn from Mistakes AirportSizeAuthority BWILargeCivil AndrewsLargeMilitary RankValue 2Small 1Large 1Civil 1Military RankValue 2Small 1Large 1Civil 2Military Given: BWI<Andrews

AAAI 2009 Spring Symposium: Humans Teaching Agents 11 Finally If the model truly is lexicographic then ranks will converge –No convergence => underlying model is not lexicographic. If training data is consist then will correctly predict all examples Size: Small Authority: Civil Size: Large Authority: Military RankValue 3Small 2Large 1Civil 3Military

AAAI 2009 Spring Symposium: Humans Teaching Agents 12 QUAIL+CHARM Case Study Goal: investigate how different question selection strategies impact CHARM preference learning for ordering patients Performance Metric: CHARM's accuracy in predicting pairwise ordering preferences Learning Target: lexicographic preference model for ordering patients defined over a subset of 5 patient attributes triageCode, woundType, personClass, readyForTransport, LAT Training Input: P1<P2 indicating P1 is at least as preferred as P2

AAAI 2009 Spring Symposium: Humans Teaching Agents 13 Question Types for CHARM Object ordering: Should Patient1 be handled before Patient2? Attribute relevance: Is Attr relevant to the ordering? Attribute ordering: Is Attr1 preferred to Attr2? Attribute value ordering: For Attr, is Val1 preferred to Val2?  Uniform question cost model

AAAI 2009 Spring Symposium: Humans Teaching Agents 14 Experiment Setup Target preference models generated randomly –Draw on database of 186 patient records Train on 1 problem; test on 4 problems –Training/test instance: a pairwise preference among 5 patients 10 runs for each target preference model –3 handcrafted target models with irrelevant attributes –5 randomly generated target models over all 5 patient attributes

AAAI 2009 Spring Symposium: Humans Teaching Agents 15 Results

AAAI 2009 Spring Symposium: Humans Teaching Agents 16 Observations on Results  Question answering is generally useful Increased number of questions (generally) results in greater performance improvements Has greater impact when fewer training examples available for learning (i.e., learned model is weaker)  A little knowledge can be a dangerous thing –CHARM’s incorporation of isolated answers can decrease performance –Related questions can lead to significant performance improvement Being told {Attr1>Attr2, Attr4>Attr5} may not be useful (and may be harmful) Being told {Attr1>Attr2, Attr2>Attr3} is very useful  Need for more sophisticated models of question utility Learn the utility models

AAAI 2009 Spring Symposium: Humans Teaching Agents 17 Future Directions Learn utility models through controlled experimentation –Assess the impact of different question types in different settings –Features for learning: Question attributes, state of learned model, training data, previously asked questions Expand set of questions, support questions with differing costs Expand coverage to a broader set of Learners Continuous model of question asking

AAAI 2009 Spring Symposium: Humans Teaching Agents 18 Related Work Active Learning: –Focus to date on classification, emphasizing selection of additional training data for a human to label Interactive Task Learning: –Allen et al.’s work on Learning by Discussion –Blythe’s work on Learning by Being Told