Kshitij Judah EECS, OSU Dissertation Proposal Presentation.

Slides:



Advertisements
Similar presentations
A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems Sriraam Natarajan, Kshitij Judah, Prasad Tadepalli and Alan Fern School.
Advertisements

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Reinforcement Learning
Design of Experiments Lecture I
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Imbalanced data David Kauchak CS 451 – Fall 2013.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Kshitij Judah, Alan Fern, Tom Dietterich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: School of EECS, Oregon State.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
1 Kshitij Judah, Alan Fern, Tom Dietterich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: UAI-2012 Catalina Island,
User-Initiated Learning for Assistive Interfaces USER-INITIATED LEARNING  Motivation  All learning tasks are pre-defined before deployment  The learning.
Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.
Bayesian Decision Theory
Evaluating Search Engine
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Presented by Zeehasham Rasheed
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Ensemble Learning (2), Tree and Forest
RL via Practice and Critique Advice Kshitij Judah, Saikat Roy, Alan Fern and Tom Dietterich PROBLEM: RL takes a long time to learn a good policy. Teacher.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Semi-Supervised Learning
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Active Learning for Class Imbalance Problem
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
by B. Zadrozny and C. Elkan
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 7: Focusing on Users and Their Tasks.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
User-Initiated Learning (UIL) Kshitij Judah, Tom Dietterich, Alan Fern, Jed Irvine, Michael Slater, Prasad Tadepalli, Oliver Brdiczka, Jim Thornton, Jim.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
Scheduling policies for real- time embedded systems.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Experimental Algorithmics Reading Group, UBC, CS Presented paper: Fine-tuning of Algorithms Using Fractional Experimental Designs and Local Search by Belarmino.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
Kshitij Judah, Saikat Roy Alan Fern, Tom Dietterich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAI-2010 Atlanta,
A Roadmap towards Machine Intelligence
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Chapter 1 Software Engineering Principles. Problem analysis Requirements elicitation Software specification High- and low-level design Implementation.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
NTU & MSRA Ming-Feng Tsai
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.
Prophet/Critic Hybrid Branch Prediction B B B
Learning From Observations Inductive Learning Decision Trees Ensembles.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
OPERATING SYSTEMS CS 3502 Fall 2017
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Integrating Learning of Dialog Strategies and Semantic Parsing
Retrieval Performance Evaluation - Measures
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Presentation transcript:

Kshitij Judah EECS, OSU Dissertation Proposal Presentation

 PART I: User-Initiated Learning  PART II: RL via Practice and Critique Advice  PART III: Proposed future directions for the PhD program  Extending RL via practice and critique advice  Active Learning for sequential decision making

All of CALO’s learning components can perform Learning In The Wild (LITW) But the learning tasks are all pre-defined by CALO’s engineers: – What to learn – What information is relevant for learning – How to acquire training examples – How to apply the learned knowledge UIL Goal: Make it possible for the user to define new learning tasks after the system is deployed

TIMELINETIMELINE Scientist Sets sensitivity to confidential Sends to team Sends to a colleague “Lunch today?” Does not set sensitivity to confidential Collaborates on a Classified project Sends to team Forgets to set sensitivity to confidential Research Team

“Please do not forget to set sensitivity when sending ” Scientist Research Team Teaches CALO to learn to predict whether user has forgot to set sensitivity Sends to team CALO reminds user to set sensitivity TIMELINETIMELINE

SAT Based Reasoning System SPARK Procedure Instrumented Outlook Instrumented Outlook Events user Integrated Task Learning user Modify Procedure Procedure Demonstration and Learning Task Creation User Interface for Feature Guidance User Selected Features user Feature Guidance + Related Objects CALO Ontology Trained Classifier Feature Guidance Machine Learner Knowledge Base Training Examples Learning Legal Features SAT Based Reasoning System Class Labels Compose new

SAT Based Reasoning System SPARK Procedure Instrumented Outlook Instrumented Outlook Events user Integrated Task Learning user Compose new Modify Procedure Procedure Demonstration and Learning Task Creation User Interface for Feature Guidance User Selected Features user Feature Guidance + Related Objects CALO Ontology Trained Classifier Feature Guidance Machine Learner Knowledge Base Training Examples Learning Legal Features SAT Based Reasoning System Class Labels

LAPDOG: Transforms an observed sequence of instrumented events into a SPARK procedure SPARK representation generalizes the dataflow between the actions of the workflow

TAILOR: Supports procedure editing – For UIL, it allows adding a condition to one or more steps in a procedure

– The condition becomes the new predicate to be learned

SAT Based Reasoning System SPARK Procedure Instrumented Outlook Instrumented Outlook Events user Integrated Task Learning user Compose new Modify Procedure Procedure Demonstration and Learning Task Creation User Interface for Feature Guidance User Selected Features user Feature Guidance + Related Objects CALO Ontology Trained Classifier Feature Guidance Machine Learner Knowledge Base Training Examples Learning Legal Features SAT Based Reasoning System Class Labels

SAT Based Reasoning System SPARK Procedure Instrumented Outlook Instrumented Outlook Events user Integrated Task Learning user Compose new Modify Procedure Procedure Demonstration and Learning Task Creation User Interface for Feature Guidance User Selected Features user Feature Guidance + Related Objects CALO Ontology Trained Classifier Feature Guidance Machine Learner Knowledge Base Training Examples Learning Legal Features SAT Based Reasoning System Class Labels

Goal: autonomously generate labeled training instances for the learning component from stored user s Problem: actions used to create s are not stored in the CALO knowledge base, so we need to infer how was created {defprocedure do_rememberSensitivity.... [do: (openCompose Window $new )] [do: (change Field $new "to")] [do: (change Field $new "subject")] [do: (change Field $new "body")] [if: (learnBranchPoint $new ) [do: (change Field $new "sensitivity")]] [do: (send Initial $new )].... } Specifically, we want to know: Whether an is an instance of the procedure? Which branch was taken during creation of the ? No such inference can be drawn

Domain Axioms Domain Axioms NewComposition  ComposeNewMail ReplyComposition  ReplyToMail HasAttachment  (AttachFile  ForwardMail) … SPARK Axioms SPARK Axioms ProcInstance  (U 1  U 2  …  U n ) (  forget  label)  (C 1  C 2  …  C n ) {defprocedure do_rememberSensitivity.... [do: (openCompose Window $new )] [do: (change Field $new "to")] [do: (change Field $new "subject")] [do: (change Field $new "body")] [if: (learnBranchPoint $new ) [do: (change Field $new "sensitivity")]] [do: (send Initial $new )].... } Label Analysis Formula (LAF) Label Analysis Formula (LAF) Knowledge Base NewComposition ReplyComposition HasToField HasSubject HasBody HasAttachment … Reasoning Engine Reasoning Engine E   forget ╞ (ProcInstance  Label) E   forget ╞ (ProcInstance   Label) otherwise Positive Example Negative Example Discard

Logistic Regression is used as the core learning algorithm Features – Relational features extracted from ontology Incorporate User Advice on Features – Apply large prior variance on user selected features – Select prior variance on rest of the features through cross-validation Automated Model Selection – Parameters: Prior variance on weights, classification threshold – Technique: Maximization of leave-one-out cross-validation estimate of kappa (  )

Problems: – Attachment Prediction – Importance Prediction Learning Configurations Compared: – No User Advice + Fixed Model Parameters – User Advice + Fixed Model Parameters – No User Advice + Automatic parameter Tuning – User Advice + Automatic parameter Tuning User Advice: 18 keywords in the body text for each problem

Set of 340 s obtained from a real desktop user 256 training set + 84 test set For each training set size, compute mean kappa (  ) using test set to generate learning curves  is a statistical measure of inter-rater agreement for discrete classes  is a common evaluation metric in cases when the classes have a skewed distribution

Attachment Prediction

Importance Prediction

We intended to test the robustness of the system to bad advice Bad advice was generated as follows: – Use SVM based feature selection in WEKA to produce a ranking of user provided keywords – Replace top three words in the ranking with randomly selected words from the vocabulary

Attachment Prediction

Importance Prediction

We want to evaluate the utility of the system for the user We use a new metric called Critical Cost Ratio (CCR) Intuition: A measure of how high cost of forgetting should be compared to cost of interruption for the system to be useful Intuition : Hence, if CCR is low, the system is useful more often For example, if CCR=10, then cost of forgetting should be 10 times more than cost of interruption for net benefit

Attachment Prediction At size 256, cost of forgetting should be at least 5 times of cost of interruption to gain net benefit from the system

Importance Prediction

User interfaces should support rich instrumentation, automation, and intervention User interfaces should come with models of their behavior User advice is helpful but not critical Self-tuning learning algorithms are critical for success

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:

PROBLEM:  Usually RL takes a long time to learn a good policy. Teacher behavior advice RESEARCH QUESTION: Can we make RL perform better with some outside help, such as critique/advice from teacher and how? GOALS:  Non-technical users as teachers  Natural interaction methods state action reward Environment

? Policy Parameters  Trajectory Data Practice Session Advice Interface In a state s i action a i is bad, whereas action a j is good. Teacher Critique Session Critique Data

Trajectory Data Practice Session Advice Interface In a state s i action a i is bad, whereas action a j is good. Teacher Critique Session Policy Parameters  Critique Data Estimate Expected Utility using Importance Sampling. (Peshkin & Shelton, ICML 2002)

Imagine: Our teacher is an Ideal Teacher (Provides All Good Actions) Set of all Good actions Any action not in O(s i ) is suboptimal according to Ideal Teacher All actions are equally good Advice Interface Ideal Teacher Advice Interface Some good actions Some bad actions Some actions unlabeled

Imagine: Our teacher is an Ideal Teacher (Provides All Good Actions) Set of all Good actions Any action not in O(s i ) is suboptimal according to Ideal Teacher All actions are equally good Advice Interface Ideal Teacher  Learning Goal: Find a probabilistic policy, or classifier, that has a high probability of returning an action in O(s) when applied to s.  ALL Likelihood (L ALL ( ,C)) : Probability of selecting an action in O(S i ) given state s i

 Coming back to reality: Not All Teachers are Ideal ! and provide partial evidence about O(s i ) Advice Interface  What about the naïve approach of treating as the true set O(s i ) ?  Difficulties:  When there are actions outside of that are equally good compared to those in, the learning problem becomes even harder.  We want a principled way of handling the situation where either or can be empty.

The Gradient of Expected Loss has a compact closed form. … and provide partial evidence about O(s i ) User Model Assume independence among different states. From corresponding for all states, we can get: Expected ALL Loss:

Map 1Map 2  Our Domain: Micro-management in tactical battles in the Real Time Strategy (RTS) game of Wargus.  5 friendly footmen against a group of 5 enemy footmen (Wargus AI).  Two battle maps, which differed only in the initial placement of the units.  Both maps had winning strategies for the friendly team and are of roughly the same difficulty.

 Difficulty:  Fast pace and multiple units acting in parallel  Our setup:  Provide end-users with an Advice Interface that allows to watch a battle and pause at any moment.

 Goal is to evaluate two systems 1.Supervised System = no practice session 2.Combined System = includes practice and critique  The user study involved 10 end-users  6 with CS background  4 no CS background  Each user trained both the supervised and combined systems  30 minutes total for supervised  60 minutes for combined due to additional time for practice  Since repeated runs are not practical results are qualitative  To provide statistical results we first present simulated experiments

 After user study, selected the worst and best performing users on each map when training the combined system  Total Critique data: User#1: 36, User#2: 91, User#3: 115, User#4: 33.  For each user: divide critique data into 4 equal sized segments creating four data-sets per user containing 25%, 50%, 75%, and 100% of their respective critique data.  We provided the combined system with each of these data sets and allowed it to practice for 100 episodes. All results are averaged over 5 runs.

 RL is unable to learn a winning policy (i.e. achieve a positive value).

 With more critiques performance increases a little bit.

 As the amount of critique data increases, the performance improves for a fixed number of practice episodes.  RL did not go past 12 health difference on any map even after 500 trajectories.

 Even with no practice, the critique data was sufficient to outperform RL.  RL did not go past 12 health difference.

 With more practice performance increases too.

 Our approach is able to leverage practice episodes in order to improve the effectiveness on a given amount of critique data.

 Goal is to evaluate two systems 1.Supervised System = no practice session 2.Combined System = includes practice and critique  The user study involved 10 end-users  6 with CS background  4 no CS background  Each user trained both the supervised and combined systems  30 minutes total for supervised  60 minutes for combined due to additional time for practice

 Comparing to RL:  9 out of 10 users achieved 50 or more performance using Supervised System  6 out of 10 users achieved 50 or more performance using Combined System  Users effectively performed better than RL using either the Supervised or Combined Systems.  RL did not go past 12 health difference on any map even after 500 trajectories.

 Frustrating Problems for Users   Large delay experience. (not an issue in many realistic settings)  Policy returned after practice was sometimes poor, seemed to be ignoring advice. (perhaps practice sessions were too short)  Comparing Combined and Supervised:  The end-users had slightly greater success with the supervised system v/s the combined system.  More users were able to achieve performance levels of 50 and 80 using the supervised system.

 Understanding the effects of user models:  Study sensitivity of our algorithm to various settings of model parameters.  Robustness of our algorithm against inaccurate parameter settings.  Study benefits of using more elaborate user models.  Understanding the effects of mixing advice from multiple teachers:  Pros: addresses incompleteness and lack of quality of advice from a single teacher.  Cons: introduces variations and more complex patterns that are hard to generalize.  Study benefits and harms of mixing advice.  Understanding the effects of advice types:  Study the effects of feedback only versus mixed advice.

 Current advice collection mechanism is very basic:  An entire episode is played before the teacher.  Teacher scans the episode to locate places where critique is needed.  Only one episode is played.  Problems with current advice collection mechanism:  Teacher is fully responsible for locating places where critique is needed.  Scanning an entire episode is very cognitively demanding.  Good possibility of missing places where advice is critical.  Showing only one episode is a limitation, especially in stochastic domains.  GOAL: Learner should itself discover places where it needs advice and query teacher at those places.

Trajectory Data Practice Session Advice Interface In a state s i action a i is bad, whereas action a j is good. Teacher Critique Session Policy parameters  Critique Data Current policy Full Episode:

Trajectory Data Practice Session Advice Interface In a state s i action a i is bad, whereas action a j is good. Teacher Critique Session Policy parameters  Critique Data Active Learning Module Best sequence: Current policy Cost Model ($$) Problem: How to select that best optimizes benefit-to-cost tradeoff?

 Few techniques exist for the problem of ‘active learning’ in sequential decision making with an external teacher  All techniques make some assumptions that work only for certain applications  Some techniques request full demonstration from start state  Some techniques assume teacher is always available and request a single or multi-step demonstration when needed  Some techniques removes assumption of teacher being present at all times but they pause till the request for demonstration is satisfied

 We feel such assumptions are unnecessary in general  Providing demonstration is quite labor intensive and sometimes not even practical  We instead seek feedback and guidance on potential execution traces of our policy  Pausing and waiting for the teacher is also inefficient  We never want to pause but keep generating execution traces from our policy for teacher to critique later when he/she is available

 Active learning is well developed for supervised setting  All instances come from single distribution of interest  Best instance is selected based on some criterion and queried for its label  In our setting, the distribution of interest is the distribution of states along the teacher's policy (or a good policy)  Asking queries about states that are far off of the teacher's policy is likely to not produce any useful feedback (losing states in Chess or Wargus)  Learner faces additional challenge to identify states that occur along the teacher's policy and query in those states

 Define new performance metric: Expected Common Prefix Length (ECPL): is # of time steps and agree up to first disagreement, starting from

State with first disagreement Common Prefix Length Unimportant States

 Ideally we should select sequences that directly maximize ECPL  Heuristic: Identify sequence that contains states with high probability of first disagreement (similar to uncertainty sampling) Common Prefix Length Unimportant States States where first disagreement is likely High Confidence Execution (most likely we are on right path) Low Confidence Execution (most likely we are going to make a wrong turn) (most likely we should not be here)

 Fixed length sequences  Variable length sequences: compute optimal length that trades off benefit from critique versus cost of critique. We plan to use return on investment heuristic proposed by Haertel et. Al. (Return on Investment for Active Learning, NIPS Workshop on Cost Sensitive Learning, 2009) Common Prefix Length Unimportant States States where first disagreement is likely High Confidence Execution (most likely we are on right path) Low Confidence Execution (most likely we are going to make a wrong turn) (most likely we should not be here)

 Practice followed by active learning: let the agent practice, when critique session starts use heuristics from supervised setting to select best sequence for the current policy  Modify heuristic from supervised setting: Can we better discover sequences with first disagreement using information from practice  Based on self assessment during practice

 We will use RTS game Wargus as our domain  Extend our current tactical battle micromanagement scenarios with more and different types of combat units  Teach control policies for attack and maneuver behavior of individual units  Teach policies to control low-level actions in resource gathering tasks  User Studies: Conduct further user studies

PeriodResearch Topic Aug 2010Get proposal approved. Sept-Oct 2010Active learning in supervised setting, teacher models (for journal). Nov-Dec 2010Active learning in practice setting, advice patterns (for journal). Jan 2011 Submit a paper to IJCAI/AAAI. Feb-Mar 2011 Start writing journal paper, mixing advice (for journal). Apr-May 2011 Finish writing journal paper, start writing dissertation. June-Aug 2011 Finish writing dissertation, submit journal paper. Sept 2011 defense.

 Presented our completed work on:  User-Initiated Learning  RL via Practice and Critique Advice  Proposed future directions for the PhD program:  Extending RL via practice and critique advice  Active Learning for sequential decision making  Presented potential approaches and evaluation plans in order to carry out proposed work  Presented a timeline for completing the proposed work

Figure: Fraction of positive, negative and mixed advice. Supervised Combined  Positive (or Negative) advice is where the user only gives feedback on the action taken by the agent.  Mixed is where the user not only gives feedback on the agent's action but also suggests alternative actions to the agent.