Learning a Policy for Opportunistic Active Learning

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.
Patch to the Future: Unsupervised Visual Prediction
A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.
Multiple Instance Learning
Learning from Observations Chapter 18 Section 1 – 4.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Presented by Zeehasham Rasheed
RL via Practice and Critique Advice Kshitij Judah, Saikat Roy, Alan Fern and Tom Dietterich PROBLEM: RL takes a long time to learn a good policy. Teacher.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Universit at Dortmund, LS VIII
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
A Confidence-Based Approach to Multi-Robot Demonstration Learning Sonia Chernova Manuela Veloso Carnegie Mellon University Computer Science Department.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University.
Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Machine Learning Concept Learning General-to Specific Ordering
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Queensland University of Technology
Figure 5: Change in Blackjack Posterior Distributions over Time.
What is a Hidden Markov Model?
Large-Scale Content-Based Audio Retrieval from Text Queries
Using the Web to Interactively Learn to Find Objects
Sofus A. Macskassy Fetch Technologies
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Adversarial Learning for Neural Dialogue Generation
PROM/SE Ohio Mathematics Associates Institute Spring 2005
Integrating Learning of Dialog Strategies and Semantic Parsing
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Structured Learning of Two-Level Dynamic Rankings
Weakly Learning to Match Experts in Online Community
Detecting the Learning Value of Items In a Randomized Problem Set
Machine Learning: Lecture 3
a chicken and egg problem…
Reinforcement Learning
Machine Learning in Practice Lecture 7
Intrinsically Motivated Collective Motion
Conjoint analysis.
Machine Learning in Practice Lecture 27
Standards learning goals - syllabus lecture notes – the current .ppt
Natural Language Processing (NLP) Systems Joseph E. Gonzalez
Retrieval Performance Evaluation - Measures
Jointly Generating Captions to Aid Visual Question Answering
The Winograd Schema Challenge Hector J. Levesque AAAI, 2011
Visual Question Answering
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Precision and Recall.
ICCV 2019.
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Learning a Policy for Opportunistic Active Learning Aishwarya Padmakumar, Peter Stone, Raymond J. Mooney Department of Computer Science The University of Texas at Austin

Natural Language Interaction with Robots With the advances in AI, robots are increasingly becoming a part of human environments. For most users, the easiest way to communicate with them, would be as they do with each other, in natural language.

Objects in Human Environments Human environments are typically filled with a lot of small objects that robots will likely have to handle. These present a number of challenges.

Objects in Human Environments Diverse It is difficult to catalog all the possible objects one would see in a home or office, and some objects such as decorations and souveneirs are especially hard to describe.

Objects in Human Environments Diverse Transient Objects may be transient. This person will probably throw the coffee cup after finishing it. So we can’t rely on memorization.

Objects in Human Environments Diverse Transient Described using diverse perceptual properties “light empty yellow spiky container” “a yellow pineapple” “a full neon green water bottle” “a green water bottle that's heavy” People typically refer to objects in terms of their properties, rather than with unique names. Understanding such descriptions is important for robots to communicate with humans.

Understanding Object Descriptions Robots need be able to Ground language in perception. Handle novel perceptual concepts during operation. Since the objects and properties used will be diverse, robots need to be able to ground such descriptions using perception. They also need to be able to handle the use of novel perceptual concepts during operation.

Opportunistic Active Learning (Thomason et al., CoRL 2017) A framework for incorporating active learning queries into test time interactions. Demonstrated improvement in learning novel perceptual concepts to ground natural language descriptions of objects. Prior work introduced the framework of opportunistic active learning - where an agent asks locally convenient questions during an interaction that may not be immediately relevant, but are expected to improve performance in future tasks. They demonstrated that this helped improve a robot’s performance in understanding natural language descriptions of objects.

Goal of this Work Learning a dialog policy for an interactive object retrieval task. Opportunistic Active Learning Grounded Language Learning This Work Reinforcement Learning We extend this work by learning a policy for an interactive task of grounding object descriptions using reinforcement learning. Our goal is to learn to trade-off learning new perceptual concepts through opportunistic active learning, with completing dialogs quickly with successful object retrieval.

Task Setup Based on task from prior work (Thomason et al., CoRL 2017) Our contribution - Setting it up in simulation using the Visual Genome dataset.

Task Setup Active Training Set Dialog Active Test Set Test_1 Test_2 Our task setup looks like this. In each set interactions, there is a set of objects called an active test set

Task Setup Robot: Describe the object I should find. Active Training Set Dialog Active Test Set Test_1 Test_2 Robot: Describe the object I should find. Human: A white umbrella. Test_3 Target Description Test_4 Target The robot is given a target description of one of the objects in this set. The robot needs to identify the object being described.

Task Setup Active Training Set Dialog Active Test Set Robot: Describe the object I should find. Human: A white umbrella. Test_1 Test_2 Train_2 Test_3 Train_5 Train_3 Train_8 Test_4 Train_6 Train_7 There is another set of objects called the active training set.

Task Setup Active Training Set Dialog Active Test Set Train_1 Train_4 Robot: Describe the object I should find. Human: A white umbrella. Test_1 Test_2 Train_2 Robot: Is there something in Train_6 that can be described as yellow? Human: No. Test_3 Train_5 Train_3 Train_8 Test_4 Label Query Train_6 Train_7 The robot can ask questions about objects in the active training set before it makes a guess. Label queries are about whether a concept applies to an object

Task Setup Active Training Set Dialog Active Test Set Train_1 Train_4 Robot: Describe the object I should find. Human: A white umbrella. Test_1 Test_2 Train_2 Robot: Is there something in Train_6 that can be described as yellow? Human: No. Test_3 Train_5 Train_3 Train_8 Test_4 Opportunistic Query Train_6 Train_7 A query is said to be opportunistic if it is not directly relevant to the current guess, for example asking about yellow when you have to find a white umbrella.

Task Setup Active Training Set Dialog Active Test Set Train_1 Train_4 Robot: Describe the object I should find. Human: A white umbrella. Robot: Is there something in Train_6 that can be described as yellow? Human: No. Test_1 Test_2 Train_2 Test_3 Train_5 Robot: Can you show me an image with something that can be described as white? Human: Train_1 Train_3 Train_8 Test_4 Train_6 Train_7 Example queries ask for an example of a label in the training set. Example Query

Task Setup Robot: My guess is Test_4 Human: Correct Guess Active Training Set Dialog Active Test Set Train_1 Train_4 Robot: Describe the object I should find. Human: A white umbrella. Robot: Is there something in Train_6 that can be described as yellow? Human: No. Robot: Can you show me an image with something that can be described as white? Human: Train_1 Test_1 Test_2 Train_2 Test_3 Train_5 Train_3 Robot: My guess is Test_4 Human: Correct Train_8 Test_4 Train_6 Train_7 Guess Example queries ask for an example of a label in the training set.

Goal of the Task Learn to maximize the fraction of successful guesses across interactions by Learning when to ask queries, and when to stop and guess. Learning to choose between different possible queries. The goal is to learn to maximize the fraction of successful guesses across interactions by learning when to query, which queries to ask, and when to stop and guess.

Active Learning ? We want the robot to use active learning to identify questions, that would be most useful to improve its model.

Why Opportunistic Queries? Robot may have good models for on-topic concepts. No useful on-topic queries. Some off-topic concepts may be more important because they are used in more interactions. Why would we want to ask off-topic questions? The robot may have good models for blue and mug, or there may be nothing around it for which it is uncertain whether the object is blue or a mug. Or it might have seen that a lot of people ask for tall objects so that is more improtant to learn.

Opportunistic Active Learning - Challenges Some other object might be a better candidate for the question Purple? But the setting is challenging for many reasons. It is only considering the objects current available in the context of the interaction, and there may be a better choice of query in the future.

Opportunistic Active Learning - Challenges The question interrupts another task and may be seen as unnatural Find a green bottle. Would you use the word “metallic” to describe this object? The question could irritate the user because it appears irrelevant.

Opportunistic Active Learning - Challenges The information needs to be useful for a future task. Red? Or it might learn a concept say red because it has seen many people ask for red things. But after it learns this, other users may not ask for red things.

Grounding Model A white umbrella {white, umbrella} SVM white/ not white Pretrained CNN SVM umbrella/ not umbrella W3e use the following model to ground object descriptions. Given a language description, we assume all words except stopwords are perceptual predicates. For each of these we learn a binary SVM over images, that uses a feature representation from VGGNet pretrained on ImageNet.

Grounding Model Active Test Set To guess, we choose the object in the active test set that maximizes this score

Grounding Model Classifier decision Active Test Set in {-1, 1} The score that weights classifier decisions d(p_1, o)

Classifier confidence Grounding Model Classifier decision in {-1, 1} Classifier confidence in (0, 1) Active Test Set By the confidence of the classifier C(p_i), which is F1 estimated by cross validation

Grounding Model Classifier decision in {-1, 1} Classifier confidence Active Test Set Summed over predicates in description And this is summed over all perceptual predicates in the decription

Active Learning Agent starts with no classifiers. Labeled examples are acquired through questions and used to train the classifiers. Agent needs to learn a policy to balance active learning with task completion.

Modelling the Dialog as an MDP Dialog Agent +100 for correct guess -100 for incorrect guess -1 per to shorten dialogs State: Action: Target description Active Train and test objects Agent’s perceptual classifiers Label query Example Query Guess Reward: User The model the dialog as an MDP where the state consists of Target description Train and test objects Agent’s perceptual classifiers And the possible actions are guessing, and the current possible label and example queries - which are any predicate it knows for any object in the active training set. We set up a reward function to maximize the number of correct guesses while also keeping dialogs as short as possible.

Challenges What information about classifiers should be represented? Features based on active learning metrics Variable number of queries and classifiers Create features for state-action pairs Large action space Sample a beam of promising queries

Feature Groups Query features - Active learning metrics used to determine whether a query is useful. Examples - Current estimated F1 of classifier Margin of object for classifier (for label query) We have two main groups of features. Query features are based on active learning metrics such as uncertainty sampling and density weighting, and are used to determine whether a query would be useful to the agent. Guess features use the predictions and confidences of classifiers to determine whether a guess would be correct.

Feature Groups Guess features - Features that use the predictions and confidences of classifiers to determine whether a guess will be correct. Examples - Highest score among regions in the active test set. Average estimated F1 of classifiers of concepts in description We have two main groups of features. Query features are based on active learning metrics such as uncertainty sampling and density weighting, and are used to determine whether a query would be useful to the agent. Guess features use the predictions and confidences of classifiers to determine whether a guess would be correct.

Experiment Setup Policy learning using REINFORCE. Baseline - A hand-coded dialog policy that asks a fixed number of questions selected using the same sampling distribution. In our experiments we simulated dialogs using the visual genome dataset. We used the annotated region descriptions to provide target descriptions, and annotated objects and attributes to answer queries. We used the REINFORCE algorithm for policy learning, and compared to a baseline static policy that asks a fixed number of questions in every dialog. The questions asked were sampled from the same distribution as that used by the policy.

Experiment Phases Initialization - Collect experience using the baseline to initialize the policy. Training - Improve the policy from on-policy experience. Testing - Policy weights are fixed, and we run a new set of interactions, starting with no classifiers, over an independent test set with different predicates. We initialized the policy by collecting experience from the baseline, and then trained it with on-policy experience. Each of these consisted of 10 batches of 100 dialogs each. Following this, we froze the policy weights, reset the system to start with no classifiers, and ran 10 batches of interactions on a test set having novel predicates in descriptions. We report results comparing the last batch of this interaction with a similar batch using the static policy. We also report results ablating the two major groups of features.

Results Systems evaluated on dialog success rate and average dialog length. We prefer high success rate and low dialog length (top left corner)

Results Static

Results Learned policy is more successful than the baseline, while also using shorter dialogs on average. Learned Static

Results If we ablate either group of features, the success rate drops considerably but dialogs are also much shorter. In both cases, the system chooses to ask very few queries. Learned - Query - Guess Static

Summary We can learn a dialog policy that learns to acquire knowledge of predicates through opportunistic active learning. The learned policy is more successful at object retrieval than a static baseline, using fewer dialog turns on average. In summary, we demonstrate that we can learn a policy for an interactive object retrieval task involving opportunistic active learning. The learned policy is more successful at object retrieval than a static baseline, using fewer dialog turns on average.

Learning a Policy for Opportunistic Active Learning Aishwarya Padmakumar, Peter Stone, Raymond J. Mooney Department of Computer Science The University of Texas at Austin

Policy Representation

Query Features Does the concept have a classifier? Current estimated F1 of the classifier Fraction of previous dialogs in which the predicate has been used, and the agent’s success rate in these. Is the query opportunistic?

Query Features Margin of object Density of object Fraction of k nearest neighbours of the object which are unlabelled

Guess Features Lowest, highest, second highest, and average estimated F1 among classifiers of concepts in the description. Highest score among regions in the active test set, and the differences between this and the second highest, and average scores respectively. An indicator of whether the two most confident classifiers agree on the decision of the top scoring region.

Sampling distribution