Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University.

Slides:

Advertisements

Similar presentations

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.

Advertisements

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

1 Kshitij Judah, Alan Fern, Tom Dietterich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: UAI-2012 Catalina Island,

Dynamic Bayesian Networks (DBNs)

Supervised Learning Recap

Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.

Observers and Kalman Filters

Kalman’s Beautiful Filter (an introduction) George Kantor presented to Sensor Based Planning Lab Carnegie Mellon University December 8, 2000.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

TOWARD DYNAMIC GRASP ACQUISITION: THE G-SLAM PROBLEM Li (Emma) Zhang and Jeff Trinkle Department of Computer Science, Rensselaer Polytechnic Institute.

Apprenticeship Learning by Inverse Reinforcement Learning Pieter Abbeel Andrew Y. Ng Stanford University.

Apprenticeship Learning by Inverse Reinforcement Learning Pieter Abbeel Andrew Y. Ng Stanford University.

CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California

1RADAR – Scheduling Task © 2003 Carnegie Mellon University RADAR – Scheduling Task May 20, 2003 Manuela Veloso, Stephen Smith, Jaime Carbonell, Brett Browning,

Single Point of Contact Manipulation of Unknown Objects Stuart Anderson Advisor: Reid Simmons School of Computer Science Carnegie Mellon University.

Stanford CS223B Computer Vision, Winter 2006 Lecture 11 Filters / Motion Tracking Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg.

CPSC 322, Lecture 32Slide 1 Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5) March, 27, 2009.

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

Pieter Abbeel and Andrew Y. Ng Reinforcement Learning and Apprenticeship Learning Pieter Abbeel and Andrew Y. Ng Stanford University.

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

A Unifying Review of Linear Gaussian Models

Particle Filters++ TexPoint fonts used in EMF.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)

Simultaneous Localization and Mapping Presented by Lihan He Apr. 21, 2006.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Probabilistic Robotics Bayes Filter Implementations Gaussian filters.

Probabilistic Robotics Bayes Filter Implementations.

Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.

1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.

CPSC 322, Lecture 32Slide 1 Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5.2) Nov, 25, 2013.

A Confidence-Based Approach to Multi-Robot Demonstration Learning Sonia Chernova Manuela Veloso Carnegie Mellon University Computer Science Department.

Detection, Classification and Tracking in a Distributed Wireless Sensor Network Presenter: Hui Cao.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))

Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.

CS Statistical Machine learning Lecture 24

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

Joseph Xu Soar Workshop Learning Modal Continuous Models.

Twenty Second Conference on Artificial Intelligence AAAI 2007 Improved State Estimation in Multiagent Settings with Continuous or Large Discrete State.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

OBJECT TRACKING USING PARTICLE FILTERS. Table of Contents Tracking Tracking Tracking as a probabilistic inference problem Tracking as a probabilistic.

State Estimation and Kalman Filtering Zeeshan Ali Sayyed.

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.

CS 541: Artificial Intelligence Lecture VIII: Temporal Probability Models.

Probabilistic Robotics Bayes Filter Implementations Gaussian filters.

AAAI Spring Symposium : 23 March Brenna D. Argall : The Robotics Institute Learning Robot Motion Control from Demonstration and Human Advice Brenna.

CS b659: Intelligent Robotics

Probabilistic Robotics

Markov ó Kalman Filter Localization

Daniel Brown and Scott Niekum The University of Texas at Austin

Introduction to particle filter

State Estimation Probability, Bayes Filtering

Active learning The learning algorithm must have some control over the data from which it learns It must be able to query an oracle, requesting for labels.

Introduction to particle filter

Computational Learning Theory

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

A Short Introduction to the Bayes Filter and Related Models

Computational Learning Theory

Learning a Policy for Opportunistic Active Learning

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Multivariate Methods Berlin Chen

Reinforcement Learning Dealing with Partial Observability

Presentation transcript:

Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University Grad AI – Spring 2013

Task Representation Robot state Robot actions Training dataset: Policy as classifier (e.g., Gaussian Mixture Model, Support Vector Machine) – policy action – decision boundary with greatest confidence for the query – classification confidence w.r.t. decision boundary sensor data f1f1 f2f2 s

Confidence-Based Autonomy Assumptions Teacher understands and can demonstrate the task High-level task learning –Discrete actions –Non-negligible action duration State space contains all information necessary to learn the task policy Robot is able to stop to request demonstration –… however, the environment may continue to change

Policy NoYes Confident Execution s2s2 stst …sisi …s4s4 s3s3 s1s1 Time Current State sisi Request Demonstration ? Execute Action a p Relearn Classifier Execute Action a d Request Demonstration adad Add Training Point (s i, a d )

Demonstration Selection When should the robot request a demonstration? –To obtain useful training data –To restrict autonomy in areas of uncertainty

Fixed Confidence Threshold Why not apply a fixed classification confidence threshold? –Example:  conf = 0.5 –Simple –How to select good threshold value? s s

Confident Execution Demonstration Selection Distance parameter  dist –Used to identify outliers and unexplored regions of state space Set of confidence parameters  conf –Used to identify ambiguous state regions in which more than one action is applicable

Confident Execution Distance Parameter Distance parameter  dist s where Given  Given state query, request demonstration if

Confident Execution Confidence Parameters Set of confidence parameters  conf –One for each decision boundary where Given and classifier  Given state query, request demonstration if s

Policy NoYes Confident Execution sisi Request Demonstration ? Execute Action a p Relearn Classifier Execute Action a d Request Demonstration adad Add Training Point (s i, a d ) or

Corrective Demonstration Confidence-Based Autonomy Confident Execution Policy NoYes sisi Request Demonstration ? Execute Action a p Relearn Classifier Execute Action a d Request Demonstration adad Add Training Point (s i, a d ) acac Teacher Relearn Classifier Add Training Point (s i, a c )

Evaluation in Driving Domain Introduced by Abbeel and Ng, 2004  Task: Teach the agent to drive on the highway –Fixed driving speed –Pass slower cars and avoid collisions current lane nearest car lane 1 nearest car lane 2 nearest car lane 3 state merge left merge right stay in lane actions

Evaluation in Driving Domain Demonstration Selection Method # Demonstrations Collision Timesteps “Teacher knows best” % Confident Execution fixed  conf % Confident Execution  dist & mult.  conf % CBA7030% CBA Final Policy

Demonstrations Over Time Total Demonstrations Confident Execution Corrective Demonstration

Summary Confidence-Based Autonomy algorithm –Confident Execution demonstration selection –Corrective Demonstration

What did we do today? (PO)MDPs: need to generate a good policy –Assumes the agent has some method for estimating its state (given current belief state and action, observation, where do I think I am now?) –How do we estimate this? Discrete latent states  HMMs (simplest DBNs) Continuous latent states, observed states drawn from Gaussian, linear dynamical system  Kalman filters –(Assumptions relaxed by Extended Kalman Filter, etc) Not analytic  particle filters –Take weighted samples (“particles”) of an underlying distribution We’ve mainly looked at policies for discrete state spaces For continuous state spaces, can use LfD: –ML gives us a good-guess action based on past actions –If we’re not confident enough, ask for help!