Download presentation
Presentation is loading. Please wait.
Published byLesley Beeson Modified over 10 years ago
1
Beyond Geometric Path Planning: When Context Matters Ashesh Jain, Shikhar Sharma Thorsten Joachims and Ashutosh Saxena
2
Outline Motivation Approach – Context-based score – Feedback mechanism – Learning algorithm Results Jain, Sharma, Joachims, Saxena
3
Structured To Unstructured Environments [Images from Google] Kuka arm Kiva Beam Baxter PR2 Robot Nurse Jain, Sharma, Joachims, Saxena
4
Path Planning High DoF manipulators Continuous high dimensional space Obstacles BASE 7 DoF arm Joint Link End-Effector A B Geometric criteria's Collision free Shortest path Least time Minimum energy Kavraki et. al. PRM LaValle et. al. RRT Ratliff et. al. CHOMP Karaman et. al. RRT* Schulman et. al. TrajOpt Jain, Sharma, Joachims, Saxena
5
Context Rich Environment Jain, Sharma, Joachims, Saxena
6
Context Rich Environment Jain, Sharma, Joachims, Saxena http://www.youtube.com/watch?v=uLktpkd7ojA Video [14 sec to 18 sec]
7
What went wrong? Robot not modeling the context Does not understand the preferences Jain, Sharma, Joachims, Saxena
8
Does Existing Works Address This? Inverse Reinforcement Learning (Kober and Peters 2011, Abbeel et. al. 2010, Ziebrat et. al. 2008, Ratliff et. al. 2006) Context is not important, focuses on specific trajectory – Modeling human navigation patterns Kitani et. al. ECCV 2012 Optimal Demonstrations: Requires an expert Abbeel et. al.Ratliff et. al.Kober et. al. Jain, Sharma, Joachims, Saxena
9
Our Goal Model Context Generate multiple trajectories for a task User preferences Learn from non-experts Jain, Sharma, Joachims, Saxena
10
Outline Motivation Approach – Context-based score – Feedback mechanism – Learning algorithm Results Jain, Sharma, Joachims, Saxena
11
Learning Setting UserRobot 1.Online learning system 2.Learns from user feedback 3.Sub-optimal feedback Goal: Learn user preferences Jain, Sharma, Joachims, Saxena
12
Outline Motivation Approach – Context-based score – Feedback mechanism – Learning algorithm Results Jain, Sharma, Joachims, Saxena
13
Example of Preferences Move a glass of water Upright Context Contorted Arm Preferences varies with users, tasks and environments Jain, Sharma, Joachims, Saxena
14
Score function Robot configuration and Environment Interactions Context Trajectory Jain, Sharma, Joachims, Saxena Object-object Interactions Connecting waypoints to neighboring objects Trajectory graph
15
Score function Jain, Sharma, Joachims, Saxena Trajectory graph Object attributes: {electronic, fragile, sharp, liquid, hot, …} E.g.Laptop: {electronic, fragile} Knife: {sharp} ….. Hermans et. al. ICRA w/s 2011 Koppula et. al. NIPS 2011 Distance features Object-object Interactions
16
Score function Object-object Interactions Robot configuration and Environment Interactions Bad Good Jain, Sharma, Joachims, Saxena Features 1.Spectrogram 2.Objects distance from horizontal and vertical surfaces 3.Objects angle with vertical axis 4.Robots wrist and elbow configuration in cylindrical co-ordinate Cakmak et. al. IROS 2011
17
Outline Motivation Approach – Context-based score – Feedback mechanism – Learning algorithm Results Jain, Sharma, Joachims, Saxena
18
User Feedback Intuitive feedback mechanisms Re-rank Interactive Zero-G Jain, Sharma, Joachims, Saxena
19
1. Re-rank Robot ranks trajectories and user selects one Top three trajectories User feedback User observing top three trajectories Jain, Sharma, Joachims, Saxena
20
2. Zero-G User corrects trajectory waypoints Bad waypoint in redHolding wrist activates zero-G mode Jain, Sharma, Joachims, Saxena
21
Bad waypoint in redHolding wrist activates zero-G mode 2. Zero-G User corrects trajectory waypoints Jain, Sharma, Joachims, Saxena
22
3. Interactive Not all robots support zero-G feedback Jain, Sharma, Joachims, Saxena
23
3. Interactive Not all robots support zero-G feedback Jain, Sharma, Joachims, Saxena
24
Outline Motivation Approach – Context-based score – Feedback mechanism – Learning algorithm Results Jain, Sharma, Joachims, Saxena
25
Coactive Learning UserRobot Goal: Learn user preferences Shivaswamy & Joachims, ICML 2012 Learn from sub- optimal feedback Jain, Sharma, Joachims, Saxena
26
Trajectory Preference Perceptron Regret bound Shivaswamy & Joachims, ICML 2012 Jain, Sharma, Joachims, Saxena
27
Outline Motivation Approach – Context-based score – Feedback mechanism – Learning algorithm Results Jain, Sharma, Joachims, Saxena
28
Experimental Setup Two robots: Baxter and PR2 35 tasks in household setting – 2100 expert labeled trajectories 16 tasks in grocery store checkout settings – 1300 expert labeled trajectories 14 objects – Bowl, Knife, Laptop, Metal box, Fruits, Egg cartons etc. 7 users Jain, Sharma, Joachims, Saxena
29
Experimental Setting 1 Household environment on PR2 Pouring Cleaning the table Setting up table 35 tasks Variation in objects and environment Experts label on 2100 trajectories on a scale of 1 to 5 Jain, Sharma, Joachims, Saxena
30
Experimental Setting 2 Grocery store checkout on Baxter Cereal box Egg carton Knife in human vicinity 16 tasks Variations in objects and their placement Experts label on 1300 trajectories on a scale of 1 to 5 Jain, Sharma, Joachims, Saxena
31
Generalization #Feedback nDCG@3 Ours w/o pre-training Ours pre-trained SVM-rank MMP-online Household setting Testing on a new environment Higher nDCG w/o feedback SVM-rank trained on experts labels MMP-online is an IRL technique Jain, Sharma, Joachims, Saxena
32
User Study 10 tasks per user – 7 users – Total 7 hours worth robot interaction – Users interacts until satisfied Jain, Sharma, Joachims, Saxena
33
User Study Task No. Time (min) #Feedback Increasing difficulty Grocery setting Baxter Re-rank popular for easier tasks Increase in zero- G for hard tasks #Feedback Time Re-rank Zero-G Jain, Sharma, Joachims, Saxena
34
User# Re-rank# Zero-G Time (min) Self Score Cross Score 15.43.37.83.84.0 21.81.74.64.33.6 32.92.05.04.43.2 4 1.55.33.03.7 53.61.95.03.53.3 63.12.4-3.53.6 72.31.8-4.1 User Study 3.2 (1.1)2.1 (0.6)5.5 (1.3)3.8 (0.5)3.6 (0.3) Avg. Jain, Sharma, Joachims, Saxena
35
User# Re-rank# Zero-G Time (min) Self Score Cross Score 15.43.37.83.84.0 21.81.74.64.33.6 32.92.05.04.43.2 4 1.55.33.03.7 53.61.95.03.53.3 63.12.4-3.53.6 72.31.8-4.1 User Study 3.2 (1.1)2.1 (0.6)5.5 (1.3)3.8 (0.5)3.6 (0.3) Avg. 5 Feedback 3 Re-rank 2 Zero-G Jain, Sharma, Joachims, Saxena
36
User# Re-rank# Zero-G Time (min) Self Score Cross Score 15.43.37.83.84.0 21.81.74.64.33.6 32.92.05.04.43.2 4 1.55.33.03.7 53.61.95.03.53.3 63.12.4-3.53.6 72.31.8-4.1 User Study 3.2 (1.1)2.1 (0.6)5.5 (1.3)3.8 (0.5)3.6 (0.3) Avg. 5 to 6 min. per task Jain, Sharma, Joachims, Saxena
37
User# Re-rank# Zero-G Time (min) Self Score Cross Score 15.43.37.83.84.0 21.81.74.64.33.6 32.92.05.04.43.2 4 1.55.33.03.7 53.61.95.03.53.3 63.12.4-3.53.6 72.31.8-4.1 User Study 3.2 (1.1)2.1 (0.6)5.5 (1.3)3.8 (0.5)3.6 (0.3) Avg. Similar preferences Jain, Sharma, Joachims, Saxena
38
Robot Demonstration Jain, Sharma, Joachims, Saxena http://www.youtube.com/watch?v=uLktpkd7ojA Video [full video]
39
Conclusion Challenges of Unstructured Environment Geometric approaches are not enough Modeling context is crucial Learning from users and not experts Jain, Sharma, Joachims, Saxena
40
Thank You For more details visit http://pr.cs.cornell.edu/coactive
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.