Autonomous Skill Acquisition on a Mobile Manipulator Hauptseminar: Topics in Robotics 12.06.2013 Jonah Vincke George Konidaris MIT CSAIL Scott Kuindersma.

Slides:



Advertisements
Similar presentations
Kien A. Hua Division of Computer Science University of Central Florida.
Advertisements

Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
University of Minho School of Engineering Centre ALGORITMI Uma Escola a Reinventar o Futuro – Semana da Escola de Engenharia - 24 a 27 de Outubro de 2011.
A vision-based system for grasping novel objects in cluttered environments Ashutosh Saxena, Lawson Wong, Morgan Quigley, Andrew Y. Ng 2007 Learning to.
LCSLCS 18 September 2002DARPA MARS PI Meeting Intelligent Adaptive Mobile Robots Georgios Theocharous MIT AI Laboratory with Terran Lane and Leslie Pack.
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
Chapter 10 Artificial Intelligence © 2007 Pearson Addison-Wesley. All rights reserved.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Laboratory for Perceptual Robotics – Department of Computer Science Hierarchical Mechanisms for Robot Programming Shiraj Sen Stephen Hart Rod Grupen Laboratory.
L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE A Relational Representation for Procedural.
Apprenticeship Learning by Inverse Reinforcement Learning Pieter Abbeel Andrew Y. Ng Stanford University.
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Generalized Grasping and Manipulation Laboratory.
Hierarchical Reinforcement Learning Ersin Basaran 19/03/2005.
Robotics for Intelligent Environments
Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Behavior-Based Artificial Intelligence Pattie Maes MIT Media-Laboratory Presentation by: Derak Berreyesa UNR, CS Department.
Mobile Robot ApplicationsMobile Robot Applications Textbook: –T. Bräunl Embedded Robotics, Springer 2003 Recommended Reading: 1. J. Jones, A. Flynn: Mobile.
 For many years human being has been trying to recreate the complex mechanisms that human body forms & to copy or imitate human systems  As a result.
Situation decomposition method extracts partial data which contains some rules. Hiroshi Yamakawa (FUJITSU LABORATORIES LTD.)
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence.
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
Natural Actor-Critic Authors: Jan Peters and Stefan Schaal Neurocomputing, 2008 Cognitive robotics 2008/2009 Wouter Klijn.
L ABORATORY FOR P ERCEPTUAL R OBOTICS U NIVERSITY OF M ASSACHUSETTS A MHERST D EPARTMENT OF C OMPUTER S CIENCE Learning Prospective Robot Behavior Shichao.
Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.
George F Luger ARTIFICIAL INTELLIGENCE 6th edition Structures and Strategies for Complex Problem Solving Machine Learning: Symbol-Based Luger: Artificial.
Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.
Topological Path Planning JBNU, Division of Computer Science and Engineering Parallel Computing Lab Jonghwi Kim Introduction to AI Robots Chapter 9.
Transfer in Variable - Reward Hierarchical Reinforcement Learning Hui Li March 31, 2006.
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
Artificial Intelligence Chapter 10 Planning, Acting, and Learning Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.
  Computer vision is a field that includes methods for acquiring,prcessing, analyzing, and understanding images and, in general, high-dimensional data.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Achieving Goals in Decentralized POMDPs Christopher Amato Shlomo Zilberstein UMass.
 Draganflyer X6 is an advanced helicopter that can be operated remotely without any pilot.  It is designed mainly to carry wireless video cameras and.
Manipulation in Human Environments
Machine Learning: Symbol-Based
Markov Decision Process (MDP)
Online Multiscale Dynamic Topic Models
Machine Learning: Symbol-Based
CS b659: Intelligent Robotics
Done Done Course Overview What is AI? What are the Major Challenges?
A Crash Course in Reinforcement Learning
Artificial Intelligence Chapter 25 Agent Architectures
Reinforcement learning (Chapter 21)
Reinforcement Learning
E-learning. DongFeng Liu, Priscila Valdiviezo, Guido Riofrío, Rodrigo Barba November 19, 2015 VARE dfgsdgfsgdsgsdgaga.
Learning about Objects
UAV Route Planning in Delay Tolerant Networks
Robust Belief-based Execution of Manipulation Programs
Announcements Homework 3 due today (grace period through Friday)
Bowei Tang, Tianyu Chen, and Christopher Atkeson
Chapter 3: The Reinforcement Learning Problem
CMSC 471 Fall 2009 RL using Dynamic Programming
Chapter 4: Dynamic Programming
SNU BioIntelligence Lab.
Chapter 4: Dynamic Programming
Chapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem
Artificial Intelligence Chapter 10 Planning, Acting, and Learning
CS 188: Artificial Intelligence Fall 2008
Artificial Intelligence Chapter 25. Agent Architectures
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Artificial Intelligence Chapter 25 Agent Architectures
Emir Zeylan Stylianos Filippou
Artificial Intelligence Chapter 10 Planning, Acting, and Learning
CS 416 Artificial Intelligence
Chapter 4: Dynamic Programming
Unsupervised Perceptual Rewards For Imitation Learning
Chapter 4 . Trajectory planning and Inverse kinematics
Presentation transcript:

Autonomous Skill Acquisition on a Mobile Manipulator Hauptseminar: Topics in Robotics Jonah Vincke George Konidaris MIT CSAIL Scott Kuindersma Roderic Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst Andrew Barto Autonomous Learning Laboratory University of Massachusetts Amherst

Table of Content ● Motivation ● uBot-5 ● Basics / Theories ● The Experiment ● Results ● Conclusion

Motivation ● In Robotics the Machine Learning is a core research ● In many cases an autonomous learning is necessary/desired: Figure 2: Household Robot [F-1] Figure 1: Mars Robot

Idea ● Robot knows basic (low level) actions ● Robot learns combinations of basic actions (called “skills”) by solving a first task ● For similar tasks the robot can use the learned skills ● Task of the research is to determine if the idea is feasible, not to develop a system with a special purpose

uBot-5 ● Whole Robot developed at the UMass ● Two arms terminated by balls (only basic manipulations) ● Two cameras are used to identify Objects using the ARToolkit system ● Navigation to an Object and the balancing still developed Figure 3: The uBot-5 [1]

Configuration Skill Tree ● Scene: Robot moves through a door to a key which it picks up and takes it to a lock ● The distances to each of the objects are used to separate the whole trajectory into single “skills” ● To separate a multiple changepoint detection is needed → MAP [4] ● The sum of the reward is returned Figure 4: Example for the CST [1]

Markov Decision Process (MDP) ● Environment / System is described as a Markov decision process (MDP) S:= Set of States, A:= Set of Actions, P:= Possibility reaching next State, R:= Expected reward ● Extended by the Hierarchical Reinforcement Learning (HRL) method ● Reinforcement Learning: – methods that uses agents to determine the use of a sequence of actions – They are often used to determine which action of an MDP should be done next

Red Room Task 1 ● Robot has to solve the task: 1. Press Button 2. Pull Handle → Door opens 3. Press Switch ● A state is described as: ● r = State of the room (4 bits: the button, the handle, the door and the switch) ● p = 5 possible positions (start, 3x in front of each objects, through the door) ● h = 7 possible positions of the hand Figure 5: Red Room Task 1 [1]

Hierarchical Reinforcement Learning (HRL) + Options ● In HRL the actions are extended to options: ● Each option consists of: ● Own option policy: → combination of actions ● Initiation set: → 1 for all states in which it can be used(else 0) ● Termination condition ● Additional an option can define an abstraction, reducing the state space and the action space ● In this case the abstractions were defined as pairs: ● (Body, Target): distance to the object and to the wall it is mounted to and the angel to the normal of that wall the angle ● (Hand, Target): distance from hand to the object

Hierarchical Reinforcement Learning (HRL) + Options (2) ● If a new option should be created, all parts have to be set: ● Only one abstraction from the library possible ● Policy: segmented actions ● Initiation set: All states in the segment ● Termination Condition: initiation set of states that succeeds it ● Reward if unknown (later measured): – Acquired skill → 0 – Basic action → 3hours ● → Robot first tries to use acquired skills

Experiment ● The Robot solved the task 7x ● Best solutions were used to generate 5 demonstration trajectories for the CST ● CST separated them into same sequence of 10 skills (see Fig. 6) ● Then the skills were merged together ● Moving skills are discarded Figure 6: sample for a seperated demonstration trajectory [1]

Red Room Task 2 ● Robot has to solve a similar task with and without acquired skills: 1. Press Switch 2. Press First Button → Door opens 3. Pull Handle → Door closes 4. Press Second Button ● The time is measured Figure 7: Red Room Task 2 [1] Task 2

Results ● Acquired Skills: ● → halves the mean time ● No times of both conditions overlapped Figure 8: Results of the second Task with and without acquired skills [1]

Conclusion ● By solving a problem, skills can be extracted that improve the performance of solving another problem ● MDP has to be defined ● For a good separation an abstraction library is needed ● How to setup a new option has to be defined ● → which skills can be created is limited ● It is foreseeable, that in addition to autonomous skill acquisition, skill management will be important

Thanks for Attention! Figure 9: Clapping machine Arms [F-2]

References [1]Barto, A., and Mahadevan, S Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems 13:41–77 [2]Konidaris, G.; Kuindersma, S.; Barto, A.; and Grupen, R Constructing skill trees for reinforcement learning agents from demonstration trajectories. In Lafferty, J.; Williams, C.; Shawe-Taylor, J.; Zemel, R.; and Culotta, A., eds., Advances in Neural Information Processing Systems 23, 1162–1170. [3] Konidaris, G.; Kuindersma, S.; Grupen, R.; Barto, A Autonomous Skill Acquisition on a Mobile Manipulator. Association for the Advancement of Artificial Intelligence [4]Fearnhead, P. and Liu, Z. (2007). On-line inference for multiple changepoint problems. Journal of the Royal Statistical Society B, 69, 589–605. Videos can be found under:

Picture-References [F-1] ©Drew Bell ( license: [F-2]©Ars Electronica ( license: nd/2.0/ nd/2.0/