Autonomous Skill Acquisition on a Mobile Manipulator Hauptseminar: Topics in Robotics Jonah Vincke George Konidaris MIT CSAIL Scott Kuindersma Roderic Grupen Laboratory for Perceptual Robotics University of Massachusetts Amherst Andrew Barto Autonomous Learning Laboratory University of Massachusetts Amherst
Table of Content ● Motivation ● uBot-5 ● Basics / Theories ● The Experiment ● Results ● Conclusion
Motivation ● In Robotics the Machine Learning is a core research ● In many cases an autonomous learning is necessary/desired: Figure 2: Household Robot [F-1] Figure 1: Mars Robot
Idea ● Robot knows basic (low level) actions ● Robot learns combinations of basic actions (called “skills”) by solving a first task ● For similar tasks the robot can use the learned skills ● Task of the research is to determine if the idea is feasible, not to develop a system with a special purpose
uBot-5 ● Whole Robot developed at the UMass ● Two arms terminated by balls (only basic manipulations) ● Two cameras are used to identify Objects using the ARToolkit system ● Navigation to an Object and the balancing still developed Figure 3: The uBot-5 [1]
Configuration Skill Tree ● Scene: Robot moves through a door to a key which it picks up and takes it to a lock ● The distances to each of the objects are used to separate the whole trajectory into single “skills” ● To separate a multiple changepoint detection is needed → MAP [4] ● The sum of the reward is returned Figure 4: Example for the CST [1]
Markov Decision Process (MDP) ● Environment / System is described as a Markov decision process (MDP) S:= Set of States, A:= Set of Actions, P:= Possibility reaching next State, R:= Expected reward ● Extended by the Hierarchical Reinforcement Learning (HRL) method ● Reinforcement Learning: – methods that uses agents to determine the use of a sequence of actions – They are often used to determine which action of an MDP should be done next
Red Room Task 1 ● Robot has to solve the task: 1. Press Button 2. Pull Handle → Door opens 3. Press Switch ● A state is described as: ● r = State of the room (4 bits: the button, the handle, the door and the switch) ● p = 5 possible positions (start, 3x in front of each objects, through the door) ● h = 7 possible positions of the hand Figure 5: Red Room Task 1 [1]
Hierarchical Reinforcement Learning (HRL) + Options ● In HRL the actions are extended to options: ● Each option consists of: ● Own option policy: → combination of actions ● Initiation set: → 1 for all states in which it can be used(else 0) ● Termination condition ● Additional an option can define an abstraction, reducing the state space and the action space ● In this case the abstractions were defined as pairs: ● (Body, Target): distance to the object and to the wall it is mounted to and the angel to the normal of that wall the angle ● (Hand, Target): distance from hand to the object
Hierarchical Reinforcement Learning (HRL) + Options (2) ● If a new option should be created, all parts have to be set: ● Only one abstraction from the library possible ● Policy: segmented actions ● Initiation set: All states in the segment ● Termination Condition: initiation set of states that succeeds it ● Reward if unknown (later measured): – Acquired skill → 0 – Basic action → 3hours ● → Robot first tries to use acquired skills
Experiment ● The Robot solved the task 7x ● Best solutions were used to generate 5 demonstration trajectories for the CST ● CST separated them into same sequence of 10 skills (see Fig. 6) ● Then the skills were merged together ● Moving skills are discarded Figure 6: sample for a seperated demonstration trajectory [1]
Red Room Task 2 ● Robot has to solve a similar task with and without acquired skills: 1. Press Switch 2. Press First Button → Door opens 3. Pull Handle → Door closes 4. Press Second Button ● The time is measured Figure 7: Red Room Task 2 [1] Task 2
Results ● Acquired Skills: ● → halves the mean time ● No times of both conditions overlapped Figure 8: Results of the second Task with and without acquired skills [1]
Conclusion ● By solving a problem, skills can be extracted that improve the performance of solving another problem ● MDP has to be defined ● For a good separation an abstraction library is needed ● How to setup a new option has to be defined ● → which skills can be created is limited ● It is foreseeable, that in addition to autonomous skill acquisition, skill management will be important
Thanks for Attention! Figure 9: Clapping machine Arms [F-2]
References [1]Barto, A., and Mahadevan, S Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems 13:41–77 [2]Konidaris, G.; Kuindersma, S.; Barto, A.; and Grupen, R Constructing skill trees for reinforcement learning agents from demonstration trajectories. In Lafferty, J.; Williams, C.; Shawe-Taylor, J.; Zemel, R.; and Culotta, A., eds., Advances in Neural Information Processing Systems 23, 1162–1170. [3] Konidaris, G.; Kuindersma, S.; Grupen, R.; Barto, A Autonomous Skill Acquisition on a Mobile Manipulator. Association for the Advancement of Artificial Intelligence [4]Fearnhead, P. and Liu, Z. (2007). On-line inference for multiple changepoint problems. Journal of the Royal Statistical Society B, 69, 589–605. Videos can be found under:
Picture-References [F-1] ©Drew Bell ( license: [F-2]©Ars Electronica ( license: nd/2.0/ nd/2.0/