Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS

Slides:



Advertisements
Similar presentations
Cognitive Systems, ICANN panel, Q1 What is machine intelligence, as beyond pattern matching, classification and prediction. What is machine intelligence,
Advertisements

Dialogue Policy Optimisation
Breakout session B questions. Research directions/areas Multi-modal perception cognition and interaction Learning, adaptation and imitation Design and.
Artificial Curiosity Tyler Streeter
Chapter Thirteen Conclusion: Where We Go From Here.
Perception and Perspective in Robotics Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory Humanoid Robotics Group Goal To build.
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
ROBOT BEHAVIOUR CONTROL SUCCESSFUL TRIAL OF MARKERLESS MOTION CAPTURE TECHNOLOGY Student E.E. Shelomentsev Group 8Е00 Scientific supervisor Т.V. Alexandrova.
Model Predictive Control for Humanoid Balance and Locomotion Benjamin Stephens Robotics Institute.
Behaviors for Compliant Robots Benjamin Stephens Christopher Atkeson We are developing models and controllers for human balance, which are evaluated on.
EE141 1 Design of Self-Organizing Learning Array for Intelligent Machines Janusz Starzyk School of Electrical Engineering and Computer Science Heidi Meeting.
EE141 1 Broca’s area Pars opercularis Motor cortexSomatosensory cortex Sensory associative cortex Primary Auditory cortex Wernicke’s area Visual associative.
Computational Intelligence Dr. Garrison Greenwood, Dr. George Lendaris and Dr. Richard Tymerski
IJCNN, International Joint Conference on Neural Networks, San Jose 2011 Pawel Raif Silesian University of Technology, Poland, Janusz A. Starzyk Ohio University,
Robotics for Intelligent Environments
Introductory Remarks Robust Intelligence Solicitation Edwina Rissland Daniel DeMenthon, George Lee, Tanya Korelsky, Ken Whang (The Robust Intelligence.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
Sociable Machines Cynthia Breazeal MIT Media Lab Robotic Presence Group.
Curious Characters in Multiuser Games: A Study in Motivated Reinforcement Learning for Creative Behavior Policies * Mary Lou Maher University of Sydney.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Institute of Perception, Action and Behaviour (IPAB) Director: Prof. Sethu Vijayakumar.
Function Approximation for Imitation Learning in Humanoid Robots Rajesh P. N. Rao Dept of Computer Science and Engineering University of Washington,
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
SA-1 Robotic Self-Perception and Body Scheme Learning Jürgen Sturm Christian Plagemann Wolfram Burgard University of Freiburg Germany With annotated questions.
An Adaptive Modeling for Robust Prognostics on a Reconfigurable Platform Behrad Bagheri Linxia Liao.
1 Challenge 2 Call 3 presentation to NCPs Brussels, December 13, 2007 Colette Maloney, PhD Head of Unit, INFSO E5, Cognitive Systems and Robotics European.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.
Prediction in Human Presented by: Rezvan Kianifar January 2009.
Chapter 1. Introduction in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans Jo, HwiYeol Biointelligence Laboratory.
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
Beyond Gazing, Pointing, and Reaching A Survey of Developmental Robotics Authors: Max Lungarella, Giorgio Metta.
Learning From Demonstration. Robot Learning A good control policy u=  (x,t) is often hard to engineer from first principles Reinforcement learning 
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
CHAPTER 10 Reinforcement Learning Utility Theory.
DARPA ITO/MARS Project Update Vanderbilt University A Software Architecture and Tools for Autonomous Robots that Learn on Mission K. Kawamura, M. Wilkes,
Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University
Self-Organization, Embodiment, and Biologically Inspired Robotics Rolf Pfeifer, Max Lungarella, Fumiya Iida Science – Nov Rakesh Gosangi PRISM lab.
Chapter 7. Learning through Imitation and Exploration: Towards Humanoid Robots that Learn from Humans in Creating Brain-like Intelligence. Course: Robots.
Cognitive Robotics: Lessons from the SmartWheeler project Joelle Pineau, School of Computer Science, McGill University September 22,
Chapter 1. Cognitive Systems Introduction in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans Park, Sae-Rom Lee, Woo-Jin Statistical.
Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.
Evaluating VR Systems. Scenario You determine that while looking around virtual worlds is natural and well supported in VR, moving about them is a difficult.
Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.
Chapter 17. The Progress drive hypothesis Course: Robots Learning from Humans Yoon, Bo Sung Department of Economics Seoul National University
IEEE International Conference on Multimedia and Expo.
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
Doctoral School – Robotics Program Human-Robot Interaction
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Intelligence and Communications for Robots Laboratory INCORL 연구실 주간 전체 회의 (10 th Week)
Introduction to Machine Learning, its potential usage in network area,
Online Multiscale Dynamic Topic Models
IPAB Research Areas and Strengths
Learning from Human Boyuan Chen.
Eick: Introduction Machine Learning
Overview of Year 1 Progress Angelo Cangelosi & ITALK team
Artificial Intelligence (CS 370D)
Active Learning using Adaptive Curiosity
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Accurate Robot Positioning using Corrective Learning
PETRA 2014 An Interactive Learning and Adaptation Framework for Socially Assistive Robotics: An Interactive Reinforcement Learning Approach Konstantinos.
Anne Pratoomtong ECE734, Spring2002
Joelle Pineau: General info
Overview of Machine Learning
George Bebis and Wenjing Li Computer Vision Laboratory
CS 188: Artificial Intelligence Spring 2006
Christoph F. Eick: A Gentle Introduction to Machine Learning
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS

Sensorimotor and social learning: Autonomous Open, « life-long learning » Real world, physical and social  Experimental validation Developmental robotics Intrinsic Motivation Maturation Imitation, Social guidance Fundamental understanding of the mechanisms of development Application to assistive robotics Fundamental understanding of the mechanisms of development Application to assistive robotics

“Engineered” robot learning Engineer shows, with fixed interaction protocol in the lab: Target:  Regression algorithms (e.g. LGP, LWPR, Gaussian Mixture Regression) Action State/ context Action policy Engineer provides a reward/fitness function: Target:  Optimization algorithms (e.g. NAC, non- linear Nelder-Mead, …) OR « Real » world Developmental approach Which generic reward function for spontaneous curiosity driven learning? Axe 2 ? Behaviour of human (non-engineer) ? Axe 1

Learning from interactions with non-engineers Non-engineer human behaviour ? 1.Intuitive multimodal interfaces Synthesis and recognition of emotion in speech (IJHCS, 2001, 5 patents) Clicker-training (RAS, 2002; 1 patent) Physical human-robot interfaces (Humanoids 2011) 2.User studies (Humanoids 2009, HRI 2011) 3.Adaptation: learning flexible teaching interfaces (Conn. Sci., 2006, ICDL 2011, IROS 2010)

Spontaneous active exploration, artificial curiosity in the vicinity of  Non-stationary function, difficult to model  Algorithms for empirical evaluation of de/dt with statistical regression  IAC (2004, 2007), R-IAC (2009), SAGG-RIAC (2010) McSAGG-RIAC ( 2011 ), SGIM ( 2011 ) Non ! Intrinsic Motivation Berlyne (1960), Csikszentmihalyi (1996) Dayan and Belleine (2002) Quelle fonction de récompense générique ?

Exploring and learning generalized forward and inverse models Parameterized by

simple complexe simple complexe Explore zones where: Uncertainty/errors maximal Least explored Assume: Spatial or temporal stationarity Everything is learnable within lifetime  Which experiment ? Developmental approach Explore zones where empirically learning progress is maximal Active learning of models

Sensori state Sensori state Action state Action state Context state Context state Classic machine learner M (e.g. neural net, SVM, Gaussian process) Classic machine learner M (e.g. neural net, SVM, Gaussian process) Meta machine learner metaM Progressive categorization Local model of learning progress... Sensori state at t+1 Prediction Error feedback Action selection system Intrinsic reward Local model of learning progress IAC, IEEE Trans. EC (2007) R-IAC, IEEE Trans. AMD (2009)

The Playground Experiments (IEEE Trans. EC 2007; Connection Science 2006; AAAI Work. Dev. Learn. 2005)

Experimentations on Open Learning in the Real World Playground Experiments Autonomous learning of novel affordances and and skills, e.g. object manipulation IEEE TEC, 2007; IROS 2010; IEEE TAMD, 2009; Front. Neurorobotics, 2007; Connect. Sc., 2006; IEEE ICDL 2010, 2011 simple complex Self-organization of developmental trajectories, bootstrapping of communication  New hypotheses for understanding infant development Front. Neuroscience 2007, Infant and Child Dev. 2008, Connect. Science 2006

Active learning of inverse models SAGG-RIAC (RAS, 2012) (Context, Movement)  Effect Redundancy of sensorimotor spaces From the active choice of action, followed by observation of effect … … to the active choice of effect, followed by the search of a corresponding action policy through goal-directed optimization (e.g. using NAC, POWER, PI^2-CMA, …)  self-defined RL problem Spontaneous active exploration of a space of fitness functions parameterized by where one iteratively chooses the which maximizes the empirical evaluation of:

Apprentissage de la locomotion omnidirectionnelle  Performance higher than more classical active learning algorithms in real sensorimotor spaces (non-stationary, non homogeneous) (IEEE TAMD 2009; ICDL 2010, 2011; IROS 2010; RAS 2012) Experimental evaluation of active learning efficiency Control Space:Task Space:

Maturational constraints Progressive growths of DOF number and spatio-temporal resolution Adaptive maturational schedule controlled by active learning/learning progress (Bjorklund, 1997; Turkewitz and Kenny, 1985)

McSAGG-RIAC Maturationally constrained curiosity-driven learning (IEEE ICDL-Epirob 2011a)

SGIM: Socially Guided Intrinsic Motivation (ICDL-Epirob, 2011b)

« Life-long » Experimentation Acroban (Siggraph 2010, IROS 2011, World Expo, South Korea, 2012) Experimentation of algorithms for « life-long » learning in the real world  Technological experimental platforms: robust, reconfigurable, precise, easily repaired, cheap

Ergo-Robots (Exhibition « Mathematics, a beautiful elsewhere », Fond. Cartier, ) Experimentation of algorithms for « life-long » learning in the real world  Technological experimental platforms: robust, reconfigurable, precise, easily repaired, cheap  Ergo-Robots  Mid-term: open-source distribution of the platform to the scientific community « Life-long » Experimentation

Baranes, A., Oudeyer, P-Y. (2012) Active Learning of Inverse Models with Intrinsically Motivated Goal Exploration in Robots, Robotics and Autonomous Systems. Baranes, A., Oudeyer, P-Y. (2011a) The Interaction of Maturational Constraints and Intrinsic Motivation in Active Motor Development, in Proceedings of IEEE ICDL-Epirob Lopes, M., Melo, F., Montesano, L. (2009) Active Learning for Reward Estimation in Inverse Reinforcement Learning, European Conference on Machine Learning (ECML/PKDD), Bled, Slovenia, Nguyen, M., Baranes, A., Oudeyer, P-Y. (2011b) Bootstrapping Intrinsically Motivated Learning with Human Demonstrations, in Proceedings of IEEE ICDL-Epirob Oudeyer P-Y, Kaplan, F. and Hafner, V. (2007) Intrinsic Motivation Systems for Autonomous Mental Development, IEEE Transactions on Evolutionary Computation, 11(2), pp Baranes, A., Oudeyer, P-Y. (2009 )R-IAC: Robust intrinsically motivated exploration and active learning, IEEE Transactions on Autonomous Mental Development, 1(3), pp Ly, O., Lapeyre, M., Oudeyer, P-Y. (2011) Bio-inspired vertebral column, compliance and semi-passive dynamics in a lightweight robot, in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011), San Francisco, US.Bio-inspired vertebral column, compliance and semi-passive dynamics in a lightweight robot, in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011), San Francisco, US. Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress, Manuel Lopes, Tobias Lang, Marc Toussaint and Pierre-Yves Oudeyer. Neural Information Processing Systems (NIPS 2012), Tahoe, USA. The Strategic Student Approach for Life-Long Exploration and Learning, Manuel Lopes and Pierre-Yves Oudeyer. In Proceedings of IEEE ICDL-Epirob 2012,