Presentation is loading. Please wait.

Presentation is loading. Please wait.

Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS

Similar presentations


Presentation on theme: "Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS"— Presentation transcript:

1 Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS http://www.pyoudeyer.com http://flowers.inria.fr

2 Sensorimotor and social learning: Autonomous Open, « life-long learning » Real world, physical and social  Experimental validation Developmental robotics Intrinsic Motivation Maturation Imitation, Social guidance Fundamental understanding of the mechanisms of development Application to assistive robotics Fundamental understanding of the mechanisms of development Application to assistive robotics

3 “Engineered” robot learning Engineer shows, with fixed interaction protocol in the lab: Target:  Regression algorithms (e.g. LGP, LWPR, Gaussian Mixture Regression) Action State/ context Action policy Engineer provides a reward/fitness function: Target:  Optimization algorithms (e.g. NAC, non- linear Nelder-Mead, …) OR « Real » world Developmental approach Which generic reward function for spontaneous curiosity driven learning? Axe 2 ? Behaviour of human (non-engineer) ? Axe 1

4 Learning from interactions with non-engineers Non-engineer human behaviour ? 1.Intuitive multimodal interfaces Synthesis and recognition of emotion in speech (IJHCS, 2001, 5 patents) Clicker-training (RAS, 2002; 1 patent) Physical human-robot interfaces (Humanoids 2011) 2.User studies (Humanoids 2009, HRI 2011) 3.Adaptation: learning flexible teaching interfaces (Conn. Sci., 2006, ICDL 2011, IROS 2010)

5 Spontaneous active exploration, artificial curiosity in the vicinity of  Non-stationary function, difficult to model  Algorithms for empirical evaluation of de/dt with statistical regression  IAC (2004, 2007), R-IAC (2009), SAGG-RIAC (2010) McSAGG-RIAC ( 2011 ), SGIM ( 2011 ) Non ! Intrinsic Motivation Berlyne (1960), Csikszentmihalyi (1996) Dayan and Belleine (2002) Quelle fonction de récompense générique ?

6 Exploring and learning generalized forward and inverse models Parameterized by

7 simple complexe simple complexe Explore zones where: Uncertainty/errors maximal Least explored Assume: Spatial or temporal stationarity Everything is learnable within lifetime  Which experiment ? Developmental approach Explore zones where empirically learning progress is maximal Active learning of models

8 Sensori state Sensori state Action state Action state Context state Context state Classic machine learner M (e.g. neural net, SVM, Gaussian process) Classic machine learner M (e.g. neural net, SVM, Gaussian process) Meta machine learner metaM Progressive categorization Local model of learning progress... Sensori state at t+1 Prediction Error feedback Action selection system Intrinsic reward Local model of learning progress IAC, IEEE Trans. EC (2007) R-IAC, IEEE Trans. AMD (2009)

9 The Playground Experiments (IEEE Trans. EC 2007; Connection Science 2006; AAAI Work. Dev. Learn. 2005)

10 Experimentations on Open Learning in the Real World Playground Experiments Autonomous learning of novel affordances and and skills, e.g. object manipulation IEEE TEC, 2007; IROS 2010; IEEE TAMD, 2009; Front. Neurorobotics, 2007; Connect. Sc., 2006; IEEE ICDL 2010, 2011 simple complex Self-organization of developmental trajectories, bootstrapping of communication  New hypotheses for understanding infant development Front. Neuroscience 2007, Infant and Child Dev. 2008, Connect. Science 2006

11 Active learning of inverse models SAGG-RIAC (RAS, 2012) (Context, Movement)  Effect Redundancy of sensorimotor spaces From the active choice of action, followed by observation of effect … … to the active choice of effect, followed by the search of a corresponding action policy through goal-directed optimization (e.g. using NAC, POWER, PI^2-CMA, …)  self-defined RL problem Spontaneous active exploration of a space of fitness functions parameterized by where one iteratively chooses the which maximizes the empirical evaluation of:

12 Apprentissage de la locomotion omnidirectionnelle  Performance higher than more classical active learning algorithms in real sensorimotor spaces (non-stationary, non homogeneous) (IEEE TAMD 2009; ICDL 2010, 2011; IROS 2010; RAS 2012) Experimental evaluation of active learning efficiency Control Space:Task Space:

13

14 Maturational constraints Progressive growths of DOF number and spatio-temporal resolution Adaptive maturational schedule controlled by active learning/learning progress (Bjorklund, 1997; Turkewitz and Kenny, 1985)

15 McSAGG-RIAC Maturationally constrained curiosity-driven learning (IEEE ICDL-Epirob 2011a)

16 SGIM: Socially Guided Intrinsic Motivation (ICDL-Epirob, 2011b)

17 « Life-long » Experimentation Acroban (Siggraph 2010, IROS 2011, World Expo, South Korea, 2012) Experimentation of algorithms for « life-long » learning in the real world  Technological experimental platforms: robust, reconfigurable, precise, easily repaired, cheap

18 Ergo-Robots (Exhibition « Mathematics, a beautiful elsewhere », Fond. Cartier, 2011-2012) Experimentation of algorithms for « life-long » learning in the real world  Technological experimental platforms: robust, reconfigurable, precise, easily repaired, cheap  Ergo-Robots  Mid-term: open-source distribution of the platform to the scientific community « Life-long » Experimentation

19 Baranes, A., Oudeyer, P-Y. (2012) Active Learning of Inverse Models with Intrinsically Motivated Goal Exploration in Robots, Robotics and Autonomous Systems. http://www.pyoudeyer.com/RAS-SAGG-RIAC-2012.pdf Baranes, A., Oudeyer, P-Y. (2011a) The Interaction of Maturational Constraints and Intrinsic Motivation in Active Motor Development, in Proceedings of IEEE ICDL-Epirob 2011. http://flowers.inria.fr/BaranesOudeyerICDL11.pdf Lopes, M., Melo, F., Montesano, L. (2009) Active Learning for Reward Estimation in Inverse Reinforcement Learning, European Conference on Machine Learning (ECML/PKDD), Bled, Slovenia, 2009. http://flowers.inria.fr/mlopes/myrefs/09-ecml-airl.pdf Nguyen, M., Baranes, A., Oudeyer, P-Y. (2011b) Bootstrapping Intrinsically Motivated Learning with Human Demonstrations, in Proceedings of IEEE ICDL-Epirob 2011. http://flowers.inria.fr/NguyenBaranesOudeyerICDL11.pdfhttp://flowers.inria.fr/NguyenBaranesOudeyerICDL11.pdf Oudeyer P-Y, Kaplan, F. and Hafner, V. (2007) Intrinsic Motivation Systems for Autonomous Mental Development, IEEE Transactions on Evolutionary Computation, 11(2), pp. 265--286. http://www.pyoudeyer.com/ims.pdf Baranes, A., Oudeyer, P-Y. (2009 )R-IAC: Robust intrinsically motivated exploration and active learning, IEEE Transactions on Autonomous Mental Development, 1(3), pp. 155--169. Ly, O., Lapeyre, M., Oudeyer, P-Y. (2011) Bio-inspired vertebral column, compliance and semi-passive dynamics in a lightweight robot, in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011), San Francisco, US.Bio-inspired vertebral column, compliance and semi-passive dynamics in a lightweight robot, in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011), San Francisco, US. Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress, Manuel Lopes, Tobias Lang, Marc Toussaint and Pierre-Yves Oudeyer. Neural Information Processing Systems (NIPS 2012), Tahoe, USA. http://flowers.inria.fr/mlopes/myrefs/12-nips-zeta.pdf http://flowers.inria.fr/mlopes/myrefs/12-nips-zeta.pdf The Strategic Student Approach for Life-Long Exploration and Learning, Manuel Lopes and Pierre-Yves Oudeyer. In Proceedings of IEEE ICDL-Epirob 2012, http://flowers.inria.fr/mlopes/myrefs/12-ssp.pdf http://flowers.inria.fr/mlopes/myrefs/12-ssp.pdf


Download ppt "Developmental Mechanisms for Life-Long Autonomous Learning in Robots Pierre-Yves Oudeyer Project-Team INRIA-ENSTA-ParisTech FLOWERS"

Similar presentations


Ads by Google