Biological Arm Motion through Reinforcement Learning by Jun Izawa, Toshiyuki Kondo, Koji Ito Presented by Helmut Hauser.

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Advertisements

Kinematic Synthesis of Robotic Manipulators from Task Descriptions June 2003 By: Tarek Sobh, Daniel Toundykov.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Cerebellar Spiking Engine: Towards Object Model Abstraction in Manipulation UGR with input from PAVIA and other partners  Motivation 1.Abstract corrective.
Quantifying Generalization from Trial-by-Trial Behavior in Reaching Movement Dan Liu Natural Computation Group Cognitive Science Department, UCSD March,
Segmentation into Planar Patches for Recovery of Unmodeled Objects Kok-Lim Low COMP Computer Vision 4/26/2000.
1. Elements of the Genetic Algorithm  Genome: A finite dynamical system model as a set of d polynomials over  2 (finite field of 2 elements)  Fitness.
Blackjack as a Test Bed for Learning Strategies in Neural Networks A. Perez-Uribe and E. Sanchez Swiss Federal Institute of Technology IEEE IJCNN'98.
MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.
Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ.
A Concept of Environmental Forecasting and Variational Organization of Modeling Technology Vladimir Penenko Institute of Computational Mathematics and.
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
Octopus Arm Mid-Term Presentation Dmitry Volkinshtein & Peter Szabo Supervised by: Yaki Engel.
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
A Hybrid Self-Organizing Neural Gas Network James Graham and Janusz Starzyk School of EECS, Ohio University Stocker Center, Athens, OH USA IEEE World.
NON-FUNCTIONAL PROPERTIES IN SOFTWARE PRODUCT LINES: A FRAMEWORK FOR DEVELOPING QUALITY-CENTRIC SOFTWARE PRODUCTS May Mahdi Noorian
The Reinforcement Learning Toolbox – Reinforcement Learning in Optimal Control Tasks Gerhard Neumann Master Thesis 2005 Institute für Grundlagen der Informationsverarbeitung.
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Constraints-based Motion Planning for an Automatic, Flexible Laser Scanning Robotized Platform Th. Borangiu, A. Dogar, A. Dumitrache University Politehnica.
T for Two: Linear Synergy Advances the Evolution of Directional Pointing Behaviour Marieke Rohde & Ezequiel Di Paolo Centre for Computational Neuroscience.
A Shaft Sensorless Control for PMSM Using Direct Neural Network Adaptive Observer Authors: Guo Qingding Luo Ruifu Wang Limei IEEE IECON 22 nd International.
Twendy-One Presented by: Brandon Norton Robot Designed by: WASEDA University Sugano Laboratory, 2009.
© Manfred Huber Autonomous Robots Robot Path Planning.
Inverse Kinematics Kris Hauser
Adapting Simulated Behaviors For New Characters Jessica K. Hodgins and Nancy S. Pollard presentation by Barış Aksan.
20/10/2009 IVR Herrmann IVR: Introduction to Control OVERVIEW Control systems Transformations Simple control algorithms.
INVERSE KINEMATICS ANALYSIS TRAJECTORY PLANNING FOR A ROBOT ARM Proceedings of th Asian Control Conference Kaohsiung, Taiwan, May 15-18, 2011 Guo-Shing.
Towards Cognitive Robotics Biointelligence Laboratory School of Computer Science and Engineering Seoul National University Christian.
An introduction to the finite element method using MATLAB
MAE505: Robotics Final Project – Papers Review. Presented By: Tao Gan Advisor: Dr. Venkat Krovi. Main Topic: Nonholonomic-Wheel Mobile Robot (WMR). Sub-Topics:
Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.
Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto.
Verve: A General Purpose Open Source Reinforcement Learning Toolkit Tyler Streeter, James Oliver, & Adrian Sannier ASME IDETC & CIE, September 13, 2006.
Balancing at the border of instability Luc Moreau, Ghent University Eduardo Sontag, The State University of New Jersey (2003) presented by Helmut Hauser.
Motor Control. Beyond babbling Three problems with motor babbling: –Random exploration is slow –Error-based learning algorithms are faster but error signals.
Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University
CSC2535: Advanced Machine Learning Lecture 11b Adaptation at multiple time-scales Geoffrey Hinton.
Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.
Feature selection with Neural Networks Dmitrij Lagutin, T Variable Selection for Regression
Review: Neural Network Control of Robot Manipulators; Frank L. Lewis; 1996.
Kinematic Redundancy A manipulator may have more DOFs than are necessary to control a desired variable What do you do w/ the extra DOFs? However, even.
Accurate Robot Positioning using Corrective Learning Ram Subramanian ECE 539 Course Project Fall 2003.
Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.
Inverse Kinematics for Robotics using Neural Networks. Authors: Sreenivas Tejomurtula., Subhash Kak
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 20: Approximate & Neuro Dynamic Programming, Policy Gradient Methods Dr. Itamar Arel.
Fast Learning in Networks of Locally-Tuned Processing Units John Moody and Christian J. Darken Yale Computer Science Neural Computation 1, (1989)
Alice E. Smith and Mehmet Gulsen Department of Industrial Engineering
Typical DOE environmental management robotics require a highly experienced human operator to remotely guide and control every joint movement. The human.
Population Based Optimization for Variable Operating Points Alan L. Jennings & Ra úl Ordóñez, ajennings1ajennings1,
1cs426-winter-2008 Notes. 2 Kinematics  The study of how things move  Usually boils down to describing the motion of articulated rigid figures Things.
Chapter 14. Active Vision for Goal-Oriented Humanoid Robot Walking (2/2) in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from.
Optimization of Pallet Packaging Space and a Robotic SCARA Manipulator for Package Stacking Group-4 Puneet Jethani Erica Neuperger Siddharth Kodgi Zarvan.
Numerical Methods for Inverse Kinematics Kris Hauser ECE 383 / ME 442.
Ali Ghadirzadeh, Atsuto Maki, Mårten Björkman Sept 28- Oct Hamburg Germany Presented by Jen-Fang Chang 1.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Minor Project - Human Interaction Robot Arm
MAE505: Robotics Final Project – Papers Review.
Accurate Robot Positioning using Corrective Learning
Dynamical Statistical Shape Priors for Level Set Based Tracking
Jason Cong, David Zhigang Pan & Prasanna V. Srinivas
Pulsed Neural Networks
Emir Zeylan Stylianos Filippou
Introduction to Robotics
Chapter 4 . Trajectory planning and Inverse kinematics
Jason Cong, David Zhigang Pan & Prasanna V. Srinivas
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Biological Arm Motion through Reinforcement Learning by Jun Izawa, Toshiyuki Kondo, Koji Ito Presented by Helmut Hauser

helmut igi 2 Overview biological motivation and basic idea biological muscle force model mathematical formulations reaching task results and conclusions

helmut igi 3 Biological Motivation (1) Reinforcement Learning in biology (Dopamine,…)  In the framework we have a big state and action space (Curse of dimensionality) (2) Multiple muscles produce joint torques  High redundancy  enables the system to maintain robustness and flexibility  increases space Humans can deal with that, but how ??

helmut igi 4 Basic Idea How do humans learn a new motion ? We coactivate muscles and stiff our joint  Stiffness decreases while learning (feeling „safer“)  Our motions get smoother Maybe there exists some preferred domain in the action space with higher priority in the learning process. Idea: Restricting the learning domain for the action space while learning and then soften restrictions when improving.

helmut igi 5 Muscle force model Muscle forceelasticityviscosity l r equilibrium length „stiffness“

helmut igi 6 Biological Model Lower arm upper arm θ1θ1 θ2θ2

helmut igi 7 Merging two worlds Muscle force modelDynamic 2-link model R =G T KG… elasticityD=G T BG … viscosity and some transformations λR -1 G T K ……Θ v

helmut igi 8 Mathematical Formulation Remember: G is constant K = diag (k 0 +k i u i ) R = G T KG Θ v = λR -1 G T K D = G T BG constant

helmut igi 9 Mathematical Formulation Orthogonal decomposition: pseudoinverse: u = u 1 ‘ + u 2 ‘ n = n 1 ‘ + n 2 ‘ ň = n 1 ‘ + c* n 2 ‘. Note: 0 ≤ c ≤1

helmut igi 10 N(J) R(J) action space u ρ θvθv

helmut igi 11 N(J) R(J) action space u ρ θvθv c

helmut igi 12 Architecture Critic network Actor network Noise generator motor command u t q t-1 reward TD error

helmut igi 13 Reaching Task goal (GA) start S Reward model: 1 - c E r E for r -c E r E for -1for with r E =Σu i 2 over all 6 muscles

helmut igi 14 Some implementation facts - extended input q, since reward model needs u too ! - stiffness R set to rather „high“ values -Neural Network (proposed by Shibata) as a function approximator (backpropagation) - as a second experiment and a load with arbitrary orientation (which stays the same in one trial) is applied within a certain region -Parameter (like noise-parameter, c E of the reward model,…) have to be tuned.

helmut igi 15 Results Proposed architecture (compared to a standard approach) gets more reward Cummulative reward doesn‘t tend to zero Energy doesn‘t change in the early stage, decreases after hitting the target. With extra force: peak of stiffness moves to this area

helmut igi 16 Conlusions Can deal with redundant systems (typical case in nature) The search noise is restricted to a subspace A robust controller has been achieved Some extra tuning was needed (made by evolution ?) Future outlook: Applying to hierarchical system (more stages) How to prevent extra tuning ?

helmut igi 17 Literature „Biological Robot Arm Motion through Reinforcement Learning“ Jun Izawa, Toshiyuki Kondo, Koji Ito Proceedings of the 2002 IEEE International Conference on Robotics & Automation „Motor Learning Model using Reinforcement Learning with Neural Internal Model“ Jun Izawa, Toshiyuki Kondo, Koji Ito Department of Computational Intelligence and Systems „Biological Robot Arm Motion through Reinforcement Learning“ Jun Izawa, Toshiyuki Kondo, Koji Ito Biol.Xabern. 91, (2004) Springer-Verlag 2004