On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS presented at Honda HRI, Offenbach, 17 th March 2009.

Slides:



Advertisements
Similar presentations
A machine learning perspective on neural networks and learning tools
Advertisements

RL for Large State Spaces: Value Function Approximation
Sparse Coding in Sparse Winner networks Janusz A. Starzyk 1, Yinyin Liu 1, David Vogel 2 1 School of Electrical Engineering & Computer Science Ohio University,
Applying Machine Learning to Circuit Design David Hettlinger Amy Kerr Todd Neller.
On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber FIGSS talk, FIAS, 20 th April 2009.
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Radial Basis Functions
From Motor Babbling to Planning Cornelius Weber Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany ICN Young Investigators’
Reinforcement Learning in Real-Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.
EE141 1 Broca’s area Pars opercularis Motor cortexSomatosensory cortex Sensory associative cortex Primary Auditory cortex Wernicke’s area Visual associative.
Project funded by the Future and Emerging Technologies arm of the IST Programme FET-Open scheme Neural Robot Control Cornelius Weber Hybrid Intelligent.
Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.
From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
Goal-Directed Feature and Memory Learning Cornelius Weber Frankfurt Institute for Advanced Studies (FIAS) Sheffield, 3 rd November 2009 Collaborators:
Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity.
Self-organizing Learning Array based Value System — SOLAR-V Yinyin Liu EE690 Ohio University Spring 2005.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Presented by: Kamakhaya Argulewar Guided by: Prof. Shweta V. Jain
The Reinforcement Learning Toolbox – Reinforcement Learning in Optimal Control Tasks Gerhard Neumann Master Thesis 2005 Institute für Grundlagen der Informationsverarbeitung.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
© Copyright 2004 ECE, UM-Rolla. All rights reserved A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C.
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
“When” rather than “Whether”: Developmental Variable Selection Melissa Dominguez Robert Jacobs Department of Computer Science University of Rochester.
Chapter 16. Basal Ganglia Models for Autonomous Behavior Learning in Creating Brain-Like Intelligence, Sendhoff et al. Course: Robots Learning from Humans.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
© The McGraw-Hill Companies, Software Project Management 4th Edition Step Wise: An approach to planning software projects Chapter 2.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Software project management (intro)
Kirchhoff Institute for Physics Johannes Schemmel Ruprecht-Karls-Universität Heidelberg 1 Accelerated Neuromorphic Hardware : Hybrid Plasticity - The Next.
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Supervised Learning. Teacher response: Emulation. Error: y1 – y2, where y1 is teacher response ( desired response, y2 is actual response ). Aim: To reduce.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Additional NN Models Reinforcement learning (RL) Basic ideas: –Supervised learning: (delta rule, BP) Samples (x, f(x)) to learn function f(.) precise error.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
UNCLASSIFIED Fundamental Aspects of Radiation Event Generation for Electronics and Engineering Research Robert A. Weller Institute for Space and Defense.
RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 1 Nonlinear Models.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University
Machine Learning for Computer Security
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
A Comparison of Learning Algorithms on the ALE
Mastering the game of Go with deep neural network and tree search
CSE 473 Introduction to Artificial Intelligence Neural Networks
Goal-Directed Feature and Memory Learning
Layout - you need to understand that a simple navigation bar:
CSE P573 Applications of Artificial Intelligence Neural Networks
"Playing Atari with deep reinforcement learning."
CSE 473 Introduction to Artificial Intelligence Neural Networks
پروتكل آموزش سلامت به مددجو
A critical review of RNN for sequence learning Zachary C
Gradient Checks for ANN
CSE 573 Introduction to Artificial Intelligence Neural Networks
Sensorimotor Learning and the Development of Position Invariance
RL for Large State Spaces: Value Function Approximation
یادگیری تقویتی Reinforcement Learning
Reinforcement Learning
Artificial Neural Networks
Artificial Intelligence 10. Neural Networks
Background “Structurally dynamic” cellular automata (Ilachinski, Halpern 1987) have been shown to simulate biological functions with emergent behavior.
How to win big by thinking straight about relatively trivial problems
Morteza Kheirkhah University College London
Presentation transcript:

On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS presented at Honda HRI, Offenbach, 17 th March 2009

for taking action, we need only the relevant features x y z

unsupervised learning in cortex reinforcement learning in basal ganglia state space actor Doya, 1999

actor state space 1-layer RL model of BG... go left? go right?... is too simple to handle complex input

complex input (cortex)‏ need another layer(s) to pre-process complex data feature detection action selection actor state space

models’ background: - gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995) - reward-modulated Hebb Triesch, Neur Comp 19, (2007), Roelfsema & Ooyen, Neur Comp 17, (2005); Franz & Triesch, ICDL (2007)‏ - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, (2007), Florian, Neur Comp 19/6, (2007); Farries & Fairhall, Neurophysiol 98, (2007);... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)‏

sensory input reward action scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’; reward given if horizontal bar at specific position

model that learns the relevant features top layer: SARSA RL lower layer: winner-take-all feature learning both layers: modulate learning by δ RL weights feature weights input action

SARSA with WTA input layer

note: non-negativity constraint on weights Energy function: estimation error of state-action value identities used:

RL action weights feature weights data learning the ‘short bars’ data reward action

short bars in 12x12 average # of steps to goal: 11

RL action weights feature weights input reward 2 actions (not shown)‏ data learning ‘long bars’ data

WTA non-negative weights SoftMax non-negative weights SoftMax no weight constraints

Discussion - simple model: SARSA on winner-take-all network with δ-feedback - learns only the features that are relevant for action strategy - theory behind: derivation of value function estimation (approx.)‏ - non-negative coding aids feature extraction - link between unsupervised- and reinforcement learning - demonstration with more realistic data needed Bernstein Focus Neurotechnology, BMBF grant 01GQ0840 EU project “IM-CLeVeR”, call FP7-ICT Frankfurt Institute for Advanced Studies, FIAS Sponsors

Bernstein Focus Neurotechnology, BMBF grant 01GQ0840 EU project “IM-CLeVeR”, call FP7-ICT Frankfurt Institute for Advanced Studies, FIAS Sponsors thank you...