On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber FIGSS talk, FIAS, 20 th April 2009.

Slides:



Advertisements
Similar presentations
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Advertisements

RL for Large State Spaces: Value Function Approximation
For Wednesday Read chapter 19, sections 1-3 No homework.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
12 June, STD( ): learning state temporal differences with TD( ) Lex Weaver Department of Computer Science Australian National University Jonathan.
Artificial Neural Networks ML Paul Scheible.
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
From Motor Babbling to Planning Cornelius Weber Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany ICN Young Investigators’
On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber, FIAS presented at Honda HRI, Offenbach, 17 th March 2009.
לביצוע מיידי ! להתחלק לקבוצות –2 או 3 בקבוצה להעביר את הקבוצות – היום בסוף השיעור ! ספר Reinforcement Learning – הספר קיים online ( גישה מהאתר של הסדנה.
Minimum Classification Error Networks Based on book chapter 9, by Shigeru Katagiri Jaakko Peltonen, 28 th February, 2002.
Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.
From Exploration to Planning Cornelius Weber and Jochen Triesch Frankfurt Institute for Advanced Studies Goethe University Frankfurt, Germany 18 th International.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
Goal-Directed Feature and Memory Learning Cornelius Weber Frankfurt Institute for Advanced Studies (FIAS) Sheffield, 3 rd November 2009 Collaborators:
Reinforcement Learning (1)
Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009)‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity.
Self-organizing Learning Array based Value System — SOLAR-V Yinyin Liu EE690 Ohio University Spring 2005.
Radial Basis Function Networks
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.
Unsupervised learning
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Unsupervised learning
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
Neural Networks Chapter 7
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
For Friday No reading Take home exam due Exam 2. For Monday Read chapter 22, sections 1-3 FOIL exercise due.
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
© Daniel S. Weld 1 Logistics No Reading First Tournament Wed Details TBA.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Reinforcement Learning
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Additional NN Models Reinforcement learning (RL) Basic ideas: –Supervised learning: (delta rule, BP) Samples (x, f(x)) to learn function f(.) precise error.
Essential components of the implementation are:  Formation of the network and weight initialization routine  Pixel analysis of images for symbol detection.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Multinomial Regression and the Softmax Activation Function Gary Cottrell.
Reinforcement Learning
A Comparison of Learning Algorithms on the ALE
Mastering the game of Go with deep neural network and tree search
A Simple Artificial Neuron
Goal-Directed Feature and Memory Learning
Dr. Kenneth Stanley September 6, 2006
"Playing Atari with deep reinforcement learning."
Unsupervised learning
RL for Large State Spaces: Value Function Approximation
یادگیری تقویتی Reinforcement Learning
Reinforcement Learning
Chapter 8: Generalization and Function Approximation
Backpropagation.
Lecture 15: Data Cleaning for ML
Deep Reinforcement Learning
Artificial Intelligence 10. Neural Networks
Backpropagation.
Linear Discrimination
The McCullough-Pitts Neuron
Background “Structurally dynamic” cellular automata (Ilachinski, Halpern 1987) have been shown to simulate biological functions with emergent behavior.
Morteza Kheirkhah University College London
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

On Linking Reinforcement Learning with Unsupervised Learning Cornelius Weber FIGSS talk, FIAS, 20 th April 2009

for taking action, we need only the relevant features x y z

unsupervised learning in cortex reinforcement learning in basal ganglia state space actor Doya, 1999

reinforcement learning go up? go right? go down? go left?

reinforcement learning input s action a weights

action a reinforcement learning minimizing value estimation error: d v(s,a) ≈ 0.9 v(s’,a’) - v(s,a)‏ d v(s,a) ≈ 1 - v(s,a)‏ moving target value fixed at goal v(s,a) value of a state-action pair (coded in the weights)‏ repeated running to goal: in state s, agent performs best action a (with random), yielding s’ and a’ --> values and action choices converge input s weights

actor go right? go left? can’t handle this! simple input go right! complex input reinforcement learning input (state space)‏

sensory input reward action complex input scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’; reward given if horizontal bar at specific position

need another layer(s) to pre-process complex data feature detection action selection network definition: s = softmax(W I)‏ P(a=1) = softmax(Q s)‏ v = a Q s a action s state I input Q weight matrix W weight matrix position of relevant bar encodes v feature detector

feature detection action selection network training: E = (0.9 v(s’,a’) - v(s,a)) 2 = δ 2 d Q ≈ dE/dQ = δ a s d W ≈ dE/dW = δ Q s I + ε a action s state I input W weight matrix minimize error w.r.t. current target reinforcement learning δ-modulated unsupervised learning Q weight matrix

note: non-negativity constraint on weights network training: minimize error w.r.t. target V π identities used:

SARSA with WTA input layer

RL action weights feature weights data learning the ‘short bars’ data reward action

short bars in 12x12 average # of steps to goal: 11

RL action weights feature weights input reward 2 actions (not shown)‏ data learning ‘long bars’ data

WTA non-negative weights SoftMax non-negative weights SoftMax no weight constraints

models’ background: - gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995) - reward-modulated Hebb Triesch, Neur Comp 19, (2007), Roelfsema & Ooyen, Neur Comp 17, (2005); Franz & Triesch, ICDL (2007)‏ - reward-modulated activity leads to input selection Nakahara, Neur Comp 14, (2002) - reward-modulated STDP Izhikevich, Cereb Cortex 17, (2007), Florian, Neur Comp 19/6, (2007); Farries & Fairhall, Neurophysiol 98, (2007);... - RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)‏

Discussion - two-layer SARSA RL performs gradient descent on value estimation error - approximation with winner-take-all leads to local rule with δ-feedback - learns only action-relevant features - non-negative coding aids feature extraction - link between unsupervised- and reinforcement learning - demonstration with more realistic data still needed Bernstein Focus Neurotechnology, BMBF grant 01GQ0840 EU project “IM-CLeVeR”, call FP7-ICT Frankfurt Institute for Advanced Studies, FIAS Sponsors

Bernstein Focus Neurotechnology, BMBF grant 01GQ0840 EU project “IM-CLeVeR”, call FP7-ICT Frankfurt Institute for Advanced Studies, FIAS Sponsors thank you...