Download presentation
Presentation is loading. Please wait.
Published byMarianna Peters Modified over 9 years ago
1
ICO Learning Gerhard Neumann Seminar A, SS06
2
Overview Short Overview of different control methods Correlation Based Learning ISO Learning Comparison to other Methods ([Wörgötter05]) TD Learning STDP ICO Learning ([Porr06]) Learning Receptive Fields ([Kulvicius06])
3
Comparison of ISO learning to other Methods Comparison for Classical Conditioning learning Problems (open loop control) Relating RL to Classical Conditioning Classical Conditioning: Pairing of two subsequent stimuli is learned such that the presentation of the first stimulus is taken as a predictor of the second one. RL: Maximization of Rewards: v … Predictor of future reward
4
RL for Classical Conditioning TD-Error: Derivation Term : Weight Change: => Nothing new so far… Goal: Output v should react after learning to the onset of the CS x n, and remains active until the reward terminates Present CS internally by a chain of n + 1 delayed pulses x i Replace the states from traditional RL with time steps
5
RL for Classical Conditioning Special kind of E-Trace Serial Compound Representation Learning Steps: Rectangular response of v Special Treatment of the reward not necessary x 0 can replace the reward when setting w 0 to 1 at the beginning
6
Comparison for Classical Conditioning Correlation Based Learning „Reward“ x 0 is not an independent term as in TD learning TD-Learning
7
Comparison for Classical Conditioning TD-Learning ISO-Learning Uses another form of E-Traces (Band-pass filters) Used for all input pathways -> also for calculating the output
8
Comparison for the closed loop Closed loop Actions of the agent affect future sensory input Comparison not so easy any more, because behavior of the algorithms is now quite different Reward Based Architectures Actor-Critic Architecture Use Evaluative Feed-Back Reward Maximation A good reward signal is very often hard to find In nature: Found by evolution Can theoretically be applied to any learning problem Resolution in the State Space: Only applicable for low dimensional state spaces -> Curse of dimensionality!
9
Comparison for the closed loop Correlation Based Architectures Non-evaluative feedback, all signals are value free Minimize Disturbance Valid Regions are usually much bigger than in for reward maximation Better Convergence !! Restricted Solutions Evaluations are implicitely build into the sign of the reaction behavior Actor and Critic are the same architectureal building block Only for a restricted set of learning problems Hard to apply for complex tasks Resolution in Time: Only looks at temporal correlation of the input variables Can be applied for high dimensional state spaces
10
Comparison of ISO learning and STDP ISO learning generically produces a bimodal weight change curve Similiar to the STDP (Spike timing dependent plasticity) learning weight change curve ISO learning STDP rule: Potential from the synapse: Filtered version of a spike Gradient Dependent Model Much faster time scale used in STDP Can model different kind of synapses with different filters easily
11
Overview Short Overview of different control methods Correlation Based Learning ISO Learning Comparison to other Methods ([Wörgötter05]) TD Learning STDP ICO Learning ([Porr06]) Learning Receptive Fields([Kulvicius06])
12
ICO (Input Correlation Only) Learning Drawback of Hebbian Learning Auto-Correlation can result in divergence even if x 0 = 0 ISO learning: Relies on orthogonal filters of different inputs Orthogonal to its derivative Only works for if steady state is assumed Auto correlation does not vanish any more if the weights are changed during the impulse response of the filters -> can not be applied for large learning rates => Can be used only for small learning rates, otherwise Auto-Correlation causes divergence of the weights
13
ICO & ISO Learning ISO Learning ICO Learning
14
Simple adaption of the ISO Learning rule Correlate only inputs with each other No correlation with the output -> No Auto Correlation Define one Input as the reflex input x0 Drawback: Loss of Generality: Not Isotropic any more Not all inputs are treated equally any more Advantage: Can use much higher learning rates (up to 100x faster) Can use almost arbitrary types of filter No Divergence in weights any more
15
ICO Learning Weight change curve (open loop, just one Filter bank) Same as for ISO learning Weight changing curve ISO learning contains exponential instability Even after setting x 0 to 0 after 100000 timesteps
16
ICO Learning: Closing the Loop Output of learner v feeds back to its inputs x j after being modified by the environment Reactive Pathway: Fixed Reactive Feedback control Learning Goal: Learn earlier reaction to keep x 0 (Disturbance or error signal) at 0 One can proof that under simplified conditions that one shoot learning is possible With one filter bank, impulse signals Using Z-Transform
17
ICO Learning: Applications Simulated Robot Experiment: Robot has to find food (disks in the environment) Sensors for Uncondition Stimulus: 2 Touchsensors (Left + Right) Reflex: Robot elicits a sharp turn as it touches a disk Pulls the robot into the centre of the disk Sensors for predictive Stimulus 2 Sound (Distance) Sensors (Left + Right), Disks Can measure distance to the disk Stimulus: Difference between Left + Right sound signals Use 5 filters (resonators) in the filter bank Output v: Steering angle of the Robot
18
ICO Learning: Simulated Robot Only One experience has been sufficient to show an adapted behavior Only Possible with ICO learning
19
Simulated Robot Comparison for different Learning rates ICO Learning ISO Learning Learning was successful if for a sequence of four contacts Equivalent for small learning rates Small Auto correlation term
20
Simulated Robot Two Different Learning Rates Divergent Behavior of ISO learning for high learning rates Robot shows avoidance behavior from food disks
21
Applications continued More Complex Task: Three food disks simultanously No simple relationship between the reflex input and the predictive input any more Superimposed Sound Fields Is only learned by ICO learning, not by ISO learning
22
ICO: Real Robot Application Real Robot: Target White disk from a distance Reflex: Pulls the robot into the white disk just at the moment the robot drives over the disk Achieved by analysing the bottom-scanline of a camera Predictive input: Analysing Scanline from the top of the image Filter Bank 5 FIR Filters with different filter length All coefficients set to 1 -> smear out signal Narrow viewing angle of the camera Put robot more or less in front of the disk
23
ICO: Real Robot Experiment Processing the input Calculate the deviation of the positions of all white points in a scanline to the center of the scanline 1D signal Results: A before learning B & C After learning 14 contacts Weights oscillate around their best values, but do not diverge
24
ICO Learning: Other Applications Mechanical Arm Arm is always controlled with a PI controller to a specified set point Input of the PI controller: Motor position PI controller is used as reactive filter Disturbance: Pushing force of a second small arm mounted to the main arm Fast reacting touch sensors measures D. Use 10 resonator filters in the filter bank
25
ICO Learning: Other Applications Result: Control is shifted backwards in time Error signal (derivation to the set point) almost vanishes Other example: Temperature Control Predict temperature changes caused by another heater
26
Overview Short Overview of different control methods Correlation Based Learning ISO Learning Comparison to other Methods ([Wörgötter05]) TD Learning STDP ICO Learning ([Porr06]) Learning Receptive Fields([Kulvicius06])
27
Development of Receptive fields through temporal Sequence learning [Kulvicius06] Develop receptive fields by ICO learning Learn behavior and receptive fields simultanously Usually these 2 learning processes are considered seperately First approach where the receptive field and the behavior is trained simultanously!! Shows the application of ICO learning for high dimensional input spaces
28
Line Following System: Robot should learn to better follow a line painted on the ground Reactive Input: x 0 … Pixels at the bottom ot the image Predictive Input x 1 … Pixels in the middle of the image Use 10 different filters in the filter bank (resonators) Reflexive Output: Brings robot back to the line Not a Smooth behavior Motor Output S… Constant Speed v modifies speed and steering of the robot Use Left-Right symmetry
29
Line Following Simple System Fixed sensor banks, all pixels are summed up Input x 1 predicts x 0
30
Line Following Three different Tracks Steep, Shallow, Sharp For one learning experiment always the same track is used Robot steers much smoother Usually 1 trial is enough for learning Videos Without Learning Steep Sharp
31
Line Following: Receptive Fields Receptive fields Use 225 pixels for the far sensors Use individual filter banks for each pixel 10 filters per pixel Left-Right Symmetry: Left Receptive field is a mirror of the right
32
Line Following: Receptive Fields Results Lower learning rates have to be used More trials are needed (3 to 6 trials) Different RFs are learned for different tracks Steep and Sharp Track, Plots show the sum of all filter weights for one pixel
33
Conclusion Correlation Based Learning Tries to minimize the influence of disturbances Easier to learn than Reinforcement Learning The framework is less general Questions: When to apply Correlation Based Learning and when Reinforcement Learning How is it done by Animals/Humans? How can these two methods be combined Correlation learning in early learning stage RL for fine tuning ICO Learning Improvement of ISO learning More Stable, higher learning rates can be used One Shoot Learning is possible
34
Literature: [Porr05]: F. Wörgötter and B. Porr, Temporal Sequence Learning, Prediction and Control, A Review of different control methods and their relation to biological mechanisms [Porr03]: B. Porr, F. Wörgötter, Isotropic Sequence Order Learning [Porr06]: B. Porr, F. Wörgötter, Strongly improved stability and faster convergence of temporal sequence learning by utilising input correlations only [Kulvicius06]: T. Kulvicius, B. Porr and F. Wörgötter, Behaviourally Guided Development of Primary and Secondary Receptive Fields through temporal sequence learning
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.