Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events.

Slides:



Advertisements
Similar presentations
Introductory Control Theory I400/B659: Intelligent robotics Kris Hauser.
Advertisements

Neural Networks  A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Motor systems1 ACTIVE SENSING Lecture 2: Motor systems.
LECTURE#08 PROCESS CONTROL STRATEGIES
How a Stimulus Elicits a Response
Nervous System GCSE Science Chapter 2.
Reflexes Chapter 19 KINE 3301 Biomechanics of Human Movement.
Muscle-Reflex Model That Encodes Principles of Legged Mechanics Produces Human Walking Dynamics and Muscle Activities 서울대학교 이경호
B.Macukow 1 Lecture 3 Neural Networks. B.Macukow 2 Principles to which the nervous system works.
The Proportional-Integral-Derivative Controller
Artificial Neural Networks - Introduction -
Artificial Neural Networks - Introduction -
* Operational Amplifiers * Op-Amp Circuits * Op-Amp Analysis
ME 746 Spring Dynamic Models Differential Equations in State-Variable Form.
CS274 Spring 01 Lecture 5 Copyright © Mark Meyer Lecture V Higher Level Motion Control CS274: Computer Animation and Simulation.
Open loop vs closed loop By Norbert Benei ZI5A58.
Ch7 Operational Amplifiers and Op Amp Circuits
Lecture 5: PID Control.
Perspectives on Walking in an Environment Işık Barış Fidaner BM 526 Project.
By Po-Han Chen  Creatures on the world can feel the surrounding in order to adapt the environment for survival. Animals relies on Endocrine system and.
Stimuli and Response-Notes
ICO Learning Gerhard Neumann Seminar A, SS06. Overview Short Overview of different control methods Correlation Based Learning ISO Learning Comparison.
Nervous system and Integumentary System (skin)
The Nervous System and the Brain Information in this presentation is taken from UCCP content.
Ch. 6 Single Variable Control
System & Control Control theory is an interdisciplinary branch of engineering and mathematics, that deals with the behavior of dynamical systems. The desired.
Gait development in children. The prerequisite for Gait development Adequate motor control. C.N.S. maturation. Adequate R.O.M. Muscle strength. Appropriate.
20/10/2009 IVR Herrmann IVR: Introduction to Control OVERVIEW Control systems Transformations Simple control algorithms.
Introduction to Computer Vision and Robotics: Motion Generation
System/Plant/Process (Transfer function) Output Input The relationship between the input and output are mentioned in terms of transfer function, which.
Nervous System. Learning Outcomes Understand the role of the Nervous System Understand what Stimuli, Receptors and Effectors are Understand what the role.
ENT364/4 – Control System Sazali Yaacob BEng(Malaya), MSc(Surrey), PhD(Sheffield) Chartered Engineer, CEng (United Kingdom) Member Institute of Engineers.
Spike timing dependent plasticity - STDP Markram et. al ms -10 ms Pre before Post: LTP Post before Pre: LTD.
20/10/2009 IVR Herrmann IVR:Control Theory OVERVIEW Control problems Kinematics Examples of control in a physical system A simple approach to kinematic.
Spinal Control of Movement Lesson 19. Anatomy n Ventral Spinal Cord l Topographic organization n Alpha motor neurons n Spinal interneurons n Striate muscle.
Chapter 4. Angle Modulation. 4.7 Generation of FM Waves Direct Method –A sinusoidal oscillator, with one of the reactive elements in the tank circuit.
Simulating Balance Recovery Responses to Trips Based on Biomechanical Principles Takaaki Shiratori 1,2 Rakié Cham 3 Brooke Coley 3 Jessica K. Hodgins 1,2.
Pathways for Motor Control and Learning. Spinal Cord: The stretch reflex Maintain stability.
By: Lee Vang. First attempt to make a humanoid Robot by Honda was in 1986 (Model E0) History:
Feedback Control System
Introduction to Operational Amplifiers
Introduction to Biped Walking
Review: Neural Network Control of Robot Manipulators; Frank L. Lewis; 1996.
Control systems KON-C2004 Mechatronics Basics Tapio Lantela, Nov 5th, 2015.
Artificial Neural Networks Students: Albu Alexandru Deaconescu Ionu.
1 Introduction to Control System CHAPTER 1. 2   Basic terminologies.   Open-loop and closed-loop.   Block diagrams.   Control structure. . 
Functional Organization of Nervous Tissue Chapter 11
Lecture 25: Implementation Complicating factors Control design without a model Implementation of control algorithms ME 431, Lecture 25.
FEEDBACK CONTROL SYSTEMS
Lecture 5 Neural Control
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
PROSTHETIC LEG PRESENTED BY:-AWAIS IJAZ HASNAT AHMED KHAN.
Robot Project by Ahmad Shtaiyat Supervised by Dr. Salem Al-Agtash.
How a Stimulus Elicits a Response
Stryker Interaction Design Workshop September 7-8, January 2006 Functional biomimesis * Compliant Sagittal Rotary Joint Active Thrusting Force *[Cham.
Robot Intelligence Technology Lab. 10. Complex Hardware Morphologies: Walking Machines Presented by In-Won Park
1. What are your 5 senses? 2. Give an example of a stimulus for each one of your senses. (stimulus = something you can sense) Example: Hearing  Listening.
The Neuron Building Blocks of our Nervous System.
Intro. ANN & Fuzzy Systems Lecture 3 Basic Definitions of ANN.
הטכניון - מ.ט.ל. הפקולטה להנדסת חשמל - אביב תשס"ה
Memory Network Maintenance Using Spike-Timing Dependent Plasticity David Jangraw, ELE ’07 Advisor: John Hopfield, Department of Molecular Biology 12 T.
Questions of the Day “ WHAT MAKES YOU WHO YOU ARE ? ” Why are you different from Everyone Else” What makes you so similar?
Neural Networks.
Closed Loop Temporal Sequence Learning
PID Controllers Jordan smallwood.
Differential Hebbian Learning – Introducing Temporal Asymmetry
How a Stimulus Elicits a Response
Nervous system.
Presentation transcript:

Closed Loop Temporal Sequence Learning Learning to act in response to sequences of sensor events

Overview over different methods You are here !

Real Life Heat radiation predicts pain when touching a hot surface. Sound of a prey may precede its smell will precede its taste. Control Force precedes position change. Electrical (disturbance) pulse may precede change in a controlled plant. Psychological Experiment Pavlovian and/or Operant Conditioning: Bell precedes Food. Natural Temporal Sequences in life and in “control” situations

  Early: “Bell” Late: “Food” x What we had so far !

This is an open-loop system Sensor 2 conditioned Input BellFood Salivation Pavlov, 1927 Temporal Sequence This is an Open Loop System !

Adaptable Neuron Env. Closed loop Sensing Behaving

How to assure behavioral & learning convergence ?? This is achieved by starting with a stable reflex-like action and learning to supercede it by an anticipatory action. Remove before being hit !

Robot Application   x Early: “Vision” Late: “Bump”

Reflex Only (Compare to an electronic closed loop controller!) This structure assures initial (behavioral) stability (“homeostasis”) Think of a Thermostat !

This is a closed-loop system before learning The Basic Control Structure Schematic diagram of A pure reflex loop Bump Retraction reflex

Learning Goal: Correlate the vision signals with the touch signals and navigate without collisions. Initially built-in behavior: Retraction reaction whenever an obstacle is touched. Robot Application

Lets look at the Reflex first

This is an open-loop system This is a closed-loop system before learning This is a closed-loop system during learning Closed Loop TS Learning The T represents the temporal delay between vision and bump.

Lets see how learning continues

Closed Loop TS Learning Antomically this loop still exists but ideally it should never be active again ! This is the system after learning

What has the learning done ? Elimination of the late reflex input corresponds mathematically to the condition: X 0 =0 This was the major condition of convergence for the ISO/ICO rules. ( The other one was T=0). What’s interesting about this is that the learning-induced behavior self-generates this condition. This is non-trivial as it assures the synaptic weight AND behavior stabilize at exactly the same moment in time.

This is the system after learning. This is the system after learning What has happened in engineering terms? The inner pathway has now become a pure feed-forward path

Formally: Delta pulse Disturbance D Calculations in the Laplace Domain!

Phase Shift The Learner has learned the inverse transfer function of the world and can compensate the disturbance therefore at the summation node! D P1P1 -e -sT e -sT 0

The out path has, through learning, built a Forward Model of the inner path. This is some kind of a “holy grail” for an engineer called: “Model Free Feed-Forward Compensation” For example: If you want to keep the temperature in an outdoors container constant, you can employ a thermostat. You may however also first measure how the sun warms the liquid in the container, relating outside temperature to inside temperature (“Eichkurve”). From this curve you can extract a control signal for the cooling of the liquid and react BEFORE the liquid inside starts warming up. As you react BEFORE, this is called Feed-Forward-Compensation (FFC). As you have used a curve, have performed Model-Based FFC. Our robot learns the model, hence we do Model Free FFC.

Example of ISO instability as compared to ICO stability ! Remember the Auto-Correlation Instability of ISO?? =  u 1 v’ dd dt =  u 1 u 0 ’ dd dt ICO ISO

Old ISO-Learning

New ICO-Learning

Full compensation One-shot learning Over compensation

Statistical Analysis: Conventional differential hebbian learning One-shot learning (some instances) Highly unstable

Stable in a wide domain One-shot learning (common) Statistical Analysis: Input Correlation Learning

Stable learning possible with about fold higher learning rate

Two more applications: RunBot With different convergent conditions! Walking is a very difficult problem as such and requires a good, multi-layered control structure only on top of which learning can take place.

An excursion into Multi-Layered Neural Network Control Some statements: Animals are behaving agents. They receive sensor inputs from MANY sensors and they have to control the reaction of MANY muscles. This is a terrific problem and solved by out brain through many control layers. Attempts to achieve this with robots have so far been not very successful as network control is somewhat fuzzy and unpredictable.

Central Control (Brain) Spinal Cord (Oscillations ) Motorneurons & Sensors (Reflex generation) Muscles & Skeleton (Biomechanics) Muscle length Ground- contact Step control Terrain control Adaptive Control during Waliking Three Loops

Muscles & Skeleton (Biomechanics) Self-Stabilization by passive Biomechanical Properties (Pendelum)

Passive Walking Properties

Motor neurons & Sensors (Reflex generation) Muscles & Skeleton (Biomechanics) Muscle length RunBot’s Network of the lowest loop

Leg Control of RunBot Reflexive Control (cf. Cruse)

Instantaneous Parameter Switching

Body (UBC) Control of RunBot

Central Control (Brain) Terrain control Long-Loop Reflex of one of the upper loops Before or during a fall: Leaning forward of rump and arms

Central Control (Brain) Terrain control Long-Loop Reflexe der obersten Schleife Forward leaning UBC Backward leaning UBC

The Leg Control as Target of the Learning

Learning in RunBot Differential Hebbian Learning related to Spike-Timing Dependent Plasticity

RunBot: Learning to climb a slope Spektrum der Wissenschaft, Sept. 06

Human walking versus machine walking ASIMO by Honda

Lower floor Upper floor Ramp Falls Manages RunBot’s Joints

Change of Gait when walking upwards

Too lateEarly enough X X 0 =0: Weight stabilize together with the behaviour Learning relevant parameters

AL, (AR) Stretch receptor for anterior angle of left (right) hip; GL, (GR) Sensor neuron for ground contact of left (right) foot; EI (FI) Extensor (Flexor) reflex inter-neuron, EM (FM) Extensor (Flexor) reflex motor-neuron; ES (FS) Extensor (Flexor) reflex sensor neuron A different Learning Control Circuitry

Motor Neuron Signal Structure (spike rate) x0x0 x1x1 Left Leg Right Leg Learning reverses pulse sequence  Stability of STDP

Motor Neuron Signal Structure (spike rate) x0x0 x1x1 Left Leg Right Leg Learning reverses pulse sequence  Stability of STDP

The actual Experiment Note: Here our convergence condition has been T=0 (oscillatory!) and NOT as usual X 0 =0

Speed Change by Learning Note the Oscillations ! T ≈ 0 !

Stable learning possible with about fold higher learning rate =  u 1 v’ dd dt =  u 1 u 0 ’ dd dt

Driving School: Learn to follow a track and develop RFs in a closed loop behavioral context

Central Features: Filtering of all inputs with low-pass filters (membrane property) Reproduces asymmetric weight change curve Derivative of the Output Filtered Input ISO-learning for temporal sequence learning: (Porr and Wörgötter, 2003)  T Differential Hebbian learning used for STDP

Differential Hebbian Learning rule: Transfer of a computational neuroscience approach to a technical domain. Modelling STDP (plasticity) and using this approach (in a simplified way) to control behavioral learning. An ecological approach to some aspects of theoretical neuroscience

This is a closed-loop system before learning This is a closed-loop system during learning Adding a secondary loop This delay constitutes the temporal sequence of sensor events ! Pavlov in “real life” !

What has happened during learning to the system ? The primary reflex re-action has effectively been eliminated and replaced by an anticipatory action