Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.

Slides:



Advertisements
Similar presentations
Genetic Algorithms (Evolutionary Computing) Genetic Algorithms are used to try to “evolve” the solution to a problem Generate prototype solutions called.
Advertisements

Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.
Reinforcement Learning
Effective Reinforcement Learning for Mobile Robots Smart, D.L and Kaelbing, L.P.
Kostas Kontogiannis E&CE
1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.
Classification Neural Networks 1
The Nature of Statistical Learning Theory by V. Vapnik
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Lezione 6 - Reinforcement Learning Prof. Giancarlo.
Simple Control Theory Jeremy Wyatt School of Computer Science University of Birmingham.
Reinforcement Learning
An Introduction to Machine Learning In the area of AI (earlier) machine learning took a back seat to Expert Systems Expert system development usually consists.
An Introduction to Reinforcement Learning (Part 1) Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham
An Illustrative Example
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Reinforcement Learning (2) Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?
Reinforcement Learning (1)
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
Reinforcement Learning Russell and Norvig: Chapter 21 CMSC 421 – Fall 2006.
CS 484 – Artificial Intelligence
Supervised Learning: Perceptrons and Backpropagation.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Reinforcement Learning
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Appendix B: An Example of Back-propagation algorithm
Simultaneous Localization and Mapping Presented by Lihan He Apr. 21, 2006.
Classification / Regression Neural Networks 2
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Reinforcement Learning
Q-learning Watkins, C. J. C. H., and Dayan, P., Q learning,
Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.
PHE YEONG KIANG A Introduction For this course LMCK1531 KEPIMPINAN & KREATIVITI, I will talk about what I've try to do in the Robot Soccer's Club.
Neural Networks Chapter 7
Reinforcement Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Purposive Behavior Acquisition for a real robot by vision based Reinforcement Learning Minuru Asada,Shoichi Noda, Sukoya Tawarasudia, Koh Hosoda Presented.
Distributed Q Learning Lars Blackmore and Steve Block.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Reinforcement Learning
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
Additional NN Models Reinforcement learning (RL) Basic ideas: –Supervised learning: (delta rule, BP) Samples (x, f(x)) to learn function f(.) precise error.
Chapter 6 Neural Network.
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
Artificial Neural Networks By: Steve Kidos. Outline Artificial Neural Networks: An Introduction Frank Rosenblatt’s Perceptron Multi-layer Perceptron Dot.
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO AUTOMATICO Lezione 12 - Reinforcement Learning Prof. Giancarlo Mauri.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Reinforcement Learning
Machine Learning overview Chapter 18, 21
Machine Learning overview Chapter 18, 21
Reinforcement Learning
Dr. Kenneth Stanley September 6, 2006
What is an ANN ? The inventor of the first neuro computer, Dr. Robert defines a neural network as,A human brain like system consisting of a large number.
CH. 1: Introduction 1.1 What is Machine Learning Example:
AV Autonomous Vehicles.
FUNDAMENTAL CONCEPT OF ARTIFICIAL NETWORKS
Classification / Regression Neural Networks 2
Classification Neural Networks 1
network of simple neuron-like computing elements
Presentation transcript:

Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham

Plan Why and when What we can do –Learning how to act –Learning maps –Evolutionary Robotics How we do it –Supervised Learning –Learning from punishments and rewards –Unsupervised Learning

Learning How to Act What can we do? –Reaching –Road following –Box pushing –Wall following –Pole-balancing –Stick juggling –Walking

Learning How to Act: Reaching We can learn from reinforcement or from a teacher (supervised learning) Reinforcement Learning: –Action: Move your arm (  ) –You received a reward of 2.1 Supervised Learning: –Action: Move your hand to  –You should have moved to     (x,y,z)

Learning How to Act: Driving ALVINN: learned to drive in 5 minutes Learns to copy the human response Feedforward multilayer neural network Steering wheel position

Learning How to Act: Driving Network outputs form a Gaussian Mean encodes the driving direction Compare with the “correct” human action Compute error for each unit given desired Gaussian

Learning How to Act: Driving Distribution of training examples from on the fly learning causes problems Network doesn’t see how to cope with misalignments Network can forget if it doesn’t see a situation for a while Answer: generate new examples from the on the fly images

Learning How to Act: Driving Use camera geometry to assess new field of view Fill in using information about road structure Transform the target steering direction Present as a new training example

Learning How to Act: Driving

Learning How to Act Obelix Learns to push boxes Reinforcement Learning

What is Reinforcement Learning? Learning from punishments and rewards Agent moves through world, observing states and rewards Adapts its behaviour to maximise some function of reward s9s9 s5s5 s4s4 …… … r9r9 r5r5 r4r4 r1r1 s1s1 a9a9 a5a5 a4a4 a2a2 … a3a3 a1a1 s2s2 s3s3

Return: Long term performance Let’s assume our agent acts according to some rules, called a policy,  The return R t is a measure of long term reward collected after time t The expected return for a state-action pair is called a Q value Q(s,a) r9r9 r5r5 r4r4 r1r1

One step Q-learning Guess how good state-action pairs are Take an action Watch the new state and reward Update the state-action value s t+1 atat stst r t+1

Obelix Won’t converge with a single controller Works if you divide it into behaviours But …

Evolutionary Robotics

Learning Maps