Robotica Lezione 13. Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning.

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
Reinforcement Learning
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.
Planning under Uncertainty
1 Lecture 35 Brief Introduction to Main AI Areas (cont’d) Overview  Lecture Objective: Present the General Ideas on the AI Branches Below  Introduction.
Autonomous Mobile Robots CPE 470/670 Lecture 12 Instructor: Monica Nicolescu.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
CSE 471/598, CBS 598 Intelligent Agents TIP We’re intelligent agents, aren’t we? Fall 2004.
Reinforcement Learning
Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.
Autonomous Mobile Robots CPE 470/670 Lecture 12 Instructor: Monica Nicolescu.
Neural Networks Marco Loog.
Reinforcement Learning
Robot Learning Jeremy Wyatt School of Computer Science University of Birmingham.
Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,
Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Intelligent Agents revisited.
Topics: Introduction to Robotics CS 491/691(X) Lecture 12 Instructor: Monica Nicolescu.
IROS04 (Japan, Sendai) University of Tehran Amir massoud Farahmand - Majid Nili Ahmadabadi Babak Najar Araabi {mnili,
Advanced Topics in Robotics CS493/790 (X) Lecture 2 Instructor: Monica Nicolescu.
Reinforcement Learning (1)
Making Decisions CSE 592 Winter 2003 Henry Kautz.
Evolutionary Reinforcement Learning Systems Presented by Alp Sardağ.
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Robotica Lezione 1. Robotica - Lecture 12 Objectives - I General aspects of robotics –Situated Agents –Autonomous Vehicles –Dynamical Agents Implementing.
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Robotica Lezione 14. Lecture Outline  Neural networks  Classical conditioning  AHC with NNs  Genetic Algorithms  Classifier Systems  Fuzzy learning.
Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.
Autonomous Mobile Robots CPE 470/670 Lecture 12 Instructor: Monica Nicolescu.
Reinforcement Learning
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 6: Optimality Criterion in MDPs Dr. Itamar Arel College of Engineering Department.
Hybrid Behavior Co-evolution and Structure Learning in Behavior-based Systems Amir massoud Farahmand (a,b,c) (
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction Ann Nowé By Sutton and.
Robotica Lecture 3. 2 Robot Control Robot control is the mean by which the sensing and action of a robot are coordinated The infinitely many possible.
University of Windsor School of Computer Science Topics in Artificial Intelligence Fall 2008 Sept 11, 2008.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 12-1 Chapter 12 Advanced Intelligent Systems.
Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.
Learning Agents MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way.
Artificial Intelligence Chapter 10 Planning, Acting, and Learning Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.
Distributed Q Learning Lars Blackmore and Steve Block.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Reinforcement Learning
Chapter 10 Planning, Acting, and Learning. 2 Contents The Sense/Plan/Act Cycle Approximate Search Learning Heuristic Functions Rewards Instead of Goals.
CSE 471/598 Intelligent Agents TIP We’re intelligent agents, aren’t we?
Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
CHAPTER 1 Introduction BIC 3337 EXPERT SYSTEM.
Done Done Course Overview What is AI? What are the Major Challenges?
Chapter 11: Learning Introduction
Chapter 3: The Reinforcement Learning Problem
Dr. Unnikrishnan P.C. Professor, EEE
Chapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem
Artificial Intelligence Chapter 10 Planning, Acting, and Learning
Artificial Intelligence Chapter 10 Planning, Acting, and Learning
Morteza Kheirkhah University College London
Presentation transcript:

Robotica Lezione 13

Lecture Outline  What is learning?  What is adaptation?  Why robot should learn?  Why is learning in robots hard?  Learning types and methods  Reinforcement learning  credit assignment  basic model  examples

Adaptation v. Learning  Learning produces changes within an organism that, over time, enable it to perform more effectively within its environment.  Adaptation is learning through making adjustments in order to be more tuned with its environment.  different time scales: acclimatization (slow) v. homeostatis (rapid)‏  different levels: genotypic v. phenotypic

Types of Adaptation ( McFarland )‏  Behavioral - behaviors are adjusted relative to each other  Evolutionary - descendents are based on ancestor’s performance over long time scales  Sensory - sensors become more attuned to the environment  Learning as adaptation - anything else that results in a more ecologically fit agent

Adaptive Control  Aström 1950s

Importance of Learning  Learning is more than just adaptation  Learning, the ability to improve one's performance over time, is considered the main hallmark of intelligence, and the greatest challenge of AI.  Learning is particularly difficult to achieve in physical robots, for all the reasons that make intelligent behavior in the physical world difficult.

What Learning Enables  Generalizing concepts  Specializing concepts  Reorganizing information  Introducing new knowledge (facts, behaviors, rules) into the system  Creating or discovering new concepts  Creating explanations  Reusing past experiences

Why Learn in Robots?  What is the purpose of learning in robots?  1) providing the robot with the ability to adapt to changes in its task and/or environment  2) providing the robot with the ability to improve performance  3) providing the robot designer with a way of automating the design and/or programming of the robot

What Can be Done?  automatically design the robot's body  automatically design a physical network for its processor  automatically generate its behaviors  automatically store and re-use its previous executed plans  automatically improve the way its layers interact  automatically tune the parameters within the behaviors

What Has Been Done?  Parts of robot bodies, brains (i.e., processors), and programs have been automatically generated (i.e., learned).  Robots given initial programs have used experience and trial & error to improve the programs (from parameter tuning to switching entire behaviors)  Robots programmed for a task have adapted to changes in the environment (e.g., new obstacles, new maps, heavier loads, new goals)

Challenges in Robots  Situatedness in the world  Noise, occlusion, dynamics, hard to model  Real time constraints  Slow learners die easily in the real world  Simultaneous and multi-modal  Multiple goals & tasks  Need to walk and talk (not just either or)‏

Challenges in Learning  Saliency: what is relevant right now?  Credit assignment: who is to receive credit/blame for the outcome?  New term: when should a new concept/representation be created?  Indexing: how to organize the memory?  Utility: what should be forgotten?

Levels of Learning  Within a behavior  Suitable responses  Suitable stimulus  Suitable behavioral functional mapping  Magnitude of response (gain)‏  Whole new behavior  Within a behavior assemblage  suitable set of behaviors  relative strengths  suitable coordination function

Learning Methods  Reinforcement learning  Neural network (connectionist) learning  Evolutionary learning  Learning from experience  memory-based  case-based  Inductive learning  Explanation-based learning  Multistrategy learning

Types of Learning  Numeric or symbolic  numeric: manipulated numeric functions  symbolic: manipulate symbolic representations  Inductive or deductive  inductive: generalize from examples  deductive: optimize what is known  Continuous or batch  continuous: during interaction w/ world  batch: after interaction, all at once

Some Terminology  Reward/punishment  Positive/negative “feedback”  Cost Function/Performance Metric  Scalar (usually) goodness measure  Induction  Generating a function (a hypothesis) that approximates the observed examples  Teacher, critic  Provides feedback

More Terminology  Plant/Model  System/Agent that we want to train  Convergence  reaching a desired (or steady) state  Credit assignment problem  who should get the credit/blame?  hard to tell over time  hard to tell in multi-robot systems

Reinforcement Learning  Reinforcement learning is currently the most popular approach to learning on mobile robots.  Reinforcement learning is inspired by conditioning in psychology  Law of effect (Thorndike 1911):  Applying a reward immediately after the occurrence of a response increases its probability or reoccurring...

Thorndike's Law of Effect  Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond.

Reinforcement Learning ...while providing punishment after the response will decrease the probability.  Translated to robotics:  Some combinations of stimuli (i.e., sensory readings and/or state) and responses (i.e., actions/behaviors) are coupled with subsequent reward in order to increase their probability of future use.

Reinforcement Learning  Desirable responses or outcomes are positively reinforced (rewarded) and thus strengthened, while undesirable ones are negatively reinforced (punished) and thus weakened.  This very general notion can be translated into a variety of specific reinforcement learning algorithms.  Reinforcement learning is a set of learning problems, not algorithms!

Challenges in RL  Learning from delayed rewards: the problem is difficult if the feedback is not immediate  Credit assignment problem: when something good or bad happens, what exact state/condition-action/behavior should be rewarded or punished?  Common approach: use the expected value of exponentially weighted past/future reinforcement

RL Algorithms  RL algorithms prescribe exact mathematical functions that associate states/situations, actions/behaviors, reinforcement, and various associated parameters.  Some of these algorithms (Q-learning, TD learning, etc.) have well-understood convergence properties; they are guaranteed to make the robot learn the optimal solution.

Optimality in Learning  Optimality depends on strong assumptions: the robot must be given infinitely many trials of each state/action combination, must know what state it is in and what action it has executed, and the world must not change too quickly  But: There is not enough time for infinite trials, outcomes of actions are uncertain, and the world can change.  => Optimality is impractical.

Observability  The world of a robot is partially observable; the robot does not know exactly what state/situation it is in at all times  Learning algorithms have been developed for partially observable environments, and while they are more realistic, they require even more time to converge. In general, managing uncertainty is very hard in learning.

Unsupervised Learning  RL is a form of unsupervised learning  RL allows a robot to learn on its own, using its own experiences (and some built-in notions of desirable and undesirable situations, associated with reward and punishment)‏  The designer can also provide reinforcement (reward/punishment) directly, to influence the robot  The robot is never told what to do.

RL Function and Critic  RL systems contain a reinforcement function, which determines when and how much positive or negative reinforcement to deliver to the robot  Note: properly scaled positive reinforcement is sufficient.  The critic is the part of the RL system which provides the reinforcement, I.e., executes the reinforcement function.

Types of Critic  The critic can be  external: if the user provides the reinforcement  internal: if the system itself provides the reinforcement  In both cases the approach is unsupervised in that the answer is never given explicitly by the critic

RL System Diagram 

What Can be RL Learned?  Policy: the function that maps the states/situations to actions/behaviors  Utility: the function that gives a value to each state  Both of the above are learned relative to a specific goal  If the goal changes, so must the policy and/or the utility function  Example: maze learning

Adaptive Heuristic Critic  Adaptive Heuristic Critic is an RL algorithm (Barto & Sutton)‏  The process of learning what action to take in what state (the policy) is separate from learning the value of each state (the utility function)‏  Both are based on trying different actions in different states and observing the outcomes over time

Q Learning  Q learning is the most popular RL algorithm (Watkins 1980’s)‏  A single utility Q-function is learned in order to evaluate both actions and states.  Shown to be superior to AHC  Q values are stored in a table, usually  Updated at each step, using the following update rule:

Q Learning Algorithm  Q(x,a) <- Q(x,a) + b (r + *E(y) - Q(x,a))‏  x is state, a is action  b is learning rate  r is reward  is discount factor (0,1)‏  E(y) is the utility of the state y, computed as E(y) = max(Q(y,a)) for all actions a  Guaranteed to converge to optimal, given infinite trials

RL Architectures

Example: Genghis’ Walking  Correlation using statistics  Genghis  trailing wheel: + feedback  underbelly contact sensors: - feedback  Learned stable tripod stance and tripod gait

Example: Obelix’s Pushing  Obelix used Q-learning (Connell 90)‏  Subsumption (colony) architecture  Learned to push as well as or better than hand-tuned controller  8 ultrasonic sensors, 1 IR, 1 motor current  5 action choices only: ahead, left or right (22) sharp left and right (45)

Obelix 