Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction Ann Nowé By Sutton and.

Slides:



Advertisements
Similar presentations
University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.
Advertisements

Reinforcement Learning
Reinforcement Learning: Learning from Interaction
1 Reinforcement Learning (RL). 2 Introduction The concept of reinforcement learning incorporates an agent that solves the problem in hand by interacting.
Reinforcement Learning
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 3 Ann Nowé By Sutton.
1 Reinforcement Learning Problem Week #3. Figure reproduced from the figure on page 52 in reference [1] 2 Reinforcement Learning Loop state Agent Environment.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.
Reinforcement Learning
Chapter 1: Introduction
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
1 Machine Learning: Symbol-based 9d 9.0Introduction 9.1A Framework for Symbol-based Learning 9.2Version Space Search 9.3The ID3 Decision Tree Induction.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 3: The Reinforcement Learning Problem pdescribe the RL problem we will.
Reinforcement Learning
Reinforcement Learning (1)
Wednesday 18 June 2003 Computational Modeling Lab Situated Agents Course Karl Tuyls (1.004) Sub-Course on: Reinforcement Learning Multi-Agent Learning.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
CPSC 7373: Artificial Intelligence Lecture 11: Reinforcement Learning Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Reinforcement Learning’s Computational Theory of Mind Rich Sutton Andy Barto Satinder SinghDoina Precup with thanks to:
Reinforcement Learning Evaluative Feedback and Bandit Problems Subramanian Ramamoorthy School of Informatics 20 January 2012.
Reinforcement Learning
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
1 Dr. Itamar Arel College of Engineering Electrical Engineering & Computer Science Department The University of Tennessee Fall 2009 August 24, 2009 ECE-517:
Introduction Many decision making problems in real life
Introduction to Reinforcement Learning
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Cross-Validation ML and Bayesian Model Comparison.
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
Fuzzy Reinforcement Learning Agents By Ritesh Kanetkar Systems and Industrial Engineering Lab Presentation May 23, 2003.
Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.
Attributions These slides were originally developed by R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction. (They have been reformatted.
INTRODUCTION TO Machine Learning
Reinforcement Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
CMSC 471 Fall 2009 MDPs and the RL Problem Prof. Marie desJardins Class #23 – Tuesday, 11/17 Thanks to Rich Sutton and Andy Barto for the use of their.
Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.
Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 5 Ann Nowé By Sutton.
Reinforcement Learning for 3 vs. 2 Keepaway P. Stone, R. S. Sutton, and S. Singh Presented by Brian Light.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 3: The Reinforcement Learning Problem pdescribe the RL problem we will.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 1: Introduction Psychology Artificial Intelligence Control Theory and Operations.
Reinforcement Learning
Done Done Course Overview What is AI? What are the Major Challenges?
Reinforcement Learning
"Playing Atari with deep reinforcement learning."
CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17
Chapter 3: The Reinforcement Learning Problem
Dr. Unnikrishnan P.C. Professor, EEE
Chapter 2: Evaluative Feedback
یادگیری تقویتی Reinforcement Learning
Chapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem
Chapter 6: Temporal Difference Learning
Chapter 1: Introduction
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Chapter 2: Evaluative Feedback
Presentation transcript:

Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction Ann Nowé By Sutton and Barto

Computational Modeling Lab Outline The elements of reinforcement learning(RL)  Supervised vs unsupervised  States, actions, policies  Goals and reward scheme  Returns  Episodic & Continuing tasks + unified view

Computational Modeling Lab Reinforcement Learning What is it? Learning from interaction Learning about, from, and while interacting with an external environment Learning what to do—how to map situations to actions—so as to maximize a numerical reward signal

Computational Modeling Lab Reinforcement Learning Key features? Learner is not told which actions to take Trial-and-Error search Possibility of delayed reward  Sacrifice short-term gains for greater long-term gains The need to explore and exploit Considers the whole problem of a goal-directed agent interacting with an uncertain environment

Computational Modeling Lab Supervised versus Unsupervised Supervised learning Unsupervised learning Supervised Learning System Inputs Outputs Training Info = desired (target) outputs Error = (target output – actual output) “actions” Training Info = evaluations (“rewards” / “penalties”) Objective: get as much reward as possible Reinforcement Learning System Inputs Outputs “states”

Computational Modeling Lab The Agent-Environment Interface t... s t a r t +1 s a r t +2 s a r t +3 s... t +3 a

Computational Modeling Lab Learning how to behave Reinforcement learning methods specify how the agent changes its policy as a result of experience. Roughly, the agent’s goal is to get as much reward as it can over the long run.

Computational Modeling Lab Example: Tic-Tac-Toe X X XOO X X O X O X O X O X X O X O X O X O X O X O X } x’s move } o’s move } x’s move... x x x x o x o x o x x x x o o Assume an imperfect opponent: —he/she sometimes makes mistakes

Computational Modeling Lab An RL Approach to Tic-Tac-Toe 2. Now play lots of games. To pick our moves, look ahead one step: State V(s) – estimated probability of winning.5 ?... 1 win 0 loss... 0 draw x x x x o o o o o x x oo oo x x x x o current state various possible next states * Just pick the next state with the highest estimated prob. of winning — the largest V(s); a greedy move. But 10% of the time pick a move at random; an exploratory move. 1. Make a table with one entry per state:

Computational Modeling Lab RL Learning Rule for Tic-Tac-Toe “Exploratory” move

Computational Modeling Lab Examples of Reinforcement Learning  Robocup Soccer Teams Stone & Veloso, Reidmiller et al. World’s best player of simulated soccer, 1999; Runner-up 2000  Inventory Management Van Roy, Bertsekas, Lee & Tsitsiklis 10-15% improvement over industry standard methods  Dynamic Channel Assignment Singh & Bertsekas, Nie & Haykin World's best assigner of radio channels to mobile telephone calls  Elevator Control Crites & Barto (Probably) world's best down-peak elevator controller  Many Robots navigation, bi-pedal walking, grasping, switching between skills...  TD-Gammon and Jellyfish Tesauro, Dahl World's best backgammon player

Computational Modeling Lab Goals and Rewards Is a scalar reward signal an adequate notion of a goal?—maybe not, but it is surprisingly flexible. A goal should specify what we want to achieve, not how we want to achieve it. A goal must be outside the agent’s direct control—thus outside the agent. The agent must be able to measure success:  explicitly;  frequently during its lifespan.

Computational Modeling Lab What’s the objective? Episodic tasks: interaction breaks naturally into episodes, e.g., plays of a game, trips through a maze. Immediate reward Long term reward

Computational Modeling Lab Returns for Continuing Tasks Continuing tasks: interaction does not have natural episodes. Discounted return:

Computational Modeling Lab An Example Avoid failure: the pole falling beyond a critical angle or the cart hitting end of track. As an episodic task where episode ends upon failure: As a continuing task with discounted return: In either case, return is maximized by avoiding failure for as long as possible.

Computational Modeling Lab Another Example Get to the top of the hill as quickly as possible. Return is maximized by minimizing number of steps reach the top of the hill.

Computational Modeling Lab A Unified Notation Think of each episode as ending in an absorbing state that always produces reward of zero: We can cover all cases by writing

Computational Modeling Lab Getting the Degree of Abstraction Right Time steps need not refer to fixed intervals of real time. Actions can be low level (voltages to motors), or high level (accept job offer), “mental” (shift focus of attention), etc. States can be low-level “sensations”, or abstract, symbolic, based on memory, or subjective (“surprised” or “lost”). The environment is not necessarily unknown to the agent, only incompletely controllable.