Reinforcement Learning

Slides:

Advertisements

Similar presentations

Reinforcement Learning: Learning from Interaction

Advertisements

Reinforcement Learning Peter Bodík. Previous Lectures Supervised learning –classification, regression Unsupervised learning –clustering, dimensionality.

brings-uas-sensor-technology-to- smartphones/ brings-uas-sensor-technology-to-

1 Dynamic Programming Week #4. 2 Introduction Dynamic Programming (DP) –refers to a collection of algorithms –has a high computational complexity –assumes.

Monte-Carlo Methods Learning methods averaging complete episodic returns Slides based on [Sutton & Barto: Reinforcement Learning: An Introduction, 1998]

Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 3 Ann Nowé By Sutton.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

1 Monte Carlo Methods Week #5. 2 Introduction Monte Carlo (MC) Methods –do not assume complete knowledge of environment (unlike DP methods which assume.

1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Lezione 6 - Reinforcement Learning Prof. Giancarlo.

SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.

Reinforcement Learning Tutorial

Reinforcement Learning

Machine LearningRL1 Reinforcement Learning in Partially Observable Environments Michael L. Littman.

An Introduction to Reinforcement Learning (Part 1) Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham

Reinforcement Learning

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Chapter 6: Temporal Difference Learning

Chapter 6: Temporal Difference Learning

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Reinforcement Learning

Reinforcement Learning (1)

Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Machine Learning Chapter 13. Reinforcement Learning

Reinforcement Learning

REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.

Reinforcement Learning

Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.

Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Thursday 29 October 2002 William.

Class 2 Please read chapter 2 for Tuesday’s class (Response due by 3pm on Monday) How was Piazza? Any Questions?

INTRODUCTION TO Machine Learning

Reinforcement Learning 主講人：虞台文大同大學資工所智慧型多媒體研究室.

CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:

Reinforcement Learning

CS 484 – Artificial Intelligence1 Announcements Homework 5 due Tuesday, October 30 Book Review due Tuesday, October 30 Lab 3 due Thursday, November 1.

Reinforcement Learning Based on slides by Avi Pfeffer and David Parkes.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.

Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.

REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

Reinforcement learning

Chapter 6: Temporal Difference Learning

CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17

Reinforcement learning

Chapter 3: The Reinforcement Learning Problem

CMSC 471 Fall 2009 RL using Dynamic Programming

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming

Instructors: Fei Fang (This Lecture) and Dave Touretzky

Chapter 2: Evaluative Feedback

یادگیری تقویتی Reinforcement Learning

Chapter 3: The Reinforcement Learning Problem

Reinforcement Learning

October 6, 2011 Dr. Itamar Arel College of Engineering

Chapter 6: Temporal Difference Learning

Chapter 10: Dimensions of Reinforcement Learning

CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29

Chapter 7: Eligibility Traces

Chapter 4: Dynamic Programming

Reinforcement learning

Chapter 2: Evaluative Feedback

Presentation transcript:

Reinforcement Learning Michael Roberts With Material From: Reinforcement Learning: An Introduction Sutton & Barto (1998)

What is RL? Trial & error learning Structure without model with model

RL vs. Supervised Learning Evaluative vs. Instructional feedback Role of exploration On-line performance

K-armed Bandit Problem Average Rewards Actions 10 0, 0, 5, 10, 35 5, 10, -15, -15, -10 -5 Agent 100

K-armed Bandit Cont. Greedy exploration ε-greedy Softmax Average Reward: Incremental formula: where: α = 1 / (k+1) Probability of choosing action a:

More General Problems More than one state Delayed rewards Markov Decision Process (MDP) Set of states Set of actions Reward function State transition function Table or Function Approximation

Example: Recycling Robot

Recycling Robot: Transition Graph

Dynamic Programming

Backup Diagram .25 .25 .25 .4 .6 .7 .3 .5 .5 Rewards 10 5 200 200 -10 1000

Dynamic Programming: Optimal Policy

Backup for Optimal Policy

Performance Metrics Eventual convergence to optimality Speed of convergence to optimality Regret (Kaelbling, L., Littman, M., & Moore, A. 1996)

Gridworld Example

Initialize V arbitrarily, e.g. , for all Repeat For each until (a small positive number) Output a deterministic policy, such that:

Temporal Difference Learning RL without a model Issue of: temporal credit assignment Bootstraps like DP TD(0):

TD Learning Again, TD(0) = TD(λ) = where e is called an eligibility trace

Backup Diagram for TD(λ)

TD-Gammon (Tesauro)

Additional Work POMDP’s Macros Multi-agent rl Multiple reward structures