Chapter 10: Dimensions of Reinforcement Learning

Slides:

Advertisements

Similar presentations

Reinforcement Learning Peter Bodík. Previous Lectures Supervised learning –classification, regression Unsupervised learning –clustering, dimensionality.

Advertisements

Reinforcement learning

Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.

Monte-Carlo Methods Learning methods averaging complete episodic returns Slides based on [Sutton & Barto: Reinforcement Learning: An Introduction, 1998]

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

1 Monte Carlo Methods Week #5. 2 Introduction Monte Carlo (MC) Methods –do not assume complete knowledge of environment (unlike DP methods which assume.

What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.

Partially Observable Markov Decision Process By Nezih Ergin Özkucur.

主講人：虞台文大同大學資工所智慧型多媒體研究室

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Reinforcement Learning Tutorial

Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.

Application of Reinforcement Learning in Network Routing By Chaopin Zhu Chaopin Zhu.

Chapter 5: Monte Carlo Methods

Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

9/23. Announcements Homework 1 returned today (Avg 27.8; highest 37) –Homework 2 due Thursday Homework 3 socket to open today Project 1 due Tuesday –A.

Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Natural Actor-Critic Authors: Jan Peters and Stefan Schaal Neurocomputing, 2008 Cognitive robotics 2008/2009 Wouter Klijn.

Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.

Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.

Reinforcement Learning

Reinforcement Learning Yishay Mansour Tel-Aviv University.

INTRODUCTION TO Machine Learning

CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:

Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.

Partially Observable Markov Decision Process and RL

Reinforcement learning

A Crash Course in Reinforcement Learning

Chapter 6: Temporal Difference Learning

Chapter 5: Monte Carlo Methods

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

CMSC 471 – Spring 2014 Class #25 – Thursday, May 1

Reinforcement Learning

Biomedical Data & Markov Decision Process

Markov Decision Processes

Markov Decision Processes

CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17

Reinforcement learning

Chapter 3: The Reinforcement Learning Problem

Reinforcement Learning with Partially Known World Dynamics

CMSC 471 Fall 2009 RL using Dynamic Programming

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming

Reinforcement Learning

Chapter 2: Evaluative Feedback

یادگیری تقویتی Reinforcement Learning

Chapter 3: The Reinforcement Learning Problem

Reinforcement Learning

Chapter 8: Generalization and Function Approximation

Chapter 3: The Reinforcement Learning Problem

Chapter 6: Temporal Difference Learning

Chapter 9: Planning and Learning

Chapter 17 – Making Complex Decisions

CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29

Hidden Markov Models (cont.) Markov Decision Processes

Chapter 7: Eligibility Traces

CS 416 Artificial Intelligence

Chapter 4: Dynamic Programming

Reinforcement Learning Dealing with Partial Observability

Chapter 2: Evaluative Feedback

Markov Decision Processes

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Markov Decision Processes

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7

Presentation transcript:

Chapter 10: Dimensions of Reinforcement Learning Objectives of this chapter: Review the treatment of RL taken in this course What have left out? What are the hot research areas?

Three Common Ideas Estimation of value functions Backing up values along real or simulated trajectories Generalized Policy Iteration: maintain an approximate optimal value function and approximate optimal policy, use each to improve the other R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Backup Dimensions R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Other Dimensions Function approximation tables aggregation other linear methods many nonlinear methods On-policy/Off-policy On-policy: learn the value function of the policy being followed Off-policy: try learn the value function for the best policy, irrespective of what policy is being followed R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Still More Dimensions Definition of return: episodic, continuing, discounted, etc. Action values vs. state values vs. afterstate values Action selection/exploration: e-greed, softmax, more sophisticated methods Synchronous vs. asynchronous Replacing vs. accumulating traces Real vs. simulated experience Location of backups (search control) Timing of backups: part of selecting actions or only afterward? Memory for backups: how long should backed up values be retained? R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Frontier Dimensions Prove convergence for bootstrapping control methods. Trajectory sampling Non-Markov case: Partially Observable MDPs (POMDPs) Bayesian approach: belief states construct state from sequence of observations Try to do the best you can with non-Markov states Modularity and hierarchies Learning and planning at several different levels Theory of options R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

More Frontier Dimensions Using more structure factored state spaces: dynamic Bayes nets factored action spaces R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Still More Frontier Dimensions Incorporating prior knowledge advice and hints trainers and teachers shaping Lyapunov functions etc. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction