Chapter 10: Dimensions of Reinforcement Learning

Slides:



Advertisements
Similar presentations
Reinforcement Learning Peter Bodík. Previous Lectures Supervised learning –classification, regression Unsupervised learning –clustering, dimensionality.
Advertisements

Reinforcement learning
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Monte-Carlo Methods Learning methods averaging complete episodic returns Slides based on [Sutton & Barto: Reinforcement Learning: An Introduction, 1998]
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
1 Monte Carlo Methods Week #5. 2 Introduction Monte Carlo (MC) Methods –do not assume complete knowledge of environment (unlike DP methods which assume.
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
Partially Observable Markov Decision Process By Nezih Ergin Özkucur.
主講人:虞台文 大同大學資工所 智慧型多媒體研究室
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Reinforcement Learning Tutorial
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
Application of Reinforcement Learning in Network Routing By Chaopin Zhu Chaopin Zhu.
Chapter 5: Monte Carlo Methods
Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
9/23. Announcements Homework 1 returned today (Avg 27.8; highest 37) –Homework 2 due Thursday Homework 3 socket to open today Project 1 due Tuesday –A.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Natural Actor-Critic Authors: Jan Peters and Stefan Schaal Neurocomputing, 2008 Cognitive robotics 2008/2009 Wouter Klijn.
Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.
Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.
Reinforcement Learning
Reinforcement Learning Yishay Mansour Tel-Aviv University.
INTRODUCTION TO Machine Learning
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
Partially Observable Markov Decision Process and RL
Reinforcement learning
A Crash Course in Reinforcement Learning
Chapter 6: Temporal Difference Learning
Chapter 5: Monte Carlo Methods
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
CMSC 471 – Spring 2014 Class #25 – Thursday, May 1
Reinforcement Learning
Biomedical Data & Markov Decision Process
Markov Decision Processes
Markov Decision Processes
CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17
Reinforcement learning
Chapter 3: The Reinforcement Learning Problem
Reinforcement Learning with Partially Known World Dynamics
CMSC 471 Fall 2009 RL using Dynamic Programming
Chapter 4: Dynamic Programming
Chapter 4: Dynamic Programming
Reinforcement Learning
Chapter 2: Evaluative Feedback
یادگیری تقویتی Reinforcement Learning
Chapter 3: The Reinforcement Learning Problem
Reinforcement Learning
Chapter 8: Generalization and Function Approximation
Chapter 3: The Reinforcement Learning Problem
Chapter 6: Temporal Difference Learning
Chapter 9: Planning and Learning
Chapter 17 – Making Complex Decisions
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Hidden Markov Models (cont.) Markov Decision Processes
Chapter 7: Eligibility Traces
CS 416 Artificial Intelligence
Chapter 4: Dynamic Programming
Reinforcement Learning Dealing with Partial Observability
Chapter 2: Evaluative Feedback
Markov Decision Processes
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Markov Decision Processes
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Chapter 10: Dimensions of Reinforcement Learning Objectives of this chapter: Review the treatment of RL taken in this course What have left out? What are the hot research areas?

Three Common Ideas Estimation of value functions Backing up values along real or simulated trajectories Generalized Policy Iteration: maintain an approximate optimal value function and approximate optimal policy, use each to improve the other R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Backup Dimensions R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Other Dimensions Function approximation tables aggregation other linear methods many nonlinear methods On-policy/Off-policy On-policy: learn the value function of the policy being followed Off-policy: try learn the value function for the best policy, irrespective of what policy is being followed R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Still More Dimensions Definition of return: episodic, continuing, discounted, etc. Action values vs. state values vs. afterstate values Action selection/exploration: e-greed, softmax, more sophisticated methods Synchronous vs. asynchronous Replacing vs. accumulating traces Real vs. simulated experience Location of backups (search control) Timing of backups: part of selecting actions or only afterward? Memory for backups: how long should backed up values be retained? R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Frontier Dimensions Prove convergence for bootstrapping control methods. Trajectory sampling Non-Markov case: Partially Observable MDPs (POMDPs) Bayesian approach: belief states construct state from sequence of observations Try to do the best you can with non-Markov states Modularity and hierarchies Learning and planning at several different levels Theory of options R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

More Frontier Dimensions Using more structure factored state spaces: dynamic Bayes nets factored action spaces R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

Still More Frontier Dimensions Incorporating prior knowledge advice and hints trainers and teachers shaping Lyapunov functions etc. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction