TD(0) prediction Sarsa, On-policy learning Q-Learning, Off-policy learning.

Slides:



Advertisements
Similar presentations
Lecture 18: Temporal-Difference Learning
Advertisements

Programming exercises: Angel – lms.wsu.edu – Submit via zip or tar – Write-up, Results, Code Doodle: class presentations Student Responses First visit.
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
Monte-Carlo Methods Learning methods averaging complete episodic returns Slides based on [Sutton & Barto: Reinforcement Learning: An Introduction, 1998]
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.
1 Eligibility Traces (ETs) Week #7. 2 Introduction A basic mechanism of RL. λ in TD(λ) refers to an eligibility trace. TD methods such as Sarsa and Q-learning.
Reinforcement Learning
1 Reinforcement Learning: Learning Algorithms Csaba Szepesvári University of Alberta Kioloa, MLSS’08 Slides:
Adapted from R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction From Sutton & Barto Reinforcement Learning An Introduction.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Lezione 6 - Reinforcement Learning Prof. Giancarlo.
Model-Free vs. Model- Based RL: Q, SARSA, & E 3. Administrivia Reminder: Office hours tomorrow truncated 9:00-10:15 AM Can schedule other times if necessary.
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Chapter 5: Monte Carlo Methods
לביצוע מיידי ! להתחלק לקבוצות –2 או 3 בקבוצה להעביר את הקבוצות – היום בסוף השיעור ! ספר Reinforcement Learning – הספר קיים online ( גישה מהאתר של הסדנה.
Machine Learning Lecture 11: Reinforcement Learning
Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring CDMRG – 24 May 2010 Nick Lawrance.
Chapter 6: Temporal Difference Learning
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 From Sutton & Barto Reinforcement Learning An Introduction.
Chapter 6: Temporal Difference Learning
Chapter 6: Temporal Difference Learning
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Reinforcement Learning: Generalization and Function Brendan and Yifang Feb 10, 2015.
Reinforcement Learning
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.
1 Tópicos Especiais em Aprendizagem Prof. Reinaldo Bianchi Centro Universitário da FEI 2006.
Drones Collecting Cell Phone Data in LA AdNear had already been using methods.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Formulating MDPs pFormulating MDPs Rewards Returns Values pEscalator pElevators.
Temporal Difference Learning By John Lenz. Reinforcement Learning Agent interacting with environment Agent receives reward signal based on previous action.
Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.
Efficiency in ML / AI 1.Data efficiency (rate of learning) 2.Computational efficiency (memory, computation, communication) 3.Researcher efficiency (autonomy,
Chapter 7: Eligibility Traces
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 11: Temporal Difference Learning (cont.), Eligibility Traces Dr. Itamar Arel College.
Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 2: Temporal difference learning.
Q-learning Watkins, C. J. C. H., and Dayan, P., Q learning,
Reinforcement Learning Generalization and Function Approximation Subramanian Ramamoorthy School of Informatics 28 February, 2012.
Computer Architecture Lecture 26 Fasih ur Rehman.
Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.
1 S ystems Analysis Laboratory Helsinki University of Technology Flight Time Allocation Using Reinforcement Learning Ville Mattila and Kai Virtanen Systems.
Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 3: TD( ) and eligibility traces.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Unconditioned stimulus (food) causes unconditioned response (saliva) Conditioned stimulus (bell) causes conditioned response (saliva)
Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.
Q-learning, SARSA, and Radioactive Breadcrumbs S&B: Ch.6 and 7.
CMSC 471 Fall 2009 Temporal Difference Learning Prof. Marie desJardins Class #25 – Tuesday, 11/24 Thanks to Rich Sutton and Andy Barto for the use of their.
INTRODUCTION TO Machine Learning
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 4 Ann Nowé By Sutton.
Reinforcement Learning Eligibility Traces 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Schedule for presentations. 6.1: Chris? – The agent is driving home from work from a new work location, but enters the freeway from the same point. Thus,
Off-Policy Temporal-Difference Learning with Function Approximation Doina Precup McGill University Rich Sutton Sanjoy Dasgupta AT&T Labs.
Monte Carlo Methods. Learn from complete sample returns – Only defined for episodic tasks Can work in different settings – On-line: no model necessary.
Retraction: I’m actually 35 years old. Q-Learning.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 5: Monte Carlo Methods pMonte Carlo methods learn from complete sample.
Reinforcement Learning Elementary Solution Methods
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
1 ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 12: Generalization and Function Approximation Dr. Itamar Arel College of Engineering.
Chapter 6: Temporal Difference Learning
Chapter 5: Monte Carlo Methods
Rounding Numbers
CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17
MATH UNIT #6 Fractions and Decimals
Significant Figures.
October 6, 2011 Dr. Itamar Arel College of Engineering
Chapter 6: Temporal Difference Learning
Chapter 7: Eligibility Traces
Presentation transcript:

TD(0) prediction Sarsa, On-policy learning Q-Learning, Off-policy learning

Actor-Critic

Unified View

N-step TD Prediction

Forward View

Random Walk

19-state random walk

n-step method is simple version of TD(λ) Example: backup average of 2-step and 4-step returns

Forward View, TD(λ) Weigh all n-step return backups by λ n-1 (time since visitation) λ-return: Backup using λ-return:

Weighting of λ-return

Relationship with TD(0) and MC

Backward View

Book shows forward and backward views are actually equivalent

On-line, Tabular TD(λ)

Update rule: As before, λ = 0 means TD(0) Now, when λ = 1, you get MC, but – Can apply to continuing tasks – Works incrementally and on-line!

Control: Sarsa(λ)

Gridworld Example

Watkin’s Q(λ) Why isn’t Q-learning as easy as Sarsa?

Watkin’s Q(λ) Why isn’t Q-learning as easy as Sarsa?

Watkin’s Q(λ)

Accumulating traces – Eligibilities can be greater than 1 – Could cause convergence problems Replacing traces

Example: why do accumulating traces do particularly poorly in this task?

Implementation Issues Could require significant amounts of computation – But most traces are very close to zero… – We can actually throw them out when they get very small Will want to use some type of efficient data structure In practice, increases computation only by a small multiple

AnonymousFeedback.net Send to 1.What’s been most useful to you (and why)? 2.What’s been least useful (and why)? 3.What could students do to improve the class? 4.What could Matt do to improve the class?