CHAPTER 11 R EINFORCEMENT L EARNING VIA T EMPORAL D IFFERENCES Organization of chapter in ISSO –Introduction –Delayed reinforcement –Basic temporal difference.

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

Reinforcement Learning (II.) Exercise Solutions Ata Kaban School of Computer Science University of Birmingham 2003.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
TEMPORAL DIFFERENCE LEARNING Mark Romero – 11/03/2011.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Reinforcement learning (Chapter 21)
CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Planning under Uncertainty
Reinforcement learning
Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.
Machine LearningRL1 Reinforcement Learning in Partially Observable Environments Michael L. Littman.
Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.
Reinforcement Learning (1)
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.
Reinforcement Learning
CHAPTER 4 S TOCHASTIC A PPROXIMATION FOR R OOT F INDING IN N ONLINEAR M ODELS Organization of chapter in ISSO –Introduction and potpourri of examples Sample.
Reinforcement Learning (II.) Exercise Solutions Ata Kaban School of Computer Science University of Birmingham.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 2: Temporal difference learning.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Logistic Regression Week 3 – Soft Computing By Yosi Kristian.
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
CHAPTER 5 S TOCHASTIC G RADIENT F ORM OF S TOCHASTIC A PROXIMATION Organization of chapter in ISSO –Stochastic gradient Core algorithm Basic principles.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Reinforcement Learning
Neural Networks Chapter 7
CHAPTER 6 STOCHASTIC APPROXIMATION AND THE FINITE-DIFFERENCE METHOD
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Reinforcement learning (Chapter 21)
Reinforcement Learning Elementary Solution Methods
S TOCHASTIC M ODELS L ECTURE 4 B ROWNIAN M OTIONS Nan Chen MSc Program in Financial Engineering The Chinese University of Hong Kong (Shenzhen) Nov 11,
Reinforcement Learning for Mapping Instructions to Actions S.R.K. Branavan, Harr Chen, Luke S. Zettlemoyer, Regina Barzilay Computer Science and Artificial.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
Lecture 2 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 2/1 Dr.-Ing. Erwin Sitompul President University
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Stochastic tree search and stochastic games
Reinforcement Learning
Done Done Course Overview What is AI? What are the Major Challenges?
Chapter 6: Temporal Difference Learning
Mastering the game of Go with deep neural network and tree search
Reinforcement learning (Chapter 21)
AlphaGo with Deep RL Alpha GO.
CMSC 471 – Spring 2014 Class #25 – Thursday, May 1
Reinforcement learning (Chapter 21)
Reinforcement Learning
Deep reinforcement learning
Reinforcement Learning
Machine Learning Today: Reading: Maria Florina Balcan
CHAPTER 3 RECURSIVE ESTIMATION FOR LINEAR MODELS
CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17
Training a Neural Network
Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND.
Dr. Unnikrishnan P.C. Professor, EEE
Chapter 2: Evaluative Feedback
Reinforcement Learning
October 6, 2011 Dr. Itamar Arel College of Engineering
Chapter 6: Temporal Difference Learning
Chapter 2: Evaluative Feedback
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Artificial Neural Network learning
CHAPTER 11 REINFORCEMENT LEARNING VIA TEMPORAL DIFFERENCES
Presentation transcript:

CHAPTER 11 R EINFORCEMENT L EARNING VIA T EMPORAL D IFFERENCES Organization of chapter in ISSO –Introduction –Delayed reinforcement –Basic temporal difference algorithm –Batch and online implementations of TD –Examples –Connections to stochastic approximation Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall

11-2 Reinforcement Learning Reinforcement learning is important class of methods in computer science, AI, engineering, etc. –Based on common-sense idea that good results are reinforced while bad results provide negative reinforcement Delayed reinforcement only provides output after several intermediate “actions” Want to create model for predicting state of system –Model depends on  –“Training” or “learning” (estimating  ) not based on methods such as stochastic gradient (supervised learning) because of delay in response –Need learning method to cope with delayed response

11-3 Schematic of Delayed Reinforcement Process Suppose time moves left to right in diagram below Z represents some system output at a future time represent some intermediate predictions of Z

11-4 Temporal Difference (TD) Learning Focus is delayed reinforcement problem Prediction function has form  h  ( ,  x  ), where  are parameters and x  is input Need to estimate  from sequence of inputs and outputs {x 0, x 1,..., x n  ; Z} TD learning is method for using in training rather than only inputs and outputs –Implies that some forms of TD allow for updating of  value before observing Z –TD exploits prior information embedded in predictions to modify  Basic form of TD for updating  is where (  )  is increment to be determined

11-5 Exercise 11.4 in ISSO: Conceptual Example of Benefits of TD. Circles denote game states. Novel Bad Loss Win 90% 10% Game outcome

11-6 Batch Version of TD Learning

11-7 Random-Walk Model (Example 11.3 in ISSO) All walks begin in state S 3 Each step involves 50–50 chance of moving left or right until terminal state T left or T right is reached Use TD to estimate probabilities of reaching T right from any of states S 1, S 2, S 3, S 4, or S 5 S1S1 T left S2S2 S3S3 S4S4 S5S5 T right Start