CHAPTER 10 Reinforcement Learning Utility Theory.

Slides:

Advertisements

Similar presentations

Lirong Xia Reinforcement Learning (1) Tue, March 18, 2014.

Advertisements

Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.

Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.

Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.

Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.

Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.

Reinforcement Learning

1. Algorithms for Inverse Reinforcement Learning 2

Reinforcement learning (Chapter 21)

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

Planning under Uncertainty

Reinforcement learning

CS 182/CogSci110/Ling109 Spring 2008 Reinforcement Learning: Algorithms 4/1/2008 Srini Narayanan – ICSI and UC Berkeley.

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

CS 188: Artificial Intelligence Fall 2009

Exploration and Apprenticeship Learning in Reinforcement Learning Pieter Abbeel and Andrew Y. Ng Stanford University.

Algorithms For Inverse Reinforcement Learning Presented by Alp Sardağ.

CS 188: Artificial Intelligence Fall 2009 Lecture 10: MDPs 9/29/2009 Dan Klein – UC Berkeley Many slides over the course adapted from either Stuart Russell.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

91.420/543: Artificial Intelligence UMass Lowell CS – Fall 2010 Reinforcement Learning 10/25/2010 A subset of slides from Dan Klein – UC Berkeley Many.

Reinforcement Learning (1)

Making Decisions CSE 592 Winter 2003 Henry Kautz.

CS 188: Artificial Intelligence Fall 2009 Lecture 12: Reinforcement Learning II 10/6/2009 Dan Klein – UC Berkeley Many slides over the course adapted from.

Reinforcement Learning Russell and Norvig: Chapter 21 CMSC 421 – Fall 2006.

1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.

Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.

CSE 573: Artificial Intelligence

Reinforcement Learning

Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.

Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.

Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

QUIZ!!  T/F: RL is an MDP but we do not know T or R. TRUE  T/F: In model free learning we estimate T and R first. FALSE  T/F: Temporal Difference learning.

Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.

MDPs (cont) & Reinforcement Learning

gaflier-uas-battles-feral-hogs/ gaflier-uas-battles-feral-hogs/

CS 188: Artificial Intelligence Spring 2007 Lecture 23: Reinforcement Learning: III 4/17/2007 Srini Narayanan – ICSI and UC Berkeley.

Reinforcement learning (Chapter 21)

CSE 473Markov Decision Processes Dan Weld Many slides from Chris Bishop, Mausam, Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer.

1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 21: Dynamic Multi-Criteria RL problems Dr. Itamar Arel College of Engineering Department.

Reinforcement Learning Based on slides by Avi Pfeffer and David Parkes.

CS 188: Artificial Intelligence Spring 2007 Lecture 23: Reinforcement Learning: IV 4/19/2007 Srini Narayanan – ICSI and UC Berkeley.

Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.

QUIZ!!  T/F: Optimal policies can be defined from an optimal Value function. TRUE  T/F: “Pick the MEU action first, then follow optimal policy” is optimal.

Reinforcement Learning

CS 188: Artificial Intelligence Spring 2007 Lecture 21:Reinforcement Learning: II MDP 4/12/2007 Srini Narayanan – ICSI and UC Berkeley.

CSE 473: Artificial Intelligence Spring 2012 Reinforcement Learning Dan Weld Many slides adapted from either Dan Klein, Stuart Russell, Luke Zettlemoyer.

CS 541: Artificial Intelligence Lecture XI: Reinforcement Learning Slides Credit: Peter Norvig and Sebastian Thrun.

Def gradientDescent(x, y, theta, alpha, m, numIterations): xTrans = x.transpose() replaceMe =.0001 for i in range(0, numIterations): hypothesis = np.dot(x,

CS 188: Artificial Intelligence Fall 2007 Lecture 12: Reinforcement Learning 10/4/2007 Dan Klein – UC Berkeley.

Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.

CS 188: Artificial Intelligence Fall 2008 Lecture 11: Reinforcement Learning 10/2/2008 Dan Klein – UC Berkeley Many slides over the course adapted from.

1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.

Reinforcement learning (Chapter 21)

Reinforcement Learning (1)

CMSC 471 – Spring 2014 Class #25 – Thursday, May 1

Reinforcement learning (Chapter 21)

Reinforcement Learning

Chapter 2: Evaluative Feedback

CS 188: Artificial Intelligence Fall 2008

CSE 473: Artificial Intelligence Autumn 2011

CS 188: Artificial Intelligence Fall 2007

CS 188: Artificial Intelligence Spring 2006

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Spring 2006

Chapter 2: Evaluative Feedback

Reinforcement Learning (2)

Reinforcement Learning (2)

Presentation transcript:

CHAPTER 10 Reinforcement Learning Utility Theory

QUESTION????

Recap: MDPs

Reinforcement Learning

Example: Animal Learning RL studied experimentally for more than 60 years in psychology Rewards: food, pain, hunger, drugs, etc. Mechanisms and sophistication debated Example: foraging Bees learn near-optimal foraging plan in field of artificial flowers with controlled nectar supplies Bees have a direct neural connection from nectar intake measurement to motor planning area

Example: Backgammon

Passive Learning

Example: Direct Estimation

Model-Based Learning Idea: Learn the model empirically (rather than values) Solve the MDP as if the learned model were correct Empirical model learning Simplest case: Count outcomes for each s,a Normalize to give estimate of T(s,a,s’) Discover R(s) the first time we enter s More complex learners are possible (e.g. if we know that all squares have related action outcomes “stationary noise”)

Example: Model-Based Learning

Model-Free Learning

(Greedy) Active Learning In general, want to learn the optimal policy Idea: –Learn an initial model of the environment: –Solve for the optimal policy for this model (value or policy iteration) Refine model through experience and repeat

Example: Greedy Active Learning

Q-Functions

Learning Q-Functions: MDPs

Q-Learning

Exploration / Exploitation

Exploration Functions

Function Approximation Problem: too slow to learn each state’s utility one by one Solution: what we learn about one state should generalize to similar states –Very much like supervised learning –If states are treated entirely independently, we can only learn on very small state spaces

Discretization

Linear Value Functions

TD Updates for Linear Values