Any Questions? Programming Assignments?. CptS 450/516 Singularity Weka / Torch / RL-Glue/Burlap/RLPy Finite MDP = ? Start arbitrarily, moving towards.

Slides:



Advertisements
Similar presentations
Markov Decision Process
Advertisements

Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
brings-uas-sensor-technology-to- smartphones/ brings-uas-sensor-technology-to-
1 Dynamic Programming Week #4. 2 Introduction Dynamic Programming (DP) –refers to a collection of algorithms –has a high computational complexity –assumes.
Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 3 Ann Nowé By Sutton.
Decision Theoretic Planning
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
1 Monte Carlo Methods Week #5. 2 Introduction Monte Carlo (MC) Methods –do not assume complete knowledge of environment (unlike DP methods which assume.
Markov Decision Processes
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 4: Dynamic Programming pOverview of a collection of classical solution.
Planning under Uncertainty
Policy Evaluation & Policy Iteration S&B: Sec 4.1, 4.3; 6.5.
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
4/1 Agenda: Markov Decision Processes (& Decision Theoretic Planning)
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
The Value of Plans. Now and Then Last time Value in stochastic worlds Maximum expected utility Value function calculation Today Example: gridworld navigation.
Department of Computer Science Undergraduate Events More
More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.
R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 4: Dynamic Programming pOverview of a collection of classical solution.
9/23. Announcements Homework 1 returned today (Avg 27.8; highest 37) –Homework 2 due Thursday Homework 3 socket to open today Project 1 due Tuesday –A.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Curious Characters in Multiuser Games: A Study in Motivated Reinforcement Learning for Creative Behavior Policies * Mary Lou Maher University of Sydney.
Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
MAKING COMPLEX DEClSlONS
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.
Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 1: Generalized policy iteration.
Reinforcement Learning 主講人:虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.
Reinforcement Learning
INTRODUCTION TO Machine Learning
CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:
1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 8: Dynamic Programming – Value Iteration Dr. Itamar Arel College of Engineering Department.
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
MDPs (cont) & Reinforcement Learning
CSE 473Markov Decision Processes Dan Weld Many slides from Chris Bishop, Mausam, Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer.
Department of Computer Science Undergraduate Events More
Sutton & Barto, Chapter 4 Dynamic Programming. Programming Assignments? Course Discussions?
Sutton & Barto, Chapter 4 Dynamic Programming. Policy Improvement Theorem Let π & π’ be any pair of deterministic policies s.t. for all s in S, Then,
Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
1 Markov Decision Processes Finite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Transfer Learning and Intelligence: an Argument and Approach Matthew E. Taylor Joint work with: Gregory Kuhlmann and Peter Stone Learning Agents Research.
Markov Decision Process (MDP)
Markov Decision Processes
Reinforcement Learning
Markov Decision Processes
Planning to Maximize Reward: Markov Decision Processes
Markov Decision Processes
CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17
Reinforcement Learning
CMSC 471 Fall 2009 RL using Dynamic Programming
Chapter 4: Dynamic Programming
Chapter 4: Dynamic Programming
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Dr. Unnikrishnan P.C. Professor, EEE
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
Reinforcement Learning
Chapter 17 – Making Complex Decisions
CS 188: Artificial Intelligence Spring 2006
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Chapter 4: Dynamic Programming
Markov Decision Processes
Markov Decision Processes
Presentation transcript:

Any Questions? Programming Assignments?

CptS 450/516 Singularity Weka / Torch / RL-Glue/Burlap/RLPy Finite MDP = ? Start arbitrarily, moving towards the “truth”

Transfer Learning – Transfer Learning with Probabilistic Mapping Selection – A. Fachantidis, I. Partalas, M. E. Taylor, I. Vlahavas – in Adaptive Behavior, 2014 Agents Teaching Agents – Reinforcement Learning Agents Providing Advice in Complex Video Games – M. E. Taylor, N. Carboni, A. Fachantidis, I. Vlahavas and L. Torrey – in Journal of Connection Science, 2014 Agents Teaching Humans – Agents Teaching Humans in Reinforcement Learning Tasks – Y. Zhan, A. Fachantidis, I. Vlahavas, and M. E. Taylor – in Adaptive and Learning Agents workshop (at AAMAS), 2014 Humans Teaching Agents – Generating Real-Time Crowd Advice to Improve Reinforcement Learning Agents – G. V. de la Cruz Jr., B. Peng, W. L. Lasecki, M. E. Taylor – in Workshop on Learning for General Competency in Video Games: tomorrow

Sutton & Barto, Chapter 4 Dynamic Programming

Review: – V, V* – Q, Q* – π, π* Bellman Equation vs. Update

Solutions Given a Model Finite MDPs Exploration / Exploitation? Where would Dynamic Programming be used?

“DP may not be practical for very large problems...” & “For the largest problems, only DP methods are feasible” – Curse of Dimensionality

Value Function Existence and uniqueness guaranteed when – γ < 1 or – eventual termination is guaranteed from all states under π

Policy Evaluation: Value Iteration Iterative Policy Evaluation – Full backup: go through each state and consider each possible subsequent state – Two copies of V or “in place” backup?

Policy Evaluation: Algorithm

Policy Improvement Update policy so that action for each state maximizes V(s’) For each state, π’(s)=?

Policy Improvement Update policy so that action for each state maximizes V(s’) For each state, π’(s)=?