Any Questions? Programming Assignments?. CptS 450/516 Singularity Weka / Torch / RL-Glue/Burlap/RLPy Finite MDP = ? Start arbitrarily, moving towards.

Slides:

Advertisements

Similar presentations

Markov Decision Process

Advertisements

Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.

brings-uas-sensor-technology-to- smartphones/ brings-uas-sensor-technology-to-

1 Dynamic Programming Week #4. 2 Introduction Dynamic Programming (DP) –refers to a collection of algorithms –has a high computational complexity –assumes.

Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.

Computational Modeling Lab Wednesday 18 June 2003 Reinforcement Learning an introduction part 3 Ann Nowé By Sutton.

Decision Theoretic Planning

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

1 Monte Carlo Methods Week #5. 2 Introduction Monte Carlo (MC) Methods –do not assume complete knowledge of environment (unlike DP methods which assume.

Markov Decision Processes

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 4: Dynamic Programming pOverview of a collection of classical solution.

Planning under Uncertainty

Policy Evaluation & Policy Iteration S&B: Sec 4.1, 4.3; 6.5.

Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

4/1 Agenda: Markov Decision Processes (& Decision Theoretic Planning)

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:

The Value of Plans. Now and Then Last time Value in stochastic worlds Maximum expected utility Value function calculation Today Example: gridworld navigation.

Department of Computer Science Undergraduate Events More

More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.

R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 4: Dynamic Programming pOverview of a collection of classical solution.

9/23. Announcements Homework 1 returned today (Avg 27.8; highest 37) –Homework 2 due Thursday Homework 3 socket to open today Project 1 due Tuesday –A.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Curious Characters in Multiuser Games: A Study in Motivated Reinforcement Learning for Creative Behavior Policies * Mary Lou Maher University of Sydney.

Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.

MAKING COMPLEX DEClSlONS

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.

Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.

Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 1: Generalized policy iteration.

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

Reinforcement Learning

INTRODUCTION TO Machine Learning

CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:

1 ECE 517: Reinforcement Learning in Artificial Intelligence Lecture 8: Dynamic Programming – Value Iteration Dr. Itamar Arel College of Engineering Department.

Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.

MDPs (cont) & Reinforcement Learning

CSE 473Markov Decision Processes Dan Weld Many slides from Chris Bishop, Mausam, Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer.

Department of Computer Science Undergraduate Events More

Sutton & Barto, Chapter 4 Dynamic Programming. Programming Assignments? Course Discussions?

Sutton & Barto, Chapter 4 Dynamic Programming. Policy Improvement Theorem Let π & π’ be any pair of deterministic policies s.t. for all s in S, Then,

Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.

Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.

1 Markov Decision Processes Finite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.

Transfer Learning and Intelligence: an Argument and Approach Matthew E. Taylor Joint work with: Gregory Kuhlmann and Peter Stone Learning Agents Research.

Markov Decision Process (MDP)

Markov Decision Processes

Reinforcement Learning

Markov Decision Processes

Planning to Maximize Reward: Markov Decision Processes

Markov Decision Processes

CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17

Reinforcement Learning

CMSC 471 Fall 2009 RL using Dynamic Programming

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming

Instructors: Fei Fang (This Lecture) and Dave Touretzky

Dr. Unnikrishnan P.C. Professor, EEE

CS 188: Artificial Intelligence Fall 2007

CS 188: Artificial Intelligence Fall 2008

Reinforcement Learning

Chapter 17 – Making Complex Decisions

CS 188: Artificial Intelligence Spring 2006

CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29

Chapter 4: Dynamic Programming

Markov Decision Processes

Markov Decision Processes

Presentation transcript:

Any Questions? Programming Assignments?

CptS 450/516 Singularity Weka / Torch / RL-Glue/Burlap/RLPy Finite MDP = ? Start arbitrarily, moving towards the “truth”

Transfer Learning – Transfer Learning with Probabilistic Mapping Selection – A. Fachantidis, I. Partalas, M. E. Taylor, I. Vlahavas – in Adaptive Behavior, 2014 Agents Teaching Agents – Reinforcement Learning Agents Providing Advice in Complex Video Games – M. E. Taylor, N. Carboni, A. Fachantidis, I. Vlahavas and L. Torrey – in Journal of Connection Science, 2014 Agents Teaching Humans – Agents Teaching Humans in Reinforcement Learning Tasks – Y. Zhan, A. Fachantidis, I. Vlahavas, and M. E. Taylor – in Adaptive and Learning Agents workshop (at AAMAS), 2014 Humans Teaching Agents – Generating Real-Time Crowd Advice to Improve Reinforcement Learning Agents – G. V. de la Cruz Jr., B. Peng, W. L. Lasecki, M. E. Taylor – in Workshop on Learning for General Competency in Video Games: tomorrow

Sutton & Barto, Chapter 4 Dynamic Programming

Review: – V, V* – Q, Q* – π, π* Bellman Equation vs. Update

Solutions Given a Model Finite MDPs Exploration / Exploitation? Where would Dynamic Programming be used?

“DP may not be practical for very large problems...” & “For the largest problems, only DP methods are feasible” – Curse of Dimensionality

Value Function Existence and uniqueness guaranteed when – γ < 1 or – eventual termination is guaranteed from all states under π

Policy Evaluation: Value Iteration Iterative Policy Evaluation – Full backup: go through each state and consider each possible subsequent state – Two copies of V or “in place” backup?

Policy Evaluation: Algorithm

Policy Improvement Update policy so that action for each state maximizes V(s’) For each state, π’(s)=?

Policy Improvement Update policy so that action for each state maximizes V(s’) For each state, π’(s)=?