Some Final Thoughts Abhijit Gosavi. From MDPs to SMDPs The Semi-MDP is a more general model in which the time for transition is also a random variable.

Slides:



Advertisements
Similar presentations
Lirong Xia Reinforcement Learning (1) Tue, March 18, 2014.
Advertisements

Markov Decision Processes (MDPs) read Ch utility-based agents –goals encoded in utility function U(s), or U:S  effects of actions encoded in.
Markov Decision Process
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
An Introduction to Markov Decision Processes Sarah Hickmott
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Markov Decision Processes
SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.
91.420/543: Artificial Intelligence UMass Lowell CS – Fall 2010
KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.
INSTITUTO DE SISTEMAS E ROBÓTICA Minimax Value Iteration Applied to Robotic Soccer Gonçalo Neto Institute for Systems and Robotics Instituto Superior Técnico.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6 th, 2006 CS286r Presented by Ilan Lobel.
Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.
Distributed Q Learning Lars Blackmore and Steve Block.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Planning to learn. Progress report Last time: Transition functions & stochastic outcomes Markov chains MDPs defined Today: Exercise completed Value functions.
Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Department of Computer Science Undergraduate Events More
More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.
Making Decisions CSE 592 Winter 2003 Henry Kautz.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
MDP Reinforcement Learning. Markov Decision Process “Should you give money to charity?” “Would you contribute?” “Should you give money to charity?” $
Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
Instructor: Vincent Conitzer
MAKING COMPLEX DEClSlONS
1 Endgame Logistics  Final Project Presentations  Tuesday, March 19, 3-5, KEC2057  Powerpoint suggested ( to me before class)  Can use your own.
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.
CPSC 422, Lecture 9Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 9 Sep, 28, 2015.
Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
By: Messias, Spaan, Lima Presented by: Mike Plasker DMES – Ocean Engineering.
Department of Computer Science Undergraduate Events More
A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.
INTRODUCTION TO Machine Learning
MDPs (cont) & Reinforcement Learning
Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.
Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.
CS 484 – Artificial Intelligence1 Announcements Homework 5 due Tuesday, October 30 Book Review due Tuesday, October 30 Lab 3 due Thursday, November 1.
Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.
Department of Computer Science Undergraduate Events More
MDPs and Reinforcement Learning. Overview MDPs Reinforcement learning.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Department of Computer Science Undergraduate Events More
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
Assignment 1 Solutions. Problem 1 States : Actions: Single MDP controlling both detectives D1 (0) (1) C (2) D2 (3) (4)(5) (6)(7)(8)
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
CS 182 Reinforcement Learning. An example RL domain Solitaire –What is the state space? –What are the actions? –What is the transition function? Is it.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Markov Decision Processes
Markov Decision Processes
Markov Decision Processes
CS 188: Artificial Intelligence Fall 2007
Reinforcement Learning
Markov Decision Problems
CS 188: Artificial Intelligence Spring 2006
Hidden Markov Models (cont.) Markov Decision Processes
Reinforcement Nisheeth 18th January 2019.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
A Deep Reinforcement Learning Approach to Traffic Management
Presentation transcript:

Some Final Thoughts Abhijit Gosavi

From MDPs to SMDPs The Semi-MDP is a more general model in which the time for transition is also a random variable. The MDP Bellman equations can be extended to SMDPs to accommodate time.

SMDPs (contd.) In the average reward case, we would be interested in maximizing the average reward per unit time. For the discounted reward case, we will need to discount proportionate to the time spent in each transition. The Q-Learning algorithm for discounted reward has a direct extension. For average reward, we have a family of algorithms called R-SMART (see book for references).

Policy Iteration Another method to solve the MDP: an alternative to SMDPs Slightly more involved mathematically Sometimes more efficient than value iteration Its Reinforcement Learning counterpart is called Approximate Policy Iteration

Other Applications Supply Chain Problems Disaster Response Management Production Planning in Remanufacturing Systems Continuous event systems (LQG control)

What you’ve learned (hopefully ) Markov chains and how they can be employed to model systems Markov decision processes: the idea of optimizing systems (controls) driven by Markov chains Some concepts from Artificial Intelligence Some (hopefully) cool applications of Reinforcement Learning Some coding (for those who were not averse to doing it) Systems thinking Coding iterative algorithms Some discrete-event simulation HOPE YOU’VE ENJOYED THE CLASS!

HAPPY HOLIDAYS!!!