KI2 - 10 Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17.

Slides:



Advertisements
Similar presentations
Markov Decision Processes (MDPs) read Ch utility-based agents –goals encoded in utility function U(s), or U:S  effects of actions encoded in.
Advertisements

Markov Decision Process
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Partially Observable Markov Decision Process (POMDP)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)
Decision Theoretic Planning
Optimal Policies for POMDP Presented by Alp Sardağ.
MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.
An Introduction to Markov Decision Processes Sarah Hickmott
Markov Decision Processes
Planning under Uncertainty
POMDPs: Partially Observable Markov Decision Processes Advanced AI
SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.
Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2005.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Markov Decision Processes
Department of Computer Science Undergraduate Events More
Reinforcement Learning Yishay Mansour Tel-Aviv University.
Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC421 – Fall 2003 material from Jean-Claude Latombe, and Daphne Koller.
Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
Instructor: Vincent Conitzer
MAKING COMPLEX DEClSlONS
Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
Utilities and MDP: A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor SIUC.
CPSC 7373: Artificial Intelligence Lecture 10: Planning with Uncertainty Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
Department of Computer Science Undergraduate Events More
Reinforcement Learning Yishay Mansour Tel-Aviv University.
MDPs (cont) & Reinforcement Learning
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.
CPS 570: Artificial Intelligence Markov decision processes, POMDPs
Announcements  Upcoming due dates  Wednesday 11/4, 11:59pm Homework 8  Friday 10/30, 5pm Project 3  Watch out for Daylight Savings and UTC.
Department of Computer Science Undergraduate Events More
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
Markov Decision Process (MDP)
MDPs and Reinforcement Learning. Overview MDPs Reinforcement learning.
1 (Chapter 3 of) Planning and Control in Stochastic Domains with Imperfect Information by Milos Hauskrecht CS594 Automated Decision Making Course Presentation.
Comparison Value vs Policy iteration
Department of Computer Science Undergraduate Events More
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Markov Decision Process (MDP)
Announcements Grader office hours posted on course website
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Making complex decisions
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Markov Decision Processes
CPS 570: Artificial Intelligence Markov decision processes, POMDPs
Reinforcement Learning
Markov Decision Processes
Planning to Maximize Reward: Markov Decision Processes
Markov Decision Processes
CS 188: Artificial Intelligence Fall 2007
13. Acting under Uncertainty Wolfram Burgard and Bernhard Nebel
Instructor: Vincent Conitzer
Chapter 17 – Making Complex Decisions
CS 188: Artificial Intelligence Spring 2006
Hidden Markov Models (cont.) Markov Decision Processes
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
CS 416 Artificial Intelligence
Reinforcement Nisheeth 18th January 2019.
Markov Decision Processes
Markov Decision Processes
Reinforcement Learning (2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Presentation transcript:

KI Kunstmatige Intelligentie / RuG Markov Decision Processes AIMA, Chapter 17

2 Markov Decision Problem How to use knowledge about the world to make decision even when the outcomes of an action are uncertain and the payoffs will not be obtained until several (or many) actions have passed.

3 The Solution Sequential decision problems in uncertain environments can be solved by calculating a policy that associates an optimal decision with every state that the agent might reach => Markov Decision Process (MDP)

4 Example start The world Actions have uncertain consequences

5

6

7

8

9

10

11 Utility of a State Sequence  Additive rewards  Discounted rewards

12

13  The utility of each state is the expected sum of discounted rewards if the agent executes the policy   The true utility of a state corresponds to the optimal policy  * Utility of a State

14

15 Algorithms for Calculating the Optimal Policy  Value iteration  Policy iteration

16  Calculate the utility of each state  Then use the state utilities to select an optimal action in each state Value Iteration

17 Value Iteration Algorithm function value-iteration(MDP) returns a utility function local variables : U, U’ initially identical to R repeat U  U’ for each state s do end until close-enough(U, U’) return U Bellman update

18 The utilities of the states by value iteration algorithm The Utilities of the States Obtained After Value Iteration

19 Policy Iteration  Pick a policy, then calculate the utility of each state given that policy (value determination step)  Update the policy at each state using the utilities of the successor states  Repeat until the policy stabilizes

20 Policy Iteration Algorithm function policy-iteration(MDP) returns a policy local variables : U, a utility function, , a policy repeat U  value-determination( ,U,MDP,R) unchanged?  true for each state s do unchanged?  false end until unchanged? return 

21 Value Determination  Simplification of the value iteration algorithm because the policy is fixed  Linear equations because the max() operator has been removed  Solve exactly for the utilities using standard linear algebra

u (1,1) = 0.8 u (1,2) u (1,2) u (1,1) u (1,2) = 0.8 u (1,3) u (1,2) … Optimal Policy (policy iteration with 11 linear equations)

23 Partially observable MDP (POMDP)  In an inaccessible environment, the percept does not provide enough information to determine the state or the transition probability  POMDP –State transition function: P(s t+1 | s t, a t ) –Observation function: P(o t | s t, a t ) –Reward function: E(r t | s t, a t )  Approach –To calculate a probability distribution over the possible states given all previous percepts, and to base decision on this distribution  Difficulty –Actions will cause the agent to obtain new percept, which will cause the agent’s beliefs to change in complex ways