1 Operations Research Prepared by: Abed Alhameed Mohammed Alfarra Supervised by: Dr. Sana’a Wafa Al-Sayegh 2 nd Semester 2008-2009 ITGD4207 University.

Slides:

Advertisements

Similar presentations

Markov Decision Processes (MDPs) read Ch utility-based agents –goals encoded in utility function U(s), or U:S  effects of actions encoded in.

Advertisements

Strategic Decisions (Part II)

Markov Decision Process

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Chapter 4. Discrete Probability Distributions Section 4.11: Markov Chains Jiaping Wang Department of Mathematical.

Department of Computer Science Undergraduate Events More

Operations Research Assistant Professor Dr. Sana’a Wafa Al-Sayegh 2 nd Semester ITGD4207 University of Palestine.

CSE-573 Artificial Intelligence Partially-Observable MDPS (POMDPs)

Dr. Sana’a Wafa Al-Sayegh

MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.

An Introduction to Markov Decision Processes Sarah Hickmott

主講人：虞台文大同大學資工所智慧型多媒體研究室

COSC 878 Seminar on Large Scale Statistical Machine Learning 1.

Markov Decision Processes

Infinite Horizon Problems

Planning under Uncertainty

SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes.

RL Cont’d. Policies Total accumulated reward (value, V ) depends on Where agent starts What agent does at each step (duh) Plan of action is called a policy,

Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University.

Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.

Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.

Planning to learn. Progress report Last time: Transition functions & stochastic outcomes Markov chains MDPs defined Today: Exercise completed Value functions.

Department of Computer Science Undergraduate Events More

More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.

Stochastic Process1 Indexed collection of random variables {X t } t   for each t  T  X t is a random variable T = Index Set State Space = range.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

CS6800 Advanced Theory of Computation Fall 2012 Vinay B Gavirangaswamy

Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.

Instructor: Vincent Conitzer

MAKING COMPLEX DEClSlONS

Search and Planning for Inference and Learning in Computer Vision

1 Chapter 16 Applications of Queuing Theory Prepared by: Ashraf Soliman Abuhamad Supervisor by : Dr. Sana’a Wafa Al-Sayegh University of Palestine Faculty.

Operations Research Assistant Professor Dr. Sana’a Wafa Al-Sayegh 2 nd Semester ITGD4207 University of Palestine.

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.

Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.

Decision Making in Robots and Autonomous Agents Decision Making in Robots and Autonomous Agents The Markov Decision Process (MDP) model Subramanian Ramamoorthy.

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)

1 University of Palestine Operations Research ITGD4207 WIAM_H-Whba Dr. Sana’a Wafa Al-Sayegh 2 nd Semester

Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.

Department of Computer Science Undergraduate Events More

Reinforcement Learning Yishay Mansour Tel-Aviv University.

© 2015 McGraw-Hill Education. All rights reserved. Chapter 19 Markov Decision Processes.

Neural Networks Chapter 7

Theory of Computations III CS-6800 |SPRING

Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.

Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.

Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making Fully Observable MDP.

Department of Computer Science Undergraduate Events More

1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.

10.1 Properties of Markov Chains In this section, we will study a concept that utilizes a mathematical model that combines probability and matrices to.

Markov Decision Process (MDP)

MDPs and Reinforcement Learning. Overview MDPs Reinforcement learning.

Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.

Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Making complex decisions

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Markov Decision Processes

GEOP 4355 Supply Networks: Decision Models

Markov Decision Processes

Markov Decision Processes

ECON 211 ELEMENTS OF ECONOMICS I

Hidden Markov Models Part 2: Algorithms

CS 188: Artificial Intelligence Fall 2007

Problem Markov Chains 1 A manufacturer has one key machine at the core of its production process. Because of heavy use, the machine.

Problem Markov Chains 1 A manufacturer has one key machine at the core of its production process. Because of heavy use, the machine.

Chapter 17 – Making Complex Decisions

Discrete-time markov chain (continuation)

Reinforcement Nisheeth 18th January 2019.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Presentation transcript:

1 Operations Research Prepared by: Abed Alhameed Mohammed Alfarra Supervised by: Dr. Sana’a Wafa Al-Sayegh 2 nd Semester ITGD4207 University of Palestine

2 ITGD4207 Operations Research Chapter 14 Markov Decision Processes

3 Outline  Introduction to MDPs  Definition MDP  Solution  MDP Basics and Terminology  Markov Assumption  A prototype Example 1  Example 2

4 Introduction to MDPs a Markov Decision Process is a discrete time stochastic control process characterized by a set of states; in each state there are several actions from which the decision maker must choose. For a state s and an action a, a state transition function Pa(s) determines the transition probabilities to the next state. The decision maker earns a reward for each state transition. Roots in operations research Also used in economics, communications engineering, ecology, performance modeling

5 Definition MDP Defined formal as a tuple: –S: State –A: Action –T: Transition function Table P(s’| s, a), prob of s’ given action “a” in state “s” – R: Reward R(s, a) = cost or reward of taking action a in state s – is the probability that action a in state s at time t will lead to state s' at time t + 1,

6 Definition MDP The goal is to maximize some cumulative function of the rewards, typically the discounted sum over a potentially infinite horizon:

7 Solution The solution to a Markov Decision Process can be expressed as a policy π, a function from states to actions. Note that once a Markov decision process is combined with a policy in this way, this fixes the action for each state and the resulting combination behaves like a Markov Chain.

8 MDP Basics and Terminology Goal is to choose a sequence of actions for optimality Defined as MDP models: –Finite horizon: Maximize the expected reward for the next n steps –Infinite horizon: Maximize the expected discounted reward. – Transition model: Maximize average expected reward per transition. –Goal state: maximize expected reward (minimize expected cost) to some target state G.

9 Markov Assumption Markov Assumption: Transition probabilities (and rewards) from any given state depend only on the state and not on previous history Where you end up after action depends only on current state Choose a sequence of actions (not just one decision or one action) –Utility based on a sequence of decisions

10 A prototype Example 1 A manufacturer has one key machine at the core of one of its production processes. Because of heavy use, the machine deteriorates rapidly in both quality and output. Therefore, at the end of each week, a thorough inspection is done those results in classifying the condition of the machine into one of four possible states:

11 The following matrix shows the relative frequency (probability) of each possible transition from the state in one month (a row of the matrix) to the state in the following month (a column of the matrix).

12 The expected costs per week from this source are as follows: Find the expected average cost per unit time: Total cost when machine enter state 3 = 6.000$

13 Solution

14 π0 = π0 π1= 7/8 π0 + ¾ π1 π1- ¾ π1 = 7/8 π π1 = 7/8 π0 π1 =3.5 π0 π2= 1/16 π0 + 1/8 π1 + 1/2 π2 π2- 1/2 π2 = 1/16 π0 + 1/8 π1 0.5π2= 1/16 π0 + 1/8 π1 π2= π π1 π3= 1/16 π0 + 1/8 π1 + 1/2 π2 = π0 1=π0+π1+π2+π3 1=π0+ 3.5π π (3.5 π0)+π0 1= ( )+π0 1= 6.5π0 π0 = 0.15 (2/13) (1) π1 =3.5(2/13)= 7/13 (2) π2 =0.125(2/13)+0. 25(7/13)= 2/13 (3) π3 = 2/13

15

16 Example 2 Assume we have 3 types of household detergents  Ariel, Tide, Omo Compacting for attract customers After studying the market situation at the widely found that the three types of current shares in the market as follows: -- Ariel = 40% Tide = 35% Omo= 25%

17 The study showed changes in the demand for all three species were estimated for the regular 6 weeks. The conversion rates were measured from one species to another during the study period Were as in the following table Omo Ariel TideState from/to Ariel Tide Omo

18 Find Identification of the market share of sales volume for each of the detergent during the next periodic periods based on current estimates of shares and the transition matrix of possibilities.

19 Market for Tide = (0.40* * *0.15)= Market share for Ariel = (0.40* * *0.1)=0.42 Market for Omo = (0.40* * *0.75)= Solution

20 Comparing the ratios of these ratios, we find that the new means: -- - Increase the share of cleaner Ariel from the local market increased = 2% - Tide Detergent decline in the share of the domestic market = 1.25% - Decline in the share of Omo = 0.75%

21