Download presentation
Presentation is loading. Please wait.
Published byDelilah Gaines Modified over 9 years ago
1
1 Operations Research Prepared by: Abed Alhameed Mohammed Alfarra Supervised by: Dr. Sana’a Wafa Al-Sayegh 2 nd Semester 2008-2009 ITGD4207 University of Palestine
2
2 ITGD4207 Operations Research Chapter 14 Markov Decision Processes
3
3 Outline Introduction to MDPs Definition MDP Solution MDP Basics and Terminology Markov Assumption A prototype Example 1 Example 2
4
4 Introduction to MDPs a Markov Decision Process is a discrete time stochastic control process characterized by a set of states; in each state there are several actions from which the decision maker must choose. For a state s and an action a, a state transition function Pa(s) determines the transition probabilities to the next state. The decision maker earns a reward for each state transition. Roots in operations research Also used in economics, communications engineering, ecology, performance modeling
5
5 Definition MDP Defined formal as a tuple: –S: State –A: Action –T: Transition function Table P(s’| s, a), prob of s’ given action “a” in state “s” – R: Reward R(s, a) = cost or reward of taking action a in state s – is the probability that action a in state s at time t will lead to state s' at time t + 1,
6
6 Definition MDP The goal is to maximize some cumulative function of the rewards, typically the discounted sum over a potentially infinite horizon:
7
7 Solution The solution to a Markov Decision Process can be expressed as a policy π, a function from states to actions. Note that once a Markov decision process is combined with a policy in this way, this fixes the action for each state and the resulting combination behaves like a Markov Chain.
8
8 MDP Basics and Terminology Goal is to choose a sequence of actions for optimality Defined as MDP models: –Finite horizon: Maximize the expected reward for the next n steps –Infinite horizon: Maximize the expected discounted reward. – Transition model: Maximize average expected reward per transition. –Goal state: maximize expected reward (minimize expected cost) to some target state G.
9
9 Markov Assumption Markov Assumption: Transition probabilities (and rewards) from any given state depend only on the state and not on previous history Where you end up after action depends only on current state Choose a sequence of actions (not just one decision or one action) –Utility based on a sequence of decisions
10
10 A prototype Example 1 A manufacturer has one key machine at the core of one of its production processes. Because of heavy use, the machine deteriorates rapidly in both quality and output. Therefore, at the end of each week, a thorough inspection is done those results in classifying the condition of the machine into one of four possible states:
11
11 The following matrix shows the relative frequency (probability) of each possible transition from the state in one month (a row of the matrix) to the state in the following month (a column of the matrix).
12
12 The expected costs per week from this source are as follows: Find the expected average cost per unit time: Total cost when machine enter state 3 = 6.000$
13
13 Solution
14
14 π0 = π0 π1= 7/8 π0 + ¾ π1 π1- ¾ π1 = 7/8 π0 0.25 π1 = 7/8 π0 π1 =3.5 π0 π2= 1/16 π0 + 1/8 π1 + 1/2 π2 π2- 1/2 π2 = 1/16 π0 + 1/8 π1 0.5π2= 1/16 π0 + 1/8 π1 π2= 0.125 π0 + 0.25 π1 π3= 1/16 π0 + 1/8 π1 + 1/2 π2 = π0 1=π0+π1+π2+π3 1=π0+ 3.5π0+0.125 π0+ 0.25 (3.5 π0)+π0 1= (1+3.5+0.125+0.878+1)+π0 1= 6.5π0 π0 = 0.15 (2/13) (1) π1 =3.5(2/13)= 7/13 (2) π2 =0.125(2/13)+0. 25(7/13)= 2/13 (3) π3 = 2/13
15
15
16
16 Example 2 Assume we have 3 types of household detergents Ariel, Tide, Omo Compacting for attract customers After studying the market situation at the widely found that the three types of current shares in the market as follows: -- Ariel = 40% Tide = 35% Omo= 25%
17
17 The study showed changes in the demand for all three species were estimated for the regular 6 weeks. The conversion rates were measured from one species to another during the study period Were as in the following table Omo Ariel TideState from/to 0.05 0.90.05Ariel 0.1 0.8 Tide 0.75 0.10.15Omo
18
18 Find Identification of the market share of sales volume for each of the detergent during the next periodic periods based on current estimates of shares and the transition matrix of possibilities.
19
19 Market for Tide = (0.40*0.05+0.35*0.8+0.25*0.15)=0.3375 Market share for Ariel = (0.40*0.9+0.35*0.1+0.25*0.1)=0.42 Market for Omo = (0.40*0.05+0.35*0.1+0.25*0.75)=0.2425 Solution
20
20 Comparing the ratios of these ratios, we find that the new means: -- - Increase the share of cleaner Ariel from the local market increased = 2% - Tide Detergent decline in the share of the domestic market = 1.25% - Decline in the share of Omo = 0.75%
21
21
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.