Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2015 McGraw-Hill Education. All rights reserved. Chapter 19 Markov Decision Processes.

Similar presentations


Presentation on theme: "© 2015 McGraw-Hill Education. All rights reserved. Chapter 19 Markov Decision Processes."— Presentation transcript:

1 © 2015 McGraw-Hill Education. All rights reserved. Chapter 19 Markov Decision Processes

2 © 2015 McGraw-Hill Education. All rights reserved. Introduction Stochastic processes –Evolve over time in a probabilistic manner Markov chain –A type of stochastic process –Special property: how the process will evolve in the future depends only on current state Independent of past events –May be continuous-time type or discrete type 2

3 © 2015 McGraw-Hill Education. All rights reserved. Introduction Transition matrix –Gives probabilities for what the state will be next time Many important systems can be modeled as a discrete time or continuous time Markov chain This chapter focuses on: –How to design a discrete time Markov chain for optimal performance 3

4 © 2015 McGraw-Hill Education. All rights reserved. 19.1 A Prototype Example Manufacturer with one key machine –Machine deteriorates rapidly in quality and output –End of week inspection classifies state 4

5 © 2015 McGraw-Hill Education. All rights reserved. A Prototype Example Transition matrix –Created by analyzing historical data –Shows relative frequency of transition from the state in one week to the state in the following week 5

6 © 2015 McGraw-Hill Education. All rights reserved. A Prototype Example State 3 is an absorbing state –Once machine becomes inoperable, it remains inoperable Must be replaced Replacement process –Takes one week to complete –Lost profit of $2,000 –Cost of replacing machine is $4,000 6

7 © 2015 McGraw-Hill Education. All rights reserved. A Prototype Example Costs incurred per week by machine in states other than state 3 7

8 © 2015 McGraw-Hill Education. All rights reserved. A Prototype Example Transition matrix describing state of the system 8

9 © 2015 McGraw-Hill Education. All rights reserved. A Prototype Example Expected average cost per unit time –Widely-used Markov performance measure –To calculate: need steady state probabilities 9

10 © 2015 McGraw-Hill Education. All rights reserved. A Prototype Example Potential maintenance policies –Do nothing –Overhaul –Replace Do nothing maintenance policy –Expected average cost per week 10

11 © 2015 McGraw-Hill Education. All rights reserved. A Prototype Example 11

12 © 2015 McGraw-Hill Education. All rights reserved. 12 19.2 A Model for Markov Decision Processes

13 © 2015 McGraw-Hill Education. All rights reserved. 13 A Model for Markov Decision Processes

14 © 2015 McGraw-Hill Education. All rights reserved. A Model for Markov Decision Processes A policy is stationary –Whenever the system is in state i, the rule for making the decision is always the same A policy is deterministic –Whenever the system is in state i, the rule for making the decision definitely chooses one particular decision 14

15 © 2015 McGraw-Hill Education. All rights reserved. A Model for Markov Decision Processes Solving the prototype example by exhaustive enumeration 15

16 © 2015 McGraw-Hill Education. All rights reserved. 16

17 © 2015 McGraw-Hill Education. All rights reserved. 17

18 © 2015 McGraw-Hill Education. All rights reserved. 19.3 Linear Programming and Optimal Policies R can be also characterized by assigning values D ik to either zero or one in a matrix form 18

19 © 2015 McGraw-Hill Education. All rights reserved. Linear Programming and Optimal Policies The optimal policy R b Linear programming formulation issue –D ik values are integers –Continuous values required for linear programming 19

20 © 2015 McGraw-Hill Education. All rights reserved. Linear Programming and Optimal Policies Solution: redefine D ik Resulting policy involving probability distributions called a randomized policy Example of a randomized policy 20

21 © 2015 McGraw-Hill Education. All rights reserved. Linear Programming and Optimal Policies A linear programming formulation 21

22 © 2015 McGraw-Hill Education. All rights reserved. Linear Programming and Optimal Policies Constraints on y ik : Solve using the simplex method –Once y ik are found, each D ik is found from: 22

23 © 2015 McGraw-Hill Education. All rights reserved. Linear Programming and Optimal Policies Key conclusion –Optimal policy found by the simplex method is deterministic Rather than randomized Solve the prototype example by linear programming –See the model given on next slide 23

24 © 2015 McGraw-Hill Education. All rights reserved. Linear Programming and Optimal Policies 24

25 © 2015 McGraw-Hill Education. All rights reserved. Linear Programming and Optimal Policies Applying the simplex method, the resulting optimal solution is: –Leave machine as is if in state 0 or 1 –Overhaul machine if in state 2 –Replace machine if in state 3 25

26 © 2015 McGraw-Hill Education. All rights reserved. 19.4 Conclusions Markov decision process –Powerful tool for optimizing performance of discrete time Markov chain processes Common objective –Find a policy for each state of the system that minimizes the expected average cost per unit time Solution methods –Exhaustive enumeration and linear programming 26


Download ppt "© 2015 McGraw-Hill Education. All rights reserved. Chapter 19 Markov Decision Processes."

Similar presentations


Ads by Google