Download presentation
Presentation is loading. Please wait.
Published byMolly Newman Modified over 9 years ago
1
© 2015 McGraw-Hill Education. All rights reserved. Chapter 19 Markov Decision Processes
2
© 2015 McGraw-Hill Education. All rights reserved. Introduction Stochastic processes –Evolve over time in a probabilistic manner Markov chain –A type of stochastic process –Special property: how the process will evolve in the future depends only on current state Independent of past events –May be continuous-time type or discrete type 2
3
© 2015 McGraw-Hill Education. All rights reserved. Introduction Transition matrix –Gives probabilities for what the state will be next time Many important systems can be modeled as a discrete time or continuous time Markov chain This chapter focuses on: –How to design a discrete time Markov chain for optimal performance 3
4
© 2015 McGraw-Hill Education. All rights reserved. 19.1 A Prototype Example Manufacturer with one key machine –Machine deteriorates rapidly in quality and output –End of week inspection classifies state 4
5
© 2015 McGraw-Hill Education. All rights reserved. A Prototype Example Transition matrix –Created by analyzing historical data –Shows relative frequency of transition from the state in one week to the state in the following week 5
6
© 2015 McGraw-Hill Education. All rights reserved. A Prototype Example State 3 is an absorbing state –Once machine becomes inoperable, it remains inoperable Must be replaced Replacement process –Takes one week to complete –Lost profit of $2,000 –Cost of replacing machine is $4,000 6
7
© 2015 McGraw-Hill Education. All rights reserved. A Prototype Example Costs incurred per week by machine in states other than state 3 7
8
© 2015 McGraw-Hill Education. All rights reserved. A Prototype Example Transition matrix describing state of the system 8
9
© 2015 McGraw-Hill Education. All rights reserved. A Prototype Example Expected average cost per unit time –Widely-used Markov performance measure –To calculate: need steady state probabilities 9
10
© 2015 McGraw-Hill Education. All rights reserved. A Prototype Example Potential maintenance policies –Do nothing –Overhaul –Replace Do nothing maintenance policy –Expected average cost per week 10
11
© 2015 McGraw-Hill Education. All rights reserved. A Prototype Example 11
12
© 2015 McGraw-Hill Education. All rights reserved. 12 19.2 A Model for Markov Decision Processes
13
© 2015 McGraw-Hill Education. All rights reserved. 13 A Model for Markov Decision Processes
14
© 2015 McGraw-Hill Education. All rights reserved. A Model for Markov Decision Processes A policy is stationary –Whenever the system is in state i, the rule for making the decision is always the same A policy is deterministic –Whenever the system is in state i, the rule for making the decision definitely chooses one particular decision 14
15
© 2015 McGraw-Hill Education. All rights reserved. A Model for Markov Decision Processes Solving the prototype example by exhaustive enumeration 15
16
© 2015 McGraw-Hill Education. All rights reserved. 16
17
© 2015 McGraw-Hill Education. All rights reserved. 17
18
© 2015 McGraw-Hill Education. All rights reserved. 19.3 Linear Programming and Optimal Policies R can be also characterized by assigning values D ik to either zero or one in a matrix form 18
19
© 2015 McGraw-Hill Education. All rights reserved. Linear Programming and Optimal Policies The optimal policy R b Linear programming formulation issue –D ik values are integers –Continuous values required for linear programming 19
20
© 2015 McGraw-Hill Education. All rights reserved. Linear Programming and Optimal Policies Solution: redefine D ik Resulting policy involving probability distributions called a randomized policy Example of a randomized policy 20
21
© 2015 McGraw-Hill Education. All rights reserved. Linear Programming and Optimal Policies A linear programming formulation 21
22
© 2015 McGraw-Hill Education. All rights reserved. Linear Programming and Optimal Policies Constraints on y ik : Solve using the simplex method –Once y ik are found, each D ik is found from: 22
23
© 2015 McGraw-Hill Education. All rights reserved. Linear Programming and Optimal Policies Key conclusion –Optimal policy found by the simplex method is deterministic Rather than randomized Solve the prototype example by linear programming –See the model given on next slide 23
24
© 2015 McGraw-Hill Education. All rights reserved. Linear Programming and Optimal Policies 24
25
© 2015 McGraw-Hill Education. All rights reserved. Linear Programming and Optimal Policies Applying the simplex method, the resulting optimal solution is: –Leave machine as is if in state 0 or 1 –Overhaul machine if in state 2 –Replace machine if in state 3 25
26
© 2015 McGraw-Hill Education. All rights reserved. 19.4 Conclusions Markov decision process –Powerful tool for optimizing performance of discrete time Markov chain processes Common objective –Find a policy for each state of the system that minimizes the expected average cost per unit time Solution methods –Exhaustive enumeration and linear programming 26
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.