ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state.

Slides:



Advertisements
Similar presentations
Statistical Inventory control models I
Advertisements

ST3236: Stochastic Process Tutorial 3 TA: Mar Choong Hock Exercises: 4.
Unità di Perugia e di Roma “Tor Vergata” "Uncertain production systems: optimal feedback control of the single site and extension to the multi-site case"
Dynamic Decision Processes
Q. 9 – 3 D G A C E Start Finish B F.
Inventory Control IME 451, Lecture 3.
Stochastic Modeling & Simulation Lecture 17 : Probabilistic Inventory Models part 2.
Chapter 9 Inventory Management.
EMGT 501 HW #3 Solutions Chapter 10 - SELF TEST 7
1 Outline  multi-period stochastic demand  base-stock policy  convexity.
In this handout Stochastic Dynamic Programming
STAT 497 APPLIED TIME SERIES ANALYSIS
Applications of Stochastic Programming in the Energy Industry Chonawee Supatgiat Research Group Enron Corp. INFORMS Houston Chapter Meeting August 2, 2001.
Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.
An Introduction to Markov Decision Processes Sarah Hickmott
Hidden Markov Models Fundamentals and applications to bioinformatics.
Operations Research: Applications and Algorithms
Planning under Uncertainty
POMDPs: Partially Observable Markov Decision Processes Advanced AI
Stochastic Differentiation Lecture 3 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous.
Markov Decision Processes
Background Material: Markov Decision Process. Reference Class notes Further studies: Dynamic programming and Optimal Control D. Bertsekas, Volume 1 Chapters.
Chapter 5 Inventory Control Subject to Uncertain Demand
Markov Chains Chapter 16.
1 Spare part modelling – An introduction Jørn Vatn.
Chapter 10 Dynamic Programming. 2 Agenda for This Week Dynamic Programming –Definition –Recursive Nature of Computations in DP –Forward and Backward Recursion.
Descriptive Modelling: Simulation “Simulation is the process of designing a model of a real system and conducting experiments with this model for the purpose.
Getting rid of stochasticity (applicable sometimes) Han Hoogeveen Universiteit Utrecht Joint work with Marjan van den Akker.
MNG221- Management Science –
Maintenance Forecasting and Capacity Planning
Generalized Semi-Markov Processes (GSMP)
Linear Programming Topics General optimization model LP model and assumptions Manufacturing example Characteristics of solutions Sensitivity analysis Excel.
1 Inventory Control with Stochastic Demand. 2  Week 1Introduction to Production Planning and Inventory Control  Week 2Inventory Control – Deterministic.
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 49 DP can give complete quantitative solution Example 1: Discrete, finite.
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 31 Alternative System Description If all w k are given initially as Then,
1 1 Slide © 2000 South-Western College Publishing/ITP Slides Prepared by JOHN LOUCKS.
Networks of Queues Plan for today (lecture 6): Last time / Questions? Product form preserving blocking Interpretation traffic equations Kelly / Whittle.
CHAPTER 5 Inventory Control Subject to Uncertain Demand McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
1 The Base Stock Model. 2 Assumptions  Demand occurs continuously over time  Times between consecutive orders are stochastic but independent and identically.
Approximate Dynamic Programming Methods for Resource Constrained Sensor Management John W. Fisher III, Jason L. Williams and Alan S. Willsky MIT CSAIL.
1 S ystems Analysis Laboratory Helsinki University of Technology Flight Time Allocation Using Reinforcement Learning Ville Mattila and Kai Virtanen Systems.
István Selek, PhD Post. Doctoral Researcher Systems Engineering Research Group University of Oulu, Oulu, Finland.
© Wallace J. Hopp, Mark L. Spearman, 1996, EOQ Assumptions 1. Instantaneous production. 2. Immediate delivery. 3.
Markov Chains and Random Walks. Def: A stochastic process X={X(t),t ∈ T} is a collection of random variables. If T is a countable set, say T={0,1,2, …
Generalized Semi- Markov Processes (GSMP). Summary Some Definitions The Poisson Process Properties of the Poisson Process  Interarrival times  Memoryless.
Operational Research & ManagementOperations Scheduling Economic Lot Scheduling 1.Summary Machine Scheduling 2.ELSP (one item, multiple items) 3.Arbitrary.
Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.
1 Inventory Control with Time-Varying Demand. 2  Week 1Introduction to Production Planning and Inventory Control  Week 2Inventory Control – Deterministic.
IT Applications for Decision Making. Operations Research Initiated in England during the world war II Make scientifically based decisions regarding the.
1 Resource-Constrained Multiple Product System & Stochastic Inventory Model Prof. Yuan-Shyi Peter Chiu Feb Material Management Class Note #4.
Stochastic Optimization
1 1 © 2003 Thomson  /South-Western Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Inequalities for Stochastic Linear Programming Problems By Albert Madansky Presented by Kevin Byrnes.
DEPARTMENT/SEMESTER ME VII Sem COURSE NAME Operation Research Manav Rachna College of Engg.
Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.
Prof. Dr. Holger Schlingloff 1,2 Dr. Esteban Pavese 1
CHAPTER 8 Operations Scheduling
Task: It is necessary to choose the most suitable variant from some set of objects by those or other criteria.
6.5 Stochastic Prog. and Benders’ decomposition
Clearing the Jungle of Stochastic Optimization
Game Theory in Wireless and Communication Networks: Theory, Models, and Applications Lecture 2 Bayesian Games Zhu Han, Dusit Niyato, Walid Saad, Tamer.
Markov Decision Problems
Information Theoretical Analysis of Digital Watermarking
6.5 Stochastic Prog. and Benders’ decomposition
Reinforcement Learning Dealing with Partial Observability
Independence and Counting
Independence and Counting
Dr. Arslan Ornek MATHEMATICAL MODELS
Presentation transcript:

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state  random elements  discrete-time stochastic dynamic system  optimal control/decision problem  actions vs. strategy (information gathering, feedback) Illustrated via examples, later on the general model will be described.

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 2 Example: Inventory Control Problem Quantity of a certain item, e.g. gas in a service station, oil in a refinery, cars in a dealership, spare parts in a maintenance facility, etc. The stock is checked at equally spaced periods in time, e.g. every morning, at the end of each week, etc. At those times, a decision must be made as to what quantity of the item to order, so that demand over the present period is “satisfactorily” met (we will give a quantitative meaning to this). 012k-1kk+1N-1N k-1kk+1 kth period check stock, place order

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 3 Example: Inventory Control Problem Stochastic Difference Equation: x k+1 = x k + u k – w k  x k : stock at the beginning of kth period  u k : quantity ordered at beginning of kth period. Assume delivered during kth period.  w k : demand during kth period, { w k } stochastic process  assume real-valued variables

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 4 Example: Inventory Control Problem  Negative stock is interpreted as excess demand, which is backlogged and filled ASAP. Cost of operation: 1.purchasing cost: cu k ( c = cost per unit) 2.H(x k+1 ) : penalty for holding and storage of extra quantity (x k+1 >0), or for shortage (x k+1 <0)  Cost for period k = cu k + H(x k +u k -w k ) = g(x k,u k,w k ) x k+1

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 5 Example: Inventory Control Problem Let or

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 6 Example: Inventory Control Problem Objective: to minimize, in some meaningful sense, the total cost of operation over a finite number of periods (finite “horizon”) total cost over N periods =

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 7 Example: Inventory Control Problem Two distinct situations can arise: Deterministic Case: x o is perfectly known, and the demands are known in advance to the manager. 1.at k=0, all future demands are known {w 0, w 1,..., w N-1 }.  select all orders at once, so as to exactly meet the demand  x 1 = x 2 =... = x N-1 = 0 0 = x 1 = x 0 + u 0 – w 0  u 0 = w 0 – x 0 u k = w k, 1  k  N-1 : fixed order schedule assume x 0  w 0

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 8 Example: Inventory Control Problem What we do is to select a set of fixed “actions” (numbers, i.e. precomputed order schedule). 2.At the beginning of period k, w k becomes known (perfect forecast). Hence, we must gather information and make decisions sequentially. “strategy” rule for making decisions based on information as it becomes available: forecast

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 9 Stochastic Case: x 0 is perfectly known (can generalize to case when only distribution is known), but is a random process. Assume that are i.i.d., -valued r.v., with pdf f w, i.e. Independent of k P w : Probability distribution or measure, i.e. is the problem that takes a value in the set Stochastic Case

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 10 Note that the stock is now a r.v. Alternatively, we can describe the evolution of the system in terms of a transition law: Prob = Prob = Stochastic Case

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 11 Also, the cost is a random quantity minimize expected cost Action: select all orders (numbers) at k=0 most likely not “optimal” (reduces to nonlinear programming problem) VS Strategy: select a sequence of functions s.t. Stochastic Case : difficult problem ! Optimization is over a function space Information available of kth period

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 12 Let  = (  0,,  1,...,  N-1 ) : control / decision strategy, policy, law  : set of all admissible strategies (e.g.  k (x)  0) s.t.    Stochastic Dynamic Program: If the problem is feasible, then  and optimal strategy  *, i.e. Then, the stochastic DP problem is minimize:

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 13 Note: No backlogging : transition law Summary of the Problem 1-Discrete time Stochastic System : system equation

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 14 3-Control constraint: 2-Stochastic element, assumed i.i.d. for example, will generalize to depending on x k and u k. if there is a maximum capacity M, 4-Additive cost Stochastic Dynamic Program then,

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 15 5-Optimization over admissible strategies We will see later on that this problem has a neat closed form solution: for some (threshold levels) T k : base-stock policy Stochastic Dynamic Program

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 16 Role of Information: Actions Vs. Strategies Example: Let a two-stage problem be given as: where w 0 is a random variable s.t. it takes values  1 w. p., i.e. 0 0

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 17 Role of Information: Actions Vs. Strategies Problem A: Choose actions (u 0, u 1 ) (open loop, control schedule) to minimize: Equivalently, let N=2 minimize s.t. (*)

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 18 Solution A: Case (i): Role of Information: Actions Vs. Strategies

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 19 Role of Information: Actions Vs. Strategies Case (ii):

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 20 Can be anything, then choose appropiately. Role of Information: Actions Vs. Strategies No information gathering: we choose at the start and do not take in to consideration x 1 at the beginning of stage 1.

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 21 Role of Information: Actions Vs. Strategies Problem B: Choose u 0 and u 1 sequentially, using the observed value of x 1. Sequential decision-making, feedback control. Thus to take decision u 1, we wait until outcome x 1 becomes available, and act accordingly. Solution B: from (*), we select

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 22 Note: information gathering doesn’t always help: Let (Deterministic case) Do not gain anything by making decisions sequentially Role of Information: Actions Vs. Strategies

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 23 Discrete-Time Stochastic Dynamic System Model and Optimal Decision / Control Problem 1-Discrete time stochastic dynamic system (t, k can be time or events) state space of time k control space disturbance space (countable) Also, depending on the state of the system, there are constraints on the actions that can be taken: Non empty subset

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 24 Discrete-Time Stochastic Dynamic System Model and Optimal Decision / Control Problem 2-Stochastic disturbance {  w k }. : probability measure (distribution), may depend explicitly in time, current state and action, but not on previous disturbances w k-1, …, w 0.

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 25 3-Admissible Control / Decision Laws (Strategies, Policies) Define information patterns ! ▪Feasible policies ▪ Markov: -Deterministic -Randomize and (*) (*) holds Discrete-Time Stochastic Dynamic System Model and Optimal Decision / Control Problem

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 26 4-Finite Horizon Optimal Control / Decision Problem Given an initial state x 0, and cost functions g k, k=0, …, N-1 find    that minimizes the cost functional k=0, …, N-1 subject to the system equation constraint Discrete-Time Stochastic Dynamic System Model and Optimal Decision / Control Problem

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 27 We say that  *   is optimal for the initial state x 0 if Optimal N-stage cost (or value) function Discrete-Time Stochastic Dynamic System Model and Optimal Decision / Control Problem Likewise, for  > 0 given, is said to be  -optimal if

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 28 This stochastic optimal control problem is difficult!: we are optimizing over strategies Discrete-Time Stochastic Dynamic System Model and Optimal Decision / Control Problem The Dynamic Programming Algorithm will give us necessary and sufficient conditions to decompose this problem into a sequence of coupled minimization problems over actions, (optimization) from which we will obtain. DP is only general approach for sequential design making under uncertainty.

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 29 Given a dynamic description of a system via a system equation Then we can alternatively describe the system via a transition law. Alternative System Description

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 30 Alternative System Description Given x k and u k, x k+1 has distribution:  System equation  system transition law P