Download presentation
Presentation is loading. Please wait.
1
Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University of Eichstätt-Ingolstadt Fifth International Conference on „Analysis of Manufacturing Systems - Production Management“ Zakynthos, Mai 24 th, 2005
2
May 24, 2005 2 Structure 1.Introduction 2.Decision Problem 3.Markov Decision Model 4.Solution Procedure 5.Numerical Results
3
May 24, 2005 3 Introduction Revenue Management (RM) –Service industries (air transportation, hotels, car rental, etc.) –Manufacturing industries (steel, paper, aluminum, etc.) see Kniker/Burman (2001) –Implementations of RM systems have increased profits by 2 – 10%.
4
May 24, 2005 4 Introduction Which kind of manufacturing company could potentially use revenue management to increase the bottom line? a)high fixed costs b)a short-term increase of capacity to meet demand peaks is very expensive or even not possible c)demand fluctuates over time d)customers are willing to pay different prices for essentially the same product
5
May 24, 2005 5 Steps of a RM system 1. Customer segmentation: Customers are segmented into customer classes, where each customer class has its own data of lead time specified by the customers of this class price (profit margin) per order of these customers processing time per order of these customers probability of arrival for an order of the customer class in a given time period (to be estimated ) 2. Optimization of Capacity: Assignment of capacity booking limits to each customer class Rejection of customers with lower profit margins when certain capacity utilization levels are reached.
6
May 24, 2005 6 Decision problem Assumptions One single bottleneck in the manufacturing process Orders: specific price, volume, and lead time (due date) one arrival in a given time period arrivals are independent of one another Products can be made to stock Limited inventory capacity Infinite planning horizon
7
May 24, 2005 7 Decision problem 1.Accept order? yes/no 2.If yes; how much inventory should be used?
8
May 24, 2005 8 Notation N order classes, n {1,..., N}. Each order n can be assigned to one order class. Parameters for orders of class n : m n : profit margin u n : capacity usage l n : lead time p n : probability of arriving dummy order class 0: Orders:
9
May 24, 2005 9 Notation Inventory : I max : maximum inventory level i : inventory level, i {0,1,..., I max }. h : inventory holding costs per unit of inventory per period Inventory level i is expressed in periods that the machine needed to produce that inventory
10
May 24, 2005 10 Notation States (n, c, i) S (state space): n : order class of the order arrived at the beginning of the current period c : number of periods the machine is reserved for already accepted but not finished yet orders, c {0,1,..., H}. i : current inventory level H-c : available capacity in the considered horizon H Problem Size: nci
11
May 24, 2005 11 Sequence of Decisions accept, do not raise inventory and satisfy order with r units from inventory : n > 0 (c+u n l n + i u n i), r {r min,…,r max } D3(r) := D2 := reject and raise inventory level : c = 0 i < I max D1 := reject and do not raise inventory level D4 := accept, satisfy order completely from inventory and raise inventory level : n > 0 c = 0 u n i D[(n, c, i)] = n: order class c: machine usage i: inventory level
12
May 24, 2005 12 Rewards R D1 = R D2 = – h ·i R D3(r) = m n – h · (i – r) R D4 = m n – h · (i – u n ) D1: reject and do not raise inventory level D2: reject and raise inventory level D3: accept and do not raise inventory level D4: accept and raise inventory level
13
May 24, 2005 13 Time-discrete Markov Decision Process Objective: find the best action for every state in order to maximize the long- term average reward per period |D| = Number of decision possibilities
14
May 24, 2005 14 p m, (n, c, i) {S : c 0}, m {0,..., N} 0,else P D1 [(n, c, i), (m, c – 1, i)] = n, m: order class c: machine usage i: inventory level Transition Probabilities = p m, (n, c, i) S, m {0,..., N}, r {min(max(0, c + u n – l n ), min(i, u n ),..., min(i, u n )} 0,else P D3(r) [(n, c, i), (m, c + u n – r – 1, i – r )] = D1: reject and do not raise inventory level D3: accept and do not raise inventory level
15
May 24, 2005 15 P D2 [(n, 0, i), (m, 0, i + 1)] = p m, n, m {0,..., N}, i {0,..., I max – 1} 0, else p m, (n, c, i) S, m {0,..., N} 0,else P D4 [(n, 0, i), (m, 0, i – u n + 1)] = n, m: order class c: machine usage i: inventory level Transition Probabilities p m, n, m {0,..., N}, i {0,..., I max } 0,else P D1 [(n, 0, i), (m, 0, i)] = P D3(r) [(n, 0, i), (m, max(0,u n – r – 1), i – r )] = … D1: reject and do not raise inventory level D2: reject and raise inventory level D3: accept and do not raise inventory level D4: accept and raise inventory level
16
May 24, 2005 16 This Markov Decision Process can be solved via standard methods, e.g. linear programming, policy iteration or value iteration. But, for large problem instances the computational times are too long (see Numerical Results). Solution Procedure
17
May 24, 2005 17 Heuristic: Objective:Find good policies in acceptable runtimes Idea: Reject "bad" order classes and accept "good" order classes "goodness" of an order class: relative profit margin m n / u n [profit/cap. usage] Solution Procedure
18
May 24, 2005 18 Consider an ”accept if possible” order class, e.g. n =4 or n =5: Acceptance levels increase with lower machine usages or higher inventory levels Solution Procedure
19
May 24, 2005 19 Consider an “accept in favorable situations” order class, e.g. n =2 or n =3: Acceptance levels increase with lower machine usages or higher inventory levels Solution Procedure
20
May 24, 2005 20 Policies can be approximated by an N-dimensional vector A T = (a 1, a 2,..., a N ) the element a n specifies: at what inventory level i can orders of class n be accepted if machine usage is 0? a n {max(0, u n – l n ),..., I max } Example: a n = 5 Solution Procedure a n = 5
21
May 24, 2005 21 The result is a combinatorial optimization problem in N dimensions. Idea for heuristic: evaluate the average reward of certain policies A T = (a 1, a 2,..., a N ) via simulation and find good policies by simulation comparisons. Example: N = 5 Solution Procedure
22
May 24, 2005 22 Solution Procedure Simulation comparison of two policies: Each policy corresponds to a Markov Reward Process. Both Markov chains are simulated and at the end of a replication the average reward of each policy is estimated. If the difference of average rewards > 0 with a certain confidence level, the simulation comparison stops, otherwise another replication is made.
23
May 24, 2005 23 Solution Procedure Policy i : order classes n {0,1,…,i} are completely rejected order classes n {i+1,…,N} are completely accepted R(i) : average reward of policy i
24
May 24, 2005 24 Solution Procedure Procedure: Sort order classes ascending by their relative profit margins Close order classes successively n = 1, 2,... until maximum of average reward is reached The last order class that was closed has the maximum reward R * ; it is called n*
25
May 24, 2005 25 Further improvement of the policy: Close half of the order class right of n*, n=n*+1, Open half of n* Determine which policy offers maximum of average reward Solution Procedure
26
May 24, 2005 26 Numerical Results problem class12345 number of states10,00050,000100,000500,0001,000,000 number of instances 100 order classes[5,20] [10,30][20,50] maximum inventory 10152050100 relative profit margin [1,3] maximum lead time 151520423466471 inventory cost0.01 trafic intensity[1.5,2.5] Problem classes
27
May 24, 2005 27 Numerical Results problem class12345 proportion optimum [%]99939400 runtime value iteration [sec.]82.3880.91584.13681.33741.1 average [%]4.43.84.02.4-8.5 minimum [%]0.0 -3.0-69.9 maximum [%]18.333.934.222.28.6 standard deviation [%]4.76.26.03.913.6 Average reward per period FCFS-policy vs. value iteration algorithm
28
May 24, 2005 28 Numerical Results problem class123 proportion optimum [%]999394 running time heuristic [sec.]42.892.8115.3 running time value iteration [sec.]82.3880.91584.1 average [%]1.71.81.5 minimum [%]0.0 maximum [%]17.933.923.1 standard deviation [%]2.94.83.1 Average reward per period Heuristic procedure vs. value iteration algorithm
29
May 24, 2005 29 Numerical Results problem class12345 runtime FCFS [sec.]15.062.8115.370.5143.2 runtime heuristic [sec.]42.892.858.3254.8206.9 average [%]2.72.12.52.01.7 minimum [%]0.0 maximum [%]16.619.232.118.411.7 standard deviation [%]3.84.15.12.82.5 Average reward per period FCFS-policy vs. heuristic procedure
30
May 24, 2005 30 Numerical Results order class123 lead time1042 profit margin20,00 €60,00 €100,00 € capacity usage444 relative profit margin5,0015,0025,00 relative traffic intensity 60%30%10% Example with three order classes
31
May 24, 2005 31 Numerical Results Average reward per period Heuristic procedure vs. value iteration algorithm
32
May 24, 2005 32 Numerical Results Average reward per period Heuristic procedure vs. value iteration algorithm
33
May 24, 2005 33 Numerical Results Average reward per period Heuristic procedure vs. value iteration algorithm
34
May 24, 2005 34 Thank you for your attention.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.