Markov Decision Models for Order Acceptance/Rejection Problems Florian Defregger and Heinrich Kuhn Florian Defregger and Heinrich Kuhn Catholic University of Eichstätt-Ingolstadt Fifth International Conference on „Analysis of Manufacturing Systems - Production Management“ Zakynthos, Mai 24 th, 2005
May 24, Structure 1.Introduction 2.Decision Problem 3.Markov Decision Model 4.Solution Procedure 5.Numerical Results
May 24, Introduction Revenue Management (RM) –Service industries (air transportation, hotels, car rental, etc.) –Manufacturing industries (steel, paper, aluminum, etc.) see Kniker/Burman (2001) –Implementations of RM systems have increased profits by 2 – 10%.
May 24, Introduction Which kind of manufacturing company could potentially use revenue management to increase the bottom line? a)high fixed costs b)a short-term increase of capacity to meet demand peaks is very expensive or even not possible c)demand fluctuates over time d)customers are willing to pay different prices for essentially the same product
May 24, Steps of a RM system 1. Customer segmentation: Customers are segmented into customer classes, where each customer class has its own data of lead time specified by the customers of this class price (profit margin) per order of these customers processing time per order of these customers probability of arrival for an order of the customer class in a given time period (to be estimated ) 2. Optimization of Capacity: Assignment of capacity booking limits to each customer class Rejection of customers with lower profit margins when certain capacity utilization levels are reached.
May 24, Decision problem Assumptions One single bottleneck in the manufacturing process Orders: specific price, volume, and lead time (due date) one arrival in a given time period arrivals are independent of one another Products can be made to stock Limited inventory capacity Infinite planning horizon
May 24, Decision problem 1.Accept order? yes/no 2.If yes; how much inventory should be used?
May 24, Notation N order classes, n {1,..., N}. Each order n can be assigned to one order class. Parameters for orders of class n : m n : profit margin u n : capacity usage l n : lead time p n : probability of arriving dummy order class 0: Orders:
May 24, Notation Inventory : I max : maximum inventory level i : inventory level, i {0,1,..., I max }. h : inventory holding costs per unit of inventory per period Inventory level i is expressed in periods that the machine needed to produce that inventory
May 24, Notation States (n, c, i) S (state space): n : order class of the order arrived at the beginning of the current period c : number of periods the machine is reserved for already accepted but not finished yet orders, c {0,1,..., H}. i : current inventory level H-c : available capacity in the considered horizon H Problem Size: nci
May 24, Sequence of Decisions accept, do not raise inventory and satisfy order with r units from inventory : n > 0 (c+u n l n + i u n i), r {r min,…,r max } D3(r) := D2 := reject and raise inventory level : c = 0 i < I max D1 := reject and do not raise inventory level D4 := accept, satisfy order completely from inventory and raise inventory level : n > 0 c = 0 u n i D[(n, c, i)] = n: order class c: machine usage i: inventory level
May 24, Rewards R D1 = R D2 = – h ·i R D3(r) = m n – h · (i – r) R D4 = m n – h · (i – u n ) D1: reject and do not raise inventory level D2: reject and raise inventory level D3: accept and do not raise inventory level D4: accept and raise inventory level
May 24, Time-discrete Markov Decision Process Objective: find the best action for every state in order to maximize the long- term average reward per period |D| = Number of decision possibilities
May 24, p m, (n, c, i) {S : c 0}, m {0,..., N} 0,else P D1 [(n, c, i), (m, c – 1, i)] = n, m: order class c: machine usage i: inventory level Transition Probabilities = p m, (n, c, i) S, m {0,..., N}, r {min(max(0, c + u n – l n ), min(i, u n ),..., min(i, u n )} 0,else P D3(r) [(n, c, i), (m, c + u n – r – 1, i – r )] = D1: reject and do not raise inventory level D3: accept and do not raise inventory level
May 24, P D2 [(n, 0, i), (m, 0, i + 1)] = p m, n, m {0,..., N}, i {0,..., I max – 1} 0, else p m, (n, c, i) S, m {0,..., N} 0,else P D4 [(n, 0, i), (m, 0, i – u n + 1)] = n, m: order class c: machine usage i: inventory level Transition Probabilities p m, n, m {0,..., N}, i {0,..., I max } 0,else P D1 [(n, 0, i), (m, 0, i)] = P D3(r) [(n, 0, i), (m, max(0,u n – r – 1), i – r )] = … D1: reject and do not raise inventory level D2: reject and raise inventory level D3: accept and do not raise inventory level D4: accept and raise inventory level
May 24, This Markov Decision Process can be solved via standard methods, e.g. linear programming, policy iteration or value iteration. But, for large problem instances the computational times are too long (see Numerical Results). Solution Procedure
May 24, Heuristic: Objective:Find good policies in acceptable runtimes Idea: Reject "bad" order classes and accept "good" order classes "goodness" of an order class: relative profit margin m n / u n [profit/cap. usage] Solution Procedure
May 24, Consider an ”accept if possible” order class, e.g. n =4 or n =5: Acceptance levels increase with lower machine usages or higher inventory levels Solution Procedure
May 24, Consider an “accept in favorable situations” order class, e.g. n =2 or n =3: Acceptance levels increase with lower machine usages or higher inventory levels Solution Procedure
May 24, Policies can be approximated by an N-dimensional vector A T = (a 1, a 2,..., a N ) the element a n specifies: at what inventory level i can orders of class n be accepted if machine usage is 0? a n {max(0, u n – l n ),..., I max } Example: a n = 5 Solution Procedure a n = 5
May 24, The result is a combinatorial optimization problem in N dimensions. Idea for heuristic: evaluate the average reward of certain policies A T = (a 1, a 2,..., a N ) via simulation and find good policies by simulation comparisons. Example: N = 5 Solution Procedure
May 24, Solution Procedure Simulation comparison of two policies: Each policy corresponds to a Markov Reward Process. Both Markov chains are simulated and at the end of a replication the average reward of each policy is estimated. If the difference of average rewards > 0 with a certain confidence level, the simulation comparison stops, otherwise another replication is made.
May 24, Solution Procedure Policy i : order classes n {0,1,…,i} are completely rejected order classes n {i+1,…,N} are completely accepted R(i) : average reward of policy i
May 24, Solution Procedure Procedure: Sort order classes ascending by their relative profit margins Close order classes successively n = 1, 2,... until maximum of average reward is reached The last order class that was closed has the maximum reward R * ; it is called n*
May 24, Further improvement of the policy: Close half of the order class right of n*, n=n*+1, Open half of n* Determine which policy offers maximum of average reward Solution Procedure
May 24, Numerical Results problem class12345 number of states10,00050,000100,000500,0001,000,000 number of instances 100 order classes[5,20] [10,30][20,50] maximum inventory relative profit margin [1,3] maximum lead time inventory cost0.01 trafic intensity[1.5,2.5] Problem classes
May 24, Numerical Results problem class12345 proportion optimum [%] runtime value iteration [sec.] average [%] minimum [%] maximum [%] standard deviation [%] Average reward per period FCFS-policy vs. value iteration algorithm
May 24, Numerical Results problem class123 proportion optimum [%] running time heuristic [sec.] running time value iteration [sec.] average [%] minimum [%]0.0 maximum [%] standard deviation [%] Average reward per period Heuristic procedure vs. value iteration algorithm
May 24, Numerical Results problem class12345 runtime FCFS [sec.] runtime heuristic [sec.] average [%] minimum [%]0.0 maximum [%] standard deviation [%] Average reward per period FCFS-policy vs. heuristic procedure
May 24, Numerical Results order class123 lead time1042 profit margin20,00 €60,00 €100,00 € capacity usage444 relative profit margin5,0015,0025,00 relative traffic intensity 60%30%10% Example with three order classes
May 24, Numerical Results Average reward per period Heuristic procedure vs. value iteration algorithm
May 24, Numerical Results Average reward per period Heuristic procedure vs. value iteration algorithm
May 24, Numerical Results Average reward per period Heuristic procedure vs. value iteration algorithm
May 24, Thank you for your attention.