Download presentation
Presentation is loading. Please wait.
1
1 An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April 29 2006
2
2 Outline Problem statement & motivations Modeling payoff distributions An asymptotically optimal algorithm
3
3 You are in a room with k slot machines Pulling the arm of machine i returns a payoff drawn (independently at random) from unknown distribution D i Allowed n total pulls Goal: maximize total payoff > 50 years of papers Machine 1 D1D1 Machine 2 D2D2 Machine 3 D3D3 The k-Armed Bandit
4
4 The Max k-Armed Bandit You are in a room with k slot machines Pulling the arm of machine i returns a payoff drawn (independently at random) from unknown distribution D i Allowed n total pulls Goal: maximize highest payoff Introduced ~2003 Machine 1 D1D1 Machine 2 D2D2 Machine 3 D3D3
5
5 The Max k-Armed Bandit: Motivations Given: some optimization problem, k randomized heuristics Each time you run a heuristic, get a solution with a certain quality Allowed n runs Goal: maximize quality of best solution Cicirello & Smith (2005) show competitive performance on RCPSP Simulated Annealing Hill Climbing Tabu Search D1D1 D2D2 D3D3 Assumption: each run has the same computational cost
6
6 The Max k-Armed Bandit: Example Given n pulls, what strategy maximizes the (expected) maximum payoff? If n=1, should pull arm 1 (higher mean) If n=1000, should pull arm 2 (higher variance)
7
7 Modeling Payoff Distributions
8
8 Can’t Handle Arbitrary Payoff Distributions Needle in the haystack: can’t distinguish arms until you get payoff > 0, at which point highest payoff can’t be improved
9
9 Assumption We will assume each machine returns payoff from a generalized extreme value (GEV) distribution Compare to Central Limit Theorem: sum of n draws a Gaussian converges in distribution Why? Extremal Types Theorem: max. of n independent draws from some fixed distribution a GEV
10
10 The GEV distribution Z has a GEV distribution if for constants s, , and > 0. determines mean determines standard deviation s determines shape
11
11 Example payoff distribution: Job Shop Scheduling Job shop scheduling: assign start times to operations, subject to constraints. Length of schedule = latest completion time of any operation Goal: find a schedule with minimum length Many heuristics (branch and bound, simulated annealing...)
12
12 Example payoff distribution: Job Shop Scheduling “ft10” is a notorious instance of the job shop scheduling problem Heuristic h: do hill-climbing 500 times Ran h 1000 times on ft10; fit GEV to payoff data
13
13 Example payoff distribution: Job shop scheduling -(schedule length) probability num. runs E[Max. payoff] Best of 50,000 sampled schedules has length 1014 Distribution truncated at 931. Optimal schedule length = 930 (Carlier & Pinson, 1986)
14
14 An Asymptotically Optimal Algorithm
15
15 Notation m i (t) = expected maximum payoff you get from pulling the i th arm t times m * (t) = max 1 i k m i (t) S(t) = expected maximum payoff you get by following strategy S for t pulls
16
16 The Algorithm Strategy S * ( and to be determined): For i from 1 to k: Using D pulls, estimate m i (n). Pick D so that with probability 1- , estimate is within of true m i (n). For remaining n-kD pulls: Pull arm with max. estimated m i (n) Guarantee: S * (n) = m * (n) - o(1).
17
17 The GEV distribution Z has a GEV distribution if for constants s, , and > 0. determines mean determines standard deviation s determines shape
18
18 Behavior of the GEV s=0 s<0 s>0 Lots of algebra Not so bad
19
19 Estimation procedure: linear interpolation! Estimate m i (1) and m i (2), then interpolate to get m i (n) Predicting m i (n) Empirical m i (1) Empirical m i (2) Predicted m i (n)
20
20 Predicting m i (n): Lemma Let X be a random variable with (unknown) mean and standard deviation max. O( -2 log -1 ) samples of X suffice to obtain an estimate such that with probability at least 1- , estimate is within of true value. Proof idea: use “median of means”
21
21 Equation for line: m i (n) = m i (1)+[m i (1)-m i (2)](log n) Estimating m i (n) requires O((log n) 2 -2 log -1 ) pulls Predicting m i (n) Empirical m i (1) Empirical m i (2) Predicted m i (n)
22
22 The Algorithm Strategy S * ( and to be determined): For i from 1 to k: Using D pulls, estimate m i (n). Pick D so that with probability 1- , estimate is within of true m i (n). For remaining n-kD pulls: Pull arm with max. predicted m i (n) Guarantee: S * (n) = m * (n) - o(1) Three things make S * less than optimal: m * (n) - m * (n-kD)
23
23 Analysis Setting =n -2, =n -1/3 takes care of the first two. Then: m * (n)-m * (n-kD) = O(log n - log(n-kD)) = O(kD/n) = O(k(log n) 2 -2 (log -1 )/n) = O(k(log n) 3 n -1/3 ) = o(1) Three things make S * less than optimal: m * (n) - m * (n-kD)
24
24 Summary & Future Work Defined max k-armed bandit problem and discussed applications to heuristic search Presented an asymptotically optimal algorithm for GEV payoff distributions (we analyzed special case s=0) Working on applications to scheduling problems
25
25 The Extremal Types Theorem Define M n = max. of n draws, and suppose where each r n is a linear “rescaling function”. Then G is either a point mass or a “generalized extreme value distribution”: for constants s, , and > 0.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.