Download presentation
Presentation is loading. Please wait.
1
Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments Michael J. Neely University of Southern California http://www-rcf.usc.edu/~mjneely Information Theory and Applications Workshop (ITA), UCSD Feb. 2009 *Sponsored in part by the DARPA IT-MANET Program, NSF OCE-0520324, NSF Career CCF-0747525 Pr(success 1, …, success n ) = ??
2
Slotted System, slots t in {0, 1, 2, …} Network Queues: Q(t) = (Q 1 (t), …, Q L (t)) 2-Stage Control Decision Every slot t: 1) Stage 1 Decision: k(t) in {1, 2, …, K}. Reveals random vector (t) (iid given k(t)) (t) has unknown distribution F k ( 2) Stage 2 Decision: I(t) in I (a possibly infinite set). Affects queue rates: A(k(t), (t), I(t)), (k(t), (t),I(t)) Incurs a “Penalty Vector” x(t): x(t) = x(k(t), (t), I(t)) 01234 56
3
Stage 1: k(t) in {1, …, K}. Reveals random (t). Stage 2: I(t) in I. Incurs Penalties x(k(t), (t), I(t)). Also affects queue dynamics A(k(t), (t), I(t)), (k(t), (t),I(t)). Goal: Choose stage 1 and stage 2 decisions over time so that the time average penalties x solve: f(x), h n (x) general convex functions of multi-variables
4
Motivating Example 1: Min Power Scheduling with Channel Measurement Costs A 1 (t) A 2 (t) A L (t) S 1 (t) S 2 (t) S L (t) If channel states are known every slot: Can Schedule without knowing channel statistics or arrival rates! (EECA --- Neely 2005, 2006) (Georgiadis, Neely, Tassiulas F&T 2006) Minimize Avg. Power Subject to Stability
5
Motivating Example 1: Min Power Scheduling with Channel Measurement Costs A 1 (t) A 2 (t) A L (t) S 1 (t) S 2 (t) S L (t) If “cost” to measuring, we make a 2-stage decision: Stage 1: Measure or Not? (reveals channels (t) ) Stage 2: Transmit over a known channel? a blind channel? -Li and Neely (07) -Gopalan, Caramanis, Shakkottai (07) Existing Solutions require a-priori knowledge of the full joint-channel state distribution! (2 L, 1024 L ? ) Minimize Avg. Power Subject to Stability
6
Motivating Example 2: Diversity Backpressure Routing (DIVBAR) 1 2 3 broadcasting error Networking with Lossy channels & Multi-Receiver Diversity: DIVBAR Stage 1: Choose Commodity and Transmit DIVBAR Stage 2: Get Success Feedback, Choose Next hop If there is a single commodity (no stage 1 decision), we do not need success probabilities! If two or more commodities, we need full joint success probability distribution over all neighbors! [Neely, Urgaonkar 2006, 2008]
7
Stage 1: k(t) in {1, …, K}. Reveals random (t). Stage 2: I(t) in I. Incurs Penalties x(k(t), (t), I(t)). Also affects queue dynamics A(k(t), (t), I(t)), (k(t), (t),I(t)). Goal: Equivalent to: Where (t) is an auxiliary vector that is a proxy for x(t).
8
Stage 1: k(t) in {1, …, K}. Reveals random (t). Stage 2: I(t) in I. Incurs Penalties x(k(t), (t), I(t)). Also affects queue dynamics A(k(t), (t), I(t)), (k(t), (t),I(t)). Equivalent Goal: Technique: Form virtual queues for each constraint. U(t) b h( (t)) U n (t+1) = max[U n (t) + h n ( (t)) – b n,0] Z(t) (t) x(t) Z m (t+1) = Z m (t) – m (t) + x m (t) Possibly negative
9
Use Stochastic Lyapunov Optimization Technique: [Neely 2003], [Georgiadis, Neely, Tassiulas F&T 2006] Define: (t) = All Queues States = [Q(t), Z(t), U(t)] Define: L( (t)) = (1/2)[sum of squared queue sizes] Define: (t)) = E{L( (t+1)) – L( (t))| (t)} Schedule using the modified “Max-Weight” Rule: Every slot t, observe queue states and make a 2-stage decision to minimize the “drift plus penalty”: Minimize: (t)) + Vf( (t)) Where V is a constant control parameter that affects Proximity to optimality (and a delay tradeoff).
10
How to (try to) minimize: Minimize: (t)) + Vf( (t)) The proxy variables (t) appear separably, and their terms can be minimized without knowing system stochastics! Minimize: Subject to: [Z m (t) and U n (t) are known queue backlogs for slot t]
11
Minimizing the Remaining Terms: Minimize: (t)) + Vf( (t))
12
Solution: Define (mw) (t), I (mw) (t), k (mw) (t) as the ideal max-weight decisions (minimizing the drift expression). Define e k (t): k (mw) (t) = argmin {k in {1,.., K}} e k (t) (Stage 1) I (mw) (t) = argmin {I in I } Y k(t) ( (t), I, ) (Stage 2) (mw) (t) = solution to the proxy problem Then: ?
13
Approximation Theorem : (related to Neely 2003, G-N-T F&T 2006) If actual decisions satisfy: With: (related to slackness of constraints) Then: -All Constraints Satisfied. [B + C + c 0 V] min[ max – Q, – Z ] -Average Queue Sizes < f( x ) < f * optimal + O(max[ Q, Z ]) + (B+C)/V -Penalty Satisfies:
14
It all hinges on our approximation of e k (t) : Declare a “type k exploration event” independently with probability >0 (small). We must use k(t) = k here. { 1 (k) (t), …, W (k) (t)} = samples over past W type k explor. events Approach 1:
15
It all hinges on our approximation of e k (t) : Declare a “type k exploration event” independently with probability >0 (small). We must use k(t) = k here. { 1 (k) (t), …, W (k) (t)} = samples over past W type k explor. Events { 1 (k) (t), …, W (k) (t)} = queue backlogs at these sample times. Approach 2:
16
Analysis (Approach 2): Subtleties: 1)“Inspection Paradox” issue requires use of samples at exploration events, so { 1 (k) (t), …, W (k) (t)} iid. 2) Even so, { 1 (k) (t), …, W (k) (t)} are correlated with queue backlogs at time t, and so we cannot directly apply the Law of Large Numbers!
17
Analysis (Approach 2): Use a “Delayed Queue” Analysis: constant Can Apply LLN tt start W (t) (t) (t) (t)
18
Max-Weight Learning Algorithm (Approach 2): (No knowledge of probability distributions is required!) -Have Random Exploration Events (prob. ). -Choose Stage-1 decision k(t) = argmin {k in {1,.., K}} [ e k (t) ] -Use I (mw) (t) for Stage-2 decision: I (mw) (t) = argmin {I in I } Y k(t) ( (t), I, (t)) -Use (mw) (t) for proxy variables. -Update the virtual queues and the moving averages.
19
Theorem (Fixed W, V): With window size W we have: -All Constraints Satisfied. [B + C + c 0 V] min[ max – Q, – Z ] -Average Queue Sizes < f( x ) < f * + O(1/sqrt{W}) + (B+C)/V -Penalty Satisfies:
20
Concluding Theorem (Variable W, V): Let 0 < 1 < 2 < 1. Define V(t) = (t + 1) 1, W(t) = (t+1) 2 Then under the Max-Weight Learning Algorithm: -All Constraints are Satisfied. -All Queues are mean rate stable*: -Average Penalty gets exact optimality (subject to random exploration events): f( x ) = f * *Mean rate stability does not imply finite average congestion and delay. In fact, Average congestion and delay are necessarily infinite when exact optimality is reached.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.