Network Utility Maximization over Partially Observable Markov Channels 1 1 Channel State 1 = ? Channel State 2 = ? Channel State 3 = ? Restless Multi-Arm Bandit
This work is from the following papers:* Li, Neely WiOpt 2010 Li, Neely ArXiv 2010, submitted for conference Neely Asilomar 2010 Chih-Ping Li is graduating and is currently looking for post-doc positions! *The above paper titles are given below, and are available at: C. Li and M. J. Neely “Exploiting Channel Memory for Multi-User Wireless Scheduling without Channel Measurement: Capacity Regions and Algorithms,” Proc. WiOpt C. Li and M. J. Neely, “Network Utility Maximization over Partially Observable Markovian Channels,” arXiv: , Aug M. J. Neely, “Dynamic Optimization and Learning for Renewal Systems,” Proc. Asilomar Conf. on Signals, Systems, and Computers, Nov
1 1 S 1 (t) = ? S 2 (t) = ? S 3 (t) = ? N-user wireless system. Timeslots t in {0, 1, 2, …}. Choose one channel for transmission every slot t. Channels S i (t) ON/OFF Markov, current states S i (t) unknown. Process S i (t) for Channel i: εiεi δiδi Restless Multi-Arm Bandit with vector rewards
1 1 S 1 (t) = ? S 2 (t) = ? S 3 (t) = ? Restless Multi-Arm Bandit with vector rewards Suppose we serve channel i on slot t: Process S i (t) for Channel i: εiεi δiδi
1 1 S 1 (t) = ? S 2 (t) = ? S 3 (t) = ? Suppose we serve channel i on slot t: If S i (t)=ON ACK Reward vector r(t) = (0, …, 0, 1, 0, …, 0). Process S i (t) for Channel i: εiεi δiδi Restless Multi-Arm Bandit with vector rewards = r(t)
1 1 S 1 (t) = ? S 2 (t) = ? S 3 (t) = ? Suppose we serve channel i on slot t: If S i (t)=ON ACK Reward vector r(t) = (0, …, 0, 1, 0, …, 0). If S i (t)=OFF NACK Reward vector r(t) = (0, …, 0, 0, 0, …, 0). Process S i (t) for Channel i: εiεi δiδi = r(t) Restless Multi-Arm Bandit with vector rewards
1 1 S 1 (t) = ? S 2 (t) = ? S 3 (t) = ? Let ω i (t) = Pr[S i (t) = ON]. If we serve channel i, we update: ω i (t+1) = { (1-ε i ) if we get “ACK” { δ i if we get “NACK” Process S i (t) for Channel i: εiεi δiδi Restless Multi-Arm Bandit with vector rewards
1 1 S 1 (t) = ? S 2 (t) = ? S 3 (t) = ? Let ω i (t) = Pr[S i (t) = ON]. If we do not serve channel i, we update: ω i (t+1) = ω i (t)(1-ε i ) + (1-ω i (t))δ i Process S i (t) for Channel i: εiεi δiδi Restless Multi-Arm Bandit with vector rewards
We want to: 1)Characterize the capacity region Λ of the system. Λ = { all stabilizable input rate vectors (λ 1,..., λ Ν ) } = { all possible time average reward vectors } 2) Perform concave utility maximization over Λ. Maximize: g(r 1,..., r Ν ) Subject to: (r 1,..., r Ν ) in Λ λ1λ1 λ2λ2 λ3λ3
What is known about such systems? 1)If (S 1 (t), …, S N (t)) known every slot: Capacity Region known [Tassiulas, Ephremides 1993]. Greedy “Max-Weight” optimal [Tassiulas, Ephremides 1993]. Capacity Region is same, and Max-Weight works, for both iid vectors and time-correlated Markov vectors. 2) If (S 1 (t), …, S N (t)) unknown but iid over slots: Capacity Region is known. Greedy Max-Weight decisions are optimal. [Gopalan, Caramanis, Shakkottai Allerton 2007] [Li, Neely CDC 2007, TMC 2010] 3) If (S 1 (t), …, S N (t)) unknown and time-correlated: Capacity Region is unknown. Seems to be an intractable multi-dimensional Markov Decision Problem (MDP). Current decisions affect future (ω 1 (t), …, ω N (t)) probability vectors.
Our Contributions: 1) We construct an operational capacity region (inner bound). Our Contributions: 1) We construct an operational capacity region (inner bound). 2) We construct a novel frame based technique for utility maximization over this region.
Assume channels are positively correlated: ε i + δ i ≤ 1. εiεi δiδi ω i (t) t 1-ε i δiδi After “ACK” ω i (t) > Steady state Pr[S i (t) = ON] = δ i /(δ i +ε i ) After “NACK” ω i (t) < Steady state Pr[S i (t) = ON] = δ i /(δ i +ε i ) Gives good intuition for scheduling decisions. For Special Case of channel symmetry (ε i = ε, δ i = δ for all i), “round-robin” maximizes sum output rate. [Ahmad, Liu, Javidi, Zhao, Krishnamachari, Trans IT 2009] How to use intuition to construct a capacity region (for possibly asymmetric channels)?
Inner Bound on Λ int (“Operational Capacity Region”): N N λ1λ1 λ2λ2 λNλN Variable Length Frame Every frame, randomly pick a subset and an ordering according to some probability distribution over the ≈ N!2 N choices. Λ int = Convex hull of all randomized round-robin policies.
Inner Bound Properties: Bound contains a huge number of policies. Touches true capacity boundary as N ∞. Even a good bound for N=2: Can obtain efficient algorithms for optimizing over this region! Let’s see how…
New Lyapunov Drift Analysis Technique: Lyapunov Function: L(t) = ∑ Q i (t) 2 T-Slot Drift for frame k: Δ[k] = L(t[k] + T[k]) – L(t[k]) New Drift-Plus-Penalty Ratio Method on each frame: Variable Length Frame t[k]t[k]+T[k] Minimize: E{ Δ[k] + V x Penalty[k] | Q(t[k]) } E{ T[k] | Q(t[k]) }
New Lyapunov Drift Analysis Technique: Lyapunov Function: L(t) = ∑ Q i (t) 2 T-Slot Drift for frame k: Δ[k] = L(t[k] + T[k]) – L(t[k]) New Drift-Plus-Penalty Ratio Method on each frame: Variable Length Frame t[k]t[k]+T[k] Minimize: E{ Δ[k] + V x Penalty[k] | Q(t[k]) } E{ T[k] | Q(t[k]) } Tassiulas, Ephremides 90, 92, 93 (queue stability) Tassiulas, Ephremides 90, 92, 93 (queue stability)
New Lyapunov Drift Analysis Technique: Lyapunov Function: L(t) = ∑ Q i (t) 2 T-Slot Drift for frame k: Δ[k] = L(t[k] + T[k]) – L(t[k]) New Drift-Plus-Penalty Ratio Method on each frame: Variable Length Frame t[k]t[k]+T[k] Minimize: E{ Δ[k] + V x Penalty[k] | Q(t[k]) } E{ T[k] | Q(t[k]) } Neely, Modiano 2003, 2005 (queue stability + utility optimization) Neely, Modiano 2003, 2005 (queue stability + utility optimization)
New Lyapunov Drift Analysis Technique: Lyapunov Function: L(t) = ∑ Q i (t) 2 T-Slot Drift for frame k: Δ[k] = L(t[k] + T[k]) – L(t[k]) New Drift-Plus-Penalty Ratio Method on each frame: Variable Length Frame t[k]t[k]+T[k] Minimize: E{ Δ[k] + V x Penalty[k] | Q(t[k]) } E{ T[k] | Q(t[k]) } Li, Neely 2010 (queue stability + utility optimization for variable frames) Li, Neely 2010 (queue stability + utility optimization for variable frames)
Conclusions: Quick Advertisement: New Book: M. J. Neely, Stochastic Network Optimization with Application to Communication and Queueing Systems. Morgan & Claypool, PDF also available from “Synthesis Lecture Series” (on digital library) Link available on Mike Neely homepage. Lyapunov Optimization theory (including renewal system problems) Detailed Examples and Problem Set Questions. Multi-Armed Bandit Problem with Reward Vectors (complex MDP). Operational Capacity Region = Convex Hull over Frame- Based Randomized Round-Robin Policies. Stochastic Network Optimization via the Drift-Plus- Penalty Ratio method.