Asynchronous Control for Coupled Markov Decision Systems Michael J. Neely University of Southern California Information Theory Workshop (ITW) Lausanne, Sept Image Processing Camera Mode Receive Transmit t Device Device 2 Device 1 t0t0 t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t 10 t9t9 1 1
Example: Network of Smart Devices Each device m has a Processing Chip and a Wireless Communication Chip. State 4 State 2 State 3 State 1 Processing Chip (device m) time Frame 2 energy bits energy Frame 3 Frame 1 Wireless Comms Chip (device m) Queue Arriving bits time channel quality 2 2
Example: Network of Smart Devices 3 3 There are many such devices sharing wireless resources. Can do opportunistic scheduling.
Example: Network of Smart Devices 4 4 Heterogeneous timelines we must solve a time averaged fractional optimization: (transmit energy) m + (processing energy) m (frame size) m Minimize: Subject to: (bits generated for link i) m (frame size) m ∑ ∑ ∑ ≤ (transmission) i m m m for all links i in {1,…, L}
General Model 5 5 S separate embedded Markov systems. Each system s in {1, …, S} has state space K (s). Each system has its own (variable length) frames. On frame r for system s, observe: Observe Random Event ω (s) [r]. Observe Current State k (s) [r]. Choose Control Action α (s) [r]. The 3-tuple (k (s) [r], ω (s) [r], α (s) [r]) determines: Frame size T (s) [r]. Penalty vector (x (s) [r], y 1 (s) [r], …., y L (s) [r]). Transition Probabilities P ij (s) [r].
Generalized Goal: 6 6 x (s) T (s) Minimize: Subject to: ∑ s ≤ d i for all penalties i in {1,…, L} y i (s) ∑ s T (s) Fractional terms with different denominators. General problems of this type are intractable. This has special structure that admits an optimal solution.
Theorem 1: 7 7 Consider special case with no random event processes ω (s) [r]. Then: 1.The problem can be transformed into a linear program via a nonlinear change of variables. 2.The total complexity is linear in the number of systems S. Translation: Total complexity is essentially the same as having each system solving its own MDP over its own state space. There is no curse of dimensionality as the number of systems S grows large!
Now Treat Random Events 8 8 Example: L channels, each with quality levels L probabilities for the quality vector ω (s) [r] (cannot estimate this huge number of statistics). Even single “standard” MDPs do not have such random event processes ω (s) [r]. Idea: Use Lyapunov Optimization and Virtual Queues to estimate appropriate scalar max-weight functionals. Theorem 2: This is is a computational tool for total optimality with no curse of dimensionality.
Overview of Algorithm 9 9 Z i [r+1] = max [ Z i [r] + ∑θ (s) [r]y i (s) [r] –d i, 0 ] H k (s) [r+1] = H κ (s) [r] + θ (s) [r]1 κ (s) [r] - ∑θ (s) [r]q ik (s) [r] J (s) [r+1] = J (s) [r] + θ (s) [r]T (s) [r] - 1 s i Virtual Queue Update (for system s, penalty i, state k): Use a drift-plus-penalty (or “max-weight”) decision to choose actions on each frame based on virtual queue values and observed random events ω (s) [r]. θ (s) [r] is an auxiliary variable related to 1/(frame size).