Download presentation
Presentation is loading. Please wait.
Published byAnn Wiggins Modified over 9 years ago
1
Asynchronous Control for Coupled Markov Decision Systems Michael J. Neely University of Southern California Information Theory Workshop (ITW) Lausanne, Sept. 2012 Image Processing Camera Mode Receive Transmit t Device 3 0 0 1 1 2 2 3 3 0 0 1 1 2 2 0 0 1 1 2 2 3 3 4 4 Device 2 Device 1 t0t0 t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t 10 t9t9 1 1
2
Example: Network of Smart Devices Each device m has a Processing Chip and a Wireless Communication Chip. State 4 State 2 State 3 State 1 Processing Chip (device m) time Frame 2 energy bits energy Frame 3 Frame 1 Wireless Comms Chip (device m) Queue Arriving bits time channel quality 2 2
3
Example: Network of Smart Devices 3 3 There are many such devices sharing wireless resources. Can do opportunistic scheduling.
4
Example: Network of Smart Devices 4 4 Heterogeneous timelines we must solve a time averaged fractional optimization: (transmit energy) m + (processing energy) m (frame size) m Minimize: Subject to: (bits generated for link i) m (frame size) m ∑ ∑ ∑ ≤ (transmission) i m m m for all links i in {1,…, L}
5
General Model 5 5 S separate embedded Markov systems. Each system s in {1, …, S} has state space K (s). Each system has its own (variable length) frames. On frame r for system s, observe: Observe Random Event ω (s) [r]. Observe Current State k (s) [r]. Choose Control Action α (s) [r]. The 3-tuple (k (s) [r], ω (s) [r], α (s) [r]) determines: Frame size T (s) [r]. Penalty vector (x (s) [r], y 1 (s) [r], …., y L (s) [r]). Transition Probabilities P ij (s) [r].
6
Generalized Goal: 6 6 x (s) T (s) Minimize: Subject to: ∑ s ≤ d i for all penalties i in {1,…, L} y i (s) ∑ s T (s) Fractional terms with different denominators. General problems of this type are intractable. This has special structure that admits an optimal solution.
7
Theorem 1: 7 7 Consider special case with no random event processes ω (s) [r]. Then: 1.The problem can be transformed into a linear program via a nonlinear change of variables. 2.The total complexity is linear in the number of systems S. Translation: Total complexity is essentially the same as having each system solving its own MDP over its own state space. There is no curse of dimensionality as the number of systems S grows large!
8
Now Treat Random Events 8 8 Example: L channels, each with 10000 quality levels. 10000 L probabilities for the quality vector ω (s) [r] (cannot estimate this huge number of statistics). Even single “standard” MDPs do not have such random event processes ω (s) [r]. Idea: Use Lyapunov Optimization and Virtual Queues to estimate appropriate scalar max-weight functionals. Theorem 2: This is is a computational tool for total optimality with no curse of dimensionality.
9
Overview of Algorithm 9 9 Z i [r+1] = max [ Z i [r] + ∑θ (s) [r]y i (s) [r] –d i, 0 ] H k (s) [r+1] = H κ (s) [r] + θ (s) [r]1 κ (s) [r] - ∑θ (s) [r]q ik (s) [r] J (s) [r+1] = J (s) [r] + θ (s) [r]T (s) [r] - 1 s i Virtual Queue Update (for system s, penalty i, state k): Use a drift-plus-penalty (or “max-weight”) decision to choose actions on each frame based on virtual queue values and observed random events ω (s) [r]. θ (s) [r] is an auxiliary variable related to 1/(frame size).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.