Presentation is loading. Please wait.

Presentation is loading. Please wait.

Asynchronous Control for Coupled Markov Decision Systems Michael J. Neely University of Southern California Information Theory Workshop (ITW) Lausanne,

Similar presentations


Presentation on theme: "Asynchronous Control for Coupled Markov Decision Systems Michael J. Neely University of Southern California Information Theory Workshop (ITW) Lausanne,"— Presentation transcript:

1 Asynchronous Control for Coupled Markov Decision Systems Michael J. Neely University of Southern California Information Theory Workshop (ITW) Lausanne, Sept. 2012 Image Processing Camera Mode Receive Transmit t Device 3 0 0 1 1 2 2 3 3 0 0 1 1 2 2 0 0 1 1 2 2 3 3 4 4 Device 2 Device 1 t0t0 t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t 10 t9t9 1 1

2 Example: Network of Smart Devices Each device m has a Processing Chip and a Wireless Communication Chip. State 4 State 2 State 3 State 1 Processing Chip (device m) time Frame 2 energy bits energy Frame 3 Frame 1 Wireless Comms Chip (device m) Queue Arriving bits time channel quality 2 2

3 Example: Network of Smart Devices 3 3 There are many such devices sharing wireless resources. Can do opportunistic scheduling.

4 Example: Network of Smart Devices 4 4 Heterogeneous timelines  we must solve a time averaged fractional optimization: (transmit energy) m + (processing energy) m (frame size) m Minimize: Subject to: (bits generated for link i) m (frame size) m ∑ ∑ ∑ ≤ (transmission) i m m m for all links i in {1,…, L}

5 General Model 5 5 S separate embedded Markov systems. Each system s in {1, …, S} has state space K (s). Each system has its own (variable length) frames. On frame r for system s, observe:  Observe Random Event ω (s) [r].  Observe Current State k (s) [r].  Choose Control Action α (s) [r]. The 3-tuple (k (s) [r], ω (s) [r], α (s) [r]) determines:  Frame size T (s) [r].  Penalty vector (x (s) [r], y 1 (s) [r], …., y L (s) [r]).  Transition Probabilities P ij (s) [r].

6 Generalized Goal: 6 6 x (s) T (s) Minimize: Subject to: ∑ s ≤ d i for all penalties i in {1,…, L} y i (s) ∑ s T (s) Fractional terms with different denominators. General problems of this type are intractable. This has special structure that admits an optimal solution.

7 Theorem 1: 7 7 Consider special case with no random event processes ω (s) [r]. Then: 1.The problem can be transformed into a linear program via a nonlinear change of variables. 2.The total complexity is linear in the number of systems S. Translation: Total complexity is essentially the same as having each system solving its own MDP over its own state space. There is no curse of dimensionality as the number of systems S grows large!

8 Now Treat Random Events 8 8 Example: L channels, each with 10000 quality levels. 10000 L probabilities for the quality vector ω (s) [r] (cannot estimate this huge number of statistics). Even single “standard” MDPs do not have such random event processes ω (s) [r]. Idea: Use Lyapunov Optimization and Virtual Queues to estimate appropriate scalar max-weight functionals. Theorem 2: This is is a computational tool for total optimality with no curse of dimensionality.

9 Overview of Algorithm 9 9 Z i [r+1] = max [ Z i [r] + ∑θ (s) [r]y i (s) [r] –d i, 0 ] H k (s) [r+1] = H κ (s) [r] + θ (s) [r]1 κ (s) [r] - ∑θ (s) [r]q ik (s) [r] J (s) [r+1] = J (s) [r] + θ (s) [r]T (s) [r] - 1 s i Virtual Queue Update (for system s, penalty i, state k): Use a drift-plus-penalty (or “max-weight”) decision to choose actions on each frame based on virtual queue values and observed random events ω (s) [r]. θ (s) [r] is an auxiliary variable related to 1/(frame size).


Download ppt "Asynchronous Control for Coupled Markov Decision Systems Michael J. Neely University of Southern California Information Theory Workshop (ITW) Lausanne,"

Similar presentations


Ads by Google