Dynamic Optimization and Learning for Renewal Systems Michael J. Neely, University of Southern California Asilomar Conference on Signals, Systems, and.

Dynamic Optimization and Learning for Renewal Systems Michael J. Neely, University of Southern California Asilomar Conference on Signals, Systems, and Computers, Nov. 2010 PDF of paper at: http://ee.usc.edu/stochastic-nets/PDF of paper at: http://ee.usc.edu/stochastic-nets/docs/renewal-systems-asilomar2010.pdf Sponsored in part by the NSF Career CCF-0747525, ARL Network Science Collaborative Tech. Alliance t T/R Network Coordinator Task 1 Task 2 Task 3 T[0]T[1]T[2]

A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. π[r] y[r] = [y 0 (π[r]), y 1 (π[r]), …, y L (π[r])] T[r] = T(π[r]) = Frame Duration

A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]) : π[r] y[r] = [y 0 (π[r]), y 1 (π[r]), …, y L (π[r])] T[r] = T(π[r]) = Frame Duration

A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]) : π[r] y[r] = [1.2, 1.8, …, 0.4] T[r] = 8.1 = Frame Duration

A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]) : π[r] y[r] = [0.0, 3.8, …, -2.0] T[r] = 12.3 = Frame Duration

A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]) : π[r] y[r] = [1.7, 2.2, …, 0.9] T[r] = 5.6 = Frame Duration

Example 1: Opportunistic Scheduling S[r] = (S 1 [r], S 2 [r], S 3 [r]) All Frames = 1 Slot S[r] = (S 1 [r], S 2 [r], S 3 [r]) = Channel States for Slot r Policy p[r]: On frame r: First observe S[r], then choose a channel to serve (i.,e, {1, 2, 3}). Example Objectives: thruput, energy, fairness, etc.

Example 2: Markov Decision Problems M(t) = Recurrent Markov Chain (continuous or discrete) Renewals are defined as recurrences to state 1. T[r] = random inter-renewal frame size (frame r). y[r] = penalties incurred over frame r. π[r] = policy that affects transition probs over frame r. Objective: Minimize time average of one penalty subj. to time average constraints on others. 1 1 2 2 3 3 4 4

Example 3: Task Processing over Networks T/R Network Coordinator Infinite Sequence of Tasks. E.g.: Query sensors and/or perform computations. Renewal Frame r = Processing Time for Frame r. Policy Types: Low Level: {Specify Transmission Decisions over Net} High Level: {Backpressure1, Backpressure2, Shortest Path} Example Objective: Maximize quality of information per unit time subject to per-node power constraints. Task 1 Task 2 Task 3 T/R

Quick Review of Renewal-Reward Theory (Pop Quiz Next Slide!) Define the frame-average for y 0 [r]: The time-average for y 0 [r] is then: *If i.i.d. over frames, by LLN this is the same as E{y 0 }/E{T}.

Pop Quiz: (10 points) Let y 0 [r] = Energy Expended on frame r. Time avg. power = (Total Energy Use)/(Total Time) Suppose (for simplicity) behavior is i.i.d. over frames. To minimize time average power, which one should we minimize? (a)(b)

Two General Problem Types: 1) Minimize time average subject to time average constraints: 2) Maximize concave function φ(x 1, …, x L ) of time average:

Solving the Problem (Type 1): Define a “Virtual Queue” for each inequality constraint: Z l [r] c l T[r] y l [r] Z l [r+1] = max[Z l [r] – c l T[r] + y l [r], 0]

Lyapunov Function and “Drift-Plus-Penalty Ratio”: Z 2 (t) Z 1 (t) L[r] = Z 1 [r] 2 + Z 2 [r] 2 + … + Z L [r] 2 Δ(Z[r]) = E{L[r+1] – L[r] | Z[r]} = “Frame-Based Lyap. Drift” Scalar measure of queue sizes: Algorithm Technique: Every frame r, observe Z 1 [r], …, Z L [r]. Then choose a policy π[r] in P to minimize: Δ(Z[r]) + VE{y 0 [r]|Z[r]} E{T|Z[r]} “Drift-Plus-Penalty Ratio” =

The Algorithm Becomes: Observe Z[r] = (Z 1 [r], …, Z L [r]). Choose π[r] in P to solve: Then update virtual queues: Δ(Z[r]) + VE{y 0 [r]|Z[r]} E{T|Z[r]} Z l [r+1] = max[Z l [r] – c l T[r] + y l [r], 0]

Theorem: Assume the constraints are feasible. Then under this algorithm, we achieve: Δ(Z[r]) + VE{y 0 [r]|Z[r]} E{T|Z[r]} DPP Ratio: (a) (b) For all frames r in {1, 2, 3, …}

Solving the Problem (Type 2): We reduce it to a problem with the structure of Type 1 via: Auxiliary Variables γ[r] = (γ 1 [r], …, γ L [r]). The following variation on Jensen’s Inequality: For any concave function φ(x 1,.., x L ) and any (arbitrarily correlated) vector of random variables (x 1, x 2, …, x L, T), where T>0, we have: E{Tφ(X 1, …, X L )} E{T} E{T(X 1, …, X L )} E{T} φ( ) ≤

The Algorithm (type 2) Becomes: On frame r, observe Z[r] = (Z 1 [r], …, Z L [r]). (Auxiliary Variables) Choose γ 1 [r], …, γ L [r] to max the below deterministic problem: (Policy Selection) Choose π[r] in P to minimize: Then update virtual queues: Z l [r+1] = max[Z l [r] – c l T[r] + y l [r], 0], G l [r+1] = max[G l [r] + γ l [r]T[r] - y l [r], 0]

Example Problem – Task Processing: T/R Network Coordinator Task 1 Task 2 Task 3 Every Task reveals random task parameters η[r]: η[r] = [(qual 1 [r], T 1 [r]), (qual 2 [r], T 2 [r]), …, (qual 5 [r], T 5 [r])] Choose π[r] = [which node to transmit, how much idle] in {1,2,3,4,5} X [0, I max ] Transmissions incur power We use a quality distribution that tends to be better for higher-numbered nodes. Maximize quality/time subject to p av ≤ 0.25 for all nodes. Setup Transmit Idle I[r] Frame r

Minimizing the Drift-Plus-Penalty Ratio: Minimizing a pure expectation, rather than a ratio, is typically easier (see Bertsekas, Tsitsiklis Neuro-DP). Define: “Bisection Lemma”:

Learning via Sampling from the past: Suppose randomness characterized by: {η 1, η 2,..., η W } (past random samples) Want to compute (over unknown random distribution of η): Approximate this via W samples from the past:

Simulation: Sample Size W Quality of Information / Unit Time Drift-Plus-Penalty Ratio Alg. With Bisection Alternative Alg. With Time Averaging

Concluding Sims (values for W=10): Quick Advertisement: New Book: M. J. Neely, Stochastic Network Optimization with Application to Communication and Queueing Systems. Morgan & Claypool, 2010. http://www.morganclaypool.com/doi/abs/10.2200/S00271ED1V01Y201006CN T007 PDF also available from “Synthesis Lecture Series” (on digital library) Lyapunov Optimization theory (including these renewal system problems) Detailed Examples and Problem Set Questions.

Dynamic Optimization and Learning for Renewal Systems Michael J. Neely, University of Southern California Asilomar Conference on Signals, Systems, and.

Similar presentations

Presentation on theme: "Dynamic Optimization and Learning for Renewal Systems Michael J. Neely, University of Southern California Asilomar Conference on Signals, Systems, and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dynamic Optimization and Learning for Renewal Systems Michael J. Neely, University of Southern California Asilomar Conference on Signals, Systems, and.

Similar presentations

Presentation on theme: "Dynamic Optimization and Learning for Renewal Systems Michael J. Neely, University of Southern California Asilomar Conference on Signals, Systems, and."— Presentation transcript:

Similar presentations

About project

Feedback