Download presentation
Presentation is loading. Please wait.
1
Dynamic Optimization and Learning for Renewal Systems -- With applications to Wireless Networks and Peer-to-Peer Networks Michael J. Neely, University of Southern California t T/R Network Coordinator Task 1 Task 2 Task 3 T[0]T[1]T[2]
2
Outline: Optimization of Renewal Systems Application 1: Task Processing in Wireless Networks Quality-of-Information (ARL CTA project) Task “deluge” problem Application 2: Peer-to-Peer Networks Social networks (ARL CTA project) Internet and wireless
3
References: General Theory and Application 1: M. J. Neely, Stochastic Network Optimization with Application to Communication and Queueing Systems, Morgan & Claypool, 2010. M. J. Neely, “Dynamic Optimization and Learning for Renewal Systems,” Proc. Asilomar Conf. on Signals, Systems, and Computers, Nov. 2010. Application 2 (Peer-to-Peer): M. J. Neely and L. Golubchik, “Utility Optimization for Dynamic Peer-to-Peer Networks with Tit-for-Tat Constraints,” Proc. IEEE INFOCOM, 2011. These works are available on: http://www-bcf.usc.edu/~mjneely/
4
A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. π[r] y[r] = [y 0 (π[r]), y 1 (π[r]), …, y L (π[r])] T[r] = T(π[r]) = Frame Duration
5
A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]) : π[r] y[r] = [y 0 (π[r]), y 1 (π[r]), …, y L (π[r])] T[r] = T(π[r]) = Frame Duration
6
A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]) : π[r] y[r] = [1.2, 1.8, …, 0.4] T[r] = 8.1 = Frame Duration
7
A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]) : π[r] y[r] = [0.0, 3.8, …, -2.0] T[r] = 12.3 = Frame Duration
8
A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]) : π[r] y[r] = [1.7, 2.2, …, 0.9] T[r] = 5.6 = Frame Duration
9
Example 1: Opportunistic Scheduling S[r] = (S 1 [r], S 2 [r], S 3 [r]) All Frames = 1 Slot S[r] = (S 1 [r], S 2 [r], S 3 [r]) = Channel States for Slot r Policy π[r]: On frame r: First observe S[r], then choose a channel to serve (i.,e, {1, 2, 3}). Example Objectives: thruput, energy, fairness, etc.
10
Example 2: Convex Programs (Deterministic Problems) Minimize: f(x 1, x 2, …, x N ) Subject to: g k (x 1, x 2, …, x N ) ≤ 0 for all k in {1,…, K} (x 1, x 2, …, x N ) in A
11
Example 2: Convex Programs (Deterministic Problems) All Frames = 1 Slot. Policy π[r] = (x 1 [r], x 2 [r], …, x N [r]) in A. Time average: f(x[r]) = lim R ∞ (1/R)∑ r=0 f(x[r]) Minimize: f(x 1, x 2, …, x N ) Subject to: g k (x 1, x 2, …, x N ) ≤ 0 for all k in {1,…, K} (x 1, x 2, …, x N ) in A Equivalent to: Minimize: f(x 1 [r], x 2 [r], …, x N [r]) Subject to: g k (x 1 [r], x 2 [r], …, x N [r]) ≤ 0 for all k in {1,…, K} (x 1 [r], x 2 [r], …, x N [r]) in A for all frames r R-1
12
Example 2: Convex Programs (Deterministic Problems) Jensen’s Inequality: The time average of the dynamic solution (x 1 [r], x 2 [r], …, x N [r]) solves the original convex program! Minimize: f(x 1, x 2, …, x N ) Subject to: g k (x 1, x 2, …, x N ) ≤ 0 for all k in {1,…, K} (x 1, x 2, …, x N ) in A Equivalent to: Minimize: f(x 1 [r], x 2 [r], …, x N [r]) Subject to: g k (x 1 [r], x 2 [r], …, x N [r]) ≤ 0 for all k in {1,…, K} (x 1 [r], x 2 [r], …, x N [r]) in A for all frames r
13
Example 3: Markov Decision Problems M(t) = Recurrent Markov Chain (continuous or discrete) Renewals are defined as recurrences to state 1. T[r] = random inter-renewal frame size (frame r). y[r] = penalties incurred over frame r. π[r] = policy that affects transition probs over frame r. Objective: Minimize time average of one penalty subj. to time average constraints on others. 1 1 2 2 3 3 4 4
14
Example 4: Task Processing over Networks T/R Network Coordinator Infinite Sequence of Tasks. E.g.: Query sensors and/or perform computations. Renewal Frame r = Processing Time for Frame r. Policy Types: Low Level: {Specify Transmission Decisions over Net} High Level: {Backpressure1, Backpressure2, Shortest Path} Example Objective: Maximize quality of information per unit time subject to per-node power constraints. Task 1 Task 2 Task 3 T/R
15
Quick Review of Renewal-Reward Theory (Pop Quiz Next Slide!) Define the frame-average for y 0 [r]: The time-average for y 0 [r] is then: *If i.i.d. over frames, by LLN this is the same as E{y 0 }/E{T}.
16
Pop Quiz: (10 points) Let y 0 [r] = Energy Expended on frame r. Time avg. power = (Total Energy Use)/(Total Time) Suppose (for simplicity) behavior is i.i.d. over frames. To minimize time average power, which one should we minimize? (a)(b)
17
Pop Quiz: (10 points) Let y 0 [r] = Energy Expended on frame r. Time avg. power = (Total Energy Use)/(Total Time) Suppose (for simplicity) behavior is i.i.d. over frames. To minimize time average power, which one should we minimize? (a)(b)
18
Two General Problem Types: 1) Minimize time average subject to time average constraints: 2) Maximize concave function φ(x 1, …, x L ) of time average:
19
Solving the Problem (Type 1): Define a “Virtual Queue” for each inequality constraint: Z l [r] c l T[r] y l [r] Z l [r+1] = max[Z l [r] – c l T[r] + y l [r], 0]
20
Lyapunov Function and “Drift-Plus-Penalty Ratio”: Z 2 (t) Z 1 (t) L[r] = Z 1 [r] 2 + Z 2 [r] 2 + … + Z L [r] 2 Δ(Z[r]) = E{L[r+1] – L[r] | Z[r]} = “Frame-Based Lyap. Drift” Scalar measure of queue sizes: Algorithm Technique: Every frame r, observe Z 1 [r], …, Z L [r]. Then choose a policy π[r] in P to minimize: Δ(Z[r]) + VE{y 0 [r]|Z[r]} E{T|Z[r]} “Drift-Plus-Penalty Ratio” =
21
The Algorithm Becomes: Observe Z[r] = (Z 1 [r], …, Z L [r]). Choose π[r] in P to solve: Then update virtual queues: Δ(Z[r]) + VE{y 0 [r]|Z[r]} E{T|Z[r]} Z l [r+1] = max[Z l [r] – c l T[r] + y l [r], 0]
22
Theorem: Assume the constraints are feasible. Then under this algorithm, we achieve: Δ(Z[r]) + VE{y 0 [r]|Z[r]} E{T|Z[r]} DPP Ratio: (a) (b) For all frames r in {1, 2, 3, …}
23
Application 1 – Task Processing: T/R Network Coordinator Task 1 Task 2 Task 3 Every Task reveals random task parameters η[r]: η[r] = [(qual 1 [r], T 1 [r]), (qual 2 [r], T 2 [r]), …, (qual 5 [r], T 5 [r])] Choose π[r] = [which node to transmit, how much idle] in {1,2,3,4,5} X [0, I max ] Transmissions incur power We use a quality distribution that tends to be better for higher-numbered nodes. Maximize quality/time subject to p av ≤ 0.25 for all nodes. Setup Transmit Idle I[r] Frame r
24
Minimizing the Drift-Plus-Penalty Ratio: Minimizing a pure expectation, rather than a ratio, is typically easier (see Bertsekas, Tsitsiklis Neuro-DP). Define: “Bisection Lemma”:
25
Learning via Sampling from the past: Suppose randomness characterized by: {η 1, η 2,..., η W } (past random samples) Want to compute (over unknown random distribution of η): Approximate this via W samples from the past:
26
Simulation: Sample Size W Quality of Information / Unit Time Drift-Plus-Penalty Ratio Alg. With Bisection Alternative Alg. With Time Averaging
27
Concluding Sims (values for W=10): T/R Network Coordinator Task 1 Task 2 Task 3 Setup Transmit Idle I[r] Frame r
28
“Application 2” – Peer-to-Peer Wireless Networking:
29
Network Cloud 1 2 3 5 4 N nodes. Each node n has download social group G n. G n is a subset of {1, …, N}. Each file f is in some subset of nodes N f. Each node n can request download of a file f from any node in G n N f Transmission rates (µ ab (t)) between nodes are chosen in some (possibly time-varying) set t
30
“Internet Cloud” Example 1: Network Cloud 1 2 3 5 4 Uplink capacity C 1 uplink (t) = Constant (no variation). ∑ b µ nb (t) ≤ C n uplink for all nodes n. This example assumes uplink capacity is the bottleneck.
31
“Internet Cloud” Example 2: Network Cloud 1 2 3 5 4 (t) specifies a single supportable (µ ab (t)). No “transmission rate decisions.” The allowable rates (µ ab (t)) are given to the peer-to-peer system from some underlying transport and routing protocol.
32
“Wireless Basestation” Example 3: = base station = wireless device Wireless device-to-device transmission increases capacity. (µ ab (t)) chosen in (t). Transmissions coordinated by base station.
33
“Commodities” for Request Allocation Multiple file downloads can be active. Each file corresponds to a subset of nodes. Queueing files according to subsets would result in O(2 N ) queues. (complexity explosion!). Instead of that, without loss of optimality, we use the following alternative commodity structure…
34
“Commodities” for Request Allocation Use subset info to determine the decision set. n (A n (t), N n (t)) j k m G n N n (t)
35
“Commodities” for Request Allocation Use subset info to determine the decision set. Choose which node will help download. n (A n (t), N n (t)) j k m G n N n (t)
36
“Commodities” for Request Allocation Use subset info to determine the decision set. Choose which node will help download. That node queues the request: Q mn (t+1) = max[Q mn (t) + R mn (t) - µ mn (t), 0] Subset info can now be thrown away. n (A n (t), N n (t)) j k m Q mn (t)
37
Stochastic Network Optimization Problem: Maximize: ∑ n g n ( ∑ a r an ) Subject to: (1)Q mn < infinity (Queue Stability Constraint) (2)α ∑ a r an ≤ β + ∑ b r nb for all n (Tit-for-Tat Constraint)
38
Maximize: ∑ n g n ( ∑ a r an ) Subject to: (1)Q mn < infinity (Queue Stability Constraint) (2)α ∑ a r an ≤ β + ∑ b r nb for all n (Tit-for-Tat Constraint) concave utility function Stochastic Network Optimization Problem:
39
Maximize: ∑ n g n ( ∑ a r an ) Subject to: (1)Q mn < infinity (Queue Stability Constraint) (2)α ∑ a r an ≤ β + ∑ b r nb for all n (Tit-for-Tat Constraint) concave utility function time average request rate Stochastic Network Optimization Problem:
40
Maximize: ∑ n g n ( ∑ a r an ) Subject to: (1)Q mn < infinity (Queue Stability Constraint) (2)α ∑ a r an ≤ β + ∑ b r nb for all n (Tit-for-Tat Constraint) concave utility function time average request rate α x Download rate Stochastic Network Optimization Problem:
41
Maximize: ∑ n g n ( ∑ a r an ) Subject to: (1)Q mn < infinity (Queue Stability Constraint) (2)α ∑ a r an ≤ β + ∑ b r nb for all n (Tit-for-Tat Constraint) concave utility function time average request rate α x Download rate β + Upload rate Stochastic Network Optimization Problem:
42
Solution Technique for Infocom paper Use “Drift-Plus-Penalty” framework in a new “Universal Scheduling” scenario. We make no statistical assumptions on the stochastic processes [S(t); (A n (t), N n (t))].
43
Resulting Algorithm: (Auxiliary Variables) For each n, choose an aux. variable γ n (t) in interval [0, A max ] to maximize: Vg n (γ n (t)) – H n (t) n (t) (Request Allocation) For each n, observe the following value for all m in { G n N n (t)}: -Q mn (t) + H n (t) + (F m (t) – αF n (t)) Give A n (t) to queue m with largest non-neg value, Drop A n (t) if all above values are negative. (Scheduling) Choose (µ ab (t)) in (t) to maximize: ∑ nb µ nb (t)Q nb (t)
44
How the Incentives Work for node n: F n (t) α x Receive Help(t)β + Help Others(t) -Q mn (t) + H n (t) + (F m (t) – αF n (t)) Node n can only request downloads from others if it finds a node m with a non-negative value of: F n (t) = “Node n Reputation” (Good reputation = Low value)
45
How the Incentives Work for node n: F n (t) α x Receive Help(t)β + Help Others(t) -Q mn (t) + H n (t) + (F m (t) – αF n (t)) Node n can only request downloads from others if it finds a node m with a non-negative value of: F n (t) = “Node n Reputation” (Good reputation = Low value) BoundedCompare Reputations!
46
Concluding Theorem: For any arbitrary [S(t); (A n (t), N n (t))] sample path, we guarantee: a)Q mn (t) ≤ Q max = O(V) for all t, all (m,n). b)All Tit-for-Tat constraints are satisfied. c)For any T>0: liminf K ∞ [Achieved Utility(KT)] ≥ liminf K ∞ (1/K)∑ i=1 [“T-Slot-Lookahead-Utility[i]”]- BT/V Frame 1 Frame 2Frame 3 0T2T3T K
47
Conclusions for Peer-to-Peer Problem: Framework for posing peer-to-peer networking as stochastic network optimization problems. Can compute optimal solution in polynomial time. Conclusions overall: Renewal Optimization Framework can be viewed as “Generalized Linear Programming” Variable Length Scheduling Modes Many applications (task processing, peer-to-peer networks, Markov decision problems, linear programs, convex programs, stock market, smart grid, energy harvesting, and many more)
48
Solving the Problem (Type 2): We reduce it to a problem with the structure of Type 1 via: Auxiliary Variables γ[r] = (γ 1 [r], …, γ L [r]). The following variation on Jensen’s Inequality: For any concave function φ(x 1,.., x L ) and any (arbitrarily correlated) vector of random variables (x 1, x 2, …, x L, T), where T>0, we have: E{Tφ(X 1, …, X L )} E{T} E{T(X 1, …, X L )} E{T} φ( ) ≤
49
The Algorithm (type 2) Becomes: On frame r, observe Z[r] = (Z 1 [r], …, Z L [r]). (Auxiliary Variables) Choose γ 1 [r], …, γ L [r] to max the below deterministic problem: (Policy Selection) Choose π[r] in P to minimize: Then update virtual queues: Z l [r+1] = max[Z l [r] – c l T[r] + y l [r], 0], G l [r+1] = max[G l [r] + γ l [r]T[r] - y l [r], 0]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.