Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Optimization and Learning for Renewal Systems -- With applications to Wireless Networks and Peer-to-Peer Networks Michael J. Neely, University.

Similar presentations


Presentation on theme: "Dynamic Optimization and Learning for Renewal Systems -- With applications to Wireless Networks and Peer-to-Peer Networks Michael J. Neely, University."— Presentation transcript:

1 Dynamic Optimization and Learning for Renewal Systems -- With applications to Wireless Networks and Peer-to-Peer Networks Michael J. Neely, University of Southern California t T/R Network Coordinator Task 1 Task 2 Task 3 T[0]T[1]T[2]

2 Outline: Optimization of Renewal Systems Application 1: Task Processing in Wireless Networks  Quality-of-Information (ARL CTA project)  Task “deluge” problem Application 2: Peer-to-Peer Networks  Social networks (ARL CTA project)  Internet and wireless

3 References: General Theory and Application 1: M. J. Neely, Stochastic Network Optimization with Application to Communication and Queueing Systems, Morgan & Claypool, 2010. M. J. Neely, “Dynamic Optimization and Learning for Renewal Systems,” Proc. Asilomar Conf. on Signals, Systems, and Computers, Nov. 2010. Application 2 (Peer-to-Peer): M. J. Neely and L. Golubchik, “Utility Optimization for Dynamic Peer-to-Peer Networks with Tit-for-Tat Constraints,” Proc. IEEE INFOCOM, 2011. These works are available on: http://www-bcf.usc.edu/~mjneely/

4 A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. π[r] y[r] = [y 0 (π[r]), y 1 (π[r]), …, y L (π[r])] T[r] = T(π[r]) = Frame Duration

5 A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]) : π[r] y[r] = [y 0 (π[r]), y 1 (π[r]), …, y L (π[r])] T[r] = T(π[r]) = Frame Duration

6 A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]) : π[r] y[r] = [1.2, 1.8, …, 0.4] T[r] = 8.1 = Frame Duration

7 A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]) : π[r] y[r] = [0.0, 3.8, …, -2.0] T[r] = 12.3 = Frame Duration

8 A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]) : π[r] y[r] = [1.7, 2.2, …, 0.9] T[r] = 5.6 = Frame Duration

9 Example 1: Opportunistic Scheduling S[r] = (S 1 [r], S 2 [r], S 3 [r]) All Frames = 1 Slot S[r] = (S 1 [r], S 2 [r], S 3 [r]) = Channel States for Slot r Policy π[r]: On frame r: First observe S[r], then choose a channel to serve (i.,e, {1, 2, 3}). Example Objectives: thruput, energy, fairness, etc.

10 Example 2: Convex Programs (Deterministic Problems) Minimize: f(x 1, x 2, …, x N ) Subject to: g k (x 1, x 2, …, x N ) ≤ 0 for all k in {1,…, K} (x 1, x 2, …, x N ) in A

11 Example 2: Convex Programs (Deterministic Problems) All Frames = 1 Slot. Policy π[r] = (x 1 [r], x 2 [r], …, x N [r]) in A. Time average: f(x[r]) = lim R  ∞ (1/R)∑ r=0 f(x[r]) Minimize: f(x 1, x 2, …, x N ) Subject to: g k (x 1, x 2, …, x N ) ≤ 0 for all k in {1,…, K} (x 1, x 2, …, x N ) in A Equivalent to: Minimize: f(x 1 [r], x 2 [r], …, x N [r]) Subject to: g k (x 1 [r], x 2 [r], …, x N [r]) ≤ 0 for all k in {1,…, K} (x 1 [r], x 2 [r], …, x N [r]) in A for all frames r R-1

12 Example 2: Convex Programs (Deterministic Problems) Jensen’s Inequality: The time average of the dynamic solution (x 1 [r], x 2 [r], …, x N [r]) solves the original convex program! Minimize: f(x 1, x 2, …, x N ) Subject to: g k (x 1, x 2, …, x N ) ≤ 0 for all k in {1,…, K} (x 1, x 2, …, x N ) in A Equivalent to: Minimize: f(x 1 [r], x 2 [r], …, x N [r]) Subject to: g k (x 1 [r], x 2 [r], …, x N [r]) ≤ 0 for all k in {1,…, K} (x 1 [r], x 2 [r], …, x N [r]) in A for all frames r

13 Example 3: Markov Decision Problems M(t) = Recurrent Markov Chain (continuous or discrete) Renewals are defined as recurrences to state 1. T[r] = random inter-renewal frame size (frame r). y[r] = penalties incurred over frame r. π[r] = policy that affects transition probs over frame r. Objective: Minimize time average of one penalty subj. to time average constraints on others. 1 1 2 2 3 3 4 4

14 Example 4: Task Processing over Networks T/R Network Coordinator Infinite Sequence of Tasks. E.g.: Query sensors and/or perform computations. Renewal Frame r = Processing Time for Frame r. Policy Types: Low Level: {Specify Transmission Decisions over Net} High Level: {Backpressure1, Backpressure2, Shortest Path} Example Objective: Maximize quality of information per unit time subject to per-node power constraints. Task 1 Task 2 Task 3 T/R

15 Quick Review of Renewal-Reward Theory (Pop Quiz Next Slide!) Define the frame-average for y 0 [r]: The time-average for y 0 [r] is then: *If i.i.d. over frames, by LLN this is the same as E{y 0 }/E{T}.

16 Pop Quiz: (10 points) Let y 0 [r] = Energy Expended on frame r. Time avg. power = (Total Energy Use)/(Total Time) Suppose (for simplicity) behavior is i.i.d. over frames. To minimize time average power, which one should we minimize? (a)(b)

17 Pop Quiz: (10 points) Let y 0 [r] = Energy Expended on frame r. Time avg. power = (Total Energy Use)/(Total Time) Suppose (for simplicity) behavior is i.i.d. over frames. To minimize time average power, which one should we minimize? (a)(b)

18 Two General Problem Types: 1) Minimize time average subject to time average constraints: 2) Maximize concave function φ(x 1, …, x L ) of time average:

19 Solving the Problem (Type 1): Define a “Virtual Queue” for each inequality constraint: Z l [r] c l T[r] y l [r] Z l [r+1] = max[Z l [r] – c l T[r] + y l [r], 0]

20 Lyapunov Function and “Drift-Plus-Penalty Ratio”: Z 2 (t) Z 1 (t) L[r] = Z 1 [r] 2 + Z 2 [r] 2 + … + Z L [r] 2 Δ(Z[r]) = E{L[r+1] – L[r] | Z[r]} = “Frame-Based Lyap. Drift” Scalar measure of queue sizes: Algorithm Technique: Every frame r, observe Z 1 [r], …, Z L [r]. Then choose a policy π[r] in P to minimize: Δ(Z[r]) + VE{y 0 [r]|Z[r]} E{T|Z[r]} “Drift-Plus-Penalty Ratio” =

21 The Algorithm Becomes: Observe Z[r] = (Z 1 [r], …, Z L [r]). Choose π[r] in P to solve: Then update virtual queues: Δ(Z[r]) + VE{y 0 [r]|Z[r]} E{T|Z[r]} Z l [r+1] = max[Z l [r] – c l T[r] + y l [r], 0]

22 Theorem: Assume the constraints are feasible. Then under this algorithm, we achieve: Δ(Z[r]) + VE{y 0 [r]|Z[r]} E{T|Z[r]} DPP Ratio: (a) (b) For all frames r in {1, 2, 3, …}

23 Application 1 – Task Processing: T/R Network Coordinator Task 1 Task 2 Task 3 Every Task reveals random task parameters η[r]: η[r] = [(qual 1 [r], T 1 [r]), (qual 2 [r], T 2 [r]), …, (qual 5 [r], T 5 [r])] Choose π[r] = [which node to transmit, how much idle] in {1,2,3,4,5} X [0, I max ] Transmissions incur power We use a quality distribution that tends to be better for higher-numbered nodes. Maximize quality/time subject to p av ≤ 0.25 for all nodes. Setup Transmit Idle I[r] Frame r

24 Minimizing the Drift-Plus-Penalty Ratio: Minimizing a pure expectation, rather than a ratio, is typically easier (see Bertsekas, Tsitsiklis Neuro-DP). Define: “Bisection Lemma”:

25 Learning via Sampling from the past: Suppose randomness characterized by: {η 1, η 2,..., η W } (past random samples) Want to compute (over unknown random distribution of η): Approximate this via W samples from the past:

26 Simulation: Sample Size W Quality of Information / Unit Time Drift-Plus-Penalty Ratio Alg. With Bisection Alternative Alg. With Time Averaging

27 Concluding Sims (values for W=10): T/R Network Coordinator Task 1 Task 2 Task 3 Setup Transmit Idle I[r] Frame r

28 “Application 2” – Peer-to-Peer Wireless Networking:

29 Network Cloud 1 2 3 5 4 N nodes. Each node n has download social group G n. G n is a subset of {1, …, N}. Each file f is in some subset of nodes N f. Each node n can request download of a file f from any node in G n N f Transmission rates (µ ab (t)) between nodes are chosen in some (possibly time-varying) set  t 

30 “Internet Cloud” Example 1: Network Cloud 1 2 3 5 4 Uplink capacity C 1 uplink  (t) = Constant (no variation). ∑ b µ nb (t) ≤ C n uplink for all nodes n. This example assumes uplink capacity is the bottleneck.

31 “Internet Cloud” Example 2: Network Cloud 1 2 3 5 4  (t) specifies a single supportable (µ ab (t)). No “transmission rate decisions.” The allowable rates (µ ab (t)) are given to the peer-to-peer system from some underlying transport and routing protocol.

32 “Wireless Basestation” Example 3: = base station = wireless device Wireless device-to-device transmission increases capacity. (µ ab (t)) chosen in  (t). Transmissions coordinated by base station.

33 “Commodities” for Request Allocation Multiple file downloads can be active. Each file corresponds to a subset of nodes. Queueing files according to subsets would result in O(2 N ) queues. (complexity explosion!). Instead of that, without loss of optimality, we use the following alternative commodity structure…

34 “Commodities” for Request Allocation Use subset info to determine the decision set. n (A n (t), N n (t)) j k m G n N n (t)

35 “Commodities” for Request Allocation Use subset info to determine the decision set. Choose which node will help download. n (A n (t), N n (t)) j k m G n N n (t)

36 “Commodities” for Request Allocation Use subset info to determine the decision set. Choose which node will help download. That node queues the request: Q mn (t+1) = max[Q mn (t) + R mn (t) - µ mn (t), 0] Subset info can now be thrown away. n (A n (t), N n (t)) j k m Q mn (t)

37 Stochastic Network Optimization Problem: Maximize: ∑ n g n ( ∑ a r an ) Subject to: (1)Q mn < infinity (Queue Stability Constraint) (2)α ∑ a r an ≤ β + ∑ b r nb for all n (Tit-for-Tat Constraint)

38 Maximize: ∑ n g n ( ∑ a r an ) Subject to: (1)Q mn < infinity (Queue Stability Constraint) (2)α ∑ a r an ≤ β + ∑ b r nb for all n (Tit-for-Tat Constraint) concave utility function Stochastic Network Optimization Problem:

39 Maximize: ∑ n g n ( ∑ a r an ) Subject to: (1)Q mn < infinity (Queue Stability Constraint) (2)α ∑ a r an ≤ β + ∑ b r nb for all n (Tit-for-Tat Constraint) concave utility function time average request rate Stochastic Network Optimization Problem:

40 Maximize: ∑ n g n ( ∑ a r an ) Subject to: (1)Q mn < infinity (Queue Stability Constraint) (2)α ∑ a r an ≤ β + ∑ b r nb for all n (Tit-for-Tat Constraint) concave utility function time average request rate α x Download rate Stochastic Network Optimization Problem:

41 Maximize: ∑ n g n ( ∑ a r an ) Subject to: (1)Q mn < infinity (Queue Stability Constraint) (2)α ∑ a r an ≤ β + ∑ b r nb for all n (Tit-for-Tat Constraint) concave utility function time average request rate α x Download rate β + Upload rate Stochastic Network Optimization Problem:

42 Solution Technique for Infocom paper Use “Drift-Plus-Penalty” framework in a new “Universal Scheduling” scenario. We make no statistical assumptions on the stochastic processes [S(t); (A n (t), N n (t))].

43 Resulting Algorithm: (Auxiliary Variables) For each n, choose an aux. variable γ n (t) in interval [0, A max ] to maximize: Vg n (γ n (t)) – H n (t)  n (t) (Request Allocation) For each n, observe the following value for all m in { G n N n (t)}: -Q mn (t) + H n (t) + (F m (t) – αF n (t)) Give A n (t) to queue m with largest non-neg value, Drop A n (t) if all above values are negative. (Scheduling) Choose (µ ab (t)) in  (t) to maximize: ∑ nb µ nb (t)Q nb (t)

44 How the Incentives Work for node n: F n (t) α x Receive Help(t)β + Help Others(t) -Q mn (t) + H n (t) + (F m (t) – αF n (t)) Node n can only request downloads from others if it finds a node m with a non-negative value of: F n (t) = “Node n Reputation” (Good reputation = Low value)

45 How the Incentives Work for node n: F n (t) α x Receive Help(t)β + Help Others(t) -Q mn (t) + H n (t) + (F m (t) – αF n (t)) Node n can only request downloads from others if it finds a node m with a non-negative value of: F n (t) = “Node n Reputation” (Good reputation = Low value) BoundedCompare Reputations!

46 Concluding Theorem: For any arbitrary [S(t); (A n (t), N n (t))] sample path, we guarantee: a)Q mn (t) ≤ Q max = O(V) for all t, all (m,n). b)All Tit-for-Tat constraints are satisfied. c)For any T>0: liminf K  ∞ [Achieved Utility(KT)] ≥ liminf K  ∞ (1/K)∑ i=1 [“T-Slot-Lookahead-Utility[i]”]- BT/V Frame 1 Frame 2Frame 3 0T2T3T K

47 Conclusions for Peer-to-Peer Problem: Framework for posing peer-to-peer networking as stochastic network optimization problems. Can compute optimal solution in polynomial time. Conclusions overall: Renewal Optimization Framework can be viewed as “Generalized Linear Programming” Variable Length Scheduling Modes Many applications (task processing, peer-to-peer networks, Markov decision problems, linear programs, convex programs, stock market, smart grid, energy harvesting, and many more)

48 Solving the Problem (Type 2): We reduce it to a problem with the structure of Type 1 via: Auxiliary Variables γ[r] = (γ 1 [r], …, γ L [r]). The following variation on Jensen’s Inequality: For any concave function φ(x 1,.., x L ) and any (arbitrarily correlated) vector of random variables (x 1, x 2, …, x L, T), where T>0, we have: E{Tφ(X 1, …, X L )} E{T} E{T(X 1, …, X L )} E{T} φ( ) ≤

49 The Algorithm (type 2) Becomes: On frame r, observe Z[r] = (Z 1 [r], …, Z L [r]). (Auxiliary Variables) Choose γ 1 [r], …, γ L [r] to max the below deterministic problem: (Policy Selection) Choose π[r] in P to minimize: Then update virtual queues: Z l [r+1] = max[Z l [r] – c l T[r] + y l [r], 0], G l [r+1] = max[G l [r] + γ l [r]T[r] - y l [r], 0]


Download ppt "Dynamic Optimization and Learning for Renewal Systems -- With applications to Wireless Networks and Peer-to-Peer Networks Michael J. Neely, University."

Similar presentations


Ads by Google