A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California Proc.

Slides:



Advertisements
Similar presentations
Chapter 17: Making Complex Decisions April 1, 2004.
Advertisements

Optimal Pricing in a Free Market Wireless Network Michael J. Neely University of Southern California *Sponsored in part.
Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Network Utility Maximization over Partially Observable Markov Channels 1 1 Channel State 1 = ? Channel State 2 = ? Channel State 3 = ? Restless.
Congestion Games with Player- Specific Payoff Functions Igal Milchtaich, Department of Mathematics, The Hebrew University of Jerusalem, 1993 Presentation.
Joint Strategy Fictitious Play Sherwin Doroudi. “Adapted” from J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for.
1 Chapter 14 – Game Theory 14.1 Nash Equilibrium 14.2 Repeated Prisoners’ Dilemma 14.3 Sequential-Move Games and Strategic Moves.
Calibrated Learning and Correlated Equilibrium By: Dean Foster and Rakesh Vohra Presented by: Jason Sorensen.
MIT and James Orlin © Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)
Course: Applications of Information Theory to Computer Science CSG195, Fall 2008 CCIS Department, Northeastern University Dimitrios Kanoulas.
Stochastic optimization for power-aware distributed scheduling Michael J. Neely University of Southern California t ω(t)
Computing equilibria of security games using linear integer programming Dima Korzhyk
Stochastic Network Optimization with Non-Convex Utilities and Costs Michael J. Neely University of Southern California
Intelligent Packet Dropping for Optimal Energy-Delay Tradeoffs for Wireless Michael J. Neely University of Southern California
Peter Bulychev Alexandre David Kim G. Larsen Marius Mikucionis TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A.
Algoritmi per Sistemi Distribuiti Strategici
Dynamic Product Assembly and Inventory Control for Maximum Profit Michael J. Neely, Longbo Huang (University of Southern California) Proc. IEEE Conf. on.
Power Cost Reduction in Distributed Data Centers Yuan Yao University of Southern California 1 Joint work: Longbo Huang, Abhishek Sharma, LeanaGolubchik.
An Introduction to Game Theory Part II: Mixed and Correlated Strategies Bernhard Nebel.
Dynamic Index Coding Broadcast Station N N Michael J. Neely, Arash Saber Tehrani, Zhen Zhang University of Southern California Paper available.
Universal Scheduling for Networks with Arbitrary Traffic, Channels, and Mobility Michael J. Neely, University of Southern California Proc. IEEE Conf. on.
Utility Optimization for Dynamic Peer-to-Peer Networks with Tit-for-Tat Constraints Michael J. Neely, Leana Golubchik University of Southern California.
Stock Market Trading Via Stochastic Network Optimization Michael J. Neely (University of Southern California) Proc. IEEE Conf. on Decision and Control.
Delay-Based Network Utility Maximization Michael J. Neely University of Southern California IEEE INFOCOM, San Diego, March.
Lecture 1 - Introduction 1.  Introduction to Game Theory  Basic Game Theory Examples  Strategic Games  More Game Theory Examples  Equilibrium  Mixed.
Dynamic Optimization and Learning for Renewal Systems Michael J. Neely, University of Southern California Asilomar Conference on Signals, Systems, and.
Dynamic Index Coding User set N Packet set P Broadcast Station N N p p p Michael J. Neely, Arash Saber Tehrani, Zhen Zhang University.
Dynamic Optimization and Learning for Renewal Systems -- With applications to Wireless Networks and Peer-to-Peer Networks Michael J. Neely, University.
Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments Michael J. Neely University of Southern California
Dynamic Data Compression for Wireless Transmission over a Fading Channel Michael J. Neely University of Southern California CISS 2008 *Sponsored in part.
Communication Networks A Second Course Jean Walrand Department of EECS University of California at Berkeley.
Dynamic Spectrum Management: Optimization, game and equilibrium Tom Luo (Yinyu Ye) December 18, WINE 2008.
*Sponsored in part by the DARPA IT-MANET Program, NSF OCE Opportunistic Scheduling with Reliability Guarantees in Cognitive Radio Networks Rahul.
1 Computing Nash Equilibrium Presenter: Yishay Mansour.
Correlated-Q Learning and Cyclic Equilibria in Markov games Haoqi Zhang.
Cross Layer Adaptive Control for Wireless Mesh Networks (and a theory of instantaneous capacity regions) Michael J. Neely, Rahul Urgaonkar University of.
Quantal Response Equilibrium APEC 8205: Applied Game Theory Fall 2007.
A Game-Theoretic Look at Joint Multi-Access, Power and Rate Control Yalin Evren Sagduyu, Anthony Ephremides Objective and Motivation * Objective: Analyze.
Communication Networks A Second Course Jean Walrand Department of EECS University of California at Berkeley.
Minimax strategies, Nash equilibria, correlated equilibria Vincent Conitzer
Optimal Energy and Delay Tradeoffs for Multi-User Wireless Downlinks Michael J. Neely University of Southern California
By: Gang Zhou Computer Science Department University of Virginia 1 A Game-Theoretic Framework for Congestion Control in General Topology Networks SYS793.
Incentivizing Sharing in Realtime D2D Streaming Networks: A Mean Field Game Perspective Jian Li Texas A&M University April 30 th, 2015 Jointly with R.
ECE559VV – Fall07 Course Project Presented by Guanfeng Liang Distributed Power Control and Spectrum Sharing in Wireless Networks.
A Non-Monetary Protocol for P2P Content Distribution in Wireless Broadcast Networks with Network Coding I-Hong Hou, Yao Liu, and Alex Sprintson Dept. of.
Michael J. Neely, University of Southern California CISS, Princeton University, March 2012 Wireless Peer-to-Peer Scheduling.
EE 685 presentation Utility-Optimal Random-Access Control By Jang-Won Lee, Mung Chiang and A. Robert Calderbank.
Michael J. Neely, University of Southern California CISS, Princeton University, March 2012 Asynchronous Scheduling for.
Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides.
Energy-Aware Wireless Scheduling with Near Optimal Backlog and Convergence Time Tradeoffs Michael J. Neely University of Southern California INFOCOM 2015,
Super-Fast Delay Tradeoffs for Utility Optimal Scheduling in Wireless Networks Michael J. Neely University of Southern California
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Competitive Scheduling in Wireless Networks with Correlated Channel State Ozan.
1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.
Order Optimal Delay for Opportunistic Scheduling In Multi-User Wireless Uplinks and Downlinks Michael J. Neely University of Southern California
Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.
1 a1a1 A1A1 a2a2 a3a3 A2A Mixed Strategies When there is no saddle point: We’ll think of playing the game repeatedly. We continue to assume that.
1 Multi-radio Channel Allocation in Competitive Wireless Networks Mark Felegyhazi, Mario Čagalj, Jean-Pierre Hubaux EPFL, Switzerland IBC’06, Lisbon, Portugal.
Stochastic Optimization for Markov Modulated Networks with Application to Delay Constrained Wireless Scheduling Michael J. Neely University of Southern.
Chapter 6 Extensive Form Games With Perfect Information (Illustrations)
Delay Analysis for Max Weight Opportunistic Scheduling in Wireless Systems Michael J. Neely --- University of Southern California
1 On the Channel Capacity of Wireless Fading Channels C. D. Charalambous and S. Z. Denic School of Information Technology and Engineering, University of.
Energy Optimal Control for Time Varying Wireless Networks Michael J. Neely University of Southern California
Asynchronous Control for Coupled Markov Decision Systems Michael J. Neely University of Southern California Information Theory Workshop (ITW) Lausanne,
Online Fractional Programming for Markov Decision Systems
Delay Efficient Wireless Networking
Utility Optimization with “Super-Fast”
EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF INDUSTRIAL ENGINEERING IENG314 OPERATIONS RESEARCH II SAMIR SAMEER ABUYOUSSEF
Normal Form (Matrix) Games
Presentation transcript:

A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California Proc. Allerton Conference on Communication, Control, and Computing, Oct Game manager Player 1 Player 2 Player 3 Player 4 Player 5

Game structure Slotted time t in {0, 1, 2, …}. N players, 1 game manager. Slot t utility for each player depends on: (i) Random events ω(t) = (ω 0 (t), ω 1 (t),…,ω N (t)) (ii) Control actions α(t) = (α 1 (t), …, α N (t)) Players  Maximize time average utility. Game manager  Provides suggestions.  Maintains fairness of utilities subject to equilibrium constraints.

Random events ω(t) Player i sees ω i (t). Manager sees: ω(t) = (ω 0 (t), ω 1 (t), …, ω N (t)) Game manager Player 1 ω 1 (t) Player 2 ω 2 (t) Player 3 ω 3 (t) (ω 0 (t), ω 1 (t), …, ω Ν (t)) Only known to manager!

Random events ω(t) Player i sees ω i (t). Manager sees: ω(t) = (ω 0 (t), ω 1 (t), …, ω N (t)) Vector ω(t) is i.i.d. over slots (components are possibly correlated) Game manager Player 1 ω 1 (t) Player 2 ω 2 (t) Player 3 ω 3 (t) (ω 0 (t), ω 1 (t), …, ω Ν (t))

Actions and utilities Manager sends suggested actions M i (t). Players take actions α i (t) in A i. U i (t) = u i ( α(t), ω(t) ). Game manager Player 1 α 1 (t) Player 2 α 2 (t) Player 3 α 3 (t) (ω 0 (t), ω 1 (t), …, ω Ν (t)) M 1 (t) M 2 (t) M 3 (t)

Example: Wireless MAC game Manager knows current channel conditions: ω 0 (t) = (C 1 (t), C 2 (t), …, C N (t)) Users do not have this knowledge: ω i (t) = NULL User 1 User 2 User 3 Access Point C 1 (t) C 2 (t) C 3 (t)

Example: Economic market ω 0 (t) = vector of current prices. Prices are commonly known to everyone: ω i (t) = ω 0 (t) for all i. Game manager Player 1 Player 2 Player 3 ω 0 (t) = [price HAM (t)] [price EGGS (t)]

Participation At beginning of game, players choose either: (i) Participate: Receive messages M i (t). Always choose α i (t) = M i (t). (ii) Do not participate: Do not receive messages M i (t). Can choose α i (t) however they like. Need incentives for participation…

Participation At beginning of game, players choose either: (i) Participate: Receive messages M i (t). Always choose α i (t) = M i (t). (ii) Do not participate: Do not receive messages M i (t). Can choose α i (t) however they like. Need incentives for participation… Nash equilibrium (NE) Correlated equilibrium (CE) Coarse Correlated Equilibrium (CCE)

ΝΕ for Static Game Consider special case with no ω(t) process. Nash equilibrium (NE):  Players actions are independent: Pr[α] = Pr[α 1 ]Pr[α 2 ]…Pr[α N ]  Game manager not needed. Definition: Distribution Pr[α] is a Nash equilibrium (NE) if no player can benefit by unilaterally changing its action probabilities. Finding a NE in a general game is a nonconvex problem!

CΕ for Static Game Manager suggests actions α(t)  i.i.d. Pr[α]. Suppose all players participate. Definition: [Aumann 1974, 1987] Distribution Pr[α] is a Correlated Equilibrium (CE) if: E[ U i (t)| α i (t)=α ] ≥ E[ u i (β, α{-i}) | α i (t)=α] for all i in {1, …, N}, all pairs α, β in A i. LP with | A 1 | 2 + | A 2 | 2 + … + | A N | 2 constraints

Criticism of CE Manager gives suggestions M i (t) to players even if they do not participate. Without knowing message M i (t) = α i : Player i only knows a-priori likelihood of other player actions via joint distribution Pr[α]. Knowing M i (t) = α i : Player i knows a-posteriori likelihood of other player actions via conditional distribution Pr[α | α i ]

CCΕ for Static Game Manager suggests α(t)  i.i.d. Pr[α]. Gives suggestions only to participating players. Suppose all players participate. Definition: [Moulin and Vial, 1978] Distribution Pr[α] is a Coarse Corr. Eq. (CCE) if: E[ U i (t) ] ≥ E[ u i (β, α{-i}) ] for all i in {1, …, N}, all pairs β in A i. LP with | A 1 | + | A 2 | + … + | A N | constraints. ( significantly less complex! )

Superset Theorem The NE, CE, CCE definitions extend easily to the stochastic game. Theorem: {all NE} {all CE} {all CCE}

Example (static game) Player 1 Player 2 Utility function 1Utility function Player 1 Player Avg. Utility 2 Avg. Utility 1 (3.5, 2.4) (3.5, 9.3) (3.87, 3.79) NE and CE point All players benefit if non-participants are denied access to the suggestions of the game manager. CCE region

Pure strategies for stochastic games Player i observes: ω i (t) in Ω i Player i chooses: α i (t) in A i Definition: A pure strategy for player i is a function b i : Ω i  A i. There are | A i | |Ωi| pure strategies for player i. Define S i as this set of pure strategies. ΩiΩi AiAi bi(ωi)bi(ωi)

Stochastic optimization problem Subject to: U i ≥ U i (s) for all i in {1, …, N} for all s in S i φ ( U 1, U 2, …, U N ) Maximize: α(t) in A 1 x A 2 x … x A N for all t in {0, 1, 2, …} 1) 2) Concave fairness function CCE Constraints

Lyapunov optimization approach U i ≥ U i (s) for all i in {1, …, N}, for all s in S i Constraints: Virtual queue: Q i (s) (t) u i (α(t), ω(t))u i (s) (α(t), ω(t)) Formally: u i (s) (α(t), ω(t)) = u i ( (b i (s) (ω i (t)), α{-i}(t)), ω(t) )

Online algorithm (main part): Every slot t: Game manager observes queues and ω(t). Chooses α(t) in A 1 x A 2 x … x A N to minimize: Do an auxiliary variable selection (omitted here). Update virtual queues. Knowledge of Pr[ω(t) = (ω 0, ω 1, …., ω N )] not required!

Conclusions: CCE constraints are simpler and lead to improved utilities. Online algorithm for the stochastic game. No knowledge of Pr[ω(t) = (ω 0, ω 1, …., ω N )] required! Complexity and convergence time is independent of size of Ω 0. Scales gracefully with large N.

Aux variable update: Choose x i (t) in [0, 1] to maximize: Vφ(x 1 (t), …, x N (t)) – ∑ Z i (t)x i (t) Where Z i (t) is another virtual queue, one for each player i in {1, …, N}. See paper for details: