Oblivious Equilibrium for Stochastic Games with Concave Utility

Slides:



Advertisements
Similar presentations
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project FLoWS Team Update: Andrea Goldsmith ITMANET PI Meeting Jan 27, 2011.
Advertisements

Sogang University ICC Lab Using Game Theory to Analyze Wireless Ad Hoc networks.
Planning under Uncertainty
1 Sensor Networks and Networked Societies of Artifacts Jose Rolim University of Geneva.
Gabriel Tsang Supervisor: Jian Yang.  Initial Problem  Related Work  Approach  Outcome  Conclusion  Future Work 2.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
*Sponsored in part by the DARPA IT-MANET Program, NSF OCE Opportunistic Scheduling with Reliability Guarantees in Cognitive Radio Networks Rahul.
Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6 th, 2006 CS286r Presented by Ilan Lobel.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
D Nagesh Kumar, IIScOptimization Methods: M1L4 1 Introduction and Basic Concepts Classical and Advanced Techniques for Optimization.
Optimal Energy and Delay Tradeoffs for Multi-User Wireless Downlinks Michael J. Neely University of Southern California
Multiple-access Communication in Networks A Geometric View W. Chen & S. Meyn Dept ECE & CSL University of Illinois.
MAKING COMPLEX DEClSlONS
Fluid Limits for Gossip Processes Vahideh Manshadi and Ramesh Johari DARPA ITMANET Meeting March 5-6, 2009 TexPoint fonts used in EMF. Read the TexPoint.
Wireless Networks Breakout Session Summary September 21, 2012.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Thrust 2 Layerless Dynamic Networks Lizhong Zheng, Todd Coleman.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Thrust 3 Application Metrics and Network Performance Asu Ozdaglar and Devavrat.
Constraint Satisfaction Problems (CSPs) CPSC 322 – CSP 1 Poole & Mackworth textbook: Sections § Lecturer: Alan Mackworth September 28, 2012.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Congestion Control in CSMA-Based Networks with Inconsistent Channel State V. Gambiroza and E. Knightly Rice Networks Group
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Thrust 2 Overview: Layerless Dynamic Networks Lizhong Zheng.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Competitive Scheduling in Wireless Networks with Correlated Channel State Ozan.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Thrust 3 Application Metrics and Network Performance Asu Ozdaglar and Devavrat.
MAIN RESULT: We assume utility exhibits strategic complementarities. We show: Membership in larger k-core implies higher actions in equilibrium Higher.
Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Asymptotic Analysis for Large Scale Dynamic Stochastic Games Sachin Adlakha, Ramesh Johari, Gabriel Weintraub and Andrea Goldsmith DARPA ITMANET Meeting.
Mean Field Methods for Computer and Communication Systems Jean-Yves Le Boudec EPFL Network Science Workshop Hong Kong July
PROJECT DOMAIN : NETWORK SECURITY Project Members : M.Ananda Vadivelan & E.Kalaivanan Department of Computer Science.
Computing Shapley values, manipulating value division schemes, and checking core membership in multi-issue domains Vincent Conitzer, Tuomas Sandholm Computer.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project A Distributed Newton Method for Network Optimization Ali Jadbabaie and Asu Ozdaglar.
INTRODUCTION TO WIRELESS SENSOR NETWORKS
Introducing Information into RM to Model Market Behavior INFORMS 6th RM and Pricing Conference, Columbia University, NY Darius Walczak June 5, 2006.
Impact of Interference on Multi-hop Wireless Network Performance
P & NP.
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
CSC321 Lecture 18: Hopfield nets and simulated annealing
Q-Learning for Policy Improvement in Network Routing
Making complex decisions
Boosting and Additive Trees (2)
ACHIEVEMENT DESCRIPTION
Analytics and OR DP- summary.
Ivana Marić, Ron Dabora and Andrea Goldsmith
Resource Allocation in Non-fading and Fading Multiple Access Channel
Routing in Wireless Ad Hoc Networks by Analogy to Electrostatic Theory
When Security Games Go Green
Hidden Markov Models Part 2: Algorithms
Announcements Homework 3 due today (grace period through Friday)
Hash Tables – 2 Comp 122, Spring 2004.
Game Theory in Wireless and Communication Networks: Theory, Models, and Applications Lecture 10 Stochastic Game Zhu Han, Dusit Niyato, Walid Saad, and.
1.206J/16.77J/ESD.215J Airline Schedule Planning
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
CASE − Cognitive Agents for Social Environments
Networked Real-Time Systems: Routing and Scheduling
13. Acting under Uncertainty Wolfram Burgard and Bernhard Nebel
Optimal Control and Reachability with Competing Inputs
Jean-Yves Le Boudec EPFL – I&C – LCA Joint work with Michel Benaïm
Multiagent Systems Repeated Games © Manfred Huber 2018.
Unequal Error Protection: Application and Performance Limits
Application Metrics and Network Performance
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Collision Helps! Algebraic Collision Recovery for Wireless Erasure Networks.
ACHIEVEMENT DESCRIPTION
ACHIEVEMENT DESCRIPTION
Towards characterizing the capacity of the building block of MANETs
Hash Tables – 2 1.
Presentation transcript:

Oblivious Equilibrium for Stochastic Games with Concave Utility Sachin Adlakha, Ramesh Johari, Gabriel Weintraub and Andrea Goldsmith DARPA ITMANET Meeting March 5-6, 2009

ACHIEVEMENT DESCRIPTION Oblivious equilibrium for stochastic games with concave utility S. Adlakha, R. Johari, G. Weintraub, A. Goldsmith IMPACT NEXT-PHASE GOALS ACHIEVEMENT DESCRIPTION STATUS QUO NEW INSIGHTS MAIN RESULT: Consider stochastic games per-period utility and state dynamics that are increasing, concave, submodular. Then in a large system, each node can find approximately optimal policies by treating the state of other nodes as constant. HOW IT WORKS: Under our assumptions, no single node is overly influential ) we can replace other nodes’ states by their mean. So the optimal policies decouple between nodes. ASSUMPTIONS AND LIMITATIONS: This result holds under much more general technical assumptions than our early results on the problem. A key modeling limitation, however, is that the limit requires all nodes to interact with each other. Thus the results apply only to dense networks. Our results provide a general framework to study the interaction of multiple devices. Further, our results: unify existing models for which such limits were known and provide simple exogenous conditions that can be checked to ensure the main result holds Utility Next state Current state or current action Current state or current action Many cognitive radio models do not account for reaction of other devices to a single device’s action. In prior work, we developed a general stochastic game model to tractably capture interactions of many devices. # of other devices with given state In principle, tracking state of other devices is complex. We approximate state of other devices via a mean field limit. State of device i State of other devices Action of device i State We will apply our results to a model of interfering transmissions among energy-constrained devices. Our main goal is to develop a related model that applies when a single node interacts with a small number of other nodes each period. What technical challenge is being undertaken on behalf of the project Answer - In this project we aim to understand competition among wireless nodes in dynamic settings. In particular, we are interested in understanding mean field approximation to large scale wireless games in a reactive environment. 2. Why is it hard and what are the open problems Answer - The standard game theoretic techniques for dynamic games are computationally prohibitive and require information flow between nodes, something that is hard in practice. 3. How has this problem been addressed in the past Answer - Most studies related to cognitive radios have focused on either static environment or have restricted attention to small toy problems with few nodes. 4. What new intellectual tools are being brought to bear on the problem Answer - We have generalized the concept of oblivious equilibrium to large class of stochastic games. The wireless games form an interesting subclass of these games. 5. What is the main intermediate achievement Answer - In earlier work we had results on special cases, including linear dynamic-quadratic cost models. Our result now generalizes and extends all our prior results, and unifies them under a single framework with easily verifiable assumptions. The main achievement has been isolating a set of conditions on model primitives under which oblivious equilibrium can approximate the (more computationally difficult) Markov perfect equilibrium. 6. How and when does this achievement align with the project roadmap (end-of-phase or end-of-project goal) Answer - A key end-of-phase goal was to generalize our models to handle much more complex scenarios of interaction. Our results directly address this goal. 7. What are the even long-term objectives and consequences? Answer - Our model is fairly general, and so one immediate goal is to exploit the model structure for models of interaction among energy-constrained nodes. However, the more important long-term objective is to develop models where the number of nodes is large, but a single node interacts with a limited subset of other nodes at any given time. 8. Which thrusts and SOW tasks does this contribution fit under and why? Answer - This fits under Thrust 3, “Application Metrics and Network Performance”, and specifically provides tools to ensure such systems are robust against noncooperative interactions between mobiles, as well as to ensure distributed coordination. Real environments are reactive and non-stationary; this requires new game-theoretic models of interaction

Wireless environments are reactive Scenario: Wireless devices sharing same spectrum. Typical Approach: Assume that the environment is non-reactive. Flawed assumption at best: In cognitive radio networks, the environment consists of other cognitive radios – hence is highly reactive Questions: How do we design policies for such networks? What is the performance loss if we assume non-reactive environments?

Foundational theory – Markov Perfect Equilibrium State of player i State of other players Action of player i Model such reactive environments as stochastic dynamic games. Key solution concept is that of Markov perfect equilibrium (MPE). The action of each player depends on the state of everyone. Problems: Tracking state of everyone else is hard. MPE is hard to compute.

Foundational Theory – Oblivious Equilibrium State of player i Action of player i Average state of other players Oblivious policies – Each player reacts to only average state of other players Easy to compute and implement. Requires little information exchange. Question: When is oblivious equilibrium close to MPE?

Our model m players State of player i is xi; action of player i is ai State evolution: Payoff: where f-i = empirical distribution of other players’ states state # of players

MPE and OE A Markov policy is a decision rule based on the current state and the empirical distribution: ai, t = ¹(xi, t, f-i, t(m)) A Markov perfect equilibrium is a vector of Markov policies, where each player has maximized present discounted payoff, given policies of other players. In an oblivious policy, a player responds instead to x-i, t and only the long run average f-i(m). In an oblivious equilibrium each player has maximized present discounted payoff using an oblivious policy, given long run average state induced by other players’ policies.

Prior Work Generalized the idea of OE to general stochastic games [Allerton 07]. Unified existing models, such as LQG games, via our framework [CDC 08]. Exogenous conditions for approximating MPE using OE for linear dynamics and separable payoffs [Allerton 08]. Current Results: We have a general set of exogenous conditions (including nonlinear dynamics and nonseparable payoffs) under which OE is a good approximation to MPE. These conditions also unify our previous results and existing models.

Assumptions [A1] The state transition function is concave in state and action and has decreasing differences in state and action. [A2] For any action, is a non-increasing function of state and eventually becomes negative. [A3] The payoff function is jointly concave in state and action and has decreasing differences in state and action. [A4] The logarithm of the payoff is Gateaux differentiable w.r.t. f-i. [A5] MPE and OE exist. [A6] We restrict attention to policies that make the individual state Markov chain recurrent and keep the discounted sum of the square of the payoff finite.

Assumptions Define g(y) can be interpreted as the maximum rate of change of the logarithm of the payoff function w.r.t a small change in fraction of players at state y. [A7] We assume that the payoff function is such that g(y) » O(yK) for some K. [A8] We assume that there exists a constant C such that the payoff function satisfies the following condition

Main Result Under [A1]-[A8], oblivious equilibrium payoff is approximately optimal over Markov policies, as m  1. In other words, OE is approximately an MPE. The key point here is that no single player is overly influential and the true state distribution is close to the time average—so knowledge of other player’s policies does not significantly improve payoff. Advantage: Each player can use oblivious policy without loss in performance.

Main Contributions and Future Work Provides a general framework to study the interaction of multiple devices. Provides exogenous conditions which can be easily checked to ensure the main result holds. Unifies existing models for which such limits are known. Future Work: Apply this model to interfering transmissions between energy constrained nodes. Develop similar models where a single node interacts with a small set of nodes at each time period.