General-sum games You could still play a minimax strategy in general- sum games –I.e., pretend that the opponent is only trying to hurt you But this is.

Slides:



Advertisements
Similar presentations
CPS Extensive-form games Vincent Conitzer
Advertisements

A study of Correlated Equilibrium Polytope By: Jason Sorensen.
Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.
Game Theory Assignment For all of these games, P1 chooses between the columns, and P2 chooses between the rows.
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
This Segment: Computational game theory Lecture 1: Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie.
Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.
Mixed Strategies For Managers
Game Theoretical Insights in Strategic Patrolling: Model and Analysis Nicola Gatti – DEI, Politecnico di Milano, Piazza Leonardo.
Chapter 6 Game Theory © 2006 Thomson Learning/South-Western.
Calibrated Learning and Correlated Equilibrium By: Dean Foster and Rakesh Vohra Presented by: Jason Sorensen.
MIT and James Orlin © Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)
EC3224 Autumn Lecture #04 Mixed-Strategy Equilibrium
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
Part 3: The Minimax Theorem
Game-theoretic analysis tools Necessary for building nonmanipulable automated negotiation systems.
Extensive-form games. Extensive-form games with perfect information Player 1 Player 2 Player 1 2, 45, 33, 2 1, 00, 5 Players do not move simultaneously.
ECO290E: Game Theory Lecture 4 Applications in Industrial Organization.
An Introduction to Game Theory Part I: Strategic Games
Vincent Conitzer CPS Normal-form games Vincent Conitzer
by Vincent Conitzer of Duke
Sep. 5, 2013 Lirong Xia Introduction to Game Theory.
A camper awakens to the growl of a hungry bear and sees his friend putting on a pair of running shoes, “You can’t outrun a bear,” scoffs the camper. His.
Complexity Results about Nash Equilibria
An Introduction to Game Theory Part II: Mixed and Correlated Strategies Bernhard Nebel.
This Wednesday Dan Bryce will be hosting Mike Goodrich from BYU. Mike is going to give a talk during :30 to 12:20 in Main 117 on his work on UAVs.
1 Computing Nash Equilibrium Presenter: Yishay Mansour.
UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.
QR 38, 2/22/07 Strategic form: dominant strategies I.Strategic form II.Finding Nash equilibria III.Strategic form games in IR.
Network Formation Games. Netwok Formation Games NFGs model distinct ways in which selfish agents might create and evaluate networks We’ll see two models:
Game Theory.
UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.
Inefficiency of equilibria, and potential games Computational game theory Spring 2008 Michal Feldman.
An Intro to Game Theory Avrim Blum 12/07/04.
Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.
Minimax strategies, Nash equilibria, correlated equilibria Vincent Conitzer
Game Theory.
Vincent Conitzer CPS Game Theory Vincent Conitzer
CPS 170: Artificial Intelligence Game Theory Instructor: Vincent Conitzer.
Computing Equilibria Christos H. Papadimitriou UC Berkeley “christos”
Game representations, solution concepts and complexity Tuomas Sandholm Computer Science Department Carnegie Mellon University.
A quantum protocol for sampling correlated equilibria unconditionally and without a mediator Iordanis Kerenidis, LIAFA, Univ Paris 7, and CNRS Shengyu.
CPS LP and IP in Game theory (Normal-form Games, Nash Equilibria and Stackelberg Games) Joshua Letchford.
Extensive-form games Vincent Conitzer
Standard and Extended Form Games A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor, SIUC.
Game Theory: introduction and applications to computer networks Game Theory: introduction and applications to computer networks Lecture 2: two-person non.
Chapters 29, 30 Game Theory A good time to talk about game theory since we have actually seen some types of equilibria last time. Game theory is concerned.
The Science of Networks 6.1 Today’s topics Game Theory Normal-form games Dominating strategies Nash equilibria Acknowledgements Vincent Conitzer, Michael.
Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides.
Beyond selfish routing: Network Games. Network Games NGs model the various ways in which selfish users (i.e., players) strategically interact in using.
CPS LP and IP in Game theory (Normal-form Games, Nash Equilibria and Stackelberg Games) Joshua Letchford.
1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.
Shall we play a game? Game Theory and Computer Science Game Theory /06/05 - Zero-sum games - General-sum games.
Algorithms for solving two-player normal form games
CPS 570: Artificial Intelligence Game Theory Instructor: Vincent Conitzer.
CPS Utility theory, normal-form games
Vincent Conitzer CPS Learning in games Vincent Conitzer
5.1.Static Games of Incomplete Information
GAME THEORY Day 5. Minimax and Maximin Step 1. Write down the minimum entry in each row. Which one is the largest? Maximin Step 2. Write down the maximum.
Network Formation Games. NFGs model distinct ways in which selfish agents might create and evaluate networks We’ll see two models: Global Connection Game.
Game theory basics A Game describes situations of strategic interaction, where the payoff for one agent depends on its own actions as well as on the actions.
Communication Complexity as a Lower Bound for Learning in Games
Vincent Conitzer Normal-form games Vincent Conitzer
Presented By Aaron Roth
CPS Extensive-form games
Vincent Conitzer Extensive-form games Vincent Conitzer
CPS 173 Extensive-form games
Instructor: Vincent Conitzer
Normal Form (Matrix) Games
Presentation transcript:

General-sum games You could still play a minimax strategy in general- sum games –I.e., pretend that the opponent is only trying to hurt you But this is not rational: 0, 03, 1 1, 02, 1 If Column was trying to hurt Row, Column would play Left, so Row should play Down In reality, Column will play Right (strictly dominant), so Row should play Up Is there a better generalization of minimax strategies in zero- sum games to general-sum games?

Nash equilibrium [Nash 50] A vector of strategies (one for each player) is called a strategy profile A strategy profile (σ 1, σ 2, …, σ n ) is a Nash equilibrium if each σ i is a best response to σ -i –That is, for any i, for any σ i ’, u i (σ i, σ -i ) ≥ u i (σ i ’, σ -i ) Note that this does not say anything about multiple agents changing their strategies at the same time In any (finite) game, at least one Nash equilibrium (possibly using mixed strategies) exists [Nash 50] (Note - singular: equilibrium, plural: equilibria)

Nash equilibria of “chicken” 0, 0-1, 1 1, -1-5, -5 D S DS S D D S (D, S) and (S, D) are Nash equilibria –They are pure-strategy Nash equilibria: nobody randomizes –They are also strict Nash equilibria: changing your strategy will make you strictly worse off No other pure-strategy Nash equilibria

Nash equilibria of “chicken”… 0, 0-1, 1 1, -1-5, -5 D S DS Is there a Nash equilibrium that uses mixed strategies? Say, where player 1 uses a mixed strategy? Recall: if a mixed strategy is a best response, then all of the pure strategies that it randomizes over must also be best responses So we need to make player 1 indifferent between D and S Player 1’s utility for playing D = -p c S Player 1’s utility for playing S = p c D - 5p c S = 1 - 6p c S So we need -p c S = 1 - 6p c S which means p c S = 1/5 Then, player 2 needs to be indifferent as well Mixed-strategy Nash equilibrium: ((4/5 D, 1/5 S), (4/5 D, 1/5 S)) –People may die! Expected utility -1/5 for each player

The presentation game Pay attention (A) Do not pay attention (NA) Put effort into presentation (E) Do not put effort into presentation (NE) 4, 4-16, -14 0, -20, 0 Presenter Audience Pure-strategy Nash equilibria: (A, E), (NA, NE) Mixed-strategy Nash equilibrium: ((1/10 A, 9/10 NA), (4/5 E, 1/5 NE)) –Utility 0 for audience, -14/10 for presenter –Can see that some equilibria are strictly better for both players than other equilibria, i.e. some equilibria Pareto-dominate other equilibria

The “equilibrium selection problem” You are about to play a game that you have never played before with a person that you have never met According to which equilibrium should you play? Possible answers: –Equilibrium that maximizes the sum of utilities (social welfare) –Or, at least not a Pareto-dominated equilibrium –So-called focal equilibria “Meet in Paris” game - you and a friend were supposed to meet in Paris at noon on Sunday, but you forgot to discuss where and you cannot communicate. All you care about is meeting your friend. Where will you go? –Equilibrium that is the convergence point of some learning process –An equilibrium that is easy to compute –…–… Equilibrium selection is a difficult problem

Stackelberg (commitment) games Unique Nash equilibrium is (R,L) –This has a payoff of (2,1) 1, -13, 1 2, 14, -1 L R L R

Commitment What if the officer has the option to (credibly) announce where he will be patrolling? This would give him the power to “commit” to being at one of the buildings –This would be a pure-strategy Stackelberg game LR L(1,-1)(3,1) R(2,1)(4,-1)

Commitment… If the officer can commit to always being at the left building, then the vandal's best response is to go to the right building –This leads to an outcome of (3,1) LR L(1,-1)(3,1)

Committing to mixed strategies What if we give the officer even more power: the ability to commit to a mixed strategy –This results in a mixed-strategy Stackelberg game –E.g., the officer commits to flip a weighted coin which decides where he patrols LR L(1,-1)(3,1) R(2,1)(4,-1)

Committing to mixed strategies is more powerful Suppose the officer commits to the following strategy: {(.5+ε)L,(.5- ε)R} –The vandal’s best response is R –As ε goes to 0, this converges to a payoff of (3.5,0) LR L(1,-1)(3,1) R(2,1)(4,-1)

Stackelberg games in general One of the agents (the leader) has some advantage that allows her to commit to a strategy (pure or mixed) The other agent (the follower) then chooses his best response to this

Visualization LCR U 0,11,00,0 M 4,00,10,0 D 1,01,1 (1,0,0) = U (0,1,0) = M (0,0,1) = D L C R

Easy polynomial-time algorithm for two players For every column t, we solve separately for the best mixed row strategy (defined by p s ) that induces player 2 to play t maximize Σ s p s u 1 (s, t) subject to for any t’, Σ s p s u 2 (s, t) ≥ Σ s p s u 2 (s, t’) Σ s p s = 1 (May be infeasible) Pick the t that is best for player 1

(a particular kind of) Bayesian games leader utilities follower utilities (type 1) follower utilities (type 2) probability.6probability.4

Multiple types - visualization (1,0,0) (0,1,0) L C R (0,0,1) (1,0,0) (0,1,0) L C R (0,0,1) (1,0,0) (0,1,0) (0,0,1) (R,C) Combined

Solving Bayesian games There’s a known MIP for this 1 Details omitted due to the fact that its rather nasty. The main trick of the MIP is encoding a exponential number of LP’s into a single MIP Used in the ARMOR system deployed at LAX [1] Paruchuri et al. Playing Games for Security: An Efficient Exact Algorithm for Solving Bayesian Stackelberg Games

(In)approximability (# types)-approximation: optimize for each type separately using the LP method. Pick the solution that gives the best expected utility against the entire type distribution. Can’t do any better in polynomial time, unless P=NP –Reduction from INDEPENDENT-SET For adversarially chosen types, cannot decide in polynomial time whether it is possible to guarantee positive utility, unless P=NP

Reduction from independent set AB al1al1 31 al2al2 010 al3al3 01 AB al1al1 0 al2al2 31 al3al3 0 AB al1al1 01 al2al2 0 al3al AB al1al1 10 al2al2 10 al3al3 10 leader utilities follower utilities (type 1) follower utilities (type 2) follower utilities (type 3)

Value We define three ratios: –Value of Pure Commitment (VoPC) –Value of Mixed Commitment (VoMC) –Mixed vs Pure (MvP) LR L(1,-1)(3,1) R(2,1)(4,-1) 3/2 3.5/2 3.5/3

Related concepts Value of mediation 1 –Ratio of combined utility of all players between: Best correlated equilibrium Best Nash equilibrium Price of anarchy 2 –Measures loss of welfare lost due to selfishness of the players –Ratio between: Worst Nash equilibrium Optimal social welfare One difference is that we are only concerned with player 1’s utility 1 Ashlagi et al Koutsoupias and Papadimitriou 99

Normal form (2×2) VoPC = VoMC = ∞ LR U,11,0 D0,01-,1

Normal form (2×2) continued MvP = ∞ LR U,11,0 D0,02,1

Summary of value results Game typeVoPCVoMCMvP Normal-form (2×2)∞ (ISD) Symmetric normal-form (3×3)∞ (ISD) Extensive-form (3 leaves)∞ (SPNE) Security games (2×2)∞ ( UNash) Atomic Selfish routing Linear costs (n players)n (ISD) Quadratic costs (n players)n 2 (ISD) k-nomnial costs (n players)n k (ISD) Arbitrary costs (2 players)∞ (ISD)

Learning Single follower type Unknown follower payoffs Repeated play: commit to mixed strategy, see follower’s (myopic) response LR U1,?3,? D2,?4,?

Visualization LCR U 0,11,00,0 M 4,00,10,0 D 1,01,1 (1,0,0) = U (0,1,0) = M (0,0,1) = D L C R

Sampling L R C (1,0,0) (0,0,1) (0,1,0)

Three main techniques in the learning algorithm Find one point in each region (using random sampling) Find a point on an unknown hyperplane Starting from a point on an unknown hyperplane, determine the hyperplane completely

Finding a point on an unknown hyperplane L C R R or L Intermediate state Step 1. Sample in the overlapping region Region: R Step 2. Connect the new point to the point in the region that doesn’t match Step 3. Binary search along this lineL R

Determining the hyperplane L C R R or L Intermediate state Step 1. Sample a regular d-simplex centered at the point Step 2. Connect d lines between points on opposing sides Step 3. Binary search along these lines Step 4. Determine hyperplane (and update the region estimates with this information) L R

Bound on number of samples Theorem. Finding all of the hyperplanes necessary to compute the optimal mixed strategy to commit to requires O(Fk log(k) + dLk 2 ) samples –F depends on the size of the smallest region –L depends on desired precision –k is the number of follower actions –d is the number of leader actions