Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides.

Slides:



Advertisements
Similar presentations
Shortest Vector In A Lattice is NP-Hard to approximate
Advertisements

Oct 9, 2014 Lirong Xia Hypothesis testing and statistical decision theory.
6.896: Topics in Algorithmic Game Theory Lecture 20 Yang Cai.
Price Of Anarchy: Routing
Totally Unimodular Matrices
Mixed Strategies CMPT 882 Computational Game Theory Simon Fraser University Spring 2010 Instructor: Oliver Schulte.
Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.
COMP 553: Algorithmic Game Theory Fall 2014 Yang Cai Lecture 21.
Calibrated Learning and Correlated Equilibrium By: Dean Foster and Rakesh Vohra Presented by: Jason Sorensen.
MIT and James Orlin © Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)
EC3224 Autumn Lecture #04 Mixed-Strategy Equilibrium
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
Game theory.
EC941 - Game Theory Lecture 7 Prof. Francesco Squintani
Shengyu Zhang The Chinese University of Hong Kong.
Game Theory: introduction and applications to computer networks Game Theory: introduction and applications to computer networks Zero-Sum Games (follow-up)
Learning in games Vincent Conitzer
Online learning, minimizing regret, and combining expert advice
Online Learning Avrim Blum Carnegie Mellon University Your guide: [Machine Learning Summer School 2012]
Network Theory and Dynamic Systems Game Theory: Mixed Strategies
Part 3: The Minimax Theorem
Equilibrium Concepts in Two Player Games Kevin Byrnes Department of Applied Mathematics & Statistics.
Infinite Horizon Problems
An Introduction to Game Theory Part II: Mixed and Correlated Strategies Bernhard Nebel.
Duality Lecture 10: Feb 9. Min-Max theorems In bipartite graph, Maximum matching = Minimum Vertex Cover In every graph, Maximum Flow = Minimum Cut Both.
Beyond selfish routing: Network Formation Games. Network Formation Games NFGs model the various ways in which selfish agents might create/use networks.
1 Computing Nash Equilibrium Presenter: Yishay Mansour.
An Introduction to Game Theory Part III: Strictly Competitive Games Bernhard Nebel.
An introduction to game theory Today: The fundamentals of game theory, including Nash equilibrium.
APEC 8205: Applied Game Theory Fall 2007
An Introduction to Black-Box Complexity
Finite Mathematics & Its Applications, 10/e by Goldstein/Schneider/SiegelCopyright © 2010 Pearson Education, Inc. 1 of 68 Chapter 9 The Theory of Games.
On Bounded Rationality and Computational Complexity Christos Papadimitriou and Mihallis Yannakakis.
Network Formation Games. Netwok Formation Games NFGs model distinct ways in which selfish agents might create and evaluate networks We’ll see two models:
An introduction to game theory Today: The fundamentals of game theory, including Nash equilibrium.
Inefficiency of equilibria, and potential games Computational game theory Spring 2008 Michal Feldman.
An Intro to Game Theory Avrim Blum 12/07/04.
1 Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Congestion Games, Potential Games and Price of Anarchy Liad Blumrosen ©
Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.
Minimax strategies, Nash equilibria, correlated equilibria Vincent Conitzer
CPS Learning in games Vincent Conitzer
Computational Optimization
1 Introduction to Randomized Algorithms Srikrishnan Divakaran DA-IICT.
1 Introduction to Randomized Algorithms Srikrishnan Divakaran DA-IICT.
The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.
Yossi Azar Tel Aviv University Joint work with Ilan Cohen Serving in the Dark 1.
Nash equilibrium Nash equilibrium is defined in terms of strategies, not payoffs Every player is best responding simultaneously (everyone optimizes) This.
Game theory & Linear Programming Steve Gu Mar 28, 2008.
Game Theory: introduction and applications to computer networks Game Theory: introduction and applications to computer networks Introduction Giovanni Neglia.
1 Intrinsic Robustness of the Price of Anarchy Tim Roughgarden Stanford University.
Beyond selfish routing: Network Games. Network Games NGs model the various ways in which selfish agents strategically interact in using a network They.
1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.
Shall we play a game? Game Theory and Computer Science Game Theory /06/05 - Zero-sum games - General-sum games.
1. 2 Some details on the Simplex Method approach 2x2 games 2xn and mx2 games Recall: First try pure strategies. If there are no saddle points use mixed.
1. 2 You should know by now… u The security level of a strategy for a player is the minimum payoff regardless of what strategy his opponent uses. u A.
1 a1a1 A1A1 a2a2 a3a3 A2A Mixed Strategies When there is no saddle point: We’ll think of playing the game repeatedly. We continue to assume that.
Vasilis Syrgkanis Cornell University
1 Algorithms for Computing Approximate Nash Equilibria Vangelis Markakis Athens University of Economics and Business.
Vincent Conitzer CPS Learning in games Vincent Conitzer
Network Formation Games. NFGs model distinct ways in which selfish agents might create and evaluate networks We’ll see two models: Global Connection Game.
2.5 The Fundamental Theorem of Game Theory For any 2-person zero-sum game there exists a pair (x*,y*) in S  T such that min {x*V. j : j=1,...,n} =
Network Formation Games. NFGs model distinct ways in which selfish agents might create and evaluate networks We’ll see two models: Global Connection Game.
Hypothesis testing and statistical decision theory
The Duality Theorem Primal P: Maximize
Game Theory Just last week:
Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II)
Communication Complexity as a Lower Bound for Learning in Games
Aviv Rosenberg 10/01/18 Seminar on Experts and Bandits
The Weighted Majority Algorithm
Presentation transcript:

Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides from Prof. Yishay Mansour’s course at TAU)

Two Things 1.Ex1 to be published by Thu –submission deadline: , midnight –can submit in pairs –submit through Dr. Blumrosen’s mailbox 2.Debt from last class.

Reminder: Zero-Sum Games A zero-sum game is a 2-player strategic game such that for each s  S, we have u 1 (s) + u 2 (s) = 0. –What is good for me, is bad for my opponent and vice versa (-1,1) (1,-1) (1,-1) (-1,1) Left Right Left Right

Reminder: Minimax-Optimal Strategies A (mixed) strategy s 1 * is minimax optimal for player 1, if min s 2  S 2 u 1 (s 1 *,s 2 ) ≥ min s 2  S 2 u 1 (s 1,s 2 ) for all s 1  S 1 Similar for player 2 Can be found via linear programming.

Reminder: Minimax Theorem Every 2-player zero-sum game has a unique value V. Minimax optimal strategy for R guarantees R’s expected gain at least V. Minimax optimal strategy for C guarantees R’s expected gain at most V.

Algorithmic Implications The minimax theorem is a useful tool in the analysis of randomized algorithms Let’s see why.

There are n boxes and exactly one box contains a dollar bill, and the rest of the boxes are empty. A probe is defined as opening a box to see if it contains the dollar bill. The objective is to locate the box containing the dollar bill while minimizing the number of probes performed. How well can a deterministic algorithm do? Can we do better via a randomized algorithm? –i.e., an algorithm that is a probability distribution over deterministic algorithms Find Bill

Randomized Find: select x in {H,T} uniformly at random 1.if x = H then probe boxes in order from 1 through n and stop if bill is found 2.Otherwise, probe boxes in order from n through 1 and stop if bill is found The expected number of probes made by the algorithm is (n+1)/2. –if the dollar bill is in the i th box, then i probes are made with probability ½ and (n - i + 1) probes are made with probability ½. Randomized Find Alg

Lemma: A lower bound on the expected number of probes required by any randomized algorithm to solve the Find- Bill problem is (n + 1)/2. Proof via the minimax theorem! Randomized Find is Optimal

Row player aims to choose malicious inputs; Column player aims to choose efficient algorithms Payoff for (I,ALG) is the running time of ALG on I The Algorithm Game ALG 1 ALG 2 … ALG n Input 1 Input 2. Input m T(Alg,I)

Pure strategies: –specific input for row player –deterministic algorithm for column player Mixed strategies: –distribution over inputs for row player –randomized algorithm for column player The Algorithm Game ALG 1 ALG 2 … ALG n Input 1 Input 2. Input m T(Alg,I)

If I’m the column player what strategy (i.e., randomized algorithm) do I want to choose? The Algorithm Game ALG 1 ALG 2 … ALG n Input 1 Input 2. Input m T(Alg,I)

What does the minimax theorem mean here? The Algorithm Game ALG 1 ALG 2 … ALG n Input 1 Input 2. Input m T(Alg,I)

Yao’s Principle Let T(I,Alg) denote the time required for deterministic algorithm Alg to run on input I. Then, max p on I min Alg E[T(I p,Alg)] = min q on algs max I E[T(I,Alg q )] So, for any two probability distributions p and q min det-alg E[T(I p,Alg)] max I E[T(I,Alg q )]

Using Yao’s Principle Useful technique for proving lower bounds on running times of randomized algorithms Step I: Design a probability distribution I p over inputs for which every deterministic algorithm’s running time is at least  Step II: Deduce that every randomized algorithm’s (expected) running time is at least 

Lemma: A lower bound on the expected number of probes required by any randomized algorithm to solve the Find-Bill problem is (n + 1)/2. Proof: –Consider the scenario that the bill is located in any one of the n boxes uniformly at random. –Consider only deterministic algorithms that do not probe the same box twice. –By symmetry we can assume that the probe order for a deterministic algorithm ALG is 1 through n. –The expected #probes for ALG is ∑i/n = (n+1)/2 –Yao’s principle implies the lower bound. Back to Find-Bill

No Regret Algs: So far… In some games (e.g., potential games), best-/better-response dynamics are guaranteed to converge to a PNE. In 2-player zero-sum games no-regret dynamics converge to a NE. What about general games?

(0,0) (-3,1) (1,-3) (-4,-4) Stop Go Stop Go Chicken Game What are the pure NEs? What are the (mixed) NEs? ½ ½ ½½ ¼¼ ¼¼

(0,0) (-3,1) (1,-3) (-4,-4) Stop Go Stop Go Correlated Equilibrium: Illustration Suppose that there is a trusted random device that samples a pure strategy profile from a distribution P … and tells each player his component of the strategy profile. If all players other than i are following the strategy suggested by the random device, then i does not have any incentive to deviate. ½ ½0 0

(0,0) (-3,1) (1,-3) (-4,-4) Stop Go Stop Go Correlated Equilibrium: Illustration 1/30 Suppose that there is a trusted random device that samples a pure strategy profile from a distribution P … and tells each player his component of the strategy profile. If all players other than i are following the strategy suggested by the random device, then i does not have any incentive to deviate.

Correlated Equilibrium Consider a game: –S i is the set of (pure) strategies for player i S = S 1 x S 2 x … x S n –s = (s 1,s 2,…,s n )  S is a vector of strategies –U i : S  R is the payoff function for player i. Notation: given a strategy vector s, let s -i = (s 1,…,s i-1,s i,…,s n ) –The vector s i where the i’th element is omitted

Correlated Equilibrium A correlated equilibrium is a probability distribution p over (pure) strategy profiles in S such that for any i, s i, s i ’: Σ s -i p (s i,s -i ) u i (s i,s -i ) ≥ Σ s -i p (s i,s -i ) u i (s i ’,s -i )

1.CE always exists –why? 2.The set of CE is convex –what about NE? –CEs are the solution to a set of linear equations 3.CE can be computed in an efficient manner (e.g., via linear programming) Facts About Correlated Equilibrium

Moreover… When every player uses a no-regret algorithm to select strategies the dynamics converges to a CE –in any game! But this requires a stronger definition of no-regret…

Types of No-Regret Algs No external regret: Do (nearly) as well as best strategy in hindsight –what we’ve been talking about so far –I should have always taken the same route to work… No internal regret: the Alg could not gain (in hindsight) by substituting a single strategy with another (consistently) –each time strategy s i was chosen substitute with s i ’ –each time I bought a Microsoft stock I should have bought the Google stock No internal regret implies no external regret –why?

Minimizing Regret Reminder: Minimizing Regret There are n actions (experts) 1,2, …, n Algorithm selects an action in {1,…,n} At each round t=1,2, …,T and then observes the gain g i,t  [0,1] of each action i  {1,…,n} Let g i =  t g i,t. Let g max = max i g i No external regret: Do (at least) “nearly as well” as g max in hindsight.

Internal Regret Assume that alg outputs action sequence A=a 1 … a T The action sequence A (b → d) : –Change every a i t =b to a i t =d in –g (b → d) is the gain of A (b → d) (for the same gains g i,t) Internal regret: max {b,d} g (b → d) - g alg – = max {b,d} Σ t (g d,t -g b,t )p b,t An algorithm has no internal regret alg if its internal regret goes to 0 as T goes to infinity

Internal Regret and Dominated Strategies Suppose that a player uses a no-internal- regret algorithm to select strategies –in a repeated game against others What guarantees does the player have? –beyond the no-regret guarantee

Dominated Strategies Strategy s i is dominated by a (mixed) strategy s i ’ if for every s -i we have that u i (s i, s -i ) < u i (s i ’, s -i ) Clearly, we like to avoid choosing dominated strategies sisi s’ i

s i is dominated by s i ’ –every time we played s i we do better with s i ’ Define internal regret –swapping the pair of strategies No internal regret  no dominated strategies internal regret (s i  s i ’) gains for (s i  s i ’) gains strategy sequence 2-1=121sisi 5-2=352sisi 9-3=693sisi 1-0=110sisi Internal Regret and Dominated Strategies

Does a No-Internal-Regret Alg Exist? Yes! In fact, there exist algorithms with a stronger guarantee: no swap regret. –no swap regret: alg cannot benefit in hindsight by changing action i to F(i) for any F:{1,…,n} -> {1,…,n} We show a generic reduction from no-external-regret to no-internal-regret

External to Swap Regret Our algorithm utilizes no-external-regret algorithms to achieve no-internal-regret: –n no-external-regret algorithms intuitively, each algorithm represents a strategy in {1,…,n} –for algorithm Alg i, and for any sequence of gain vectors: g Alg i > g max - R i Alg 1 Alg n Alg 2

At time t: –each Alg i outputs a distribution q i induces a matrix Q –our algorithm uses Q to decide on a distribution p over the strategies {1,…,n} –adversary decides on gains vector g= –our algorithm returns to each Alg i some gains vector Alg 1 q1q1 qiqi qnqn Q p External to Swap Regret Alg n Alg 2

34 Combining the No-External-Regret Algs Approach I: –Select an expert A i with probability r i –Let the “selected” expert decide the outcome p –strategy distribution p=Qr Approach II: –Directly decide on p. Our approach: make p=r –Find a p such that p=Qp Q p

Distributing Gain Adversary selects gains g=(g 1 … g n ) Return to Alg i gain vector p i g –Note: Σ p i g = g Alg 1 Alg n Alg 2

At time t: –each Alg i outputs a distribution q i induces a matrix Q –output distribution p such that p=Qp p j = Σ i p i q i,j –observe gains g=(g 1, …,g n ) –return to Alg i the gain vector p i g External to Swap Regret Alg 1 q1q1 qiqi qnqn Q p Alg n Alg 2

Gain of Alg i (from its view) at round t – = p i,t No-external-regret guarantee: –g Alg i = Σ t p i,t > Σ t p i,t g j,t – R i For any swap function F: –g Alg = Σ t = Σ t = Σ t Σ i p i,t = Σ i g Alg i > Σ i Σ t p i,t g F(i),t – R i = g Alg,F - Σ i R i External to Swap Regret

Swap Regret Corollary : Can be improved to :

Summary The Minimax Theorem is a useful tool for analyzing randomized algorithms –Yao’s Principle There exist no-swap-regret algorithms Next time: When all players use no-swap- regret algorithms to select strategies the dynamics converge to a CE