Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides from Prof. Yishay Mansour’s course at TAU)
Two Things 1.Ex1 to be published by Thu –submission deadline: , midnight –can submit in pairs –submit through Dr. Blumrosen’s mailbox 2.Debt from last class.
Reminder: Zero-Sum Games A zero-sum game is a 2-player strategic game such that for each s S, we have u 1 (s) + u 2 (s) = 0. –What is good for me, is bad for my opponent and vice versa (-1,1) (1,-1) (1,-1) (-1,1) Left Right Left Right
Reminder: Minimax-Optimal Strategies A (mixed) strategy s 1 * is minimax optimal for player 1, if min s 2 S 2 u 1 (s 1 *,s 2 ) ≥ min s 2 S 2 u 1 (s 1,s 2 ) for all s 1 S 1 Similar for player 2 Can be found via linear programming.
Reminder: Minimax Theorem Every 2-player zero-sum game has a unique value V. Minimax optimal strategy for R guarantees R’s expected gain at least V. Minimax optimal strategy for C guarantees R’s expected gain at most V.
Algorithmic Implications The minimax theorem is a useful tool in the analysis of randomized algorithms Let’s see why.
There are n boxes and exactly one box contains a dollar bill, and the rest of the boxes are empty. A probe is defined as opening a box to see if it contains the dollar bill. The objective is to locate the box containing the dollar bill while minimizing the number of probes performed. How well can a deterministic algorithm do? Can we do better via a randomized algorithm? –i.e., an algorithm that is a probability distribution over deterministic algorithms Find Bill
Randomized Find: select x in {H,T} uniformly at random 1.if x = H then probe boxes in order from 1 through n and stop if bill is found 2.Otherwise, probe boxes in order from n through 1 and stop if bill is found The expected number of probes made by the algorithm is (n+1)/2. –if the dollar bill is in the i th box, then i probes are made with probability ½ and (n - i + 1) probes are made with probability ½. Randomized Find Alg
Lemma: A lower bound on the expected number of probes required by any randomized algorithm to solve the Find- Bill problem is (n + 1)/2. Proof via the minimax theorem! Randomized Find is Optimal
Row player aims to choose malicious inputs; Column player aims to choose efficient algorithms Payoff for (I,ALG) is the running time of ALG on I The Algorithm Game ALG 1 ALG 2 … ALG n Input 1 Input 2. Input m T(Alg,I)
Pure strategies: –specific input for row player –deterministic algorithm for column player Mixed strategies: –distribution over inputs for row player –randomized algorithm for column player The Algorithm Game ALG 1 ALG 2 … ALG n Input 1 Input 2. Input m T(Alg,I)
If I’m the column player what strategy (i.e., randomized algorithm) do I want to choose? The Algorithm Game ALG 1 ALG 2 … ALG n Input 1 Input 2. Input m T(Alg,I)
What does the minimax theorem mean here? The Algorithm Game ALG 1 ALG 2 … ALG n Input 1 Input 2. Input m T(Alg,I)
Yao’s Principle Let T(I,Alg) denote the time required for deterministic algorithm Alg to run on input I. Then, max p on I min Alg E[T(I p,Alg)] = min q on algs max I E[T(I,Alg q )] So, for any two probability distributions p and q min det-alg E[T(I p,Alg)] max I E[T(I,Alg q )]
Using Yao’s Principle Useful technique for proving lower bounds on running times of randomized algorithms Step I: Design a probability distribution I p over inputs for which every deterministic algorithm’s running time is at least Step II: Deduce that every randomized algorithm’s (expected) running time is at least
Lemma: A lower bound on the expected number of probes required by any randomized algorithm to solve the Find-Bill problem is (n + 1)/2. Proof: –Consider the scenario that the bill is located in any one of the n boxes uniformly at random. –Consider only deterministic algorithms that do not probe the same box twice. –By symmetry we can assume that the probe order for a deterministic algorithm ALG is 1 through n. –The expected #probes for ALG is ∑i/n = (n+1)/2 –Yao’s principle implies the lower bound. Back to Find-Bill
No Regret Algs: So far… In some games (e.g., potential games), best-/better-response dynamics are guaranteed to converge to a PNE. In 2-player zero-sum games no-regret dynamics converge to a NE. What about general games?
(0,0) (-3,1) (1,-3) (-4,-4) Stop Go Stop Go Chicken Game What are the pure NEs? What are the (mixed) NEs? ½ ½ ½½ ¼¼ ¼¼
(0,0) (-3,1) (1,-3) (-4,-4) Stop Go Stop Go Correlated Equilibrium: Illustration Suppose that there is a trusted random device that samples a pure strategy profile from a distribution P … and tells each player his component of the strategy profile. If all players other than i are following the strategy suggested by the random device, then i does not have any incentive to deviate. ½ ½0 0
(0,0) (-3,1) (1,-3) (-4,-4) Stop Go Stop Go Correlated Equilibrium: Illustration 1/30 Suppose that there is a trusted random device that samples a pure strategy profile from a distribution P … and tells each player his component of the strategy profile. If all players other than i are following the strategy suggested by the random device, then i does not have any incentive to deviate.
Correlated Equilibrium Consider a game: –S i is the set of (pure) strategies for player i S = S 1 x S 2 x … x S n –s = (s 1,s 2,…,s n ) S is a vector of strategies –U i : S R is the payoff function for player i. Notation: given a strategy vector s, let s -i = (s 1,…,s i-1,s i,…,s n ) –The vector s i where the i’th element is omitted
Correlated Equilibrium A correlated equilibrium is a probability distribution p over (pure) strategy profiles in S such that for any i, s i, s i ’: Σ s -i p (s i,s -i ) u i (s i,s -i ) ≥ Σ s -i p (s i,s -i ) u i (s i ’,s -i )
1.CE always exists –why? 2.The set of CE is convex –what about NE? –CEs are the solution to a set of linear equations 3.CE can be computed in an efficient manner (e.g., via linear programming) Facts About Correlated Equilibrium
Moreover… When every player uses a no-regret algorithm to select strategies the dynamics converges to a CE –in any game! But this requires a stronger definition of no-regret…
Types of No-Regret Algs No external regret: Do (nearly) as well as best strategy in hindsight –what we’ve been talking about so far –I should have always taken the same route to work… No internal regret: the Alg could not gain (in hindsight) by substituting a single strategy with another (consistently) –each time strategy s i was chosen substitute with s i ’ –each time I bought a Microsoft stock I should have bought the Google stock No internal regret implies no external regret –why?
Minimizing Regret Reminder: Minimizing Regret There are n actions (experts) 1,2, …, n Algorithm selects an action in {1,…,n} At each round t=1,2, …,T and then observes the gain g i,t [0,1] of each action i {1,…,n} Let g i = t g i,t. Let g max = max i g i No external regret: Do (at least) “nearly as well” as g max in hindsight.
Internal Regret Assume that alg outputs action sequence A=a 1 … a T The action sequence A (b → d) : –Change every a i t =b to a i t =d in –g (b → d) is the gain of A (b → d) (for the same gains g i,t) Internal regret: max {b,d} g (b → d) - g alg – = max {b,d} Σ t (g d,t -g b,t )p b,t An algorithm has no internal regret alg if its internal regret goes to 0 as T goes to infinity
Internal Regret and Dominated Strategies Suppose that a player uses a no-internal- regret algorithm to select strategies –in a repeated game against others What guarantees does the player have? –beyond the no-regret guarantee
Dominated Strategies Strategy s i is dominated by a (mixed) strategy s i ’ if for every s -i we have that u i (s i, s -i ) < u i (s i ’, s -i ) Clearly, we like to avoid choosing dominated strategies sisi s’ i
s i is dominated by s i ’ –every time we played s i we do better with s i ’ Define internal regret –swapping the pair of strategies No internal regret no dominated strategies internal regret (s i s i ’) gains for (s i s i ’) gains strategy sequence 2-1=121sisi 5-2=352sisi 9-3=693sisi 1-0=110sisi Internal Regret and Dominated Strategies
Does a No-Internal-Regret Alg Exist? Yes! In fact, there exist algorithms with a stronger guarantee: no swap regret. –no swap regret: alg cannot benefit in hindsight by changing action i to F(i) for any F:{1,…,n} -> {1,…,n} We show a generic reduction from no-external-regret to no-internal-regret
External to Swap Regret Our algorithm utilizes no-external-regret algorithms to achieve no-internal-regret: –n no-external-regret algorithms intuitively, each algorithm represents a strategy in {1,…,n} –for algorithm Alg i, and for any sequence of gain vectors: g Alg i > g max - R i Alg 1 Alg n Alg 2
At time t: –each Alg i outputs a distribution q i induces a matrix Q –our algorithm uses Q to decide on a distribution p over the strategies {1,…,n} –adversary decides on gains vector g= –our algorithm returns to each Alg i some gains vector Alg 1 q1q1 qiqi qnqn Q p External to Swap Regret Alg n Alg 2
34 Combining the No-External-Regret Algs Approach I: –Select an expert A i with probability r i –Let the “selected” expert decide the outcome p –strategy distribution p=Qr Approach II: –Directly decide on p. Our approach: make p=r –Find a p such that p=Qp Q p
Distributing Gain Adversary selects gains g=(g 1 … g n ) Return to Alg i gain vector p i g –Note: Σ p i g = g Alg 1 Alg n Alg 2
At time t: –each Alg i outputs a distribution q i induces a matrix Q –output distribution p such that p=Qp p j = Σ i p i q i,j –observe gains g=(g 1, …,g n ) –return to Alg i the gain vector p i g External to Swap Regret Alg 1 q1q1 qiqi qnqn Q p Alg n Alg 2
Gain of Alg i (from its view) at round t – = p i,t No-external-regret guarantee: –g Alg i = Σ t p i,t > Σ t p i,t g j,t – R i For any swap function F: –g Alg = Σ t = Σ t = Σ t Σ i p i,t = Σ i g Alg i > Σ i Σ t p i,t g F(i),t – R i = g Alg,F - Σ i R i External to Swap Regret
Swap Regret Corollary : Can be improved to :
Summary The Minimax Theorem is a useful tool for analyzing randomized algorithms –Yao’s Principle There exist no-swap-regret algorithms Next time: When all players use no-swap- regret algorithms to select strategies the dynamics converge to a CE