Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II)

Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב
Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II) (Some slides from Prof. Avrim Blum’s course at CMU and Prof. Yishay Mansour’s course at TAU)

Recap: Regret Minimization

Example 1: Weather Forecast
Sunny: Rainy: No meteorological understanding! using other web sites forecast Web site CNN BBC weather.com OUR Goal: Nearly the most accurate forecast

Example 2: Route Selection
Goal: Fastest route Challenge: Partial Information

Financial Markets Model: Select a portfolio each day.
Gain: The changes in stock values. Performance goal: Compare well with the best “static” policy.

Reminder: Minimizing Regret
There are n strategies (experts) 1,2, …, n At each round t=1,2, …,T Algorithm selects a strategy in {1,…,n} and then observes the loss li,t[0,1] of each strategy i{1,…,n} Let li = Stli,t. Let lmin = minili Goal: Do “nearly as well” as lmin in hindsight. Have no regret!

Randomized Weighted Majority
Initially: wi,1=1 and pi,1=1/n for each strategy i At time t=1,…,T Select strategy i with probability pi,t Observe loss vector lt Update the weights wi,t+1 := wi,t(1-li,t). pi,t+1 := wi,t+1/Wt+1 where Wt = Siwi,t+1

Formal Guarantee for RWM
Theorem: If L = expected total loss of alg by time T+1 and OPT = accumulated loss of best strategy by time T L < OPT + 2(T log(n))1/2. “additive regret” bound An algorithm is called a “no-regret algorithm” if L < OPT + f(T) and f(T)/T goes to 0 as T goes to infinity

Analysis Let Ft be the expected loss of RWM at time t Ft=Sipi,tli,t
Observe that Wfinal = n(1-eF1)(1 - eF2)… ln(Wfinal) = ln(n) + åt [ln(1 - eFt)] < ln(n) - e åt Ft (using ln(1-x) < -x) = ln(n) - eL (using å Ft = L = E[total loss of RWM by time T+1])

Analysis (Cont.) Let i be the best strategy in hindsight
Observe that wi,final = 1(1-li,1e)(1-li,2e)…(1-li,ne) ln(wi,final) = ln(1-li,1e)+ln(1-li,2e)+…+ln(1-li,Te) > -Stli,te-Stli,te2 = lie(1+e) = -(Stli,t)e(1+e) = -lie(1+e) = -lmine(1+e) (using -z-z2 ≤ ln(1-z) for 0 ≤ z ≤ ½ and li,t in [0,1])

Analysis (Cont.) wi,final < Wfinal ln(wi,final) < ln(Wfinal) -lmine(1+e) < ln(n) – eL L < (1+e)lmin + (1/e)ln(n) Set e=(log(n) / T)1/2 to get L < lmin + 2(T * log(n))1/2

Equivalently There are n strategies (experts) 1,2, …, n
At each round t=1,2, …, T Algorithm selects a strategy in {1,…,n} and then observes the gain gi,t[0,1] of each strategy i{1,…,n} Let gi = Stgi,t. Let gmax = maxigi Goal: Do “nearly as well” as gmax in hindsight Let G be the algorithm’s expected total gain by time T+1. RWM (setting li,t=1-gi,t) guarantees that G > gmax – 2(T * log(n))1/2

So, why is this in an algorithmic game theory course?

Rock-Paper-Scissors (1,-1) (-1,1) (0, 0)

Rock-Paper-Scissors Say that you are the row player … and you play multiple times How should you play the game?! what does “playing well” mean? highly opponent dependent One (weak?) option: Do “nearly as well” as best pure strategy in hindsight! (1,-1) (-1,1) (0, 0)

Rock-Paper-Scissors (1,-1) (-1,1) (0, 0) Use no-regret algorithm to select strategy at each time step Why does this make sense?

Zero-Sum Games

Reminder: Mixed strategies
Definition: a “mixed strategy” is a probability distribution over actions. If {a1,a2,…,am} are the pure strategies of A, then {p1,…,pm} is a mixed strategy for A if (1) (2) For all i Tail Heads -1,1 1,-1 1/4 9/10 1/2 1/3 1/2 1/10 1/2 2/3 1

Reminder: Mixed Nash Eq.
Main idea: given a fixed behavior of the others, I will not change my strategy. Definition: (SA,SB) are in Nash Equilibrium, if each strategy is a best response to the other. 1/2 1/2 זוג פרט -1,1 1,-1 1/2 1/2

Zero-Sum Games A zero-sum game is a 2-player strategic game such that for each s  S, we have u1(s) + u2(s) = 0. What is good for me, is bad for my opponent and vice versa Note: Any game where the sum is a constant c can be transformed into a zero-sum game with the same set of equilibria: u’1(a) = u1(a) u’2(a) = u2(a) - c

2-Player Zero-Sum Games
(1,-1) (-1,1) (0, 0) (-1,1) (1,-1) (1,-1) (-1,1) Left Right Left Right

How to Play Zero-Sum Games?
Assume that only pure strategies are allowed Be paranoid: Try to minimize your loss by assuming the worst! Player 1 takes minimum over row values: T: -6, M: -1, B: -6 then maximizes: M: -1 L M R T 8,-8 3,-3 -6,6 2,-2 -1,1 B 4,-4

Minimax-Optimal Strategies
A (mixed) strategy s1* is minimax optimal for player 1, if mins2 S2 u1(s1*,s2) ≥ mins2 S2 u1(s1,s2) for all s1  S1 Similar for player 2 Can be found via linear programming.

E.g., penalty shot (-1,1) (1,-1) (1,-1) (-1,1) Left Right Left Right Minimax optimal for both players is 50/50. Gives expected gain of 0. Any other is worse.

E.g., penalty shot with goalie who’s weaker on the left. (0,0) (1,-1) (1,-1) (-1,1) Left Right Left Right Minimax optimal for both players is (2/3,1/3). Gives expected gain 1/3. Any other is worse.

Minimax Theorem (von Neumann 1928)
Every 2-player zero-sum game has a unique value V. Minimax optimal strategy for R guarantees R’s expected gain at least V. Minimax optimal strategy for C guarantees R’s expected gain at most V.

Proof Via Regret (sketch)
Suppose for contradiction this is false. This means some game G has VC > VR: If column player commits first, there exists a row that gets at least VC. But if row player has to commit first, the column player can make him get only VR. Scale matrix so that payoffs are in [0,1] and say that VR = (1-e)Vc. VC VR

Proof (Cont.) Consider the scenario that both players repeatedly play the game G and each uses a no-regret algorithm. Let si,t be the (pure) strategy of player i at time t Let qi = (1/T)Stsi,t qi is called i’s empiricial distribution Observe that player 1’s average gain when playing a pure strategy s1 against the sequence {s2,t} is exactly u1(s1,q2)

Proof (Cont.) Player 1 is using a no-regret algorithm and so his average gain is at least Vc (as T goes to infinity) Similarly, player 2 is using a no-regret algorithm and so player 1’s average gain is at most VR (as T goes to infinity). A contradiction! VC VR

Convergence to Nash Suppose that each of the players in a 2-player zero-sum game is using a no-regret algorithm to select strategies. Let qi be the empirical distribution of each player i in {1,2}. (q1,q2) converges to a Nash equilibrium as T goes to infinity!

Rock-Paper-Scissors Use no-regret algorithm
Adjust to the opponent’s play No need to know the entire game in advance Payoff can be more than the game’s value V If both players do this outcome is a Nash equilibrium

Summary No-regret algorithms help deal with uncertainty in repeated decision making. Implications for game theory: When both players use no-regret algorithms in a 2-player zero-sum game convergence to a Nash equilibrium is guaranteed.

Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II)

Similar presentations

Presentation on theme: "Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II)

Similar presentations

Presentation on theme: "Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part II)"— Presentation transcript:

Similar presentations

About project

Feedback