Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computing Nash Equilibrium

Similar presentations


Presentation on theme: "Computing Nash Equilibrium"— Presentation transcript:

1 Computing Nash Equilibrium
Presenter: Yishay Mansour

2 Outline Problem Definition Notation Today: Zero-Sum game
Next week: General Sum Games Multiple players

3 Model Multiple players N={1, ... , n} Strategy set Payoff functions
Player i has m actions Si = {si1, ... , sim} Si are pure actions of player i S = i Si Payoff functions Player i ui : S  

4 Strategies Modified distribution Pure strategies: actions
Mixed strategy Player i – pi distribution over Si Game - P = i pi Product distribution Modified distribution P-i = probability P except for player i (q, P-i ) = player i plays q other player pj

5 Notations Average Payoff Nash Equilibrium
Player i: ui(P) = Es~P[ui(s)] =  P(s)ui(s) P(s) = i pi (si) Nash Equilibrium P* is a Nash Eq. If for every player i For any distribution qi ui(qi,P*-i)  ui(P*) Best Response

6 Notations Alternative payoff Difference in payoff
xij(P) = ui(sij,P-i) = Es~P[ui(s) | si = sij] Difference in payoff zij(P) = xij(P) – ui(P) Improvement in payoff gij(P) = max{ zij(P),0}

7 Fixed point Theorems Intermediate Value Theorem domain [a,b]
function f continuous f(a) f(b) < 0 exists z such that f(z)=0 Proof: M+ = { x | f(x) 0} M- ={x | f(x)  0} closed sets and have an intersection.

8 Brouwer’s Fixed point theorem
f: S  S continuous, S compact and convex There exists z in S : z = f(z) For S=[0,1], previous theorem

9 Kakutani’ Fixed Point Theorem
L: S  S correspondence L(x) is a convex set L semi-continuous S compact and convex There exists z: z in L(z)

10 Nash Equilibrium I Best response correspondence
L(P) = argmaxQ { ui(qi, P-i)} L is a correspondence, continuous Nash is a fixed point of L P* in L(P*) Kakutani’s fixed point theorem

11 Nash Equilibrium II Fixed point K(P) has mN parameters
Kij(P) = (pij+gij(P)) / (1 +  gij(P)) Nash is a fixed point of K P* = K(P*) Original proof of Nash Continuous function on a compact space Brouwer’s fixed point theorem

12 Nash Equilibrium III Non-linear complementary problem (NCP)
Recall zij(P) For every player i and action aij: zij(P)*pij = 0 zi(P) is orthogonal to pi Nash: z(P*)  0 zij(P*)  0

13 Nash Equilibrium IV Stationary point problem
Recall: x = alternative payoff Nash: P* For every P (P-P*) x(P*)  0 (pij –p*ij) x(P*)  0

14 Nash Equilibrium V Minimizing a function Objective function:
V(P) = i j [gij(P)]2 V(P) is continuous and differentiable, non-negative function NASH: V(P*) = 0 Local Minima

15 Nash Equilibrium VI Semi-Algebraic set distribution P: j pij = 1
difference in payoff: zij(P)  0 zij(P) = xij(P) – ui(P)  0 Explicitly:

16 Two player games Payoff matrices (A,B) strategies p and q
m rows and n columns player 1 has m action, player 2 has n actions strategies p and q Payoffs: u1(pq)=pAqt and u2(pq)= pBqt Zero sum game A= -B

17 Linear Programming Primal LP: x in SETprimal is feasible
maximize <c,x> subject to x in SETprimal

18 Linear Programming Dual LP: y in SETdual is feasible
minimize <b,y> subject to y in SETdual

19 Duality Theorem Weak duality: <c,x>  <b,y> Strong Duality
for any feasible x and y proof! Strong Duality If there are feasible solutions then <c,x> = <b,y> for some feasible x and y sketch of proof.

20 Two players zero sum Fix strategy q of player 2,
player 1 best response: maximize p (Aqt) such that j pj = 1 and pj 0 dual LP: minimize u such that u  Aqt Player 2: select strategy q : minimize u such that u  Aqt and i qi = 1 and qi 0 dual (strategy for player 1) maximize v such that v  pA, j pj = 1 and pj 0 There exists a unique value v.

21 Example

22 Summary Two players zero sum linear programming polynomial time
can have multiple Nash unique value! If (p,q) and (p’,q’) Nash then (p,q’) and (p’,q) Nash

23 Online learning Playing with unknown payoff matrix Online algorithm:
at each step selects an action. can be stochastic or fractional Observes all possible payoffs Updates its parameters Goal: Achieve the value of the game Payoff matrix of the “game” define at the end

24 Online learning - Algorithm
Notations: Opponent distribution Qt Our distribution Pt Observed cost M(i, Qt) Should be MQt Goal: minimize cost Algorithm: Exponential weights Action i has weight proportional to bL(i,t) L(i,t) = loss of action i until time t

25 Online algorithm: Notations
Formally: parameter: b 0< b < 1 wt+1(i) = wt(i) bM(i,Qt) Zt =  wt(i) Pt+1(i) = wt+1(i) / Zt Number of total steps T is known

26 Online algorithm: Theorem
For any matrix M with entries in [0,1] Any sequence of dist. Q1 ... QT The algorithm generates P1, ... , PT RE(A||B) = Ex~A [ln (A(x) / B(x) ) ]

27 Online algorithm: Analysis
Lemma For any mixed strategy P Corollary

28 Online Algorithm: Optimization
b= 1/(1 + sqrt{2 (ln n) / T}) Average Loss: v + O(sqrt{(ln n )/T})

29 Two players General sum games
Input matrices (A,B) No unique value Computational issues: find some, all Nash player 1 best response: Like for zero sum: Fix strategy q of player 2 maximize p (Aqt) such that j pj = 1 and pj 0 dual LP: minimize u such that u  Aqt

30 Two players General sum games
Assume the support of strategies known. p has support Sp and q has support Sq Can formulate the Nash as LP:

31 Approximate Nash

32 Lemke & Howson

33 Example


Download ppt "Computing Nash Equilibrium"

Similar presentations


Ads by Google