Computing Nash Equilibrium

Computing Nash Equilibrium
Presenter: Yishay Mansour

Outline Problem Definition Notation Today: Zero-Sum game
Next week: General Sum Games Multiple players

Model Multiple players N={1, ... , n} Strategy set Payoff functions
Player i has m actions Si = {si1, ... , sim} Si are pure actions of player i S = i Si Payoff functions Player i ui : S  

Strategies Modified distribution Pure strategies: actions
Mixed strategy Player i – pi distribution over Si Game - P = i pi Product distribution Modified distribution P-i = probability P except for player i (q, P-i ) = player i plays q other player pj

Notations Average Payoff Nash Equilibrium
Player i: ui(P) = Es~P[ui(s)] =  P(s)ui(s) P(s) = i pi (si) Nash Equilibrium P* is a Nash Eq. If for every player i For any distribution qi ui(qi,P*-i)  ui(P*) Best Response

Notations Alternative payoff Difference in payoff
xij(P) = ui(sij,P-i) = Es~P[ui(s) | si = sij] Difference in payoff zij(P) = xij(P) – ui(P) Improvement in payoff gij(P) = max{ zij(P),0}

Fixed point Theorems Intermediate Value Theorem domain [a,b]
function f continuous f(a) f(b) < 0 exists z such that f(z)=0 Proof: M+ = { x | f(x) 0} M- ={x | f(x)  0} closed sets and have an intersection.

Brouwer’s Fixed point theorem
f: S  S continuous, S compact and convex There exists z in S : z = f(z) For S=[0,1], previous theorem

Kakutani’ Fixed Point Theorem
L: S  S correspondence L(x) is a convex set L semi-continuous S compact and convex There exists z: z in L(z)

Nash Equilibrium I Best response correspondence
L(P) = argmaxQ { ui(qi, P-i)} L is a correspondence, continuous Nash is a fixed point of L P* in L(P*) Kakutani’s fixed point theorem

Nash Equilibrium II Fixed point K(P) has mN parameters
Kij(P) = (pij+gij(P)) / (1 +  gij(P)) Nash is a fixed point of K P* = K(P*) Original proof of Nash Continuous function on a compact space Brouwer’s fixed point theorem

Nash Equilibrium III Non-linear complementary problem (NCP)
Recall zij(P) For every player i and action aij: zij(P)*pij = 0 zi(P) is orthogonal to pi Nash: z(P*)  0 zij(P*)  0

Nash Equilibrium IV Stationary point problem
Recall: x = alternative payoff Nash: P* For every P (P-P*) x(P*)  0 (pij –p*ij) x(P*)  0

Nash Equilibrium V Minimizing a function Objective function:
V(P) = i j [gij(P)]2 V(P) is continuous and differentiable, non-negative function NASH: V(P*) = 0 Local Minima

Nash Equilibrium VI Semi-Algebraic set distribution P: j pij = 1
difference in payoff: zij(P)  0 zij(P) = xij(P) – ui(P)  0 Explicitly:

Two player games Payoff matrices (A,B) strategies p and q
m rows and n columns player 1 has m action, player 2 has n actions strategies p and q Payoffs: u1(pq)=pAqt and u2(pq)= pBqt Zero sum game A= -B

Linear Programming Primal LP: x in SETprimal is feasible
maximize <c,x> subject to x in SETprimal

Linear Programming Dual LP: y in SETdual is feasible
minimize <b,y> subject to y in SETdual

Duality Theorem Weak duality: <c,x>  <b,y> Strong Duality
for any feasible x and y proof! Strong Duality If there are feasible solutions then <c,x> = <b,y> for some feasible x and y sketch of proof.

Two players zero sum Fix strategy q of player 2,
player 1 best response: maximize p (Aqt) such that j pj = 1 and pj 0 dual LP: minimize u such that u  Aqt Player 2: select strategy q : minimize u such that u  Aqt and i qi = 1 and qi 0 dual (strategy for player 1) maximize v such that v  pA, j pj = 1 and pj 0 There exists a unique value v.

Example

Summary Two players zero sum linear programming polynomial time
can have multiple Nash unique value! If (p,q) and (p’,q’) Nash then (p,q’) and (p’,q) Nash

Online learning Playing with unknown payoff matrix Online algorithm:
at each step selects an action. can be stochastic or fractional Observes all possible payoffs Updates its parameters Goal: Achieve the value of the game Payoff matrix of the “game” define at the end

Online learning - Algorithm
Notations: Opponent distribution Qt Our distribution Pt Observed cost M(i, Qt) Should be MQt Goal: minimize cost Algorithm: Exponential weights Action i has weight proportional to bL(i,t) L(i,t) = loss of action i until time t

Online algorithm: Notations
Formally: parameter: b 0< b < 1 wt+1(i) = wt(i) bM(i,Qt) Zt =  wt(i) Pt+1(i) = wt+1(i) / Zt Number of total steps T is known

Online algorithm: Theorem
For any matrix M with entries in [0,1] Any sequence of dist. Q1 ... QT The algorithm generates P1, ... , PT RE(A||B) = Ex~A [ln (A(x) / B(x) ) ]

Online algorithm: Analysis
Lemma For any mixed strategy P Corollary

Online Algorithm: Optimization
b= 1/(1 + sqrt{2 (ln n) / T}) Average Loss: v + O(sqrt{(ln n )/T})

Two players General sum games
Input matrices (A,B) No unique value Computational issues: find some, all Nash player 1 best response: Like for zero sum: Fix strategy q of player 2 maximize p (Aqt) such that j pj = 1 and pj 0 dual LP: minimize u such that u  Aqt

Two players General sum games
Assume the support of strategies known. p has support Sp and q has support Sq Can formulate the Nash as LP:

Approximate Nash

Lemke & Howson

Example

Computing Nash Equilibrium

Similar presentations

Presentation on theme: "Computing Nash Equilibrium"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computing Nash Equilibrium

Similar presentations

Presentation on theme: "Computing Nash Equilibrium"— Presentation transcript:

Similar presentations

About project

Feedback