Download presentation
Presentation is loading. Please wait.
1
Computing Nash Equilibrium
Presenter: Yishay Mansour
2
Outline Problem Definition Notation Today: Zero-Sum game
Next week: General Sum Games Multiple players
3
Model Multiple players N={1, ... , n} Strategy set Payoff functions
Player i has m actions Si = {si1, ... , sim} Si are pure actions of player i S = i Si Payoff functions Player i ui : S
4
Strategies Modified distribution Pure strategies: actions
Mixed strategy Player i – pi distribution over Si Game - P = i pi Product distribution Modified distribution P-i = probability P except for player i (q, P-i ) = player i plays q other player pj
5
Notations Average Payoff Nash Equilibrium
Player i: ui(P) = Es~P[ui(s)] = P(s)ui(s) P(s) = i pi (si) Nash Equilibrium P* is a Nash Eq. If for every player i For any distribution qi ui(qi,P*-i) ui(P*) Best Response
6
Notations Alternative payoff Difference in payoff
xij(P) = ui(sij,P-i) = Es~P[ui(s) | si = sij] Difference in payoff zij(P) = xij(P) – ui(P) Improvement in payoff gij(P) = max{ zij(P),0}
7
Fixed point Theorems Intermediate Value Theorem domain [a,b]
function f continuous f(a) f(b) < 0 exists z such that f(z)=0 Proof: M+ = { x | f(x) 0} M- ={x | f(x) 0} closed sets and have an intersection.
8
Brouwer’s Fixed point theorem
f: S S continuous, S compact and convex There exists z in S : z = f(z) For S=[0,1], previous theorem
9
Kakutani’ Fixed Point Theorem
L: S S correspondence L(x) is a convex set L semi-continuous S compact and convex There exists z: z in L(z)
10
Nash Equilibrium I Best response correspondence
L(P) = argmaxQ { ui(qi, P-i)} L is a correspondence, continuous Nash is a fixed point of L P* in L(P*) Kakutani’s fixed point theorem
11
Nash Equilibrium II Fixed point K(P) has mN parameters
Kij(P) = (pij+gij(P)) / (1 + gij(P)) Nash is a fixed point of K P* = K(P*) Original proof of Nash Continuous function on a compact space Brouwer’s fixed point theorem
12
Nash Equilibrium III Non-linear complementary problem (NCP)
Recall zij(P) For every player i and action aij: zij(P)*pij = 0 zi(P) is orthogonal to pi Nash: z(P*) 0 zij(P*) 0
13
Nash Equilibrium IV Stationary point problem
Recall: x = alternative payoff Nash: P* For every P (P-P*) x(P*) 0 (pij –p*ij) x(P*) 0
14
Nash Equilibrium V Minimizing a function Objective function:
V(P) = i j [gij(P)]2 V(P) is continuous and differentiable, non-negative function NASH: V(P*) = 0 Local Minima
15
Nash Equilibrium VI Semi-Algebraic set distribution P: j pij = 1
difference in payoff: zij(P) 0 zij(P) = xij(P) – ui(P) 0 Explicitly:
16
Two player games Payoff matrices (A,B) strategies p and q
m rows and n columns player 1 has m action, player 2 has n actions strategies p and q Payoffs: u1(pq)=pAqt and u2(pq)= pBqt Zero sum game A= -B
17
Linear Programming Primal LP: x in SETprimal is feasible
maximize <c,x> subject to x in SETprimal
18
Linear Programming Dual LP: y in SETdual is feasible
minimize <b,y> subject to y in SETdual
19
Duality Theorem Weak duality: <c,x> <b,y> Strong Duality
for any feasible x and y proof! Strong Duality If there are feasible solutions then <c,x> = <b,y> for some feasible x and y sketch of proof.
20
Two players zero sum Fix strategy q of player 2,
player 1 best response: maximize p (Aqt) such that j pj = 1 and pj 0 dual LP: minimize u such that u Aqt Player 2: select strategy q : minimize u such that u Aqt and i qi = 1 and qi 0 dual (strategy for player 1) maximize v such that v pA, j pj = 1 and pj 0 There exists a unique value v.
21
Example
22
Summary Two players zero sum linear programming polynomial time
can have multiple Nash unique value! If (p,q) and (p’,q’) Nash then (p,q’) and (p’,q) Nash
23
Online learning Playing with unknown payoff matrix Online algorithm:
at each step selects an action. can be stochastic or fractional Observes all possible payoffs Updates its parameters Goal: Achieve the value of the game Payoff matrix of the “game” define at the end
24
Online learning - Algorithm
Notations: Opponent distribution Qt Our distribution Pt Observed cost M(i, Qt) Should be MQt Goal: minimize cost Algorithm: Exponential weights Action i has weight proportional to bL(i,t) L(i,t) = loss of action i until time t
25
Online algorithm: Notations
Formally: parameter: b 0< b < 1 wt+1(i) = wt(i) bM(i,Qt) Zt = wt(i) Pt+1(i) = wt+1(i) / Zt Number of total steps T is known
26
Online algorithm: Theorem
For any matrix M with entries in [0,1] Any sequence of dist. Q1 ... QT The algorithm generates P1, ... , PT RE(A||B) = Ex~A [ln (A(x) / B(x) ) ]
27
Online algorithm: Analysis
Lemma For any mixed strategy P Corollary
28
Online Algorithm: Optimization
b= 1/(1 + sqrt{2 (ln n) / T}) Average Loss: v + O(sqrt{(ln n )/T})
29
Two players General sum games
Input matrices (A,B) No unique value Computational issues: find some, all Nash player 1 best response: Like for zero sum: Fix strategy q of player 2 maximize p (Aqt) such that j pj = 1 and pj 0 dual LP: minimize u such that u Aqt
30
Two players General sum games
Assume the support of strategies known. p has support Sp and q has support Sq Can formulate the Nash as LP:
31
Approximate Nash
32
Lemke & Howson
33
Example
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.