Download presentation
Presentation is loading. Please wait.
Published byStacey Ludwick Modified over 9 years ago
2
Online learning and game theory Adam Kalai (joint with Sham Kakade)
4
How do we learn? Goal learn a function f: X ! Y Batch (offline) model Get training data (x 1,y 1 ),…,(x n,y n ) drawn independently from some distribution over X £ Y We output f: X ! Y with low error P [f(x) y] Online (repeated game) model, for i=1,2,…,n: Observe ith example x i 2 X We predict its label Observe true label y i 2 {–,+} Goal: make as few mistakes as possible Distribution-free learning Y = {–,+} =?
5
Outline 1. Online/batch learnability of F Online learnability ) batch learnability Finite learning: batch and online (via weighted maj.) Batch learnability, online learnability? 2. Online learning in repeated games Zero-sum: Weighted majority General-sum: No “internal regret” ) corr. eq.
6
Online learning Adversary picks (x 1,y 1 ) 2 X £ Y We see x 1 We predict z 1 We see y 1 … Adversary picks (x n,y n ) 2 X £ Y We see x n We predict z n We see y n – + – ? “empirical error” err(A,data) = | {i j z i y i } | / n X = R 2 Y = {–,+} Online alg. A(x 1,y 1,…,x i-1,y i-1,x i )=z i
7
Batch Learning X £ Y X = R 2 Y = {–,+} data: (x 1,y 1 ),…,(x n,y n ) laerning algorithm A f: X ! Y – + – – – – – – – – – – –– – – – – – – – – – – – – + + + + + + + + + + + + + + + + + + + + + + + + + – + + + + + + + + + + – + – + – – – – +
8
– – – – – – – – – – –– – – – – – – – – – – – – + + + + + + + + + + + + + + + + + + + + + + + + + – + + + + + + + + + + – + – + – – – – + Batch Learning X £ Y X = R 2 Y = {–,+} data: (x 1,y 1 ),…,(x n,y n ) – – – – – – – – – – –– – – – – – – – – – – – – + + + + + + + + + + + + + + + + + + + + + + + + + – + + + + + + + + + + – + –.+ – – – – + laerning algorithm A f: X ! Y – + “generalization error” err(f, ) = Pr [f(x) y] “empirical error” err(f,data) = | {i j f(x i ) y i } | / n – +
9
Online/batch learnability of F Family F of functions f: X ! Y ( Y = {–,+}) Alg. A learns F online if 9 k,c>0 Online input: data (x 1,y 1 ),…,(x n,y n ) Regret(A,data) = err(A,data)–min g 2F err(g,data) Alg. B batch learns F if 9 k,c>0 Input: (x 1,y ),…,(x n,y n ) independent from Output: f 2 F, Regret(f, ) = err(f, )–min g 2F err(g, ) X £ Y 8 data E[Regret(A,data)] · k / n c 8 E data [Regret(B, )] · k / n c
10
Online learnable ) Batch learnable · Given online learning algorithm A Define batch learning algorithm B Input: (x 1,y 1 ),(x 2,y 2 ),…,(x n,y n ) from Let f i (x): X ! Y be f i (x)=A(x 1,y 1,…,x i-1,y i-1,x) Pick i 2 {1,2,…,n} at random and output f i Analysis E[Regret(A,data)] = E[err(A,data)] – E[min g 2F err(g,data)] E[Regret(B, )] = E[err(B, )] – min g 2F err(g, )
11
Given online learning algorithm A Define batch learning algorithm B Input: (x 1,y 1 ),(x 2,y 2 ),…,(x n,y n ) from Let f i (x): X ! Y be f i (x)=A(x 1,y 1,…,x i-1,y i-1,x) Pick i 2 {1,2,…,n} at random and output f i Analysis E[Regret(A,data)] = E[err(A,data)] – E[min g 2F err(g,data)] E[Regret(B, )] = E[err(B, )] – min g 2F err(g, ) Online learnable ) Batch learnable · · ·
12
Outline 1. Online/batch learnability of F Online learnability ) batch learnability Finite learning: batch and online (via weighted maj.) Batch learnability ; online learnability Batch learnability ) online learnability 2. Online learning in repeated games Zero-sum: Weighted majority ) eq. General-sum: No “internal regret” ) corr. eq. transductive
13
Online majority algorithm Say there is some perfect f* 2 F, err(f*,data)=0 Say | F |=F Predict according to majority of consistent f’s Each mistake Maj makes eliminates ¸ ½ of f’s Maj’s #mistakes · log 2 (F) err(Maj,data) · log 2 (F)/n Perfect f 2 F f 1 f 2 f 3 … f F (live) majority truth y x 1 x 2 x 3 … x n + – + … + + + – – + … + + – + + + … – + –
14
Naive batch learning Say there is some perfect f* 2 F, err(f*,data)=0 Say | F |=F Select a consistent f Say 8 g f* err(g,data)=log(F)/n P[err(g,data)=0]= f 1 f 2 f 3 … f F truth y x 1 x 2 x 3 … x n +–+…+++–+…++ ––+…+–––+…+– +++…––+++…–– –+–…–––+–…–– Wow! Online looks like batch. Perfect f 2 F
15
Naive batch learning Naive batch algorithm Choose f 2 F that minimizes err(f,data) For any f 2 F, P[|err(f,data)-err(f, )|> ] · 2e -2c 2 P[ 9 f 2F |err(f,data)-err(f, )|> ] · 2Fe -200ln F · 2 -100 E[Regret(n.b., )] · c (F = | F |)
16
Weighted majority’ [LW89] Assign weight to each f, 8 f 2F w(f)=1 On period i=1,2,…,n: Predict weighted maj of f’s For each f: if f(x i ) y i, w(f):=w(f)/2 (F = | F |) WM’ errs ) total weight decreases by ¸ 25% Final total weight · F (3/4) #mistakes(WM’) Final total weight ¸ 2 -min f #mistakes(f) #mistakes(WM’) · 2.41(min f #mistakes(f)+log(F)/n)
17
Weighted majority [LW89] Assign weight to each f, 8 f 2F w(f)=1 On period i=1,2,…,n: Predict weighted maj of f’s For each f: if f(x i ) y i, w(f):=w(f)(1– ) Thm: E[Regret(WM,data)] · 2 (F = | F |) Wow! Online looks like batch.
18
Weighted majority extensions… Tracking On any window W, E[Regret(WM,W)] · c f 1 f 2 f 3 … f F WM truth y x 1 x 2 x 3 … x n +–+…++++–+…+++ ––+…++–––+…++– +++…–+–+++…–+– W
19
Weighted majority extensions… Multi-armed bandit You don’t see x i You pick f Find out if you erred E[Regret] · c f 1 f 2 f 3 … f F WM truth y x 1 x 2 x 3 … x n + … + – –…+––…+– +…+–+…+–
20
Outline 1. Online/batch learnability of F Online learnability ) batch learnability Finite learning: batch and online (via weighted maj.) Batch learnability ; online learnability Batch learnability ) (transductive) online learnability 2. Online learning in repeated games Zero-sum: Weighted majority ) eq. General-sum: No “internal regret” ) corr. eq.
21
Batch Online Define f c : [0,1] ! {+,–}, f c (x) = sgn(x – c) Simple threshold functions F = {f c | c 2 [0,1]} Batch learnable: Yes Online learnable: ? Adversary does a “random binary search” Each label is equally likely to be +/– E[Regret]=½ for any online algorithm x 1 =.5 + x2x2 – x3x3 + 01 x4x4 – No! x5x5
22
Key idea: transductive online learning We see x 1,x 2,…,x n 2 X in advance y 1,y 2,…,y n 2 {+,–} are revealed online [KakadeK05] x 1 =.5 + x2x2 – x3x3 + 01 x4x4 –
23
Adversary picks (x 1,y 1 ),…,(x n,y n ) 2X£Y Adversary reveals x 1,x 2,…,x n We predict z 1 We see y 1 We predict z 2 We see y 2 … We predict z n We see y n – + – ? “empirical error” err(A,data) = | {i j z i y i } | / n X = R 2 Y = {–,+} Trans. online alg. T(x 1,y 1,…,x i-1,y i-1,x i,x i+1,…,x n )=z i Key idea: transductive online learning [KakadeK05]
24
Algorithm for trans. online learning We see x 1,x 2,…,x n 2 X in advance y 1,y 2,…,y n 2 {+,–} are revealed online Algorithm for trans. online learning L distinct labelings f(x 1 ),f(x 2 ),…,f(x n ) over all f 2 F Effective size of F is L Run WM on L functions E[Regret(WM,data)] · 2 [KK05] f1f2f3…f1f1f2f3…f1 x 1 x 2 x 3 … x n +++…++++…+ ––+…+––+…+ +++…–+++…– ++–…–++–…–
25
Candidate efficient algorithm f 1 f 2 f 3 … f 1 True y x 1 x 2 x 3 … x n + + + … + + ––+…+–––+…+– +++…–+++…– ++–…–++–…– + – – … + (random)
26
How many labelings? Shattering & VC Def: S µ X is shattered if there are 2 |S| ways to label S by f 2 F VC( F ) = max |S| S is shattered by F Example VC dimension captures complexity of F – + + – + – + + –
27
How many labelings? Shattering & VC Sauer’s lemma: # labelings L = O(n VC( F ) ) ) E[Regret(WM,data)] ·
28
Cannot batch learn faster than VC( F ) Shattered set S, |S| = VC( F ), n > 0 X £ Y Batch training set of size n Each x 2 S is not in training set with probability (1-1/n) n ¼ e -1 ) E[Regret(B, )] ¸ c VC( F ) /n
29
Putting it together Transductive online: E[Regret(WM,data)] = Batch: E[Regret(B, )] ¸ Trans. online learnable, batch learnable, finite VC( F ) Almost identical to standard VC bound
30
Learnability conclusions Finite VC( F ) characterizes batch and transductive learnability Open problem: what propertyof F characterizes online learnability (non- transductive) Efficiency!? WM algorithm requires enumeration of F Thm [KK05] : if one can efficiently find lowest error f 2 F, then one can design efficient online learning algorithm
31
Online learning in repeated games
32
Repeated games Example: Rounds i=1,2,…,n: Players simultaneously choose actions Players receive payoff, goal: max total payoff Learning: players need not know opponent/game Feedback: player only finds out payoff of his action and alternatives (not opponent action) 0,0-1,11,-1 -1,10,0-1,1 1,-1-1,10,0 RPS S P R Pl. 1 Pl. 2
33
(Mixed) Nash Equlibrium Each player chooses a dist. over actions Players are optimizing relative to opponent(s) 0,0-1,11,-1 -1,10,0-1,1 1,-1-1,10,0 RPS S P R 1/3
34
Online learning in 0-sum games (Schapire recap) Payoff is A(i,j) for pl. 1, -A(i,j) for pl. 2 Going first is disadvantage: max i min j A(i,j) · min j max i A(i,j) Mixed strategies: max min A(, ) · min max A(, ) Min-max theorem “=”
35
Each player uses weighted majority Maintain weight on each action, initially equal Choose an action proportional to weight (assume payoffs are in [-1,1]) Find out payoffs of each action For each action weight à weight*(1 + payoff) Regret = possible improvement (in hindsight) from always playing a single action WM ) regret is low Online learning in 0-sum games (Schapire recap)
36
Actions are (a 1,b 1 ),(a 2,b 2 ),…,(a n,b n ) Regret of pl. 1 is Let be empirical distributions of actions a 1,…,a n and b 1,…,b n, respectively Online learning in 0-sum games (Schapire recap) =
37
WM ) “min-max” theorem max min A(, ) = min max A(, ) = “value of game” Using WM, each player guarantees regret ! 0, regardless of opponent Can beat an idiot in tic-tac-toe Reasonable strategy to use Justifies how such equilibria might arise
38
Justifying equilibrium in games Online learning gives plausible explanation for how equilibrium might arise Nash equilibrium Zero-sum games Unique value Fast and easy to “learn” General-sum games Not unique Fast and easy to “learn”?? Polynomial time algorithm to find one??
39
General-sum games No unique “value” Many very different equilibria Can’t naively improve a “no regret” algorithm (by playing a single mixed strategy) Low regret for both players ; equilibrium 1,10,0 2,2 ABAB A B e.g.
40
General sum games Low regret ; Nash equilibrium, e.g., (1,1),(2,2),(1,1),(2,2),(1,1),(2,2),… 0,0-1,-11, 11,-1 -1,-10,01,-11,1 -1,11,1 -1,11,1 123 3 2 1 4 4
41
Refined notion of regret Can’t naively improve a “no regret” algorithm (by playing a single mixed strategy) Might be able to naively improve it by replacing: “When alg. suggests 1, play 3” 0,0-1,-11, 11,-1 -1,-10,01,-11,1 -1,11,1 -1,11,1 123 3 2 1 4 4
42
Play col. 2 Play row 1 Internal regret Internal regret IR(i,j) is how much we could have improved by replacing all occurrences of action i with action j No internal regret ) correlated equilibrium Calibration ) correlated equilibrium [FosterVohra] Correlated Equilibrium [Aumann] Best strategy to listen to the fish 1,00,10,0 1,00,1 0,01,0 1/6
43
Low internal regret ! correlated eq. Sequence like (1,1),(2,1),(3,2),… Think about this as a distribution No internal regret, correlated eq. Play col. 2 Play row 1 1,00,10,0 1,00,1 0,01,0 1/6 P
44
Online learning in games conclusions Online learning in zero-sum games Weighted majority (low regret) Achieves value of game Online learning in general-sum games Low internal regret Achieves correlated equilibrium Open problems Are there natural dynamics ) Nash Equilibrium Is correlated equilibrium going take over?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.