Online learning and game theory Adam Kalai (joint with Sham Kakade)

Slides:

Advertisements

Similar presentations

Approximate Nash Equilibria in interesting games Constantinos Daskalakis, U.C. Berkeley.

Advertisements

Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.

An Introduction to Game Theory Part V: Extensive Games with Perfect Information Bernhard Nebel.

Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.

Game Theory Assignment For all of these games, P1 chooses between the columns, and P2 chooses between the rows.

Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.

C&O 355 Mathematical Programming Fall 2010 Lecture 12 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.

Learning Voting Trees Ariel D. Procaccia, Aviv Zohar, Yoni Peleg, Jeffrey S. Rosenschein.

Two-Player Zero-Sum Games

MIT and James Orlin © Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)

Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.

Techniques for Dealing with Hard Problems Backtrack: –Systematically enumerates all potential solutions by continually trying to extend a partial solution.

© 2015 McGraw-Hill Education. All rights reserved. Chapter 15 Game Theory.

EC941 - Game Theory Lecture 7 Prof. Francesco Squintani

Regret Minimization and the Price of Total Anarchy Paper by A. Blum, M. Hajiaghayi, K. Ligett, A.Roth Presented by Michael Wunder.

Learning in games Vincent Conitzer

Online learning, minimizing regret, and combining expert advice

1 Learning with continuous experts using Drifting Games work with Robert E. Schapire Princeton University work with Robert E. Schapire Princeton University.

Online Learning Avrim Blum Carnegie Mellon University Your guide: [Machine Learning Summer School 2012]

Part 3: The Minimax Theorem

Games of Prediction or Things get simpler as Yoav Freund Banter Inc.

The Evolution of Conventions H. Peyton Young Presented by Na Li and Cory Pender.

Batch online learning Toyota Technological Institute (TTI)transductive [Littlestone89] i.i.d.i.i.d. Sham KakadeAdam Kalai.

Dynamic Games of Complete Information.. Repeated games Best understood class of dynamic games Past play cannot influence feasible actions or payoff functions.

A camper awakens to the growl of a hungry bear and sees his friend putting on a pair of running shoes, “You can’t outrun a bear,” scoffs the camper. His.

Rational Learning Leads to Nash Equilibrium Ehud Kalai and Ehud Lehrer Econometrica, Vol. 61 No. 5 (Sep 1993), Presented by Vincent Mak

UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Review Midterm3/19 3/12.

Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.

1 Computing Nash Equilibrium Presenter: Yishay Mansour.

Analysis of greedy active learning Sanjoy Dasgupta UC San Diego.

Probably Approximately Correct Model (PAC)

AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.

An Introduction to Game Theory Part III: Strictly Competitive Games Bernhard Nebel.

Correlated-Q Learning and Cyclic Equilibria in Markov games Haoqi Zhang.

Shuchi Chawla, Carnegie Mellon University Static Optimality and Dynamic Search Optimality in Lists and Trees Avrim Blum Shuchi Chawla Adam Kalai 1/6/2002.

PRISONER’S DILEMMA By Ajul Shah, Hiten Morar, Pooja Hindocha, Amish Parekh & Daniel Castellino.

UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.

UNIT II: The Basic Theory Zero-sum Games Nonzero-sum Games Nash Equilibrium: Properties and Problems Bargaining Games Bargaining and Negotiation Review.

An Intro to Game Theory Avrim Blum 12/07/04.

Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.

Minimax strategies, Nash equilibria, correlated equilibria Vincent Conitzer

CPS Learning in games Vincent Conitzer

online convex optimization (with partial information)

The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.

Standard and Extended Form Games A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor, SIUC.

Potential-Based Agnostic Boosting Varun Kanade Harvard University (joint work with Adam Tauman Kalai (Microsoft NE))

Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.

Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides.

1 What is Game Theory About? r Analysis of situations where conflict of interests is present r Goal is to prescribe how conflicts can be resolved 2 2 r.

Shall we play a game? Game Theory and Computer Science Game Theory /06/05 - Zero-sum games - General-sum games.

Algorithms for solving two-player normal form games

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.

CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.

Vincent Conitzer CPS Learning in games Vincent Conitzer

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

When Can Limited Randomness Be Used in Repeated Games? Moni Naor Weizmann Institute of Science Joint work with Pavel Hubáček and Jon Ullman Columbia University.

CS-424 Gregory Dudek Lecture 14 Learning –Inductive inference –Probably approximately correct learning.

Competitive Safety Analysis: Robust Decision-Making in Multi- Agent Systems Moshe Tennenholtz Kyle Rokos.

Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.

Game Theory Just last week:

Better Algorithms for Better Computers

Static Optimality and Dynamic Search Optimality in Lists and Trees

CSCI B609: “Foundations of Data Science”

Presented By Aaron Roth

Normal Form (Matrix) Games

Presentation transcript:

Online learning and game theory Adam Kalai (joint with Sham Kakade)

How do we learn? Goal learn a function f: X ! Y Batch (offline) model  Get training data (x 1,y 1 ),…,(x n,y n ) drawn independently from some distribution  over X £ Y  We output f: X ! Y with low error P  [f(x)  y] Online (repeated game) model, for i=1,2,…,n:  Observe ith example x i 2 X  We predict its label  Observe true label y i 2 {–,+}  Goal: make as few mistakes as possible Distribution-free learning Y = {–,+} =?

Outline 1. Online/batch learnability of F  Online learnability ) batch learnability  Finite learning: batch and online (via weighted maj.)  Batch learnability, online learnability? 2. Online learning in repeated games  Zero-sum: Weighted majority  General-sum: No “internal regret” ) corr. eq.

Online learning Adversary picks (x 1,y 1 ) 2 X £ Y We see x 1 We predict z 1 We see y 1 … Adversary picks (x n,y n ) 2 X £ Y We see x n We predict z n We see y n – + – ? “empirical error” err(A,data) = | {i j z i  y i } | / n X = R 2 Y = {–,+} Online alg. A(x 1,y 1,…,x i-1,y i-1,x i )=z i

Batch Learning X £ Y  X = R 2 Y = {–,+} data: (x 1,y 1 ),…,(x n,y n ) laerning algorithm A f: X ! Y – + – – – – – – – – – – –– – – – – – – – – – – – – – – + – + – – – – +

– – – – – – – – – – –– – – – – – – – – – – – – – – + – + – – – – + Batch Learning X £ Y  X = R 2 Y = {–,+} data: (x 1,y 1 ),…,(x n,y n ) – – – – – – – – – – –– – – – – – – – – – – – – – – + –.+ – – – – + laerning algorithm A f: X ! Y – + “generalization error”   err(f,  ) = Pr  [f(x)  y] “empirical error” err(f,data) = | {i j f(x i )  y i } | / n – +

Online/batch learnability of F Family F of functions f: X ! Y ( Y = {–,+}) Alg. A learns F online if 9 k,c>0  Online input: data (x 1,y 1 ),…,(x n,y n )  Regret(A,data) = err(A,data)–min g 2F err(g,data) Alg. B batch learns F if 9 k,c>0   Input: (x 1,y  ),…,(x n,y n ) independent from    Output: f 2 F, Regret(f,  ) = err(f,  )–min g 2F err(g,  ) X £ Y  8 data E[Regret(A,data)] · k / n c  8  E data [Regret(B,  )] · k / n c

Online learnable ) Batch learnable · Given online learning algorithm A Define batch learning algorithm B   Input: (x 1,y 1 ),(x 2,y 2 ),…,(x n,y n ) from   Let f i (x): X ! Y be f i (x)=A(x 1,y 1,…,x i-1,y i-1,x)  Pick i 2 {1,2,…,n} at random and output f i Analysis E[Regret(A,data)] = E[err(A,data)] – E[min g 2F err(g,data)]  E[Regret(B,  )] = E[err(B,  )] – min g 2F err(g,  )

Given online learning algorithm A Define batch learning algorithm B   Input: (x 1,y 1 ),(x 2,y 2 ),…,(x n,y n ) from   Let f i (x): X ! Y be f i (x)=A(x 1,y 1,…,x i-1,y i-1,x)  Pick i 2 {1,2,…,n} at random and output f i Analysis E[Regret(A,data)] = E[err(A,data)] – E[min g 2F err(g,data)]  E[Regret(B,  )] = E[err(B,  )] – min g 2F err(g,  ) Online learnable ) Batch learnable · ·  ·

Outline 1. Online/batch learnability of F Online learnability ) batch learnability  Finite learning: batch and online (via weighted maj.)  Batch learnability ; online learnability  Batch learnability ) online learnability 2. Online learning in repeated games  Zero-sum: Weighted majority ) eq.  General-sum: No “internal regret” ) corr. eq. transductive

Online majority algorithm Say there is some perfect f* 2 F, err(f*,data)=0 Say | F |=F Predict according to majority of consistent f’s Each mistake Maj makes eliminates ¸ ½ of f’s Maj’s #mistakes · log 2 (F) err(Maj,data) · log 2 (F)/n Perfect f 2 F f 1 f 2 f 3 … f F (live) majority truth y x 1 x 2 x 3 … x n + – + … – – + … + + – … – + –

Naive batch learning Say there is some perfect f* 2 F, err(f*,data)=0 Say | F |=F Select a consistent f Say 8 g  f* err(g,data)=log(F)/n P[err(g,data)=0]= f 1 f 2 f 3 … f F truth y x 1 x 2 x 3 … x n +–+…+++–+…++ ––+…+–––+…+– +++…––+++…–– –+–…–––+–…–– Wow! Online looks like batch. Perfect f 2 F

Naive batch learning Naive batch algorithm  Choose f 2 F that minimizes err(f,data)  For any f 2 F, P[|err(f,data)-err(f,  )|> ] · 2e -2c 2  P[ 9 f 2F |err(f,data)-err(f,  )|> ] · 2Fe -200ln F ·  E[Regret(n.b.,  )] · c (F = | F |)

Weighted majority’ [LW89] Assign weight to each f, 8 f 2F w(f)=1 On period i=1,2,…,n:  Predict weighted maj of f’s  For each f: if f(x i )  y i, w(f):=w(f)/2 (F = | F |) WM’ errs ) total weight decreases by ¸ 25% Final total weight · F (3/4) #mistakes(WM’) Final total weight ¸ 2 -min f #mistakes(f) #mistakes(WM’) · 2.41(min f #mistakes(f)+log(F)/n)

Weighted majority [LW89] Assign weight to each f, 8 f 2F w(f)=1 On period i=1,2,…,n:  Predict weighted maj of f’s  For each f: if f(x i )  y i, w(f):=w(f)(1– ) Thm: E[Regret(WM,data)] · 2 (F = | F |) Wow! Online looks like batch.

Weighted majority extensions… Tracking  On any window W, E[Regret(WM,W)] · c f 1 f 2 f 3 … f F WM truth y x 1 x 2 x 3 … x n +–+…++++–+…+++ ––+…++–––+…++– +++…–+–+++…–+– W

Weighted majority extensions… Multi-armed bandit  You don’t see x i  You pick f  Find out if you erred  E[Regret] · c f 1 f 2 f 3 … f F WM truth y x 1 x 2 x 3 … x n + … + – –…+––…+– +…+–+…+–

Outline 1. Online/batch learnability of F Online learnability ) batch learnability Finite learning: batch and online (via weighted maj.)  Batch learnability ; online learnability  Batch learnability ) (transductive) online learnability 2. Online learning in repeated games  Zero-sum: Weighted majority ) eq.  General-sum: No “internal regret” ) corr. eq.

Batch  Online Define f c : [0,1] ! {+,–}, f c (x) = sgn(x – c) Simple threshold functions F = {f c | c 2 [0,1]} Batch learnable: Yes Online learnable: ?  Adversary does a “random binary search”  Each label is equally likely to be +/–  E[Regret]=½ for any online algorithm x 1 =.5 + x2x2 – x3x x4x4 – No!  x5x5

Key idea: transductive online learning We see x 1,x 2,…,x n 2 X in advance y 1,y 2,…,y n 2 {+,–} are revealed online [KakadeK05] x 1 =.5 + x2x2 – x3x x4x4 –

Adversary picks (x 1,y 1 ),…,(x n,y n ) 2X£Y Adversary reveals x 1,x 2,…,x n We predict z 1 We see y 1 We predict z 2 We see y 2 … We predict z n We see y n – + – ? “empirical error” err(A,data) = | {i j z i  y i } | / n X = R 2 Y = {–,+} Trans. online alg. T(x 1,y 1,…,x i-1,y i-1,x i,x i+1,…,x n )=z i Key idea: transductive online learning [KakadeK05]

Algorithm for trans. online learning We see x 1,x 2,…,x n 2 X in advance y 1,y 2,…,y n 2 {+,–} are revealed online Algorithm for trans. online learning  L distinct labelings f(x 1 ),f(x 2 ),…,f(x n ) over all f 2 F  Effective size of F is L  Run WM on L functions  E[Regret(WM,data)] · 2 [KK05] f1f2f3…f1f1f2f3…f1 x 1 x 2 x 3 … x n +++…++++…+ ––+…+––+…+ +++…–+++…– ++–…–++–…–

Candidate efficient algorithm f 1 f 2 f 3 … f 1 True y x 1 x 2 x 3 … x n … + + ––+…+–––+…+– +++…–+++…– ++–…–++–…– + – – … + (random)

How many labelings? Shattering & VC Def: S µ X is shattered if there are 2 |S| ways to label S by f 2 F VC( F ) = max |S| S is shattered by F Example VC dimension captures complexity of F – + + – + – + + –

How many labelings? Shattering & VC Sauer’s lemma: # labelings L = O(n VC( F ) ) ) E[Regret(WM,data)] ·

Cannot batch learn faster than VC( F ) Shattered set S, |S| = VC( F ), n > 0 X £ Y  Batch training set of size n Each x 2 S is not in training set with probability (1-1/n) n ¼ e -1  ) E[Regret(B,  )] ¸ c VC( F ) /n

Putting it together Transductive online: E[Regret(WM,data)] =  Batch: E[Regret(B,  )] ¸ Trans. online learnable, batch learnable, finite VC( F ) Almost identical to standard VC bound

Learnability conclusions Finite VC( F ) characterizes batch and transductive learnability Open problem: what propertyof F characterizes online learnability (non- transductive) Efficiency!?  WM algorithm requires enumeration of F  Thm [KK05] : if one can efficiently find lowest error f 2 F, then one can design efficient online learning algorithm

Online learning in repeated games

Repeated games Example: Rounds i=1,2,…,n:  Players simultaneously choose actions  Players receive payoff, goal: max total payoff  Learning: players need not know opponent/game  Feedback: player only finds out payoff of his action and alternatives (not opponent action) 0,0-1,11,-1 -1,10,0-1,1 1,-1-1,10,0 RPS S P R Pl. 1 Pl. 2

(Mixed) Nash Equlibrium Each player chooses a dist. over actions Players are optimizing relative to opponent(s) 0,0-1,11,-1 -1,10,0-1,1 1,-1-1,10,0 RPS S P R 1/3 

Online learning in 0-sum games (Schapire recap) Payoff is A(i,j) for pl. 1, -A(i,j) for pl. 2 Going first is disadvantage: max i min j A(i,j) · min j max i A(i,j) Mixed strategies: max min  A(,  ) · min  max A(,  ) Min-max theorem “=”

Each player uses weighted majority  Maintain weight on each action, initially equal  Choose an action proportional to weight  (assume payoffs are in [-1,1])  Find out payoffs of each action  For each action weight Ã weight*(1 +  payoff) Regret = possible improvement (in hindsight) from always playing a single action WM ) regret is low Online learning in 0-sum games (Schapire recap)

Actions are (a 1,b 1 ),(a 2,b 2 ),…,(a n,b n ) Regret of pl. 1 is Let be empirical distributions of actions a 1,…,a n and b 1,…,b n, respectively Online learning in 0-sum games (Schapire recap) =

WM ) “min-max” theorem max min  A(,  ) = min  max A(,  ) = “value of game” Using WM, each player guarantees regret ! 0, regardless of opponent  Can beat an idiot in tic-tac-toe  Reasonable strategy to use  Justifies how such equilibria might arise

Justifying equilibrium in games Online learning gives plausible explanation for how equilibrium might arise Nash equilibrium  Zero-sum games Unique value Fast and easy to “learn”  General-sum games Not unique Fast and easy to “learn”?? Polynomial time algorithm to find one??

General-sum games No unique “value” Many very different equilibria Can’t naively improve a “no regret” algorithm (by playing a single mixed strategy) Low regret for both players ; equilibrium 1,10,0 2,2 ABAB A B e.g.

General sum games Low regret ; Nash equilibrium, e.g., (1,1),(2,2),(1,1),(2,2),(1,1),(2,2),… 0,0-1,-11, 11,-1 -1,-10,01,-11,1 -1,11,1 -1,11,

Refined notion of regret Can’t naively improve a “no regret” algorithm (by playing a single mixed strategy) Might be able to naively improve it by replacing: “When alg. suggests 1, play 3” 0,0-1,-11, 11,-1 -1,-10,01,-11,1 -1,11,1 -1,11,

Play col. 2 Play row 1 Internal regret Internal regret IR(i,j) is how much we could have improved by replacing all occurrences of action i with action j No internal regret ) correlated equilibrium Calibration ) correlated equilibrium [FosterVohra] Correlated Equilibrium [Aumann]  Best strategy to listen to the fish 1,00,10,0 1,00,1 0,01,0 1/6 

 Low internal regret ! correlated eq. Sequence like (1,1),(2,1),(3,2),… Think about this as a distribution No internal regret, correlated eq. Play col. 2 Play row 1 1,00,10,0 1,00,1 0,01,0 1/6 P

Online learning in games conclusions Online learning in zero-sum games  Weighted majority (low regret)  Achieves value of game Online learning in general-sum games  Low internal regret  Achieves correlated equilibrium Open problems  Are there natural dynamics ) Nash Equilibrium  Is correlated equilibrium going take over?