CPS 590.4 Learning in games Vincent Conitzer

Slides:

Advertisements

Similar presentations

On the Approximation Performance of Fictitious Play in Finite Games Paul W. GoldbergU. Liverpool Rahul Savani U. Liverpool Troels Bjerre Sørensen U. Warwick.

Advertisements

Vincent Conitzer CPS Repeated games Vincent Conitzer

Ultimatum Game Two players bargain (anonymously) to divide a fixed amount between them. P1 (proposer) offers a division of the “pie” P2 (responder) decides.

Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.

Game Theory Assignment For all of these games, P1 chooses between the columns, and P2 chooses between the rows.

Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.

Evolution and Repeated Games D. Fudenberg (Harvard) E. Maskin (IAS, Princeton)

Congestion Games with Player- Specific Payoff Functions Igal Milchtaich, Department of Mathematics, The Hebrew University of Jerusalem, 1993 Presentation.

Joint Strategy Fictitious Play Sherwin Doroudi. “Adapted” from J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for.

Calibrated Learning and Correlated Equilibrium By: Dean Foster and Rakesh Vohra Presented by: Jason Sorensen.

EC3224 Autumn Lecture #04 Mixed-Strategy Equilibrium

Learning in games Vincent Conitzer

What is a game?. Game: a contest between players with rules to determine a winner. Strategy: a long term plan of action designed to achieve a particular.

The Evolution of Conventions H. Peyton Young Presented by Na Li and Cory Pender.

1 Evolution & Economics No. 5 Forest fire p = 0.53 p = 0.58.

An Introduction to Game Theory Part I: Strategic Games

Vincent Conitzer CPS Normal-form games Vincent Conitzer

Satisfaction Equilibrium Stéphane Ross. Canadian AI / 21 Problem In real life multiagent systems :  Agents generally do not know the preferences.

by Vincent Conitzer of Duke

A camper awakens to the growl of a hungry bear and sees his friend putting on a pair of running shoes, “You can’t outrun a bear,” scoffs the camper. His.

1 Economics & Evolution Number 2. 2 Reading List.

An Introduction to Game Theory Part II: Mixed and Correlated Strategies Bernhard Nebel.

Yale 11 and 12 Evolutionary Stability: cooperation, mutation, and equilibrium.

A Memetic Framework for Describing and Simulating Spatial Prisoner’s Dilemma with Coalition Formation Sneak Review by Udara Weerakoon.

Games of pure conflict two person constant sum. Two-person constant sum game Sometimes called zero-sum game. The sum of the players’ payoffs is the same,

Communication Networks A Second Course Jean Walrand Department of EECS University of California at Berkeley.

Autonomous Target Assignment: A Game Theoretical Formulation Gurdal Arslan & Jeff Shamma Mechanical and Aerospace Engineering UCLA AFOSR / MURI.

Advanced Microeconomics Instructors: Wojtek Dorabialski & Olga Kiuila Lectures: Mon. & Wed. 9:45 – 11:20 room 201 Office hours: Mon. & Wed. 9:15 – 9:45.

Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces.

Evolutionary Games The solution concepts that we have discussed in some detail include strategically dominant solutions equilibrium solutions Pareto optimal.

AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.

Correlated-Q Learning and Cyclic Equilibria in Markov games Haoqi Zhang.

QR 38, 2/22/07 Strategic form: dominant strategies I.Strategic form II.Finding Nash equilibria III.Strategic form games in IR.

Evolutionary Games The solution concepts that we have discussed in some detail include strategically dominant solutions equilibrium solutions Pareto optimal.

1 On the Agenda(s) of Research on Multi-Agent Learning by Yoav Shoham and Rob Powers and Trond Grenager Learning against opponents with bounded memory.

Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011.

Minimax strategies, Nash equilibria, correlated equilibria Vincent Conitzer

Risk attitudes, normal-form games, dominance, iterated dominance Vincent Conitzer

CPS 170: Artificial Intelligence Game Theory Instructor: Vincent Conitzer.

A Projection Framework for Near- Potential Polynomial Games Nikolai Matni Control and Dynamical Systems, California.

Changing Perspective… Common themes throughout past papers Repeated simple games with small number of actions Mostly theoretical papers Known available.

Chapter 12 Choices Involving Strategy Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written.

CPS 173 Mechanism design Vincent Conitzer

Derivative Action Learning in Games Review of: J. Shamma and G. Arslan, “Dynamic Fictitious Play, Dynamic Gradient Play, and Distributed Convergence to.

Learning in Multiagent systems

1 Economics & Evolution Number 3. 2 The replicator dynamics (in general)

Standard and Extended Form Games A Lesson in Multiagent System Based on Jose Vidal’s book Fundamentals of Multiagent Systems Henry Hexmoor, SIUC.

Presenter: Chih-Yuan Chou GA-BASED ALGORITHMS FOR FINDING EQUILIBRIUM 1.

Moshe Tennenholtz, Aviv Zohar Learning Equilibria in Repeated Congestion Games.

The Science of Networks 6.1 Today’s topics Game Theory Normal-form games Dominating strategies Nash equilibria Acknowledgements Vincent Conitzer, Michael.

CPS 270: Artificial Intelligence Game Theory Instructor: Vincent Conitzer.

Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides.

CPS 570: Artificial Intelligence Game Theory Instructor: Vincent Conitzer.

CPS Utility theory, normal-form games

1 UNIVERSITY OF CALIFORNIA, IRVINE, GAME THEORY AND POLITICS 2, POL SCI 130B, Lecture 2.5: AGGRESSION, VIOLENCE, AND UNCERTAINTY Recall various interpretations.

Vincent Conitzer CPS Learning in games Vincent Conitzer

Replicator Dynamics. Nash makes sense (arguably) if… -Uber-rational -Calculating.

R. Brafman and M. Tennenholtz Presented by Daniel Rasmussen.

OPPONENT EXPLOITATION Tuomas Sandholm. Traditionally two approaches to tackling games Game theory approach (abstraction+equilibrium finding) –Safe in.

Replicator Dynamics.

Communication Complexity as a Lower Bound for Learning in Games

Vincent Conitzer CPS Repeated games Vincent Conitzer

Vincent Conitzer Normal-form games Vincent Conitzer

Economics & Evolution Number 2.

Vincent Conitzer Repeated games Vincent Conitzer

Vincent Conitzer Learning in games Vincent Conitzer

Instructor: Vincent Conitzer

Vincent Conitzer CPS Learning in games Vincent Conitzer

Normal Form (Matrix) Games

Vincent Conitzer CPS Repeated games Vincent Conitzer

Presentation transcript:

CPS Learning in games Vincent Conitzer

“2/3 of the average” game Everyone writes down a number between 0 and 100 Person closest to 2/3 of the average wins Example: –A says 50 –B says 10 –C says 90 –Average(50, 10, 90) = 50 –2/3 of average = –A is closest (| | = 16.67), so A wins

“2/3 of the average” game revisited (2/3)*100 (2/3)*(2/3)*100 … dominated dominated after removal of (originally) dominated strategies

Learning in (normal-form) games Approach we have taken so far when playing a game: just compute an optimal/equilibrium strategy Another approach: learn how to play a game by –playing it many times, and –updating your strategy based on experience Why? –Some of the game’s utilities (especially the other players’) may be unknown to you –The other players may not be playing an equilibrium strategy –Computing an optimal strategy can be hard –Learning is what humans typically do –…–… Learning strategies ~ strategies for the repeated game Does learning converge to equilibrium?

Iterated best response 0, 0-1, 11, -1 0, 0-1, 1 1, -10, 0 In the first round, play something arbitrary In each following round, play a best response against what the other players played in the previous round If all players play this, it can converge (i.e., we reach an equilibrium) or cycle -1, -10, 0 -1, -1 Alternating best response: players alternatingly change strategies: one player best-responds each odd round, the other best-responds each even round rock-paper-scissors a simple congestion game

Fictitious play [Brown 1951] 0, 0-1, 11, -1 0, 0-1, 1 1, -10, 0 In the first round, play something arbitrary In each following round, play a best response against the empirical distribution of the other players’ play –I.e., as if other player randomly selects from his past actions Again, if this converges, we have a Nash equilibrium Can still fail to converge… -1, -10, 0 -1, -1 rock-paper-scissors a simple congestion game

Fictitious play on rock-paper- scissors 0, 0-1, 11, -1 0, 0-1, 1 1, -10, 0 RowColumn 30% R, 50% P, 20% S 30% R, 20% P, 50% S

Does the empirical distribution of play converge to equilibrium? … for iterated best response? … for fictitious play? 3, 01, 2 2, 1

Fictitious play is guaranteed to converge in… Two-player zero-sum games [Robinson 1951] Generic 2x2 games [Miyasawa 1961] Games solvable by iterated strict dominance [Nachbar 1990] Weighted potential games [Monderer & Shapley 1996] Not in general [Shapley 1964] But, fictitious play always converges to the set of ½- approximate equilibria [Conitzer 2009; more detailed analysis by Goldberg, Savani, Sørensen, Ventre 2011]

Shapley’s game on which fictitious play does not converge starting with (U, M): 0, 00, 11, 0 0, 00, 1 1, 00, 0

Regret For each player i, action a i and time t, define the regret r i (a i, t) as (Σ 1≤t’≤t-1 u i (a i, a -i,t’ ) - u i (a i,t’, a -i,t’ ))/(t-1) An algorithm has zero regret if for each a i, the regret for a i becomes nonpositive as t goes to infinity (almost surely) against any opponents Regret matching [Hart & Mas-Colell 00] : at time t, play an action that has positive regret r i (a i, t) with probability proportional to r i (a i, t) –If none of the actions have positive regret, play uniformly at random Regret matching has zero regret If all players use regret matching, then play converges to the set of weak correlated equilibria –Weak correlated equilibrium: playing according to joint distribution is at least as good as any strategy that does not depend on the signal Variants of this converge to the set of correlated equilibria Smooth fictitious play [Fudenberg & Levine 95] also gives no regret –Instead of just best-responding to history, assign some small value to having a more “mixed” distribution

Targeted learning Assume that there is a limited set of possible opponents Try to do well against these Example: is there a learning algorithm that –learns to best-respond against any stationary opponent (one that always plays the same mixed strategy), and –converges to a Nash equilibrium (in actual strategies, not historical distribution) when playing against a copy of itself (so-called self-play)? [Bowling and Veloso AIJ02] : yes, if it is a 2-player 2x2 game and mixed strategies are observable [Conitzer and Sandholm ML06] : yes (without those assumptions) –AWESOME algorithm (Adapt When Everybody is Stationary, Otherwise Move to Equilibrium): (very) rough sketch: play according to equilibrium strategy best-respond to recent history not all players appear to be playing equilibrium not all players appear to be playing stationary strategies

“Teaching” 4, 43, 5 5, 30, 0 Suppose you are playing against a player that uses one of these strategies –Fictitious play, anything with no regret, AWESOME, … Also suppose you are very patient, i.e., you only care about what happens in the long run How will you (the row player) play in the following repeated games? –Hint: the other player will eventually best-respond to whatever you do 1, 03, 1 2, 14, 0 Note relationship to optimal strategies to commit to There is some work on learning strategies that are in equilibrium with each other [Brafman & Tennenholtz AIJ04]

Evolutionary game theory Given: a symmetric game 1, 10, 2 2, 0-1, -1 dove hawk A large population of players plays this game, players are randomly matched to play with each other Each player plays a pure strategy –Fraction of players playing strategy s = p s –p is vector of all fractions p s (the state) Utility for playing s is u(s, p) = Σ s’ p s’ u(s, s’) Players reproduce at a rate that is proportional to their utility, their offspring play the same strategy –Replicator dynamic dp s (t)/dt = p s (t)(u(s, p(t)) - Σ s’ p s’ u(s’, p(t))) What are the steady states of this? Nash equilibria: (d, h), (h, d), ((.5,.5), (.5,.5))

Stability A steady state is stable if slightly perturbing the state will not cause us to move far away from the state E.g. everyone playing dove is not stable, because if a few hawks are added their percentage will grow What about the mixed steady state? Proposition: every stable steady state is a Nash equilibrium of the symmetric game Slightly stronger criterion: a state is asymptotically stable if it is stable, and after slightly perturbing this state, we will (in the limit) return to this state 1, 10, 2 2, 0-1, -1 dove hawk

Evolutionarily stable strategies Now suppose players play mixed strategies A (single) mixed strategy σ is evolutionarily stable if the following is true: –Suppose all players play σ –Then, whenever a very small number of invaders enters that play a different strategy σ’, –the players playing σ must get strictly higher utility than those playing σ’ (i.e., σ must be able to repel invaders) σ will be evolutionarily stable if and only if for all σ’ –u(σ, σ) > u(σ’, σ), or: –u(σ, σ) = u(σ’, σ) and u(σ, σ’) > u(σ’, σ’) Proposition: every evolutionarily stable strategy is asymptotically stable under the replicator dynamic