Belief Learning in an Unstable Infinite Game Paul J. Healy CMU.

Slides:

Advertisements

Similar presentations

Ultimatum Game Two players bargain (anonymously) to divide a fixed amount between them. P1 (proposer) offers a division of the “pie” P2 (responder) decides.

Advertisements

Logistic Regression Psy 524 Ainsworth.

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Evolution and Repeated Games D. Fudenberg (Harvard) E. Maskin (IAS, Princeton)

Tacit Coordination Games, Strategic Uncertainty, and Coordination Failure John B. Van Huyck, Raymond C. Battalio, Richard O. Beil The American Economic.

Calibrated Learning and Correlated Equilibrium By: Dean Foster and Rakesh Vohra Presented by: Jason Sorensen.

MIT and James Orlin © Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)

Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.

P.J. Healy California Institute of Technology Learning Dynamics for Mechanism Design An Experimental Comparison of Public Goods Mechanisms.

Learning and teaching in games: Statistical models of human play in experiments Colin F. Camerer, Social Sciences Caltech Teck.

Time Series Building 1. Model Identification

Maynard Smith Revisited: Spatial Mobility and Limited Resources Shaping Population Dynamics and Evolutionary Stable Strategies Pedro Ribeiro de Andrade.

1 Evolution & Economics No. 5 Forest fire p = 0.53 p = 0.58.

1 Duke PhD Summer Camp August 2007 Outline  Motivation  Mutual Consistency: CH Model  Noisy Best-Response: QRE Model  Instant Convergence: EWA Learning.

Public goods provision and endogenous coalitions (experimental approach)

1 Teck H. Ho April 8, 2004 Outline  In-Class Experiment and Motivation  Adaptive Experience-Weighted Attraction (EWA) Learning in Games: Camerer and.

Communication Networks A Second Course Jean Walrand Department of EECS University of California at Berkeley.

B OUNDED R ATIONALITY in L ABORATORY B ARGAINING with A SSYMETRIC I NFORMATION Timothy N. Cason and Stanley S. Reynolds Economic Theory, 25, (2005)

1 Teck H. Ho Duke PhD Summer CampAugust 2007 Outline  Motivation  Mutual Consistency: CH Model  Noisy Best-Response: QRE Model  Instant Convergence:

COGNITION AND BEHAVIOR IN TWO-PERSON GUESSING GAMES: AN EXPERIMENTAL STUDY Miguel A. Costa-Gomes Vincent P. Crawford University of York University of California,

Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.

Autonomous Target Assignment: A Game Theoretical Formulation Gurdal Arslan & Jeff Shamma Mechanical and Aerospace Engineering UCLA AFOSR / MURI.

Introduction to Inference Estimating with Confidence Chapter 6.1.

Outline  In-Class Experiment on a Coordination Game  Test of Equilibrium Selection I :Van Huyck, Battalio, and Beil (1990)  Test of Equilibrium Selection.

XYZ 6/18/2015 MIT Brain and Cognitive Sciences Convergence Analysis of Reinforcement Learning Agents Srinivas Turaga th March, 2004.

Project  Now it is time to think about the project  It is a team work Each team will consist of 2 people  It is better to consider a project of your.

1 Information Markets & Decision Makers Yiling Chen Anthony Kwasnica Tracy Mullen Penn State University This research was supported by the Defense Advanced.

Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,

Presenting: Assaf Tzabari

Inference in Dynamic Environments Mark Steyvers Scott Brown UC Irvine This work is supported by a grant from the US Air Force Office of Scientific Research.

1 Economics & Evolution. 2 Cournot Game 2 players Each chooses quantity q i ≥ 0 Player i’s payoff is: q i (1- q i –q j ) Inverse demand (price) No cost.

The Agencies Method for Coalition Formation in Experimental Games John Nash (University of Princeton) Rosemarie Nagel (Universitat Pompeu Fabra, ICREA,

Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Learning dynamics,genetic algorithms,and corporate takeovers Thomas H. Noe,Lynn Pi.

Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.

Bayesian and non-Bayesian Learning in Games Ehud Lehrer Tel Aviv University, School of Mathematical Sciences Including joint works with: Ehud Kalai, Rann.

by B. Zadrozny and C. Elkan

1 A unified approach to comparative statics puzzles in experiments Armin Schmutzler University of Zurich, CEPR, ENCORE.

Derivative Action Learning in Games Review of: J. Shamma and G. Arslan, “Dynamic Fictitious Play, Dynamic Gradient Play, and Distributed Convergence to.

McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.

Random Sampling, Point Estimation and Maximum Likelihood.

Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE.

August th Computer Olympiad1 Learning Opponent-type Probabilities for PrOM search Jeroen Donkers IKAT Universiteit Maastricht.

7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.

Blackjack Betting and Playing Strategies: A Statistical Comparison By Jared Luffman MSIM /3/2007.

Design of an Evolutionary Algorithm M&F, ch. 7 why I like this textbook and what I don’t like about it!

Testing Models on Simulated Data Presented at the Casualty Loss Reserve Seminar September 19, 2008 Glenn Meyers, FCAS, PhD ISO Innovative Analytics.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

2005MEE Software Engineering Lecture 11 – Optimisation Techniques.

Issues in Estimation Data Generating Process:

Data Analysis Econ 176, Fall Populations When we run an experiment, we are always measuring an outcome, x. We say that an outcome belongs to some.

Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.

Part 3 Linear Programming

KEYS Scott Gajewski ART 389A Spring Contents Premise Getting Started -Players -Set-up -Materials Rules -Basics -Points System -Multiple Players.

Math 4030 – 9a Introduction to Hypothesis Testing

Enrica Carbone (UniBA) Giovanni Ponti (UA- UniFE) ESA-Luiss–30/6/2007 Positional Learning with Noise.

Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.

The Evolution of “An Experiment on Exit in Duopoly”

Chapter 10 Understanding Randomness. Why Be Random? What is it about chance outcomes being random that makes random selection seem fair? Two things: –

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Chapter 7: Random Variables 7.2 – Means and Variance of Random Variables.

CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

OVERCOMING COORDINATION FAILURE THROUGH NEIGHBORHOOD CHOICE ~AN EXPERIMENTAL STUDY~ Maastricht University Arno Riedl Ingrid M.T. Rohde Martin Strobel.

Correlated equilibria, good and bad: an experimental study

General principles in building a predictive model

Economics & Evolution.

The Public Goods Environment

Presentation transcript:

Belief Learning in an Unstable Infinite Game Paul J. Healy CMU

Belief Learning in an Unstable Infinite Game Issue #3 Issue #1 Issue #2

Issue #1: Infinite Games Typical Learning Model: –Finite set of strategies –Strategies get weight based on ‘fitness’ –Bells & Whistles: experimentation, spillovers… Many important games have infinite strategies –Duopoly, PG, bargaining, auctions, war of attrition… Quality of fit sensitive to grid size? Models don’t use strategy space structure

Previous Work Grid size on fit quality: –Arifovic & Ledyard Groves-Ledyard mechanisms Convergence failure of RL with |S| = 51 Strategy space structure: –Roth & Erev AER ’99 Quality-of-fit/error measures –What’s the right metric space? Closeness in probs. or closeness in strategies?

Issue #2: Unstable Game Usually predicting convergence rates –Example: p–beauty contests Instability: –Toughest test for learning models –Most statistical power

Previous Work Chen & Tang ‘98 –Walker mechanism & unstable Groves-Ledyard –Reinforcement > Fictitious Play > Equilibrium Healy ’06 –5 PG mechanisms, predicting convergence or not Feltovich ’00 –Unstable finite Bayesian game –Fit varies by game, error measure

Issue #3: Belief Learning If subjects are forming beliefs, measure them! Method 1: Direct elicitation –Incentivized guesses about s -i Method 2: Inferred from payoff table usage –Tracking payoff ‘lookups’ may inform our models

Previous Work Nyarko & Schotter ‘02 –Subjects BR to stated beliefs –Stated beliefs not too accurate Costa-Gomes, Crawford & Boseta ’01 –Mouselab to identify types –How players solve games, not learning

This Paper Pick an unstable infinite game Give subjects a calculator tool & track usage Elicit beliefs in some sessions Fit models to data in standard way Study formation of “beliefs” –“Beliefs” <= calculator tool –“Beliefs” <= elicited beliefs

The Game Walker’s PG mechanism for 3 players Added a ‘punishment’ parameter

Parameters & Equilibrium v i (y) = b i y – a i y 2 + c i Pareto optimum: y = 7.5 Unique PSNE: s i * = 2.5 Punishment γ = 0.1 Purpose: Not too wild, payoffs rarely negative Guessing Payoff: 10 – |g L - s L |/4 - |g R - s R |/4 Game Payoffs: Pr(<50) = 8.9% Pr(>100) = 71% aiai bibi cici

Choice of Grid Size Grid Width5211/21/41/8 # Grid Points % on Grid S = [-10,10]

Properties of the Game Best response: BR Dynamics: unstable –One eigenvalue is +2

Interface

Design PEEL Lab, U. Pittsburgh All Sessions –3 player groups, 50 periods –Same group, ID#s for all periods –Payoffs etc. common information –No explicit public good framing –Calculator always available –5 minute ‘warm-up’ with calculator Sessions 1-6 –Guess s L and s R. Sessions 7-13 –Baseline: no guesses.

Total Variation: – No significant difference (p=0.745) No. of Strategy Switches: –No significant difference (p=0.405) Autocorrelation (predictability): –Slightly more without elicitation Total Earnings per Session: –No significant difference (p=1) Missed Periods: –Elicited: 9/300 (3%) vs. Not: 3/350 (0.8%) Does Elicitation Affect Choice?

Does Play Converge? Average | s i – s i * | per Period Average | y – y o | per Period

Does Play Converge, Part 2

Accuracy of Beliefs Guesses get better in time Average || s -i – s -i (t) || per Period Elicited guessesCalculator inputs

Model 1: Parametric EWA δ : weight on strategy actually played φ : decay rate of past attractions ρ : decay rate of past experience A(0): initial attractions N(0): initial experience λ : response sensitivity to attractions

Model 1’: Self-Tuning EWA N(0) = 1 Replace δ and φ with deterministic functions:

STEWA: Setup Only remaining parameters: λ and A 0 –λ will be estimated –5 minutes of ‘Calculator Time’ gives A 0 Average payoff from calculator trials:

STEWA: Fit Likelihoods are ‘zero’ for all λ –Guess: Lots of near misses in predictions Alternative Measure: Quad. Scoring Rule –Best fit: λ = 0.04 (previous studies: λ>4) –Suggests attractions are very concentrated

STEWA: Adjustment Attempts The problem: near misses in strategy space, not in time Suggests: alter δ (weight on hypotheticals) –original specification : QSR* = λ*=0.04 –δ = 0.7 (p-beauty est.): QSR* = λ*=0.03 –δ = 1 (belief model): QSR* = λ*=0.175 –δ(k,t) = % of B.R. payoff: QSR* = λ*=0.06 Altering φ: –1/8 weight on surprises: QSR* = λ*=0.04

STEWA: Other Modifications Equal initial attractions: worse Smoothing –Takes advantage of strategy space structure λ spreads probability across strategies evenly Smoothing spreads probability to nearby strategies –Smoothed Attractions –Smoothed Probabilities –But… No Improvement in QSR* or λ* ! Tentative Conclusion: –STEWA: not broken, or can’t be fixed…

Other Standard Models Nash Equilibrium Uniform Mixed Strategy (‘Random’) Logistic Cournot BR Deterministic Cournot BR Logistic Fictitious Play Deterministic Fictitious Play k-Period BR

“New” Models Best respond to stated beliefs (S1-S6 only) Best respond to calculator entries –Issue: how to aggregate calculator usage? –Decaying average of input Reinforcement based on calculator payoffs –Decaying average of payoffs

Model Comparisons MODELPARAMBIC2-QSRMADMSD Random Choice*N/AIn: InfiniteIn: Out: In: Out: In: Out: Logistic STEWA*λIn: InfiniteIn: Out: λ*=0.04 In: Out: λ*=0.41 In: Out: λ*=0.35 Logistic Cournot*λIn: InfiniteIn: Out: λ*=0.00(!) In: Out: λ*=4.30 In: Out: λ*=4.30 Logistic F.P.*λIn: InfiniteIn: Out: λ*=14.98 In: Out: λ*=4.47 In: Out: λ*=4.47 * Estimates on the grid of integers {-10,-9,…,9,10} In = periods 1-35 Out = periods 36-End

Model Comparisons 2 MODELPARAMMADMSD BR(Guesses) (6 sessions only) N/AIn: Out: In: Out: BR(Calculator Input)δ (=1/2)In: Out: In: Out: Calculator Reinforcement* δ (=1/2)In: Out: In: Out: k-Period BRkIn: Out: k* = 4 In: Out: k* = 4 CournotN/AIn: Out: In: Out: Weighted F.P.δIn: Out: δ* = 0.56 In: Out: δ * = 0.65

The “Take-Homes” Methodological issues –Infinite strategy space –Convergence vs. Instability –Right notion of error Self-Tuning EWA fits best. Guesses & calculator input don’t seem to offer any more predictive power… ?!?!