999999-1 XYZ 6/18/2015 MIT Brain and Cognitive Sciences Convergence Analysis of Reinforcement Learning Agents Srinivas Turaga 9.912 30th March, 2004.

Slides:



Advertisements
Similar presentations
Ultimatum Game Two players bargain (anonymously) to divide a fixed amount between them. P1 (proposer) offers a division of the “pie” P2 (responder) decides.
Advertisements

Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.
1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.
Introduction to Game theory Presented by: George Fortetsanakis.
Course: Applications of Information Theory to Computer Science CSG195, Fall 2008 CCIS Department, Northeastern University Dimitrios Kanoulas.
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
Network Theory and Dynamic Systems Game Theory: Mixed Strategies
P.J. Healy California Institute of Technology Learning Dynamics for Mechanism Design An Experimental Comparison of Public Goods Mechanisms.
On Quantum Walks and Iterated Quantum Games G. Abal, R. Donangelo, H. Fort Universidad de la República, Montevideo, Uruguay UFRJ, RJ, Brazil.
Satisfaction Equilibrium Stéphane Ross. Canadian AI / 21 Problem In real life multiagent systems :  Agents generally do not know the preferences.
1 Duke PhD Summer Camp August 2007 Outline  Motivation  Mutual Consistency: CH Model  Noisy Best-Response: QRE Model  Instant Convergence: EWA Learning.
Algoritmi per Sistemi Distribuiti Strategici
IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com.
IN SEARCH OF VALUE EQUILIBRIA By Christopher Kleven & Dustin Richwine xkcd.com.
Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.
1 Teck H. Ho April 8, 2004 Outline  In-Class Experiment and Motivation  Adaptive Experience-Weighted Attraction (EWA) Learning in Games: Camerer and.
Game Theory Lecture Jan 18. In Bertrand’s model of oligopoly A)Each firm chooses its quantity as the best response to the quantity chosen by the other(s).
Communication Networks A Second Course Jean Walrand Department of EECS University of California at Berkeley.
Game Theory and Applications following H. Varian Chapters 28 & 29.
6/2/2001 Cooperative Agent Systems: Artificial Agents Play the Ultimatum Game Steven O. Kimbrough Presented at FMEC 2001, Oslo Joint work with Fang Zhong.
Games as Systems Administrative Stuff Exercise today Meet at Erik Stemme
INSTITUTO DE SISTEMAS E ROBÓTICA Minimax Value Iteration Applied to Robotic Soccer Gonçalo Neto Institute for Systems and Robotics Instituto Superior Técnico.
Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces.
6.1 Consider a simultaneous game in which player A chooses one of two actions (Up or Down), and B chooses one of two actions (Left or Right). The game.
Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.
Reinforcement Learning
Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6 th, 2006 CS286r Presented by Ilan Lobel.
Quantal Response Equilibrium APEC 8205: Applied Game Theory Fall 2007.
Multiple timescales for multiagent learning David Leslie and E. J. Collins University of Bristol David Leslie is supported by CASE Research Studentship.
Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?
Two-Stage Games APEC 8205: Applied Game Theory Fall 2007.
1 Economics & Evolution. 2 Cournot Game 2 players Each chooses quantity q i ≥ 0 Player i’s payoff is: q i (1- q i –q j ) Inverse demand (price) No cost.
Bayesian Networks, Influence Diagrams, and Games in Simulation Metamodeling Jirka Poropudas (M.Sc.) Aalto University School of Science and Technology Systems.
Learning dynamics,genetic algorithms,and corporate takeovers Thomas H. Noe,Lynn Pi.
September 15September 15 Multiagent learning using a variable learning rate Igor Kiselev, University of Waterloo M. Bowling and M. Veloso. Artificial Intelligence,
MAKING COMPLEX DEClSlONS
Reinforcement Learning on Markov Games Nilanjan Dasgupta Department of Electrical and Computer Engineering Duke University Durham, NC Machine Learning.
Game Theory The Prisoner’s Dilemma Game. “Strategic thinking is the art of outdoing an adversary, knowing that the adversary is trying to do the same.
Exponential Moving Average Q- Learning Algorithm By Mostafa D. Awheda Howard M. Schwartz Presented at the 2013 IEEE Symposium Series on Computational Intelligence.
Derivative Action Learning in Games Review of: J. Shamma and G. Arslan, “Dynamic Fictitious Play, Dynamic Gradient Play, and Distributed Convergence to.
Learning in Multiagent systems
林偉楷 Taiwan Evolutionary Intelligence Laboratory.
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
Dynamical Replica Theory for Dilute Spin Systems Jon Hatchett RIKEN BSI In collaboration with I. Pérez Castillo, A. C. C. Coolen & N. S. Skantzos.
Presenter: Chih-Yuan Chou GA-BASED ALGORITHMS FOR FINDING EQUILIBRIUM 1.
\ B A \ Draw a graph to show the expected pay-off for A. What is the value of the game. How often should A choose strategy 1? If A adopts a mixed.
Balancing Exploration and Exploitation Ratio in Reinforcement Learning Ozkan Ozcan (1stLT/ TuAF)
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
1 S ystems Analysis Laboratory Helsinki University of Technology Kai Virtanen, Tuomas Raivio and Raimo P. Hämäläinen Systems Analysis Laboratory Helsinki.
Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 1: Generalized policy iteration.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Reinforcement Learning Yishay Mansour Tel-Aviv University.
3.1.4 Types of Games. Strategic Behavior in Business and Econ Outline 3.1. What is a Game ? The elements of a Game The Rules of the Game:
Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.
Equilibrium transitions in stochastic evolutionary games Dresden, ECCS’07 Jacek Miękisz Institute of Applied Mathematics University of Warsaw.
Reinforcement Learning
IJCAI’07 Emergence of Norms through Social Learning Partha Mukherjee, Sandip Sen and Stéphane Airiau Mathematical and Computer Sciences Department University.
Replicator Dynamics. Nash makes sense (arguably) if… -Uber-rational -Calculating.
Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.
 When playing two person finitely repeated games, do people behave like they are adapting a policy directly(Gradient Accent) or do they behave like they.
Keep the Adversary Guessing: Agent Security by Policy Randomization
Reinforcement Learning Dynamics in Social Dilemmas
Replicator Dynamics.
Reinforcement Learning
Presented By Aaron Roth
Game Theory in Wireless and Communication Networks: Theory, Models, and Applications Lecture 10 Stochastic Game Zhu Han, Dusit Niyato, Walid Saad, and.
Economics & Evolution.
15th Scandinavian Workshop on Algorithm Theory
Presentation transcript:

XYZ 6/18/2015 MIT Brain and Cognitive Sciences Convergence Analysis of Reinforcement Learning Agents Srinivas Turaga th March, 2004

MIT Brain & Cog. Sci XYZ 6/18/2015 The Learning Algorithm Players use stochastic strategies. Players only observe their reward. Players attempt to estimate the value of choosing a particular action. The Assumptions The Algorithm Play action i with probability Pr(i) Observe reward r Update value function v

MIT Brain & Cog. Sci XYZ 6/18/2015 The Learning Algorithm The Algorithm Payoff matrix Player 2’s choice Player 1’s choice Value of action i Play action i with probability Pr(i) –Proportional to value of action i Observe reward r –Depends on other player’s choice j also Update value function v –2 simple schemes If action i chosen: If action i not chosen: Algorithm 1 Algorithm 2 forgetting no forgetting

MIT Brain & Cog. Sci XYZ 6/18/2015 Analysis Techniques Analysis of stochastic dynamics is hard! So approximate: –Consider average case (deterministic) –Consider continuous time (differential equation) Random! Discrete time! Deterministic! Discrete time! Deterministic! Continuous time!

MIT Brain & Cog. Sci XYZ 6/18/2015 Results - Matching Pennies Game Analysis shows a stable fixed point corresponding to matching behavior. Simulations of stochastic algorithm and deterministic dynamics converge as expected. Analysis shows a fixed point corresponding to the Nash equilibrium. Linear stability analysis shows marginal stability. Simulations of stochastic algorithm and deterministic dynamics diverge to corners.

MIT Brain & Cog. Sci XYZ 6/18/2015 Future Directions Validate approximation technique. Analyze properties of more general reinforcement learners. Consider situations with asymmetric learning rates. Study behavior of algorithms for arbitrary payoff matrices.