1 Near-Optimal Play in a Social Learning Game Ryan Carr, Eric Raboin, Austin Parker, and Dana Nau Department of Computer Science, University of Maryland.

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

Reaching Agreements II. 2 What utility does a deal give an agent? Given encounter  T 1,T 2  in task domain  T,{1,2},c  We define the utility of a.
Nash’s Theorem Theorem (Nash, 1951): Every finite game (finite number of players, finite number of pure strategies) has at least one mixed-strategy Nash.
Markov Decision Process
Evolution and Repeated Games D. Fudenberg (Harvard) E. Maskin (IAS, Princeton)
Joint Strategy Fictitious Play Sherwin Doroudi. “Adapted” from J. R. Marden, G. Arslan, J. S. Shamma, “Joint strategy fictitious play with inertia for.
An Introduction to... Evolutionary Game Theory
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
© 2015 McGraw-Hill Education. All rights reserved. Chapter 15 Game Theory.
On the Genetic Evolution of a Perfect Tic-Tac-Toe Strategy
Artificial Intelligence Adversarial search Fall 2008 professor: Luigi Ceccaroni.
Simulating the Predator-Prey Interaction Lab Introduction.
COMP-4640: Intelligent & Interactive Systems Game Playing A game can be formally defined as a search problem with: -An initial state -a set of operators.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
The Evolution of Conventions H. Peyton Young Presented by Na Li and Cory Pender.
Markov Decision Processes
This time: Outline Game playing The minimax algorithm
Progressive Strategies For Monte-Carlo Tree Search Presenter: Ling Zhao University of Alberta November 5, 2007 Authors: G.M.J.B. Chaslot, M.H.M. Winands,
Solving Probabilistic Combinatorial Games Ling Zhao & Martin Mueller University of Alberta September 7, 2005 Paper link:
Evolutionary Games The solution concepts that we have discussed in some detail include strategically dominant solutions equilibrium solutions Pareto optimal.
Reinforcement Learning
An Introduction to Black-Box Complexity
Artificial Intelligence Genetic Algorithms and Applications of Genetic Algorithms in Compilers Prasad A. Kulkarni.
Evolutionary Games The solution concepts that we have discussed in some detail include strategically dominant solutions equilibrium solutions Pareto optimal.
Department of Computer Science Undergraduate Events More
Interval-based Inverse Problems with Uncertainties Francesco Fedele 1,2 and Rafi L. Muhanna 1 1 School of Civil and Environmental Engineering 2 School.
Ocober 10, 2012Introduction to Artificial Intelligence Lecture 9: Machine Evolution 1 The Alpha-Beta Procedure Example: max.
MDP Reinforcement Learning. Markov Decision Process “Should you give money to charity?” “Would you contribute?” “Should you give money to charity?” $
Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
PSU CS 370 – Introduction to Artificial Intelligence Game MinMax Alpha-Beta.
Sequences Informally, a sequence is a set of elements written in a row. – This concept is represented in CS using one- dimensional arrays The goal of mathematics.
Genetic Algorithm.
Evolutionary Intelligence
Brian Duddy.  Two players, X and Y, are playing a card game- goal is to find optimal strategy for X  X has red ace (A), black ace (A), and red two (2)
Auction Seminar Optimal Mechanism Presentation by: Alon Resler Supervised by: Amos Fiat.
林偉楷 Taiwan Evolutionary Intelligence Laboratory.
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
Introduction Many decision making problems in real life
Agents that can play multi-player games. Recall: Single-player, fully-observable, deterministic game agents An agent that plays Peg Solitaire involves.
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Presenter: Chih-Yuan Chou GA-BASED ALGORITHMS FOR FINDING EQUILIBRIUM 1.
1 A New Method for Composite System Annualized Reliability Indices Based on Genetic Algorithms Nader Samaan, Student,IEEE Dr. C. Singh, Fellow, IEEE Department.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Section 2 – Ec1818 Jeremy Barofsky
Robert Axelrod’s Tournaments Robert Axelrod’s Tournaments, as reported in Axelrod, Robert. 1980a. “Effective Choice in the Prisoner’s Dilemma.” Journal.
L56 – Discrete Random Variables, Distributions & Expected Values
Game Theory by James Crissey Luis Mendez James Reid.
Iterated Prisoner’s Dilemma Game in Evolutionary Computation Seung-Ryong Yang.
IJCAI’07 Emergence of Norms through Social Learning Partha Mukherjee, Sandip Sen and Stéphane Airiau Mathematical and Computer Sciences Department University.
The Mean of a Discrete Random Variable Lesson
The Standard Genetic Algorithm Start with a “population” of “individuals” Rank these individuals according to their “fitness” Select pairs of individuals.
Evolving Strategies for the Prisoner’s Dilemma Jennifer Golbeck University of Maryland, College Park Department of Computer Science July 23, 2002.
Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.
R. Brafman and M. Tennenholtz Presented by Daniel Rasmussen.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
1 Comparative Study of two Genetic Algorithms Based Task Allocation Models in Distributed Computing System Oğuzhan TAŞ 2005.
Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.
Stat 35b: Introduction to Probability with Applications to Poker Outline for the day: 1.Odds ratio example again. 2.Random variables. 3.cdf, pmf, and density,
Genetic Algorithms And other approaches for similar applications Optimization Techniques.
Game Theory Just last week:
PENGANTAR INTELIJENSIA BUATAN (64A614)
Reinforcement Learning (1)
A Distributed Genetic Algorithm for Learning Evaluation Parameters in a Strategy Game Gary Matthias.
Markov Decision Processes
Announcements Homework 3 due today (grace period through Friday)
CASE − Cognitive Agents for Social Environments
Boltzmann Machine (BM) (§6.4)
Presentation transcript:

1 Near-Optimal Play in a Social Learning Game Ryan Carr, Eric Raboin, Austin Parker, and Dana Nau Department of Computer Science, University of Maryland College Park, MD, 20742, USA

2 Outline Introduction The Social Learning Strategies Tournament Evaluating Social Learning Strategies Computing “Near-Optimal” Strategies Conclusions Ongoing and Future Work

3 Introduction Exploration vs. exploitation in a social setting You are an agent in an unknown environment Do not know how to do simple tasks, like find food, travel, etc. Can work out how to do them on your own, or copy others What do you do?

4 The Social Learning Strategies Tournament Evolutionary simulation with social interaction Designed to examine biological and cultural evolution of communication Developed by the European Commission’s Cultaptation Project €10,000 Prize 102 entries from numerous fields (Biology, Physics, Psychology, Sociology, Mathematics, Philosophy…)‏

5 Our Contributions Metric for evaluating social learning strategies: Expected Reproductive Utility (ERU)‏ Proof that maximizing ERU maximizes chances of winning the game Formula for calculating ERU within any ε > 0 Proof that near-optimal strategies (within ε > 0) can be found with a finite amount of computation Algorithm to do this computation and find the near-optimal strategies

6 The Social Learning Strategies Tournament: Rules Let V be a set of 100 actions, each with some payoff utility drawn from probability distribution π Environment populated by 100 agents, each using some strategy Agents begin with no knowledge of V or π Each round, each agent may: Innovate (I): Learn a random action from V and its payoff Observe (O): Learn an action performed by some other agent, and its payoff Exploit (E): Perform an action to gain payoff Payoff of an action changes with probability c

7 The Social Learning Strategies Tournament: Rules Agents die with a 2% chance every round Dead agents are replaced by offspring of living agents Agents' chance to reproduce is proportional to their per-round payoff Game lasts 10,000 rounds Each strategy's score is its average population over the last 2,500 rounds

8 Example Game Strategy I1 – Innovates once, then exploits forever Learns action 1 (payoff 3)‏ Strategy I2O – Innovates twice, observes, then exploits forever Learns action 3 (payoff 8), action 2 (payoff 5), then observes I1

9 Outline Introduction The Social Learning Strategies Tournament Evaluating Social Learning Strategies Computing “Near-Optimal” Strategies Conclusions Ongoing and Future Work

10 Terminology History h describes what actions the agent performed each round, and their result h =,... Ex: I1:, H is the set of all possible histories A Strategy S is a function that describes what action an agent will take for a given history h S(h) = action Ex: I1(, ) = X[1]

11 Expected Reproductive Utility Each round, a single agent reproduces with probability proportional to its per-round payoff ERU measures the expected total per-round payoff the agent will earn in its lifetime For strategy S Note: Not computable, |H| = ∞

12 Expected Reproductive Utility A recursive definition of ERU: Still not computable, since recursion never stops We can stop the recursion at a certain depth, if we are willing to lose some accuracy

13 Expected Reproductive Utility A depth-limited version: Stops recursion after round k Computable, but not exact Need an upper-bound on ERU obtained after round k

14 Bounding ERU Each round, agent dies with probability d Probability of living k rounds: (1-d) k Let Q be the reproductive utility gained by exploiting some action Assume we do so on every round after k Q(1-d) k+1 + Q(1-d) k+2 + Q(1-d) k is the ERU we get from this This geometric series converges when 0 < d < 1

15 Bounding ERU G(k,v) gives us the ERU gained by exploiting an action with value v on every round after k Derivation of G(k,v) is in the paper Let v max be the highest possible payoff in V G(k, v max ) is an upper-bound on ERU gained after round k

16 Bounding ERU Round k = ≥ Game start

17 Maximizing ERU Leads to Optimal Performance Proof Intuition (more details in paper): ERU(S) sums, for each round r: A - Probability that an agent lives for r rounds, times B - Expected per-round payoff for S on round r Each round, chance that any agent using S reproduces is proportional to the sum, for all r, of: C - Portion of the population of S agents that have lived for r rounds, times D – Average per-round payoff of agents that have lived r rounds In the expected case, A ~ C and B ~ D

18 Outline Introduction The Social Learning Strategies Tournament Evaluating Social Learning Strategies Computing “Near-Optimal” Strategies Conclusions Ongoing and Future Work

19 Finding a Near-Optimal Strategy Corrolary 1: For any error tolerance ε > 0, there is some k' such that G(k', v max ) < ε Given k', we can compute a strategy S' that optimizes ERU over the first k' rounds ERU(S') must be within ε of optimal Algorithm 1 in the paper, essentially a depth-first search of all pairs

20 Outline Introduction The Social Learning Strategies Tournament Evaluating Social Learning Strategies Computing “Near-Optimal” Strategies Conclusions Ongoing and Future Work

21 Conclusions Studying games like the SLST can help us understand cultural and biological evolution of communication We have Introduced the ERU metric for Social Learning strategies Proven that maximizing ERU leads to optimal play in the Social Learning Strategies Tournament Given a formula for computing the ERU of a strategy to within any ε > 0 Given an algorithm that, given any ε > 0, will find a strategy whose ERU is within ε of optimal Limitations Strategy-generating algorithm has exponential running time Algorithm needs detailed models of environment and opponents to guarantee near-optimality

22 Ongoing and Future Work Ongoing work: Mitigating exponential running time of our algorithm Dealing with the lack of accurate opponent models Testing near-optimal strategies against in-house agents Future work: Testing near-optimal strategies against tournament winners Dealing with unknown V and π Dealing with noise in Observe moves