1 Near-Optimal Play in a Social Learning Game Ryan Carr, Eric Raboin, Austin Parker, and Dana Nau Department of Computer Science, University of Maryland College Park, MD, 20742, USA
2 Outline Introduction The Social Learning Strategies Tournament Evaluating Social Learning Strategies Computing “Near-Optimal” Strategies Conclusions Ongoing and Future Work
3 Introduction Exploration vs. exploitation in a social setting You are an agent in an unknown environment Do not know how to do simple tasks, like find food, travel, etc. Can work out how to do them on your own, or copy others What do you do?
4 The Social Learning Strategies Tournament Evolutionary simulation with social interaction Designed to examine biological and cultural evolution of communication Developed by the European Commission’s Cultaptation Project €10,000 Prize 102 entries from numerous fields (Biology, Physics, Psychology, Sociology, Mathematics, Philosophy…)
5 Our Contributions Metric for evaluating social learning strategies: Expected Reproductive Utility (ERU) Proof that maximizing ERU maximizes chances of winning the game Formula for calculating ERU within any ε > 0 Proof that near-optimal strategies (within ε > 0) can be found with a finite amount of computation Algorithm to do this computation and find the near-optimal strategies
6 The Social Learning Strategies Tournament: Rules Let V be a set of 100 actions, each with some payoff utility drawn from probability distribution π Environment populated by 100 agents, each using some strategy Agents begin with no knowledge of V or π Each round, each agent may: Innovate (I): Learn a random action from V and its payoff Observe (O): Learn an action performed by some other agent, and its payoff Exploit (E): Perform an action to gain payoff Payoff of an action changes with probability c
7 The Social Learning Strategies Tournament: Rules Agents die with a 2% chance every round Dead agents are replaced by offspring of living agents Agents' chance to reproduce is proportional to their per-round payoff Game lasts 10,000 rounds Each strategy's score is its average population over the last 2,500 rounds
8 Example Game Strategy I1 – Innovates once, then exploits forever Learns action 1 (payoff 3) Strategy I2O – Innovates twice, observes, then exploits forever Learns action 3 (payoff 8), action 2 (payoff 5), then observes I1
9 Outline Introduction The Social Learning Strategies Tournament Evaluating Social Learning Strategies Computing “Near-Optimal” Strategies Conclusions Ongoing and Future Work
10 Terminology History h describes what actions the agent performed each round, and their result h =,... Ex: I1:, H is the set of all possible histories A Strategy S is a function that describes what action an agent will take for a given history h S(h) = action Ex: I1(, ) = X[1]
11 Expected Reproductive Utility Each round, a single agent reproduces with probability proportional to its per-round payoff ERU measures the expected total per-round payoff the agent will earn in its lifetime For strategy S Note: Not computable, |H| = ∞
12 Expected Reproductive Utility A recursive definition of ERU: Still not computable, since recursion never stops We can stop the recursion at a certain depth, if we are willing to lose some accuracy
13 Expected Reproductive Utility A depth-limited version: Stops recursion after round k Computable, but not exact Need an upper-bound on ERU obtained after round k
14 Bounding ERU Each round, agent dies with probability d Probability of living k rounds: (1-d) k Let Q be the reproductive utility gained by exploiting some action Assume we do so on every round after k Q(1-d) k+1 + Q(1-d) k+2 + Q(1-d) k is the ERU we get from this This geometric series converges when 0 < d < 1
15 Bounding ERU G(k,v) gives us the ERU gained by exploiting an action with value v on every round after k Derivation of G(k,v) is in the paper Let v max be the highest possible payoff in V G(k, v max ) is an upper-bound on ERU gained after round k
16 Bounding ERU Round k = ≥ Game start
17 Maximizing ERU Leads to Optimal Performance Proof Intuition (more details in paper): ERU(S) sums, for each round r: A - Probability that an agent lives for r rounds, times B - Expected per-round payoff for S on round r Each round, chance that any agent using S reproduces is proportional to the sum, for all r, of: C - Portion of the population of S agents that have lived for r rounds, times D – Average per-round payoff of agents that have lived r rounds In the expected case, A ~ C and B ~ D
18 Outline Introduction The Social Learning Strategies Tournament Evaluating Social Learning Strategies Computing “Near-Optimal” Strategies Conclusions Ongoing and Future Work
19 Finding a Near-Optimal Strategy Corrolary 1: For any error tolerance ε > 0, there is some k' such that G(k', v max ) < ε Given k', we can compute a strategy S' that optimizes ERU over the first k' rounds ERU(S') must be within ε of optimal Algorithm 1 in the paper, essentially a depth-first search of all pairs
20 Outline Introduction The Social Learning Strategies Tournament Evaluating Social Learning Strategies Computing “Near-Optimal” Strategies Conclusions Ongoing and Future Work
21 Conclusions Studying games like the SLST can help us understand cultural and biological evolution of communication We have Introduced the ERU metric for Social Learning strategies Proven that maximizing ERU leads to optimal play in the Social Learning Strategies Tournament Given a formula for computing the ERU of a strategy to within any ε > 0 Given an algorithm that, given any ε > 0, will find a strategy whose ERU is within ε of optimal Limitations Strategy-generating algorithm has exponential running time Algorithm needs detailed models of environment and opponents to guarantee near-optimality
22 Ongoing and Future Work Ongoing work: Mitigating exponential running time of our algorithm Dealing with the lack of accurate opponent models Testing near-optimal strategies against in-house agents Future work: Testing near-optimal strategies against tournament winners Dealing with unknown V and π Dealing with noise in Observe moves