Competition between adaptive agents: learning and collective efficiency Damien Challet Oxford University Matteo Marsili ICTP-Trieste (Italy) ● My definition of the Minority Game ● Simple worlds (M= 0) ● Markovian behavior ● Neural networks ● Reinforcement learning ● Multistate worlds (M> 0) ● Cause of large inefficiencies ● Remedies ● From El Farol to MG and back
'Truth is always in the minority' Kierkegaard
Zig-Zag-Zoug ● Game played by Swiss children ● 3 players, 3 feet, 3 magic words ● “Ziiig”... “Zaaag”.... “ZOUG!”
Minority Game ● Zig-Zag-Zoug with N players ● Aim: to be in the minority ● Outcome = #UP-#DOWN = #A-#B ● Model of competition between adaptive players Challet and Zhang (1997), from El Farol's bar problem (Arthur 1994)
Initial goals of the MG El Farol (1994): impossible to understand Drastic simplification, keeping key ingredients Bounded rationality Reinforcement learning Symmetrize the problem: 60/100 -> 50/50 Understand the symmetric problem Generalize results to the asymmetric problem
Repeated games Why playing again ? Frustration Losers in majority How to play ? Deduction Rationality Best answer All lose ! Induction Limited capabilities Beliefs, strategies, personality Trial and error Learning
Minority Game a 1 ( t) a 2 ( t) a N ( t)... A(t) = i a i (t) Payoff player i -a i (t)A(t) N agents i=1,..., N Choice a i (t) +1 Total losses = A 2
Markovian learning 'If it ain't broken, don't fix it' (Reents et al., Physica A 2000: If I won, I stick to my previous choice If I lost, I change to the other choice with prob p Results: ( s 2 = 2 ) ● pN = x = cst (small p): 2 = 1 + 2x (1+ x/6) ● p~ N 1/2 2 ~ N ● p~ 1 2 ~ N 2
Markovian learning II Problem: if N unknown, p= ? Try: p= f(t) e.g. p= t -k Convergence for any N Freezing When to stop ?
Neural networks Simple perceptrons, learning rate R (Metzler ) 2 = N + N(N-1)F(N,R) min 2 = N (1-2/ ) = N
Reinforcement learning ● Each player has a register D i ● D i > 0 + is better ● D i < 0 - is better ● D i (t+1) = D i (t) – A(t) ● Choice: prob(+ | D i ) = f(D i ) f '(x) > 0 (RL)
Reinforcement learning II ● Central result: agents minimize 2 (predictability) for all f ● Stationary state: = 0 ● Fluctuations = ? ● Ex: f(x)=(1+tanh(K x))/2 exponential learning, K learning rate ● K< K c ~ N ● K> K c 2 ~ N 2
Market Impact: each agent has an influence on the outcome ● Naive agents: payoff- A = - A -i -a i ● Non-naive agents: payoff- A + c a i ● Smart agents: payoff - A -i cf WLU, AU ● Central result 2: non-naive agents minimize (fluctuations) for all f -> Nash equilibrium Reinforcement learning III ~ 1
Summary
Minority Games with memory If an agent believes that the outcome depends on the past results, the outcome will depend on the past results. Sun spot effect Self-fulfilling prophecies Fallacies of casual inference Consequence: The other agents will change their behavior accordingly
=P/N s 2 /N Minority Games with memory: naïve agents Fixed randomly drawn strategies = quenched disorder Tools of statistical physics give the exact solution in principle Agents minimize the predictability Predictability = Hamiltonian Optimization problem Numeric: Savit++ PRL99 Analytic: Challet++ PRL99 Coolen+ J. Phys A 2002 ?
Minority Games with memory: low efficiency = P/N
Minority Games with memory: low efficiency P/N is not the right scaling for large fluctuations
Minority Games with memory: origin of low efficiency Stochastic dynamical equation for strategy score U i slow varying part + correlated noise I: Size independent II = K P -1/2 When I << II, large fluctuations Transition at I / K = G / P 1/2 Critical signal to noise ratio = G / P 1/2
Minority Games with memory: origin of low efficiency Check: Determine G Predict critical points I/K G / P 1/2
Minority Games with memory: origin of low efficiency BEFORE AFTER
Minority Games with memory: origin of low efficiency
Minority Games with memory: sophisticated agents Agents minimize fluctuations Optimization problem again
Reverse problem Many variations, different global utility functions ● Grand canonical game (play or not play) ● Time window of scores (exponential moving average) ● Any payoff Hence, given a task (global utility function), one knows how to design agents (local utility). example: optimal defects combinations (cf. Neil's talk)
From El Farol to MG and back El Farol 0 N L MG 0 N L = N/2 Differences, similarities? Which results from MG are valid for El Farol?
From El Farol to MG and back 0 N L Theorem: all results from MG apply to El Farol N Everything scales like (L/N – )/ = P ½ The El Farol problem with P states of the world is solved.
From El Farol to MG and back: new results If (L/N – )/ = P ½ 0, P>P c = 2 S 2 / [ (L/N- ) 2 ]: no more phase transition.
Summary AU/WLU suppresses large fluctuations -> Nash equilibrium Design: agents must know they have an impact. The knowledge of the exact impact not crucial Reverse problem also possible MG: simple, rich, fun, and useful commented references