Mutually-guided Multi-agent Learning

Mutually-guided Multi-agent Learning
Raghav Aras Alain Dutech François Charpillet (MAIA) June 2004

Outline A review of some Multiagent Q-learning approaches
Our approach for Multiagent learning in a stochastic game Some preliminary results

Multiagent Q-Learning (1)
Q-Learning: (single-agent learning) Q(st, at)  (1 - ) Q(st, at) +  [Rt +  maxa Q(st+1, a)] Known to converge to optimal values minimax-Q: (for zero-sum, 2-player games) V1(s) ← maxP1 ∈ Π(A1) mina2 ∈ A2 a1∈A1 P1(a1)Q1(s,(a1, a2)) Known to converge to optimal values

Multiagent Q-Learning (2)
Nash-Q learning: (for n-agent, general sum SGs) Qi(s, a1,..,an)  (1 - ) Qi(s, a1,..,an) +  [Ri +  NashQi(s’)] where NashQi(s’) = Qi(s’, 1(s’) 2(s’)…n(s’)) Convergences under strict conditions (existence, uniqueness of Nash equilibria)

Drawbacks of NashQ learning
Coordination in choice of Nash Equilibrium Observability of all actions and all rewards Space complexity (each agent): n |S||A|n

The problem that we treat…
n-agent SG <S, A1..An, R1..Rn, P>: Ri: S x (A1 x…An)   P: S x (A1 x…An) x S  {0,1} (deterministic) i  S (set of equally good goal states)  = 1  2 …n ||  1 (atleast one common goal state) An agent’s payoff is the same in all its goal states Agents’ payoffs may be different in the common goal state

Our Interest A more realistic assumption for SGs:
(Actions, rewards of other agents hidden) Investigating « independent » learning in SGs (Leading also to scalability) Using communication to forge cooperation A single-agent learning algorithm giving maximum payoff to a maximum number of agents

1 1 1 Communication in Our Approach
Agents send and receive ‘ping messages’ Sending a message is an action A ping message, …is an n-1 sized array of 0s and 1s …has no content 1 2 3 4 5 Agent 1 sends Agent 2 gets 1 3 4 5 1 2 4 5 Agent 3 gets

Communication based Q-Values
Agent state = <game state, message received> Agent action = <basic action, message to send> 2n – 1 possible messages Mi, agent i message set State set = S x Mi Action set = Ai x Mi Size of Q-value set: S x Mi x Ai x Mi Agent policy i: S x Mi  Ai x Mi

What do we envisage the messages doing?
Alert others of proximity to a goal state Discover the common goal state Enforce preference for the common goal state Main principle of our algorithm: Play safe by inverting actual rewards Create artificial rewards based on messages

The Q-comm Learning algorithm
Agent i initial state i  <S, > Loop (each agent) Select i  <ai, messsend> (Boltzmann, -greedy) Execute i, observe reward Ri i  <S’, messrecd> (next state) RMi  (Ri . messsend) + (Ri . messrecd) Invert reward, Ri  -1 Ri Qi(i, i)  (1 - ) Qi(i, i) +  [Ri + RMi +  max Qi(i , )] i  i, S  S’ Until S   (a goal state)

An n-digit array (number) controlled by n agents
Test Problem: Find the Winning Number (FWN) 9 3 5 8 An n-digit array (number) controlled by n agents Each agent controls a digit Actions: +1, -1, 0 i, list of « winning » numbers for i (unknown) Each num  i gives equal payoff to i  = 1  2 …n  contains a common « winning » number

Results (1): 3 agent FWN 1 = {2, 16, 119}, 2 = {68, 102, 119} , 3 = {37, 86, 119}

Results (2): 3 agent FWN 1 = {2, 16, 119}, 2 = {68, 102, 119} , 3 = {37, 86, 119}

Agents select one common goal
Results (3): Multiple Common Goals Agents select one common goal

Not all agents satisfied!
Results (4): 4 agent FWN Not all agents satisfied!

Summary of Results: Empirically, Q-comm learning finds the common goal
Works with multiple common goals Agents coordinate equilibrium choice Works with upto 3 agents Doesn’t always work for 4 or more agents

Future work: Increase scalability by localising communication
Investigate how it can work for n  4 Analyse convergence

Thank you! Your questions…

Mutually-guided Multi-agent Learning

Similar presentations

Presentation on theme: "Mutually-guided Multi-agent Learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mutually-guided Multi-agent Learning

Similar presentations

Presentation on theme: "Mutually-guided Multi-agent Learning"— Presentation transcript:

Similar presentations

About project

Feedback