Presentation is loading. Please wait.

Presentation is loading. Please wait.

Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University.

Similar presentations


Presentation on theme: "Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University."— Presentation transcript:

1 Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University of Bristol and School of Electronics and Computer Science University of Southampton david.leslie@bristol.ac.uk

2 Playing games?

3

4

5

6

7 Dense deployment of sensors to detect pedestrian and vehicle activity within an urban environment. Berkeley Engineering

8 Learning in games Adapt to observations of past play Hope to converge to something “good” Why?! –Bounded rationality justification of equilibrium –Robust to behaviour of “opponents” –Language to describe distributed optimisation

9 Notation Players Discrete action sets Joint action set Reward functions Mixed strategies Joint mixed strategy space Reward functions extend to

10 Best response / Equilibrium Mixed strategies of all players other than i is Best response of player i is An equilibrium is a satisfying, for all i,

11 Fictitious play Estimate strategies of other players Game matrix Select best action given estimates Update estimates

12 Belief updates Belief about strategy of player i is the MLE Online updating

13 Processes of the form where and F is set-valued (convex and u.s.c.) Limit points are chain-recurrent sets of the differential inclusion Stochastic approximation

14 Best-response dynamics Fictitious play has M and e identically 0, and Limit points are limit points of the best- response differential inclusion In potential games (and zero-sum games and some others) the limit points must be Nash equilibria

15 Best-response dynamics Limit points are limit points of the best- response differential inclusion In potential games (and zero-sum games and some others) the limit points must be Nash equilibria

16 Generalised weakened fictitious play Consider any process such that where and and also an interplay betweenand M. The convergence properties do not change

17 Fictitious play Estimate strategies of other players Game matrix Select best action given estimates Update estimates

18 Learning the game

19 Reinforcement learning Track the average reward for each joint action Play each joint action frequently enough Estimates will be close to the expected value Estimated game converges to the true game

20 Q-learned fictitious play Estimate strategies of other players Game matrix Select best action given estimates Estimated game matrix Select best action given estimates Update estimates

21 Theoretical result Theorem – If all joint actions are played infinitely often then beliefs follow a GWFP Proof: The estimated game converges to the true game, so selected strategies are  -best responses.

22 Claus and Boutilier Claus and Boutilier (1998) state a similar result It is restricted to team games

23 Playing games? Dense deployment of sensors to detect pedestrian and vehicle activity within an urban environment. Berkeley Engineering

24 It’s impossible! N players, each with A actions Game matrix has A N entries to learn Each individual must estimate the strategy of every other individual It’s just not possible for realistic game scenarios

25 Marginal contributions Marginal contribution of player i is total system reward – system reward if i absent Maximised marginal contributions implies system is at a (local) optimum Marginal contribution might depend only on the actions of a small number of neighbours

26 Sensors – rewards Global reward for action a is Marginal reward for i is Actually use

27 Marginal contributions

28 Graphical games

29 Local learning Estimate strategies of other players Game matrix Select best action given estimates Estimated game matrix Select best action given estimates Update estimates

30 Local learning Estimate strategies of neighbours Game matrix Select best action given estimates Estimated game matrix for local interactions Select best action given estimates Update estimates

31 Theoretical result Theorem – If all joint actions of local games are played infinitely often then beliefs follow a GWFP Proof: The estimated game converges to the true game, so selected strategies are  -best responses.

32 Sensing results

33 So what?! Play converges to (local) optimum with only noisy information and local communication An individual always chooses an action to maximise expected reward given information If an individual doesn’t “play cricket”, the other individuals will reach an optimal point conditional on the behaviour of the itinerant

34 Summary Learning the game while playing is essential This can be accommodated within the GWFP framework Exploiting the neighbourhood structure of marginal contributions is essential for feasibility


Download ppt "Convergent Learning in Unknown Graphical Games Dr Archie Chapman, Dr David Leslie, Dr Alex Rogers and Prof Nick Jennings School of Mathematics, University."

Similar presentations


Ads by Google