Download presentation
Presentation is loading. Please wait.
1
999999-1 XYZ 6/18/2015 MIT Brain and Cognitive Sciences Convergence Analysis of Reinforcement Learning Agents Srinivas Turaga 9.912 30th March, 2004
2
MIT Brain & Cog. Sci 999999-2 XYZ 6/18/2015 The Learning Algorithm Players use stochastic strategies. Players only observe their reward. Players attempt to estimate the value of choosing a particular action. The Assumptions The Algorithm Play action i with probability Pr(i) Observe reward r Update value function v
3
MIT Brain & Cog. Sci 999999-3 XYZ 6/18/2015 The Learning Algorithm The Algorithm Payoff matrix Player 2’s choice Player 1’s choice Value of action i Play action i with probability Pr(i) –Proportional to value of action i Observe reward r –Depends on other player’s choice j also Update value function v –2 simple schemes If action i chosen: If action i not chosen: Algorithm 1 Algorithm 2 forgetting no forgetting
4
MIT Brain & Cog. Sci 999999-4 XYZ 6/18/2015 Analysis Techniques Analysis of stochastic dynamics is hard! So approximate: –Consider average case (deterministic) –Consider continuous time (differential equation) Random! Discrete time! Deterministic! Discrete time! Deterministic! Continuous time!
5
MIT Brain & Cog. Sci 999999-5 XYZ 6/18/2015 Results - Matching Pennies Game Analysis shows a stable fixed point corresponding to matching behavior. Simulations of stochastic algorithm and deterministic dynamics converge as expected. Analysis shows a fixed point corresponding to the Nash equilibrium. Linear stability analysis shows marginal stability. Simulations of stochastic algorithm and deterministic dynamics diverge to corners.
6
MIT Brain & Cog. Sci 999999-6 XYZ 6/18/2015 Future Directions Validate approximation technique. Analyze properties of more general reinforcement learners. Consider situations with asymmetric learning rates. Study behavior of algorithms for arbitrary payoff matrices.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.