Download presentation
Presentation is loading. Please wait.
Published byCory Thompson Modified over 8 years ago
2
When playing two person finitely repeated games, do people behave like they are adapting a policy directly(Gradient Accent) or do they behave like they are estimating values and choosing actions based on the estimates(Q-learning)? Some research in Neuroscience suggests that Monkeys think like a Q-Learner!?
3
Compared different learners › Q-Learner with -Greedy Exploration › Gradient Accent Learner with decreasing step size › Human Learner 1,20,30,12,0 Child Parent
4
Assign arbitrary Q-values to each strategy A and B. › Will refer to these values Q(A) as Q(B) respectively. › -greedy exploration: With a probability the Q-learner will choose a random action.
5
Reward Function Gradient of Reward Decreasing Step Size Update Function
6
Quickly forgiving Tit-for-Tat player › Always play the last play of the opponent. › If the last action pair was BB then play strategy A on the next play so as to quickly forgive the opponent for making a “poor” choice.
7
Previous selection choice is known Unknown previous selection choice
8
Intelligence order › Human>Q-Learner>Gradient Accent Learner=Nash Future Research › Q-Learner with History Assumed that historical Q-Learner’s actions will better resemble human behavior. How do people attempt to make “good” action choices? › Create GUI so other people can play against Q-Learner. › Payoff alteration?
9
[1] Babes, M., Munoz de Cote, E., and Littman, M. Social reward shaping in the prisoner's dilemma. In 7th International Joint Conference on Autonomous Agents and Multiagent Systems, pages 1389-1392, 2008. [2] Littman, M. Markov games as a framework for multi-agent reinforcement learning. Proceedings of eleventh international conference on machine learning, pp.157- 163 San Francisco. CA. Morgan Koufmann. 1994. [3] Singh, S., Kearns, M., and Mansour, Y. Nash convergence of gradient dynamics in general-sum games, Proceedings of the Sixteenth Conference on Uncertainty in Artifcial Intelligence, Morgan Kaufman, 2000. [4] Straffin, P. Game Theory and Strategy. Washington DC. The Mathematical Association of America, 2006. [5] Wunder, M., Littman, M., and Babes, M. Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration. Proceedings of twenty-seventh International Conference on Machine Learning, 2010
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.