Download presentation
Presentation is loading. Please wait.
1
Reinforcement Learning for Adaptive Game Learner
Richard S. Sutton 발표자 : 정승우
2
Introduction Selecting the game for RL Simple rules Huge search space
Many people do (deserves to analyze)
3
Introduction(2) So Omok is the one
Goal : to make five-series line against a opponent Board space : 19 X 19 = 361 Search space(Brute Force) : 361! Search space(heuristic) : 100^10 ? Backgammon has approximately about 10^20 states Gerry Tesauro (1992, 1995) combined RL with NN Plays at the level of the world's best human players
4
MiniMax Search(Delayed Effects)
One of the most used RL method (in board game) Explores tree search space and choose the best node (never chooses the node that results in failing in search tree) Assumes that a opponent plays optimally Depends on value function and tree depth But (never chooses the node that results in Risk in search tree) Assumes a opponent plays optimally
5
MiniMax Search(2)
6
Adaptive Play Needs a Model
Simply restarts exploring the search space whenever starting a game Fixed ability regardless of times it played Assumes the same opponents Needs a model for a opponents Can change the ability and policy as playing with some opponent models If opponent models are general one, so is a learning model
7
Adaptive Play Needs a Model(2)
Who can be models? People who are good at or bad at playing Another AI-player Self playing Shadow model Mixed model
8
Remember the Past Plays
Saving the experiences Saving a description of each board position encountered during play together with its backed-up value determined by the minimax procedure If a position that had already been encountered were to occur again as a terminal position of a search tree, the depth of the search was effectively amplified since this position's stored value cached the results of one or more searches conducted earlier It means searching spaces more and more Can accumulate experience, but generalization?
9
Generalization Not only the information during the past plays, but also the information inference when faced to new situations Neural network with RL Generalizes from its experience Make space searching more efficient Input node Board position info, recent movements, value function But regression may bring about the important mistake once in a while
10
Value Function Close to be optimal How 1 How 2 Intuition
Approximating through searching Value(s) = α×Value(s) + (1- α)×max{ Reward(a) + Value(s+1) } How 2 GA, GP for information(features) selecting in value function (e.g. Associates position info with another factors) Intuition (e.g. The number of 3-series lines, the number of white marks in some area, correlation between 3-series lines)
11
Conclusion How can make a learner adaptive for opponent models
How can value function be close to optimal one How searching part of space can get the generalization power Information vs. Inference Search space can be divided into several parts Divide and conquer
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.