Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November 2004 Reinforcement Learning of Strategies for Settlers of Catan Michael Pfeiffer.

Slides:

Advertisements

Similar presentations

Reinforcement Learning

Advertisements

RL for Large State Spaces: Value Function Approximation

Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.

Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.

Reinforcement Learning

1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Reinforcement Learning.

Reinforcement Learning Tutorial

Reinforcement Learning

November 10, 2009Introduction to Cognitive Science Lecture 17: Game-Playing Algorithms 1 Decision Trees Many classes of problems can be formalized as search.

Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.

Reinforcement Learning

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

לביצוע מיידי ! להתחלק לקבוצות –2 או 3 בקבוצה להעביר את הקבוצות – היום בסוף השיעור ! ספר Reinforcement Learning – הספר קיים online ( גישה מהאתר של הסדנה.

Incorporating Advice into Agents that Learn from Reinforcement Presented by Alp Sardağ.

Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.

Soar-RL: Reinforcement Learning and Soar Shelley Nason.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Reinforcement Learning (1)

Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.

Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK

1 On the Agenda(s) of Research on Multi-Agent Learning by Yoav Shoham and Rob Powers and Trond Grenager Learning against opponents with bounded memory.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.

Reinforcement Learning in the Presence of Hidden States Andrew Howard Andrew Arnold {ah679

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

The Reinforcement Learning Toolbox – Reinforcement Learning in Optimal Control Tasks Gerhard Neumann Master Thesis 2005 Institute für Grundlagen der Informationsverarbeitung.

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence.

Reinforcement Learning

Temporal Difference Learning By John Lenz. Reinforcement Learning Agent interacting with environment Agent receives reward signal based on previous action.

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.

Introduction Many decision making problems in real life

Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

Reinforcement Learning

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.

Reinforcement Learning Ata Kaban School of Computer Science University of Birmingham.

CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

Decision Making Under Uncertainty Lec #8: Reinforcement Learning UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Most slides by Jeremy.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.

Cooperative Q-Learning Lars Blackmore and Steve Block Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents Tan, M Proceedings of the.

MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.

CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:

Design and Implementation of General Purpose Reinforcement Learning Agents Tyler Streeter November 17, 2005.

Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.

Reinforcement learning (Chapter 21)

Selection of Behavioral Parameters: Integration of Case-Based Reasoning with Learning Momentum Brian Lee, Maxim Likhachev, and Ronald C. Arkin Mobile Robot.

Reinforcement Learning

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November 2004 Reinforcement Learning of Strategies for Settlers of Catan Michael Pfeiffer.

Reinforcement Learning Based on slides by Avi Pfeffer and David Parkes.

Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.

Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.

REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,

Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.

CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.

Stochastic tree search and stochastic games

A Comparison of Learning Algorithms on the ALE

Reinforcement Learning

Reinforcement learning (Chapter 21)

Reinforcement Learning

RL for Large State Spaces: Value Function Approximation

Reinforcement Learning

Reinforcement Learning (2)

Reinforcement Learning (2)

Presentation transcript:

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November 2004 Reinforcement Learning of Strategies for Settlers of Catan Michael Pfeiffer Institute for Theoretical Computer Science Graz University of Technology, Austria

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Motivation  Computer Game AI  Mainly relies on prior knowledge of AI designer  inflexible and non-adaptive  Machine Learning in Games  successfully used for classical board games  TD Gammon [Tesauro 95]  self-play reinforcement learning  playing strength of human grandmasters Figures from Sutton, Barto: Reinforcement Learning

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Goal of this Work  Demonstrate self-play Reinforcement Learning (RL) for a large and complex game  Settlers of Catan: popular board game  more in common with commercial computer strategy games than backgammon or chess  in terms of: number of players, possibilities of actions, interaction, non- determinism,...  New RL methods  model tree-based function approximation  speeding up learning  Combination of learning and knowledge  Where in the learning process can we use our prior knowledge about the game?

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Agenda  Introduction  Settlers of Catan  Method  Results  Conclusion

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November The Task: Settlers of Catan  Popular modern board game (1995)  Resources  Production  Construction  Victory Points  Trading  Strategies

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November What makes Settlers so difficult?  Huge state and action space  4 players  Non-deterministic environment  Interaction with opponents

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Agenda  Introduction  Settlers of Catan  Method  Results  Conclusion

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Reinforcement Learning  Goal: Maximize cumulative discounted rewards  Learn optimal state-action value function Q * (s,a)  Learning of strategies through interaction with the environment  Try out actions to get an estimate of Q  Explore new actions, exploit good actions  Improve currently learned policies  Various learning algorithms: Q-Learning, SARSA,...

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Self Play  How to simulate opponents?  Agent learns by playing against itself  Co-evolutionary approach  Most successful approach for RL in Games  TD-Gammon [Tesauro 95]  Apparently works better in non-deterministic games  Sufficient exploration must be guaranteed

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Typical Problems of RL in Games  State Space is too large  Value Function Approximation  Action Space is too large  Hierarchy of Actions  Learning Time is too long  Suitable Representation and Approximation Method  Even obvious moves need to be discovered  A-priori Knowledge

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Function Approximation  Impossible to visit whole state space  Need for generalization from visited states to whole state space  Regression Task: Q(s, a)  F( , a,  )  ... feature representation of s  ... finite parameter vector (e.g. weights of linear functions or ANNs)  Features for Settlers of Catan:  216 high-level concept features (using knowledge)

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Choice of Approximator  Discontinuities in value function  global smoothing is undesirable  Local importance of certain features  impossible with linear methods  Learning time is crucial  [Sridharan and Tesauro, 00] Tree based approximation techniques learn faster than ANNs in such scenarios

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Model Trees  Q-values are targets  Partition state space into homogeneous regions  Learn local linear regression models in leaves  Generalization via Pruning  M5 learning algorithm [Quinlan, 92]

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Pros and Cons of Model Trees  Discrete and real- valued features  Ignores irrelevant features  Local models  Feature combinations  Discontinuities  Easy interpretation  Few parameters  Only offline learning  Need to store all training examples  Long training time  Little experience in RL context  No convergence results in RL context + -

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Offline Training Algorithm One model tree approximates Q-function for one action 1.Use current policy to play 1000 training games 2.Store game traces (states, actions, rewards, successor states) of all 4 players 3.Use current Q-function approximation (model trees) to calculate Q-values of training examples and add them to existing training set 4.Update older training examples 5.Build new model trees from the updated training set 6.Go back to step 1

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Hierarchical RL  Division of action space  3 layer model  Easier integration of a- priori knowledge  Learned information defines primitive actions  e.g. placing roads, trading  Independent Rewards:  high level: winning the game  low level: reaching the behavior‘s goal  otherwise zero

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Example: Trading  Select which trades to offer / accept / reject  Evaluation of a trade:  What increase in low-level value would each trade bring?  Select highest valued trade  Simplification of game design  No economical model needed  Gain in value function naturally replaces prices

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Approaches  Apply Learning or prior Knowledge at different stages of learning architecture  Pure Learning  learning at both levels  Heuristic High-Level  simple hand-coded high-level strategy  learning low-level policies (= AI modules)  Guided Learning  use hand-coded strategy only during learning  off-policy learning of high-level policy

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Agenda  Introduction  Settlers of Catan  Method  Results  Conclusion

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Evaluation Method  3000 – 8000 training matches per approach  Long training time  1 day for 1000 training games  1 day for training of model trees  Evaluation against:  random players  other approaches  human player (myself)  no benchmark program

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Relative Comparison  Combination of Learning and prior Knowledge yields the best results  Low-level Learning is responsible for improvement  Guided approach:  worse than heuristic  better than pure learning Victories of heuristic strategies against other approaches (20 games)

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Against Human Opponent  3 agents vs. author  Performance Measure:  Avg. maximum victory points  10 VP: win every game  8 VP: close to winning in every game  Heuristic policy wins 2 out of 10 matches  Best policies are successful against opponents not encountered in training  Demo matches confirm results (not included here) Victory points of different strategies against a human opponent (10 games)

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Agenda  Introduction  Settlers of Catan  Method  Results  Conclusion

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Conclusion  RL works in large and complex game domains  Not grandmaster level like TD-Gammon, but pretty good  Settlers of Catan is an interesting testbed and closer to commercial computer games than backgammon, chess,...  Model trees as a new approximation methodology for RL  Combination of prior knowledge with RL yields promising results  Hierarchical learning allows incorporation of knowledge at multiple points of the learning architecture  Learning of AI components  Knowledge speeds up learning

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Future Work  Opponent modeling  recognizing and beating certain opponent types  Reward filtering  how much of the reward signal is caused by other agents  Model trees  other games  improvement of offline training algorithm (tree structure)  Settlers of Catan as game AI testbed  trying other algorithms  improving results

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Thankyou!

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Sources  M. Pfeiffer: Machine Learning Applications in Computer Games, MSc Thesis, Graz University of Technology, 2003  J.R. Quinlan: Learning with Continuous Classes, Proceedings Australian Joint Conference on AI, 1992  M. Sridharan, G.J. Tesauro: Multi-agent Q-learning and Regression Trees for Automated Pricing Decision, Proceedings ICML 17, 2000  R. Sutton, A. Barto: Reinforcement Learning: An Introduction, MIT Press, Cambridge, 1998  G.J. Tesauro: Temporal Difference Learning and TD-Gammon, Communications of the ACM 38, 1995  K. Teuber: Die Siedler von Catan, Kosmos Verlag, Stuttgart, 1995

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Extra Slides

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Approaches  High-level behaviors always run until completion  Allowing high-level switches every time-step (feudal approach) did not work  Module-based Approach  High-level is learned  Low-level is learned  Heuristic Approach  Simple hand-coded high- level strategy during training and in game  Low-level is learned  Selection of high-level influences primitive actions  Guided Approach  Hand-coded high-level strategy during learning  Off-policy learning of high- level strategy for game  Low-level is learned

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Comparison of Approaches  Module-based:  good low-level choices  poor high-level strategy  Heuristic high-level:  significant improvement  learned low-level clearly responsible for improvement  Guided approach:  worse than heuristic  better than module-based Victories of heuristic strategies against other approaches (20 games)

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Comparison of Approaches  Comparison of strategies in games against each other  all significantly better than random  heuristic is best

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November Against Human Opponent  10 games of each policy vs. author  3 agents vs. human  Average victory points as measure of performance  10 VP: win every game  8 VP: close to winning in every game  Only heuristic policy wins 2 out of 10 matches  Demo matches confirm results (not included here) Performance of different strategies against a human opponent (10 games)