RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.

Slides:



Advertisements
Similar presentations
Artificial Intelligence Presentation
Advertisements

Todd W. Neller Gettysburg College
Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
RL for Large State Spaces: Value Function Approximation
TEMPORAL DIFFERENCE LEARNING Mark Romero – 11/03/2011.
Machine Learning in Computer Games Learning in Computer Games By: Marc Ponsen.
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
Introduction to Hierarchical Reinforcement Learning Jervis Pinto Slides adapted from Ron Parr (From ICML 2005 Rich Representations for Reinforcement Learning.
Adversarial Search: Game Playing Reading: Chapter next time.
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
Game Intelligence: The Future Simon M. Lucas Game Intelligence Group School of CS & EE University of Essex.
Progressive Strategies For Monte-Carlo Tree Search Presenter: Ling Zhao University of Alberta November 5, 2007 Authors: G.M.J.B. Chaslot, M.H.M. Winands,
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring CDMRG – 24 May 2010 Nick Lawrance.
Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Monte-Carlo Planning Look Ahead Trees
The Poker Squares Challenge Todd W. Neller. What is the Poker Squares Challenge? A semester-long contest where Gettysburg College students (individuals.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
Monte-Carlo Tree Search
Search and Planning for Inference and Learning in Computer Vision
Vilalta&Eick: Informed Search Informed Search and Exploration Search Strategies Heuristic Functions Local Search Algorithms Vilalta&Eick: Informed Search.
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
Upper Confidence Trees for Game AI Chahine Koleejan.
Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.
GOMOKU ALGORITHM STUDY MIN-MAX AND MONTE CARLO APPROACHING
Ibrahim Fathy, Mostafa Aref, Omar Enayet, and Abdelrahman Al-Ogail Faculty of Computer and Information Sciences Ain-Shams University ; Cairo ; Egypt.
Monte-Carlo methods for Computation and Optimization Spring 2015 Based on “N-Grams and the Last-Good-Reply Policy Applied in General Game Playing” (Mandy.
Randomized Parallel Proof-Number Search ACG 12, Pamplona, May 2009.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
1/27 High-level Representations for Game-Tree Search in RTS Games Alberto Uriarte and Santiago Ontañón Drexel University Philadelphia October 3, 2014.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
Parallelization in Computer Board Games Ian Princivalli.
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
1 Monte-Carlo Tree Search Alan Fern. 2 Introduction  Rollout does not guarantee optimality or near optimality  It only guarantees policy improvement.
Reinforcement Learning Based on slides by Avi Pfeffer and David Parkes.
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
Pigeon Problems Revisited Pigeons (Columba livia) as Trainable Observers of Pathology and Radiology Breast Cancer Images.
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
GOMOKU ALGORITHM STUDY MIN-MAX AND MONTE CARLO APPROACHING
A Scalable Machine Learning Approach to Go Pierre Baldi and Lin Wu UC Irvine.
RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.
Adaptive Reinforcement Learning Agents in RTS Games Eric Kok.
Adversarial Search In this lecture, we introduce a new search scenario: game playing 1.two players, 2.zero-sum game, (win-lose, lose-win, draw) 3.perfect.
Adversarial Reasoning: Sampling-Based Search with the UCT algorithm Joint work with Raghuram Ramanujan and Ashish Sabharwal.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
CE810 / IGGI Game Design II PTSP and Game AI Agents Diego Perez.
Classification Using Genetic Programming Patrick Kellogg General Assembly Data Science Course (8/23/ /12/15)
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Artificial Intelligence AIMA §5: Adversarial Search
Stochastic tree search and stochastic games
Database Management System
Mastering the game of Go with deep neural network and tree search
AlphaGo with Deep RL Alpha GO.
CMSC 471 – Spring 2014 Class #25 – Thursday, May 1
Deep reinforcement learning
Biomedical Data & Markov Decision Process
Robust Belief-based Execution of Manipulation Programs
Announcements Homework 3 due today (grace period through Friday)
SAT-Based Area Recovery in Technology Mapping
Reinforcement Learning with Partially Known World Dynamics
COMP60621 Fundamentals of Parallel and Distributed Systems
RHEA Enhancements for GVGP
Morteza Kheirkhah University College London
Deployment Optimization of IoT Devices through Attack Graph Analysis
Presentation transcript:

RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games

Overview I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion

I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion

Domain RTS games  Resource Production  Tactical Planning Tactical Assault battles

RTS game - Wargus Screenshot of a typical battle scenario in Wargus

Planning problem Large state space Temporal actions Spatial reasoning Concurrency Stochastic actions Changing goals

I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion

Related Work Board games – bridge, poker, Go etc.,  Monte Carlo simulations RTS games  Resource Production  Means-ends analysis – Chan et al.,  Tactical Planning  Monte Carlo simulations – Chung et al.,  Nash strategies – Sailer et al.,  Reinforcement learning – Wilson et al., Bandit-based problems, Go  UCT – Kocsis et al., Gelly et al.,

Our Approach Monte Carlo simulations UCT algorithm Advantage  Complex plans from simple abstract actions  Exploration/Exploitation tradeoff  Changing goals

I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion

Method Planning architecture UCT Algorithm Search space formulation Monte Carlo simulations Challenges

Planning Architecture Online Planner State space abstraction  Grouping of units Abstract actions  Join(G)  Attack(f,e)

UCT Algorithm Exploration/Exploitation tradeoff Monte Carlo simulation – get subsequent states Search tree  Root node – current state  Edges – available actions  Intermediate nodes – subsequent states  Leaf nodes – terminal states Rollout-based construction Value estimates

UCT Algorithm – Pseudo Code 1 At each interesting time point in the game: build_UCT_tree(current state); choose argmax action(s) based on the UCT policy; execute the aggregated actions in the actual game; wait until one of the actions get executed; build_UCT_tree(state): for each UCT pass do run UCT_rollout(state); (.. continued)

UCT Algorithm – Pseudo Code 2 UCT_rollout(state): recursive algorithm if leaf node reached then estimate final reward; propagate reward up the tree and update value functions; return; populate possible actions; if all actions explored at least once then choose the action with best value function; else if there exists unexplored action choose an action based on random sampling; run Monte-Carlo simulation to get next state based on current state and action; call UCT_rollout(next state);

UCT Algorithm - Formulae Action Selection: Value Updation:

Search Space Formulation Abstract State  Friendly and enemy groups  Hit points  Location  Current actions  Current time Calculation of group hit points: Calculation of mean location: centroid

Monte Carlo Simulations Domain-specific Actual game play – Wargus  Join actions  Attack actions Reward calculation – objective function  Time  Hit points Note: Partial simulations (time cutoff)

Domain-specific Challenges State space abstraction  Grouping of units (proximity-based) Concurrency  Aggregation of actions  Join actions – simple  Attack actions – complex (partial simulations)

Planning problem - revisited Large state space – abstraction Temporal actions – Monte Carlo simulations Spatial reasoning – Monte Carlo simulations Concurrency – aggregation of actions Stochastic actions – UCT (online planning) Changing goals – UCT (different objective functions)

I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion

Experiments # Scenario Name # of friendly groups Friendly groups composition # of enemy groups Enemy groups composition # of possible ‘Join’ actions # of possible ‘Attack’ actions Total # of possible actions 12vs22{6,6}2{5,5}145 23vs23{6,2,4}2{5,5}369 34vs2_14{2,4,2,4}2{5,5} vs2_24{2,4,2,4}2{5,5} vs2_34{2,4,2,4}2{5,5} vs2_44{2,4,2,4}2{5,5} vs2_54{2,4,2,4}2{5,5} vs2_64{2,4,2,4}2{5,5} vs2_74{3,3,6,4}2{5,9} vs2_84{3,3,3,6}2{5,8} vs4_12{9,9}4{4,5,5,4} vs4_22{9,9}4{5,5,5,5} vs4_32{9,9}4{5,5,5,5} vs5_12{9,9}5{5,5,5,5,5} vs5_22{10,10}5{5,5,5,5,5} vs43{12,4,4}4{5,5,5,5}31215 Table 1: Details of the different game scenarios

Planners UCT Planners  UCT(t)  UCT(hp) Number of rollouts – 5000 Averaged over – 5 runs

Planners Baseline Planners  Random  Attack-Closest  Attack-Weakest  Stratagus-AI  Human

Video – Planning in action Simple scenario Complex scenario

Results Figure 1: Time results for UCT(t) and baselines.

Results Figure 2: Hit point results for UCT(t) and baselines.

Results Figure 3: Time results for UCT(hp) and baselines.

Results Figure 4: Hit point results for UCT(hp) and baselines.

Results - Comparison Figures 1, 2, 3 & 4: Comparison between UCT(t) and UCT(hp) metrics Time resultsHit point results U C T (t) U C T (hp)

Results Figure 5: Time results for UCT(t) with varying rollouts.

I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion

Conclusion  Hard planning problem  Less expert knowledge  Different objective functions Future Work  Computational time – engineering aspects  Machine Learning techniques  Beyond Tactical Assault

Thank you