RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.

Slides:

Advertisements

Similar presentations

S Licentiate course on Telecommunications Technology (4+1+3 cr.) Course Topic Spring 2000: Routing Algorithms in the DiffServ MPLS Networks Introduction.

Advertisements

Latest AI Research in RTS Games

Todd W. Neller Gettysburg College

Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.

RL for Large State Spaces: Value Function Approximation

Framework for comparing power system reliability criteria Evelyn Heylen Prof. Geert Deconinck Prof. Dirk Van Hertem Durham Risk and Reliability modelling.

Machine Learning in Computer Games Learning in Computer Games By: Marc Ponsen.

Monte Carlo Tree Search: Insights and Applications BCS Real AI Event Simon Lucas Game Intelligence Group University of Essex.

1/38 Game-Tree Search over High-Level Game States in RTS Games Alberto Uriarte and Santiago Ontañón Drexel University Philadelphia October 6, 2014.

Introduction to Hierarchical Reinforcement Learning Jervis Pinto Slides adapted from Ron Parr (From ICML 2005 Rich Representations for Reinforcement Learning.

Adversarial Search Chapter 5.

Adversarial Search: Game Playing Reading: Chapter next time.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

Progressive Strategies For Monte-Carlo Tree Search Presenter: Ling Zhao University of Alberta November 5, 2007 Authors: G.M.J.B. Chaslot, M.H.M. Winands,

Games and adversarial search

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Optimal Tuning of Continual Online Exploration in Reinforcement Learning Youssef Achbany, Francois Fouss, Luh Yen, Alain Pirotte & Marco Saerens Information.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Persistent Autonomous FlightNicholas Lawrance Reinforcement Learning for Soaring CDMRG – 24 May 2010 Nick Lawrance.

Adversarial Search: Game Playing Reading: Chess paper.

Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.

Marco Adelfio CMSC 828N – Spring 2009 General Game Playing (GGP)

The Poker Squares Challenge Todd W. Neller. What is the Poker Squares Challenge? A semester-long contest where Gettysburg College students (individuals.

1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.

1 Project Ideas. 2 Algorithmic Evaluations/Comparisons  Compare variants of (nested) policy rollout using different bandit algorithms  Compare some.

A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.

Monte-Carlo Tree Search

Search and Planning for Inference and Learning in Computer Vision

Vilalta&Eick: Informed Search Informed Search and Exploration Search Strategies Heuristic Functions Local Search Algorithms Vilalta&Eick: Informed Search.

Network Aware Resource Allocation in Distributed Clouds.

REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.

Upper Confidence Trees for Game AI Chahine Koleejan.

Iterative Readability Computation for Domain-Specific Resources By Jin Zhao and Min-Yen Kan 11/06/2010.

Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.

LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:

Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.

Monte-Carlo methods for Computation and Optimization Spring 2015 Based on “N-Grams and the Last-Good-Reply Policy Applied in General Game Playing” (Mandy.

Randomized Parallel Proof-Number Search ACG 12, Pamplona, May 2009.

1/27 High-level Representations for Game-Tree Search in RTS Games Alberto Uriarte and Santiago Ontañón Drexel University Philadelphia October 3, 2014.

Neural Networks Chapter 7

Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:

Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.

RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.

Parallelization in Computer Board Games Ian Princivalli.

Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.

Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:

1 Monte-Carlo Tree Search Alan Fern. 2 Introduction  Rollout does not guarantee optimality or near optimality  It only guarantees policy improvement.

Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.

Energy-Efficient Randomized Switching for Maximizing Lifetime in Tree- Based Wireless Sensor Networks Sk Kajal Arefin Imon, Adnan Khan, Mario Di Francesco,

Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10

Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.

Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.

GOMOKU ALGORITHM STUDY MIN-MAX AND MONTE CARLO APPROACHING

Adversarial Reasoning: Sampling-Based Search with the UCT algorithm Joint work with Raghuram Ramanujan and Ashish Sabharwal.

Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.

CE810 / IGGI Game Design II PTSP and Game AI Agents Diego Perez.

The Game Development Process: Artificial Intelligence.

Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.

Artificial Intelligence AIMA §5: Adversarial Search

Stochastic tree search and stochastic games

A Comparison of Learning Algorithms on the ALE

Mastering the game of Go with deep neural network and tree search

AlphaGo with Deep RL Alpha GO.

CMSC 471 – Spring 2014 Class #25 – Thursday, May 1

Deep reinforcement learning

SAT-Based Area Recovery in Technology Mapping

Reinforcement Learning with Partially Known World Dynamics

Introduction to Visual Analytics

Morteza Kheirkhah University College London

Deployment Optimization of IoT Devices through Attack Graph Analysis

Presentation transcript:

RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games

Overview I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion

I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion

Domain RTS games  Resource Production  Tactical Planning Tactical Assault battles

RTS game - Wargus Screenshot of a typical battle scenario in Wargus Battle 1 Battle 2 Enemy group Friendly group

Planning problem Large state space Temporal actions Spatial reasoning Concurrency Stochastic actions Changing goals

I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion

Related Work Board games – bridge, poker, Go etc.,  Monte Carlo simulations RTS games  Resource Production  Means-ends analysis  Tactical Planning  Monte Carlo simulations  Nash strategies  Reinforcement learning Bandit-based problems, Go  UCT

Our Approach Monte Carlo simulations UCT algorithm Advantage  Complex plans from simple abstract actions  Exploration/Exploitation tradeoff  Changing goals

I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion

Method Planning architecture UCT Algorithm Search space formulation Monte Carlo simulations Challenges

Online Planning Framework UCT planner Action dispatcher Stratagus engine Current game state: Unit locations and hit points Ground actions: Move (unit1, pos1, pos2) Attack (unit1, unit2) Abstract game state: Group locations and hit points Abstract actions: Join (f1, f2) Attack (f1, e1)

Abstraction Abstract state space  Grouping of units Abstract actions  Join(G)  Attack(f,e) f1 f2 f3 e1 e2

UCT Algorithm Monte Carlo simulation – get subsequent states Search tree  Root node – current state  Edges – available actions  Intermediate nodes – subsequent states  Leaf nodes – terminal states Rollout-based construction Value estimates Exploration/Exploitation tradeoff

UCT Source: Achieving Master Level Play in Computer Go – Sylvain Gelly and David Silver

UCT Algorithm - Formulae Action Selection: Value Updation:

Exploitation Source: Achieving Master Level Play in Computer Go – Sylvain Gelly and David Silver

Exploration Source: Achieving Master Level Play in Computer Go – Sylvain Gelly and David Silver

Search Space Join Actions f1 f2 f3 e1 e2 f1 f2’ e1 e2 Join (f1, f2)

Search Space Attack Actions f1 f2’ e1 e2 f1 f2’’ e1 Attack (f2’, e2)

Monte Carlo Simulations Domain-specific Actual game play  Join actions  Attack actions Reward calculation – objective function  Time  Hit points

Domain-specific Challenges State space abstraction  Grouping of units (proximity-based) Concurrency  Aggregation of actions  Join actions – simple  Attack actions – complex (partial simulations)

Planning problem - revisited Large state space Temporal actions Spatial reasoning Stochastic actions Changing goals Concurrency - Abstraction - Monte Carlo simulations - UCT (online planning) - UCT (objective functions) - Aggregation of actions

I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion

Experiments # Scenario Name # of friendly groups Friendly groups composition # of enemy groups Enemy groups composition # of possible ‘Join’ actions # of possible ‘Attack’ actions Total # of possible actions 12vs22{6,6}2{5,5}145 23vs23{6,2,4}2{5,5}369 34vs2_14{2,4,2,4}2{5,5} vs2_24{2,4,2,4}2{5,5} vs2_34{2,4,2,4}2{5,5} vs2_44{2,4,2,4}2{5,5} vs2_54{2,4,2,4}2{5,5} vs2_64{2,4,2,4}2{5,5} vs2_74{3,3,6,4}2{5,9} vs2_84{3,3,3,6}2{5,8} vs4_12{9,9}4{4,5,5,4} vs4_22{9,9}4{5,5,5,5} vs4_32{9,9}4{5,5,5,5} vs5_12{9,9}5{5,5,5,5,5} vs5_22{10,10}5{5,5,5,5,5} vs43{12,4,4}4{5,5,5,5}31215 Table 1: Details of the different game scenarios

Planners UCT Planners  UCT(t)  UCT(hp) Number of rollouts – 5000 Averaged over – 5 runs - minimize time - maximize hit points

Planners Baseline Planners  Random  Attack-Closest  Attack-Weakest  Stratagus-AI  Human

Video – Planning in action Simple scenario – 2 vs 2 UCT(t) – optimize timeUCT(hp) – optimize hit points

Video – Planning in action Complex scenario – 3 vs 4 UCT(t) – optimize timeUCT(hp) – optimize hit points

Results Figure 1: Time results for UCT(t) and baselines.

Results Figure 2: Hit point results for UCT(t) and baselines.

Results Figure 3: Time results for UCT(hp) and baselines.

Results Figure 4: Hit point results for UCT(hp) and baselines.

Results - Comparison Figures 1, 2, 3 & 4: Comparison between UCT(t) and UCT(hp) metrics Time resultsHit point results U C T (t) U C T (hp)

Results Figure 5: Time results for UCT(t) with varying rollouts.

I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion

Conclusion  Hard planning problem  Less expert knowledge  Different objective functions Future Work  Computational time – engineering aspects  Machine Learning techniques  Beyond Tactical Assault

Thank you