Improving Monte Carlo Tree Search Policies in StarCraft

Slides:



Advertisements
Similar presentations
Chapter 09 AI techniques in different game genres (Puzzle/Card/Shooting)
Advertisements

Multi Armed Bandits
Artificial Intelligence Presentation
Todd W. Neller Gettysburg College
Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
PSMAGE: Balanced Map Generation for StarCraft Alberto Uriarte and Santiago Ontañón Drexel University Philadelphia 1/34 August 11, 2013.
A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft Santiago Ontanon, Gabriel Synnaeve, Alberto Uriarte, Florian Richoux, David.
Techniques for Dealing with Hard Problems Backtrack: –Systematically enumerates all potential solutions by continually trying to extend a partial solution.
AI for Connect-4 (or other 2-player games) Minds and Machines.
Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.
1/38 Game-Tree Search over High-Level Game States in RTS Games Alberto Uriarte and Santiago Ontañón Drexel University Philadelphia October 6, 2014.
Rescuing an Endangered Species with Monte Carlo AI Tom Dietterich based on work by Dan Sheldon et al. 1.
Mortal Multi-Armed Bandits Deepayan Chakrabarti,Yahoo! Research Ravi Kumar,Yahoo! Research Filip Radlinski, Microsoft Research Eli Upfal,Brown University.
Games with Chance Other Search Algorithms CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 3 Adapted from slides of Yoonsuck Choe.
Detecting Network Intrusions via Sampling : A Game Theoretic Approach Presented By: Matt Vidal Murali Kodialam T.V. Lakshman July 22, 2003 Bell Labs, Lucent.
1 An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April
Simulation Exploitation Using Open Source Information 1 Problem The US Government would like to share simulation software packages with friendly countries.
Exploration and Exploitation Strategies for the K-armed Bandit Problem by Alexander L. Strehl.
EDA (CS286.5b) Day 11 Scheduling (List, Force, Approximation) N.B. no class Thursday (FPGA) …
Using Value of Information to Learn and Classify under Hard Budgets Russell Greiner, Daniel Lizotte, Aloak Kapoor, Omid Madani Dept of Computing Science,
The Poker Squares Challenge Todd W. Neller. What is the Poker Squares Challenge? A semester-long contest where Gettysburg College students (individuals.
The Parameterized Poker Squares EAAI NSG Challenge
1 Project Ideas. 2 Algorithmic Evaluations/Comparisons  Compare variants of (nested) policy rollout using different bandit algorithms  Compare some.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
CSC 412: AI Adversarial Search
Kiting in RTS Games Using Influence Maps Alberto Uriarte and Santiago Ontañón Drexel University Philadelphia 1/26 October 9, 2012.
Search and Planning for Inference and Learning in Computer Vision
Game Playing.
1 A K-Means Based Bayesian Classifier Inside a DBMS Using SQL & UDFs Ph.D Showcase, Dept. of Computer Science Sasi Kumar Pitchaimalai Ph.D Candidate Database.
Upper Confidence Trees for Game AI Chahine Koleejan.
1 Computer Group Engineering Department University of Science and Culture S. H. Davarpanah
Game Playing. Towards Intelligence? Many researchers attacked “intelligent behavior” by looking to strategy games involving deep thought. Many researchers.
GOMOKU ALGORITHM STUDY MIN-MAX AND MONTE CARLO APPROACHING
CS 4730 Probability and Risk CS 4730 – Computer Game Design Credit: Several slides from Walker White (Cornell)
Artificial Intelligence in Game Design N-Grams and Decision Tree Learning.
Senior Project Poster Day 2007, CIS Dept. University of Pennsylvania Reversi Meng Tran Faculty Advisor: Dr. Barry Silverman Strategies: l Corners t Corners.
Monte-Carlo methods for Computation and Optimization Spring 2015 Based on “N-Grams and the Last-Good-Reply Policy Applied in General Game Playing” (Mandy.
Game Playing. Introduction One of the earliest areas in artificial intelligence is game playing. Two-person zero-sum game. Games for which the state space.
1/27 High-level Representations for Game-Tree Search in RTS Games Alberto Uriarte and Santiago Ontañón Drexel University Philadelphia October 3, 2014.
RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Bandits.
Pedagogical Possibilities for the 2048 Puzzle Game Todd W. Neller.
Local Search. Systematic versus local search u Systematic search  Breadth-first, depth-first, IDDFS, A*, IDA*, etc  Keep one or more paths in memory.
GOMOKU ALGORITHM STUDY MIN-MAX AND MONTE CARLO APPROACHING
CE810 / IGGI Game Design II PTSP and Game AI Agents Diego Perez.
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
1/23 A Benchmark for StarCraft Intelligent Agents Alberto Uriarte and Santiago Ontañón Drexel University Philadelphia November 15, 2015.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.
Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.
Basics of Multi-armed Bandit Problems
Automatic Learning of Combat Models for RTS Games
ADVERSARIAL GAME SEARCH: Min-Max Search
Automatic Learning of Combat Models for RTS Games
Stochastic tree search and stochastic games
Monte-Carlo Planning:
Reinforcement Learning
Improving Terrain Analysis and Applications to RTS Game AI
Mastering the game of Go with deep neural network and tree search
Movement in a full and dynamic environment using a limited influence map Paulo Lafeta Ferreira Artificial Intelligence for Games – CS 580 Professor: Steve.
AlphaGo with Deep RL Alpha GO.
資訊新知 Playing Games with Computational Intelligence
David Kauchak CS52 – Spring 2016
Deep reinforcement learning
The Parameterized Poker Squares EAAI NSG Challenge
Kevin Mason Michael Suggs
Shunan Zhang, Michael D. Lee, Miles Munro
Guess a random word with n errors
These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like.
Presentation transcript:

Improving Monte Carlo Tree Search Policies in StarCraft via Probabilistic Models Learned from Replay Data Alberto Uriarte and Santiago Ontañón Drexel University Philadelphia October 10, 2016

Motivation Sorry, this paper is not really about Monte Carlo Tree Search

Motivation Sorry, this paper is not really about Monte Carlo Tree Search It is about the Multi-Armed Bandit Problem

Multi-Armed Bandit Problem

Multi-Armed Bandit Problem How to pick between Slot Machines so that you walk out with most $$$ from Las Vegas?

Multi-Armed Bandit Problem

Multi-Armed Bandit Problem Epsilon Greedy Upper Confidence Bounds (UCB) Thompson sampling …

Multi-Armed Bandit Problem

Problem We are broke

Problem We are broke Thousands of Slot Machines

Problem We are broke Thousands of Slot Machines We don’t have enough budget to explore each Slot Machine even once!!

Solution: Abstraction

Solution: Abstraction

Solution: Data Acquisition

… RTS Same Problem Money = Computational Budget Slot Machine = Next Action to Choose … Number of Slot Machines = Branching Factor

Abstraction

Abstraction

Abstraction

Abstraction 2 1 4 4 1 2

Abstraction Idle Attack Move To Friend To Enemy Towards Friend Towards Enemy 2 1 4 4 1 2

Abstraction NOT To Friend, NOT To Enemy, Idle Attack Move To Friend To Enemy Towards Friend Towards Enemy 2 1 4 4 NOT To Friend, NOT To Enemy, NOT Towards Friend, NOT Towards Enemy 1 2

Abstraction NOT To Friend, To Enemy, NOT Towards Friend, Towards Enemy Idle Attack Move To Friend To Enemy Towards Friend Towards Enemy 2 1 4 4 NOT To Friend, To Enemy, NOT Towards Friend, Towards Enemy 1 2

Abstraction Even with this abstraction the branching factor Idle Attack Move To Friend To Enemy Towards Friend Towards Enemy 2 1 4 4 NOT To Friend, To Enemy, NOT Towards Friend, Towards Enemy 1 2 Even with this abstraction the branching factor can be too big to handle (10200)

Data Acquisition Professional Player Game Replays

Squad-Action Naïve Bayes Data Acquisition Professional Player Game Replays Squad-Action Naïve Bayes Probability to chose action T given a game state X (X = set of current possible actions)

Squad-Action Naïve Bayes Data Acquisition Professional Player Game Replays Squad-Action Naïve Bayes Probability to choose action T

Squad-Action Naïve Bayes Data Acquisition Professional Player Game Replays Squad-Action Naïve Bayes Probability of action Xj was an option when action T was selected

Epsilon-Greedy Sampling Explore (20%) Exploit (80%) Select Using a Uniform Distribution Select Current Best

Informed Epsilon-Greedy Sampling Explore (20%) Exploit (80%) Select Using our Naïve Bayes Distribution Select Current Best Squad-Action Naïve Bayes

Best Informed E-Greedy Sampling If none of the children have been explored: Select Most Probable action from our Naïve Bayes Distribution Else: Explore (20%) Exploit (80%) Select Using our Naïve Bayes Distribution Select Current Best

MTCS Policies We can use the new sampling strategies in both MCTS policies: Tree Policy and Default Policy

MTCS Policies Experiments Tree Policy Default Policy ε UNIFORM NB NB-ε BestNB-ε

MTCS Policies Experiments No fog of war Tree Policy Default Policy ε UNIFORM NB NB-ε BestNB-ε 1 MCTS search every 400 frames (16s) TvT, default AI

How deep to search? Simulating until reaching the end of the game is not feasible 2 min in the future

Experiments

the remaining 10% are ties Experiments the remaining 10% are ties

Experiments

Conclusions BestNB-ε, NB policies with 40 playouts wins 80% with less than 0.1s spend per search BestNB-ε wins in less time and loses less units than NB-ε

Improving Monte Carlo Tree Search Policies in StarCraft via Probabilistic Models Learned from Replay Data Alberto Uriarte albertouri@cs.drexel.edu Santiago Ontañón santi@cs.drexel.edu Lab looking for new PhD students!!!!