Introduction to the Heuristically Accelerated Hierarchical Reinforcement Learning in RTS Games Omar Enayet Amr Saqr AbdelRahman Al-Ogail Ahmed Atta.

Slides:

Advertisements

Similar presentations

Least squares CS1114

Advertisements

Machine Learning in Computer Games Learning in Computer Games By: Marc Ponsen.

Design and Analysis of Algorithms NP-Completeness Haidong Xue Summer 2012, at GSU.

ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.

1 Reinforcement Learning Problem Week #3. Figure reproduced from the figure on page 52 in reference [1] 2 Reinforcement Learning Loop state Agent Environment.

10/19/2004TCSS435A Isabelle Bichindaritz1 Game and Tree Searching.

Lecture 11 CSS314 Parallel Computing

1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.

Artificial Intelligence in Game Design Introduction to Learning.

CS 452 – Software Engineering Workshop Acquire-Playing Agent System Group 1: Lisa Anthony Mike Czajkowski Luiza da Silva Winter 2001, Department of Mathematics.

RED DEAD REVOLVER Artificial Intelligence Critique By Mitchell C. Dodes CIS 588.

Game Playing CSC361 AI CSC361: Game Playing.

Shallow Blue Project 2 Due date: April 5 th. Introduction Second in series of three projects This project focuses on getting AI opponent Subsequent project.

Intelligent Agents What is the basic framework we use to construct intelligent programs?

ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2006.

16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.

Alpha-Beta Search. 2 Two-player games The object of a search is to find a path from the starting position to a goal position In a puzzle-type problem,

Reinforcement Learning (1)

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November 2004 Reinforcement Learning of Strategies for Settlers of Catan Michael Pfeiffer.

Algorithm design techniques

A DAPTIVE I NTELLIGENT AGENT IN REAL - TIME STRATEGY GAMES An Introduction.

Texas Holdem Poker With Q-Learning. First Round (pre-flop) PlayerOpponent.

Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.

Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.

Artificial Intelligence: Its Roots and Scope

Game Playing. Introduction Why is game playing so interesting from an AI point of view? –Game Playing is harder then common searching The search space.

Current Situation and Future Plans Abdelrahman Al-Ogail & Omar Enayet October

Introduction Many decision making problems in real life

Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.

Dynamic Games of complete information: Backward Induction and Subgame perfection - Repeated Games -

Introduction to AI Engine & Common Used AI Techniques Created by: Abdelrahman Al-Ogail Under Supervision of: Dr. Ibrahim Fathy.

Starcraft Opponent Modeling CSE 391: Intro to AI Luciano Cheng.

C ASE -B ASED P LANNER P LATFORM FOR RTS G AMES An Introduction Abdelrahman Al-Ogail Ahmed Atta.

Section 3.1: Proof Strategy Now that we have a fair amount of experience with proofs, we will start to prove more difficult theorems. Our experience so.

CSE 326: Data Structures NP Completeness Ben Lerner Summer 2007.

AI in Computer Gaming: The first person shooter Tyler Hulburd.

SKULLS OF THE SHOGUn AI POST-MORTEM Borut Pfeifer Developer: Haunted Temple Studios Publisher: Microsoft Platforms: XBLA, Windows Phone, Windows 8 Release.

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.

Ibrahim Fathy, Mostafa Aref, Omar Enayet, and Abdelrahman Al-Ogail Faculty of Computer and Information Sciences Ain-Shams University ; Cairo ; Egypt.

OOAD Unit – I OBJECT-ORIENTED ANALYSIS AND DESIGN With applications

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

Game Theory, Social Interactions and Artificial Intelligence Supervisor: Philip Sterne Supervisee: John Richter.

GAME PLAYING 1. There were two reasons that games appeared to be a good domain in which to explore machine intelligence: 1.They provide a structured task.

CSE473 Winter /04/98 State-Space Search Administrative –Next topic: Planning. Reading, Chapter 7, skip 7.3 through 7.5 –Office hours/review after.

Part 3 Linear Programming

Today’s Topics Playing Deterministic (no Dice, etc) Games –Mini-max –  -  pruning –ML and games? 1997: Computer Chess Player (IBM’s Deep Blue) Beat Human.

Game tree search Thanks to Andrew Moore and Faheim Bacchus for slides!

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November 2004 Reinforcement Learning of Strategies for Settlers of Catan Michael Pfeiffer.

Design and Analysis of Algorithms NP-Completeness Haidong Xue Summer 2012, at GSU.

Chess Strategies Component Skills Strategies Prototype Josh Waters, Ty Fenn, Tianyu Chen.

CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,

Contested Dominion Game Treatment written by Nicholas Mezza.

ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.

Graph Search II GAM 376 Robin Burke. Outline Homework #3 Graph search review DFS, BFS A* search Iterative beam search IA* search Search in turn-based.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.

Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams.

Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Algorithms and Problem Solving

Figure 5: Change in Blackjack Posterior Distributions over Time.

Integer Programming An integer linear program (ILP) is defined exactly as a linear program except that values of variables in a feasible solution have.

Data Structures Lab Algorithm Animation.

CS Fall 2016 (Shavlik©), Lecture 11, Week 6

Objective of This Course

CIS 488/588 Bruce R. Maxim UM-Dearborn

Lecture 6 Architecture Algorithm Defin ition. Algorithm 1stDefinition: Sequence of steps that can be taken to solve a problem 2ndDefinition: The step.

Decision Trees Jeff Storey.

Heuristic Search in Empire-Based Games

Discrete Optimization

Presentation transcript:

Introduction to the Heuristically Accelerated Hierarchical Reinforcement Learning in RTS Games Omar Enayet Amr Saqr AbdelRahman Al-Ogail Ahmed Atta

Agenda Complexity of RTS Games. Analysis of the Strategy Game. The HAHRL-RTS Platform. – The Hierarchy. – Heuristic Algorithms. – Function Approximation. References.

Complexity of RTS Games There’s no doubt that strategy games are complex domains: – Gigantic set of allowed Actions (almost infinite) – Gigantic set of Game States (almost infinite) – imperfect information – nondeterministic behavior However : Real-time Planning and Reactions are required !

Complexity of RTS Games No Model of the Game i.e.: we don’t know exactly how can we go from a state to another. Infinite number of states and actions Result : Infeasible Learning with Raw Reinforcement Learning

Solution Solution : – Approximation of state space, action space, and value functions. – Hierarchical Reinforcement Learning – Applying heuristics – Others

Analysis of the Strategy Game

Primitive Actions Primary Primitive Actions 1.Move a unit 2.Train/Upgrade a unit 3.Gather a resource 4.Make a unit attack 5.Make a unit defend 6.Build a building 7.Repair a building NB: Upgrading units or buildings is not available in BosWars but found in most RTS Games.

Wining a Game Any player wins by doing 2 types of actions simultaneously, either an action that strengthens him or an action that weakens his enemy (Fig 1).

Wining a Game

6 Main Sub-Strategies When a human plays a strategy game, he doesn’t learn everything at the same time. He learns each of the following 6 independent sub-strategies separately:

1-Train What Units ? Train/Build/Upgrade attacking Units: What unit does he need to train?? Will he depend on fast cheep units to perform successive fast attacks or powerful expensive slow units to perform one or two brutal attacks to finish his enemy? Or will it be a combination of the two which is often a better choice? Does his enemy have some weak points concerning a certain unit? Or his enemy has units which can infiltrate his defenses so he must train their anti-units? Does he prefer to spend his money on expensive upgrades or spend it on more amounts of non-upgraded units? NB: I deal with attacking Buildings as static attacking units

2- How to Defend ? Defend: How will he use his current units to defend? Will he concentrate all his units in one force stuck to each other or will he stretch his units upon his borders? Or a mix of the two approaches? Will he keep the defending units (which maybe an attacking building) around his buildings or will he make them guard far from the base to stop the enemy early. Or a mix of the two approaches? If he detects an attack on his radar, will he order the units to attack them at once, or will he wait for the opponent to come to his base and be crushed? Or a mix of the two approaches? How will he defend un-armed units? Will he place armed units near them to for protection or will he prefer to use the armed units in another useful thing? If an un-armed unit is under attack how will he react? What are his reactions to different events while defending?

3- How to Attack ? Attack: How will he use his current units to attack? Will he attack the important buildings first? Or will he prefer to crush all the defensive buildings and units first? Or a mix of the two approaches? Will he divide his attacking force to separate small forces to attack from different places, or will he attack with one big solid force? Or a mix of the two approaches? What are his reactions to different events while attacking?

4- How to Gather Resources ? Gather Resources: How will he gather the resources? Will he train a lot of gatherers to have a large rate of gathering resources? Or will he train a limited amount because it would be a waste of money and he wants to rush (attack early) in the beginning of the game so he needs that money? Or a mix of the two approaches? Will he start gathering the far resources first because the near resources are more guaranteed? Or will he be greedy and acquire the resources the nearer the first? Or a mix of the two approaches?

5- How to construct buildings ? Construct Buildings: How does he place his buildings? Will he stick them to each other in order to defend them easily? Or will he leave large spaces between them to make it harder for the opponent to destroy them? Or a mix of the two approaches?

6- How to Repair ? Repair: How will he do the repairing? Although it’s a minor thing, but different approaches are used. Will he place a repairing unit near every building in case of having an attack, or will he just order the nearest one to repair the building being attacked? Or a mix of the two approaches?

Heuristically accelerated Hierarchical RL in RTS Games

The Hierarchy Since the 6 sub-strategies do not depend on each other (think of it and you’ll find them nearly independent), So, I will divide the AI system to a hierarchy as shown in figure 1, each child node is by itself a Semi-Marcov decision process (SMDP) where Heuristically Accelerated Reinforcement Learning Techniques will be applied. Each child node will be later divided into other sub-nodes of SMDPs.

Heuristic Algorithms A heuristic, is an algorithm that is able to produce an acceptable solution to a problem in many practical scenarios, in the fashion of a general heuristic, but for which there is no formal proof of its correctness. Alternatively, it may be correct, but may not be proven to produce an optimal solution, or to use reasonable resources.

Heuristic Algorithms (Cont’d) Firstly : The Splitting of the learning into learning the six sub-strategies is a heuristic Secondly : Using Case-Based Reasoning when choosing actions is a heuristic. Why Heuristics ?? – Because they will accelerate the learning dramatically. – They will decrease the non-determination of the AI so Testing is easier. Why not Heuristics ? : Programming Increases

Feature-Based Function Approximation The Problem: The State-action Space is infinite The Goal: We want to approximate the state-action space but reinforcement learning still becomes efficient.

The Approach If the actions are infinite, make them discrete with any appropriate way. For example: In the Resource Gathering Problem, the actions are joining more N number of gatherers to gather this resource, N this could be any number, we will convert it to discrete values such as : [0,1] [2,4] [5,8] [9,15] [16,22] [22,35] Only. Notice that its rare cases when u need to join more than 35 gatherers to the already-working gatherers to gather a resource.

The Approach (Cont’d) The states won’t be represented explicitly, but depending on their features. For example: In the Resource Gathering Problem, the states are infinite depending on the combinations of following features: number of gatherers, relative distant between each gatherer and the resource, available resources, wanted resources …etc. which is a huge number, instead we will use features themselves as you will see

The Approach (Cont’d)

Result of Approximation So the complexity won’t depend on the number of states*number of actions, Instead it will depend on the number of features*number of actions, so in the Resource Gathering Problem, if we have 6 distinct actions and we approximated the infinite number of states to at least 100 we will learn the Values of at least 600 Q-Value Pairs, but by using this approach if we have 5 features and 6 distinct actions, we will learn 5*6=30 thetas only. We approximated only state space not action space, infinite states to definite number of features. But still exists a problem if the action space is large.

References Andrew G. Barto, Sridhar Mahadevan, 2003, Recent Advances in Hierarchical Reinforcement Learning Marina Irodova and Robert H. Sloa, 2005, Reinforcement Learning and Function Approximation Reinaldo A. C. Bianchi, Raquel Ros and Ramón López de Mántaras, 2009, Improving Reinforcement Learning by using Case Based Heuristics Richard S. Sutton and Andrew G. Barto, 1998, Reinforcement Learning: An Introduction Wikipedia