GOMOKU ALGORITHM STUDY MIN-MAX AND MONTE CARLO APPROACHING

Slides:



Advertisements
Similar presentations
A Whist AI Jason Fong CS261A, Spring What is Whist? Old card game, driven into obscurity by Bridge Similar to other trick taking games Bridge, Spades,
Advertisements

Artificial Intelligence Presentation
Markov Decision Process
Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Adversarial Search Reference: “Artificial Intelligence: A Modern Approach, 3 rd ed” (Russell and Norvig)
Computers playing games. One-player games Puzzle: Place 8 queens on a chess board so that no two queens attack each other (i.e. on the same row, same.
AI for Connect-4 (or other 2-player games) Minds and Machines.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.
Game Playing Games require different search procedures. Basically they are based on generate and test philosophy. At one end, generator generates entire.
University College Cork (Ireland) Department of Civil and Environmental Engineering Course: Engineering Artificial Intelligence Dr. Radu Marinescu Lecture.
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Chess AI’s, How do they work? Math Club 10/03/2011.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Markov Decision Processes
CPSC 322 Introduction to Artificial Intelligence October 25, 2004.
Games with Chance Other Search Algorithms CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 3 Adapted from slides of Yoonsuck Choe.
Games and adversarial search
Solving Probabilistic Combinatorial Games Ling Zhao & Martin Mueller University of Alberta September 7, 2005 Paper link:
Min-Max Trees Based on slides by: Rob Powers Ian Gent Yishay Mansour.
A TIE IS NOT A LOSS Paul Adamiak T02 Aruna Meiyeppen T01.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2006.
THE RENJU GAME BY ABHISHEK JAIN, PRANSHU GUPTA & RHYTHM DAS PCLUB SUMMER PROJECT PRESENTATION JUNE, L7 IIT KANPUR MENTOR – SANIL JAIN.
Time for playing games Form pairs You will get a sheet of paper to play games with You will have 12 minutes to play the games and turn them in.
The Poker Squares Challenge Todd W. Neller. What is the Poker Squares Challenge? A semester-long contest where Gettysburg College students (individuals.
The Parameterized Poker Squares EAAI NSG Challenge
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
CSC 412: AI Adversarial Search
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
PSU CS 370 – Introduction to Artificial Intelligence Game MinMax Alpha-Beta.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
Lecture 5 Note: Some slides and/or pictures are adapted from Lecture slides / Books of Dr Zafar Alvi. Text Book - Aritificial Intelligence Illuminated.
Minimax.
KU NLP Heuristic Search Heuristic Search and Expert Systems (1) q An interesting approach to implementing heuristics is the use of confidence.
Game Playing.
Upper Confidence Trees for Game AI Chahine Koleejan.
Prepared by : Walaa Maqdasawi Razan Jararah Supervised by: Dr. Aladdin Masri.
Mark Dunlop, Computer and Information Sciences, Strathclyde University 1 Algorithms & Complexity 5 Games Mark D Dunlop.
Game Playing. Towards Intelligence? Many researchers attacked “intelligent behavior” by looking to strategy games involving deep thought. Many researchers.
14.3 Simulation Techniques and the Monte Carlo Method simulation technique A simulation technique uses a probability experiment to mimic a real-life situation.
GOMOKU ALGORITHM STUDY MIN-MAX AND MONTE CARLO APPROACHING
Connect Four AI Robert Burns and Brett Crawford. Connect Four  A board with at least six rows and seven columns  Two players: one with red discs and.
1 N -Queens via Relaxation Labeling Ilana Koreh ( ) Luba Rashkovsky ( )
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.
Monte-Carlo methods for Computation and Optimization Spring 2015 Based on “N-Grams and the Last-Good-Reply Policy Applied in General Game Playing” (Mandy.
Game Playing. Introduction One of the earliest areas in artificial intelligence is game playing. Two-person zero-sum game. Games for which the state space.
Cilk Pousse James Process CS534. Overview Introduction to Pousse Searching Evaluation Function Move Ordering Conclusion.
CSCI 4310 Lecture 6: Adversarial Tree Search. Book Winston Chapter 6.
For Friday Finish chapter 6 Program 1, Milestone 1 due.
RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games.
Game tree search Chapter 6 (6.1 to 6.3 and 6.6) cover games. 6.6 covers state of the art game players in particular. 6.5 covers games that involve uncertainty.
Games, page 1 CSI 4106, Winter 2005 Games Points Games and strategies A logic-based approach to games AND-OR trees/graphs Static evaluation functions Tic-Tac-Toe.
Pedagogical Possibilities for the 2048 Puzzle Game Todd W. Neller.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Will Britt and Bryan Silinski
CompSci Backtracking, Search, Heuristics l Many problems require an approach similar to solving a maze ä Certain mazes can be solved using the.
Graph Search II GAM 376 Robin Burke. Outline Homework #3 Graph search review DFS, BFS A* search Iterative beam search IA* search Search in turn-based.
CPS Backtracking, Search, Heuristics l Many problems require an approach similar to solving a maze ä Certain mazes can be solved using the “right-hand”
Teaching Computers to Think:
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.
1 Decisions in games Minimax algorithm  -  algorithm Tic-Tac-Toe game Decisions in games Minimax algorithm  -  algorithm Tic-Tac-Toe game.
By: Casey Savage, Hayley Stueber, and James Olson
Mastering the game of Go with deep neural network and tree search
AlphaGO from Google DeepMind in 2016, beat human grandmasters
Search in OOXX Games J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept.
Kevin Mason Michael Suggs
Backtracking, Search, Heuristics
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning
Presentation transcript:

GOMOKU ALGORITHM STUDY MIN-MAX AND MONTE CARLO APPROACHING Jingtong Liu, Sun Wei, Weixun Ge, Xie Guochen

Introduction Gomoku is an abstract strategy board game, also called Gobang or Five in a Row, played on a board of 15X15 intersections. Complicated, not very complicated Algorithm Minimax MonteCarlo

Minimax

Minimax Evaluation: total score of all patterns on the board -800000 -970000 950000

Minimax Improvement

Why Monte? (Motivation) For some games, Minimax works really well. But for some other games, the search tree could be very large. It motivates us to implement an alternative algorithm called MonteCarlo Tree Search. For some games, We believe, if we use simulate annealing with applying local Minimax search, the agent will get better than simply adopted Minimax search.

Assumptions(Modification later) (1) Moves are performed randomly with the probabilities assigned by the method of simulated annealing, (2) The value of a position is defined by the win rate of the given position (3)To find the best move in a given position, play the game to the very end as suggested by (1) and then evaluate as in (2); play thousands of such random games, and the best move will be the one doing the best. Unique game, so modifications later

Win Time (Evaluation Function)

Updating the win time Update the Win Time by each roll out. The best move should always be played.

Issues we need to solve or improve How to choose the roots to build the search tree Two idiots play or something else? Uniqueness of the game: Order of the moves is important

Monte Carlo Gomoku Smart Simulate (trained by minimax as opponent)

Smart Simulate (not all idiots) When minimax trained twice, perform worse

How to build the roots(Genetic Algorithm) Instead of sing root or 5 roots, we enlarge it to 20.

Order importance (short-cut) Best moves should be played immediately. Urgent moves are important than big moves.

Win Time History(improvement)   If “win”, the win-time of all tried moves in this simulation increases by 1 as reward; If “lost”, the win-time decreases by 2 as punishment; If “tie”, win-time does not change.

Tests and results Search area Search tree node Count for 1st step   Search area Search tree node Count for 1st step Avg search tree node count till 5st step/step Avg search tree node count till Game End/step Unimproved-Minimax(3) Whole board 1597360 1013302 642326 Improved-Minimax(3) Limited area 73870 135515 128551

Tests and results Minimax(3) 14:15:0(30) Win:46% Lose:50% Tie:3.33%   opponent Win rate W:L:T (n Games) Monte Average time per step MiniMaxAverage time per step Average step per Game Simulating Game For Each Root Move:1000 Repeat Smart Simulating : 1 MiniMax Training Depth: 2 Minimax(3) 14:15:0(30) Win:46% Lose:50% Tie:3.33% 1.13s 5.48s 74 Simulating Game For Each Root Move:5000 18:10:2(30) Win:60% Lose:33.33% Tie:6.67% 1.58s 5.21s 75 Simulating Game For Each Root Move:10000 7:2:1(10) Win:70% Lose:20% Tie:10% 2.46s 5.7s 92 Repeat Smart Simulating : 3 16:11:3(30) Win:53% Lose:36.37% 3.43s 4.67s 82 7:1:2(10) Lose:10% 6.61s 8.57s 113 13:5:2(20) Win:65% Lose:25% 8.2s 7.7s 94 Simulating Game For Each Root move trained by Minimax(2) twice and simulates 4: 16(20) 1.16s 2.2s 20

Conclusion Monte Carlo approach shows its reliability on win-rate and its good time efficiency on performance. It’s a promising approach for more complicated games(e.g. Go)

Future Map history and current win-time to [-1000, 1000] Find out best simulation time to maximize win- rate Find out best reward and punishment strategy