How to Win a Chinese Chess Game Reinforcement Learning Cheng, Wen Ju.

Slides:



Advertisements
Similar presentations
Artificial Intelligence 5. Game Playing
Advertisements

Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Adversarial Search Reference: “Artificial Intelligence: A Modern Approach, 3 rd ed” (Russell and Norvig)
Games and adversarial search
10/29/01Reinforcement Learning in Games 1 Colin Cherry Oct 29/01.
CS 484 – Artificial Intelligence
Adversarial Search Chapter 6 Section 1 – 4.
Adversarial Search Chapter 5.
COMP-4640: Intelligent & Interactive Systems Game Playing A game can be formally defined as a search problem with: -An initial state -a set of operators.
Adversarial Search: Game Playing Reading: Chapter next time.
Applied Neuro-Dynamic Programming in the Game of Chess James Gideon.
"Programming a Computer to Play Chess" A presentation of C. Shannon's 1950 paper by Andrew Oldag April
Application of Artificial intelligence to Chess Playing Capstone Design Project 2004 Jason Cook Bitboards  Bitboards are 64 bit unsigned integers, with.
Adversarial Search CSE 473 University of Washington.
Hoe schaakt een computer? Arnold Meijster. Why study games? Fun Historically major subject in AI Interesting subject of study because they are hard Games.
MINIMAX SEARCH AND ALPHA- BETA PRUNING: PLAYER 1 VS. PLAYER 2.
The Implementation of Artificial Intelligence and Temporal Difference Learning Algorithms in a Computerized Chess Programme By James Mannion Computer Systems.
Artificial Intelligence in Game Design Heuristics and Other Ideas in Board Games.
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Adversarial Search Board games. Games 2 player zero-sum games Utility values at end of game – equal and opposite Games that are easy to represent Chess.
Chess AI’s, How do they work? Math Club 10/03/2011.
Game Playing in the Real World Oct 8 th : Uncertainty and Probabilistic reasoning Oct 10th: How should we define artificial intelligence? Reading for Oct.
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Lecture 13 Last time: Games, minimax, alpha-beta Today: Finish off games, summary.
Games and adversarial search
Learning Shape in Computer Go David Silver. A brief introduction to Go Black and white take turns to place down stones Once played, a stone cannot move.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6: Adversarial Search Fall 2008 Marco Valtorta.
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
Shallow Blue Project 2 Due date: April 5 th. Introduction Second in series of three projects This project focuses on getting AI opponent Subsequent project.
MAE 552 – Heuristic Optimization Lecture 28 April 5, 2002 Topic:Chess Programs Utilizing Tree Searches.
Adversarial Search: Game Playing Reading: Chess paper.
Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.
double AlphaBeta(state, depth, alpha, beta) begin if depth
Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
CSC 412: AI Adversarial Search
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
PSU CS 370 – Introduction to Artificial Intelligence Game MinMax Alpha-Beta.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Adversarial Search CS30 David Kauchak Spring 2015 Some material borrowed from : Sara Owsley Sood and others.
Game Playing.
Artificial Intelligence in Game Design Lecture 22: Heuristics and Other Ideas in Board Games.
Prepared by : Walaa Maqdasawi Razan Jararah Supervised by: Dr. Aladdin Masri.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Mark Dunlop, Computer and Information Sciences, Strathclyde University 1 Algorithms & Complexity 5 Games Mark D Dunlop.
Evaluation Function in Game Playing Programs M1 Yasubumi Nozawa Chikayama & Taura Lab.
Game-playing AIs Part 1 CIS 391 Fall CSE Intro to AI 2 Games: Outline of Unit Part I (this set of slides)  Motivation  Game Trees  Evaluation.
Adversarial Search Chapter 6 Section 1 – 4. Outline Optimal decisions α-β pruning Imperfect, real-time decisions.
Instructor: Vincent Conitzer
Othello Playing AI Matt Smith. Othello 8x8 Board game 8x8 Board game Try to outflank opponents pieces Try to outflank opponents pieces Winner ends up.
Games 1 Alpha-Beta Example [-∞, +∞] Range of possible values Do DF-search until first leaf.
Chess and AI Group Members Abhishek Sugandhi Sanjeet Khaitan Gautam Solanki
GAME PLAYING 1. There were two reasons that games appeared to be a good domain in which to explore machine intelligence: 1.They provide a structured task.
Computers and Games Board Maps and Hill-Climbing for Opening and Middle Game Play in Shogi Reijer Grimbergen (Saga University) Jeff Rollason (Oxford.
Adversarial Search Chapter 6 Section 1 – 4. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time.
February 25, 2016Introduction to Artificial Intelligence Lecture 10: Two-Player Games II 1 The Alpha-Beta Procedure Can we estimate the efficiency benefit.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 5 Adversarial Search (Thanks Meinolf Sellman!)
Luca Weibel Honors Track: Competitive Programming & Problem Solving Partisan game theory.
ADVERSARIAL SEARCH Chapter 6 Section 1 – 4. OUTLINE Optimal decisions α-β pruning Imperfect, real-time decisions.
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
Artificial Intelligence AIMA §5: Adversarial Search
BACE Bowron – Abernethy Chess Engine Christopher Bowron Rob Abernethy.
Games and adversarial search (Chapter 5)
Optimizing Minmax Alpha-Beta Pruning Real Time Decisions
Dakota Ewigman Jacob Zimmermann
Heuristic AI for XiangQi
The Alpha-Beta Procedure
CS51A David Kauchak Spring 2019
Presentation transcript:

How to Win a Chinese Chess Game Reinforcement Learning Cheng, Wen Ju

Set Up RIVER

General

Guard

Minister

Rook

Knight

Cannon

Pawn

Training how long does it to take for a human? how long does it to take for a computer? Chess program, “KnightCap”, used TD to learn its evaluation function while playing on the Free Internet Chess Server (FICS, fics.onenet.net), improved from a 1650 rating to a 2100 rating (the level of US Master, world champion are rating around 2900) in just 308 games and 3 days of play.

Training to play a series of games in a self-play learning mode using temporal difference learning The goal is to learn some simple strategies –piece values or weights

Why Temporal Difference Learning the average branching factor for the game tree is usually around 30 the average game lasts around 100 ply the size of a game tree is

Searching alpha-beta search 3 ply search vs 4 ply search horizon effect quiescence cutoff search

Horizon Effect t t+1 t+2t+3

Evaluation Function feature –property of the game feature evaluators –Rook, Knight, Cannon, Minister, Guard, and Pawn weight: –the value of a specific piece type feature function: f –return the current player’s piece advantage on a scale from -1 to 1 evaluation function: Y Y = ∑ k=1 to 7 w k * f k

TD(λ) and Updating the Weights w i, t+1 = w i, t +  (Y t+1 – Y t )  k=1 to t t-k∆ w i Y k = w i, t +  (Y t+1 – Y t )(f i, t + f i, t f i, t-2 + … + t-1 f i, 1 )  learning rate –how quickly the weights can change  0.01 feedback coefficient -how much to discount past values

Features Table tf1f2f3f4f5f Array of Weights

Example t=5t=6 t=7t-8

Final Reward loser –if is a draw, the final reward is 0 –if the board evaluation is negative, then the final reward is twice the board –if the board evaluation is positive, then the final reward is -2 times the board evaluation winner –if is a draw, the final reward is 0 –if the board evaluation is negative, then the final reward is -2 times the board evaluation –if the board evaluation is positive, then the final reward is twice the board evaluation

Final Reward the weights are normalized by dividing by the greatest weight any negative weights are set to zero the most valuable piece has weight 1

Summary of Main Events 1.Red’s turn 2.Update weights for Red using TD(λ) 3.Red does alpha-beta search. 4.Red executes the best move found 5.Blue’s turn 6.Update weights for Blue using TD(λ) 7.Blue does alpha-beta search 8.Blue executes the best move found (go to 1)

After the Game Ends 1.Calculate and assign final reward for losing player 2.Calculate and assign final reward for winning player 3.Normalize the weights between 0 and 1

Results 10 games series 100 games series learned weights are carried over into the next series began with all weights initialized to 1 The goal is to learn the different the piece values that is close to the default values defined by H.T. Lau or even better

Observed Behavior the early stages –played pretty randomly after 20 games –had identified the most valuable piece – Rook after 250 games –played better –protecting the valuable pieces, and trying to capture a valuable piece

Weights H.T.L weights After 20 games After 250 games Rook Knight Cannon Guard Minister Pawn

Testing self-play games –Red played using the learned weights after 250 games –Blue used H.T. Lau’s equivalent of the weights 5 games –red won 3 –blue won once –one draw

Future Works 8 different types or "categories" of features: 1.Piece Values 2.Comparative Piece Advantage 3.Mobility 4.Board Position 5.Piece Proximity 6.Time Value of Pieces 7.Piece Combinations 8.Piece Configurations

Examples

Cannon behind Knight

Conclusion Computer Chinese chess has been studied for more than twenty years. Recently, due to the advancement of AI researches and enhancement of computer hardware in both efficiency and capacity, some Chinese chess programs with grand-master level (about 6-dan in Taiwan) have been successfully developed. Professor Shun-Chin Hsu of Chang-Jung University (CJU), who has involved in the development of computer Chinese chess programs for a long time of period, points out that “the strength of Chinese chess programs increase 1-dan every three years.” He also predicts that a computer program will beat the “world champion of Chinese chess” before 2012.

When and What 2004 World Computer Chinese Chess Championship Competition Dates : – June 25-26, 2004 Prizes : (1) First Place USD 1,500 A gold medal (2) Second Place USD 900 A silver medal (3) Third Place USD 600 A bronze medal (4) Fourth Place USD 300

References C. Szeto. Chinese Chess and Temporal Difference Learning J. Baxter. KnightCap: A chess program that learns by combining TD(λ) with minimax search T. Trinh. Temporal Difference Learning in Chinese Chess