AlphaGO from Google DeepMind in 2016, beat human grandmasters

Slides:

Advertisements

Similar presentations

Fuzzy Reasoning in Computer Go Opening Stage Strategy P.Lekhavat and C.J.Hinde.

Advertisements

Learning in Computer Go David Silver. The Problem Large state space  Approximately states  Game tree of about nodes  Branching factor.

Artificial Intelligence in Game Design Heuristics and Other Ideas in Board Games.

Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.

Honte, a Go-Playing Program Using Neural Nets Frederik Dahl.

Learning Shape in Computer Go David Silver. A brief introduction to Go Black and white take turns to place down stones Once played, a stone cannot move.

Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.

Monte Carlo Go Has a Way to Go Haruhiro Yoshimoto (*1) Kazuki Yoshizoe (*1) Tomoyuki Kaneko (*1) Akihiro Kishimoto (*2) Kenjiro Taura (*1) (*1)University.

Reinforcement Learning of Local Shape in the Game of Atari-Go David Silver.

Artificial Intelligence in Game Design Lecture 22: Heuristics and Other Ideas in Board Games.

Introduction Many decision making problems in real life

Artificial Intelligence in Games CA107 Topics in Computing Dr. David Sinclair School of Computer Applications

GOMOKU ALGORITHM STUDY MIN-MAX AND MONTE CARLO APPROACHING

Connect Four AI Robert Burns and Brett Crawford. Connect Four  A board with at least six rows and seven columns  Two players: one with red discs and.

Computer Go : A Go player Rohit Gurjar CS365 Project Presentation, IIT Kanpur Guided By – Prof. Amitabha Mukerjee.

CHECKERS: TD(Λ) LEARNING APPLIED FOR DETERMINISTIC GAME Presented By: Presented To: Amna Khan Mis Saleha Raza.

Every chess master was once a beginner. Irving Chernev

Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007.

GOMOKU ALGORITHM STUDY MIN-MAX AND MONTE CARLO APPROACHING

1 Evaluation Function for Computer Go. 2 Game Objective Surrounding most area on the boardSurrounding most area on the board.

Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.

AI: AlphaGo European champion : Fan Hui A feat previously thought to be at least a decade away!!!

ConvNets for Image Classification

Understanding AlphaGo. Go Overview Originated in ancient China 2,500 years ago Two players game Goal - surround more territory than the opponent 19X19.

Some counting questions with the Rules of Chess. Chessboard and Chess pieces The game of chess is played on an 8-by-8 grid of alternately colored squares.

Game Playing Why do AI researchers study game playing?

Reinforcement Learning

Adversarial Environments/ Game Playing

Instructor: Vincent Conitzer

Stochastic tree search and stochastic games

Machine Learning for Big Data

Status Report on Machine Learning

Ab & Team Presents CHESS “It’s not just a game”.

I Know it’s an Idiot But it’s MY artificial idiot!

Mastering the game of Go with deep neural network and tree search

AlphaGo with Deep RL Alpha GO.

Status Report on Machine Learning

Videos NYT Video: DeepMind's alphaGo: Match 4 Summary: see 11 min.

Alpha Go …and Higher Ed Reuben Ternes Oakland University

AlphaGo and learning methods

Deep reinforcement learning

AI Strategies for Probabilistic Turn Based Games

CS 4700: Foundations of Artificial Intelligence

Adversarial Search Chapter 5.

AlphaGo and learning methods

A note given in BCC class on March 15, 2016

Dakota Ewigman Jacob Zimmermann

Machine Learning: The Connectionist

Games & Adversarial Search

Adversarial Search.

Deep learning Concept Objectives Accomplishments and Impact

Kevin Mason Michael Suggs

Reinforcement Learning

Instructor: Vincent Conitzer

Connect 4 Michael Yura.

Try starting with fewer objects to investigate.

Reinforcement learning

Unit 10 Review Around the World.

FOUR PLAYER CHESS.

Adversarial Search, Game Playing

These neural networks take a description of the Go board as an input and process it through 12 different network layers containing millions of neuron-like.

Deep Reinforcement Learning: Learning how to act using a deep neural network Psych 209, Winter 2019 February 12, 2019.

Introduction to Artificial Intelligence

Mastering Open-face Chinese Poker by Self-play Reinforcement Learning

CS51A David Kauchak Spring 2019

Connect 4 michael yura.

Function approximation

Connect 4 Michael Yura.

Prabhas Chongstitvatana Chulalongkorn University

Presentation transcript:

AlphaGO from Google DeepMind in 2016, beat human grandmasters uses Monte Carlo game tree search Science Breakthrough of the year runner-up Silver et al (2016). Nature, 529:484–489.

Go - the game 19x19, black vs white stones images from https://en.wikipedia.org/wiki/Go_(game) Go - the game 19x19, black vs white stones chess: O(bd) ~ 3580, go: O(bd) ~ 250150 Rule 1: Every stone remaining on the board must have at least one open "point" ("liberty") directly adjacent (up, down, left, or right), or must be part of a connected group that has at least one such open point next to it. Stones or groups of stones which lose their last liberty are removed from the board. Strategies: Connection: Keeping one's own stones connected means that fewer groups need to make living shape, and one has fewer groups to defend. Stay alive: The simplest way to stay alive is to establish a foothold in the corner or along one of the sides. At a minimum, a group must have two eyes (separate open points) to be "alive" liberties capture

AlphaGo implementation uses deep networks (13 layers) to represent "value function" and "policy function" these are for Reinforcement Learning learn the value of good moves or positions by whether they lead to wins (discounted rewards) performs Monte Carlo game search explore state space like minimax random "rollouts" simulate probable plays by opponent according to policy function

AlphaGo implementation hardware: 1920 CPUs, 28O GPUs training of networks: phase 1: supervised learning from database of 30 million human moves phase 2: play against self using reinforcement learning

match against Tang Weixing (2016)