You already have these slides, this is just a repacking in the order I intend to present them on OCT 14.

Slides:



Advertisements
Similar presentations
Artificial Intelligence 5. Game Playing
Advertisements

Adversarial Search Chapter 6 Section 1 – 4. Types of Games.
Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Games & Adversarial Search Chapter 5. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent’s reply. Time.
Techniques for Dealing with Hard Problems Backtrack: –Systematically enumerates all potential solutions by continually trying to extend a partial solution.
Games & Adversarial Search
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.
CS 484 – Artificial Intelligence
Adversarial Search: Game Playing Reading: Chapter next time.
MINIMAX SEARCH AND ALPHA- BETA PRUNING: PLAYER 1 VS. PLAYER 2.
Games CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Lecture 13 Last time: Games, minimax, alpha-beta Today: Finish off games, summary.
Game Playing CSC361 AI CSC361: Game Playing.
Min-Max Trees Based on slides by: Rob Powers Ian Gent Yishay Mansour.
1 search CS 331/531 Dr M M Awais A* Examples:. 2 search CS 331/531 Dr M M Awais 8-Puzzle f(N) = g(N) + h(N)
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2006.
Games & Adversarial Search Chapter 6 Section 1 – 4.
1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.
CSC 412: AI Adversarial Search
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Lecture 6: Game Playing Heshaam Faili University of Tehran Two-player games Minmax search algorithm Alpha-Beta pruning Games with chance.
Game Playing.
Game Playing Chapter 5. Game playing §Search applied to a problem against an adversary l some actions are not under the control of the problem-solver.
Instructor: Vincent Conitzer
Minimax with Alpha Beta Pruning The minimax algorithm is a way of finding an optimal move in a two player game. Alpha-beta pruning is a way of finding.
Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.
Today’s Topics Playing Deterministic (no Dice, etc) Games –Mini-max –  -  pruning –ML and games? 1997: Computer Chess Player (IBM’s Deep Blue) Beat Human.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Pigeon Problems Revisited Pigeons (Columba livia) as Trainable Observers of Pathology and Radiology Breast Cancer Images.
Adversarial Search and Game Playing Russell and Norvig: Chapter 6 Slides adapted from: robotics.stanford.edu/~latombe/cs121/2004/home.htm Prof: Dekang.
Adversarial Search Chapter Two-Agent Games (1) Idealized Setting – The actions of the agents are interleaved. Example – Grid-Space World – Two.
Artificial Intelligence AIMA §5: Adversarial Search
Game Playing Why do AI researchers study game playing?
Adversarial Search and Game-Playing
EA C461 – Artificial Intelligence Adversarial Search
Instructor: Vincent Conitzer
Iterative Deepening A*
CS Fall 2016 (Shavlik©), Lecture 11, Week 6
Adversarial Search and Game Playing (Where making good decisions requires respecting your opponent) R&N: Chap. 6.
Adversarial Search (game playing search)
Pengantar Kecerdasan Buatan
AlphaGo with Deep RL Alpha GO.
State Space 4 Chapter 4 Adversarial Games.
Here is a puzzle I found on a t-shirt
Project 1 Project_1_The_Eight_Puzzle.doc.
Adversarial Search Chapter 5.
Games & Adversarial Search
Games & Adversarial Search
Project 1 Project_1_The_Eight_Puzzle.doc.
Alpha-Beta Search.
Games & Adversarial Search
Games & Adversarial Search
Alpha-Beta Search.
Instructor: Vincent Conitzer
Introduction to Artificial Intelligence Lecture 9: Two-Player Games I
Alpha-Beta Search.
Minimax strategies, alpha beta pruning
Alpha-Beta Search.
Instructor: Vincent Conitzer
Mini-Max search Alpha-Beta pruning General concerns on games
Based on slides by: Rob Powers Ian Gent
Search.
Search.
Games & Adversarial Search
Alpha-Beta Search.
Minimax strategies, alpha beta pruning
CS51A David Kauchak Spring 2019
Games & Adversarial Search
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning
Presentation transcript:

You already have these slides, this is just a repacking in the order I intend to present them on OCT 14

For 1% Extra Credit Who makes Van Halen’s drums?

Who do you think?

Project 1 Project_1_The_Eight_Puzzle.doc

Read the project handout. This is not hard or complicated ;-) I can do this in 24 lines of Matlab. If you find yourself writing hundreds of lines of code, stop, think and start again. You need to write a report that has your findings. Here is a hint, your findings will be: For shallow problems, it does not matter too much what heuristic you use, if any. As the problems get harder, heuristics help more and more. A good heuristic is better than a weak heuristic. You need to write a “A two to five -page report which summaries your findings”. I expect this report to be well written, coherent and largely free of misspellings/typos/poor grammar. I expect clean, well thought out figures and/or tables. The basic story of the report is: For simple problems (only slightly “messed-up” puzzles), having a heuristic does not make a difference. However, for harder puzzles, having a heuristic like Misplaced Tiles helps, and having a tight heuristic like Manhattan distance really helps.

I have given you a sample report. It was created by a young lady in my class two years ago. It is not perfect, but it is very good.

Don’t cheat If you copy a single line of text, or a single line of code without proper attribution, I will fail you. I am better at catching cheaters, than you are at cheating.

Test cases Oh Boy 8 7 1 6 * 2 5 4 3 IMPOSSIBLE: The following puzzle is impossible to solve, if you can solve it, you have a bug in your code. 1 2 3 4 5 6 8 7 * Trival 1 2 3 4 5 6 7 8 * Very Easy 7 * 8 Easy 1 2 * 4 5 3 7 8 6 doable * 1 2

Adversarial Search: Revisted (game playing search) We have experience in search where we assume that we are the only intelligent entity and we have explicit control over the “world”. Let us consider what happens when we relax those assumptions.

Example Utility Functions II Chess I Assume Max is “White” Assume each piece has the following values pawn = 1; knight = 3; bishop = 3; rook = 5; queen = 9; let w = sum of the value of white pieces let b = sum of the value of black pieces e(n) = w - b w + b Note that this value ranges between 1 and -1

Search the game tree as deep as you can in the given time. Depth limited Minimax search. Search the game tree as deep as you can in the given time. Evaluate the fringe nodes with the utility function. Back up the values to the root. Choose best move, repeat. And, replace Minimax with alpha-beta to go a little deeper (about twice as deep) After reply, search to cutoff, make best move, wait for reply… Search to cutoff, make best move, wait for reply… After reply, search to cutoff, make best move, wait for reply…

Branching Factor Average game length Game-tree complexity Checkers The Game-tree complexity of Go is about 2787 times that of Chess. So even if Moore's law holds out forever, it would take about 700 years for Alpha-Beta to be competitive here. Checkers Chess Go 8 35 250 70 123 150 1031 10123 10360

? Super Human Performance “Although progress has been steady, it will take many decades of research and development before world-championship– caliber go programs exist”. Jonathan Schaeffer, 2001 Checkers Chess Go “It may be a hundred years before a computer beats humans at Go—maybe even longer.” Dr. Piet Hut, 1997 ? 1994 1997

Some Concrete Numbers Exhaustively calculating the first eight moves would require computing 512 quintillion (5.12×1020) possible combinations. As of March 2014, the most powerful supercomputer in the world, NUDT's "Tianhe-2", can sustain 33 petaflops. At that rate, even given an exceedingly low estimate of 10 operations required to assess the value of one play of a stone, Tianhe-2 would require 4 hours to assess all possible combinations of the next eight moves in order to make a single play. Go ? 8

We have good utility functions for Checkers, Chess etc. What about Go? All utility functions tend to work better, the deeper you are in the tree. Even if we could get a supercomputer, and we could wait four hours, we would find that the utility function is basically the same for all nodes on the horizon! So we have no information to make an informed choice. This is the key reason why GO is much harder than chess 8

Monte Carlo Tree Search Intuition Imagine the following: We are playing Go, and we need to choose between two moves, one is the move that is to the root of the red subtree, the other is the move that is to the root of the blue subtree. The best evaluation function basically says “I have no idea which is better, maybe flip a coin?” It happens to be true (but we have no way to know this) that: 90% of the terminal nodes in Red are wins for white (Max) 50% of the terminal nodes in Blue are wins for white (Max) So, all things being equal, Red is a much better choice. Red Blue

Suppose we started at the root of Red, and randomly traversed down the tree to a terminal node. What is the probability that this terminal node is a win for white (Max)? It is 0.9 If I did the same for the Blue subtree, what is the probability that this terminal node is a win for white (Max)? It is 0.5

Suppose I do this multiple times, lets say ten times, what is the expected number of wins for White (Max)?

Red Blue Think of as: 9/10 = 90% So 90% of the games that pass through this node are wins for Max Likewise 6/10 = 60% So 60% of the games that pass through this node are wins for Max 9/10 6/10 Red Blue Note that the correct value is 50%, but our estimate of 60% is pretty close

Monte-Carlo tree search (MCTS) Until we run our of time (lets say one minute) MCTS repeats four phases: descent, roll-out, update, and growth. descent phase: expand current node roll-out phase: play n random games from each leaf node (in my example, n = 3) update phase: the statistics (number of wins) are passed up the tree. growth phase: expand the most promising node.

descent phase

Random Game 1: Win for white roll-out phase Random Game 1: Win for white

Random Game 2: Win for black roll-out phase Random Game 2: Win for black Random Game 1: Win for white

Random Game 2: Win for black roll-out phase Think of as: 2/3 =66.6% So 66.6% of the games that pass through this node are wins for Max 2/3 Random Game 2: Win for black Random Game 1: Win for white Random Game 3: Win for white

2/3 3/3 Random Game 3: Win for white Random Game 1: Win for white roll-out phase 2/3 3/3 Random Game 3: Win for white Random Game 1: Win for white Random Game 2: Win for white

2/3 3/3 1/3 1/3 7/12 update phase We can update the ancestor node(s) So 7/12 of the games that pass through the root are wins for Max 7/12 2/3 3/3 1/3 1/3

growth phase 7/12 2/3 3/3 1/3 1/3

roll-out phase 7/12 2/3 3/3 1/3 1/3 1/3

update phase 8/15 2/3 4/6 1/3 1/3 1/3

growth phase 8/15 2/3 4/6 1/3 1/3 1/3

roll-out phase 8/15 2/3 4/6 1/3 1/3 3/3 1/3

update phase 11/18 5/6 4/6 1/3 1/3 3/3 1/3

11/18 5/6 4/6 1/3 1/3 3/3 1/3 Stop! We ran out of time, our one minute is up. Of our four options, one has a the highest estimated probability of a win (5/6), so that is our move.

We wait a minute for Min to make his move. He choses the bold circle. 11/18 5/6 4/6 1/3 1/3 1/3 We wait a minute for Min to make his move. He choses the bold circle. This now becomes the root of our new search tree, and we begin our one minute of MCTS again….

When will AI Go be superhuman? I found this chart of progress in 2013, so it extrapolated it (next slide)…. Zen19 ratings over time on KGS Go servers. Data from KGS (2010, 2013a). Adapted from: Algorithmic Progress in Six Domains. Katja Grace There are about 100 USA players at 5 dan or above

When will AI GO be superhuman? It correctly predicted human equivalence around 2017 Best Humans 9 8 7 2014 2015 2016 2017 In the 2017 Future of Go Summit, AlphaGo beat Ke Jie, the world No.1 ranked player at the time, in a three-game match.

Here is a puzzle I found on a t-shirt The task, is to go from start to finish, alternating the colors of the “orbs” you pass…

.. So this partial solution is not legal, since I have past two blues in a row…

Here is a solution (I am not sure if it is unique)

To Think About Assume we are to solve this with blind search.. How would you represent the states and the operators? What is the branching factor? What is the diameter? Can you get it exactly, or at least upper and/or lower bounds? Which blind search algorithm would you use?

The states should be the intersections The states should be the intersections. It is only there that we have a choice. Here choices are operators. For each state we need to know.. F H I M Possible operators allowed For example: From C, we can get to B, E or F From A, we can get to B C E G J L N B D K O A P

We also need to know the parity for each operator. That is to say, we need to know the last color visited. For example: For node F If C was parent, last color was Blue If H was parent, last color was Blue If M was parent, last color was Blue If E was parent, last color was Red F H I M C E G J L N B D K O A P

F H I M C E G J L N B D K O A P For node F If C was parent, last color was Blue If H was parent, last color was Blue If M was parent, last color was Blue If E was parent, last color was Red Given the above, we can list the legal operators IF last color was Blue legal operators = { E } ELSE legal operators = { C, M , H} END F H I M C E G J L N B D K O A P

A B C D E F B N L O P Start Goal! What can we say about the branching factor? The are some four-way intersections, but you can never go back (because the last color you saw, would be seen again). So the branching factor is at most three for the four-way intersections, and there are four of them. The are some three-way intersections, but again you can never go back. So the branching factor is at most two for the three-way intersections, and there are twelve of them1. So an estimate for the branching factor is b = (12/16 * 2) + (4/16*3) = 2.25. This is actually an upper bound. A Start B C D E F B N L O P Goal! 1Really should say, at most 12, because A has only one choice

A B C D E F B N L O P Start Goal! What can we say about the branching factor? The are some four-way intersections, but you can never go back (because the last color you saw, would be seen again). So the branching factor is at most three. That is good enough an estimate. What can we say about the depth? Lets do a lower bound. If I take away all the colors, then I can solve this with ten moves: A, B, C, F , H, I, M, L, N, O, P So 10 is a lower bound. Lets do an upper bound. If we count the number of intersections, we might guess 16. However, with a little introspection, we can see that: We might have to visit some intersections twice, each time from a different parent, and each time going a different way. We never have to visit an intersection twice! So an upper bound is d = (16 * 2) = 32. A Start B C D E F B N L O P Goal!

Which algorithm should we use? How many nodes do we have to check? Assume we take one nanosecond to test each state that we pop off NODES. Assume the worst case We have b = 2.25 and d = 32. (2.2532) nanosecond = 3.1 minutes Assume the best case We have b = 2.25 and d = 10 (2.2510) nanosecond = 3.3 microseconds Assume the actual depth is half way between best and worst case We have b = 2.25 and d = 24 (2.2524) nanosecond = 0.28 seconds It probably does not matter what algorithm we use. We are not likely to run out of space or time. Likewise, I would not bother to optimize my code. However, this assumes we solving the puzzle once. The similar “google maps” problem needs to be solved millions of times a day, so for that problem, we would optimize the code very carefully.

SETTING SUN Blanks, that pieces can slides into Here is a depth 16 solution, I don’t know if it is optimal. Here is a video of a solution https://www.youtube.com/watch?v=8diOixSxc0g