Download presentation
Presentation is loading. Please wait.
Published byBrittney Stewart Modified over 5 years ago
1
You already have these slides, this is just a repacking in the order I intend to present them on OCT 14
3
For 1% Extra Credit Who makes Van Halen’s drums?
5
Who do you think?
6
Project 1 Project_1_The_Eight_Puzzle.doc
7
Read the project handout.
This is not hard or complicated ;-) I can do this in 24 lines of Matlab. If you find yourself writing hundreds of lines of code, stop, think and start again. You need to write a report that has your findings. Here is a hint, your findings will be: For shallow problems, it does not matter too much what heuristic you use, if any. As the problems get harder, heuristics help more and more. A good heuristic is better than a weak heuristic. You need to write a “A two to five -page report which summaries your findings”. I expect this report to be well written, coherent and largely free of misspellings/typos/poor grammar. I expect clean, well thought out figures and/or tables. The basic story of the report is: For simple problems (only slightly “messed-up” puzzles), having a heuristic does not make a difference. However, for harder puzzles, having a heuristic like Misplaced Tiles helps, and having a tight heuristic like Manhattan distance really helps.
8
I have given you a sample report.
It was created by a young lady in my class two years ago. It is not perfect, but it is very good.
9
Don’t cheat If you copy a single line of text, or a single line of code without proper attribution, I will fail you. I am better at catching cheaters, than you are at cheating.
10
Test cases Oh Boy 8 7 1 6 * 2 5 4 3 IMPOSSIBLE: The following puzzle is impossible to solve, if you can solve it, you have a bug in your code. 1 2 3 4 5 6 8 7 * Trival 1 2 3 4 5 6 7 8 * Very Easy 7 * 8 Easy 1 2 * 4 5 3 7 8 6 doable * 1 2
12
Adversarial Search: Revisted (game playing search)
We have experience in search where we assume that we are the only intelligent entity and we have explicit control over the “world”. Let us consider what happens when we relax those assumptions.
13
Example Utility Functions II
Chess I Assume Max is “White” Assume each piece has the following values pawn = 1; knight = 3; bishop = 3; rook = 5; queen = 9; let w = sum of the value of white pieces let b = sum of the value of black pieces e(n) = w - b w + b Note that this value ranges between 1 and -1
14
Search the game tree as deep as you can in the given time.
Depth limited Minimax search. Search the game tree as deep as you can in the given time. Evaluate the fringe nodes with the utility function. Back up the values to the root. Choose best move, repeat. And, replace Minimax with alpha-beta to go a little deeper (about twice as deep) After reply, search to cutoff, make best move, wait for reply… Search to cutoff, make best move, wait for reply… After reply, search to cutoff, make best move, wait for reply…
15
Branching Factor Average game length Game-tree complexity Checkers
The Game-tree complexity of Go is about 2787 times that of Chess. So even if Moore's law holds out forever, it would take about 700 years for Alpha-Beta to be competitive here. Checkers Chess Go 8 35 250 70 123 150 1031 10123 10360
16
? Super Human Performance
“Although progress has been steady, it will take many decades of research and development before world-championship– caliber go programs exist”. Jonathan Schaeffer, 2001 Checkers Chess Go “It may be a hundred years before a computer beats humans at Go—maybe even longer.” Dr. Piet Hut, 1997 ? 1994 1997
17
Some Concrete Numbers Exhaustively calculating the first eight moves would require computing 512 quintillion (5.12×1020) possible combinations. As of March 2014, the most powerful supercomputer in the world, NUDT's "Tianhe-2", can sustain 33 petaflops. At that rate, even given an exceedingly low estimate of 10 operations required to assess the value of one play of a stone, Tianhe-2 would require 4 hours to assess all possible combinations of the next eight moves in order to make a single play. Go ? 8
18
We have good utility functions for Checkers, Chess etc. What about Go?
All utility functions tend to work better, the deeper you are in the tree. Even if we could get a supercomputer, and we could wait four hours, we would find that the utility function is basically the same for all nodes on the horizon! So we have no information to make an informed choice. This is the key reason why GO is much harder than chess 8
19
Monte Carlo Tree Search Intuition
Imagine the following: We are playing Go, and we need to choose between two moves, one is the move that is to the root of the red subtree, the other is the move that is to the root of the blue subtree. The best evaluation function basically says “I have no idea which is better, maybe flip a coin?” It happens to be true (but we have no way to know this) that: 90% of the terminal nodes in Red are wins for white (Max) 50% of the terminal nodes in Blue are wins for white (Max) So, all things being equal, Red is a much better choice. Red Blue
20
Suppose we started at the root of Red, and randomly traversed down the tree to a terminal node. What is the probability that this terminal node is a win for white (Max)? It is 0.9 If I did the same for the Blue subtree, what is the probability that this terminal node is a win for white (Max)? It is 0.5
21
Suppose I do this multiple times, lets say ten times, what is the expected number of wins for White (Max)?
23
Red Blue Think of as: 9/10 = 90%
So 90% of the games that pass through this node are wins for Max Likewise 6/10 = 60% So 60% of the games that pass through this node are wins for Max 9/10 6/10 Red Blue Note that the correct value is 50%, but our estimate of 60% is pretty close
24
Monte-Carlo tree search (MCTS)
Until we run our of time (lets say one minute) MCTS repeats four phases: descent, roll-out, update, and growth. descent phase: expand current node roll-out phase: play n random games from each leaf node (in my example, n = 3) update phase: the statistics (number of wins) are passed up the tree. growth phase: expand the most promising node.
25
descent phase
26
Random Game 1: Win for white
roll-out phase Random Game 1: Win for white
27
Random Game 2: Win for black
roll-out phase Random Game 2: Win for black Random Game 1: Win for white
28
Random Game 2: Win for black
roll-out phase Think of as: 2/3 =66.6% So 66.6% of the games that pass through this node are wins for Max 2/3 Random Game 2: Win for black Random Game 1: Win for white Random Game 3: Win for white
29
2/3 3/3 Random Game 3: Win for white Random Game 1: Win for white
roll-out phase 2/3 3/3 Random Game 3: Win for white Random Game 1: Win for white Random Game 2: Win for white
30
2/3 3/3 1/3 1/3 7/12 update phase We can update the ancestor node(s)
So 7/12 of the games that pass through the root are wins for Max 7/12 2/3 3/3 1/3 1/3
31
growth phase 7/12 2/3 3/3 1/3 1/3
32
roll-out phase 7/12 2/3 3/3 1/3 1/3 1/3
33
update phase 8/15 2/3 4/6 1/3 1/3 1/3
34
growth phase 8/15 2/3 4/6 1/3 1/3 1/3
35
roll-out phase 8/15 2/3 4/6 1/3 1/3 3/3 1/3
36
update phase 11/18 5/6 4/6 1/3 1/3 3/3 1/3
37
11/18 5/6 4/6 1/3 1/3 3/3 1/3 Stop! We ran out of time, our one minute is up. Of our four options, one has a the highest estimated probability of a win (5/6), so that is our move.
38
We wait a minute for Min to make his move. He choses the bold circle.
11/18 5/6 4/6 1/3 1/3 1/3 We wait a minute for Min to make his move. He choses the bold circle. This now becomes the root of our new search tree, and we begin our one minute of MCTS again….
39
When will AI Go be superhuman?
I found this chart of progress in 2013, so it extrapolated it (next slide)…. Zen19 ratings over time on KGS Go servers. Data from KGS (2010, 2013a). Adapted from: Algorithmic Progress in Six Domains. Katja Grace There are about 100 USA players at 5 dan or above
40
When will AI GO be superhuman?
It correctly predicted human equivalence around 2017 Best Humans 9 8 7 2014 2015 2016 2017 In the 2017 Future of Go Summit, AlphaGo beat Ke Jie, the world No.1 ranked player at the time, in a three-game match.
41
Here is a puzzle I found on a t-shirt
The task, is to go from start to finish, alternating the colors of the “orbs” you pass…
42
.. So this partial solution is not legal, since I have past two blues in a row…
43
Here is a solution (I am not sure if it is unique)
44
To Think About Assume we are to solve this with blind search..
How would you represent the states and the operators? What is the branching factor? What is the diameter? Can you get it exactly, or at least upper and/or lower bounds? Which blind search algorithm would you use?
45
The states should be the intersections
The states should be the intersections. It is only there that we have a choice. Here choices are operators. For each state we need to know.. F H I M Possible operators allowed For example: From C, we can get to B, E or F From A, we can get to B C E G J L N B D K O A P
46
We also need to know the parity for each operator.
That is to say, we need to know the last color visited. For example: For node F If C was parent, last color was Blue If H was parent, last color was Blue If M was parent, last color was Blue If E was parent, last color was Red F H I M C E G J L N B D K O A P
47
F H I M C E G J L N B D K O A P For node F
If C was parent, last color was Blue If H was parent, last color was Blue If M was parent, last color was Blue If E was parent, last color was Red Given the above, we can list the legal operators IF last color was Blue legal operators = { E } ELSE legal operators = { C, M , H} END F H I M C E G J L N B D K O A P
48
A B C D E F B N L O P Start Goal!
What can we say about the branching factor? The are some four-way intersections, but you can never go back (because the last color you saw, would be seen again). So the branching factor is at most three for the four-way intersections, and there are four of them. The are some three-way intersections, but again you can never go back. So the branching factor is at most two for the three-way intersections, and there are twelve of them1. So an estimate for the branching factor is b = (12/16 * 2) + (4/16*3) = 2.25. This is actually an upper bound. A Start B C D E F B N L O P Goal! 1Really should say, at most 12, because A has only one choice
49
A B C D E F B N L O P Start Goal!
What can we say about the branching factor? The are some four-way intersections, but you can never go back (because the last color you saw, would be seen again). So the branching factor is at most three. That is good enough an estimate. What can we say about the depth? Lets do a lower bound. If I take away all the colors, then I can solve this with ten moves: A, B, C, F , H, I, M, L, N, O, P So 10 is a lower bound. Lets do an upper bound. If we count the number of intersections, we might guess 16. However, with a little introspection, we can see that: We might have to visit some intersections twice, each time from a different parent, and each time going a different way. We never have to visit an intersection twice! So an upper bound is d = (16 * 2) = 32. A Start B C D E F B N L O P Goal!
50
Which algorithm should we use?
How many nodes do we have to check? Assume we take one nanosecond to test each state that we pop off NODES. Assume the worst case We have b = 2.25 and d = (2.2532) nanosecond = 3.1 minutes Assume the best case We have b = 2.25 and d = (2.2510) nanosecond = 3.3 microseconds Assume the actual depth is half way between best and worst case We have b = 2.25 and d = (2.2524) nanosecond = seconds It probably does not matter what algorithm we use. We are not likely to run out of space or time. Likewise, I would not bother to optimize my code. However, this assumes we solving the puzzle once. The similar “google maps” problem needs to be solved millions of times a day, so for that problem, we would optimize the code very carefully.
51
SETTING SUN Blanks, that pieces can slides into Here is a depth 16 solution, I don’t know if it is optimal. Here is a video of a solution
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.