Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adversarial Search 1 This slide deck courtesy of Dan Klein at UC Berkeley.

Similar presentations


Presentation on theme: "Adversarial Search 1 This slide deck courtesy of Dan Klein at UC Berkeley."— Presentation transcript:

1 Adversarial Search 1 This slide deck courtesy of Dan Klein at UC Berkeley

2 Game Playing  Many different kinds of games!  Axes:  Deterministic or stochastic?  One, two, or more players?  Perfect information (can you see the state)?  Turn taking or simultaneous action?  Want algorithms for calculating a strategy (policy) which recommends a move in each state 2

3 Pruning for Minimax 3

4 Pruning in Minimax Search 4 12852 3 214

5 Alpha-Beta Pruning  General configuration  We’re computing the MIN-VALUE at n  We’re looping over n’s children  n’s value estimate is dropping  a is the best value that MAX can get at any choice point along the current path  If n becomes worse than a, MAX will avoid it, so can stop considering n’s other children  Define b similarly for MIN MAX MIN MAX MIN a n 5

6 Alpha-Beta Pseudocode b v

7 Alpha-Beta Pruning Example 1251 3 2 8 14 ≥8 3 ≤2≤1 3 a is MAX’s best alternative here or above b is MIN’s best alternative here or above

8 Alpha-Beta Pruning Properties  This pruning has no effect on final result at the root  Values of intermediate nodes might be wrong!  Good child ordering improves effectiveness of pruning  With “perfect ordering”:  Time complexity drops to O(b m/2 )  Doubles solvable depth!  Full search of, e.g. chess, is still hopeless…  This is a simple example of metareasoning (computing about what to compute) 8

9 Expectimax Search Trees  What if we don’t know what the result of an action will be? E.g.,  In solitaire, next card is unknown  In minesweeper, mine locations  In pacman, the ghosts act randomly  Can do expectimax search  Chance nodes, like min nodes, except the outcome is uncertain  Calculate expected utilities  Max nodes as in minimax search  Chance nodes take average (expectation) of value of children  Later, we’ll learn how to formalize the underlying problem as a Markov Decision Process 10457 ma x chanc e 9

10 Maximum Expected Utility  Why should we average utilities? Why not minimax?  Principle of maximum expected utility: an agent should chose the action which maximizes its expected utility, given its knowledge  General principle for decision making  Often taken as the definition of rationality  We’ll see this idea over and over in this course!  Let’s decompress this definition… 10

11 Reminder: Probabilities  A random variable represents an event whose outcome is unknown  A probability distribution is an assignment of weights to outcomes  Example: traffic on freeway?  Random variable: T = whether there’s traffic  Outcomes: T in {none, light, heavy}  Distribution: P(T=none) = 0.25, P(T=light) = 0.55, P(T=heavy) = 0.20  Some laws of probability (more later):  Probabilities are always non-negative  Probabilities over all possible outcomes sum to one  As we get more evidence, probabilities may change:  P(T=heavy) = 0.20, P(T=heavy | Hour=8am) = 0.60  We’ll talk about methods for reasoning and updating probabilities later 11

12 What are Probabilities?  Objectivist / frequentist answer:  Averages over repeated experiments  E.g. empirically estimating P(rain) from historical observation  Assertion about how future experiments will go (in the limit)  New evidence changes the reference class  Makes one think of inherently random events, like rolling dice  Subjectivist / Bayesian answer:  Degrees of belief about unobserved variables  E.g. an agent’s belief that it’s raining, given the temperature  E.g. pacman’s belief that the ghost will turn left, given the state  Often learn probabilities from past experiences (more later)  New evidence updates beliefs (more later) 12

13 Uncertainty Everywhere  Not just for games of chance!  I’m sick: will I sneeze this minute?  Email contains “FREE!”: is it spam?  Tooth hurts: have cavity?  60 min enough to get to the airport?  Robot rotated wheel three times, how far did it advance?  Safe to cross street? (Look both ways!)  Sources of uncertainty in random variables:  Inherently random process (dice, etc)  Insufficient or weak evidence  Ignorance of underlying processes  Unmodeled variables  The world’s just noisy – it doesn’t behave according to plan!  Compare to fuzzy logic, which has degrees of truth, rather than just degrees of belief 13

14 Reminder: Expectations  We can define function f(X) of a random variable X  The expected value of a function is its average value, weighted by the probability distribution over inputs  Example: How long to get to the airport?  Length of driving time as a function of traffic: L(none) = 20, L(light) = 30, L(heavy) = 60  What is my expected driving time?  Notation: E[ L(T) ]  Remember, P(T) = {none: 0.25, light: 0.5, heavy: 0.25}  E[ L(T) ] = L(none) * P(none) + L(light) * P(light) + L(heavy) * P(heavy)  E[ L(T) ] = (20 * 0.25) + (30 * 0.5) + (60 * 0.25) = 35 14

15 Expectations  Real valued functions of random variables:  Expectation of a function of a random variable  Example: Expected value of a fair die roll X P f 11/61 2 2 3 3 4 4 5 5 6 6 15

16 Utilities  Utilities are functions from outcomes (states of the world) to real numbers that describe an agent’s preferences  Where do utilities come from?  In a game, may be simple (+1/-1)  Utilities summarize the agent’s goals  Theorem: any set of preferences between outcomes can be summarized as a utility function (provided the preferences meet certain conditions)  In general, we hard-wire utilities and let actions emerge (why don’t we let agents decide their own utilities?)  More on utilities soon… 16

17 Expectimax Search  In expectimax search, we have a probabilistic model of how the opponent (or environment) will behave in any state  Model could be a simple uniform distribution (roll a die)  Model could be sophisticated and require a great deal of computation  We have a node for every outcome out of our control: opponent or environment  The model might say that adversarial actions are likely!  For now, assume for any state we magically have a distribution to assign probabilities to opponent actions / environment outcomes Having a probabilistic belief about an agent’s action does not mean that agent is flipping any coins! 17

18 Expectimax Pseudocode def value(s) if s is a max node return maxValue(s) if s is an exp node return expValue(s) if s is a terminal node return evaluation(s) def maxValue(s) values = [value(s’) for s’ in successors(s)] return max(values) def expValue(s) values = [value(s’) for s’ in successors(s)] weights = [probability(s, s’) for s’ in successors(s)] return expectation(values, weights) 8456 18

19 Expectimax for Pacman  Notice that we’ve gotten away from thinking that the ghosts are trying to minimize pacman’s score  Instead, they are now a part of the environment  Pacman has a belief (distribution) over how they will act  Quiz: Can we see minimax as a special case of expectimax?  Quiz: what would pacman’s computation look like if we assumed that the ghosts were doing 1-ply minimax and taking the result 80% of the time, otherwise moving randomly?  If you take this further, you end up calculating belief distributions over your opponents’ belief distributions over your belief distributions, etc…  Can get unmanageable very quickly! 19

20 Expectimax for Pacman Minimizing Ghost Random Ghost Minimax Pacman Won 5/5 Avg. Score: 493 Won 5/5 Avg. Score: 483 Expectimax Pacman Won 1/5 Avg. Score: -303 Won 5/5 Avg. Score: 503 Results from playing 5 games Pacman used depth 4 search with an eval function that avoids trouble Ghost used depth 2 search with an eval function that seeks Pacman

21 Expectimax Pruning? 21

22 Expectimax Evaluation  For minimax search, evaluation function scale doesn’t matter  We just want better states to have higher evaluations (get the ordering right)  We call this property insensitivity to monotonic transformations  For expectimax, we need the magnitudes to be meaningful as well  E.g. must know whether a 50% / 50% lottery between A and B is better than 100% chance of C  100 or -10 vs 0 is different than 10 or -100 vs 0

23 Mixed Layer Types  E.g. Backgammon  Expectiminimax  Environment is an extra player that moves after each agent  Chance nodes take expectations, otherwise like minimax 23

24 Stochastic Two-Player  Dice rolls increase b: 21 possible rolls with 2 dice  Backgammon  20 legal moves  Depth 4 = 20 x (21 x 20) 3 1.2 x 10 9  As depth increases, probability of reaching a given node shrinks  So value of lookahead is diminished  So limiting depth is less damaging  But pruning is less possible…  TDGammon uses depth-2 search + very good eval function + reinforcement learning: world-champion level play

25 Non-Zero-Sum Games  Similar to minimax:  Utilities are now tuples  Each player maximizes their own entry at each node  Propagate (or back up) nodes from children  Can give rise to cooperation and competition dynamically… 1,2,61,2,64,3,24,3,26,1,26,1,27,4,17,4,15,1,15,1,11,5,21,5,27,7,17,7,15,4,55,4,5

26 Iterative Deepening Iterative deepening uses DFS as a subroutine: 1.Do a DFS which only searches for paths of length 1 or less. (DFS gives up on any path of length 2) 2.If “1” failed, do a DFS which only searches paths of length 2 or less. 3.If “2” failed, do a DFS which only searches paths of length 3 or less. ….and so on. Why do we want to do this for multiplayer games? Note: wrongness of eval functions matters less and less the deeper the search goes! … b 26

27

28 What is Search For?  Models of the world: single agents, deterministic actions, fully observed state, discrete state space  Planning: sequences of actions  The path to the goal is the important thing  Paths have various costs, depths  Heuristics to guide, fringe to keep backups  Identification: assignments to variables  The goal itself is important, not the path  All paths at the same depth (for some formulations)‏  CSPs are specialized for identification problems

29 Constraint Satisfaction Problems  Standard search problems:  State is a “black box”: arbitrary data structure  Goal test: any function over states  Successor function can be anything  Constraint satisfaction problems (CSPs):  A special subset of search problems  State is defined by variables X i with values from a domain D (sometimes D depends on i )‏  Goal test is a set of constraints specifying allowable combinations of values for subsets of variables  Simple example of a formal representation language  Allows useful general-purpose algorithms with more power than standard search algorithms

30 Example: N-Queens  Formulation 1:  Variables:  Domains:  Constraints

31 Example: N-Queens  Formulation 2:  Variables:  Domains:  Constraints: Implicit: Explicit: -or-

32 Example: Map-Coloring  Variables:  Domain:  Constraints: adjacent regions must have different colors  Solutions are assignments satisfying all constraints, e.g.:

33 Constraint Graphs  Binary CSP: each constraint relates (at most) two variables  Binary constraint graph: nodes are variables, arcs show constraints  General-purpose CSP algorithms use the graph structure to speed up search. E.g., Tasmania is an independent subproblem!

34 Example: Cryptarithmetic  Variables (circles):  Domains:  Constraints (boxes):

35 Example: Sudoku  Variables:  Each (open) square  Domains:  {1,2,…,9}  Constraints: 9-way alldiff for each row 9-way alldiff for each column 9-way alldiff for each region

36 Example: Boolean Satisfiability  Given a Boolean expression, is it satisfiable?  Very basic problem in computer science  Turns out you can always express in 3-CNF  3-SAT: find a satisfying truth assignment

37 Example: 3-SAT  Variables:  Domains:  Constraints: Implicitly conjoined (all clauses must be satisfied)‏

38 Varieties of CSPs  Discrete Variables  Finite domains  Size d means O(d n ) complete assignments  E.g., Boolean CSPs, including Boolean satisfiability (NP-complete)‏  Infinite domains (integers, strings, etc.)‏  E.g., job scheduling, variables are start/end times for each job  Linear constraints solvable, nonlinear undecidable  Continuous variables  E.g., start/end times for Hubble Telescope observations  Linear constraints solvable in polynomial time by LP methods

39 Varieties of Constraints  Varieties of Constraints  Unary constraints involve a single variable (equiv. to shrinking domains):  Binary constraints involve pairs of variables:  Higher-order constraints involve 3 or more variables: e.g., cryptarithmetic column constraints  Preferences (soft constraints):  E.g., red is better than green  Often representable by a cost for each variable assignment  Gives constrained optimization problems  (We’ll ignore these until we get to Bayes’ nets)‏

40 Real-World CSPs  Assignment problems: e.g., who teaches what class  Timetabling problems: e.g., which class is offered when and where?  Hardware configuration  Transportation scheduling  Factory scheduling  Floorplanning  Fault diagnosis  … lots more!  Many real-world problems involve real-valued variables…

41 Backtracking Example

42 Improving Backtracking  General-purpose ideas can give huge gains in speed:  Which variable should be assigned next?  In what order should its values be tried?  Can we detect inevitable failure early?  Can we take advantage of problem structure?

43 Summary  CSPs are a special kind of search problem:  States defined by values of a fixed set of variables  Goal test defined by constraints on variable values  Backtracking = depth-first search with incremental constraint checks  Ordering: variable and value choice heuristics help significantly  Filtering: forward checking, arc consistency prevent assignments that guarantee later failure  Structure: Disconnected and tree-structured CSPs are efficient  Iterative improvement: min-conflicts is usually effective in practice

44 A (Short) History of AI  1940-1950: Early days  1943: McCulloch & Pitts: Boolean circuit model of brain  1950: Turing's “Computing Machinery and Intelligence”  1950—70: Excitement: Look, Ma, no hands!  1950s: Early AI programs, including Samuel's checkers program, Newell & Simon's Logic Theorist, Gelernter's Geometry Engine  1956: Dartmouth meeting: “Artificial Intelligence” adopted  1965: Robinson's complete algorithm for logical reasoning  1970—88: Knowledge-based approaches  1969—79: Early development of knowledge-based systems  1980—88: Expert systems industry booms  1988—93: Expert systems industry busts: “AI Winter”  1988—: Statistical approaches  Resurgence of probability, focus on uncertainty  General increase in technical depth  Agents and learning systems… “AI Spring”?  2000—: Where are we now?

45 Some Hard Questions…  Who is liable if a robot driver has an accident?  Will machines surpass human intelligence?  What will we do with superintelligent machines?  Would such machines have conscious existence? Rights?  Can human minds exist indefinitely within machines (in principle)?


Download ppt "Adversarial Search 1 This slide deck courtesy of Dan Klein at UC Berkeley."

Similar presentations


Ads by Google