Informed Search Modified by M. Perkowski. Yun Peng
Informed Methods Add Domain-Specific Information All heuristic domain specific knowledge is in function h Informed Methods add domain-specific information to select what is the best path to continue searching along They define a heuristic function, h(n), that estimates the "goodness" of a node n with respect to reaching a goal. Specifically, h(n) = estimated cost (or distance) of minimal cost path from n to a goal state. h(n) is about cost of the future search, g(n) past search h(n) is an estimate (rule of thumb), based on domain-specific information that is computable from the current state description. Heuristics do not guarantee feasible solutions and are often without theoretical basis. a g b c d e f h i h=2 h=1 h=0 better solution Here only heuristic is used to guide search
Examples of Heuristics Missionaries and Cannibals: Number of people on starting river bank 8-puzzle: Number of tiles out of place (i.e., not in their goal positions) 8-puzzle: Sum of Manhattan distances each tile is from its goal position 8-queen: # of un-attacked positions – un-positioned queens In general: h(n) >= 0 for all nodes n h(n) = 0 implies that n is a goal node h(n) = infinity implies that n is a deadend from which a goal cannot be reached
Heuristic Functions
Heuristic Functions 5 4 6 1 8 7 3 2 1 2 3 8 4 7 6 5 8 puzzle Branching Factor ≈ 3 9! =362,880 unique states Heuristics h1 = # tiles in wrong position h2 = city block distance h2 dominates h1 h2(n) > h1(n) Both are admissible → h2 is better (see Figure 4.8) 5 4 6 1 8 7 3 2 1 2 3 8 4 7 6 5
Heuristic Functions If h2 dominates h1 and both are admissible, then use h2 A* expands all nodes with f(n) < f*. But f(n) = g(n)+h(n) < f* → h(n) < f*- g(n). If n is expanded by h2, then n will be expanded by h1 since h1(n) < h2(n) but some nodes may be expanded by h1 and not by h2
Inventing Heuristic Functions Solutions to relaxed problems h2 is exact cost if squares can be on top of each other ABSOLVER (see text) Max of several heuristics Statistics (may not be admissible) Features Somewhat problem dependent
IDA* Iterative Deepening A* Use f-cost as limit instead of depth Properties of cost function influence complexity 8 puzzle has small number of f values Traveling salesperson large number
Best First Search
Best First Search Greedy search Minimize estimated cost to reach the goal h(n) = estimated cost of the cheapest path from the state at node n to a goal state Expand node with smallest h(n) Like depth first Prefers to follow single path Backtracking Not optimal, but sometimes good
Best First Search – Greedy Search (not bad)
Best First Search Best First Search orders nodes on the OPEN list by increasing value of an evaluation function, f(n) , that incorporates domain-specific information in some way. Example of f(n): f(n) = g(n) (uniform-cost) f(n) = h(n) (greedy algorithm) f(n) = g(n) + h(n) (algorithm A) This is a generic way of referring to the class of informed methods. Thus, uniform-cost, breadth-first, greedy and A are special cases of Best First Search
Analysis of Greedy Search Like depth-first Not Optimal Complete if space is finite
Greedy Search a g b c d e f h i Evaluation function f(n) = h(n), sorting open nodes by increasing values of f. Selects node to expand believed to be closest (hence it's "greedy") to a goal node (i.e., smallest f = h value) Not admissible, as in the example. Assuming all arc costs are 1, then Greedy search will find goal f, which has a solution cost of 5, while the optimal solution is the path to goal i with cost 3. Not complete (if no duplicate check) a g b c d e f h i h=2 h=1 h=0 h=4 Remember: An admissible algorithm finds the optimal solution! Minimal solution of cost 3 Solution found of cost 5
Beam search Idea = to limit the size of OPEN list. Use an evaluation function f(n) = h(n), but the maximum size of the list of nodes is k, a fixed constant Only keeps k best nodes as candidates for expansion, and throws the rest away More space efficient than greedy search, but may throw away a node that is on a solution path Not complete Not admissible
Algorithm A
Algorithm A f(n) = g(n) + h(n) S 8 5 1 A 5 B C 8 9 3 4 D G Algorithm A uses as an evaluation function f(n) = g(n) + h(n) The h(n) term represents a “depth-first“ factor in f(n) g(n) = minimal cost path from the start state to state n generated so far The g(n) term adds a "breadth-first" component to f(n). Ranks nodes on OPEN list by estimated cost of solution from start node through the given node to goal. Completeness and admissibility depends on h(n) 8 S g(n) 8 5 1 1 A 5 B C 8 9 3 5 1 4 D G h(n) 9 f(D)=4+9=13 f(B)=5+5=10 f(C)=8+1=9 C is chosen next to expand as having the smallest value of f Observe that there is no constraint on h g(n) h(n)
Algorithm A OPEN := {S}; CLOSED := {}; repeat Select node n from OPEN with minimal f(n) and place n on CLOSED; if n is a goal node exit with success; Expand(n); For each child n' of n do compute h(n'), g(n')=g(n)+ c(n,n'), f(n')=g(n')+h(n'); if n' is not already on OPEN or CLOSED then put n’ on OPEN; set backpointer from n' to n else if n' is already on OPEN or CLOSED and if g(n') is lower for the new version of n' then discard the old version of n‘; Put n' on OPEN; set backpointer from n' to n until OPEN = {}; exit with failure
Algorithm A*
Algorithm A* A* is admissible Algorithm A* is algorithm A with constraint that h(n) <= h*(n) This is called the admissible heuristic. h*(n) = true cost of the minimal cost path from n to any goal. g*(n) = true cost of the minimal cost path from S to n. f*(n) = h*(n)+g*(n) = true cost of the minimal cost solution path from S to any goal going through n. h is admissible when h(n) <= h*(n) holds. Using an admissible heuristic guarantees that the first solution found will be an optimal one. A* is complete when: (1) the branching factor is finite, and (2) every operator has a fixed positive cost : i.e. the total # of nodes with f(.) <= f*(goal) is finite A* is admissible Observe that we can calculate h* when the whole tree is already known. Expanding the whole tree can be then used to define better function h.
Some Observations on A* Null heuristic: If h(n) = 0 for all n, then this is an admissible heuristic and A* acts like Uniform-Cost Search. Thus Uniform-Cost is a special case of A* Better heuristic: If h1(n) <= h2(n) <= h*(n) for all non-goal nodes, then h2 is a better heuristic than h1 If A1* uses h1, and A2* uses h2, then every node expanded by A2* is also expanded by A1*. In other words, A1* expands at least as many nodes as A2*. We say that A2* is better informed than A1*. The closer h is to h*, the fewer extra nodes that will be expanded Perfect heuristic: If h(n) = h*(n) for all n, then only the nodes on the optimal solution path will be expanded. So, no extra work will be performed.
Example search space: please note values of g and h in nodes B A D G E 1 5 8 9 4 3 7 start state goal state arc cost h value parent pointer g value f(n) = g(n) + h(n) Optimal path with cost 9 Use this and next slide to discuss in detail how A* works
S C B A D G E 1 5 8 9 4 3 7 parent pointer arc cost h value g value Example search space n g(n) h(n) f(n) h*(n) S 0 8 8 9 A 1 8 9 9 B 5 4 9 4 C 8 3 11 5 D 4 inf inf inf E 8 inf inf inf G 9 0 9 0 f*(n) 9 10 9 13 inf inf 0 S C B A D G E 1 5 8 9 4 3 7 start state goal state arc cost h value parent pointer g value h*(n)=9 Accurate prediction h*(n)=9 h*(n)=4 h*(n)=5 h*(n)=0 h*(n)=inf h*(n)=inf f(n) = g(n) + h(n)
Example f(n) = g(n) + h(n) Using f*(n) we would be able to find directly the optimum solution. f(n) = g(n) + h(n) Example n g(n) h(n) f(n) h*(n) S 0 8 8 9 A 1 8 9 9 B 5 4 9 4 C 8 3 11 5 D 4 inf inf inf E 8 inf inf inf G 9 0 9 0 h*(n) is the (hypothetical) perfect heuristic. Since h(n) <= h*(n) for all n, h is admissible Optimal path = S B G with cost 9. Optimal path would be found directly if we would be able to calculate h* f*(n) 9 10 9 13 inf inf 0
Greedy Algorithm S C B A D G E 1 5 8 9 4 3 7 f(n) = h(n) start state node exp. OPEN list {S(8)} S {C(3) B(4) A(8)} C {G(0) B(4) A(8)} G {B(4) A(8)} Solution path found is S C G with cost 13. 3 nodes expanded. Fast, but not optimal. f(n) = h(n) S C B A D G E 1 5 8 9 4 3 7 start state goal state arc cost h value parent pointer g value
A* Search S C B A D G E 1 5 8 9 4 3 7 f(n) = g(n) + h(n) start state node exp. OPEN list {S(8)} S {A(9) B(9) C(11)} A {B(9) G(10) C(11) D(inf) E(inf)} B {G(9) G(10) C(11) D(inf) E(inf)} G {C(11) D(inf) E(inf)} A* Search f(n) = g(n) + h(n) S C B A D G E 1 5 8 9 4 3 7 start state goal state arc cost h value parent pointer g value Solution path found is S B G with cost 9 4 nodes expanded. Still pretty fast. And optimal, too.
Proof of the Optimality of A* Let l* be the optimal solution path (from S to G), let f* be its cost At any time during the search, one or more node on l* are in OPEN We assume that A* has selected G2, a goal state with a suboptimal solution (g(G2) > f*). We show that this is impossible. Let node n be the shallowest OPEN node on l* Because all ancestors of n along l* are expanded, g(n)=g*(n) Because h(n) is admissible, h*(n)>=h(n). Then f* =g*(n)+h*(n) >= g*(n)+h(n) = g(n)+h(n)= f(n). If we choose G2 instead of n for expansion, f(n)>=f(G2). This implies f*>=f(G2). G2 is a goal state: h(G2) = 0, f(G2) = g(G2). Therefore f* >= g(G2) Contradiction. Read it at home. See Luger book
A* Minimizing total path cost Admissible heuristic Monotonicity Combines greedy + uniform cost Admissible heuristic Monotonicity Optimality and Completeness Complexity
A* - minimizing total path cost Evaluation function f(n) = g(n) + h(n) g(n) = cost of the path so far h(n) = estimated cost of the cheapest path from the state at node n to a goal state (heuristic)
A* - admissible heuristic Never over-estimates the cost to reach the goal Definition of A* search Best first search using f as the evaluation function when h is an admissible heuristic e.g. straight line distance
f(n’) = max(f(n), g(n’)+h(n’)) A* - monotonicity f cost never decreases along a solution path If f is not monotonic, i.e. f(n’) < f(n) but n’ is a child of n, then we should redefine f as f(n’) = max(f(n), g(n’)+h(n’)) The redefined f is monotonic
A* - Bucharest Craiova Rimnicu Arad 366 Timisoara 447 Sibiu 393 Zerind 449 Fagaras 417 Rimnicu 413 Arad 646 Sibiu 553 Pitesti 415 Craiova 526 Craiova Bucharest 418 Rimnicu Oradea 526
A* - Bucharest What if straight line distance to Fagaras is less? Sibiu 592 Arad 366 Zerind 449 Sibiu 393 Timisoara 447 Arad 646 Rimnicu 413 Fagaras 317 Oradea 526 Sibiu 553 Pitesti 415 Craiova 526 Craiova Bucharest 418 Rimnicu 78
Proof of Optimality of A* Proof by contradiction Assume it is not optimal and derive a contradiction.
Proof of Optimality of A* Let G be an optimal goal state Path cost f* Let G2 be a suboptimal goal state h(G2) = 0 (goal state) g(G2) > f* (suboptimal) Assume A* selects G2 from the queue Prove this leads to a contradiction
Proof of Optimality of A* Let n be a current leaf node on optimal path to G path not completely expanded h admissible → f* ≥ f(n) (can’t overestimate) A* chooses G2 → f(n) ≥ f(G2) Hence, f* ≥ f(G2) = g(G2) + h(G2) = g(G2) > f*
Iterative Deepening A* (IDA*)
Iterative Deepening A* (IDA*) Idea: Similar to IDDF: In IDDF search at each iteration is bound by the depth, In IDA* search is bound by the current f_limit At each iteration, all nodes with f(n) <= f_limit will be expanded (in DF fashion). If no solution is found at the end of an iteration, increase f_limit and start the next iteration f_limit: Initialization: f_limit := h(s) Increment: at the end of each (unsuccessful) iteration, f_limit := max{f(n)|n is a cut-off node} Goal testing: test all cut-off nodes until a solution is found select the least cost solution if there are multiple solutions IDA* is Admissible if h is admissible Recall: h(n) <= h*(n) is called the admissible heuristic.
What’s a good heuristic? 11/8/2012 As we remember if h1(n) < h2(n) <= h*(n) for all n, then h2 is better than h1 We say, h2 dominates h1. The following can be done to improve the algorithm: Relaxing the problem: remove constraints to create a (much) easier problem; use the solution cost for this easier problem as the heuristic function h Combining heuristics: take the max of several admissible heuristics to create h: still have an admissible heuristic, and it’s better! Use statistical estimates to compute h: may lose admissibility Identify good features, then use a learning algorithm to find a heuristic function: also may lose admissibility
Nature of heuristics Domain knowledge: some knowledge about the game, the problem to choose Heuristic: a guess about which is best, not exact Heuristic function, h(n): estimate the distance from current node to goal
Automatic generation of h functions by relaxing the problem. Original problem P Relaxed problem P' A set of constraints removing one or more constraints P is complex P' becomes simpler Use cost of a best solution path from n in P' as h(n) for P Admissibility: h* h cost of best solution in P cost of best solution in P' Original problem Solution space of P Solution space of P' Relaxed problem
8 PUZZLE
8-puzzle URLS http://www.permadi.com/java/puzzle8 http://www.cs.rmit.edu.au/AI-Search/Product
Heuristics for 8 Puzzle Suppose 8-puzzle off by one Is there a way to choose the best move next? Good news: Yes! We can use domain knowledge or heuristic to choose the best move
1 2 3 4 5 6 7
Heuristic for the 8-puzzle # tiles out of place (h1) Manhattan distance (h2) Sum of the distance of each tile from its goal position Tiles can only move up or down city blocks 1 2 3 4 5 6 7 2 5 3 1 7 6 4 1 2 3 4 5 6 7 Goal State h1=5 h2=1+1+1+2+2=7 h1=1 h2=1
Admissability of A*
Example for Admissible Function
Admissable heuristics A heuristic that never overestimates the cost to the goal h1 and h2 for the 8 puzzle are admissable heuristics Consistency: the estimated cost of reaching the goal from n is no greater than the step cost of getting to n’ plus estimated cost to goal from n’ h(n) <=c(n,a,n’)+h(n’)
Which heuristic is better for 8 puzzle? Better means that fewer nodes will be expanded in searches h2 dominates h1 if h2 >= h1 for every node n Intuitively, if both are underestimates, then h2 is more accurate Using a more informed heuristic is guaranteed to expand fewer nodes of the search space Which heuristic is better for the 8-puzzle?
Relaxed Problems Admissable heuristics can be derived from the exact solution cost of a relaxed version of the problem If the rules of the 8-puzzle are relaxed so that a tile can move anywhere, then h1 gives the shortest solution If the rules are relaxed so that a tile can move to any adjacent square, then h2 gives the shortest solution Key: the optimal solution cost of a relaxed problem is no greater than the optimal solution cost of the real problem.
Automatic generation of h functions by relaxing the problem Example: 8-puzzle Constraints: to move a tile from cell A to cell B there are 3 conditions as follows: cond1: there is a tile on A cond2: cell B is empty cond3: A and B are adjacent (horizontally or vertically) Removing cond2 we obtain function h2: h2 (sum of Manhattan distances of all misplaced tiles) Removing cond2 and cond3 we obtain function h1: h1 (# of misplaced tiles) Removing cond3 we obtain function h3: h3, a new heuristic function calculated as below h1(start) = 7 h2(start) = 18 h3(start) = 7 Number of misplaced tiles is 7 here
Example of using a heuristic function h repeat if the current empty cell A is to be occupied by tile x in the goal, move x to A. Otherwise, move into A any arbitrary misplaced tile. until the goal is reached It can be checked that: h2 h3 h1 Relaxing the problem: remove constraints to create a (much) easier problem; use the solution cost for this easier problem as the heuristic function h Example of using a heuristic function h h1(start) = 7 h2(start) = 18 h3(start) = 7 Use cost of a best solution path from n in P' as h(n) for P
Heuristics for other problems Shortest path from one city to another Touring problem: visit every city exactly once Challenge: Is there an admissable heuristic for sudoku?
Sudoku
Another Example: Traveling Salesman Problem. Relaxation Methods Another Example: Traveling Salesman Problem. A legal tour is a (Hamiltonian) circuit Variant 1: It is a connected second degree graph (each node has exactly two adjacent edges) Removing the connectivity constraint leads to h1: find the cheapest second degree graph from the given graph (with o(n^3) complexity) The given complete graph A legal tour Other second degree graphs After removing connectivity
Relaxation Methods The given complete graph A legal tour Combine second degree graphs
Hamiltonian) circuit continued Variant 2: legal tour is a spanning tree (when an edge is removed) with the constraint that each node has at most 2 adjacent edges) Removing the constraint leads to h2: find the cheapest minimum spanning tree from the given graph (with O(n^2/log n) The given graph A legal tour Other spanning trees Relaxing the problem: remove constraints to create a (much) easier problem; use the solution cost for this easier problem as the heuristic function h
Complexity of A* search
Complexity of A* search In general: exponential time complexity exponential space complexity For subexponential growth of # of nodes expanded, we need |h(n)-h*(n)| <= O(log h*(n)) for all n For most problems we have |h(n)-h*(n)| <= O(h*(n) Relaxing optimality can be done using one of the following methods: Method 1 of relaxing optimality. Weighted evaluation function f(n) = (1-w)*g(n)+w*h(n) w=0: uniformed-cost search w=1: greedy algorithm w=1/2: A* algorithm when w > ½, search is more depth-first (radical) than A*
Method 2 of relaxing optimality. Dynamic weighting f(n)=g(n)+h(n)+ [1- d(n)/N]*h(n) d(n): depth of node n N: anticipated depth of an optimal goal at beginning of search: << N encourages DF search when close to the end: back to A* It is -admissible (solution cost found is (1+ ) solution found by A*)
Method 3 of relaxing optimality : : another -admissible algorithm select n from OPEN for expansion if f(n) <= (1 + )*smallest f value among all nodes in OPEN Select n if it is a goal Otherwise select randomly or with smallest h value Method 4 of relaxing optimality : Pruning OPEN list (cutting-off): Find a solution using some quick but non-admissible method (e.g., greedy algorithm, hill-climbing, neural networks) with cost f+ Do not put a new node n on OPEN unless f(n) <= f+ Admissible: suppose f(n) > f+ >= f*, the least cost solution sharing the current path from s to n would have cost g(n) + h*(n) >= g(n) + h(n) = f(n)>f*
Conclusions on A* By relaxing optimality we can create more efficient algorithms that can find solution while A* cannot. These are typical tricks that allow us to create efficient algorithms when we understand the problem or when we can use Machine Learning to learn the problem characteristics
Iterative Improvement Search Examples of other good search algorithms
Iterative Improvement Search Another approach to search involves starting with an initial guess at a solution and gradually improving it. Some examples: Hill Climbing Simulated Annealing Genetic algorithm
Hill Climbing on a Surface of States
Hill Climbing Expand best next node Doesn’t keep search tree Can hit local maxima/minima, plateaus Multi-dimensional Random restart Neural networks often rely on this Cost function is differentiable Hills determined by gradient Gradient descent
Hill Climbing on a Surface of States Height Defined by Evaluation Function
Hill Climbing Search If there exists a successor n’ for the current state n such that h(n’) < h(n) h(n’) <= h(t) for all the successors t of n, then move from n to n’. Otherwise, halt at n. Looks one step ahead to determine if any successor is better than the current state; if there is, move to the best successor. Similar to greedy search in that it uses h only, but does not allow backtracking or jumping to an alternative path since it doesn’t “remember” where it has been. OPEN = {current-node} Not complete since the search will terminate at "local minima," "plateaus," and "ridges."
Hill climbing example 2 8 3 1 6 4 7 5 start goal h = 3 h = 2 h = 1 Here or here? Selected this because of “look ahead heuristic” f(n) = (number of tiles out of place)
Drawbacks of Hill-Climbing Problems: Local Maxima: Plateaus: the space has a broad flat plateau with a singularity as it’s maximum Ridges: steps to the North, East, South and West may go down, but a step to the NW may go up. Remedy: Random Restart. Multiple HC searches from different start states Some problems spaces are great for Hill Climbing and others horrible.
Example of a local maximum 1 2 5 7 4 8 6 3 -4 start goal 1 2 5 7 4 8 6 3 1 2 5 7 4 8 6 3 1 2 4 7 3 8 6 5 -4 -3 1 2 5 7 4 8 6 3 -4
Simulated Annealing
Simulated Annealing belongs to random search Example of Random Search Hill climbing but chooses next node randomly and may go that way even if not better!
Simulated Annealing Philosophy and temperature C = current node N = randomly selected successor dE = value(N) – value(C) If(dE > 0 ) Then C←N Else C←N with probability edE/T T is temperature parameter
Explanation of Simulated Annealing Step What does C←N with probability edE/T mean? Generate random number, r, from uniform distribution on [0, 1]. Prob( r ≤ edE/T ) = edE/T. So it means that If r ≤ edE/T Then assign C←N
Simulated Annealing analogy to nature Simulated Annealing (SA) exploits an analogy between the way in which a metal cools and freezes into a minimum energy crystalline structure (the annealing process) and the search for a minimum in a more general system. Each state n has an energy value f(n), where f is an evaluation function SA can avoid becoming trapped at local minimum energy state by introducing randomness into search so that it not only accepts changes that decreases state energy, but also some that increase it. SAs use a a control parameter T, which by analogy with the original application is known as the system “temperature” irrespective of the objective function involved. T starts out high and gradually (very slowly) decreases toward 0.
Algorithm for Simulated Annealing current := a randomly generated state; T := T_0 /* initial temperature T0 >>0 */ forever do if T <= T_end then return current; next := a randomly generated new state; /* next != current */ = f(next) – f(current); current := next with probability ; T := schedule(T); /* reduce T by a cooling schedule */ Commonly used cooling schedule T := T * alpha where 0 < alpha < 1 is a cooling constant T_k := T_0 / log (1+k)
Observation with Simulated Annealing This simulates Nature Probability that the system is at any particular state depends on the state’s energy (Boltzmann distribution) If time taken to cool is infinite then So statistically the best solution is found
Genetic Algorithms can be combined with search and SA Emulating biological evolution (survival of the fittest by natural selection process ) population selection of parents for reproduction (based on a fitness function) parents reproduction (cross-over + mutation) next generation of population Reminder GA Population of individuals (each individual is represented as a string of symbols: genes and chromosomes) Fitness function: estimates the goodness of individuals Selection: only those with high fitness function values are allowed to reproduce Reproduction: crossover allows offspring to inherit good features from their parents mutation (randomly altering genes) may produce new (hopefully good) features bad individuals are throw away when the limit of population size is reached To ensure good results, the population size must be large There are several similarities of GA, Simulated Annealing and Search. These algorithms have been combined. You can invent your own new variants and compare to “pure” algorithms from the class
Informed Search Summary Best-first search is general search where the minimum cost nodes (according to some measure) are expanded first. Greedy search uses minimal estimated cost h(n) to the goal state as measure. This reduces the search time, but the algorithm is neither complete nor optimal. A* search combines uniform-cost search and greedy search: f(n) = g(n) + h(n) and handles state repetitions and h(n) never overestimates. A* is complete, optimal and optimally efficient, but its space complexity is still bad. The time complexity depends on the quality of the heuristic function. IDA* reduces the memory requirements of A*.
Informed Search Summary Hill-climbing algorithms keep only a single state in memory, but can get stuck on local optima. Simulated annealing escapes local optima, and is complete and optimal given a slow enough cooling schedule (in probability). Genetic algorithms escapes local optima, and is complete and optimal given a long enough evolution time and large population size.