3/3 Factoid for the day: “Most people have more than the average number of feet” & eyes & ears & noses.

Slides:



Advertisements
Similar presentations
Informed search algorithms
Advertisements

Review: Search problem formulation
Lecture 7. Network Flows We consider a network with directed edges. Every edge has a capacity. If there is an edge from i to j, there is an edge from.
Uninformed search strategies
Informed search algorithms
A* Search. 2 Tree search algorithms Basic idea: Exploration of state space by generating successors of already-explored states (a.k.a.~expanding states).
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
Situation Calculus for Action Descriptions We talked about STRIPS representations for actions. Another common representation is called the Situation Calculus.
Solving Problem by Searching
1 Graphs Traversals In many graph problems, we need to traverse the vertices of the graph in some order Analogy: Binary tree traversals –Pre-order Traversal.
Lecture 17 Path Algebra Matrix multiplication of adjacency matrices of directed graphs give important information about the graphs. Manipulating these.
Problem Solving Agents A problem solving agent is one which decides what actions and states to consider in completing a goal Examples: Finding the shortest.
Solving Problems by Searching Currently at Chapter 3 in the book Will finish today/Monday, Chapter 4 next.
Graphs By JJ Shepherd. Introduction Graphs are simply trees with looser restrictions – You can have cycles Historically hard to deal with in computers.
Artificial Intelligence Lecture No. 7 Dr. Asad Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
CPSC 322, Lecture 9Slide 1 Search: Advanced Topics Computer Science cpsc322, Lecture 9 (Textbook Chpt 3.6) January, 23, 2009.
8/29. Administrative.. Bouncing mails –Qle01; jmussem; rbalakr2 Send me a working address for class list Blog posting issues Recitation session.
9/9. Num iterations: (d+1) Asymptotic ratio of # nodes expanded by IDDFS vs DFS (b+1)/ (b-1) (approximates to 1 when b is large)
MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002
3/3 Factoid for the day: “Most people have more than the average number of feet” & eyes & ears & noses.
DAST 2005 Tirgul 11 (and more) sample questions. DAST 2005 Q.Let G = (V,E) be an undirected, connected graph with an edge weight function w : E→R. Let.
Using Search in Problem Solving
Tirgul 11 BFS,DFS review Properties Use. Breadth-First-Search(BFS) The BFS algorithm executes a breadth search over the graph. The search starts at a.
9/10  Name plates for everyone!. Blog qn. on Dijkstra Algorithm.. What is the difference between Uniform Cost Search and Dijkstra algorithm? Given the.
Tirgul 7 Review of graphs Graph algorithms: – BFS (next tirgul) – DFS – Properties of DFS – Topological sort.
Class of 28 th August. Announcements Lisp assignment deadline extended (will take it until 6 th September (Thursday). In class. Rao away on 11 th and.
Hardness Results for Problems
CS420 lecture eight Greedy Algorithms. Going from A to G Starting with a full tank, we can drive 350 miles before we need to gas up, minimize the number.
Minimum Spanning Trees. Subgraph A graph G is a subgraph of graph H if –The vertices of G are a subset of the vertices of H, and –The edges of G are a.
Informed Search Idea: be smart about what paths to try.
Minimum Spanning Trees
Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree R 陳建霖 R 宋彥朋 B 楊鈞羽 R 郭慶徵 R
Prof. Swarat Chaudhuri COMP 482: Design and Analysis of Algorithms Spring 2012 Lecture 10.
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
Informed (Heuristic) Search
CSE 2331 / 5331 Topic 12: Shortest Path Basics Dijkstra Algorithm Relaxation Bellman-Ford Alg.
Lecture 3: Uninformed Search
Introduction to Graphs And Breadth First Search. Graphs: what are they? Representations of pairwise relationships Collections of objects under some specified.
Ricochet Robots Mitch Powell Daniel Tilgner. Abstract Ricochet robots is a board game created in Germany in A player is given 30 seconds to find.
Fahiem Bacchus © 2005 University of Toronto 1 CSC384: Intro to Artificial Intelligence Search II ● Announcements.
GRAPH ALGORITHM. Graph A pair G = (V,E) – V = set of vertices (node) – E = set of edges (pairs of vertices) V = (1,2,3,4,5,6,7) E = ( (1,2),(2,3),(3,5),(1,4),(4,5),(6,7)
Romania. Romania Problem Initial state: Arad Goal state: Bucharest Operators: From any node, you can visit any connected node. Operator cost, the.
CSC317 1 At the same time: Breadth-first search tree: If node v is discovered after u then edge uv is added to the tree. We say that u is a predecessor.
Nattee Niparnan. Graph  A pair G = (V,E)  V = set of vertices (node)  E = set of edges (pairs of vertices)  V = (1,2,3,4,5,6,7)  E = ((1,2),(2,3),(3,5),(1,4),(4,
Graphs – Breadth First Search
Lecture 3: Uninformed Search
Unweighted Shortest Path Neil Tang 3/11/2010
Minimal Spanning Trees
CSE 421: Introduction to Algorithms
Elementary Graph Algorithms
Autumn 2015 Lecture 11 Minimum Spanning Trees (Part II)
Elementary graph algorithms Chapter 22
Breadth First Search 11/21/ s
What to do when you don’t know anything know nothing
Searching for Solutions
Chapter 22: Elementary Graph Algorithms I
Lectures on Graph Algorithms: searching, testing and sorting
On the effect of randomness on planted 3-coloring models
Artificial Intelligence
CSE 421: Introduction to Algorithms
Breadth First Search s
Introducing Underestimates
HW 1: Warmup Missionaries and Cannibals
Informed Search Idea: be smart about what paths to try.
Elementary graph algorithms Chapter 22
Breadth First Search s
HW 1: Warmup Missionaries and Cannibals
CSC 325: Algorithms Graph Algorithms David Luebke /24/2019.
Artificial Intelligence
Informed Search Idea: be smart about what paths to try.
Presentation transcript:

3/3 Factoid for the day: “Most people have more than the average number of feet” & eyes & ears & noses

A B C D G Uniform Cost Search No:A (0) N1:B(1)N2:G(9) N3:C(2)N4:D(3)N5:G(5) Completeness? Optimality? if d < d’, then paths with d distance explored before those with d’ Branch & Bound argument (as long as all op costs are +ve) Efficiency? (as bad as blind search..) A B C D G Bait & Switch Graph Notation: C(n,n’) cost of the edge between n and n’ g(n) distance of n from root dist(n,n’’) shortest distance between n and n’’ If the graph is undirected?

A B C D G Uniform Cost Search on Undirected Graph No:A (0) N1:B(1)N2:G(9) N31:C(2)N41:D(3)N51:G(5) Notation: C(n,n’) cost of the edge between n and n’ g(n) distance of n from root dist(n,n’’) shortest distance between n and n’’ N32:A(2) N42:B(3) N52:C(3) Cycles do not affect the completeness and optimality! The same world state may come into the search queue multiple times (e.g. N0:A; N32:A), but each additional time it comes it comes with a higher g-value!  If edge costs are positive then no single node can be expanded more than a finite times..  We will eventually get out of the cycles and find the optimal solution  You can reduce the number of Repeated expansions by doing cycle checking

Visualizing Breadth-First & Uniform Cost Search Breadth-First goes level by level This is also a proof of optimality…

Proof of Optimality of Uniform search Proof of optimality: Let N be the goal node we output. Suppose there is another goal node N’ We want to prove that g(N’) >= g(N) Suppose this is not true. i.e. g(N’) < g(N) --Assumption A1 When N was picked up for expansion, Either N’ itself, or some ancestor of N’, Say N’’ must have been on the search queue If we picked N instead of N’’ for expansion, It was because g(N) <= g(N’’) ---Fact f1 But g(N’) = g(N’’) + dist(N’’,N’) So g(N’) >= g(N’’) So from f1, we have g(N) <= g(N’) But this contradicts our assumption A1 No N N’ N’’ Holds only because dist(N’’,N’) >= 0 This will hold if every operator has +ve cost The partial path to N’ through N’’ is already longer than the path to N.

“Informing” Uniform search… A B C D G Bait & Switch Graph No:A (0) N1:B(.1)N2:G(9) N3:C(.2)N4:D(.3)N5:G(25.3) Would be nice if we could tell that N2 is better than N1 --Need to take not just the distance until now, but also distance to goal --Computing true distance to goal is as hard as the full search --So, try “bounds” h(n) prioritize nodes in terms of f(n) = g(n) +h(n) two bounds: h1(n) <= h*(n) <= h2(n) Which guarantees optimality? --h1(n) <= h2(n) <= h*(n) Which is better function? Admissibility Informedness

“Informing” Uniform search… A B C D G Bait & Switch Graph No:A (0) N1:B(.1)N2:G(9) N3:C(.2)N4:D(.3)N5:G(25.3) Would be nice if we could tell that N2 is better than N1 --Need to take not just the distance until now, but also distance to goal --Computing true distance to goal is as hard as the full search --So, try “bounds” h(n) prioritize nodes in terms of f(n) = g(n) +h(n) two bounds: h1(n) <= h*(n) <= h2(n) Which guarantees optimality? --h1(n) <= h2(n) <= h*(n) Which is better function? Admissibility Informedness

(if there are multiple goal nodes, we consider the distance to the nearest goal node) Several proofs: 1. Based on Branch and bound --g(N) is better than f(N’’) and f(n’’) <= cost of best path through N’’ 2. Based on contours -- f() contours are more goal directed than g() contours 3. Based on contradiction

Where do heuristics (bounds) come from? From relaxed problems (the more relaxed, the easier to compute heuristic, but the less accurate it is) For path planning on the plane (with obstacles)? For 8-puzzle problem? For Traveling sales person? Assume away obstacles. The distance will then be The straightline distance (see next slide for other abstractions) Assume ability to move the tile directly to the place distance= # misplaced tiles Assume ability to move only one position at a time distance = Sum of manhattan distances. Relax the “circuit” requirement. Minimum spanning tree

Different levels of abstraction for shortest path problems on the plane I G I G “circular abstraction” I G “Polygonal abstraction” I G “disappearing-act abstraction” hDhD hChC hPhP h* The obstacles in the shortest path problem canbe abstracted in a variety of ways. --The more the abstraction, the cheaper it is to solve the problem in abstract space --The less the abstraction, the more “informed” the heuristic cost (i.e., the closer the abstract path length to actual path length) Actual Why are we inscribing the obstacles rather than circumscribing them?

hDhD hChC hPhP h*h* h0h0 Cost of computing the heuristic Cost of searching with the heuristic Total cost incurred in search Not always clear where the total minimum occurs Old wisdom was that the global min was closer to cheaper heuristics Current insights are that it may well be far from the cheaper heuristics for many problems E.g. Pattern databases for 8-puzzle polygonal abstractions for SP Plan graph heuristics for planning How informed should the heuristic be? Reduced level of abstraction (i.e. more and more concrete)

h* h1 h4 h5 Admissibility/Informedness h2 h3 Max(h2,h3) Seach Nodes Heuristic Value

Manhattan Distance Heuristic Manhattan distance is 6+3=9 moves

Performance on 15 Puzzle Random 15 puzzle instances were first solved optimally using IDA* with Manhattan distance heuristic (Korf, 1985). Optimal solution lengths average 53 moves. 400 million nodes generated on average. Average solution time is about 50 seconds on current machines.

Limitation of Manhattan Distance To solve a 24-Puzzle instance, IDA* with Manhattan distance would take about 65,000 years on average. Assumes that each tile moves independently In fact, tiles interfere with each other. Accounting for these interactions is the key to more accurate heuristic functions.

More Complex Tile Interactions Min disance (M.d.) is 19 moves, but 31 moves are needed. M.d. is 20 moves, but 28 moves are needed M.d. is 17 moves, but 27 moves are needed

Pattern Database Heuristics Culberson and Schaeffer, 1996 A pattern database is a complete set of such positions, with associated number of moves. e.g. a 7-tile pattern database for the Fifteen Puzzle contains 519 million entries.

Heuristics from Pattern Databases moves is a lower bound on the total number of moves needed to solve this particular state.

Precomputing Pattern Databases Entire database is computed with one backward breadth-first search from goal. All non-pattern tiles are indistinguishable, but all tile moves are counted. The first time each state is encountered, the total number of moves made so far is stored. Once computed, the same table is used for all problems with the same goal state.

Proof of Optimality of A* search Proof of optimality: Let N be the goal node we output. Suppose there is another goal node N’ We want to prove that g(N’) >= g(N) Suppose this is not true. i.e. g(N’) < g(N) --Assumption A1 When N was picked up for expansion, Either N’ itself, or some ancestor of N’, Say N’’ must have been on the search queue If we picked N instead of N’’ for expansion, It was because f(N) <= f(N’’) ---Fact f1 i.e. g(N) + h(N) <= g(N’’) + h(N’’) Since N is goal node, h(N) = 0 So, g(N) <= g(N’’) + h(N’’) But g(N’) = g(N’’) + dist(N’’,N’) Given h(N’) <= h*(N’’) = dist(N’’,N’) (lower bound) So g(N’) = g(N’’)+dist(N’’,N’) >= g(N’’) +h(N’’) ==Fact f2 So from f1 and f2 we have g(N) <= g(N’) But this contradicts our assumption A1 No N N’ N’’ Holds only because h(N’’) is a lower bound on dist(N’’,N’) The lower-bound (optimistic) estimate on the length of the path to N’ through N’’ is already longer than the path to N. f(n) is the estimate of the length of the shortest path to goal passing through n

It will not expand Nodes with f >f* (f* is f-value of the Optimal goal which is the same as g* since h value is zero for goals) Uniform cost search A* Visualizing A* Search But, does f value really increase along a path?

A B C D G A* Search—Admissibility, Monotonicity, Pathmax Correction No:A (0) N1:B(.1+8.8)N2:G(9+0) N3:C(max(.2+0),8.8)N4:D(.3+25) No:A (0) N1:B( )N2:G(9+0) f(B)= = 8.9 f(C)=.2+0 = 0.2 This doesn’t make sense since we are reducing the estimate of the actual cost of the path A—B—C—D—G To make f(.) monotonic along a path, we say f(n) = max( f(parent), g(n)+h(n)) PathMax Adjustment This is just enforcing Triangle law of inequality That the sum of two sides Must be greater than the third B C G f(C) 0.2 f(B) 8.9 C(B,C) 0.1 Is magenta-h admissible? Is green-h admissible? Does f(c) make sense? Why wasn’t this needed for uniform cost search?

A B C D G A* Search—Admissibility, Monotonicity, Pathmax Correction No:A (0) N1:B(.1+8.8)N2:G(9+0) N3:C(max(.2+0),8.8)N4:D(.3+25) No:A (0) N1:B( )N2:G(9+0) f(B)= = 8.9 f(C)=.2+0 = 0.2 This doesn’t make sense since we are reducing the estimate of the actual cost of the path A—B—C—D—G To make f(.) monotonic along a path, we say f(n) = max( f(parent), g(n)+h(n)) PathMax Adjustment Consistency of a heuristic will be enforced if it obeys triangle inequality B C G h(C) 0.1 h(B) 8.8 C(B,C) 0.1 Is magenta-h admissible? Is green-h admissible? Does f(c) make sense? Why wasn’t this needed for uniform cost search? If a heuristic is inconsistently optimistic, then it can be more optimistic at a node than its parent node. When this happens, f values of the expanded nodes are not guaranteed to increase monotonically.

2/5

IDA*--do iterative depth first search but Set threshold in terms of f (not depth)

IDA* to handle the A* memory problem Basicaly IDDFS, except instead of the iterations being defined in terms of depth, we define it in terms of f-value –Start with the f cutoff equal to the f-value of the root node –Loop Generate and search all nodes whose f-values are less than or equal to current cutoff. –Use depth-first search to search the trees in the individual iterations –Keep track of the node N’ which has the smallest f- value that is still larger than the current cutoff. Let this f-value be next-largest-f-value -- If the search finds a goal node, terminate. If not, set cutoff = next-largest-f-value and go back to Loop Properties: Linear memory. #Iterations in the worst case? = B d !!  (Happens when all nodes have distinct f-values. There is such a thing as too much discrimination…)

Using memory more effectively: SMA* A* can take exponential space in the worst case IDA* takes linear space (in solution depth) always If A* is consuming too much space, one can argue that IDA* is consuming too little Better idea is to use all the memory that is available, and start cleaning up as memory starts filling up –Idea: When the memory is about to fill up, remove the leaf node with the worst f-value from the search tree But remember its f-value at its parent (which is still in the search tree) –Since the parent is now the leaf node, it too can get removed to make space If ever the rest of the tree starts looking less promising than the parent of the removed node, the parent will be picked up and expanded again. –Works quite well—but can thrash when memory is too low Not unlike your computer with too little RAM..

IDA* to handle the A* memory problem Basicaly IDDFS, except instead of the iterations being defined in terms of depth, we define it in terms of f-value –Start with the f cutoff equal to the f-value of the root node –Loop Generate and search all nodes whose f-values are less than or equal to current cutoff. –Use depth-first search to search the trees in the individual iterations –Keep track of the node N’ which has the smallest f- value that is still larger than the current cutoff. Let this f-value be next-largest-f-value -- If the search finds a goal node, terminate. If not, set cutoff = next-largest-f-value and go back to Loop Properties: Linear memory. #Iterations in the worst case? = B d !!  (Happens when all nodes have distinct f-values. There is such a thing as too much discrimination…)

Nothing beyond this was discussed in Spring 2012

Used while discussing A* alg

On “predicting” the effectiveness of Heuristics Unfortunately, it is not the case that a heuristic h 1 that is more informed than h 2 will always do fewer node expansions than h 2. -We can only gurantee that h 1 will expand less nodes with f-value less than f* than h 2 will Consider the plot on the right… do you think h1 or h2 is likely to do better in actual search? –The “differentiation” ability of the heuristic—I.e., the ability to tell good nodes from the bad ones-- is also important. But it is harder to measure. Some new work that does a histogram characterization of the distribution of heuristic values [Korf, 2000] Nevertheless, informedness of heuristics is a reasonable qualitative measure Nodes  Heuristic value h1 h2 h* Let us divide the number of nodes expanded n E into Two parts: n I which is the number of nodes expanded Whose f-values were strictly less than f* (I.e. the Cost of the optimal goal), and n G is the # of expanded Nodes with f-value greater than f*. So, n E =n I +n G A more informed heuristic is only guaranteed to have A smaller n I —all bets are off as far as the n G value is Concerned. In many cases n G may be relatively large Compared to n I making the n E wind up being higher For an informed heuristic! Is h 1 better or h 2 ?

Not enough to show the correct configuration of the 18-puzzle problem or rubik’s cube..  (although by including the list of actions as part of the state, you can support hill-climbing)

What is needed: --A neighborhood function The larger the neighborhood you consider, the less myopic the search (but the more costly each iteration) --A “goodness” function needs to give a value to non-solution configurations too for 8 queens: (-ve) of number of pair-wise conflicts

Problematic scenarios for hill-climbing  When the state-space landscape has local minima, any search that moves only in the greedy direction cannot be (asymptotically) complete  Random walk, on the other hand, is asymptotically complete Idea: Put random walk into greedy hill-climbing Ridges Solution(s):  Random restart hill-climbing  Do the non-greedy thing with some probability p>0  Use simulated annealing

Making Hill-Climbing Asymptotically Complete Random restart hill-climbing –Keep some bound B. When you made more than B moves, reset the search with a new random initial seed. Start again. Getting random new seed in an implicit search space is non-trivial! –In 8-puzzle, if you generate a random state by making random moves from current state, you are still not truly random (as you will continue to be in one of the two components) “biased random walk”: Avoid being greedy when choosing the seed for next iteration –With probability p, choose the best child; but with probability (1-p) choose one of the children randomly Use simulated annealing –Similar to the previous idea—the probability p itself is increased asymptotically to one (so you are more likely to tolerate a non- greedy move in the beginning than towards the end) With random restart or the biased random walk strategies, we can solve very large problems million queen problems in under minutes!