CMSC 380 Graph Traversals and Search
2 Graph Traversals Graphs can be traversed breadth-first, depth- first, or by path length We need to specifically guard against cycles Mark each vertex as “closed” when we encounter it and do not consider closed vertices again
Queuing Function Used to maintain a ranked list of nodes that are candidates for expansion Substituting different queuing functions yields different traversals/searches: FIFO Queue : breadth first traversal LIFO Stack : depth first traversal Priority Queue : Dijkstra’s algorithm / uniform cost
Bookkeeping Structures Typical node structure includes: vertex ID predecessor node path length cost of the path Problem includes: graph starting vertex goalTest(Node n) – tests if node is a goal state (can be omitted for full graph traversals)
5 General Graph Search / Traversal // problem describes the graph, start vertex, and goal test // queueingfn is a comparator function that ranks two states // graphSearch returns either a goal node or failure graphSearch(problem, queuingFn) { open = {}, closed = {} queuingFn(open, new Node(problem.startvertex)) //initialize loop { if empty(open) then return FAILURE //no nodes remain curr = removeFront(open) //get current node if problem.goalTest(curr.vertex) //optional goaltest return curr //for search if curr.vertex is not in closed { //avoid duplicates add curr.vertex to closed for each Vertex w adjacent to curr.vertex // expand node queuingFn(open, new Node(w,curr)); }
6 Unweighted Shortest Path Problem Unweighted shortest-path problem: Given an unweighted graph G = ( V, E ) and a starting vertex s, find the shortest unweighted path from s to every other vertex in G. Breadth first search Use FIFO queue Finds shortest path if edges are unweighted (or equal cost) Recover path by backtracking through nodes
7 Breadth-First Example: Queue v1 v2 v4 v3 v5 ∞ node open ∞ ∞ ∞ ∞ v1 0 1 v1 v2 v3 2 v2 v4 v1v2v3v4 BFS Traversal
8 DFS Example: Stack v1 v2 v4 v3 v5 open v1 v2 v3 node v4 v1 v3 v2v4 DFS Traversal
9 Traversal Performance What is the performance of DF and BF traversal? Each vertex appears in the stack or queue exactly once in the worst case. Therefore, the traversals are at least O( |V| ). However, at each vertex, we must find the adjacent vertices. Therefore, df- and bf- traversal performance depends on the performance of the getAdjacent operation.
10 GetAdjacent Method 1: Look at every vertex (except u), asking “are you adjacent to u?” List L; for each Vertex v except u if (v.isAdjacentTo(u)) L.push_back(v); Assuming O(1) performance for isAdjacentTo, then getAdjacent has O( |V| ) performance and traversal performance is O( |V 2 | )
11 GetAdjacent (2) Method 2: Look only at the edges which impinge on u. Therefore, at each vertex, the number of vertices to be looked at is deg(u), the degree of the vertex For this approach where getAdjacent is O( deg( u ) ). The traversal performance is since getAdjacent is done O( |V| ) times. However, in a disconnected graph, we must still look at every vertex, so the performance is O( |V| + |E| ).
12 Weighted Shortest Path Problem Single-source shortest-path problem: Given as input a weighted graph, G = ( V, E ), and a distinguished starting vertex, s, find the shortest weighted path from s to every other vertex in G. Dijkstra’s algorithm (also called uniform cost search) –Use a priority queue in general search/traversal –Keep tentative distance for each vertex giving shortest path length using vertices visited so far. –Record vertex visited before this vertex (to allow printing of path). –At each step choose the vertex with smallest distance among the unvisited vertices (greedy algorithm).
13 Example Network v1 v7v2 v8v4v6v3 v9v10v
14 Dijkstra’s Algorithm The pseudo code for Dijkstra’s algorithm assumes the following structure for a Vertex object class Vertex { public List adj; //Adjacency list public boolean known; public DisType dist;//DistType is probably int public Vertex path; //Other fields and methods as needed }
15 Dijkstra’s Algorithm void dijksra(Vertex start) { for each Vertex v in V { v.dist = Integer.MAX_VALUE; v.known = false; v.path = null; } start.distance = 0; while there are unknown vertices { v = unknown vertex with smallest distance v.known = true; for each Vertex w adjacent to v if (!w.known) if (v.dist + weight(v, w)< w.distance){ decrease(w.dist to v.dist+weight(v, w)) w.path = v; }
16 Correctness of Dijkstra’s Algorithm The algorithm is correct because of a property of shortest paths: If P k = v 1, v 2,..., v j, v k, is a shortest path from v 1 to v k, then P j = v 1, v 2,..., v j, must be a shortest path from v 1 to v j. Otherwise P k would not be as short as possible since P k extends P j by just one edge (from v j to v k ) P j must be shorter than P k (assuming that all edges have positive weights). So the algorithm must have found P j on an earlier iteration than when it found P k. i.e. Shortest paths can be found by extending earlier known shortest paths by single edges, which is what the algorithm does.
17 Running Time of Dijkstra’s Algorithm The running time depends on how the vertices are manipulated. The main ‘while’ loop runs O( |V| ) time (once per vertex) Finding the “unknown vertex with smallest distance” (inside the while loop) can be a simple linear scan of the vertices and so is also O( |V| ). With this method the total running time is O (|V| 2 ). This is acceptable (and perhaps optimal) if the graph is dense ( |E| = O (|V| 2 ) ) since it runs in linear time on the number of edges. If the graph is sparse, ( |E| = O (|V| ) ), we can use a priority queue to select the unknown vertex with smallest distance, using the deleteMin operation (O( lg |V| )). We must also decrease the path lengths of some unknown vertices, which is also O( lg|V| ). The deleteMin operation is performed for every vertex, and the “decrease path length” is performed for every edge, so the running time is O( |E| lg|V| + |V|lg|V|) = O( (|V|+|E|) lg|V|) = O(|E| lg|V|) if all vertices are reachable from the starting vertex
18 Dijkstra and Negative Edges Note in the previous discussion, we made the assumption that all edges have positive weight. If any edge has a negative weight, then Dijkstra’s algorithm fails. Why is this so? Suppose a vertex, u, is marked as “known”. This means that the shortest path from the starting vertex, s, to u has been found. However, it’s possible that there is negatively weighted edge from an unknown vertex, v, back to u. In that case, taking the path from s to v to u is actually shorter than the path from s to u without going through v. Other algorithms exist that handle edges with negative weights for weighted shortest-path problem.
19 Directed Acyclic Graphs A directed acyclic graph is a directed graph with no cycles. A strict partial order R on a set S is a binary relation such that for all a S, aRa is false (irreflexive property) for all a,b,c S, if aRb and bRc then aRc is true (transitive property) To represent a partial order with a DAG: represent each member of S as a vertex for each pair of vertices (a,b), insert an edge from a to b if and only if a R b
20 More Definitions Vertex i is a predecessor of vertex j if and only if there is a path from i to j. Vertex i is an immediate predecessor of vertex j if and only if ( i, j ) is an edge in the graph. Vertex j is a successor of vertex i if and only if there is a path from i to j. Vertex j is an immediate successor of vertex i if and only if ( i, j ) is an edge in the graph.
21 Topological Ordering A topological ordering of the vertices of a DAG G = (V,E) is a linear ordering such that, for vertices i, j V, if i is a predecessor of j, then i precedes j in the linear order, i.e. if there is a path from v i to v j, then v i comes before v j in the linear order
22 Topological Sort
23 TopSort Example
24 Running Time of TopSort 1. At most, each vertex is enqueued just once, so there are O(|V| ) constant time queue operations. 2. The body of the for loop is executed at most once per edges = O( |E| ) 3. The initialization is proportional to the size of the graph if adjacency lists are used = O( |E| + |V| ) 4. The total running time is therefore O ( |E| + |V| )
Example for illustrating uninformed search strategies S CBA D G E
Evaluating Search Strategies Completeness Guarantees finding a solution whenever one exists Time complexity How long (worst or average case) does it take to find a solution? Usually measured in terms of the number of nodes expanded Space complexity How much space is used by the algorithm? Usually measured in terms of the maximum size of the “open” list during the search Optimality/Admissibility If a solution is found, is it guaranteed to be an optimal one? That is, is it the one with minimum cost?
Breadth-First Enqueue nodes in FIFO (first-in, first-out) order. Complete Optimal (i.e., admissible) if all operators have the same cost. Otherwise, not optimal but finds solution with shortest path length. Exponential time and space complexity, O(b d ), where d is the depth of the solution and b is the branching factor (i.e., number of children) at each node Will take a long time to find solutions with a large number of steps because must look at all shorter length possibilities first A complete search tree of depth d where each non-leaf node has b children, has a total of 1 + b + b b d = (b (d+1) - 1)/(b-1) nodes For a complete search tree of depth 12, where every node at depths 0,..., 11 has 10 children and every node at depth 12 has 0 children, there are = ( )/9 = O(10 12 ) nodes in the complete search tree. If BFS expands 1000 nodes/sec and each node uses 100 bytes of storage, then BFS will take 35 years to run in the worst case, and it will use 111 terabytes of memory!
28 Breadth-First Traversal void bfs() { Queue q; for all v in V, d[v] = // mark each vertex unvisited q.enqueue(startvertex); // start with any vertex d[startvertex] = 0;// mark visited while ( !q.isEmpty() ) { Vertex u = q.dequeue( ); for each Vertex w adjacent to u { if (d[w] == ) {// w not marked as visited d[w] = d[u]+1;// mark visited path[w] = u; // where we came from q.enqueue(w); }
29 Recursive Depth First Traversal void dfs() { for (each v V) dfs(v) } void dfs(Vertex v) { if (!v.visited) { v.visited = true; for each Vertex w adjacent to v { if ( !w.visited ) dfs(w) }
30 DFS with explicit stack void dfs() { Stack s; s.push(startvertex);// initialize the stack startvertex.visited = true;// mark as visited while ( !s.isEmpty() ) { Vertex u = s.pop(); for each Vertex w adjacent to u { if (!w.visited) { w.visited = true; s.push(w); }
Depth-First (DFS) Enqueue nodes in LIFO (last-in, first-out) order. That is, use a stack data structure to order nodes. May not terminate without a “depth bound,” i.e., cutting off search below a fixed depth D ( “depth-limited search”) Not complete (with or without cycle detection, and with or without a cutoff depth) Exponential time, O(b d ), but only linear space, O(bd) Can find long solutions quickly if lucky (and short solutions slowly if unlucky!) When search hits a dead-end, can only back up one level at a time even if the “problem” occurs because of a bad operator choice near the top of the tree. Hence, only does “chronological backtracking”
Uniform-Cost (UCS) Enqueue nodes by path cost. That is, let g(n) = cost of the path from the start node to the current node n. Sort nodes by increasing value of g. Called “Dijkstra’s Algorithm” in the algorithms literature and similar to “Branch and Bound Algorithm” in operations research literature Complete (*) Optimal/Admissible (*) Admissibility depends on the goal test being applied when a node is removed from the nodes list, not when its parent node is expanded and the node is first generated Exponential time and space complexity, O(b d )
Depth-First Iterative Deepening (DFID) First do DFS to depth 0 (i.e., treat start node as having no successors), then, if no solution found, do DFS to depth 1, etc. until solution found do DFS with depth cutoff c c = c+1 Complete Optimal/Admissible if all operators have the same cost. Otherwise, not optimal but guarantees finding solution of shortest length (like BFS). Time complexity seems worse than BFS or DFS because nodes near the top of the search tree are generated multiple times, but because almost all of the nodes are near the bottom of a tree, the worst case time complexity is still exponential, O(b d ).
Depth-First Iterative Deepening If branching factor is b and solution is at depth d, then nodes at depth d are generated once, nodes at depth d-1 are generated twice, etc. IDS : (d) b + (d-1) b 2 + … + (2) b (d-1) + b d = O(b d ). If b=4, then worst case is 1.78 * 4 d, i.e., 78% more nodes searched than exist at depth d (in the worst case). However, let’s compare this to the time spent on BFS: BFS : b + b 2 + … + b d + (b (d+1) – b) = O(b d ). Same time complexity of O(b d ), but BFS expands some nodes at depth d+1, which can make a HUGE difference: With b = 10, d = 5, BFS: , , , ,990 = 1,111,100 IDS: , , ,000 = 123,450 IDS can actually be quicker in-practice than BFS, even though it regenerates early states.
Depth-First Iterative Deepening Exponential time complexity, O(b d ), like BFS Linear space complexity, O(bd), like DFS Has advantage of BFS (i.e., completeness) and also advantages of DFS (i.e., limited space and finds longer paths more quickly) Generally preferred for large state spaces where solution depth is unknown