COSC 3101A - Design and Analysis of Algorithms 10 BFS, DFS Topological Sort Strongly Connected Components Many of these slides are taken from Monica Nicolescu, Univ. of Nevada, Reno, monica@cs.unr.edu
Searching in a Graph Graph searching = systematically follow the edges of the graph so as to visit the vertices of the graph Two basic graph searching algorithms: Breadth-first search Depth-first search The difference between them is in the order in which they explore the unvisited edges of the graph Graph algorithms are typically elaborations of the basic graph-searching algorithms 7/6/2004 Lecture 10 COSC3101A
Breadth-First Search (BFS) Input: A graph G = (V, E) (directed or undirected) A source vertex s V Goal: Explore the edges of G to “discover” every vertex reachable from s, taking the ones closest to s first Output: d[v] = distance (smallest # of edges) from s to v, for all v V A “breadth-first tree” rooted at s that contains all reachable vertices 7/6/2004 Lecture 10 COSC3101A
Breadth-First Search (cont.) Discover vertices in increasing order of distance from the source s – search in breadth not depth Find all vertices at 1 edge from s, then all vertices at 2 edges from s, and so on 1 2 5 4 3 11 12 6 7 9 7/6/2004 Lecture 10 COSC3101A
Breadth-First Search (cont.) Keeping track of progress: Color each vertex in either white, gray or black Initially, all vertices are white When being discovered a vertex becomes gray After discovering all its adjacent vertices the node becomes black Use FIFO queue Q to maintain the set of gray vertices source 1 2 5 4 3 1 2 5 4 3 1 2 5 4 3 7/6/2004 Lecture 10 COSC3101A
Breadth-First Tree BFS constructs a breadth-first tree Initially contains the root (source vertex s) When vertex v is discovered while scanning the adjacency list of a vertex u vertex v and edge (u, v) are added to the tree u is the predecessor (parent) of v in the breadth-first tree A vertex is discovered only once it has at most one parent source 1 2 5 4 3 7/6/2004 Lecture 10 COSC3101A
BFS Additional Data Structures G = (V, E) represented using adjacency lists color[u] – the color of the vertex for all u V [u] – predecessor of u If u = s (root) or node u has not yet been discovered [u] = NIL d[u] – the distance from the source s to vertex u Use a FIFO queue Q to maintain the set of gray vertices source d=1 =1 1 2 5 4 3 d=2 =2 d=1 =1 d=2 =5 7/6/2004 Lecture 10 COSC3101A
BFS(G, s) for each u V[G] - {s} do color[u] WHITE d[u] ← x y for each u V[G] - {s} do color[u] WHITE d[u] ← [u] = NIL color[s] GRAY d[s] ← 0 [s] = NIL Q Q ← ENQUEUE(Q, s) r s t u v w x y r s t u v w x y Q: s 7/6/2004 Lecture 10 COSC3101A
BFS(G, s) while Q do u ← DEQUEUE(Q) for each v Adj[u] r s t u v w x y while Q do u ← DEQUEUE(Q) for each v Adj[u] do if color[v] = WHITE then color[v] ← GRAY d[v] ← d[u] + 1 [v] = u ENQUEUE(Q, v) color[u] BLACK Q: s 1 r s t u v w x y Q: w 1 r s t u v w x y Q: w, r 7/6/2004 Lecture 10 COSC3101A
Example r s t u v w x y 1 r s t u v w x y v w x y 1 2 r s t u r s t u v w x y 1 r s t u v w x y v w x y 1 2 r s t u Q: s Q: w, r Q: r, t, x 1 2 r s t u v w x y 1 2 3 r s t u v w x y 1 2 3 r s t u v w x y Q: t, x, v Q: x, v, u Q: v, u, y 1 2 3 r s t u v w x y 1 2 3 r s t u v w x y r s t u 1 2 3 v w x y Q: u, y Q: y Q: 7/6/2004 Lecture 10 COSC3101A
Analysis of BFS for each u V - {s} do color[u] WHITE d[u] ← [u] = NIL color[s] GRAY d[s] ← 0 [s] = NIL Q Q ← ENQUEUE(Q, s) O(V) (1) 7/6/2004 Lecture 10 COSC3101A
Analysis of BFS Total running time for BFS = O(V + E) while Q do u ← DEQUEUE(Q) for each v Adj[u] do if color[v] = WHITE then color[v] = GRAY d[v] ← d[u] + 1 [v] = u ENQUEUE(Q, v) color[u] BLACK Scan Adj[u] for all vertices in the graph Each vertex is scanned only once, when the vertex is dequeued Sum of lengths of all adjacency lists = (E) Scanning operations: O(E) (1) (1) Total running time for BFS = O(V + E) 7/6/2004 Lecture 10 COSC3101A
Shortest Paths Property BFS finds the shortest-path distance from the source vertex s V to each node in the graph Shortest-path distance = (s, u) Minimum number of edges in any path from s to u source r s t u 1 2 3 v w x y 7/6/2004 Lecture 10 COSC3101A
Depth-First Search Input: Goal: Output: G = (V, E) (No source vertex given!) Goal: Explore the edges of G to “discover” every vertex in V starting at the most current visited node Search may be repeated from multiple sources Output: 2 timestamps on each vertex: d[v] = discovery time f[v] = finishing time (done with examining v’s adjacency list) Depth-first forest 1 2 5 4 3 7/6/2004 Lecture 10 COSC3101A
Depth-First Search Search “deeper” in the graph whenever possible Edges are explored out of the most recently discovered vertex v that still has unexplored edges 1 2 5 4 3 After all edges of v have been explored, the search “backtracks” from the parent of v The process continues until all vertices reachable from the original source have been discovered If undiscovered vertices remain, choose one of them as a new source and repeat the search from that vertex DFS creates a “depth-first forest” 7/6/2004 Lecture 10 COSC3101A
DFS Additional Data Structures Global variable: time-step Incremented when nodes are discovered/finished color[u] – similar to BFS White before discovery, gray while processing and black when finished processing [u] – predecessor of u d[u], f[u] – discovery and finish times GRAY WHITE BLACK 2V d[u] f[u] 1 ≤ d[u] < f [u] ≤ 2 |V| 7/6/2004 Lecture 10 COSC3101A
DFS(G) for each u V[G] do color[u] ← WHITE [u] ← NIL time ← 0 do if color[u] = WHITE then DFS-VISIT(u) Every time DFS-VISIT(u) is called, u becomes the root of a new tree in the depth-first forest u v w x y z 7/6/2004 Lecture 10 COSC3101A
DFS-VISIT(u) color[u] ← GRAY time ← time+1 d[u] ← time for each v Adj[u] do if color[v] = WHITE then [v] ← u DFS-VISIT(v) color[u] ← BLACK time ← time + 1 f[u] ← time u v w x y z time = 1 1/ u v w x y z 1/ 2/ u v w x y z 7/6/2004 Lecture 10 COSC3101A
Example u v w x y z u v w x y z u v w x y z u v w x y z u v w x y z u 1/ u v w x y z 1/ 2/ u v w x y z 1/ 2/ 3/ u v w x y z 1/ 2/ 4/ 3/ u v w x y z 1/ 2/ 4/ 3/ u v w x y z B 1/ 2/ 4/5 3/ u v w x y z B 1/ 2/ 4/5 3/6 u v w x y z B 1/ 2/7 4/5 3/6 u v w x y z B 1/ 2/7 4/5 3/6 u v w x y z B F 7/6/2004 Lecture 10 COSC3101A
Example (cont.) The results of DFS may depend on: 1/8 2/7 4/5 3/6 u v w x y z B F 1/8 2/7 9/ 4/5 3/6 u v w x y z B F 1/8 2/7 9/ 4/5 3/6 u v w x y z B F C 1/8 2/7 9/ 4/5 3/6 10/ u v w x y z B F C 1/8 2/7 9/ 4/5 3/6 10/ u v w x y z B F C 1/8 2/7 9/ 4/5 3/6 10/11 u v w x y z B F C 1/8 2/7 9/12 4/5 3/6 10/11 u v w x y z B F C The results of DFS may depend on: The order in which nodes are explored in procedure DFS The order in which the neighbors of a vertex are visited in DFS-VISIT 7/6/2004 Lecture 10 COSC3101A
Edge Classification Tree edge (reaches a WHITE vertex): (u, v) is a tree edge if v was first discovered by exploring edge (u, v) Back edge (reaches a GRAY vertex): (u, v), connecting a vertex u to an ancestor v in a depth first tree Self loops (in directed graphs) are also back edges 1/ u v w x y z 1/ 2/ 4/ 3/ u v w x y z B 7/6/2004 Lecture 10 COSC3101A
Edge Classification Forward edge (reaches a BLACK vertex & d[u] < d[v]): Non-tree edges (u, v) that connect a vertex u to a descendant v in a depth first tree Cross edge (reaches a BLACK vertex & d[u] > d[v]): Can go between vertices in same depth-first tree (as long as there is no ancestor / descendant relation) or between different depth-first trees 1/ 2/7 4/5 3/6 u v w x y z B F 1/8 2/7 9/ 4/5 3/6 u v w x y z B F C 7/6/2004 Lecture 10 COSC3101A
Analysis of DFS(G) for each u V[G] do color[u] ← WHITE [u] ← NIL time ← 0 do if color[u] = WHITE then DFS-VISIT(u) (V) (V) – exclusive of time for DFS-VISIT 7/6/2004 Lecture 10 COSC3101A
Analysis of DFS-VISIT(u) color[u] ← GRAY time ← time+1 d[u] ← time for each v Adj[u] do if color[v] = WHITE then [v] ← u DFS-VISIT(v) color[u] ← BLACK time ← time + 1 f[u] ← time DFS-VISIT is called exactly once for each vertex Each loop takes |Adj[v]| Total: ΣvV |Adj[v]| + (V) = (V + E) (E) 7/6/2004 Lecture 10 COSC3101A
Properties of DFS u = [v] DFS-VISIT(v) was called during a search of u’s adjacency list Vertex v is a descendant of vertex u in the depth first forest v is discovered during the time in which u is gray 1/ 2/ 3/ u v w x y z 7/6/2004 Lecture 10 COSC3101A
Parenthesis Theorem In any DFS of a graph G, for all u, v, exactly one of the following holds: [d[u], f[u]] and [d[v], f[v]] are disjoint, and neither of u and v is a descendant of the other [d[v], f[v]] is entirely within [d[u], f[u]] and v is a descendant of u [d[u], f[u]] is entirely within [d[v], f[v]] and u is a descendant of v y z s t 3/6 2/9 1/10 11/16 4/5 7/8 12/13 14/15 x w v u s t z v u y w x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 (s (z (y (x x) y) (w w) z) s) (t (v u) (u u) t) Well-formed expression: parenthesis are properly nested 7/6/2004 Lecture 10 COSC3101A
Other Properties of DFS Corollary Vertex v is a proper descendant of u d[u] < d[v] < f[v] < f[u] Theorem (White-path Theorem) In a depth-first forest of a graph G, vertex v is a descendant of u if and only if at time d[u], there is a path u v consisting of only white vertices. 1/8 2/7 9/12 4/5 3/6 10/11 u v B F C 1/ 2/ u v 7/6/2004 Lecture 10 COSC3101A
Topological Sort Topological sort of a directed acyclic graph G = (V, E): a linear order of vertices such that if there exists an edge (u, v), then u appears before v in the ordering. Directed acyclic graphs (DAGs) Used to represent precedence of events or processes that have a partial order a before b b before c b before c a before c What about a and b? a before c Topological sort helps us establish a total order 7/6/2004 Lecture 10 COSC3101A
Topological Sort TOPOLOGICAL-SORT(V, E) Running time: (V + E) Call DFS(V, E) to compute finishing times f[v] for each vertex v When each vertex is finished, insert it onto the front of a linked list Return the linked list of vertices undershorts 11/ 16 17/ 18 socks 12/ 15 pants shoes 13/ 14 shirt 1/ 8 6/ 7 belt watch 9/ 10 tie 2/ 5 jacket 3/ 4 socks undershorts pants shoes watch shirt belt tie jacket Running time: (V + E) 7/6/2004 Lecture 10 COSC3101A
Topological Sort undershorts 11/ 16 17/ 18 socks Topological sort: an ordering of vertices along a horizontal line so that all directed edges go from left to right. 12/ 15 pants shoes 13/ 14 shirt 1/ 8 6/ 7 belt watch 9/ 10 tie 2/ 5 jacket 3/ 4 socks undershorts pants shoes watch shirt belt tie jacket 7/6/2004 Lecture 10 COSC3101A
Lemma A directed graph is acyclic a DFS on G yields no back edges. Proof: “”: acyclic no back edge Assume back edge prove cycle Assume there is a back edge (u, v) v is an ancestor of u there is a path from v to u in G (v u) v u + the back edge (u, v) yield a cycle v u (u, v) 7/6/2004 Lecture 10 COSC3101A
Lemma A directed graph is acyclic a DFS on G yields no back edges. Proof: “”: no back edge acyclic Assume cycle prove back edge Suppose G contains cycle c Let v be the first vertex discovered in c, and (u, v) be the preceding edge in c At time d[v], vertices of c form a white path v u u is descendant of v in depth-first forest (by white-path theorem) (u, v) is a back edge v (u, v) u 7/6/2004 Lecture 10 COSC3101A
Strongly Connected Components Given directed graph G = (V, E): A strongly connected component (SCC) of G is a maximal set of vertices C V such that for every pair of vertices u, v C, we have both u v and v u. 7/6/2004 Lecture 10 COSC3101A
The Transpose of a Graph GT = transpose of G GT is G with all edges reversed GT = (V, ET), ET = {(u, v) : (v, u) E} If using adjacency lists: we can create GT in (V + E) time 1 2 5 4 3 1 2 5 4 3 7/6/2004 Lecture 10 COSC3101A
Finding the SCC Observation: G and GT have the same SCC’s u and v are reachable from each other in G they are reachable from each other in GT Idea for computing the SCC of a DAG G = (V, E): Make two depth first searches: one on G and one on GT 1 2 5 4 3 1 2 5 4 3 7/6/2004 Lecture 10 COSC3101A
STRONGLY-CONNECTED-COMPONENTS(G) call DFS(G) to compute finishing times f[u] for each vertex u compute GT call DFS(GT), but in the main loop of DFS, consider vertices in order of decreasing f[u] (as computed in first DFS) output the vertices in each tree of the depth-first forest formed in second DFS as a separate SCC 7/6/2004 Lecture 10 COSC3101A
Example a b c d e f g h DFS on the initial graph G 13/ 14 11/ 16 1/ 10 8/ 9 b 16 e 15 a 14 c 10 d 9 g 7 h 6 f 4 12/ 15 3/ 4 2/ 7 5/ 6 a b c d e f g h DFS on GT: start at b: visit a, e start at c: visit d start at g: visit f start at h Strongly connected components: C1 = {a, b, e}, C2 = {c, d}, C3 = {f, g}, C4 = {h} 7/6/2004 Lecture 10 COSC3101A
Component Graph The component graph is a DAG b c d e f g h a b e c d f g h The component graph GSCC = (VSCC, ESCC): VSCC = {v1, v2, …, vk}, where vi corresponds to each strongly connected component Ci There is an edge (vi, vj) ESCC if G contains a directed edge (x, y) for some x Ci and y Cj The component graph is a DAG 7/6/2004 Lecture 10 COSC3101A
Lemma 1 Let C and C’ be distinct SCC’s in G Let u, v C, and u’, v’ C’ Suppose there is a path u u’ in G Then there cannot also be a path v’ v in G. Proof Suppose there is a path v’ v There exists u u’ v’ There exists v’ v u u and v’ are reachable from each other, so they are not in separate SCC’s: contradiction! C C’ u u’ v v’ 7/6/2004 Lecture 10 COSC3101A
Notations Extend notation for d (starting time) and f (finishing time) to sets of vertices U V: d(U) = minuU { d[u] } (earliest discovery time) f(U) = maxuU { f[u] } (latest finishing time) C1 C2 a b c d d(C2) f(C2) =1 =10 d(C1) f(C1) =11 =16 13/ 14 11/ 16 1/ 10 8/ 9 12/ 15 3/ 4 2/ 7 5/ 6 e f g h C3 C4 d(C3) f(C3) =2 =7 d(C4) f(C4) =5 =6 7/6/2004 Lecture 10 COSC3101A
Lemma 2 Let C and C’ be distinct SCCs in a directed graph G = (V, E). If there is an edge (u, v) E, where u C and v C’ then f(C) > f(C’). Consider C1 and C2, connected by edge (b, c) C1 C2 a b c d d(C2) f(C2) =1 =10 d(C1) f(C1) =11 =16 13/ 14 11/ 16 1/ 10 8/ 9 12/ 15 3/ 4 2/ 7 5/ 6 e f g h C3 C4 d(C3) f(C3) =3 =7 d(C4) f(C4) =5 =6 7/6/2004 Lecture 10 COSC3101A
Corollary Let C and C’ be distinct SCCs in a directed graph G = (V, E). If there is an edge (u, v) ET, where u C and v C’ then f(C) < f(C’). Consider C2 and C1, connected by edge (c, b) C1 = C’ C2 = C Since (c, b) ET (b, c) E From previous lemma: f(C1) > f(C2) f(C’) > f(C) a b c d e f g h C3 C4 7/6/2004 Lecture 10 COSC3101A
Discussion f(C) < f(C’) Each edge in GT that goes between different components goes from a component with an earlier finish time (in the DFS) to one with a later finish time C1 = C’ C2 = C a b c d e f g h C3 C4 7/6/2004 Lecture 10 COSC3101A
Why does SCC Work? When we do the second DFS, on GT, we start with a component C such that f(C) is maximum (b, in our case) We start from b and visit all vertices in C1 From corollary: f(C) > f(C’) for all C C’ there are no edges from C to any other SCCs in GT DFS will visit only vertices in C1 The depth-first tree rooted at b contains exactly the vertices of C1 C1 C2 a b c d e f g h f 4 h 6 g 7 d 9 c 10 a 14 e 15 b 16 C3 C4 7/6/2004 Lecture 10 COSC3101A
Why does SCC Work? (cont.) The next root chosen in the second DFS is in SCC C2 such that f(C) is maximum over all SCC’s other than C1 DFS visits all vertices in C2 the only edges out of C2 go to C1, which we’ve already visited The only tree edges will be to vertices in C2 Each time we choose a new root it can reach only: vertices in its own component vertices in components already visited C1 C2 a b c d e f g h f 4 h 6 g 7 d 9 c 10 a 14 e 15 b 16 C3 C4 7/6/2004 Lecture 10 COSC3101A
Readings Chapter 22 Appendix B 7/6/2004 Lecture 10 COSC3101A