BFS and DFS BFS and DFS in directed graphs BFS in undirected graphs An improved undirected BFS-algorithm
The Buffered Repository Tree (BRT) Stores key-value pairs (k,v) Supported operations: I NSERT (k,v) inserts a new pair (k,v) into T E XTRACT (k) extracts all pairs with key k Complexity: I NSERT : O((1/B)log 2 (N/B)) amortized E XTRACT : O(log 2 (N/B) + K/B) amortized (K = number of reported elements)
The Buffered Repository Tree (BRT) (2,4)-tree Leaves store between B/4 and B elements Internal nodes have buffers of size B Root in main memory, rest on disk Main memory Disk
Main memory Disk I NSERT (k,v) O(X/B) I/Os to empty buffer of size X B Amortized charge per element and level: O(1/B) Height of tree: O(log 2 (N/B)) Insertion cost: O((1/B)log 2 (N/B)) amortized Main memory Disk
Elements with key k E XTRACT (k) Number of traversed nodes: O(log 2 (N/B) + K/B) I/Os per node: O(1) Cost of operation: O(log 2 (N/B) + K/B) But careful with removal of extracted elements Main memory Disk Main memory Disk
Cost of Rebalancing O(N/B) leaf creations and deletions ØO(N/B) node splits, fusions, merges Each such operation costs O(1) I/Os ØO(N/B) I/Os for rebalancing Theorem: The BRT supports I NSERT and E XTRACT operations in O((1/B)log 2 (N/B)) and O(log 2 (N/B) + K/B) I/Os amortized.
Directed DFS Algorithm proceeds as internal memory algorithm: Use stack to determine order in which vertices are visited For current vertex v: Find unvisited out-neighbor w Push w on the stack Continue search at w If no unvisited out-neighbor exists Remove v from stack Continue search at v’s parent Stack operations cost O(N/B) I/Os Problem: Finding an unvisited vertex
Directed DFS Data structures: BRT T Stores directed edges (v,w) with key v Priority queues P(v), one per vertex Stores unexplored out-edges of v Invariant: Not in P(v) In P(v) and in T In P(v), but not in T
Directed DFS Finding next vertex after vertex v: v E XTRACT (v): Retrieve red edges from T Remove these edges from P(v) using D ELETE Retrieve next edge using D ELETE M IN on P(v) Insert in-edges of w into T w Push w on the stack O(log 2 (|E|/B) + K 1 /B) O(sort(K 1 )) O(1 + (K 2 /B)log 2 (|E|/B)) O(1/B) amortized O((1/B)log m (|E|/B)) O(|V| log 2 (|E|/B) + |E|/B) O(|V| + sort(|E|)) O((|E|/B)log 2 (|E|/B)) O(|V|/B) O(sort(|E|)) Total: O((|V| + |E|/B)log 2 (|E|/B))
Directed DFS + BFS BFS can be solved using same algorithm Only modification: Use queue (FIFO) instead of stack Theorem: Depth first-search and breadth-first search in a directed graph G = (V,E) can be solved in O((|V|+|E|/B)log 2 (|E|/B)) I/Os. Exercise: Convince yourself that the priority queues P(v) are not necessary in the case of BFS.
Undirected BFS Observation: For v L(i), all its neighbors are in L(i – 1) L(i) L(i + 1). ØBuild BFS-tree level by level: Initially, L(0) = {r} Given levels L(i – 1) and L(i): Let X(i) = set of all neighbors of vertices in L(i) Let L(i + 1) = X(i) \ (L(i – 1) L(i)) Partition graph into levels L(0), L(1),... around source: L(0), L(1), L(2), L(3)
Undirected BFS Constructing L(i + 1): Retrieve adjacency lists of vertices in L(i) X(i) Sort X(i) Scan L(i – 1), L(i), and X(i) to Remove duplicates from X(i) Compute X(i) \ (L(i – 1) L(i)) Complexity: O(|L(i)| + sort(|L(i – 1)| + |X(i)|)) I/Os O( ) I/Os|V| +sort(|E|) Theorem: Breadth-first search in an undirected graph G = (V,E) can be solved in O(|V| + sort(|E|)) I/Os.
A Faster BFS-Algorithm Problem with simple BFS-algorithm: Random accesses to retrieve adjacency lists Idea for a faster algorithm: Load more than one adjacency list at a time Reduces number of random accesses Causes edges to be involved in more than one iteration of the algorithm ØTrade-off
A Faster BFS-Algorithm (Randomized) Let 0 < < 1 be a parameter (specified later) Two phases: Build |V| disjoint clusters of diameter O(1/ ) Perform modified version of S IMPLE B FS Clusters C 1,...,C q formed using BFS from randomly chosen set V’ = {r 1,...,r q } of masters Vertex is chosen as a master with probability (coin flip) Observation: E[|V’|] = |V|. That is, the expected number of clusters is |V|.
Forming Clusters (Randomized) Apply S IMPLE B FS to form clusters L(0) = V’ v C i if v is descendant of r i s
Forming Clusters (Randomized) Lemma: The expected diameter of a cluster is 2/ . E[k] 1/ Corollary: The clusters are formed in expected O((1/ )sort(|E|)) I/Os. x v1v1 v2v2 v3v3 v4v4 v5v5 s vkvk
Forming Clusters (Randomized) Form files F 1,...,F q, one per cluster F i = concatenation of adjacency lists of vertices in C i Augment every edge (v,w) F i with the start position of file F j s.t. w C j : Edge = triple (v,w,p j ) s
The BFS-Phase Maintain a sorted pool H of edges s.t. adjacency lists of vertices in L(i) are contained in H Scan L(i) and H to find vertices in L(i) whose adjacency lists are not in H Form list of start positions of files containing these adjacency lists and remove duplicates Retrieve files, sort them, and merge resulting list H’ with H Scan L(i) and H to build X(i) Construct L(i + 1) from L(i – 1), L(i), and X(i) as before O((|L(i)| + |H|)/B) O(sort(|L(i)|)) O(K + sort(|H’|) + |H|/B) O((|L(i)| + |H|)/B) O(sort(|L(i)| + |L(i – 1)| + |X(i)|))
The BFS-Phase I/O-complexity of single step: O(K + |H|/B + sort(|H’| + |L(i – 1)| + |L(i)| + |X(i)|)) Expected I/O-complexity: O( |V| + |E|/( B) + sort(|E|)) Choose Theorem: BFS in an undirected graph G = (V,E) can be solved in I/Os.
Single Source Shortest Paths The tournament tree SSSP in undirected graphs SSSP in planar graphs
Single Source Shortest Paths Need: I/O-efficient priority queue I/O-efficient method to update only unvisited vertices
The Tournament Tree =I/O-efficient priority queue Supports: I NSERT (x,p) D ELETE (x) D ELETE M IN D ECREASE K EY (x,p) All operations take O((1/B)log 2 (N/B)) I/Os amortized Note: N = size of the universe # elements in the tree
The Tournament Tree Static binary tree over all elements in the universe Elements map to leaves, M elements per leaf Internal nodes have signal buffers of size M Root in main memory, rest on disk Main memory Disk Internal nodes store between M/2 and M elements
Main memory Disk The Tournament Tree Elements stored at each node are sorted by priority Elements at node v have smaller priority than elements at v’s descendants Convention: x T if and only if p(x) is finite
The Tournament Tree Deletions Operation D ELETE (x) signal D ELETE (x) x D ELETE (x) U PDATE (x, ) v
The Tournament Tree Insertions and Updates Operations I NSERT (x,p) and D ECREASE K EY (x,p) signal U PDATE (x,p) x w v Current priority p’ If p < p’: Update If p p’: Do nothing All elements < p Forward signal to w At least one element p Insert x Send D ELETE (x) to w
The Tournament Tree Handling Overflow Let y be element with highest priority p y Send signal P USH (y,p y ) to appropriate child of v y w v
The Tournament Tree Keeping the Nodes Filled w v O(M/B) I/Os to move M/2 elements one level up the tree
Main memory Disk The Tournament Tree Signal Propagation Scan v’s signal, partition into sets X u and X w Load u into memory, apply signals in X u to u, insert signals into u’s signal buffer Do the same for w O((|X| + M)/B) = O(|X|/B) I/Os
The Tournament Tree Analysis Elements travel up the tree Cost: O(1/B) I/Os amortized per element and level O((K/B)log 2 (N/B)) I/Os for K operations Signals travel down the tree Cost: O(1/B) I/Os amortized per signal and level O(K) signals for K operations O((K/B)log 2 (N/B)) I/Os Theorem: The tournament tree supports I NSERT, D ELETE, D ELETE M IN, and D ECREASE K EY operations in O((1/B)log 2 (N/B)) I/Os amortized.
Single Source Shortest Paths Modified Dijkstra: Retrieve next vertex v from priority queue Q using D ELETE M IN Retrieve v’s adjacency list Update distances of all of v’s neighbors, except predecessor u on the path from s to v Repeat O(|V| + (E/B)log 2 (V/B)) I/Os using tournament tree
Single Source Shortest Paths Problem: Observation: If v performs a spurious update of u, u has tried to update v before. Record this update attempt of u on v by insterting u into another priority queue Q’ Priority: d(s,u) + w({u,v}) u v
Single Source Shortest Paths Second modification: Retrieve next vertex using two D ELETE M IN ’s, one on Q, one on Q’ Let (x,p x ) be the element retrieved from Q, let (y,p y ) be the element retrieved from Q’ If p x p y : re-insert (y,p y ) into Q’ and proceed as normal If p x < p y : re-insert (x,p x ) into Q and perform a D ELETE (y) on Q
Single Source Shortest Paths Lemma: A spurious update is removed from Q before the targeted vertex can be retrieved using D ELETE M IN. Event A: Spurious update happens (“time”: d(s,v)) Event B: Vertex u is deleted byretrieval of u from Q’ (“time”: d(s,u) + w(e)) Event C: Vertex u is retrieved from Q using D ELETE M IN operation (“time”: d(s,v) + w(e)) u v
Single Source Shortest Paths Assume that all vertices have different distance from source s Ød(u) < d(v) d(v) d(u) + w(e) < d(u) + w(e) Sequence of events: A B C Theorem: The single source shortest path problem on an undirected graph G = (V,E) can be solved in O(|V| + (|E|/B)log 2 (|V|/B)) I/Os.
Planar Graphs Shortest paths in planar graphs Planar separators Planar DFS
Shortest Paths in Planar Graphs s GRGR
sv vs Observation: For every separator vertex v, the distances from s to v in G and G R are the same. ØThe distances from s to all separator vertices can be computed in G R.
s Shortest Paths in Planar Graphs Observation: For every vertex v in G i, dist(s,v) = min{dist(s,x) + dist(x,v) : v G i }. ØCan compute dist(s,v) in the following graph: vs
Shortest Paths in Planar Graphs Three main steps: Solve all-pairs shortest paths in subgraphs G i Compute shortest paths from s to separator vertices in G R Compute shortest paths from s to all remaining vertices
Shortest Paths in Planar Graphs Regular h-partition: O(N/h) subgraphs G 1,...,G r Each G i has size at most h Each G i has boundary size at most Total number of separator vertices Number of boundary sets is O(N/h)
Shortest Paths in Planar Graphs Three main steps: Solve all-pairs shortest paths in subgraphs G i Compute shortest paths from s to separator vertices in G R Compute shortest paths from s to all remaining vertices Assume the given partition is regular B 2 -partition ØSteps 1 and 3 take O(scan(N)) I/Os ØGraph G R has O(N/B) vertices and O(N) edges
Shortest Paths in Planar Graphs Data structures: List L storing tentative distances of all vertices Priority queue Q storing vertices with their tentative distances as priorities One step: Retrieve next vertex v using D ELETE M IN Get distances of v’s neighbors from L Update their distances in Q using D ELETE and I NSERT ØO(N + sort(N)) I/Os
Shortest Paths in Planar Graphs One I/O per boundary set Each boundary set is touched O(B) times: Once per vertex on the boundary of the region O(N/B 2 ) boundary sets O(N/B) I/Os
Planar Separator Goal: Compute a separator S of size whose removal partitions G into subgraphs of size at most h. Basic idea: Compute hierarchy of log(DB) graphs of geometrically decreasing size using graph contraction Compute a separator of the smallest graph Undo the contractions and maintain the separator while doing this Assumption: M = (h log 2 B)
G0G0 Planar Separator G1G1 G2G2
Properties: All G i are planar |G i+1 | |G i |/2 Every vertex in G i+1 represents only a constant number of vertices in G i Every vertex in G i+1 represents at most 2 i+2 vertices in G 0 r = log 2 (DB) graphs G 0,…,G r Ø|G r | = O(N/(DB))
Planar Separator G0G0 G1G1 G2G2
Compute separator S r of G r : S r = S r partitions G r into connected components of size at most h log 2 (DB) Takes O(|G r |) = O(N/B) I/Os [AD96]
Planar Separator Compute S i from S i+1 : Let S i be the set of vertices in G i represented by the vertices in S i+1 Connected components of G i – S i have size at most c h log 2 (DB) Partition every connected components of size more than h log 2 (DB) into components of size h log 2 (DB) separator S i Takes O(sort(|G i |)) I/Os: Connected components O(sort(|G i |)) Partitioning happens in internal memory Total: O(sort(N)) I/Os
Planar Separator Separator S 0 partitions G 0 into connected components of size at most h log 2 (DB) Size of S 0 :
Planar Separator Compute a superset S of S 0 so that no connected component of G – S has size more than h: Partition every connected component of G – S 0 separately in internal memory Total number of extra separator vertices is Extra cost: O(sort(N)) I/Os Theorem: A separator S of size whose removal partitions G into subgraphs of size at most h can be obtained in O(sort(N)) I/Os, provided that M = (h log 2 B).
Building the Graph Hierarchy Properties: All G i are planar |G i+1 | |G i |/2 Every vertex in G i+1 represents only a constant number of vertices in G i Every vertex in G i+1 represents at most 2 i+2 vertices in G 0 Build G i+1 from G i by Contracting edges Merging vertices of degree 2 with the same neighbors
Building the Graph Hierarchy Iterative approach: Extract set of edges that can be contracted Contract subset of these edges to reduce number of vertices by a factor of two Repeat until no contractible edges remain Problem: Standard graph contraction procedure may contract too many vertices into a single vertex.
Building the Graph Hierarchy Solution: Compute maximal matching of contractible subgraph Contract edges in the matching New problem: We may not contract sufficient number of edges to reduce number of vertices by a constant factor Two-stage contraction: Contract maximal matching Contract edges between matched and unmatched vertices
Building the Graph Hierarchy Why is this two-stage approach good? No unmatched vertex remains in contractible subgraph Every matched vertex represents at least two vertices before the contraction ØSize of graph reduces by a factor of two ØIf a single iteration takes O(sort(|G i |)) I/Os, the whole construction of G i+1 from G i takes O(sort(|G i |)) I/Os
A Single Contraction Phase Maximal matching can be computed and contracted in O(sort(|H|)) I/Os, where H is the current contractible subgraph Bipartite contraction: Takes O(sort(|H|)) I/Os using buffer tree as priority queue
Building the Graph Hierarchy Lemma: Graph G i+1 can be constructed from G i in O(sort(|G i |)) I/Os. Corollary: The whole graph hierarchy can be built in O(sort(|G 0 |)) = O(sort(N)) I/Os.
Level 0 Level 1 Level 2 Planar DFS s
s
Observation Observation: Every cycle in the i-th layer is a boundary cycle of graph G i. ØEvery bicomp of a layer is a cycle. Level > i Level < i
DFS in a Layer
Planar DFS DFS in a single layer H i takes O(sort(|H i |)) I/Os: Compute the bicomps Root the bicomp tree Remove one of the edges incident to parent cutpoint in each cycle ØTotal I/O-complexity: O(sort(N))
Planar DFS GiGi v
r
Building the Face-on-Vertex Graph
Lower Bounds and Open Problems Lower bounds List ranking, BFS, DFS, and shortest paths Connected and biconnected components Open problems
Lower Bounds Split Proximate Neighbors
Lemma: Split proximate neighbors requires (perm(N)) I/Os I(N) O(scan(N)) Total: O(I(N) + scan(N)) = O(I(N)) I(N) = (perm(N))
Lower Bounds List Ranking Consider general algorithms for weighted list ranking Algorithm is only allowed to use associativity of sum operator ØAlgorithm can be made to have the following property: For every vertex v, v and succ(v) are both in main memory at some point during the course of the algorithm Note: The lower bound we show does not hold for unweighted list ranking or weighted list ranking over groups.
Lower Bounds List Ranking When both copies of x are in main memory, move to buffer of size B When buffer full, flush to disk Split proximate neighbors could be solved in O(I(N) + scan(N)) I/Os I(N) = (perm(N))
Lower Bounds List Ranking, BFS, DFS, and Shortest Paths Theorem: List ranking requires (perm(N)) I/Os. List ranking can be solved using BFS, DFS, or SSSP from the head of the list. Theorem: BFS, DFS, and SSSP require (perm(N)) I/Os. Note: Again, lower bound holds only for algorithms that compute distances from source only by adding path lengths.
Lower Bounds Segmented Duplicate Elimination Let P N P 2 Elements drawn from interval [2P+1,3P] Construct Boolean array C[2P+1..3P] s.t. C[i] = 1 iff i S Proposition: Segmented duplicate elimination requires (perm(N)) I/Os S: P/2
S1S1 S2S2 S3S3 S4S Lower Bounds Connected Components Graph construction O(scan(N)) I/Os |V| = (P), |E| = N
Lower Bounds Connected and Biconnected Components Theorem: Computing the connected components of a graph G = (V,E) requires (perm(|E|)) I/Os. Theorem: Computing the biconnected components of a graph G = (V,E) requires (perm(|E|)) I/Os.
More Classes of Sparse Graphs Grid graphs Separators: Size in O(sort(N)) I/Os BFS/SSSP: O(sort(N)) DFS: Graphs of bounded treewidth Separators: O(N/h) in O(sort(N)) I/Os BFS/SSSP: O(sort(N)) DFS: ???
Open Problems Optimal separators for grid graphs DFS Grid graphs Graphs of bounded treewidth Semi-external shortest paths Optimal connectivity Optimal BFS, DFS, and shortest paths or lower bounds Directed graphs Topological sorting Strongly connected components