External Memory Graph Algorithms and Applications to GIS Laura Toma Bowdoin College.

Slides:



Advertisements
Similar presentations
Algorithms (and Datastructures) Lecture 3 MAS 714 part 2 Hartmut Klauck.
Advertisements

Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
Lars Arge 1/13 Efficient Handling of Massive (Terrain) Datasets Lars Arge A A R H U S U N I V E R S I T E T Department of Computer Science.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture10.
Data Structure and Algorithms (BCS 1223) GRAPH. Introduction of Graph A graph G consists of two things: 1.A set V of elements called nodes(or points or.
I/O-Algorithms Lars Arge January 31, Lars Arge I/O-algorithms 2 Random Access Machine Model Standard theoretical model of computation: –Infinite.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus.
Chapter 9: Graphs Summary Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova, Simpson College.
I/O-Algorithms Lars Arge Spring 2009 January 27, 2009.
I/O-Algorithms Lars Arge Spring 2007 January 30, 2007.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
Connected Components, Directed Graphs, Topological Sort COMP171.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
1 Data Structures and Algorithms Graphs I: Representation and Search Gal A. Kaminka Computer Science Department.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
I/O-Algorithms Lars Arge Spring 2006 February 2, 2006.
Flow Computation on Massive Grid Terrains
Graphs. Graphs Many interesting situations can be modeled by a graph. Many interesting situations can be modeled by a graph. Ex. Mass transportation system,
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Efficient Algorithms for Large-Scale GIS Applications Laura Toma Duke University.
CS 350 Algorithms for GIS. What is GIS? Definitions  A powerful set of tools for collecting, storing, retrieving at will, transforming and displaying.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, and Ke Yi Duke University University of Aarhus.
I/O-Algorithms Lars Arge University of Aarhus March 7, 2005.
External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July
Flow modeling on grid terrains. Why GIS?  How it all started.. Duke Environmental researchers: computing flow accumulation for Appalachian Mountains.
Connected Components, Directed graphs, Topological sort COMP171 Fall 2005.
From Elevation Data to Watershed Hierarchies Pankaj K. Agarwal Duke University Supported by ARO W911NF
Flow modeling on grid terrains. DEM Representations TIN Grid Contour lines Sample points.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
External Memory Algorithms Kamesh Munagala. External Memory Model Aggrawal and Vitter, 1988.
More Graph Algorithms Weiss ch Exercise: MST idea from yesterday Alternative minimum spanning tree algorithm idea Idea: Look at smallest edge not.
External-Memory MST (Arge, Brodal, Toma). Minimum-Spanning Tree Given a weighted, undirected graph G=(V,E), the minimum-spanning tree (MST) problem is.
I/O-Algorithms Lars Arge Spring 2008 January 31, 2008.
Tools for Planar Networks Grigorios Prasinos and Christos Zaroliagis CTI/University of Patras 3 rd Amore Research Seminar – Oegstgeest, The Netherlands,
From Topographic Maps to Digital Elevation Models Daniel Sheehan IS&T Academic Computing Anne Graham MIT Libraries.
Review of Graphs A graph is composed of edges E and vertices V that link the nodes together. A graph G is often denoted G=(V,E) where V is the set of vertices.
TerraStream: From Elevation Data to Watershed Hierarchies Thursday, 08 November 2007 Andrew Danner (Swarthmore), T. Moelhave (Aarhus), K. Yi (HKUST), P.
TerraFlow Flow Computation on Massive Grid Terrains Helena Mitasova Dept. of Marine, Earth & Atmospheric Sciences, NCSU, USA
Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Graph Algorithms: Minimum.
1 GRAPHS - ADVANCED APPLICATIONS Minimim Spanning Trees Shortest Path Transitive Closure.
I/O-Algorithms Lars Arge Fall 2014 August 28, 2014.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002.
A Survey of Techniques for Designing I/O-Efficient Algorithm S.Fahimeh Moosavi Fall 1389.
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Terracost: A Versatile and Scalable Approach to Computing Least-Cost-Path Surfaces for Massive Grid-Based.
Graphs. Definitions A graph is two sets. A graph is two sets. –A set of nodes or vertices V –A set of edges E Edges connect nodes. Edges connect nodes.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus.
Data Structures & Algorithms Graphs
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
1 Chapter 22 Elementary Graph Algorithms. 2 Introduction G=(V, E) –V = vertex set –E = edge set Graph representation –Adjacency list –Adjacency matrix.
Efficient Algorithms for Large-Scale GIS Applications Laura Toma Duke University.
Flow Modeling on Massive Grids Laura Toma, Rajiv Wickremesinghe with Lars Arge, Jeff Chase, Jeff Vitter Pat Halpin, Dean Urban in collaboration with.
Laura TomaSimplified External memory Algorithms for Planar DAGs Simplified External Memory Algorithms for Planar DAGs July 2004 Lars Arge Laura Toma Duke.
CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.
Lecture 1: Basic Operators in Large Data CS 6931 Database Seminar.
CSE 421 Algorithms Richard Anderson Autumn 2015 Lecture 5.
SSSP in DAGs (directed acyclic graphs). DFS (depth first search) DFS(vertex v) { v.visited = TRUE; for each w adjacent to v do if(!w.visited) then dfs(w);
CSE 373: Data Structures and Algorithms Lecture 21: Graphs V 1.
Flow field representations for a grid DEM
Digital Terrain Analysis for Massive Grids
CS 350 Algorithms for GIS.
Advanced Topics in Data Management
Enumerating Distances Using Spanners of Bounded Degree
Graphs & Graph Algorithms 2
Introduction Wireless Ad-Hoc Network
GRAPHS Lecture 17 CS2110 Spring 2018.
Presentation transcript:

External Memory Graph Algorithms and Applications to GIS Laura Toma Bowdoin College

Outline  A GIS application: TerraFlow  I/O-efficient graph algorithms Problems and results Graph separation Topological sort on planar DAGs  Future directions/open questions

Why GIS?  How it all started.. Duke environmental researchers Computing flow accumulation for Appalachian Mountains took 14 days (with 512MB memory) 800km x 800km at 100m resolution  ~64 million points  GIS (Geographic Information Systems) System that handles spatial data Visualization, processing, queries, analysis… Rich source of problems for Computer Science Graphics, graph theory, computational geometry, scientific computing…

GIS and the Environment Indispensable tool Monitoring: keep an eye on the state of earth systems using satellites and monitoring stations (water, pollution, ecosystems, urban development,…) Modeling and simulation: predict consequences of human actions and natural processes Analysis and risk assessment: find the problem areas and analyse the possible causes (soil erosion, groundwater pollution,..) Planning and decision support: provide information and tools for better management of resources Lots of rain Dry Precipitation in Tropical South America High nitrogen concentrations Nitrogen in Chesapeake Bay

Computations on Terrains Reality: Elevation of terrain is a continuous function of two variables h(x,y) Estimate, predict, simulate  Flooding, pollution  Erosion, deposition  Vegetation structure  …. GIS: DEM ( Digital Elevation Model ) is a set of sample points and their heights {  x, y, h xy  } Model and compute indices

DEM Representations TIN Grid Contour lines Sample points TerraFlow

Modeling Flow on Terrains  What happens when it rains? Predict areas susceptible to floods. Predict location of streams.  Flow is modeled by computing two basic attributes from the DEM of the terrain: Flow Direction (FD) The direction water flows at a point Flow Accumulation (FA) Total amount of water that flows through a point if water is distributed according to the flow directions TerraFlow

Flow Accumulation of Panama TerraFlow

Panama Flow Accumulation: zoom TerraFlow

Uses Flow direction and flow accumulation are used for:  Computing other hydrological attributes river network moisture indices watersheds and watershed divides  Analysis and prediction of sediment and pollutant movement in landscapes.  Decision support in land management, flood and pollution prevention and disaster management

Massive Terrain Data  Remote sensing technology Massive amounts of terrain data Higher resolutions (1km, 100m, 30m, 10m, 1m,…)  NASA-SRTM Mission launched in 2001 Acquired data for 80% of earth at 30m resolution 5TB  USGS Most of US at 10m resolution  LIDAR 1m res

Example: LIDAR Terrain Data  Massive (irregular) point sets (1-10m resolution)  Relatively cheap and easy to collect Example: Jockey’s ridge (NC coast) TerraFlow

It’s Growing!  Appalachian Mountains Area if approx. 800 km x 800 km Sampled at: 100m resolution:  64 million points (128MB) 30m resolution:  640 (1.2GB) 10m resolution:  6400 = 6.4 billion (12GB) 1m resolution:  billion (1.2TB)

Sorting: sort(N) = I/Os Scanning: scan(N) = I/Os  I/O-operation: movement of one block of data from/to disk  Complexity measure: number of I/Os  Fundamental bounds: I/O Model [AV’88] N= problem size B = disk block size M = memory size M Block I/O  In practice B and M are big

Flow Direction (FD) on Grids  Water flows downhill follows the gradient  On grids: Approximated using 3x3 neighborhood SFD (Single-Flow Direction): FD points to the steepest downslope neighbor MFD (Multiple-Flow direction): FD points to all downslope neighbors

Computing FD  Goal: compute FD for every cell in the grid (FD grid)  Algorithm: For each cell compute SFD/MFD by inspecting 8 neighbor cells  Analysis: O(N) time for a grid of N cells  Is this all? NO! flat areas: Plateas and sinks

FD on Flat Areas  Plateaus A cell flows towards the nearest spill point on the boundary of the plateau Compute FD on plateaus using CC and BFS  Sinks Route the water uphill out of the sink by modeling flooding: uniformly pouring water on terrain until steady-state is reached Flooding removes (fills) sinks  Assign uphill flow directions on the original terrain by assigning downhill flow directions on the flooded terrain TerraFlow

Flooding  Watershed: part of the terrain that flows into a sink  Sinks  partition of terrain into watersheds  watershed graph G T Vertices are watersheds; add vertex for the “outside” watershed Edge (u,v) if watersheds u,v are adjacent Edge (u,v) labeled with lowest height on boundary between u and v  Flooding: Compute for each watershed u to the height h u of the lowest- height path in G T from u to the “outside” watershed. the height of a path is the height of the highest edge on path TerraFlow

Flooding  Plane-sweep algorithm with a Union-Find structure Initially only the outside watershed is done Sweep watershed graph bottom-up with a horizontal plane When hit edge (u,v) If both watersheds u and v are done, ignore If none is done, union them If precisely one is not done, raise it at h (u,v) and mark it done  Theorem: Flooding and the FD grid can be computed in O(sort(N)) I/Os on a grid DEM of size N. TerraFlow

Flow Accumulation (FA) on Grids FA models water flow through each cell with “uniform rain” Initially one unit of water in each cell Water distributed from each cell to neighbors pointed to by its FD Flow conservation: If several FD, distribute proportionally to height difference Flow accumulation of cell is total flow through it Goal: compute FA for every cell in the grid (FA grid)

Computing FA  FD graph node for each cell edge from a to b if FD of a points to b  FD graph must be acyclic  FD graph SFD: a set of trees MFD: a DAG

 Algorithm: Input: flow direction grid FD Output: flow accumulation grid FA (initialized to 1) Process (sweep) cells in topological order. For each cell: Read flow from FA grid and direction from FD grid Update flow in FA grid for downslope neighbors  Analysis One sweep enough: O(sort) + O(N) time for a grid of N cells,..but O(N) I/Os: Cells in topological order distributed over the terrain Standard FA Algorithm TerraFlow

Scalability Problem  We can compute FD and FA using simple O(N)-time algorithms ..but.. for large sets..?? Dataset Size (log)

I/O-Efficient Flow Accumulation  Eliminating scattered accesses to FD grid Store FD grid in topological order  Eliminating scattered accesses to FA grid Obs: flow to neighbor cell only needed at the time when the neighbor is processed: Time when cell is processed Topological rank priority Push flow by inserting flow increment in priority queue with priority equal to neighbor’s time Flow of cell obtained using DeleteMin operations  Use I/O-efficient priority queue [A95, BK97] O(N) operations in I/Os [ATV00]

TerraFlow  TerraFlow: implementation of I/O-efficient FD and FA algorithms Significantly faster on very large grids than existing GIS software Scalable: 1 billion elements!! (>2GB data)  Implementation C++, uses TPIE (Transparent Parallel I/O Environment) Library of I/O-efficient modules developed at Duke  Experimental platform TerraFlow, ArcInfo: 500MHz Alpha, FreeBSD 4.0, 1GB RAM GRASS/TARDEM: 500MHz Intel PIII, FreeBSD/Windows, 1GB RAM TerraFlow

Experimental Results  GRASS cannot handle Hawaii dataset (killed after 17 days)  TARDEM cannot handle Cumberlands dataset (killed after 20 days)  Significant speedup over ArcInfo (ESRI) for large datasets East-Coast TerraFlow: 8.7 Hours ArcInfo: 78 Hours Washington TerraFlow: 63 Hours ArcInfo: % TerraFlow

TerraFlow in GRASS PIII dual 1GHz processor, 1GB RAM DatasetGrid dimensions Grid size (million elements) Kaweah 1163 x Puerto Rico 4452 x Sierra Nevada3750 x Hawaii6784 x Lower New England 9148 x Panama11283 x r.terraflow 1.85 min 4.65 min min min 114 min 3.5 hr r.watershed 9.2 min 93 min 18.2 hours killed after 6 days < 1% done TerraFlow

Outline  A GIS application: TerraFlow  I/O-efficient graph algorithms Problems and results Graph separation Topological sort on planar DAGs  Future directions/open questions

I/O-Efficient Graph Algorithms  Input: G = (V,E) Assume edge-list representation of stored on disk  Basic problems: BFS, DFS, CC, SSSP, MST Standard internal memory algorithms for these problems use O(E) I/Os Hard in external memory! Lower bound: Ω(min{V, sort(V)}) (practically Ω(sort(V)) Adj(v1) Adj(v2) Adj(v3) … G

BFS and DFS DFS(u)  Mark u  For every v in Adj(u) If v not marked DFS(v) Internal memory: O(V+E) External memory:  one I/O per vertex to load adjacency list  O(V ) I/Os  one I/O per edge to check if v is marked  O(E) I/Os  O (V+E)= O (E) I/Os

I/O-Efficient Graph Algorithms  Problems: 1.Random (unstructured) accesses to the adjacency lists of vertices as they are visited  Ω(V) I/Os 2.Need to check if v has been already visited and/or read its key  Ω(E) I/Os o(E) algorithm: solve (2) o(V) algorithm: solve (1) and (2)

Upper Bounds Sparse Graphs  Sparse graphs E=O(V) CC, MST : O(sort(V)) if graph stays sparse under edge contraction Undirected BFS: O(sort(V)) ? open Undirected SSSP: O(sort(V)) ? open Undirected DFS: O(V) o(V) ? open Directed BFS, DFS, SSSP  O(sort(N)) BFS, SSSP, (DFS) on special classes of sparse graphs Planar Outerplanar, grid, bounded-treewidth

Planar Undirected Graphs  BFS, DFS, SSSP: O(sort(N)) I/Os O(sort(N)) I/O-efficient reductions [ABT’00, AMTZ’01] Separators can be computed in O(sort(N)) I/Os [MZ’02] O(sort(N)) I/Os [AMTZ’01] O(sort(N)) I/Os [ABT’00] DFS BFSSSSP separators

Planar graph separation: R-division  A partition of a planar graph using a set S of separator vertices into. subgraphs (clusters) G i of at most R vertices each such that: There are separators vertices in total There is no edge between a vertex in G i and a vertex in G j Each cluster is adjacent to separator vertices R R R R R R R R R

R -divisions and Planar Graph Algorithms  R-divisions [Frederickson’87]  dynamic graph algorithms [GI’91,KS’93], faster SP algorithms [HKRS’97], SP data structures  In external memory choose R = B 2 O(N/B) separator vertices O(N/B 2 ) clusters of O(B 2 ) vertices each and O(B) boundary vertices Can be computed in O(sort(N)) I/Os [MZ’02]  B 2 -division  SSSP, BFS, DFS, topological sort, APSP, diameter, SP data structures,..

Directed Ear Decomposition (DED)  A directed ear decomposition of a graph G is a partition of G into simple directed paths P 0, P 1, …, P k such that: P 0 is a simple cycle endpoints of each P i i>0 are in lower-indexed paths P j, P l, j,l<i internal vertices of each P i i>0 are not in any P j j<i  G has a directed ear decomposition if and only if it is strongly connected (exists directed cycle containing each pair of vertices u,v).  Planar DED: O(sort(N)) I/Os [ATZ’03]

Planar Topological Sort using DED  Theorem [KK’79]: The directed dual of a planar DAG is strongly connected and therefore has a directed ear decomposition.  Idea: Place vertices to the left of P 0 before vertices to the right Sort two sets recursively  Used in PRAM topological sort algorithm [KK93,K93]  PRAM simulation  O(sort(N)log N) I/Os  Improved to O(sort(N)) by defining and utilizing ordered ear decomposition tree [ATZ’03]

O( sort (N)) Topological Sort using B 2 -division  Construct a substitute graph G R using B 2 -division edge from v to u on boundary of G i if exists path from v to u in G i G R has O(N/B 2 )· O(B 2 )=O(N) edges, O(N/B) vertices  Topologically sort G R (separator vertices in G): Store in-degree of each vertex in list L Maintain list of in-degree zero vertices Repeatedly: Number an in-degree zero vertex v Consider all edges (v,u) and decrement in-degree of u in L  Incorporate vertices in G in the topological order of G R B2B2 v

O( sort (N)) Topological Sort using B 2 -division  Problem: Not clear how to incorporate removed vertices from G in topological order of separator vertices (G R )  Solution (assuming only one in-degree zero vertex s for simplicity): Longest-path-from-s order is a topological order Longest paths to removed vertices locally computable from longest-paths to boundary vertices B F C D A E s t B2B2

O( sort (N)) Topological Sort using B 2 -division 1.Compute a B 2 -division of G 2.Construct substitute graph G R using Weight of edge between v and u on boundary of G i equal to length of longest path from v to u in G i 3. Compute longest path to each vertex in G R (same as in G): Maintain list L of longest paths seen to each vertex Repeatedly: Obtain longest path for next vertex v in topological order Consider all edges (v,u) and update longest path to u 4. Find longest path to vertices inside clusters v

O( sort (N)) Topological Sort using B 2 -division Analysis  Access to adjacency list of each vertex takes O(N/B) I/Os  But..need dist(s,u) for all u in Adj(v) ..need indegree(u) for all u in Adj(v)  Keep list L S ={dist(s,u), for any u in S} For each vertex v read from L S the current distances of adjacent vertices  O(N) edges => O(N) accesses to L S  O(N) I/Os

R -division  Boundary vertices Bnd(G i ) of G i The separator vertices adjacent to G i  Boundary set Maximal subset of separator vertices that are adjacent to the same clusters  Lemma [Frederickson’87]: R-division of a planar graph of bounded degree has boundary sets.

Longest paths on G R Idea: use boundary sets Store L S so that vertices in the same boundary set are consecutive There are O(N/B 2 ) boundary sets Vertices in same boundary set have same O(B) neighbors in G R assuming G has bounded degree Each boundary set is accessed once by each neighbor in G R Each boundary set has size O(B)  O(N/B 2 ) x O(B) = O(N/B) I/Os O( sort (N)) Topological Sort using B 2 -division

Open Problems/Future Directions  I/O-Efficient Graph Algorithms O(sort(N)) DFS on planar digraphs Planar DAGs: can a DFS-tree be computed using topological order? Improved algorithms for general digraphs Simple and feasible O(sort(N)) algorithms for planar graphs and in particular for triangulations  I/O-Efficient GIS Processing LIDAR data Point to grid conversion, point to TIN conversion, terrain simplification, Delaunay triangulation… TINs Practical algorithms on triangulations Flow modeling on TINs

End

Upper Bounds  General undirected graphs CC, MST: [MR’99, ABT’01] BFS: [MM’02] SSSP: [MZ’03] DFS: [KS’96]  General directed graphs BFS, DFS, SSSP: [BVWB’00] Topological sort

Planar DAGs Summary and Open Problems  If the B 2 -division is given Topological sort can be computed in O(scan(N)) I/Os Extends to BFS and SSSP  Simplified O(scan(N)) algorithms for planar DAGs B 2 -division ? ? scan(N) SSSP BFS Topological sort DFS

Planar SSSP 1. Compute a B 2 -division of G 2. Construct a substitute graph G R on the separator vertices such that it preserves SP in G between any u,v in S replace each subgraph G i with a complete graph on Bnd(G i ) for any u, v on Bnd(G i ), the weight of edge (u,v) is δ Gi (u,v)  G R has O(N/B 2 )· O(B 2 )=O(N) edges and O(N/B) vertices 3. Compute SSSP on G R 4. Compute SSSP to vertices inside clusters s t B2B2

SSSP on G R with O(N/B) vertices and O(N) edges  Dijkstra’s algorithm with I/O-efficient p-queue Access to adjacency list of each vertex takes O(N/B) I/Os O(N) Insert/Delete/DeleteMin in O(sort(N)) I/Os [A95] But..need dist(s,u) for all u in Adj(v)  Keep list L S ={dist(s,u), for any u in S} For each vertex v read from L S the current distances of adjacent vertices  O(N) edges => O(N) accesses to L S  O(N) I/Os Planar SSSP v

SSSP on G R Idea: use boundary sets Store L S so that vertices in the same boundary set are consecutive There are O(N/B 2 ) boundary sets Vertices in same boundary set have same O(B) neighbors in G R assuming G has bounded degree Each boundary set is accessed once by each neighbor in G R Each boundary set has size O(B)  O(N/B 2 ) x O(B) = O(N/B) I/Os Planar SSSP

Planar APSP  Straightforward bound: O(N sort(N)) = O(sort(N 2 ))  Improved to optimal O(scan(N 2 ))  Idea: compute SP from all vertices in a cluster while cluster is in memory  For each cluster G i  For any α in Bnd(G i ) compute SSSP(α) in G R  For each cluster G j  load in memory G j, Bnd(G j ) and δ(Bnd(G i ), Bnd(G j ))  compute the shortest paths between all vertices in G i and G j d(u,v)=min{δ Gj (u,α) + δ GR (α, β) + δ Gi (β,v) | α in Bnd(G i ), β in Bnd(G j )}  write the output  O(N/B 2 ) clusters  O(sort(N 2 )/B) [compute] + O(scan(N 2 )) [output]  Diameter: O(sort(N 2 )/B) v u GiGi GjGj α β

General AP-BFS The APSP idea (compute SP from all vertices of a cluster while the cluster is in main memory) can be generalized to other algorithms which use clustering, like the BFS algorithm [MM’02] on general undirected graphs. Theorem: AP-BFS of a general undirected graph and its unweighted diameter can be computed in O(V sort(E)) I/Os. Note: general undirected BFS is O(sort(E)) amortized over V vertices

Planar DFS Idea: Partition the faces of G into levels around a source face containing s and grow DFS level-by-level Levels can be obtained from BFS in dual graph Structure of levels is simple (bicomps are cycles) Rooting/Attaching: use that a spanning tree is a DFS-tree if and only if it has no cross edges  A DFS-tree of a planar graph can be computed in O(sort(N)) I/Os

Planar Graphs  Shortest paths generalize to digraphs: compute B 2 -division on the underlying graph BFS, SSSP in O(sort(N)) APSP (transitive closure) in O(scan(N 2 )) diameter in O(sort(N 2 )/B)  DFS Undirected O(sort(N)) using BFS in the dual [O(sort(N) log N) direct algorithm using cycle separators] Directed The planar undirected DFS algorithms do not extend to digraphs O(sort(N)) DFS? open