Presentation is loading. Please wait.

Presentation is loading. Please wait.

External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003.

Similar presentations


Presentation on theme: "External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003."— Presentation transcript:

1 External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003

2 Massive Data Massive datasets are being collected everywhere Storage management software is billion-$ industry Examples:  Geography: NASA satellites generate 1.2TB per day  WEB: Web crawl of 200M pages and 2000M links, Akamai stores 7 billion clicks per day  Phone: AT&T 20TB phone call database  Consumer: WalMart 70TB database, buying patterns (supermarket checkout)

3 Sorting: sort(N) = I/Os Scanning: scan(N) = I/Os  I/O-operation: movement of one block of data from/to disk  Complexity measure: number of I/Os  Fundamental bounds: I/O Model [AV’88] N= problem size B = disk block size M = memory size M Block I/O  In practice B and M are big

4 Outline  I/O-efficient graph algorithms Problems, techniques and results  Algorithms for planar graphs using graph separation  A GIS application: TerraFlow

5 I/O-Efficient Graph Algorithms  Input: G = (V,E) Assume edge-list representation of stored on disk  Basic problems: BFS, DFS, CC, SSSP, MST Hard in external memory! Lower bound: Ω(min{V, sort(V)}) (practically Ω(sort(V)) Standard internal memory algorithms for these problems use O(E) I/Os Adj(v1) Adj(v2) Adj(v3) … G

6 BFS and DFS DFS(u)  Mark u  For every v in Adj(u) If v not marked DFS(v) Internal memory: O(V+E) External memory:  one I/O per vertex to load adjacency list  Ω (V ) I/Os  one I/O per edge to check if v is marked  Ω (E) I/Os  O(V+E)= O(E) I/Os

7 SSSP and MST  Dijkstra’s algorithm Maintain p-queue on vertices not yet included in SSSP Repeatedly DeleteMin(v) and relax each adjacent edge (v,u) if d(s,u) > d(s,u) + w vu then DecreaseKey(u, d(s,u) + w vu )  External memory: one I/O per vertex to load adjacency list  Ω (V) I/Os External p-queue: O(E) Insert/Delete/DeleteMin in O(sort(E)) I/Os DecreaseKey: O(1) I/Os to read key of u  Ω (E) I/Os  O(V+E+sort(E))= O(E) I/Os v

8 I/O-Efficient Graph Algorithms  Problems: 1.Random (unstructured) accesses to the adjacency lists of vertices as they are visited  Ω(V) I/Os 2.Need to check if v has been already visited and/or read its key  Ω(E) I/Os o(E) algorithm: solve (2) o(V) algorithm: solve (1) and (2)

9 o(E) Algorithms  Store edges to previously seen vertices  Undirected/directed BFS, DFS, SSSP buffered repository tree (BRT) [BGVW’00] Insert(v, e), ExtractAll(v)  Process/update all adjacent edges without checking if necessary  Undirected SSSP: I/O-efficient tournament tree [KS’96] DecreaseKey(v,k)  Undirected MST: O(V + sort(E)) [ABT’01] Maintain a priority queue on edges incident to current MST How to decide if v is in MST without doing one I/O? –If next edge returned by DeleteMin is the same then v already in MST v u v

10 o(V) Algorithms  CC and MST: [MR’99, ABT’01] graph contraction Goal: reduce the problem to the same problem on a smaller graph by selecting disjoint subgraphs and contracting them A contraction phase reduces nb of vertices by a constant fraction Typically use a sequence of contraction steps G = G 0  G 1  G 2 …  G i … CC and MST algorithms: general idea Use contraction steps Use an O(V+sort(E)) algorithm on G’ u1 u2 u3 u4 u1 u2 u3 u4

11 o(V) Algorithms Undirected BFS, SSSP [MM’02, MZ’03] Clustering partition graph into V/k subgraphs (clusters) of k vertices BFS Idea: Keep a pool of hot clusters A cluster is loaded in the pool once A cluster stays in the pool until all its vertices have been visited 

12 Upper Bounds  General undirected graphs CC, MST: [MR’99, ABT’01] BFS: [MM’02] SSSP: [MZ’03] DFS: [KS’96]  General directed graphs BFS, DFS, SSSP: [BVWB’00] Topological sort

13 Upper Bounds Sparse Graphs  Sparse graphs E=O(V) CC, MST : O(sort(V)) if graph stays sparse under edge contraction Undirected BFS: O(sort(V)) ? open Undirected SSSP: O(sort(V)) ? open Undirected DFS: O(V) o(V) ? open Directed BFS, DFS, SSSP  O(sort(N)) BFS, SSSP, (DFS) on special classes of sparse graphs Planar Outerplanar, grid, bounded-treewidth

14 Planar Undirected Graphs  BFS, DFS, SSSP: O(sort(N)) I/Os O(sort(N)) I/O-efficient reductions [ABT’00, AMTZ’01] Separators can be computed in O(sort(N)) I/Os [MZ’02] O(sort(N)) I/Os [AMTZ’01] O(sort(N)) I/Os [ABT’00] DFS BFSSSSP separators

15 I/O-Efficient Graph Algorithms Our Contributions  An MST on general undirected graphs.  O(sort(N)) algorithms on planar graphs Reducibility on planar undirected graphs Planar digraphs: SSSP, BFS, directed ear decomposition and topological sort  An O(sort(N) log N) DFS algorithm for planar undirected graphs O(sort(N)) cycle separator  All-pair-shortest-paths and diameter Planar digraphs General undirected graphs  Data structure for shortest path queries on planar digraphs Trade-off space-query  GIS application: TerraFlow Flow modeling on grid terrains r.terraflow: Port into GRASS, the open source GIS

16 Outline  I/O-efficient graph algorithms Problems, techniques and results  Algorithms for planar graphs using graph separation Shortest paths (SSSP, BFS, APSP) DFS Topological sort on planar DAGs Data structure for SP queries  A GIS application: TerraFlow

17 Planar graph separation: R-division  A partition of a planar graph using a set S of separator vertices into. subgraphs (clusters) G i of at most R vertices each such that: There are separators vertices in total There is no edge between a vertex in G i and a vertex in G j Each cluster is adjacent to separator vertices R R R R R R R R R

18 R -division  Boundary vertices Bnd(G i ) of G i The separator vertices adjacent to G i  Boundary set Maximal subset of separator vertices that are adjacent to the same clusters  Lemma [Frederickson’87]: R-division of a planar graph of bounded degree has boundary sets.

19 R -divisions and Planar Graph Algorithms  R-divisions [Frederickson’87]  dynamic graph algorithms [GI’91,KS’93], faster SP algorithms [HKRS’97], SP data structures  In external memory choose R = B 2 O(N/B) separator vertices O(N/B 2 ) clusters of O(B 2 ) vertices each and O(B) boundary vertices O(N/B 2 ) boundary sets Can be computed in O(sort(N)) I/Os [MZ’02]  B 2 -division  SSSP, BFS, DFS, topological sort, APSP, diameter, SP data structures,..

20 Planar SSSP 1. Compute a B 2 -division of G 2. Construct a substitute graph G R on the separator vertices such that it preserves SP in G between any u,v in S replace each subgraph G i with a complete graph on Bnd(G i ) for any u, v on Bnd(G i ), the weight of edge (u,v) is δ Gi (u,v)  G R has O(N/B 2 )· O(B 2 )=O(N) edges and O(N/B) vertices 3. Compute SSSP on G R 4. Compute SSSP to vertices inside clusters s t B2B2

21 SSSP on G R with O(N/B) vertices and O(N) edges  Dijkstra’s algorithm with I/O-efficient p-queue Access to adjacency list of each vertex takes O(N/B) I/Os O(N) Insert/Delete/DeleteMin in O(sort(N)) I/Os [A95] But..need dist(s,u) for all u in Adj(v)  Keep list L S ={dist(s,u), for any u in S} For each vertex v read from L S the current distances of adjacent vertices  O(N) edges => O(N) accesses to L S  O(N) I/Os Planar SSSP v

22 SSSP on G R Idea: use boundary sets Store L S so that vertices in the same boundary set are consecutive There are O(N/B 2 ) boundary sets Vertices in same boundary set have same O(B) neighbors in G R assuming G has bounded degree Each boundary set is accessed once by each neighbor in G R Each boundary set has size O(B)  O(N/B 2 ) x O(B) = O(N/B) I/Os Planar SSSP

23 Planar APSP  Straightforward bound: O(N sort(N)) = O(sort(N 2 ))  Improved to optimal O(scan(N 2 ))  Idea: compute SP from all vertices in a cluster while cluster is in memory  For each cluster G i  For any α in Bnd(G i ) compute SSSP(α) in G R  For each cluster G j  load in memory G j, Bnd(G j ) and δ(Bnd(G i ), Bnd(G j ))  compute the shortest paths between all vertices in G i and G j d(u,v)=min{δ Gj (u,α) + δ GR (α, β) + δ Gi (β,v) | α in Bnd(G i ), β in Bnd(G j )}  write the output  O(N/B 2 ) clusters  O(sort(N 2 )/B) [compute] + O(scan(N 2 )) [output]  Diameter: O(sort(N 2 )/B) v u GiGi GjGj α β

24 General AP-BFS The APSP idea (compute SP from all vertices of a cluster while the cluster is in main memory) can be generalized to other algorithms which use clustering, like the BFS algorithm [MM’02] on general undirected graphs. Theorem: AP-BFS of a general undirected graph and its unweighted diameter can be computed in O(V sort(E)) I/Os. Note: general undirected BFS is O(sort(E)) amortized over V vertices

25 Planar DFS Idea: Partition the faces of G into levels around a source face containing s and grow DFS level-by-level Levels can be obtained from BFS in dual graph Structure of levels is simple (bicomps are cycles) Rooting/Attaching: use that a spanning tree is a DFS-tree if and only if it has no cross edges  A DFS-tree of a planar graph can be computed in O(sort(N)) I/Os

26 Planar Graphs  Shortest paths generalize to digraphs: compute B 2 -division on the underlying graph BFS, SSSP in O(sort(N)) APSP (transitive closure) in O(scan(N 2 )) diameter in O(sort(N 2 )/B)  DFS Undirected O(sort(N)) using BFS in the dual [O(sort(N) log N) direct algorithm using cycle separators] Directed The planar undirected DFS algorithms do not extend to digraphs O(sort(N)) DFS? open

27 Outline  I/O-efficient graph algorithms Problems, techniques and results  Algorithms for planar graphs using graph separation Shortest paths (SSSP, BFS, APSP) DFS Topological sort on planar DAGs O(sort(N)) using directed ear decomposition (DED) of its dual Simplified algorithm using B 2 -division Data structure for SP queries  A GIS application: TerraFlow

28 Directed Ear Decomposition (DED)  A directed ear decomposition of a graph G is a partition of G into simple directed paths P 0, P 1, …, P k such that: P 0 is a simple cycle endpoints of each P i i>0 are in lower-indexed paths P j, P l, j,l<i internal vertices of each P i i>0 are not in any P j j<i  G has a directed ear decomposition if and only if it is strongly connected (exist directed cycle containing each pair of vertices u,v).  Planar DED: O(sort(N)) I/Os

29 Planar Topological Sort using DED  Theorem [KK’79]: The directed dual of a planar DAG is strongly connected and therefore has a directed ear decomposition.  Idea: Place vertices to the left of P 0 before vertices to the right Sort two sets recursively  Used in PRAM topological sort algorithm [KK93,K93]  PRAM simulation  O(sort(N)log N) I/Os  Improved to O(sort(N)) by defining and utilizing ordered ear decomposition tree [ATZ’03]

30 O( sort (N)) Topological Sort using B 2 -division Same idea as in planar SSSP algorithm  Construct a substitute graph G R using B 2 -division edge from v to u on boundary of G i if exists path from v to u in G i  Topologically sort G R (separator vertices in G): Store in-degree of each vertex in list L Maintain list of in-degree zero vertices Repeatedly: Number an in-degree zero vertex v Consider all edges (v,u) and decrement in-degree of u in L  analysis exactly as in SSSP algorithm O(scan(N)) if B 2 -division is given B2B2 v

31 O( sort (N)) Topological Sort using B 2 -division  Problem: Not clear how to incorporate removed vertices from G in topological order of separator vertices (G R )  Solution (assuming only one in-degree zero vertex s for simplicity): Longest-path-from-s order is a topological order Longest paths to removed vertices locally computable from longest-paths to boundary vertices 1 2 3 4 5 B F C D A E s t B2B2

32 O( sort (N)) Topological Sort using B 2 -division 1.Compute a B 2 -division of G 2.Construct substitute graph G R using Weight of edge between v and u on boundary of G i equal to length of longest path from v to u in G i 2. Compute longest path to each vertex in G R (same as in G): Maintain list L of longest paths seen to each vertex Repeatedly: Obtain longest path for next vertex v in topological order Consider all edges (v,u) and update longest path to u 3. Find longest path to vertices inside clusters  analysis exactly as for planar SSSP algorithm  O(scan(N)) if B 2 -division is given v

33 Outline  I/O-efficient graph algorithms Problems, techniques and results  Algorithms for planar graphs using graph separation Shortest paths (SSSP, BFS, APSP) DFS Topological sort on planar DAGs Data structure for SP queries  A GIS application: TerraFlow

34 Data Structure for SP Queries on Planar Digraphs  Problem: pre-process a planar digraph into a data structure in order to answer efficiently distance (shortest path) queries between arbitrary vertices  Trade-off space-query: O(S) space, query = ? The two extreme straightforward solutions: O(N) space, O(sort(N)) I/O query O(N 2 ) space, O(1) I/O query  Related work: Planar graphs: [Arikati et al, Djidjev, 1996] [Chen & Xu, 2000] Space-query trade-off: for any S in [N, N 2 ], S x Q = O(N 2 ) General graphs: approx shortest paths [Cohen, Halperin, Zwick, …] I/O-model : space, query [HMZ’99]

35  Basic data structure [Arikati et al, Djidjev]: Recursively, compute a separator and store for each vertex u in G the shortest path from u to all separator vertices. Space, query time, I/Os [HMZ’99]  Generalized to any S in [N, N 2 ]: O(S) space, Q=O(N2/S) Use R-division S in [N, N 3/2 ]: Store shortest paths between the separator vertices and compute shortest path in each cluster on the fly. S in [N 3/2, N 2 ]: Pre-process each cluster as a basic data structure and for any vertex u in G store shortest paths from u to all separator vertices.  I/O-model S in [N, N 3/2 ]: ? S in [N 3/2, N 2 ]: O(S) space, query using [HMZ’99] Data Structure for SP Queries on Planar Digraphs

36  General framework: Compute an R-division. Store APSP between separator vertices. This uses space O(N 2 /R). Query: δ(u,v)=min{δ Gj (u,α) + δ GR (α, β) + δ Gi (β,v) | α in Bnd(G i ), β in Bnd(G j )}  Problems 1.Store APSP between separator vertices so that the O(R) distances δ(Bnd(G i ), Bnd(G j )) can be retrieved efficiently in O(scan(R)) I/Os 2.Compute δ Gj (u,v) in O(scan(R)) I/Os Pre-process each cluster recursively 3.Compute δ Gj (u, Bnd(G i )) in O(scan(R)) I/Os Pre-process each cluster into a data structure for answering all-boundary-SP queries v u GiGi GjGj α β

37 Data Structure for SP Queries on Planar Digraphs  Let G be a planar graph of size N and Bnd(G) its boundary of size O(N 1/2 ). There exists a data structure that uses space O(N lg N) and answers all- boundary-shortest-path queries in O(N/B) I/Os.  Theorem: For any S in [N, N 2 /B] there exists a data structure which answers distance queries in I/Os and can be built in I/Os. The size is. if and if. For S = Θ(N): O(N log 2 N) space and O(N/B) query For any S/N = Ω (N ε ) or S = Ω (N 1 +ε ) for some ε in (0,1] There exists a data structure of size O(S) which answers distance queries in I/Os and can be built in I/Os.

38 Outline  I/O-efficient graph algorithms Problems, techniques and results  Algorithms for planar graphs using graph separation Shortest paths DFS Topological sort on planar DAGs Data structure for SP queries  A GIS application: TerraFlow

39 DEM Representations 324 758 719 324 758 719 324 758 719 324 758 719 TIN Grid Contour lines Sample points TerraFlow  Grids DEMs  grid graphs  On grid graphs: BFS, SSSP, CC in O(sort(N)) I/Os

40 Example: LIDAR Terrain Data  Massive (irregular) point sets (1-10m resolution)  Relatively cheap and easy to collect Example: Jockey’s ridge (NC coast) TerraFlow

41 Modeling Flow on Terrains  What happens when it rains? Predict areas susceptible to floods. Predict location of streams.  Flow is modeled by computing two basic attributes from the DEM of the terrain: Flow Direction (FD) The direction water flows at a point Flow Accumulation (FA) Total amount of water that flows through a point if water is distributed according to the flow directions TerraFlow

42 Flow Accumulation of Panama TerraFlow

43 Panama Flow Accumulation: zoom TerraFlow

44 GIS Performance on Massive Data  GRASS (open source GIS) Killed after running for 17 days on a 6700 x 4300 grid (approx 50 MB dataset)  TARDEM (research, U. Utah) Killed after running for 20 days on a 12000 x 10000 grid (appox 240 MB dataset) CPU utilization 5%, 3GB swap file  ArcInfo (commercial GIS) Can handle the 240MB dataset Doesn’t work for datasets bigger than 2GB TerraFlow

45 Flow Direction (FD) on Grids  On grids: Approximated using 3x3 neighborhood Problem: flat areas - Plateas and sinks  Goal: compute FD grid Every cell has flow direction Flow directions do not induce cycles Every cell has a flow path outside the terrain TerraFlow

46 FD on Flat Areas  Plateaus A cell flows towards the nearest spill point on the boundary of the plateau Compute FD on plateaus using CC and BFS  Sinks Route the water uphill out of the sink by modeling flooding: uniformly pouring water on terrain until steady-state is reached Flooding removes (fills) sinks  Assign uphill flow directions on the original terrain by assigning downhill flow directions on the flooded terrain TerraFlow

47 Flooding  Watershed: part of the terrain that flows into a sink  Sinks  partition of terrain into watersheds  watershed graph G T Vertices are watersheds; add vertex for the “outside” watershed Edge (u,v) if watersheds u,v are adjacent Edge (u,v) labeled with lowest height on boundary between u and v  Flooding: Compute for each watershed u to the height h u of the lowest- height path in G T from u to the “outside” watershed. the height of a path is the height of the highest edge on path TerraFlow

48 Flooding  Plane-sweep algorithm with a Union-Find structure Initially only the outside watershed is done Sweep watershed graph bottom-up with a horizontal plane When hit edge (u,v) If both watersheds u and v are done, ignore If none is done, union them If precisely one is not done, raise it at h (u,v) and mark it done  Theorem: Flooding and the FD grid can be computed in O(sort(N)) I/Os on a grid DEM of size N. TerraFlow

49 Flow Accumulation (FA) on Grids FA models water amount of flow through each cell with “uniform rain” Initially one unit of water in each cell Water distributed from each cell to neighbors pointed to by its FD Flow conservation: If several FD, distribute proportionally to height difference Flow accumulation of cell is total flow through it Goal: compute FA for every cell in the grid (FA grid) Theorem: The FA grid can be computed in O(sort(N)) I/Os. TerraFlow

50  TerraFlow: implementation of I/O-efficient FD and FA algorithms Significantly faster on very large grids than existing GIS software Scalable: 1 billion elements!! (>2GB data) Allows multiple methods flow modeling  Implementation C++, uses TPIE (Transparent Parallel I/O Environment) Library of I/O-efficient modules developed at Duke  Experimental platform TerraFlow, ArcInfo: 500MHz Alpha, FreeBSD 4.0, 1GB RAM GRASS/TARDEM: 500MHz Intel PIII, FreeBSD/Windows, 1GB RAM http://www.cs.duke.edu/geo*/terraflow TerraFlow

51  GRASS cannot handle Hawaii dataset (killed after 17 days)  TARDEM cannot handle Cumberlands dataset (killed after 20 days)  Significant speedup over ArcInfo (ESRI) for large datasets East-Coast TerraFlow: 8.7 Hours ArcInfo: 78 Hours Washington TerraFlow: 63 Hours ArcInfo: % http://www.cs.duke.edu/geo*/terraflow TerraFlow

52 TerraFlow in GRASS  r.terraflow Port of TerraFlow into GRASS Available with GRASS 5.0.2  Preliminary results on Quality of output Comparison with r.watershed SFD, MFD comparison Performance analysis  Good response from users http://grass.itc.it TerraFlow

53 Preliminary Experimental Results PIII dual 1GHz processor, 1GB RAM DatasetGrid dimensions Grid size (million elements) Kaweah 1163 x 14241.6 Puerto Rico 4452 x 13785.9 Sierra Nevada3750 x 26729.5 Hawaii6784 x 436928.2 Lower New England 9148 x 850977.8 Panama11283 x 10862122.5 r.terraflow 1.85 min 4.65 min 19.22 min 22.35 min 114 min 3.5 hr r.watershed 9.2 min 93 min 18.2 hours killed after 6 days < 1% done http://grass.itc.it TerraFlow

54 I/O-Efficient GIS Future Directions  TerraFlow Extend flow direction modeling (D-inf) Realistic treatment of flat areas Partial flooding Computing complete watershed hierarchy  Processing LIDAR data Point to grid conversion, point to TIN conversion, terrain simplification, Delaunay triangulation…  TINs Practical algorithms on triangulations Flow modeling on TINs Geometric? Graph theoretical?

55 I/O-Efficient Graph Algorithms Open Problems  Improved algorithms for general digraphs  O(sort(N)) DFS on planar digraphs Planar DAGs: can a DFS-tree be computed using topological order?  O(sort(E)) algorithm for CC/MST  Improved DFS on general undirected graphs (clustering?)  Simple and feasible O(sort(N)) algorithms for planar graphs and in particular for triangulations  Dynamic data structures for planar graphs

56 The End

57 Upper Bounds Dense graphs  CC,MST: :  BFS:  SSSP:  DFS: Sparse graphs E=O(V)  CC, MST: O(sort(V)) if graph closed under edge contraction  BFS:  SSSP:  DFS: O(V)  O(sort(V)) BFS, DFS, SSSP on planar graphs, outerplanar graphs, grid graphs, bounded-tree-width graphs General undirected graphs

58 MST Contraction Step  Used in PRAM MST algorithms [CLC’82]  Each vertex selects its lightest adjacent edge  Lemma: Each selected edge must be part of MST  The selected edges are contracted: Number of resulting vertices at most V/2 Note: contraction does not reduce the number of edges  MST contraction step in O(sort(E)) I/Os Finding the representative of a super-vertex [ABT’01]

59 I/O-Efficient MST Graph contraction algorithm can be improved by grouping the contraction steps in super-steps Each super-step in O(sort(E) + sort(V)) I/Os Basic idea: in order to perform k contraction steps need to know only the 2 k lightest edges adjacent to each node  each super-step works with a subset of the edges {nb contraction steps} x {subset of edges} = O(V) 

60 A direct O( sort (N) log N) DFS Algorithm on Planar Undirected Graphs  Divide-and-conquer using cycle separators [PRAM DFS, Smith86]  Algorithm Compute a cycle separator C and path P Compute DFS recursively in the connected components G i of G\P Attach the DFS trees of G i onto the cycle  I/O-analysis O(log N) recursive steps O(sort(N)) I/Os per step simple O(sort(N)) algorithm for finding a cycle separator  O(sort(N) log N) I/Os in total

61 Planar DFS  Denote G i = union of the boundaries of faces at level <= i H i = G i \ G i-1 T i = DFS-tree of G i  Structure of levels is simple The bicomps of the H i are the boundary cycles of G i

62 Planar DFS  Algorithm: Compute DFS of H i and attach it onto T i-1  Attaching onto T i-1 :

63 Planar DAGs Summary and Open Problems  If the B 2 -division is given Topological sort can be computed in O(scan(N)) I/Os Extends to BFS and SSSP  Simplified O(scan(N)) algorithms for planar DAGs B 2 -division ? ? scan(N) SSSP BFS Topological sort DFS

64 Massive Terrain Data  Remote sensing technology Massive amounts of terrain data Higher resolutions (1km, 100m, 30m, 10m, 1m,…)  NASA-SRTM Mission launched in 2001 Acquired data for 80% of earth at 30m resolution 5TB  USGS Most of US at 10m resolution  LIDAR 1m res TerraFlow

65 Uses Flow direction and flow accumulation are used for:  Computing other hydrological attributes river network moisture indices watersheds and watershed divides  Analysis and prediction of sediment and pollutant movement in landscapes.  Decision support in land management, flood and pollution prevention and disaster management TerraFlow

66  Algorithm: Input: flow direction grid FD Output: flow accumulation grid FA (initialized to 1) Process (sweep) cells in topological order. For each cell: Read flow from FA grid and direction from FD grid Update flow in FA grid for downslope neighbors  Analysis One sweep enough: O(sort) + O(N) time for a grid of N cells,..but O(N) I/Os: Cells in topological order distributed over the terrain Standard FA Algorithm TerraFlow

67 I/O-Efficient FA Algorithm  Eliminating scattered accesses to FD grid Store FD grid in topological order  Eliminating scattered accesses to FA grid.. ….by replacing them with accesses to a p-queue Idea: Flow to neighbor is only needed when neighbor is processed time when cell is processed topological rank priority Push flow by Insert-ing a flow increment in p-queue with priority equal to neighbor’s time Flow of cell obtained using DeleteMin Note: Augment each cell with priorities of 8 neighbors Obs: Space (~9N) traded for I/O  The FA grid can be computed in O(sort(N)) I/Os. TerraFlow

68 GRASS:>r.terraflow help Description: Flow computation for massive grids. Usage: r.terraflow [-sq] elev=name filled=name direction=name watershed=name accumulation=name tci=name [d8cut=value] [memory=value] [STREAM_DIR=name] [stats=name] Flags: -s SFD (D8) flow (default is MFD) -q Quiet Parameters: elev Input elevation grid filled Output (filled) elevation grid direction Output direction grid watershed Output watershed grid accumulation Output accumulation grid tci Output tci grid d8cut If flow accumulation is larger than this value it is routed using SFD (D8) direction (meaningfull only for MFD flow only). default: infinity memory Main memory size (in MB) default: 300 STREAM_DIR Location of intermediate STREAMs default: /var/tmp stats Stats file default: stats.outv http://www.cs.duke.edu/geo*/terraflow

69 Flat DEM

70 r.terraflow MFD

71 r.terraflow SFD

72 r.watershed

73 r.terraflow MFD zoom,2D

74 r.terraflow SFD zoom,2D

75 It’s Growing!  Appalachian Mountains Area if approx. 800 km x 800 km Sampled at: 100m resolution:  64 million points (128MB) 30m resolution:  640 (1.2GB) 10m resolution:  6400 = 6.4 billion (12GB) 1m resolution:  600.4 billion (1.2TB)


Download ppt "External Memory Graph Algorithms and Applications to GIS Laura Toma Duke University July 14 2003."

Similar presentations


Ads by Google