Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Terracost: A Versatile and Scalable Approach to Computing Least-Cost-Path Surfaces for Massive Grid-Based.

Slides:



Advertisements
Similar presentations
Chapter 9: Graphs Shortest Paths
Advertisements

Introduction to Algorithms Rabie A. Ramadan rabieramadan.org 2 Some of the sides are exported from different sources.
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
Seunghwa Kang David A. Bader Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System.
The Shortest Path Problem. Shortest-Path Algorithms Find the “shortest” path from point A to point B “Shortest” in time, distance, cost, … Numerous.
Modeling & Analyzing Massive Terrain Data Sets (STREAM Project) Pankaj K. Agarwal Workshop on Algorithms for Modern Massive Data Sets.
I/O-Algorithms Lars Arge January 31, Lars Arge I/O-algorithms 2 Random Access Machine Model Standard theoretical model of computation: –Infinite.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus.
I/O-Efficient Construction of Constrained Delaunay Triangulations Pankaj K. Agarwal, Lars Arge, and Ke Yi Duke University.
MSSG: A Framework for Massive-Scale Semantic Graphs Timothy D. R. Hartley, Umit Catalyurek, Fusun Ozguner, Andy Yoo, Scott Kohn, Keith Henderson Dept.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
I/O-Algorithms Lars Arge Spring 2009 January 27, 2009.
I/O-Algorithms Lars Arge Spring 2007 January 30, 2007.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
I/O-Algorithms Lars Arge Spring 2006 February 2, 2006.
Flow Computation on Massive Grid Terrains
I/O-Algorithms Lars Arge Aarhus University April 16, 2008.
Efficient Algorithms for Large-Scale GIS Applications Laura Toma Duke University.
Graph Algorithms: Shortest Path We are given a weighted, directed graph G = (V, E), with weight function w: E R mapping.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, and Ke Yi Duke University University of Aarhus.
I/O-Algorithms Lars Arge University of Aarhus March 7, 2005.
Flow modeling on grid terrains. Why GIS?  How it all started.. Duke Environmental researchers: computing flow accumulation for Appalachian Mountains.
From Elevation Data to Watershed Hierarchies Pankaj K. Agarwal Duke University Supported by ARO W911NF
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
External Memory Algorithms Kamesh Munagala. External Memory Model Aggrawal and Vitter, 1988.
Route Planning Vehicle navigation systems, Dijkstra’s algorithm, bidirectional search, transit-node routing.
External-Memory MST (Arge, Brodal, Toma). Minimum-Spanning Tree Given a weighted, undirected graph G=(V,E), the minimum-spanning tree (MST) problem is.
I/O-Algorithms Lars Arge Spring 2008 January 31, 2008.
Tools for Planar Networks Grigorios Prasinos and Christos Zaroliagis CTI/University of Patras 3 rd Amore Research Seminar – Oegstgeest, The Netherlands,
Dijkstra's algorithm.
TerraStream: From Elevation Data to Watershed Hierarchies Thursday, 08 November 2007 Andrew Danner (Swarthmore), T. Moelhave (Aarhus), K. Yi (HKUST), P.
TerraFlow Flow Computation on Massive Grid Terrains Helena Mitasova Dept. of Marine, Earth & Atmospheric Sciences, NCSU, USA
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Graph Algorithms Shortest-Path.
Dijkstras Algorithm Named after its discoverer, Dutch computer scientist Edsger Dijkstra, is an algorithm that solves the single-source shortest path problem.
Keyword Search on External Memory Data Graphs Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan PVLDB 2008 Reported by: Yiqi Lu.
A Survey of Techniques for Designing I/O-Efficient Algorithm S.Fahimeh Moosavi Fall 1389.
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
An experimental study of priority queues By Claus Jensen University of Copenhagen.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus.
Parallel External Directed Model Checking with Linear I/O Shahid Jabbar Stefan Edelkamp Computer Science Department University of Dortmund, Dortmund, Germany.
Time Parallel Simulations I Problem-Specific Approach to Create Massively Parallel Simulations.
External A* Stefan Edelkamp, Shahid Jabbar (ich) University of Dortmund, Germany and Stefan Schrödl (DaimlerChrysler, CA)
Efficient Algorithms for Large-Scale GIS Applications Laura Toma Duke University.
Parallel and Distributed Simulation Time Parallel Simulation.
I/O Efficient Directed Model Checking Shahid Jabbar and Stefan Edelkamp, Computer Science Department University of Dortmund, Germany.
Flow Modeling on Massive Grids Laura Toma, Rajiv Wickremesinghe with Lars Arge, Jeff Chase, Jeff Vitter Pat Halpin, Dean Urban in collaboration with.
Equivalence Between Priority Queues and Sorting in External Memory
Laura TomaSimplified External memory Algorithms for Planar DAGs Simplified External Memory Algorithms for Planar DAGs July 2004 Lars Arge Laura Toma Duke.
Lecture 1: Basic Operators in Large Data CS 6931 Database Seminar.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
External Memory Graph Algorithms and Applications to GIS Laura Toma Bowdoin College.
Experimental evaluation of Navigation piles
Digital Terrain Analysis for Massive Grids
Space-efficient graph algorithms
Main Memory Management
Advanced Topics in Data Management
Fast Nearest Neighbor Search on Road Networks
Chapter 8: Memory management
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
University of Wisconsin-Madison
Graph Algorithms: Shortest Path
Accelerating Regular Path Queries using FPGA
Presentation transcript:

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Terracost: A Versatile and Scalable Approach to Computing Least-Cost-Path Surfaces for Massive Grid-Based Terrains ACM SAC April 2006 Dijon, France Thomas HazelLaura TomaJan VahrenholdRajiv Wickremesinghe Bowdoin College U. MuensterDuke University

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Least-cost path surfaces Problem –Input: A cost surface of a grid terrain and a set of sources –Output: A least-cost path surface: each point represents the shortest distance to a source Applications –Spread of fires from different sources –Distance from streams or roads –Cost of building pipelines or roads

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Grid terrains Sierra Nevada, 30m resolutionSierra Nevada, a cost surface

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Least-Cost Surface with one source Least-cost path surface (1 source)Cost surface

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Least-cost surface with Multiple Sources Multiple sources Least-cost surface (multiple sources)

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Least-cost path surfaces on massive terrains Why massive terrains? –Large amounts of data are becoming available NASA SRTM project: 30m resolution over the entire globe (~10TB) LIDAR data: sub-meter resolution Traditional algorithms designed in RAM model don’t scale –Buy more RAM? Data grows faster than memory –Data does not fit in memory, sits on disk –Random I/O + Virtual memory => swapping => I/O-bottleneck

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe I/O-Efficient Algorithms Input data (grid) stored on disk I/O-model [Agarwal and Vitter, 88] –N = size of grid –M = size of main memory –B = size of disk block –I/O-operation (I/O): Reading/Writing one block of data from/to disk I/O efficiency –Number of I/Os performed by the algorithm Basic I/O bounds –Scanning: Scan(N) =  (N/B) I/Os –Sorting: Sort(N) =  (N/B log M/B N/B) I/Os –In practice M and B are big: Scan(N) < Sort(N) << N I/Os blocked I/O M

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Terracost Scalable approach to computing least-cost path surfaces on massive terrains –Based on optimal I/O-efficient algorithm: O(Sort(N)) I/Os Experimental analysis on real-life data –Can handle bigger grids –Can handle more sources Versatile: Interpolate between versions optimized for I/O or CPU Parallelization on a cluster

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Outline Background Shortest paths Related Work Shortest paths in the I/O-Model Terracost Algorithm Experimental analysis Cluster implementation Conclusions

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Shortest Paths Least-cost path surfaces correspond to computing shortest paths Shortest paths –Ubiquitous graph problem –Variations SSSP: Single source shortest path MSSP: Multiple source shortest path Grid terrains --> graphs Cost grid Corresponding graph Shortest-distance from center point

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Related Work Dijkstra’s Algorithm –Best known for SSSP/MSSP on general graphs, non-negative weights Recent variations on the SP algorithm –Goldberg et al SODA 200, WAE 2005 –Kohler, Mohring, Schilling WEA 2005 –Gutman WEA 2004 –Lauther 2004 Different setting –Point-to-point SP E.g. Route planning, navigation systems –Exploit geometric characteristics of graph to narrow down search space Route planning graphs –Use RAM model –When dealing with massive graphs ==> I/O bottleneck

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Dijkstra’s Algorithm 1: Insert sources in a priority queue(PQ) 2: While PQ is not empty 3: DeleteMin vertex u with the least cost from PQ 4:Relax all edges incident to u In external memory –Dijkstra’s algorithm requires 3 main data structures: 1: Priority queue (can be implemented I/O-efficiently) 2: Grid of costs (size = N >> M) 3: Grid of current shortest path (size = N >> M) –Each time we DeleteMin from PQ, for every adjacent edge (u,v) we must do a lookup in both grids. To check whether v can be relaxed –These lookups can cost O(1) I/Os each in the worst case ==> Total O(N ) I/Os

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe I/O-Efficient SSSP on Grids [ATV’01] 0: Divide grid G into subgrids(G i ) of size O(M) 1: Construct a substitute graph S on the boundary vertices Replace each subgrid with a complete graph on its boundary For any u,v on the boundary of G i, the weight of edge (u,v) in S is SP G i (u,v) Lemma: S has O(N /√M ) vertices, O(N) edges and it preserves the SP in G between any two boundary vertices u, v. 2: Solve SP in S Gives SP for all boundary vertices in G 3: Compute SP to vertices inside subgrids

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Terracost Step 1: (intra-tile Dijkstra) –Partition into tiles of size R –Compute an edge-list representation of substitute graph S Dijkstra from each boundary to tile boundaries Dijkstra from sources to tile boundaries Step 2: –Sort boundary-to-boundary stream Step 3: (inter-tile Dijkstra) –Dijkstra on S Step 4: (final Dijkstra) –MSSP for each tile

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Experimental Analysis Experimental Platform Apple Power Macintosh G5 Dual 2.5 GHz processors 512 KB L2 cache 1 GB RAM Compare Terracost with r.cost in GRASS r.cost has same functionality GRASS users have complained it is very slow for large terrains Dataset Grid Size (million elements) MB (Grid Only) Kaweah 1.66 Puerto Rico Hawaii Sierra Nevada Cumberlands Lower New England Midwest USA

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Experimental Analysis GRASS: r.cost Opt Dijkstra: internal memory version of Terracost (num tiles = 1) Terracost: I/O-efficient version of Terracost

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe CPU-I/O Tradeoff R = tile size I/O-complexity = O(N / √R + sort(N)) –Dominated by Step 3 (inter-tile Dijkstra) CPU-complexity = O(N √R log R) –Dominated by Step 1 (intra-tile Dijkstra) So, to optimize I/Os, we want a large R. But, to optimize CPU, we want a small R. Optimal performance: balance I/O-CPU Lower NE, Cost = Slope, 1GB RAM, Single source

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Terracost on Clusters Terracost lends itself to parallelization We parallelized the most CPU-intensive part –Computing the substitute graph (Step 1) Hgrid –Cluster management tool –Clients submit requests (run jobs, query status); agents get jobs and run them Near-linear speedup

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Conclusions and Future Work Key Points Dijkstra’s algorithm is I/O-inefficient on large data sets Terracost restructures the input grid to run I/O-efficiently - But we can’t ignore CPU-complexity completely I/O-bottleneck increases with number of sources for MSSP Tiling inTerracost allows for parallelization Future Work Determine the optimal tile size analytically Find I/O-efficient SSSP/MSSP w/o increase of CPU-efficiency

Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Thank you.