Digital Terrain Analysis for Massive Grids

Slides:



Advertisements
Similar presentations
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
Advertisements

Lars Arge 1/13 Efficient Handling of Massive (Terrain) Datasets Lars Arge A A R H U S U N I V E R S I T E T Department of Computer Science.
Fort Bragg Cantonment Area Background The USGS is working with the U.S. Army at Fort Bragg to develop a Storm Water Pollution Prevention Plan (SWP3). The.
Modeling & Analyzing Massive Terrain Data Sets (STREAM Project) Pankaj K. Agarwal Workshop on Algorithms for Modern Massive Data Sets.
I/O-Algorithms Lars Arge January 31, Lars Arge I/O-algorithms 2 Random Access Machine Model Standard theoretical model of computation: –Infinite.
From Topographic Maps to Digital Elevation Models Daniel Sheehan DUE Office of Educational Innovation & Technology Anne Graham MIT Libraries.
Disk Access Model. Using Secondary Storage Effectively In most studies of algorithms, one assumes the “RAM model”: –Data is in main memory, –Access to.
I/O-Algorithms Lars Arge Spring 2009 January 27, 2009.
I/O-Algorithms Lars Arge Spring 2007 January 30, 2007.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
I/O-Algorithms Lars Arge Spring 2006 February 2, 2006.
Flow Computation on Massive Grid Terrains
Lars Arge 1/14 A A R H U S U N I V E R S I T E T Department of Computer Science Efficient Handling of Massive (Terrain) Datasets Professor Lars Arge University.
Efficient Algorithms for Large-Scale GIS Applications Laura Toma Duke University.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
Massive Data Algorithmics Faglig Dag, January 17, 2008 Gerth Stølting Brodal University of Aarhus Department of Computer Science.
Flow modeling on grid terrains. Why GIS?  How it all started.. Duke Environmental researchers: computing flow accumulation for Appalachian Mountains.
From Elevation Data to Watershed Hierarchies Pankaj K. Agarwal Duke University Supported by ARO W911NF
Flow modeling on grid terrains. DEM Representations TIN Grid Contour lines Sample points.
Evaluating river cross section for SPRINT: Guadalupe and San Antonio River Basins Alfredo Hijar Flood Forecasting.
Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere.
I/O-Algorithms Lars Arge Spring 2008 January 31, 2008.
From Topographic Maps to Digital Elevation Models Daniel Sheehan IS&T Academic Computing Anne Graham MIT Libraries.
TerraStream: From Elevation Data to Watershed Hierarchies Thursday, 08 November 2007 Andrew Danner (Swarthmore), T. Moelhave (Aarhus), K. Yi (HKUST), P.
TerraFlow Flow Computation on Massive Grid Terrains Helena Mitasova Dept. of Marine, Earth & Atmospheric Sciences, NCSU, USA
I/O-Algorithms Lars Arge Fall 2014 August 28, 2014.
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
DEM’s, Watershed and Stream Network Delineation DEM Data Sources Study Area in West Austin with a USGS 30m DEM from a 1:24,000 scale map Eight direction.
Efficient Volume Visualization of Large Medical Datasets Stefan Bruckner Institute of Computer Graphics and Algorithms Vienna University of Technology.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Conclusions and Future Considerations: Parallel processing of raster functions were 3-22 times faster than ArcGIS depending on file size. Also, processing.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
A User-Lever Concurrency Manager Hongsheng Lu & Kai Xiao.
Terracost: Hazel, Toma, Vahrenhold, Wickremesinghe Terracost: A Versatile and Scalable Approach to Computing Least-Cost-Path Surfaces for Massive Grid-Based.
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Efficient Algorithms for Large-Scale GIS Applications Laura Toma Duke University.
Flow Modeling on Massive Grids Laura Toma, Rajiv Wickremesinghe with Lars Arge, Jeff Chase, Jeff Vitter Pat Halpin, Dean Urban in collaboration with.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Lecture 1: Basic Operators in Large Data CS 6931 Database Seminar.
Water Availability Modeling in the State of Texas CE 394 K.2 - Surface Water Hydrology University of Texas at Austin David Mason.
TerraSTREAM: Terrain Processing Pipeline MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation What TerraSTREAM.
Practical Hadoop: do’s and don’ts by example Kacper Surdy, Zbigniew Baranowski.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
Definition In scientific literature there is no universal agreement about the usage of the terms: digital elevation model (DEM) digital terrain model (DTM)
Flow field representations for a grid DEM
Approaches to Continental Scale River Flow Routing
Automation of Input data preparation of TOPNET model using Python
CS 350 Algorithms for GIS.
Lecture 11: DMBS Internals
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
On Spatial Joins in MapReduce
GIS FOR HYDROLOGIC DATA DEVELOPMENT FOR DESIGN OF HIGHWAY DRAINAGE FACILITIES by Francisco Olivera and David Maidment Center for Research in Water Resources.
Advanced Topics in Data Management
Database Management Systems (CS 564)
May 18, 2016 Spring 2016 Institute of Space Technology
Lecture 7: Index Construction
CSE 451: Operating Systems Autumn 2004 BSD UNIX Fast File System
TeraScale Supernova Initiative
2.C Memory GCSE Computing Langley Park School for Boys.
CSE 451: Operating Systems Winter Module 15 BSD UNIX Fast File System
BAD SECTOR PHOTOLAB Presents
File system : Disk Space Management
CSE 332: Data Abstractions Memory Hierarchy
Presentation transcript:

Digital Terrain Analysis for Massive Grids Lars Arge, Jeff Chase, Laura Toma, Jeff Vitter, Rajiv Wickremesinghe Pat Halpin, Dean Urban in collaboration with http://www.cs.duke.edu/geo*/terraflow

Modeling Flow Sierra-Nevada DEM Flow Direction Flow Accumulation

Modeling Flow Flow direction Flow Routing Flow accumulation value The direction water flows at a cell Flow Routing Compute flow direction for all cells in the grid Flat areas Flooding Flow accumulation value Total area which flows through a cell in the terrain per unit width of contour Flow Accumulation Compute flow accumulation values for all cells in the terrain Flow is distributed according to the flow directions

Applications Automatic estimation of terrain parameters watersheds drainage networks topographic index Surface saturation Soil water content Erosion, Deposition Forest structure Species diversity Sediment transport

Massive Data Remote sensing data available today USGS (entire US at 10m resolution) NASA-SRTM (whole Earth 5TB at 30m resolution) Higher resolution data available Ex: Appalachian Mountains dataset 100m resolution (500MB) 30m resolution (5.5GB) 10m resolution (50GB) 1m resolution (5TB)

Problems with Existing Software GRASS r.watershed Killed after 17 days on a 50MB dataset TARDEM flood, d8, aread8 Can handle the 50MB dataset Killed after running for 20 days on a 130MB dataset CPU utilization: 5%, 3GB swap file ArcInfo flowdirection, flowaccumulation Can handle the 130MB dataset Doesn’t work for files bigger than 2GB

Our Results: TerraFlow Collection of programs for flow routing and flow accumulation on massive grids Theoretical results Flow routing and flow accumulation modeled as graph problems and solved in optimal bounds Practical results Efficient 2-1000 times faster than existing software on massive grids Scalable 1 billion elements!! (>2GB data) Flexible Outputs similar with ArcInfo flowdirection and flowaccumulation http://www.cs.duke.edu/geo*/terraflow

Scalability: Why? How? Local data accesses vs. scattered data accesses Massive data Data does not fit in memory OS places data on disk and moves data in and out of memory Data is moved in blocks Accessing disk is 1000 times slower than accessing main memory  disk I/O is the bottleneck! Local data accesses vs. scattered data accesses l

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 2 5 6 9 10 3 4 7 8

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 2 5 6 9 10 3 4 7 8

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 2 5 6 9 10 3 4 7 8

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 2 5 6 9 10 3 4 7 8

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 2 5 6 9 10 3 4 7 8

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 2 5 6 9 10 3 4 7 8

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 2 5 6 9 10 3 4 7 8

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 2 5 6 9 10 3 4 7 8

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 3 5 1 2 6 9 10 4 7 8

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 4 3 5 1 2 6 9 10 7 8

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 4 3 5 1 2 6 9 10 7 8

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 2 5 6 9 10 3 4 7 8 Loads 5 blocks

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 5 9 10 2 3 6 7 8 4 10 4 1 3 2 8 7 5 6 9 Loads 5 blocks

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 5 9 10 2 3 6 7 8 4 10 4 1 3 2 8 7 5 6 9 Loads 5 blocks

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 5 9 10 2 3 6 7 8 4 10 4 1 3 2 8 7 5 6 9 Loads 5 blocks

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 5 9 10 2 3 6 7 8 4 10 4 1 3 2 8 7 5 6 9 Loads 5 blocks

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 5 9 10 2 3 6 7 8 4 1 5 2 6 3 8 9 4 7 10 Loads 5 blocks

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 5 9 10 2 3 6 7 8 4 10 4 1 3 2 8 7 5 6 9 Loads 5 blocks

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 5 9 10 2 3 6 7 8 4 10 4 1 3 2 8 7 5 6 9 Loads 5 blocks

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 5 9 10 2 3 6 7 8 4 10 4 1 3 2 8 7 5 6 9 Loads 5 blocks

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 5 9 10 2 3 6 7 8 4 10 4 1 3 2 8 7 5 6 9 Loads 5 blocks

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 5 9 10 2 3 6 7 8 4 7 10 4 1 3 2 8 5 6 9 Loads 5 blocks

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 5 9 10 2 3 6 7 8 4 7 10 4 1 3 2 8 5 6 9 Loads 5 blocks

Local Accesses vs. Scattered Accesses Example: reading an array from disk Array size N = 10 elements Disk block size = 2 elements Memory size = 4 elements (2 blocks) 1 5 9 10 2 3 6 7 8 4 10 4 1 3 2 8 7 5 6 9 Loads 5 blocks Loads 10 blocks N B blocks <<

Scalability: Why? How? Local data accesses vs. scattered data accesses Massive data Data does not fit in memory OS places data on disk and moves data in and out of memory Data is moved in blocks Accessing disk is 1000 times slower than accessing main memory  disk I/O is the bottleneck! Local data accesses vs. scattered data accesses N/B << N block transfers However good the OS, it cannot change the data access pattern of the program!

TerraFlow Approach Improve locality by redesigning algorithms Block size at least 8KB (32KB, 64KB) Compute on whole block while it is in memory Avoid loading a block each time Speedup = block size! I/O-Efficient algorithms http://www.cs.duke.edu/geo*/terraflow

Related Work TerraFlow’s emphasis Flow modeling Computational aspects, not modeling Flow modeling [O’Callaghan and Mark 1984] D8 method for flow accumulation [Jenson and Domingue 1988] General technique of flooding Existing software ArcInfo, GRASS, Tardem, Topaz, Tapes-G, RiverTools

Flow Routing on Flat Areas …no obvious flow direction

TerraFlow Outline Flow routing Flow accumulation Flood the terrain to eliminate sinks Identify watersheds and construct watershed graph Collapse watershed graph and raise sinks Flow accumulation Sweep terrain top-down to distribute flow All these steps can be solved I/O-Efficiently http://www.cs.duke.edu/geo*/terraflow

Datasets Dataset Grid dimensions Grid size Sierra Nevada 3750 x 2672 9.5 million cells (19MB) Hawaii 6784 x 4369 28 million cells (54MB) East-Coast USA 13500 x 18200 246 million cells (500MB) Mid-West USA 11000 x 25500 280 million cells (560MB) Washington State 33454 x 31866 1 billion cells (2GB) http://www.cs.duke.edu/geo*/terraflow

TerraFlow v.s. ArcInfo http://www.cs.duke.edu/geo*/terraflow

TerraFlow – Performance Significant speedup over ArcInfo for large grids East-Coast dataset ArcInfo: 78 hours TerraFlow: 8.7 hours Washington State dataset TerraFlow: 63 hours ArcInfo: Cannot process files larger than 2GB! http://www.cs.duke.edu/geo*/terraflow

TerraFlow Features Flow directions, Flow accumulation SFD (single flow directions) MFD (multiple flow directions) (SFD,SFD), (MFD,MFD), (MFD,MFD) Flow accumulation Use MFD and switch to SFD when flow value exceeds an user-defined threshold http://www.cs.duke.edu/geo*/terraflow

TerraFlow: Result samples http://www.cs.duke.edu/geo*/terraflow

TerraFlow Results Samples http://www.cs.duke.edu/geo*/terraflow

Conclusions / Future Work TerraFlow - Flow modeling More features Modeling New applications http://www.cs.duke.edu/geo*/terraflow http://www.cs.duke.edu/geo*/terraflow http://www.cs.duke.edu/geo*/terraflow