Lars Arge 1/13 Efficient Handling of Massive (Terrain) Datasets Lars Arge A A R H U S U N I V E R S I T E T Department of Computer Science.

Slides:



Advertisements
Similar presentations
Reality Check: Processing LiDAR Data A story of data, more data and some more data.
Advertisements

University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
U.S. Department of the Interior U.S. Geological Survey.
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
Fort Bragg Cantonment Area Background The USGS is working with the U.S. Army at Fort Bragg to develop a Storm Water Pollution Prevention Plan (SWP3). The.
Standard watershed and stream delineation recipe - Vector stream (ex. NHD data) fusion into DEM raster (burning in) - Sink removal - Flow direction - Flow.
Modeling & Analyzing Massive Terrain Data Sets (STREAM Project) Pankaj K. Agarwal Workshop on Algorithms for Modern Massive Data Sets.
. Computing Contour Maps & Answering Contour Queries Pankaj K. Agarwal Joint work with Lars Arge ThomasMolhave Thomas Molhave Bardia Sadri.
Lasse Deleuran 1/37 Homotopic Polygonal Line Simplification Lasse Deleuran PhD student.
I/O-Algorithms Lars Arge January 31, Lars Arge I/O-algorithms 2 Random Access Machine Model Standard theoretical model of computation: –Infinite.
I/O-Efficient Construction of Constrained Delaunay Triangulations Pankaj K. Agarwal, Lars Arge, and Ke Yi Duke University.
Terrain Datasets: How good are they? Celso Ferreira¹, Francisco Olivera¹, Dean Djokic² ¹Texas A & M University ² Environmental Systems Research Institute.
Celso Ferreira¹, Francisco Olivera², Dean Djokic³ ¹ PH.D. Student, Civil Engineering, Texas A&M University ( ² Associate.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
I/O-Algorithms Lars Arge Spring 2009 January 27, 2009.
I/O-Algorithms Lars Arge Spring 2007 January 30, 2007.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
I/O-Algorithms Lars Arge Spring 2006 February 2, 2006.
Flow Computation on Massive Grid Terrains
Lars Arge 1/14 A A R H U S U N I V E R S I T E T Department of Computer Science Efficient Handling of Massive (Terrain) Datasets Professor Lars Arge University.
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
Massive Data Algorithmics Faglig Dag, January 17, 2008 Gerth Stølting Brodal University of Aarhus Department of Computer Science.
Flow modeling on grid terrains. Why GIS?  How it all started.. Duke Environmental researchers: computing flow accumulation for Appalachian Mountains.
From Elevation Data to Watershed Hierarchies Pankaj K. Agarwal Duke University Supported by ARO W911NF
Hydrologic Model Integration
Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere.
I/O-Algorithms Lars Arge Spring 2008 January 31, 2008.
From Topographic Maps to Digital Elevation Models Daniel Sheehan IS&T Academic Computing Anne Graham MIT Libraries.
Algorithms for Information Retrieval Prologue. References Managing gigabytes A. Moffat, T. Bell e I. Witten, Kaufmann Publisher A bunch of scientific.
TerraStream: From Elevation Data to Watershed Hierarchies Thursday, 08 November 2007 Andrew Danner (Swarthmore), T. Moelhave (Aarhus), K. Yi (HKUST), P.
TerraFlow Flow Computation on Massive Grid Terrains Helena Mitasova Dept. of Marine, Earth & Atmospheric Sciences, NCSU, USA
I/O-Algorithms Lars Arge Fall 2014 August 28, 2014.
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
Automated H&H and Advanced Terrain Processing
IST 210 Introduction to Spatial Databases. IST 210 Evolution of acronym “GIS” Fig 1.1 Geographic Information Systems (1980s) Geographic Information Science.
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
August 20th, CONTENT 1. Introduction 2. Data and Characteristics 3. Flood analysis 1. MOUSE 2. SOBEK 3. ARC-SWAT 4. Conclusions and suggestions.
A Simple Drainage Enforcement Procedure for Estimating Catchment Area Using DEM Data David Nagel, John M. Buffington, and Charles Luce U.S. Forest Service,
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 29, 2000.
References [1] Pankaj Agarwal, Lars Arge and Andrew Danner. From Point Cloud to Grid DEM: A Scalable Approach. In Proc. 12th Intl. Symp. on Spatial Data.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
Steve Kopp Esri ArcGIS 10 and Beyond Arc Hydro River Workshop, Austin, Texas, December 1, 2010.
Flow Modeling on Massive Grids Laura Toma, Rajiv Wickremesinghe with Lars Arge, Jeff Chase, Jeff Vitter Pat Halpin, Dean Urban in collaboration with.
ProblemAssumption and PreliminariesAlgorithm  How does the water flow and depressions fill during non-uniform rain over a terrain?  Can we efficiently.
U.S. Department of the Interior U.S. Geological Survey.
Towards Unifying Vector and Raster Data Models for Hybrid Spatial Regions Philip Dougherty.
Lecture 1: Basic Operators in Large Data CS 6931 Database Seminar.
Headlines - Font: Ariel, Size:32, white Breadtext: Font: Ariel, Size: 22, color white This is a table cell with color RGB 96,96,96 Headline Running text.
Internal Memory Pointer MachineRandom Access MachineStatic Setting Data resides in records (nodes) that can be accessed via pointers (links). The priority.
Field Drainage Technology LiDAR John Nowatzki Extension Ag Machine Systems Specialist.
TerraSTREAM: Terrain Processing Pipeline MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation What TerraSTREAM.
Industrial, Technical and Scientific uses of Computers By Daniel and Luke.
Abstract This study examines the potential impact of 21 st century sea-level rise on Aarhus, the second largest city in Denmark, emphasizing the economic.
Digital Elevation Models (DEMs)Constructing Grid DEMConstructing Contour Map Constructing TIN DEM Grid DEM Quality Metric A Digital Elevation Model (DEM)
Hydrologic Terrain Processing Using Parallel Computing
Definition In scientific literature there is no universal agreement about the usage of the terms: digital elevation model (DEM) digital terrain model (DTM)
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
Flow field representations for a grid DEM
Digital Terrain Analysis for Massive Grids
Modeling & Analyzing Massive Terrain Data Sets
ESRI Geodatabases Ming-Chun Lee.
Advanced Topics in Data Management
CS246: Search-Engine Scale
Presentation transcript:

Lars Arge 1/13 Efficient Handling of Massive (Terrain) Datasets Lars Arge A A R H U S U N I V E R S I T E T Department of Computer Science

Lars Arge 2/13 Massive Data Algorithmics  Massive data being acquired/used everywhere  Storage management software is billion-$ industry  Science is increasingly about mining massive data (Nature 2/06) Examples (2002):  Phone: AT&T 20TB phone call database, wireless tracking  Consumer: WalMart 70TB database, buying patterns  WEB: Google index 8 billion web pages  Geography: NASA satellites generate Terrabytes each day

Lars Arge 3/13 Terrain Data  New technologies: Much easier/cheaper to collect detailed data  Previous ‘manual’ or radar based methods −Often 30 meter between data points −Sometimes 10 meter data available  New laser scanning methods (LIDAR) −Less than 1 meter between data points −Centimeter accuracy (previous meter) Denmark:  ~2 million points at 30 meter (<<1GB)  ~18 billion points at 1 meter (>>1TB)  COWI (and other) now scanning DK  NC scanned after Hurricane Floyd in 1999

Lars Arge 4/13 Massive data = I/O-Bottleneck  I/O is often bottleneck when handling massive datasets  Disk access is 10 6 times slower than main memory access!  Disk systems try to amortize large access time transferring large contiguous blocks of data  Need to store and access data to take advantage of blocks! “The difference in speed between modern CPU and disk technologies is analogous to the difference in speed in sharpening a pencil using a sharpener on one’s desk or by taking an airplane to the other side of the world and using a sharpener on someone else’s desk.” (D. Comer)

Lars Arge 5/13 I/O-efficient Algorithms  Taking advantage of block access important  Traditionally algorithms developed without block considerations  I/O-efficient algorithms leads to large runtime improvements data size running time Normal algorithm I/O-efficient algorithm Main memory size

Lars Arge 6/13 RAMRAM Scalability: Hierarchical Memory  Block access not only important on disk level  Machines have complicated memory hierarchy  Levels get larger and slower  Block transfers on all levels L1L1 L2L2 data size running time

Lars Arge 7/13 My Research Work  Theoretically I/O- (and cache-) efficient algorithms work  Data structures, computational geometry, graph theory, …  Focus on Geographic Information Systems problems  Algorithm engineering work, e.g  TPIE system for simple, efficient, and portable implementation of I/O-efficient algorithms  Software for terrain data processing  LIDAR data handling  Terrain flow computations

Lars Arge 8/13 Example: Terrain Flow  Terrain water flow has many important applications  Predict location of streams, areas susceptible to floods…  Conceptually flow is modeled using two basic attributes  Flow direction: The direction water flows at a point  Flow accumulation: Amount of water flowing through a point  Flow accumulation used to compute other hydrological attributes: drainage network, topographic convergence index… 7 am 3pm

Lars Arge 9/13 Terrain Flow Accumulation  Collaboration with environmental researchers at Duke University  Appalachian mountains dataset:  800x800km at 100m resolution  a few Gigabytes  On ½GB machine:  ArcGIS:  Performance somewhat unpredictable  Days on few gigabytes of data  Many gigabytes of data…..  Appalachian dataset would be Terabytes sized at 1m resolution 14 days!!

Lars Arge 10/13 Terrain Flow Accumulation: TerraFlow  We developed theoretically I/O-optimal algorithms  TPIE implementation was very efficient  Appalachian Mountains flow accumulation in 3 hours!  Developed into comprehensive software package for flow computation on massive terrains: TerraFlow  Efficient: times faster than existing software  Scalable: >1 billion elements!  Flexible: Flexible flow modeling (direction) methods  Extension to ArcGIS

Lars Arge 11/13 LIDAR Terrain Data Work: TerraStream  Now TerraStream software “pipeline” for handling terrain data  Points to DEM (incl. breaklines)  DEM flow modeling (incl “flooding”, “flat” routing, “noise” reduce)  DEM flow accumulation (incl river extraction)  DEM hierarchical watershed computation  All work for both grid and TIN DEM’s  Capable of handling massive datasets  Test dataset: 400M point Neuse river basin (1/3 NC) (>17GB)

Lars Arge 12/13 Examples of Ongoing Terrain Work  Terrain modeling, e.g  “Raw” LIDAR to point conversion (LIDAR point classification) (incl feature, e.g. bridge, detection/removal)  Further improved flow and erosion modeling (e.g. carving)  Contour line extraction (incl. smoothing and simplification)  Terrain (and other) data fusion (incl format conversion)  Terrain analysis, e.g  Choke point, navigation, visibility, change detection,…  Major grand goal:  Construction of hierarchical (simplified) DEM where derived features (water flow, drainage, choke points) are preserved/consistent

Lars Arge 13/13 Thanks Lars Arge Work supported in part by  US National Science Foundation ESS grant EIA– , RI grant EIA– , CAREER grant EIA– , and ITR grant EIA–  US Army Research Office grants W911NF and DAAD  Ole Rømer Scholarship from Danish Science Research Council  NABIIT grant from the Danish Strategic Research Council  Danish National Research foundation (MADALGO center)