Lars Arge 1/14 A A R H U S U N I V E R S I T E T Department of Computer Science Efficient Handling of Massive (Terrain) Datasets Professor Lars Arge University.

Slides:



Advertisements
Similar presentations
Reality Check: Processing LiDAR Data A story of data, more data and some more data.
Advertisements

University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
U.S. Department of the Interior U.S. Geological Survey.
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
Lars Arge 1/13 Efficient Handling of Massive (Terrain) Datasets Lars Arge A A R H U S U N I V E R S I T E T Department of Computer Science.
Fort Bragg Cantonment Area Background The USGS is working with the U.S. Army at Fort Bragg to develop a Storm Water Pollution Prevention Plan (SWP3). The.
Standard watershed and stream delineation recipe - Vector stream (ex. NHD data) fusion into DEM raster (burning in) - Sink removal - Flow direction - Flow.
Modeling & Analyzing Massive Terrain Data Sets (STREAM Project) Pankaj K. Agarwal Workshop on Algorithms for Modern Massive Data Sets.
Raster Based GIS Analysis
I/O-Algorithms Lars Arge January 31, Lars Arge I/O-algorithms 2 Random Access Machine Model Standard theoretical model of computation: –Infinite.
Terrain Datasets: How good are they? Celso Ferreira¹, Francisco Olivera¹, Dean Djokic² ¹Texas A & M University ² Environmental Systems Research Institute.
Celso Ferreira¹, Francisco Olivera², Dean Djokic³ ¹ PH.D. Student, Civil Engineering, Texas A&M University ( ² Associate.
From Topographic Maps to Digital Elevation Models Daniel Sheehan DUE Office of Educational Innovation & Technology Anne Graham MIT Libraries.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
I/O-Algorithms Lars Arge Spring 2009 January 27, 2009.
I/O-Algorithms Lars Arge Spring 2007 January 30, 2007.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
I/O-Algorithms Lars Arge Spring 2006 February 2, 2006.
Flow Computation on Massive Grid Terrains
Processing Geospatial Data with HEC-GeoRAS 3.1
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
Massive Data Algorithmics Faglig Dag, January 17, 2008 Gerth Stølting Brodal University of Aarhus Department of Computer Science.
Flow modeling on grid terrains. Why GIS?  How it all started.. Duke Environmental researchers: computing flow accumulation for Appalachian Mountains.
From Elevation Data to Watershed Hierarchies Pankaj K. Agarwal Duke University Supported by ARO W911NF
Hydrologic Model Integration
Flow modeling on grid terrains. DEM Representations TIN Grid Contour lines Sample points.
Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere.
Airborne LIDAR The Technology Slides adapted from a talk given by Mike Renslow - Spencer B. Gross, Inc. Frank L.Scarpace Professor Environmental Remote.
I/O-Algorithms Lars Arge Spring 2008 January 31, 2008.
From Topographic Maps to Digital Elevation Models Daniel Sheehan IS&T Academic Computing Anne Graham MIT Libraries.
Panel: Strategies for CyberGIS Partner Engagement.
TerraStream: From Elevation Data to Watershed Hierarchies Thursday, 08 November 2007 Andrew Danner (Swarthmore), T. Moelhave (Aarhus), K. Yi (HKUST), P.
TerraFlow Flow Computation on Massive Grid Terrains Helena Mitasova Dept. of Marine, Earth & Atmospheric Sciences, NCSU, USA
Community-wide urban stormwater planning utilizing LiDAR, the WinSLAMM model and GIS Dan Murphy Rebecca Gronewold UNI GeoTREE Center April 4, 2013.
Digital Terrain Models by M. Varshosaz
I/O-Algorithms Lars Arge Fall 2014 August 28, 2014.
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
Automated H&H and Advanced Terrain Processing
IST 210 Introduction to Spatial Databases. IST 210 Evolution of acronym “GIS” Fig 1.1 Geographic Information Systems (1980s) Geographic Information Science.
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Applied Cartography and Introduction to GIS GEOG 2017 EL Lecture-2 Chapters 3 and 4.
August 20th, CONTENT 1. Introduction 2. Data and Characteristics 3. Flood analysis 1. MOUSE 2. SOBEK 3. ARC-SWAT 4. Conclusions and suggestions.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
Karst Topography – Developing a Sinkhole Inventory to Protect Groundwater Quality Presenters: Stacey Jarboe and John-Paul Brashear Stantec Consulting Services.
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 29, 2000.
References [1] Pankaj Agarwal, Lars Arge and Andrew Danner. From Point Cloud to Grid DEM: A Scalable Approach. In Proc. 12th Intl. Symp. on Spatial Data.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
Steve Kopp Esri ArcGIS 10 and Beyond Arc Hydro River Workshop, Austin, Texas, December 1, 2010.
Flow Modeling on Massive Grids Laura Toma, Rajiv Wickremesinghe with Lars Arge, Jeff Chase, Jeff Vitter Pat Halpin, Dean Urban in collaboration with.
U.S. Department of the Interior U.S. Geological Survey.
Towards Unifying Vector and Raster Data Models for Hybrid Spatial Regions Philip Dougherty.
Lecture 1: Basic Operators in Large Data CS 6931 Database Seminar.
LIDAR Flood mapping for Brownsville and Matamoros Gueudet Pierre GIS in Water Resources University of Texas Austin Fall 2002.
Field Drainage Technology LiDAR John Nowatzki Extension Ag Machine Systems Specialist.
TerraSTREAM: Terrain Processing Pipeline MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation What TerraSTREAM.
Abstract This study examines the potential impact of 21 st century sea-level rise on Aarhus, the second largest city in Denmark, emphasizing the economic.
Digital Elevation Models (DEMs)Constructing Grid DEMConstructing Contour Map Constructing TIN DEM Grid DEM Quality Metric A Digital Elevation Model (DEM)
Big Data Analytics and HPC Platforms
Definition In scientific literature there is no universal agreement about the usage of the terms: digital elevation model (DEM) digital terrain model (DTM)
Flow field representations for a grid DEM
Digital Terrain Analysis for Massive Grids
Modeling & Analyzing Massive Terrain Data Sets
Advanced Topics in Data Management
CS246: Search-Engine Scale
Presentation transcript:

Lars Arge 1/14 A A R H U S U N I V E R S I T E T Department of Computer Science Efficient Handling of Massive (Terrain) Datasets Professor Lars Arge University of Aarhus and Duke University September 22, 2006

Lars Arge 2/14 A A R H U S U N I V E R S I T E T Department of Computer Science Who am I?  PhD in Computer Science 1996 from AU  Research career at Duke University in US  Post doc 1996……Professor 2006  Ole Rømer Professorship AU 2004  Research:  Efficient algorithms and data structures for massive datasets  Lots of theoretical work…  … but also practical work, especially on GIS and terrain data  Funded by e.g. US ARO and NSF, Danish basic and strategic research councils, private companies (COWI, DHI)

Lars Arge 3/14 A A R H U S U N I V E R S I T E T Department of Computer Science Massive Data Algorithmics  Massive data being acquired/used everywhere  Storage management software is billion-$ industry  Science is increasingly about mining massive data (Nature 2/06) Examples (2002):  Phone: AT&T 20TB phone call database, wireless tracking  Consumer: WalMart 70TB database, buying patterns  WEB: Google index 8 billion web pages  Geography: NASA satellites generate Terrabytes each day

Lars Arge 4/14 A A R H U S U N I V E R S I T E T Department of Computer Science Terrain Data  New technologies: Much easier/cheaper to collect detailed data  Previous ‘manual’ or radar based methods −Often 30 meter between data points −Sometimes 10 meter data available  New laser scanning methods (LIDAR) −Less than 1 meter between data points −Centimeter accuracy (previous meter) Denmark:  ~2 million points at 30 meter (<<1GB)  ~18 billion points at 1 meter (>>1TB)  COWI A/S in the process of scanning DK  NC scanned after Hurricane Floyd in 1999

Lars Arge 5/14 A A R H U S U N I V E R S I T E T Department of Computer Science Massive data = I/O-Bottleneck  I/O is often bottleneck when handling massive datasets  Disk access is 10 6 times slower than main memory access!  Disk systems try to amortize large access time transferring large contiguous blocks of data  Need to store and access data to take advantage of blocks!

Lars Arge 6/14 A A R H U S U N I V E R S I T E T Department of Computer Science I/O-efficient Algorithms  Taking advantage of block access very important  Traditionally algorithms developed without block considerations  I/O-efficient algorithms leads to large runtime improvements data size running time Normal algorithm I/O-efficient algorithm Main memory size

Lars Arge 7/14 A A R H U S U N I V E R S I T E T Department of Computer Science RAMRAM Scalability: Hierarchical Memory  Block access not only important on disk level  Machines have complicated memory hierarchy  Levels get larger and slower  Block transfers on all levels L1L1 L2L2 data size running time

Lars Arge 8/14 A A R H U S U N I V E R S I T E T Department of Computer Science My Research Work  Theoretically I/O- (and cache-) efficient algorithms work  Data structures, computational geometry, graph theory, …  Focus on Geographic Information Systems problems  Algorithm engineering work, e.g  TPIE system for simple, efficient, and portable implementation of I/O-efficient algorithms  Software for terrain data processing  LIDAR data handling  Terrain flow computations

Lars Arge 9/14 A A R H U S U N I V E R S I T E T Department of Computer Science Example: Terrain Flow  Terrain water flow has many important applications  Predict location of streams, areas susceptible to floods…  Conceptually flow is modeled using two basic attributes  Flow direction: The direction water flows at a point  Flow accumulation: Amount of water flowing through a point  Flow accumulation used to compute other hydrological attributes: drainage network, topographic convergence index… 7 am 3pm

Lars Arge 10/14 A A R H U S U N I V E R S I T E T Department of Computer Science Terrain Flow Accumulation  Collaboration with environmental researchers at Duke University  Appalachian mountains dataset:  800x800km at 100m resolution  a few Gigabytes  On ½GB machine:  ArcGIS:  Performance somewhat unpredictable  Days on few gigabytes of data  Many gigabytes of data…..  Appalachian dataset would be Terabytes sized at 1m resolution 14 days!!

Lars Arge 11/14 A A R H U S U N I V E R S I T E T Department of Computer Science Terrain Flow Accumulation: TerraFlow  We developed theoretically I/O-optimal algorithms  TPIE implementation was very efficient  Appalachian Mountains flow accumulation in 3 hours!  Developed into comprehensive software package for flow computation on massive terrains: TerraFlow  Efficient: times faster than existing software  Scalable: >1 billion elements!  Flexible: Flexible flow modeling (direction) methods  Extension to ArcGIS

Lars Arge 12/14 A A R H U S U N I V E R S I T E T Department of Computer Science LIDAR Terrain Data Work  Now “pipeline” of software for handling terrain (LIDAR) data  Points to DEM (incl. breaklines)  DEM flow modeling (incl “flooding”, “flat” routing, “noise” reduce)  DEM flow accumulation (incl river extraction)  DEM hierarchical watershed computation  All work for both grid and TIN DEM’s  Capable of handling massive datasets  Test dataset: 400M point Neuse river basin (1/3 NC) (>17GB)

Lars Arge 13/14 A A R H U S U N I V E R S I T E T Department of Computer Science Examples of Ongoing Terrain Work  Terrain modeling, e.g  “Raw” LIDAR to point conversion (LIDAR point classification) (incl feature, e.g. bridge, detection/removal)  Further improved flow and erosion modeling (e.g. carving)  Contour line extraction (incl. smoothing and simplification)  Terrain (and other) data fusion (incl format conversion)  Terrain analysis, e.g  Choke point, navigation, visibility, change detection,…  Major grand goal:  Construction of hierarchical (simplified) DEM where derived features (water flow, drainage, choke points) are preserved/consistent

Lars Arge 14/14 A A R H U S U N I V E R S I T E T Department of Computer Science Thanks Lars Arge Work supported in part by  US National Science Foundation ESS grant EIA– , RI grant EIA– , CAREER grant EIA– , and ITR grant EIA–  US Army Research Office grants W911NF and DAAD  Ole Rømer Scholarship from Danish Science Research Council  NABIIT grant from the Danish Strategic Research Council