Modeling & Analyzing Massive Terrain Data Sets

Slides:



Advertisements
Similar presentations
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Advertisements

I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
Lars Arge 1/13 Efficient Handling of Massive (Terrain) Datasets Lars Arge A A R H U S U N I V E R S I T E T Department of Computer Science.
Terrain for the Lower Colorado River Flood Damage Evaluation Project Erin Atkinson, Halff Associates, Inc. Rick Diaz, Lower Colorado River Authority Symposium.
Drainage Paths derived from TIN-based Digital Elevation Models Graduate student: Henrique Rennó de Azeredo Freitas Advisors: Sergio Rosim João Ricardo.
Modeling & Analyzing Massive Terrain Data Sets (STREAM Project) Pankaj K. Agarwal Workshop on Algorithms for Modern Massive Data Sets.
. Computing Contour Maps & Answering Contour Queries Pankaj K. Agarwal Joint work with Lars Arge ThomasMolhave Thomas Molhave Bardia Sadri.
Lasse Deleuran 1/37 Homotopic Polygonal Line Simplification Lasse Deleuran PhD student.
I/O-Algorithms Lars Arge January 31, Lars Arge I/O-algorithms 2 Random Access Machine Model Standard theoretical model of computation: –Infinite.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus.
I/O-Efficient Construction of Constrained Delaunay Triangulations Pankaj K. Agarwal, Lars Arge, and Ke Yi Duke University.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
Celso Ferreira¹, Francisco Olivera², Dean Djokic³ ¹ PH.D. Student, Civil Engineering, Texas A&M University ( ² Associate.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
Streaming Computation of Delaunay Triangulations Jack Snoeyink UNC Chapel Hill Jonathan Shewchuk UC Berkeley Yuanxin Liu UNC Chapel Hill Martin Isenburg.
I/O-Algorithms Lars Arge Spring 2009 January 27, 2009.
I/O-Algorithms Lars Arge Spring 2007 January 30, 2007.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
I/O-Algorithms Lars Arge Spring 2006 February 2, 2006.
Lars Arge 1/14 A A R H U S U N I V E R S I T E T Department of Computer Science Efficient Handling of Massive (Terrain) Datasets Professor Lars Arge University.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, and Ke Yi Duke University University of Aarhus.
reconstruction process, RANSAC, primitive shapes, alpha-shapes
Generating Raster DEM from Mass Points via TIN Streaming Jack Snoeyink UNC Chapel Hill Jonathan Shewchuk UC Berkeley Tim Thirion UNC Chapel Hill Martin.
Flow modeling on grid terrains. Why GIS?  How it all started.. Duke Environmental researchers: computing flow accumulation for Appalachian Mountains.
Some Potential Terrain Analysis Tools for ArcGIS David G. Tarboton
From Elevation Data to Watershed Hierarchies Pankaj K. Agarwal Duke University Supported by ARO W911NF
I/O-Algorithms Lars Arge Spring 2008 January 31, 2008.
Processing Terrain Data in the River Proximity Arc Hydro River Workshop December 1, 2010 Erin Atkinson, PE, CFM, GISP Halff Associates, Inc.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Terrain Mapping and Analysis
TerraStream: From Elevation Data to Watershed Hierarchies Thursday, 08 November 2007 Andrew Danner (Swarthmore), T. Moelhave (Aarhus), K. Yi (HKUST), P.
I/O-Algorithms Lars Arge Fall 2014 August 28, 2014.
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
Multi-Scale Dual Morse Complexes for Representing Terrain Morphology E. Danovaro Free University of Bolzano, Italy L. De Floriani University of Genova,
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
Intro. To GIS Lecture 9 Terrain Analysis April 24 th, 2013.
Creating Watersheds and Stream Networks
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus.
Esri UC 2014 | Technical Workshop | Creating Watersheds, Stream Networks and Hydrologically Conditioned DEMS Steve Kopp Dean Djokic.
Generating Realistic Terrains with Higher-Order Delaunay Triangulations Thierry de Kok Marc van Kreveld Maarten Löffler Center for Geometry, Imaging and.
References [1] Pankaj Agarwal, Lars Arge and Andrew Danner. From Point Cloud to Grid DEM: A Scalable Approach. In Proc. 12th Intl. Symp. on Spatial Data.
Flow Modeling on Massive Grids Laura Toma, Rajiv Wickremesinghe with Lars Arge, Jeff Chase, Jeff Vitter Pat Halpin, Dean Urban in collaboration with.
Introduction Terrain Level set and Contour tree Problem Maintaining the contour tree of a terrain under the following operation: ChangeHeight(v, r) : Change.
Statistical Surfaces, part II GEOG370 Instructor: Christine Erlien.
Lecture 1: Basic Operators in Large Data CS 6931 Database Seminar.
TerraSTREAM: Terrain Processing Pipeline MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation What TerraSTREAM.
Digital Elevation Models (DEMs)Constructing Grid DEMConstructing Contour Map Constructing TIN DEM Grid DEM Quality Metric A Digital Elevation Model (DEM)
Definition In scientific literature there is no universal agreement about the usage of the terms: digital elevation model (DEM) digital terrain model (DTM)
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
Database Management System
Terrain modelling: the basics
Digital Terrain Analysis for Massive Grids
Chapter 12: Query Processing
Digital Elevation Model Based Watershed and Stream Network Delineation
Spatial Analysis & Modeling
Advanced Topics in Data Management
May 18, 2016 Spring 2016 Institute of Space Technology
Lecture 2- Query Processing (continued)
Data Sources for GIS in Water Resources
Creating Watersheds and Stream Networks
CS 685: Special Topics in Data Mining Jinze Liu
Presentation transcript:

Modeling & Analyzing Massive Terrain Data Sets Pankaj K. Agarwal Duke University

STREAM Project Scalable Techniques for hi-Resolution Elevation data Analysis & Modeling In collaboration with: Lars Arge, Helena Mitasova Students: Andrew Danner,Thomas Molhave,Ke Yi

Diverse elevation point data: density, distribution, accuracy Photogrammetry 0.76m v. accuracy (5ft contours) 1974 1995 1998 Lidar 0.15m v. accuracy; altitude 700m and 2300m RTK-GPS 0.10m v. accuracy 1999 2001 2004

Constructing Digital Elevation Models Grid, TIN, Contour DEMs

Natural feature extraction Maps of topo parameters: computed simultaneously with interpolation using partial derivatives of the RST function, terrain features: combined thresholds of topo parameters Ma Extracted foredunes 1999-2004 using profile curvature and elevation threshold

Tracking evolving features slip faces: slope > 30deg 1995 new slip face in 1999 2001 dune crests: profile curvature threshold how to automatically track features that change some of their properties over time?

Hydrology Analysis Flow Analysis Watershed hierarchy

Challenge: Massive Data Sets LIDAR NC Coastline: 200 million points – over 7 GB Neuse River basin (NC): 500 million points – over 17 GB Output is also large 10ft grid: 10GB 5ft grid: 40GB Storage is not the problem. Processing is. Need good transition

Approximation Algorithms Exact computation expensive Many practical problems are intractable Multiple & often conflicting optimization criteria Suffices to find a near-optimal solution Tunable Approximation algorithms Tradeoff between accuracy and efficiency User specifies tolerance

I/O-Bottleneck Data resides in secondary memory Disk access is 106 times slower than main memory access Maximize useful data transferred with each access Amortize large access time by transferring large contiguous blocks of data I/O – not CPU – is often the bottleneck for processing massive data sets track magnetic surface read/write arm read/write head

I/O-efficient Algorithms [AV88] Traditional algorithms optimize CPU computation Not sensitive to penalty of disk access I/O model Memory is finite Data is transferred in blocks Complexity measured in disk blocks transferred Disk RAM CPU B M Virtual memory helps sometime, but not always. Depends on replacement policy, working set size. B ~ 2K-8K

Elevation Data to Watershed Hierarchies: A Pipeline Approach

Point Cloud to Grid DEM Given a point cloud sampled from a bivariate function z=F(x,y) (elevation data) Goal: construct a grid DEM for z DEM Applications Hydrology Ecology Viewsheds … Many algorithms only work on grid data

Sub-meter resolution: improved structures representation 2004 lidar-based 30cm resolution DSM 1999 lidar 2m resolution DSM 2004 lidar: 0.5m resolution binning interpolated DSM Binning (within cell average) sufficient for ~1-3m DEMs but interpolation produces improved geometry representation and allows us to create higher resolution DEMs/DSMs

General Approach: Interpolation Kriging, IDW, Splines, … Interpolate O(N3) z=F(x,y) N Points Modern data sets can have millions of points Direct interpolation does not scale

Improving Interpolation with Quad Trees Segment initial point set using a quad tree Each quad tree leaf has a <K points (10-100) Interpolate each segment separately N/K segments O(K3) interpolation per segment O(K2N) total running time Linear in N Boundary smoothness? Use points from nearby segments

Overview of the Algorithm Construct quad-tree on disk using bulk loading [AAPV 01] For each quad-tree node, find points in neighboring segments Arrange points on disk for sequential interpolation Use the algorithm by Mitasova, Mitas, Harmon Can use any other interpolation method

Building an External Quad Tree Build top few levels in memory Buffer points in lower levels Write buffers to disk when full K=2 Block I/O

Building a TIN TIN: Triangulated Irregular Network Constrained Delaunay Triangulation Developed an I/O- efficient algorithm Builds TIN for Neuse River Basin (14GB) in 7.5 hours

Future Work: Hierarchical DEMs Considerable work in graphics and GIS communities on multiresolution hierarchies for terrains Focuses on visualization Most algorithms perform random memory access Out-of-core simplification I/O-efficient algorithms for terrain simplification (edge contraction method) Feature preserving simplification Preserves main river networks Preserves visibility for most of the points

Coping with Noisy Data Nearly all flow models assume [O’Callaghan and Mark 84, de Berg et al. 96] water flows downwards stops at a minimum

Denoising via Flooding [Jenson and Domingue 88, Arge et al. 03] Pour water uniformly on terrain Water level rises until steady-state

River Network After Flooding Sort(N) I/Os

Denoising via Flooding [Jenson and Domingue 88, Arge et al. 03] Pour water uniformly on terrain Water level rises until steady-state Use topological persistence (Edelsbrunner et al.) as importance measure to measure the importance of a minimum Flood low-persistence minima, preserve high-persistence minima

Topological Persistence [Edelsbrunner et al. 02]

The Union-Find Problem A universe of N elements: x1, x2, …, xN Initially N singleton sets: {x1}, {x2 }, …, {xN} Each set has a representative Maintain the partition under Union(xi, xj) : Joins the sets containing xi and xj Find(xi) : Returns the representative of the set containing xi

Formulated as Batched Union-Find On a terrain represented as a TIN When reach A minimum or maximum: do nothing A regular point u: Issue union(u,v) for a lower neighbor v A saddle u: let v and w be nodes from u’s two connected pieces in its lower link Issue: find(v), find(w), union(u,v), union(u,w) Sequence of union/find operations can be computed in advance lower link

Classical Union-Find Algorithm representatives d h i p b j a f l z s r c k e g m n Union(d, h) : Find(n) : h h d f l d f l m n b j a m b j a link-by-rank path compression e g n e g

Complexity O(N α(N)) for a sequence of N union and find operations [Tarjan 75] α(•) : Inverse Ackermann function (very slow!) Optimal in the worst case [Tarjan79, Fredman and Saks 89] Batched (Off-line) version Entire sequence known in advance Can be improved to linear on RAM [Gabow and Tarjan 85] Not possible on a pointer machine [Tarjan79]

Our Results An I/O-efficient algorithm for the batched union-find problem using O(sort(N)) = O(N/B logM/B(N/B)) I/Os Same as sorting optimal in the worst case A simpler algorithm using O(sort(N) log(N/M)) I/Os Implemented Apply to persistence computation, flooding O(sort(N)) time Other applications

I/O-Efficient Batched Union-Find Two-stage algorithm Convert to interval union-find Compute an order on the elements s.t. each union joins two adjacent sets Solve batched interval union-find Each stage formulated as a geometric problem and solved using sort(N) I/Os

Traditional algorithm good as long as data fits in memory Experimental Results Traditional algorithm good as long as data fits in memory

Experimental Results:Neuse River Basin 0.5 billion points (14GB raw data), sampled to obtain smaller data sets On the entire data set: EM takes 5 hours, IM fails

Contour Trees

Previous Results Directly maintain contours O(N log N) time [van Kreveld et al. 97] Needs union-split-find for circular lists Do not extend to higher dimensions Two sweeps by maintaining components, then merge O(N log N) time [Carr et al. 03] Extend to arbitrary dimensions

Join Tree and Split Tree [Carr et al. 03] Qualified nodes 9 9 9 9 8 8 8 8 7 7 7 7 6 6 6 6 5 5 5 5 4 4 4 4 3 3 3 3 2 2 1 1 1 1 Join tree Split tree Join tree Split tree

Final Contour Tree Hard to BATCH! Join tree Split tree Contour tree 9 8 8 8 7 7 7 6 6 6 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 Join tree Split tree Contour tree

Another Characterization Let w be the highest node that is a descendant of v in join tree and ancestor of u in split tree, (u, w) is a contour tree edge 9 8 7 6 5 4 3 2 1 Join tree Split tree Contour tree u v w Now can BATCH!

Experimental Results On the Neuse River Basin data set

Coping with Noise: Future Work Constructing hydrologically conditioned DEMs Formulate as an optimization problem Using persistent homology Integrating topology and geometry Handling flat areas Scalability remains a big issue Handling anthropogenic features

Back to Pipeline: Flow Modeling

From Elevation to River Networks Where does water go? From higher elevation to lower elevation Single flow directions forms a tree 80 90 100 95 85 110 Sea

Drainage Area How much area is upstream of each node? 1 How much area is upstream of each node? Each node has initial drainage area (1) Drainage area of internal nodes depends on drainage area of children 2 3 3 3 7 17 25 14 11 9 5

Drainage Area example (Panama)

IFSARE and SRTM based stream analysis Stream and watershed boundary extraction for Panama for on-going water quality research to provide more detailed and up-to-date data. SRTM - 7400x3600 DSM at 90m resolution for entire Panama, IFSARE - 10800x11300 DEM at 10m resolution for the Panama canal section Using r.terraflow - officially released in GRASS6.2 in 2006 - streams can be extracted in 3-4 hours, tile-based approach takes days SRTM DEM for entire Panama with main rivers extracted from DEM in 3 hr

Hierarchical Watershed Decomposition

Watershed Hierarchies Decompose a river network into a hierarchy of hydrological units All water in HU flows to a common outlet Hierarchy provides tunable level of detail Method used: Pfafstetter [VV99] Want a solution scalable to large modern hi-res terrains

Pfafstetter 8 9 7 4 5 3 1 2 6 Find main river 17 25 14 11 9 5 Find main river Find four largest tributaries Label basins/interbasins Recurse until single path 8 6 4 2 7 5 3 1 9

Recurse 71 75 73 74 72 7 2 26 25 24 23 21 22 27

Example Watershed Boundaries 1 2 3 4 5 6 7 8 9 91 92 93 94 95 96 97 98 99 991 992 993 994 995 996 997 998 999

A Complete Pipeline

Implementation TPIE: C++ primitives for I/O-efficient algorithms GRASS: Open Source GIS Interpolation: Regularized spline with tension (in GRASS) Data: North Carolina LIDAR Neuse river basin: 400 million points (NC Floodmaps) Outer banks coastal data : 128 million points (NOAA CSC) USGS 30m NED

Grid Construction Results Resolution (ft) 40 20 10 Grid cells x106 221 885 3542 Points x106 205 340 415 Total time 12h32m 14h46m 26h52m Time spent(%)   Build quad tree 8.9 7.1 5.7 Find neighbors 31.6 32.4 29.1 Interpolate 58.8 58.5 59.3 Write output 0.7 2 5.9 Neuse Point Thinning Adjusted interpolation parameters

Sample Watershed Results size (MB) 150 713 5819 size (mln cells) 30.8 147 396.5 total time 10m29s 58m10s 187m43s Time spent… % importing data 8 7 16 sorting by flow 15 13 building river list 31 35 29 sorting river list 19 20 computing labels 6 sort by grid order 14 12 exporting data 5 4

Future Work Terrain Modeling Terrain Analysis

Multivalent terrain models multiple surfaces hybrid raster and 3D vector for structured environments (urban) voxel for unstructured environments (forested areas) Multiple surface representation: DSM and DEM

Terrain analysis on DEMs with structures Investiagte multivalent elevation field representations that preserve road and stream connectivity

Dynamic Data Continuous data streams from various sensors Maintain smart summaries of data Communication, energy efficient algorithms for sensor networks Coping with moving/deformable objects Tracking evolving terrain features Kinetic algorithms that update the structure locally

Terrain Analysis Flow modeling Tracking evolving features Erosion analysis Soil moisture modeling (collaboration with ecologists) Tracking evolving features Visibility based navigation Finding paths from which most of the terrain is visible Computing well concealed paths Covering terrains from a few points

Acknowledgments Collaborators Sponsored by Lars Arge, Helena Mitasova Andrew Danner, Thomas Molhave, Ke Yi Sponsored by Army Research Office W911NF-04-1-0278

Relevant Publications Danner, Yi, Molhave, A., Arge, Mitasova, TerraStream: From Elevation Data to Watershed Hierarchies, submitted for publication, 2007. A., Arge, Danner, From point cloud to grid DEM: A scalable approach. In Proc. 12th SSDH, 771–788, 2006. A., Arge, Yi, I/O-efficient construction of constrained Delaunay triangulations. In Proc. ESA, 355–366, 2005. A., Arge, Yi. I/O-efficient batched union-find and its applications to terrain analysis. In Proc. 22nd SoCG, 2006. Arge, Danner, Haverkort, Zeh. I/O-efficient hierarchical watershed decomposition of grid terrain models. In Proc. 12th SSDH, 825–844, 2006. Danner. I/O Efficient Algorithms and Applications in Geographic Information Systems. PhD thesis, Duke University, 2006.

Thanks!

Future Directions – Noise Removal Bridge detection/removal Other flow routing methods Flow routing on flat surfaces Comparing flow networks

Mapping LIDAR point density point density at the beginning of the project current industry standard no. of points/2m grid cell 1996 0.2 1997 0.9 ATM II NASA 1998 0.4 1999 1.4 2001 0.2 NCflood Leica 2003 2.0 EARL NASA 2004 15.0 SHOALS USACE 2005 6.0 SHOALS USACE 1998 2004 pt / 2m grid cell 2001: NC Flood 2004: SHOALS

Building an External Quad Tree levels in one scan I/Os total After first scan of data: Write top of tree to disk Flush all buffers Recurse on buffers What is in memory what is on disk

Coping with Noisy Data Identifying minima likely due to noise Don’t want to remove real features Persistent homology [ELZ 02] Computed in Sort(N) I/Os

Interpolation Scan through list of points with neighbors Interpolate each leaf separately – Use any interpolation method Evaluate at grid cells Write grid cell values as (i,j,z) as they are computed Sort grid cells by raster order Evaluate Sort Interpolate (i, j, z)