Modeling & Analyzing Massive Terrain Data Sets

Modeling & Analyzing Massive Terrain Data Sets
Pankaj K. Agarwal Duke University

STREAM Project Scalable Techniques for hi-Resolution Elevation data Analysis & Modeling In collaboration with: Lars Arge, Helena Mitasova Students: Andrew Danner,Thomas Molhave,Ke Yi

Diverse elevation point data: density, distribution, accuracy
Photogrammetry 0.76m v. accuracy (5ft contours) 1974 1995 1998 Lidar 0.15m v. accuracy; altitude 700m and 2300m RTK-GPS 0.10m v. accuracy 1999 2001 2004

Constructing Digital Elevation Models
Grid, TIN, Contour DEMs

Natural feature extraction
Maps of topo parameters: computed simultaneously with interpolation using partial derivatives of the RST function, terrain features: combined thresholds of topo parameters Ma Extracted foredunes using profile curvature and elevation threshold

Tracking evolving features
slip faces: slope > 30deg new slip face in dune crests: profile curvature threshold how to automatically track features that change some of their properties over time?

Hydrology Analysis Flow Analysis Watershed hierarchy

Challenge: Massive Data Sets
LIDAR NC Coastline: 200 million points – over 7 GB Neuse River basin (NC): 500 million points – over 17 GB Output is also large 10ft grid: 10GB 5ft grid: 40GB Storage is not the problem. Processing is. Need good transition

Approximation Algorithms
Exact computation expensive Many practical problems are intractable Multiple & often conflicting optimization criteria Suffices to find a near-optimal solution Tunable Approximation algorithms Tradeoff between accuracy and efficiency User specifies tolerance

I/O-Bottleneck Data resides in secondary memory Disk access is 106 times slower than main memory access Maximize useful data transferred with each access Amortize large access time by transferring large contiguous blocks of data I/O – not CPU – is often the bottleneck for processing massive data sets track magnetic surface read/write arm read/write head

I/O-efficient Algorithms [AV88]
Traditional algorithms optimize CPU computation Not sensitive to penalty of disk access I/O model Memory is finite Data is transferred in blocks Complexity measured in disk blocks transferred Disk RAM CPU B M Virtual memory helps sometime, but not always. Depends on replacement policy, working set size. B ~ 2K-8K

Elevation Data to Watershed Hierarchies: A Pipeline Approach

Point Cloud to Grid DEM Given a point cloud sampled from a bivariate function z=F(x,y) (elevation data) Goal: construct a grid DEM for z DEM Applications Hydrology Ecology Viewsheds … Many algorithms only work on grid data

Sub-meter resolution: improved structures representation
2004 lidar-based 30cm resolution DSM 1999 lidar 2m resolution DSM 2004 lidar: 0.5m resolution binning interpolated DSM Binning (within cell average) sufficient for ~1-3m DEMs but interpolation produces improved geometry representation and allows us to create higher resolution DEMs/DSMs

General Approach: Interpolation
Kriging, IDW, Splines, … Interpolate O(N3) z=F(x,y) N Points Modern data sets can have millions of points Direct interpolation does not scale

Improving Interpolation with Quad Trees
Segment initial point set using a quad tree Each quad tree leaf has a <K points (10-100) Interpolate each segment separately N/K segments O(K3) interpolation per segment O(K2N) total running time Linear in N Boundary smoothness? Use points from nearby segments

Overview of the Algorithm
Construct quad-tree on disk using bulk loading [AAPV 01] For each quad-tree node, find points in neighboring segments Arrange points on disk for sequential interpolation Use the algorithm by Mitasova, Mitas, Harmon Can use any other interpolation method

Building an External Quad Tree
Build top few levels in memory Buffer points in lower levels Write buffers to disk when full K=2 Block I/O

Building a TIN TIN: Triangulated Irregular Network
Constrained Delaunay Triangulation Developed an I/O- efficient algorithm Builds TIN for Neuse River Basin (14GB) in 7.5 hours

Future Work: Hierarchical DEMs
Considerable work in graphics and GIS communities on multiresolution hierarchies for terrains Focuses on visualization Most algorithms perform random memory access Out-of-core simplification I/O-efficient algorithms for terrain simplification (edge contraction method) Feature preserving simplification Preserves main river networks Preserves visibility for most of the points

Coping with Noisy Data Nearly all flow models assume [O’Callaghan and Mark 84, de Berg et al. 96] water flows downwards stops at a minimum

Denoising via Flooding [Jenson and Domingue 88, Arge et al. 03]
Pour water uniformly on terrain Water level rises until steady-state

River Network After Flooding
Sort(N) I/Os

Denoising via Flooding [Jenson and Domingue 88, Arge et al. 03]
Pour water uniformly on terrain Water level rises until steady-state Use topological persistence (Edelsbrunner et al.) as importance measure to measure the importance of a minimum Flood low-persistence minima, preserve high-persistence minima

Topological Persistence [Edelsbrunner et al. 02]

The Union-Find Problem
A universe of N elements: x1, x2, …, xN Initially N singleton sets: {x1}, {x2 }, …, {xN} Each set has a representative Maintain the partition under Union(xi, xj) : Joins the sets containing xi and xj Find(xi) : Returns the representative of the set containing xi

Formulated as Batched Union-Find
On a terrain represented as a TIN When reach A minimum or maximum: do nothing A regular point u: Issue union(u,v) for a lower neighbor v A saddle u: let v and w be nodes from u’s two connected pieces in its lower link Issue: find(v), find(w), union(u,v), union(u,w) Sequence of union/find operations can be computed in advance lower link

Classical Union-Find Algorithm
representatives d h i p b j a f l z s r c k e g m n Union(d, h) : Find(n) : h h d f l d f l m n b j a m b j a link-by-rank path compression e g n e g

Complexity O(N α(N)) for a sequence of N union and find operations [Tarjan 75] α(•) : Inverse Ackermann function (very slow!) Optimal in the worst case [Tarjan79, Fredman and Saks 89] Batched (Off-line) version Entire sequence known in advance Can be improved to linear on RAM [Gabow and Tarjan 85] Not possible on a pointer machine [Tarjan79]

Our Results An I/O-efficient algorithm for the batched union-find problem using O(sort(N)) = O(N/B logM/B(N/B)) I/Os Same as sorting optimal in the worst case A simpler algorithm using O(sort(N) log(N/M)) I/Os Implemented Apply to persistence computation, flooding O(sort(N)) time Other applications

I/O-Efficient Batched Union-Find
Two-stage algorithm Convert to interval union-find Compute an order on the elements s.t. each union joins two adjacent sets Solve batched interval union-find Each stage formulated as a geometric problem and solved using sort(N) I/Os

Traditional algorithm good as long as data fits in memory
Experimental Results Traditional algorithm good as long as data fits in memory

Experimental Results:Neuse River Basin
0.5 billion points (14GB raw data), sampled to obtain smaller data sets On the entire data set: EM takes 5 hours, IM fails

Contour Trees

Previous Results Directly maintain contours
O(N log N) time [van Kreveld et al. 97] Needs union-split-find for circular lists Do not extend to higher dimensions Two sweeps by maintaining components, then merge O(N log N) time [Carr et al. 03] Extend to arbitrary dimensions

Join Tree and Split Tree [Carr et al. 03]
Qualified nodes 9 9 9 9 8 8 8 8 7 7 7 7 6 6 6 6 5 5 5 5 4 4 4 4 3 3 3 3 2 2 1 1 1 1 Join tree Split tree Join tree Split tree

Final Contour Tree Hard to BATCH! Join tree Split tree Contour tree 9
8 8 8 7 7 7 6 6 6 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 Join tree Split tree Contour tree

Another Characterization
Let w be the highest node that is a descendant of v in join tree and ancestor of u in split tree, (u, w) is a contour tree edge 9 8 7 6 5 4 3 2 1 Join tree Split tree Contour tree u v w Now can BATCH!

Experimental Results On the Neuse River Basin data set

Coping with Noise: Future Work
Constructing hydrologically conditioned DEMs Formulate as an optimization problem Using persistent homology Integrating topology and geometry Handling flat areas Scalability remains a big issue Handling anthropogenic features

Back to Pipeline: Flow Modeling

From Elevation to River Networks
Where does water go? From higher elevation to lower elevation Single flow directions forms a tree 80 90 100 95 85 110 Sea

Drainage Area How much area is upstream of each node?
1 How much area is upstream of each node? Each node has initial drainage area (1) Drainage area of internal nodes depends on drainage area of children 2 3 3 3 7 17 25 14 11 9 5

Drainage Area example (Panama)

IFSARE and SRTM based stream analysis
Stream and watershed boundary extraction for Panama for on-going water quality research to provide more detailed and up-to-date data. SRTM x3600 DSM at 90m resolution for entire Panama, IFSARE x11300 DEM at 10m resolution for the Panama canal section Using r.terraflow - officially released in GRASS6.2 in streams can be extracted in 3-4 hours, tile-based approach takes days SRTM DEM for entire Panama with main rivers extracted from DEM in 3 hr

Hierarchical Watershed Decomposition

Watershed Hierarchies
Decompose a river network into a hierarchy of hydrological units All water in HU flows to a common outlet Hierarchy provides tunable level of detail Method used: Pfafstetter [VV99] Want a solution scalable to large modern hi-res terrains

Pfafstetter 8 9 7 4 5 3 1 2 6 Find main river
17 25 14 11 9 5 Find main river Find four largest tributaries Label basins/interbasins Recurse until single path 8 6 4 2 7 5 3 1 9

Recurse 71 75 73 74 72 7 2 26 25 24 23 21 22 27

Example Watershed Boundaries
1 2 3 4 5 6 7 8 9 91 92 93 94 95 96 97 98 99 991 992 993 994 995 996 997 998 999

A Complete Pipeline

Implementation TPIE: C++ primitives for I/O-efficient algorithms
GRASS: Open Source GIS Interpolation: Regularized spline with tension (in GRASS) Data: North Carolina LIDAR Neuse river basin: 400 million points (NC Floodmaps) Outer banks coastal data : 128 million points (NOAA CSC) USGS 30m NED

Grid Construction Results
Resolution (ft) 40 20 10 Grid cells x106 221 885 3542 Points x106 205 340 415 Total time 12h32m 14h46m 26h52m Time spent(%) Build quad tree 8.9 7.1 5.7 Find neighbors 31.6 32.4 29.1 Interpolate 58.8 58.5 59.3 Write output 0.7 2 5.9 Neuse Point Thinning Adjusted interpolation parameters

Sample Watershed Results
size (MB) 150 713 5819 size (mln cells) 30.8 147 396.5 total time 10m29s 58m10s 187m43s Time spent… % importing data 8 7 16 sorting by flow 15 13 building river list 31 35 29 sorting river list 19 20 computing labels 6 sort by grid order 14 12 exporting data 5 4

Future Work Terrain Modeling Terrain Analysis

Multivalent terrain models
multiple surfaces hybrid raster and 3D vector for structured environments (urban) voxel for unstructured environments (forested areas) Multiple surface representation: DSM and DEM

Terrain analysis on DEMs with structures
Investiagte multivalent elevation field representations that preserve road and stream connectivity

Dynamic Data Continuous data streams from various sensors
Maintain smart summaries of data Communication, energy efficient algorithms for sensor networks Coping with moving/deformable objects Tracking evolving terrain features Kinetic algorithms that update the structure locally

Terrain Analysis Flow modeling Tracking evolving features
Erosion analysis Soil moisture modeling (collaboration with ecologists) Tracking evolving features Visibility based navigation Finding paths from which most of the terrain is visible Computing well concealed paths Covering terrains from a few points

Acknowledgments Collaborators Sponsored by Lars Arge, Helena Mitasova
Andrew Danner, Thomas Molhave, Ke Yi Sponsored by Army Research Office W911NF

Relevant Publications
Danner, Yi, Molhave, A., Arge, Mitasova, TerraStream: From Elevation Data to Watershed Hierarchies, submitted for publication, 2007. A., Arge, Danner, From point cloud to grid DEM: A scalable approach. In Proc. 12th SSDH, 771–788, 2006. A., Arge, Yi, I/O-efficient construction of constrained Delaunay triangulations. In Proc. ESA, 355–366, 2005. A., Arge, Yi. I/O-efficient batched union-find and its applications to terrain analysis. In Proc. 22nd SoCG, 2006. Arge, Danner, Haverkort, Zeh. I/O-efficient hierarchical watershed decomposition of grid terrain models. In Proc. 12th SSDH, 825–844, 2006. Danner. I/O Efficient Algorithms and Applications in Geographic Information Systems. PhD thesis, Duke University, 2006.

Thanks!

Future Directions – Noise Removal
Bridge detection/removal Other flow routing methods Flow routing on flat surfaces Comparing flow networks

Mapping LIDAR point density
point density at the beginning of the project current industry standard no. of points/2m grid cell ATM II NASA NCflood Leica EARL NASA SHOALS USACE SHOALS USACE 1998 2004 pt / 2m grid cell 2001: NC Flood 2004: SHOALS

Building an External Quad Tree
levels in one scan I/Os total After first scan of data: Write top of tree to disk Flush all buffers Recurse on buffers What is in memory what is on disk

Coping with Noisy Data Identifying minima likely due to noise
Don’t want to remove real features Persistent homology [ELZ 02] Computed in Sort(N) I/Os

Interpolation Scan through list of points with neighbors
Interpolate each leaf separately – Use any interpolation method Evaluate at grid cells Write grid cell values as (i,j,z) as they are computed Sort grid cells by raster order Evaluate Sort Interpolate (i, j, z)

Modeling & Analyzing Massive Terrain Data Sets

Similar presentations

Presentation on theme: "Modeling & Analyzing Massive Terrain Data Sets"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modeling & Analyzing Massive Terrain Data Sets

Similar presentations

Presentation on theme: "Modeling & Analyzing Massive Terrain Data Sets"— Presentation transcript:

Similar presentations

About project

Feedback