Download presentation
Presentation is loading. Please wait.
1
Modeling & Analyzing Massive Terrain Data Sets
Pankaj K. Agarwal Duke University
2
STREAM Project Scalable Techniques for hi-Resolution Elevation data Analysis & Modeling In collaboration with: Lars Arge, Helena Mitasova Students: Andrew Danner,Thomas Molhave,Ke Yi
3
Diverse elevation point data: density, distribution, accuracy
Photogrammetry 0.76m v. accuracy (5ft contours) 1974 1995 1998 Lidar 0.15m v. accuracy; altitude 700m and 2300m RTK-GPS 0.10m v. accuracy 1999 2001 2004
4
Constructing Digital Elevation Models
Grid, TIN, Contour DEMs
5
Natural feature extraction
Maps of topo parameters: computed simultaneously with interpolation using partial derivatives of the RST function, terrain features: combined thresholds of topo parameters Ma Extracted foredunes using profile curvature and elevation threshold
6
Tracking evolving features
slip faces: slope > 30deg new slip face in dune crests: profile curvature threshold how to automatically track features that change some of their properties over time?
7
Hydrology Analysis Flow Analysis Watershed hierarchy
8
Challenge: Massive Data Sets
LIDAR NC Coastline: 200 million points – over 7 GB Neuse River basin (NC): 500 million points – over 17 GB Output is also large 10ft grid: 10GB 5ft grid: 40GB Storage is not the problem. Processing is. Need good transition
9
Approximation Algorithms
Exact computation expensive Many practical problems are intractable Multiple & often conflicting optimization criteria Suffices to find a near-optimal solution Tunable Approximation algorithms Tradeoff between accuracy and efficiency User specifies tolerance
10
I/O-Bottleneck Data resides in secondary memory Disk access is 106 times slower than main memory access Maximize useful data transferred with each access Amortize large access time by transferring large contiguous blocks of data I/O – not CPU – is often the bottleneck for processing massive data sets track magnetic surface read/write arm read/write head
11
I/O-efficient Algorithms [AV88]
Traditional algorithms optimize CPU computation Not sensitive to penalty of disk access I/O model Memory is finite Data is transferred in blocks Complexity measured in disk blocks transferred Disk RAM CPU B M Virtual memory helps sometime, but not always. Depends on replacement policy, working set size. B ~ 2K-8K
12
Elevation Data to Watershed Hierarchies: A Pipeline Approach
13
Point Cloud to Grid DEM Given a point cloud sampled from a bivariate function z=F(x,y) (elevation data) Goal: construct a grid DEM for z DEM Applications Hydrology Ecology Viewsheds … Many algorithms only work on grid data
14
Sub-meter resolution: improved structures representation
2004 lidar-based 30cm resolution DSM 1999 lidar 2m resolution DSM 2004 lidar: 0.5m resolution binning interpolated DSM Binning (within cell average) sufficient for ~1-3m DEMs but interpolation produces improved geometry representation and allows us to create higher resolution DEMs/DSMs
15
General Approach: Interpolation
Kriging, IDW, Splines, … Interpolate O(N3) z=F(x,y) N Points Modern data sets can have millions of points Direct interpolation does not scale
16
Improving Interpolation with Quad Trees
Segment initial point set using a quad tree Each quad tree leaf has a <K points (10-100) Interpolate each segment separately N/K segments O(K3) interpolation per segment O(K2N) total running time Linear in N Boundary smoothness? Use points from nearby segments
17
Overview of the Algorithm
Construct quad-tree on disk using bulk loading [AAPV 01] For each quad-tree node, find points in neighboring segments Arrange points on disk for sequential interpolation Use the algorithm by Mitasova, Mitas, Harmon Can use any other interpolation method
18
Building an External Quad Tree
Build top few levels in memory Buffer points in lower levels Write buffers to disk when full K=2 Block I/O
19
Building a TIN TIN: Triangulated Irregular Network
Constrained Delaunay Triangulation Developed an I/O- efficient algorithm Builds TIN for Neuse River Basin (14GB) in 7.5 hours
20
Future Work: Hierarchical DEMs
Considerable work in graphics and GIS communities on multiresolution hierarchies for terrains Focuses on visualization Most algorithms perform random memory access Out-of-core simplification I/O-efficient algorithms for terrain simplification (edge contraction method) Feature preserving simplification Preserves main river networks Preserves visibility for most of the points
21
Coping with Noisy Data Nearly all flow models assume [O’Callaghan and Mark 84, de Berg et al. 96] water flows downwards stops at a minimum
22
Denoising via Flooding [Jenson and Domingue 88, Arge et al. 03]
Pour water uniformly on terrain Water level rises until steady-state
23
River Network After Flooding
Sort(N) I/Os
24
Denoising via Flooding [Jenson and Domingue 88, Arge et al. 03]
Pour water uniformly on terrain Water level rises until steady-state Use topological persistence (Edelsbrunner et al.) as importance measure to measure the importance of a minimum Flood low-persistence minima, preserve high-persistence minima
25
Topological Persistence [Edelsbrunner et al. 02]
26
The Union-Find Problem
A universe of N elements: x1, x2, …, xN Initially N singleton sets: {x1}, {x2 }, …, {xN} Each set has a representative Maintain the partition under Union(xi, xj) : Joins the sets containing xi and xj Find(xi) : Returns the representative of the set containing xi
27
Formulated as Batched Union-Find
On a terrain represented as a TIN When reach A minimum or maximum: do nothing A regular point u: Issue union(u,v) for a lower neighbor v A saddle u: let v and w be nodes from u’s two connected pieces in its lower link Issue: find(v), find(w), union(u,v), union(u,w) Sequence of union/find operations can be computed in advance lower link
28
Classical Union-Find Algorithm
representatives d h i p b j a f l z s r c k e g m n Union(d, h) : Find(n) : h h d f l d f l m n b j a m b j a link-by-rank path compression e g n e g
29
Complexity O(N α(N)) for a sequence of N union and find operations [Tarjan 75] α(•) : Inverse Ackermann function (very slow!) Optimal in the worst case [Tarjan79, Fredman and Saks 89] Batched (Off-line) version Entire sequence known in advance Can be improved to linear on RAM [Gabow and Tarjan 85] Not possible on a pointer machine [Tarjan79]
30
Our Results An I/O-efficient algorithm for the batched union-find problem using O(sort(N)) = O(N/B logM/B(N/B)) I/Os Same as sorting optimal in the worst case A simpler algorithm using O(sort(N) log(N/M)) I/Os Implemented Apply to persistence computation, flooding O(sort(N)) time Other applications
31
I/O-Efficient Batched Union-Find
Two-stage algorithm Convert to interval union-find Compute an order on the elements s.t. each union joins two adjacent sets Solve batched interval union-find Each stage formulated as a geometric problem and solved using sort(N) I/Os
32
Traditional algorithm good as long as data fits in memory
Experimental Results Traditional algorithm good as long as data fits in memory
33
Experimental Results:Neuse River Basin
0.5 billion points (14GB raw data), sampled to obtain smaller data sets On the entire data set: EM takes 5 hours, IM fails
34
Contour Trees
35
Previous Results Directly maintain contours
O(N log N) time [van Kreveld et al. 97] Needs union-split-find for circular lists Do not extend to higher dimensions Two sweeps by maintaining components, then merge O(N log N) time [Carr et al. 03] Extend to arbitrary dimensions
36
Join Tree and Split Tree [Carr et al. 03]
Qualified nodes 9 9 9 9 8 8 8 8 7 7 7 7 6 6 6 6 5 5 5 5 4 4 4 4 3 3 3 3 2 2 1 1 1 1 Join tree Split tree Join tree Split tree
37
Final Contour Tree Hard to BATCH! Join tree Split tree Contour tree 9
8 8 8 7 7 7 6 6 6 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 Join tree Split tree Contour tree
38
Another Characterization
Let w be the highest node that is a descendant of v in join tree and ancestor of u in split tree, (u, w) is a contour tree edge 9 8 7 6 5 4 3 2 1 Join tree Split tree Contour tree u v w Now can BATCH!
39
Experimental Results On the Neuse River Basin data set
40
Coping with Noise: Future Work
Constructing hydrologically conditioned DEMs Formulate as an optimization problem Using persistent homology Integrating topology and geometry Handling flat areas Scalability remains a big issue Handling anthropogenic features
41
Back to Pipeline: Flow Modeling
42
From Elevation to River Networks
Where does water go? From higher elevation to lower elevation Single flow directions forms a tree 80 90 100 95 85 110 Sea
43
Drainage Area How much area is upstream of each node?
1 How much area is upstream of each node? Each node has initial drainage area (1) Drainage area of internal nodes depends on drainage area of children 2 3 3 3 7 17 25 14 11 9 5
44
Drainage Area example (Panama)
45
IFSARE and SRTM based stream analysis
Stream and watershed boundary extraction for Panama for on-going water quality research to provide more detailed and up-to-date data. SRTM x3600 DSM at 90m resolution for entire Panama, IFSARE x11300 DEM at 10m resolution for the Panama canal section Using r.terraflow - officially released in GRASS6.2 in streams can be extracted in 3-4 hours, tile-based approach takes days SRTM DEM for entire Panama with main rivers extracted from DEM in 3 hr
46
Hierarchical Watershed Decomposition
47
Watershed Hierarchies
Decompose a river network into a hierarchy of hydrological units All water in HU flows to a common outlet Hierarchy provides tunable level of detail Method used: Pfafstetter [VV99] Want a solution scalable to large modern hi-res terrains
48
Pfafstetter 8 9 7 4 5 3 1 2 6 Find main river
17 25 14 11 9 5 Find main river Find four largest tributaries Label basins/interbasins Recurse until single path 8 6 4 2 7 5 3 1 9
49
Recurse 71 75 73 74 72 7 2 26 25 24 23 21 22 27
50
Example Watershed Boundaries
1 2 3 4 5 6 7 8 9 91 92 93 94 95 96 97 98 99 991 992 993 994 995 996 997 998 999
51
A Complete Pipeline
52
Implementation TPIE: C++ primitives for I/O-efficient algorithms
GRASS: Open Source GIS Interpolation: Regularized spline with tension (in GRASS) Data: North Carolina LIDAR Neuse river basin: 400 million points (NC Floodmaps) Outer banks coastal data : 128 million points (NOAA CSC) USGS 30m NED
53
Grid Construction Results
Resolution (ft) 40 20 10 Grid cells x106 221 885 3542 Points x106 205 340 415 Total time 12h32m 14h46m 26h52m Time spent(%) Build quad tree 8.9 7.1 5.7 Find neighbors 31.6 32.4 29.1 Interpolate 58.8 58.5 59.3 Write output 0.7 2 5.9 Neuse Point Thinning Adjusted interpolation parameters
54
Sample Watershed Results
size (MB) 150 713 5819 size (mln cells) 30.8 147 396.5 total time 10m29s 58m10s 187m43s Time spent… % importing data 8 7 16 sorting by flow 15 13 building river list 31 35 29 sorting river list 19 20 computing labels 6 sort by grid order 14 12 exporting data 5 4
55
Future Work Terrain Modeling Terrain Analysis
56
Multivalent terrain models
multiple surfaces hybrid raster and 3D vector for structured environments (urban) voxel for unstructured environments (forested areas) Multiple surface representation: DSM and DEM
57
Terrain analysis on DEMs with structures
Investiagte multivalent elevation field representations that preserve road and stream connectivity
58
Dynamic Data Continuous data streams from various sensors
Maintain smart summaries of data Communication, energy efficient algorithms for sensor networks Coping with moving/deformable objects Tracking evolving terrain features Kinetic algorithms that update the structure locally
59
Terrain Analysis Flow modeling Tracking evolving features
Erosion analysis Soil moisture modeling (collaboration with ecologists) Tracking evolving features Visibility based navigation Finding paths from which most of the terrain is visible Computing well concealed paths Covering terrains from a few points
60
Acknowledgments Collaborators Sponsored by Lars Arge, Helena Mitasova
Andrew Danner, Thomas Molhave, Ke Yi Sponsored by Army Research Office W911NF
61
Relevant Publications
Danner, Yi, Molhave, A., Arge, Mitasova, TerraStream: From Elevation Data to Watershed Hierarchies, submitted for publication, 2007. A., Arge, Danner, From point cloud to grid DEM: A scalable approach. In Proc. 12th SSDH, 771–788, 2006. A., Arge, Yi, I/O-efficient construction of constrained Delaunay triangulations. In Proc. ESA, 355–366, 2005. A., Arge, Yi. I/O-efficient batched union-find and its applications to terrain analysis. In Proc. 22nd SoCG, 2006. Arge, Danner, Haverkort, Zeh. I/O-efficient hierarchical watershed decomposition of grid terrain models. In Proc. 12th SSDH, 825–844, 2006. Danner. I/O Efficient Algorithms and Applications in Geographic Information Systems. PhD thesis, Duke University, 2006.
62
Thanks!
63
Future Directions – Noise Removal
Bridge detection/removal Other flow routing methods Flow routing on flat surfaces Comparing flow networks
64
Mapping LIDAR point density
point density at the beginning of the project current industry standard no. of points/2m grid cell ATM II NASA NCflood Leica EARL NASA SHOALS USACE SHOALS USACE 1998 2004 pt / 2m grid cell 2001: NC Flood 2004: SHOALS
65
Building an External Quad Tree
levels in one scan I/Os total After first scan of data: Write top of tree to disk Flush all buffers Recurse on buffers What is in memory what is on disk
66
Coping with Noisy Data Identifying minima likely due to noise
Don’t want to remove real features Persistent homology [ELZ 02] Computed in Sort(N) I/Os
67
Interpolation Scan through list of points with neighbors
Interpolate each leaf separately – Use any interpolation method Evaluate at grid cells Write grid cell values as (i,j,z) as they are computed Sort grid cells by raster order Evaluate Sort Interpolate (i, j, z)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.