Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.

Slides:



Advertisements
Similar presentations
I/O-Algorithms Lars Arge Spring 2012 April 17, 2012.
Advertisements

Computational Geometry
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Polygon overlay in double precision arithmetic One example of why robust geometric code is hard to write Jack Snoeyink & Andrea Mantler Computer Science,
I/O-Algorithms Lars Arge Fall 2014 September 25, 2014.
Lars Arge 1/13 Efficient Handling of Massive (Terrain) Datasets Lars Arge A A R H U S U N I V E R S I T E T Department of Computer Science.
Convex Hull obstacle start end Convex Hull Convex Hull
External Memory Geometric Data Structures
Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying.
I/O-Algorithms Lars Arge January 31, Lars Arge I/O-algorithms 2 Random Access Machine Model Standard theoretical model of computation: –Infinite.
I/O-Efficient Batched Union-Find and Its Applications to Terrain Analysis Pankaj K. Agarwal, Lars Arge, Ke Yi Duke University University of Aarhus.
I/O-Algorithms Lars Arge University of Aarhus February 21, 2005.
I/O-Algorithms Lars Arge Aarhus University February 27, 2007.
I/O-Algorithms Lars Arge Spring 2011 March 8, 2011.
Lecture 12 : Special Case of Hidden-Line-Elimination Computational Geometry Prof. Dr. Th. Ottmann 1 Special Cases of the Hidden Line Elimination Problem.
I/O-Algorithms Lars Arge Aarhus University February 13, 2007.
I/O-Algorithms Lars Arge Aarhus University March 16, 2006.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
Special Cases of the Hidden Line Elimination Problem Computational Geometry, WS 2007/08 Lecture 16 Prof. Dr. Thomas Ottmann Algorithmen & Datenstrukturen,
I/O-Algorithms Lars Arge Spring 2009 January 27, 2009.
I/O-Algorithms Lars Arge Spring 2007 January 30, 2007.
I/O-Algorithms Lars Arge Aarhus University February 16, 2006.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge University of Aarhus February 13, 2005.
I/O-Algorithms Lars Arge Spring 2009 April 28, 2009.
I/O-Algorithms Lars Arge University of Aarhus March 1, 2005.
I/O-Algorithms Lars Arge Spring 2009 March 3, 2009.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
I/O-Algorithms Lars Arge Spring 2006 February 2, 2006.
Lars Arge 1/14 A A R H U S U N I V E R S I T E T Department of Computer Science Efficient Handling of Massive (Terrain) Datasets Professor Lars Arge University.
I/O-Algorithms Lars Arge Aarhus University March 5, 2008.
I/O-Algorithms Lars Arge Aarhus University April 16, 2008.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
I/O-Algorithms Lars Arge Aarhus University February 9, 2006.
I/O-Algorithms Lars Arge Aarhus University March 9, 2006.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
Massive Data Algorithmics Faglig Dag, January 17, 2008 Gerth Stølting Brodal University of Aarhus Department of Computer Science.
I/O-Algorithms Lars Arge Aarhus University March 6, 2007.
I/O-Algorithms Lars Arge University of Aarhus March 7, 2005.
Flow modeling on grid terrains. Why GIS?  How it all started.. Duke Environmental researchers: computing flow accumulation for Appalachian Mountains.
From Elevation Data to Watershed Hierarchies Pankaj K. Agarwal Duke University Supported by ARO W911NF
Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere.
I/O-Algorithms Lars Arge Spring 2008 January 31, 2008.
1 Geometric Intersection Determining if there are intersections between graphical objects Finding all intersecting pairs Brute Force Algorithm Plane Sweep.
I/O-Algorithms Lars Arge Fall 2014 August 28, 2014.
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
External Memory Algorithms for Geometric Problems Piotr Indyk (slides partially by Lars Arge and Jeff Vitter)
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
CSE332: Data Abstractions Lecture 8: Memory Hierarchy Tyler Robison Summer
Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.
MotivationFundamental ProblemsProblems on Graphs Parallel processors are becoming common place. Each core of a multi-core processor consists of a CPU and.
Lecture 2: External Memory Indexing Structures CS6931 Database Seminar.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
Flow Modeling on Massive Grids Laura Toma, Rajiv Wickremesinghe with Lars Arge, Jeff Chase, Jeff Vitter Pat Halpin, Dean Urban in collaboration with.
External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets.
1 Data Structures CSCI 132, Spring 2014 Lecture 1 Big Ideas in Data Structures Course website:
FALL 2005CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
Lecture 1: Basic Operators in Large Data CS 6931 Database Seminar.
Internal Memory Pointer MachineRandom Access MachineStatic Setting Data resides in records (nodes) that can be accessed via pointers (links). The priority.
Copyright © 2009 Curt Hill Look Ups A Recurring Theme.
TerraSTREAM: Terrain Processing Pipeline MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation What TerraSTREAM.
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
Digital Terrain Analysis for Massive Grids
Real-Time Ray Tracing Stefan Popov.
Query Processing in Databases Dr. M. Gavrilova
Advanced Topics in Data Management
Advanced Topics in Data Management
Introduction to Data Structures
Presentation transcript:

Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge

Lars Arge 2/43 Outline of Talk 1.Big terrain data 2.I/O-efficient algorithms 3.I/O-efficient big terrain data algorithms  Surface water flow modeling 4.SCALGO

Lars Arge 3/43 Big Terrain Data

Lars Arge 4/43 Big Terrain Data  Shuttle Radar Topography Mission (SRTM):  11 day mission in 2000  Interferometric Synthetic Aperture Radar (IFSAR)  Near global dataset (60 ○ N - 53 ○ S)  3-arc seconds (90-meter at equator) raster  ~60 billion cells in roughly files (tiles)

Lars Arge 5/43 Big and Detailed Terrain Data  LiDAR delivers detailed and accurate data  Denmark (~ km 2 ):  Previously: Typically 30 meter resolution  ~46 million points (<1GB)  Now: LiDAR data of 1,6 meter resolution  ~26 billion points (>1TB)  NC ~ km 2  US ~ km 2

Lars Arge 6/43 Detailed Terrain Data Essential Sea-level rise (2 meter effect on Mandø) 90 meter terrain model 1,6 meter terrain model

Lars Arge 7/43 1,6 meter terrain model90 meter terrain model Detailed Terrain Data Essential Drainage network (flow accumulation)

Lars Arge 8/43 Massive Terrain Data Hard to Handle  Most commercial systems cannot handle truly massive datasets  and certainly not all of Denmark at 1,6-meter resolution  Typical workarounds  Tiling: Break terrain into pieces processed individually  Not really possible for flow computations  Leads to cumbersome workflows  Simplification: Reduce data size  Leads to unreliable results

Lars Arge 9/43 I/O-Efficient Algorithms

Lars Arge 10/43 “The difference in speed between modern CPU and disk technologies is analogous to the difference in speed in sharpening a pencil using a sharpener on one’s desk or by taking an airplane to the other side of the world and using a sharpener on someone else’s desk.” (D. Comer)  I/O is often bottleneck when handling massive datasets  Disk access is 10 6 times slower than main memory access!  Disk systems try to amortize large access time transferring large contiguous blocks of data  I/O-efficient algorithms: Store and access data to use blocks! I/O-Efficient Algorithms data size running time I/O-efficient algorithm Main memory size Normal algorithm

Lars Arge 11/43 I/O-model of Computation  Model parameters [AV88]  N= # of items in the problem instance  B = # of items per disk block  M = # of items that fit in main memory  Goal: Minimize I/O  Move of B consecutive elements between memory and disk D P M Block I/O B

Lars Arge 12/43 I/O-Efficient Algorithms Matter  Example: Traversing linked list (List ranking)  Array size N = 10 elements  Disk block size B = 2 elements  Main memory size M = 4 elements (2 blocks)  Difference between N and N/B large since block size is large  Example: N = 256 x 10 6, B = 8000, 1ms disk access time  N I/Os take 256 x 10 3 sec = 4266 min = 71 hr  N/B I/Os take 256/8 sec = 32 sec Algorithm 2: N/B=5 I/OsAlgorithm 1: N=10 I/Os

Lars Arge 13/43 I/O-model of Computation  Scanning: O(N/B)  Sorting: O(Sort(N))=  Searching: O(log B N)  Note: Not sort optimally with search tree  Priority queue: O(Sort(N)/N) amortized  Many O(Sort(N)) computational geometry results  Many graph algorithms results D P M Block I/O B Surveys [A02,V06]

Lars Arge 14/43 External Plane Sweeping  Plane sweeping powerful technique for solving geometric problems  Example: Orthogonal line segment intersection  Sweep line top-down while maintaining search tree T on vertical segments crossing sweep line (by x-coordinates)  Top endpoint of vertical segment: Insert in T  Bottom endpoint of vertical segment: Delete from T  Horizontal segment: Perform range query with x-interval on T

Lars Arge 15/43 External Plane Sweeping  Plane sweeping powerful technique for solving geometric problems  Example: Orthogonal line segment intersection  In internal memory algorithm runs in optimal O(N log N+T) time  In external memory algorithm performs badly (>N I/Os) if |T|>M  Even if we implements T as B-tree  O(N log B N+T/B) I/Os

Lars Arge 16/43 Distribution Sweeping  Divide plane into M/B slabs with N/(M/B) endpoints each  Sweep plane top-down while reporting intersections between  Vertical and part of horizontal segments spanning slab(s)  Distribute data to M/B slabs  Vertical and non-spanning parts of horizontal segments  Recurse in each slab

Lars Arge 17/43 Distribution Sweeping  Sweep performed in O(N/B+T’/B) I/Os  I/Os  Maintain active list of vertical segments for each slab (<B in MM)  Top endpoint of vertical segment: Insert in active list  Horizontal segment: Scan through all relevant active lists  Removing “expired” vertical segments  Reporting intersections with “non-expired” vertical segments

Lars Arge 18/43 (Some) I/O-Efficient Big Terrain Data Algorithms

Lars Arge 19/43 I/O-model Terrain Results Model construction  O(Sort(N)) Triangular irregular network (TIN):  Build planar triangulation on input points and lift to 3d  Delaunay triangulation [GTVV93]  Constrained Delaunay triangulation [AAY05]  O(Sort(N)) Raster (Grid):  By Delaunay or spline interpolation [AAD06] Point data cleaning  O(Sort(N)) removal of noise (outliers) in sonar data  Planar graph (Delaunay) connected components  Under some realistic assumptions

Lars Arge 20/43 I/O-model Terrain Results Contours from model (TIN/Raster)  Easy to construct contour segments in O(N/B+T/B)  Hard to order segments along individual contours  Tracing contours  O(T) I/Os  Sorting/List ranking contours  O(Sort(T))  O(Sort(N)+T/B)) ordered contour map and nesting info [AAMS08]  Order TIN triangles such that partial connected contours are nested like balanced parenthesis

Lars Arge 21/43 I/O-model Terrain Results Model Simplification  LiDAR  many “insignificant” depressions  e.g. ugly contours  Removal of insignificant depressions:  Score each depression (persistent homology)  Remove low score depressions by “flooding”  Sweep terrain: Depression Score = death – birth time  Minimum: Component is born  Saddle: Components merge; later birth time component die  O(Sort(N)) simplification (persistent homology) [AAY06]  Using batched Union-Find solved using distribution sweeping Significant Insignificant

Lars Arge 22/43 I/O-model Terrain Results Contour simplification  Removal of insignificant depressions removes insignificant contours  But simplification of individual curves still needed  Simplifying individual contours (e.g. Douglass-Peucker) may result in non- homotopic and intersecting contours  Recently, O(Sort(T)) practical algorithm maintaining homotopy and guaranteeing non-intersecting contours [ADMRT12]  Under some realistic assumptions  Simplify model directly?  Open to do so I/O-efficiently!

Lars Arge 23/43 Flood Modeling

Lars Arge 24/43 Flood Risk Analysis Important  Increasingly important to predict areas susceptible to floods  Due to e.g extreme rain or rising sea-level  Hurricane Floyd Sep. 15, am 3pm

Lars Arge 25/43 Basic Raster Terrain Flow Modeling  Rising sea-level: Sea-level mapping  The minimal sea-level height each cell is flooded  Extreme rain: Surface water flow  Flow direction: The direction water flows at each cell  Flow accumulation: Amount of water flowing through each cell  Flow accumulation of cell = size of “upstream area”  Drainage network = cells with high flow accumulation

Lars Arge 26/43 Detailed Terrain Data Essential Sea-level rise (2 meter effect on Mandø) 90 meter terrain model 1,6 meter terrain model

Lars Arge 27/43 1,6 meter terrain model90 meter terrain model Detailed Terrain Data Essential Drainage network (flow accumulation)

Lars Arge 28/43 Rising Sea-Level  The minimal sea-level height each cell is flooded  New grid where terrain below h is flooded when water rise to h  No commercial software seemed to be able to process all of Denmark at 1,6-meter resolution  I/O-efficient algorithm:  Result is simply simplification with score threshold ∞ [AKY06]  Denmark in a day on standard 4GB desktop!

Lars Arge 29/43 Flow Accumulation  Initially one unit of water in each grid cell  Water (initial and received) distributed from each cell to lowest lower neighbor cell (if existing)  Flow accumulation of cell is total flow through it

Lars Arge 30/43 Flow Accumulation Algorithm  Sweep cells in decreasing height order. At each cell:  Flow from flow grid and neighbor heights from height grid  Update flow (flow grid) for downslope neighbors  Problem: Cells of same height distributed over the terrain  Scattered access to flow grid and height grid  Ω(N) I/Os  Natural to try “tiling”: But different tiles not independent!  Performance of commercial systems are often unpredictable  And cannot handle Denmark at 1,6-meter resolution

Lars Arge 31/43 I/O-Efficient Flow Accumulation  Eliminating height grid scattered accesses:  Augment each cell with height of 8 neighbors  Eliminating flow grid scattered accesses: Note: Flow to neighbor only needed when reaching its elevation  Distribute flow by inserting element in priority queue Priority equal to neighbor’s height (and grid position)  Flow of cell obtained using DeleteMin operations  Turns O(N) grid accesses into O(N) priority queue operations  O(Sort(N)) algorithm [ACHTUVW03]  Denmark 1,6-meter model in two days on standard 4GB desktop!  Really “Time-forward processing” technique [CGGTVV95]

Lars Arge 32/43 Flash Flood Mapping  Models how surface water gathers in depressions as it rains  Water from watershed of depression gathers in the depression  Depressions fill, leading to (potentially dramatic) increase in neighbor depression watershed size  Flash Flood Mapping:  Amount of rain before any given raster cell is below water Watershed area Volume Watershed area Volume

Lars Arge 33/43 Flash Flood Mapping  Relatively easily solved in O(N log N) internal memory time [LS05]  Using priority queue and Union-Find structure  Algorithm runs in O(N (N))+Sort(N)) I/Os since no I/O-efficient online union-find structure known  Recently solved in O(Sort(N) log (N/M)) I/Os [ARZ10]  e.g. using time-forward processing and batched Union-Find Watershed area Volume

Lars Arge 34/43 SCALGO

Lars Arge 35/43 SCALGO  Established in 2009 to commercialize I/O-efficient terrain processing technology  Founders:  Lars Arge  Pankaj Agarwal  Morten Revsbæk  Thomas Mølhave

Lars Arge 36/43 SCALGO Products Embedded SCALGO S-CAN Software Packages SCALGO Model SCALGO Hydrology SCALGO Simplify Computation Services SCALGO Flash Flood Mapping SCALGO Contour Maps SCALGO I/O-Efficient Technology

Lars Arge 37/43 SCALGO Software Products  Embedded software: SCALGO S-CAN for terrain (sonar) data cleaning within EIVA NaviModel product  Software packages: SCALGO Model constructs and simplifies massive terrain models SCALGO Hydrology performs basic hydrological analysis on massive raster terrain models SCALGO Simplify adds massive raster terrain model simplification functionality to SCALGO Hydrology

Lars Arge 38/43 SCALGO Model Success Stories  LiDAR scan of Denmark ( km 2 )  Whole country  26 billion points in roughly 14,000 files  On standard workstation with 4GB memory  National 2-meter 26 billon cell raster model in 2 days  Without thinning or tiling  Simplification of 26 billon cell raster model full model in ½ day

Lars Arge 39/43 SCALGO Hydrology Success Stories  Sea-level rise:  National sea-level rise tool launched on Danish Ministry of the Environment climate change portal (klimaportalen.dk)  One weekend hits = normal 6 months  Flow accumulation:  Shuttle Radar Topography Mission (SRTM):  90-meter near global grid (~60G cells)  Large USGS Hydrosheds project produced “hydrological conditioned” grid  But upscaled to 500-meter to compute flow accumulation  SCALGO flow accumulation in 1½ day on standard workstation

Lars Arge 40/43 SCALGO Computation Services  SCALGO performs computation consulting work:  On big terrains using SCALGO Model and SCALGO Hydrology  Using advanced in-house tool, such as tools for  Production of realistic contours  Flash Flood Mapping

Lars Arge 41/43 Flash Flood Mapping Success Stories  The major European (Danish) engineering consulting company COWI has launched product in Denmark (”Skybrudskort®“)Skybrudskort®  Sold to over 10 local government and one of 5 regions ( km²)  Now being produced for entire country  The major Florida engineering consulting company Jones Edmunds recently compared Flash Flood Mapping to result of advanced dynamic model (ICPR) for Marion County, Florida  Results very close  Significantly more detailed  Cost under 5% (significantly reduced production time)

Lars Arge 42/43 Demo  Demo  Sea-level rise, flow accumulation and Flash Flood Mapping  Computed I/O-efficient using SCALGO Software  On 1,6-meter resolution Denmark raster (~26 Billion cells)  Build from LiDAR 26 Billion LiDAR points in its entirety  On 90-meter resolution near global raster (~60 Billion cells)

Lars Arge 43/43 Summary/Conclusion  Big terrain data available, essential and hard to handle  I/O often bottleneck when handling big terrain data  I/O-efficient algorithms  Many terrain data algorithms developed based on CG technology  solutions to important practical problems  Technology commercialized by SCALGO  Open problem examples:  Model simplification  Dynamic flow models  Incorporation of further data (soil type, ground water, infrastructure, …) in flow models