Download presentation
1
Distributed Geospatial Indexing
Rich Fecher Kent Miller
2
What is GeoWave? GeoWave…
Bridges the gaps between popular geospatial projects, and distributed processing/analytics frameworks. Leverages the scalability of distributed key-value stores for effective storage, retrieval, and analysis of massive geospatial datasets. An open source LocationTech project from the National Geospatial-Intelligence Agency (NGA) in collaboration with RadiantBlue Technologies and Booz-Allen Hamilton.
3
Core Problem How should GeoWave index multi-dimensional (i.e. spatial) data in a 1-dimensional, sorted key-value store?
4
Dimensionality Reduction
Use a Space Filling Curve (SFC) to impose multi-dimensional data.
5
Space Filling Curve Selection
Z-Order Hilbert H-order Peano AR2W2 BΩ WL∞ ∞ 6 4 8 5.40 5.00 WL2 6.04 WL1 9 10.66 12.00 9.00 WBA 2.40 3.00 2.00 3.05 2.22 ABA 2.86 1.41 1.69 1.42 1.47 1.40 Haverkort, Walderveen Locality and Bounding-Box Quality of Two-Dimensional Space-Filling Curves 2008 arXiv: v2 Worst Case Bounding Box Area Ratio (WBA) Average Total Bounding Box Area (ABA) Worst Case Dilation
6
Query-Time Aggregation
Computes customized summaries on query Runs distributed on data nodes Returns user-specified aggregate values
7
Map Occlusion Culling: Subsample At Pixel Resolution
A specific determined zoom level, each pixel signifies a range in degrees. Scanning the data, only one entry is needed within each pixel range. The rest of the entries can be skipped. The block identified in red represents many data points, but is rendered by the 9 pixels.
8
Pixel-based Subsampling: Apply at Data Node on Keys
1 2 3 4 2 3 Database Data Displayed Pixels The accumulo iterator starts at the first pixel, scans until it hits a geometry, then skips to the next pixel. 1 4 TO DO: Fix graphic The rendering engine received only these points Scan to the first pixel Seek to the beginning of the next pixel Points that were all skipped.
9
Distributed Rendering
GeoServer (GeoWave Plugin) Map Request Layer Style GeoWave Data Nodes Each scan result is an image with the data in the range Rendered Map Map Response All resultant images are composited together
10
Built On
11
Additional Core Features in GeoWave
Command Line Utilities & RESTful Web Services Local/HDFS Ingest Kafka Streaming Ingest Stats Generation OSM Utilities Landsat8 Utilities Base Analytics KDE Heat Map DBSCAN K-Means Clustering Open Geospatial Consortium standard services via GeoServer Spatial-Temporal indexing Integrated with other geospatial frameworks PDAL, Mapnik HBase Support Raster datasets Rapid Deployment via AWS EMR GeoWave MapReduce input/output formats Statistics Attribute ranges/histograms Enveloping bounding box over all geometries Counts of # of stored items Counts of discrete attribute values Support for a variety of ingest types Shapefile, GeoJSON, PostGIS, ArcGrid, GeoTIFF, GPX, T-Drive
12
GeoWave Where can I find it?
GitHub Source Examples Documentation This Demo
13
Thanks! Rich Fecher rfecher@radiantblue.com Kent Miller
14
Backup Slides
15
OSM – Planet GPX Every track ever uploaded to Open Street Map
Complete data attribution 2.9 Billion spatial entities (points)
16
Global View Entire Pointset Visualized
17
Zoomed In
19
Kernel Density Estimate Global View
20
Kernel Density Estimate Zoomed In
21
Tiered Indexing Tier 0 (1x1) Tier 1 (2x2) Tier 2 (4x4) Tier 3 (8x8)
Point Polygon Tier Duplicates Cell(s) 1 4 2 14 6 3 56 21-24 222 9 35-42
22
Space Filling Curve Granularity
8x8 Grid 64x64 Grid Polygons overlap few cells Many points per cell Polygons overlap many cells Fewer points per cell Which is better? Why?
23
Good to Know Space Filling Curve Range Decomposition
Intersects Bounding Box Query (71 -> 98) Range of cells from 92-99 Range of cells from 70-75 Range of cells from
24
Microsoft - GeoLife Microsoft research has made available a trajectory data set that contains the GPS coordinates of 182 users over a three year period (April 2007 to August 2012). There are 17,621 trajectories in this data set.
25
GeoLife Original Track Data
26
Kernel Density Estimate Gaussian Kernel
27
Zoomed In Original Track Data
28
Kernel Density Estimate Zoomed In
29
Good to Know GeoWave Key Structure
Value Row ID Column Time-Stamp Family Qualifier Visibility Index ID Adapter ID Data ID Adapter ID Length Data ID Length # of Duplicates Field ID Field Value Tier Bin Hilbert GeoWave Feature Metadata Feature Attribute Field/Value Pair Tier Index Hilbert SFC Index
30
Initially 2 Points, 2 Polygons
Ingest Example Ingest a Line Tier 31 Tier 3 Initially 2 Points, 2 Polygons Many Duplicates 18 Duplicates Tier 2 Tier 1 Tier 0 Final State 8 Duplicates 3 Duplicates 1 Duplicate Final State
31
Query Example Intersects Query Tier 1 Tier 2 Magenta Bounding Box
1 cell, 0 filtered, 0 intersect 2 cells, 2 filtered, 0 intersect Tier 3 Tier 4 Tier 31 Final Result 4 cells, 1 filtered, 0 intersect 6 cells, 0 filtered, 1 intersect 0 filtered, 1 intersect 1 Point, 1 Polygon Returned
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.