Distributed Geospatial Indexing

Slides:



Advertisements
Similar presentations
Content-Based Image Retrieval
Advertisements

Lecture 6 Data entry. Getting the Map into the Computer Get data in finished form Analog-to-Digital maps Digitizing Data Entry Editing and validation.
Flood Map Library MD. M. HAQUE DWR-HYDROLOGY. Building a Flood Map Library Indexing existing flood maps and geospatial data for search and retrieval Separate.
School of Environmental Sciences University of East Anglia
Group 3 Akash Agrawal and Atanu Roy 1 Raster Database.
For Mapping Biodiversity Data Data Management Options.
Raster Data in ArcSDE 8.2 Why Put Images in a Database? What are Basic Raster Concepts? How Raster data stored in Database?
Massive Graph Visualization: LDRD Final Report Sandia National Laboratories Sand Printed October 2007.
So What is GIS??? “A collection of computer hardware, software and procedures that are used to organize, manage, analyze and display.
NPS Introduction to GIS: Lecture 1
Rebecca Boger Earth and Environmental Sciences Brooklyn College.
Intro. To GIS Lecture 4 Data: data storage, creation & editing
Managing Data Interoperability with FME Tony Kent Applications Engineer IMGS.
Spatial data models (types)
Spatial Database Souhad Daraghma.
NEPAnode is a Geospatial Data and Document Management System It provides a centralized and collaborative site to access the data.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Faculty of Applied Engineering and Urban Planning Civil Engineering Department Geographic Information Systems Vector and Raster Data Models Lecture 3 Week.
Applied Cartography and Introduction to GIS GEOG 2017 EL Lecture-2 Chapters 3 and 4.
Intro to GIS and ESRI Trainers: Randy Jones, GIS Technician, Douglas County Jon Fiskness, GISP GIS Coordinator, City of Superior.
Development of Dynamic SLD and Understanding WCS Using Geo-server Supervisor Prof N.L Sarda Dept. of Computer Science & Engg. IIT-Bombay Bharti M.Tech.
BigTable and Accumulo CMSC 461 Michael Wilson. BigTable  This was Google’s original distributed data concept  Key value store  Meant to be scaled up.
Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
, Key Components of a Successful Earth Science Subsetter Architecture ASDC Introduction The Atmospheric Science Data Center (ASDC) at NASA Langley Research.
Exploring Spatial Data Infrastructure in an Open Source World Jacqueline Lowe UNC-Asheville National Environmental Modeling and Analysis Center Jacqueline.
Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Geographic Data in GIS. Components of geographic data Three general components to geographic information Three general components to geographic information.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
GeoWave Geospatial Indexing Eric Robertson Derek Yeager.
UNCLASSFIED GeoWave How Space Filling Curves accelerate ingest and query of Geospatial data Eric RobertsonDerek Yeager.
What is GIS? “A powerful set of tools for collecting, storing, retrieving, transforming and displaying spatial data”
A Use Case for GEON 1 A user request of the form: “For a given region (i.e. lat/long extent, plus depth), return a 3D structural model with accompanying.
Uploading Data Matthew Hanson  GeoNode made up of several components  Web Framework – Django  OGC Server – GeoServer  Database – PostGIS.
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
GeoServer Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
UNCLASSFIED Geospatial Indexing Eric Robertson Derek Yeager (BAH) Rich Fecher.
Flood Map Library MD. M. HAQUE DWR-HYDROLOGY. Building a Flood Map Library Indexing existing flood maps and geospatial data for search and retrieval Separate.
UNCLASSFIED GeoWave Geospatial Indexing Eric Robertson Derek Yeager (BAH) Rich.
Data Visualization with Tableau
Rayat Shikshan Sanstha’s Chhatrapati Shivaji College Satara
Presented by: Omar Alqahtani Fall 2016
Key Terms Attribute join Target table Join table Spatial join.
GEOGRAPHICAL INFORMATION SYSTEM
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
Spatial Models – Raster Stacy Bogan
Chapter 14 Big Data Analytics and NoSQL
Flanders Marine Institute (VLIZ)
Enhancing Web Map Performance in ArcGIS Online
Physical Structure of GDB
IIS for Image Processing
Aggregation Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together,
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
GeoMesa, GeoBench & SFCurve: Measuring & Improving BigGeo performance
E. Borovikov, A. Sussman, L. Davis, University of Maryland
Review- vector analyses
Introduction to Geographic Information Science
Geographic Information Systems
Microsoft Dynamics.
GIS Applications in the Water Management Sector
Lecture 2 Components of GIS
QGIS, the data model, use and storage
Lecture 13: Query Execution
NPS Introduction to GIS: Lecture 1 Based on NIMC and Other Sources.
Introduction to Portal for ArcGIS
Working with Temporal Data
Big Data and Analytics: Getting Started with ArcGIS
Presentation transcript:

Distributed Geospatial Indexing Rich Fecher rfecher@radiantblue.com Kent Miller kmiller@radiantblue.com

What is GeoWave? GeoWave… Bridges the gaps between popular geospatial projects, and distributed processing/analytics frameworks. Leverages the scalability of distributed key-value stores for effective storage, retrieval, and analysis of massive geospatial datasets. An open source LocationTech project from the National Geospatial-Intelligence Agency (NGA) in collaboration with RadiantBlue Technologies and Booz-Allen Hamilton.

Core Problem How should GeoWave index multi-dimensional (i.e. spatial) data in a 1-dimensional, sorted key-value store?

Dimensionality Reduction Use a Space Filling Curve (SFC) to impose multi-dimensional data.

Space Filling Curve Selection Z-Order Hilbert H-order Peano AR2W2 BΩ WL∞ ∞ 6 4 8 5.40 5.00 WL2 6.04 WL1 9 10.66 12.00 9.00 WBA 2.40 3.00 2.00 3.05 2.22 ABA 2.86 1.41 1.69 1.42 1.47 1.40 Haverkort, Walderveen Locality and Bounding-Box Quality of Two-Dimensional Space-Filling Curves 2008 arXiv:0806.4787v2 Worst Case Bounding Box Area Ratio (WBA) Average Total Bounding Box Area (ABA) Worst Case Dilation

Query-Time Aggregation Computes customized summaries on query Runs distributed on data nodes Returns user-specified aggregate values

Map Occlusion Culling: Subsample At Pixel Resolution A specific determined zoom level, each pixel signifies a range in degrees. Scanning the data, only one entry is needed within each pixel range. The rest of the entries can be skipped. The block identified in red represents many data points, but is rendered by the 9 pixels.

Pixel-based Subsampling: Apply at Data Node on Keys 1 2 3 4 2 3 Database Data Displayed Pixels The accumulo iterator starts at the first pixel, scans until it hits a geometry, then skips to the next pixel. 1 4 TO DO: Fix graphic The rendering engine received only these points Scan to the first pixel Seek to the beginning of the next pixel Points that were all skipped.

Distributed Rendering GeoServer (GeoWave Plugin) Map Request Layer Style GeoWave Data Nodes Each scan result is an image with the data in the range Rendered Map Map Response All resultant images are composited together

Built On

Additional Core Features in GeoWave Command Line Utilities & RESTful Web Services Local/HDFS Ingest Kafka Streaming Ingest Stats Generation OSM Utilities Landsat8 Utilities Base Analytics KDE Heat Map DBSCAN K-Means Clustering Open Geospatial Consortium standard services via GeoServer Spatial-Temporal indexing Integrated with other geospatial frameworks PDAL, Mapnik HBase Support Raster datasets Rapid Deployment via AWS EMR GeoWave MapReduce input/output formats Statistics Attribute ranges/histograms Enveloping bounding box over all geometries Counts of # of stored items Counts of discrete attribute values Support for a variety of ingest types Shapefile, GeoJSON, PostGIS, ArcGrid, GeoTIFF, GPX, T-Drive

GeoWave Where can I find it? GitHub Source https://github.com/ngageoint/geowave Examples https://github.com/ngageoint/geowave/tree/master/examples Documentation http://ngageoint.github.io/geowave This Demo https://github.com/radiantbluetechnologies/geowave-demos/tree/master/nyc-taxi

Thanks! Rich Fecher rfecher@radiantblue.com Kent Miller kmiller@radiantblue.com

Backup Slides

OSM – Planet GPX Every track ever uploaded to Open Street Map Complete data attribution 2.9 Billion spatial entities (points) https://blog.openstreetmap.org/2013/04/12/bulk-gpx-track-data/

Global View Entire Pointset Visualized

Zoomed In

Kernel Density Estimate Global View

Kernel Density Estimate Zoomed In

Tiered Indexing Tier 0 (1x1) Tier 1 (2x2) Tier 2 (4x4) Tier 3 (8x8) Point Polygon Tier Duplicates Cell(s) 1 4 2 14 6 3 56 21-24 222 9 35-42

Space Filling Curve Granularity 8x8 Grid 64x64 Grid Polygons overlap few cells Many points per cell Polygons overlap many cells Fewer points per cell Which is better? Why?

Good to Know Space Filling Curve Range Decomposition Intersects Bounding Box Query (71 -> 98) Range of cells from 92-99 Range of cells from 70-75 Range of cells from 116-121

Microsoft - GeoLife Microsoft research has made available a trajectory data set that contains the GPS coordinates of 182 users over a three year period (April 2007 to August 2012). There are 17,621 trajectories in this data set. http://research.microsoft.com/jump/131675

GeoLife Original Track Data

Kernel Density Estimate Gaussian Kernel

Zoomed In Original Track Data

Kernel Density Estimate Zoomed In

Good to Know GeoWave Key Structure Value Row ID Column Time-Stamp Family Qualifier Visibility Index ID Adapter ID Data ID Adapter ID Length Data ID Length # of Duplicates Field ID Field Value Tier Bin Hilbert GeoWave Feature Metadata Feature Attribute Field/Value Pair Tier Index Hilbert SFC Index

Initially 2 Points, 2 Polygons Ingest Example Ingest a Line Tier 31 Tier 3 Initially 2 Points, 2 Polygons Many Duplicates 18 Duplicates Tier 2 Tier 1 Tier 0 Final State 8 Duplicates 3 Duplicates 1 Duplicate Final State

Query Example Intersects Query Tier 1 Tier 2 Magenta Bounding Box 1 cell, 0 filtered, 0 intersect 2 cells, 2 filtered, 0 intersect Tier 3 Tier 4 Tier 31 Final Result 4 cells, 1 filtered, 0 intersect 6 cells, 0 filtered, 1 intersect 0 filtered, 1 intersect 1 Point, 1 Polygon Returned