Supporting Web-based Visual Exploration of Large-Scale Raster Geospatial Data Using Binned Min-Max Quadtree Jianting Zhang 12, Simin You 2 City College.

Slides:

Advertisements

Similar presentations

Visualizing maps on the web. What is a Map? A map is a drawing that is the representation, on a certain scale, of a terrain.

Advertisements

Semantic Data Caching and Replacement. Outline Motivation Client Caching Architecture Model of Semantic Caching Simulations and Results Conclusion and.

The Big Picture Scientific disciplines have developed a computational branch Models without closed form solutions solved numerically This has lead to.

VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.

The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.

Information Retrieval: Human-Computer Interfaces and Information Access Process.

Visibility Culling using Hierarchical Occlusion Maps Hansong Zhang, Dinesh Manocha, Tom Hudson, Kenneth E. Hoff III Presented by: Chris Wassenius.

Parallel Geospatial Data Management for Multi-Scale Environmental Data Analysis on GPUs Visiting Faculty: Jianting Zhang, The City College of New York.

Experience of application of modern GIS-technologies for environmental monitoring tasks Prof., Dr. Cheremisina Evgenia, Dr.Lyubimova Anna.

A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.

LYU0101 Wireless Digital Information System Lam Yee Gordon Yeung Kam Wah Supervisor Prof. Michael Lyu Second semester FYP Presentation 2001~2002.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

LYU0101 Wireless Digital Information System Lam Yee Gordon Yeung Kam Wah Supervisor Prof. Michael Lyu Second semester FYP Presentation 2001~2002.

ADVISE: Advanced Digital Video Information Segmentation Engine

Exploring large marine datasets using an interactive website and Google Earth Jon Blower, Dan Bretherton, Keith Haines, Chunlei Liu, Adit Santokhee Reading.

Dr. Roberto Flores Physics, Computer Science & Engineering Christopher Newport University Newport News, Virginia USA.

Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam

Spatial Indexing I Point Access Methods.

Efficiently Managing Large-Scale Raster Species Distribution Data in PostgreSQL Jianting Zhang, Dept. of Computer Science The City College of the City.

Dynamic Tiled Map Services: Supporting Query-Based Visualization of Large-Scale Raster Geospatial Data Jianting Zhang 12, Simin You 2 City College 1 &

Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.

An Overview of the SAND spatial Database System Claudio Esperanca Hanan Samet Presented By Gautam Shanbhag.

FLANN Fast Library for Approximate Nearest Neighbors

Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.

Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.

ROOT: A Data Mining Tool from CERN Arun Tripathi and Ravi Kumar 2008 CAS Ratemaking Seminar on Ratemaking 17 March 2008 Cambridge, Massachusetts.

Search Engines and Information Retrieval Chapter 1.

Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,

Map Scale, Resolution and Data Models. Components of a GIS Map Maps can be displayed at various scales –Scale - the relationship between the size of features.

Hopkins Storage Systems Lab, Department of Computer Science A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching Xiaodan Wang, Tanu.

Efficient Volume Visualization of Large Medical Datasets Stefan Bruckner Institute of Computer Graphics and Algorithms Vienna University of Technology.

Rendering Adaptive Resolution Data Models Daniel Bolan Abstract For the past several years, a model for large datasets has been developed and extended.

Assessment of Regional Vegetation Productivity: Using NDVI Temporal Profile Metrics Background NOAA satellite AVHRR data archive NDVI temporal profile.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Introduction to Hadoop and HDFS

Service Computation 2010November 21-26, Lisbon.

U.S. Department of the Interior U.S. Geological Survey Web Presence, Data Sharing, Real- time Analysis and Crowdsourcing GFSAD30 Sixth Workshop – July.

Graphics Graphics Korea University cgvr.korea.ac.kr Solid Modeling 고려대학교 컴퓨터 그래픽스 연구실.

HPDC 2014 Supporting Correlation Analysis on Scientific Datasets in Parallel and Distributed Settings Yu Su*, Gagan Agrawal*, Jonathan Woodring # Ayan.

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.

Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]

Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.

Lecture 3 The Digital Image – Part I - Single Channel Data 12 September

HPDC 2013 Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices Yu Su*, Gagan Agrawal*, Jonathan Woodring # Kary Myers #, Joanne Wendelberger.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.

Efficient Processing of Top-k Spatial Preference Queries

Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.

Challenges in Mining Large Image Datasets Jelena Tešić, B.S. Manjunath University of California, Santa Barbara

GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma

2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.

VisDB: Database Exploration Using Multidimensional Visualization Maithili Narasimha 4/24/2001.

CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.

Client-Server Paradise ICOM 8015 Distributed Databases.

Visualization Four groups Design pattern for information visualization

MEAD: Volume Visualization David Porter, U. Minnesota/LCSE Data Pipeline A3D HVR Tiled Commas Output Windows & Unix Formats & Frameworks linking to MEAD.

1 FollowMyLink Individual APT Presentation First Talk February 2006.

Raster Data Models: Data Compression Why? –Save disk space by reducing information content –Methods Run-length codes Raster chain codes Block codes Quadtrees.

1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.

Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.

Database management system Data analytics system:

Pathology Spatial Analysis February 2017

So far we have covered … Basic visualization algorithms

CSCE 990: Advanced Distributed Systems

Towards GPU-Accelerated Web-GIS

Spatio-temporal Pattern Queries

Jianting Zhang City College of New York

Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

Efficient Processing of Top-k Spatial Preference Queries

Presentation transcript:

Supporting Web-based Visual Exploration of Large-Scale Raster Geospatial Data Using Binned Min-Max Quadtree Jianting Zhang 12, Simin You 2 City College 1 & Graduate Center 2 of The City University of New York

Outline Motivation and Introduction Background and Related Work Binned Min-Max Quadtree Index Construction Query Processing System Architecture Experiments and Evaluation Conclusion and Future Work

Motivation/Introduction 3 If you load your own data in Google Earth, Wouldn’t it be nicer if you can query your data and highlight the query results? In addition to simple display, zoom in/out, pan

Global 30s Precipitation Data from WorldClim (Interpolated ) Coloring Schema: Green: 0 mm Red: 100 mm Linear Interpolation Undergraduate Project: Generate Dynamic KML Files for Interactive Visualization in Google Earth ( C. Dasrat/CCNY ) Jan July

Motivation/Introduction Task: Find/show regions where precipitation amount in January is between [p1,p2). Intuitive Solution –Loop through all the raster cells and return all the cell locations. –Problem: long evaluation time and difficulty in visualizing query results in Web browsers for practical reasons. Our Solution: –Backend: Index raster data, perform the query in main memory and return a set of quadrants (SSDBM’10) –Middleware: Dynamically generate tiled images on-demand based on user’s current view and cache the tiled images as necessary (Com.Geo’10) –Ongoing work: massively parallel indexing using GPGPU (20X speedup)

Background & Related Work Spectral, spatial and temporal resolutions of raster geospatial data are getting increasingly finer  larger data volumes –The next generation GOES-R satellite will provide global coverage at the km resolution every 5 minutes (16 bands) –Numerous derived products from satellite images ts_table –Large-scale model simulation results (e.g. WRF)

Background & Related Work Manually examine all the data through visual display is not possible anymore –Human eyes can only effectively distinguish a limited number of colors at a time –Studies show that screen resolution beyond 4000 by 4000 pixels is not effective Query data and highlight results (Region of Interests) for further analysis become more preferable

Background & Related Work Query Driven Visual Exploration of Scientific Data –Wu et al 2003, Stockinger et al 2005, Rubel et 2008 –Glatter et al 2006, Kendall et al 2009, Fuchs et al 2009 Indexing and Query Processing in Spatial Databases –Overview: Gaede and Gunther 1998, Samet 2005 –Vector data: R-Tree, Quad-Tree –Raster data: very limited (except tiling/pyramid)

Background & Related Work Managing Multi-dimensional Array Data –Array query definition language: Baumann et al 1997, Marathe and Salem 1999, Baumann 2009 –Physical data layout: Sarawagi and Stonebraker 1994, Otoo and Rotem 2006, Kim and Jaja 2007, Otoo et al 2007 Information Visualization/Visual Exploration –Desktop Systems: Prefuse, GeoVista, GeoDa, IDV –Web-based: Wood et al 2007, Dork et al 2008 –Main-memory based, no database backend support –Scalability problem  integrating high-performance database engines with information visualization/visual exploration modules

Binned Min-Max Quadtree (BMMQ-Tree) Designed to support ROI finding queries Given: a set of rasters representing environmental variables {F i |0<i<n} over a spatial domain D A ROI finding query Q: identifies regions in D whose cells C j satisfy the compound condition op can be either conjunctive and disjunctive, 0<k<n lower and high bounds of query Q for variable i

Binned Min-Max Quadtree Why Tree-based indexing? –A ROI query is a global operation on rasters –Without indices, scanning whole rasters is required Disk IOs are most expensive along storage hierarchy Performance is limited by disk IOs. –With tree-based indexing Quickly prune irrelevant branches – reduce disk IOs Access disk files only when necessary Answer a large portion of queries directly without incurring disk IOs Indices with small memory footprint can be main-memory resident

Binned Min-Max Quadtree Why Binned Min-Max Quadtree? –Associate min/max values with each quadtree node to help ROI-based queries – popular in 3D graphics for generating iso-surfaces and tracing rays –First law of geography ”Everything is related to everything else, but near things are more related than distant things “ (Tobler 1970) –However, neighboring cells values often are slightly different –Binning helps quadrant uniformity and reduce quadtree complexity

Index Construction

Query Processing – Arbitrary Spatial Window

value range [1,3] under tile (0,1,1) Query Processing –Tile Based (Parallelization possibility) Tile size N*N k=log 2 N

Binned Min-Max Quadtree BMMQ-Tree integrates features of Binned Bitmap Indexing and Min-Max kd-trees and octrees A BMMQ-Tree query results is a set of quadrants that can be expressed as (X,Y,L) tuples – suitable for data communication between clients and servers A BMMQ-Tree query can terminates when the spatial extent that a quadtree node represents is less than a screen pixel (Less-than-Single-Pixel stopping policy) May result in false positives - NOT necessarily bad for visual explorations –Identifying Region of Interests is the primary goal –Details on demand for further examination

Prototype System Original design –Rendering quadrants as vector objects using Flex RIA APIs at the client side –Powerful and flexible: control rendering at the pixel level in Web browsers –The performance is poor when the number of quadrants is above the order of a few thousands –We consider the results as “lessons” rather than “achievements” Current design (COM.GEO’10) –Support tile based queries –Render resulting quadrants as binary images in the middleware –Client is responsible for formulating tiles, submitting queries and visualizing query results –Significant better performances

Prototype System Architecture

Online demo:

Experiments and Evaluation Data: WorldClim January Precipitation Data at 30s resolution (43200*21600) –Value range [0,1003] –Quadtree level=16 Query processing server: Dell T5400 Ad-hoc queries (arbitrary parameters) –Three bin sizes: 8, 16, 32 –Query value range [90,300) –Eight spatial query windows of sizes around 65 degrees (lon) by 55 degrees (lat) Tile-based queries (more systematic) –Bin size=32 –Tile size: 256*256 (k=8) –For query value range[0,1003]: 6848 tiles –For query value range[90,300): 1197 tiles

Results of Ad-hoc Queries #B=8B=16B=32 Q Q Q Q Q Q Q Q Less-Than-Single-Pixel stopping policy NOT applied (Max Level=16, results in milliseconds)

Results of End-to-End Performance using OLD Design Less-Than-Single-Pixel stopping rule Applied Max Level=12 for query window sizes 65*55 degrees Bin size=32

Results of End-to-End Performance using New Design Estimating End-to-End time Assume available network bandwidth=300k Bps  TT=10ms Assume client display area 1024*1024  16 tiles (Parallelizable) Assume no server/client side caching (cold start) Assume rendering times for small images in Web browsers are negligible Estimated time: (QT+GT+TT)*16 = ( )*16=1120 ms

Conclusions The proposed BMMQ-Tree data structure can be used to efficiently process ROI-finding queries on large scale raster geospatial data. Queries can be processed in fractions of a second for large query windows. Tile-based query and dynamic tile image generation (middleware) and rendering (client) are more suitable for visualizing complex query results than client side rendering. New experimental results have showed that we are able to achieve an end-to-end performance in the order of sub-second for 1024*1024 pixels display area using 16 tiles. The performance can be further improved by parallel tile-based processing.

Additional Information GPU-based indexing –Nvidia Quadro FX3700 GPU card with 112 cores and 512M device memory –Raster size is limited to 4096*4096 due to device memory constraints  11*5 blocks –20X speedup (8.7s vs. 0.4s) –We expect to index the same global data on SGI Octane III 2-node mini- cluster with 4 GPU cards in about 1-5 seconds after fine-tuning our current codebase  real time indexing

Relationship with the Big Picture: Visual Explorations of Global Biodiversity Patterns Environment Species Taxonomic (Linnaean ranks) Kingdom Phylum Class Order Family Genus Species SubSpecies Area Water- Energy Latitude Altitude Productivity Environmental Gradient Community – Ecosystem – Biome – Biosphere ACMGIS’08 GeoInfo’09, ACMGIS’09 Com.Geo’10, SSDBM’10 Scale-up online query processing through offline indexing GPGPU-based Indexing Scale-up offline indexing through parallelization