On-the-fly Visualization of Scientific Geospatial Data Using Wavelets GeoDA On-the-fly Visualization of Scientific Geospatial Data Using Wavelets Cyrus Shahabi, Farnoush Banaei-Kashani, Kai Song Introduce: self (postdoc, USC), paper (title and GeoDA), coauthors On behalf of Kai, Bear with me
Outline Motivation and Problem Definition Our Solution: GeoDA Summary Underlying Technology Background: Discrete Wavelet Transform WOLAP Prototype System Development Summary Future Work
USC-JPL SURP Project Cyrus Shahabi and Farnoush Banaei-Kashani USC-JPL collaboration under a SURP project Earth Science Data Visualization Cyrus Shahabi and Farnoush Banaei-Kashani Information Laboratory (InfoLab) University of Southern California (USC) Los Angeles, CA 90089 [shahabi,banaeika]@usc.edu http://infolab.usc.edu Yi Chao and Peggy Li Climate, Oceans, and Solid Earth Science Section Jet Propulsion Laboratory (JPL) Pasadena, CA 91109 [yi.chao,peggy.li]@jpl.nasa.gov http://science.jpl.nasa.gov/COSE/
Earth Science Data Visualization Without Re-scaling Example: SST as data (dimensions and measure attribute), and average as the query Generalize Color-coded map: based on resolution each pixel is a aggregate query covering data of the corresponding area Interactive: Range selection Range Selection With Re-scaling
Earth Science Data Visualization Range Selection Range Re-scaling Aggregated query over latitude, longitude and/or time
Off-line vs. On-the-fly Visualization Off-line Visualization Pre-selected range (and resolution) Visualization by query pre-computation On-the-fly Visualization On-the-fly range (and resolution) selection Visualization by on-the-fly query computation to support dynamic data
Outline Motivation and Problem Definition Our Solution: GeoDA Summary Underlying Technology Background: Discrete Wavelet Transform WOLAP Prototype System Development Summary Future Work Give overview of solution that enables on-the-fly visualization here
Discrete Wavelet Transform 80 70 60 90 37 67 50 50 a {1/2, 1/2} {1/2, -1/2} 75 63 12 -15 75 52 50 5 -15 -15 75 51 1 =DWT(a) â =Wa â 63 12 A transform where the generated data has interesting properites Describe wavelet by example Assume one dimensional dataset How to generate? What are properties? Coefficients are representatives and they are ordered based on their energy Compression as well Multi-resolution view: Compression! 75 60 90 36 66 50 a′ 80 70 37 67 50 60 90 75 51 63 75 52 50 63 12 5 -15 1 â * For simplification, assume {1/2, 1/2} and {1/2, -1/2} as filters instead of the Haar filters {1/2, 1/2} and {1/2, -1/2}.
Wavelets in Databases Our work (WOLAP)2: Others’ work1: Query Compression Reason: fast response time Define range-sum query as dot product of query vector and data vector At the query time, we have the knowledge of what is important to the pending query More opportunities: Progressive results Data-independent approximation Others’ work1: Data Compression Reason: save space? Implicit reason: queries deal with smaller datasets and hence faster Problems: Only approximate results! Very data-dependant Different error rates for different queries 1 See Vitter-CIKM'98, Vitter-SIGMOD'99, Agrawal-CIKM'00, Garofalakis-VLDB'00 2 See Schmidt-PODS‘02, Schmidt-EDBT‘02, Jahangiri-SIGMOD’05
WOLAP Example Original Wavelet* Result=504 Result=178.19*2.83=504 80 70 60 90 37 67 50 80 70 60 90 37 67 50 80 70 60 90 37 67 50 â 178.19 33.94 7.07 -21.21 2 178.19 33.94 7.07 -21.21 2 178.19 33.94 7.07 -21.21 2 1 2.83 Result=504 Result=178.19*2.83=504 (Parseval Theorem) 1 1.73 -.35 -1 .5 .71 Result=304 Result=178.19*1.73+33.94*(-.35)+2*.5 =304 (Parseval Theorem) ~303 (99% accuracy!) O(N) << O(log N) * Here we assume the actual Haar filter: {1/2, 1/2} and {1/2, -1/2}
WOLAP Query Complexity: O(log n) 1 1.4 1.4 1.4 1.4 1.4 1.4 0.7 0.7 1.0 2.0 2.0 1.5 -1 0.5 2.1 2.5 -.7 0.4 3.3 -.3 3.3 -.3 -.7 0.4 -1 0.5 0.7 So summarize WOLAP Assuming that the query is of size N: Theorem 1: Using “lazy wavelet transform” (computing only on the boundaries of the selected range), one can transform any polynomial range-aggregate query in O(log N) to wavelet domain. Theorem 2: The query has O(log N) non-zero values in wavelet domain.
Related Work Agrawal-SIGMOD'97 Abbadi-ICDE'99 Abbadi-Dawak'00 Literature Support not only for queries but for update And more: progressiveness, e.g. N=domain size for each dimension d=number of dimensions
Outline Motivation and Problem Definition Our Solution: GeoDA Summary Underlying Technology Background: Discrete Wavelet Transform WOLAP Prototype System Development Summary Future Work
WOLAP Query Engine (ProDA) GeoDA Architecture Google Map Mashup Presentation Tier WOLAP Query Engine (ProDA) Plotting Tools Query Tier Wavelet Datacubes Data Tier Text Files NC Files
Helena Data Helene Dataset Helene Datacube 10+ dimensions (selected longitude and latitude) 100+ Variables (selected SST) 1km by 1km resolution, daily samples, world-wide 36000 18000 data points per sample (~1/3 of which are null) Helene Datacube Dimensions: Latitude, Longitude Variable: SST
Presentation Tier Implementation Progressive Visualization Cross-language development – JavaScript, C#, ASP.NET AJAX Multi-thread programming Progressive Visualization Demo: Interface Flexible ranges Rescaling GeoDA
Outline Motivation and Problem Definition Our Solution: GeoDA Summary Underlying Technology Background: Discrete Wavelet Transform WOLAP Prototype System Development Summary Future Work
Summary We devised a framework for on-the-fly visualization of large-scale scientific datasets. We designed and exploited a fast range-aggregate query processing technique, WOLAP, that enables on-the-fly visualization. WOLAP supports the family of polynomial range-aggregate queries. We developed a prototype system, GeoDA, as a proof-of-concept based on the designed visualization framework and query processing technique.
Future Work Supporting dynamic datasets by extending WOLAP to handle append of the data stream in wavelet domain. Enhancing WOLAP via caching, to enable group/batch aggregate queries.
Q & A