Download presentation
Presentation is loading. Please wait.
1
Techniques for Visualizing Massive Data Sets
Leilani Battle, Mike Stonebraker
2
Context Visualization System query result Database
Have a database with lots of data Want a visual overview (i.e. stats plots like scatterplot, heatmap, etc.) Want visualizations to be interactive (I.e. pan and zoom)
3
Problem Performance Over-plotting
Vis systems don’t scale well for big data Or are turning into databases Over-plotting Makes visualizations unreadable Waste of time/resources
4
Solution: Resolution Reduction
Visualization System Database Resolution Reduction Layer query modified query queryplan query When query will return too much data, reduce it Aggregate, sample, filter, etc. “Too big” means: Will slow down the vis system Will cause over-plotting reduced result queryplan result
5
ScalaR Scalable vis system for data exploration
Web front-end Uses SciDB ( Visualizes query results Performs Resolution Reduction Advertise SciDB (open source array-oriented db system, great for scientific applications and scalable machine learning, give url scidb.org)
6
Demo of ScalaR
7
Array Browser Collaboration with:
Brown: Justin DeBrabant, Stan Zdonik, Ugur Cetintemel Stanford: Zhicheng Liu, Jeff Heer Google Maps-style exploration experience Fetches subsets of the data (aka data tiles)
8
Array Browser Example
9
Array Browser Architecture
10
Demo of Array Browser
11
Future Work: Prefetching
Goal: Reduce user-wait time by prefetching tiles Cache tiles in the tile buffer Need algorithms to decide what to pre-fetch
12
User Behavior Predictor (Seer)
Learn common query sequences from user traces P P
13
Statistical Analysis Predictor
Look for statistical similarities in tiles Try to guess what’s important based on patterns P P P
14
Using Multiple Predictors
Run multiple predictors (or experts) in parallel Compare predictions to user’s actual behavior Use predictions from best performing expert May change over time based on user’s goals
15
Other Challenges Lots if interesting problems left to address
Best eviction policy for the tile buffer? How to share data between multiple users? More predictors? Explain these bullets LRU and weighted LRU, lots of work don on these But not clear how LRU works with multiple predictors
16
Questions?
18
Gemini Sagittarius Dogs Cats
20
Prefetching Experts User behavior predictor (Seer)
Learn common query sequences from user traces Stats analysis predictor Look for statistical similarities in tiles Try to guess what’s important based on patterns 2 min Describe movement patterns through the data set, and explain how Seer is great for this
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.