Techniques for Visualizing Massive Data Sets

Slides:



Advertisements
Similar presentations
Concurrent Web Map Cache Server Zao Liu, Marlon Pierce, Geoffrey Fox Community Grids Laboratory Indiana University.
Advertisements

Nokia Technology Institute Natural Partner for Innovation.
ScalaRMotivationQueryPlanWrap-up 1/26 Dynamic Reduction of Query Result Sets for Interactive Visualization Leilani Battle (MIT) Remco Chang (Tufts) Michael.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
C.R.E.A.M. C ACHE R ULES E VERYTHING A ROUND M E.
1/26Remco Chang – PNNL 14 Analyzing User Interactions for Data and User Modeling Remco Chang Assistant Professor Tufts University.
What will my performance be? Resource Advisor for DB admins Dushyanth Narayanan, Paul Barham Microsoft Research, Cambridge Eno Thereska, Anastassia Ailamaki.
Glenn Reinman, Brad Calder, Department of Computer Science and Engineering, University of California San Diego and Todd Austin Department of Electrical.
1998/5/21by Chang I-Ning1 ImageRover: A Content-Based Image Browser for the World Wide Web Introduction Approach Image Collection Subsystem Image Query.
Progress Report 11/1/01 Matt Bridges. Overview Data collection and analysis tool for web site traffic Lets website administrators know who is on their.
12/11/01 Matt Bridges Advisor: Ralph Morelli. What is Web Analytics? In traditional commerce, store owners can observe their customers habits: What time.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
CS 4432lecture #31 CS4432: Database Systems II Lecture #3 Using the Disk, and Disk Optimizations Professor Elke A. Rundensteiner.
Setting expectations and context Architecting for scale – a web app journey to scalability Scaling ‘real-world’ applications.
Generic Simulator for Users' Movements and Behavior in Collaborative Systems.
Predictor-Directed Stream Buffers Timothy Sherwood Suleyman Sair Brad Calder.
IBIS GIS Mapping Missouri “Show and Tell”. Outline 1.What is KML 2.Why we chose KML 3.Show and Tell.
Anti-Caching in Main Memory Database Systems Justin DeBrabant Brown University
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
IntroDefinitionSizeComplexityWrap-up 1/54 Individual Big Data Visual Analytics: Challenges and Opportunities Remco Chang and Eli Brown Tufts University.
Interactive Data Exploration using Constraints Alexander Kalinin Ugur Cetintemel, Stan Zdonik.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.
Web Prefetching Between Low-Bandwidth Clients and Proxies : Potential and Performance Li Fan, Pei Cao and Wei Lin Quinn Jacobson (University of Wisconsin-Madsion)
Architecture Planning and designing a successful system Use tried and tested techniques Easy to maintain Robust and long lasting.
So far we have covered … Basic visualization algorithms Parallel polygon rendering Occlusion culling They all indirectly or directly help understanding.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Big Data Vs. (Traditional) HPC Gagan Agrawal Ohio State ICPP Big Data Panel (09/12/2012)
Google App Engine Data Store ae-10-datastore
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
Operating System Support for Database Management Andrew Gladstone CSC /26/2007.
Integrating Geographical Information Systems and Grid Applications Marlon Pierce Contributions: Ahmet Sayar,
1 Presented By: Michael Bieniek. Embedded systems are increasingly using chip multiprocessors (CMPs) due to their low power and high performance capabilities.
WEST VIRGINIA UNIVERSITY Lane Department of Computer Science and Electrical Engineering CROWDSOURCED TRAFFIC MAP Team Members: Faculty Mentor: David Williams.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
Recording the Context of Action for Process Documentation Ian Wootten Cardiff University, UK
Client-Server Paradise ICOM 8015 Distributed Databases.
LO To start to draw and describe sequences RAG
They are all about New Year’s resolutions It’s New Year’s day and you have decided to make resolutions. Write a list of new year’s resolutions that.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2, Ugur Cetintemel 2, Mitch Cherniack 1, Christian Convey.
1 The Good  HPC brings a wealth of parallelization experience, petaflop scaling and hybrid architectures.  Analytics brings new algorithms and new markets.
IntroGoalCrowdPredictionWrap-up 1/26 Learning Debugging and Hacking the User Remco Chang Assistant Professor Tufts University.
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
Introduction to ASP.NET development. Background ASP released in 1996 ASP supported for a minimum 10 years from Windows 8 release ASP.Net 1.0 released.
Stuff to memorise… "A method tells an object to perform an action. A property allows us to read or change the settings of the object."
Data Science Interview Questions 1.What do you mean by word Data Science? Data Science is the extraction of knowledge from large.
R EMCO C HANG | T UFTS U NIVERSITY 1/38 B IG D ATA V ISUAL A NALYTICS : A U SER -C ENTRIC A PPROACH Remco Chang Assistant Professor Computer Science, Tufts.
Stuff to memorise… "A method tells an object to perform an action. A property allows us to read or change the settings of the object."
R EMCO C HANG | T UFTS U NIVERSITY 1/38 B IG D ATA V ISUAL A NALYTICS : A U SER -C ENTRIC A PPROACH Remco Chang Assistant Professor Computer Science, Tufts.
Intel “Big Data” Science and Technology Center Michael Stonebraker.
Big Data Visual Analytics: A User-Centric Approach
Building Enterprise Applications Using Visual Studio®
Some remarks and questions Ragnhild Rein Bore, Statistics Norway
Data Prefetching Smruti R. Sarangi.
DATA MINING © Prentice Hall.
18742 Parallel Computer Architecture Caching in Multi-core Systems
So far we have covered … Basic visualization algorithms
Whether you decide to use hidden frames or XMLHttp, there are several things you'll need to consider when building an Ajax application. Expanding the role.
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
WebAnywhere Addressing Performance and Security
Adda Quinn 1974 Nancy Wheeler Jenkins 1978.
Anti-Caching in Main Memory Database Systems
WebSpector: JavaScript Execution Monitor Minyeop Choi
Data Prefetching Smruti R. Sarangi.
CS 3410, Spring 2014 Computer Science Cornell University
Lecture 10: ILP Innovations
Lecture 9: ILP Innovations
GIS for the Public Servant
Presentation transcript:

Techniques for Visualizing Massive Data Sets Leilani Battle, Mike Stonebraker

Context Visualization System query result Database Have a database with lots of data Want a visual overview (i.e. stats plots like scatterplot, heatmap, etc.) Want visualizations to be interactive (I.e. pan and zoom)

Problem Performance Over-plotting Vis systems don’t scale well for big data Or are turning into databases Over-plotting Makes visualizations unreadable Waste of time/resources

Solution: Resolution Reduction Visualization System Database Resolution Reduction Layer query modified query queryplan query When query will return too much data, reduce it Aggregate, sample, filter, etc. “Too big” means: Will slow down the vis system Will cause over-plotting reduced result queryplan result

ScalaR Scalable vis system for data exploration Web front-end Uses SciDB (www.scidb.org) Visualizes query results Performs Resolution Reduction Advertise SciDB (open source array-oriented db system, great for scientific applications and scalable machine learning, give url scidb.org)

Demo of ScalaR

Array Browser Collaboration with: Brown: Justin DeBrabant, Stan Zdonik, Ugur Cetintemel Stanford: Zhicheng Liu, Jeff Heer Google Maps-style exploration experience Fetches subsets of the data (aka data tiles)

Array Browser Example

Array Browser Architecture

Demo of Array Browser

Future Work: Prefetching Goal: Reduce user-wait time by prefetching tiles Cache tiles in the tile buffer Need algorithms to decide what to pre-fetch

User Behavior Predictor (Seer) Learn common query sequences from user traces P P

Statistical Analysis Predictor Look for statistical similarities in tiles Try to guess what’s important based on patterns P P P

Using Multiple Predictors Run multiple predictors (or experts) in parallel Compare predictions to user’s actual behavior Use predictions from best performing expert May change over time based on user’s goals

Other Challenges Lots if interesting problems left to address Best eviction policy for the tile buffer? How to share data between multiple users? More predictors? Explain these bullets LRU and weighted LRU, lots of work don on these But not clear how LRU works with multiple predictors

Questions?

Gemini Sagittarius Dogs Cats

Prefetching Experts User behavior predictor (Seer) Learn common query sequences from user traces Stats analysis predictor Look for statistical similarities in tiles Try to guess what’s important based on patterns 2 min Describe movement patterns through the data set, and explain how Seer is great for this