1/30Remco Chang – SEAri Workshop 15 Big Data Visual Analytics: A User Centric Approach Remco Chang Assistant Professor Tufts University.

Slides:



Advertisements
Similar presentations
Lindsey Bleimes Charlie Garrod Adam Meyerson
Advertisements

1/26Remco Chang – Dagstuhl 14 Analyzing User Interactions for Data and User Modeling Remco Chang Assistant Professor Tufts University.
1/54Remco Chang – LANL 14 Analyzing User Interactions for Data and User Modeling Remco Chang Assistant Professor Tufts University.
ProvenanceIntroLOCCog StateDist FuncWrap-up 1/52 User-Centric Visual Analytics Remco Chang Tufts University.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
EvaluationIntroVis/GfxInteractionWrap-up Thinking Interactively with Visualizations Remco Chang UNC Charlotte Charlotte Visualization Center.
ScalaRMotivationQueryPlanWrap-up 1/26 Dynamic Reduction of Query Result Sets for Interactive Visualization Leilani Battle (MIT) Remco Chang (Tufts) Michael.
VALTChessVA IntroAppsWrap-up 1/25 User-Centric Visual Analytics Remco Chang Tufts University Department of Computer Science.
Dist FuncIntroVAAppsATGWrap-up 1/25 Visual Analytics Research at Tufts Remco Chang Assistant Professor Tufts University.
ProvenanceIntroApplicationPersonalityDist FuncWrap-up 1/36 User-Centric Visual Analytics Remco Chang Tufts University Department of Computer Science.
WireVis Visualization of Categorical, Time-Varying Data From Financial Transactions Remco Chang, Mohammad Ghoniem, Robert Kosara, Bill Ribarsky, Jing Yang,
1/26Remco Chang – PNNL 14 Analyzing User Interactions for Data and User Modeling Remco Chang Assistant Professor Tufts University.
IBM Software Group ® Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Chapter 4 DECISION SUPPORT AND ARTIFICIAL INTELLIGENCE
Research to Reality William Ribarsky Remco Chang University of North Carolina at Charlotte.
Engineering Data Analysis & Modeling Practical Solutions to Practical Problems Dr. James McNames Biomedical Signal Processing Laboratory Electrical & Computer.
Data Mining.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Chapter 14 The Second Component: The Database.
Chapter 13 The Data Warehouse
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
SizeIntroDefinitionComplexityTuftsWrap-up 1/54 Big Data Visual Analytics: Challenges and Opportunities Remco Chang Tufts University.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Dist FuncIntroPersonalityProvenanceGroupWrap-up 1/40 User-Centric Visual Analytics Remco Chang Tufts University.
IntroDefinitionSizeComplexityWrap-up 1/54 Individual Big Data Visual Analytics: Challenges and Opportunities Remco Chang and Eli Brown Tufts University.
VALTVA IntroAppsWrap-up 1/16 Interactive Data Analysis and Model Exploration: A Visual Analytics Approach Remco Chang Tufts University Department of Computer.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
What are your interactions doing for your visualization? Remco Chang UNC Charlotte Charlotte Visualization Center.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Full-Text Search in P2P Networks Christof Leng Databases and Distributed Systems Group TU Darmstadt.
1/20 (Big Data Analytics for Everyone) Remco Chang Assistant Professor Department of Computer Science Tufts University Big Data Visual Analytics: A User-Centric.
VISUAL ANALYTICS: VISUAL EXPLORATION, ANALYSIS, AND PRESENTATION OF LARGE COMPLEX DATA Remco Chang, PhD (Charlotte Visualization Center) (Tufts University)
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
VALTVA IntroAppsWrap-up 1/34 User-Centric Visual Analytics Remco Chang Tufts University Department of Computer Science.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
ProvenanceIntroPersonalityPrimingDist FuncWrap-up 1/52 User-Centric Visual Analytics Remco Chang Tufts University.
Article Summary of The Structural Complexity of Software: An Experimental Test By Darcy, Kemerer, Slaughter and Tomayko In IEEE Transactions of Software.
The Interplay Between Mathematics/Computation and Analytics Haesun Park Division of Computational Science and Engineering Georgia Institute of Technology.
Sorting: Implementation Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.
Trust Me, I’m Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster Shengliang Dai.
ProvenanceIntroPersonalityPrimingDist FuncWrap-up 1/40 User-Centric Visual Analytics Remco Chang Tufts University.
1 Remco Chang – Dagstuhl 15 From vision science to data science: applying perception to problems in big data Remco Chang Assistant Professor Computer Science.
1/41 Visualization and Analysis of Text Remco Chang, PhD Assistant Professor Department of Computer Science Tufts University December 17, 2010 Cologne,
ApproxHadoop Bringing Approximations to MapReduce Frameworks
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Evaluating the Relationships between User Interaction and Financial Visual Analysis Dong Hyun Jeong, Wenwen Dou, Felesia Stukes, William Ribarsky, Heather.
Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC Relevance Feedback for Image Retrieval.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
IntroGoalCrowdPredictionWrap-up 1/26 Learning Debugging and Hacking the User Remco Chang Assistant Professor Tufts University.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Dato Confidential 1 Danny Bickson Co-Founder. Dato Confidential 2 Successful apps in 2015 must be intelligent Machine learning key to next-gen apps Recommenders.
R EMCO C HANG | T UFTS U NIVERSITY 1/38 B IG D ATA V ISUAL A NALYTICS : A U SER -C ENTRIC A PPROACH Remco Chang Assistant Professor Computer Science, Tufts.
R EMCO C HANG | T UFTS U NIVERSITY 1/38 B IG D ATA V ISUAL A NALYTICS : A U SER -C ENTRIC A PPROACH Remco Chang Assistant Professor Computer Science, Tufts.
Big Data Visual Analytics: A User-Centric Approach
Database management system Data analytics system:
School of Computer Science & Engineering
A Black-Box Approach to Query Cardinality Estimation
Chapter 13 The Data Warehouse
Remco Chang Associate Professor Computer Science, Tufts University
Big Data Visual Analytics: Challenges and Opportunities
CMPT 733, SPRING 2016 Jiannan Wang
Data Warehousing and Data Mining
CSc4730/6730 Scientific Visualization
Introduction to Visual Analytics
Introduction of Week 9 Return assignment 5-2
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Presentation transcript:

1/30Remco Chang – SEAri Workshop 15 Big Data Visual Analytics: A User Centric Approach Remco Chang Assistant Professor Tufts University

2/30Remco Chang – SEAri Workshop 15 Human + Computer Human vs. Artificial Intelligence Garry Kasparov vs. Deep Blue (1997) – Computer takes a “brute force” approach without analysis – “As for how many moves ahead a grandmaster sees,” Kasparov concludes: “Just one, the best one” Artificial vs. Augmented Intelligence Hydra vs. Cyborgs (2005) – Grandmaster + 1 chess program > Hydra (equiv. of Deep Blue) – Amateur + 3 chess programs > Grandmaster + 1 chess program

3/30Remco Chang – SEAri Workshop 15 Example: What Does (Wire) Fraud Look Like? Financial Institutions like Bank of America have legal responsibilities to report all suspicious wire transaction activities (money laundering, supporting terrorist activities, etc) Data size: approximately 200,000 transactions per day (73 million transactions per year) Problems: – Automated approach can only detect known patterns – Bad guys are smart: patterns are constantly changing – Data is messy: lack of international standards resulting in ambiguous data Current methods: – 10 analysts monitoring and analyzing all transactions – Using SQL queries and spreadsheet-like interfaces – Limited time scale (2 weeks)

4/30Remco Chang – SEAri Workshop 15 WireVis: Financial Fraud Analysis In collaboration with Bank of America – Develop a visual analytical tool (WireVis) – Visualizes 7 million transactions over 1 year – Beta-deployed at WireWatch A great problem for visual analytics: – Ill-defined problem (how does one define fraud?) – Limited or no training data (patterns keep changing) – Requires human judgment in the end (involves law enforcement agencies) Design philosophy: “combating human intelligence requires better (augmented) human intelligence” R. Chang et al., Scalable and interactive visual analysis of financial wire transactions for fraud detection. Information Visualization,2008. R. Chang et al., Wirevis: Visualization of categorical, time-varying data from financial transactions. IEEE VAST, 2007.

5/30Remco Chang – SEAri Workshop 15 WireVis: A Visual Analytics Approach Heatmap View (Accounts to Keywords Relationship) Strings and Beads (Relationships over Time) Search by Example (Find Similar Accounts) Keyword Network (Keyword Relationships)

6/30Remco Chang – SEAri Workshop 15 Visual Analytics = Human + Computer Visual analytics is “the science of analytical reasoning facilitated by visual interactive interfaces.” 1 By design, it is a collaboration between human and computer to solve hard problems. 1. Thomas and Cook, “Illuminating the Path”, 2005.

7/30Remco Chang – SEAri Workshop 15 “The computer is incredibly fast, accurate, and stupid. Man is unbelievably slow, inaccurate, and brilliant. The marriage of the two is a force beyond calculation.” -Leo Cherne, 1977 (often attributed to Albert Einstein)

8/30Remco Chang – SEAri Workshop 15 Which Marriage?

9/30Remco Chang – SEAri Workshop 15 Which Marriage?

10/30Remco Chang – SEAri Workshop 15 Applications of Visual Analytics Political Simulation – Agent-based analysis – With DARPA Global Terrorism Database – With DHS Bridge Maintenance – With US DOT – Exploring inspection reports Biomechanical Motion – Interactive motion comparison R. Chang et al., Two Visualization Tools for Analysis of Agent-Based Simulations in Political Science. IEEE CG&A, 2012

11/30Remco Chang – SEAri Workshop 15 Applications of Visual Analytics Where When Who What Original Data Evidence Box R. Chang et al., Investigative Visual Analysis of Global Terrorism, Journal of Computer Graphics Forum, Political Simulation – Agent-based analysis – With DARPA Global Terrorism Database – With DHS Bridge Maintenance – With US DOT – Exploring inspection reports Biomechanical Motion – Interactive motion comparison

12/30Remco Chang – SEAri Workshop 15 Applications of Visual Analytics R. Chang et al., An Interactive Visual Analytics System for Bridge Management, Journal of Computer Graphics Forum, To Appear. Political Simulation – Agent-based analysis – With DARPA Global Terrorism Database – With DHS Bridge Maintenance – With US DOT – Exploring inspection reports Biomechanical Motion – Interactive motion comparison

13/30Remco Chang – SEAri Workshop 15 Applications of Visual Analytics R. Chang et al., Interactive Coordinated Multiple-View Visualization of Biomechanical Motion Data, IEEE Vis (TVCG) Political Simulation – Agent-based analysis – With DARPA Global Terrorism Database – With DHS Bridge Maintenance – With US DOT – Exploring inspection reports Biomechanical Motion – Interactive motion comparison

14/30Remco Chang – SEAri Workshop 15 Future of Visual Analytics Current Approach: – One command, one response (not quite a collaboration) Assumptions: – User’s mouse and keyboard actions with a visualization reflect a user’s reasoning process – If the computer knows what the user’s reasoning process, it can better support (collaborate with) the user Goals: Can we extract a higher level information about the user through analyzing the user’s interactions? How will the computer utilize such information? VisualizationHuman Output Input Keyboard, Mouse Images (visualizations)

15/30Remco Chang – SEAri Workshop 15 Extracting User Model from Interactions 1. Learning about a User in Real-Time Who is the user, and what is she doing?

16/30Remco Chang – SEAri Workshop 15 Experiment: Finding Waldo Google-Maps style interface – Left, Right, Up, Down, Zoom In, Zoom Out, Found

17/30Remco Chang – SEAri Workshop 15 Fast completion time Pilot Visualization – Completion Time Slow completion time Eli Brown et al., Where’s Waldo. IEEE VAST 2014.

18/30Remco Chang – SEAri Workshop 15 Post-hoc Analysis Results Mean Split (50% Fast, 50% Slow) Data RepresentationClassification AccuracyMethod State Space72%SVM Edge Space63%SVM Action Sequence77%Decision Tree Mouse Event62%SVM Fast vs. Slow Split (Mean+0.5σ=Fast, Mean-0.5σ=Slow) Data RepresentationClassification AccuracyMethod State Space96%SVM Edge Space83%SVM Action Sequence79%Decision Tree Mouse Event79%SVM

19/30Remco Chang – SEAri Workshop 15 “Real-Time” Prediction (Limited Time Observation) State-Based Linear SVM Accuracy: ~70% Interaction Sequences N-Gram + Decision Tree Accuracy: ~80%

20/30Remco Chang – SEAri Workshop 15 Predicting a User’s Personality External Locus of Control Internal Locus of Control Ottley et al., How locus of control influences compatibility with visualization style. IEEE VAST, Ottley et al., Understanding visualization by understanding individual users. IEEE CG&A, 2012.

21/30Remco Chang – SEAri Workshop 15 Predicting Users’ Personality Traits Noisy data, but can detect the users’ individual traits “Extraversion”, “Neuroticism”, and “Locus of Control” at ~60% accuracy by analyzing the user’s interactions alone. Predicting user’s “Extraversion” Linear SVM Accuracy: ~60%

22/30Remco Chang – SEAri Workshop 15 User-Model Adaptive Databases 2. What Can a System Do If It Knows Something About Its User?

23/30Remco Chang – SEAri Workshop 15 Problem Domain: Big Data Exploration Visualization on a Commodity Hardware Large Data in a Data Warehouse

24/30Remco Chang – SEAri Workshop 15 Problem Statement Constraint: Data is too big to fit into the memory or hard drive of the personal computer – Note: Ignoring various database technologies (OLAP, Column-Store, No-SQL, Array-Based, etc) Goal: Guarantee a result set to a user’s query within X number of seconds. – Based on HCI research, the absolute upperbound for X is 10 seconds – Ideally, we would like to get it down to 1 second or less In CS talk: trading speed for accuracy, but optimize on minimizing latency (user wait time).

25/30Remco Chang – SEAri Workshop 15 Our Approach: Predictive Pre-Computation and Pre-Fetching In collaboration with MIT and Brown – Models the user based on their past interaction histories – “Guesses” a set of the user’s possible next moves – pre-computes and pre-fetches the necessary data chunks – If the guesses are right, the user would experience no wait time

26/30Remco Chang – SEAri Workshop 15 Interactive Visualization System client middleware database Predictive Engine Caching and Query Execution Recommender Cooked Tile Cache Semi-Cooked Tile Cache Server

27/30Remco Chang – SEAri Workshop 15 Preliminary System and Evaluation Using a simple Waldo- like interface 18 users explored the NASA MODIS dataset – Users were in WA – Database in Boston Tasks include “find 4 areas in Europe that have a snow coverage index above 0.5” What happens if the guesses are “wrong”?

28/30Remco Chang – SEAri Workshop 15 Summary

29/30Remco Chang – SEAri Workshop 15 Wrap Up: Visual Analytics Theory and Practice Visual analytics offers tremendous opportunities to combine “human + computer” as a collaborative computational unit “Increasing the input bandwidth” is a critical challenge. There is a lot of “signal” about the user’s reasoning process and analysis behaviors that can be extracted from analyzing their (past) interactions. By modeling the user based on their past interactions, we can design very complex (adaptive) systems to better support the user. The example of “big data” is just one of many potentially rich and impactful example.

30/30Remco Chang – SEAri Workshop 15 Questions?

31/30Remco Chang – SEAri Workshop 15 Backup

32/30Remco Chang – SEAri Workshop 15 Prediction Algorithms General Idea: – Lots of “experts” who recommends chunks of data to pre-fetch / pre-compute – One “manager” who listens to the experts and chooses which experts’ advice to follow – Each “expert” gets more of their recommendations accepted if they keep guessing correctly

33/30Remco Chang – SEAri Workshop Iteration: 0

34/30Remco Chang – SEAri Workshop Iteration: 0

35/30Remco Chang – SEAri Workshop Iteration: 0 User Requests Data Block 13

36/30Remco Chang – SEAri Workshop Iteration: 0 User Requests Data Block 13

37/30Remco Chang – SEAri Workshop Iteration: 0 User Requests Data Block 13

38/30Remco Chang – SEAri Workshop Iteration: 1

39/30Remco Chang – SEAri Workshop 15 Training Instead of training the manager in real-time, this process can be done offline – Using past user interaction logs This approach is similar to how Database are currently tuned – Instead of a DBA manually tune the performance of a database – Past SQL logs are used to automatically tune the database for an organization’s specific needs (e.g. read-mostly, write-often, etc.)

40/30Remco Chang – SEAri Workshop 15 How to Determine the “Experts”? More detail on this later Some obvious ones include: – Momentum-based – Data similarity-based – Frequency (hot-spot)-based – Past action sequence-based Generally speaking, given the “manager” approach, we want as many different types of “experts” as possible

41/30Remco Chang – SEAri Workshop 15 Preliminary Results Using a simple Google- maps like interface 18 users explored the NASA MODIS dataset Tasks include “find 4 areas in Europe that have a snow coverage index above 0.5”

42/30Remco Chang – SEAri Workshop User’s Requests Data Block 52 Worst Case Scenario: Cache Miss

43/30Remco Chang – SEAri Workshop 15 Cache Miss How to guarantee response time when there’s a cache miss? Trick: the ‘EXPLAIN’ command Usage: explain select * from myTable; Not standard SQL, but implemented in most commercial databases

44/30Remco Chang – SEAri Workshop 15 Example EXPLAIN Output from SciDB Example SciDB the output of (a query similar to) Explain SELECT * FROM earthquake [("[pPlan]: schema earthquake <datetime:datetime NULL DEFAULT null, magnitude:double NULL DEFAULT null, latitude:double NULL DEFAULT null, longitude:double NULL DEFAULT null> [x=1:6381,6381,0,y=1:6543,6543,0] bound start {1, 1} end {6381, 6543} density 1 cells chunks 1 est_bytes e+09 ")] The four attributes in the table ‘earthquake’ Notes that the dimensions of this array (table) is 6381x6543 This query will touch data elements from (1, 1) to (6381, 6543), totaling 41,750,833 cells Estimated size of the returned data is e+09 bytes (~8GB)

45/30Remco Chang – SEAri Workshop 15 Other Examples Oracle 11g Release 1 (11.1)

46/30Remco Chang – SEAri Workshop 15 Other Examples MySQL 5.0

47/30Remco Chang – SEAri Workshop 15 Other Examples PostgreSQL 7.3.4

48/30Remco Chang – SEAri Workshop 15 Query Modification Based on the resulting query plan, our system chooses one of three strategies to reduce results from the query – Can be based on the literal resolution of the visualization (number of pixels) – Or desired data size

49/30Remco Chang – SEAri Workshop 15 Reduction Strategies Aggregation: – In SciDB, this operation is carried out as regrid (scale_factorX, scale_factorY) Sampling – In SciDB, uniform sampling is carried out as bernoulli (query, percentage, randseed) Filtering – Currently, the filtering criteria is user specified where (clause)

50/30Remco Chang – SEAri Workshop 15 Quick Summary Key Components: 1.Pre-computation and pre-fetching 2.Three-tiered system 3.Pre-fetching based on “expert-manager” approach 4.Use the “explain” trick to handle cache-miss 5.Guarantees response time, but not data quality

51/30Remco Chang – SEAri Workshop 15 Future Work: Streaming Integrate Streaming [Fisher et al. CHI 2012] t = 1 second t = 5 minute Fisher et al., Trust Me, I'm Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster. CHI 2012

52/30Remco Chang – SEAri Workshop 15 Designing “Experts” How much can a user’s past interactions tell us about: – The user’s future analysis behaviors? – The user’s analysis style? – The user’s analysis intent? – The user’s mental model of the data and problem? Fundamental question in Visualization and HCI…

53/30Remco Chang – SEAri Workshop 15 Project Outline “Reverse engineer” the human cognitive black box (by analyzing user interactions) A.Data Modeling – Interactive Metric Learning B.User Modeling – Predict Analysis Behavior C.Interactive Big Data Databases – Adaptive Pre-fetching and computation R. Chang et al., Science of Interaction, Information Visualization, 2009.

54/30Remco Chang – SEAri Workshop 15 Data Modeling 1.Interactive Metric Learning Quantifying a User’s Knowledge about Data

55/30Remco Chang – SEAri Workshop Richard Heuer. Psychology of Intelligence Analysis, (pp 53-57)

56/30Remco Chang – SEAri Workshop 15 Exploring High-Dimensional Space: iPCA Jeong et al., iPCA: An Interactive System for PCA-based Visual Analytics. Eurovis 2009.

57/30Remco Chang – SEAri Workshop 15 Metric Learning Finding the weights to a linear distance function Instead of a user manually give the weights, can we learn them implicitly through their interactions?

58/30Remco Chang – SEAri Workshop 15 Metric Learning In a projection space (e.g., MDS), the user directly moves points on the 2D plane that don’t “look right”… Until the expert is happy (or the visualization can not be improved further) The system learns the weights (importance) of each of the original k dimensions Short Video (play)play

59/30Remco Chang – SEAri Workshop 15 Dis-Function Brown et al., Find Distance Function, Hide Model Inference. IEEE VAST Poster 2011 Brown et al., Dis-function: Learning Distance Functions Interactively. IEEE VAST Optimization:

60/30Remco Chang – SEAri Workshop 15 Results Used the “Wine” dataset (13 dimensions, 3 clusters) Added 10 extra dimensions, and filled them with random values Blue: original data dimension Red: randomly added dimensions X-axis: dimension number Y-axis: final weights of the distance function