SizeIntroDefinitionComplexityTuftsWrap-up 1/54 Big Data Visual Analytics: Challenges and Opportunities Remco Chang Tufts University.

Slides:



Advertisements
Similar presentations
1/26Remco Chang – Dagstuhl 14 Analyzing User Interactions for Data and User Modeling Remco Chang Assistant Professor Tufts University.
Advertisements

1/54Remco Chang – LANL 14 Analyzing User Interactions for Data and User Modeling Remco Chang Assistant Professor Tufts University.
ProvenanceIntroLOCCog StateDist FuncWrap-up 1/52 User-Centric Visual Analytics Remco Chang Tufts University.
EvaluationIntroVis/GfxInteractionWrap-up Thinking Interactively with Visualizations Remco Chang UNC Charlotte Charlotte Visualization Center.
VALTChessVA IntroAppsWrap-up 1/25 User-Centric Visual Analytics Remco Chang Tufts University Department of Computer Science.
Dist FuncIntroVAAppsATGWrap-up 1/25 Visual Analytics Research at Tufts Remco Chang Assistant Professor Tufts University.
ProvenanceIntroApplicationPersonalityDist FuncWrap-up 1/36 User-Centric Visual Analytics Remco Chang Tufts University Department of Computer Science.
Visualization and Cluster
WireVis Visualization of Categorical, Time-Varying Data From Financial Transactions Remco Chang, Mohammad Ghoniem, Robert Kosara, Bill Ribarsky, Jing Yang,
1/26Remco Chang – PNNL 14 Analyzing User Interactions for Data and User Modeling Remco Chang Assistant Professor Tufts University.
Small Displays Nicole Arksey Information Visualization December 5, 2005 My new kitty, Erwin.
Chapter 4 DECISION SUPPORT AND ARTIFICIAL INTELLIGENCE
Introduction to Data-driven Animation Jinxiang Chai Computer Science and Engineering Texas A&M University.
Research to Reality William Ribarsky Remco Chang University of North Carolina at Charlotte.
Dimensionality Reduction
Data Mining – Intro.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Large Scale Data Analytics
© 2010 IBM Corporation © 2011 IBM Corporation September 6, 2012 NCDHHS FAMS Overview for Behavioral Health Managed Care Organizations.
1/30Remco Chang – SEAri Workshop 15 Big Data Visual Analytics: A User Centric Approach Remco Chang Assistant Professor Tufts University.
Information Design and Visualization
LECTURE 03: DATA COLLECTION AND MODELS February 4, 2015 COMP Topics in Visual Analytics Note: slide deck adapted from R. Chang, Fall 2010.
1 Using Information Systems for Decision Making BUS Abdou Illia, Spring 2007 (Week 13, Thursday 4/5/2007)
CIS 9002 Kannan Mohan Department of CIS Zicklin School of Business, Baruch College.
Dist FuncIntroPersonalityProvenanceGroupWrap-up 1/40 User-Centric Visual Analytics Remco Chang Tufts University.
IntroDefinitionSizeComplexityWrap-up 1/54 Individual Big Data Visual Analytics: Challenges and Opportunities Remco Chang and Eli Brown Tufts University.
VALTVA IntroAppsWrap-up 1/16 Interactive Data Analysis and Model Exploration: A Visual Analytics Approach Remco Chang Tufts University Department of Computer.
Introduction GAM 376 Robin Burke Winter Outline Introductions Syllabus.
11 C H A P T E R Artificial Intelligence and Expert Systems.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
What are your interactions doing for your visualization? Remco Chang UNC Charlotte Charlotte Visualization Center.
Fall 2002CS/PSY Information Visualization Picture worth 1000 words... Agenda Information Visualization overview  Definition  Principles  Examples.
David S. Ebert David S. Ebert Visual Analytics to Enable Discovery and Decision Making: Potential, Challenges, and.
1/20 (Big Data Analytics for Everyone) Remco Chang Assistant Professor Department of Computer Science Tufts University Big Data Visual Analytics: A User-Centric.
VISUAL ANALYTICS: VISUAL EXPLORATION, ANALYSIS, AND PRESENTATION OF LARGE COMPLEX DATA Remco Chang, PhD (Charlotte Visualization Center) (Tufts University)
VALTVA IntroAppsWrap-up 1/34 User-Centric Visual Analytics Remco Chang Tufts University Department of Computer Science.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Visualizing Tabular Data CS 4390/5390 Data Visualization Shirley Moore, Instructor September 29,
Data Visualization Michel Bruley Teradata Aster EMEA Marketing Director April 2013 Michel Bruley Teradata Aster EMEA Marketing Director.
ProvenanceIntroPersonalityPrimingDist FuncWrap-up 1/52 User-Centric Visual Analytics Remco Chang Tufts University.
The Interplay Between Mathematics/Computation and Analytics Haesun Park Division of Computational Science and Engineering Georgia Institute of Technology.
Trust Me, I’m Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster Shengliang Dai.
ProvenanceIntroPersonalityPrimingDist FuncWrap-up 1/40 User-Centric Visual Analytics Remco Chang Tufts University.
SNU OOPSLA Lab. 1 Great Ideas of CS with Java Part 1 WWW & Computer programming in the language Java Ch 1: The World Wide Web Ch 2: Watch out: Here comes.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
1 Remco Chang – Dagstuhl 15 From vision science to data science: applying perception to problems in big data Remco Chang Assistant Professor Computer Science.
1/41 Visualization and Analysis of Text Remco Chang, PhD Assistant Professor Department of Computer Science Tufts University December 17, 2010 Cologne,
Evaluating the Relationships between User Interaction and Financial Visual Analysis Dong Hyun Jeong, Wenwen Dou, Felesia Stukes, William Ribarsky, Heather.
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.
KNOWLEDGE MANAGEMENT UNIT II KNOWLEDGE MANAGEMENT AND TECHNOLOGY 1.
IntroGoalCrowdPredictionWrap-up 1/26 Learning Debugging and Hacking the User Remco Chang Assistant Professor Tufts University.
R EMCO C HANG | T UFTS U NIVERSITY 1/38 B IG D ATA V ISUAL A NALYTICS : A U SER -C ENTRIC A PPROACH Remco Chang Assistant Professor Computer Science, Tufts.
Dense-Region Based Compact Data Cube
Large Scale Data Analytics
Big Data Visual Analytics: A User-Centric Approach
Data Mining – Intro.
School of Computer Science & Engineering
Remco Chang Associate Professor Computer Science, Tufts University
Current Issues or Challenges in Visual Analytics
Big Data Visual Analytics: Challenges and Opportunities
Data Warehousing and Data Mining
CSc4730/6730 Scientific Visualization
Information Design and Visualization
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Introduction to Visual Analytics
Information Visualization (Part 1)
CHAPTER 7: Information Visualization
Data Mining: Concepts and Techniques
Presentation transcript:

SizeIntroDefinitionComplexityTuftsWrap-up 1/54 Big Data Visual Analytics: Challenges and Opportunities Remco Chang Tufts University

SizeIntroDefinitionComplexityTuftsWrap-up 2/54 Human + Computer Human vs. Artificial Intelligence Garry Kasparov vs. Deep Blue (1997) – Computer takes a “brute force” approach without analysis – “As for how many moves ahead a grandmaster sees,” Kasparov concludes: “Just one, the best one” Artificial vs. Augmented Intelligence Hydra vs. Cyborgs (2005) – Grandmaster + 1 chess program > Hydra (equiv. of Deep Blue) – Amateur + 3 chess programs > Grandmaster + 1 chess program

SizeIntroDefinitionComplexityTuftsWrap-up 3/54 Visual Analytics = Human + Computer Visual analytics is "the science of analytical reasoning facilitated by visual interactive interfaces.“ 1 By definition, it is a collaboration between human and computer to solve problems. 1. Thomas and Cook, “Illuminating the Path”, 2005.

SizeIntroDefinitionComplexityTuftsWrap-up 4/54 Example: What Does (Wire) Fraud Look Like? Financial Institutions like Bank of America have legal responsibilities to report all suspicious wire transaction activities (money laundering, supporting terrorist activities, etc) Data size: approximately 200,000 transactions per day (73 million transactions per year) Problems: – Automated approach can only detect known patterns – Bad guys are smart: patterns are constantly changing – Data is messy: lack of international standards resulting in ambiguous data Current methods: – 10 analysts monitoring and analyzing all transactions – Using SQL queries and spreadsheet-like interfaces – Limited time scale (2 weeks)

SizeIntroDefinitionComplexityTuftsWrap-up 5/54 WireVis: Financial Fraud Analysis In collaboration with Bank of America – Develop a visual analytical tool (WireVis) – Visualizes 7 million transactions over 1 year – Beta-deployed at WireWatch A great problem for visual analytics: – Ill-defined problem (how does one define fraud?) – Limited or no training data (patterns keep changing) – Requires human judgment in the end (involves law enforcement agencies) Design philosophy: “combating human intelligence requires better (augmented) human intelligence” R. Chang et al., Scalable and interactive visual analysis of financial wire transactions for fraud detection. Information Visualization,2008. R. Chang et al., Wirevis: Visualization of categorical, time-varying data from financial transactions. IEEE VAST, 2007.

SizeIntroDefinitionComplexityTuftsWrap-up 6/54 WireVis: A Visual Analytics Approach Heatmap View (Accounts to Keywords Relationship) Strings and Beads (Relationships over Time) Search by Example (Find Similar Accounts) Keyword Network (Keyword Relationships)

SizeIntroDefinitionComplexityTuftsWrap-up 7/54 Applications of Visual Analytics Political Simulation – Agent-based analysis – With DARPA Global Terrorism Database – With DHS Bridge Maintenance – With US DOT – Exploring inspection reports Biomechanical Motion – Interactive motion comparison R. Chang et al., Two Visualization Tools for Analysis of Agent-Based Simulations in Political Science. IEEE CG&A, 2012

SizeIntroDefinitionComplexityTuftsWrap-up 8/54 Applications of Visual Analytics Where When Who What Original Data Evidence Box R. Chang et al., Investigative Visual Analysis of Global Terrorism, Journal of Computer Graphics Forum, Political Simulation – Agent-based analysis – With DARPA Global Terrorism Database – With DHS Bridge Maintenance – With US DOT – Exploring inspection reports Biomechanical Motion – Interactive motion comparison

SizeIntroDefinitionComplexityTuftsWrap-up 9/54 Applications of Visual Analytics R. Chang et al., An Interactive Visual Analytics System for Bridge Management, Journal of Computer Graphics Forum, To Appear. Political Simulation – Agent-based analysis – With DARPA Global Terrorism Database – With DHS Bridge Maintenance – With US DOT – Exploring inspection reports Biomechanical Motion – Interactive motion comparison

SizeIntroDefinitionComplexityTuftsWrap-up 10/54 Applications of Visual Analytics R. Chang et al., Interactive Coordinated Multiple-View Visualization of Biomechanical Motion Data, IEEE Vis (TVCG) Political Simulation – Agent-based analysis – With DARPA Global Terrorism Database – With DHS Bridge Maintenance – With US DOT – Exploring inspection reports Biomechanical Motion – Interactive motion comparison

SizeIntroDefinitionComplexityTuftsWrap-up 11/54 Talk Outline Visual Analytics + Big Data: 1.What is Big Data Visual Analytics? Definition and Problem Statement 2.How to Visualize High Dimensional Data? 3.How to Visualize Large Amounts of Data? 4.Research at Tufts

SizeIntroDefinitionComplexityTuftsWrap-up 12/54 1. What is Big Data Visual Analytics? A Definition and Problem Statement

SizeIntroDefinitionComplexityTuftsWrap-up 13/54 Recall Bank of America Project Financial Institutions like Bank of America have legal responsibilities to report all suspicious wire transaction activities (money laundering, supporting terrorist activities, etc) Data size: approximately 200,000 transactions per day (73 million transactions per year) Question: How many people think this is Big Data?

SizeIntroDefinitionComplexityTuftsWrap-up 14/54 Defining Big Data for Visual Analytics Let’s say that I have a billion data items, is that Big Data? What if: – These data items only have two attributes (e.g., latitude, longitude)? – If I transpose this dataset such that I have two rows of data, but with a billion attributes?

SizeIntroDefinitionComplexityTuftsWrap-up 15/54 Defining Big Data for Visual Analytics Big Data is NOT just about the size of your data For the purpose of this talk, let’s talk about Big Data in the following way: – Complexity: The number of attributes (k) Assume (k > 2) – Size: The number of rows (n) Assume the amount of data cannot fit into a desktop computer’s memory

SizeIntroDefinitionComplexityTuftsWrap-up 16/54 Problem Statements Considering the two together is too difficult, so we’ll tackle the two issues independently for now Our goal is to visualize (complex | large) data sets while: – Maintaining interactivity: rendering at 10 fps – Allowing for operations on the data (zoom, pivot, etc)

SizeIntroDefinitionComplexityTuftsWrap-up 17/54 2. How to Visualize Complex (High-Dimensional) Data?

SizeIntroDefinitionComplexityTuftsWrap-up 18/54 Why is This Problem Hard? You can only see 2D because Your monitor is 2D In other words: you can show at most 2 dimensional data. Everything else is a hack.

SizeIntroDefinitionComplexityTuftsWrap-up 19/54 Ways to Visualize k-Dimensional Data Two primary ways to do this “hack” – Divide up the 2D screen into multiple 2D regions Showing no correlation between dimensions Showing k-1 correlations Showing all pair-wise correlations – Project k-Dimensional Data into 2D 3D to 2D k-D projection

SizeIntroDefinitionComplexityTuftsWrap-up 20/54 Ways to Visualize k-Dimensional Data Divide up the 2D screen into multiple 2D regions – Showing no correlation between dimensions – Showing k-1 correlations – Showing all pair-wise correlations Project k-Dimensional Data into 2D – 3D to 2D – k-D projection

SizeIntroDefinitionComplexityTuftsWrap-up 21/54 Ways to Visualize k-Dimensional Data Divide up the 2D screen into multiple 2D regions – Showing no correlation between dimensions – Showing k-1 correlations – Showing all pair-wise correlations Project k-Dimensional Data into 2D – 3D to 2D – k-D projection Parallel Coordinates

SizeIntroDefinitionComplexityTuftsWrap-up 22/54 Ways to Visualize k-Dimensional Data Divide up the 2D screen into multiple 2D regions – Showing no correlation between dimensions – Showing k-1 correlations – Showing all pair-wise correlations Project k-Dimensional Data into 2D – 3D to 2D – k-D projection Scatterplot Matrix

SizeIntroDefinitionComplexityTuftsWrap-up 23/54 Ways to Visualize k-Dimensional Data Divide up the 2D screen into multiple 2D regions – Showing no correlation between dimensions – Showing k-1 correlations – Showing all pair-wise correlations Project k-Dimensional Data into 2D – 3D to 2D – k-D projection

SizeIntroDefinitionComplexityTuftsWrap-up 24/54 Ways to Visualize k-Dimensional Data Divide up the 2D screen into multiple 2D regions – Showing no correlation between dimensions – Showing k-1 correlations – Showing all pair-wise correlations Project k-Dimensional Data into 2D – 3D to 2D – k-D projection

SizeIntroDefinitionComplexityTuftsWrap-up 25/54 Ways to Visualize k-Dimensional Data Divide up the 2D screen into multiple 2D regions – Showing no correlation between dimensions – Showing k-1 correlations – Showing all pair-wise correlations Project k-Dimensional Data into 2D – 3D to 2D – k-D projection Example Projection Methods: (Dimension Reduction) PCA MDS LDA LLE Many others! Usually, try to preserve distances in 2D as they exist in k-D

SizeIntroDefinitionComplexityTuftsWrap-up 26/54 What We Have Done (at Tufts) We like projection methods because it is more scalable than the “divide the screen” methods iPCA – does interaction help understanding high dimensional data? – Demo Dis-Function – are interactions in 2D meaningful (recoverable) in k-D?

SizeIntroDefinitionComplexityTuftsWrap-up 27/54 Dis-Function: Direct Manipulation of Visualization The user directly moves points on the 2D plane that don’t “look right”… Until the expert is happy (or the visualization can not be improved further) The system learns the weights (importance) of each of the original k dimensions

SizeIntroDefinitionComplexityTuftsWrap-up 28/54 Dis-Function This iterative metric learning process finds the weights of the k-dimensions over a series of 2D interactions R. Chang et al., Find Distance Function, Hide Model Inference. IEEE VAST Poster 2011 R. Chang et al., Dis-function: Learning Distance Functions Interactively, IEEE VAST To Appear

SizeIntroDefinitionComplexityTuftsWrap-up 29/54 Dis-Function: Implementation Linear distance function: Optimization:

SizeIntroDefinitionComplexityTuftsWrap-up 30/54 Open Questions in High-Dimensional Data Visualization When to use what? – Projection methods scale better, but are harder to understand What happens when the data attributes are not all numeric, but contains categorical or text data? – Use multiple coordinated views But what if k gets to be really large and the types are mixed? – Uh…

SizeIntroDefinitionComplexityTuftsWrap-up 31/54 3. How to Visualize Large Amount of Data?

SizeIntroDefinitionComplexityTuftsWrap-up 32/54 Problem Statement Visualization on a Commodity Hardware Large Data in a Data Warehouse

SizeIntroDefinitionComplexityTuftsWrap-up 33/54 Problem Statement Constraint: Data is too big to fit into the memory or hard drive of the personal computer – Note: Ignoring various database technologies (OLAP, Column-Store, No-SQL, Array-Based, etc) Classic Computer Science Problem… What are some previous techniques? – Truncate (sample, filter) – Resolution reduction (“blurring”, image zooming) – Stream (think Netflix, Hulu) – Pre-fetch (think open world 3D video games)

SizeIntroDefinitionComplexityTuftsWrap-up 34/54 Pros and Cons: Truncate Truncate (sample, filter) – Pros: Easy to implement; efficient; scalable – Cons: Sampling is often data- or task-dependent Sampling Algorithm

SizeIntroDefinitionComplexityTuftsWrap-up 35/54 Pros and Cons: Resolution Reduction Resolution reduction (“blurring”) – Pros: Allows hierarchical navigations – Cons: Fine details are often lost, not all data types can be easily blurred (order-invariant data)

SizeIntroDefinitionComplexityTuftsWrap-up 36/54 Pros and Cons: Streaming Stream [Fisher et al. CHI 2012] – Pros: Query can be terminated at any time – Cons: It is inefficient on the database end t = 1 second t = 5 minute Fisher et al., Trust Me, I'm Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster. CHI 2012

SizeIntroDefinitionComplexityTuftsWrap-up 37/54 Pros and Cons: Pre-Fetch Pre-fetch – Pros: Seamless to the user – Cons: Predicting the future is kind of hard Possible in 3D games because of limited degrees of freedom

SizeIntroDefinitionComplexityTuftsWrap-up 38/54 Pros and Cons: Pre-Fetch Pre-fetch in Visual Analytics [Chan, Hanrahan, 2008 VAST] – Limit the types of operations a user can do – Allows interactive analysis of over a billion data points Chan et al.,. Maintaining Interactivity While Exploring Massive Time Series. IEEE VAST 2008

SizeIntroDefinitionComplexityTuftsWrap-up 39/54 Quick Summary Most of the time, a combination of techniques is used in a given system. For example, streaming and sampling. Pre-fetching is very interesting because: – The success metric is quantitative (cache misses) – Multiple approaches for prediction Feature-based (what data features is the user interested in?) Momentum-based (has the user been panning to the right?) Probabilistic models (what is the user likely going to do?) Profile-based (what type of user is it?) etc

SizeIntroDefinitionComplexityTuftsWrap-up 40/54 4. Research at Tufts: Visual Analytics of Large Amounts of Data Joint work with Caroline Ziemkiewicz, Alvitta Ottley

SizeIntroDefinitionComplexityTuftsWrap-up 41/54 Motivation

SizeIntroDefinitionComplexityTuftsWrap-up 42/54 Individual Differences and Interaction Pattern Existing research shows that all the following factors affect how someone uses a visualization: – Spatial Ability – Cognitive Workload/Mental Demand – Personality – Experience (novice vs. expert) – Emotional State – Perceptual Speed – … and more

SizeIntroDefinitionComplexityTuftsWrap-up 43/54 Preliminary Study – Novice v. Expert Novice vs. Expert financial experts use of the WireVis system when searching for fraud – Novice exhibited “breadth-first-search” behaviors – Experts exhibited “depth-first-search” behaviors Our next step is to use Machine Learning methods to distinguish a user by analyzing their interactions in real-time

SizeIntroDefinitionComplexityTuftsWrap-up 44/54 Preliminary Study – Locus of Control Identified the personality factor, Locus of Control (LOC), as a predictor for how a user interacts with the following visualizations:

SizeIntroDefinitionComplexityTuftsWrap-up 45/54 Results When with list view compared to containment view, internal LOC users are: – faster (by 70%) – more accurate (by 34%) Only for complex (inferential) tasks The speed improvement is about 2 minutes (116 seconds) R. Chang et al., How Locus of Control Influences Compatibility with Visualization Style, IEEE VAST R. Chang et al., How Visualization Layout Relates to Locus of Control and Other Personality Factors. TVCG To Appear.

SizeIntroDefinitionComplexityTuftsWrap-up 46/54 Preliminary Study – Cognitive Priming

SizeIntroDefinitionComplexityTuftsWrap-up 47/54 Results: Averages Primed More Internal Visual Form List-View Containment Performance Poor Good Internal LOC External LOC Average ->Internal Average LOC R. Chang et al., LOC it Down: Manipulating and Controlling for Personality Effects on Visualization Tasks. (In Submission to CHI)

SizeIntroDefinitionComplexityTuftsWrap-up 48/54 Preliminary Study – Using Brain Sensing (fNIRS) Functional Near-Infrared Spectroscopy a lightweight brain sensing technique measures mental demand (working memory) R. Chang et al., Using fNIRS Brain Sensing to Evaluate Information Visualization Interfaces (In submission at CHI)

SizeIntroDefinitionComplexityTuftsWrap-up 49/54 This is Your Brain on Bar graphs and Pie Charts

SizeIntroDefinitionComplexityTuftsWrap-up 50/54 Make the Computer Aware of the User!

SizeIntroDefinitionComplexityTuftsWrap-up 51/54 Summary

SizeIntroDefinitionComplexityTuftsWrap-up 52/54 Summary Visual Analytics + Big Data is a critically important problem that isn’t going to go away Thinking of Big Data as problems of data complexity and size can lead to clearer research paths I propose that one research area that has largely been unexplored is in the understanding of the human user.

SizeIntroDefinitionComplexityTuftsWrap-up 53/54 Summary Visual Analytics + Big Data: 1.What is Big Data Visual Analytics? Definition and Problem Statement 2.How to Visualize High Dimensional Data? 3.How to Visualize Large Amounts of Data? 4.Research at Tufts

SizeIntroDefinitionComplexityTuftsWrap-up 54/54