Visual Causality Analysis Made Practical

Slides:



Advertisements
Similar presentations
Spatial 3D-visualization of groundwater quality Gooi en Vechtstreek.
Advertisements

Markov Networks Alan Ritter.
Forecasting Using the Simple Linear Regression Model and Correlation
What makes an image memorable?
TorusVis ND : Unraveling High- Dimensional Torus Networks for Network Traffic Visualizations Shenghui Cheng, Pradipta De, Shaofeng H.-C. Jiang* and Klaus.
Yicheng Tu, § Shaoping Chen, §¥ and Sagar Pandit § § University of South Florida, Tampa, Florida, USA ¥ Wuhan University of Technology, Wuhan, Hubei, China.
Structure-Based Distance Metric for High-Dimensional Space Exploration with Multi-Dimensional Scaling Jenny Hyunjung Lee , Kevin T. McDonnell, Alla Zelenyuk.
VALTChessVA IntroAppsWrap-up 1/25 User-Centric Visual Analytics Remco Chang Tufts University Department of Computer Science.
IView: A Feature Clustering Framework for Suggesting Informative Views in Volume Visualization Ziyi Zheng, Nafees Ahmed, Klaus Mueller Visual Analytics.
AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.
Mapping Nominal Values to Numbers for Effective Visualization Presented by Matthew O. Ward Geraldine Rosario, Elke Rundensteiner, David Brown, Matthew.
Visual Analytics Research at WPI Dr. Matthew Ward and Dr. Elke Rundensteiner Computer Science Department.
SPREADSHEETS IN EDUCATION OF LOGISTICS MANAGERS AT FACULTY OF ORGANIZATIONAL SCIENCES: AN EXAMPLE OF INVENTORY DYNAMICS SIMULATION L. Djordjevic, D. Vasiljevic.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
/department of mathematics and computer science Visualization of Transition Systems Hannes Pretorius Visualization Group
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
1 © 1998 HRL Laboratories, LLC. All Rights Reserved Development of Bayesian Diagnostic Models Using Troubleshooting Flow Diagrams K. Wojtek Przytula: HRL.
Berkeley Parlab 1. INTRODUCTION A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing 2. CORRELATIONS TO THE GROUND.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
T T18-09 Line Plot (by Observation) Purpose Allows the analyst to visually analyze up to 5 time series plots on a single graph data samples by.
Preparing Data for Analysis and Analyzing Spatial Data/ Geoprocessing Class 11 GISG 110.
Dist FuncIntroPersonalityProvenanceGroupWrap-up 1/40 User-Centric Visual Analytics Remco Chang Tufts University.
(a.k.a: The statistical bare minimum I should take along from STAT 101)
Windows Programming Parallel Coordinates Adithya Addanki(aa207) Instructor: Dr. Yingcai Xiao.
Mining and Visualization of Flow Cytometry Data ANGELA CHIN UNIVERSITY OF HOUSTON RESEARCH EXPERIENCE FOR UNDERGRADUATES JULY 3,
Atomistic Visualization: On-the-fly Data Extraction and Rendering Dipesh Bhattarai Dr. Bijaya B. Karki Department of Computer Science Louisiana State University.
Tile-based parallel coordinates and its application in financial visualization Jamal Alsakran, Ye Zhao Kent State University, Department of Computer Science,
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 29, 2000.
1 Data Mining: Data Lecture Notes for Chapter 2. 2 What is Data? l Collection of data objects and their attributes l An attribute is a property or characteristic.
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Analyzing Data in MWADC. Outline What is Ravian and what is the Analyst application? Accessing the Analyst application What you can do with the Analyst.
Statistics in IB Biology Error bars, standard deviation, t-test and more.
Data Cleaning Data Cleaning Importance “Data cleaning is one of the three biggest problems in data warehousing”—Ralph Kimball “Data.
How Good is a Model? How much information does AIC give us? –Model 1: 3124 –Model 2: 2932 –Model 3: 2968 –Model 4: 3204 –Model 5: 5436.
The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.
DataJewel 1 : Tightly Integrating Visualization with Temporal Data Mining Mihael Ankerst, David H. Jones, Anne Kao, Changzhou Wang 1 US patent pending.
1 8. Estimating the cluster tree of a density from the MST by Runt Pruning Problem: 1-nn density estimate is very noisy --- singularity at each observation.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Modelling and Solving Configuration Problems on Business
Multivariate statistical methods
Math 7 Statistics and Probability Unit 4
Agenda Preliminaries Motivation and Research questions Exploring GLL
Progressive Clustering of Big Data with GPU Acceleration and Visualization Jun Wang1, Eric Papenhausen1, Bing Wang1, Sungsoo Ha1, Alla Zelenyuk2, and Klaus.
Chapter 7. Classification and Prediction
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 8: Introduction to Statistics CIS Computational Probability.
Math 6330: Statistical Consulting Class 5
Data Structures Lab Algorithm Animation.
Fig. 1. proFIA approach for peak detection and quantification
Xin Zhao and Arie Kaufman
How Good is a Model? How much information does AIC give us?
Analyzing Redistribution Matrix with Wavelet
Constraint Satisfaction Problems vs. Finite State Problems
Visual Analytics Processes
Design Space Exploration
Cse 344 May 30th – analysis.
Multi-Dimensional Data Visualization
Physics-based simulation for visual computing applications
Efficient Distribution-based Feature Search in Multi-field Datasets Ohio State University (Shen) Problem: How to efficiently search for distribution-based.
Find API Usage Patterns
Volume 6, Issue 5, Pages e5 (May 2018)
Probability Theory.
Automated Software Integration
What Is Good Clustering?
Magnet & /facet Zheng Liang
Search.
Search.
Clustering Deviance From CART Analysis and Silhouette Widths
Strength of relation High Low Number of data Relationship Data
Presentation transcript:

Visual Causality Analysis Made Practical Jun Wang and Klaus Mueller Computer Science Department Stony Brook University

Causality Analysis Goal - Recover causal relations from observations Advantages More explicit than correlation analysis “A causes B” vs. “A and B may be associated” More practical than controlled experiments The experiment for testing “smoking causes cancer”

Visual Causality Analysis Why taking a Visual Analytics approach? Automated algorithms are not reliable Get users involved with their domain knowledge Make analysis more manageable

Previous Work Visual Causality Analyst Operating on a single model Force-directed graph Model refinement with statistical tables Naïve method for processing heterogeneous data

Current Work Causal Structure Investigator Visualizing causal flows Visual model refinement Interface for data subdivisions Managing and pooling of the multiple models learned from data subdivisions

Causal Flows Laid out by Breadth-first spanning tree Causal relations as paths flowing mostly from left to the right Color of path encodes relation type Width encodes strength of the relation Causal Graph of the AutoMPG dataset

Visual Model Refinement Measure model goodness with Bayesian Information Criterion (BIC) An extra step in parameterization The heuristic – removing a good relation will lower the quality of the model Causal Graph of the AutoMPG dataset

Handling Heterogeneous Data Global Mapping (GM) strategy (previous method) GM + Un-binning (UB) strategy Random sample in the range to simulate continuous domain Experiment evaluation comparing to Binning 100 random DAGs and the according data Measure the rebuilding error in Structure Hamming Distance (SHD), True Positive Rate(TPR), and True Discovery Rate (TDR)

Handling Heterogeneous Data

Data Subdivisions Simpson’s Paradox A relation found in the overall data may not hold in certain subdivisions Subdivide data via the parallel coordinate interface Manual brushing By values of dimensions By clustering

Data Subdivisions – Multiple Models (b) Blue Cluster (c) Yellow Cluster (d) Red Cluster (a) Analyzing the Sales Campaign Dataset

Model Pooling Purposes Recognize the possible grouping of causal models Pooling by clustering Summarize the common relations from multiple models Pooling by frequency Pooling by credibility

Model Pooling Analyzing the Ocean Chlorophyll Dataset Red Cluster Medoid Blue Cluster Medoid Red Cluster Pooled Blue Cluster Pooled (b) (c) (d) (e) (a) (f) (g) (h) Green Cluster Medoid Green Cluster Pooled Analyzing the Ocean Chlorophyll Dataset

One More Use Case The presidential election dataset – county level statistics

Thanks for attending!