1 A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data Jinwook Seo, Ben Shneiderman University of Maryland Hyun Young Song.

Slides:



Advertisements
Similar presentations
PARTITIONAL CLUSTERING
Advertisements

Lecture 2 Summarizing the Sample. WARNING: Today’s lecture may bore some of you… It’s (sort of) not my fault…I’m required to teach you about what we’re.
Visual Documentation v User Interface Active class (for selection and some processes)
Introduction to Bioinformatics
Mapping Nominal Values to Numbers for Effective Visualization Presented by Matthew O. Ward Geraldine Rosario, Elke Rundensteiner, David Brown, Matthew.
Visualizing and Exploring Data Summary statistics for data (mean, median, mode, quartile, variance, skewnes) Distribution of values for single variables.
Lecture Notes for Chapter 2 Introduction to Data Mining
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Classifier Decision Tree A decision tree classifies data by predicting the label for each record. The first element of the tree is the root node, representing.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
1 Interactive Exploration of Multidimensional Data By: Sanket Sinha Nitin Madnani By: Sanket Sinha Nitin Madnani.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Clustering Petter Mostad. Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning.
“Exploring High-D Spaces with Multiform Matrices and Small Multiples” Mudit Agrawal Nathaniel Ayewah MacEachren, A., Dai, X., Hardisty, F., Guo, D., and.
What is Cluster Analysis
A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration Jinwook Seo and Ben Shneiderman Human-Computer Interaction Lab. & Department.
Interactive Exploration of Hierarchical Clustering Results HCE (Hierarchical Clustering Explorer) Jinwook Seo and Ben Shneiderman Human-Computer Interaction.
Information Visualization Design for Multidimensional Data: Integrating the Rank-by-Feature Framework with Hierarchical Clustering Dissertation Defense.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Introduction to Bioinformatics - Tutorial no. 12
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Data Mining – Intro.
Visual Documentation v User Interface Active class (for selection and some processes)
Info Vis: Multi-Dimensional Data Chris North cs3724: HCI.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Data Mining Techniques
Overview DM for Business Intelligence.
Copyright 2000, Media Cybernetics, L.P. Array-Pro ® Analyzer Software.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Programming Collective Intelligence by Toby.
Information Design and Visualization
Clustering II. 2 Finite Mixtures Model data using a mixture of distributions –Each distribution represents one cluster –Each distribution gives probabilities.
Using geWorkbench: Hierarchical & SOM Clustering Fan Lin, Ph. D Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of.
Enhancing Interactive Visual Data Analysis by Statistical Functionality Jürgen Platzer VRVis Research Center Vienna, Austria.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
The Table Lens: Merging Graphical and Symbolic Representations in an Interactive Focus+Context Visualization for Tabular Information Ramana Rao and Stuart.
1 Data Mining: Data Lecture Notes for Chapter 2. 2 What is Data? l Collection of data objects and their attributes l An attribute is a property or characteristic.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
VizDB A tool to support Exploration of large databases By using Human Visual System To analyze mid-size to large data.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Tight Coupling of Dynamic Query Filters with Starfield Displays / Spotfire.net Desktop By Chris Ahlberg and Ben Shneiderman / Spotfire Inc. IC280 5/9/02.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
Analyzing Expression Data: Clustering and Stats Chapter 16.
Data Mining and Decision Support
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Chapter 5: Organizing and Displaying Data. Learning Objectives Demonstrate techniques for showing data in graphical presentation formats Choose the best.
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
LangTest: An easy-to-use stats calculator Punjaporn P.
Thursday, May 12, 2016 Report at 11:30 to Prairieview
Data Transformation: Normalization
Data Mining Soongsil University
CZ5211 Topics in Computational Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
DATA MINING © Prentice Hall.
AP Exam Review Chapters 1-10
Interaction Week 7 CPSC 533C, Spring 2003
Information Design and Visualization
Dimension reduction : PCA and Clustering
Clustering Wei Wang.
Data Transformations targeted at minimizing experimental variance
Topic 5: Cluster Analysis
Presentation transcript:

1 A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data Jinwook Seo, Ben Shneiderman University of Maryland Hyun Young Song Maryam Farboodi Feb,

2 HCE 3.0 HCE (Hierarchical Clustering Explorer) Main Idea: GRID principles Graphics, Ranking and Interaction for Discovery Feature Application User Manual manual/hce3_manual.html manual/hce3_manual.html Dataset examples.html examples.html

3 Axis-Parallel vs. Non Axis-Parallel Approach Definition 3 dimensions X, Y & Z Axis-parallel: Projection on either X & Y; X & Z or Y & Z Non axis-parallel: Can project on a.X+b.Y & Z Simplicity vs. power Users

4 Related Works Axis-parallel: Machine learning, Info. Vis. Pattern recognition Subset of dimensions to find specific patterns Machine learning and Data mining Supervised/ Unsupervised classification Subspace-based clustering analysis Projections naturally partitioning the data set Information Visualization Permutation Matrix Parallel coordinates: dimension ordering Conditional Entropy

5 Related Work (cntd.) Non axis-parallel: statisticians Two-dimensional projection SOM (Self Organizing Maps) XGobi: Grand tour, Projection pursuit No ranking HD-Eye interactive hierarchical clustering OptiGrid (partitioning clustering algorithm)

6 Major Contributions GRID (Graphics, Ranking and Interaction for Discovery) Study 1D, study 2D, then find features Ranking guides insight, statistics confirm Visualization Techniques Overview Coordination (multiple windows) Dynamic query (item slider)

7 General Overview Menu Toolbar Overviews, Color setting Dendrogram (binary tree), scatterplot 7 tabs Color mosaic, Table view, Histogram Ordering, Scatterplot ordering, Profile search, Gene ontology, K-means

8 General Overview back

9 Load/Transformation Data Natural Log Standardization Normalization To the first column Median Linear scaling back

10 Clustering Algorithm 1. Initially, each data a cluster by itself 2. Merge the pair with highest similarity value 3. Update similarity values 4. Repeat 2 & 3 for n - 1 times to reach one cluster of size n  No predefined number of clusters

11 Choosing Algorithm Parameters

12 Linkage Method Average Linkage Average Group Linkage Complete Linkage Single Linkage Scheinderman’s 1by1 Linkage Tries to grow the newly merged cluster of last iteration first

13 Dendrogram View back

14 7 Tabs

15 1D Histogram Interface Interface description Control panel, Score overview, Ordered list, Histogram browser

16 1D Histogram Ordering Ranking criteria Normality of the distribution (0~∞) s: skewness, k: kurtosis: Uniformity of the distribution (0~∞) Number of potential outliers (0~n) IQR = Q3 – Q1, d: item value Suspected outlier: Extreme outlier: Number of unique values (0~n) Size of the biggest gap (0~max. dim. range) mf: max frequency, t: tolerance:

17 2D Scatterplot Interface Interface description Control panel, Score overview, Ordered list, Scatterplot browser

18 2D Scatterplot Ordering Ranking criteria Statistical Relationship Correlation coefficient(-1~1): Pearson’s coefficient Least square error for curvilinear regression(0~1) Quadracity(-∞~∞) Distribution Characteristics Number of potential outliers(0~n)  LOF-based: Density-based outlier detection Number of items in area of interest(0~n) Uniformity(0~∞) :

19 Demo

20 System Constraints Computational Complexity n data in m dimensional space : O(nm²) O(n) : scoring complexity O(m²) :combination of dimension Display Constraints Appropriate number of dimensions for score overview component: 0~130 Lack of sliders to adjust displacement

21 Evaluation of HCE 3.0 Linear color mapping (3 color or 1 color) Consistent layout of the components Focus-context F: dendrogram – C: rank-by-feature F: ordered list - C: histogram, scatter plot Item slider Dynamic query Multi-window view Dynamic update of data selection in different window

22 Futureworks User study Various statistical tools and data mining algorithms HCE 3.0 (HCE 3.5)HCE 4.0 ?? 1D, 2D axis parallel projection  3D projection Numerical data format  Numerical + categorical, binary, Nominal Limited number of applicable datasets ( us cities, cereal, netscan …) 1D - 5 ranking criteria 2D – 6 ranking criteria  More meaningful datasets to demonstrate the power of each ranking criteria  Incorporate more criterion into rank-by- feature framework

23 Thank you! Questions?