Download presentation
Presentation is loading. Please wait.
1
1 A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data Jinwook Seo, Ben Shneiderman University of Maryland Hyun Young Song (hsong@cs.umd.edu)hsong@cs.umd.edu Maryam Farboodi (farboodi@cs.umd.edu)farboodi@cs.umd.edu Feb, 09 2006
2
2 HCE 3.0 HCE (Hierarchical Clustering Explorer) Main Idea: GRID principles Graphics, Ranking and Interaction for Discovery Feature Application http://www.cs.umd.edu/hcil/hce/ User Manual http://www.cs.umd.edu/hcil/hce/hce3- manual/hce3_manual.html http://www.cs.umd.edu/hcil/hce/hce3- manual/hce3_manual.html Dataset http://www.cs.umd.edu/hcil/hce/examples/application_ examples.html http://www.cs.umd.edu/hcil/hce/examples/application_ examples.html
3
3 Axis-Parallel vs. Non Axis-Parallel Approach Definition 3 dimensions X, Y & Z Axis-parallel: Projection on either X & Y; X & Z or Y & Z Non axis-parallel: Can project on a.X+b.Y & Z Simplicity vs. power Users
4
4 Related Works Axis-parallel: Machine learning, Info. Vis. Pattern recognition Subset of dimensions to find specific patterns Machine learning and Data mining Supervised/ Unsupervised classification Subspace-based clustering analysis Projections naturally partitioning the data set Information Visualization Permutation Matrix Parallel coordinates: dimension ordering Conditional Entropy
5
5 Related Work (cntd.) Non axis-parallel: statisticians Two-dimensional projection SOM (Self Organizing Maps) XGobi: Grand tour, Projection pursuit No ranking HD-Eye interactive hierarchical clustering OptiGrid (partitioning clustering algorithm)
6
6 Major Contributions GRID (Graphics, Ranking and Interaction for Discovery) Study 1D, study 2D, then find features Ranking guides insight, statistics confirm Visualization Techniques Overview Coordination (multiple windows) Dynamic query (item slider)
7
7 General Overview Menu Toolbar Overviews, Color setting Dendrogram (binary tree), scatterplot 7 tabs Color mosaic, Table view, Histogram Ordering, Scatterplot ordering, Profile search, Gene ontology, K-means
8
8 General Overview back
9
9 Load/Transformation Data Natural Log Standardization Normalization To the first column Median Linear scaling back
10
10 Clustering Algorithm 1. Initially, each data a cluster by itself 2. Merge the pair with highest similarity value 3. Update similarity values 4. Repeat 2 & 3 for n - 1 times to reach one cluster of size n No predefined number of clusters
11
11 Choosing Algorithm Parameters
12
12 Linkage Method Average Linkage Average Group Linkage Complete Linkage Single Linkage Scheinderman’s 1by1 Linkage Tries to grow the newly merged cluster of last iteration first
13
13 Dendrogram View back
14
14 7 Tabs
15
15 1D Histogram Interface Interface description Control panel, Score overview, Ordered list, Histogram browser
16
16 1D Histogram Ordering Ranking criteria Normality of the distribution (0~∞) s: skewness, k: kurtosis: Uniformity of the distribution (0~∞) Number of potential outliers (0~n) IQR = Q3 – Q1, d: item value Suspected outlier: Extreme outlier: Number of unique values (0~n) Size of the biggest gap (0~max. dim. range) mf: max frequency, t: tolerance:
17
17 2D Scatterplot Interface Interface description Control panel, Score overview, Ordered list, Scatterplot browser
18
18 2D Scatterplot Ordering Ranking criteria Statistical Relationship Correlation coefficient(-1~1): Pearson’s coefficient Least square error for curvilinear regression(0~1) Quadracity(-∞~∞) Distribution Characteristics Number of potential outliers(0~n) LOF-based: Density-based outlier detection Number of items in area of interest(0~n) Uniformity(0~∞) :
19
19 Demo
20
20 System Constraints Computational Complexity n data in m dimensional space : O(nm²) O(n) : scoring complexity O(m²) :combination of dimension Display Constraints Appropriate number of dimensions for score overview component: 0~130 Lack of sliders to adjust displacement
21
21 Evaluation of HCE 3.0 Linear color mapping (3 color or 1 color) Consistent layout of the components Focus-context F: dendrogram – C: rank-by-feature F: ordered list - C: histogram, scatter plot Item slider Dynamic query Multi-window view Dynamic update of data selection in different window
22
22 Futureworks User study Various statistical tools and data mining algorithms HCE 3.0 (HCE 3.5)HCE 4.0 ?? 1D, 2D axis parallel projection 3D projection Numerical data format Numerical + categorical, binary, Nominal Limited number of applicable datasets ( us cities, cereal, netscan …) 1D - 5 ranking criteria 2D – 6 ranking criteria More meaningful datasets to demonstrate the power of each ranking criteria Incorporate more criterion into rank-by- feature framework
23
23 Thank you! Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.