Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Visualization Design for Multidimensional Data: Integrating the Rank-by-Feature Framework with Hierarchical Clustering Dissertation Defense.

Similar presentations


Presentation on theme: "Information Visualization Design for Multidimensional Data: Integrating the Rank-by-Feature Framework with Hierarchical Clustering Dissertation Defense."— Presentation transcript:

1 Information Visualization Design for Multidimensional Data: Integrating the Rank-by-Feature Framework with Hierarchical Clustering Dissertation Defense Human-Computer Interaction Lab & Dept. of Computer Science Jinwook Seo

2 Outline Research Problems Clustering Result Visualization in HCE GRID Principles Rank-by-Feature Framework Evaluation –Case studies –User survey via emails Contributions and Future work

3 Exploration of Multidimensional Data To understand the story that the data tells To find features in the data set To generate hypotheses Lost in multidimensional space Tools and techniques are available in many areas Strategy and interface to organize them to guide discovery

4 Constrained by Conventions Multidimensional Data Statistical MethodsData Mining Algorithms User/Researcher Conventional Tools

5 Boosting Information Bandwidth Multidimensional Data Statistical MethodsData Mining Algorithms Information Visualization Interfaces User/Researcher

6 Contributions Graphics, Ranking, and Interaction for Discovery (GRID) principles Rank-by-Feature Framework The design and implementation of the Hierarchical Clustering Explorer (HCE) Validation through case studies and user surveys

7 Hierarchical Clustering Explorer: Understanding Clusters Through Interactive Exploration Overview of the entire clustering results  compressed overview The right number of clusters  minimum similarity bar Overall pattern of each cluster (aggregation)  detail cutoff bar Compare two results  brushing and linking using pair-tree

8 HCE History Document-View Architecture 72,274 lines of C++ codes, 76 C++ classes About 2,500 downloads since April 2002 Commercial license to a biotech company (www.vialactia.com)www.vialactia.com Freely downloadable at www.cs.umd.edu/hcil/hce www.cs.umd.edu/hcil/hce

9 Goal: Find Interesting Features in Multidimensional Data Finding clusters, outliers, correlations, gaps, … is difficult in multidimensional data –Cognitive difficulties in >3D Therefore utilize low-dimensional projections –Perceptual efficiency in 1D and 2D –Orderly process to guide discovery

10 Do you see anything interesting?

11 Do you see any interesting feature?

12 Correlation…What else?

13 Outliers He Rn

14 GRID Principles Graphics, Ranking, and Interaction for Discovery in Multidimensional Data study 1D study 2D then find features ranking guides insight statistics confirm

15

16 Rank-by-Feature Framework Based on the GRID principles 1D → 2D –1D : Histogram + Boxplot –2D : Scatterplot Ranking Criteria –statistical methods –data mining algorithms Graphical Overview Rapid & Interactive Browsing

17 Pearson correlation (0.996, 0.31, 0.01, -0.69) Uniformness (entropy) (6.7, 6.1, 4.5, 1.5) A Ranking Example 3138 U.S. counties with 17 attributes

18 Categorical Variables in RFF New ranking criteria –Chi-square, ANOVA Significance and Strength –How strong is a relationship? –How significant is a relationship? Partitioning and Comparison –partition by a column (categorical variable) –partition by a row (class info for columns) –compare clustering results for partitions

19 color : Contingency coefficient C size : Chi-square p-value color : Quadracity size : Least-square error

20 Categorical Variables in RFF New ranking criteria –Chi-square, ANOVA Significance and Strength –How strong is a relationship? –How significant is a relationship? Partitioning and Comparison –partition by a column (categorical variable) –partition by a row (class info for columns) –compare clustering results for partitions

21 Partitioning and Comparison s1s2s3s4s5s6s7 FieldTypeinteger realinteger categorical i1i1 M i2i2 M i3i3 M …… i n-1 F inin F Compare two column-clustering results

22 Partitioning and Comparison s1s2s3s4s5s6 CID111222 FieldTypeinteger realinteger i1i1 i2i2 i3i3 … i n-1 inin Compare two row-clustering results

23 Qualitative Evaluation Case studies –30-minute weekly meeting for 6 weeks individually –observe how participants use HCE –improve HCE according to their requirements –1 molecular biologist (Acute lung injuries in mice) –1 biostatistician (FAMuSS Study data) –1 meteorologist (Aerosol measurement)

24 Lessons Learned Rank-by-Feature Framework –Enables systematic/orderly exploration –Prevents from missing important features –Helps confirm known features –Helps identify unknown features –Reveals outliers as signal/noise More work needed –Transformation of variables –More ranking criteria –More interactions

25 User Survey via Emails 1500 user survey emails 13 questions on HCE and RFF 60% successfully sent out 85 users replied 60 users answered a majority of questions 25 just curious users

26 Which features have you used? Do you think HCE improved the way you analyze your data set?

27 Future Work Integrating RFF with Other Tools –More ranking criteria –GRID principles available in other tools Scaling-up –Selection/Filtering to handle large number of dimensions Interaction in RFF Further Evaluation

28

29 Future Work Integrating RFF with Other Tools –More ranking criteria –GRID principles available in other tools Scaling-up –Selection/Filtering to handle large number of dimensions Interaction in RFF Further Evaluation

30 Contributions Graphics, Ranking, and Interaction for Discovery (GRID) principles Rank-by-Feature Framework The design and implementation of the Hierarchical Clustering Explorer (HCE) Validation through case studies and user surveys

31


Download ppt "Information Visualization Design for Multidimensional Data: Integrating the Rank-by-Feature Framework with Hierarchical Clustering Dissertation Defense."

Similar presentations


Ads by Google