Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Interactive Exploration of Multidimensional Data By: Sanket Sinha Nitin Madnani By: Sanket Sinha Nitin Madnani.

Similar presentations


Presentation on theme: "1 Interactive Exploration of Multidimensional Data By: Sanket Sinha Nitin Madnani By: Sanket Sinha Nitin Madnani."— Presentation transcript:

1 1 Interactive Exploration of Multidimensional Data By: Sanket Sinha Nitin Madnani By: Sanket Sinha Nitin Madnani

2 2 Is It Really That Common ?  You Bet:  Demographics  Economics  Census  Microarray Gene Expression  Engineering  Psychology  Health  You Bet:  Demographics  Economics  Census  Microarray Gene Expression  Engineering  Psychology  Health

3 3 I can’t see it, I tell ya !  Visualization challenges for >= 3D:  Relationship comprehension is difficult  Discovering outliers, clusters and gaps is almost impossible  Orderly exploration is not possible with standard visualization systems  Navigation is cognitively onerous and disorienting (3D)  Occlusion (3D)  Visualization challenges for >= 3D:  Relationship comprehension is difficult  Discovering outliers, clusters and gaps is almost impossible  Orderly exploration is not possible with standard visualization systems  Navigation is cognitively onerous and disorienting (3D)  Occlusion (3D)

4 4 Standard Solution  Can you say “Pro-jek-shun” ?  Use lower dimensional projections of data:  Can you say “Pro-jek-shun” ?  Use lower dimensional projections of data: 1D : Histograms 2D : Scatterplots

5 5 But there are so many !  For 13 dimensions (columns) :  Number of histograms = 13  Number of scatterplots = C(13,2) = 78  Must examine a series of these to gain insights  Unsystematic == Inefficient  Must have order !  For 13 dimensions (columns) :  Number of histograms = 13  Number of scatterplots = C(13,2) = 78  Must examine a series of these to gain insights  Unsystematic == Inefficient  Must have order !

6 6 Introducing Rank-by-feature  Allows projections to be examined in an orderly fashion  A powerful framework for interactive detection of:  Inter-dimension relationships  Gaps  Outliers  Patterns  Allows projections to be examined in an orderly fashion  A powerful framework for interactive detection of:  Inter-dimension relationships  Gaps  Outliers  Patterns

7 7 How does it work ?  Framework defines ranking criteria for 1D & 2D projections  User selects criterion of interest  All projections are scored on the criterion and ranked  User examines projections in the order recommended  Eureka* !!  Framework defines ranking criteria for 1D & 2D projections  User selects criterion of interest  All projections are scored on the criterion and ranked  User examines projections in the order recommended  Eureka* !! *Disclaimer: All users may not be able to make life-altering discoveries

8 8 Ranking Criteria - 1D  Normality: Indicative of how “Gaussian” the dataset is  Uniformity: How “uniform” is the dataset ? (How high is the entropy ?)  Outliers: The number of potential outliers in the dataset  Gap: The size of the biggest gap  Uniqueness: Number of unique data points  Normality: Indicative of how “Gaussian” the dataset is  Uniformity: How “uniform” is the dataset ? (How high is the entropy ?)  Outliers: The number of potential outliers in the dataset  Gap: The size of the biggest gap  Uniqueness: Number of unique data points

9 9 Ranking Criteria - 2D  Linear Correlation: Pearson’s correlation coefficient  LSE: Least Square Error from the optimal quadratic curve fit  Quadracity: Quadratic coefficient from fitting curve equation  Uniformity: Joint entropy  ROI: Number of items in a Region Of Interest  Outliers: Number of potential outliers  Linear Correlation: Pearson’s correlation coefficient  LSE: Least Square Error from the optimal quadratic curve fit  Quadracity: Quadratic coefficient from fitting curve equation  Uniformity: Joint entropy  ROI: Number of items in a Region Of Interest  Outliers: Number of potential outliers

10 10 Put A Demo Where Your Mouth Is !

11 11 HCE Overview

12 12 The Input Dialog Box Perform Filtering & Normalization

13 13 Histogram Ordering

14 14 Scatterplot Ordering

15 15 Tabular View of Data Select specific data records and annotate if needed

16 16 Questions/Critiques  What does “outlierness” mean?  Cannot identify datapoints in histogram or scatterplot browser without switching to table view  Especially in ROI  How to intuitively interpret:  Outliers in 2D  LSE  Quadracity  What does “outlierness” mean?  Cannot identify datapoints in histogram or scatterplot browser without switching to table view  Especially in ROI  How to intuitively interpret:  Outliers in 2D  LSE  Quadracity


Download ppt "1 Interactive Exploration of Multidimensional Data By: Sanket Sinha Nitin Madnani By: Sanket Sinha Nitin Madnani."

Similar presentations


Ads by Google