Presentation is loading. Please wait.

Presentation is loading. Please wait.

IIIT Hyderabad Interactive Visualization and Tuning of Multi-Dimensional Clusters for Indexing Dasari Pavan Kumar (MS by Research Thesis) Centre for Visual.

Similar presentations


Presentation on theme: "IIIT Hyderabad Interactive Visualization and Tuning of Multi-Dimensional Clusters for Indexing Dasari Pavan Kumar (MS by Research Thesis) Centre for Visual."— Presentation transcript:

1 IIIT Hyderabad Interactive Visualization and Tuning of Multi-Dimensional Clusters for Indexing Dasari Pavan Kumar (MS by Research Thesis) Centre for Visual Information Technology

2 IIIT Hyderabad Overview Provide a framework to generate better clusters for high dimensional data points Provide a fast cluster analysis/generation tool

3 IIIT Hyderabad Data, Data, Data ! Digital data creation at an unprecedented rate Data is collected to extract/search “valuable” information – A difficult task however! Data generation in previous decade consisted mostly of textual information – Inverted Index, suffix trees, N-grams, etc

4 IIIT Hyderabad More data ! Flickr, Youtube, etc changed the game – Non-textual information (images) – Huge amounts of data! New methods! (Content based Image Retrieval) – Underlying processes remain similar Why image search? – Copyright Infringement, Offensive, Education, etc

5 IIIT Hyderabad Multi-dimensional Multi-variate data Stock markets Weather/climate Business Huge datasets – multiple dimensions. Finding “insights” can’t be fully automated.

6 IIIT Hyderabad Data Visualization Human intelligence/cognition is unmatchable by computers Cluster analysis – descriptive modeling Information Visualizations to support analysis – Identify important features/patterns

7 IIIT Hyderabad What if you have millions of high- dimensional data points? XMDV tool (M. Ward) – Scatter-plot matrix – Parallel Coordinate Plot Cluster tree (Stuetzle) Cone trees (Robertson et. al) Past Attempts!

8 IIIT Hyderabad Indexing images/videos Extract feature vectors from images Apply clustering to compute bag of words Generate feature histogram and perform some ML methods

9 IIIT Hyderabad Indexing images/videos Extract feature vectors from images Apply clustering to compute bag of words Generate feature histogram and perform some ML methods

10 IIIT Hyderabad Using SIFT features The fundamental problem – sheer volume of data No. of dimensions – 128 No. of data points – in millions Other low-level image features exist – GLOH, steerable filter, spin images

11 IIIT Hyderabad Clusters + visualization The problem – choosing the right bag of words (clusters) Better visual words lead to better classification

12 IIIT Hyderabad Cluster analysis Provide a framework for user to – Identify better subspaces – Efficiently/quickly compute clusters – Compare clustering schemas

13 Extracted low-level image descriptors Manageable size (high dimensional) Statistical sampling Priority/Weight assignment to features Clustering (Visual Words) Visualization system Automatic weight recommendation 1 Automatic weight recommendation N User defined weight re-assignment Verification Cluster entire set Good Bad Output Schema Framework

14 IIIT Hyderabad Tool

15 IIIT Hyderabad Extracted low-level image descriptors Manageable size (high dimensional) Statistical sampling Priority/Weight assignment to features Clustering (Visual Words) Visualization system Automatic weight recommendation 1 Automatic weight recommendation N User defined weight re-assignment Verification Cluster entire set Good Bad Output Schema Framework

16 IIIT Hyderabad Why prioritize dimensions? Dimensionality reduction !! – Feature transformation – Feature selection

17 IIIT Hyderabad Why not feature transformation? Dimensions can be redundant/irrelevant – Hence PCA cant be trivially applied Clusters could be lost in cloud of dimensions (curse of dimensionality) Difficult to interpret the combination

18 IIIT Hyderabad Feature selection Wrapper model – “wrap” selection process around the mining algorithm – Go hand in hand giving little control Filter model – Examine intrinsic properties

19 IIIT Hyderabad “Interesting” dimensions Without any rank – Analyze density distribution based on grids – Difficult to compare since its highly dependent on density parameter Rank dimensions – Based on distribution of data Uniformity (Entropy) No. of outliers No. of unique values d>(Q 3 +1.5*IQR) || d<(Q 1 -1.5*IQR)

20 IIIT Hyderabad Ranked dimensions Assign weights based on the amount of “interestingness” – 1D Histogram of distribution – 2D correlations - PCP How do we assign weights? Manual – Automatic suggestions !

21 IIIT Hyderabad Glyph view Standard SIFT glyph Bar chart – Length – rank – Color - weight Colormap

22 Extracted low-level image descriptors Manageable size (high dimensional) Statistical sampling Priority/Weight assignment to features Clustering (Visual Words) Visualization system Automatic weight recommendation 1 Automatic weight recommendation N User defined weight re-assignment Verification Cluster entire set Good Bad Output Schema Framework

23 IIIT Hyderabad Data clustering Sample data set – 1.3 million points with 128 dimensions Cluster such data on a commodity pc – Almost impossible

24 IIIT Hyderabad Data clustering Plug-in for any cluster technique – Currently using k-means (GPU) Currently 200 iterations for 1.3 million SIFT vectors – 12 sec for each iteration for 1000 clusters

25 Extracted low-level image descriptors Manageable size (high dimensional) Statistical sampling Priority/Weight assignment to features Clustering (Visual Words) Visualization system Automatic weight recommendation 1 Automatic weight recommendation N User defined weight re-assignment Verification Cluster entire set Good Bad Output Schema Framework

26 IIIT Hyderabad Cluster Viz. Visualizing clusters over 128 dimensions – Not feasible Re-project into 2D space – Necessity for some sort of layout Plug-in any graph drawing – Current – 2D force based

27 IIIT Hyderabad Graph representation Compute cluster tree of nearest neighbor density – Similar nodes must be close – Can be estimated using MST Generate minimum spanning tree (MST) of cluster centers – Single linkage dendogram – Prim’s method

28 IIIT Hyderabad Graph drawing Use a GPU implementation of force based graph layout – Takes 0.2 sec for 1000 nodes Drill-down “visual word” to actually see the “sift” interest points to understand the similarity MST without layout MST with layout

29 IIIT Hyderabad Similar looking regions clustered into the same id

30 IIIT Hyderabad Cluster validation Two clustering schemas – Visually not feasible to compare Three basic strategies – Internal – compare schema C with proximity matrix – External – build an independent partition according to our intuition Comparison with schema C or proximity matrix. – Relative – choose the one that best fits !! Computationally not feasible

31 IIIT Hyderabad Relative validity Some indices – RS value – Davies-Bouldin index – SD index Around 1 minute for each schema C on CPU GPU implementation takes 1 second

32 IIIT Hyderabad Validity indices Indices plotted over a line graph – Obtain min/max of the graph – optimal clusters N c Iteration Index

33 Extracted low-level image descriptors Manageable size (high dimensional) Statistical sampling Priority/Weight assignment to features Clustering (Visual Words) Visualization system Automatic weight recommendation 1 Automatic weight recommendation N User defined weight re-assignment Verification Cluster entire set Good Bad Output Schema Framework

34 IIIT Hyderabad Automatic weight recommendation Only a suggestive process Final decision left to user

35 IIIT Hyderabad Results on UIUC image collection A total of 4485 images 15 categories Mean classification accuracy of 57.6% for SIFT with DoG

36 IIIT Hyderabad Interesting observation 135◦, 215◦, 270◦ – Lower weights assigned by automatic schemas Same with corner cells Ds = {4, 12, 22, 43, 44, 54, 55, 71, 78, 79, 83, 84, 110, 116} 1D histograms corresponding to dimensions (a)84, (b) 110, (c) 124

37 IIIT Hyderabad Results on UIUC image collection More clusters does not mean better classification Fei-Fei et al. report a mean accuracy of 52.5% VW = Number of visual words, EW = K-means using uniform weights, IW = K- means with weights adjusted interactively, IW-Ds = K-means with Ds dimensions given a weight zero and weights of other dimensions adjusted interactively.

38 IIIT Hyderabad Results on UIUC image collection More clusters does not necessarily mean better classification Fei-Fei et al. report a mean accuracy of 52.5%

39 IIIT Hyderabad Summary Provide a framework for better cluster generation Provide fast cluster analysis/generation tool for a commodity pc enabled with GPU Able to analyze distributions across dimensions – Identified redundant dimensions Able to achieve higher classification ratios with relative ease

40 IIIT Hyderabad Publications Interactive Visualization and Tuning of SIFT Indexing, Dasari Pavan Kumar and P.J.Narayanan, Vision, Modelling and Visualization, 2010, Siegen, Germany

41 IIIT Hyderabad Limitations Limited by GPU and CPU memory User needs to get familiarized with the tool Visual decoding of data is sometimes difficult Cluster generation still depends on parameters like K (no. of clusters).

42 IIIT Hyderabad Future Work Provide a brush for PCP view Incorporate support for subspace clustering Conduct experiments based on wrapper clustering methods

43 IIIT Hyderabad Thank you


Download ppt "IIIT Hyderabad Interactive Visualization and Tuning of Multi-Dimensional Clusters for Indexing Dasari Pavan Kumar (MS by Research Thesis) Centre for Visual."

Similar presentations


Ads by Google