IIIT Hyderabad Interactive Visualization and Tuning of Multi-Dimensional Clusters for Indexing Dasari Pavan Kumar (MS by Research Thesis) Centre for Visual.

IIIT Hyderabad Interactive Visualization and Tuning of Multi-Dimensional Clusters for Indexing Dasari Pavan Kumar (MS by Research Thesis) Centre for Visual Information Technology

IIIT Hyderabad Overview Provide a framework to generate better clusters for high dimensional data points Provide a fast cluster analysis/generation tool

IIIT Hyderabad Data, Data, Data ! Digital data creation at an unprecedented rate Data is collected to extract/search “valuable” information – A difficult task however! Data generation in previous decade consisted mostly of textual information – Inverted Index, suffix trees, N-grams, etc

IIIT Hyderabad More data ! Flickr, Youtube, etc changed the game – Non-textual information (images) – Huge amounts of data! New methods! (Content based Image Retrieval) – Underlying processes remain similar Why image search? – Copyright Infringement, Offensive, Education, etc

IIIT Hyderabad Multi-dimensional Multi-variate data Stock markets Weather/climate Business Huge datasets – multiple dimensions. Finding “insights” can’t be fully automated.

IIIT Hyderabad Data Visualization Human intelligence/cognition is unmatchable by computers Cluster analysis – descriptive modeling Information Visualizations to support analysis – Identify important features/patterns

IIIT Hyderabad What if you have millions of high- dimensional data points? XMDV tool (M. Ward) – Scatter-plot matrix – Parallel Coordinate Plot Cluster tree (Stuetzle) Cone trees (Robertson et. al) Past Attempts!

IIIT Hyderabad Indexing images/videos Extract feature vectors from images Apply clustering to compute bag of words Generate feature histogram and perform some ML methods

IIIT Hyderabad Using SIFT features The fundamental problem – sheer volume of data No. of dimensions – 128 No. of data points – in millions Other low-level image features exist – GLOH, steerable filter, spin images

IIIT Hyderabad Clusters + visualization The problem – choosing the right bag of words (clusters) Better visual words lead to better classification

IIIT Hyderabad Cluster analysis Provide a framework for user to – Identify better subspaces – Efficiently/quickly compute clusters – Compare clustering schemas

Extracted low-level image descriptors Manageable size (high dimensional) Statistical sampling Priority/Weight assignment to features Clustering (Visual Words) Visualization system Automatic weight recommendation 1 Automatic weight recommendation N User defined weight re-assignment Verification Cluster entire set Good Bad Output Schema Framework

IIIT Hyderabad Tool

IIIT Hyderabad Extracted low-level image descriptors Manageable size (high dimensional) Statistical sampling Priority/Weight assignment to features Clustering (Visual Words) Visualization system Automatic weight recommendation 1 Automatic weight recommendation N User defined weight re-assignment Verification Cluster entire set Good Bad Output Schema Framework

IIIT Hyderabad Why prioritize dimensions? Dimensionality reduction !! – Feature transformation – Feature selection

IIIT Hyderabad Why not feature transformation? Dimensions can be redundant/irrelevant – Hence PCA cant be trivially applied Clusters could be lost in cloud of dimensions (curse of dimensionality) Difficult to interpret the combination

IIIT Hyderabad Feature selection Wrapper model – “wrap” selection process around the mining algorithm – Go hand in hand giving little control Filter model – Examine intrinsic properties

IIIT Hyderabad “Interesting” dimensions Without any rank – Analyze density distribution based on grids – Difficult to compare since its highly dependent on density parameter Rank dimensions – Based on distribution of data Uniformity (Entropy) No. of outliers No. of unique values d>(Q 3 +1.5*IQR) || d<(Q 1 -1.5*IQR)

IIIT Hyderabad Ranked dimensions Assign weights based on the amount of “interestingness” – 1D Histogram of distribution – 2D correlations - PCP How do we assign weights? Manual – Automatic suggestions !

IIIT Hyderabad Glyph view Standard SIFT glyph Bar chart – Length – rank – Color - weight Colormap

IIIT Hyderabad Data clustering Sample data set – 1.3 million points with 128 dimensions Cluster such data on a commodity pc – Almost impossible

IIIT Hyderabad Data clustering Plug-in for any cluster technique – Currently using k-means (GPU) Currently 200 iterations for 1.3 million SIFT vectors – 12 sec for each iteration for 1000 clusters

IIIT Hyderabad Cluster Viz. Visualizing clusters over 128 dimensions – Not feasible Re-project into 2D space – Necessity for some sort of layout Plug-in any graph drawing – Current – 2D force based

IIIT Hyderabad Graph representation Compute cluster tree of nearest neighbor density – Similar nodes must be close – Can be estimated using MST Generate minimum spanning tree (MST) of cluster centers – Single linkage dendogram – Prim’s method

IIIT Hyderabad Graph drawing Use a GPU implementation of force based graph layout – Takes 0.2 sec for 1000 nodes Drill-down “visual word” to actually see the “sift” interest points to understand the similarity MST without layout MST with layout

IIIT Hyderabad Similar looking regions clustered into the same id

IIIT Hyderabad Cluster validation Two clustering schemas – Visually not feasible to compare Three basic strategies – Internal – compare schema C with proximity matrix – External – build an independent partition according to our intuition Comparison with schema C or proximity matrix. – Relative – choose the one that best fits !! Computationally not feasible

IIIT Hyderabad Relative validity Some indices – RS value – Davies-Bouldin index – SD index Around 1 minute for each schema C on CPU GPU implementation takes 1 second

IIIT Hyderabad Validity indices Indices plotted over a line graph – Obtain min/max of the graph – optimal clusters N c Iteration Index

IIIT Hyderabad Automatic weight recommendation Only a suggestive process Final decision left to user

IIIT Hyderabad Results on UIUC image collection A total of 4485 images 15 categories Mean classification accuracy of 57.6% for SIFT with DoG

IIIT Hyderabad Interesting observation 135◦, 215◦, 270◦ – Lower weights assigned by automatic schemas Same with corner cells Ds = {4, 12, 22, 43, 44, 54, 55, 71, 78, 79, 83, 84, 110, 116} 1D histograms corresponding to dimensions (a)84, (b) 110, (c) 124

IIIT Hyderabad Results on UIUC image collection More clusters does not mean better classification Fei-Fei et al. report a mean accuracy of 52.5% VW = Number of visual words, EW = K-means using uniform weights, IW = K- means with weights adjusted interactively, IW-Ds = K-means with Ds dimensions given a weight zero and weights of other dimensions adjusted interactively.

IIIT Hyderabad Results on UIUC image collection More clusters does not necessarily mean better classification Fei-Fei et al. report a mean accuracy of 52.5%

IIIT Hyderabad Summary Provide a framework for better cluster generation Provide fast cluster analysis/generation tool for a commodity pc enabled with GPU Able to analyze distributions across dimensions – Identified redundant dimensions Able to achieve higher classification ratios with relative ease

IIIT Hyderabad Publications Interactive Visualization and Tuning of SIFT Indexing, Dasari Pavan Kumar and P.J.Narayanan, Vision, Modelling and Visualization, 2010, Siegen, Germany

IIIT Hyderabad Limitations Limited by GPU and CPU memory User needs to get familiarized with the tool Visual decoding of data is sometimes difficult Cluster generation still depends on parameters like K (no. of clusters).

IIIT Hyderabad Future Work Provide a brush for PCP view Incorporate support for subspace clustering Conduct experiments based on wrapper clustering methods

IIIT Hyderabad Thank you

IIIT Hyderabad Interactive Visualization and Tuning of Multi-Dimensional Clusters for Indexing Dasari Pavan Kumar (MS by Research Thesis) Centre for Visual.

Similar presentations

Presentation on theme: "IIIT Hyderabad Interactive Visualization and Tuning of Multi-Dimensional Clusters for Indexing Dasari Pavan Kumar (MS by Research Thesis) Centre for Visual."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IIIT Hyderabad Interactive Visualization and Tuning of Multi-Dimensional Clusters for Indexing Dasari Pavan Kumar (MS by Research Thesis) Centre for Visual.

Similar presentations

Presentation on theme: "IIIT Hyderabad Interactive Visualization and Tuning of Multi-Dimensional Clusters for Indexing Dasari Pavan Kumar (MS by Research Thesis) Centre for Visual."— Presentation transcript:

Similar presentations

About project

Feedback