Unsupervised Image Clustering using Probabilistic Continuous Models and Information Theoretic Principles Shiri Gordon Electrical Engineering – System,

Unsupervised Image Clustering using Probabilistic Continuous Models and Information Theoretic Principles Shiri Gordon Electrical Engineering – System, Faculty of Engineering, Tel-Aviv University Under the supervision of: Doctor Hayit Greenspan

Introduction : Content-Based Image Retrieval (CBIR) The interest in Content-Based Image Retrieval (CBIR) and efficient image search algorithms has grown out of the necessity of managing large image databases Most CBIR systems are based on search-by-query –The user provides an example image –The database is searched exhaustively for images which are most similar to the query

CBIR: Issues Image representation Distance measure between images Image search algorithms Qbic - IBM Blobworld – Berkeley Photobook – MIT VisualSEEk – Colombia

What is Image Clustering ?? Performing supervised / unsupervised mapping of the archive images into classes The classes should provide the same information about the image archive as the entire image collection

Why do we need Clustering ?? Faster search-by-query algorithms Browsing environment Image categorization Query image Cluster center Images

Why do we need Clustering ?? Browsing environment Image categorization Faster search-by-query algorithms Cluster center Images

Why do we need Clustering ?? “Yellow” “Blue” “Green” Browsing environment Image categorization Faster search-by-query algorithms Cluster center Images

GMM-IB System Block-Diagram Clustering via Information-Bottleneck (IB) method Image GMM Cluster GMM Images Image Clusters

Feature space= color (CIE-lab); Spatial (x,y); … Grouping the feature vectors in a 5-dimensional space Image is modeled as a Gaussian mixture distribution in feature space Image Representation [ “Blobworld”: Belongie, Carson, Greenspan, Malik, PAMI 2002] PixelsFeature vectorsRegions

Image Representation via Gaussian Mixture Modeling (GMM) Feature Space GMM Parameter set : Expectation-maximization (EM) algorithm- to determine the maximum likelihood parameters of a mixture of k Gaussians –Initialization of the EM algorithm via K-means –Model selection via MDL (Minimum Description Length) EM

5-dimensional space: Color (L*a*b) & Spatial (x,y) GMM

Category GMM ImagesImage ModelsCategory Model Variability in colors per spatial location Variability in location per spatial color

8.529.128.714.4(4)flowers 27.714.236.330.2(3)sunset 30.442.110.429.6(2)snow 16.434.832.56.5(1)monkey (4)(3)(2)(1)Image\category KL distance between Image model to category model: Kullback-Leibler (KL) distance between distributions: GMM – KL Framework [Greenspan, Goldberger, Ridel. CVIU 2001] Image distribution Category distribution Feature set extracted from image Data set size

The desired clustering is the one that minimizes the loss of mutual information between objects and features extracted from them The information contained in the objects about the features is ‘squeezed’ through a compact ‘bottleneck’ of clusters Unsupervised Clustering using the Information-Bottleneck (IB) principle N.Slonim, N.Tishby. In Proc. of NIPS 1999

Clusters Information Bottleneck Principle Motivation Features Number of required clusters Objects

The minimization problem posed by the IB principle can be approximated by various algorithms using a greedy merging criterion: Information Bottleneck Principle Greedy Criterion KL distance:Prior probability

GMM-IB Framework Image clusters Images Prior probability KL distance Feature vectors

Example 876543210876543210

Results AIB - Optimum number of clusters Loss of mutual information during the clustering process

Results AIB - Generated Tree ?

Mutual Information as a quality measure The reduction in the uncertainty of X based on the knowledge of Y: No closed-form expression for a mixture of Gaussian distribution The greedy criterion derived from the IB principle provides a tool for approximating this measure

Mutual Information as a quality measure Example C1C1 C2C2 C3C3 C1C1 C2C2 C3C3 I(C;Y)1.511.321.18 I(X;Y)2.732.72

Results Image database of 1460 images selectively hand- picked from the COREL database to create 16 labeled categories Building the GMM model for each image Applying the various algorithms, using various image representations to the database

Results Retrieval Experiments Clustering for efficient retrieval Comparing between clustering methodologies

Results Mutual Information as a quality measure Comparing between image representations 1.67SIB + average GMM 1.68K-means + reduced GMM 1.63AIB I(C;Y)Clustering method Comparing between clustering algorithms

Summary Image clustering is done using the IB method IB is applied on continuous representations of images and categories with Gaussian Mixture Models From the AIB algorithm : –We conclude the optimal number of clusters in the database –We have a “built-in” distance measure –The database is arranged in a tree structure that provides a browsing environment and more efficient search algorithms –The tree can be modified using algorithms like the SIB and K-means to achieve a more stable solution

Future Work Making the current framework more feasible for large databases: –A simpler approximation for the KL-distance –Incorporating the reduced category GMM into the clustering algorithms Performing relaxation on the hierarchical tree structure Using the tree structure for the creation of a “user-friendly” environment Extending the feature space

Unsupervised Image Clustering using Probabilistic Continuous Models and Information Theoretic Principles Shiri Gordon Electrical Engineering – System,

Similar presentations

Presentation on theme: "Unsupervised Image Clustering using Probabilistic Continuous Models and Information Theoretic Principles Shiri Gordon Electrical Engineering – System,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Unsupervised Image Clustering using Probabilistic Continuous Models and Information Theoretic Principles Shiri Gordon Electrical Engineering – System,

Similar presentations

Presentation on theme: "Unsupervised Image Clustering using Probabilistic Continuous Models and Information Theoretic Principles Shiri Gordon Electrical Engineering – System,"— Presentation transcript:

Similar presentations

About project

Feedback