Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.

Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

What is clustering? Given a collection of objects (characterized by feature vectors, or just a matrix of pair-wise similarities), detects the presence of distinct groups, and assign objects to groups.

Another example

Why should we care about clustering? Clustering is a basic step in most data mining procedures: Examples :  Clustering movie viewers for movie ranking.  Clustering proteins by their functionality.  Clustering text documents for content similarity.

Clustering is one of the most widely used tool for exploratory data analysis. Social Sciences Biology Astronomy Computer Science. All apply clustering to gain a first understanding of the structure of large data sets. The Theory-Practice Gap Yet, there exist distressingly little theoretical understanding of clustering

What questions should research address?  What is clustering?  What is “good” clustering?  Can clustering be carried out efficiently?  Can we distinguish “clusterable” from “structureless” data?  Many more …

“Clustering” is an ill defined problem  There are many different clustering tasks, leading to different clustering paradigms: There are Many Clustering Tasks

Some more examples

I would like to discuss the broad notion of clustering Independently of any particular algorithm, objective function or generative data model

What for?  Choosing a suitable algorithm for a given task.  Evaluating the quality of clustering methods.  Distinguishing significant structure from random fata morgana,  Distinguishing clusterable data from structure-less data.  Providing performance guarantees for clustering algorithms.  Much more …

The Basic Setting S  For a finite domain set S, a dissimilarity function (DF) is a symmetric mapping d:SxSR + d:SxS → R + such that d(x,y)=0x=y d(x,y)=0 iff x=y. S S  A clustering function takes a dissimilarity function on S and returns a partition of S.  We wish to define the properties that distinguish clustering functions (from any other functions that output domain partitions).

Kleinberg’s Axioms  Scale Invariance F(λd)=F(d)d λ F(λd)=F(d) for all d and all strictly positive λ.  Richness S For any finite domain S, {F(d): d S}={P:P S} {F(d): d is a DF over S}={P:P a partition of S}  Consistency d’d F(d) F(d)F(d)=F(d’). d’ equals d except for shrinking distances within clusters of F(d) or stretching between-cluster distances (w.r.t. F(d) ), then F(d)=F(d’).

Note that any pair is realizable Consider Single-Linkage with different stopping criteria:  k connected components.  Distance r stopping.  Scale α stopping: add edges as long as their length is at most α(max-distance)

Kleinberg’s Impossibility result There exist no clustering function Proof: Scaling up Consistency

A Different Perspective  The goal is to generate a variety of axioms (or properties) over a fixed framework, so that different clustering approaches could be classified by the different subsets of axioms they satisfy.  Axioms as a tool for classifying clustering paradigms

Axioms as a tool for a taxonomy of clustering paradigms The goal is to generate a variety of axioms (or properties) over a fixed framework, so that different clustering approaches could be classified by the different subsets of axioms they satisfy. Scale Invariance RichnessLocal Consistency Full Consistency Single Linkage -+++ Center Based +++- Spectral +++- MDL ++- Rate Distortion ++- “Axioms”“Properties”

Ideal Theory  We would like to have a list of simple properties so that major clustering methods are distinguishable from each other using these properties.  We would like the axioms to be such that all methods satisfy all of them, and nothing that is clearly not a clustering satisfies all of them. (this is probably too much to hope for).  In the remainder of this talk, I would like to discuss some candidate “axioms” and “properties” to get a taste of what this theory-development program may involve.

Types of Axioms/Properties Richness requirements E.g., relaxations of Kelinberg’s richness, e.g., {F(d): d S}={P:P S k } {F(d): d is a DF over S}={P:P a partition of S into k sets} Invariance/Robustness/Stability requirements. E.g., Scale-Invariance, Consistency, robustness d F to perturbations of d (“smoothness” of F) or stability S w.r.t. sampling of S.

Relaxations of Consistency Local Consistency – C 1 …C k F(d). Let C 1, …C k be the clusters of F(d). λ 0 ≥ 1 λ 1,..λ k ≤1d’ For every λ 0 ≥ 1 and positive λ 1,..λ k ≤ 1, if d’ is defined by: λ i d(a,b)abC i λ i d(a,b) if a and b are in C id’(a,b)= λ 0 d(a,b)a,bF(d)- λ 0 d(a,b) if a,b are not in the same F(d)- cluster, F(d)=F(d’). then F(d)=F(d’). Is there any known clustering method for which it fails?

Other types of clustering  Culotta and McCallum’s “Clusterwise Similarity”  Edge-Detection (advantage to smooth contours)  Texture clustering  The professors example.

Conclusions and open questions  There is a place for developing an axiomatic framework for clustering.  The existing negative results do not rule out the possibility of useful axiomatization.  We should also develop a system of “clustering properties” for a taxonomy of clustering methods.  There are many possible routes to take and hidden subtleties in this project.

Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.

Similar presentations

Presentation on theme: "Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.

Similar presentations

Presentation on theme: "Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007."— Presentation transcript:

Similar presentations

About project

Feedback