Download presentation
Presentation is loading. Please wait.
Published byBelinda Potter Modified over 9 years ago
1
Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007
2
What is clustering? Given a collection of objects (characterized by feature vectors, or just a matrix of pair-wise similarities), detects the presence of distinct groups, and assign objects to groups.
3
Another example
4
Why should we care about clustering? Clustering is a basic step in most data mining procedures: Examples : Clustering movie viewers for movie ranking. Clustering proteins by their functionality. Clustering text documents for content similarity.
5
Clustering is one of the most widely used tool for exploratory data analysis. Social Sciences Biology Astronomy Computer Science. All apply clustering to gain a first understanding of the structure of large data sets. The Theory-Practice Gap Yet, there exist distressingly little theoretical understanding of clustering
6
What questions should research address? What is clustering? What is “good” clustering? Can clustering be carried out efficiently? Can we distinguish “clusterable” from “structureless” data? Many more …
7
“Clustering” is an ill defined problem There are many different clustering tasks, leading to different clustering paradigms: There are Many Clustering Tasks
8
“Clustering” is an ill defined problem There are many different clustering tasks, leading to different clustering paradigms: There are Many Clustering Tasks
9
Some more examples
10
I would like to discuss the broad notion of clustering Independently of any particular algorithm, objective function or generative data model
11
What for? Choosing a suitable algorithm for a given task. Evaluating the quality of clustering methods. Distinguishing significant structure from random fata morgana, Distinguishing clusterable data from structure-less data. Providing performance guarantees for clustering algorithms. Much more …
12
The Basic Setting S For a finite domain set S, a dissimilarity function (DF) is a symmetric mapping d:SxSR + d:SxS → R + such that d(x,y)=0x=y d(x,y)=0 iff x=y. S S A clustering function takes a dissimilarity function on S and returns a partition of S. We wish to define the properties that distinguish clustering functions (from any other functions that output domain partitions).
13
Kleinberg’s Axioms Scale Invariance F(λd)=F(d)d λ F(λd)=F(d) for all d and all strictly positive λ. Richness S For any finite domain S, {F(d): d S}={P:P S} {F(d): d is a DF over S}={P:P a partition of S} Consistency d’d F(d) F(d)F(d)=F(d’). d’ equals d except for shrinking distances within clusters of F(d) or stretching between-cluster distances (w.r.t. F(d) ), then F(d)=F(d’).
14
Note that any pair is realizable Consider Single-Linkage with different stopping criteria: k connected components. Distance r stopping. Scale α stopping: add edges as long as their length is at most α(max-distance)
15
Kleinberg’s Impossibility result There exist no clustering function Proof: Scaling up Consistency
16
A Different Perspective The goal is to generate a variety of axioms (or properties) over a fixed framework, so that different clustering approaches could be classified by the different subsets of axioms they satisfy. Axioms as a tool for classifying clustering paradigms
17
Axioms as a tool for a taxonomy of clustering paradigms The goal is to generate a variety of axioms (or properties) over a fixed framework, so that different clustering approaches could be classified by the different subsets of axioms they satisfy. Scale Invariance RichnessLocal Consistency Full Consistency Single Linkage -+++ Center Based +++- Spectral +++- MDL ++- Rate Distortion ++- “Axioms”“Properties”
18
Ideal Theory We would like to have a list of simple properties so that major clustering methods are distinguishable from each other using these properties. We would like the axioms to be such that all methods satisfy all of them, and nothing that is clearly not a clustering satisfies all of them. (this is probably too much to hope for). In the remainder of this talk, I would like to discuss some candidate “axioms” and “properties” to get a taste of what this theory-development program may involve.
19
Types of Axioms/Properties Richness requirements E.g., relaxations of Kelinberg’s richness, e.g., {F(d): d S}={P:P S k } {F(d): d is a DF over S}={P:P a partition of S into k sets} Invariance/Robustness/Stability requirements. E.g., Scale-Invariance, Consistency, robustness d F to perturbations of d (“smoothness” of F) or stability S w.r.t. sampling of S.
20
Relaxations of Consistency Local Consistency – C 1 …C k F(d). Let C 1, …C k be the clusters of F(d). λ 0 ≥ 1 λ 1,..λ k ≤1d’ For every λ 0 ≥ 1 and positive λ 1,..λ k ≤ 1, if d’ is defined by: λ i d(a,b)abC i λ i d(a,b) if a and b are in C id’(a,b)= λ 0 d(a,b)a,bF(d)- λ 0 d(a,b) if a,b are not in the same F(d)- cluster, F(d)=F(d’). then F(d)=F(d’). Is there any known clustering method for which it fails?
21
Other types of clustering Culotta and McCallum’s “Clusterwise Similarity” Edge-Detection (advantage to smooth contours) Texture clustering The professors example.
22
Conclusions and open questions There is a place for developing an axiomatic framework for clustering. The existing negative results do not rule out the possibility of useful axiomatization. We should also develop a system of “clustering properties” for a taxonomy of clustering methods. There are many possible routes to take and hidden subtleties in this project.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.