Download presentation
Presentation is loading. Please wait.
Published byArchibald Watkins Modified over 9 years ago
1
Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker
2
Clustering is one of the most widely used tools for exploratory data analysis. Social Sciences Biology Astronomy Computer Science …. All apply clustering to gain a first understanding of the structure of large data sets. 2 The Theory-Practice Gap
3
“While the interest in and application of cluster analysis has been rising rapidly, the abstract nature of the tool is still poorly understood” (Wright, 1973) “There has been relatively little work aimed at reasoning about clustering independently of any particular algorithm, objective function, or generative data model” (Kleinberg, 2002) Both statements still apply today. 3 The Theory-Practice Gap
4
Clustering aims to assign data into groups of similar items Beyond that, there is very little consensus on the definition of clustering 4 Inherent Obstacles: Clustering is ill-defined Inherent Obstacles: Clustering is ill-defined
5
Clustering is inherently ambiguous There may be multiple reasonable clusterings There is usually no ground truth 5 Inherent Obstacles
6
6 Differences in Input/Output Behavior of Clustering Algorithms
7
7
8
Previous work Clustering algorithm selection Characterization of Linkage-Based clustering Conclusions and future work 8 Outline
9
Axioms of clustering [ (Wright, ‘73), (Meila, ACM ‘05), (Pattern Recognition, ‘00), (Kleinberg, NIPS ‘02), (Ackerman & Ben-David, NIPS ‘08)]. Clusterability [(Balcan, Blum, and Vempala, STOC ‘08),(Balcan, Blum and Gupta, SODA ‘09), (Ackerman & Ben-David, AISTATS ’09)]. 9 Previous Work Towards a General Theory Previous Work Towards a General Theory
10
Previous work Clustering algorithm selection Characterization of Linkage-Based clustering Conclusions and future work 10 Outline
11
There are a wide variety of clustering algorithms, which often produce very different clusterings. 11 How should a user decide which algorithm to use for a given application? Selecting a Clustering Algorithm
12
12 Selecting a Clustering Algorithm Users rely on cost related considerations: running times, space usage, software purchasing costs, etc… There is inadequate emphasis on input-output behaviour
13
Identify properties that distinguish between different input-output behaviour of clustering paradigms The properties should be: 1) Intuitive and “user-friendly” 2) Useful for distinguishing clustering algorithms 13 Our Framework for Selecting a Clustering Algorithm
14
Enables users to identify a suitable algorithm without the overhead of executing many algorithms Helps understand the behaviour of algorithms The long-term goal is to construct a large property-based classification for many useful clustering algorithms 14 Our Framework for Selecting a Clustering Algorithm
15
15 Taxonomy of Partitional Algorithms (Ackerman, Ben-David, Loker, NIPS 2010) Taxonomy of Partitional Algorithms (Ackerman, Ben-David, Loker, NIPS 2010)
16
Properties Axioms 16 Properties VS Axioms
17
17 Characterization of Linkage-Based Clustering (Ackerman, Ben-David, Loker, COLT 2010) Characterization of Linkage-Based Clustering (Ackerman, Ben-David, Loker, COLT 2010)
18
The 2010 characterization applies in the partitional setting, by using the k-stopping criteria. This characterization distinguished linkage-based algorithms from other partitional algorithms. 18 Characterization of Linkage-Based Clustering (Ackerman, Ben-David, Loker, COLT 2010) Characterization of Linkage-Based Clustering (Ackerman, Ben-David, Loker, COLT 2010)
19
Propose two intuitive properties that uniquely indentify hierarchical linkage-based clustering algorithms. Show that common hierarchical algorithms, including bisecting k-means, cannot be simulated by any linkage-based algorithm 19 Characterizing Linkage-Based Clustering in the Hierarchical Setting (Ackerman and Ben-David, IJCAI 2011) Characterizing Linkage-Based Clustering in the Hierarchical Setting (Ackerman and Ben-David, IJCAI 2011)
20
Previous work Clustering algorithm selection Characterization of Linkage-Based clustering Conclusions and future work 20 Outline
21
C_i is a cluster in a dendrogram D if there exists a node in the dendrogram so that C_i is the set of its leaf descendents. 21 Formal Setup: Dendrograms and clusterings Formal Setup: Dendrograms and clusterings
22
C = {C 1, …, C k } is a clustering in a dendrogram D if – C i is a cluster in D for all 1≤ i ≤ k, and – The clusters are disjoint 22 Formal Setup: Dendrograms and clusterings Formal Setup: Dendrograms and clusterings
23
A Hierarchical Clustering Algorithm A maps Input: A data set X with a dissimilarity function d, denoted (X,d) to Output: A dendrogram of X 23 Formal Setup: Hierarchical clustering algorithm Formal Setup: Hierarchical clustering algorithm
24
Create a leaf node for every element of X Insert image 24 Linkage-Based Algorithm
25
X Create a leaf node for every elements of X Repeat the following until a single tree remains: – Consider clusters represented by the remaining root nodes. 25 Linkage-Based Algorithm
26
Create a leaf node for every elements of X Repeat the following until a single tree remains: – Consider clusters represented by the remaining root nodes. Merge the closest pair of clusters by assigning them a common parent node. 26 ? Linkage-Based Algorithm
27
The choice of Linkage Function distinguishes between different linkage-based algorithms. Examples of common linkage-functions – Single-linkage: shortest between-cluster distance – Average-linkage: average between-cluster distance – Complete-linkage: maximum between-cluster distance X1X1X1X1 X2X2X2X2 27 Example Linkage-Based Algorithms
28
If we select a set of disjoint clusters from a dendrogram, and run the algorithm on the union of these clusters, we obtain a result that is consistent with the original dendrogram. D = A(X,d) D’ = A(X’,d) X’={x 1, …, x 6 } 28 Locality Informal Definition
29
If we select a set of disjoint clusters from a dendrogram, and run the algorithm on the union of these clusters, we obtain a result that is consistent with the original dendrogram. D = A(X,d) D’ = A(X’,d) X’={x 1, …, x 6 } 29 Locality Informal Definition
30
A(X,d) C C on dataset (X,d) C on dataset (X,d’) Outer-consistent change 30 If A is outer-consistent, then A(X,d’) will also include the clustering C. Outer Consistency
31
Theorem (Ackerman & Ben-David, IJCAI 2011): A hierarchical clustering algorithm is Linkage-Based if and only if it is Local and Outer-Consistent. 31
32
Every Linkage-Based hierarchical clustering algorithm is Local and Outer-Consistent. The proof is quite straightforward. 32 Easy Direction of Proof
33
AA If A is Local and Outer-Consistent, then A is Linkage-Based. To prove this direction we first need to formalize Linkage-Based clustering, by formally defining what is a Linkage Function. 33 Interesting Direction of Proof
34
A Linkage Function is a function l :{(X 1, X 2,d): d is a distance function over X 1 u X 2 }→ R + that satisfies the following: -Representation independence: Doesn’t change if we re-label data -Monotonicity: if we increase edges that go between X 1 and X 2, then l (X 1, X 2,d) doesn’t decrease. (X 1 u X 2,d) X1X1 X2X2 34 What Do We Expect From Linkage Functions?
35
Recall direction: If A satisfies Outer-Consistency and Locality, then A is Linkage-Based. Goal: Define a linkage function l so that the linkage-based clustering based on l outputs A(X,d) (for every X and d). 35 Proof Sketch
36
Define an operator < A : (X,Y,d 1 ) < A (Z,W,d 2 ) if when we run A on (X u Y u Z u W,d), where d extends d 1 and d 2, X and Y are merged before Z and W. A(X,d) Z W X Y Prove that < A can be extended to a partial ordering Use the ordering to define l 36 Proof Sketch
37
We show that < A is cycle-free. A Lemma: Given a hierarchical algorithm A that is Local and Outer-Consistent, there exists no finite sequence so that (X 1,Y 1,d 1 ) < A …. < A (X n,Y n,d n ) < A (X 1,Y 1,d 1 ). 37 Sketch of proof continue: Show that < A is a partial ordering
38
By the above Lemma, the transitive closure of < A is a partial ordering. This implies that there exists an order preserving function l that maps pairs of data sets to R +. It can be shown that l satisfies the properties of a Linkage Function. 38 Proof Sketch (continued…)
39
P -Divisive algorithms construct dendrograms top-down using a partitional 2-clustering algorithm P to split nodes. 39 P Apply partitional clustering P Ex. k-means for k=2 Hierarchical but Not Linkage-Based
40
A partitional 2-clustering algorithm P is Context Sensitive if there exist d’ extending d so that P({x,y,z},d) = {x, {y,z}} and P({x,y,z,w},d’)= {{x,y}, {z,w}}. A partitional 2-clustering algorithm P is Context Sensitive if there exist d’ extending d so that P({x,y,z},d) = {x, {y,z}} and P({x,y,z,w},d’)= {{x,y}, {z,w}}. Ex. K-means, min-sum, min-diameter. 40 Theorem [Ackerman & Ben-David, IJCAI ’11]: If P is context-sensitive, then the P –divisive algorithm fails the locality property. Hierarchical but Not Linkage-Based
41
The input-output behaviour of some natural divisive algorithms is distinct from that of all linkage-based algorithms. The bisecting k-means algorithm, and other natural divisive algorithms, cannot be simulated by any linkage-based algorithm. 41 Hierarchical but Not Linkage-Based
42
We present a new framework for clustering algorithm selection Provide a property-based classification of common clustering algorithms Characterize linkage-based clustering in terms of two natural properties Show that no linkage-based algorithm can simulate some natural divisive algorithms 42 Conclusions
43
Apply our approach to specific clustering applications (Ackerman, Brow, and Loker, ICCABS ‘12). Bridging the gap in other clustering settings – clustering with a “noise cluster” – algorithms for categorical data Axioms of clustering algorithms 43 What’s Next?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.