1. Clustering is one of the most widely used tools for exploratory data analysis. Social Sciences Biology Astronomy Computer Science …. All apply clustering.

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Hierarchical Clustering, DBSCAN The EM Algorithm
Introduction to Graphs
Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.
Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Margareta Ackerman Joint work with Shai Ben-David Measures of Clustering Quality: A Working Set of Axioms for Clustering.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT.
Introduction to Bioinformatics
Weighted Clustering Margareta Ackerman Work with Shai Ben-David, Simina Branzei, and David Loker.
Discerning Linkage-Based Algorithms Among Hierarchical Clustering Methods Margareta Ackerman and Shai Ben-David IJCAI 2011.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Lecture 6 Image Segmentation
Clustering II.
Complexity 5-1 Complexity Andrei Bulatov Complexity of Problems.
Image Segmentation Chapter 14, David A. Forsyth and Jean Ponce, “Computer Vision: A Modern Approach”.
Cluster Analysis: Basic Concepts and Algorithms
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Clustering III. Lecture outline Soft (model-based) clustering and EM algorithm Clustering aggregation [A. Gionis, H. Mannila, P. Tsaparas: Clustering.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Evaluating Performance for Data Mining Techniques
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
An Information Theory based Modeling of DSMLs Zekai Demirezen 1, Barrett Bryant 1, Murat M. Tanik 2 1 Department of Computer and Information Sciences,
An Impossibility Theorem for Clustering By Jon Kleinberg.
VAST 2011 Sebastian Bremm, Tatiana von Landesberger, Martin Heß, Tobias Schreck, Philipp Weil, and Kay Hamacher Interactive-Graphics Systems TU Darmstadt,
Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker.
1 2. Independence and Bernoulli Trials Independence: Events A and B are independent if It is easy to show that A, B independent implies are all independent.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Keng-Wei Chang Author: Yehuda.
Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Chapter 10 Graph Theory Eulerian Cycle and the property of graph theory 10.3 The important property of graph theory and its representation 10.4.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.
Formal Foundations of Clustering Margareta Ackerman Work with Shai Ben-David, Simina Branzei, and David Loker.
Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING.
Introduction to Graphs. This Lecture In this part we will study some basic graph theory. Graph is a useful concept to model many problems in computer.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Machine Learning Lunch - 29 Sep 2009 – ClusteringTheory.org John Oliver from “The Daily Show” Supporting worthy causes at the G20 Pittsburgh Summit: “Bayesians.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Hierarchical Clustering
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
Hierarchical Clustering
K-means and Hierarchical Clustering
John Nicholas Owen Sarah Smith
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Clustering.
SEEM4630 Tutorial 3 – Clustering.
Hierarchical Clustering
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
Presentation transcript:

1

Clustering is one of the most widely used tools for exploratory data analysis. Social Sciences Biology Astronomy Computer Science …. All apply clustering to gain a first understanding of the structure of large data sets. The Theory-Practice Gap 2

“While the interest in and application of cluster analysis has been rising rapidly, the abstract nature of the tool is still poorly understood” (Wright, 1973) “There has been relatively little work aimed at reasoning about clustering independently of any particular algorithm, objective function, or generative data model” (Kleinberg, 2002) Both statements still apply today. 3

Clustering aims to assign data into groups of similar items Beyond that, there is very little consensus on the definition of clustering Inherent Obstacles: Clustering is ill-defined 4

Clustering is inherently ambiguous – There may be multiple reasonable clusterings – There is usually no ground truth There are many clustering algorithms with different (often implicit) objective functions Inherent Obstacles 5

Previous work Clustering algorithm selection Characterization of Linkage-Based clustering – Sketch of proof – Hierarchical algorithms that are not linkage- based Conclusions and future work 6 Outline

Clustering in the weighted setting (Wright, ‘73) Axioms of clustering distance functions (Meila, ACM ‘05) Impossibility result (Kleinberg, NIPS ‘02) Rebuttal to impossibility result (Ackerman & Ben- David, NIPS ‘08) 7 Previous Work Towards a General Theory: Axiomatizing clustering

Conditions for efficiently uncovering the target clustering [(Balcan, Blum, and Vempala, STOC ‘08),(Balcan, Blum and Gupta, SODA ‘09)] Theoretical study of clusterability (Ackerman & Ben-David, AISTATS ‘09)].  Notions of clusterability are pairwise distinct  Data sets that are more clusterable are computationally easier to cluster well. 8 Previous Work Towards a General Theory: Clusterability

Previous work Clustering algorithm selection Characterization of Linkage-Based clustering – Sketch of proof – Heirarchical algorithms that are not linkage- based Conclusions and future work 9 Outline

There are a wide variety of clustering algorithms, which often produce very different clusterings. Clustering Algorithm Selection 10 How should a user decide which algorithm to use for a given application?

Users rely on cost related considerations: running times, space usage, software purchasing costs, etc… There is inadequate emphasis on input-output behaviour Clustering Algorithm Selection 11

12 Radical Differences in Input/Output Behavior of Clustering Algorithms

13 Radical Differences in Input/Output Behavior of Clustering Algorithms

We propose a framework that lets a user utilize prior knowledge to select an algorithm Identify properties that distinguish between different input-output behaviour of clustering paradigms The properties should be: 1) Intuitive and “user-friendly” 2) Useful for distinguishing clustering algorithms Our Framework for Clustering Algorithm Selection 14

The long-term goal is to construct a large property-based classification for many useful clustering algorithms This would facilitates the application of prior knowledge. Enables users to identify a suitable algorithm without the overhead of executing many algorithms This framework helps understand behaviour of existing and new algorithms Our Framework for Clustering Algorithm Selection 15

Taxonomy of Partitional Algorithms (Ackerman, Ben-David, Loker, NIPS 2010) LocalOuter Con. Inner Con. Refinm. Preserv Order Inv. Outer Rich. Scale Inv. Iso. Inv. Single linkage Average linkage   Complete linkage  K-means  K-median  Min-Sum  Ratio-cut  Normalized- cut  16

Axioms VS Properties LocalOuter Con. Inner Con. Refinm. Preserv Order Inv. Outer Rich. Scale Inv. Iso. Inv. Single linkage Average linkage   Complete linkage  K-means  K-median  Min-Sum  Ratio-cut  Normalized- cut  Properties Axioms 17

Characterization of Linkage-Based Clustering (Ackerman, Ben-David, Loker, COLT 2010) LocalOuter Con. Inner Con. Refinm. Preserv Order Inv. Outer Rich. Scale Inv. Iso. Inv. Single linkage Average linkage   Complete linkage  K-means  K-median  Min-Sum  Ratio-cut  Normalized- cut  18

Characterization of Linkage-Based Clustering (Ackerman, Ben-David, Loker, COLT 2010) The 2010 characterization applies in the partitional setting, by using the k-stopping criteria. This characterization distinguished linkage-based algorithms from other partitional algorithms. LocalOuter Con. Inner Con. Refinm. Preserv Order Inv. Outer Rich. Scale Inv. Iso. Inv. Single linkage Average linkage   Complete linkage  19

Propose two intuitive properties that uniquely indentify hierarchical linkage-based clustering algorithms. Show that common hierarchical algorithms, including bisecting k-means, cannot be simulated by any linkage-based algorithm Characterizing Linkage-Based Clustering in the Heirarchical Setting (Ackerman and Ben-David, IJCAI 2011) 20

Previous work Clustering algorithm selection Characterization of Linkage-Based clustering – Sketch of proof – Hierarchical algorithms that are not linkage- based Conclusions and future work 21 Outline

C_iD C_i C_i is a cluster in a dendrogram D if there exists a node in the dendrogram so that C_i is the set of its leaf descendents. Formal Setup: Dendrograms and clusterings 22

C = {C 1, …, C k } D C = {C 1, …, C k } is a clustering in a dendrogram D if – C i D1≤ i ≤ k – C i is a cluster in D for all 1≤ i ≤ k, and – Clusters are disjoint Formal Setup: Dendrograms and clusterings 23

Formal Setup: Heirarchical clustering algorithm A A Hierarchical Clustering Algorithm A maps X d (X,d) Input: A data set X with a dissimilarity function d, denoted (X,d)to X Output: A dendrogram of X 24

X Create a leaf node for every elements of X Linkage-Based Algorithm Insert image 25

X Create a leaf node for every elements of X Repeat the following until a single tree remains: – Consider clusters represented by the remaining root nodes. Linkage-Based Algorithm 26

Create a leaf node for every elements of X Repeat the following until a single tree remains: – Consider clusters represented by the remaining root nodes. Merge the closest pair of clusters by assigning them a common parent node. Linkage-Based Algorithm 27 ?

The choice of linkage function distinguishes between different linkage-based algorithms. Examples of common linkage-functions – Single-linkage: shortest between-cluster distance – Average-linkage: average between-cluster distance – Complete-linkage: maximum between-cluster distance Examples of Linkage-Based Algorithms X1X1X1X1 X2X2X2X2 28

Locality Informal Definition If we select a set of disjoint clusters from a dendrogram, and run the algorithm on the union of these clusters, we obtain a result that is consistent with the original dendrogram. D = A(X,d) D’ = A(X’,d) X’={x 1, …, x 6 } 29

Locality Informal Definition If we select a set of disjoint clusters from a dendrogram, and run the algorithm on the union of these clusters, we obtain a result that is consistent with the original dendrogram. D = A(X,d) D’ = A(X’,d) X’={x 1, …, x 6 } 30

A(X,d) C C(X,d) C on dataset (X,d) C(X,d’) C on dataset (X,d’) Outer-consistent change 31 Outer Consistency If A is outer-consistent, then A(X,d’) will also include the clustering C.

Theorem (Ackerman & Ben-David, IJCAI 2011): A hierarchical clustering algorithm is Linkage-Based if and only if it is Local and Outer-Consistent. Characterization of Linkage-Based Clustering 32

Previous work Clustering algorithm selection Characterization of Linkage-Based clustering – Sketch of proof – Heirarchical algorithms that are not linkage- based Conclusions and future work 33 Outline

Every Linkage-Based hierarchical clustering algorithm is Local and Outer-Consistent. The proof is quite straightforward. Easy Direction of Proof 34

AA If A is Local and Outer-Consistent, then A is Linkage-Based. To prove this direction we first need to formalize Linkage-Based clustering, by formally defining what is a Linkage Function. Interesting Direction of Proof 35

A Linkage Function is a function l :{(X 1, X 2,d): d X 1 u X 2 }→ R + l :{(X 1, X 2,d): d is a distance function over X 1 u X 2 }→ R + that satisfies the following: What Do We Expect From Linkage Functions? -Representation independence: Doesn’t change if we re-label data X 1 X 2 -Monotonicity: if we increase edges that go between X 1 and X 2, l (X 1, X 2,d) then l (X 1, X 2,d) doesn’t decrease. (X 1 u X 2,d) X1X1X1X1 X2X2X2X2 36

Recall direction: AA If A satisfies Outer-Consistency and Locality, then A is Linkage-Based. Goal: l l A(X,d) Define a linkage function l so that the linkage-based clustering based on l outputs A(X,d) Xd (for every X and d). Sketch of proof 37

Define an operator < A : (X,Y,d 1 ) (Z,W,d 2 ) A(X u Y u Z u W,d) dd 1 d 2 XY ZW (X,Y,d 1 ) < A (Z,W,d 2 ) if when we run A on (X u Y u Z u W,d), where d extends d 1 and d 2, X and Y are merged before Z and W. Sketch of proof A(X,d) Z W X Y Prove that < A can be extended to a partial ordering l Use the ordering to define l 38

Sketch of proof continue: Show that < A is a partial ordering We show that < A is cycle-free. A Lemma: Given a hierarchical algorithm A that is Local and Outer-Consistent, there exists no finite sequence so that (X 1,Y 1,d 1 ) < A …. < A (X n,Y n,d n ) < A (X 1,Y 1,d 1 ). 39

By the above Lemma, the transitive closure of < A is a partial ordering. l R + This implies that there exists an order preserving function l that maps pairs of data sets to R +. l It can be shown that l satisfies the properties of a Linkage Function. Sketch of proof (continued…) 40

Previous work Clustering algorithm selection Characterization of Linkage-Based clustering – Sketch of proof – Hierarchical algorithms that are not linkage- based Conclusions and future work 41 Outline

Hierarchical but Not Linkage-Based P P -Divisive algorithms construct dendrograms top-down P using a partitional 2-clustering algorithm P to split nodes. 42 P Apply partitional clustering P Ex. k-means for k=2

Hierarchical but Not Linkage-Based P A partitional 2-clustering algorithm P is d ⊂ d’ Context Sensitive if there exist d ⊂ d’ so that P({x,y,z},d) = {x, {y,z}} P({x,y,z,w},d’)= {{x,y}, {z,w}}. P({x,y,z},d) = {x, {y,z}} and P({x,y,z,w},d’)= {{x,y}, {z,w}}. P A partitional 2-clustering algorithm P is d ⊂ d’ Context Sensitive if there exist d ⊂ d’ so that P({x,y,z},d) = {x, {y,z}} P({x,y,z,w},d’)= {{x,y}, {z,w}}. P({x,y,z},d) = {x, {y,z}} and P({x,y,z,w},d’)= {{x,y}, {z,w}}. Ex. K-means, min-sum, min-diameter. 43

Hierarchical but Not Linkage-Based The input-output behaviour of some natural divisive algorithms is distinct from that of all linkage-based algorithms. The bisecting k-means algorithm, and other natural divisive algorithms, cannot be simulated by any linkage-based algorithm. 44

Conclusions We present a new framework for clustering algorithm selection Provide a property-based classification of common clustering algorithms Characterize linkage-based clustering in terms of two natural properties Show that no linkage-based algorithm can simulate some natural divisive algorithms 45

What’s Next? Our approach to selecting clustering algorithms can be applied to any clustering application (ex. phylogeny). Classify applications in terms of their clustering needs – Target research on common clustering needs or specific applications – Identify when results are relevant to specific applications Bridging the gap in other clustering settings (ex. clustering with a “noise cluster”) Axioms of clustering algorithms 46