Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT.

Slides:



Advertisements
Similar presentations
Some important properties Lectures of Prof. Doron Peled, Bar Ilan University.
Advertisements

Introduction The concept of transform appears often in the literature of image processing and data compression. Indeed a suitable discrete representation.
1 Decomposing Hypergraphs with Hypertrees Raphael Yuster University of Haifa - Oranim.
Translation-Based Compositional Reasoning for Software Systems Fei Xie and James C. Browne Robert P. Kurshan Cadence Design Systems.
Lecture 24 MAS 714 Hartmut Klauck
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.
Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang.
Section 7.4: Closures of Relations Let R be a relation on a set A. We have talked about 6 properties that a relation on a set may or may not possess: reflexive,
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Self Stabilization 1.
Margareta Ackerman Joint work with Shai Ben-David Measures of Clustering Quality: A Working Set of Axioms for Clustering.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Efficient Divide-and- Conquer Simulations Of Symmetric FSAs David Pritchard, University of Waterloo Aug. 29 '07 Toronto, ON.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Weighted Clustering Margareta Ackerman Work with Shai Ben-David, Simina Branzei, and David Loker.
Discerning Linkage-Based Algorithms Among Hierarchical Clustering Methods Margareta Ackerman and Shai Ben-David IJCAI 2011.
Classification-Based Clustering Evaluation John S. Whissell, Charles L.A. Clarke Benjamin Parimala Dheeraj V Katta.
1 Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
A Semantic Characterization of Unbounded-Nondeterministic Abstract State Machines Andreas Glausch and Wolfgang Reisig 1.
Correctness. Until now We’ve seen how to define dataflow analyses How do we know our analyses are correct? We could reason about each individual analysis.
CPSC 668Self Stabilization1 CPSC 668 Distributed Algorithms and Systems Spring 2008 Prof. Jennifer Welch.
A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
Data Flow Analysis Compiler Design Nov. 8, 2005.
1 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY For next time: Read 2.1 & 2.2.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Clustering III. Lecture outline Soft (model-based) clustering and EM algorithm Clustering aggregation [A. Gionis, H. Mannila, P. Tsaparas: Clustering.
Evaluating Performance for Data Mining Techniques
Decision Theory CHOICE (Social Choice) Professor : Dr. Liang Student : Kenwa Chu.
An Impossibility Theorem for Clustering By Jon Kleinberg.
Image Segmentation Seminar III Xiaofeng Fan. Today ’ s Presentation Problem Definition Problem Definition Approach Approach Segmentation Methods Segmentation.
Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker.
1 2. Independence and Bernoulli Trials Independence: Events A and B are independent if It is easy to show that A, B independent implies are all independent.
UNC Chapel Hill Lin/Foskey/Manocha Minimum Spanning Trees Problem: Connect a set of nodes by a network of minimal total length Some applications: –Communication.
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.
The Integers. The Division Algorithms A high-school question: Compute 58/17. We can write 58 as 58 = 3 (17) + 7 This forms illustrates the answer: “3.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David.
Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.
1. Clustering is one of the most widely used tools for exploratory data analysis. Social Sciences Biology Astronomy Computer Science …. All apply clustering.
MODEL FOR DEALING WITH DUAL-ROLE FACTORS IN DEA: EXTENSIONS GONGBING BI,JINGJING DING,LIANG LIANG,JIE WU Presenter : Gongbing Bi School of Management University.
Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.
Formal Foundations of Clustering Margareta Ackerman Work with Shai Ben-David, Simina Branzei, and David Loker.
DOCUMENT CLUSTERING USING HIERARCHICAL ALGORITHM Submitted in partial fulfillment of requirement for the V Sem MCA Mini Project Under Visvesvaraya Technological.
Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Improved Competitive Ratios for Submodular Secretary Problems ? Moran Feldman Roy SchwartzJoseph (Seffi) Naor Technion – Israel Institute of Technology.
Hedonic Clustering Games Moran Feldman Joint work with: Seffi Naor and Liane Lewin-Eytan.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 1 Regular Languages Some slides are in courtesy.
Chapter 8 Asynchronous System Model by Mikhail Nesterenko “Distributed Algorithms” by Nancy A. Lynch.
Negotiating Socially Optimal Allocations of Resources U. Endriss, N. Maudet, F. Sadri, and F. Toni Presented by: Marcus Shea.
Machine Learning Lunch - 29 Sep 2009 – ClusteringTheory.org John Oliver from “The Daily Show” Supporting worthy causes at the G20 Pittsburgh Summit: “Bayesians.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Chapter 5. Optimal Matchings
CS 154, Lecture 4: Limitations on DFAs (I),
ICS 353: Design and Analysis of Algorithms
Alternating tree Automata and Parity games
A Design Structure for Higher Order Quotients
On the effect of randomness on planted 3-coloring models
Advanced Algorithms Analysis and Design
Presentation transcript:

Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT 2010

There are a wide variety of clustering algorithms, which often produce very different clusterings. How can we distinguish between clustering algorithms? How should a user decide which algorithm to use for a given application? Motivation

We propose a framework that lets a user utilize prior knowledge to select an algorithm Identify properties that distinguish between different clustering paradigms The properties should be: 1) Intuitive and “user-friendly” 2) Useful for classifying clustering algorithms Our approach for clustering algorithm selection

Kleinberg proposes abstract properties (“Axioms”) of clustering functions (NIPS, 2002) Bosagh Zadeh and Ben-David provide a set of properties that characterize single linkage clustering (UAI, 2009) Previous work

Propose a set of intuitive properties that uniquely indentify linkage-based clustering algorithms Construct a taxonomy of clustering algorithms based on the properties Our contributions

Define linkage-based clustering Our new clustering properties Main result Sketch of proof A taxonomy of common clustering algorithms using clustering properties Conclusions Outline

For a finite domain set X, a dissimilarity function d over the members of X. A Clustering Function F maps Input: (X,d) and k>0to Output: a k -partition (clustering) of X Formal setup

Start with the clustering of singletons Merge the closest pair of clusters Repeat until only k clusters remain. Ex. Single linkage, average linkage, complete linkage Informally, a linkage function is an extension of the between-point distance that applies to subsets of the domain. The choice of the linkage function distinguishes between different linkage-based algorithms. ? Linkage-based algorithm: An informal definition

Define linkage-based clustering Our new clustering properties Main result Sketch of proof A taxonomy of common clustering algorithms using our properties Conclusions Outline

A clustering C is a refinement of clustering C’ if every cluster in C’ is a union of some clusters in C. A clustering function is hierarchical if for and every F(X,d,k’) is a refinement of F(X,d,k). Hierarchical clustering

F is local if for any C Locality

Many clustering algorithms are local:  K-means  K-median  Single-linkage  Average-linkage  Complete-linkage Notably, some clustering algorithms fail locality:  Ratio cut  Normalized cut Which paradigms satisfy locality ?

If d’ equals d, except for increasing between-cluster distances, then F(X,d,k)=F(X,d’,k) for all d, X, and k. dd’ F(X,d,3)F(X,d’,3) Outer Consistency Based on Kleinberg, 2002.

K-means K-median Single-linkage Average-linkage Complete-linkage Not all clustering algorithms are outer-consistent  Ratio cut  Normalized cut Which paradigms satisfy outer-consistency?

Extended Richness

F satisfies extended richness if for any set of domains there is a d over that extends each of the so that Extended Richness

K-means K-median Single-linkage Average-linkage Complete-linkage Ratio cut Normalized cut Many clustering algorithms satisfy extended richness

Define linkage-based clustering Our new clustering properties Main result Sketch of proof A taxonomy of common clustering algorithms using our properties Conclusions Outline

Theorem: A clustering function is Linkage Based if and only if it is Hierarchical and it satisfies Outer Consistency, Locality and Extended Richness. Our main result

Every Linkage Based clustering function is Hierarchical, Local, Outer-Consistent, and satisfies Extended Richness. The proof is quite straight-forward. Easy direction of proof

If F is Hierarchical and it satisfies Outer Consistency, Locality and Extended-Richness then F is Linkage-Based. To prove this direction we first need to formalize linkage-based clustering, by formally defining what is a linkage function. Interesting direction of proof

A linkage function is a function l :{ : d is a distance function over } that satisfies the following: What do we expect from linkage function? 1) Representation independent: Doesn’t change if we re-label the data 2) Monotonic: if we increase edges that go between and, then l doesn’t decrease. 3) Any pair of clusters can be made arbitrarily distant: By increasing edges that go between and, we can make l reach any value in the range of l.

Recall direction: If F is a hierarchical function that satisfies outer- consistency, locality, and extended richness then F is linkage-based. Goal: Define a linkage function l so that the linkage based clustering based on l outputs F(X,d,k) (for every X, d and k ). Sketch of proof

Define an operator < F : (A,B,d 1 ) < F (C,D,d 2 ) if when we run F on, where d extends d 1 and d 2, A and B are merged before C and D. Sketch of proof (continued…)

Define an operator < F : (A,B,d 1 ) < F (C,D,d 2 ) if when we run F on, where d extends d 1 and d 2, A and B are merged before C and D.

Sketch of proof (continued…) Prove that < F can be extended to a partial ordering Use the ordering to define l Define an operator < F : (A,B,d 1 ) < F (C,D,d 2 ) if when we run F on, where d extends d 1 and d 2, A and B are merged before C and D.

Sketch of proof continue: Show that < F is a partial ordering We show that < F is cycle-free. Lemma: Given a function F that is hierarchical, local, outer-consistent and satisfies extended richness, there are no so that and

By the above Lemma, the transitive closure of < F is a partial ordering. R This implies that there exists an order preserving function l that maps pairs of data sets to R. It can be shown that l satisfies the properties of a linkage function. Sketch of proof (continued…)

Define linkage-based clustering Our new clustering properties Main result Sketch of proof A taxonomy of common clustering algorithms using our properties Conclusions Outline

LocalOuter Con. Inner Con. Heirar- chical Path Dist. Order Inv. Extent. Rich. Scale Inv. Iso. Inv. Single linkage Average linkage   Complete linkage   K-means  K-median  Min-Sum  Ratio-cut  Normalized- cut  Taxonomy of clustering algorithms

LocalOuter Con. Inner Con. Heirar- chical Path Dist. Order Inv. Extent. Rich. Scale Inv. Iso. Inv. Single linkage Average linkage   Complete linkage   K-means  K-median  Min-Sum  Ratio-cut  Normalized- cut  Characterization of Linkage-Based Algorithms

LocalOuter Con. Inner Con. Heirar- chical Path Dist. Order Inv. Extent. Rich. Scale Inv. Iso. Inv. Single linkage Average linkage   Complete linkage   K-means  K-median  Min-Sum  Ratio-cut  Normalized- cut  Characterization of Single-Linkage By Bosagh Zadeh and Ben-David (UAI, 09)

LocalOuter Con. Inner Con. Heirar- chical Path Dist. Order Inv. Extent. Rich. Scale Inv. Iso. Inv. Single linkage Average linkage   Complete linkage   K-means  K-median  Min-Sum  Ratio-cut  Normalized- cut  Distinguishing among Linkage-Based Algorithms

A function F is order invariant if for all d and d’ where for all points p,q,r,s d(p,q)< d(r,s) iff d’(p,q)< d’(r,s), we have that F(X,d) = F(X,d’). LocalOuter Con. Inner Con. Heirar- chical Path Dist. Order Inv. Extent. Rich. Scale Inv. Iso. Inv. Single linkage Average linkage   Complete linkage   Distinguishing among Linkage-Based Algorithms

LocalOuter Con. Inner Con. Heirar- chical Path Dist. Order Inv. Extent. Rich. Scale Inv. Iso. Inv. Single linkage Average linkage   Complete linkage   K-means  K-median  Min-Sum  Ratio-cut  Normalized- cut  When “Natural” properties are not satisfied

LocalOuter Con. Inner Con. Heirar- chical Path Dist. Order Inv. Extent. Rich. Scale Inv. Iso. Inv. Single linkage Average linkage   Complete linkage   K-means  K-median  Min-Sum  Ratio-cut  Normalized- cut  PropertiesAxioms

Using this framework, clustering users can utilize prior knowledge to determine which properties make sense for their application The goal is to construct a property-based taxonomy for many useful clustering algorithms Using this approach, a user will be able to find a suitable algorithm without the overhead of executing many clustering algorithms Advantages of the Framework

We introduced new properties of clustering algorithms. We use these properties to provide a characterization of linkage-based algorithms. We classified common clustering algorithms using these properties. Conclusions

Kleinberg (NIPS, 02) proposed 3 “axioms” of clustering functions, which he showed to be inconsistent. Ackerman and Ben-David (NIPS, 08) showed that the these properties are consistent in the setting of clustering quality measures. Goal: find a consistent set of axioms of clustering functions. Axioms of clustering

An axiom is a property that is satisfied by all members of a class A complete set of axioms of clustering functions would be satisfied by all clustering functions, and only by clustering functions Our goal is to find a complete set of axioms of clustering We use Kleinberg’s axioms as a starting point If we fix k, Kleinberg’s axioms are consistent. Axioms VS properties

Scale Invariance: The output of the function doesn’t change if the data is scaled uniformly.  Satisfied by common clustering algorithms. Richness: For all k -clustering C of X, there exists a distance function d over X so that F(X,d,k) = C.  Richness is implied by extended richness.  Satisfied by common clustering algorithms. Kleinberg’s axioms for fixed K Consistency: If d’ equals d, except for increasing between-cluster distances, then F(X,d,k)=F(X,d’,k).  Not satisfied by some common clustering algorithms.  Relaxations of this property, inner and outer consistency are also not satisfied by some common algorithms.

We propose using the following as axioms of clustering.  Scale invariance  Isomorphism invariance  Extended richness Are there natural clustering functions that fail any of these properties? Are these axioms sufficient? Towards axioms of clustering functions

LocalOuter Con. Inner Con. Heirar- chical Path Dist. Order Inv. Extent. Rich. Scale Inv. Iso. Inv. Single linkage Average linkage   Complete linkage   K-means  K-median  Min-Sum  Ratio-cut  Normalized- cut  PropertiesAxioms