An Impossibility Theorem for Clustering By Jon Kleinberg.

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
1 Discrete Structures & Algorithms Graphs and Trees: III EECE 320.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
GOLOMB RULERS AND GRACEFUL GRAPHS
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT.
Introduction to Bioinformatics
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Discrete geometry Lecture 2 1 © Alexander & Michael Bronstein
1 Network Coding: Theory and Practice Apirath Limmanee Jacobs University.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Lecture 6 Image Segmentation
Clustering II.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
Image Segmentation Chapter 14, David A. Forsyth and Jean Ponce, “Computer Vision: A Modern Approach”.
4. Ad-hoc I: Hierarchical clustering
Cluster Analysis (1).
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
Clustering III. Lecture outline Soft (model-based) clustering and EM algorithm Clustering aggregation [A. Gionis, H. Mannila, P. Tsaparas: Clustering.
Nirit Gourgy.  What is clustering?  Metric Spaces  K-center Clustering problem & 2- approximation algorithm  K-median Clustering problem & approximation.
Clustering IV. Outline Impossibility theorem for clustering Density-based clustering and subspace clustering Bi-clustering or co-clustering.
Radial Basis Function Networks
Clustering Unsupervised learning Generating “classes”
Introduction to Graph Theory
Cluster analysis 포항공과대학교 산업공학과 확률통계연구실 이 재 현. POSTECH IE PASTACLUSTER ANALYSIS Definition Cluster analysis is a technigue used for combining observations.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
tch?v=Y6ljFaKRTrI Fireflies.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Clustering.
Introduction to Graph Theory
Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING.
Clustering CS446 - FALL ‘15 Administration Final Exam: Next Tuesday, 12/8 12:30, in class.  Material: Everything covered from the beginning of the semester.
Machine Learning Queens College Lecture 7: Clustering.
Clustering Patrice Koehl Department of Biological Sciences National University of Singapore
Hedonic Clustering Games Moran Feldman Joint work with: Seffi Naor and Liane Lewin-Eytan.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Machine Learning Lunch - 29 Sep 2009 – ClusteringTheory.org John Oliver from “The Daily Show” Supporting worthy causes at the G20 Pittsburgh Summit: “Bayesians.
1 Assignment #3 is posted: Due Thursday Nov. 15 at the beginning of class. Make sure you are also working on your projects. Come see me if you are unsure.
12. Lecture WS 2012/13Bioinformatics III1 V12 Menger’s theorem Borrowing terminology from operations research consider certain primal-dual pairs of optimization.
Clustering Data Streams A presentation by George Toderici.
K-Means and variants Rahul K Mishra Guide: Prof. G. Ramakrishnan.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Dilworth’s theorem and extremal set theory 張雁婷 國立交通大學應用數學系.
Clustering – Definition and Basic Algorithms Seminar on Geometric Approximation Algorithms, spring 11/12.
Haim Kaplan and Uri Zwick
Chapter 5. Optimal Matchings
Autumn 2016 Lecture 11 Minimum Spanning Trees (Part II)
3.3 Applications of Maximum Flow and Minimum Cut
Clustered representations: Clusters, covers, and partitions
Autumn 2015 Lecture 11 Minimum Spanning Trees (Part II)
Robustness of wireless ad hoc network topologies
Robustness of wireless ad hoc network topologies
Fair Clustering through Fairlets ( NIPS 2017)
Richard Anderson Lecture 10 Minimum Spanning Trees
V12 Menger’s theorem Borrowing terminology from operations research
Clustering.
Clustering The process of grouping samples so that the samples are similar within each group.
Winter 2019 Lecture 11 Minimum Spanning Trees (Part II)
Hierarchical Clustering
Autumn 2019 Lecture 11 Minimum Spanning Trees (Part II)
Presentation transcript:

An Impossibility Theorem for Clustering By Jon Kleinberg

Definitions  Clustering function: operates on a set S of more than 2 points and the distances among them where is a partition of S  Distance function: the distance is 0 only for d(i,i)  Does not require the triangle inequality.

Many different clustering criteria  k-center  k-median  k-means  Inter-Intra  etc

k-Center Minimize maximum distance

k-median Minimize average distance k-means: minimize distance squared

Inter-Intra T(C) D(C) Maximize D(C) – T(C)

Motivation  Each criterion optimizes different features  Is there one clustering criterion with phenomenal cosmic powers?

Method  Give three intuitive axioms that any criterion should satisfy  Surprise: Not possible to satisfy all three  Reminiscent of Arrow’s Impossibility theorem: ranking is impossible

Axiom 1 – Scale-Invariance  For any distance function d and any β >0 we have that f(S,d)=f(S,βd)

Axiom 2 - Richness  Range(f) is equal to all partitions of S  i.e. All possible clusterings can be generated given the right distances

Axiom 3 - Consistency  Let d and d’ be two distance functions. If f(d) = and d’ is such that the distance between all points in a cluster is less than in d and the distance between inter-cluster points is larger than in d then f(d’)= d(i,j) d’(i,j)

Definition  Anti-chain: A collection of partitions is an anti-chain if it does not contain two distinct partitions such that one is a refinement of the other  Anti-Chains can not satisfy Richness

Main Result  For each, there is no clustering function f that satisfies Scale-Invariance, Richness and Consistency  Implied by proof that if f satisfies Scale- Invariance and Consistency, then Range(f) is an anti-chain

Reminder of Axioms  Scale-Invariance: For any distance function d and any β >0 we have that f(d)=f(β d)  Richness: Range(f) is equal to all partitions of S  Consistency: Let d and d’ be two distance functions. If f(d) = and d’ is such that the distance between all points in a cluster is less than in d and the distance between inter-cluster points is larger than in d then f(d’)=

Single Linkage  Cluster by combining the closest points

Any two axioms  For every pair of axioms, there is a stopping condition for single linkage  Consistency + Richness: only link if distance is less than r  Consistency + SI: stop when you have k connected components  Richness + SI: if x is the diameter of the graph, only add edges with weight βx

Centroid-Based Clustering  (k,g)-centroid clustering function: Choose T, a set of k centroid points such that is minimized  If g is identity, we get k-median, etc.  Result: For every and every function g and n significantly larger than k the (k,g)-centroid clustering function does not satisfy consistency.

Proof: A contradiction r r+δ ε X (size m) Y (size λm)

A new distance function r’ r+δ ε Y (size λm) X 0 (size m/2) r’ r r+δ X 1 (size m/2) r’ < r

Wrapping Up  If we pick λ, r, r’, ε and δ right then we can have:  But then our new centers are in X 0 and X 1  But our new distance followed consistency, so it should give us X and Y.  This covers the case where k is 2.

Discussion: Relaxing Axioms  Refinement-consistency: if d’ is an f(d)- transformation of d, then f(d’) is a refinement of f(d)  Near-Richness: all partitions except the trivial one can be obtained  These together allow a function that satisfies these replacements.  What other relaxations could we have?

Discussion  Does this mean there is a law of continuous employment for clustering criterion creators?  Is the clustering function properly defined? Allow overlaps Allow outliers  Are these the right axioms? All partitions possible vs. power set  Axioms for graph clustering?

Questions?