Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Shai Ben-David University of Waterloo Canada, Dec 2004 Generalization Bounds for Clustering - Some Thoughts and Many Questions.
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Clustering Basic Concepts and Algorithms
Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.
Linked Based Clustering and Its Theoretical Foundations Paper written by Margareta Ackerman and Shai Ben-David Yan T. Yang Presented by Yan T. Yang.
Alvin Kwan Division of Information & Technology Studies
Margareta Ackerman Joint work with Shai Ben-David Measures of Clustering Quality: A Working Set of Axioms for Clustering.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Characterization of Linkage-Based Algorithms Margareta Ackerman Joint work with Shai Ben-David and David Loker University of Waterloo To appear in COLT.
Introduction to Bioinformatics
Discerning Linkage-Based Algorithms Among Hierarchical Clustering Methods Margareta Ackerman and Shai Ben-David IJCAI 2011.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Mining for High Complexity Regions Using Entropy and Box Counting Dimension Quad-Trees Rosanne Vetro, Wei Ding, Dan A. Simovici Computer Science Department.
Discovering Affine Equalities Using Random Interpretation Sumit Gulwani George Necula EECS Department University of California, Berkeley.
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Mutual Information Mathematical Biology Seminar
International Workshop on Computer Vision - Institute for Studies in Theoretical Physics and Mathematics, April , Tehran 1 IV COMPUTING SIZE.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Simon’s Algorithm Arathi Ramani EECS 598 Class Presentation.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Clustering III. Lecture outline Soft (model-based) clustering and EM algorithm Clustering aggregation [A. Gionis, H. Mannila, P. Tsaparas: Clustering.
Radial Basis Function Networks
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Online Detection of Change in Data Streams Shai Ben-David School of Computer Science U. Waterloo.
An Impossibility Theorem for Clustering By Jon Kleinberg.
By Sarita Jondhale1 Pattern Comparison Techniques.
Data Mining & Knowledge Discovery Lecture: 2 Dr. Mohammad Abu Yousuf IIT, JU.
Discrete Mathematical Structures (Counting Principles)
Towards Theoretical Foundations of Clustering Margareta Ackerman Caltech Joint work with Shai Ben-David and David Loker.
A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.
Modeling and Simulation Random Number Generators
3. Counting Permutations Combinations Pigeonhole principle Elements of Probability Recurrence Relations.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Soft Computing Lecture 14 Clustering and model ART.
CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David.
1. Clustering is one of the most widely used tools for exploratory data analysis. Social Sciences Biology Astronomy Computer Science …. All apply clustering.
Chapter 23: Probabilistic Language Models April 13, 2004.
Clustering.
Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009 A UNIQUENESS THEOREM FOR CLUSTERING.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
R ANDOM N UMBER G ENERATORS Modeling and Simulation CS
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Machine Learning Lunch - 29 Sep 2009 – ClusteringTheory.org John Oliver from “The Daily Show” Supporting worthy causes at the G20 Pittsburgh Summit: “Bayesians.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Learning with General Similarity Functions Maria-Florina Balcan.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Outline Introduction State-of-the-art solutions
Random Number Generators
Topic 3: Cluster Analysis
Topic 5: Cluster Analysis
Data Pre-processing Lecture Notes for Chapter 2
Presentation transcript:

Theoretical Foundations of Clustering – CS497 Talk Shai Ben-David February 2007

What is clustering? Given a collection of objects (characterized by feature vectors, or just a matrix of pair-wise similarities), detects the presence of distinct groups, and assign objects to groups.

Another example

Why should we care about clustering? Clustering is a basic step in most data mining procedures: Examples :  Clustering movie viewers for movie ranking.  Clustering proteins by their functionality.  Clustering text documents for content similarity.

Clustering is one of the most widely used tool for exploratory data analysis. Social Sciences Biology Astronomy Computer Science. All apply clustering to gain a first understanding of the structure of large data sets. The Theory-Practice Gap Yet, there exist distressingly little theoretical understanding of clustering

What questions should research address?  What is clustering?  What is “good” clustering?  Can clustering be carried out efficiently?  Can we distinguish “clusterable” from “structureless” data?  Many more …

“Clustering” is an ill defined problem  There are many different clustering tasks, leading to different clustering paradigms: There are Many Clustering Tasks

“Clustering” is an ill defined problem  There are many different clustering tasks, leading to different clustering paradigms: There are Many Clustering Tasks

Some more examples

I would like to discuss the broad notion of clustering Independently of any particular algorithm, objective function or generative data model

What for?  Choosing a suitable algorithm for a given task.  Evaluating the quality of clustering methods.  Distinguishing significant structure from random fata morgana,  Distinguishing clusterable data from structure-less data.  Providing performance guarantees for clustering algorithms.  Much more …

The Basic Setting S  For a finite domain set S, a dissimilarity function (DF) is a symmetric mapping d:SxSR + d:SxS → R + such that d(x,y)=0x=y d(x,y)=0 iff x=y. S S  A clustering function takes a dissimilarity function on S and returns a partition of S.  We wish to define the properties that distinguish clustering functions (from any other functions that output domain partitions).

Kleinberg’s Axioms  Scale Invariance F(λd)=F(d)d λ F(λd)=F(d) for all d and all strictly positive λ.  Richness S For any finite domain S, {F(d): d S}={P:P S} {F(d): d is a DF over S}={P:P a partition of S}  Consistency d’d F(d) F(d)F(d)=F(d’). d’ equals d except for shrinking distances within clusters of F(d) or stretching between-cluster distances (w.r.t. F(d) ), then F(d)=F(d’).

Note that any pair is realizable Consider Single-Linkage with different stopping criteria:  k connected components.  Distance r stopping.  Scale α stopping: add edges as long as their length is at most α(max-distance)

Kleinberg’s Impossibility result There exist no clustering function Proof: Scaling up Consistency

A Different Perspective  The goal is to generate a variety of axioms (or properties) over a fixed framework, so that different clustering approaches could be classified by the different subsets of axioms they satisfy.  Axioms as a tool for classifying clustering paradigms

Axioms as a tool for a taxonomy of clustering paradigms The goal is to generate a variety of axioms (or properties) over a fixed framework, so that different clustering approaches could be classified by the different subsets of axioms they satisfy. Scale Invariance RichnessLocal Consistency Full Consistency Single Linkage -+++ Center Based +++- Spectral +++- MDL ++- Rate Distortion ++- “Axioms”“Properties”

Ideal Theory  We would like to have a list of simple properties so that major clustering methods are distinguishable from each other using these properties.  We would like the axioms to be such that all methods satisfy all of them, and nothing that is clearly not a clustering satisfies all of them. (this is probably too much to hope for).  In the remainder of this talk, I would like to discuss some candidate “axioms” and “properties” to get a taste of what this theory-development program may involve.

Types of Axioms/Properties Richness requirements E.g., relaxations of Kelinberg’s richness, e.g., {F(d): d S}={P:P S k } {F(d): d is a DF over S}={P:P a partition of S into k sets} Invariance/Robustness/Stability requirements. E.g., Scale-Invariance, Consistency, robustness d F to perturbations of d (“smoothness” of F) or stability S w.r.t. sampling of S.

Relaxations of Consistency Local Consistency – C 1 …C k F(d). Let C 1, …C k be the clusters of F(d). λ 0 ≥ 1 λ 1,..λ k ≤1d’ For every λ 0 ≥ 1 and positive λ 1,..λ k ≤ 1, if d’ is defined by: λ i d(a,b)abC i λ i d(a,b) if a and b are in C id’(a,b)= λ 0 d(a,b)a,bF(d)- λ 0 d(a,b) if a,b are not in the same F(d)- cluster, F(d)=F(d’). then F(d)=F(d’). Is there any known clustering method for which it fails?

Other types of clustering  Culotta and McCallum’s “Clusterwise Similarity”  Edge-Detection (advantage to smooth contours)  Texture clustering  The professors example.

Conclusions and open questions  There is a place for developing an axiomatic framework for clustering.  The existing negative results do not rule out the possibility of useful axiomatization.  We should also develop a system of “clustering properties” for a taxonomy of clustering methods.  There are many possible routes to take and hidden subtleties in this project.