tch?v=Y6ljFaKRTrI Fireflies.

Slides:

Advertisements

Similar presentations

Advertisements

Hierarchical Clustering

1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.

Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.

Clustering Paolo Ferragina Dipartimento di Informatica Università di Pisa This is a mix of slides taken from several presentations, plus my touch !

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.

Data Mining Cluster Analysis: Basic Concepts and Algorithms

Unsupervised learning

Introduction to Bioinformatics

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.

1 Machine Learning: Symbol-based 10d More clustering examples10.5Knowledge and Learning 10.6Unsupervised Learning 10.7Reinforcement Learning 10.8Epilogue.

Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

1 Text Clustering. 2 Clustering Partition unlabeled examples into disjoint subsets of clusters, such that: –Examples within a cluster are very similar.

4. Ad-hoc I: Hierarchical clustering

Lecture 13: Clustering (continued) May 12, 2010

Cluster Analysis: Basic Concepts and Algorithms

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 17: Hierarchical Clustering 1.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Cluster Analysis (1).

What is Cluster Analysis?

Cluster Analysis CS240B Lecture notes based on those by © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004.

What is Cluster Analysis?

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Performance guarantees for hierarchical clustering Sanjoy Dasgupta University of California, San Diego Philip Long Genomics Institute of Singapore.

Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.

Clustering Unsupervised learning Generating “classes”

Local Search CS311 David Kauchak Spring 2013 Some material borrowed from: Sara Owsley Sood and others.

BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.

COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong

Machine learning: Unsupervised learning David Kauchak cs311 Spring 2013 adapted from:

Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.

START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.

Text Clustering.

Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering.

1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.

Clustering Paolo Ferragina Dipartimento di Informatica Università di Pisa This is a mix of slides taken from several presentations, plus my touch !

UNSUPERVISED LEARNING David Kauchak CS 451 – Fall 2013.

CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.

Clustering Gene Expression Data BMI/CS 576 Colin Dewey Fall 2010.

CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.

CSCI 5417 Information Retrieval Systems Jim Martin Lecture 15 10/13/2011.

Today’s Topic: Clustering  Document clustering Motivations Document representations Success criteria  Clustering algorithms Partitional Hierarchical.

Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.

Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Information Retrieval and Organisation Chapter 17 Hierarchical Clustering Dell Zhang Birkbeck, University of London.

Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.

1 CS 391L: Machine Learning Clustering Raymond J. Mooney University of Texas at Austin.

1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.

Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.

Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar.

Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar.

Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Data Mining and Text Mining. The Standard Data Mining process.

Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,

Hierarchical Clustering & Topic Models

Unsupervised Learning: Clustering

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"

Unsupervised Learning: Clustering

Machine Learning Lecture 9: Clustering

Information Organization: Clustering

Text Categorization Berlin Chen 2003 Reference:

Clustering The process of grouping samples so that the samples are similar within each group.

Hierarchical Clustering

Data Mining CSCI 307, Spring 2019 Lecture 24

Presentation transcript:

tch?v=Y6ljFaKRTrI Fireflies

Hierarchical Clustering David Kauchak cs160 Fall 2009 some slides adapted from:

Administrative Project schedule Ethics in IR lecture ml

Hierarchical Clustering Recursive partitioning/merging of a data set        clustering  clustering  clustering  clustering  clustering

Represents all partitionings of the data We can get a K clustering by looking at the connected components at any given level Frequently binary dendograms, but n-ary dendograms are generally easy to obtain with minor changes to the algorithms Dendogram

Advantages of hierarchical clustering Don’t need to specify the number of clusters Good for data visualization See how the data points interact at many levels Can view the data at multiple levels of granularity Understand how all points interact Specifies all of the K clusterings/partitions

Hierarchical Clustering Common in many domains Biologists and social scientists Gene expression data Document/web page organization DMOZ Yahoo directories animal vertebrate fish reptile amphib. mammal worm insect crustacean invertebrate Two main approaches…

Divisive hierarchical clustering Finding the best partitioning of the data is generally exponential in time Common approach: Let C be a set of clusters Initialize C to be the one-clustering of the data While there exists a cluster c in C remove c from C partition c into 2 clusters using a flat clustering algorithm, c 1 and c 2 Add to c 1 and c 2 C Bisecting k-means

Divisive clustering

start with one cluster

Divisive clustering split using flat clustering

Divisive clustering

split using flat clustering

Divisive clustering split using flat clustering

Divisive clustering Note, there is a “temporal” component not seen here

Hierarchical Agglomerative Clustering (HAC) Let C be a set of clusters Initialize C to be all points/docs as separate clusters While C contains more than one cluster find c 1 and c 2 in C that are closest together remove c 1 and c 2 from C merge c 1 and c 2 and add resulting cluster to C The history of merging forms a binary tree or hierarchy How do we measure the distance between clusters?

Distance between clusters Single-link Similarity of the most similar (single-link)

Distance between clusters Complete-link Similarity of the “furthest” points, the least similar Why are these “local” methods used?efficiency

Distance between clusters Centroid Clusters whose centroids (centers of gravity) are the most similar

Distance between clusters Centroid Clusters whose centroids (centers of gravity) are the most similar Ward’s method What does this do?

Distance between clusters Centroid Clusters whose centroids (centers of gravity) are the most similar Ward’s method Encourages similar sized clusters

Distance between clusters Average-link Average similarity between all pairs of elements

Single Link Example

Complete Link Example

Computational Complexity For m dimensions n documents/points How many iterations? n-1 iterations First iteration Need to compute similarity of all pairs of n points/documents: O(n 2 m) Remaining n  2 iterations compute the distance between the most recently created cluster and all other existing clusters: O(nm) Does depend on the cluster similarity approach Overall run-time: O(n 2 m) – generally slower than flat clustering!

single linkage complete linkage

Problems with hierarchical clustering

Locally greedy: once a merge decision has been made it cannot be changed  … jj+1n … 1 - j  Single-linkage: chaining effect

State space search approach View hierarchical clustering problem as a state space search problem Each hierarchical clustering represents a state Goal is to find a state that minimizes some criterion function Avoids problem of traditional greedy methods

Basic state space search algorithm Start at some initial state Repeat List all next states Evaluate all next states using some criterion function Pick choice based on some search method/criterion

State space search components State space What is a state? How to move between states? Search State criterion function Method to choose next state

State space Each state is a hierarchical clustering n points n –1 sub-clusters labeled with temporal component (i.e. split order or inverse merge order) Huge state space! A B C D E

Moving between states Move should be: Simple/Straightforward Well motivated Traverse entire state space (state space complete) Ideas? 2 options node swap node graft Also include a temporal swap

Swap without temporal constraints, example 1 A B C D E swap B and D A D C B E no change to the structure

Swap without temporal constraints, example 2 A B C D E swap (D,E) and C A B D E C structure changed!

Swap with temporal constraints Move split numbers with sub-clusters (nodes) Some swap moves don’t result in legal hierarchies A B C D E What would be an illegal swap?

Swap with temporal constraints Move split numbers with sub-clusters (nodes) Some swap moves don’t result in legal hierarchies The split number of the parent must be less than the split number of the child D E C A B cannot swap 2 and 4

Graft without temporal constraints D E C A B A B D E C D E C A B

Graft with temporal constraints Move split number with sub-cluster Same as swap, only allow swaps that satisfy parent < child

Swap using grafts A B C D E Emulate: swap (A,B) and (D,E) C D E A B graft (A,B) D E C A B graft (D,E)

Graft using swaps A B C D E Emulate: graft (A,B) to above (D,E) C D E A B swap C and (A,B,D,E) swap sibling of one with other: swap C with (D,E) A B D E C

Temporal swap A B C D E Must obey parent/child constraints - In general, could swap any two that satisfy constraints - Only consider adjacent numbers (i.e. 2, 3 or 3, 4)

Evaluating states For a given k-clustering, the k-means criterion function is the squared difference between a point and it’s assigned center for all points and all centers

Leveraging k-means criterion For a hierarchical clustering, calculate a weighted sum of the k-means criterion function for all n -1 clusterings represented by the hierarchy

Calculating criterion function How long does it take to calculate k-means cost? O(nm) How long then for the overall cost? n –1 clusterings: O(n 2 m) We can do better! Using a few tricks… O(nm) time

How to pick the next state Greedy: Pick best choice  -greedy: Pick best choice with probability , otherwise choose randomly Soft-max: Choose option with probability Simulated annealing: Vary parameters for above algorithms over time

Overall run-time  List all next states How many next states are there? All combinations of n data points and n – 1 sub-clusters O(n 2 ) Evaluate all next states using criterion function O(nm) Pick choice based on some search method/criterion O(n 3 ) per iteration

Bad case for single linkage Examined n = 8 Greedy method Using simulated annealing “best” was found 3 out of 10 Lowest criterion value is “best” clustering (3304)  … jj+1n … 1 - j  d3 d2 d1 d6 d7 d8 d4 d5 3469

SS-Hierarchical vs. Ward’s SS-Hierarchical Greedy, Ward’s initialize Ward’s 20 points iterations points iterations points ? iterations Yeast gene expression data set

SS-Hierarchical vs. Ward’s: Individual clusters

What Is A Good Clustering? Internal criterion: A good clustering will produce high quality clusters in which: the intra-class (that is, intra-cluster) similarity is high the inter-class similarity is low How would you evaluate clustering?

Common approach: use labeled data Use data with known classes For example, document classification data Measure how well the clustering algorithm reproduces class partitions Purity, the proportion of the dominant class in the cluster Good for comparing two algorithms, but not understanding how well a single algorithm is doing, why? Increasing the number of clusters increases purity Average entropy of classes in clusters for example, prefers 50/50 vs. 50/25/25

     Cluster ICluster IICluster III Cluster I: Purity = 1/6 (max(5, 1, 0)) = 5/6 Cluster II: Purity = 1/6 (max(1, 4, 1)) = 4/6 Cluster III: Purity = 1/5 (max(2, 0, 3)) = 3/5 Purity example

Googlenomics The article mentions the “quality score” as an important ingredient to the search. How is it important/useful? What are the drawbacks to this algorithm? /nep_googlenomics