Taylor Rassmann.  Grouping data objects into X tree of clusters and uses distance matrices as clustering criteria  Two Hierarchical Clustering Categories:

Slides:



Advertisements
Similar presentations
CLUSTERING.
Advertisements

Clustering II.
Hierarchical Clustering
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Albert Gatt Corpora and Statistical Methods Lecture 13.
CS690L: Clustering References:
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
Birch: Balanced Iterative Reducing and Clustering using Hierarchies By Tian Zhang, Raghu Ramakrishnan Presented by Vladimir Jelić 3218/10
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
Part II - Clustering© Prentice Hall1 Clustering Large DB Most clustering algorithms assume a large data structure which is memory resident. Most clustering.
Chapter 3: Cluster Analysis
IT 433 Data Warehousing and Data Mining Hierarchical Clustering Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department.
6-1 ©2006 Raj Jain Clustering Techniques  Goal: Partition into groups so the members of a group are as similar as possible and different.
Clustering II.
Clustering II.
4. Clustering Methods Concepts Partitional (k-Means, k-Medoids)
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
Clustering II.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Constructing Category Hierarchies for Visual Recognition Marcin Marszaklek and Cordelia Schmid.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Birch: An efficient data clustering method for very large databases
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Taylor Rassmann.  Look at a confusion matrix of the UCF50 dataset  Dollar Features  Find the two most confused classes  Train an SVM specifically.
Data mining and machine learning A brief introduction.
Department of Computer Science, University of Waikato, New Zealand Geoffrey Holmes, Bernhard Pfahringer and Richard Kirkby Traditional machine learning.
Cluster Analysis Part II. Learning Objectives Hierarchical Methods Density-Based Methods Grid-Based Methods Model-Based Clustering Methods Outlier Analysis.
9/03Data Mining – Clustering G Dong (WSU) 1 4. Clustering Methods Concepts Partitional (k-Means, k-Medoids) Hierarchical (Agglomerative & Divisive, COBWEB)
Hierarchical Clustering
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Clustering.
BIRCH: An Efficient Data Clustering Method for Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny University of Wisconsin-Maciison Presented.
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies A hierarchical clustering method. It introduces two concepts : Clustering feature Clustering.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
5/29/2008AI UEC in Japan Chapter 12 Clustering: Large Databases Written by Farial Shahnaz Presented by Zhao Xinyou Data Mining Technology.
Hierarchical Clustering
Other Clustering Techniques
Taylor Rassmann.  Look at a confusion matrix of the UCF50 dataset  Dollar Features  Idea of a structured tree SVM  Different configurations have different.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
CURE: An Efficient Clustering Algorithm for Large Databases Authors: Sudipto Guha, Rajeev Rastogi, Kyuseok Shim Presentation by: Vuk Malbasa For CIS664.
Data Mining and Text Mining. The Standard Data Mining process.
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Human Action Recognition Week 10
Hierarchical Clustering
Clustering in Ratemaking: Applications in Territories Clustering
Data Mining -Cluster Analysis. What is a clustering ? Clustering is the process of grouping data into classes, or clusters, so that objects within a cluster.
CS 685: Special Topics in Data Mining Jinze Liu
K-means and Hierarchical Clustering
Hierarchical and Ensemble Clustering
CS 685: Special Topics in Data Mining Jinze Liu
CS 485G: Special Topics in Data Mining
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Human Action Recognition Week 8
Hierarchical and Ensemble Clustering
CSE572, CBS572: Data Mining by H. Liu
Birch presented by : Bahare hajihashemi Atefeh Rahimi
Hierarchical Clustering
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies
Clustering Large Datasets in Arbitrary Metric Space
CSE572: Data Mining by H. Liu
CS 685: Special Topics in Data Mining Jinze Liu
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
Presentation transcript:

Taylor Rassmann

 Grouping data objects into X tree of clusters and uses distance matrices as clustering criteria  Two Hierarchical Clustering Categories:  Agglomerative (bottom-up): merging clusters iteratively  Divisive (top-down): splitting clusters iteratively

 AGNES Method:  Merges nodes that have the least dissimilarity  Continues in a non-descending way  All nodes belong to the same cluster at the end  DIANA Method:  Inverse order of AGNES method  All nodes are in separate clusters at the end

 AGNES and DIANA Weaknesses:  Time complexity of O(N 2 )  Split or merge decisions are final, and cannot be undone. This may lead to lower quality clusters  New technique proposed:  BIRCH

 BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies  Incrementally uses a CF tree as a summarized cluster representation, split into multiple phase.  Phase 1: scan the database to build an initial in- memory CF tree  Phase 2: use an arbitrary clustering, such as k- means, to cluster the leaf nodes

 BIRCH Advantages:  Scales in a linear fashion. This method finds a good cluster with a single run and then can improve with a few additional scans  Time complexity of O(N)  BIRCH Weaknesses:  Only numeric data, and sensitive of order  Favors clusters with a spherical shape

 Look at a confusion matrix of the UCF50 dataset  Dollar Features  Idea of a structured tree SVM  Different configurations have different paired groupings  Train an SVM specifically on these paired groupings

 Retrain and test with one less label than the previous iteration  Repeat for multiple levels

Biking7030 Horseback Riding 2080 BikingHorseback Riding

 Configuration: Least confused paired with the most confused  25 levels deep after selection, training, and then testing  Initial Acc = Level 1 Acc Level 2 Acc Level 3 Acc Level 4 Acc Level 5 Acc …… Level 25 Acc0.6418

 Configuration: Most confused pairs grouped together  25 levels deep after selection, training, and then testing  Initial Acc = Level 1 Acc Level 2 Acc Level 3 Acc Level 4 Acc Level 5 Acc …… Level 25 Acc0.6866

 Configuration: Least confused pair with most confused. (Least confused not taken out)  49 levels deep after selection  Training and testing still need to be completed Level 1 Acc Level 2 Acc Level 3 Acc Level 4 Acc Level 5 Acc … Level 49 Acc ?

 Continue research into different hierarchical clustering methods  Finish training and testing with method three of hierarchical SVMs  Write final report