POSTER TEMPLATE BY: www.PosterPresentations.com Note: in high dimensions, the data are sphered prior to distance matrix calculation. Three Groups Example;

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Clustering.
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Hierarchical Clustering, DBSCAN The EM Algorithm
Clustering Paolo Ferragina Dipartimento di Informatica Università di Pisa This is a mix of slides taken from several presentations, plus my touch !
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Introduction to Bioinformatics
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Clustering II.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
Segmentation CSE P 576 Larry Zitnick Many slides courtesy of Steve Seitz.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Visualization of Clusters with a Density-Based Similarity Measure Rebecca Nugent Department of Statistics, Carnegie Mellon University Joint with: Werner.
Cluster Analysis: Basic Concepts and Algorithms
An Interval MST Procedure Rebecca Nugent w/ Werner Stuetzle November 16, 2004.
What is Cluster Analysis?
Visualization of Clusters with a Density-Based Similarity Measure Rebecca Nugent Department of Statistics, Carnegie Mellon University June 9, 2007 Joint.
What is Cluster Analysis?
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Using Statistical Methods to Improve Disease Classification Ryan Sieberg Advisor: Rebecca Nugent Abstract In this research project, we combined statistical.
POSTER TEMPLATE BY: Cluster-Based Modeling: Exploring the Linear Regression Model Space Student: XiaYi(Sandy) Shen Advisor:
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
CSE 185 Introduction to Computer Vision Pattern Recognition 2.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Intelligent Systems Laboratory 1 CLUSTERS Prof. George Papadourakis,
Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
Clustering.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
CSE4334/5334 Data Mining Clustering. What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related)
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Unsupervised Learning
Data Mining: Basic Cluster Analysis
Clustering CSC 600: Data Mining Class 21.
Chapter 15 – Cluster Analysis
CSE 5243 Intro. to Data Mining
K-means and Hierarchical Clustering
John Nicholas Owen Sarah Smith
Multivariate Statistical Methods
Announcements Project 4 questions Evaluations.
Cluster Analysis.
Text Categorization Berlin Chen 2003 Reference:
SEEM4630 Tutorial 3 – Clustering.
Hierarchical Clustering
Unsupervised Learning
Presentation transcript:

POSTER TEMPLATE BY: Note: in high dimensions, the data are sphered prior to distance matrix calculation. Three Groups Example; Complete linkage Low Minimum Density: Two observations are far apart. C1C2C3C4C5 BM BF OM OF C1C2C3C4C5 BM BF OM OF C1C2C3 S S S C1C2C3 S14910 S S Clustering with a Density-Based Similarity Measure Student: Xiaoyi Fei, Advisor: Rebecca Nugent Carnegie Mellon University, Pittsburgh, Pennsylvania Introduction Algorithm In Practice In Practice (Continued) Performance Comment Clustering Based on Density How does it work? Minimum-Density Clustering “Distance” For Further Information Iris Data: 50 specimens each of 3 types of iris. Sepal.length and width, petal length and width. The algorithm handles different size of groups The shape of the data set has minimal effect on the performance of the algorithm. The bottom of the tree are the modes of the data The top points tend to be outliers May need to prune more groups in order to extract substantial clusters. New Ideas: Prune the tree from the bottom instead of from the top. Will find modes, outliers will need to be assigned. What is Clustering? The goal of clustering is to identify distinct groups in a data set and assign a group label to each observation. Calculate Distance Matrix: Use Gaussian Kernel Density Estimate to estimate the true density of the data. The bandwidth of the density estimate is chosen by least squares cross validation. Use grid search to estimate the minimum density between pairs of observations. Please contact Xiaoyi Fei at For suggestions, questions and comments on this research Contact Information: Xiaoyi Fei Phone: (703) A Desirable Clustering Partitions Observations Such That: Observations in the same cluster are very similar to each other Observations in different clusters are very different from each other. One Commonly used Algorithm: K-means user asks for k clusters, the algorithm finds k cluster centers that minimize the within-cluster sum of squared Euclidean distances from each cluster’s points to its center Many popular clustering algorithms depend on Euclidean distance. The ideal clustering algorithm would be able to identify clusters of different sizes and shapes. High Density Area: Area where observations occur frequently Low Density Area: Area where observations occur rarely Clusters are then identified as connected high density areas separated by low density areas. K-means only works well with spherical data Observations in same cluster have high d(i,j) Observations in different cluster have low d(i, j) Define the distance between two observations to be minimum of the density along the line segment connecting them. Acknowledgement Special thanks for Dr. Rebecca Nugent. Partition Data into Clusters Use hierarchical clustering/linkage method on distance matrix. Start with all observations, eventually combine into one cluster. Steps: 1.Start with each observation as its own group. 2.The two closest groups are merged into a new group. (linked) 3.Update the distance among all groups. 4.Repeat 2) and 3) until left with one group. Pruning the Tree : extract k clusters from the tree by cutting the tree at the height where there are k branches. Crabs Data : 50 crabs each of two species (orange/blue) and both genders. 200 total. 5 morphological measurements. Spirometry Data : Sets of results from three pulmonary function tests run on 50 internal medicine patients; the type of pulmonary disease is classified by physicians as Normal, Obstructive, Restrictive and Mixed. (Data are standardized for age, gender ethnicity and BMI.) Euclidean Distance Minimum Density Euclidean Distance Minimum Density Euclidean Distance High Minimum Density: Two observations are close Low Minimum Density: Two observations are far.