Jagdish Gangolly State University of New York at Albany

Slides:



Advertisements
Similar presentations
Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.
Advertisements

Clustering II.
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Clustering.
Cluster Analysis: Basic Concepts and Algorithms
PARTITIONAL CLUSTERING
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Introduction to Bioinformatics
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)
Unsupervised learning: Clustering Ata Kaban The University of Birmingham
Grouping Data Methods of cluster analysis. Goals 1 1.We want to identify groups of similar artifacts or features or sites or graves, etc that represent.
Clustering II.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Cluster Analysis: Basic Concepts and Algorithms
What is Cluster Analysis?
What is Cluster Analysis?
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Cluster Analysis Chapter 12.
Last lecture summary.
COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
1 Gene Ontology Javier Cabrera. 2 Outline Goal: How to identify biological processes or biochemical pathways that are changed by treatment.Goal: How to.
Data Clustering 2 – K Means contd & Hierarchical Methods Data Clustering – An IntroductionSlide 1.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
Prepared by: Mahmoud Rafeek Al-Farra
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Clustering/Cluster Analysis. What is Cluster Analysis? l Finding groups of objects such that the objects in a group will be similar (or related) to one.
Roberto Battiti, Mauro Brunato
Clustering / Scaling. Cluster Analysis Objective: – Partitions observations into meaningful groups with individuals in a group being more “similar” to.
Multivariate statistical methods Cluster analysis.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
1 Statistics & R, TiP, 2011/12  Cluster Analysis  Technique of exploratory data analysis  Do data divide into groups or ‘clusters’  Are there ‘natural’
Chapter_20 Cluster Analysis Naresh K. Malhotra
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Multivariate statistical methods
Data Mining: Basic Cluster Analysis
Machine Learning for the Quantified Self
Clustering CSC 600: Data Mining Class 21.
Clustering 28/03/2016 A diák alatti jegyzetszöveget írta: Balogh Tamás Péter.
CZ5211 Topics in Computational Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Charity Morgan Functional Data Analysis April 12, 2005
Hierarchical Clustering
Data Mining K-means Algorithm
Canadian Bioinformatics Workshops
Data Mining -Cluster Analysis. What is a clustering ? Clustering is the process of grouping data into classes, or clusters, so that objects within a cluster.
Cluster Analysis: Basic Concepts and Algorithms
John Nicholas Owen Sarah Smith
Multivariate community analysis
Data Mining Cluster Techniques: Basic
Revision (Part II) Ke Chen
Clustering 23/03/2016 A diák alatti jegyzetszöveget írta: Balogh Tamás Péter.
Clustering and Multidimensional Scaling
Revision (Part II) Ke Chen
Multivariate Statistical Methods
Jagdish Gangolly State University of New York at Albany
Data Mining – Chapter 4 Cluster Analysis Part 2
Clustering Wei Wang.
Chapter_20 Cluster Analysis
Cluster Analysis.
Text Categorization Berlin Chen 2003 Reference:
Clustering The process of grouping samples so that the samples are similar within each group.
SEEM4630 Tutorial 3 – Clustering.
Cluster analysis Presented by Dr.Chayada Bhadrakom
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Presentation transcript:

Jagdish Gangolly State University of New York at Albany Clustering Jagdish Gangolly State University of New York at Albany Acc 522 Fall, 2006 (Jagdish S. Gangolly) 9/17/2018

Clustering Clustering in S-Plus Objectives of Clustering Methods Hierarchical Partitioning (iterative-relocation) Model-based methods Acc 522 Fall, 2006 (Jagdish S. Gangolly) 9/17/2018

Clustering in S-Plus You need to load the S-Plus cluster library library(cluster) Data can be either in np matrix of measurement on each of the p variables for each object, or nn matrix of dissimilarities where d(i,j) in the matrix represents dissimilarity between object i and object j. daisy in the library cluster constructs the dissimilarity matrix. Acc 522 Fall, 2006 (Jagdish S. Gangolly) 9/17/2018

Objectives of Clustering To classify data set into groups that are internally cohesive and externally isolated (loosely coupled) dataset (matrix, dataframe) distance measure optimisation criterion number of clusters (partitioning) shape of clusters, probability distribution (model-based) Acc 522 Fall, 2006 (Jagdish S. Gangolly) 9/17/2018

Distance Measures Data mining Text Ch 2. Slides 47-56. Acc 522 Fall, 2006 (Jagdish S. Gangolly) 9/17/2018

Clustering methods: Hierarchical I Hierarchical Methods: Agglomerative methods: Start with each observation forming a separate group. Observations close to each other are successively merged. The results are displayed in the form of a dendrogram Divisive methods: Initial cluster consists of one cluster containing the whole dataset. This is successively split into ntwo smaller clusters until each cluster contains exactly a single object Acc 522 Fall, 2006 (Jagdish S. Gangolly) 9/17/2018

Clustering methods: Hierarchical II Agglomerative Nesting: agnes(x, diss, metric, stand, method,…) Methods: average (group average) single (linkage), nearest neighbour method complete (linkage), furthest neighbour method ward (Ward’s method) weighted (weighted average linkage) Evaluation criterion: Agglomeration coefficient (AC) Results display: Dendrogram, Banner plot Acc 522 Fall, 2006 (Jagdish S. Gangolly) 9/17/2018

Clustering methods: Hierarchical III hclust: hierarchical clustering hclust(dist, method, sim) dist: distances method: compact (complete linkage) average connected (single linkage) results displayed using plclust Acc 522 Fall, 2006 (Jagdish S. Gangolly) 9/17/2018

Clustering methods: Hierarchical IV Divisive Analysis: diana(x, diss, metric, stand, …) Evaluation criterion: Divisive coefficient (DC) Results display: Dendrogram, Banner plot Monothetic Analysis: For binary data matrix. For each split, mona uses a single (well-chosen) variable mona(x) Acc 522 Fall, 2006 (Jagdish S. Gangolly) 9/17/2018

Clustering methods: Partitioning Methods I Method for dividing the set of objects into k clusters; k needs to be specified by the user. k-means: Partitioning among Medoids: accepts a dissimilarity matrix, minimises the sum of dissimilarities (rather than distances) and so is more robust, and displays a silhoutte plot pam(data, k, diss, metric, stand,…) data: matrix or dataframe diss: T or F metric: euclidean or manhattan stand: T or F Acc 522 Fall, 2006 (Jagdish S. Gangolly) 9/17/2018

Clustering methods: Partitioning Methods II Clustering large applications: Considers data subsets of fixed size to cluster very large datasets clara(x, k, metric, stand, samples, sampsize, …) Fanny: Fuzzy clustering. fanny(x, k, diss, metric, stand,…) Acc 522 Fall, 2006 (Jagdish S. Gangolly) 9/17/2018

Clustering: Displays of Results Dendrograms: plot.agnes() plot.diana() plot.mona() Print: print.agnes() print.diana() print.mona() print.pam() print.fanny() print.clara() Acc 522 Fall, 2006 (Jagdish S. Gangolly) 9/17/2018