Jagdish Gangolly State University of New York at Albany

Slides:



Advertisements
Similar presentations
Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.
Advertisements

Clustering II.
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Clustering.
Cluster Analysis: Basic Concepts and Algorithms
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
PARTITIONAL CLUSTERING
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Introduction to Bioinformatics
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)
Unsupervised learning: Clustering Ata Kaban The University of Birmingham
Grouping Data Methods of cluster analysis. Goals 1 1.We want to identify groups of similar artifacts or features or sites or graves, etc that represent.
Clustering II.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Cluster Analysis: Basic Concepts and Algorithms
What is Cluster Analysis?
What is Cluster Analysis?
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Elements of cluster analysis Purpose of cluster analysis Various clustering techniques Agglomerative clustering Individual distances Group distances Other.
Cluster Analysis Chapter 12.
Last lecture summary.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
1 Gene Ontology Javier Cabrera. 2 Outline Goal: How to identify biological processes or biochemical pathways that are changed by treatment.Goal: How to.
Data Clustering 2 – K Means contd & Hierarchical Methods Data Clustering – An IntroductionSlide 1.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Prepared by: Mahmoud Rafeek Al-Farra
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
CURE: EFFICIENT CLUSTERING ALGORITHM FOR LARGE DATASETS VULAVALA VAMSHI PRIYA.
Clustering/Cluster Analysis. What is Cluster Analysis? l Finding groups of objects such that the objects in a group will be similar (or related) to one.
Clustering / Scaling. Cluster Analysis Objective: – Partitions observations into meaningful groups with individuals in a group being more “similar” to.
Multivariate statistical methods Cluster analysis.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
1 Statistics & R, TiP, 2011/12  Cluster Analysis  Technique of exploratory data analysis  Do data divide into groups or ‘clusters’  Are there ‘natural’
Chapter_20 Cluster Analysis Naresh K. Malhotra
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Clustering (1) Clustering Similarity measure Hierarchical clustering
Multivariate statistical methods
Data Mining: Basic Cluster Analysis
Machine Learning for the Quantified Self
Clustering CSC 600: Data Mining Class 21.
Clustering 28/03/2016 A diák alatti jegyzetszöveget írta: Balogh Tamás Péter.
CZ5211 Topics in Computational Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Charity Morgan Functional Data Analysis April 12, 2005
What Is the Problem of the K-Means Method?
Hierarchical Clustering
Data Mining K-means Algorithm
Canadian Bioinformatics Workshops
Data Mining -Cluster Analysis. What is a clustering ? Clustering is the process of grouping data into classes, or clusters, so that objects within a cluster.
Jagdish Gangolly State University of New York at Albany
Cluster Analysis: Basic Concepts and Algorithms
John Nicholas Owen Sarah Smith
Multivariate community analysis
Data Mining Cluster Techniques: Basic
Clustering 23/03/2016 A diák alatti jegyzetszöveget írta: Balogh Tamás Péter.
Clustering and Multidimensional Scaling
Multivariate Statistical Methods
CSCI N317 Computation for Scientific Applications Unit Weka
Data Mining – Chapter 4 Cluster Analysis Part 2
Clustering Wei Wang.
Chapter_20 Cluster Analysis
Cluster Analysis.
Text Categorization Berlin Chen 2003 Reference:
Clustering The process of grouping samples so that the samples are similar within each group.
SEEM4630 Tutorial 3 – Clustering.
Cluster analysis Presented by Dr.Chayada Bhadrakom
Presentation transcript:

Jagdish Gangolly State University of New York at Albany Clustering Jagdish Gangolly State University of New York at Albany Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering Clustering in S-Plus Objectives of Clustering Methods Hierarchical Partitioning (iterative-relocation) Model-based methods Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering in S-Plus You need to load the S-Plus cluster library library(cluster) Data can be either in np matrix of measurement on each of the p variables for each object, or nn matrix of dissimilarities where d(i,j) in the matrix represents dissimilarity between object i and object j. daisy in the library cluster constructs the dissimilarity matrix. Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Objectives of Clustering To classify data set into groups that are internally cohesive and externally isolated (loosely coupled) dataset (matrix, dataframe) distance measure optimisation criterion number of clusters (partitioning) shape of clusters, probability distribution (model-based) Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering methods: Hierarchical I Hierarchical Methods: Agglomerative methods: Start with each observation forming a separate group. Observations close to each other are successively merged. The results are displayed in the form of a dendrogram Divisive methods: Initial cluster consists of one cluster containing the whole dataset. This is successively split into ntwo smaller clusters until each cluster contains exactly a single object Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering methods: Hierarchical II Agglomerative Nesting: agnes(x, diss, metric, stand, method,…) Methods: average (group average) single (linkage), nearest neighbour method complete (linkage), furthest neighbour method ward (Ward’s method) weighted (weighted average linkage) Evaluation criterion: Agglomeration coefficient (AC) Results display: Dendrogram, Banner plot Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering methods: Hierarchical III hclust: hierarchical clustering hclust(dist, method, sim) dist: distances method: compact (complete linkage) average connected (single linkage) results displayed using plclust Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering methods: Hierarchical IV Divisive Analysis: diana(x, diss, metric, stand, …) Evaluation criterion: Divisive coefficient (DC) Results display: Dendrogram, Banner plot Monothetic Analysis: For binary data matrix. For each split, mona uses a single (well-chosen) variable mona(x) Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering methods: Partitioning Methods I Method for dividing the set of objects into k clusters; k needs to be specified by the user. k-means: Partitioning among Medoids: accepts a dissimilarity matrix, minimises the sum of dissimilarities (rather than distances) and so is more robust, and displays a silhoutte plot pam(data, k, diss, metric, stand,…) data: matrix or dataframe diss: T or F metric: euclidean or manhattan stand: T or F Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering methods: Partitioning Methods II Clustering large applications: Considers data subsets of fixed size to cluster very large datasets clara(x, k, metric, stand, samples, sampsize, …) Fanny: Fuzzy clustering. fanny(x, k, diss, metric, stand,…) Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering: Displays of Results Dendrograms: plot.agnes() plot.diana() plot.mona() Print: print.agnes() print.diana() print.mona() print.pam() print.fanny() print.clara() Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018