Jagdish Gangolly State University of New York at Albany

Slides:

Advertisements

Similar presentations

Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.

Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.

Cluster Analysis: Basic Concepts and Algorithms

Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.

PARTITIONAL CLUSTERING

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.

Introduction to Bioinformatics

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)

Unsupervised learning: Clustering Ata Kaban The University of Birmingham

Grouping Data Methods of cluster analysis. Goals 1 1.We want to identify groups of similar artifacts or features or sites or graves, etc that represent.

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Cluster Analysis: Basic Concepts and Algorithms

What is Cluster Analysis?

What is Cluster Analysis?

Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.

Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.

Elements of cluster analysis Purpose of cluster analysis Various clustering techniques Agglomerative clustering Individual distances Group distances Other.

Cluster Analysis Chapter 12.

Last lecture summary.

1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.

START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.

1 Gene Ontology Javier Cabrera. 2 Outline Goal: How to identify biological processes or biochemical pathways that are changed by treatment.Goal: How to.

Data Clustering 2 – K Means contd & Hierarchical Methods Data Clustering – An IntroductionSlide 1.

CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.

Prepared by: Mahmoud Rafeek Al-Farra

K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:

DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.

CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:

CURE: EFFICIENT CLUSTERING ALGORITHM FOR LARGE DATASETS VULAVALA VAMSHI PRIYA.

Clustering/Cluster Analysis. What is Cluster Analysis? l Finding groups of objects such that the objects in a group will be similar (or related) to one.

Clustering / Scaling. Cluster Analysis Objective: – Partitions observations into meaningful groups with individuals in a group being more “similar” to.

Multivariate statistical methods Cluster analysis.

DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.

1 Statistics & R, TiP, 2011/12  Cluster Analysis  Technique of exploratory data analysis  Do data divide into groups or ‘clusters’  Are there ‘natural’

Chapter_20 Cluster Analysis Naresh K. Malhotra

CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.

Clustering (1) Clustering Similarity measure Hierarchical clustering

Multivariate statistical methods

Data Mining: Basic Cluster Analysis

Machine Learning for the Quantified Self

Clustering CSC 600: Data Mining Class 21.

Clustering 28/03/2016 A diák alatti jegyzetszöveget írta: Balogh Tamás Péter.

CZ5211 Topics in Computational Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:

Charity Morgan Functional Data Analysis April 12, 2005

What Is the Problem of the K-Means Method?

Hierarchical Clustering

Data Mining K-means Algorithm

Canadian Bioinformatics Workshops

Data Mining -Cluster Analysis. What is a clustering ? Clustering is the process of grouping data into classes, or clusters, so that objects within a cluster.

Jagdish Gangolly State University of New York at Albany

Cluster Analysis: Basic Concepts and Algorithms

John Nicholas Owen Sarah Smith

Multivariate community analysis

Data Mining Cluster Techniques: Basic

Clustering 23/03/2016 A diák alatti jegyzetszöveget írta: Balogh Tamás Péter.

Clustering and Multidimensional Scaling

Multivariate Statistical Methods

CSCI N317 Computation for Scientific Applications Unit Weka

Data Mining – Chapter 4 Cluster Analysis Part 2

Clustering Wei Wang.

Chapter_20 Cluster Analysis

Cluster Analysis.

Text Categorization Berlin Chen 2003 Reference:

Clustering The process of grouping samples so that the samples are similar within each group.

SEEM4630 Tutorial 3 – Clustering.

Cluster analysis Presented by Dr.Chayada Bhadrakom

Presentation transcript:

Jagdish Gangolly State University of New York at Albany Clustering Jagdish Gangolly State University of New York at Albany Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering Clustering in S-Plus Objectives of Clustering Methods Hierarchical Partitioning (iterative-relocation) Model-based methods Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering in S-Plus You need to load the S-Plus cluster library library(cluster) Data can be either in np matrix of measurement on each of the p variables for each object, or nn matrix of dissimilarities where d(i,j) in the matrix represents dissimilarity between object i and object j. daisy in the library cluster constructs the dissimilarity matrix. Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Objectives of Clustering To classify data set into groups that are internally cohesive and externally isolated (loosely coupled) dataset (matrix, dataframe) distance measure optimisation criterion number of clusters (partitioning) shape of clusters, probability distribution (model-based) Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering methods: Hierarchical I Hierarchical Methods: Agglomerative methods: Start with each observation forming a separate group. Observations close to each other are successively merged. The results are displayed in the form of a dendrogram Divisive methods: Initial cluster consists of one cluster containing the whole dataset. This is successively split into ntwo smaller clusters until each cluster contains exactly a single object Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering methods: Hierarchical II Agglomerative Nesting: agnes(x, diss, metric, stand, method,…) Methods: average (group average) single (linkage), nearest neighbour method complete (linkage), furthest neighbour method ward (Ward’s method) weighted (weighted average linkage) Evaluation criterion: Agglomeration coefficient (AC) Results display: Dendrogram, Banner plot Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering methods: Hierarchical III hclust: hierarchical clustering hclust(dist, method, sim) dist: distances method: compact (complete linkage) average connected (single linkage) results displayed using plclust Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering methods: Hierarchical IV Divisive Analysis: diana(x, diss, metric, stand, …) Evaluation criterion: Divisive coefficient (DC) Results display: Dendrogram, Banner plot Monothetic Analysis: For binary data matrix. For each split, mona uses a single (well-chosen) variable mona(x) Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering methods: Partitioning Methods I Method for dividing the set of objects into k clusters; k needs to be specified by the user. k-means: Partitioning among Medoids: accepts a dissimilarity matrix, minimises the sum of dissimilarities (rather than distances) and so is more robust, and displays a silhoutte plot pam(data, k, diss, metric, stand,…) data: matrix or dataframe diss: T or F metric: euclidean or manhattan stand: T or F Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering methods: Partitioning Methods II Clustering large applications: Considers data subsets of fixed size to cluster very large datasets clara(x, k, metric, stand, samples, sampsize, …) Fanny: Fuzzy clustering. fanny(x, k, diss, metric, stand,…) Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018

Clustering: Displays of Results Dendrograms: plot.agnes() plot.diana() plot.mona() Print: print.agnes() print.diana() print.mona() print.pam() print.fanny() print.clara() Acc 522 Fallo, 2001 (Jagdish S. Gangolly) 12/29/2018