Clustering based on book chapter Cluster Analysis in Multivariate Analysis by Hair, Anderson, Tatham, and Black.

Slides:



Advertisements
Similar presentations
Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.
Advertisements

Clustering II.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Chapter 12: Cluster analysis and segmentation of customers
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Metrics, Algorithms & Follow-ups Profile Similarity Measures Cluster combination procedures Hierarchical vs. Non-hierarchical Clustering Statistical follow-up.
Cluster Analysis.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)
Clustering II.
Speaker Clustering using MDL Principles Kofi Boakye Stat212A Project December 3, 2003.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
Cluster Analysis: Basic Concepts and Algorithms
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Multivariate Data Analysis Chapter 9 - Cluster Analysis
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Dr. Michael R. Hyman Cluster Analysis. 2 Introduction Also called classification analysis and numerical taxonomy Goal: assign objects to groups so that.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong
CLUSTER ANALYSIS.
Why is it useful to use multivariate statistical methods for microfacies analysis? A microfacies is a multivariate object: each sample is characterized.
© 2007 Prentice Hall20-1 Chapter Twenty Cluster Analysis.
Cluster Analysis Cluster Analysis Cluster analysis is a class of techniques used to classify objects or cases into relatively homogeneous groups.
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 9-1 Chapter 9 Cluster Analysis.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Clustering/Cluster Analysis. What is Cluster Analysis? l Finding groups of objects such that the objects in a group will be similar (or related) to one.
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
Factor & Cluster Analyses. Factor Analysis Goals Data Process Results.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Copyright © 2010 Pearson Education, Inc Chapter Twenty Cluster Analysis.
Clustering / Scaling. Cluster Analysis Objective: – Partitions observations into meaningful groups with individuals in a group being more “similar” to.
1 Cluster Analysis Prepared by : Prof Neha Yadav.
Multivariate statistical methods Cluster analysis.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
CSE4334/5334 Data Mining Clustering. What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related)
Chapter_20 Cluster Analysis Naresh K. Malhotra
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Unsupervised Learning
Multivariate statistical methods
Hierarchical Clustering
Clustering Patrice Koehl Department of Biological Sciences
Clustering CSC 600: Data Mining Class 21.
Chapter 15 – Cluster Analysis
Lecturing 12 Cluster Analysis
CZ5211 Topics in Computational Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Hierarchical Clustering
CSE 5243 Intro. to Data Mining
Clustering and Multidimensional Scaling
CSCI N317 Computation for Scientific Applications Unit Weka
Data Mining – Chapter 4 Cluster Analysis Part 2
Chapter_20 Cluster Analysis
Register variation: correlation, clusters and factors
Cluster Analysis.
Hierarchical Clustering
Clustering The process of grouping samples so that the samples are similar within each group.
Cluster analysis Presented by Dr.Chayada Bhadrakom
Hierarchical Clustering
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Unsupervised Learning
Presentation transcript:

Clustering based on book chapter Cluster Analysis in Multivariate Analysis by Hair, Anderson, Tatham, and Black

Cluster analysis What? we group objects based on characteristics they posses also called as numerical taxonomy or typology construction often atheoretical: no statistical basis, lots of heuristics

Intuitive basis Ach, ja: Gestaltlagen

Clustering methods: nonhierarchical hierarchical fuzzy vector quantization hierarchical agglomerative divisive fuzzy probabilistic mixture models?

Obectives exploratory/confirmatory taxonomy description (e.g. biology) data simplification (e.g. segmentation)

Select the variables abracadabra, explicit theories, past research, suppositions, hopes, deadlines,

Research design detect and remove outliers choose a similarity measure Householder norm (usually Euclid) Mahalanobis correlation standardize the data by variable within case

Similarity measures

Research design representativeness of the sample (cf. outliers) multicollinearity?

How is this done? abracadabra, explicit theories, past research, suppositions, hopes, deadlines,

Clustering procedure single linkage complete linkage average linkage centroid method Ward’s method

Single linkage results easily in snake-like clusters even if they don’t exist

Complete linkage eliminates the snake formation, otherwise a big question mark

Average linkage joins clusters with smallest average distances not so outlier sensitive tends to form cluster with small within-cluster variation biased to form clusters with approximately the same variance etc.

Centroid method

Centroid method most outlier robust confusing situations: intercentroid distances may become smaller than distances between already joined pairs: messes up the dendorgram

Ward’s method distance between two clusters is something squared tends to combine clusters with small number of objects biased toward clusters with approximately equal number of objects

Nonhierachical heuristical methods: sequential treshold/parallel treshold objective function based: VQ:s, e.g., K-means procedure Hierachical: O(N2), K-means O(KN)

How many clusters open question practical limits (it would be nice to have 3-6 clusters) dendrogram based (large increase in cluster distances

Validation exogeneous variables indexes, e.g. Davies-Bouldin measure age:15 age:20 age:14 doesn’t like DD likes Donald Duck

Key issues similarity or dissimilarity measure ...and data standardization