1 Cluster Analysis – 2 Approaches K-Means (traditional) Latent Class Analysis (new) by Jay Magidson, Statistical Innovations based in part on a presentation.

Slides:



Advertisements
Similar presentations
K-Means Clustering Algorithm Mining Lab
Advertisements

Clustering II.
Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Clustering.
Clustering Beyond K-means
K Means Clustering , Nearest Cluster and Gaussian Mixture
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
K-means clustering Hongning Wang
Metrics, Algorithms & Follow-ups Profile Similarity Measures Cluster combination procedures Hierarchical vs. Non-hierarchical Clustering Statistical follow-up.
Introduction to Bioinformatics
COMP 328: Final Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Clustering II.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Contrast Enhancement Crystal Logan Mentored by: Dr. Lucia Dettori Dr. Jacob Furst.
Clustering.
Cluster Analysis (1).
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
CLUSTERING (Segmentation)
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Evaluating Performance for Data Mining Techniques
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Math 5364 Notes Chapter 8: Cluster Analysis Jesse Crawford Department of Mathematics Tarleton State University.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
Cluster analysis 포항공과대학교 산업공학과 확률통계연구실 이 재 현. POSTECH IE PASTACLUSTER ANALYSIS Definition Cluster analysis is a technigue used for combining observations.
Text Clustering.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
1 Cluster Analysis Objectives ADDRESS HETEROGENEITY Combine observations into groups or clusters such that groups formed are homogeneous (similar) within.
Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
Clustering Algorithms Presented by Michael Smaili CS 157B Spring
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Big Data Infrastructure Week 9: Data Mining (4/4) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Semi-Supervised Clustering
Latent Variables, Mixture Models and EM
Clustering and Multidimensional Scaling
Revision (Part II) Ke Chen
Clustering Wei Wang.
Biointelligence Laboratory, Seoul National University
Presentation transcript:

1 Cluster Analysis – 2 Approaches K-Means (traditional) Latent Class Analysis (new) by Jay Magidson, Statistical Innovations based in part on a presentation by Wagner Kamakura at Statistical Modeling Week 2004

2 K-Means clustering Partitioning algorithm Partitions a data set into a pre-determined number of groups (clusters) that are homogeneous in terms of selected continuous variables Y 1,Y 2,… How it works: User chooses the number K of clusters and selects variables to define these clusters Step 1: Algorithm randomly positions each cluster at a point in the variable space Step 2: Each case is assigned to the nearest of the K clusters using Euclidean distance Step 3: Within-cluster means are computed and the clusters are re-positioned at this centroid point The process is iterated until convergence

3 Illustration of K-Means clustering – Step 1 Y1Y1 Y2Y2 Cluster 1 Cluster 2 Cluster 3

4 Step 2: Cases Assigned to Clusters Y1Y1 Y2Y2

5 Step 3: Clusters are Repositioned Y1Y1 Y2Y2

6 Steps 2 and 3 are Repeated Y1Y1 Y2Y2

7 Cases are Assigned to Repositioned Clusters, and Algorithm Continues Y1Y1 Y2Y2

8 Problems with K-Means Approach  Needs a metric for similarity or distance between pairs of respondents  No statistical criterion to choose number of clusters  Solution is not unique – depends on random start  Use of Euclidean distance implies that within each cluster, the variance of Y 1 equals the variance of Y 2  Can’t classify respondents with missing data Y1Y1 Y2Y2 Marital Gender Female Male SingleMarriedDivorced ? Distance

9 Latent Class Cluster Models Differences from traditional clustering approach Based on similarity in response patterns, rather than distance between respondents Maximum likelihood and posterior mode parameter estimation utilizes the EM algorithm, which corresponds to a probabilistic extension of the K-Means algorithm Can be applied to variables of different scale types (discrete or continuous) Statistical tests available to compare different models Random sets of starting points to avoid local solutions Missing data not a major problem

10 Latent Class Cluster Analysis  Instead of using distances to classify cases into segments, it uses probabilities  Can handle nominal, ordinal and continuous variables (any combination of these)  Isn’t as sensitive to missing data as traditional cluster analysis techniques. Easier to classify a respondent into a segment when some of the data is not available Reference: Magidson and Vermunt “Latent class models for clustering: A comparison with K-means”, Canadian Journal of Marketing Research, Vol. 20.1, 2002, pp