1 Cluster Analysis Prepared by : Prof Neha Yadav.

Slides:



Advertisements
Similar presentations
McGraw-Hill/Irwin McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
Advertisements

Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
N. Kumar, Asst. Professor of Marketing Database Marketing Cluster Analysis.
Chapter 12: Cluster analysis and segmentation of customers
Livelihoods analysis using SPSS. Why do we analyze livelihoods?  Food security analysis aims at informing geographical and socio-economic targeting 
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Metrics, Algorithms & Follow-ups Profile Similarity Measures Cluster combination procedures Hierarchical vs. Non-hierarchical Clustering Statistical follow-up.
Introduction to Bioinformatics
AEB 37 / AE 802 Marketing Research Methods Week 7
Cluster Analysis.
Cluster Analysis Hal Whitehead BIOL4062/5062. What is cluster analysis? Non-hierarchical cluster analysis –K-means Hierarchical divisive cluster analysis.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)
Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters.
What is Cluster Analysis?
Multivariate Data Analysis Chapter 9 - Cluster Analysis
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Dr. Michael R. Hyman Cluster Analysis. 2 Introduction Also called classification analysis and numerical taxonomy Goal: assign objects to groups so that.
Clustering analysis workshop Clustering analysis workshop CITM, Lab 3 18, Oct 2014 Facilitator: Hosam Al-Samarraie, PhD.
Clustering Unsupervised learning Generating “classes”
Segmentation Analysis
Cluster Analysis Chapter 12.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong
CLUSTER ANALYSIS.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
© 2007 Prentice Hall20-1 Chapter Twenty Cluster Analysis.
Cluster analysis 포항공과대학교 산업공학과 확률통계연구실 이 재 현. POSTECH IE PASTACLUSTER ANALYSIS Definition Cluster analysis is a technigue used for combining observations.
Chapter XX Cluster Analysis. Chapter Outline Chapter Outline 1) Overview 2) Basic Concept 3) Statistics Associated with Cluster Analysis 4) Conducting.
Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Intelligent Systems Laboratory 1 CLUSTERS Prof. George Papadourakis,
1 Cluster Analysis Objectives ADDRESS HETEROGENEITY Combine observations into groups or clusters such that groups formed are homogeneous (similar) within.
Cluster Analysis Cluster Analysis Cluster analysis is a class of techniques used to classify objects or cases into relatively homogeneous groups.
Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Cluster Analysis.
1 Hair, Babin, Money & Samouel, Essentials of Business Research, Wiley, Learning Objectives: 1.Explain the difference between dependence and interdependence.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
Selecting Diverse Sets of Compounds C371 Fall 2004.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Machine Learning Queens College Lecture 7: Clustering.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L10.1 Lecture 10: Cluster analysis l Uses of cluster analysis.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Clustering Patrice Koehl Department of Biological Sciences National University of Singapore
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Copyright © 2010 Pearson Education, Inc Chapter Twenty Cluster Analysis.
Cluster Analysis. 1. A cluster, by definition, is a group of similar objects. Cluster analysis is a technique for grouping individuals or objects into.
Clustering / Scaling. Cluster Analysis Objective: – Partitions observations into meaningful groups with individuals in a group being more “similar” to.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
Market Intelligence Class 8. Agenda Experimental research – Factorial designs Segmentation – Tactical – Strategic – a priori and clustering approaches.
CLUSTER ANALYSIS. What is Cluster analysis? Cluster analysis is a techniques for grouping objects, cases, entities on the basis of multiple variables.
Basic statistical concepts Variance Covariance Correlation and covariance Standardisation.
Chapter_20 Cluster Analysis Naresh K. Malhotra
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Unsupervised Learning
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
PREDICT 422: Practical Machine Learning
Chapter 15 – Cluster Analysis
Cluster Analysis.
Clustering.
Revision (Part II) Ke Chen
Multivariate Statistical Methods
Data Mining – Chapter 4 Cluster Analysis Part 2
Chapter_20 Cluster Analysis
Cluster Analysis.
Cluster analysis Presented by Dr.Chayada Bhadrakom
Unsupervised Learning
Presentation transcript:

1 Cluster Analysis Prepared by : Prof Neha Yadav

2 Application Areas.. Segmenting the market : example – segment the market on the basis of benefits sought. Segmenting the market : on the basis of demographics, geo-demographics, psychographics, buyer behavior ( quality consciousness and price sensitivity ). Understanding buyer behaviors : identify homogeneous groups of buyers.

3 Application Areas.. Identifying new product opportunities : by clustering brands and products, competitive sets within the market can be determined. Brands in the same cluster compete more fiercely with each other than in other clusters. A company can examine its current offerings compared to those of its competitors to identify potential new product opportunities. Selecting Test Markets : by grouping cities into homogeneous clusters, it is possible to select comparable cities to test various marketing strategies.

4 Application Areas.. Reducing data : cluster analysis is used as a general data reduction technique to develop clusters or subgroups of data. For example, to describe differences in consumers’ product usage behavior, consumers must first be clustered into groups. The differences between the groups can be examined using multiple discriminant analysis.

5 SPSS Commands ( Stage 1 ).. Click on : “ Analyze “ Click on : “ Classify “, “ HIERARCHICAL CLUSTER ” On the dialogue box which appears, select all the variables required for the Cluster analysis by clicking on the right arrow to transfer them from the variable list on the left to the variables box on the right.

6 SPSS Commands ( Stage 1 ).. Under the small section called “ CLUSTER ”, select “ CASES ” because you would be clustering cases ( rows of data that are normally respondents or objects grouped into clusters. In another small box called “ Display ”, select “ Statistics ” and “ Plots ”. Click on “ Method ”. A dialogue box will then open up. Choose “ Ward Linkages ” as the clustering method. In the box titled “ Measure ”, choose “ Squared Euclidean Distance ”. Click “ Continue ” to return to the main dialogue box.

7 SPSS Commands ( Stage 1 ).. Click “ Statistics ” on the main dialogue box. Choose “ Agglomeration Schedule ” so that it will appear in the final output. Click “ Continue ”. Click “ Plots ” on the main dialogue box. Choose “ Dendogram ”. Then, on the box called ICICLE, choose “ All Clusters ” and “ Verticals ”.This will get you all the required plots on the output. Click “ Continue ” to return to the main dialogue box. Click “ OK ” on the main dialogue box to get the output of the hierarchical cluster analysis.

8 SPSS Commands ( Stage 2 ).. After the number of clusters have been identified using the hierarchical clustering method, you can proceed to the second stage of the cluster analysis. This second stage is called the “non-hierarchical ” or “ K – Means ” clustering. This generally provides a stable solution, and is used if you know how many clusters you want. This also called as “ Quick Clustering ”.

9 SPSS Commands ( Stage 2 ).. Click “ Classify ”, followed by “ K-Means Cluster ”. Fill in the desired number of clusters you have identified from Stage – 1. Click “ Options ” on the main dialogue box. Select “ Initial Cluster Centers ”, “ ANOVA Table ” and “Cluster information for each case ”, in the box labeled “ Statistics ”. Click “ Continue ” to return to the main dialogue box. Click “ OK ” to get the output which contains the final cluster centers from the K-Means clustering method.

10 Conducting Cluster Analysis 1.Formulate the problem. 2.Select a distance measure. 3.Select a clustering procedure. 4.Decide on the number of clusters. 5.Interpret and profile clusters. 6.Assess the validity of clusters.

11 Clustering Procedures.. Clustering Procedures Hierarchical 1 Agglomerative ** Divisive Non hierarchical ( K –means ) Sequential thresholdParallel threshold Optimizing Partitioning 2

12 Clustering Procedures.. Agglomerative ** Linkage Methods Single Linkage Complete Linkage Average Linkage 1 Variance Methods Ward’s Method 1 Centroid Methods

13 Clustering Procedures.. Hierarchical Clustering : is characterized by the development of a hierarchy or tree like structure. Hierarchical methods can be agglomerative or divisive. Agglomerative Clustering : starts with each object in a separate cluster. Clusters are formed by grouping objects into bigger and bigger clusters. This process is continued until all the objects are members of a some cluster. These are commonly used in market research. They consist of linkage methods, variance methods, centroid methods. Divisive Clustering : starts with all the objects grouped into a single cluster. Clusters are divided or split until each object is in a separate cluster.

14 Clustering Procedures.. ( agglomerative - Linkage ) a.Single Linkage Method : based on the minimum distance or the nearest neighbor rule. The first two objects clustered are those that have the smallest distance between them. The next shortest distance is identified, and either the third object is clustered with the first two, or a new two-object cluster is formed. Minimum distance Cluster 1 Cluster 2 Single Linkage

15 Clustering Procedures.. ( agglomerative – Linkage ) b. Complete Linkage Method : based on the maximum distance or the furthest neighbor approach. In this method, the distance between two clusters is calculated as the distance between their two farthest points. Maximum distance Complete Linkage Cluster 1 Cluster 2

16 Clustering Procedures.. ( agglomerative – Linkage ) c. Average Linkage Method : in this method, the distance between two clusters is defined as the average of the distances between all the pairs of objects, where one member of the pair is from each of the clusters. As the average linkage method uses information on all the pairs of distances, not merely the minimum or maximum distances, it is usually the most preferred method. Average distance Cluster 1 Cluster 2 Average Linkage

17 Clustering Procedures.. ( agglomerative – Variance ) Variance method : attempts to generate clusters to minimize within - cluster variance. The most commonly used variance method is the Ward’s Procedure. Ward’s Method

18 Clustering Procedures.. ( agglomerative – Centroid ) Centroid Method : the distance between two clusters is the distance between their centroids ( means for all the variables ). Centroid Method Cluster 1Cluster 2

19 Clustering Procedures.. ( non – hierarchical or k-means ) Sequential threshold method : a non-hierarchical or k - means clustering method in which a cluster center is first selected and then all the objects within a specified threshold value from the center are grouped together. Then a new cluster center or seed is selected, and the process is repeated for the unclustered points. Once an object is clustered with a seed, it is no longer considered for clustering with subsequent seeds.

20 Parallel threshold method : operates similar to the earlier described method, except that several cluster centers are selected simultaneously, and objects within the threshold level are grouped with the nearest center. Clustering Procedures.. ( non – hierarchical or k-means )

21 Clustering Procedures.. ( non – hierarchical or k-means ) Optimizing partitioning method : differs from the other two threshold methods in that the objects can later be reassigned to clusters to optimize an overall criterion, such as average within-cluster distance for a given number of clusters.

22 Clustering Procedure.. Step 1 : an initial clustering solution is obtained using the hierarchical procedure, such as average or ward’s method. Step 2 : the number of clusters and cluster centroids so obtained are used as inputs to the optimizing partitioning method ( non-hierarchical procedure ).

23 Example..( cluster shoppers on the basis of attitude towards shopping ) Consumers express their agreement or disagreement ( 1 = disagree, 7 = agree ) V 1 : shopping is fun. V 2 : shopping is bad for your budget. V 3 : I combine shopping with eating out. V 4 : I try to get the best buys when shopping. V 5 : I don’t care about shopping. V 6 : you save a lot of money by comparing prices.

24 Example..( cluster shoppers on the basis of attitude towards shopping ) Sample size : 20 respondents Note : (at least a sample of 100 respondents should be used in real life situations ).

25 Output for cluster analysis..

26 Ward Linkage

27 Ward Linkage In the agglomeration schedule, respondents 14 and 16 are combined at stage 1, as shown in the column labeled “ Clusters Combined ”. The squared Euclidean distance between these two respondents is given under the column labeled “ Coefficients ”. The last column “ next stage ” indicates the stage at which another case ( respondent ) or cluster is combined with this one.

28 Dendrogram * * * * * * H I E R A R C H I C A L C L U S T E R A N A L Y S I S * * * * * * Dendrogram using Ward Method Rescaled Distance Cluster Combine C A S E Label Num òø 16 òú 10 òú 4 òôòòòø 19 ò÷ ùòòòòòòòòòòòòòòòòòòòòòòòø 18 òòòòò÷ ùòòòòòòòòòòòòòòòòòòòø 2 òûòø ó ó 13 ò÷ ùòòòòòòòòòòòòòòòòòòòòòòòòò÷ ó 5 òø ó ó 11 òôò÷ ó 9 òú ó 20 ò÷ ó 3 òûòø ó 8 ò÷ ó ó 6 òø ó ó 7 òú ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷ 12 òú ó 1 òôòú 17 ò÷ ó 15 òòò÷

29 Dendrogram The dendogram is read from left to right. Vertical lines represent clusters that are joined together. The position of the line on the scale indicates the distances at which the clusters were joined. This information is helpful in deciding the number of clusters.

30 Cluster Centroids TABLE - Cluster Centroids (1= disagree, 7= agree) Means of Variables Cluster No. ( bottom up from dendogram ) V 1 V 2 V 3 V 4 V 5 V 6 Shopping is fun Bad for budget Shopping & eating out Get best buys Don’t care about shopping Save money by comparing prices

31 Interpretation of Clusters Cluster No.1 : fun loving shoppers ( respondents nos 1,3,6,7,8,12,15,17 ) Cluster No.2 : apathetic shoppers ( respondents nos 2,5,9,11,13,20 ) Cluster No.3 : economical shoppers ( respondents nos 4,10,14,16,18,19 )

32 Final cluster centers (centroids)..

33 Final Clusters..

34 Tests for Reliability and Validity of the Clusters … Too complex, hence omitted. ANOVA is not required, as felt by many practitioners.

35 Thank you !!!