1 Cluster Analysis Prepared by : Prof Neha Yadav.

1 Cluster Analysis Prepared by : Prof Neha Yadav

2 Application Areas.. Segmenting the market : example – segment the market on the basis of benefits sought. Segmenting the market : on the basis of demographics, geo-demographics, psychographics, buyer behavior ( quality consciousness and price sensitivity ). Understanding buyer behaviors : identify homogeneous groups of buyers.

3 Application Areas.. Identifying new product opportunities : by clustering brands and products, competitive sets within the market can be determined. Brands in the same cluster compete more fiercely with each other than in other clusters. A company can examine its current offerings compared to those of its competitors to identify potential new product opportunities. Selecting Test Markets : by grouping cities into homogeneous clusters, it is possible to select comparable cities to test various marketing strategies.

4 Application Areas.. Reducing data : cluster analysis is used as a general data reduction technique to develop clusters or subgroups of data. For example, to describe differences in consumers’ product usage behavior, consumers must first be clustered into groups. The differences between the groups can be examined using multiple discriminant analysis.

5 SPSS Commands ( Stage 1 ).. Click on : “ Analyze “ Click on : “ Classify “, “ HIERARCHICAL CLUSTER ” On the dialogue box which appears, select all the variables required for the Cluster analysis by clicking on the right arrow to transfer them from the variable list on the left to the variables box on the right.

6 SPSS Commands ( Stage 1 ).. Under the small section called “ CLUSTER ”, select “ CASES ” because you would be clustering cases ( rows of data that are normally respondents or objects grouped into clusters. In another small box called “ Display ”, select “ Statistics ” and “ Plots ”. Click on “ Method ”. A dialogue box will then open up. Choose “ Ward Linkages ” as the clustering method. In the box titled “ Measure ”, choose “ Squared Euclidean Distance ”. Click “ Continue ” to return to the main dialogue box.

7 SPSS Commands ( Stage 1 ).. Click “ Statistics ” on the main dialogue box. Choose “ Agglomeration Schedule ” so that it will appear in the final output. Click “ Continue ”. Click “ Plots ” on the main dialogue box. Choose “ Dendogram ”. Then, on the box called ICICLE, choose “ All Clusters ” and “ Verticals ”.This will get you all the required plots on the output. Click “ Continue ” to return to the main dialogue box. Click “ OK ” on the main dialogue box to get the output of the hierarchical cluster analysis.

8 SPSS Commands ( Stage 2 ).. After the number of clusters have been identified using the hierarchical clustering method, you can proceed to the second stage of the cluster analysis. This second stage is called the “non-hierarchical ” or “ K – Means ” clustering. This generally provides a stable solution, and is used if you know how many clusters you want. This also called as “ Quick Clustering ”.

9 SPSS Commands ( Stage 2 ).. Click “ Classify ”, followed by “ K-Means Cluster ”. Fill in the desired number of clusters you have identified from Stage – 1. Click “ Options ” on the main dialogue box. Select “ Initial Cluster Centers ”, “ ANOVA Table ” and “Cluster information for each case ”, in the box labeled “ Statistics ”. Click “ Continue ” to return to the main dialogue box. Click “ OK ” to get the output which contains the final cluster centers from the K-Means clustering method.

10 Conducting Cluster Analysis 1.Formulate the problem. 2.Select a distance measure. 3.Select a clustering procedure. 4.Decide on the number of clusters. 5.Interpret and profile clusters. 6.Assess the validity of clusters.

11 Clustering Procedures.. Clustering Procedures Hierarchical 1 Agglomerative ** Divisive Non hierarchical ( K –means ) Sequential thresholdParallel threshold Optimizing Partitioning 2

12 Clustering Procedures.. Agglomerative ** Linkage Methods Single Linkage Complete Linkage Average Linkage 1 Variance Methods Ward’s Method 1 Centroid Methods

13 Clustering Procedures.. Hierarchical Clustering : is characterized by the development of a hierarchy or tree like structure. Hierarchical methods can be agglomerative or divisive. Agglomerative Clustering : starts with each object in a separate cluster. Clusters are formed by grouping objects into bigger and bigger clusters. This process is continued until all the objects are members of a some cluster. These are commonly used in market research. They consist of linkage methods, variance methods, centroid methods. Divisive Clustering : starts with all the objects grouped into a single cluster. Clusters are divided or split until each object is in a separate cluster.

14 Clustering Procedures.. ( agglomerative - Linkage ) a.Single Linkage Method : based on the minimum distance or the nearest neighbor rule. The first two objects clustered are those that have the smallest distance between them. The next shortest distance is identified, and either the third object is clustered with the first two, or a new two-object cluster is formed. Minimum distance Cluster 1 Cluster 2 Single Linkage

15 Clustering Procedures.. ( agglomerative – Linkage ) b. Complete Linkage Method : based on the maximum distance or the furthest neighbor approach. In this method, the distance between two clusters is calculated as the distance between their two farthest points. Maximum distance Complete Linkage Cluster 1 Cluster 2

16 Clustering Procedures.. ( agglomerative – Linkage ) c. Average Linkage Method : in this method, the distance between two clusters is defined as the average of the distances between all the pairs of objects, where one member of the pair is from each of the clusters. As the average linkage method uses information on all the pairs of distances, not merely the minimum or maximum distances, it is usually the most preferred method. Average distance Cluster 1 Cluster 2 Average Linkage

17 Clustering Procedures.. ( agglomerative – Variance ) Variance method : attempts to generate clusters to minimize within - cluster variance. The most commonly used variance method is the Ward’s Procedure. Ward’s Method

18 Clustering Procedures.. ( agglomerative – Centroid ) Centroid Method : the distance between two clusters is the distance between their centroids ( means for all the variables ). Centroid Method Cluster 1Cluster 2

19 Clustering Procedures.. ( non – hierarchical or k-means ) Sequential threshold method : a non-hierarchical or k - means clustering method in which a cluster center is first selected and then all the objects within a specified threshold value from the center are grouped together. Then a new cluster center or seed is selected, and the process is repeated for the unclustered points. Once an object is clustered with a seed, it is no longer considered for clustering with subsequent seeds.

20 Parallel threshold method : operates similar to the earlier described method, except that several cluster centers are selected simultaneously, and objects within the threshold level are grouped with the nearest center. Clustering Procedures.. ( non – hierarchical or k-means )

21 Clustering Procedures.. ( non – hierarchical or k-means ) Optimizing partitioning method : differs from the other two threshold methods in that the objects can later be reassigned to clusters to optimize an overall criterion, such as average within-cluster distance for a given number of clusters.

22 Clustering Procedure.. Step 1 : an initial clustering solution is obtained using the hierarchical procedure, such as average or ward’s method. Step 2 : the number of clusters and cluster centroids so obtained are used as inputs to the optimizing partitioning method ( non-hierarchical procedure ).

23 Example..( cluster shoppers on the basis of attitude towards shopping ) Consumers express their agreement or disagreement ( 1 = disagree, 7 = agree ) V 1 : shopping is fun. V 2 : shopping is bad for your budget. V 3 : I combine shopping with eating out. V 4 : I try to get the best buys when shopping. V 5 : I don’t care about shopping. V 6 : you save a lot of money by comparing prices.

24 Example..( cluster shoppers on the basis of attitude towards shopping ) Sample size : 20 respondents Note : (at least a sample of 100 respondents should be used in real life situations ).

25 Output for cluster analysis..

26 Ward Linkage

27 Ward Linkage In the agglomeration schedule, respondents 14 and 16 are combined at stage 1, as shown in the column labeled “ Clusters Combined ”. The squared Euclidean distance between these two respondents is given under the column labeled “ Coefficients ”. The last column “ next stage ” indicates the stage at which another case ( respondent ) or cluster is combined with this one.

28 Dendrogram * * * * * * H I E R A R C H I C A L C L U S T E R A N A L Y S I S * * * * * * Dendrogram using Ward Method Rescaled Distance Cluster Combine C A S E 0 5 10 15 20 25 Label Num +---------+---------+---------+---------+---------+ 14 òø 16 òú 10 òú 4 òôòòòø 19 ò÷ ùòòòòòòòòòòòòòòòòòòòòòòòø 18 òòòòò÷ ùòòòòòòòòòòòòòòòòòòòø 2 òûòø ó ó 13 ò÷ ùòòòòòòòòòòòòòòòòòòòòòòòòò÷ ó 5 òø ó ó 11 òôò÷ ó 9 òú ó 20 ò÷ ó 3 òûòø ó 8 ò÷ ó ó 6 òø ó ó 7 òú ùòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷ 12 òú ó 1 òôòú 17 ò÷ ó 15 òòò÷

29 Dendrogram The dendogram is read from left to right. Vertical lines represent clusters that are joined together. The position of the line on the scale indicates the distances at which the clusters were joined. This information is helpful in deciding the number of clusters.

30 Cluster Centroids TABLE - Cluster Centroids (1= disagree, 7= agree) Means of Variables Cluster No. ( bottom up from dendogram ) V 1 V 2 V 3 V 4 V 5 V 6 Shopping is fun Bad for budget Shopping & eating out Get best buys Don’t care about shopping Save money by comparing prices 15.7503.6256.0003.1251.7503.875 21.6673.0001.8333.5005.5003.333 33.5005.8333.3336.0003.5006.000

31 Interpretation of Clusters Cluster No.1 : fun loving shoppers ( respondents nos 1,3,6,7,8,12,15,17 ) Cluster No.2 : apathetic shoppers ( respondents nos 2,5,9,11,13,20 ) Cluster No.3 : economical shoppers ( respondents nos 4,10,14,16,18,19 )

32 Final cluster centers (centroids)..

33 Final Clusters..

34 Tests for Reliability and Validity of the Clusters … Too complex, hence omitted. ANOVA is not required, as felt by many practitioners.

35 Thank you !!!

1 Cluster Analysis Prepared by : Prof Neha Yadav.

Similar presentations

Presentation on theme: "1 Cluster Analysis Prepared by : Prof Neha Yadav."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Cluster Analysis Prepared by : Prof Neha Yadav.

Similar presentations

Presentation on theme: "1 Cluster Analysis Prepared by : Prof Neha Yadav."— Presentation transcript:

Similar presentations

About project

Feedback