Download presentation
Presentation is loading. Please wait.
Published byKathryn McCormick Modified over 8 years ago
1
CLUSTER ANALYSIS
2
What is Cluster analysis? Cluster analysis is a techniques for grouping objects, cases, entities on the basis of multiple variables. The advantage of the technique is that it is applicable to both metric and non- metric data. Secondly, the grouping can be done post hoc, i.e. after the primary data survey is over. The technique has wide applications in all branches of management. However, it is most often used for market segmentation analysis.
3
Cluster analysis- basic tenets Can be used to cluster objects, individuals and entities Similarity is based on multiple variables Measures proximity between study variables Groups that are grouped in one cluster are homogenous as compared to others Can be conducted on metric, non-metric as well as mixed data
4
Usage of cluster analysis Market segmentation – customers/potential customers can be split into smaller more homogenous groups by using the method.(ACORN- A classification of residential neighbourhood based on house, car, empt etc. PRIZM- Potential rating index by zip market based on education, family life cycle, race, ethnicity etc. Segmenting industries – the same grouping principle can be applied for industrial consumers.
5
Usage of cluster analysis Segmenting markets – cities or regions with similar or common traits can be grouped on the basis of climatic or socio-economic conditions Career planning and training analysis – for human resource planning people can be grouped into clusters on the basis of their educational/experience or aptitude and aspirations. Segmenting financial sector/instruments – different factors like raw material cost, financial allocations, seasonality and other factors are being used to group sectors together to understand the growth and performance of a group of industries.
6
Statistics associated with cluster analysis Metric data analysis Where, d ij = distance between person i and j. k = variable (interval / ratio) i = object j = object
7
Statistics associated with cluster analysis Non-metric data Simple matching coefficient = Jaccard coefficient = Where P=positive matches N=negative matches M=mismatches
8
Key concepts in cluster analysis Agglomeration schedule: A hierarchical method that provides information on the objects, starting with the most similar pair and then at each stage provides information on the object joining the pair at a later stage. ANOVA table: The univariate or one way ANOVA statistics for each clustering variable. The higher is the ANOVA value, the higher is the difference between the clusters on that variable. Cluster variate: The variables or parameters representing the objects to be clustered and used to calculate the similarity between objects. Cluster centroid: The average values of the objects on all the variables in the cluster variate.
9
Key concepts in cluster analysis Cluster seeds: Initial cluster centres in the non-hierarchical clustering that are the initial points from which one starts. Then the clusters are created around these seeds. Cluster membership: This indicates the address or the cluster to which a particular person/object belongs. Dendrogram: This is a tree like diagram that is used to graphically present the cluster results. The vertical axis represents the objects and the horizontal represents the inter-respondent distance. The figure is to be read from left to right. Distances between final cluster centres: These are the distances between the individual pairs of clusters. A robust solution that is able to demarcate the groups distinctly is the one where the inter cluster distance is large; the larger the distance the more distinct are the clusters.
10
Key concepts in cluster analysis Entropy group: The individuals or small groups that do not seem to fit into any cluster. Final cluster centres: The mean value of the cluster on each of the variables that is a part of the cluster variate. Hierarchical methods: A step-wise process that starts with the most similar pair and formulates a tree-like structure composed of separate clusters. Non-hierarchical methods: Cluster seeds or centres are the starting points and one builds individual clusters around it based on some pre-specified distance of the seeds.
11
Key concepts in cluster analysis Proximity matrix: A data matrix that consists of pair-wise distances/similarities between the objects. It is a N x N matrix, where N is the number of objects being clustered. Summary: Number of cases in each cluster is indicated in the non-hierarchical clustering method. Vertical icicle diagram: Quite similar to the dendogram, it is a graphical method to demonstrate the composition of the clusters. The objects are individually displayed at the top. At any given stage the columns correspond to the objects being clustered, and the rows correspond to the number of clusters. An icicle diagram is read from bottom to top.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.