Presentation is loading. Please wait.

Presentation is loading. Please wait.

Análisis de Cluster.

Similar presentations


Presentation on theme: "Análisis de Cluster."— Presentation transcript:

1 Análisis de Cluster

2

3

4 Selección de variables:
Consideraciones teórico- conceptuales y prácticas. La técnica de AC no puede determinar cuales variables son relevantes y cuales no Las variables no apropiadas pueden afectar el análisis

5 What is clustering? A grouping of data objects such that the objects within a group are similar (or related) to one another and different from (or unrelated to) the objects in other groups Inter-cluster distances are maximized Intra-cluster distances are minimized

6 Why do we cluster? Clustering : given a collection of data objects group them so that Similar to one another within the same cluster Dissimilar to the objects in other clusters Clustering results are used: As a stand-alone tool to get insight into data distribution Visualization of clusters may unveil important information As a preprocessing step for other algorithms Efficient indexing or compression often relies on clustering

7 Applications of clustering?
Image Processing cluster images based on their visual content Web Cluster groups of users based on their access patterns on webpages Cluster webpages based on their content Bioinformatics Cluster similar proteins together (similarity wrt chemical structure and/or functionality etc) Many more…

8 The clustering task Group observations into groups so that the observations belonging in the same group are similar, whereas observations in different groups are different Basic questions: What does “similar” mean What is a good partition of the objects? I.e., how is the quality of a solution measured How to find a good partition of the observations

9 Observations to cluster
Real-value attributes/variables e.g., salary, height Binary attributes e.g., gender (M/F), has_cancer(T/F) Nominal (categorical) attributes e.g., religion (Christian, Muslim, Buddhist, Hindu, etc.) Ordinal/Ranked attributes e.g., military rank (soldier, sergeant, lutenant, captain, etc.) Variables of mixed types multiple attributes with various types

10

11 Algoritmos jerárquicos

12 Aglomerativos

13 vecino más proximo

14 Ejemplo

15

16

17 Hierarchical joining algorithms
Centroid Single (nearest-neighbour): distance between two clusters = distance between two closest members of the two clusters. Complete (furthest neighbour): distance between two clusters = distance between two most distant cluster members. Centroid : distance between two clusters = distance between multivariate means of each cluster. Cluster 1 Single Cluster 2 Cluster 3 Complete 2001 Bio 8100s Applied Multivariate Biostatistics

18 Hierarchical joining algorithms (cont’d)
Cluster 1 Average: distance between two clusters = average distance between all members of the two clusters. Median: distance between two clusters = median distance between all members of the two clusters. Ward: distance between two clusters = average distance between all members of the two clusters with adjustment for covariances. Cluster 2 Cluster 3 Mean/median/adjusted mean of all pairwise distances 2001 Bio 8100s Applied Multivariate Biostatistics

19 Partitioned clustering: the procedure
X1 Choose k “seed” cases which are spread apart from center of all objects as much as possible. Assign all remaining objects to nearest seed. Reassign objects so that within-group sum of squares is reduced… …and continue to do so until SSwithin is minimized. Seed 1 Seed 2 Seed 3 X2 Objects Seeds Object center Bio 8100s Applied Multivariate Biostatistics 2001


Download ppt "Análisis de Cluster."

Similar presentations


Ads by Google