Clustering and Multidimensional Scaling

Slides:



Advertisements
Similar presentations
CLUSTERING.
Advertisements

Clustering II.
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Clustering.
Hierarchical Clustering
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Hierarchical Clustering, DBSCAN The EM Algorithm
Clustering: Introduction Adriano Joaquim de O Cruz ©2002 NCE/UFRJ
CLUSTERING PROXIMITY MEASURES
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Data Mining Techniques: Clustering
Metrics, Algorithms & Follow-ups Profile Similarity Measures Cluster combination procedures Hierarchical vs. Non-hierarchical Clustering Statistical follow-up.
Introduction to Bioinformatics
Cluster Analysis.
Cluster Analysis Hal Whitehead BIOL4062/5062. What is cluster analysis? Non-hierarchical cluster analysis –K-means Hierarchical divisive cluster analysis.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)
What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or.
Clustering II.
Mutual Information Mathematical Biology Seminar
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Clustering Petter Mostad. Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning.
Cluster Analysis (1).
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Dr. Michael R. Hyman Cluster Analysis. 2 Introduction Also called classification analysis and numerical taxonomy Goal: assign objects to groups so that.
1 Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong
Cluster analysis 포항공과대학교 산업공학과 확률통계연구실 이 재 현. POSTECH IE PASTACLUSTER ANALYSIS Definition Cluster analysis is a technigue used for combining observations.
1 Gene Ontology Javier Cabrera. 2 Outline Goal: How to identify biological processes or biochemical pathways that are changed by treatment.Goal: How to.
1 Cluster Analysis Objectives ADDRESS HETEROGENEITY Combine observations into groups or clusters such that groups formed are homogeneous (similar) within.
Cluster Analysis Cluster Analysis Cluster analysis is a class of techniques used to classify objects or cases into relatively homogeneous groups.
Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L10.1 Lecture 10: Cluster analysis l Uses of cluster analysis.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
Unsupervised Learning
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Multivariate statistical methods Cluster analysis.
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Unsupervised Learning
Clustering (1) Clustering Similarity measure Hierarchical clustering
Multivariate statistical methods
Semi-Supervised Clustering
Chapter 15 – Cluster Analysis
Clustering 聚类分析and Multidimensional Scaling
Charity Morgan Functional Data Analysis April 12, 2005
Quality Control at a Local Brewery
John Nicholas Owen Sarah Smith
Revision (Part II) Ke Chen
Classification (Dis)similarity measures, Resemblance functions
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Revision (Part II) Ke Chen
Multivariate Statistical Methods
CSCI N317 Computation for Scientific Applications Unit Weka
Matrix Algebra and Random Vectors
Dimension reduction : PCA and Clustering
Data Mining – Chapter 4 Cluster Analysis Part 2
What Is Good Clustering?
Clustering Wei Wang.
Multivariate Linear Regression Models
Multidimensional Scaling
Cluster Analysis.
Text Categorization Berlin Chen 2003 Reference:
Hierarchical Clustering
Unsupervised Learning
Presentation transcript:

Clustering and Multidimensional Scaling Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia

Clustering Searching data for a structure of “natural” groupings An exploratory technique Provides means for Assessing dimensionality Identifying outliers Suggesting interesting hypotheses concerning relationships

Classification vs. Clustering Known number of groups Assign new observations to one of these groups Cluster analysis No assumptions on the number of groups or the group structure Based on similarities or distances (dissimilarities)

Difficulty in Natural Grouping

Choice of Similarity Measure Nature of variables Discrete, continuous, binary Scale of measurement Nominal, ordinal, interval, ratio Subject matter knowledge Items: proximity indicated by some sort of distance Variables: grouped by correlation coefficient or measures of association

Some Well-known Distances Euclidean distance Statistical distance Minkowski metric

Two Popular Measures of Distance for Nonnegative Variables Canberra metric Czekanowski coefficient

A Caveat Use “true” distances when possible i.e., distances satisfying distance properties Most clustering algorithms will accept subjectively assigned distance numbers that may not satisfy, for example, the triangle inequality

Example of Binary Variable 1 2 3 4 5 Item i Item j

Squared Euclidean Distance for Binary Variables Suffers from weighting the 1-1 and 0-0 matches equally e.g., two people both read ancient Greek is stronger evidence of similarity than the absence of this capability

Contingency Table Item k Totals 1 Item i a b a + b c d c + d a+c b+d Item i a b a + b c d c + d a+c b+d p = a + b + c + d

Some Binary Similarity Coefficients

Example 12.1

Example 12.1

Example 12.1

Example 12.1: Similarity Matrix with Coefficient 1

Conversion of Similarities and Distances Similarities from distances e.g., “True” distances from similarities Matrix of similarities must be nonnegative definite

Contingency Table Variable k Totals 1 Variable i a b a + b c d c + d Variable i a b a + b c d c + d a+c b+d n = a + b + c + d

Product Moment Correlation as a Measure of Similarity Related to the chi-square statistic (r2 = c2/n) for testing independence For n fixed, large similarity is consistent with presence of dependence

Example 12.2 Similarities of 11 Languages

Example 12.2 Similarities of 11 Languages

Hierarchical Clustering : Agglomerative Methods Initially a many clusters as objects The most similar objects are first grouped Initial groups are merged according to their similarities Eventually, all subgroups are fused into a single cluster

Hierarchical Clustering : Divisive Methods Initial single group is divided into two subgroups such that objects in one subgroup are “far from” objects in the other These subgroups are then further divided into dissimilar subgroups Continues until there are as many subgroups as objects

Inter-cluster Distance for Linkage Methods

Example 12.3: Single Linkage

Example 12.3: Single Linkage

Example 12.3: Single Linkage

Example 12.3: Single Linkage

Example 12.3: Single Linkage Resultant Dendrogram

Example 12.4 Single Linkage of 11 Languages

Example 12.4 Single Linkage of 11 Languages

Pros and Cons of Single Linkage

Example 12.5: Complete Linkage

Example 12.5: Complete Linkage

Example 12.5: Complete Linkage

Example 12.5: Complete Linkage

Example 12.6 Complete Linkage of 11 Languages

Example 12.7 Clustering Variables

Example 12.7 Correlations of Variables

Example 12.7: Complete Linkage Dendrogram

Average Linkage

Example 12.8 Average Linkage of 11 Languages

Example 12.9 Average Linkage of Public Utilities

Example 12.9 Average Linkage of Public Utilities

Ward’s Hierarchical Clustering Method For a given cluster k, let ESSk be the sum of the squared deviation of every item in the cluster from the cluster mean At each step, the union of every possible pair of clusters is considered The two clusters whose combination results in the smallest increase in the sum of Essk are joined

Example 12.10 Ward’s Clustering Pure Malt ScotchWhiskies

Final Comments Sensitive to outliers, or “noise points” No reallocation of objects that may have been “incorrectly” grouped at an early stage Good idea to try several methods and check if the results are roughly consistent Check stability by perturbation

Inversion

Nonhierarchical Clustering: K-means Method Partition the items into K initial clusters Proceed through the list of items, assigning an item to the cluster whose centroid is nearest Recalculate the centroid for the cluster receiving the new item and for the cluster losing the item Repeat until no more reassignment

Example 12.11 K-means Method Observations Item x1 x2 A 5 3 B -1 1 C -2 D -3

Example 12.11 K-means Method Coordinates of Centroid Cluster x1 x2 (AB) (5+(-1))/2 = 2 (3+1)/2 = 2 (CD) (1+(-3))/2=-1 (-2+(-2))/2=-2

Example 12.11 K-means Method

Example 12.11 Final Clusters Squared distances to group centroids Item Cluster A B C D 40 41 89 (BCD) 52 4 5

F Score

Normal Mixture Model

Likelihood

Statistical Approach

BIC for Special Structures

Software Package MCLUST Combines hierarchical clustering, EM algorithm, and BIC In the E step of EM, a matrix is created whose jth row contains the estimates of the conditional probabilities that observation xj belongs to cluster 1, 2, . . ., K At convergence xj is assigned to cluster k for which the conditional probability of membership is largest

Example 12.13 Clustering of Iris Data

Example 12.13 Clustering of Iris Data

Example 12.13 Clustering of Iris Data

Example 12.13 Clustering of Iris Data

Multidimensional Scaling (MDS) Displays (transformed) multivariate data in low-dimensional space Different from plots based on PC Primary objective is to “fit” the original data into low-dimensional system Distortion caused by reduction of dimensionality is minimized Distortion Similarities or dissimilarities among data

Multidimensional Scaling Given a set of similarities (or distances) between every pair of N items Find a representation of the items in few dimensions Inter-item proximities “nearly” match the original similarities (or distances)

Non-metric and Metric MDS Non-metric MDS Uses only the rank orders of the N(N-1)/2 original similarities and not their magnitudes Metric MDS Actual magnitudes of original similarities are used Also known as principal coordinate analysis

Objective

Kruskal’s Stress

Takane’s Stress

Basic Algorithm Obtain and order the M pairs of similarities Try a configuration in q dimensions Determine inter-item distances and reference numbers Minimize Kruskal’s or Takane’s stress Move the points around to obtain an improved configuration Repeat until minimum stress is obtained

Example 12.14 MDS of U.S. Cities

Example 12.14 MDS of U.S. Cities

Example 12.14 MDS of U.S. Cities

Example 12.15 MDS of Public Utilities

Example 12.15 MDS of Public Utilities

Example 12.16 MDS of Universities

Example 12.16 Metric MDS of Universities

Example 12.16 Non-metric MDS of Universities