Download presentation
Presentation is loading. Please wait.
1
Jianping Fan Dept of CS UNC-Charlotte
Spectral Clustering Jianping Fan Dept of CS UNC-Charlotte
2
Key issues for Data Clustering
Similarity or distance function Inter-cluster similarity or distance Intra-cluster similarity or distance Number of clusters K Decision for data clustering Objective Function Inter-cluster distances are maximized Intra-cluster distances are minimized
3
Summary of K-means Problems of K-means Centers: random & density scan
K: start from small K & separate iteratively; start from large K and merge sequentially Outliers: Problems of K-means Locations of Centers Number of Clusters K Sensitive to Outliers Data Manifolds (Shapes of Data Distributions) Experiences
4
Problems of K-MEANs Distance Function Optimization Step:
Inter-cluster distances are maximized Intra-cluster distances are minimized Distance Function Geometry Distance Optimization Step: Assignment Step:
5
Problems of K-MEANs Similarity function cannot handle special data manifold effectively! Intra-cluster similarity and inter-cluster similarity are not optimized jointly or simultaneously! Pre-selected locations of cluster centers may not be acceptable!
6
K-Means Clustering Expected Achieved Why K-Means fails?
7
Why K-Means Clustering Fails?
Expected Achieved Similarity or distance function Inter-cluster similarity or distance Intra-cluster similarity or distance Number of clusters K Decision for data clustering Objective Function
8
Why K-Means Clustering Fails?
Achieved Expected Number of clusters K may not be an issue here Objective function?
9
Why K-Means Clustering Fails?
Expected Achieved Data Manifold: Relationship rather than distance Distance Function & Decision for Data Clustering
10
Key issues for Data Clustering
Inter-cluster similarity or distance Intra-cluster similarity or distance Number of clusters K Decision for data clustering Similarity or distance function
11
Lecture Outline Motivation Graph overview and construction
Spectral Clustering Cool implementations
12
Spectral Clustering Example – 2 Spirals
Dataset exhibits complex cluster shapes K-means performs very poorly in this space due bias toward dense spherical clusters. Relationship vs. Geometry Distance In the embedded space given by two leading eigenvectors, clusters are trivial to separate.
13
Spectral Clustering Relationship Similarity representation
Inter-cluster similarity Intra-cluster similarity Number of clusters K Decision for clustering Relationship Objective Function
14
Graph-Based Similarity Representation ---considering data manifold
Geometry Distance Relationship vs.
15
Spectral Clustering Example
Why k-means fails? Geometry vs. Manifold
16
Graph-Based Similarity Representation
Distance vs. Relationship
17
Graph-Based Similarity Representation
Distance vs. Relationship
18
Graph-Based Similarity Representation
Distance vs. Relationship
19
Graph-Based Similarity Representation
Number of clusters matters
20
Lecture Outline Motivation Graph overview and construction
Spectral Clustering Cool implementation
21
Graph-based Representation of Data Similarity(Relationship)
22
Graph-based Representation of Data Similarity(Relationship)
23
Graph-based Representation of Data Relationship
24
Manifold (Shape of Data Distribution)
25
Graph-based Representation of Data Relationships
Manifold
26
Graph-based Representation of Data Relationships
27
Graph-based Representation of Data Relationships
How to generate such graph for data relationship representation?
28
Data Graph Construction
29
Graph-based Representation of Data Relationships
30
Graph-based Representation of Data Relationships
32
Graph-based Representation of Data Relationships
33
Graph-based Representation of Data Relationships
34
Graph Cut
35
Lecture Outline Motivation Graph overview and construction
Spectral Clustering---considering intra-cluster similarity and inter-cluster similarity jointly! Cool implementations
36
Relationship function for Graph construction
Key issues for Spectral Clustering Relationship function for Graph construction Inter-cluster similarity or distance Intra-cluster similarity or distance Number of clusters K Decision for data clustering Objective Function
37
How to Do Graph Partitioning?
Citation Group Identification
38
How to Do Graph Partitioning?
Social Group Identification
39
How to Do Graph Partitioning?
Hot Topic Detection
40
Graph-based Representation of Data Relationships
41
Intra-cluster similarity
42
Spectral Clustering cut Intra-Cluster Similarity:
Inter-Cluster Similarity:
43
Spectral Clustering Graphcut Objective Function for Spectral Clustering 1. Maximize Intra-Cluster Similarity 2. Minimize Inter-Cluster Similarity
44
Objective Function for Spectral Clustering
Graphcut Objective Function for Spectral Clustering Min
45
Spectral Clustering Graphcut
Clustering via Graph Cut on weak connection points: Minimize inter-cluster similarity
46
Inter-cluster similarity
47
Inter-cluster similarity
51
Graph-based Representation of Data Relationships
52
Graph Cut
57
Eigenvectors & Eigenvalues
60
Normalized Cut A graph G(V, E) can be partitioned into two disjoint sets A, B Cut is defined as: Optimal partition of the graph G is achieved by minimizing the cut Min ( )
61
Normalized Cut Normalized Cut
Association between partition set and whole graph
62
Normalized Cut
63
Normalized Cut
64
Normalized Cut
65
Normalized Cut Normalized Cut becomes
Normalized cut can be solved by eigenvalue equation:
66
Extending Binary Normalized Cut to Multi-Class
67
K-way Min-Max Cut Intra-cluster similarity Inter-cluster similarity Decision function for spectral clustering Minimize inter-cluster similarity but maximizing intra-cluster similarity
68
Mathematical Description of Spectral Clustering
Refined decision function for spectral clustering We can further define:
69
Refined decision function for spectral clustering
This decision function can be solved as
70
Spectral Clustering Algorithm Ng, Jordan, and Weiss
Motivation Given a set of points We would like to cluster them into k subsets
71
Algorithm Form the affinity matrix Define if
Scaling parameter chosen by user Define D a diagonal matrix whose (i,i) element is the sum of A’s row i
72
Algorithm Form the matrix Find , the k largest eigenvectors of L
These form the the columns of the new matrix X Note: have reduced dimension from nxn to nxk
73
Algorithm Form the matrix Y Treat each row of Y as a point in
Renormalize each of X’s rows to have unit length Y Treat each row of Y as a point in Cluster into k clusters via K-means
74
Algorithm Final Cluster Assignment
Assign point to cluster j iff row i of Y was assigned to cluster j
75
Why? If we eventually use K-means, why not just apply K-means to the original data? This method allows us to cluster non-convex regions
76
Some Examples
85
User’s Prerogative Affinity matrix construction
Choice of scaling factor Realistically, search over and pick value that gives the tightest clusters Choice of k, the number of clusters Choice of clustering method
86
How to select k? Eigengap: the difference between two consecutive eigenvalues. Most stable clustering is generally given by the value k that maximises the expression Largest eigenvalues of Cisi/Medline data λ1 λ2 Choose k=2
87
Recap – The bottom line
88
Summary Spectral clustering can help us in hard clustering problems
The technique is simple to understand The solution comes from solving a simple algebra problem which is not hard to implement Great care should be taken in choosing the “starting conditions”
89
Problems for Spectral Clustering
Number of Clusters K Objective Function Optimization Better Similarity (Relationship) Functions
90
What’s Visual Analytics?
Initial Clustering Result & Visualization
91
What’s Visual Analytics?
Initial Clustering Result & Visualization Similarity-preserving data projection: from high-dimensional space for data representation to 2D space for visualization Data layout Mistakes induced by data projection
92
What’s Visual Analytics?
Human Advising via HCI
93
What’s Visual Analytics?
Computer Interpretation of Human Advices Must-Link vs. Not-Link Data Clustering with Constraints
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.