Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jianping Fan Dept of CS UNC-Charlotte

Similar presentations


Presentation on theme: "Jianping Fan Dept of CS UNC-Charlotte"— Presentation transcript:

1 Jianping Fan Dept of CS UNC-Charlotte
Spectral Clustering Jianping Fan Dept of CS UNC-Charlotte

2 Key issues for Data Clustering
Similarity or distance function Inter-cluster similarity or distance Intra-cluster similarity or distance Number of clusters K Decision for data clustering Objective Function Inter-cluster distances are maximized Intra-cluster distances are minimized

3 Summary of K-means Problems of K-means Centers: random & density scan
K: start from small K & separate iteratively; start from large K and merge sequentially Outliers: Problems of K-means Locations of Centers Number of Clusters K Sensitive to Outliers Data Manifolds (Shapes of Data Distributions) Experiences

4 Problems of K-MEANs Distance Function Optimization Step:
Inter-cluster distances are maximized Intra-cluster distances are minimized Distance Function Geometry Distance Optimization Step: Assignment Step:

5 Problems of K-MEANs Similarity function cannot handle special data manifold effectively! Intra-cluster similarity and inter-cluster similarity are not optimized jointly or simultaneously! Pre-selected locations of cluster centers may not be acceptable!

6 K-Means Clustering Expected Achieved Why K-Means fails?

7 Why K-Means Clustering Fails?
Expected Achieved Similarity or distance function Inter-cluster similarity or distance Intra-cluster similarity or distance Number of clusters K Decision for data clustering Objective Function

8 Why K-Means Clustering Fails?
Achieved Expected Number of clusters K may not be an issue here Objective function?

9 Why K-Means Clustering Fails?
Expected Achieved Data Manifold: Relationship rather than distance Distance Function & Decision for Data Clustering

10 Key issues for Data Clustering
Inter-cluster similarity or distance Intra-cluster similarity or distance Number of clusters K Decision for data clustering Similarity or distance function

11 Lecture Outline Motivation Graph overview and construction
Spectral Clustering Cool implementations

12 Spectral Clustering Example – 2 Spirals
Dataset exhibits complex cluster shapes K-means performs very poorly in this space due bias toward dense spherical clusters. Relationship vs. Geometry Distance In the embedded space given by two leading eigenvectors, clusters are trivial to separate.

13 Spectral Clustering Relationship Similarity representation
Inter-cluster similarity Intra-cluster similarity Number of clusters K Decision for clustering Relationship Objective Function

14 Graph-Based Similarity Representation ---considering data manifold
Geometry Distance Relationship vs.

15 Spectral Clustering Example
Why k-means fails? Geometry vs. Manifold

16 Graph-Based Similarity Representation
Distance vs. Relationship

17 Graph-Based Similarity Representation
Distance vs. Relationship

18 Graph-Based Similarity Representation
Distance vs. Relationship

19 Graph-Based Similarity Representation
Number of clusters matters

20 Lecture Outline Motivation Graph overview and construction
Spectral Clustering Cool implementation

21 Graph-based Representation of Data Similarity(Relationship)

22 Graph-based Representation of Data Similarity(Relationship)

23 Graph-based Representation of Data Relationship

24 Manifold (Shape of Data Distribution)

25 Graph-based Representation of Data Relationships
Manifold

26 Graph-based Representation of Data Relationships

27 Graph-based Representation of Data Relationships
How to generate such graph for data relationship representation?

28 Data Graph Construction

29 Graph-based Representation of Data Relationships

30 Graph-based Representation of Data Relationships

31

32 Graph-based Representation of Data Relationships

33 Graph-based Representation of Data Relationships

34 Graph Cut

35 Lecture Outline Motivation Graph overview and construction
Spectral Clustering---considering intra-cluster similarity and inter-cluster similarity jointly! Cool implementations

36 Relationship function for Graph construction
Key issues for Spectral Clustering Relationship function for Graph construction Inter-cluster similarity or distance Intra-cluster similarity or distance Number of clusters K Decision for data clustering Objective Function

37 How to Do Graph Partitioning?
Citation Group Identification

38 How to Do Graph Partitioning?
Social Group Identification

39 How to Do Graph Partitioning?
Hot Topic Detection

40 Graph-based Representation of Data Relationships

41 Intra-cluster similarity

42 Spectral Clustering cut Intra-Cluster Similarity:
Inter-Cluster Similarity:

43 Spectral Clustering Graphcut Objective Function for Spectral Clustering 1. Maximize Intra-Cluster Similarity 2. Minimize Inter-Cluster Similarity

44 Objective Function for Spectral Clustering
Graphcut Objective Function for Spectral Clustering Min

45 Spectral Clustering Graphcut
Clustering via Graph Cut on weak connection points: Minimize inter-cluster similarity

46 Inter-cluster similarity

47 Inter-cluster similarity

48

49

50

51 Graph-based Representation of Data Relationships

52 Graph Cut

53

54

55

56

57 Eigenvectors & Eigenvalues

58

59

60 Normalized Cut A graph G(V, E) can be partitioned into two disjoint sets A, B Cut is defined as: Optimal partition of the graph G is achieved by minimizing the cut Min ( )

61 Normalized Cut Normalized Cut
Association between partition set and whole graph

62 Normalized Cut

63 Normalized Cut

64 Normalized Cut

65 Normalized Cut Normalized Cut becomes
Normalized cut can be solved by eigenvalue equation:

66 Extending Binary Normalized Cut to Multi-Class

67 K-way Min-Max Cut Intra-cluster similarity Inter-cluster similarity Decision function for spectral clustering Minimize inter-cluster similarity but maximizing intra-cluster similarity

68 Mathematical Description of Spectral Clustering
Refined decision function for spectral clustering We can further define:

69 Refined decision function for spectral clustering
This decision function can be solved as

70 Spectral Clustering Algorithm Ng, Jordan, and Weiss
Motivation Given a set of points We would like to cluster them into k subsets

71 Algorithm Form the affinity matrix Define if
Scaling parameter chosen by user Define D a diagonal matrix whose (i,i) element is the sum of A’s row i

72 Algorithm Form the matrix Find , the k largest eigenvectors of L
These form the the columns of the new matrix X Note: have reduced dimension from nxn to nxk

73 Algorithm Form the matrix Y Treat each row of Y as a point in
Renormalize each of X’s rows to have unit length Y Treat each row of Y as a point in Cluster into k clusters via K-means

74 Algorithm Final Cluster Assignment
Assign point to cluster j iff row i of Y was assigned to cluster j

75 Why? If we eventually use K-means, why not just apply K-means to the original data? This method allows us to cluster non-convex regions

76 Some Examples

77

78

79

80

81

82

83

84

85 User’s Prerogative Affinity matrix construction
Choice of scaling factor Realistically, search over and pick value that gives the tightest clusters Choice of k, the number of clusters Choice of clustering method

86 How to select k? Eigengap: the difference between two consecutive eigenvalues. Most stable clustering is generally given by the value k that maximises the expression Largest eigenvalues of Cisi/Medline data λ1 λ2 Choose k=2

87 Recap – The bottom line

88 Summary Spectral clustering can help us in hard clustering problems
The technique is simple to understand The solution comes from solving a simple algebra problem which is not hard to implement Great care should be taken in choosing the “starting conditions”

89 Problems for Spectral Clustering
Number of Clusters K Objective Function Optimization Better Similarity (Relationship) Functions

90 What’s Visual Analytics?
Initial Clustering Result & Visualization

91 What’s Visual Analytics?
Initial Clustering Result & Visualization Similarity-preserving data projection: from high-dimensional space for data representation to 2D space for visualization Data layout Mistakes induced by data projection

92 What’s Visual Analytics?
Human Advising via HCI

93 What’s Visual Analytics?
Computer Interpretation of Human Advices Must-Link vs. Not-Link Data Clustering with Constraints


Download ppt "Jianping Fan Dept of CS UNC-Charlotte"

Similar presentations


Ads by Google