Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.

Similar presentations


Presentation on theme: "Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte."— Presentation transcript:

1 Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte

2 2 Lecture Outline Motivation Graph overview and construction Spectral Clustering Cool implementations

3 3 Semantic interpretations of clusters

4 4 Spectral Clustering Example – 2 Spirals Dataset exhibits complex cluster shapes  K-means performs very poorly in this space due bias toward dense spherical clusters. In the embedded space given by two leading eigenvectors, clusters are trivial to separate.

5 Original PointsK-means (2 Clusters) Spectral Clustering Example Why k-means fail for these two examples? Geometry vs. Manifold

6 6 Lecture Outline Motivation Graph overview and construction Spectral Clustering Cool implementation

7 7 Graph-based Representation of Data Similarity

8 8 similarity

9 9 Graph-based Representation of Data Relationship

10 10 Manifold

11 11 Graph-based Representation of Data Relationships Manifold

12 12 Graph-based Representation of Data Relationships

13 13 Data Graph Construction

14 14 Graph-based Representation of Data Relationships

15 15 Graph-based Representation of Data Relationships

16 16

17 17 Graph-based Representation of Data Relationships

18 18 Graph-based Representation of Data Relationships

19 19 Graph Cut

20 20 Lecture Outline Motivation Graph overview and construction Spectral Clustering Cool implementations

21 21 Graph-based Representation of Data Relationships

22 22

23 23 Graph Cut

24 24

25 25

26 26

27 27

28 28 Graph-based Representation of Data Relationships

29 29 Graph Cut

30 30

31 31

32 32

33 33

34 34 Eigenvectors & Eigenvalues

35 35

36 36

37 37 Normalized Cut A graph G(V, E) can be partitioned into two disjoint sets A, B Optimal partition of the graph G is achieved by minimizing the cut Cut is defined as: Min ()

38 38 Normalized Cut Association between partition set and whole graph

39 39 Normalized Cut

40 40 Normalized Cut

41 41 Normalized Cut

42 42 Normalized Cut Normalized Cut becomes Normalized cut can be solved by eigenvalue equation:

43 43 K-way Min-Max Cut Intra-cluster similarity Inter-cluster similarity Decision function for spectral clustering

44 44 Mathematical Description of Spectral Clustering Refined decision function for spectral clustering We can further define:

45 45 Refined decision function for spectral clustering This decision function can be solved as

46 46 Spectral Clustering Algorithm Ng, Jordan, and Weiss Motivation Given a set of points We would like to cluster them into k subsets

47 47 Algorithm Form the affinity matrix Define if Scaling parameter chosen by user Define D a diagonal matrix whose (i,i) element is the sum of A’s row i

48 48 Algorithm Form the matrix Find, the k largest eigenvectors of L These form the the columns of the new matrix X Note: have reduced dimension from nxn to nxk

49 49 Algorithm Form the matrix Y Renormalize each of X’s rows to have unit length Y Treat each row of Y as a point in Cluster into k clusters via K-means

50 50 Algorithm Final Cluster Assignment Assign point to cluster j iff row i of Y was assigned to cluster j

51 51 Why? If we eventually use K-means, why not just apply K-means to the original data? This method allows us to cluster non-convex regions

52 52 Some Examples

53 53

54 54

55 55

56 56

57 57

58 58

59 59

60 60

61 61 User’s Prerogative Affinity matrix construction Choice of scaling factor Realistically, search over and pick value that gives the tightest clusters Choice of k, the number of clusters Choice of clustering method

62 62 Largest eigenvalues of Cisi/Medline data λ1λ1 λ2λ2 How to select k ? Eigengap: the difference between two consecutive eigenvalues. Most stable clustering is generally given by the value k that maximises the expression  Choose k=2

63 63 Recap – The bottom line

64 64 Summary Spectral clustering can help us in hard clustering problems The technique is simple to understand The solution comes from solving a simple algebra problem which is not hard to implement Great care should be taken in choosing the “starting conditions”

65 Spectral Clustering

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80


Download ppt "Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte."

Similar presentations


Ads by Google