1 Partitioning Algorithms: Basic Concepts  Partition n objects into k clusters Optimize the chosen partitioning criterion Example: minimize the Squared.

1 Partitioning Algorithms: Basic Concepts  Partition n objects into k clusters Optimize the chosen partitioning criterion Example: minimize the Squared Error Squared Error of a cluster m i is the mean (centroid) of C i Squared Error of a clustering

2 Example of Square Error of Cluster 0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 C i ={P1, P2, P3} P1 = (3, 7) P2 = (2, 3) P3 = (7, 5) m i = (4, 5) |d(P1, m i )| 2 =(3-4) 2 +(7-5) 2 =5 |d(P2, m i )| 2 =8 |d(P3, m i )| 2 =9 Error (C i )=5+8+9=22 P3 P2 P1 mimi

3 Example of Square Error of Cluster 0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 C j ={P4, P5, P6} P4 = (4, 6) P5 = (5, 5) P6 = (3, 4) m j = (4, 5) |d(P4, m j )| 2 =(4-4) 2 +(6-5) 2 =1 |d(P5, m j )| 2 =1 |d(P6, m j )| 2 =1 Error (C j )=1+1+1=3 P5 P6 P4 mjmj

4 Partitioning Algorithms: Basic Concepts  Global optimal: examine all possible partitions k n possible partitions, too expensive!  Heuristic methods: k-means and k-medoids k-means (MacQueen’67): Each cluster is represented by center of cluster k-medoids (Kaufman & Rousseeuw’87): Each cluster is represented by one of the objects (medoid) in cluster

5 K-means  Initialization Arbitrarily choose k objects as the initial cluster centers (centroids)  Iteration until no change For each object O i  Calculate the distances between O i and the k centroids  (Re)assign O i to the cluster whose centroid is the closest to O i Update the cluster centroids based on current assignment

6 k-Means Clustering Method cluster mean current clusters new clusters objects relocated

7 Example  For simplicity, 1 dimensional objects and k=2.  Objects: 1, 2, 5, 6,7  K-means: Randomly select 5 and 6 as initial centroids; => Two clusters {1,2,5} and {6,7}; meanC1=8/3, meanC2=6.5 => {1,2}, {5,6,7}; meanC1=1.5, meanC2=6 => no change. Aggregate dissimilarity = 0.5^2 + 0.5^2 + 1^2 + 1^2 = 2.5

8 Variations of k-Means Method  Aspects of variants of k-means Selection of initial k centroids  E.g., choose k farthest points Dissimilarity calculations  E.g., use Manhattan distance Strategies to calculate cluster means  E.g., update the means incrementally

9 Strengths of k-Means Method  Strength Relatively efficient for large datasets  O(tkn) where n is # objects, k is # clusters, and t is # iterations; normally, k, t <<n Often terminates at a local optimum  global optimum may be found using techniques such as deterministic annealing and genetic algorithms

10 Weakness of k-Means Method  Weakness Applicable only when mean is defined, then what about categorical data?  k-modes algorithm Unable to handle noisy data and outliers  k-medoids algorithm Need to specify k, number of clusters, in advance  Hierarchical algorithms  Density-based algorithms

11 k-modes Algorithm  Handling categorical data: k-modes (Huang’98) Replacing means of clusters with modes  Given n records in cluster, mode is record made up of most frequent attribute values  In the example cluster, mode = (<=30, medium, yes, fair) Using new dissimilarity measures to deal with categorical objects

12 A Problem of K-means  Sensitive to outliers Outlier: objects with extremely large (or small) values  May substantially distort the distribution of the data + + Outlier

13 k-Medoids Clustering Method  k-medoids: Find k representative objects, called medoids PAM (Partitioning Around Medoids, 1987) CLARA (Kaufmann & Rousseeuw, 1990) CLARANS (Ng & Han, 1994): Randomized sampling 0 1 2 3 4 5 6 7 8 9 10 0123456789 k-meansk-medoids

14 PAM (Partitioning Around Medoids) (1987)  PAM (Kaufman and Rousseeuw, 1987)  Arbitrarily choose k objects as the initial medoids  Until no change, do (Re)assign each object to the cluster with the nearest medoid Improve the quality of the k-medoids (Randomly select a nonmedoid object, O random, compute the total cost of swapping a medoid with O random)  Work for small data sets (100 objects in 5 clusters)  Not efficient for medium and large data sets

15 Swapping Cost  For each pair of a medoid m and a non-medoid object h, measure whether h is better than m as a medoid  Use the squared-error criterion Compute E h -E m Negative: swapping brings benefit  Choose the minimum swapping cost

16 Four Swapping Cases  When a medoid m is to be swapped with a non-medoid object h, check each of other non-medoid objects j j is in cluster of m  reassign j  Case 1: j is closer to some k than to h; after swapping m and h, j relocates to cluster represented by k  Case 2: j is closer to h than to k; after swapping m and h, j is in cluster represented by h j is in cluster of some k, not m  compare k with h  Case 3: j is closer to some k than to h; after swapping m and h, j remains in cluster represented by k  Case 4: j is closer to h than to k; after swapping m and h, j is in cluster represented by h

17 PAM Clustering: Total swapping cost TC mh =  j C jmh j m h k k m hj k m h j Case 2 Case 3 h m k j Case 1 Case 4 C jmh = d(j, h)  d(j,k) < 0

18 Complexity of PAM  Arbitrarily choose k objects as the initial medoids  Until no change, do (Re)assign each object to the cluster with the nearest medoid Improve the quality of the k- medoids  For each pair of medoid m and non-medoid object h Calculate the swapping cost TC mh =  j C jmh O(1) O((n-k) 2 *k) O((n-k)*k) O((n-k) 2 *k) (n-k)*k times O(n-k)

19 Strength and Weakness of PAM  PAM is more robust than k-means in the presence of outliers because a medoid is less influenced by outliers or other extreme values than a mean  PAM works efficiently for small data sets but does not scale well for large data sets O(k(n-k) 2 ) for each iteration where n is # of data objects, k is # of clusters  Can we find the medoids faster?

20 CLARA (Clustering Large Applications) (1990)  CLARA (Kaufmann and Rousseeuw in 1990) Built in statistical analysis packages, such as S+  It draws multiple samples of data set, applies PAM on each sample, gives best clustering as output  Handle larger data sets than PAM (1,000 objects in 10 clusters)  Efficiency and effectiveness depends on the sampling

21 CLARA - Algorithm  Set mincost to MAXIMUM;  Repeat q times // draws q samples Create S by drawing s objects randomly from D; Generate the set of medoids M from S by applying the PAM algorithm; Compute cost(M,D) If cost(M, D)<mincost Mincost = cost(M, D); Bestset = M; Endif;  Endrepeat;  Return Bestset;

22 Complexity of CLARA  Set mincost to MAXIMUM;  Repeat q times Create S by drawing s objects randomly from D; Generate the set of medoids M from S by applying the PAM algorithm; Compute cost(M,D) If cost(M, D)<mincost Mincost = cost(M, D); Bestset = M; Endif ;  Endrepeat;  Return Bestset; O(1) O((s-k) 2 *k) O((n-k)*k) O(1) O((s-k) 2 *k+(n-k)*k)

23 Strengths and Weaknesses of CLARA  Strength: Handle larger data sets than PAM (1,000 objects in 10 clusters)  Weakness: Efficiency depends on sample size A good clustering based on samples will not necessarily represent a good clustering of whole data set if sample is biased

24 CLARANS (“Randomized” CLARA) (1994)  CLARANS (A Clustering Algorithm based on Randomized Search) (Ng and Han’94)  CLARANS draws sample in solution space dynamically  A solution is a set of k medoids  The solutions space contains solutions in total  The solution space can be represented by a graph where every node is a potential solution, i.e., a set of k medoids

25 Graph Abstraction  Every node is a potential solution (k-medoid)  Every node is associated with a squared error  Two nodes are adjacent if they differ by one medoid  Every node has k(n  k) adjacent nodes {O 1,O 2,…,O k } {O k+1,O 2,…,O k } {O k+n,O 2,…,O k } … n-k neighbors for one medoid k(n  k) neighbors for one node …

26 Graph Abstraction: CLARANS  Start with a randomly selected node, check at most m neighbors randomly  If a better adjacent node is found, moves to node and continue; otherwise, current node is local optimum; re- starts with another randomly selected node to search for another local optimum  When h local optimum have been found, returns best result as overall result

27 CLARANS N NN C C N NN < Local minimum … Compare no more than maxneighbor times numlocal Best Node Local minimum … … …

28 CLARANS - Algorithm  Set mincost to MAXIMUM;  For i=1 to h do // find h local optimum Randomly select a node as the current node C in the graph; J = 1; // counter of neighbors Repeat Randomly select a neighbor N of C; If Cost(N,D)<Cost(C,D) Assign N as the current node C; J = 1; Else J++; Endif; Until J > m Update mincost with Cost(C,D) if applicableEnd for;  End For  Return bestnode;

29 Graph Abstraction (k-means, k-modes, k-medoids)  Each vertex is a set of k-representative objects (means, modes, medoids)  Each iteration produces a new set of k-representative objects with lower overall dissimilarity  Iterations correspond to a hill descent process in a landscape (graph) of vertices

30 Comparison with PAM  Search for minimum in graph (landscape)  At each step, all adjacent vertices are examined; the one with deepest descent is chosen as next k-medoids  Search continues until minimum is reached  For large n and k values (n=1,000, k=10), examining all k(n  k) adjacent vertices is time consuming; inefficient for large data sets  CLARANS vs PAM For large and medium data sets, it is obvious that CLARANS is much more efficient than PAM For small data sets, CLARANS outperforms PAM significantly

31 When n=80, CLARANS is 5 times faster than PAM, while the cluster quality is the same.

32 Comparision with CLARA  CLARANS vs CLARA CLARANS is always able to find clusterings of better quality than those found by CLARA; CLARANS may use much more time than CLARA When the time used is the same, CLARANS is still better than CLARA

34 Hierarchies of Co-expressed Genes and Coherent Patterns The interpretation of co-expressed genes and coherent patterns mainly depends on the domain knowledge

35 A Subtle Situation  To split or not to split? It ’ s a question. group A group A 1 group A 2

1 Partitioning Algorithms: Basic Concepts  Partition n objects into k clusters Optimize the chosen partitioning criterion Example: minimize the Squared.

Similar presentations

Presentation on theme: "1 Partitioning Algorithms: Basic Concepts  Partition n objects into k clusters Optimize the chosen partitioning criterion Example: minimize the Squared."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Partitioning Algorithms: Basic Concepts  Partition n objects into k clusters Optimize the chosen partitioning criterion Example: minimize the Squared.

Similar presentations

Presentation on theme: "1 Partitioning Algorithms: Basic Concepts  Partition n objects into k clusters Optimize the chosen partitioning criterion Example: minimize the Squared."— Presentation transcript:

Similar presentations

About project

Feedback