Clustering
Revesion of Yesterday's Algorithm
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input : set of objects (n), no of clusters (k) Output : set of k clusters Algo Randomly select k samples & mark them a initial cluster Repeat Assign/ reassign in sample to any given cluster to which it is most similar depending upon the mean of the cluster Update the cluster’s mean until No Change.
K-Means (graph) Step1: Form k centroids, randomly Step2: Calculate distance between centroids and each object Use Euclidean’s law do determine min distance: d(A,B) = (x2-x1)2 + (y2-y1)2 Step3: Assign objects based on min distance to k clusters Step4: Calculate centroid of each cluster using C = (x1+x2+…xn , y1+y2+…yn) n n Go to step 2. Repeat until no change in centroids.
K-Mediod (PAM) Also called Partitioning Around Mediods. Step1: choose k mediods Step2: assign all points to closest mediod Step3: form distance matrix for each cluster and choose the next best mediod. i.e., the point closest to all other points in cluster go to step2. Repeat until no change in any mediods
What are Agglomerative Algorithms?? Bottom Up Approach Simple Outputs a hierarchy Structure is more informative Need not specify the number of clusters
Dendogram
Euclidean Distance
Distance Matrix
Agglomerative Algorithm Step1: Make each object as a cluster Step2: Calculate the Euclidean distance from every point to every other point. i.e., construct a Distance Matrix Step3: Identify two clusters with shortest distance. Merge them Go to Step 2 Repeat until all objects are in one cluster
Agglomerative Algorithm Approaches Single Link Complete Link Average Link
Simple Example Item E A C B D 1 2 3 5 6
Another Example Find single link technique to find clusters in the given database. X Y 1 0.4 0.53 2 0.22 0.38 3 0.35 0.32 4 0.26 0.19 5 0.08 0.41 6 0.45 0.3
Plot given data
Construct a distance matrix 1 2 3 4 5 6 0.24 0.22 0.15 0.37 0.2 0.34 0.14 0.28 0.29 0.23 0.25 0.11 0.39
Identify two nearest clusters
Repeat process until all objects in same cluster
Average link Average distance matrix
Use below data and draw single link, complete link and average link dendogram. Object X Y A 2 B 3 C 1 D E 1.5 0.5