Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques
K-means clustering This an elementary but very popular method for clustering. Our goal is to find the k mean vectors or “cluster centers”. Initialize k, m1, m2, …, mk Repeat Classify samples according to its nearest mi Recompute mi Until there is no change in mi Return m1, m2, …, mk
Complexity The computational complexity of the algorithm is defined as follows: O( n d c T ) Where d is the number of features, n is the number of examples, c is the number of clusters, and T is the number of iterations. The number of iterations is normally much less than the number of examples.
Figure 10.3
K-means clustering Disadvantage 1: Prone to fall into local minima. This can be solved with more computational power by running the algorithm many times with different initial means. Disadvantage 2: Susceptible to outliers. One solution is to replace the mean with the median.
K-means clustering Hugo Steinhaus Born in January 14, 1887 (Austria-Hungary). Professor at the University of Wroclaw, Notre Dame, and Sussex. Authored over 170 works in mathematics. First one to use k-means clustering
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques
The Sum-of-Squared Error We can now define the goal of clustering: Goal: To divide a dataset of examples into c disjoint subsets D1, D2, …, Dc, so that the distance between examples within the same partition is small compared to the distance between examples on different partitions. To achieve this, we define the c means by looking to minimize a metric.
Metric Let mi be the mean of examples on partition Di: mi = (1 / ni) Σ x (for all x in Di) Then the metric to minimize is the sum-of-squared errors: Je = Σi Σx || x – mi || 2 For all x in Di where index i goes along the clusters.
Figure 10.10
Others Hierarchical clustering Clusters have subclusters which also have subclusters and so on. Online clustering As time goes on new information may call for restructuring the clusters (plasticity). But we don’t want this to happen very often (stability).
Figure 10.11
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques
Vector Quantisation Data will be represented with prototype vectors.
Feature Mapping Input Nodes
Feature Mapping Input Nodes [ x1, x2, x3, x4 ] T [ w1, w2, w3, w4 ] T w1 w2 w3 w4
Feature Mapping Weight vector will be mapped into the feature space. [ w1, w2, w3, w4 ] T [ x1, x2, x3, x4 ] T
SOM Algorithm Initialization Select the number of neurons in the map Choose random values for all weights Learning Repeat For each example, find the neuron closest to the point: min || x - w ||
SOM Algorithm Winner takes all Input Nodes Update weights of winner only (and neighbors)
SOM Algorithm Update Weights Update weights for the closest neuron and neighbors: w t+1 = w t + A(x,w) (x – w) where is the learning rate Function A defines a neighboring function.
SOM Algorithm The neighboring function A:
SOM Algorithm Usage For every test point Select the closest neuron using minimum Euclidean distance: min || x - w ||
Mapping a Grid to a Grid
SOM Algorithm Comments Neighborhoods should be large at the beginning but short as the nodes gain a specific ordering Global ordering comes naturally (complexity theory) Architecture of the map: Few nodes: underfitting Many nodes: overfitting
Teuvo Kohonen Born in 1934, Finland He has several books and over 300 papers His most famous work is in Self Organizing Maps Member of the Academy of Finland Awards: IEEE Neural Networks Council Pioneer Award, 1991 Technical Achievement Award of IEEE, 1995 Frank Rosenblatt Technical Field Award, 2008
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques
Cluster Tendency Cluster tendency is a preprocessing step that indicates when data objects exhibit a clustering structure; it precludes using clustering when the data appears randomly generated under the uniform distribution over a sample window of interest in the attribute space
Example Cluster Tendency Clustering captures inherent data groups. Clustering does not capture groups; Results come from random variations.
Example Cluster Tendency Problem: How do we choose the sampling window? Rule of thumb: Create a window centered at the mean that captures half the total number of examples.
Cluster Validation Cluster validation is used to assess the value of the output of a clustering algorithm. Internal Statistics are devised to capture the quality of the induced clusters using the available data objects. External If the validation is performed by gathering statistics comparing the induced clusters against an external and independent classification of objects, the validation is called external.
Example Cluster Validation
Metrics Cluster Validation One type of statistical metrics is defined in terms of a 2 x2 table where each entry counts the number of object pairs that agree or disagree with the class and cluster to which they belong: E11 E12 E21 E22 Same class; Same cluster Different class; Different cluster Same class; Different cluster Different class; Same cluster
Examples Metrics Cluster Validation Rand: [ E11 + E22 ] / [ E11 + E12 + E21 + E22 ] Jaccard: E11 / [ E11 + E12 + E21 ]