Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning
Announcements Hw1 Feb-11 and Hw2 Mar-11 Programming HW1 Feb-25 and Programming Hw2 Apr-1 Quiz 1 March-11 and Quiz 2 Apr-8 First project report Mar-16,18. Second project report Apr-13,15. April 29 – Exam review May 4 – Final Exam
Machine Learning
Supervised Learning – Classification: – Regression: Unsupervised Learning – Clustering: – ART and SOM –PCA
Machine Learning Supervised vs. Unsupervised Learning Hierarchical Clustering – Hierarchical Agglomerative Clustering (HAC) Non-Hierarchical Clustering – K-means – Mixtures of Gaussians and EM-Algorithm
Machine Learning Clustering: Partition unlabeled examples into disjoint subsets of clusters, such that: – Examples within a cluster are similar – Examples in different clusters are different
Machine Learning
Cluster based on similarities/distances Distance measure between instances x r and x s Minkowski (L p ) (Euclidean for p = 2) City-block distance Hierarchical Clustering Machine Learning
Agglomerative Clustering Assume a similarity function that determines the similarity of two instances: sim(x,y). How to compute similarity of two clusters each possibly containing multiple instances? –Single Link: Similarity of two most similar members. –Complete Link: Similarity of two least similar members. Machine Learning
Single Link Agglomerative Clustering Use maximum similarity of pairs: Can result in “straggly” (long and thin) clusters due to chaining effect. Machine Learning
Single Link Example Machine Learning
Complete Link Agglomerative Clustering Use minimum similarity of pairs: Makes more “tight,” spherical clusters that are typically preferable. Machine Learning
Complete Link Example Machine Learning
Direct clustering methods require a specification of the number of clusters, k, desired. Randomly choose k instances as seeds, one per cluster. Form initial clusters based on these seeds. Iterate, repeatedly reallocating instances to different clusters to improve the overall clustering. Stop when clustering converges or after a fixed number of iterations. Machine Learning
Assumes instances are real-valued vectors. Clusters based on centroids, center of gravity, or mean of points in a cluster, c: Reassignment of instances to clusters is based on distance to the current cluster centroids. Machine Learning
Hard clustering typically assumes that each instance is given a “hard” assignment to exactly one cluster. Does not allow uncertainty in class membership or for an instance to belong to more than one cluster. Soft clustering gives probabilities that an instance belongs to each of a set of clusters. Allow uncertainty in class membership or for an instance to belong to more than one cluster. Machine Learning
Probabilistic method for soft clustering. Direct method that assumes k clusters: {c 1, c 2,… c k } Soft version of k-means. Assumes a probabilistic model of categories that allows computing P(c i | E) for each category, c i, for a given example, E. For text, typically assume a naïve-Bayes category model. –Parameters = {P(c i ), P(w j | c i ): i {1,…k}, j {1,…,|V|}} Machine Learning
Iterate following two steps until convergence: –Expectation (E-step): Compute P(c i | E) for each example given the current model, and probabilistically re-label the examples based on these posterior probability estimates. –Maximization (M-step): Re-estimate the model parameters, , from the probabilistically re-labeled data. Machine Learning
Unlabeled Examples + + + + + Assign random probabilistic labels to unlabeled data Initialize: Machine Learning
Prob. Learner + + + + + Give soft-labeled training data to a probabilistic learner Initialize: Machine Learning
Prob. Learner Prob. Classifier + + + + + Produce a probabilistic classifier Initialize: Machine Learning
Prob. Learner Prob. Classifier Relabel unlabled data using the trained classifier + + + + + E Step: Machine Learning
Prob. Learner + + + + + Prob. Classifier Continue EM iterations until probabilistic labels on unlabeled data converge. Retrain classifier on relabeled data M step: Machine Learning
How to evaluate clustering? Improving search to converge faster and avoid local minima. Overlapping clustering. Ensemble clustering. Clustering structured relational data. Semi-supervised methods other than EM Machine Learning
Thank you!