Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 11-12 a Machine Learning.

Mehdi Ghayoumi MSB rm 132 mghayoum@kent.edu Ofc hr: Thur, 11-12 a Machine Learning

Announcements Hw1 Feb-11 and Hw2 Mar-11 Programming HW1 Feb-25 and Programming Hw2 Apr-1 Quiz 1 March-11 and Quiz 2 Apr-8 First project report Mar-16,18. Second project report Apr-13,15. April 29 – Exam review May 4 – Final Exam

Machine Learning

Supervised Learning – Classification: – Regression: Unsupervised Learning – Clustering: – ART and SOM –PCA

Machine Learning Supervised vs. Unsupervised Learning Hierarchical Clustering – Hierarchical Agglomerative Clustering (HAC) Non-Hierarchical Clustering – K-means – Mixtures of Gaussians and EM-Algorithm

Machine Learning Clustering: Partition unlabeled examples into disjoint subsets of clusters, such that: – Examples within a cluster are similar – Examples in different clusters are different

Machine Learning

Cluster based on similarities/distances Distance measure between instances x r and x s Minkowski (L p ) (Euclidean for p = 2) City-block distance Hierarchical Clustering Machine Learning

Agglomerative Clustering Assume a similarity function that determines the similarity of two instances: sim(x,y). How to compute similarity of two clusters each possibly containing multiple instances? –Single Link: Similarity of two most similar members. –Complete Link: Similarity of two least similar members. Machine Learning

Single Link Agglomerative Clustering Use maximum similarity of pairs: Can result in “straggly” (long and thin) clusters due to chaining effect. Machine Learning

Single Link Example Machine Learning

Complete Link Agglomerative Clustering Use minimum similarity of pairs: Makes more “tight,” spherical clusters that are typically preferable. Machine Learning

Complete Link Example Machine Learning

Direct clustering methods require a specification of the number of clusters, k, desired. Randomly choose k instances as seeds, one per cluster. Form initial clusters based on these seeds. Iterate, repeatedly reallocating instances to different clusters to improve the overall clustering. Stop when clustering converges or after a fixed number of iterations. Machine Learning

Assumes instances are real-valued vectors. Clusters based on centroids, center of gravity, or mean of points in a cluster, c: Reassignment of instances to clusters is based on distance to the current cluster centroids. Machine Learning

Hard clustering typically assumes that each instance is given a “hard” assignment to exactly one cluster. Does not allow uncertainty in class membership or for an instance to belong to more than one cluster. Soft clustering gives probabilities that an instance belongs to each of a set of clusters. Allow uncertainty in class membership or for an instance to belong to more than one cluster. Machine Learning

Probabilistic method for soft clustering. Direct method that assumes k clusters: {c 1, c 2,… c k } Soft version of k-means. Assumes a probabilistic model of categories that allows computing P(c i | E) for each category, c i, for a given example, E. For text, typically assume a naïve-Bayes category model. –Parameters  = {P(c i ), P(w j | c i ): i  {1,…k}, j  {1,…,|V|}} Machine Learning

Iterate following two steps until convergence: –Expectation (E-step): Compute P(c i | E) for each example given the current model, and probabilistically re-label the examples based on these posterior probability estimates. –Maximization (M-step): Re-estimate the model parameters, , from the probabilistically re-labeled data. Machine Learning

Unlabeled Examples +  +  +  +   + Assign random probabilistic labels to unlabeled data Initialize: Machine Learning

Prob. Learner +  +  +  +   + Give soft-labeled training data to a probabilistic learner Initialize: Machine Learning

Prob. Learner Prob. Classifier +  +  +  +   + Produce a probabilistic classifier Initialize: Machine Learning

Prob. Learner Prob. Classifier Relabel unlabled data using the trained classifier +  +  +  +   + E Step: Machine Learning

Prob. Learner +  +  +  +   + Prob. Classifier Continue EM iterations until probabilistic labels on unlabeled data converge. Retrain classifier on relabeled data M step: Machine Learning

How to evaluate clustering? Improving search to converge faster and avoid local minima. Overlapping clustering. Ensemble clustering. Clustering structured relational data. Semi-supervised methods other than EM Machine Learning

Thank you!

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 11-12 a Machine Learning.

Similar presentations

Presentation on theme: "Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 11-12 a Machine Learning."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 11-12 a Machine Learning.

Similar presentations

Presentation on theme: "Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 11-12 a Machine Learning."— Presentation transcript:

Similar presentations

About project

Feedback