cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14
7/29/2018 Today’s Topics Learning Without a Teacher K-Means Clustering Hierarchical Clustering Expectation-Maximization (using NB) Auto-Association ANNs (formerly heavily used in Deep Neural Networks) Read Section of Russell & Norvig plus "Standard Algorithm" section of the Wikipedia article on K-Means Clustering 12/8/16 cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

No teacher  Unsupervised ML Data abounds but labels hard to get, recall: Labels from Experts (sometimes that is us) Time-will-tell can produce training labels (eg, patients who responsed well to drug X) “The crowd” (eg, Amazon Turk, hobbyists annotating sky images) can label What might we do with unlabeled data? Tells us where future data will lie in feature space We might want algorithms to group/cluster data 12/8/16 cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Clustering (given: points w/o labels)
Feature Space : : : … : : . …. : : 12/8/16 cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Algorithm: K-Means Clustering
Choose K, the number of clusters Randomly choose K examples, let these be the initial cluster centers Repeat until nothing changes (or out of time) Assign each example to the closest cluster center Replace each cluster center with the centroid of its members (eg, use the mean value of each feature) If we want to ‘label’ NEW data, assign to nearest cluster center 12/8/16 cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Some On-Line Visualizations
Very short video (13 seconds) Short video (140 seconds) 12/8/16 cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Hierarchical Clustering
Find two (unpaired) most similar examples, create parent (pseudo) example using average values Repeat until one ‘super’ example left Only ‘look at’ top portion of result (depending on how many clusters one wants to inspect) 12/8/16 cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Hierarchical Clustering
If we want FOUR clusters 12/8/16 cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Expectation-Maximization (EM; Dempster et al. 1977)
Find the (locally) optimal settings for some parameters Often used to find the most likely values for some missing feature values E-step (reasoning) given a model, predict the most likely parameter values M-step (learning) given parameter values, learn a new model (then go to E-step unless new model same as old model) 12/8/16 cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

EM: One Concrete Usage Assume our missing parameter is the LABEL for all our examples (ie, unsupervised learning) Initially guess labels for each example Train Naïve Bayes using these now labeled examples Use the Naïve Bayes model to predict the labels for all the (originally) unlabeled ex’s If any labels changed, go to Step 2 12/8/16 cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Worked Example (assume K=2, ie two categories)
Initial RANDOM Assignments New Assignments (use Model 1 to predict) A B C T F ? Model 1 (use m=1 and 2 more T ex’s and 2 more F ex’s) p(A|T)= p(A|F)=0.33 p(B|T)= p(B|F)=0.67 p(C|T)= p(C|F)= p(T)= p(F)=0.38 12/8/16 cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Worked Example (assume K=2, ie two categories)
Initial RANDOM Assignments New Assignments (use Model 1 to predict) A B C T F T F In this tiny data set, after the first round the predictions stabilized, so done But in a larger dataset, likely to be some label changes the first few rounds Model 1 (use m=1 and 2 more T ex’s and 2 more F ex’s) p(A|T)= p(A|F)=0.33 p(B|T)= p(B|F)=0.67 p(C|T)= p(C|F)= p(T)= p(F)=0.38 12/8/16 cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Worked Example TWO (assume K=2, ie two categories)
Initial RANDOM Assignments New Assignments (use Model 1b to predict) A B C ? ? Model 1b (use m=1 and 2 more T ex’s and 2 more F ex’s) p(A|T)= p(A|F)= p(B|T)= p(B|F)= p(C|T)= p(C|F)= p(T)= p(F)= 12/8/16 cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Some Notes on EM via NB Could have BOTH labeled and unlabelled data – called semi-supervised ML Could use same idea to fill in missing feature values (common use of EM) use that feature as the category to predict using the other features (might do this separately for POS and NEG examples) More mathematical treatments of EM: 12/8/16 cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Recall: Auto-Association
Train ANN to Predict the Input Another unsupervised method 1 1 12/8/16 cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Wrapup on Unlabeled Data
Lots of unlabeled data; want to make good use of it Can cluster it Can augment the labeled data (semi-supervised ML) A weakness of many algo’s that use unlabeled data is user must select the number of clusters Many equally good ways to cluster unlabeled data, so how to judge which is best? Before performing clustering, make sure you don’t really have a SUPERVISED ML task! 12/8/16 cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Similar presentations

Presentation on theme: "cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14

Similar presentations

Presentation on theme: "cs540 - Fall 2015 (Shavlik©), Lecture 25, Week 14"— Presentation transcript:

Similar presentations

About project

Feedback