Presentation is loading. Please wait.

Presentation is loading. Please wait.

Active Learning An example From Xu et al., “Training SpamAssassin with Active Semi- Supervised Learning”

Similar presentations


Presentation on theme: "Active Learning An example From Xu et al., “Training SpamAssassin with Active Semi- Supervised Learning”"— Presentation transcript:

1 Active Learning An example From Xu et al., “Training SpamAssassin with Active Semi- Supervised Learning”

2 Semi-Supervised and Active Learning Semi-Supervised learning: Using a combination of labeled and unlabeled examples, or using partially labeled examples Active learning: Having the learning system decide which examples to ask an oracle to label

3 Spamassassin Spamassassin: – Asks users to label e-mail, but they don’t often do it. – Also, they may not label the “most informative” examples. Spamassassin “self-training”: – Train classifier on small number of labeled examples. – Run these on unlabeled examples. Add the ones classified with high confidence to the original training set. (Problem – the ones classified with high confidence are not necessarily the most informative ones. – Retrain the classifier with the new, larger training set.

4 Xu et al. paper: Method Supervised learning: Train Naive Bayes classifier on small subset of (labeled) e-mails. Semi-supervised learning: Then run Spamassassin’s self-learning method, selecting a large number of new examples to add to training set. Retrain the classifier. Active learning: Cluster remaining unlabeled e-mails using k-means (on term-frequency feature vectors) with Euclidean distance. Select q representative unlabeled e-mails, first from “pure” clusters, then from “impure clusters”, making sure that many clusters are sampled from. The e-mails selected from each cluster are the ones closest to the cluster centroids. Ask the user to label these q examples. For each of these q examples, if the corresponding cluster is “pure”, propagate this label to a fraction p of the that cluster. Add the newly labeled examples to the training set, and retrain the classifier.

5 Ran on a large corpus (75K) of e-mails. Xu et al. paper: Results


Download ppt "Active Learning An example From Xu et al., “Training SpamAssassin with Active Semi- Supervised Learning”"

Similar presentations


Ads by Google