Active Learning to Classify

Slides:



Advertisements
Similar presentations
Classification Classification Examples
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Linear Classifiers (perceptrons)
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Data Mining Classification: Alternative Techniques
CHAPTER 10: Linear Discrimination
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
SVM—Support Vector Machines
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Partitioned Logistic Regression for Spam Filtering Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Christopher Meek Microsoft.
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Classification and Decision Boundaries
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
Sparse vs. Ensemble Approaches to Supervised Learning
Support Vector Machines for Multiple- Instance Learning Authors: Andrews, S.; Tsochantaridis, I. & Hofmann, T. (Advances in Neural Information Processing.
Ensemble Learning: An Introduction
Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.
Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Masquerade Detection Mark Stamp 1Masquerade Detection.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Active Learning for Class Imbalance Problem
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Data mining and machine learning A brief introduction.
SVM by Sequential Minimal Optimization (SMO)
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Fast and accurate text classification via multiple linear discriminant projections Soumen Chakrabarti Shourya Roy Mahesh Soundalgekar IIT Bombay
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Catalog Integration R. Agrawal, R. Srikant: WWW-10.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Classification Ensemble Methods 1
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Classification using Co-Training
Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.
Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.
Classification Results for Folder Classification on Enron Dataset.
SVMs in a Nutshell.
Support Vector Machines Optimization objective Machine Learning.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Using Asymmetric Distributions to Improve Text Classifier Probability Estimates Paul N. Bennett Computer Science Dept. Carnegie Mellon University SIGIR.
Machine Learning: Ensemble Methods
Artificial Intelligence
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Perceptrons Lirong Xia.
CS 4/527: Artificial Intelligence
COSC 4335: Other Classification Techniques
Ensembles.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Perceptrons Lirong Xia.
Presentation transcript:

Active Learning to Classify Email 4/22/05

What’s the problem? How will I ever sort all these new emails?

What’s the problem? To get an idea of what mail I have gotten, I will need to sort these new messages. A great solution would be if I could sort just a few and my computer could sort the rest for me. To make it really accurate, the assistant could even pick which messages I should manually sort, so that it can learn to do the best job possible. (Active Learning)

What’s the solution? To solve this problem, we need a way to choose the most informative training examples. This requires some way of sorting emails by how informative they are for classification. So, where do we start?

Email Classification So, what do we know about email classification? SVM and Naïve Bayes significantly outperform many other methods (Brutlag 2000, Kiritchenko 2001) Both SVM and Naïve Bayes are suitable for “online” learning required for solving this problem effectively. (Cauwenberghs 2000) Classifier accuracy varies more between users than between algorithms. (Kiritchenko 2001) SVM performs better for users with more email in each folder. (Brutlag 2000) Users with more email, such as in our example problem, tend to have more email in each folder than other users. (Klimt 2004) Thus, we have chosen SVM as the basis for this research.

“Bag-of-Words” Model classification decision email data “bag of words” SVM

Multiple SVMs Using separate SVMs for each section LLSF classification decision email data SVMs

Active Learning with SVM In general, examples closer to the decision boundary hyperplane will cause larger displacement of that boundary. (Schohn and Cohn 2000, Tong 2001)

What if our prediction is right? Labeling the closer example: Labeling the farther example:

And if our prediction is wrong? Picking the closer example: Picking the farther example:

Incorporating Diversity In this example, the instance near the top is intuitively more likely to be informative. This is known as “diversity” (Brinker 2003).

Active Learning with SVM But what about when you have multiple SVMs (like one-vs-rest)? (Yan 2003)

The Enron Corpus 150+ users 200,000 emails

Initial Results Trained on 10%, Tested on 90%

Chrono-Diverse Algorithm The way a user sorts email changes over time. Pick training data that are maximally different from previous data with respect to time.

Combination Algorithm Combine strengths of Standard and Chrono-Diverse. Take a weighted combination of their results. Adjust weighting with parameter lambda.

Results Trained on 10%, Tested on 90%

Parameter Tuning

Conclusions State-of-the-art algorithm for active learning with text classification performs horribly on email data! Choosing emails for time diversity works very well. Combining the two works best.

Future Work Improve the efficiency of SVM or find a better alternative Determine when using chronological diversity performs best and worst Adapt the algorithm to online classification