Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.

Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore

Goal of Text Categorization Classify documents into a number of pre- defined categories. Documents can be in multiple categories Documents can be in none of the categories

Applications of Text Categorization Categorization of news stories for online retrieval Finding interesting information from the WWW Guiding a user's search through hypertext

Representation of Text Removal of stop words Reduction of word to its stem Preparation of feature vector

Representation of Text.................................................................. 2Comput 1Process 2Buy 3Memory.... This is a Document Vector

What's Next... Appropriateness of support vector machines for this application Support vector machine theory Conventional learning methods Experiments Results Conclusions

Why SVMs? High dimensional input space Few irrelevant features Sparse document vectors Text categorization problems are linearly separable

Support Vector Machines Visualization of a Support Vector Machine

Support Vector Machines Structural risk minimization

Support Vector Machines We define a structure of hypothesis spaces H i such that their respective VC dimensions d i increases

Support Vector Machines Lemma [Vapnik, 1982] Consider hyperplanes As hypotheses

Support Vector Machines If all example vectors are contained in A hypersphere of radius R and it is Required that

Support Vector Machines Then this set of hyperplane has a VC dimension d bounded by

Minimize Support Vector Machines Such that

Conventional Learning Methods Naïve Bayes classifier Rocchio algorithm K-nearest Neighbors Decision tree classifier

Naïve Bayes Classifier Consider a document vector with attributes a 1, a 2 … a n with target values v Bayesian approach:

Naïve Bayes Classifier We can rewrite that using Bayes theorem as

Naïve Bayes Classifier Naïve Bayes method assumes that the attributes are independent

Experiments Datasets Performance measures Results

Datasets Reuters-21578 dataset 9603 training examples 3299 testing documents Ohsumed Corpus 10000 training documents 10000 testing examples

Performance Measures Precision Probability that a document predicted to be in class ‘x’ truly belongs to that class Recall Probability that a document belonging to class ‘x’ is classified into that class Precision/recall breakeven point

Results Precision/recall break-even point on Ohsumed dataset

Results Precision/recall break-even point on Reuters dataset

Conclusions Introduces SVMs for text categorization Theoretical and empirical evidence that SVMs are well suited for text categorization Consistent improvement in accuracy over other methods

Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.

Similar presentations

Presentation on theme: "Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.

Similar presentations

Presentation on theme: "Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore."— Presentation transcript:

Similar presentations

About project

Feedback