Presentation is loading. Please wait.

Presentation is loading. Please wait.

Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.

Similar presentations


Presentation on theme: "Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore."— Presentation transcript:

1 Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore

2 Goal of Text Categorization Classify documents into a number of pre- defined categories. Documents can be in multiple categories Documents can be in none of the categories

3 Applications of Text Categorization Categorization of news stories for online retrieval Finding interesting information from the WWW Guiding a user's search through hypertext

4 Representation of Text Removal of stop words Reduction of word to its stem Preparation of feature vector

5 Representation of Text.................................................................. 2Comput 1Process 2Buy 3Memory.... This is a Document Vector

6 What's Next... Appropriateness of support vector machines for this application Support vector machine theory Conventional learning methods Experiments Results Conclusions

7 Why SVMs? High dimensional input space Few irrelevant features Sparse document vectors Text categorization problems are linearly separable

8 Support Vector Machines Visualization of a Support Vector Machine

9 Support Vector Machines Structural risk minimization

10 Support Vector Machines We define a structure of hypothesis spaces H i such that their respective VC dimensions d i increases

11 Support Vector Machines Lemma [Vapnik, 1982] Consider hyperplanes As hypotheses

12 Support Vector Machines If all example vectors are contained in A hypersphere of radius R and it is Required that

13 Support Vector Machines Then this set of hyperplane has a VC dimension d bounded by

14 Minimize Support Vector Machines Such that

15 Conventional Learning Methods Naïve Bayes classifier Rocchio algorithm K-nearest Neighbors Decision tree classifier

16 Naïve Bayes Classifier Consider a document vector with attributes a 1, a 2 … a n with target values v Bayesian approach:

17 Naïve Bayes Classifier We can rewrite that using Bayes theorem as

18 Naïve Bayes Classifier Naïve Bayes method assumes that the attributes are independent

19 Experiments Datasets Performance measures Results

20 Datasets Reuters-21578 dataset 9603 training examples 3299 testing documents Ohsumed Corpus 10000 training documents 10000 testing examples

21 Performance Measures Precision Probability that a document predicted to be in class ‘x’ truly belongs to that class Recall Probability that a document belonging to class ‘x’ is classified into that class Precision/recall breakeven point

22 Results Precision/recall break-even point on Ohsumed dataset

23 Results Precision/recall break-even point on Reuters dataset

24 Conclusions Introduces SVMs for text categorization Theoretical and empirical evidence that SVMs are well suited for text categorization Consistent improvement in accuracy over other methods


Download ppt "Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore."

Similar presentations


Ads by Google