Download presentation
Presentation is loading. Please wait.
Published byEzra Horton Modified over 9 years ago
1
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore
2
Goal of Text Categorization Classify documents into a number of pre- defined categories. Documents can be in multiple categories Documents can be in none of the categories
3
Applications of Text Categorization Categorization of news stories for online retrieval Finding interesting information from the WWW Guiding a user's search through hypertext
4
Representation of Text Removal of stop words Reduction of word to its stem Preparation of feature vector
5
Representation of Text.................................................................. 2Comput 1Process 2Buy 3Memory.... This is a Document Vector
6
What's Next... Appropriateness of support vector machines for this application Support vector machine theory Conventional learning methods Experiments Results Conclusions
7
Why SVMs? High dimensional input space Few irrelevant features Sparse document vectors Text categorization problems are linearly separable
8
Support Vector Machines Visualization of a Support Vector Machine
9
Support Vector Machines Structural risk minimization
10
Support Vector Machines We define a structure of hypothesis spaces H i such that their respective VC dimensions d i increases
11
Support Vector Machines Lemma [Vapnik, 1982] Consider hyperplanes As hypotheses
12
Support Vector Machines If all example vectors are contained in A hypersphere of radius R and it is Required that
13
Support Vector Machines Then this set of hyperplane has a VC dimension d bounded by
14
Minimize Support Vector Machines Such that
15
Conventional Learning Methods Naïve Bayes classifier Rocchio algorithm K-nearest Neighbors Decision tree classifier
16
Naïve Bayes Classifier Consider a document vector with attributes a 1, a 2 … a n with target values v Bayesian approach:
17
Naïve Bayes Classifier We can rewrite that using Bayes theorem as
18
Naïve Bayes Classifier Naïve Bayes method assumes that the attributes are independent
19
Experiments Datasets Performance measures Results
20
Datasets Reuters-21578 dataset 9603 training examples 3299 testing documents Ohsumed Corpus 10000 training documents 10000 testing examples
21
Performance Measures Precision Probability that a document predicted to be in class ‘x’ truly belongs to that class Recall Probability that a document belonging to class ‘x’ is classified into that class Precision/recall breakeven point
22
Results Precision/recall break-even point on Ohsumed dataset
23
Results Precision/recall break-even point on Reuters dataset
24
Conclusions Introduces SVMs for text categorization Theoretical and empirical evidence that SVMs are well suited for text categorization Consistent improvement in accuracy over other methods
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.