Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Linear Document Classifier.

Similar presentations


Presentation on theme: "Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Linear Document Classifier."— Presentation transcript:

1 Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Linear Document Classifier

2 Support Vector Machines: Slide 2 Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers Binary classification y=+1 for positive class, y=-1 for negative class Vector representation for documents denotes +1 denotes -1 How would you classify this data? b a

3 Support Vector Machines: Slide 3 Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers Binary classification y=+1 for positive class, y=-1 for negative class Vector representation for documents denotes +1 denotes -1 How would you classify this data? +  f(d)

4 Support Vector Machines: Slide 4 Copyright © 2001, 2003, Andrew W. Moore Decision Boundary d1d1 d2d2 d4d4 d3d3 f(d) 1.How to classify documents using f(d)? 2.How to find the line f(d) ? w a and w b are the weights for word a and b a b

5 Support Vector Machines: Slide 5 Copyright © 2001, 2003, Andrew W. Moore How to Classify Documents ? d1d1 d2d2 d4d4 d3d3 f(d) w a and w b are the weights for word a and b a b

6 Support Vector Machines: Slide 6 Copyright © 2001, 2003, Andrew W. Moore Decision Boundary d1d1 d2d2 d4d4 d3d3 f(d) 1.How to classify documents using f(d)? 2.How to find the line f(d) ? w a and w b are the weights for word a and b a b

7 Support Vector Machines: Slide 7 Perception Algorithm Initialize Repeat Receive a labeled document (d, y) (y=+1 or -1) Check if doc d is classified correctly yf(d) > 0 ? Yes: do nothing No: d1d1 d2d2 d4d4 d3d3 f(d) b a

8 Support Vector Machines: Slide 8 Perception Algorithm Initialize Repeat Receive a labeled document (d, y) (y=+1 or -1) Check if doc d is classified correctly yf(d) > 0 ? Yes: do nothing No: d1d1 d2d2 d4d4 d3d3 f(d) b a

9 Support Vector Machines: Slide 9 Perception Algorithm Initialize Repeat Receive a labeled document (d, y) (y=+1 or -1) Check if doc d is classified correctly yf(d) > 0 ? Yes: do nothing No: d1d1 d2d2 d4d4 d3d3 f(d) b a

10 Support Vector Machines: Slide 10 10 Geometrical Interpretation

11 Support Vector Machines: Slide 11 Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers denotes +1 denotes -1 How would you classify this data? f(d)

12 Support Vector Machines: Slide 12 Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers denotes +1 denotes -1 How would you classify this data? f(d)

13 Support Vector Machines: Slide 13 Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers denotes +1 denotes -1 Any of these would be fine....but which is best?

14 Support Vector Machines: Slide 14 Copyright © 2001, 2003, Andrew W. Moore Classifier Margin denotes +1 denotes -1 Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.

15 Support Vector Machines: Slide 15 Copyright © 2001, 2003, Andrew W. Moore Maximum Margin denotes +1 denotes -1 The maximum margin linear classifier is the linear classifier with the, maximum margin.

16 Support Vector Machines: Slide 16 Copyright © 2001, 2003, Andrew W. Moore Maximum Margin denotes +1 denotes -1 The maximum margin linear classifier is the linear classifier with the, maximum margin. Called Linear Support Vector Machine (SVM)

17 Support Vector Machines: Slide 17 Copyright © 2001, 2003, Andrew W. Moore Empirical Studies with Text Categorization 10 Categories from Reuters-21578 For a few categories, the SVM method significantly outperforms the KNN approach CategoryKNNSVM earn97.398.0 acq92.093.6 money-fx78.274.5 grain82.294.6 crude85.788.9 trade77.475.9 interest74.077.7 ship79.285.6 wheat76.691.8 corn77.990.3 Classification accuracy

18 Support Vector Machines: Slide 18 Copyright © 2001, 2003, Andrew W. Moore Doing multi-class classification SVMs can only handle two-class outputs (i.e. a categorical output variable with arity 2). How to handle multiple classes E.g., classify documents into three categories: sports, business, politics

19 Support Vector Machines: Slide 19 Copyright © 2001, 2003, Andrew W. Moore Doing multi-class classification SVMs can only handle two-class outputs (i.e. a categorical output variable with arity 2). How to handle multiple classes E.g., classify documents into three categories: sports, business, politics Answer: one-vs-all, learn N SVM’s SVM 1 learns “Output==1” vs “Output != 1” SVM 2 learns “Output==2” vs “Output != 2” : SVM N learns “Output==N” vs “Output != N”

20 Support Vector Machines: Slide 20 One-vs-All vs the other classes: red(d) Copyright © 2001, 2003, Andrew W. Moore

21 Support Vector Machines: Slide 21 One-vs-All vs the other classes: red(d) vs the other classes: yellow(d) Copyright © 2001, 2003, Andrew W. Moore

22 Support Vector Machines: Slide 22 One-vs-All vs the other classes: red(d) vs the other classes: yellow(d) vs the other classes: cyan(d) Copyright © 2001, 2003, Andrew W. Moore

23 Support Vector Machines: Slide 23 One-vs-All vs the other classes: red(d) vs the other classes: yellow(d) vs the other classes: cyan(d) Given a test document d, how to decide its color ? Copyright © 2001, 2003, Andrew W. Moore

24 Support Vector Machines: Slide 24 One-vs-All vs the other classes: red(d) vs the other classes: yellow(d) vs the other classes: cyan(d) Given a test document d, how to decide its color ? Assign d to the color function with the largest score Copyright © 2001, 2003, Andrew W. Moore

25 Support Vector Machines: Slide 25 Copyright © 2001, 2003, Andrew W. Moore Suppose we’re in 1-dimension What would SVMs do with this data? x=0

26 Support Vector Machines: Slide 26 Copyright © 2001, 2003, Andrew W. Moore Suppose we’re in 1-dimension Not a big surprise x=0

27 Support Vector Machines: Slide 27 Copyright © 2001, 2003, Andrew W. Moore Harder 1-dimensional dataset What can be done about this? x=0

28 Support Vector Machines: Slide 28 Copyright © 2001, 2003, Andrew W. Moore Harder 1-dimensional dataset Expand from one dimensional space to a two dimensional space x=0 x2x2 x

29 Support Vector Machines: Slide 29 Copyright © 2001, 2003, Andrew W. Moore Harder 1-dimensional dataset Expand from one dimensional space to a two dimensional space x=0 x2x2 x Kernel trick: expand the dimensionality by a kernel function

30 Support Vector Machines: Slide 30 Copyright © 2001, 2003, Andrew W. Moore Nonlinear Kernel (I)

31 Support Vector Machines: Slide 31 Copyright © 2001, 2003, Andrew W. Moore Nonlinear Kernel (II)

32 Support Vector Machines: Slide 32 Software for SVM SVMlight (http://svmlight.joachims.org/)http://svmlight.joachims.org/ Libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/)http://www.csie.ntu.edu.tw/~cjlin/libsvm/ It is faster than SVMlight Sparse data representation The occurrences of most words in a document are zero : : Copyright © 2001, 2003, Andrew W. Moore class label word-id-1: word-occurrence

33 Support Vector Machines: Slide 33 Software for SVM SVMlight (http://svmlight.joachims.org/)http://svmlight.joachims.org/ Libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/)http://www.csie.ntu.edu.tw/~cjlin/libsvm/ It is faster than SVMlight Sparse data representation The occurrences of most words in a document are zero Example D = (‘hello’: 2, ‘world’: 3), negative document Wor-id for `hello’ is 100, word-id for ‘world’ is 54 -1 100:2 54:3 Copyright © 2001, 2003, Andrew W. Moore


Download ppt "Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Linear Document Classifier."

Similar presentations


Ads by Google