The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison, Wisconsin

Outline of Talk  Support Vector Machines (SVM) Introduction  Standard Quadratic Programming Formulation  SVM Feature Selection  The Disputed Federalist Papers  Results  Classification Agrees with Previous Results  Successive Linearization Algorithm (SLA)  Description of the Classification Problem  1-norm Linear SVMs  Separating Hyperplane in Three Dimensions Only  Description of Previous Work

What is a Support Vector Machine?  An optimally defined surface  Typically nonlinear in the input space  Linear in a higher dimensional space  Implicitly defined by a kernel function

What are Support Vector Machines Used For?  Classification  Regression & Data Fitting  Supervised & Unsupervised Learning (Will concentrate on classification)

Geometry of the Classification Problem 2-Category Linearly Separable Case A+ A-

Algebra of the Classification Problem 2-Category Linearly Separable Case  Given m points in n dimensional space  Represented by an m-by-n matrix A  More succinctly: where e is a vector of ones.  Separate by two bounding planes,  An m-by-m diagonal matrix D with +1 & -1 entries  Membership of each in class +1 or –1 specified by:

Support Vector Machines Maximizing the Margin between Bounding Planes A+ A- Support vectors

Support Vector Machines: Quadratic Programming Formulation  Solve the following quadratic program: min s.t. where is the weight of the training error  Maximize the margin by minimizing

Support Vector Machines: Linear Programming Formulation  Use the 1-norm instead of the 2-norm: min s.t.  This is equivalent to the following linear program: min s.t.

Feature Selection and SVMs  Use the step function to suppress components of the normal to the separating hyperplane: min s.t. Where:

Smooth Approximation of the Step Function

SVM Formulation with Feature Selection  For, we use the approximation of the step vector by the concave exponential:  Here is the base of natural logarithms. This leads to: min s.t.

Successive Linearization Algorithm (SLA) for Feature Selection  Choose. Start with some. Having, determine the next iterate by solving the LP: min s.t.  Stop when:  Proposition: Algorithm terminates in a finite number of steps (typically 5 to 7) at a stationary point.

The Federalist Papers  Written in 1787-1788 by Alexander Hamilton, John Jay and James Madison to persuade the citizens of New York to ratify the constitution.  Papers consisted of short essays, 900 to 3500 words in length.  Authorship of 12 of those papers have been in dispute ( Madison or Hamilton). These papers are referred to as the disputed Federalist papers.

Previous Work  Mosteller and Wallace (1964)  Using statistical inference, determined the authorship of the 12 disputed papers.  Bosch and Smith (1998).  Using linear programming techniques and the evaluation of every possible combination of one, two and three features, obtained a separating hyperplane using only three words.

Description of the data  For every paper:  Machine readable text was created using a scanner.  Computed relative frequencies of 70 words, that Mosteller-Wallace identified as good candidates for author-attribution studies.  Each document is represented as a vector containing the 70 real numbers corresponding to the 70 word frequencies.  The dataset consists of 118 papers:  50 Madison papers  56 Hamilton papers  12 disputed papers

Function Words Based on Relative Frequencies

SLA Feature Selection for Classifying the Disputed Federalist Papers  Apply the successive linearization algorithm to:  Train on the 106 Federalist papers with known authors  Find a classification hyperplane that uses as few words as possible  Use the hyperplane to classify the 12 disputed papers  The parameter was obtained by a tuning procedure.

Hyperplane Classifier Using 3 Words  A hyperplane depending on three words was found: 0.5368to+24.6634upon+2.9532would=66.6159  All disputed papers ended up on the Madison side of the plane

Results: 3d plot of resulting hyperplane

Comparison with Previous Work & Conclusion  Bosch and Smith (1998) calculated all the possible sets of one, two and three words to find a separating hyperplane. They solved 118,895 linear programs.  Our SLA algorithm for feature selection required the solution of only 6 linear programs.  Our classification of the disputed Federalist papers agrees with that of Mosteller- Wallace and Bosch-Smith.

More on SVMs:  My web page: www.cs.wisc.edu/~gfung  Olvi Mangasarian web page: www.cs.wisc.edu/~olvi

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,

Similar presentations

Presentation on theme: "The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,

Similar presentations

Presentation on theme: "The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,"— Presentation transcript:

Similar presentations

About project

Feedback