Download presentation
Presentation is loading. Please wait.
Published byCamron Potter Modified over 9 years ago
1
Spam Detection Ethan Grefe December 13, 2013
2
Motivation Spam email is constantly cluttering inboxes
Commonly removed using rule based filters Spam often has very similar characteristics This allows them to be detected using machine learning Naïve Bayes Classifiers Support Vector Machines
3
SVM Solution Used training data from CSDMC2010 SPAM corpus
4327 labeled s 2949 non-spam messages (HAM) 1378 spam messages (SPAM). Extracted features from the subject and body of s Used resulting feature vectors to train an SVM classifier in Matlab
4
Email Features Features were determined by research and observation
Best results were obtained with the following features Percentage of letters that are capitalized Types of punctuation used Average length of a word Amount of html in the
5
Classifier Results Trained on a random 35% of emails
Tested SVM classifier on remaining 65% Trained SVM using three different kernel functions Kernel Function Spam Classification Rate Ham Classification Rate Total Classification Rate RBF 80.06% 92.33% 86.20% Linear 78.69% 80.66% 79.67% Quadratic 82.75% 84.85% 83.80%
6
Possible Improvements
Use Naïve Bayes to classify s using word frequency Obtain a wider variety of input features Test other types of learning algorithms
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.