Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spam Email Detection Ethan Grefe December 13, 2013.

Similar presentations


Presentation on theme: "Spam Email Detection Ethan Grefe December 13, 2013."— Presentation transcript:

1 Spam Detection Ethan Grefe December 13, 2013

2 Motivation Spam email is constantly cluttering inboxes
Commonly removed using rule based filters Spam often has very similar characteristics This allows them to be detected using machine learning Naïve Bayes Classifiers Support Vector Machines

3 SVM Solution Used training data from CSDMC2010 SPAM corpus
4327 labeled s 2949 non-spam messages (HAM) 1378 spam messages (SPAM). Extracted features from the subject and body of s Used resulting feature vectors to train an SVM classifier in Matlab

4 Email Features Features were determined by research and observation
Best results were obtained with the following features Percentage of letters that are capitalized Types of punctuation used Average length of a word Amount of html in the

5 Classifier Results Trained on a random 35% of emails
Tested SVM classifier on remaining 65% Trained SVM using three different kernel functions Kernel Function Spam Classification Rate Ham Classification Rate Total Classification Rate RBF 80.06% 92.33% 86.20% Linear 78.69% 80.66% 79.67% Quadratic 82.75% 84.85% 83.80%

6 Possible Improvements
Use Naïve Bayes to classify s using word frequency Obtain a wider variety of input features Test other types of learning algorithms


Download ppt "Spam Email Detection Ethan Grefe December 13, 2013."

Similar presentations


Ads by Google