Machine Learning Basics with Applications to Email Spam Detection UGR P ROJECT - H AOYU LI, BRITTANY EDWARDS, WEI ZHANG UNDER XIAOXIAO XU AND ARYE NEHORAI.

Machine Learning Basics with Applications to Email Spam Detection UGR P ROJECT - H AOYU LI, BRITTANY EDWARDS, WEI ZHANG UNDER XIAOXIAO XU AND ARYE NEHORAI

General background information about the process of machine learning

The process of email detection Motivation of this project Pre-processing of data Classifier Models Evaluation of classifiers

Motivation of this project Spam email has been annoyed every personal email account 60% of January 2004 emails were spam Fraud & Phishing Spam vs. Ham email

Our Goal

Spam Email example

Ham Email example

Pre-processing of data Convert capital letters to lowercase Remove numbers, and extra white space Remove punctuations Remove stop-words Delete terms with length greater than 20.

Pre-processing of data Original Email

Pre-processing of data After pre-processing

Pre-processing of data Extract Terms

Pre-processing of data Reduce Terms Keep word length < 20

Different classification methods K Nearest Neighbor (KNN) Naive Bayes Classifier Logistic Regression Decision Tree Analysis

What is K Nearest Neighbor Use k "closet" samples (nearest neighbors) to perform classification

What is K Nearest Neighbor

Initial outcome and strategies for improvement KNN accuracy was ~64% - very low KNN classifier does not fit our project Term-list is still too large Try different method to classify and see if evaluation results are better than KNN results Continue to reduce size of term list by removing terms that are not meaningful

Steps for improvement Remove sparsity Reduced length threshold Created hashtable Used alternative classifier Naive- Bayes Classifier

Calculate Hash Key for each term in term-list. Once collision occurs, use the separate chain Hashtable

Naive- Bayes classifier

Secondary Results Correctness increases from 62% to 82.36%

Suggestions for further improvement Revise pre-processing Apply additional classifiers

Thank you Questions?

Machine Learning Basics with Applications to Email Spam Detection UGR P ROJECT - H AOYU LI, BRITTANY EDWARDS, WEI ZHANG UNDER XIAOXIAO XU AND ARYE NEHORAI.

Similar presentations

Presentation on theme: "Machine Learning Basics with Applications to Email Spam Detection UGR P ROJECT - H AOYU LI, BRITTANY EDWARDS, WEI ZHANG UNDER XIAOXIAO XU AND ARYE NEHORAI."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning Basics with Applications to Email Spam Detection UGR P ROJECT - H AOYU LI, BRITTANY EDWARDS, WEI ZHANG UNDER XIAOXIAO XU AND ARYE NEHORAI.

Similar presentations

Presentation on theme: "Machine Learning Basics with Applications to Email Spam Detection UGR P ROJECT - H AOYU LI, BRITTANY EDWARDS, WEI ZHANG UNDER XIAOXIAO XU AND ARYE NEHORAI."— Presentation transcript:

Similar presentations

About project

Feedback