A COMPARISON OF ANN, NAÏVE BAYES, AND DECISION TREE FOR THE PURPOSE OF SPAM FILTERING KAASHYAPEE JHA ECE/CS 539 1.

Slides:



Advertisements
Similar presentations
Machine Learning Basics with Applications to Spam Detection UGR P ROJECT - H AOYU LI, BRITTANY EDWARDS, WEI ZHANG UNDER XIAOXIAO XU AND ARYE NEHORAI.
Advertisements

Document Filtering Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Bayesian Theorem & Spam Filtering
AI Practice 05 / 07 Sang-Woo Lee. 1.Usage of SVM and Decision Tree in Weka 2.Amplification about Final Project Spec 3.SVM – State of the Art in Classification.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
1 A LVQ-based neural network anti-spam approach 楊婉秀 教授 資管碩一 詹元順 /12/07.
Weka. Preprocessing Opening a file Editing a file Visualize a variable.
Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks Yehonatan Cohen Daniel Gordon Danny Hendler Ben-Gurion University Yehonatan.
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
6/1/2015 Spam Filtering - Muthiyalu Jothir 1 Spam Filtering Computer Security Seminar N.Muthiyalu Jothir – Media Informatics.
Quiz 9 Chapter 13 Note the two versions A & B Nov
Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.
Probabilistic inference
Assuming normally distributed data! Naïve Bayes Classifier.
Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Machine Learning Group University College Dublin 4.30 Machine Learning Pádraig Cunningham.
Deep Belief Networks for Spam Filtering
Document Classification Comparison Evangel Sarwar, Josh Woolever, Rebecca Zimmerman.
Text Classification: An Implementation Project Prerak Sanghvi Computer Science and Engineering Department State University of New York at Buffalo.
Sentence Classifier for Helpdesk s Anthony 6 June 2006 Supervisors: Dr. Yuval Marom Dr. David Albrecht.
Decision Trees with Minimal Costs Charles Ling, Qiang Yang and Jianning Wang 2004.
How does computer know what is spam and what is ham?
Integrating Management Tools with the Project Management Process Working Group 3.
Filtron: A Learning-Based Anti-Spam Filter Eirinaios Michelakis Ion Androutsopoulos
Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray,
Naïve Bayes Chapter 4, DDS. Introduction Classification Training set  design a model Test set  validate the model Classify data set using the model.
SPAM DETECTION USING MACHINE LEARNING Lydia Song, Lauren Steimle, Xiaoxiao Xu.
An Exercise in Machine Learning
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
6/28/2014 CSE651C, B. Ramamurthy1.  Classification is placing things where they belong  Why? To learn from classification  To discover patterns  To.
Man vs. Machine: Adversarial Detection of Malicious Crowdsourcing Workers Gang Wang, Tianyi Wang, Haitao Zheng, Ben Y. Zhao, UC Santa Barbara, Usenix Security.
©2012 Paula Matuszek CSC 9010: Text Mining Applications: Document-Based Techniques Dr. Paula Matuszek
SCAVENGER: A JUNK MAIL CLASSIFICATION PROGRAM Rohan Malkhare Committee : Dr. Eugene Fink Dr. Dewey Rundus Dr. Alan Hevner.
Modeling the Human Classification of Galaxy Morphology Wednesday, December 5, 2007 Mike Specian.
Naive Bayes Classifier Christopher Gonzalez. Outline Bayes’ Theorem What is a Naive Bayes Classifier (NBC)? Why/when to use NBC? How does NBC work? Applications.
SPAM DETECTION AND FILTERING By Prasanna Kunchavaram.
Spam Detection Ethan Grefe December 13, 2013.
Introduction Use machine learning and various classifying techniques to be able to create an algorithm that can decipher between spam and ham s. .
Project Final Presentation – Dec. 6, 2012 CS 5604 : Information Storage and Retrieval Instructor: Prof. Edward Fox GTA : Tarek Kanan ProjArabic Team Ahmed.
MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30,
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Nuhi BESIMI, Adrian BESIMI, Visar SHEHU
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Copyright  2004 limsoon wong Using WEKA for Classification (without feature selection)
Classification using Co-Training
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Lazy Bayesian Rules: A Lazy Semi-Naïve Bayesian Learning Technique Competitive to Boosting Decision Trees Zijian Zheng, Geoffrey I. Webb, Kai Ming Ting.
Machine Learning Models
Recitation #3 Tel Aviv University 2016/2017 Slava Novgorodov
Asymmetric Gradient Boosting with Application to Spam Filtering
Classification Techniques: Bayesian Classification
Automatic Chinese Text Categorization Feature Engineering and Comparison of Classification Approaches Yi-An Lin and Yu-Te Lin.
ريكاوري (بازگشت به حالت اوليه)
Spam Detection Algorithm Analysis
Adapted from: Prof. Pedro Larrañaga Technical University of Madrid
Anne Howard Training and Consulting
Text Categorization Rong Jin.
Building a Naive Bayes Text Classifier with scikit-learn
Objectives Data Mining Course
Naïve Bayes Classifiers
Predicting Loan Defaults
Spam control Old emphasis: detect spam
Spam Detection Using Support Vector Machine Presenting By Nan Mya Oo University of Computer Studies Taunggyi.
Tips to Stop Spam in Gmail Account |Gmail Customer Helpline Number
Presentation transcript:

A COMPARISON OF ANN, NAÏVE BAYES, AND DECISION TREE FOR THE PURPOSE OF SPAM FILTERING KAASHYAPEE JHA ECE/CS 539 1

NAÏVE BAYES CLASSIFIER Bayes Theorem:

PREPROCESSING Stop list: do not take into account trivial words like {or, and, but, a, an, the, is, in, for} Do not take into account words that are very uncommon

NAÏVE BAYES CLASSIFIER RESULTS Trial #1# of spam documents # of ham documents False positive rateAccuracy Training Set %98.9% Testing Set82476 Trial #3# of spam documents # of ham documents False positive rateAccuracy Training Set %96.9% Testing Set57232 Trial #2# of spam documents # of ham documents False positive rateAccuracy Training Set %98.1% Testing Set68451

SVM RESULTS Trial #1# of spam documents # of ham documents False positive rateAccuracy Training Set %99.6% Testing Set82476 Trial #3# of spam documents # of ham documents False positive rateAccuracy Training Set %97.6% Testing Set57232 Trial #2# of spam documents # of ham documents False positive rateAccuracy Training Set %99.3% Testing Set68451

WEAKNESS OF NAÏVE BAYES CLASSIFIER Example: hey man are you interested in sports? then me at Spammers can avoid using words that are more prone to being in a spam

WORK AHEAD  Finish implementing and testing Decision Tree  More preprocessing of the data  Perform more trials with different ratios of training set and testing set