A False Positive Safe Neural Network for Spam Detection Alexandru Catalin Cosoi

Slides:



Advertisements
Similar presentations
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Advertisements

Anti-SPAM experience at LAL Michel Jouvin LAL / IN2P3
Document Filtering Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.
Marković Miljan 3139/2011
Data mining in wireless sensor networks based on artificial neural-networks algorithms Authors: Andrea Kulakov and Danco Davcev Presentation by: Niyati.
CRM114 TeamKNN and Hyperspace Spam Sorting1 Sorting Spam with K-Nearest Neighbor and Hyperspace Classifiers William Yerazunis 1 Fidelis Assis 2 Christian.
Neural Networks Chapter 9 Joost N. Kok Universiteit Leiden.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
Presented by: Alex Misstear Spam Filtering An Artificial Intelligence Showcase.
6/1/2015 Spam Filtering - Muthiyalu Jothir 1 Spam Filtering Computer Security Seminar N.Muthiyalu Jothir – Media Informatics.
Search Engines and Information Retrieval
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
1 Abstract This study presents an analysis of two modified fuzzy ARTMAP neural networks. The modifications are first introduced mathematically. Then, the.
1 BotGraph: Large Scale Spamming Botnet Detection Yao Zhao EECS Department Northwestern University.
1 Spam Filtering Using Bayesian Approach Presented by: Nitin Kumar.
Data Mining.
Document Classification Comparison Evangel Sarwar, Josh Woolever, Rebecca Zimmerman.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao Yinglian Xie *, Fang Yu *, Qifa Ke *, Yuan Yu *, Yan Chen and Eliot Gillum ‡ EECS Department,
1 Authors: Anirudh Ramachandran, Nick Feamster, and Santosh Vempala Publication: ACM Conference on Computer and Communications Security 2007 Presenter:
Fighting Spam Enterprise Spam Filtering Using Open Source Tools.
23 October 2002Emmanuel Ormancey1 Spam Filtering at CERN Emmanuel Ormancey - 23 October 2002.
Spam? Not any more !! Detecting spam s using neural networks ECE/CS/ME 539 Project presentation Submitted by Sivanadyan, Thiagarajan.
Copyright c 2001 The McGraw-Hill Companies, Inc.1 Chapter 7 Sampling, Significance Levels, and Hypothesis Testing Three scientific traditions critical.
Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray,
Clustering Spam MIT Spam Conference 2008 Phil Tom.
Good Word Attacks on Statistical Spam Filters Daniel Lowd University of Washington (Joint work with Christopher Meek, Microsoft Research)
WRITING EFFECTIVE S. Before writing the Make a plan! Think about the purpose of the Think about the person who will read the and.
November 25, 2014Computer Vision Lecture 20: Object Recognition IV 1 Creating Data Representations The problem with some data representations is that the.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
A Neural Network Classifier for Junk Ian Stuart, Sung-Hyuk Cha, and Charles Tappert CSIS Student/Faculty Research Day May 7, 2004.
Group 2 R 李庭閣 R 孔垂玖 R 許守傑 R 鄭力維.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Presented by Scott Lichtor An Introduction to Neural Networks.
A Technical Approach to Minimizing Spam Mallory J. Paine.
SCAVENGER: A JUNK MAIL CLASSIFICATION PROGRAM Rohan Malkhare Committee : Dr. Eugene Fink Dr. Dewey Rundus Dr. Alan Hevner.
Introduction to Business Writing: Effective Business s
Adapting Statistical Filtering David Kohlbrenner IT.com TJHSST.
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
Marketing Amanda Freeman. Design Guidelines Set your width to pixels Avoid too many tables Flash, JavaScript, ActiveX and movies will not.
A Kosher Source of Ham Nathan Friess John Aycock Department of Computer Science University of Calgary Canada.
Leveraging Asset Reputation Systems to Detect and Prevent Fraud and Abuse at LinkedIn Jenelle Bray Staff Data Scientist Strata + Hadoop World New York,
Spam Detection Ethan Grefe December 13, 2013.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
1 Fighting Against Spam. 2 How might we analyze ? Identify different parts – Reply blocks, signature blocks Integrate with workflow tasks Build.
Leveraging Delivery for Spam Mitigation.
1 An Anti-Spam filter based on Adaptive Neural Networks Alexandru Catalin Cosoi Researcher / BitDefender AntiSpam Laboratory
Machine Learning for Spam Filtering 1 Sai Koushik Haddunoori.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 7 Sampling, Significance Levels, and Hypothesis Testing Three scientific traditions.
1 Adaptive Resonance Theory. 2 INTRODUCTION Adaptive resonance theory (ART) was developed by Carpenter and Grossberg[1987a] ART refers to the class of.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
DO LOCAL MODIFICATION RULES ALLOW EFFICIENT LEARNING ABOUT DISTRIBUTED REPRESENTATIONS ? A. R. Gardner-Medwin THE PRINCIPLE OF LOCAL COMPUTABILITY Neural.
Claudiu MUSAT, Ionut GRIGORESCU, Carmen MITRICA, Alexandru TRIFAN Spam Clustering using Wave Oriented K Means.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Spam By Dan Sterrett. Overview ► What is spam? ► Why it’s a problem ► The source of spam ► How spammers get your address ► Preventing Spam ► Possible.
AdaBoost Algorithm and its Application on Object Detection Fayin Li.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Excuber ' Product introduction.
How do Web Applications Work?
WEB SPAM.
Introduction to Business Writing: Effective Business s
Final Year Project Presentation --- Magic Paint Face
Asymmetric Gradient Boosting with Application to Spam Filtering
Dr. Unnikrishnan P.C. Professor, EEE
network of simple neuron-like computing elements
Do humans beat computers at pattern recognition? Andra Miloiu Costina
An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,
PowerPoint Template.
Presentation transcript:

A False Positive Safe Neural Network for Spam Detection Alexandru Catalin Cosoi

Does this look familiar?

Anatrim

Oh boy, it’s getting worst!!!

Bad Bad Spammer!!! Databases: D: Random legitimate text D 1 : Different rephrases of a certain spam phrase D 2 : Different rephrases of another spam phrase ………………… D n : Different rephrases of another spam phrase –Create spam message script: –Choose a random phrase from D 1 –Choose random text from D –Choose a random phrase from D 2 –Choose random text from D –……………. –Chose random phrase from D n Send message. 40 samples of different subjects 50 samples of different titles 30 samples of different titles (part II) different combinations Appeared as a consequence of botnets

Features Larger time frame – KeyWord!!!! Weak features –Words like “Anatrim”, “Viagra”, “Xanax”, “Stock” –Simple word combinations like “Stock alert”, “Strong buy” –Simple Header Heuristics (for both spam and ham) like: valid reply, weird message id, forged headers Example: –Top 500 spammy words from a Bayesian dictionary –Some simple header heuristics from spamassasins’ SARE Ninjas –Trainer’s personal flavour

Why ART? Training occurs by modifying the weights of each neuron For large amounts of data, forgetting important details might actually happen Solves the stability-plasticity dilemma Based on template detection Unlimited number of templates involves unlimited number of patterns 2 self organizing neural networks + a mapping module = supervised organizing neural network

Adaptive Resonance Theory Similar to a cluster algorithm (as many clusters as needed) ARTMAP = ART a + ART b + MapField

ART Vigilance Small Value - Imprecise Big value - Fragmented A big value: Accepts small errors; Many small clusters; High precision A small value: Accepts high errors; A few big clusters; Errors can appear

ART ++

Algorithm

Corpus 2.5 million spam messages (sampled on waves with a high degree of variation) and around 1000 simple low relevance text heuristics (not counting the standard header heuristics). The first 1000 words (ordered by discrimination, but with a minimum of hundred occurrences) from a bayesian dictionary trained on this corpus, and also standard header heuristics. Almost 1 million legitimate messages 75% of the message corpus were used for training the neural network and, 25% were used in testing the neural network. 1.5 days to train!!!!

Results FP: 1%0.0001% FN: 4% 20 % On some corpuses (TREC 2006) we had … not so great results (but current heuristics) FN: 35% (  ) FP: 2 messages! ( ) At least, just a few false positives!

Conclusions ART + Simple Features + Spam = Love ART + False Positives + Spam = OMG!!! (ART++) = Heuristic Filter + ARTMAP Must use a lot of messages. It is highly difficult to find representative samples for individual waves. Can also be applied to other neural networks Interesting PowerPoint template…

Thanks QUESTIONS?