6/1/2015 Email Spam Filtering - Muthiyalu Jothir 1 Email Spam Filtering Computer Security Seminar N.Muthiyalu Jothir – 271120 Media Informatics.

Slides:



Advertisements
Similar presentations
Anti-SPAM experience at LAL Michel Jouvin LAL / IN2P3
Advertisements

Document Filtering Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.
Basic Communication on the Internet:
What is Spam  Any unwanted messages that are sent to many users at once.  Spam can be sent via , text message, online chat, blogs or various other.
Addressing spam and enforcing a Do Not Registry using a Certified Electronic Mail System Information Technology Advisory Group, Inc.
Surrey Public Library Electronic Classrooms Essentials.
COMPUTER BASICS METC 106. The Internet Global group of interconnected networks Originated in 1969 – Department of Defense ARPANet Only text, no graphics.
Dealing With Spam The kind, not the Food product.
----Presented by Di Xu  Introduction  Overview of Spam  Solutions to Spam  Conclusion.
CSC 380 Algorithm Project Presentation Spam Detection Algorithms Kyle McCombs Bridget Kelly.
Presented by: Alex Misstear Spam Filtering An Artificial Intelligence Showcase.
1 Aug. 3 rd, 2007Conference on and Anti-Spam (CEAS’07) Slicing Spam with Occam’s Razor Chris Fleizach, Geoffrey M. Voelker, Stefan Savage University.
IMF Mihály Andó IT-IS 6 November Mihály Andó 2 / 11 6 November 2006 What is IMF? ­ Intelligent Message Filter ­ provides server-side message filtering,
Preventing Spam: Today and Tomorrow Zane Bonny Vilaphong Phasiname The Spamsters!
SPAM Ka Yat, Kei Comp 450 Spring 2008, CSUN. Thesis Statement Thesis Statement---Spam is becoming a bigger issue in the computer world. How do we.
Spam May CS239. Taxonomy (UBE)  Advertisement  Phishing Webpage  Content  Links From: Thrifty Health-Insurance Mailed-By: noticeoption.comReply-To:
HUNTINGTON BEACH PUBLIC LIBRARY Basics. What is ? short for electronic mail send & receive messages over the internet.
A – Promotion Marketing PE: Understand the use of direct marketing to attract attention and to build brand. PI: Explain the nature of marketing.
Guide to Operating System Security Chapter 10 Security.
Pro Exchange SPAM Filter An Exchange 2000 based spam filtering solution.
Spam Sonia Jahid University of Illinois Fall 2007.
23 October 2002Emmanuel Ormancey1 Spam Filtering at CERN Emmanuel Ormancey - 23 October 2002.
Lesson 46: Using Information From the Web copy and paste information from a Web site print a Web page download information from a Web site customize Web.
Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray,
AND SPAM BY OLUWATOBI BAKARE
ICT Essential Skills. (electronic mail) Snail Mail.
Spam and Anti-Spam By Aditi Desai Yousuf Haider. Agenda Introduction Purpose of Spam Types of Spam Spam Techniques Anti spam Why Spam is so Easy Anti.
SHASHANK MASHETTY security. Introduction Electronic mail most commonly referred to as or e- mail. Electronic mail is one of the most commonly.
Login Screen This is the Sign In page for the Dashboard Enter Id and Password to sign In New User Registration.
Chapter 9 Collecting Data with Forms. A form on a web page consists of form objects such as text boxes or radio buttons into which users type information.
Sending Mark Kruger Coldfusionmuse.com Cfwebtools.com.
PHISHING AND SPAM INTRODUCTION There’s a good chance that in the past week you have received at least one that pretends to be from your bank,
OCR Nationals – Unit 1 AO2 (Part 2) – s. Overview of AO2 (Part 2) To select and use tools and facilities to download files/information and to send.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Information guide.
Login Screen This is the Sign In page for the Dashboard New User Registration Enter Id and Password to sign In.
A form of communication in which electronic messages are created and transferred between two or more devices connected to a network.
Network and Systems Security By, Vigya Sharma (2011MCS2564) FaisalAlam(2011MCS2608) DETECTING SPAMMERS ON SOCIAL NETWORKS.
Client X CronLab Spam Filter Technical Training Presentation 19/09/2015.
A Neural Network Classifier for Junk Ian Stuart, Sung-Hyuk Cha, and Charles Tappert CSIS Student/Faculty Research Day May 7, 2004.
Name: Ryan Lugg Form: 10B . How can businesses make use of . (P) can be a very useful tool, it can be very cost effective and efficient.
The Internet 8th Edition Tutorial 2 Basic Communication on the Internet: .
Representation of Electronic Mail Filtering Profiles: A User Study Michael J. Pazzani Information and Computer Science University of California, Irvine.
Privacy & Security Online Ivy, Kris & Neil Privacy Threat - Ivy Is Big Brother Watching You? - Kris Identity Theft - Kris Medical Privacy - Neil Children’s.
Permission Keys Adrian E. McElligott. What have you lost today? What has your Spam filter.
A Technical Approach to Minimizing Spam Mallory J. Paine.
SCAVENGER: A JUNK MAIL CLASSIFICATION PROGRAM Rohan Malkhare Committee : Dr. Eugene Fink Dr. Dewey Rundus Dr. Alan Hevner.
Basic Features and Options Accessing  Means of communicating electronically via the Internet.  Used by individuals, businesses,
Marketing Amanda Freeman. Design Guidelines Set your width to pixels Avoid too many tables Flash, JavaScript, ActiveX and movies will not.
Web Content Filtering Mayur Lodha (mdl2130). Agenda  Need of Filtering  Content Filtering  Basic Model  Filtering Techniques  Filtering  Circumvent.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, Presented.
Machine Learning for Spam Filtering 1 Sai Koushik Haddunoori.
Living Online Lesson 3 Using the Internet IC3 Basics Internet and Computing Core Certification Ambrose, Bergerud, Buscge, Morrison, Wells-Pusins.
Do Now: Describe the steps used to access the comments tool in MS Word. ( review your notes for the answer) Ex: Step 1. Select the text or item you want.
Basics What is ? is short for electronic mail. is a method for sending messages electronically from one computer.
A False Positive Safe Neural Network for Spam Detection Alexandru Catalin Cosoi
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,
Spam Detection Kingsley Okeke Nimrat Virk. Everyone hates spams!! Spam s, also known as junk s, are unwanted s sent to numerous recipients.
BUILD SECURE PRODUCTS AND SERVICES
Project Management: Messages
TMG Client Protection 6NPS – Session 7.
3.06 Understand the use of direct marketing to attract attention and to build brand.
3.06 Understand the use of direct marketing to attract attention and to build brand.
Basics HURY DEPARTMENT OF COMPUTER SCIENCE M.TEJASWINI.
Spam Fighting at CERN 12 January 2019 Emmanuel Ormancey.
This is the Sign In page for the Dashboard
Management Suite v2.0 DoubleCheck Manager Management Suite v2.0.
3.06 Understand the use of direct marketing to attract attention and to build brand.
Text Mining Application Programming Chapter 9 Text Categorization
Presentation transcript:

6/1/2015 Spam Filtering - Muthiyalu Jothir 1 Spam Filtering Computer Security Seminar N.Muthiyalu Jothir – Media Informatics

6/1/2015 Spam Filtering - Muthiyalu Jothir 2 Agenda What is Spam ? What is Spam ? Statistics Statistics Who Benefits from it? Who Benefits from it? Spam Filtering Techniques Spam Filtering Techniques Combining Filters Combining Filters Conclusion Conclusion

6/1/2015 Spam Filtering - Muthiyalu Jothir 3 What is Spam? Spam  Unsolicited Spam  Unsolicited s that involves sending identical or nearly identical messages to thousands (or millions) of recipients. s that involves sending identical or nearly identical messages to thousands (or millions) of recipients. Caution ! Caution ! “SPAM - Spiced Ham ” is a popular American canned meat brand… “SPAM - Spiced Ham ” is a popular American canned meat brand…

6/1/2015 Spam Filtering - Muthiyalu Jothir 4 Problem  With a tiny investment, a spammer can send over 100,000 bulk s per hour. With a tiny investment, a spammer can send over 100,000 bulk s per hour. Junk mails waste storage and transmission bandwidth. Junk mails waste storage and transmission bandwidth. ISP’s investment  Cost we absorb as ISP’s customer ISP’s investment  Cost we absorb as ISP’s customer Spam is a problem because the cost is forced onto us, the recipient. Spam is a problem because the cost is forced onto us, the recipient.

6/1/2015 Spam Filtering - Muthiyalu Jothir 5Statistics considered Spam40% of all Daily Spam s sent 12.4 billion Daily Spam received per person 6 Annual Spam received per person2,200 Spam cost to all non-corp. Internet users$255 million Spam cost to all U.S. Corporations in 2002 $8.9 billion Estimated Spam increase by % Users who reply to Spam 28% Users who purchased from Spam 8% Wasted corporate time per Spam 4-5 seconds

6/1/2015 Spam Filtering - Muthiyalu Jothir 6 Who benefits from Spam? Financial Firms e.g. Mortgage Lead Generators (Gain 2% of Loan value per customer data) Spammers (Share the profit with Lead Generators) Recipient Information about interested customers Recipient replies here

6/1/2015 Spam Filtering - Muthiyalu Jothir 7 Spam Control Techniques Fight Back techniquesFiltering Techniques Reporting Spam to ISP Fight back filters Slow Senders Law ??? etc. Challenge-Response Filtering Blacklists and White lists Content based filters  Rule based  Bayesian filters

6/1/2015 Spam Filtering - Muthiyalu Jothir 8 Reporting Spam To ISPs Original spam solution Legitimate ISPs respond to such complaints Spammers kicked off Disadvantage   Disguised Spammers.   Naïve users cannot interpret the headers

6/1/2015 Spam Filtering - Muthiyalu Jothir 9 Filters that Fight Back (FFB) Majority of spam contain links to web pages. Spam filters could auto retrieve the URLs and crawl back to those pages, which would increase the load on the server. If all the spam receivers do this at the same time, the server might be crashed and so the cost of spamming increases. Caution ! FFB usually works with blacklists (of malicious servers) in order to avoid the attack on innocent servers.

6/1/2015 Spam Filtering - Muthiyalu Jothir 10 Filtering Techniques

6/1/2015 Spam Filtering - Muthiyalu Jothir 11 Spam Vs Ham Care to be taken in any Spam filtering technique Care to be taken in any Spam filtering technique “All the Spam could be allowed to pass thro; but, not even a single legitimate mail should be filtered.” “All the Spam could be allowed to pass thro; but, not even a single legitimate mail should be filtered.” False Positive – Legitimate mail classified as spam. False Positive – Legitimate mail classified as spam. Least false positive rate desired… Least false positive rate desired… Caution : Check your junk folder before deleting Caution : Check your junk folder before deleting Don’t believe your Spam filter Don’t believe your Spam filter

6/1/2015 Spam Filtering - Muthiyalu Jothir 12 Challenge-Response Filtering s from unknown senders will receive an auto-reply message asking them to verify themselves Senders “Challenged" to type in a word that is hidden within a graphic or a sound file Mail is forwarded to receiver’s inbox, only after successful “response” This technique almost filters all spam. No spammer would be interested to take the extra effort to prove him / her self. Commercial product “spamarrest” Disadvantage This technique is rude  Sometimes senders don’t or forget to reply to the challenge

6/1/2015 Spam Filtering - Muthiyalu Jothir 13 Blacklists and White lists Blacklists of misbehaving servers or known spammers that are collected by several sites. Sender id in the is compared with the blacklist White lists are complementary to black lists, and contain addresses of trusted contacts Use blacklists and white lists for the first level filtering (before applying content checks) and not used as the only tool for making decision. Disadvantage   Prone to wrong configurations with legitimate servers unable to exit from a list where they had been incorrectly inserted.

6/1/2015 Spam Filtering - Muthiyalu Jothir 14 Content based filters Not a good idea to filter mails just based on blacklists Not a good idea to filter mails just based on blacklists Wiser decision  Consider the actual content of the Wiser decision  Consider the actual content of the Almost all the successful spam filters use this technique Almost all the successful spam filters use this technique Major types : Rule-based and Bayesian Major types : Rule-based and Bayesian

6/1/2015 Spam Filtering - Muthiyalu Jothir 15 Rule Based Filters Rule based filters work based on some static rules to decide whether a mail is a spam or not. Rules could be words and phrases lots of uppercase characters exclamation points special characters Web links HTML messages background colors crazy Subject lines etc.

6/1/2015 Spam Filtering - Muthiyalu Jothir 16 Rule based filters Rules are given scores, based on importance Incoming mails are parsed and checked for known malicious patterns Total score calculated for the triggered rules If Final Score > Threshold, classify as spam. Otherwise, classify as legitimate mail. Threshold decided by the user.

6/1/2015 Spam Filtering - Muthiyalu Jothir 17 Rule Based Filters “Spamassasin”, a popular spam filtering product uses rule based filtering. Perl Regex (Regular expressions) used for pattern checking Example rules header __LOCAL_FROM_NEWS From body __LOCAL_SALES_FIGURES /\bMonthly Sales Figures\b/ score LOCAL_NEWS_SALES_FIGURES 0.8

6/1/2015 Spam Filtering - Muthiyalu Jothir 18 Rule Based Filters Advantage Advantage Easy to implement Easy to implement No training required No training required Disadvantage Disadvantage Static rules too general Static rules too general Spammers find new ways to deceive the rules Spammers find new ways to deceive the rules

6/1/2015 Spam Filtering - Muthiyalu Jothir 19 Bayesian Filters Bayesian filters are the latest in spam filtering technology and the most successful. Bayes classifiers were used extensively in the field of pattern recognition. Given an unlabeled example, the classifier will calculate the most likely classification with some degree of probability.

6/1/2015 Spam Filtering - Muthiyalu Jothir 20 Bayesian Filters Steps in Bayes Filtering Training Validation Implementation Training starts with two collections of mails : one of spam and one of legitimate mail. For every word in these s, it calculates a spam probability based on the proportion of spam occurrences. Bayesian filters are quite accurate, and adapt automatically as spam evolves. False positives are minimized by Bayesian filtering because they consider evidence of innocence as well as evidence of spam.

6/1/2015 Spam Filtering - Muthiyalu Jothir 21 Bayesian Filtering Bayes Probability, Bayes Probability, Pr (spam | words) = Pr (spam) * Pr (spam | words) = Pr (spam) * Pr (words | Spam) Pr (words) Probability closer to 1 would be classified as spam and closer to 0 is classified as ham. 0.5 is set as the threshold.

6/1/2015 Spam Filtering - Muthiyalu Jothir 22 Neural Network for Training Neural Network Structure Neural Network Structure i

6/1/2015 Spam Filtering - Muthiyalu Jothir 23 Neural Networks for Training Neural networks are used to train the spam filter (Rule-based or Bayesian) and itself is not a filter Input  words or rules etc. Trained over multiple samples of the user’s mails (both spam and ham) Weights of the links are altered till the desired output is obtained.

6/1/2015 Spam Filtering - Muthiyalu Jothir 24 Supervised Learning Supervised learning  Training with a “teacher” signal Train the system till we get optimized unaltered weights for the edges. Caution! Take care not to over train the network.

6/1/2015 Spam Filtering - Muthiyalu Jothir 25 Combining Spam Filters Goal  Goal  Combined filter aims to improve individual filters performance. Combined Filter = Original Filter (OF) + Received Filter (RF) Combined Filter = Original Filter (OF) + Received Filter (RF) Max gain  Received filter contains some feature sets not found in the original filter. E.g. Original Filter = {“Share Market”, “Higher Studies”} Received filter = {“Share Market”, “Job Alerts”}

6/1/2015 Spam Filtering - Muthiyalu Jothir 26 Challenges Decisions (Spam / Ham) made by both filters individually Decisions (Spam / Ham) made by both filters individually Decisions agree  No Problem Decisions agree  No Problem Disagreement  Due to difference of feature sets Disagreement  Due to difference of feature sets Challenges Challenges “How do we select the correct decision or filter?” “Who selects it?”

6/1/2015 Spam Filtering - Muthiyalu Jothir 27 Filter Selector (FS) Training Phase  FS predicts the unique features (e.g. words) of RF Training Phase  FS predicts the unique features (e.g. words) of RF Parse the s of training set and extract the features Parse the s of training set and extract the features ‘Bag’ of (predicted) features for RF ‘Bag’ of (predicted) features for RF Text similarity comparison between the current 's features and the feature sets of the filters.

6/1/2015 Spam Filtering - Muthiyalu Jothir 28 Algorithm Flowchart 1. Training Phase 2. Final Verdict

6/1/2015 Spam Filtering - Muthiyalu Jothir 29 TF – IDF Similarity Measure Commonly used in Information Retrieval applications. More frequent words would be key to accurate classification of s FS predicted feature set is unique “Query – Document” retrieval procedure. 2 documents – Feature sets Query – Current

6/1/2015 Spam Filtering - Muthiyalu Jothir 30 Experiments & Results Experiments & Results

6/1/2015 Spam Filtering - Muthiyalu Jothir 31 Conclusion We discussed the techniques to “kill” spam We discussed the techniques to “kill” spam Comparison between various techniques Comparison between various techniques So far, Bayesian seems to be reliable So far, Bayesian seems to be reliable Discussed a new approach to combine filters Discussed a new approach to combine filters Future work : Future work : Learning techniques for Filter Selector Learning techniques for Filter Selector Better Similarity measures Better Similarity measures

6/1/2015 Spam Filtering - Muthiyalu Jothir 32 Thank You Thank You