Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw.

Slides:



Advertisements
Similar presentations
Document Filtering Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.
Advertisements

Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
TrustPort Net Gateway traffic protection. Keep It Secure Entry point protection –Clear separation of the risky internet and secured.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
What is Spam  Any unwanted messages that are sent to many users at once.  Spam can be sent via , text message, online chat, blogs or various other.
1 Bayesian Spam Filters Key ConceptsKey Concepts –Conditional Probability –Independence –Bayes Theorem.
CSC 380 Algorithm Project Presentation Spam Detection Algorithms Kyle McCombs Bridget Kelly.
Presented by: Alex Misstear Spam Filtering An Artificial Intelligence Showcase.
Phishing (pronounced “fishing”) is the process of sending messages to lure Internet users into revealing personal information such as credit card.
IMF Mihály Andó IT-IS 6 November Mihály Andó 2 / 11 6 November 2006 What is IMF? ­ Intelligent Message Filter ­ provides server-side message filtering,
Introduction to Hypothesis Testing
1 Spam Filtering Using Bayesian Approach Presented by: Nitin Kumar.
Spam Filters. What is Spam? Unsolicited (legally, “no existing relationship” Automated Bulk Not necessarily commercial – “flaming”, political.
S EC (4.5): S ECURITY 1. F ORMS OF ATTACK There are numerous way that a computer system and its contents can be attacked via network connections. Many.
Phishing, Pharming, and Spam Margaret StewartTuesday, Oct. 21, 2006.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 15 How Spam Works.
Pro Exchange SPAM Filter An Exchange 2000 based spam filtering solution.
Kaspersky Open Space Security: Release 2 World-class security solution for your business.
23 October 2002Emmanuel Ormancey1 Spam Filtering at CERN Emmanuel Ormancey - 23 October 2002.
TrustPort Net Gateway traffic protection. Keep It Secure Entry point protection –Clear separation of the risky internet and secured.
Spam? Not any more !! Detecting spam s using neural networks ECE/CS/ME 539 Project presentation Submitted by Sivanadyan, Thiagarajan.
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
Forefront Security Exchange. Problem Meddelande system och sammarbetsprodukter är underbarar mål för elak kod och “distrubition” av äkta dynga… Viruses.
Practical PC, 7 th Edition Chapter 9: Sending and Attachments.
Norman SecureTide Powerful cloud solution to stop spam and threats before it reaches your network.
GOT SPAM? Spam is the unsolicited or undesired bulk electronic messages. Spam usually contains pornography, viruses, phishing attacks, scams, trojans,
Internet Safety By Megan Wilkinson. Viruses If your computer haves a viruses on it, it will show one of them or a different one. All commuters have different.
Internet safety Viruses A computer virus is a program or piece of code that is loaded onto your computer without your knowledge and runs against your.
On the Anonymity of Anonymity Systems Andrei Serjantov (anonymous)
11 SECURING INTERNET MESSAGING Chapter 9. Chapter 9: SECURING INTERNET MESSAGING2 CHAPTER OBJECTIVES  Explain basic concepts of Internet messaging. 
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Basic Security Networking for Home and Small Businesses – Chapter 8.
Detrick Robinson & Amris Treadwell.  Computer viruses- are pieces of programs that are purposely made up to infect your computer.  Examples: › Internet.
Firewall and Internet Access Mechanism that control (1)Internet access, (2)Handle the problem of screening a particular network or an organization from.
Client X CronLab Spam Filter Technical Training Presentation 19/09/2015.
Python & Web Mining Old Dominion University Department of Computer Science Hany SalahEldeen CS495 – Python & Web Mining Fall 2012 Lecture 5 CS 495 Fall.
Dangers of the Internet CEL : C O M P U T E R S I N E V E R Y D A Y L I F E CEL 1 Dangers of the Internet Name: ____________________ Class: ________________.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
 A computer virus is a program or piece of code that is loaded onto your computer without your knowledge and runs against your wishes. It is deliberately.
A Technical Approach to Minimizing Spam Mallory J. Paine.
Phishing Pharming Spam. Phishing: Definition  A method of identity theft carried out through the creation of a website that seems to represent a legitimate.
Content Control Stewart Duncan Technical Manager.
SCAVENGER: A JUNK MAIL CLASSIFICATION PROGRAM Rohan Malkhare Committee : Dr. Eugene Fink Dr. Dewey Rundus Dr. Alan Hevner.
Predicting Accurate and Actionable Static Analysis Warnings: An Experimental Approach J. Ruthruff et al., University of Nebraska-Lincoln, NE U.S.A, Google.
Adapting Statistical Filtering David Kohlbrenner IT.com TJHSST.
Computing Science, University of Aberdeen1 Reflections on Bayesian Spam Filtering l Tutorial nr.10 of CS2013 is based on Rosen, 6 th Ed., Chapter 6 & exercises.
1 A Study of Supervised Spam Detection Applied to Eight Months of Personal E- Mail Gordon Cormack and Thomas Lynam Presented by Hui Fang.
Outlook 2003 Rules Creating/Modifing Rules Junk By sender By receipient By Subject By John Marcus – Presentation.
Web Content Filtering Mayur Lodha (mdl2130). Agenda  Need of Filtering  Content Filtering  Basic Model  Filtering Techniques  Filtering  Circumvent.
Lightspeed is a web-blocking and filtering software program providing safe online security for educational users.
By Ankur Khator Gaurav Sharma Arpit Mathur 01D05014 SPAM FILTERING.
What is Spam? d min.
Machine Learning for Spam Filtering 1 Sai Koushik Haddunoori.
Chapter 2. Conditional Probability Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University :
Overview of Firewalls. Outline Objective Background Firewalls Software Firewall Hardware Firewall Demilitarized Zone (DMZ) Firewall Types Firewall Configuration.
Extra Credit Presentation: Allegra Earl CSCI 101 T 3:30.
Chapter 8 – Naïve Bayes DM for Business Intelligence.
CERN - IT Department CH-1211 Genève 23 Switzerland t OIS Update on the anti spam system at CERN Pawel Grzywaczewski, CERN IT/OIS HEPIX fall.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
A Plan For No Spam WELCOME TO THE SEMINAR ON A Plan For No Spam by.
Sender Reputation in a Large Webmail Service by Bradley Taylor (2006) Presented by : Manoj Kumar & Harsha Vardhana.
Lesson 3 Safe Computing.
Document Filtering Social Web 3/17/2010 Jae-wook Ahn.
Exploiting Machine Learning to Subvert Your Spam Filter
Introduction to Networking
A Study On Solutions To Spam
Discrete Structures for Computer Science
Spam Fighting at CERN 12 January 2019 Emmanuel Ormancey.
Computer Security.
Management Suite v2.0 DoubleCheck Manager Management Suite v2.0.
Presented by: Sehar Munawar Registration no: B2F17ASOC0035 Presented to: Sir Waseem Iqbal & management & spam.
Presentation transcript:

Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu

Background Strong need exists to identify “bad” items in a population and remove them -- Examples: SPAM, Unsolicited IMs, Etc. Filtering often results in “Arm’s Race” requiring rapid response “Arm’s Race” favors inherently adaptive methods over others Strong need exists to identify “bad” items in a population and remove them -- Examples: SPAM, Unsolicited IMs, Etc. Filtering often results in “Arm’s Race” requiring rapid response “Arm’s Race” favors inherently adaptive methods over others

Benefits of Filters Less unwanted traffic, thus less wasted space on clients & servers Greater use of internet services due to reduced customer frustration Provide some protection against dangerous traffic: scams, phishing attacks, viruses, etc. Less unwanted traffic, thus less wasted space on clients & servers Greater use of internet services due to reduced customer frustration Provide some protection against dangerous traffic: scams, phishing attacks, viruses, etc.

Downsides of Filtering Exclusion of even one legitimate item (i.e., False Positives) less desirable than letting 10 or more illegitimate items pass. Reducing the percentage of undesirable traffic often causes legitimate traffic to be excluded as well. Exclusion of even one legitimate item (i.e., False Positives) less desirable than letting 10 or more illegitimate items pass. Reducing the percentage of undesirable traffic often causes legitimate traffic to be excluded as well.

Cost of Filtering Manual filtering has become prohibitive Maintenance of static filters costs time & money Time spent maintaining keywords or updating software delays response “Arm’s Race” often results in ever escalating costs Manual filtering has become prohibitive Maintenance of static filters costs time & money Time spent maintaining keywords or updating software delays response “Arm’s Race” often results in ever escalating costs

Methodologies Manual filtering prohibitive in terms of time Static filtering based on heuristics and keywords does not adapt except via manual updates Bayesian filtering is dynamic, adapting with each new item scanned and/or marked Manual filtering prohibitive in terms of time Static filtering based on heuristics and keywords does not adapt except via manual updates Bayesian filtering is dynamic, adapting with each new item scanned and/or marked

What is Bayesian Filtering? Uses Naïve Bayes Classifier, which uses Bayes Theorem Classifier allows items to be adaptively categorized using probabilities & has low rate of False Positives Most well-known use in SPAM filtering; often credited to initial work by Paul Graham (“A Plan for Spam”) in 2002 Uses Naïve Bayes Classifier, which uses Bayes Theorem Classifier allows items to be adaptively categorized using probabilities & has low rate of False Positives Most well-known use in SPAM filtering; often credited to initial work by Paul Graham (“A Plan for Spam”) in 2002

Naïve Bayes Classifier Uses Bayes Theorem with assumptions that probabilities are independent (rarely true), thus “naïve” Classifier can start with initial assumptions, i.e., probabilities that words occur in legitimate or illegitimate messages Is trained over time and adapts. If final probability reaches some threshold, an item is rejected. Superior to keyword filtering. Uses Bayes Theorem with assumptions that probabilities are independent (rarely true), thus “naïve” Classifier can start with initial assumptions, i.e., probabilities that words occur in legitimate or illegitimate messages Is trained over time and adapts. If final probability reaches some threshold, an item is rejected. Superior to keyword filtering.

Bayes Theorem First presented in 1763 based on work by mathematician Thomas Bayes Pr(A|B) = Pr(B|A)· Pr(A) / Pr(B) Specifies relationships between conditional probabilities Currently has practical use in many fields First presented in 1763 based on work by mathematician Thomas Bayes Pr(A|B) = Pr(B|A)· Pr(A) / Pr(B) Specifies relationships between conditional probabilities Currently has practical use in many fields

Bayesian Filtering Usage Uses user input to develop individual statistics Probability matrix changes over time based on scanned messages and user decisions Matrix is used to calculate probability a message is unwanted Matrix adapts quickly to new input, resulting in surprisingly good results Uses user input to develop individual statistics Probability matrix changes over time based on scanned messages and user decisions Matrix is used to calculate probability a message is unwanted Matrix adapts quickly to new input, resulting in surprisingly good results

Example Matrix

Example Suppose the word “guarantee” occurs in 500 of 2000 Spam s, but only in 5 of 1000 Non-Spam s The probability of Spam for this word is then (500 / 2000) / ((500 / 2000) + (5 / 1000)) = 0.98 This probability is combined with that of others obtained from message to compute a probability for the entire message being Spam. Suppose the word “guarantee” occurs in 500 of 2000 Spam s, but only in 5 of 1000 Non-Spam s The probability of Spam for this word is then (500 / 2000) / ((500 / 2000) + (5 / 1000)) = 0.98 This probability is combined with that of others obtained from message to compute a probability for the entire message being Spam.

Bayesian Poisoning Attempts to fool BF systems by adding irrelevant words (often hidden) Type I attacks attempt to get messages through filter -- could be active or passive, with active producing feedback to sender via a “Web Bug” or other means Type II attacks attempt to cause “False Positives”, i.e., force desirable messages to be rejected Attempts to fool BF systems by adding irrelevant words (often hidden) Type I attacks attempt to get messages through filter -- could be active or passive, with active producing feedback to sender via a “Web Bug” or other means Type II attacks attempt to cause “False Positives”, i.e., force desirable messages to be rejected

Poisoning Effectiveness Passive attacks are rarely effective as filters are individual and sender gets no feedback Active attacks can be initially highly effective, if systems access “Web Bugs” All attacks lose effectiveness as the filter adjusts to incoming traffic Passive attacks are rarely effective as filters are individual and sender gets no feedback Active attacks can be initially highly effective, if systems access “Web Bugs” All attacks lose effectiveness as the filter adjusts to incoming traffic

Products that use Bayesian Filtering AlienCamelDSPAMEudoraeXpurgateJunk-OutMozilla Pegasus Mail POPFilePostiniSeaMonkey SpamAssas sin SpamBayesSpamProbeThunderbird

Summary BF adapts to individual needs BF is highly effective BF adapts more quickly than other solutions BF is resistant to “poisoning” BF adapts to individual needs BF is highly effective BF adapts more quickly than other solutions BF is resistant to “poisoning”

References [1] Sahami, M., et. al. “A Bayesian Approach to Filtering Junk ”, 1998A Bayesian Approach to Filtering Junk [2] Graham, Paul. “A Plan for SPAM”, 2002A Plan for SPAM [3] Graham-Cumming, John. “Does Bayesian poisoning exist?”, 2006Does Bayesian poisoning exist? [1] Sahami, M., et. al. “A Bayesian Approach to Filtering Junk ”, 1998A Bayesian Approach to Filtering Junk [2] Graham, Paul. “A Plan for SPAM”, 2002A Plan for SPAM [3] Graham-Cumming, John. “Does Bayesian poisoning exist?”, 2006Does Bayesian poisoning exist?

References, cont. [4] Naive Bayes Classifier, Wikipedia, 2007Naive Bayes Classifier [5] Bayes Theorem, Wikipedia, 2007Bayes Theorem [4] Naive Bayes Classifier, Wikipedia, 2007Naive Bayes Classifier [5] Bayes Theorem, Wikipedia, 2007Bayes Theorem