1 Bayesian Spam Filters Key ConceptsKey Concepts –Conditional Probability –Independence –Bayes Theorem.

Slides:



Advertisements
Similar presentations
The Complexity of Agreement A 100% Quantum-Free Talk Scott Aaronson MIT.
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Bayes Theorem. Motivating Example: Drug Tests A drug test gives a false positive 2% of the time (that is, 2% of those who test positive actually are not.
Document Filtering Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.
Discrete Probability Chapter 7.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Mathematics in Today's World
CSC-2259 Discrete Structures
S ENTIMENTAL A NALYSIS O F B LOGS B Y C OMBINING L EXICAL K NOWLEDGE W ITH T EXT C LASSIFICATION. 1 By Prem Melville, Wojciech Gryc, Richard D. Lawrence.
CSC 380 Algorithm Project Presentation Spam Detection Algorithms Kyle McCombs Bridget Kelly.
Presented by: Alex Misstear Spam Filtering An Artificial Intelligence Showcase.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.1 Chapter Six Probability.
Probably Approximately Correct Learning Yongsub Lim Applied Algorithm Laboratory KAIST.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Chapter 4 Probability.
1 Discrete Math CS 2800 Prof. Bart Selman Module Probability --- Part b) Bayes’ Rule Random Variables.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 Spam Filtering Using Bayesian Approach Presented by: Nitin Kumar.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Chapter 4 Probability Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Probability, Bayes’ Theorem and the Monty Hall Problem
Chapter 1 Basics of Probability.
Chapter 1 Probability and Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Python & Web Mining Old Dominion University Department of Computer Science Hany SalahEldeen CS495 – Python & Web Mining Fall 2012 Lecture 5 CS 495 Fall.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Modeling and Simulation CS 313
Math The Multiplication Rule for P(A and B)
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
2. Bayes Decision Theory Prof. A.L. Yuille Stat 231. Fall 2004.
Spam Filtering. From: "" Subject: real estate is the only way... gem oalvgkay Anyone can buy real estate with no money down Stop paying rent TODAY ! There.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
CSE 321 Discrete Structures Winter 2008 Lecture 19 Probability Theory TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Chapter 17: probability models
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 3. Bayes Decision Theory: Part II. Prof. A.L. Yuille Stat 231. Fall 2004.
1 Bayesian Methods. 2 Naïve Bayes New data point to classify: X=(x 1,x 2,…x m ) Strategy: – Calculate P(C i /X) for each class C i. – Select C i for which.
CHAPTER 12: General Rules of Probability Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.
Baye’s Theorem Working with Conditional Probabilities.
Computing Science, University of Aberdeen1 Reflections on Bayesian Spam Filtering l Tutorial nr.10 of CS2013 is based on Rosen, 6 th Ed., Chapter 6 & exercises.
Bayes’ Theorem -- Partitions Given two events, R and S, if P(R  S) =1 P(R  S) =0 then we say that R and S partition the sample space. More than 2 events.
Chapter 4 Probability ©. Sample Space sample space.S The possible outcomes of a random experiment are called the basic outcomes, and the set of all basic.
6.3 Bayes Theorem. We can use Bayes Theorem… …when we know some conditional probabilities, but wish to know others. For example: We know P(test positive|have.
Spam Detection Ethan Grefe December 13, 2013.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
1 Fighting Against Spam. 2 How might we analyze ? Identify different parts – Reply blocks, signature blocks Integrate with workflow tasks Build.
Education as a Signaling Device and Investment in Human Capital Topic 3 Part I.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Simple examples of the Bayesian approach For proportions and means.
Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Notes 2.1 and 2.2 LOOKING FOR SQUARES AND SQUARE ROOTS.
Stat 1510: General Rules of Probability. Agenda 2  Independence and the Multiplication Rule  The General Addition Rule  Conditional Probability  The.
CDA6530: Performance Models of Computers and Networks Chapter 1: Review of Practical Probability TexPoint fonts used in EMF. Read the TexPoint manual before.
The Birthday Problem. The Problem In a group of 50 students, what is the probability that at least two students share the same birthday?
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
MM207 Statistics Welcome to the Unit 9 Seminar With Ms. Hannahs Final Project is due Tuesday, August 7 at 11:59 pm ET. No late projects will be accepted.
Lecture 1.31 Criteria for optimal reception of radio signals.
Probability --- Part b)
Direct Proof and Counterexample III: Divisibility
Chapter 17 Probability Models Copyright © 2010 Pearson Education, Inc.
Discrete Structures for Computer Science
Machine Learning. k-Nearest Neighbor Classifiers.
How to Automatically Delete Spam Mails on Mac PC?.
Training Presentation for McCormick’s Proficiency Sample Program Website Lesson 11 - How to contact the System Administrator [Instructor Notes, if any,
CSE 321 Discrete Structures
Naïve Bayes Classifiers
Section 11.7 Probability.
Copyright © Cengage Learning. All rights reserved.
Sample Proofs 1. S>-M A 2. -S>-M A -M GOAL.
Presentation transcript:

1 Bayesian Spam Filters Key ConceptsKey Concepts –Conditional Probability –Independence –Bayes Theorem

2 Spam or Ham? FROM: Terry Delaney [removed] TO: (removed) Subject: FDA approved on-line pharmacies! click here (removed) here (removed) Chose your product and site below: Canadian pharmacy (removed) - Cialis Soft Tabs - $5.78, Viagra Professional - $4.07, Soma - $1.38, Human Growth Hormone - $43.37, Meridia - $3.32, Tramadol - $2.17, Levitra - $11.97.

3 Quick Reminders Conditional Probability: Events E, F withConditional Probability: Events E, F with Independence: E and F are independent if and only ifIndependence: E and F are independent if and only if

4 Baye’s Theorem: A quick Proof

Proof cont. 5

6 Applying Baye’s Theorem Let our sample space be the set of s.Let our sample space be the set of s. Let S be the event a message is spam; hence is the event a message is not spamLet S be the event a message is spam; hence is the event a message is not spam Let E be the event a message contains a word w.Let E be the event a message contains a word w.

7 Estimations

8 Estimation Continued

9 Spam based on single words? Probabilities based on single words: Bad IdeaProbabilities based on single words: Bad Idea –False positives AND false negatives aplenty Calculate based on n words, assuming each event E i |S (E i |S C ) is independent; P(S) = P(S C ).Calculate based on n words, assuming each event E i |S (E i |S C ) is independent; P(S) = P(S C ).

Final Approximation 10

11 How do we use this? User must train the filter based on messages in his/her inbox to estimate probabilitiesUser must train the filter based on messages in his/her inbox to estimate probabilities The program or user must define a threshold probability r:The program or user must define a threshold probability r: If, the message is considered spam.If, the message is considered spam.

12 Example Suppose the filter has the following dataSuppose the filter has the following data Threshold Probability:.9Threshold Probability:.9 “Viagra” occurs in 250 of 2000 spam messages“Viagra” occurs in 250 of 2000 spam messages “Viagra” occurs in only 5 of 1000 non-spam messages“Viagra” occurs in only 5 of 1000 non-spam messages Let’s try to estimate the probability, using the process we just definedLet’s try to estimate the probability, using the process we just defined

13 Example Cont. Step 1: Find the probability that the message has the word “Viagra” in it and is spam.Step 1: Find the probability that the message has the word “Viagra” in it and is spam. –p(Viagra) = 250 / 2000 = Step 2: Find the probability that the message has the word “Viagra” in it and is not spam.Step 2: Find the probability that the message has the word “Viagra” in it and is not spam. –q(Viagra) = 5 / 1000 = 0.005

14 Since we are assuming that it is equally likely that an incoming message is or is not spam, we can estimate the probability with this equation:Since we are assuming that it is equally likely that an incoming message is or is not spam, we can estimate the probability with this equation: –r(Viagra) = p(Viagra) p(Viagra) + q(Viagra) p(Viagra) + q(Viagra) Example Cont.

= = Since r(Viagra) is greater than the threshold of 0.9, we can reject this message as spam. Example Cont.

16 Single-word detection can lead to a lot of false positives and false negatives.Single-word detection can lead to a lot of false positives and false negatives. To counter this, most spam filters look for the presence of multiple words.To counter this, most spam filters look for the presence of multiple words. Harder Stuff

17 Another Example 2000 Spam messages; 1000 real messages2000 Spam messages; 1000 real messages “Viagra” appears in 400 spam messages“Viagra” appears in 400 spam messages “Viagra” appears in 60 real messages“Viagra” appears in 60 real messages “Cialis” appears in 200 spam and 25 real messages“Cialis” appears in 200 spam and 25 real messages Threshold Probability:.9Threshold Probability:.9 Let’s calculate the probability that it’s spam.Let’s calculate the probability that it’s spam.

18 Example Cont. Step 1: Find the probability that the message has the word “Viagra” in it and is spam.Step 1: Find the probability that the message has the word “Viagra” in it and is spam. –p(Viagra) = 400 / 2000 = 0.2 Step 2: Find the probability that the message has the word “Viagra” and is not spam.Step 2: Find the probability that the message has the word “Viagra” and is not spam. –q(Viagra) = 60 / 1000 = 0.06

19 Example Cont. Step 3: Find the probability that the message contains the word “Cialis” and is spam.Step 3: Find the probability that the message contains the word “Cialis” and is spam. –p(Cialis) = 200 / 2000 = 0.1 Step 4: Find the probability that the message contains the word “Cialis” and is not spam.Step 4: Find the probability that the message contains the word “Cialis” and is not spam. –q(Cialis) = 25 / 1000 = 0.025

20 Example Cont Using our approximation, we have:Using our approximation, we have: –r(Viagra,Cialis) = p(Viagra) * p(Cialis) p(Viagra) * p(Cialis) + q(Viagra) * q(Cialis) p(Viagra) * p(Cialis) + q(Viagra) * q(Cialis)

21 Example Cont. r(Viagra,Cialis) = (0.2)(0.1)r(Viagra,Cialis) = (0.2)(0.1) (0.2)(0.1) + (0.6)(0.025) (0.2)(0.1) + (0.6)(0.025) = = This message will be rejected however since we set the threshold probability at 0.9. This message will be rejected however since we set the threshold probability at 0.9.

22 Questions?