Computing Science, University of Aberdeen1 Reflections on Bayesian Spam Filtering l Tutorial nr.10 of CS2013 is based on Rosen, 6 th Ed., Chapter 6 & exercises.

Slides:



Advertisements
Similar presentations
Document Filtering Dr. Frank McCown Intro to Web Science Harding University This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike.
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 11 (Lab): Probability reminder.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 4-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
1 BASIC NOTIONS OF PROBABILITY THEORY. NLE 2 What probability theory is for Suppose that we have a fair dice, with six faces, and that we keep throwing.
CSC 380 Algorithm Project Presentation Spam Detection Algorithms Kyle McCombs Bridget Kelly.
Assuming normally distributed data! Naïve Bayes Classifier.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
CEEN-2131 Business Statistics: A Decision-Making Approach CEEN-2130/31/32 Using Probability and Probability Distributions.
Probability Population:
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
381 Discrete Probability Distributions (The Binomial Distribution) QSCI 381 – Lecture 13 (Larson and Farber, Sect 4.2)
Incorporating New Information to Decision Trees (posterior probabilities) MGS Chapter 6 Part 3.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Copyright © 2012 by Nelson Education Limited. Chapter 8 Hypothesis Testing II: The Two-Sample Case 8-1.
Hypothesis testing – mean differences between populations
Bayesian Networks. Male brain wiring Female brain wiring.
by B. Zadrozny and C. Elkan
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Python & Web Mining Old Dominion University Department of Computer Science Hany SalahEldeen CS495 – Python & Web Mining Fall 2012 Lecture 5 CS 495 Fall.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Naive Bayes Classifier
Step 3 of the Data Analysis Plan Confirm what the data reveal: Inferential statistics All this information is in Chapters 11 & 12 of text.
Computer Science, Software Engineering & Robotics Workshop, FGCU, April 27-28, 2012 Fault Prediction with Particle Filters by David Hatfield mentors: Dr.
Chapter 17: probability models
Chapter 6 USING PROBABILITY TO MAKE DECISIONS ABOUT DATA.
Chap 4-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 4 Using Probability and Probability.
Classification Techniques: Bayesian Classification
Confidence intervals and hypothesis testing Petter Mostad
CDA6530: Performance Models of Computers and Networks Chapter 1: Review of Practical Probability TexPoint fonts used in EMF. Read the TexPoint manual before.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang.
Bayesian Filtering Team Glyph Debbie Bridygham Pravesvuth Uparanukraw Ronald Ko Rihui Luo Thuong Luu Team Glyph Debbie Bridygham Pravesvuth Uparanukraw.
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
Quick review of some key ideas CEE 11 Spring 2002 Dr. Amelia Regan These notes draw liberally from the class text, Probability and Statistics for Engineering.
CDA6530: Performance Models of Computers and Networks Chapter 1: Review of Practical Probability TexPoint fonts used in EMF. Read the TexPoint manual before.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Inferential Statistics Psych 231: Research Methods in Psychology.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
AP Statistics From Randomness to Probability Chapter 14.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Naïve Bayes Classification Recitation, 1/25/07 Jonathan Huang.
Independent Samples: Comparing Means Lecture 39 Section 11.4 Fri, Apr 1, 2005.
Yandell – Econ 216 Chap 4-1 Chapter 4 Basic Probability.

Lecture Slides Elementary Statistics Twelfth Edition
Classification Techniques: Bayesian Classification
Discrete Event Simulation - 4
CSE 321 Discrete Structures
Honors Statistics From Randomness to Probability
Naïve Bayes Classifiers
Psych 231: Research Methods in Psychology
Inferential Statistics
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Computing Science, University of Aberdeen1 Reflections on Bayesian Spam Filtering l Tutorial nr.10 of CS2013 is based on Rosen, 6 th Ed., Chapter 6 & exercises l The following notes discuss why the Bayesian approach was useful in this case, and what might have been done differently

Computing Science, University of Aberdeen2 Bayesian Spam filtering l Task: predict whether a given is spam l But what does that mean? »an unseen ? »an in a specific corpus? l The standard distinction in statistics between »a sample space (the entire “population” of interest) »a particular sample

Computing Science, University of Aberdeen3 Bayesian Spam filtering l Task: predict whether a given is spam l First: focus on one word w (e.g., w = “friend”) l Define P(E) = prob that a random contains w at least once l Define P(S) = prob that a random is spam l Task: estimate P(S|E) »I read the word “friend” in an . Is this spam?

Computing Science, University of Aberdeen4 Task: estimate P(S|E) l One task: You have a large corpus C of s. Your task is to estimate P(S|E) over C »Variant task: estimate P(S|E) for an unseen l Assume all s in your corpus have been marked as Good (G, no spam) or Bad (B, spam) »Variant assumption: you have only had resources to mark a part of the corpus Direct approach: compute P(S|E) using frequency: |C  contain w| / |C| »have to compute a different set for each w »C may not be representative (i.e., may not tell you much about an unseen message)

Computing Science, University of Aberdeen5 Task: estimate P(S|E) l Promising approach: use Bayes’ Theorem P(S|E) = P(E|S) * P(S) / P(E) Data: |C| = 1,000,000 |G| = 20,000, |B| = 10,000 (only a small part of C!) |G  contain w| = 50 |B  contain w| = 2,500 l Let’s first use the approach suggested in the Practical, then explore alternatives

Computing Science, University of Aberdeen6 Task: estimate P(S|E) P(S|E) = P(E|S) * P(S) / P(E) |C| = 1,000,000,000 |B| = 20,000, |G| = 10,000 (only a small part!) |G  contain w| = 50 |B  contain w| = 2,500 P(E|S) = 2,500 / 20,000 = Assume P(S) = 0.1 (why not compute P(S)= 20,000/30,000 = 0.66?) How to estimate P(E) ?

Computing Science, University of Aberdeen7 Task: estimate P(S|E) P(S|E) = P(E|S) * P(S) / P(E) |C| = 1,000,000,000 |B| = 20,000, |G| = 10,000 (only a small part!) |G  contain w| = 50 |B  contain w| = 2,500 P(E|S) = 2,500 / 20,000 = Assume P(S) = 0.1 How to estimate P(E) ? (Assume we do not know |C  contain w|)

Computing Science, University of Aberdeen8 Task: estimate P(S|E) P(E|S) = P(S) = 0.1 How to estimate P(E) ? P(E) =Marginalisation= P(E,S) + P(E,not-S) =ProductRule= P(S)*P(E|S) + P(not-S)*P(E|not-S) = (0.1*0.125) + (0.9*50/10,000) = (0.9*0.005) = =0.0170

Computing Science, University of Aberdeen9 Task: estimate P(S|E) We can now estimate P(S|E) using Bayes’ Theorem: P(S|E) = P(E|S) * P(S) / P(E) P(E|S) = P(S) = 0.1 P(E) = Prediction: P(S|E) = 0.125*0.1 / =approx= (Given the threshold used in the Tutorial, the message is not classified as Spam)

Computing Science, University of Aberdeen10 Task: estimate P(S|E) P(S|E) = P(E|S) * P(S) / P(E) Some alternatives: 1. (see Practical) You know nothing about P(S).  S is Boolean, hence estimate P(S)=0.5 This is higher than our 0.1, hence you’d be over- estimating the probability P(S|E)

Computing Science, University of Aberdeen11 Task: estimate P(S|E) P(S|E) = P(E|S) * P(S) / P(E) Some alternatives: 2. You’re able to obtain P(E) from C  contain w  Great! If C is large and representative enough, this may give you a better assessment of P(E) (based on much more data than the estimate above, which estimated P(E|S) and P(E|not-S) on the basis of only 20,000 and 10,000 messages)

Computing Science, University of Aberdeen12 Task: estimate P(S|E) P(S|E) = P(E|S) * P(S) / P(E) Some alternatives: 3. You have the resources to look at more words  Great! This can give you a much more reliable estimate

Computing Science, University of Aberdeen13 using k words instead of 1! Def: P(E i ) is Prob that word i occurs in an (etc.) Assume all P(E i ) are independent of each other (etc.) P(S|w 1,w 2,…,w k ) =  i=1..k P(E i |S) * P(S) /  i=1..k P(E i ) = (  i=1..k P(E i |S * P(S)) +  i=1..k P(E i |not-S)*P(not-S)) (As in the Tutorial, this may be simplified if further assumptions are made)

Computing Science, University of Aberdeen14 Task: estimate P(S|E) P(S|E) = P(E|S) * P(S) / P(E) Other alternatives: l synonyms: if “friend” indicates spam, then maybe “soul mate” too? l sequences of n words (Rosen: compare “enhance performance” vs “operatic performance”) l take into account how often a word occurs l etc.

Computing Science, University of Aberdeen15 Spam filtering l Another example of Bayesian reasoning »used for constructing a classifier (much like mushroom classification in the Lectures) l Spammers second-guess spam filters, by adding “good” text to their messages (harvested from real non-spam messages, newspapers, etc.) l Spam filters second-guess spammers doing this … l A weapons race!

Computing Science, University of Aberdeen16