Naïve Bayes Model. Outline Independence and Conditional Independence Naïve Bayes Model Application: Spam Detection.

Slides:



Advertisements
Similar presentations
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Advertisements

Bayes Rule The product rule gives us two ways to factor a joint probability: Therefore, Why is this useful? –Can get diagnostic probability P(Cavity |
More probability CS151 David Kauchak Fall 2010 Some material borrowed from: Sara Owsley Sood and others.
1 Essential Probability & Statistics (Lecture for CS598CXZ Advanced Topics in Information Retrieval ) ChengXiang Zhai Department of Computer Science University.
0 0 Review Probability Axioms –Non-negativity P(A)≥0 –Additivity P(A U B) =P(A)+ P(B), if A and B are disjoint. –Normalization P(Ω)=1 Independence of two.
1 Chapter 3 Probability 3.1 Terminology 3.2 Assign Probability 3.3 Compound Events 3.4 Conditional Probability 3.5 Rules of Computing Probabilities 3.6.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE 11 (Lab): Probability reminder.
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Probability Sample Space Diagrams.
1 BASIC NOTIONS OF PROBABILITY THEORY. NLE 2 What probability theory is for Suppose that we have a fair dice, with six faces, and that we keep throwing.
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
AI – CS364 Uncertainty Management Introduction to Uncertainty Management 21 st September 2006 Dr Bogdan L. Vrusias
Probabilistic inference
Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Bayesian Belief Networks
Uncertainty in AI Outline: Introduction Basic Probability Theory Probabilistic Reasoning Why should we use probability theory?  Dutch Book Theorem.
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
Ai in game programming it university of copenhagen Welcome to... the Crash Course Probability Theory Marco Loog.
CSE (c) S. Tanimoto, 2008 Bayes Nets 1 Probabilistic Reasoning With Bayes’ Rule Outline: Motivation Generalizing Modus Ponens Bayes’ Rule Applying.
Naïve Bayesian Classifiers Before getting to Naïve Bayesian Classifiers let’s first go over some basic probability theory p(C k |A) is known as a conditional.
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
Does Naïve Bayes always work?
Review: Probability Random variables, events Axioms of probability
“PROBABILITY” Some important terms Event: An event is one or more of the possible outcomes of an activity. When we toss a coin there are two possibilities,
Naive Bayes Classifier
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin A Survey of Probability Concepts Chapter 5.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
November 2004CSA4050: Crash Concepts in Probability1 CSA4050: Advanced Topics in NLP Probability I Experiments/Outcomes/Events Independence/Dependence.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Classification Techniques: Bayesian Classification
CDA6530: Performance Models of Computers and Networks Chapter 1: Review of Practical Probability TexPoint fonts used in EMF. Read the TexPoint manual before.
Recap from last lesson Compliment Addition rule for probabilities
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
12/7/20151 Math b Conditional Probability, Independency, Bayes Theorem.
Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8.
Probability. What is probability? Probability discusses the likelihood or chance of something happening. For instance, -- the probability of it raining.
Textbook Basics of an Expert System: – “Expert systems: Design and Development,” by: John Durkin, 1994, Chapters 1-4. Uncertainty (Probability, Certainty.
CDA6530: Performance Models of Computers and Networks Chapter 1: Review of Practical Probability TexPoint fonts used in EMF. Read the TexPoint manual before.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
PROBABILITY 1. Basic Terminology 2 Probability 3  Probability is the numerical measure of the likelihood that an event will occur  The probability.
CSE (c) S. Tanimoto, 2007 Bayes Nets 1 Bayes Networks Outline: Why Bayes Nets? Review of Bayes’ Rule Combining independent items of evidence General.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 2: Probability CIS Computational Probability and Statistics.
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
ECE457 Applied Artificial Intelligence Fall 2007 Lecture #8
Bayesian Reasoning Chapter 13 Thomas Bayes,
Bayesian Reasoning Chapter 13 Thomas Bayes,
CS 2750: Machine Learning Directed Graphical Models
Conditional Probability
Bayesian and Markov Test
Does Naïve Bayes always work?
Qian Liu CSE spring University of Pennsylvania
Chapter 4 Probability.
Naive Bayes Classifier
Machine Learning. k-Nearest Neighbor Classifiers.
Information Retrieval
Classification Techniques: Bayesian Classification
Professor Marie desJardins,
Class #21 – Monday, November 10
Bayesian Reasoning Chapter 13 Thomas Bayes,
Bayesian Reasoning Chapter 13 Thomas Bayes,
28th September 2005 Dr Bogdan L. Vrusias
DR.M.THIAGARAJAN ASSOCIATE PROFESSOR OF MATHEMATICS
Chapter 5 – Probability Rules
ECE457 Applied Artificial Intelligence Spring 2008 Lecture #8
Presentation transcript:

Naïve Bayes Model

Outline Independence and Conditional Independence Naïve Bayes Model Application: Spam Detection

Independence: Intuition Events are independent if one has nothing whatever to do with others. Therefore, for two independent events, knowing one happening does change the probability of the other event happening. one toss of coin is independent of another coin (assuming it is a regular coin). price of tea in England is independent of the result of general election in Canada.

Independent or Dependent? Getting cold and getting cat-allergy Mile Per Gallon and acceleration. Size of a person’s vocabulary the person’s shoe size.

Independence: Definition Events A and B are independent iff: P(A, B) = P(A) x P(B) which is equivalent to P(A|B) = P(A) and P(B|A) = P(B) when P(A, B) >0. T1: the first toss is a head. T2: the second toss is a tail. P(T2|T1) = P(T2)

Conditional Independence Dependent events can become independent given certain other events. Example, Size of shoe Age Size of vocabulary Two events A, B are conditionally independent given a third event C iff P(A|B, C) = P(A|C)

Conditional Independence: Definition Let E1 and E2 be two events, they are conditionally independent given E iff P(E1|E, E2)=P(E1|E), that is the probability of E1 is not changed after knowing E2, given E is true. Equivalent formulations: P(E1, E2|E)=P(E1|E) P(E2|E) P(E2|E, E1)=P(E2|E)

Example: Play Tennis? Predict playing tennis when What probability should be used to make the prediction? How to compute the probability?

Probabilities of Individual Attributes Given the training set, we can compute the probabilities P(+) = 9/14 P( − ) = 5/14

Naïve Bayes Method Knowledge Base contains A set of hypotheses A set of evidences Probability of an evidence given a hypothesis Given A sub set of the evidences known to be present in a situation Find the hypothesis with the highest posterior probability: P(H|E 1, E 2, …, E k ).  The probability itself does not matter so much.

Naïve Bayes Method Assumptions Hypotheses are exhaustive and mutually exclusive  H 1 v H 2 v … v H k  ¬ (H i ^ H j ) for any i≠j Evidences are conditionally independent given a hypothesis  P(E 1, E 2,…, E k |H) = P(E 1 |H)…P(E k |H)  P(H | E 1, E 2,…, E k ) = P(E 1, E 2,…, E k, H)/P(E 1, E 2,…, E k ) = P(E 1, E 2,…, E k |H)P(H)/P(E 1, E 2,…, E k )

Naïve Bayes Method The goal is to find H that maximize P(E 1, E 2,…, E k |H) Since P(E 1, E 2,…, E k |H) = P(E 1, E 2,…, E k |H)P(H)/P(E 1, E 2,…, E k ) and P(E 1, E 2,…, E k ) is the same for different hypotheses, Maximizing P(E 1, E 2,…, E k |H) is equivalent to maximizing P(E 1, E 2,…, E k |H)P(H)= P(E 1 |H)…P(E k |H)P(H) Naïve Bayes Method Find a hypothesis that maximizes P(E 1 |H)…P(E k |H)P(H)

Example: Play Tennis P(+| sunny, cool, high, strong) vs. P(−| sunny, cool, high, strong) P(sunny|+)P(cool|+)P(high|+)P(strong|+)P(+) vs. P(sunny|−)P(cool|−)P(high|−)P(strong|−)P(−)

Application: Spam Detection Spam Dear sir, We want to transfer to overseas ($ 126, USD) One hundred and Twenty six million United States Dollars) from a Bank in Africa, I want to ask you to quietly look for a reliable and honest person who will be capable and fit to provide either an existing …… Legitimate Ham: for lack of better name.

Hypotheses: {Spam, Ham} Evidence: a document The document is treated as a set (or bag) of words Knowledge P(Spam)  The prior probability of an message being a spam.  How to estimate this probability? P(w|Spam)  the probability that a word is w if we know w is chosen from a spam.  How to estimate this probability?

Limitations of Naïve Bayesian Cannot handle hypotheses of composite hypotheses well Suppose are independent of each other Consider a composite hypothesis How to compute the posterior probability

Using the Bayes’ Theorem

but this is a very unreasonable assumption Need a better representation and a better assumption E: earth quake B: burglar A: alarm set off E and B are independent But when A is given, they are (adversely) dependent because they become competitors to explain A P(B|A, E) <<P(B|A) E explains away of A

Cannot handle causal chaining Ex. A: weather of the year B: cotton production of the year C: cotton price of next year Observed: A influences C The influence is not direct (A -> B -> C) P(C|B, A) = P(C|B): instantiation of B blocks influence of A on C