CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Slides:



Advertisements
Similar presentations
Particle Filtering Sometimes |X| is too big to use exact inference
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
Naïve Bayes Classifier
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
Overview Full Bayesian Learning MAP learning
Probabilistic inference
Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
CS 188: Artificial Intelligence Fall 2009
CS 188: Artificial Intelligence Fall 2009 Lecture 23: Perceptrons 11/17/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
Thanks to Nir Friedman, HU
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Review: Probability Random variables, events Axioms of probability
Crash Course on Machine Learning
Made by: Maor Levy, Temple University  Up until now: how to reason in a give model  Machine learning: how to acquire a model on the basis of.
CS 188: Artificial Intelligence Spring 2007 Lecture 18: Classification: Part I Naïve Bayes 03/22/2007 Srini Narayanan – ICSI and UC Berkeley.
Bayesian Networks. Male brain wiring Female brain wiring.
CSE 511a: Artificial Intelligence Spring 2013
Naive Bayes Classifier
Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)
LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.
CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein – UC Berkeley Many slides from either Stuart Russell or Andrew Moore.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Classification Techniques: Bayesian Classification
Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호
Oliver Schulte Machine Learning 726 Bayes Net Classifiers The Naïve Bayes Model.
Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki
CSE 473: Artificial Intelligence Spring 2012 Bayesian Networks - Learning Dan Weld Slides adapted from Jack Breese, Dan Klein, Daphne Koller, Stuart Russell,
Slides for “Data Mining” by I. H. Witten and E. Frank.
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
Machine Learning  Up until now: how to reason in a model and how to make optimal decisions  Machine learning: how to acquire a model on the basis of.
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
Announcements  Project 5  Due Friday 4/10 at 5pm  Homework 9  Released soon, due Monday 4/13 at 11:59pm.
Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang.
Recap: General Naïve Bayes  A general naïve Bayes model:  Y: label to be predicted  F 1, …, F n : features of each instance Y F1F1 FnFn F2F2 This slide.
Computer vision: models, learning and inference Chapter 2 Introduction to probability.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Naïve Bayes Classification Recitation, 1/25/07 Jonathan Huang.
LEARNING FROM EXAMPLES AIMA CHAPTER 18 (4-5) CSE 537 Spring 2014 Instructor: Sael Lee Slides are mostly made from AIMA resources, Andrew W. Moore’s tutorials:
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
CAP 5636 – Advanced Artificial Intelligence
Matt Gormley Lecture 3 September 7, 2016
Naïve Bayes Classifiers
CS 188: Artificial Intelligence Fall 2006
Artificial Intelligence
Read R&N Ch Next lecture: Read R&N
Bayes Net Learning: Bayesian Approaches
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008
Machine Learning. k-Nearest Neighbor Classifiers.
Classification Techniques: Bayesian Classification
Read R&N Ch Next lecture: Read R&N
Speech recognition, machine learning
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
Naïve Bayes Classifiers
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
Speech recognition, machine learning
Naïve Bayes Classifier
Presentation transcript:

CHAPTER 6 Naive Bayes Models for Classification

QUESTION????

Bayes’ Rule in Bayes Nets

Combining Evidence

General Naïve Bayes

Modeling with Naïve Bayes

What to do with Naïve Bayes? 1. Create a Naïve Bayes model: We need local probability estimates Could elicit them from a human Better: Estimate them from observations! This is called parameter estimation, or more generally learning 2. Use a Naïve Bayes model to estimate probability of causes given observations of effects: This is a specific kind of probabilistic inference Requires just a simple computation (next slide) From this we can also get the most likely cause, which is called prediction, or classification These are the basic tasks of machine learning!

Building Naïve Bayes Models What do we need to specify a Bayesian Network? Directed acyclic graph (DAG) Conditional probability tables (CPDs) How do we build a Naïve Bayes model? We know the graph structure already (Why?) Estimates of local conditional probability tables (CPTs) P(C), the prior over causes P(E|C) for each evidence variable These typically come from observed data These probabilities are collectively called the parameters of the model and denoted by θ

Review: Parameter Estimation

A Spam Filter

Baselines First task: get a baseline Baselines are very simple “straw man” procedures Help determine how hard the task is Help know what a “good” accuracy is Weak baseline: most frequent label classifier Gives all test instances whatever label was most common in the training set E.g. for spam filtering, might label everything as ham Accuracy might be very high if the problem is skewed For real research, usually use previous work as a (strong) baseline

Naïve Bayes for Text

Example: Spam Filtering Raw probabilities don’t affect the posteriors; relative probabilities (odds ratios) do:

Generalization and Overfitting Relative frequency parameters will overfit the training data! Unlikely that every occurrence of “minute” is 100% spam Unlikely that every occurrence of “seriously” is 100% ham What about the words that don’t occur in the training set? In general, we can’t go around giving unseen events zero probability As an extreme case, imagine using the entire as the only feature Would get the training data perfect (if deterministic labeling) Wouldn’t generalize at all Just making the bag-of-words assumption gives us some generalization, but isn’t enough To generalize better: we need to smooth or regularize the estimates

Estimation: Smoothing Problems with maximum likelihood estimates: –If I flip a coin once, and it’s heads, what’s the estimate for P(heads)? –What if I flip it 50 times with 27 heads? –What if I flip 10M times with 8M heads? Basic idea: –We have some prior expectation about parameters (here, the probability of heads) –Given little evidence, we should skew towards prior –Given a lot of evidence, we should listen to the data

Estimation: Smoothing

Estimation: Laplace Smoothing

Estimation: Linear Interpolation

Real NB: Smoothing

Spam Example