Copyright © 2004, Andrew W. Moore Naïve Bayes Classifiers Andrew W. Moore Professor School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~awm.

Slides:

Advertisements

Similar presentations

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Advertisements

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.

CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

LECTURE 11: BAYESIAN PARAMETER ESTIMATION

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.

Executive Summary: Bayes Nets Andrew W. Moore Carnegie Mellon University, School of Computer Science Note to other teachers.

Overview Full Bayesian Learning MAP learning

Assuming normally distributed data! Naïve Bayes Classifier.

An introduction to time series approaches in biosurveillance Professor The Auton Lab School of Computer Science Carnegie Mellon University

Decision Tree Rong Jin. Determine Milage Per Gallon.

Slide 1 Aug 25th, 2001 Copyright © 2001, Andrew W. Moore Probabilistic Machine Learning Brigham S. Anderson School of Computer Science Carnegie Mellon.

Bayesian Belief Networks

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Review P(h i | d) – probability that the hypothesis is true, given the data (effect  cause) Used by MAP: select the hypothesis that is most likely given.

July 30, 2001Copyright © 2001, Andrew W. Moore Decision Trees Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University.

Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.

Machine Learning Recitation 2 Öznur Taştan September 2, 2009.

Learning Bayesian Networks

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.

Thanks to Nir Friedman, HU

Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes April 3, 2012.

Copyright © 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Copyright © Andrew W. Moore Slide 1 Probabilistic and Bayesian Analytics Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Aug 25th, 2001Copyright © 2001, Andrew W. Moore Probabilistic and Bayesian Analytics Andrew W. Moore Associate Professor School of Computer Science Carnegie.

Copyright © Andrew W. Moore Slide 1 Gaussians Andrew W. Moore Professor School of Computer Science Carnegie Mellon University

Naïve Bayes William W. Cohen. Probabilistic and Bayesian Analytics Andrew W. Moore School of Computer Science Carnegie Mellon University

Naive Bayes Classifier

Copyright © Andrew W. Moore Slide 1 Cross-validation for detecting and preventing overfitting Andrew W. Moore Professor School of Computer Science Carnegie.

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would.

Oct 15th, 2001Copyright © 2001, Andrew W. Moore Bayes Networks Basado en notas de los profesores Andrew W. Moore, Nir Friedman, y Daphne Kolle.

Empirical Research Methods in Computer Science Lecture 6 November 16, 2005 Noah Smith.

Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.

Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.

1 Support Vector Machines Chapter Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School.

Naïve Bayes William W. Cohen. Probabilistic and Bayesian Analytics Andrew W. Moore School of Computer Science Carnegie Mellon University

Machine Learning Lecture 7: SVM Moshe Koppel Slides adapted from Andrew Moore Copyright © 2001, 2003, Andrew W. Moore.

CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Sep 10th, 2001Copyright © 2001, Andrew W. Moore Learning Gaussian Bayes Classifiers Andrew W. Moore Associate Professor School of Computer Science Carnegie.

Sep 6th, 2001Copyright © 2001, 2004, Andrew W. Moore Learning with Maximum Likelihood Andrew W. Moore Professor School of Computer Science Carnegie Mellon.

Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.

CS 401R: Intro. to Probabilistic Graphical Models Lecture #6: Useful Distributions; Reasoning with Joint Distributions This work is licensed under a Creative.

Nov 30th, 2001Copyright © 2001, Andrew W. Moore PAC-learning Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University.

Oct 29th, 2001Copyright © 2001, Andrew W. Moore Bayes Net Structure Learning Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon.

Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.

Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.

Nov 20th, 2001Copyright © 2001, Andrew W. Moore VC-dimension for characterizing classifiers Andrew W. Moore Associate Professor School of Computer Science.

A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.

CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.

Naïve Bayes Classification Recitation, 1/25/07 Jonathan Huang.

PDF, Normal Distribution and Linear Regression

Markov Systems, Markov Decision Processes, and Dynamic Programming

Naive Bayes Classifier

Data Mining Lecture 11.

Support Vector Machines

Speech recognition, machine learning

Support Vector Machines

Analysis of Large Graphs: Overlapping Communities

Speech recognition, machine learning

Presentation transcript:

Copyright © 2004, Andrew W. Moore Naïve Bayes Classifiers Andrew W. Moore Professor School of Computer Science Carnegie Mellon University Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: Comments and corrections gratefully received. For a more in-depth introduction to Naïve Bayes Classifiers and the theory surrounding them, please see Andrew’s lecture on Probability for Data Miners. These notes assume you have already met Bayesian Networks

Slide 2 Copyright © 2004, Andrew W. Moore A simple Bayes Net J RZC JPerson is a Junior CBrought Coat to Classroom ZLive in zipcode RSaw “Return of the King” more than once

Slide 3 Copyright © 2004, Andrew W. Moore A simple Bayes Net J RZC JPerson is a Junior CBrought Coat to Classroom ZLive in zipcode RSaw “Return of the King” more than once What parameters are stored in the CPTs of this Bayes Net?

Slide 4 Copyright © 2004, Andrew W. Moore A simple Bayes Net J RZC JPerson is a Junior CBrought Coat to Classroom ZLive in zipcode RSaw “Return of the King” more than once P(J) = P(C|J) = P(C|~J)= P(R|J) = P(R|~J)= P(Z|J) = P(Z|~J)= Suppose we have a database from 20 people who attended a lecture. How could we use that to estimate the values in this CPT?

Slide 5 Copyright © 2004, Andrew W. Moore A simple Bayes Net J RFC JPerson is a Junior CBrought Coat to Classroom ZLive in zipcode RSaw “Return of the King” more than once P(J) = P(C|J) = P(C|~J)= P(R|J) = P(R|~J)= P(Z|J) = P(Z|~J)= Suppose we have a database from 20 people who attended a lecture. How could we use that to estimate the values in this CPT?

Slide 6 Copyright © 2004, Andrew W. Moore A Naïve Bayes Classifier J RZC JWalked to School CBrought Coat to Classroom ZLive in zipcode RSaw “Return of the King” more than once P(J) = P(C|J) = P(C|~J)= P(R|J) = P(R|~J)= P(Z|J) = P(Z|~J)= Input Attributes Output Attribute A new person shows up at class wearing an “I live right above the Manor Theater where I saw all the Lord of The Rings Movies every night” overcoat. What is the probability that they are a Junior?

Slide 7 Copyright © 2004, Andrew W. Moore Naïve Bayes Classifier Inference J RZC

Slide 8 Copyright © 2004, Andrew W. Moore The General Case Y X1X1 XmXm X2X Estimate P(Y=v) as fraction of records with Y=v 2.Estimate P(X i =u | Y=v) as fraction of “Y=v” records that also have X=u. 3.To predict the Y value given observations of all the X i values, compute

Slide 9 Copyright © 2004, Andrew W. Moore Naïve Bayes Classifier Because of the structure of the Bayes Net

Slide 10 Copyright © 2004, Andrew W. Moore More Facts About Naïve Bayes Classifiers Naïve Bayes Classifiers can be built with real-valued inputs* Rather Technical Complaint: Bayes Classifiers don’t try to be maximally discriminative---they merely try to honestly model what’s going on* Zero probabilities are painful for Joint and Naïve. A hack (justifiable with the magic words “Dirichlet Prior”) can help*. Naïve Bayes is wonderfully cheap. And survives 10,000 attributes cheerfully! *See future Andrew Lectures

Slide 11 Copyright © 2004, Andrew W. Moore What you should know How to build a Bayes Classifier How to predict with a BC