Oliver Schulte Machine Learning 726 Bayes Net Classifiers The Naïve Bayes Model.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

COMPUTER AIDED DIAGNOSIS: CLASSIFICATION Prof. Yasser Mostafa Kadah –
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Bayesian Learning Provides practical learning algorithms
Support Vector Machines and Margins
Oliver Schulte Machine Learning 726
Logistic Regression.
Chapter 4: Linear Models for Classification
What is Statistical Modeling
Overview Full Bayesian Learning MAP learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Classification and risk prediction
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Classification 10/03/07.
Review. 2 Statistical modeling  “Opposite” of 1R: use all the attributes  Two assumptions: Attributes are  equally important  statistically independent.
Presented by Zeehasham Rasheed
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Thanks to Nir Friedman, HU
Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.
Crash Course on Machine Learning
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes April 3, 2012.
Principles of Pattern Recognition
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Naive Bayes Classifier
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
Oliver Schulte Machine Learning 726
Oliver Schulte Machine Learning 726
Naive Bayes Classifier
Data Science Algorithms: The Basic Methods
Read R&N Ch Next lecture: Read R&N
CH 5: Multivariate Methods
Bayes Net Learning: Bayesian Approaches
Oliver Schulte Machine Learning 726
Read R&N Ch Next lecture: Read R&N
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.
LECTURE 23: INFORMATION THEORY REVIEW
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Read R&N Ch Next lecture: Read R&N
Logistic Regression.
Naïve Bayes Classifier
Presentation transcript:

Oliver Schulte Machine Learning 726 Bayes Net Classifiers The Naïve Bayes Model

2/13 Classification Suppose we have a target node V such that all queries of interest are of the form P(V=v| values for all other variables). Example: predict whether patient has bronchitis given values for all other nodes.  Because we know form of query, we can optimize the Bayes net. V is called the class variable. v is called the class label. The other variables are called features.

3/13 Optimizing the Structure Some nodes are irrelevant to a target node, given the others. Examples Can you guess the pattern? The Markov blanket of a node contains: The neighbors. The spouses (co-parents).

4/13 The Markov Blanket The Markov blanket of a node contains: The neighbors. The spouses (co-parents).

5/13 How to Build a Bayes net classifier Eliminate nodes not in the Markov blanket.  Feature Selection. Learn parameters.  Fewer dimensions!

6/13 The Naïve Bayes Model

7/13 Classification Models A Bayes net is a very general probability model. Sometimes want to use more specific models. 1. More intelligible for some users. 2. Models make assumptions : if correct → better learning. Widely used Bayes net-type classifier: Naïve Bayes.

8/13 The Naïve Bayes Model Given class label, features are independent. Intuition: The only way in which features interact is through the class label. Also: We don’t care about correlations among features. PlayTennis Humidity Outlook TemperatureWind

9/13 The Naive Bayes Classification Model Exercise: Use the Naive Bayes Assumption to find a simple expression for P(PlayTennis=yes|o,t,w,h) Solution: 1. multiply the numbers in each column 2. Divide by P(o,t,w,h) PriorOutlookTemperatureWindHumidity P(PT=yes)P(o|PT=yes)P(t|PT=yes)P(w|PT=yes)P(h|PT=yes)

10/13 Example PriorOutlookTemperatureWindHumidityProduct P(PT=yes) P(sunny|PT =yes) P(cool|PT=y es) P(strong|P T=yes)P(high|PT=yes) 9/14 2/9 1/ PriorOutlookTemperatureWindHumidityProduct P(PT=no) P(sunny|PT =no) P(cool|PT=n o) P(strong|P T=no)P(high|PT=no) 5/14 3/5 1/5 3/5 4/ Normalization: P(PT=yes|features) = / = 20.5%.

11/13 Naive Bayes Learning Use maximum likelihood estimates, i.e. observed frequencies. Linear number of parameters! Example: see previous slide. Weka.NaiveBayesSimple uses Laplace estimation. For another refinement, can perform feature selection first. Can also apply boosting to Naive Bayes learning, very competitive. PlayTennis Humidity Outlook TemperatureWind

12/13 Ratio/Odds Classification Formula If we only care about classification, can ignore normalization constant. Ratios of feature probabilities more numeric stability. Exercise: Use the Naive Bayes Assumption to find a simple expression for the posterior odds P(class=yes|features)/P(class = no|features). PriorOutlookTemperaturWindHumidity P(PT=yes)/P(o|yes)/ P(h|yes)/ P(PT=no)P(o|no) P(h|no) Product = 0.26, see examples.xlsxexamples.xlsx Positive or negative?

13/13 Log-Odds Formula For even more numeric stability, use logs. Intuitive interpretation: each feature “votes” for a class,then we add up votes. PriorOutlookTemperaturWindHumidity P(PT=yes)/P(o|yes)/ P(h|yes)/ P(PT=no)P(o|no) P(h|no) Sum = -1.36, see examples.xlsxexamples.xlsx Positive or negative? Linear discriminant: add up feature terms, accept if >0.