Naïve Bayes Classifier

Slides:



Advertisements
Similar presentations
Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)
Advertisements

What we will cover here What is a classifier
Naïve Bayes Classifier
Naïve Bayes Classifier
On Discriminative vs. Generative classifiers: Naïve Bayes
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Crash Course on Machine Learning
Naïve Bayes Classifier Ke Chen Extended by Longin Jan Latecki COMP20411 Machine Learning.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav.
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
Naive Bayes Classifier
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Classification Techniques: Bayesian Classification
Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Classification And Bayesian Learning
Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.
Bayesian Learning Reading: C. Haruechaiyasak, “A tutorial on naive Bayes classification” (linked from class website)
Chapter 8 – Naïve Bayes DM for Business Intelligence.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.
Naïve Bayes Classification Recitation, 1/25/07 Jonathan Huang.
Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
Lecture 1.31 Criteria for optimal reception of radio signals.
Oliver Schulte Machine Learning 726
Chapter 7. Classification and Prediction
Naïve Bayes Classifier
Naive Bayes Classifier
COMP61011 : Machine Learning Probabilistic Models + Bayes’ Theorem
text processing And naïve bayes
Data Science Algorithms: The Basic Methods
Naïve Bayes Classifier
Bayes Net Learning: Bayesian Approaches
Oliver Schulte Machine Learning 726
Naïve Bayes Classifier
Bayesian Classification
Data Mining Lecture 11.
Classification Techniques: Bayesian Classification
Naïve Bayes CSC 600: Data Mining Class 19.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
CSCI 5822 Probabilistic Models of Human and Machine Learning
Naïve Bayes Classifier
Generative Models and Naïve Bayes
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier
Classification and Prediction
Play Tennis ????? Day Outlook Temperature Humidity Wind PlayTennis
Naïve Bayes Classifiers
LECTURE 23: INFORMATION THEORY REVIEW
Logistic Regression Chapter 7.
Generative Models and Naïve Bayes
Naïve Bayes CSC 576: Data Science.
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Machine Learning: Decision Tree Learning
Mathematical Foundations of BME Reza Shadmehr
NAÏVE BAYES CLASSIFICATION
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Naïve Bayes Classifier

Classifiers Generative vs Discriminative classifiers: discriminative classifier models a classification rule directly based on previous data while Generative classifier makes a probabilistic model of data within each class For example: a discriminative approach would determine differences in linguistic models without learning the language and then classify input data On the other hand, a generative classifier would learn each language and then classify data using the knowledge. Discriminative classifier include: k-NN, decision tree, SVM Generative classifier include: naïve Bayes, model based classifiers Generative and discriminative can both be used for classification. Both can supervised methods but the way they both go about separating positive from the negatives is very different. Generative would make a model of positive and a model for negative, so basically some characterization of the entire population. So basically what type of positives you are likely to see and what kind of positives you are not likely to see and the same for negatives. Thus it creates boundaries where one model is more likely than other and all this is based on probabilities. Thus most of them are probabilistic. For example a discriminative model approach would determine differences in linguistic models without learning the language and then classify input data On the other hand, a generative classifier would learn each language and then classify data using the knowledge.

Classifiers contd.. Generative model learns join probability distribution P(x,y) Discriminative model learns conditional probability distribution P(x|y) Consider following data in form of (x,y): (1,0), (1, 0), (2, 0), (2, 1) P(x, y): P(x | y): Y = 0 Y = 1 X = 1 1/2 X = 2 1/4 Y = 0 Y = 1 X = 1 1 X = 2 1/2

Bayes Theorem Bayes theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event Posterior = likelihood*Prior / Evidence P(c | x) = P(x | c)*P(c) / P(x) Discriminative Generative

Probabilistic model for discriminative classifier To train a discriminative classifier, all training examples of different classes must be jointly used to make a single discriminative classifier. For example, in case of a coin toss, we have 2 classes, head and tail and one of those can happen. For a classifier of languages we will have classes of all available languages and input language will be classified into one of them For a probabilistic classifier, each class will be given a probability to find whether input is classified in that class or not. Input is classified based on class with highest probability. For a non-probabilistic classifier we do not label classes separately P(c | x) c = c1 . . . .cN x = (x1, …. xm) So how can we use concepts from probability to create a classifier?

Probabilistic model for generative classifier For generative classifier, all models are trained independently on only the examples of the same class label. The model outputs N probabilities for a given input with N models. P(x | c) c = c1 . . . .cN x = (x1, …. xm)

Maximum A Posterior (MAP) classification For an input x (x1. . . xN), run discriminative probabilistic classifier and find class with largest probability. We assign input x to label with the largest probability. For generative classifier, we use Bayesian rule to convert into posterior probability P(c | x) = P(x | c)*P(c) / P(x)  ~~ P(x | c)*P(c) (for c = 1. . . k) (P(x) is common for all classes so we can disregard it using in the calculation) Posterior = prior * likelihood / evidence

Naïve Bayes P(x | c)*P(c) (for c = 1. . . k) Here the formula is equivalent to joint probability model: This can be rewritten as using chain rule: Now, we make an assumption that all features in x are mutually independent, conditional on the category Ck

The mentioned assumption leads to the approximation that: Thus the joint model can be expressed as:

Example To better understand the naïve Bayes classification, lets take an example: P(Play = yes) = 9/14 P(Play = No) = 5/14

Example Training of model based on original data OUTLOOK Yes No Sunny 2/9 3/5 Overcast 4/9 0/5 Rainy 3/9 2/5 Total 9/9 5/5

Example TEMPERATURE Yes No Hot 2/9 2/5 Mild 4/9 Cool 3/9 1/5 Total 9/9 5/5

HUMIDITY Yes No High 3/9 4/5 Low 6/9 1/5 Total 9/9 5/5

WIND Yes No Strong 3/9 3/5 Weak 6/9 2/5 Total 9/9 5/5

Example Testing the classifier: x = (Outlook = sunny, temperature = cool, Humidity = High, Wind = String) We now need to apply the following formula that we derived: P(c | x) = P(x | c)*P(c) / P(x)  ~~ P(x | c)*P(c) (for c = 1. . . k) The above mentioned scenario can be classified as either ‘Yes’ or ‘No’ P(Outlook = sunny | play = yes) = 2/9 P(Temperature = cool | play = yes) = 3/9 P(Humidity = high | play = yes) = 3/9 P(Wind = strong | play = yes) = 3/9 P(play = yes) = 9/14

Classification using MAP rule: P(Outlook = sunny | play = no) = 3/5 P(Temperature = cool | play = no) = 1/5 P(Humidity = high | play = no) = 4/5 P(Wind = strong | play = no) = 3/5 P(play = no) = 5/14 Classification using MAP rule: P(Yes|x) = [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053 P(No|x) = [P(Sunny|No)P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206 From the result we find that P(No | x) > P(Yes | x), thus we classify x = No

Given the data for symptoms and whether patient have flu or not, classify following: x = (chills = Y, runny nose = N, headache = mild, fever = Y)

CHILLS Flu = Yes Flu = No Yes 3/4 1/4 No 2/4

RUNNY NOSE Flu = Yes Flu = No Yes 4/5 1/5 No 1/3 2/3

HEADACHE Flu = Yes Flu = No Strong 2/3 1/3 Mild No 1/2

FEVER Flu = Yes Flu = No Yes 4/5 1/5 No 1/3 2/3

P(Flu = Yes) = 5/8 P(Flu = No) = 3/8 P(chills = Y | Y) = 3/4 P(chills = Y | N) = 1/4 P(runny nose = N | Y) = 1/3 P(runny nose = N | N) = 2/3 P(headache = Mild | Y) = 2/3 P(headache = Mild | N) = 1/3 P(fever = Y | Y) = 4/5 P(fever = Y | N) = 1/5 P(Yes|x) = [P(chills=Y|Yes)P(runny nose=Y|Yes)P(headache=Y|Yes)P(fever=Y|yes)]P(flu=Yes) = 0.006 P(Yes|x) = [P(chills=Y|Yes)P(runny nose=Y|Yes)P(headache=Y|Yes)P(fever=Y|yes)]P(flu=Yes) = 0.018

Pros of Naïve Bayes It is easy and fast to predict a class of test data set Naïve Bayes classifier performs better compared to other models i.e. logistic regression and it needs less training data It performs well in case of categorical input variables compared to numerical variables Highly scalable, it scales linearly with number of predictors and data points Handles continuous and discrete data Not sensitive to irrelevant features

Cons of Naïve Bayes If categorical variable has a category (in test data set) which was not observed in training data set, then model will assign a 0 probability and will be unable to make a prediction This is often known as “zero frequency”. Can be solved by smoothing technique One example of smoothing technique is Laplace estimation Naïve Bayes makes assumption of independent predictors. In real life it is often hard to get set of predictors which are completely independent

Applications of Naïve Bayes Algorithms Naïve Bayes fast and thus can be used for making real time predictions It can predict probability of multiple classes of target variables It can be used for text classification, spam filtering, sentiment analysis Naïve Bayes and collaborative filtering together can help make recommendation systems to filter unseen information and predict whether user would like a given resource or not