Download presentation
Presentation is loading. Please wait.
Published byGladys Holmes Modified over 9 years ago
1
NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav Dutta
2
CLASSIFICATION ● What is it? Assigning a given piece of input data into one of a given number of categories. e.g. : Classifying kitchen items : separating cups and saucers. 2
3
CLASSIFICATION ● Why do we need it? Separating like things from unlike things. Categorizing different types of cattle like cows, goats, etc. 3
4
CLASSIFICATION Looking for identifiable patterns. Predicting an e-mail is spam or non-spam from patterns observed in previous mails. Automatic categorization on online articles. 4
5
Classification Allowing extrapolation. Given the red dots predicting the value at the blue box. 5
6
Classification Techniques Decision Tree based methods Rule-based methods Memory based methods Neural Networks Naïve Bayes Classifier Support Vector Machines 6
7
Problem Statement Play Tennis : Training Examples DayOutlookTemperatu re HumidityWindPlay Tennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolNormalWeakYes D6RainCoolNormalStrongNo D7OvercastCoolNormalStrongYes D8SunnyMildHighWeakNo D9SunnyCoolNormalWeakYes D10RainMildNormalWeakYes D11SunnyMildNormalStrongYes D12OvercastMildHighStrongYes D13OvercastHotNormalWeakYes D14RainMildHighStrongNo 7
8
Problem Statement Domain space : Set of values an attribute can have. Domain space of previous example: o Outlook – {Sunny, Overcast, Rain} o Temperature – {Hot, Mild, Cool} o Humidity – {High, Normal} o Wind – {Strong, Weak} o Play Tennis – {Yes, No} 8
9
Problem Statement Instances X : A set of items over which the concept is defined. Set of all possible days with attributes Outlook, Temperature, Humidity, Wind. Target concept (c): concept or function to be learned. c : X → {0,1} c(x) = 1 : Play Tennis = Yes c(x) = 0 : Play Tennis = No 9
10
Problem Statement Hypothesis (H) A statement that is assumed to be true for the sake of argument. Conjunction of constraints on the attributes. h : X → {0,1} For each attribute hypothesis will be : ? – any value is acceptable - a single required value Ø - no value is acceptable 10
11
Problem Statement Training examples - Prior knowledge. Set of input vector (instances) and a label(outcome). Input vector Outlook - Sunny, Temperature - Hot, Humidity - High, Wind - Weak. Label Play tennis – No 11
12
Problem Statement Training examples can be : Positive example : Instance satisfies all the constraints of hypothesis h(x) = 1 Negative Example : Instance does not satisfy one or many constraints of hypothesis. h(x) = 0 12
13
Learning Algorithm Naïve Bayes Classifier – Supervised Learning Supervised Learning: machine learning task of inferring a function from supervised (labelled) training data g : X Y X : input space Y : output space 13
14
A quick Recap Conditional Probability : P(A/B) = P(A ∩ B) P(B) Multiplication Rule : P(A ∩ B) = P(A/B).P(B) = P(B/A).P(A) Independent Events: P(A ∩ B) = P(A).P(B) Total Probability: 14 AB
15
Few Important Definitions o Prior Probability: Let p be an uncertain quantity. Then prior probability is the probability distribution that would express one's uncertainty about p before the "data" is taken into account. o Posterior probability: The posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence is taken into account 15
16
Bayes’ Theorem 16
17
MAP HYPOTHESIS 17
18
Example A medical diagnosis problem: It has 2 alternative hypothesis: 1) Patient has a particular form of cancer 2) The patient does not have the particular form of cancer 18
19
Example - Bayes’ Theorem TEST OUTCOMES a) + (Positive - having rare disease) b) - (Negative - not having rare disease) Prior Knowledge: P(cancer) = 0.008P(~cancer) = 0.992 P(+|cancer) = 0.98P(-|cancer) = 0.02 P(+|~cancer) = 0.03P(-|~cancer) = 0.97 19
20
Examples – Bayes Theorem Suppose we now observe a new patient for whom the lab test returns a positive value. Should we diagnose the patient as having cancer or not?? 20
21
Solution 21
22
Naïve Bayes Classifier Supervised Learning Technique Bayes Theorem MAP Hypothesis 22
23
Naïve Bayes Classifier Prior Knowledge Training data set A new instance of data. Objective Classify the new instance of data: Find P(vj|a1,a2,….,an) Find the required probability for all possible classifications. Find the maximum probability among them. 23
24
Naïve Bayes Classifier 24
25
Naïve Bayes Classifier Why Naïve? Assume all attributes to be conditionally independent. P(a 1,a 2,…,a n |v j ) = P(a i |v j ) for all i=1 to n V NB = max of P(v j ) P(a i |v j ) for all v j in V 25
26
Play Tennis : Training Examples DayOutlookTemperatur e HumidityWindPlay Tennis D1SunnyHotHighWeakNo D2SunnyHotHighStrongNo D3OvercastHotHighWeakYes D4RainMildHighWeakYes D5RainCoolNormalWeakYes D6RainCoolNormalStrongNo D7OvercastCoolNormalStrongYes D8SunnyMildHighWeakNo D9SunnyCoolNormalWeakYes D10RainMildNormalWeakYes D11SunnyMildNormalStrongYes D12OvercastMildHighStrongYes D13OvercastHotNormalWeakYes D14RainMildHighStrongNo New Instance: 26
27
Probability Estimate We define our probability estimate to be the frequency of data combinations within the training examples P(v j ) = Fraction of times v j occurs in the training set. P(a i |v j ) = Fraction of times a i occurs in those examples which are classified as v j 27
28
Example Let’s calculate P(Overcast | Yes) Number of training examples classified as Yes = 9 Number of times Outlook = Overcast given the classification is Yes = 4 Hence, P(Overcast | Yes) = 4/9 28
29
Prior Probability P(Yes) = 9/14i.e. P(playing tennis) P(No) = 5/14i.e. P(not playing tennis) Look up Tables 29
30
P(Yes) P(Sunny|Yes) P(Cool|Yes) P(High|Yes) P(Strong|Yes) = 9/14 * 2/9 * 3/9 * 3/9 * 3/9 = 0.0053 P(No) P(Sunny|No) P(Cool|No) P(High|No) P(Strong|No) = 5/14 * 3/5 * 1/5 * 4/5 * 3/5 = 0.0206 Hence, We can’t play tennis given the weather conditions. 30
31
Drawback of the estimate What happens if the probability estimate is zero? The estimate is zero when a particular attribute value never occurs in the training data set given the classification. This estimate will ultimately dominate the product term V NB for that particular classification. 31
32
Example For a new training set, the attribute outlook does not have the value overcast when the example is labeled yes. P(Overcast | Yes) = 0 V NB = P(Yes) * P(Overcast | Yes)*P(Cool | Yes)…. = 0 32
33
Solution 33
34
Disadvantages of Naïve Bayes Classifier 1)Require initial knowledge about many probabilities. 2)Significant computational cost needed to determine Bayes optimal hypothesis. 34
35
Conclusion Naïve Bayes based on the independence assumption o Training is very easy and fast o Test is straightforward; just looking up tables or calculating conditional probabilities with normal distributions A popular generative model o Performance competitive to most of state-of-the-art classifiers even in presence of violating independence assumption o Many successful applications, e.g., spam mail filtering o A good candidate of a base learner in ensemble learning 35
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.