Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Classifier. 2 Review: Decision Tree Age? Student? Credit? fair excellent >40 31…40 <=30 NOYES noyes NOYES buys_computer ?

Similar presentations


Presentation on theme: "Bayesian Classifier. 2 Review: Decision Tree Age? Student? Credit? fair excellent >40 31…40 <=30 NOYES noyes NOYES buys_computer ?"— Presentation transcript:

1 Bayesian Classifier

2 2 Review: Decision Tree Age? Student? Credit? fair excellent >40 31…40 <=30 NOYES noyes NOYES buys_computer ?

3 3 Bayesian Classification  Bayesian classifier vs. decision tree Decision tree: predict the class label Bayesian classifier: statistical classifier; predict class membership probabilities  Based on Bayes theorem; estimate posterior probability  Naïve Bayesian classifier: Simple classifier that assumes attribute independence High speed when applied to large databases Comparable in performance to decision trees

4 4 Bayes Theorem  Let X be a data sample whose class label is unknown  Let H i be the hypothesis that X belongs to a particular class C i  P(H i ) is class prior probability that X belongs to a particular class C i Can be estimated by n i /n from training data samples n is the total number of training data samples n i is the number of training data samples of class C i Formula of Bayes Theorem

5 5 AgeIncomeStudentCreditBuys_computer P131…40highnofairno P2<=30highnoexcellentno P331…40highnofairyes P4>40mediumnofairyes P5>40lowyesfairyes P6>40lowyesexcellentno P731…40lowyesexcellentyes P8<=30mediumnofairno P9<=30lowyesfairyes P10>40mediumyesfairyes  H1: Buys_computer=yes  H0: Buys_computer=no  P(H1)=6/10 = 0.6  P(H0)=4/10 = 0.4

6 6 Bayes Theorem  P(H i |X) is class posteriori probability (of H conditioned on X) Probability that data example X belongs to class C i given the attribute values of X e.g., given X=(age:31…40, income: medium, student: yes, credit: fair), what is the probability X buys computer?  To classify means to determine the highest P(H i |X) among all classes C 1,…C m If P(H 1 |X)>P(H 0 |X), then X buys computer If P(H 0 |X)>P(H 1 |X), then X does not buy computer Calculate P(H i |X) using the Bayes theorem

7 7 Bayes Theorem  P(X) is descriptor prior probability of X Probability that observe the attribute values of X Suppose X= (x 1, x 2,…, x n ) and they are independent, then P(X) =P(x 1 ) P(x 2 ) … P(x n ) P(x j )=n j /n, where n j is number of training examples having value x j for attribute A j n is the total number of training examples Constant for all classes

8 8 AgeIncomeStudentCreditBuys_computer P131…40highnofairno P2<=30highnoexcellentno P331…40highnofairyes P4>40mediumnofairyes P5>40lowyesfairyes P6>40lowyesexcellentno P731…40lowyesexcellentyes P8<=30mediumnofairno P9<=30lowyesfairyes P10>40mediumyesfairyes  X=(age:31…40, income: medium, student: yes, credit: fair)  P(age=31…40)=3/10 P(income=medium)=3/10 P(student=yes)=5/10 P(credit=fair)=7/10  P(X)=P(age=31…40)  P(income=medium)  P(student=yes)  P(credit=fair) =0.3  0.3  0.5  0.7 = 0.0315

9 9 Bayes Theorem  P(X|H i ) is descriptor posterior probability Probability that observe X in class C i Assume X=(x 1, x 2,…, x n ) and they are independent, then P(X|H i ) =P(x 1 |H i ) P(x 2 |H i ) … P(x n |H i ) P(x j |H i )=n i,j /n i, where n i,j is number of training examples in class C i having value x j for attribute A j n i is number of training examples in C i

10 10  X= (age:31…40, income: medium, student: yes, credit: fair)  H 1 = X buys a computer  n1 = 6, n 11 =2, n 21 =2, n 31 =4, n 41 =5,  P(X|H 1 )= AgeIncomeStudentCreditBuys_computer P131…40highnofairno P2<=30highnoexcellentno P331…40highnofairyes P4>40mediumnofairyes P5>40lowyesfairyes P6>40lowyesexcellentno P731…40lowyesexcellentyes P8<=30mediumnofairno P9<=30lowyesfairyes P10>40mediumyesfairyes

11 11  X= (age:31…40, income: medium, student: yes, credit: fair)  H 0 = X does not buy a computer  n0 = 4, n 10 =1, n 20 =1, n 31 =1, n 41 = 2,  P(X|H 0 )= AgeIncomeStudentCreditBuys_computer P131…40highnofairno P2<=30highnoexcellentno P331…40highnofairyes P4>40mediumnofairyes P5>40lowyesfairyes P6>40lowyesexcellentno P731…40lowyesexcellentyes P8<=30mediumnofairno P9<=30lowyesfairyes P10>40mediumyesfairyes

12 12 Class Posterior Probability Class Prior ProbabilityDescriptor Posterior Probability Descriptor Prior Probability Bayesian Classifier – Basic Equation  To classify means to determine the highest P(H i |X) among all classes C 1,…C m  P(X) is constant to all classes  Only need to compare P(H i )P(X|H i )

13 13 Weather dataset example X =

14 14 Weather dataset example  Given a training set, we can compute probabilities: P(p) = 9/14 P(n) = 5/14 P(Hi)P(Hi) P(xj|Hi)P(xj|Hi)

15 15 Weather dataset example: classifying X  An unseen sample X =  P(p) P(X|p) = P(p) P(rain|p) P(hot|p) P(high|p) P(false|p) = 9/14 · 3/9 · 2/9 · 3/9 · 6/9 · = 0.010582  P(n) P(X|n) = P(n) P(rain|n) P(hot|n) P(high|n) P(false|n) = 5/14 · 2/5 · 2/5 · 4/5 · 2/5 = 0.018286  Sample X is classified in class n (don’t play)

16 16 The independence hypothesis…  … makes computation possible  … yields optimal classifiers when satisfied  … but is seldom satisfied in practice, as attributes (variables) are often correlated.  Attempts to overcome this limitation: Bayesian networks, that combine Bayesian reasoning with causal relationships between attributes Decision trees, that reason on one attribute at the time, considering most important attributes first


Download ppt "Bayesian Classifier. 2 Review: Decision Tree Age? Student? Credit? fair excellent >40 31…40 <=30 NOYES noyes NOYES buys_computer ?"

Similar presentations


Ads by Google