Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oliver Schulte Machine Learning 726 Bayes Net Classifiers The Naïve Bayes Model.

Similar presentations


Presentation on theme: "Oliver Schulte Machine Learning 726 Bayes Net Classifiers The Naïve Bayes Model."— Presentation transcript:

1 Oliver Schulte Machine Learning 726 Bayes Net Classifiers The Naïve Bayes Model

2 2/13 Classification Suppose we have a target node V such that all queries of interest are of the form P(V=v| values for all other variables). Example: predict whether patient has bronchitis given values for all other nodes.  Because we know form of query, we can optimize the Bayes net. V is called the class variable. v is called the class label. The other variables are called features.

3 3/13 Optimizing the Structure Some nodes are irrelevant to a target node, given the others. Examples Can you guess the pattern? The Markov blanket of a node contains: The neighbors. The spouses (co-parents).

4 4/13 The Markov Blanket The Markov blanket of a node contains: The neighbors. The spouses (co-parents).

5 5/13 How to Build a Bayes net classifier Eliminate nodes not in the Markov blanket.  Feature Selection. Learn parameters.  Fewer dimensions!

6 6/13 The Naïve Bayes Model

7 7/13 Classification Models A Bayes net is a very general probability model. Sometimes want to use more specific models. 1. More intelligible for some users. 2. Models make assumptions : if correct → better learning. Widely used Bayes net-type classifier: Naïve Bayes.

8 8/13 The Naïve Bayes Model Given class label, features are independent. Intuition: The only way in which features interact is through the class label. Also: We don’t care about correlations among features. PlayTennis Humidity Outlook TemperatureWind

9 9/13 The Naive Bayes Classification Model Exercise: Use the Naive Bayes Assumption to find a simple expression for P(PlayTennis=yes|o,t,w,h) Solution: 1. multiply the numbers in each column 2. Divide by P(o,t,w,h) PriorOutlookTemperatureWindHumidity P(PT=yes)P(o|PT=yes)P(t|PT=yes)P(w|PT=yes)P(h|PT=yes)

10 10/13 Example PriorOutlookTemperatureWindHumidityProduct P(PT=yes) P(sunny|PT =yes) P(cool|PT=y es) P(strong|P T=yes)P(high|PT=yes) 9/14 2/9 1/3 0.0053 PriorOutlookTemperatureWindHumidityProduct P(PT=no) P(sunny|PT =no) P(cool|PT=n o) P(strong|P T=no)P(high|PT=no) 5/14 3/5 1/5 3/5 4/50.0206 Normalization: P(PT=yes|features) = 0.0053/0.0053+0.0206 = 20.5%.

11 11/13 Naive Bayes Learning Use maximum likelihood estimates, i.e. observed frequencies. Linear number of parameters! Example: see previous slide. Weka.NaiveBayesSimple uses Laplace estimation. For another refinement, can perform feature selection first. Can also apply boosting to Naive Bayes learning, very competitive. PlayTennis Humidity Outlook TemperatureWind

12 12/13 Ratio/Odds Classification Formula If we only care about classification, can ignore normalization constant. Ratios of feature probabilities more numeric stability. Exercise: Use the Naive Bayes Assumption to find a simple expression for the posterior odds P(class=yes|features)/P(class = no|features). PriorOutlookTemperaturWindHumidity P(PT=yes)/P(o|yes)/ P(h|yes)/ P(PT=no)P(o|no) P(h|no) 1.800.371.670.560.42 Product = 0.26, see examples.xlsxexamples.xlsx Positive or negative?

13 13/13 Log-Odds Formula For even more numeric stability, use logs. Intuitive interpretation: each feature “votes” for a class,then we add up votes. PriorOutlookTemperaturWindHumidity P(PT=yes)/P(o|yes)/ P(h|yes)/ P(PT=no)P(o|no) P(h|no) 0.59-0.990.51-0.59-0.88 Sum = -1.36, see examples.xlsxexamples.xlsx Positive or negative? Linear discriminant: add up feature terms, accept if >0.


Download ppt "Oliver Schulte Machine Learning 726 Bayes Net Classifiers The Naïve Bayes Model."

Similar presentations


Ads by Google