Oliver Schulte Machine Learning 726 Bayes Net Classifiers The Naïve Bayes Model
2/13 Classification Suppose we have a target node V such that all queries of interest are of the form P(V=v| values for all other variables). Example: predict whether patient has bronchitis given values for all other nodes. Because we know form of query, we can optimize the Bayes net. V is called the class variable. v is called the class label. The other variables are called features.
3/13 Optimizing the Structure Some nodes are irrelevant to a target node, given the others. Examples Can you guess the pattern? The Markov blanket of a node contains: The neighbors. The spouses (co-parents).
4/13 The Markov Blanket The Markov blanket of a node contains: The neighbors. The spouses (co-parents).
5/13 How to Build a Bayes net classifier Eliminate nodes not in the Markov blanket. Feature Selection. Learn parameters. Fewer dimensions!
6/13 The Naïve Bayes Model
7/13 Classification Models A Bayes net is a very general probability model. Sometimes want to use more specific models. 1. More intelligible for some users. 2. Models make assumptions : if correct → better learning. Widely used Bayes net-type classifier: Naïve Bayes.
8/13 The Naïve Bayes Model Given class label, features are independent. Intuition: The only way in which features interact is through the class label. Also: We don’t care about correlations among features. PlayTennis Humidity Outlook TemperatureWind
9/13 The Naive Bayes Classification Model Exercise: Use the Naive Bayes Assumption to find a simple expression for P(PlayTennis=yes|o,t,w,h) Solution: 1. multiply the numbers in each column 2. Divide by P(o,t,w,h) PriorOutlookTemperatureWindHumidity P(PT=yes)P(o|PT=yes)P(t|PT=yes)P(w|PT=yes)P(h|PT=yes)
10/13 Example PriorOutlookTemperatureWindHumidityProduct P(PT=yes) P(sunny|PT =yes) P(cool|PT=y es) P(strong|P T=yes)P(high|PT=yes) 9/14 2/9 1/ PriorOutlookTemperatureWindHumidityProduct P(PT=no) P(sunny|PT =no) P(cool|PT=n o) P(strong|P T=no)P(high|PT=no) 5/14 3/5 1/5 3/5 4/ Normalization: P(PT=yes|features) = / = 20.5%.
11/13 Naive Bayes Learning Use maximum likelihood estimates, i.e. observed frequencies. Linear number of parameters! Example: see previous slide. Weka.NaiveBayesSimple uses Laplace estimation. For another refinement, can perform feature selection first. Can also apply boosting to Naive Bayes learning, very competitive. PlayTennis Humidity Outlook TemperatureWind
12/13 Ratio/Odds Classification Formula If we only care about classification, can ignore normalization constant. Ratios of feature probabilities more numeric stability. Exercise: Use the Naive Bayes Assumption to find a simple expression for the posterior odds P(class=yes|features)/P(class = no|features). PriorOutlookTemperaturWindHumidity P(PT=yes)/P(o|yes)/ P(h|yes)/ P(PT=no)P(o|no) P(h|no) Product = 0.26, see examples.xlsxexamples.xlsx Positive or negative?
13/13 Log-Odds Formula For even more numeric stability, use logs. Intuitive interpretation: each feature “votes” for a class,then we add up votes. PriorOutlookTemperaturWindHumidity P(PT=yes)/P(o|yes)/ P(h|yes)/ P(PT=no)P(o|no) P(h|no) Sum = -1.36, see examples.xlsxexamples.xlsx Positive or negative? Linear discriminant: add up feature terms, accept if >0.