Download presentation
Presentation is loading. Please wait.
Published byDominic Hall Modified over 9 years ago
1
Jeff Howbert Introduction to Machine Learning Winter 2012 1 Classification Bayesian Classifiers
2
Jeff Howbert Introduction to Machine Learning Winter 2012 2 l A probabilistic framework for solving classification problems. –Used where class assignment is not deterministic, i.e. a particular set of attribute values will sometimes be associated with one class, sometimes with another. –Requires estimation of posterior probability for each class, given a set of attribute values: for each class C i –Then use decision theory to make predictions for a new sample x Bayesian classification
3
Jeff Howbert Introduction to Machine Learning Winter 2012 3 l Conditional probability: l Bayes theorem: Bayesian classification posterior probability likelihood prior probability evidence
4
Jeff Howbert Introduction to Machine Learning Winter 2012 4 l Given: –A doctor knows that meningitis causes stiff neck 50% of the time –Prior probability of any patient having meningitis is 1/50,000 –Prior probability of any patient having stiff neck is 1/20 l If a patient has stiff neck, what’s the probability he/she has meningitis? Example of Bayes theorem
5
Jeff Howbert Introduction to Machine Learning Winter 2012 5 l Treat each attribute and class label as random variables. l Given a sample x with attributes ( x 1, x 2, …, x n ): –Goal is to predict class C. –Specifically, we want to find the value of C i that maximizes p( C i | x 1, x 2, …, x n ). l Can we estimate p( C i | x 1, x 2, …, x n ) directly from data? Bayesian classifiers
6
Jeff Howbert Introduction to Machine Learning Winter 2012 6 Approach: l Compute the posterior probability p( C i | x 1, x 2, …, x n ) for each value of C i using Bayes theorem: l Choose value of C i that maximizes p( C i | x 1, x 2, …, x n ) l Equivalent to choosing value of C i that maximizes p( x 1, x 2, …, x n | C i ) p( C i ) (We can ignore denominator – why?) l Easy to estimate priors p( C i ) from data. (How?) l The real challenge: how to estimate p( x 1, x 2, …, x n | C i )? Bayesian classifiers
7
Jeff Howbert Introduction to Machine Learning Winter 2012 7 l How to estimate p( x 1, x 2, …, x n | C i )? l In the general case, where the attributes x j have dependencies, this requires estimating the full joint distribution p( x 1, x 2, …, x n ) for each class C i. l There is almost never enough data to confidently make such estimates. Bayesian classifiers
8
Jeff Howbert Introduction to Machine Learning Winter 2012 8 l Assume independence among attributes x j when class is given: p( x 1, x 2, …, x n | C i ) = p( x 1 | C i ) p( x 2 | C i ) … p( x n | C i ) l Usually straightforward and practical to estimate p( x j | C i ) for all x j and C i. l New sample is classified to C i if p( C i ) p( x j | C i ) is maximal. Naïve Bayes classifier
9
Jeff Howbert Introduction to Machine Learning Winter 2012 9 l Class priors: p( C i ) = N i / N p( No ) = 7/10 p( Yes ) = 3/10 l For discrete attributes: p( x j | C i ) = | x ji | / N i where | x ji | is number of instances in class C i having attribute value x j Examples: p ( Status = Married | No ) = 4/7 p( Refund = Yes | Yes ) = 0 How to estimate p ( x j | C i ) from data?
10
Jeff Howbert Introduction to Machine Learning Winter 2012 10 l For continuous attributes: –Discretize the range into bins replace with an ordinal attribute –Two-way split: ( x i v ) replace with a binary attribute –Probability density estimation: assume attribute follows some standard parametric probability distribution (usually a Gaussian) use data to estimate parameters of distribution (e.g. mean and variance) once distribution is known, can use it to estimate the conditional probability p( x j | C i ) How to estimate p ( x j | C i ) from data?
11
Jeff Howbert Introduction to Machine Learning Winter 2012 11 l Gaussian distribution: –one for each ( x j, C i ) pair l For ( Income | Class = No ): –sample mean = 110 –sample variance = 2975 How to estimate p ( x j | C i ) from data?
12
Jeff Howbert Introduction to Machine Learning Winter 2012 12 Example of using naïve Bayes classifier l p( x | Class = No ) = p( Refund = No | Class = No) p( Married | Class = No ) p( Income = 120K | Class = No ) = 4/7 4/7 0.0072 = 0.0024 l p( x | Class = Yes ) = p( Refund = No | Class = Yes) p( Married | Class = Yes ) p( Income = 120K | Class = Yes ) = 1 0 1.2 10 -9 = 0 p( x | No ) p( No ) > p( x | Yes ) p( Yes ) therefore p( No | x ) > p( Yes | x ) => Class = No Given a Test Record:
13
Jeff Howbert Introduction to Machine Learning Winter 2012 13 l Problem: if one of the conditional probabilities is zero, then the entire expression becomes zero. l This is a significant practical problem, especially when training samples are limited. l Ways to improve probability estimation: Naïve Bayes classifier c: number of classes p: prior probability m: parameter
14
Jeff Howbert Introduction to Machine Learning Winter 2012 14 Example of Naïve Bayes classifier X: attributes M: class = mammal N: class = non-mammal p( X | M ) p( M ) > p( X | N ) p( N ) => mammal
15
Jeff Howbert Introduction to Machine Learning Winter 2012 15 l Robust to isolated noise samples. l Handles missing values by ignoring the sample during probability estimate calculations. l Robust to irrelevant attributes. l NOT robust to redundant attributes. –Independence assumption does not hold in this case. –Use other techniques such as Bayesian Belief Networks (BBN). Summary of naïve Bayes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.