Download presentation
Presentation is loading. Please wait.
Published byNeil Greene Modified over 8 years ago
1
Chapter 8 – Naïve Bayes DM for Business Intelligence
2
Naïve Bayes Characteristics Theorem named after Reverend Thomas Bayes (1702 – 1761) Data-driven, not model-driven (i.e., lets data “create” the training model) Is Supervised Learning (i.e., defined Output Variable) Is naive since it makes the assumption that each of its inputs are independent of each other (an assumption which rarely holds true in real life) Example: an illness may be diagnosed as Flu if a patient has a virus, a fever, and congestion (even though the virus may actually be causing the other two symptoms). Also naïve because it ignores the existence of other symptoms.
3
Naïve Bayes: The Basic Idea For a given new record to be classified, find other records like it (i.e., same values for input variables) What is the prevalent class (i.e., Output Variable value) among those records? Assign that class to your new record
4
Usage Requires Categorical input variables Numerical input variables must be binned and converted to Categorical variables Can be used with very large data sets “Exact” Bayes requires that you have exact records in your dataset matching the thing you are trying to classify Bayes can be “modified” to calculate probability of a match in your dataset
5
Exact Bayes Classifier Relies on finding other records that share same input variable values as record-to-be-classified. Want to find “probability of belonging to class C, given specified values of predictors/input variables” Even with large data sets, may be hard to find other records that exactly match your record, in terms of predictor values.
6
Solution – Modified Naïve Bayes Assume independence of predictor variables (within each class) Use multiplication rule Find same probability that record belongs to class C, given predictor values, without limiting calculation to records that share all those same values
7
Modified Bayes Calculations 1. Take a record, and note its predictor values 2. Find the probabilities those predictor values occur across all records in C1 3. Multiply them together, then by proportion of records belonging to C1 4. Same for C2, C3, etc. 5. Prob. of belonging to C1 is value from step (3) divide by sum of all such values C1 … Cn 6. Establish & adjust a “cutoff” prob. for class of interest
8
Example: Financial Fraud Target variable: Audit finds “Fraud,” or “No fraud” Predictors: Prior pending legal charges (yes/no) Size of firm (small/large)
10
Exact Bayes Calculations Goal: classify (as “fraudulent” or as “truthful”) a small firm with charges filed There are 2 firms like that, one fraudulent and the other truthful There is no “majority” class, so... P(fraud | prior charges=y, size=small) = ½ = 0.50 Note: calculation is limited to the two firms matching those exact characteristics
11
Modified Naïve Bayes Calculations Page 151 formula... P(Ci|x 1 …,x p ) = P(x 1...,x p |C i ) P(C i ) P(x 1...,x p |C i ) P(C i ) + P(x 1...,x p |C m ) P(C m ) P(Fraud|Prior Charges & Small Firm) = Proportion of “Prior Charges,” Small,” & “Fraud” to Total Records Divided by... {same} + Proportion of “Prior Charges,” “Small,” & “Truthful” to Total Records
12
Modified Naïve Bayes Calculations Same goal as before Compute 2 quantities: Proportion of “prior charges = y” among frauds, times proportion of “small” among frauds, times proportion frauds = 3/4 * 1/4 * 4/10 = 0.075 Proportion of “prior charges = y” among truthfuls, times proportion of “small” among truthfuls, times proportion of truthfuls = 1/6 * 4/6 * 6/10 = 0.067 P(fraud | prior charges, small) = 0.075 / (0.075 + 0.067) = 0.53
13
Modified Naïve Bayes, cont. Note that probability estimate does not differ greatly from exact result of.50 But, ALL records are used in calculations, not just those matching predictor values This makes calculations practical in most circumstances Relies on assumption of independence between predictor variables within each class
14
Independence Assumption Not strictly justified (variables often correlated with one another) Often “good enough”
15
Advantages Handles purely categorical data well Works well with very large data sets Simple & computationally efficient
16
Shortcomings Requires large number of records Problematic when a predictor category is not present in training data Assigns 0 probability of response, ignoring information in other variables
17
On the other hand… Probability rankings are more accurate than the exact probability estimates Good for applications using lift (e.g., response to mailing), less so for applications requiring probabilities (e.g., credit scoring)
18
Summary No statistical models involved Naïve Bayes (like k-NN) pays attention to complex interactions and local structure Computational challenges remain
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.