Chapter 8 – Naïve Bayes DM for Business Intelligence.

Slides:



Advertisements
Similar presentations
Naive Bayes Classifiers, an Overview By Roozmehr Safi.
Advertisements

Naïve-Bayes Classifiers Business Intelligence for Managers.
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
Foundations of Artificial Intelligence 1 Bayes’ Rule - Example  Medical Diagnosis  suppose we know from statistical data that flu causes fever in 80%
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Data Mining Classification: Naïve Bayes Classifier
Rosa Cowan April 29, 2008 Predictive Modeling & The Bayes Classifier.
Chapter 7 – K-Nearest-Neighbor
Sampling Distributions
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
One-way Between Groups Analysis of Variance
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
Today Concepts underlying inferential statistics
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Chapter 5 Data mining : A Closer Look.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Overview DM for Business Intelligence.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Principles of Pattern Recognition
Bayesian Networks. Male brain wiring Female brain wiring.
by B. Zadrozny and C. Elkan
Chapter 19 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A dichotomous outcome is one that has only two possibilities (e.g., pass or fail;
Text Classification, Active/Interactive learning.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Naive Bayes Classifier
Chapter 9 – Classification and Regression Trees
Chapter 12 – Discriminant Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz.
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
Bayesian Networks Martin Bachler MLA - VO
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Classification Techniques: Bayesian Classification
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Uncertainty in Expert Systems
Chapter Part 1 Sampling Distribution Models for.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Bayesian Classification
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
POPLHLTH 304 Regression (modelling) in Epidemiology Simon Thornley (Slides adapted from Assoc. Prof. Roger Marshall)
Classification Today: Basic Problem Decision Trees.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
CIS 335 CIS 335 Data Mining Classification Part I.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Chapter 12 – Discriminant Analysis
Chapter 12 Chi-Square Tests and Nonparametric Tests
Naive Bayes Classifier
Chapter 7 – K-Nearest-Neighbor
Chapter 7 Sampling Distributions.
Bayesian Classification
Machine Learning. k-Nearest Neighbor Classifiers.
Classification Techniques: Bayesian Classification
Naïve Bayes CSC 600: Data Mining Class 19.
Chapter 7 Sampling Distributions.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Chapter 7 Sampling Distributions.
LECTURE 07: BAYESIAN ESTIMATION
Chapter 7 Sampling Distributions.
Naïve Bayes CSC 576: Data Science.
Chapter 7 Sampling Distributions.
Naïve Bayes Classifier
Presentation transcript:

Chapter 8 – Naïve Bayes DM for Business Intelligence

Naïve Bayes Characteristics Theorem named after Reverend Thomas Bayes (1702 – 1761) Data-driven, not model-driven (i.e., lets data “create” the training model) Is Supervised Learning (i.e., defined Output Variable) Is naive since it makes the assumption that each of its inputs are independent of each other (an assumption which rarely holds true in real life) Example: an illness may be diagnosed as Flu if a patient has a virus, a fever, and congestion (even though the virus may actually be causing the other two symptoms). Also naïve because it ignores the existence of other symptoms.

Naïve Bayes: The Basic Idea For a given new record to be classified, find other records like it (i.e., same values for input variables) What is the prevalent class (i.e., Output Variable value) among those records? Assign that class to your new record

Usage Requires Categorical input variables Numerical input variables must be binned and converted to Categorical variables Can be used with very large data sets “Exact” Bayes requires that you have exact records in your dataset matching the thing you are trying to classify Bayes can be “modified” to calculate probability of a match in your dataset

Exact Bayes Classifier Relies on finding other records that share same input variable values as record-to-be-classified. Want to find “probability of belonging to class C, given specified values of predictors/input variables” Even with large data sets, may be hard to find other records that exactly match your record, in terms of predictor values.

Solution – Modified Naïve Bayes Assume independence of predictor variables (within each class) Use multiplication rule Find same probability that record belongs to class C, given predictor values, without limiting calculation to records that share all those same values

Modified Bayes Calculations 1. Take a record, and note its predictor values 2. Find the probabilities those predictor values occur across all records in C1 3. Multiply them together, then by proportion of records belonging to C1 4. Same for C2, C3, etc. 5. Prob. of belonging to C1 is value from step (3) divide by sum of all such values C1 … Cn 6. Establish & adjust a “cutoff” prob. for class of interest

Example: Financial Fraud Target variable: Audit finds “Fraud,” or “No fraud” Predictors: Prior pending legal charges (yes/no) Size of firm (small/large)

Exact Bayes Calculations Goal: classify (as “fraudulent” or as “truthful”) a small firm with charges filed There are 2 firms like that, one fraudulent and the other truthful There is no “majority” class, so... P(fraud | prior charges=y, size=small) = ½ = 0.50 Note: calculation is limited to the two firms matching those exact characteristics

Modified Naïve Bayes Calculations Page 151 formula... P(Ci|x 1 …,x p ) = P(x 1...,x p |C i ) P(C i ) P(x 1...,x p |C i ) P(C i ) + P(x 1...,x p |C m ) P(C m ) P(Fraud|Prior Charges & Small Firm) = Proportion of “Prior Charges,” Small,” & “Fraud” to Total Records Divided by... {same} + Proportion of “Prior Charges,” “Small,” & “Truthful” to Total Records

Modified Naïve Bayes Calculations Same goal as before Compute 2 quantities: Proportion of “prior charges = y” among frauds, times proportion of “small” among frauds, times proportion frauds = 3/4 * 1/4 * 4/10 = Proportion of “prior charges = y” among truthfuls, times proportion of “small” among truthfuls, times proportion of truthfuls = 1/6 * 4/6 * 6/10 = P(fraud | prior charges, small) = / ( ) = 0.53

Modified Naïve Bayes, cont. Note that probability estimate does not differ greatly from exact result of.50 But, ALL records are used in calculations, not just those matching predictor values This makes calculations practical in most circumstances Relies on assumption of independence between predictor variables within each class

Independence Assumption Not strictly justified (variables often correlated with one another) Often “good enough”

Advantages Handles purely categorical data well Works well with very large data sets Simple & computationally efficient

Shortcomings Requires large number of records Problematic when a predictor category is not present in training data Assigns 0 probability of response, ignoring information in other variables

On the other hand… Probability rankings are more accurate than the exact probability estimates Good for applications using lift (e.g., response to mailing), less so for applications requiring probabilities (e.g., credit scoring)

Summary No statistical models involved Naïve Bayes (like k-NN) pays attention to complex interactions and local structure Computational challenges remain