Jeff Howbert Introduction to Machine Learning Winter 2012 1 Classification Bayesian Classifiers.

Slides:



Advertisements
Similar presentations
Data Mining Classification: Basic Concepts,
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
What is Statistical Modeling
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Data Mining Classification: Naïve Bayes Classifier
Data Mining Classification: Alternative Techniques
Visual Recognition Tutorial
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.
Lecture outline Classification Naïve Bayes classifier Nearest-neighbor classifier.
Visual Recognition Tutorial
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Crash Course on Machine Learning
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Bayesian Networks. Male brain wiring Female brain wiring.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
1 Lecture Notes 16: Bayes’ Theorem and Data Mining Zhangxi Lin ISQS 6347.
Naïve Bayes Classifier. Bayes Classifier l A probabilistic framework for classification problems l Often appropriate because the world is noisy and also.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Classification Techniques: Bayesian Classification
Supervised Learning Approaches Bayesian Learning Neural Network Support Vector Machine Ensemble Methods Adapted from Lecture Notes of V. Kumar and E. Alpaydin.
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Classification Ensemble Methods 1
Machine Learning 5. Parametric Methods.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Fall 2004, CIS, Temple University CIS527: Data Warehousing, Filtering, and Mining Lecture 8 Alternative Classification Algorithms Lecture slides taken/modified.
Bayesian Learning. Bayes Classifier A probabilistic framework for solving classification problems Conditional Probability: Bayes theorem:
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Oliver Schulte Machine Learning 726
Probability Theory and Parameter Estimation I
Data Mining Classification: Alternative Techniques
Ch3: Model Building through Regression
CH 5: Multivariate Methods
Bayesian Classification
Data Mining Lecture 11.
Naïve Bayes CSC 600: Data Mining Class 19.
Bayesian Classification
Data Mining Classification: Alternative Techniques
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Machine Learning: UNIT-3 CHAPTER-1
Naïve Bayes CSC 576: Data Science.
Presentation transcript:

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers

Jeff Howbert Introduction to Machine Learning Winter l A probabilistic framework for solving classification problems. –Used where class assignment is not deterministic, i.e. a particular set of attribute values will sometimes be associated with one class, sometimes with another. –Requires estimation of posterior probability for each class, given a set of attribute values: for each class C i –Then use decision theory to make predictions for a new sample x Bayesian classification

Jeff Howbert Introduction to Machine Learning Winter l Conditional probability: l Bayes theorem: Bayesian classification posterior probability likelihood prior probability evidence

Jeff Howbert Introduction to Machine Learning Winter l Given: –A doctor knows that meningitis causes stiff neck 50% of the time –Prior probability of any patient having meningitis is 1/50,000 –Prior probability of any patient having stiff neck is 1/20 l If a patient has stiff neck, what’s the probability he/she has meningitis? Example of Bayes theorem

Jeff Howbert Introduction to Machine Learning Winter l Treat each attribute and class label as random variables. l Given a sample x with attributes ( x 1, x 2, …, x n ): –Goal is to predict class C. –Specifically, we want to find the value of C i that maximizes p( C i | x 1, x 2, …, x n ). l Can we estimate p( C i | x 1, x 2, …, x n ) directly from data? Bayesian classifiers

Jeff Howbert Introduction to Machine Learning Winter Approach: l Compute the posterior probability p( C i | x 1, x 2, …, x n ) for each value of C i using Bayes theorem: l Choose value of C i that maximizes p( C i | x 1, x 2, …, x n ) l Equivalent to choosing value of C i that maximizes p( x 1, x 2, …, x n | C i ) p( C i ) (We can ignore denominator – why?) l Easy to estimate priors p( C i ) from data. (How?) l The real challenge: how to estimate p( x 1, x 2, …, x n | C i )? Bayesian classifiers

Jeff Howbert Introduction to Machine Learning Winter l How to estimate p( x 1, x 2, …, x n | C i )? l In the general case, where the attributes x j have dependencies, this requires estimating the full joint distribution p( x 1, x 2, …, x n ) for each class C i. l There is almost never enough data to confidently make such estimates. Bayesian classifiers

Jeff Howbert Introduction to Machine Learning Winter l Assume independence among attributes x j when class is given: p( x 1, x 2, …, x n | C i ) = p( x 1 | C i ) p( x 2 | C i ) … p( x n | C i ) l Usually straightforward and practical to estimate p( x j | C i ) for all x j and C i. l New sample is classified to C i if p( C i )  p( x j | C i ) is maximal. Naïve Bayes classifier

Jeff Howbert Introduction to Machine Learning Winter l Class priors: p( C i ) = N i / N p( No ) = 7/10 p( Yes ) = 3/10 l For discrete attributes: p( x j | C i ) = | x ji | / N i where | x ji | is number of instances in class C i having attribute value x j Examples: p ( Status = Married | No ) = 4/7 p( Refund = Yes | Yes ) = 0 How to estimate p ( x j | C i ) from data?

Jeff Howbert Introduction to Machine Learning Winter l For continuous attributes: –Discretize the range into bins  replace with an ordinal attribute –Two-way split: ( x i v )  replace with a binary attribute –Probability density estimation:  assume attribute follows some standard parametric probability distribution (usually a Gaussian)  use data to estimate parameters of distribution (e.g. mean and variance)  once distribution is known, can use it to estimate the conditional probability p( x j | C i ) How to estimate p ( x j | C i ) from data?

Jeff Howbert Introduction to Machine Learning Winter l Gaussian distribution: –one for each ( x j, C i ) pair l For ( Income | Class = No ): –sample mean = 110 –sample variance = 2975 How to estimate p ( x j | C i ) from data?

Jeff Howbert Introduction to Machine Learning Winter Example of using naïve Bayes classifier l p( x | Class = No ) = p( Refund = No | Class = No)  p( Married | Class = No )  p( Income = 120K | Class = No ) = 4/7  4/7  = l p( x | Class = Yes ) = p( Refund = No | Class = Yes)  p( Married | Class = Yes )  p( Income = 120K | Class = Yes ) = 1  0  1.2  = 0 p( x | No ) p( No ) > p( x | Yes ) p( Yes ) therefore p( No | x ) > p( Yes | x ) => Class = No Given a Test Record:

Jeff Howbert Introduction to Machine Learning Winter l Problem: if one of the conditional probabilities is zero, then the entire expression becomes zero. l This is a significant practical problem, especially when training samples are limited. l Ways to improve probability estimation: Naïve Bayes classifier c: number of classes p: prior probability m: parameter

Jeff Howbert Introduction to Machine Learning Winter Example of Naïve Bayes classifier X: attributes M: class = mammal N: class = non-mammal p( X | M ) p( M ) > p( X | N ) p( N ) => mammal

Jeff Howbert Introduction to Machine Learning Winter l Robust to isolated noise samples. l Handles missing values by ignoring the sample during probability estimate calculations. l Robust to irrelevant attributes. l NOT robust to redundant attributes. –Independence assumption does not hold in this case. –Use other techniques such as Bayesian Belief Networks (BBN). Summary of naïve Bayes