Bayesian classifiers.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Bayesian Learning Provides practical learning algorithms
Preventing Overfitting Problem: We don’t want to these algorithms to fit to ``noise’’ The generated tree may overfit the training data –Too many branches,
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
Classification and Prediction.  What is classification? What is prediction?  Issues regarding classification and prediction  Classification by decision.
Instance Based Learning
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Classifiers in Atlas CS240B Class Notes UCLA. Data Mining z Classifiers: yBayesian classifiers yDecision trees z The Apriori Algorithm zDBSCAN Clustering:
Classification and Prediction
Classification and Regression. Classification and regression  What is classification? What is regression?  Issues regarding classification and regression.
Classification and Regression. Classification and regression  What is classification? What is regression?  Issues regarding classification and regression.
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
Lecture outline Classification Naïve Bayes classifier Nearest-neighbor classifier.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
INSTANCE-BASE LEARNING
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Oct 17, 2006Sudeshna Sarkar, IIT Kharagpur1 Machine Learning Sudeshna Sarkar IIT Kharagpur.
Instructor: Dan Hebert
Rule Generation [Chapter ]
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 1 Classification.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Naive Bayes Classifier
START OF DAY 3 Reading: Chap. 7. Instance-based Learning.
Classification. 2 Classification: Definition  Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes.
Naïve Bayes Classifier. Bayes Classifier l A probabilistic framework for classification problems l Often appropriate because the world is noisy and also.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
CS464 Introduction to Machine Learning1 Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease.
Bayesian Classifier. 2 Review: Decision Tree Age? Student? Credit? fair excellent >40 31…40
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Classification Techniques: Bayesian Classification
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Chapter 6 Bayesian Learning
CpSc 881: Machine Learning Instance Based Learning.
CpSc 810: Machine Learning Instance Based Learning.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
UNIT-6 Classification and Prediction LectureTopic ********************************************** Lecture-32What is classification? What is prediction?
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Data Mining: Concepts and Techniques
Naive Bayes Classifier
Data Mining Lecture 11.
Machine Learning Bayes Learning Bai Xiao.
Lecture 3 Classification and Prediction
Classification Bayesian Classification 2018年12月30日星期日.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Chap 8. Instance Based Learning
Data Mining: Classification
Machine Learning: UNIT-4 CHAPTER-1
UNIT-6 Classification and Prediction
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Naive Bayes Classifier
Presentation transcript:

Bayesian classifiers

Bayesian Classification: Why? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct. Prior knowledge can be combined with observed data. Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities Standard: Even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured

Bayesian Theorem Given training data D, posteriori probability of a hypothesis h, P(h|D) follows the Bayes theorem MAP (maximum posteriori) hypothesis Practical difficulty: require initial knowledge of many probabilities, significant computational cost

Naïve Bayes Classifier (I) A simplified assumption: attributes are conditionally independent: Greatly reduces the computation cost, only count the class distribution.

Naïve Bayesian Classification If i-th attribute is categorical: P(di|C) is estimated as the relative freq of samples having value di as i-th attribute in class C If i-th attribute is continuous: P(di|C) is estimated thru a Gaussian density function Computationally easy in both cases

Play-tennis example: estimating P(xi|C) P(true|n) = 3/5 P(true|p) = 3/9 P(false|n) = 2/5 P(false|p) = 6/9 P(high|n) = 4/5 P(high|p) = 3/9 P(normal|n) = 2/5 P(normal|p) = 6/9 P(hot|n) = 2/5 P(hot|p) = 2/9 P(mild|n) = 2/5 P(mild|p) = 4/9 P(cool|n) = 1/5 P(cool|p) = 3/9 P(rain|n) = 2/5 P(rain|p) = 3/9 P(overcast|n) = 0 P(overcast|p) = 4/9 P(sunny|n) = 3/5 P(sunny|p) = 2/9 windy humidity temperature outlook P(n) = 5/14 P(p) = 9/14

Naive Bayesian Classifier (II) Given a training set, we can compute the probabilities

Play-tennis example: classifying X An unseen sample X = <rain, hot, high, false> P(X|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 = 0.010582 P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = 0.018286 Sample X is classified in class n (don’t play)

The independence hypothesis… … makes computation possible … yields optimal classifiers when satisfied … but is seldom satisfied in practice, as attributes (variables) are often correlated. Attempts to overcome this limitation: Bayesian networks, that combine Bayesian reasoning with causal relationships between attributes

Bayesian Belief Networks (I) Age FamilyH (FH, A) (FH, ~A) (~FH, A) (~FH, ~A) M 0.8 0.5 0.7 0.1 Diabetes Mass ~M 0.2 0.5 0.3 0.9 The conditional probability table for the variable Mass Insulin Glucose Bayesian Belief Networks

Applying Bayesian nets When all but one variable known: P(D|A,F,M,G,I) From Jiawei Han's slides

Bayesian belief network Find joint probability over set of variables making use of conditional independence whenever known a d ad ad ad ad b b 0.1 0.2 0.3 0.4 Variable e independent of d given b b 0.3 0.2 0.1 0.5 e C From Jiawei Han's slides

Bayesian Belief Networks (II) Bayesian belief network allows a subset of the variables conditionally independent A graphical model of causal relationships Several cases of learning Bayesian belief networks Given both network structure and all the variables: easy Given network structure but only some variables: use gradient descent / EM algorithms When the network structure is not known in advance Learning structure of network harder

The k-Nearest Neighbor Algorithm All instances correspond to points in the n-D space. The nearest neighbor are defined in terms of Euclidean distance. The target function could be discrete- or real- valued. For discrete-valued, the k-NN returns the most common value among the k training examples nearest to xq. Vonoroi diagram: the decision surface induced by 1-NN for a typical set of training examples. . _ _ . _ _ + . . + . _ xq + . _ + From Jiawei Han's slides

Discussion on the k-NN Algorithm The k-NN algorithm for continuous-valued target functions Calculate the mean values of the k nearest neighbors Distance-weighted nearest neighbor algorithm Weight the contribution of each of the k neighbors according to their distance to the query point xq giving greater weight to closer neighbors Similarly, for real-valued target functions Robust to noisy data by averaging k-nearest neighbors Curse of dimensionality: distance between neighbors could be dominated by irrelevant attributes. To overcome it, axes stretch or elimination of the least relevant attributes. From Jiawei Han's slides