Data Mining Lecture 11.

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

Slides from: Doug Gray, David Poole
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Data Mining Classification: Alternative Techniques
Supervised Learning Recap
Naïve Bayes Classifier
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Visual Recognition Tutorial
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Visual Recognition Tutorial
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. Concept Learning Reference : Ch2 in Mitchell’s book 1. Concepts: Inductive learning hypothesis General-to-specific.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
A survey on using Bayes reasoning in Data Mining Directed by : Dr Rahgozar Mostafa Haghir Chehreghani.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
EM and expected complete log-likelihood Mixture of Experts
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Naive Bayes Classifier
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 Instance-Based & Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CS464 Introduction to Machine Learning1 Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
CS Bayesian Learning1 Bayesian Learning A powerful and growing approach in machine learning We use it in our own decision making all the time – You.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Chapter 6 Bayesian Learning
INTRODUCTION TO Machine Learning 3rd Edition
1 Bayesian Learning. 2 Bayesian Reasoning Basic assumption –The quantities of interest are governed by probability distribution –These probability + observed.
Machine Learning 5. Parametric Methods.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Bayesian Learning. Probability Bayes Rule Choosing Hypotheses- Maximum a Posteriori Maximum Likelihood - Bayes Concept Learning Maximum Likelihood of.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Bayesian Learning. Uncertainty & Probability Baye's rule Choosing Hypotheses- Maximum a posteriori Maximum Likelihood - Baye's concept learning Maximum.
1 1)Bayes’ Theorem 2)MAP, ML Hypothesis 3)Bayes optimal & Naïve Bayes classifiers IES 511 Machine Learning Dr. Türker İnce (Lecture notes by Prof. T. M.
Chapter 7. Classification and Prediction
12. Principles of Parameter Estimation
Naive Bayes Classifier
Computer Science Department
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Integration of sensory modalities
Bayesian Learning Chapter
Parametric Methods Berlin Chen, 2005 References:
Machine Learning: Lecture 6
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Machine Learning: UNIT-3 CHAPTER-1
Mathematical Foundations of BME Reza Shadmehr
12. Principles of Parameter Estimation
A task of induction to find patterns
Presentation transcript:

Data Mining Lecture 11

Course Syllabus Classification Techniques (Week 7- Week 8- Week 9) Inductive Learning Decision Tree Learning Association Rules Neural Networks Regression Probabilistic Reasoning Bayesian Learning Case Study 4: Working and experiencing on the properties of the classification infrastructure of Propensity Score Card System for The Retail Banking (Assignment 4) Week 9

Bayesian Learning Bayes theorem is the cornerstone of Bayesian learning methods because it provides a way to calculate the posterior probability P(hlD), from the prior probability P(h), together with P(D) and P(D/h)

Bayesian Learning finding the most probable hypothesis h E H given the observed data D (or at least one of the maximally probable if there are several). Any such maximally probable hypothesis is called a maximum a posteriori (MAP) hypothesis. We can determine the MAP hypotheses by using Bayes theorem to calculate the posterior probability of each candidate hypothesis. More precisely, we will say that MAP is a MAP hypothesis provided (in the last line we dropped the term P(D) because it is a constant independent of h)

Bayesian Learning

Probability Rules

Bayesian Theorem and Concept Learning

Bayesian Theorem and Concept Learning Here let us choose them to be consistent with the following assumptions: 2. And 3. assumptions denote that

Bayesian Theorem and Concept Learning Here let us choose them to be consistent with the following assumptions: 1. assumption denotes that

Bayesian Theorem and Concept Learning

Bayesian Theorem and Concept Learning

Bayesian Theorem and Concept Learning

Bayesian Theorem and Concept Learning

Bayesian Theorem and Concept Learning our straightforward Bayesian analysis will show that under certain assumptions any learning algorithm that minimizes the squared error between the output hypothesis predictions and the training data will output a maximum likelihood hypothesis. The significance of this result is that it provides a Bayesian justification (under certain assumptions) for many neural network and other curve fitting methods that attempt to minimize the sum of squared errors over the training data.

Bayesian Theorem and Concept Learning

Bayesian Theorem and Concept Learning Normal Distribution

Bayesian Theorem and Concept Learning

Bayesian Theorem and Concept Learning Cross Entropy Note the similarity between above equation and the general form of the entropy function Entropy

Gradient Search to Maximize Likelihood in a Neural Net

Gradient Search to Maximize Likelihood in a Neural Net Cross Entropy Rule Backpropogation Rule

Minimum Description Length Principle

Minimum Description Length Principle

Minimum Description Length Principle

Bayes Optimal Classifier So far we have considered the question "what is the most probable hypothesis given the training data?' In fact, the question that is often of most significance is the closely related question "what is the most probable classification of the new instance given the training data?'Although it may seem that this second question can be answered by simply applying the MAP hypothesis to the new instance, in fact it is possible to do better.

Bayes Optimal Classifier

Bayes Optimal Classifier

Gibbs Algorithm Surprisingly, it can be shown that under certain conditions the expected misclassification error for the Gibbs algorithm is at most twice the expected error of the Bayes optimal classifier

Naive Bayes Classifier

Naive Bayes Classifier – An Example New Instance

Naive Bayes Classifier – An Example New Instance

Naive Bayes Classifier – Detailed Look What is wrong with the above formula ? What about zero nominator term; and multiplication of Naive Bayes Classifier

Naive Bayes Classifier – Remarks Simple but very effective strategy Assumes Conditional Independence between attributes of an instance Clearly most of the cases this assumption erroneous Especiallly for the Text Classification task it is powerful It is an entrance point for Bayesian Belief Networks

Bayesian Belief Networks

Bayesian Belief Networks

Bayesian Belief Networks

Bayesian Belief Networks

Bayesian Belief Networks

Bayesian Belief Networks-Learning Can we device effective algorithm for Bayesian Belief Networks ? Two different parameters we must care about -network structure -variables observable or unobservable When network structure unknown; it is too difficult When network structure known and all the variables observable Then it is straightforward just apply Naive Bayes procedure When network structure known but some variables unobservable It is analogous learning the weights for the hidden units in an artificial neural network, where the input and output node values are given but the hidden unit values are left unspecified by the training examples

Bayesian Belief Networks-Learning Can we device effective algorithm for Bayesian Belief Networks ? Two different parameters we must care about -network structure -variables observable or unobservable When network structure unknown; it is too difficult When network structure known and all the variables observable Then it is straightforward just apply Naive Bayes procedure When network structure known but some variables unobservable It is analogous learning the weights for the hidden units in an artificial neural network, where the input and output node values are given but the hidden unit values are left unspecified by the training examples

Bayesian Belief Networks-Gradient Ascent Learning We need gradient ascent procedure searches through a space of hypotheses that corresponds to the set of all possible entries for the conditional probability tables. The objective function that is maximized during gradient ascent is the probability P(D/h) of the observed training data D given the hypothesis h. By definition, this corresponds to searching for the maximum likelihood hypothesis for the table entries.

Bayesian Belief Networks-Gradient Ascent Learning Let’s use instead of for clearity

Bayesian Belief Networks-Gradient Ascent Learning Assuming the training examples d in the data set D are drawn independently, we write this derivative as

Bayesian Belief Networks-Gradient Ascent Learning

Bayesian Belief Networks-Gradient Ascent Learning

Bayesian Belief Networks-Gradient Ascent Learning

EM Algorithm – Basis of Unsupervised Learning Algorithms

EM Algorithm – Basis of Unsupervised Learning Algorithms

EM Algorithm – Basis of Unsupervised Learning Algorithms

EM Algorithm – Basis of Unsupervised Learning Algorithms Step 1 is easy:

EM Algorithm – Basis of Unsupervised Learning Algorithms Let’s try to understand the formula Step 2:

EM Algorithm – Basis of Unsupervised Learning Algorithms for any function f (z) that is a linear function of z, the following equality holds

EM Algorithm – Basis of Unsupervised Learning Algorithms

EM Algorithm – Basis of Unsupervised Learning Algorithms

End of Lecture read Chapter 6 of Course Text Book read Chapter 6 – Supplemantary Text Book “Machine Learning” – Tom Mitchell