1 Bayesian Learning. 2 Bayesian Reasoning Basic assumption –The quantities of interest are governed by probability distribution –These probability + observed.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Bayesian Learning Provides practical learning algorithms
Pattern Recognition and Machine Learning
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Supervised Learning Recap
Chapter 4: Linear Models for Classification
Chapter 20 of AIMA KAIST CS570 Lecture note
Visual Recognition Tutorial
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
CII504 Intelligent Engine © 2005 Irfan Subakti Department of Informatics Institute Technology of Sepuluh Nopember Surabaya - Indonesia.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
2D1431 Machine Learning Bayesian Learning. Outline Bayes theorem Maximum likelihood (ML) hypothesis Maximum a posteriori (MAP) hypothesis Naïve Bayes.
Probability and Bayesian Networks
Machine Learning CMPT 726 Simon Fraser University
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, March 6, 2000 William.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Visual Recognition Tutorial
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Bayesian Learning Rong Jin.
Bayes Classification.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. Concept Learning Reference : Ch2 in Mitchell’s book 1. Concepts: Inductive learning hypothesis General-to-specific.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
A survey on using Bayes reasoning in Data Mining Directed by : Dr Rahgozar Mostafa Haghir Chehreghani.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Naive Bayes Classifier
Machine Learning Chapter 6. Bayesian Learning Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
1 Instance-Based & Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
CS464 Introduction to Machine Learning1 Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Classification Techniques: Bayesian Classification
CS Bayesian Learning1 Bayesian Learning A powerful and growing approach in machine learning We use it in our own decision making all the time – You.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Chapter 6 Bayesian Learning
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
INTRODUCTION TO Machine Learning 3rd Edition
BCS547 Neural Decoding.
Bayesian Learning Provides practical learning algorithms
Data Mining and Decision Support
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Wednesday, 28 February 2007.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
CS Ensembles and Bayes1 Ensembles, Model Combination and Bayesian Combination.
Bayesian Learning Evgueni Smirnov Overview Bayesian Theorem Maximum A Posteriori Hypothesis Naïve Bayes Classifier Learning Text Classifiers.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Monday, 01 February 2016 William.
Bayesian Learning. Probability Bayes Rule Choosing Hypotheses- Maximum a Posteriori Maximum Likelihood - Bayes Concept Learning Maximum Likelihood of.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Bayesian Learning. Uncertainty & Probability Baye's rule Choosing Hypotheses- Maximum a posteriori Maximum Likelihood - Baye's concept learning Maximum.
1 1)Bayes’ Theorem 2)MAP, ML Hypothesis 3)Bayes optimal & Naïve Bayes classifiers IES 511 Machine Learning Dr. Türker İnce (Lecture notes by Prof. T. M.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Naive Bayes Classifier
Ch3: Model Building through Regression
Computer Science Department
Irina Rish IBM T.J.Watson Research Center
Data Mining Lecture 11.
CSCI 5822 Probabilistic Models of Human and Machine Learning
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Presentation transcript:

1 Bayesian Learning

2 Bayesian Reasoning Basic assumption –The quantities of interest are governed by probability distribution –These probability + observed data ==> reasoning ==> optimal decision 의의, 중요성 – 직접적으로 확률을 다루는 알고리듬의 근간 예 ) naïve Bayes classifier – 확률을 다루지 않는 알고리듬을 분석하기 위한 틀 예 ) cross entropy, Inductive bias decision tree, MDL principle

3 Feature & Limitation Feature of Bayesian Learning – 관측된 데이터들은 추정된 확률을 점진적으로 증감 –Prior Knowledge : P(h), P(D|h) –Probabilistic Prediction 에 응용 –multiple hypothesis 의 결합에 의한 prediction 문제점 –initial knowledge 요구 –significant computational cost

4 Bayes Theorem Terms –P(h) : prior probability of h –P(D) : prior probability that D will be observed –P(D|h) : prior knowledge –P(h|D) : posterior probability of h, given D Theorem machine learning : 주어진 데이터 들로부터 the most probable hypothesis 를 찾는 과정

5 Example Medical diagnosis –P(cancer)=0.008, P(~cancer)=0.992 –P(+|cancer) = 0.98, P(-|cancer) = 0.02 –P(+|~cancer) = 0.03, P(-|~cancer) = 0.97 –P(cancer|+) = P(+|cancer)P(cancer) = –P(~cancer|+) = P(+|~cancer)P(~cancer) = –h MAP = ~cancer

6 MAP hypothesis MAP(Maximum a posteriori) hypothesis

7 ML hypothesis maximum likelihood (ML) hypothesis –basic assumption : equally probable a priori basic formular –P(a^b) = P(A|B)P(B) = P(B|A)P(A)

8 Bayes Theorem and Concept Learning Brute-force MAP learning –for each calculate P(h|D) –find h MAP consistent assumption –noise free data D –target concept c in hypothesis space H –every hypothesis is equally probable Result every consistent hypothesis is MAP hypothesis (if h is consistent with D) P(h|D) = 0(otherwise)

10 Consistent learner 정의 : training example 들에 대해 에러가 없는 hypothesis 를 출력해 주는 알고리듬 result : –every consistent hypothesis output == MAP hypothesis –every consistent learner output == MAP hypothesis if uniform prior probability distribution over H if deterministic, noise-free training data

11 ML and LSE hypothesis Least squared error hypothesis –NN, curve fitting, linear regression –continuous-valued target function task : find f : d i =f(x i )+e i preliminary : –probability densities, Normal distribution –target value independence result : limitation : noise only in the target value

13 ML hypothesis for predicting Probability Task : find g : g(x) = P(f(x)=1) question : what criterion should we optimize in order to find a ML hypothesis for g result : cross entropy –entropy function :

15 Gradient search to ML in NN Let G(h,D) = cross entropy (BP) By gradient ascent

17 MDL principle 목적 : Bayesian method 에 의한 inductive bias 와 MLD principle 해석 Shannon and weaver’s optimal code length

18 Bayes optimal classifier Motivation : 새로운 instance 의 classification 은 모든 hypothesis 에 의한 prediction 의 결합으로 인하여 최적화 되어진다. task : Find the most probable classification of the new instance given the training data answer :combining the prediction of all hypotheses Bayes optimal classification limitation : significant computational cost ==> Gibbs algorithm

19 Bayes optimal classifier example

20 Gibbs algorithm Algorithm –1. Choose h from H, according to the posterior probability distribution over H –2. Use h to predict the classification of x Gibbs algorithm 의 유용성 –Haussler, 1994 –Error(Gibbs algorithm)< 2*Error(Bayes optimal classifier)

21 Naïve Bayes classifier difference –no explicit search through H –by counting the frequency of existing examples m-estimate of probability = –m : equivalent sample size, p : prior estimate of probability

22 example (outlook=sunny,temperature=cool,humidity=high,wind=str ong) P(wind=strong|playTennis=yes)=3/9=.33 P(wind=string|PlayTennis=no)=3/5=.60 P(yes)P(sunny|yes)P(cool|yes)P(high|yes)P(strong|yes)= P(no)P(sunny|no)P(cool|no)P(high|no)P(strong|no)=.0206 v NB = no

23 Bayes Belief Networks 정의 –describe the joint probability distribution for a set of variables – 모든 변수들이 conditional independence 일것을 요구하지 않음 – 변수들간의 부분적 의존 관계를 확률로 표현 representation

24 Bayesian Belief Networks

25 Inference Task : infer the probability distribution for the target variables methods –exact inference : NP hard –approximate inference theoretically NP hard practically useful Monte Carlo methods

26 Learning Env –structure known + fully observable data easy, by naïve Bayes classifier –structure known + partially observable data gradient ascent procedure ( by Russel, 1995 ) ML hypothesis 와 유사 P(D|h) –structure unknown

27 Learning(2) Structure unknown –Bayesian scoring metric ( cooper, Herskovits, 1992 ) –K2 algorithm cooper, Herskovits, 1992 heuristic greedy search fully observed data –constraint-based approach Spirtes, 1993 infer dependency and independency relationship construct structure using this relationship

29 EM algorithm EM : estimation, maximization env –learning in the presence of unobserved variables –the form of probability distribution is known application –training Bayesian belief networks –training radial basis function networks –basis for many unsupervised clustering algorithm –basis for Baum-Welch’s forward-backward algorithm

30 K-means algorithm Env : k normal distribution 들로부터 임의로 data 생성 task : find mean values of each distribution instance : –if z is known : using –else use EM algorithm

31 K-means algorithm Initialize calculate E[z] calculate a new ML hypothesis ==> converge to a local ML hypothesis

32 General statement of EM algo Terms –  : underlying probability distribution –x : observed data from each distribution –z : unobserved data –Y = X union Z –h : current hypothesis of  –h’ : revised hypothesis task : estimate  from X

33 guideline Search h’ if h =  : calculate function Q

34 EM algorithm Estimation step maximization step converge to a local maxima