Download presentation
Presentation is loading. Please wait.
Published byDoris Walker Modified over 9 years ago
1
1 Bayesian Learning
2
2 Bayesian Reasoning Basic assumption –The quantities of interest are governed by probability distribution –These probability + observed data ==> reasoning ==> optimal decision 의의, 중요성 – 직접적으로 확률을 다루는 알고리듬의 근간 예 ) naïve Bayes classifier – 확률을 다루지 않는 알고리듬을 분석하기 위한 틀 예 ) cross entropy, Inductive bias decision tree, MDL principle
3
3 Feature & Limitation Feature of Bayesian Learning – 관측된 데이터들은 추정된 확률을 점진적으로 증감 –Prior Knowledge : P(h), P(D|h) –Probabilistic Prediction 에 응용 –multiple hypothesis 의 결합에 의한 prediction 문제점 –initial knowledge 요구 –significant computational cost
4
4 Bayes Theorem Terms –P(h) : prior probability of h –P(D) : prior probability that D will be observed –P(D|h) : prior knowledge –P(h|D) : posterior probability of h, given D Theorem machine learning : 주어진 데이터 들로부터 the most probable hypothesis 를 찾는 과정
5
5 Example Medical diagnosis –P(cancer)=0.008, P(~cancer)=0.992 –P(+|cancer) = 0.98, P(-|cancer) = 0.02 –P(+|~cancer) = 0.03, P(-|~cancer) = 0.97 –P(cancer|+) = P(+|cancer)P(cancer) = 0.0078 –P(~cancer|+) = P(+|~cancer)P(~cancer) = 0.0298 –h MAP = ~cancer
6
6 MAP hypothesis MAP(Maximum a posteriori) hypothesis
7
7 ML hypothesis maximum likelihood (ML) hypothesis –basic assumption : equally probable a priori basic formular –P(a^b) = P(A|B)P(B) = P(B|A)P(A)
8
8 Bayes Theorem and Concept Learning Brute-force MAP learning –for each calculate P(h|D) –find h MAP consistent assumption –noise free data D –target concept c in hypothesis space H –every hypothesis is equally probable Result every consistent hypothesis is MAP hypothesis (if h is consistent with D) P(h|D) = 0(otherwise)
10
10 Consistent learner 정의 : training example 들에 대해 에러가 없는 hypothesis 를 출력해 주는 알고리듬 result : –every consistent hypothesis output == MAP hypothesis –every consistent learner output == MAP hypothesis if uniform prior probability distribution over H if deterministic, noise-free training data
11
11 ML and LSE hypothesis Least squared error hypothesis –NN, curve fitting, linear regression –continuous-valued target function task : find f : d i =f(x i )+e i preliminary : –probability densities, Normal distribution –target value independence result : limitation : noise only in the target value
13
13 ML hypothesis for predicting Probability Task : find g : g(x) = P(f(x)=1) question : what criterion should we optimize in order to find a ML hypothesis for g result : cross entropy –entropy function :
15
15 Gradient search to ML in NN Let G(h,D) = cross entropy (BP) By gradient ascent
17
17 MDL principle 목적 : Bayesian method 에 의한 inductive bias 와 MLD principle 해석 Shannon and weaver’s optimal code length
18
18 Bayes optimal classifier Motivation : 새로운 instance 의 classification 은 모든 hypothesis 에 의한 prediction 의 결합으로 인하여 최적화 되어진다. task : Find the most probable classification of the new instance given the training data answer :combining the prediction of all hypotheses Bayes optimal classification limitation : significant computational cost ==> Gibbs algorithm
19
19 Bayes optimal classifier example
20
20 Gibbs algorithm Algorithm –1. Choose h from H, according to the posterior probability distribution over H –2. Use h to predict the classification of x Gibbs algorithm 의 유용성 –Haussler, 1994 –Error(Gibbs algorithm)< 2*Error(Bayes optimal classifier)
21
21 Naïve Bayes classifier difference –no explicit search through H –by counting the frequency of existing examples m-estimate of probability = –m : equivalent sample size, p : prior estimate of probability
22
22 example (outlook=sunny,temperature=cool,humidity=high,wind=str ong) P(wind=strong|playTennis=yes)=3/9=.33 P(wind=string|PlayTennis=no)=3/5=.60 P(yes)P(sunny|yes)P(cool|yes)P(high|yes)P(strong|yes)=.0 053 P(no)P(sunny|no)P(cool|no)P(high|no)P(strong|no)=.0206 v NB = no
23
23 Bayes Belief Networks 정의 –describe the joint probability distribution for a set of variables – 모든 변수들이 conditional independence 일것을 요구하지 않음 – 변수들간의 부분적 의존 관계를 확률로 표현 representation
24
24 Bayesian Belief Networks
25
25 Inference Task : infer the probability distribution for the target variables methods –exact inference : NP hard –approximate inference theoretically NP hard practically useful Monte Carlo methods
26
26 Learning Env –structure known + fully observable data easy, by naïve Bayes classifier –structure known + partially observable data gradient ascent procedure ( by Russel, 1995 ) ML hypothesis 와 유사 P(D|h) –structure unknown
27
27 Learning(2) Structure unknown –Bayesian scoring metric ( cooper, Herskovits, 1992 ) –K2 algorithm cooper, Herskovits, 1992 heuristic greedy search fully observed data –constraint-based approach Spirtes, 1993 infer dependency and independency relationship construct structure using this relationship
29
29 EM algorithm EM : estimation, maximization env –learning in the presence of unobserved variables –the form of probability distribution is known application –training Bayesian belief networks –training radial basis function networks –basis for many unsupervised clustering algorithm –basis for Baum-Welch’s forward-backward algorithm
30
30 K-means algorithm Env : k normal distribution 들로부터 임의로 data 생성 task : find mean values of each distribution instance : –if z is known : using –else use EM algorithm
31
31 K-means algorithm Initialize calculate E[z] calculate a new ML hypothesis ==> converge to a local ML hypothesis
32
32 General statement of EM algo Terms – : underlying probability distribution –x : observed data from each distribution –z : unobserved data –Y = X union Z –h : current hypothesis of –h’ : revised hypothesis task : estimate from X
33
33 guideline Search h’ if h = : calculate function Q
34
34 EM algorithm Estimation step maximization step converge to a local maxima
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.