Learning Bayesian Networks

Name: Learning Bayesian Networks
Uploaded: 2017-08-22T16:33:04+00:00
Duration: PTM11S7
Description: Learning Bayesian Networks

Learning Bayesian Networks
CSE 473

Last Time Basic notions Bayesian networks Statistical learning
Parameter learning (MAP, ML, Bayesian L) Naïve Bayes Issues Structure Learning Expectation Maximization (EM) Dynamic Bayesian networks (DBNs) Markov decision processes (MDPs) © Daniel S. Weld

… Simple Spam Model Spam Nigeria Sex Nude
Naïve Bayes assumes attributes independent Given class of parent Correct? © Daniel S. Weld

Naïve Bayes Incorrect, but Easy!
P(spam | X1 … Xn) =  P(spam | Xi) How compute P(spam | Xi)? How compute P(Xi | spam)? i © Daniel S. Weld

Smooth with a Prior # # # + mp + m P(Xi | S) = Note that if m = 10 in the above, it is like saying “I have seen 10 samples that make me believe that P(Xi | S) = p” Hence, m is referred to as the equivalent sample size

Probabilities: Important Detail!
P(spam | X1 … Xn) =  P(spam | Xi) i Is there any potential problem here? We are multiplying lots of small numbers Danger of underflow! 0.557 = 7 E -18 Solution? Use logs and add! p1 * p2 = e log(p1)+log(p2) Always keep in log form

P(S | X) Easy to compute from data if X discrete
Instance X Spam? 1 T F 2 3 4 5 P(S | X) = ¼ ignoring smoothing… © Daniel S. Weld

P(S | X) What if X is real valued? What now? Instance X Spam? 1 0.01 F
2 3 0.02 4 0.03 T 5 0.05 <T <T >T What now? © Daniel S. Weld

Anything Else? P(S|.04)? 3 2 1 .01 .02 .03 .04 .05 # X S? 1 0.01 F 2 3
0.02 4 0.03 T 5 0.05 P(S|.04)? 3 2 1 © Daniel S. Weld

Smooth with Gaussian then sum “Kernel Density Estimation”
# X S? 1 0.01 F 2 3 0.02 4 0.03 T 5 0.05 3 2 1 © Daniel S. Weld

Spam? What’s with the shape? P(S| X=.023) 3 2 P(S| X=.023) 1
# X S? 1 0.01 F 2 3 0.02 4 0.03 T 5 0.05 P(S| X=.023) P(S| X=.023) 3 2 1 What’s with the shape? © Daniel S. Weld

Analysis Attribute value © Daniel S. Weld

Recap Given a BN structure (with discrete or continuous variables), we can learn the parameters of the conditional prop tables. Earthqk Burgl Spam Nigeria Nude Sex Alarm N1 N2 © Daniel S. Weld

What if we don’t know structure?

Outline Statistical learning Dynamic Bayesian networks (DBNs)

Learning The Structure of Bayesian Networks
Search thru the space of possible network structures! (for now, assume we observe all variables) For each structure, learn parameters Pick the one that fits observed data best Caveat – won’t we end up fully connected? ???? When scoring, add a penalty  model complexity

Search thru the space For each structure, learn parameters Pick the one that fits observed data best Problem? Exponential number of networks! And we need to learn parameters for each! Exhaustive search out of the question! So what now?

Local search! Start with some network structure Try to make a change (add or delete or reverse edge) See if the new network is any better What should be the initial state?

Initial Network Structure?
Uniform prior over random networks? Network which reflects expert knowledge?

Learning BN Structure © Daniel S. Weld

The Big Picture We described how to do MAP (and ML) learning of a Bayes net (including structure) How would Bayesian learning (of BNs) differ? Find all possible networks Calculate their posteriors When doing inference, return weighed combination of predictions from all networks!

Outline Statistical learning Dynamic Bayesian networks (DBNs)

We –could- But we’d get a fully-connected network
With 708 parameters (vs. 78) Much harder to learn! © Daniel S. Weld

Chicken & Egg Problem If we knew that a training instance (patient) had the disease… It would be easy to learn P(symptom | disease) But we can’t observe disease, so we don’t. If we knew params, e.g. P(symptom | disease) then it’d be easy to estimate if the patient had the disease. But we don’t know these parameters. © Daniel S. Weld

Expectation Maximization (EM) (high-level version)
Pretend we do know the parameters Initialize randomly [E step] Compute probability of instance having each possible value of the hidden variable [M step] Treating each instance as fractionally having both values compute the new parameter values Iterate until convergence! © Daniel S. Weld

+ Candy Problem Given a bowl of mixed candies
Can we learn contents of each bag, and What percentage of each is in bowl? © Daniel S. Weld

Simpler Problem Mixture of two distributions
Know: form of distr, variance, % =5 Just need mean of each © Daniel S. Weld

EM Randomly Initialize:  F1,W1,H1, F2,W2,H2 [E Step]
In observable case, we would Estimate  from observed counts Of candies in bags 1 and 2 But bag is hidden, so calculate expected amounts instead 1 = j P(bag=1 | flavorj, wrapperj, holesj) / N Etc. for other parameters [M Step] Recompute Iterate © Daniel S. Weld

Generate Data Assume (true) model is =0.5, F1=W1=H1=0.8,
W=red W=green H=1 H=0 F=cherry 273 93 104 90 F=lime 79 100 94 167 © Daniel S. Weld

Learning Bayesian Networks

Similar presentations

Presentation on theme: "Learning Bayesian Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning Bayesian Networks

Similar presentations

Presentation on theme: "Learning Bayesian Networks"— Presentation transcript:

Similar presentations

About project

Feedback