Download presentation
Presentation is loading. Please wait.
1
. Learning Bayesian networks Slides by Nir Friedman
2
Learning Bayesian networks Inducer Data + Prior information E R B A C.9.1 e b e.7.3.99.01.8.2 be b b e BEP(A | E,B)
3
Known Structure -- Incomplete Data Inducer E B A.9.1 e b e.7.3.99.01.8.2 be b b e BEP(A | E,B) ?? e b e ?? ? ? ?? be b b e BE E B A u Network structure is specified u Data contains missing values l We consider assignments to missing values E, B, A.
4
Known Structure / Complete Data u Given a network structure G And choice of parametric family for P(X i |Pa i ) u Learn parameters for network from complete data Goal u Construct a network that is “closest” to probability distribution that generated the data
5
Maximum Likelihood Estimation in Binomial Data u Applying the MLE principle we get (Which coincides with what one would expect) 00.20.40.60.81 L( :D)L( :D) Example: (N H,N T ) = (3,2) MLE estimate is 3/5 = 0.6
6
Learning Parameters for a Bayesian Network E B A C u Training data has the form:
7
Learning Parameters for a Bayesian Network E B A C u Since we assume i.i.d. samples, likelihood function is
8
Learning Parameters for a Bayesian Network E B A C u By definition of network, we get
9
Learning Parameters for a Bayesian Network E B A C u Rewriting terms, we get
10
General Bayesian Networks Generalizing for any Bayesian network: u The likelihood decomposes according to the structure of the network. i.i.d. samples Network factorization
11
General Bayesian Networks (Cont.) Complete Data Decomposition Independent Estimation Problems If the parameters for each family are not related, then they can be estimated independently of each other. (Not true in Genetic Linkage analysis).
12
Learning Parameters: Summary For multinomial we collect sufficient statistics which are simply the counts N (x i,pa i ) u Parameter estimation u Bayesian methods also require choice of priors u Both MLE and Bayesian are asymptotically equivalent and consistent. MLE Bayesian (Dirichlet Prior)
13
Known Structure -- Incomplete Data Inducer E B A.9.1 e b e.7.3.99.01.8.2 be b b e BEP(A | E,B) ?? e b e ?? ? ? ?? be b b e BE E B A u Network structure is specified u Data contains missing values l We consider assignments to missing values E, B, A.
14
Learning Parameters from Incomplete Data Incomplete data: u Posterior distributions can become interdependent u Consequence: l ML parameters can not be computed separately for each multinomial l Posterior is not a product of independent posteriors XX Y|X=H m X[m] Y[m] Y|X=T
15
Learning Parameters from Incomplete Data (cont.). u In the presence of incomplete data, the likelihood can have multiple global maxima u Example: l We can rename the values of hidden variable H l If H has two values, likelihood has two global maxima u Similarly, local maxima are also replicated u Many hidden variables a serious problem HY
16
Gradient Ascent: Follow gradient of likelihood w.r.t. to parameters L( |D) MLE from Incomplete Data u Finding MLE parameters: nonlinear optimization problem
17
L( |D) Expectation Maximization (EM): Use “current point” to construct alternative function (which is “nice”) Guaranty: maximum of new function is better scoring than the current point MLE from Incomplete Data u Finding MLE parameters: nonlinear optimization problem
18
MLE from Incomplete Data Both Ideas: Find local maxima only. Require multiple restarts to find approximation to the global maximum.
19
Gradient Ascent u Main result Theorem GA: Requires computation: P(x i,pa i |o[m], ) for all i, m Inference replaces taking derivatives.
20
Gradient Ascent (cont) m pax ii moP moP, )|][( )|][( 1 m x x iiii moPDP,, )|][(log)|( How do we compute ? Proof:
21
=1 Gradient Ascent (cont) Since: ii pax ii o xP ',' ),,','( ii x',' ii nd i ii d paxP o Po xoP ),'|'( )|,'(),,','|( ii ii x x nd iii ii d opaP xPo xoP, ',' )|,(),|(),,,|( ii iiii x x ii x o xP oP, ','',' )|,,( )|(
22
Gradient Ascent (cont) u Putting all together we get
23
Expectation Maximization (EM) u A general purpose method for learning from incomplete data Intuition: u If we had access to counts, then we can estimate parameters u However, missing values do not allow to perform counts u “Complete” counts using current parameter assignment
24
Expectation Maximization (EM) 1.3 0.4 1.7 1.6 X Z N (X,Y ) XY # HTHHTHTHHT Y ??HTT??HTT TT?THTT?TH HTHTHTHT HHTTHHTT P(Y=H|X=T, Z=T, ) = 0.4 Expected Counts P(Y=H|X=H,Z=T, ) = 0.3 Data Current model These numbers are placed for illustration; they have not been computed. X Y Z
25
EM (cont.) Training Data X1X1 X2X2 X3X3 H Y1Y1 Y2Y2 Y3Y3 Initial network (G, 0 ) Expected Counts N(X 1 ) N(X 2 ) N(X 3 ) N(H, X 1, X 1, X 3 ) N(Y 1, H) N(Y 2, H) N(Y 3, H) Computation (E-Step) Reparameterize X1X1 X2X2 X3X3 H Y1Y1 Y2Y2 Y3Y3 Updated network (G, 1 ) (M-Step) Reiterate
26
Expectation Maximization (EM) u In practice, EM converges rather quickly at start but converges slowly near the (possibly-local) maximum. u Hence, often EM is used few iterations and then Gradient Ascent steps are applied.
27
Final Homework Question 1: Develop an algorithm that given a pedigree input, provides the most probably haplotype of each individual in the pedigree. Use the Bayesian network model of superlink to formulate the problem exactly as a query. Specify the algorithm at length discussing as many details as you can. Analyze its efficiency. Devote time to illuminating notation and presentation. Question 2: Specialize the formula given in Theorem GA for in genetic linkage analysis. In particular, assume exactly 3 loci: Marker 1, Disease 2, Marker 3, with being the recombination between loci 2 and 1 and 0.1- being the recombination between loci 3 and 2. 1. Specify the formula for a pedigree with two parents and two children. 2. Extend the formula for arbitrary pedigrees. Note that is the same in many local probability tables.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.