Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast Exact Bayes Net Structure Learning Daniel Eaton Tuesday Oct 31, 2006 relatively-speakingly-

Similar presentations


Presentation on theme: "Fast Exact Bayes Net Structure Learning Daniel Eaton Tuesday Oct 31, 2006 relatively-speakingly-"— Presentation transcript:

1 Fast Exact Bayes Net Structure Learning Daniel Eaton Tuesday Oct 31, 2006 relatively-speakingly-

2 Oct 31, 2006Daniel Eaton2 What structure learning is Data AB Space of models AB AB AB p( ) =.1 ) =.2 ) =.7 optimize integrate (SL) eg. dirichlet-multinomial DAGs

3 Oct 31, 2006Daniel Eaton3 What SL is for this talk Have complete & exchangeable data –M cases, N variables –Continuous or discrete Model space = DAGs with CPD for each node (ex. multinomial or linear gaussian) plus a prior on parameters p(θ|G), and DAGs p(G)

4 Oct 31, 2006Daniel Eaton4 What SL is for this talk To determine posterior over DAGs given data: –For each DAG G compute marginal likelihood p(data|G) by integrating out parameters θ easy for conjugate prior models ie. dirichlet-multinomial: –p(G|data) α p(data|G)p(G) Unfortunately...

5 Oct 31, 2006Daniel Eaton5 nG(n) 11 23 325 4543 529,281 63,781,503 71.1e9... n Why SL is hard Number of DAGs is superexponential in # of nodes 7 node DAGs... –Even with a tight graph representation (1 b/edge) takes 6.3 GB to store –Posterior prob takes 8.2 GB

6 Oct 31, 2006Daniel Eaton6 Possible resolutions RW MCMC on DAG space –"Posterior landscape" too big and bumpy = bad mixing, doesn't work for N>10 Never represent DAGs: Condition on node orderings (Buntine 1991) –Compute marginal probability of "graph features" –Friedman & Koller 2003 -> MCMC –Koivisto & Sood 2004 -> Exact!

7 Oct 31, 2006Daniel Eaton7 Graph features Just an indicator function on DAGs (f(G)) which is 1 iff a particular structure (ie. an edge) exists in the graph, 0 otherwise Assume N=3, want to know marginal probability that edge from A -> B exists Of course, naively, this sum is difficult for n>5... AB C f ab () = 1 AB C f ab () = 0 AB C f ab () = 1

8 Oct 31, 2006Daniel Eaton8 Conditioning on orders (order-trick) Order on variables, just a permutation of index set –ie. 3 node graph, (1,2,3), (2,1,3), (3,2,1),... –Only N! of these! For a fixed order Intuition: Can consider each node independently of the others (acyclicity ensured by ordering)

9 Oct 31, 2006Daniel Eaton9 Order-trick feature probability Ultimately we want to determine For now, assume order known so we go for –know how to do denominator –numerator depends on feature f for f a single edge (A->B), it's easy

10 Oct 31, 2006Daniel Eaton10 How to use order-trick in practice Except in very special (perhaps bio) cases we don't know the node order a priori –Must sum over N! orders –Ouch, still super-exponential! 1. Sample orders with MCMC (Koller/Friedman) –Sampler mixes much better than in DAG-space –Order-space smaller and less bumpy 2. Do it exactly with dynamic programming! –Koivisto & Sood (2004)

11 Oct 31, 2006Daniel Eaton11 Koivisto Recognize that although there are N! orders to sum over, there is much redundant computation Will allow us to compute the marginal probability of a particular feature in O(N2 N + N2 N-1 C(M)) time –All edge marg probs in O(N 3 2 N ) time (naive) –Or, O(N2 N ) time with a recent (2006) extension that I won't cover today

12 Oct 31, 2006Daniel Eaton12 Koivisto Consider: just need a way to evaluate since can be computed by setting f = 1

13 Oct 31, 2006Daniel Eaton13 To simplify derivation, assume uniform Please accept that: (proof later, if requested) so that Introduce: so that key: each term is modular

14 Oct 31, 2006Daniel Eaton14 () (1)(2)(3) (1,2)(1,3)(2,1)(2,3)(3,1)(3,2) (1,2,3)(1,3,2)(2,1,3)(2,3,1)(3,1,2)(3,2,1) α 1 ({})α 2 ({1})α 3 ({1,2})α 2 ({})α 1 ({2})α 3 ({1,2}) = = + = α 3 ({1,2}) x ( α 1 ({})α 2 ({1}) + α 2 ({})α 1 ({2}) ) Brute force DP recurrence... E ordered sets O(NN!)

15 Oct 31, 2006Daniel Eaton15 DP {} {2}{3}{1} {1,3}{2,3}{1,2} {1,2,3} unordered sets O(N2 N )

16 Oct 31, 2006Daniel Eaton16 alpha We assumed alpha was easy to compute for each i, there are 2 N values α i (:) "Möbius transform" Exists clever algorithm to compute each α i in O(2 N ) time (so O(N2 N ) overall) Intuition (2 node): 111 101010110 010 001 100 000 00 10 11 01 00 10 11 01 00 10 11 01 3 node precompute naive: O(N3 N )

17 Oct 31, 2006Daniel Eaton17 Koivisto summary 1.For each nodes, for each possible parent set, compute ML(i,G i ), f i, p(G i ) (each Nx2 N arrays) 2.Compute Betas a)β f = ML.+ f.+ p(G) b)β 1 = ML.+ p(G) (trivial feature f=1, used for normalization) 3.Compute Alphas For each node, compute α f,i (:) and α 1,i (:) (1x2 N vectors) 4.Compute g f (V) using α f and g 1 (V) using α 1 O(N2 N-1 C(m)) O(N2 N ) O(N2N + N2 N-1 C(m))

18 Oct 31, 2006Daniel Eaton18 G(n)nn2 n 112 328 25324 543464 29,2815160 3,781,5036384 1.1e97896... ?15491,520... ??2020,971,520 Timing 30s 30m ~day

19 Oct 31, 2006Daniel Eaton19 Order-trick limitations 1.Graph structure prior must be modular –Breaks markov equivalence –Cannot have arbitrary priors (ie. uniform impossible) 2.Cannot query arbitrary features (f must be modular) –ie. Directed path between nodes A & B Resolution: use MCMC with proposal based on sampling from marginal edge probabilities –An independence sampler –Works well for N=5

20 Oct 31, 2006Daniel Eaton20 Improvement by MCMC "cancer" network

21 Oct 31, 2006Daniel Eaton21 Improvement by MCMC

22 Oct 31, 2006Daniel Eaton22 Follow-up To read: Exact Bayesian structure learning from uncertain interventions – Kevin & Daniel –Submitted to AIStats06 To run: –Code in CVS: Aline/koivisto –Probably best to wait till Dec to use Happy Halloween


Download ppt "Fast Exact Bayes Net Structure Learning Daniel Eaton Tuesday Oct 31, 2006 relatively-speakingly-"

Similar presentations


Ads by Google