Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang

Similar presentations


Presentation on theme: "Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang"— Presentation transcript:

1 Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang
Learning Bayesian Network Structure from Massive Datasets: The “Sparse Candidate” Algorithm Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang

2 Abstract Learning Bayesian network Search space is extremely large.
Optimization problem (in machine learning) Constraint satisfaction (in statistics) Search space is extremely large. Search procedure spends most of times examining extremely unreasonable candidate structures. If we can reduce search space, faster learning will be possible. Some restrictions on candidate parent variables for a variable are given. Bioinformatics

3 Learning Bayesian Network Structures
Constraint satisfaction problem 2-test Optimization problem BDe, MDL Learning is to find the structure maximizes these scores. Search technique Generally NP-hard Greedy hill-climbing, simulated annealing O(n2) If the number of examples and the number of attributes are large, the computational cost is too expensive to get tractable result.

4 Combining Statistical Properties
Most of the candidates considered during the search procedure can be eliminated in advance based on our statistical understanding on the domain If X and Y are almost independent in data, we might decide not to consider Y as a parent of X. Mutual information Restricting the possible parents of each variable (k) k << n – 1 The key idea is to use the network structure found at the last stage to find better candidate parents.

5 Background A Bayesian network for X = {X1, X2, …, Xn}
B = <G, > The problem of learning a Bayesian network Given a training set D = {X1, X2, …, XN}, Find a B that best matches D. BDe, MDL Score(G:D) = iScore(Xi|Pa(Xi):NXi, Pa(Xi)) Greedy hill-climbing search At each step, all possible local change is examined and the change which brings maximal gain in the score is selected. Calculation of sufficient statistics is computational bottle-neck.

6 Simple Intuitions Using mutual information or correlation
If the true structure is X -> Y -> Z, I(X;Z) > 0, I(Y;Z) > 0, I(X;Y) > 0 and I(X;Z|Y) = 0 Basic idea of “Sparse Candidate” algorithm For each variable X, we find a set of variables Y1, Y2, …, Yk that are most promising candidate parents for X. This gives us smaller search space. The main drawback of this idea A mistake in initial stage can lead us to find an inferior scoring network. To iterate basic procedure, using the previously constructed network to reconsider the candidate parents.

7 Outline of the Sparse Candidate Algorithm

8 Convergence Properties of the Sparse Candidate Algorithm
We require that in Restrict step, the selected candidates for Xi’s parents include Xi’s current parents. PaGn(Xi)  Cin+1 This requirement implies that the winning network Bn is a legal structure in the n + 1 iteration. Score(Bn+1|D)  Score(Bn|D) Stopping criterion Score(Bn) = Score(Bn-1)

9 Mutual Information Mutual information Example
I(A;C) > I (A;D) > I(A;B) B A C D

10 Discrepancy Test Initial iteration uses mutual information and after this, discrepancy.

11 Other tests Conditional mutual information
Penalizing structures with more parameters

12 Learning with Small Candidate Sets
Standard heuristics Unconstrained Space: O(nCk) Time: O(n2) Constrained by small candidate Space: O(2k) Time: O(kn) Divide and Conquer heuristics

13 Strongly Connected Components
Decomposing H into strongly connected components takes linear time.

14 Separator Decomposition
H’1 H’2 X Y H1 S H2 The bottle-neck is S. We can order the variables in S to disallow any cycle in H1  H2.

15 Experiments on Synthetic Data

16 Experiments on Real-Life Data

17 Conclusions Sparse candidate set enables us to search for good structure efficiently. Better criterion is necessary. The authors applied these techniques to Spellman’s cell-cycle data. Exploiting of network structure to search in H needs to be improved.


Download ppt "Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang"

Similar presentations


Ads by Google