Download presentation
1
Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang
Learning Bayesian Network Structure from Massive Datasets: The “Sparse Candidate” Algorithm Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang
2
Abstract Learning Bayesian network Search space is extremely large.
Optimization problem (in machine learning) Constraint satisfaction (in statistics) Search space is extremely large. Search procedure spends most of times examining extremely unreasonable candidate structures. If we can reduce search space, faster learning will be possible. Some restrictions on candidate parent variables for a variable are given. Bioinformatics
3
Learning Bayesian Network Structures
Constraint satisfaction problem 2-test Optimization problem BDe, MDL Learning is to find the structure maximizes these scores. Search technique Generally NP-hard Greedy hill-climbing, simulated annealing O(n2) If the number of examples and the number of attributes are large, the computational cost is too expensive to get tractable result.
4
Combining Statistical Properties
Most of the candidates considered during the search procedure can be eliminated in advance based on our statistical understanding on the domain If X and Y are almost independent in data, we might decide not to consider Y as a parent of X. Mutual information Restricting the possible parents of each variable (k) k << n – 1 The key idea is to use the network structure found at the last stage to find better candidate parents.
5
Background A Bayesian network for X = {X1, X2, …, Xn}
B = <G, > The problem of learning a Bayesian network Given a training set D = {X1, X2, …, XN}, Find a B that best matches D. BDe, MDL Score(G:D) = iScore(Xi|Pa(Xi):NXi, Pa(Xi)) Greedy hill-climbing search At each step, all possible local change is examined and the change which brings maximal gain in the score is selected. Calculation of sufficient statistics is computational bottle-neck.
6
Simple Intuitions Using mutual information or correlation
If the true structure is X -> Y -> Z, I(X;Z) > 0, I(Y;Z) > 0, I(X;Y) > 0 and I(X;Z|Y) = 0 Basic idea of “Sparse Candidate” algorithm For each variable X, we find a set of variables Y1, Y2, …, Yk that are most promising candidate parents for X. This gives us smaller search space. The main drawback of this idea A mistake in initial stage can lead us to find an inferior scoring network. To iterate basic procedure, using the previously constructed network to reconsider the candidate parents.
7
Outline of the Sparse Candidate Algorithm
8
Convergence Properties of the Sparse Candidate Algorithm
We require that in Restrict step, the selected candidates for Xi’s parents include Xi’s current parents. PaGn(Xi) Cin+1 This requirement implies that the winning network Bn is a legal structure in the n + 1 iteration. Score(Bn+1|D) Score(Bn|D) Stopping criterion Score(Bn) = Score(Bn-1)
9
Mutual Information Mutual information Example
I(A;C) > I (A;D) > I(A;B) B A C D
10
Discrepancy Test Initial iteration uses mutual information and after this, discrepancy.
11
Other tests Conditional mutual information
Penalizing structures with more parameters
12
Learning with Small Candidate Sets
Standard heuristics Unconstrained Space: O(nCk) Time: O(n2) Constrained by small candidate Space: O(2k) Time: O(kn) Divide and Conquer heuristics
13
Strongly Connected Components
Decomposing H into strongly connected components takes linear time.
14
Separator Decomposition
H’1 H’2 X Y H1 S H2 The bottle-neck is S. We can order the variables in S to disallow any cycle in H1 H2.
15
Experiments on Synthetic Data
16
Experiments on Real-Life Data
17
Conclusions Sparse candidate set enables us to search for good structure efficiently. Better criterion is necessary. The authors applied these techniques to Spellman’s cell-cycle data. Exploiting of network structure to search in H needs to be improved.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.