Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Algorithm for Bayesian Network Construction from Data

Similar presentations


Presentation on theme: "An Algorithm for Bayesian Network Construction from Data"— Presentation transcript:

1 An Algorithm for Bayesian Network Construction from Data
by: Jie Cheng David A. Bell Weiru Liu University of Ulster, UK Presented by: Jian Xu

2 Outline Introduction Some basic concepts
The proposed algorithm for BN construction Experiment results Discussions & comments 2/4/2019 Machine Learning

3 What is a Bayesian Network?
Serum Calcium Brain Tumor Metastatic Cancer Coma Headaches P(M) .20 S B P(C) B P(H) M P(S) M P(B) Cancer BN Example 2/4/2019 Machine Learning

4 Bayesian Network (BN) A Bayesian network is a compact graphical representation of a probability distribution over a set of domain random variables X = {X1, X2, …, Xn} Two components Structure: direct acyclic graph (DAG) over nodes, which exploits causal relations in the domain CPD: each node has a conditional probability distribution associated with it 2/4/2019 Machine Learning

5 BN Learning Structure learning Parameter learning
To identify the topology of the network Score based methods Dependency analysis methods Parameter learning To learn the conditional probabilities for a given network topology MLE, Bayesian approach, etc 2/4/2019 Machine Learning

6 BN Structure Learning Search & scoring methods:
To search for a structure most likely to have generated the data Use heuristic search method to construct a model and evaluate it using a scoring method, such as MDL, Bayesian approach, etc May not find the best solution Random restarts: to avoid getting stuck in a local maximum Less time complexity in the worst case, i.e., when the underlying DAG is fully connected 2/4/2019 Machine Learning

7 BN Learning Algorithms (Cont’d)
Dependency analysis methods: Use conditional independency (CI) test to analyze dependency relationships among nodes. Usually asymptotically correct when the data is DAG-faithful Works efficiently when the underlying network is sparse CI tests with large condition sets may be unreliable unless the volume of data is enormous. Used in this proposed algorithm 2/4/2019 Machine Learning

8 Basic Concepts D-separation: two nodes X and Y are called d-separated given C if and only if there exists no adjacency path P between X and Y, such that: every collider on P is in C or has a descendant in C no other nodes on path P is in C C is called a condition-set Open path: a path between X and Y is said to be open if every node in the path is active. Closed path: if any node in the path is inactive Collider node Non-collider node 2/4/2019 Machine Learning

9 Basic Concepts (Cont’d)
DAG-faithful: when there exists such a DAG that can represent all the conditional independence relations of the underlying distribution. D-map: a graph G is a dependency map (D-map) of M if every independence relationship in M is true in G. (a BN with no edge) I-map: a graph G is an independency map (I-map) of M if every independence relationship in G is true in M. (fully-connected BN) Minimum I-map: a graph G is an I-map of M, but the removal of any arc from G yields a graph that is not an I-map of M. P-map: a graph G is a perfect map of M if it is both a D-map and an I-map of M. 2/4/2019 Machine Learning

10 Mutual Information The mutual information of two nodes Xi , Xj is defined as: The conditional mutual information is defined as: 2/4/2019 Machine Learning

11 Assumptions All attributes are discrete
No missing values in any record All the records are drawn from a single probability model independently The size of dataset is big enough for reliable CI tests The ordering of the attributes are available before the network construction 2/4/2019 Machine Learning

12 An Algorithm for BN Construction
Drafting Compute mutual information of each pair of nodes, and creates a draft of the model Thickening Adds arcs when the pairs of nodes cannot be d-separated, get an I-map of the model Thinning Each arc of the I-map is examined using CI tests and will be removed if the two nodes are the arc are conditionally independent 2/4/2019 Machine Learning

13 Drafting Phase 1. Initiate a graph G(V, E) where V={all nodes}, E={ }, Initiate two empty lists S, R 2. For each pair of nodes (vi, vj), i≠j, compute I(vi, vj). Sort all of the I(vi, vj) ≥ ε from large to small, and put the corresponding pairs of nodes into an ordered set S. 3. Get the first two pairs of nodes in S, and remove them from S. Add the Corresponding arc to E. (the direction of the arcs is determined by the available node ordering) 4. Get the first pair of nodes remained in S and remove it from S. If there is no open path between the two nodes (they are d-separated given empty set), add the corresponding arc to E. Otherwise add the pair of nodes to the end of an ordered set R. 5. Repeat step 4 until S is empty. 2/4/2019 Machine Learning

14 Drafting Example Figure (a) is the underlying BN structure
I(B,D) ≥ I(C,E) ≥ I(B,E) ≥ I(A,B) ≥ I(B,C) ≥ I(C,D) ≥ I(D,E) ≥ I(A,D) ≥ I(A,E) ≥ I(A,C) ≥ ε Figure (b) is the draft graph 2/4/2019 Machine Learning

15 Thickening Phase 6. Get the first pair of nodes in R and remove it from R 7. Find a block set that blocks each open path between these nodes by a set of minimum number of nodes. Conduct a CI test, if these two nodes are still dependent on each other given the block set, connect them by an arc. 8. Go to step 6 until R is empty. 2/4/2019 Machine Learning

16 Thickening Example Figure (b) is the draft graph
Examine (D,E) pair, find the minimum set that blocks all the open paths between D and E {B} CI test reveal that D and E are dependent given {B}, so arc (D,E) is added (A,C) is not added because A and C are independent given {B} 2/4/2019 Machine Learning

17 Thinning Phase 9. For each arc in E, if there are open paths between the two nodes besides this arc, remove this arc from E temporarily, and call procedure find_block_set(current graph, node1, node2). Conduct a CI test on the condition of the block set. If the two nodes are dependent, add this arc back to E; otherwise remove the arc permanently. 2/4/2019 Machine Learning

18 Thinning Example Figure (c) is the I-map of the underlying BN
Arc (B,E) is removed because B and E are independent of each other given {C,D}. Figure (d) is the perfect I-map of the underlying dependency model (a). 2/4/2019 Machine Learning

19 Finding Minimum Block Set
2/4/2019 Machine Learning

20 Complexity Analysis For a dataset with N attributes, r maximum possible values each, k parents at most Phase I: N2 mutual information computation, each of which requires O(r2) basic operations, O(N2r2) Phase II: at most N2 CI tests, each with at most O(rk+2) basic operations, O(N2rk+2), worst case O(N2rN) Phase III: same as Phase II. 2/4/2019 Machine Learning

21 ALARM Network Structure
2/4/2019 Machine Learning

22 Experiment setup ALARM BN (A Logical Alarm Reduction Mechanism): a medical diagnosis system for patient monitoring 37 nodes, 46 arcs 3 versions: same structure, different CPD’s 10000 cases for each dataset Modified conditional mutual information calculation by taking the variable’s degree of freedom into consideration to make CI tests more reliable ε = 0.003 2/4/2019 Machine Learning

23 Result on ALARM BN 2/4/2019 Machine Learning

24 Discussions & Comments
About the assumptions All attributes are discrete No missing values in any record The size of dataset is big enough for reliable CI tests The ordering of the attributes are available before the network construction 2/4/2019 Machine Learning

25 Discussions & Comments
Threshold ε ε = 0.003 How do we pick an appropriate ε? How does it affect the accuracy and time by choosing different ε? Modification in the experiment part Use Modified conditional mutual information calculation by taking the variable’s degree of freedom into consideration to make CI tests more reliable Does this modification affect the result in any way other than increasing the accuracy? 2/4/2019 Machine Learning

26 Thank you! 2/4/2019 Machine Learning


Download ppt "An Algorithm for Bayesian Network Construction from Data"

Similar presentations


Ads by Google