An Algorithm for Bayesian Network Construction from Data


An Algorithm for Bayesian Network Construction from Data by: Jie Cheng David A. Bell Weiru Liu University of Ulster, UK Presented by: Jian Xu

Outline Introduction Some basic concepts The proposed algorithm for BN construction Experiment results Discussions & comments 2/4/2019 Machine Learning

What is a Bayesian Network? Serum Calcium Brain Tumor Metastatic Cancer Coma Headaches P(M) .20 S B P(C) + + .80 + - .80 - + .80 - - .05 B P(H) + .80 - .60 M P(S) + .80 - .20 M P(B) + .20 - .05 Cancer BN Example 2/4/2019 Machine Learning

Bayesian Network (BN) A Bayesian network is a compact graphical representation of a probability distribution over a set of domain random variables X = {X1, X2, …, Xn} Two components Structure: direct acyclic graph (DAG) over nodes, which exploits causal relations in the domain CPD: each node has a conditional probability distribution associated with it 2/4/2019 Machine Learning

BN Learning Structure learning Parameter learning To identify the topology of the network Score based methods Dependency analysis methods Parameter learning To learn the conditional probabilities for a given network topology MLE, Bayesian approach, etc 2/4/2019 Machine Learning

BN Structure Learning Search & scoring methods: To search for a structure most likely to have generated the data Use heuristic search method to construct a model and evaluate it using a scoring method, such as MDL, Bayesian approach, etc May not find the best solution Random restarts: to avoid getting stuck in a local maximum Less time complexity in the worst case, i.e., when the underlying DAG is fully connected 2/4/2019 Machine Learning

BN Learning Algorithms (Cont’d) Dependency analysis methods: Use conditional independency (CI) test to analyze dependency relationships among nodes. Usually asymptotically correct when the data is DAG-faithful Works efficiently when the underlying network is sparse CI tests with large condition sets may be unreliable unless the volume of data is enormous. Used in this proposed algorithm 2/4/2019 Machine Learning

Basic Concepts D-separation: two nodes X and Y are called d-separated given C if and only if there exists no adjacency path P between X and Y, such that: every collider on P is in C or has a descendant in C no other nodes on path P is in C C is called a condition-set Open path: a path between X and Y is said to be open if every node in the path is active. Closed path: if any node in the path is inactive Collider node Non-collider node 2/4/2019 Machine Learning

Basic Concepts (Cont’d) DAG-faithful: when there exists such a DAG that can represent all the conditional independence relations of the underlying distribution. D-map: a graph G is a dependency map (D-map) of M if every independence relationship in M is true in G. (a BN with no edge) I-map: a graph G is an independency map (I-map) of M if every independence relationship in G is true in M. (fully-connected BN) Minimum I-map: a graph G is an I-map of M, but the removal of any arc from G yields a graph that is not an I-map of M. P-map: a graph G is a perfect map of M if it is both a D-map and an I-map of M. 2/4/2019 Machine Learning

Mutual Information The mutual information of two nodes Xi , Xj is defined as: The conditional mutual information is defined as: 2/4/2019 Machine Learning

Assumptions All attributes are discrete No missing values in any record All the records are drawn from a single probability model independently The size of dataset is big enough for reliable CI tests The ordering of the attributes are available before the network construction 2/4/2019 Machine Learning

An Algorithm for BN Construction Drafting Compute mutual information of each pair of nodes, and creates a draft of the model Thickening Adds arcs when the pairs of nodes cannot be d-separated, get an I-map of the model Thinning Each arc of the I-map is examined using CI tests and will be removed if the two nodes are the arc are conditionally independent 2/4/2019 Machine Learning

Drafting Phase 1. Initiate a graph G(V, E) where V={all nodes}, E={ }, Initiate two empty lists S, R 2. For each pair of nodes (vi, vj), i≠j, compute I(vi, vj). Sort all of the I(vi, vj) ≥ ε from large to small, and put the corresponding pairs of nodes into an ordered set S. 3. Get the first two pairs of nodes in S, and remove them from S. Add the Corresponding arc to E. (the direction of the arcs is determined by the available node ordering) 4. Get the first pair of nodes remained in S and remove it from S. If there is no open path between the two nodes (they are d-separated given empty set), add the corresponding arc to E. Otherwise add the pair of nodes to the end of an ordered set R. 5. Repeat step 4 until S is empty. 2/4/2019 Machine Learning

Drafting Example Figure (a) is the underlying BN structure I(B,D) ≥ I(C,E) ≥ I(B,E) ≥ I(A,B) ≥ I(B,C) ≥ I(C,D) ≥ I(D,E) ≥ I(A,D) ≥ I(A,E) ≥ I(A,C) ≥ ε Figure (b) is the draft graph 2/4/2019 Machine Learning

Thickening Phase 6. Get the first pair of nodes in R and remove it from R 7. Find a block set that blocks each open path between these nodes by a set of minimum number of nodes. Conduct a CI test, if these two nodes are still dependent on each other given the block set, connect them by an arc. 8. Go to step 6 until R is empty. 2/4/2019 Machine Learning

Thickening Example Figure (b) is the draft graph Examine (D,E) pair, find the minimum set that blocks all the open paths between D and E {B} CI test reveal that D and E are dependent given {B}, so arc (D,E) is added (A,C) is not added because A and C are independent given {B} 2/4/2019 Machine Learning

Thinning Phase 9. For each arc in E, if there are open paths between the two nodes besides this arc, remove this arc from E temporarily, and call procedure find_block_set(current graph, node1, node2). Conduct a CI test on the condition of the block set. If the two nodes are dependent, add this arc back to E; otherwise remove the arc permanently. 2/4/2019 Machine Learning

Thinning Example Figure (c) is the I-map of the underlying BN Arc (B,E) is removed because B and E are independent of each other given {C,D}. Figure (d) is the perfect I-map of the underlying dependency model (a). 2/4/2019 Machine Learning

Finding Minimum Block Set 2/4/2019 Machine Learning

Complexity Analysis For a dataset with N attributes, r maximum possible values each, k parents at most Phase I: N2 mutual information computation, each of which requires O(r2) basic operations, O(N2r2) Phase II: at most N2 CI tests, each with at most O(rk+2) basic operations, O(N2rk+2), worst case O(N2rN) Phase III: same as Phase II. 2/4/2019 Machine Learning

ALARM Network Structure 2/4/2019 Machine Learning

Experiment setup ALARM BN (A Logical Alarm Reduction Mechanism): a medical diagnosis system for patient monitoring 37 nodes, 46 arcs 3 versions: same structure, different CPD’s 10000 cases for each dataset Modified conditional mutual information calculation by taking the variable’s degree of freedom into consideration to make CI tests more reliable ε = 0.003 2/4/2019 Machine Learning

Result on ALARM BN 2/4/2019 Machine Learning

Discussions & Comments About the assumptions All attributes are discrete No missing values in any record The size of dataset is big enough for reliable CI tests The ordering of the attributes are available before the network construction 2/4/2019 Machine Learning

Discussions & Comments Threshold ε ε = 0.003 How do we pick an appropriate ε? How does it affect the accuracy and time by choosing different ε? Modification in the experiment part Use Modified conditional mutual information calculation by taking the variable’s degree of freedom into consideration to make CI tests more reliable Does this modification affect the result in any way other than increasing the accuracy? 2/4/2019 Machine Learning

Thank you! 2/4/2019 Machine Learning