S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational Intelligence Group Departamento de Inteligencia Artificial Universidad Politécnica de Madrid
C.Bielza, P.Larrañaga -UPM- 2 Types of queries Brute-force computation Probabilistic logic sampling Variable elimination algorithm Message passing algorithm Conceptos básicos Inference in Bayesian networks Exact inference: Approximate inference:
C.Bielza, P.Larrañaga -UPM- 3 Queries: posterior probabilities Given some evidence e (observations), Posterior probability of a target variable(s) X : Other names: probability propagation, belief updating or revision… Alarm Earth. Burgl. WCalls News ? Vector Types of queries QueriesBrute-force VE Message Approx answer queries about P
C.Bielza, P.Larrañaga -UPM- 4 Semantically, for any kind of reasoning Predictive reasoning or deductive (causal inference): predict effects Alarm Earth. Burgl. WCalls News ? Diagnostic reasoning (diagnostic inference): diagnose the causes Alarm Earth. Burgl. WCalls News ? Symptoms|Disease Disease|Symptoms Types of queries QueriesBrute-force VE Message Approx Target variable is usually a descendant of the evidence Target variable is usually an ancestor of the evidence
C.Bielza, P.Larrañaga -UPM- 5 More queries: maximum a posteriori (MAP) Most likely configurations (abductive inference): event that best explains the evidence Total abduction: search for Partial abduction: search for K most likely explanations subset. of unobserved (explanation set) all the unobserved Alarm Earth.Burgl. WCalls News ? ? Alarm Earth.Burgl. WCalls News ? ? ? ? Types of queries QueriesBrute-force VE Message Approx In general, cannot be computed component-wise, with max P(x i |e)
C.Bielza, P.Larrañaga -UPM- 6 More queries: maximum a posteriori (MAP) Types of queries QueriesBrute-force VE Message Approx Use MAP for: Classification: find most likely label, given the evidence Explanation: what is the most likely scenario, given the evidence
C.Bielza, P.Larrañaga -UPM- 7 More queries: decision-making Optimal decisions (of maximum expected utility), with influence diagrams Types of queries QueriesBrute-force VE Message Approx
C.Bielza, P.Larrañaga -UPM- 8 Brute-force computation of P(X|e) First, consider P(X i ), without observed evidence e. Conceptually simple but computationally complex For a BN with n variables, each with its P(X j |Pa(X j )): But this amounts to computing the JPD, often very inefficient and even intractable computationally CHALLENGE: Without computing the JDP, exploit the factorization encoded by the BN and the distributive law (local computations) Exact inference [Pearl’88; Lauritzen & Spiegelhalter’88] QueriesBrute-force VE Message Approx Brute-force approach
C.Bielza, P.Larrañaga -UPM- 9 Improving brute-force Use the JPD factorization and the distributive law Table with 32 inputs (JPD) (if binary variables) Exact inference QueriesBrute-force VE Message Approx ?
C.Bielza, P.Larrañaga -UPM- 10 Improving brute-force Arrange computations effectively, moving some additions over X 5 and X 3 : over X 4 : Biggest table with 8 (like the BN) Exact inference QueriesBrute-force VE Message Approx
C.Bielza, P.Larrañaga -UPM- 11 Variable elimination algorithm Wanted: A list with all functions of the problem Select an elimination order of all variables (except i) For each X k from , if F is the set of functions that involve X k : Delete F from the list Add f’ to the list Output: combination (multiplication) of all functions in the current list Eliminate X k = combine all the functions that contain this variable and marginalize out X k Compute ONE variable Exact inference QueriesBrute-force VE Message Approx
C.Bielza, P.Larrañaga -UPM- 12 Variable elimination algorithm Exact inference QueriesBrute-force VE Message Approx Repeat the algorithm for each target variable
C.Bielza, P.Larrañaga -UPM- 13 Example with Asia network Exact inference QueriesBrute-force VE Message Approx Visit to Asia (A) Smoking (S) Lung Cancer (L) Tuberculosis (T) Tub. or Lung Canc (E) Bronchitis (B) X-Ray (X) Dyspnea (D)
C.Bielza, P.Larrañaga -UPM- 14 Brute-force approach Compute P(D) by brute-force: Exact inference QueriesBrute-force VE Message Approx Complexity is exponential in the size of the graph (number of variables *number of states for each variable)
C.Bielza, P.Larrañaga -UPM- 15 Exact inference QueriesBrute-force VE Message Approx not necessarily a probability term
C.Bielza, P.Larrañaga -UPM- 16 Exact inference QueriesBrute-force VE Message Approx 4
C.Bielza, P.Larrañaga -UPM- 17 Variable elimination algorithm Size = 8 Local computations (due to moving the additions) Importance of the elimination ordering, but finding an optimal (minimum cost) is NP-hard [Arnborg et al.’87] (heuristics for good sequences) Exact inference QueriesBrute-force VE Message Approx Complexity is exponential in the max N. of var. in factors of the summation
C.Bielza, P.Larrañaga -UPM- 18 Message passing algorithm Operates passing messages among the nodes of the network. Nodes act as processors that receive, calculate and send information. Called propagation algorithms Exact inference QueriesBrute-force VE Message Approx Clique tree propagation, based on the same principle as VE but with a sophisticated caching strategy that: Enables to compute the posterior prob. distr. of all variables in twice the time it takes to compute that of one single variable Works in an intuitive appealing fashion, namely message propagation
C.Bielza, P.Larrañaga -UPM- 19 Basic operations for a node Ask info(i,j) : Target node i asks info to node j. Does it for all neighbors j. They do the same until there are no nodes to ask Exact inference QueriesBrute-force VE Message Approx Send-message(i,j) : Each node sends a message to the node that asked him the info… until reaching the target node A message is defined over the intersection of domains of f i and f j. It is computed as: And finally, we calculate locally at each node i: Target combines all received info with his info and marginalize over the target variable
C.Bielza, P.Larrañaga -UPM- 20 Procedure for X 2 Exact inference QueriesBrute-force VE Message Approx CollectEvidence Ask
C.Bielza, P.Larrañaga -UPM- 21 P(X 2 ) as a message passing algorithm Exact inference QueriesBrute-force VE Message Approx ?
C.Bielza, P.Larrañaga -UPM- 22 VE as a message passing algorithm Direct correspondence: Exact inference QueriesBrute-force VE Message Approx ? VE Mess.
C.Bielza, P.Larrañaga -UPM- 23 Computing prob. P(X i |e) of all (unobserved) variables i at a time We can perform the previous process for each node: but many messages are repeated! Exact inference QueriesBrute-force VE Message Approx Or, we can use 2 rounds of messages as follows: Select a node as a root (or pivot) Ask or collect evidence from the leaves toward the root (messages in downward direction). As VE. Distribute evidence from the root toward the leaves (messages in upward direction) Calculate marginal distributions at each node by local computation, i.e. using its incoming messages This algorithm never constructs tables larger than those in the BN
C.Bielza, P.Larrañaga -UPM- 24 Message passing algorithm CollectEvidence Root node Exact inference QueriesBrute-force VE Message Approx First sweep: DistributeEvidence Second sweep:
C.Bielza, P.Larrañaga -UPM- 25 Networks with loops If net is not a polytree, it does not work Independence assumptions applied in the algorithm cannot be used here (now “any node separates the graph into 2 unconnected parts (polytrees)” does not hold) Exact inference QueriesBrute-force VE Message Approx Request/messages go in a cycle indefinitely (info goes through 2 paths and is counted twice) Alternatives??
C.Bielza, P.Larrañaga -UPM- 26 Complexity Exact inference QueriesBrute-force VE Message Approx Complexity of propagation algorithms in polytrees (i.e., without loops, cycles in the underlying undirected graph) is linear in the size (nodes+arcs) of the network [brute-force is exponential] Exact inference in multiply-connected BNs is an NP-complete problem [Cooper 1990]
C.Bielza, P.Larrañaga -UPM- 27 Alternative: clustering methods [Lauritzen & Spiegelhalter’88] Method implemented in the main BN software packages Transform the BN into a probabilistically equivalent polytree by merging nodes, removing the multiple paths between two nodes Exact inference QueriesBrute-force VE Message Approx M SB CH Metastatic cancer (M) is a possible cause of brain tumors (B) and an explanation for increased total serum calcium (S). In turn, either of these could explain a patient falling into a coma (C). Severe headache (H) is also associated with brain tumors. Create a new node Z, that combines S and B M Z=S,B CH States of Z: {tt,ft,tf,ff} P(Z|M)=P(S|M)P(B|M) since they are c.i. given M P(H|Z)=P(H|B) since H c.i. of S given B
C.Bielza, P.Larrañaga -UPM- 28 Alternative: clustering methods Steps for the JUNCTION TREE CLUSTERING ALGORITHM : 1.Moralize the BN 2.Triangulate the moral graph and obtain the cliques 3.Create the junction tree and its separators 4.Compute new parameters 5.Message passing algorithm Exact inference QueriesBrute-force VE Message Approx Transform BN into a polytree (slow, much memory if dense, but only once) Belief updating (fast) COMPILATION
C.Bielza, P.Larrañaga -UPM- 29 Inferencia aproximada Why? Because exact inference is intractable (NP-complete) with large (+40) and densely connected BNs Both deterministic and stochastic simulation to find approximate answers the associated cliques for the junction tree algorithm or the intermediate factors in the VE algorithm will grow in size, generating an exponential blowup in the number of computations performed Approximate inference QueriesBrute-force VE Message Approx
C.Bielza, P.Larrañaga -UPM- 30 Stochastic simulation Uses the network to generate a large number of cases (full instantiations) from the network distribution Inferencia aproximada Approximate inference QueriesBrute-force VE Message Approx P(X i |e) is estimated using these cases by counting observed frequencies in the samples. By the Law of Large Numbers, estimate converges to the exact probability as more cases are generated Approximate propagation in BNs within an arbitrary tolerance or accuracy is an NP-complete problem In practice, if e is not too unlikely, convergence is quickly
C.Bielza, P.Larrañaga -UPM- 31 Probabilistic logic sampling [Henrion’88] When all the nodes have been visited, we have a case, an instantiation of all the nodes in the BN A forward sampling algorithm Given an ancestral ordering of the nodes (parents before children), generate from X once we have generated from its parents (i.e. from the root nodes down to the leaves) Inferencia aproximada Approximate inference QueriesBrute-force VE Message Approx Repeat and use the observed frequencies to estimate P(X i |e) Use conditional prob. given the known values of the parents
C.Bielza, P.Larrañaga -UPM- 32 Software
C.Bielza, P.Larrañaga -UPM- 33 Software
C.Bielza, P.Larrañaga -UPM- 34 Software
C.Bielza, P.Larrañaga -UPM- 35 genie.sis.pitt.edu Software
C.Bielza, P.Larrañaga -UPM- 36 http.cs.berkeley.edu/~murphyk/ Software
C.Bielza, P.Larrañaga -UPM- 37 leo.ugr.es/elvira Software
C.Bielza, P.Larrañaga -UPM- S3-SEMINAR ON DATA MINING -BAYESIAN NETWORKS- B. INFERENCE Master Universitario en Inteligencia Artificial Concha Bielza, Pedro Larrañaga Computational Intelligence Group Departamento de Inteligencia Artificial Universidad Politécnica de Madrid