主講人:虞台文 大同大學資工所 智慧型多媒體研究室

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

A Tutorial on Learning with Bayesian Networks
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
1 Chapter 5 Belief Updating in Bayesian Networks Bayesian Networks and Decision Graphs Finn V. Jensen Qunyuan Zhang Division. of Statistical Genomics,
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
Lauritzen-Spiegelhalter Algorithm
Bayesian Networks Bucket Elimination Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Optimization Problems 虞台文 大同大學資工所 智慧型多媒體研究室. Content Introduction Definitions Local and Global Optima Convex Sets and Functions Convex Programming Problems.
An Introduction to Variational Methods for Graphical Models.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,
主講人:虞台文 大同大學資工所 智慧型多媒體研究室
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
From Variable Elimination to Junction Trees
CSCI 121 Special Topics: Bayesian Networks Lecture #3: Multiply-Connected Graphs and the Junction Tree Algorithm.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Bayesian Networks A causal probabilistic network, or Bayesian network,
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Bayesian Belief Networks
Belief Propagation, Junction Trees, and Factor Graphs
Bayesian Networks Alan Ritter.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Bayesian networks Chapter 14. Outline Syntax Semantics.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
The Simplex Algorithm 虞台文 大同大學資工所 智慧型多媒體研究室. Content Basic Feasible Solutions The Geometry of Linear Programs Moving From Bfs to Bfs Organization of a.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
An Introduction to Variational Methods for Graphical Models
Intro to Junction Tree propagation and adaptations for a Distributed Environment Thor Whalen Metron, Inc.
Lecture 2: Statistical learning primer for biologists
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
On Distributing a Bayesian Network
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Hopfield Neural Networks for Optimization 虞台文 大同大學資工所 智慧型多媒體研究室.
Lecture 2: Limiting Models of Instruction Obeying Machine 虞台文 大同大學資工所 智慧型多媒體研究室.
Today Graphical Models Representing conditional dependence graphically
Belief propagation with junction trees Presented by Mark Silberstein and Yaniv Hamo.
Linear Programming 虞台文.
Bayesian Belief Propagation for Image Understanding David Rosenberg.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
EM Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.
Knowledge Representation & Reasoning Lecture #5 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk
Qian Liu CSE spring University of Pennsylvania
Inference in Bayesian Networks
Read R&N Ch Next lecture: Read R&N
Exact Inference Continued
Artificial Intelligence Chapter 19
Markov Networks.
Read R&N Ch Next lecture: Read R&N
CSCI 5822 Probabilistic Models of Human and Machine Learning
Exact Inference ..
Class #19 – Tuesday, November 3
Biointelligence Lab School of Computer Sci. & Eng.
Exact Inference Continued
Class #16 – Tuesday, October 26
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Junction Trees 3 Undirected Graphical Models
Hopfield Neural Networks for Optimization
Read R&N Ch Next lecture: Read R&N
Variable Elimination Graphical Models – Carlos Guestrin
Simulated Annealing & Boltzmann Machines
Presentation transcript:

主講人:虞台文 大同大學資工所 智慧型多媒體研究室 Bayesian Networks 主講人:虞台文 大同大學資工所 智慧型多媒體研究室

Contents Introduction Probability Theory  Skip Inference Clique Tree Propagation Building the Clique Tree Inference by Propagation

Introduction 大同大學資工所 智慧型多媒體研究室 Bayesian Networks Introduction 大同大學資工所 智慧型多媒體研究室

What is Bayesian Networks? Bayesian Networks are directed acyclic graphs (DAGs) with an associated set of probability tables. The nodes are random variables. Certain independence relations can be induced by the topology of the graph.

Why Use a Bayesian Network? Deal with uncertainty in inference via probability  Bayes. Handle incomplete data set, e.g., classification, regression. Model the domain knowledge, e.g., causal relationships.

Example Use a DAG to model the causality. Train Strike Norman Oversleep Martin Oversleep Martin Late Norman Late Boss Failure-in-Love Project Delay Office Dirty Boss Angry

Example Attach prior probabilities to all root nodes Martin oversleep Probability T 0.01 F 0.99 Train Strike Probability T 0.1 F 0.9 Norman oversleep Probability T 0.2 F 0.8 Train Strike Martin Late Norman Project Delay Office Dirty Boss Angry Failure-in-Love Oversleep Boss failure-in-love Probability T 0.01 F 0.99

Example Attach prior probabilities to non-root nodes Each column is summed to 1. Train Strike Martin Late Norman Project Delay Office Dirty Boss Angry Failure-in-Love Oversleep Norman untidy Train strike T F Martin oversleep Martin Late 0.95 0.8 0.7 0.05 0.2 0.3 Norman oversleep T F Norman untidy 0.6 0.2 0.4 0.8

Example Attach prior probabilities to non-root nodes Boss Failure-in-love T F Project Delay Office Dirty Boss Angry very 0.98 0.85 0.6 0.5 0.3 0.2 0.01 mid 0.02 0.15 0.25 little 0.1 0.7 0.07 no 0.9 Each column is summed to 1. Train Strike Martin Late Norman Project Delay Office Dirty Boss Angry Failure-in-Love Oversleep Norman untidy What is the difference between probability & fuzzy measurements?

Medical Knowledge Example

Definition of Bayesian Networks A Bayesian network is a directed acyclic graph with the following properties: Each node represents a random variable. Each node representing a variable A with parent nodes representing variables B1, B2,..., Bn is assigned a conditional probability table (CPT):

Problems Bad news: All of them are NP-Hard How to inference? How to learn the probabilities from data? How to learn the structure from data? What applications we may have? Bad news: All of them are NP-Hard

Inference 大同大學資工所 智慧型多媒體研究室 Bayesian Networks Inference 大同大學資工所 智慧型多媒體研究室

Inference

Example Questions: P (“Martin Late”, “Norman Late”, “Train Strike”)=? Probability T 0.1 F 0.9 Train Strike Martin Late Norman Train Strike T F Martin Late 0.6 0.5 0.4 Train Strike T F Norman Late 0.8 0.1 0.2 0.9 Questions: P (“Martin Late”, “Norman Late”, “Train Strike”)=? Joint distribution P(“Martin Late”)=? Marginal distribution P(“Matrin Late” | “Norman Late ”)=? Conditional distribution

Example C A B Questions: e.g., Demo Probability T 0.048 F 0.032 0.012 0.008 0.045 0.405 Example Demo Train Strike Probability T 0.1 F 0.9 C Train Strike Martin Late Norman Train Strike T F Martin Late 0.6 0.5 0.4 Train Strike T F Norman Late 0.8 0.1 0.2 0.9 A B Questions: P (“Martin Late”, “Norman Late”, “Train Strike”)=? Joint distribution e.g.,

Example C A B Questions: e.g., Demo P (“Martin Late”, “Norman Late”)=? Probability T 0.048 F 0.032 0.012 0.008 0.045 0.405 A B Probability T 0.093 F 0.077 0.417 0.413 Example Demo Train Strike Probability T 0.1 F 0.9 C Train Strike Martin Late Norman Train Strike T F Martin Late 0.6 0.5 0.4 Train Strike T F Norman Late 0.8 0.1 0.2 0.9 A B Questions: P (“Martin Late”, “Norman Late”)=? Marginal distribution e.g.,

Example C A B Questions: e.g., Demo P (“Martin Late”)=? Probability T 0.048 F 0.032 0.012 0.008 0.045 0.405 A B Probability T 0.093 F 0.077 0.417 0.413 Example Train Strike Probability T 0.1 F 0.9 C Train Strike Martin Late Norman Train Strike T F Martin Late 0.6 0.5 0.4 Train Strike T F Norman Late 0.8 0.1 0.2 0.9 A B A Probability T 0.51 F 0.49 Demo Questions: P (“Martin Late”)=? Marginal distribution e.g.,

Example C A B Questions: e.g., P (“Martin Late” | “Norman Late” )=? Probability T 0.048 F 0.032 0.012 0.008 0.045 0.405 A B Probability T 0.093 F 0.077 0.417 0.413 Example Train Strike Probability T 0.1 F 0.9 C Train Strike Martin Late Norman Train Strike T F Martin Late 0.6 0.5 0.4 Train Strike T F Norman Late 0.8 0.1 0.2 0.9 A B A Probability T 0.51 F 0.49 B Probability T 0.17 F 0.83 Questions: P (“Martin Late” | “Norman Late” )=? Conditional distribution e.g., Demo

Inference Methods Exact Algorithms: Approximation Algorithms Probability propagation Variable elimination Cutset Conditioning Dynamic Programming Approximation Algorithms Variational methods Sampling (Monte Carlo) methods Loopy belief propagation Bounded cutset conditioning Parametric approximation methods

Independence Assertions The given terms are called evidences. Independence Assertions Bayesian Networks have build-in independent assertions. An independence assertion is a statement of the form X and Y are independent given Z We called that X and Y are d-separated by Z. That is, or

d-Separation Y1 Y2 Y3 Y4 Z X1 W1 X2 X3 W2

Type of Connections Yi – Z – Xj Y1/2 – Z – Y3/4 Y3 – Z – Y4 Serial Connections Y1 Y2 Y3 Y4 Yi – Z – Xj Converge Connections Y1/2 – Z – Y3/4 Z X1 W1 X2 X3 W2 Y3 – Z – Y4 Diverge Connections Xi – Z – Xj

d-Separation Serial Converge Diverge Z X Y Z X Y Z X Y

Joint Distribution By chain rule By independence assertions JPT: Joint probability table CPT: Conditional probability table Joint Distribution  With this, we can compute all probabilities X8 X6 X1 X4 X2 X3 X10 X7 X5 X9 X11 By chain rule By independence assertions Parents of Xi Consider binary random variables: To store JPT of all r.v’s : 2n 1 table entries To store CPT of all r.v’s: ? table entries

Joint Distribution Consider binary random variables: X8 X6 X1 X4 X2 X3 To store JPT of all r.v’s : 2n 1 table entries To store CPT of all r.v’s: ? table entries

Joint Distribution To store JPT of all random variables: X8 X6 X1 X4 X2 X3 X10 X7 X5 X9 X11 1 2 8 4 To store CPT of all random variables:

More on d-Separation A path from X to Y is d-connecting w.r.t evidence nodes E is every interior nodes N in the path has the property that either X Y It is linear or diverge and not a member of E; or It is converging, and either N or one of its descendants is in E. E

Identify the d-connecting and non-d-connecting paths from X to Y. More on d-Separation A path from X to Y is d-connecting w.r.t evidence nodes E is every interior nodes N in the path has the property that either X Y It is linear or diverge and not a member of E; or It is converging, and either N or one of its descendants is in E. E

More on d-Separation Two nodes are d-separated if there is no d-connecting path between them. X Y E Exercise: Withdraw minimum number of edges such that X and Y are d-separated.

X E Y More on d-Separation Two set of nodes, say, X={X1, …, Xm} and Y={Y1, …, Yn} are d-separated w.r.t. evidence nodes E if any pair of Xi and Yj are d-separated w.r.t. E. X Y E In this case, we have

Clique Tree Propagation 大同大學資工所 智慧型多媒體研究室 Bayesian Networks Clique Tree Propagation 大同大學資工所 智慧型多媒體研究室

References Developed by Lauritzen and Spiegelhalter and refined by Jensen et al. Lauritzen, S. L., and Spiegelhalter, D. J., Local computations with probabilities on graphical structures and their application to expert systems, J. Roy. Stat. Soc. B, 50, 157-224, 1988. Jensen, F. V., Lauritzen, S. L., and Olesen, K. G., Bayesian updating in causal probabilistic networks by local computations, Comp. Stat. Quart., 4, 269-282, 1990. Shenoy, P., and Shafer, G., Axioms for probability and belief-function propagation, in Uncertainty and Articial Intelligence, Vol. 4 (R. D. Shachter, T. Levitt, J. F. Lemmer and L. N. Kanal, Eds.), Elsevier, North-Holland, Amsterdam, 169-198, 1990.

Clique Tree Propagation (CTP) Given a Bayesian Network, build a secondary structure, called clique tree. An undirected tree Inference by propagation the belief potential among tree nodes. It is an exact algorithm.

Notations Item Notation Examples Random variables uninitiated uppercase A, B, C initiated lowercase a, b, c Random vectors Boldface uppercase X, Y, Z Boldface lowercase x, y, z

Definition: Family of a Node The family of a node V, denoted as FV, is defined by: Examples: A B C D E F G H

Potential and Distributions We will model the probability tables as potential functions. Potential and Distributions a P(a) on 0.5 off Function of a. Prior probability All of these tables map a set of random variables to a real value. A B C D E F G H b a on off P(b | a) 0.7 0.2 0.3 0.8 Conditional probability Conditional probability f d on off e P(f | de) 0.95 0.8 0.7 0.05 0.2 0.3 Function of a and b. Function of d, e and f.

Potential 1. Marginalization: 2. Multiplication: Used to implement matrices or tables. Two operations: 1. Marginalization: 2. Multiplication:

Marginalization Example: ABC AB A A B C A B A T 0.048 F 0.032 0.012 0.008 0.045 0.405 Marginalization Example: A B AB T 0.093 F 0.077 0.417 0.413 A A T 0.51 F 0.49

x and y are consistent with z. B C ABC T 0.093 0.08=0.00744 F 0.077 0.08=0.00616 0.417 0.02=0.00834 0.413 0.02=0.00826 0.093 0.09=0.00837 0.077 0.09=0.00693 0.417 0.91=0.37947 0.413 0.91=0.37583 Multiplication Not necessary sum to one. x and y are consistent with z. Example: A B AB T 0.093 F 0.077 0.417 0.413 B C AB T 0.08 F 0.02 0.09 0.91

The Secondary Structure Given a Bayesian Network over a set of variables U = {V1, …, Vn} , its secondary structure contains a graphical and a numerical component. Graphic Component: An undirected clique tree: satisfies the join tree property. Numerical Component: Belief potentials on nodes and edges.

The Clique Tree T How to build a clique tree? The clique tree T for a belief network over a set of variables U = {V1, …, Vn} satisfies the following properties. A B C D E F G H Each node in T is a cluster or clique (nonempty set) of variables. The clusters satisfy the join tree property: Given two clusters X and Y in T, all clusters on the path between X and Y contain XY. For each variable VU, FV is included in at least one of the cluster. Sepsets: Each edge in T is labeled with the intersection of the adjacent clusters. ABD ADE ACE CEG DEF EGH AD AE CE DE EG

The Numeric Component How to assign belief functions? Clusters and sepsets are attached with belief functions. For each cluster X and neighboring sepset S, it holds that It also holds that Local Consistency ABD ADE ACE CEG DEF EGH AD AE CE DE EG Global Consistency

The Numeric Component How to assign belief functions? Clusters and sepsets are attached with belief functions. The key step to satisfy these constraints by letting and If so, ABD ADE ACE CEG DEF EGH AD AE CE DE EG

Building the Clique Tree 大同大學資工所 智慧型多媒體研究室 Bayesian Networks Building the Clique Tree 大同大學資工所 智慧型多媒體研究室

The Steps Belief Network Moral Graph Triangulated Graph Clique Set Join Tree

Moral Graph Belief Network Moral Graph A B C D E F G H A B C D E F G H Triangulated Graph Clique Set Join Tree Belief Network Moral Graph Convert the directed graph to undirected. Connect each pair of parent nodes for each node.

Triangulation Triangulated Graph Moral Graph There are many ways. This step is, in fact, done by incorporating with the next step. Triangulation A B C D E F G H A B C D E F G H Belief Network Moral Graph Triangulated Graph Clique Set Join Tree Triangulated Graph Moral Graph Triangulate the cycles with length more than 4 There are many ways.

Select Clique Set A B C D E F G H A B C D E F G H Belief Network Moral Graph Triangulated Graph Clique Set Join Tree Copy GM to GM’. While GM’ is not empty select a node V from GM’, according to a criterion. Node V and its neighbor form a cluster. Connect all the nodes in the cluster. For each edge added to GM’, add the same edge to GM. Remove V from GM’.

Select Clique Set Criterion: The weight of a node V is the number of values of V. The weight of a cluster is the product of it constituent nodes. Choose the node that causes the least number of edges to be added. Breaking ties by choosing the node that induces the cluster with the smallest weight. Select Clique Set A B C D E F G H A B C D E F G H Belief Network Moral Graph Triangulated Graph Clique Set Join Tree Copy GM to GM’. While GM’ is not empty select a node V from GM’, according to a criterion. Node V and its neighbor form a cluster. Connect all the nodes in the cluster. For each edge added to GM’, add the same edge to GM. Remove V from GM’.

Select Clique Set Criterion: The weight of a node V is the number of values of V. The weight of a cluster is the product of it constituent nodes. Choose the node that causes the least number of edges to be added. Breaking ties by choosing the node that induces the cluster with the smallest weight. Select Clique Set A B C D E F G H A B C D E F G H Belief Network Moral Graph Triangulated Graph Clique Set Join Tree Eliminated Vertex Induced Cluster Edges Added A B C D E F G H H EGH none G CEG none F DEF none C ACE {A, E} B ABD {A, D} D ADE none E AE none A none

Building an Optimal Join Tree We need to find minimal number of edges to connect these cliques, i.e. to build a tree. Belief Network Moral Graph Triangulated Graph Clique Set Join Tree Given n nodes to build a tree, n1 edges are required. Eliminated Vertex Induced Cluster Edges Added H EGH none G CEG none There are many ways. F DEF none C ACE {A, E} How to achieve optimality? B ABD {A, D} D ADE none E AE none A none

Building an Optimal Join Tree Begin with a set of n trees, each consisting of a single clique, and an empty set S. For each distinct pair of cliques X and Y: Create a candidate sepset SXY= XY, with backpointers to X and Y. Insert SXY to S. Repeat until n1 sepsets have been inserted into the forest. Select a sepset SXY from S, according to the criterion described in the next slide. Delete SXY from S. Insert SXY between cliques X and Y only if X and Y are on different trees in the forest. Belief Network Moral Graph Triangulated Graph Clique Set Join Tree

Building an Optimal Join Tree Criterion: The mass of SXY is the number of nodes in XY. The cost of SXY is the weight X plus the weight Y. The weight of a node V is the number of values of V. The weight of a set of nodes X is the product of it constituent nodes in X. Choose the sepset with causes the largest mass. Breaking ties by choosing the sepset with the smallest cost. Begin with a set of n trees, each consisting of a single clique, and an empty set S. For each distinct pair of cliques X and Y: Create a candidate sepset SXY= XY, with backpointers to X and Y. Insert SXY to S. Repeat until n1 sepsets have been inserted into the forest. Select a sepset SXY from S, according to the criterion described in the next slide. Delete SXY from S. Insert SXY between cliques X and Y only if X and Y are on different trees in the forest. Belief Network Moral Graph Triangulated Graph Clique Set Join Tree

Building an Optimal Join Tree C D E F G H Belief Network Moral Graph Triangulated Graph Clique Set Join Tree Graphical Transformation ABD ADE ACE CEG DEF EGH AD AE CE DE EG

Inference by Propagation 大同大學資工所 智慧型多媒體研究室 Bayesian Networks Inference by Propagation 大同大學資工所 智慧型多媒體研究室

Inferences PPTC: Probability Propagation in Tree of Cliques. Inference without evidence Inference with evidence PPTC: Probability Propagation in Tree of Cliques.

Inference without Evidence Train Strike Martin Late Norman Project Delay Office Dirty Boss Angry Failure-in-Love Oversleep Demo

Procedure for PPTC without Evidence Belief Network Building Graphic Component Graphical Transformation Join Tree Structure Building Numeric Component Initialization Inconsistent Join Tree Propagation Consistent Join Tree Marginalization

Initialization For each cluster and sepset X, set each X(x) to 1: For each variable V: Assign to V a cluster X that contains FV; call X the parent cluster of FV. Multiply X(x) by P(V | V). A B C D E F G H ABD ADE ACE CEG DEF EGH AD AE CE DE EG

Initialization c a on off 0.7 0.2 0.3 0.8 P(c | a) e c on off 0.3 0.6 0.4 P(e | c) Initialization a c e Initial Values on off 1 A B C D E F G H  0.7  0.3  0.2  0.8  0.3  0.7  0.6  0.4 = 0.21 = 0.49 = 0.18 = 0.12 = 0.06 = 0.14 = 0.48 = 0.32 c e Initial Values on off 1 ABD ADE ACE CEG DEF EGH AD AE CE DE EG

By independence assertions N : # clusters Q : # variables By independence assertions Initialization a c e Initial Values on off 1 A B C D E F G H  0.7  0.3  0.2  0.8  0.3  0.7  0.6  0.4 = 0.21 = 0.49 = 0.18 = 0.12 = 0.06 = 0.14 = 0.48 = 0.32 c e Initial Values on off 1 ABD ADE ACE CEG DEF EGH AD AE CE DE EG

By independence assertions N : # clusters Q : # variables By independence assertions Initialization a c e Initial Values on off 1 A B C D E F G H  0.7  0.3  0.2  0.8  0.3  0.7  0.6  0.4 = 0.21 = 0.49 = 0.18 = 0.12 = 0.06 = 0.14 = 0.48 = 0.32 After initialization, global consistency is satisfied, but local consistency is not. c e Initial Values on off 1 ABD ADE ACE CEG DEF EGH AD AE CE DE EG

Global Propagation X Y It is used to achieve local consistency. Let’s consider single message passing first. Message Passing Absorption on receiving cluster: X Y R Projection on sepset:

The Effect of Single Message Passing Absorption on receiving cluster: X Y R Projection on sepset:

Global Propagation Choose an arbitrary cluster X. Unmark all clusters. Call Ingoing-Propagation(X). Unmark all clusters. Call Outgoing-Propagation(X).

Global Propagation Ingoing-Propagation(X) Outgoing-Propagation(X) Choose an arbitrary cluster X. Unmark all clusters. Call Ingoing-Propagation(X). Unmark all clusters. Call Outgoing-Propagation(X). Global Propagation Ingoing-Propagation(X) Mark X. Call Ingoing-Propagation recursively on X’s unmarked neighboring clusters, if any. Pass a message from X to the cluster which invoked Ingoing-Propagation(X). Outgoing-Propagation(X) Mark X. Pass a message from X to each of its unmarked clusters, if any. Call Outgoing-Propagation recursively on X’s unmarked neighboring clusters, if any. 1 3 5 ABD ADE ACE CEG DEF EGH AD AE CE DE EG After global propagation, The clique tree is both global and local consistent. 8 6 9 2 7 10 4

Marginalization ABD = Consistent Join Tree a b d ABD(abd) on off .225 .025 .125 .180 .020 .150 ABD = a P (a) on off .225 + .025 + .125 + .125 = .500 .180 + .020 + .150 + .150 = .500 d P (d) on off .225 + .125 + .180 + .150 = .680 .025 + .125 + .020 + .150 = .320 ABD ADE ACE CEG DEF EGH AD AE CE DE EG Consistent Join Tree

Review: Procedure for PPTC without Evidence Belief Network Building Graphic Component Graphical Transformation Join Tree Structure Building Numeric Component Initialization Inconsistent Join Tree Propagation Consistent Join Tree Marginalization

Inference with Evidence Train Strike Martin Late Norman Project Delay Office Dirty Boss Angry Failure-in-Love Oversleep Demo

E = e Observations Observations are the simplest forms of evidences. An observations is a statement of the form V = v. Collections of observations may be denoted by Observations are referred to as hard evidence. E = e An instantiation of a set of variable E.

Likelihoods Given E = e, the likelihood of V, denoted as V, is defined as:

Likelihoods A B C D E F G H V(v) V on off Variable v = on v = off A 1 D E F G H A B C D E F G H on off

Procedure for PPTC with Evidence Belief Network Join Tree Structure Inconsistent Join Tree Consistent Join Tree Graphical Transformation Propagation Building Graphic Component Building Numeric Component Initialization Observation Entry Marginalization Normalization

Initialization with Observations C D E F G H ABD ADE ACE CEG DEF EGH AD AE CE DE EG For each cluster and sepset X, set each X(x) to 1: For each variable V: Assign to V a cluster X that contains FV; call X the parent cluster of FV. Multiply X(x) by P(V | V). Set each likelihood element V(v) to 1:

Observation Entry Encode the observation V = v as: Identify a cluster X that contains V. Update X and V: A B C D E F G H ABD ADE ACE CEG DEF EGH AD AE CE DE EG

Marginalization After global propagation, ABD ADE ACE CEG DEF EGH AD AE CE DE EG

Normalization After global propagation, Normalization

Handling Dynamic Observations How to handle the consistency if the observation is changed to e2? Suppose that the join tree now is consistent for e1.

Observation States e1 e2 Three observation states for a variable, say, V : No change Update Retraction V is unobserved  observed V is observed  unobserved or V = v1  V = v2 , v1  v2

Handling Dynamic Observations Initialization Observation Entry Marginalization Normalization Belief Network Join Tree Structure Inconsistent Join Tree Consistent Join Tree Graphical Transformation Propagation Global Update Global Retraction When? When?