Bayesian networks, introduction Graphical models: nodes (vertices) links (edges)

A graph can be disconnected: or connected: ; undirected: or directed: the edges are one-directed arrows cyclic: or acyclic: possible to start in one node and “come back”

Examples: Transport routes: S I1I1 I 2A I 2B F Acyclic, but not completely directed Junction trees: ABC DE FGH ABC DE FGH ABD DFG BDGBEG BCE EGH From 8 nodes to 6 nodes (Source: Wikipedia)

Markov random field Given the light blue nodes, the middle blue node is conditionally independent of all other nodes (the white nodes)

Bayesian (belief) networks A Bayesian network is a connected directed acyclic graph (DAG) in which the nodes represent random variables the links represent direct relevance relationships among variables Examples: X Y This small network has two nodes representing the random variable X and Y. The directed link gives a relevance relationship between the two variables that means Pr (Y = y | X = x, I )  Pr (Y = y | I )

X Y Z This network has three nodes representing the random variables X, Y and Z. The directed links give relevance relationships that means Pr ( Y = y | X = x, I )  Pr ( Y = y | I ) Pr ( Z = z | X = x, I )  Pr ( Z = z | I ) but also (as will be seen below) Pr ( Z = z | Y = y, X = x, I ) = Pr ( Z = z | X = x, I )

Structures in a Bayesian network There are two classifications for nodes: parent nodes and child nodes parent nodechild node child nodes parent nodes Thus, a node can be solely a parent node, solely a child node or both!

Probability “tables” Each node represents a random variable. This random variable has either assigned probabilities (nominal scale or discrete) or an assigned probability density function (continuous scale) for its states. For a node that is solely a parent node: The assigned probabilities or density function are conditional on background information only (may be expressed as unconditional) For a node that is a child node (solely or joint parent/child): The assigned probabilities or density function are conditional on the states of its parent nodes (and on background information).

Example Dyes on banknotes (from previous lectures) Two states: A?A? B?B? A?A?Probabilities 0.001 0.999 Probabilities A?:A?: B?:B?:0.990.02 0.010.98

More about the structure… Ancestors and descendants: A node X is an ancestor of a node Y and Y is in turn a descendant of X if there is a unidirectional path from X to Y A E B G DF H I C AncestorDescendants AD, E, G, I BE, F, G, H, I CF, H, I EG, I FH, I GI HI

Different connections: A BC diverging connection ABC serial connection AB C converging connection

Conditional independence and d-separation 1) Diverging connection A BC There is a path between B and C even if it not unidirectional  B may be relevant for C (and vice versa) However, if the state of A is known this relevance is lost.: The path is blocked  B and C are conditionally independent given A

Example: Assume the old rat Willie was caught in a trap. We have also found a sac with wheat grains with a small hole where grains have leaked out, and we suspect that Willie made this hole. Examining the sac and Willie we find traces of wheat grain in the jaw of Willie traces of saliva at the damage on the sac that matches the DNA of Willie.

Note that the states of B and C are actually given, but the description gives a complete model The whole scenario can be described with three random variables : A with the two states: A 1 : “Willie made the whole in the sac” A 2 : “Willie has not been near the sac”, B with the two states: B 1 : “Traces of wheat grain found in Willie’s jaw” B 2 : “No traces of wheat grain found in Willie’s jaw”, C with the two states: C 1 : “Match between saliva DNA and Willie’s DNA” C 2 : “No match in DNA between saliva and Willie”

First we assume none of the states are given: Is B relevant for C ? Yes, because if B 1 is true, i.e. we have found wheat grains in Willie’s jaw, the conditional probability of obtaining a match in DNA would be different from the corresponding conditional probability if B 2 was true. Now assume for A that the state A 2 is given, i.e. Willie was never near the sac. Under this condition B can no longer be relevant for C as whether we find a match in DNA between the saliva trace and Willie or not can have nothing to do with the grains we have found in Willie’s jaw. Now assume for A that the state A 1 is given, i.e. Willie made the hole. Under this condition it is tempting to think that B is relevant for C, but the relevance is actually lost. Whether we find a match in DNA or not cannot have any impact on whether we find grains in the jaw or not once we have stated that Willie made the hole.

The scenario can be described with the Bayesian network A BC i.e. a diverging connection When a state of a node is assumed to be given we say the node is instantiated In the example, once A is instantiated the relevance relationship between B and C is lost. B and C are thus conditionally independent given a state of A

2) Serial connection ABC There is a path between A and C (unidirectional from A to C)  A may be relevant for C (and vice versa) If the state of B is known this relevance is lost.: The path is blocked  A and C are conditionally independent given (a state of) B

Example The Willie case with another description Let A be a random variable with states A 1 : “Willie made the hole in the sac” A 2 : “Willie did not make the hole”, B be a random variable with states B 1 : “Willie left saliva on the damage” B 2 : “Willie left no saliva”, C be a random variable with states C 1 : “There is a match in DNA” C 2 : “There is no match” Assuming no state is given, there is a relevance relationship between A and C:

Now assuming state B 1 of B is given, i.e. we assume there was a contact between Willie’s jaw and the damage. A can no longer be relevant for C as once we have stated that Willie left saliva it does not matter for C whether he made the hole or not. The scenario can be described with the Bayesian network ABC Once B is instantiated the relevant relationship between A and C is lost. A and C are conditionally independent given a state of B

3) Converging connection AB C There is a path between A and B (not unidirectional)  A may be relevant for B (and vice versa) If the state of C is (completely) unknown this relevance does not exist. If the state of C is known (exactly or by a modification of the state probabilities) the path is opened  A and C are conditionally dependent given information about the states of C, otherwise they are (conditionally) independent

Example Paternity testing: child, mother and the true father Let A be a random variable representing the mother’s genotype in a specific locus B be a random variable representing the true father’s genotype in the same locus C be a random variable representing the child’s genotype in that locus A1A1 A2A2 A: B1B1 B2B2 B: C1C1 C2C2 C:

If we know nothing about C (C 1 and C 2 are both unknown), then information about A cannot have any impact on B and vice versa. If we on the other hand know the genotype of the child (C 1 and C 2 are both known or one of them is) then knowledge of the genotype of the mother has impact on the probabilities of the different genotypes that can be possessed by the true father since the child must have inherited half of the genotype from the mother and the other half from the father. AB C Bayesian network:

d-separation In a directed acyclic graph (DAG) the concept of d-separation is defined as: Let S X, S Y and S X be three disjoint subsets of variables included in the DAG The sets S X and S Y are d-separated given S Z if every path between a variable X in S X and a variable Y in S Y contains either a serial connection through a variable Z in S Z or a divergent connection diverging from a variable Z in S Z or a converging connection converging to a variable W not in S Z and of which no descendants belong to S X

No direct link from red area to blue area or vice versa No convergence from blue area and red area to green area X1X1 X2X2 Y1Y1 Y2Y2 Y3Y3 Z1Z1 Z3Z3 Z2Z2 W1W1 W2W2

The Markov property - formal definition of a Bayesian network Consider a variable X in a DAG Let PA(X ) be the set of all parents to X and DE(X ) be the set of all descendants to X. Let S Y be a set of variables that does not include any variables in DE(X ), i.e. are not descendants of X Then, the DAG is a Bayesian network if and only if i.e. X is conditionally independent of S Y given PA(X ) This is also known as the Markov property Note, by Pr(X | … ) we mean the probability of X having a particular state

X D1D1 D1D1 D3D3 Y1Y1 Y2Y2 Y4Y4 Y3Y3 W1W1 W2W2 Example

Software GeNIe (Graphical network Interface) Software free-of-charge Powerful for building complex network and running with moderately large probability tables Download from http://genie.sis.pitt.edu/ HUGIN Commercial software Probably today’s most powerful software for Bayesian networks A demo version (less powerful than GeNIe) can be downloaded from www.hugin.com

Example A burglary was done in a shop. On the shop floor the police have secured a shoeprint. In the home of a suspect a shoe is found with a sole pattern that matches that of the shoeprint. In a compiled database of shoeprints it is found that the particular pattern is prevalent on 3 out of 657 prints. Hypotheses (usually called propositions in forensic literature): H p : “The shoeprint was made by the found shoe” H d : “The shoeprint was made by some other shoe” “p” in H p stands for “Prosecutor” (incriminating proposition) “d” in H d stands for “Defence” (alternative to the incriminating) Evidence: E : “There is a match in pattern between shoeprint and the found shoe”

Default settings

Table automatically set from table of H

Setting the probability table for node E If proposition H p (Shoeprint was made by found shoe) is true: If proposition H d (Shoeprint was made by another shoe) is true: where  is the proportion shoes in the (relevant) population of shoes having the observed pattern

The proportion  is unknown, but an estimate from the database can be used

Run the network

Instantiate the match

On the other we could directly have computed which is more accurate

Alternative network Propositions (as before): H p : “The shoeprint was made by the found shoe” H d : “The shoeprint was made by some other shoe” Evidence: X : Sole pattern of the found shoe States: q (the observed pattern) non-q Y : Pattern of the shoe print States:q non-q

in a network… H X Y Probability table for Y HHpHp HdHd Xqnon-qq Yq103/657 non-q01654/657 Probability table for X XProbability q  non-q 1– 

Note! We need to give a probability table for X to make the software work. However, we do not know  but it does not matter what value we set here. Instantiate nodes X and Y both to q

Example: (more complex) In the head of the experienced examiner Assume there is a question whether an individual has a specific disease A or another disease B. What is observed is The individual has an increased level of substance 1 The individual has recurrent fever attacks The individual has light recurrent pain in the stomach

The experience of the examining physician says 1.If disease A is present it is quite common to have an increased level of substance 1. 2.If disease B is present it is less common to have an increased level of substance 1. 3.If disease A is present it is not generally common to have recurrent fever attacks, but if there is also an increased level of substance 1 such events are very common 4.Recurrent fever attacks are quite common when disease B is present regardless of the level of substance 1 5.Recurrent pain in the stomach are generally more common when disease B is present than when disease A is present, and regardless of the level of substance 1 and whether fever attacks are present or not 6.If a patient has disease A, increased levels of substance 1 and recurrent fever attacks he/she would almost certainly have recurrent pain in the stomach. Otherwise, if disease A is present recurrent pain in the stomach is equally common. Can we put this up in a network?

Let the “disease node” be H, with states A and B Let the “evidence” nodes be X with states x 1 : “The individual has an increased level of substance 1” x 2 : “The individual has a normal level of substance 1” Y with states y 1 : “The individual has recurrent fever attacks” y 2 : “The individual has no fever attacks” Z with states z 1 : “The individual has light recurrent pain in the stomach” z 2 : “The individual has no pain in the stomach”

H X Y Z Probability table for X H:AB Xx1x1  x2x2 1 –  1 –  Probability table for Y H:AB X :x1x1 x2x2 x1x1 x2x2 Yy1y1  y2y2 1 –  1 –  1 – Probability table for Z H:AB X :x1x1 x2x2 x1x1 x2x2 Y :y1y1 y2y2 y1y1 y2y2 y1y1 y2y2 y1y1 y2y2 Zz1z1 1  z1z1 0 1 –  1 – 

The probabilities set out in the tables take into account some of the experience listed (e.g. that some probabilities are equal) However, we need to estimate numbers for , ,, , ,  and  Experience 1 & 2   >>  Assume   0.8 and   0.2 Experience 3   high and  < 0.5 Assume   0.9,   0.3 Experience 4  Assume  0.8 Experience 5 & 6   >  Assume   0.6 and   0.4

Run network

Instantiate the nodes X, Y and Z

The likelihood ratio of the evidence becomes Thus the three observations are combined 7.5 times more probable if disease A is present than if disease B is present.

Bayesian networks, introduction Graphical models: nodes (vertices) links (edges)

Similar presentations

Presentation on theme: "Bayesian networks, introduction Graphical models: nodes (vertices) links (edges)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bayesian networks, introduction Graphical models: nodes (vertices) links (edges)

Similar presentations

Presentation on theme: "Bayesian networks, introduction Graphical models: nodes (vertices) links (edges)"— Presentation transcript:

Similar presentations

About project

Feedback