Christopher M. Bishop, Pattern Recognition and Machine Learning 1.

Christopher M. Bishop, Pattern Recognition and Machine Learning 1

Outline  Introduction  Directed Graphs  Undirected Graphs  Factor Graphs  Summary 2

Introduction  A graph consists of nodes (vertices) that are connected by edges (links, arcs)  They provide a simple and clear way to visualize the probabilistic model  Complex computations can be expressed in terms of graphical manipulations 4

Probabilistic Graphical Models  There are two models: directed and undirected graphical models  Each node represents a random variable and the edges represent probabilistic relationships between these variables 5 DirectedUndirected

Directed Graphical Models  An example:  Definition: for a graph with K nodes, the joint distribution is given by where denotes the set of parents of 7 a b c

An Example 8 x1x1 x2x2 x3x3 x4x4 x5x5 x7x7 x6x6

Conditional Independence (1)  is conditionally independent of given :  A shorthand notation:  There are three types of conditional independencies for the directed graphs 9

Conditional Independence (2) 10 ab c ab c tail-to-tail blocked

Conditional Independence (2)  Definition: d-separation is the notion of being separated on a directed graph 11 abc a b c a b c head-to-tail head-to-head dependence

D-separation: an example 12 a b c e f

Application: an Example  Hidden Markov model: 13

Undirected Graphical Models  Nodes of set A and B are separated by the third set C  A and B are conditionally independent,  15 A B C

Conditional Independence  The computers can infect each other via the hubs and the hubs can infect each other via the computers 16 C1 C2 H1H2

Cliques  Definition: a subset of the nodes in a clique is fully connected  Maximal cliques  We can define the factors in decomposition of the joint distribution as functions of the variable in the clique 17 x1x1 x2x2 x4x4 x3x3

Undirected Factorization  Consider factorizations of the form: where is a non-negative potential function of a maximal clique  An example: 18 x1x1 x2x2 x4x4 x3x3

An Example  Markov random field: 19

Directed versus Undirected (1)  We have to discard some conditional independence properties to complete this transfer 20 x1x1 x2x2 x4x4 x3x3 x1x1 x2x2 x4x4 x3x3 moralization moral graph

Directed versus Undirected (2)  P: the set of all distributions over a given set of variables 21 P DU

Factor Graphs (1)  A factor graph is a more general graph  It allows us to be more explicit about the details of the factorization  An example: 23 x1x1 x2x2 fafa x3x3 fbfb fcfc fdfd Factor node Variable node

Factor Graphs (2)  Definition: given a factor graph, the joint probability distribution is given by where the denotes a subset of the variables that connect to the factor  Each factor is a function of a corresponding set of variables 24

Factor Graphs (3) 25

 Directed and undirected graphs are special cases of factor graphs Factor Graphs (4) 26

Sum-Product Algorithm (1)  Goal: Obtain a efficient, exact inference algorithm for finding marginals Allow computations to be shared efficiently  By definition, the marginal is 27

Sum-Product Algorithm (2) where : the factor nodes are neighbors of x : all variables in the subtree : the product of all the factors in the group associated with factor 28

Sum-Product Algorithm (3)  can be view as messages from the factor node f s to the variable node x  which is a factor sub-graph can itself be factorized 29

Sum-Product Algorithm (4) 30 x1x1 x2x2 fsfs xMxM x G 1 (x 1,X s1 )

Sum-Product Algorithm (5) 31

Sum-Product Algorithm (6) 32

Sum-Product Algorithm (7)  Messages: Variable node  factor node: take the product of the in coming messages along all of the other link Factor node  variable node: take the product of the in coming messages along all of the other link and multiply by the factor 33

Sum-Product Algorithm (8)  The sum-product algorithm can be viewed purely in terms of messages sent out by factor nodes to other factor nodes 34

35 x1x1 x2x2 fafa x3x3 fbfb fcfc x4x4 root Sum-Product Algorithm – an Example

Max-Sum Algorithm (1)  Find a setting of the variables that has the largest probability  Find the value of that probability 36

Max-Sum Algorithm (2)  Compare this with the marginal:  That is similar to the sum-product algorithm except that the summations are replaced by maximization 37

Max-Sum Algorithm (3)  The max-product algorithm: 38

Max-Sum Algorithm (4)  It is convenient to work with the logarithm of the joint distribution  The max-sum algorithm: 39

Max-Sum Algorithm (5)  We can find the maximum by propagating messages from leaves to a root node  Now we want to find the configuration of the variables for which the joint distribution attains this maximum value 40

Max-Sum Algorithm (6)  An example:  Once we know, we can propagate a message back down the chain using 41 x1x1 x2x2 f 1,2 x3x3 f 2,3 f N-1,N x N-1 xNxN

Max-Sum Algorithm (7)  It is known as back-tracking  This can be extended to a general tree- structure factor graph 42

Examples  A Markov chain:  A hidden Markov model: 43

Summary  The author introduces three types of probabilistic graphs  Graphical models are composed of probability theory and graphical theory  The concept is to factorize a complicated system into some simple components 45

Christopher M. Bishop, Pattern Recognition and Machine Learning 1.

Similar presentations

Presentation on theme: "Christopher M. Bishop, Pattern Recognition and Machine Learning 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Christopher M. Bishop, Pattern Recognition and Machine Learning 1.

Similar presentations

Presentation on theme: "Christopher M. Bishop, Pattern Recognition and Machine Learning 1."— Presentation transcript:

Similar presentations

About project

Feedback