General Gibbs Distribution Representation Probabilistic Graphical Models Markov Networks General Gibbs Distribution
Consider a fully connected pairwise Markov network over X1,…,Xn where each Xi has d values. How many parameters does the network have? O(dn) O(nd) O(n2d2) O(nd)
Gibbs Distribution Parameters: a1 b1 c1 0.25 c2 0.35 b2 0.08 0.16 a2 0.05 0.07 a3 0.15 0.21 0.09 0.18 Parameters:
Gibbs Distribution
Markov Network Representation P factorizes over H
Separation in Undirected Graph H A trail between X and Y is active given Z X and Y are separated in H given Z if
Independence Assumptions in H The independencies implied by H I(H) = We say that H is an I-map (independence map) of P if Define I(G)
Factorization Independence Theorem: If P factorizes over H then H is an I-map for P
Independence Factorization Hammersley-Clifford Theorem: If H is an I-map for P, and P is positive, then P factorizes over H
Which parameterization of P factorizes over the graph H? D B C All of the above
Graph Structure & Factorization Factorization not unique, but same independencies
Summary Gibbs distribution represents distribution as a product of factors Associated Markov network connects every pair of nodes that are in the same factor Can read independencies that must hold in P from Markov network separation Markov network structure doesn’t fully specify the factorization of P
END END END
The Chain Rule for Bayesian Nets Intelligence Difficulty Grade Letter SAT 0.3 0.08 0.25 0.4 g2 0.02 0.9 i1,d0 0.7 0.05 i0,d1 0.5 g1 g3 0.2 i1,d1 i0,d0 l1 l0 0.99 0.1 0.01 0.6 0.95 s0 s1 0.8 i1 i0 d1 d0 P(D,I,G,S,L) = P(D) P(I) P(G | I,D) P(L | G) P(S | I)
Suppose q is at a local minimum of a function Suppose q is at a local minimum of a function. What will one iteration of gradient descent do? Leave q unchanged. Change q in a random direction. Move q towards the global minimum of J(q). Decrease q.
Fig. A corresponds to a=0.01, Fig. B to a=0.1, Fig. C to a=1.