Genome evolution: a sequence-centric approach Lecture 6: Belief propagation.

Slides:

Advertisements

Similar presentations

Bayesian Belief Propagation

Advertisements

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.

Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar Slides available online

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

Section 3: Appendix BP as an Optimization Algorithm 1.

Exact Inference in Bayes Nets

Loopy Belief Propagation a summary. What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some.

Convergent Message-Passing Algorithms for Inference over General Graphs with Convex Free Energies Tamir Hazan, Amnon Shashua School of Computer Science.

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

1 Fast Primal-Dual Strategies for MRF Optimization (Fast PD) Robot Perception Lab Taha Hamedani Aug 2014.

Belief Propagation on Markov Random Fields Aggeliki Tsoli.

CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Visual Recognition Tutorial

Bayesian network inference

1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.

Global Approximate Inference Eran Segal Weizmann Institute.

Lecture 5: Learning models using EM

Genome evolution: a sequence-centric approach Lecture 5: Undirected models and variational inference.

Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs.

Understanding Belief Propagation and its Applications Dan Yuan June 2004.

Genome evolution: a sequence-centric approach Lecture 4: Beyond Trees. Inference by sampling Pre-lecture draft – update your copy after the lecture!

Genome Evolution. Amos Tanay 2009 Genome evolution: Lecture 8: Belief propagation.

Aspects of Bayesian Inference and Statistical Disclosure Control in Python Duncan Smith Confidentiality and Privacy Group CCSR University of Manchester.

If we measured a distribution P, what is the tree- dependent distribution P t that best approximates P? Search Space: All possible trees Goal: From all.

Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.

Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep

Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 7th~10th Belief propagation Appendix Kazuyuki Tanaka Graduate School of Information.

Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.

MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/24/10.

Belief Propagation. What is Belief Propagation (BP)? BP is a specific instance of a general class of methods that exist for approximate inference in Bayes.

Probabilistic Graphical Models

Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 7th~10th Belief propagation Kazuyuki Tanaka Graduate School of Information Sciences,

CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

Lecture 2: Statistical learning primer for biologists

Belief Propagation and its Generalizations Shane Oldenburger.

CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Genome Evolution. Amos Tanay 2010 Genome evolution: Lecture 9: Variational inference and Belief propagation.

Graduate School of Information Sciences, Tohoku University

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

Mean field approximation for CRF inference

RADFORD M. NEAL GEOFFREY E. HINTON 발표: 황규백

Today Graphical Models Representing conditional dependence graphically

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Distributed cooperation and coordination using the Max-Sum algorithm

Bayesian Belief Propagation for Image Understanding David Rosenberg.

Statistical-Mechanical Approach to Probabilistic Image Processing -- Loopy Belief Propagation and Advanced Mean-Field Method -- Kazuyuki Tanaka and Noriko.

Graduate School of Information Sciences, Tohoku University, Japan

Markov Networks.

Bayesian Models in Machine Learning

Generalized Belief Propagation

Markov Random Fields Presented by: Vladan Radosavljevic.

Graduate School of Information Sciences, Tohoku University

Physical Fluctuomatics 7th~10th Belief propagation

Algorithms and Theory of

Expectation-Maximization & Belief Propagation

Probabilistic image processing and Bayesian network

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Mean Field and Variational Methods Loopy Belief Propagation

Presentation transcript:

Genome evolution: a sequence-centric approach Lecture 6: Belief propagation

Probabilistic models Inference Parameter estimation Genome structure Mutations Population Inferring Selection (Probability, Calculus/Matrix theory, some graph theory, some statistics) Simple Tree Models HMMs and variants PhyloHMM,DBN Context-aware MM Factor Graphs DP Sampling Variational apx. EM Generalized EM (optimize free energy) Refs: HMM,simple tree: Durbin Basic BNs: Heckerman Sampling: Mackey book Variational: Jojic et al. paper LBP: Yedidia,Freeman, Weiss

Simple Tree: Inference as message passing s s ss s s s You are P(H|our data) I am P(H|all data) DATA

Belief propagation in a factor graph Remember, a factor graph is defined given a set of random variables (use indices i,j,k.) and a set of factors on groups of variables (use indices a,b..) Think of messages as transmitting beliefs: a->i : given my other inputs variables, and ignoring your message, you are x i->a : given my other inputs factors and my potential, and ignoring your message, you are x x a refers to an assignment of values to the inputs of the factor a Z is the partition function (which is hard to compute) The BP algorithm is constructed by computing and updating messages: Messages from factors to variables: Messages from variables to factors: (any value attainable by x i )->real values

Messages update rules: a i a i Messages from variables to factors: Messages from factors to variables:

The algorithm proceeds by updating messages: Define the beliefs as approximating single variables posterios (p(h i |s)): Algorithm: Initialize all messages to uniform Iterate until no message change: Update factors to variables messages Update variables to factors messages Why this is different than the mean field algorithm?

Beliefs on factor inputs This is far from mean field, since for example The update rules can be viewed as derived from the: 1.requirement on the variables beliefs (b i ) 2.requirement on the factor beliefs (b a ) 3.Marginalization requirement: Here’s how:

BP on Tree = Up-Down s4s4 s3s3 h2h2 h3h3 e s2s2 s1s1 h1h1 ba c d 21 3

Loopy BP is not guaranteed to converge XY This is not a hypothetical scenario – it frequently happens when there is too much symmetry For example, most mutational effects are double stranded and so symmetric which can result in loops.

The Bethe Free Energy H. Bethe LBP was introduced in several domains (BNs, Coding), and is consider very practical in many cases...but unlike the variational approaches we studied before, it is not clear how it approximate the likelihood/partition function, even when it converges.. Compare to the variational free energy: Theorem: beliefs are LBP fixed points if and only if they are locally optimal for the Bethe free energy In the early 2000, Yedidia, Freeman and Weiss discovered a connection between the LBP algorithm and the Bethe free energy developed by Hans Bethe to approximate the free energy in crystal field theory back in the 40’s/50’s.

Generalization: Regions-based free energy Start with a factor graph (X,A) Introduce regions (X R,A R ) and multipliers c R We require that: We will work with valid regions graphs: Region-based average energy Region average energy Region Entropy Region Free energy Region-based entropy Region-based free energy

Bethe regions are the factors neighbors set and single variables regions: a c b We compensate for the multiple counting of variables using the multiplicity constant We can add larger regions As long as we update the multipliers: RaRa R ac R bc

Multipliers compensate on average, not on entropy Claim: If the regions’ beliefs are exact then the average region-based energy is exact. We cannot guarantee much on the region-based entropy: Claim: the region-based entropy is exact when the model is a uniform distribution Proof: exercise. This means that the entropy count the correct number of degrees of freedom – e.g. for binary variables, H=Nlog2 Definition: a region based free energy approximation is said to be max-ent normal if its region-based entropy is maximized when the beliefs are uniform. An non max-ent approximation can minimize the region free energy by selecting erroneously high entropy beliefs!

Bethe’s region are max-ent normal Claim: The Bethe regions gives a max-ent normal approximation (i.e. it maximize the region-based entropy on the uniform distribution) EntropyInformation (maximal on uniform)(0 and minimal on uniform)

Start with a complete graph and binary factors Add all variable triplets, pairs and singleton as regions Generate multipliers: triplets = 1 (20 overall) pairs = -3 (15 overall) singletons = 6 (6 overall)( guarantee consistency) Example: A Non max-ent approximation Look at the consistent beliefs: The Region entropy (for any region) = ln2. The total region entropy is: We claimed before the entropy of the uniform distribution will be exact: 6ln2

We basically solve a variational problem: While enforcing constraints on the regions’ beliefs: Inference as minimization of region-based free energy Unlike the structured variational approximation we discussed before, and although the beliefs are (pairwise) compatible, we can have cases with locally optimal beliefs that are not representing a true global posterior distribution C B A Optimal region beliefs are identical to the factors: This is pairwise consistent, but cannot be the result of any joint distribution on the three vars (we have a negative feedback loop here)

Claim: When it converges, LBP finds a minimum of the Bethe free energy. Proof idea: we have an optimization problem (minimum energy) with constraints (beliefs are consistent and adds up to 1). We write down a Lagrangian that expresses both minimization goal and constraints, and show that it is minimized when the LBP update rules are holding. Inference as minimization of region-based free energy Important technical point: we shall assume that in the fixed point all beliefs are non zero. This can be shown to hold if all factors are “soft” (do not contain zero values for any assignment).

The Bethe Lagrangian Large region beliefs are normalized Variable region beliefs are normalized Marginalization

The Bethe lagrangian Take the derivatives with respect to each b a and b i :

Bethe minimum are LBP fixed points So here are the conditions: And we can solve them if: Giving us: We saw before these conditions, with the marginalization constraint, are generating the update rules! So L minimum -> LBP fixed point is proven. The other direction quite direct – see Exercise LBP is in fact computing the lagrange multipliers – a very powerful observation

Generalizing LBP for region graphs Parent-to-child beliefs: A region graph is graph on subsets of nodes in the factor graph, with valid multipliers (as defined above) R D(R) – Decedents of R P(R) regions (X R,A R ) and multipliers c R We require that: We will work with valid regions graphs: P(D(R))\D(R) P(R) – Parents of R D(R)

Generalizing LBP for region graphs Parent-to-child algorithm: I J D(P)+P Not D(P)+P D(R) – Decedents of R P(R) – Parents of R P R D(R)+R I J D(P)+P P R D(R)+R N(I,J) = I not in D(P)+P J in D(P)+P but not D(R)+R D(I,J) = I in D(P)+P but not D(R)+R J in D(R)+R

GLBP in practice LBP is very attractive for users: really simple to implement, very fast LBP performance is limited by the size of region assignments X a which can grow rapidly with the factor’s degrees or the size of large regions GLBP will be powerful when large regions can capture significant dependencies that are not captured by individual factors – think small positive loop or other symmetric effects LBP messages can be computed synchronously (factors->variables->factors…), other scheduling options may boost up performance considerably LBP is just one (quite indirect) way by which Bethe energies can be minimized. Other approaches are possible – which can be guaranteed to converge The Bethe/Region energy minimization can be further constraint to force beliefs are realizable. This gives rise to the concept of Wainwright-Jordan marginal polytope and convex algorithms on it.