Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of.

Basic Steps 1.Compute the x and y image derivatives 2.Classify each derivative as being caused by either shading or a reflectance change 3.Set derivatives.

Generating Random Spanning Trees Sourav Chatterji Sumit Gulwani EECS Department University of California, Berkeley.

Energy-Efficient Distributed Algorithms for Ad hoc Wireless Networks Gopal Pandurangan Department of Computer Science Purdue University.

Mean-Field Theory and Its Applications In Computer Vision1 1.

Bayesian Belief Propagation

1 LP, extended maxflow, TRW OR: How to understand Vladimirs most recent work Ramin Zabih Cornell University.

HOPS: Efficient Region Labeling using Higher Order Proxy Neighborhoods Albert Y. C. Chen 1, Jason J. Corso 1, and Le Wang 2 1 Dept. of Computer Science.

Load Balancing Parallel Applications on Heterogeneous Platforms.

. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.

1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.

Introduction to Markov Random Fields and Graph Cuts Simon Prince

Exact Inference in Bayes Nets

ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.

Loopy Belief Propagation a summary. What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some.

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

Markov Networks.

Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.

Hidden Markov Models Fundamentals and applications to bioinformatics.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Yung-Lin Huang, Yi-Nung Liu, and Shao-Yi Chien Media IC and System Lab Graduate Institute of Networking and Multimedia National Taiwan University Signal.

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

1 Computer Vision Research  Huttenlocher, Zabih –Recognition, stereopsis, restoration, learning  Strong algorithmic focus –Combinatorial optimization.

Announcements Readings for today:

Maximum Likelihood (ML), Expectation Maximization (EM)

Computer vision: models, learning and inference

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Computer vision: models, learning and inference

EM and expected complete log-likelihood Mixture of Experts

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.

Markov Random Fields Probabilistic Models for Images

Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.

Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.

1 Markov Random Fields with Efficient Approximations Yuri Boykov, Olga Veksler, Ramin Zabih Computer Science Department CORNELL UNIVERSITY.

CS Statistical Machine learning Lecture 24

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.

Lecture 2: Statistical learning primer for biologists

Belief Propagation and its Generalizations Shane Oldenburger.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.

Pattern Recognition and Machine Learning

Efficient Belief Propagation for Image Restoration Qi Zhao Mar.22,2006.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University

Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Bayesian Belief Propagation for Image Understanding David Rosenberg.

Markov Random Fields in Vision

Biointelligence Laboratory, Seoul National University

Sublinear Computational Time Modeling in Statistical Machine Learning Theory for Markov Random Fields Kazuyuki Tanaka GSIS, Tohoku University, Sendai,

Markov Random Fields with Efficient Approximations

Exact Inference Continued

Latent Variables, Mixture Models and EM

Markov Networks.

Hidden Markov Models Part 2: Algorithms

CSCI 5822 Probabilistic Models of Human and Machine Learning

Exact Inference Continued

Expectation-Maximization & Belief Propagation

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Markov Networks.

Presentation transcript:

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December, 2004

2 Problem Formulation  Find good assignment of labels x i to sites i –Set L of k labels –Set S of n sites –Neighborhood system N  S  S between sites  Undirected graphical model –Graph G =( S, N ) –Hidden Markov Model (HMM), chain –Markov Random Field (MRF), arbitrary graph –Consider first order models Maximal cliques in G of size 2

3 Problems We Consider  Labels x=(x 1,…,x n ), observations (o 1,…,o n )  Posterior distribution P(x|o) factors P(x|o)   i S  i (x i )  (i,j) N  ij (x i,x j )  Sum over labelings  x ( i S  i (x i )  (i,j) N  ij (x i,x j ))  Min cost labeling min x ( i S ’ i (x i )+ (i,j) N ’ ij (x i,x j )) –Where ’ i = -ln( i ) and ’ ij = -ln( ij )

4 Computational Limitation  Not feasible to directly compute clique potentials when large label set –Computation of  ij (x i,x j ) requires O(k 2 ) time –Issue both for exact HMM methods and heuristic MRF methods  Restricts applicability of combinatorial optimization techniques –Use variational or other approaches  However, often can do better –Problems where pairwise potential based on differences between labels  ij (x i,x j )= ij (x i -x j )

5 Applications  Pairwise potentials based on difference between labels –Low-level computer vision problems such as stereo, and image restoration Labels are disparities or true intensities –Event sequences such as web downloads Labels are time varying probabilities

6 Fast Algorithms  Summing posterior (sum product) –Express as a convolution –O(klogk) algorithm using the FFT –Better linear-time approximation algorithms for Gaussian models  Minimizing negative log probability cost function (corresponds to max product) –Express as a min convolution –Linear time algorithms for common models using distance transforms and lower envelopes

7 Message Passing Formulation  For concreteness consider local message update algorithms –Techniques apply equally well to recurrence formulations (e.g., Viterbi)  Iterative local update schemes –Every site in parallel computes local estimates Based on  and neighboring estimates from previous iteration –Exact (correct) for graphs without loops –Also applied as heuristic to graphs with cycles (loopy belief propagation)

8 Message Passing Updates  At each step j sends neighbor i a message –Node j’s “view” of i’s labels  Sum product m ji (x i ) =  x j ( j (x j )  ji (x j -x i )  k N (j)\i m kj (x j ))  Max product (negative log) m’ ji (x i ) = min x j (’ j (x j ) + ’ ji (x j -x i ) +  k N (j)\i m’ kj (x j )) jj  ji

9 Sum Product Message Passing  Can write message update as convolution m ji (x i ) =  x j ( ji (x j -x i ) h(x j )) =  ji  h –Where h(x j )=  j (x j )  k N (j)\i m kj (x j ))  Thus FFT can be used to compute in O(klogk) time for k values –Can be somewhat slow in practice  For  ji a (mixture of) Gaussian(s) do faster

10 Fast Gaussian Convolution  A box filter has value 1 in some range b w (x) = 1 if 0xw 0 otherwise  A Gaussian can be approximated by repeated convolutions with a box filter –Application of central limit theorem, convolving pdf’s tends to Gaussian –In practice, 4 convolutions [Wells, PAMI 86] b w 1 (x)  b w 2 (x)  b w 3 (x)  b w 4 (x)  G  (x) –Choose widths w i such that  i (w i 2 -1)/12   2

11 Fast Convolution Using Box Sum  Thus can approximate G  (x)  h(x) by cascade of box filters b w 1 (x)  (b w 2 (x)  (b w 3 (x)  (b w 4 (x)  h(x))))  Compute each b w (x)  f(x) in time independent of box width w – sliding sum –Each successive shift of b w (x) w.r.t. f(x) requires just one addition and one subtraction  Overall computation just 4 add/sub per label, O(k) with very low constant

12 Fast Sum Product Methods  Efficient computation without assuming parametric form of distributions –O(klogk) message updates for arbitrary discrete distributions over k labels Likelihood, prior and messages –Requires prior to be based on differences between labels rather than their identities  For (mixture of) Gaussian clique potential linear time method that in practice is both fast and simple to implement –Box sum technique

13 Max Product Message Passing  Can write message update as m’ ji (x i ) = min x j (’ ji (x j -x i ) + h’(x j )) –Where h’(x j )= ’ j (x j )  k N (j)\i m’ kj (x j )) –Formulation using minimization of costs, proportional to negative log probabilities  Convolution-like operation over min,+ rather than , [FH00,FHK03] –No general fast algorithm like FFT –Certain important special cases in linear time

14 Commonly Used Pairwise Costs  Potts model ’(x) = 0 if x=0 d otherwise  Linear model ’(x) = c|x|  Quadratic model ’(x) = cx 2  Truncated models –Truncated linear ’(x)=min(d,c|x|) –Truncated quadratic ’(x)=min(d,cx 2 )  Min convolution can be computed in linear time for any of these cost functions

15 Potts Model  Substituting in to min convolution m’ ji (x i ) = min x j (’ ji (x j -x i ) + h’(x j )) can be written as m’ ji (x i ) = min(h’(x i ), min x j h’(x j )+d)  No need to compare pairs x i, x j –Compute min over x j once, then compare result with each x i  O(k) time for k labels –No special algorithm, just rewrite expression to make alternative computation clear

16 Linear Model  Substituting in to min convolution yields m’ ji (x i ) = min x j (c|x j -x i | + h’(x j ))  Similar form to the L 1 distance transform min x j (|x j -x i | + 1(x j )) –Where 1(x) = 0 when xP  otherwise is an indicator function for membership in P  Distance transform measures L 1 distance to nearest point of P –Can think of computation as lower envelope of cones, one for each element of P

17 Using the L 1 Distance Transform  Linear time algorithm –Traditionally used for indicator functions, but applies to any sampled function  Forward pass –For x j from 1 to k-1 m(x j )  min(m(x j ),m(x j -1)+c)  Backward pass –For x j from k-2 to 0 m(x j )  min(m(x j ),m(x j +1)+c)  Example, c=1 – (3,1,4,2) becomes (3,1,2,2) then (2,1,2,2)

18 Quadratic Model  Substituting in to min convolution yields m’ ji (x i ) = min x j (c(x j -x i ) 2 + h’(x j ))  Again similar form to distance transform –However algorithms for L 2 (Euclidean) distance do not directly apply as did in L 1 case  Compute lower envelope of parabolas –Each value of x j defines a quadratic constraint, parabola rooted at (x j,h(x j )) –Comp. Geom. O(klogk) but here parabolas are ordered

19 Lower Envelope of Parabolas  Quadratics ordered x 1 <x 2 < … <x n  At step j consider adding j-th one to LE –Maintain two ordered lists Quadratics currently visible on LE Intersections currently visible on LE –Compute intersection of j-th quadratic with rightmost visible on LE If right of rightmost intersection add quadratic and intersection If not, this quadratic hides at least rightmost quadratic, remove and try again NewRightmost NewRightmost

20 Running Time of Lower Envelope  Consider adding each quadratic just once –Intersection and comparison constant time –Adding to lists constant time –Removing from lists constant time But then need to try again  Simple amortized analysis –Total number of removals O(k) Each quadratic, once removed, never considered for removal again  Thus overall running time O(k)

21 Overall Algorithm (1D) static float *dt(float *f, int n) { float *d = new float[n], *z = new float[n]; int *v = new int[n], k = 0; v[0] = 0; z[0] = -INF; z[1] = +INF; for (int q = 1; q <= n-1; q++) { float s = ((f[q]+c*square(q)) (f[v[k]]+c*square(v[k]))) /(2*c*q-2*c*v[k]); while (s <= z[k]) { k--; s = ((f[q]+c*square(q))-(f[v[k]]+c*square(v[k]))) /(2*c*q-2*c*v[k]); } k++; v[k] = q; z[k] = s; z[k+1] = +INF; } k = 0; for (int q = 0; q <= n-1; q++) { while (z[k+1] < q) k++; d[q] = c*square(q-v[k]) + f[v[k]]; } return d;}

22 Combined Models  Truncated models –Compute un-truncated message m’ –Truncate using Potts-like computation on m’ and original function h’ min(m’(x i ), min x j h’(x j )+d)  More general combinations –Min of any constant number of linear and quadratic functions, with or without truncation E.g., multiple “segments”

23 Illustrative Results  Image restoration using MRF formulation with truncated quadratic clique potentials –Simply not practical with conventional techniques, message updates  Fast quadratic min convolution technique makes feasible –A multi-grid technique can speed up further  Powerful formulation largely abandoned for such problems

24 Illustrative Results  Pose detection and object recognition –Sites are parts of an articulated object such as limbs of a person –Labels are locations of each part in the image Millions of labels, conventional quadratic time methods do not apply –Compatibilities are spring-like

25 Summary  Linear time methods for propagating beliefs –Combinatorial approach –Applies to problems with discrete label space where potential function based on differences between pairs of labels  Exact methods, not heuristic pruning or variational techniques –Except linear time Gaussian convolution which has small fixed approximation error  Fast in practice, simple to implement

26 Readings  P. Felzenszwalb and D. Huttenlocher, Efficient Belief Propagation for Early Vision, Proceedings of IEEE CVPR, Vol 1, pp ,  P. Felzenszwalb and D. Huttenlocher, Distance Transforms of Sampled Functions, Cornell CIS Technical Report TR , Sept  P. Felzenszwalb and D. Huttenlocher, Pictorial Structures for Object Recognition, Intl. Journal of Computer Vision, 61(1), pp ,  P. Felzenszwalb, D. Huttenlocher and J. Kleinberg, Fast Algorithms for Large State Space HMM’s with Applications to Web Usage Analysis, NIPS 16, 2003.