Sublinear Computational Time Modeling in Statistical Machine Learning Theory for Markov Random Fields Kazuyuki Tanaka GSIS, Tohoku University, Sendai, Japan http://www.smapip.is.tohoku.ac.jp/~kazu/ Collaborators Muneki Yasuda (Yamagata University, Japan) Masayuki Ohzeki (Kyoto University, Japan) Shun Kataoka (Tohoku University, Japan) My talk is to realize an acceleration of Bayesian image segmentation by using the real space renormalization group transformation in the statistical mechanics.. Our Bayesian image segmentation modeling is based on Markov random fields and loopy belief propagation. 24 September, 2015 University of Roma, La Sapienza
Markov Random Fields and Loopy Belief Propagation = Classical Spin Systems Bayes Formulas Maximum Likelihood KL Divergence Probabilistic Information Processing Probabilistic Models and Statistical Machine Learning Loopy Belief Propagation =Bethe Approximation j i Probability distribution of Markov random field can be expressed as a product of pairwise weights over all the neighbouring pairs of pixels. In this slide, a_i is a state variable at each pixel on a square grid graph. In the loopy belief propagation, some statistical quantities can be approximately expressed in terms of messages between neighbouring pixels. Messages can be determined so as to satisfy the message passing rules which are regarded as simultaneous fixed point equations. Practical algorithms of loopy belief propagation can be realized as an iteration method to solve the simultaneous fixed point equations for messages. Message V: Set of all the nodes (vertices) in graph G E: Set of all the links (edges) in graph G 24 September, 2015 University of Roma, La Sapienza
Bayesian Modeling for Image Segmentation Image Segmentation using MRF (JPJS, 2014) Segment image data into some regions using belief propagation and EM algorithm Data Parameter ai = 0,1,…,q-1 Posterior Probability Dstribution 12q+1 Hyperparameters Potts Prior Data Generative Model Likelihood of Hyperparameter Now we consider Bayesian modelling of image segmentation problems for color images. Segmentation problems can be regarded as one of clustering of pixels from one observed color image. In this slide, d is a color image and is data point in our problem. a is a state variable of labeling at each pixel and takes all the possible integer from 0 to q-1. The number of all the possible states for labeling is denoted by q. The posterior probability distribution in our Bayesian modeling of image segmentation problems is expressed as the following Markov random field model. The first factor corresponds to data generative model and is a product of three-dimensional Gaussian distributions over all the pixels.. The second factor is a prior probability distributions which is the Potts model with spatially uniform ferromagnetic interactions between all the neighbouring pixels on the square grid graph. The posterior probability distribution also can be regarded as the ferromagnetic Potts model with random external fields, in which d corresponds to a kind of external fields.and a is a state variable in the classical spin system. This model include 12q+1 hyperparameters. They are determined by the maximum likelihood framework. In the maximum likelihood framework, we regard the probability of data d when the hyperparameters are given is regarded as a likelihood function of the 12q+1 hyperparameters when the data d is given. And the hyperparameters are determined so as to maximize the likelihood function. After determing the hyperparameters, the label state a at each pixel is determined so as to maximize the one-body marginal of the posterior probability distribution at the pixel,. Maximization of Posterior Marginal (MPM) Berkeley Segmentation Data Set 500 (BSDS500), http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/ P. Arbelaez, M. Maire, C. Fowlkes and J. Malik: IEEE Trans. PAMI, 33, 898, 2011. 24 September, 2015 University of Roma, La Sapienza
Bayesian Modeling for Image Segmentation Likelihood of Hyperparameter Deterministic Equations of Hyperparameters Extremum Conditions Posterior Probability Dstribution Potts Prior The maximization of Likelihood can be reduced to its extremum conditions with respect to the hyperparameters. We can determine the estimates of hyperparameters by solving these extremum conditions. The posterior marginals and the prior marginals in the deterministic equations are computed by the loopy belief propagations. 24 September, 2015 University of Roma, La Sapienza
Practical Algorithm based on Potts Prior V: Set of all the vertices E: Set of all the edges Repeat until hyperparameters converge Posterior Marginals Potts Prior Marginals Our algorithm have three parts. The first part corresponding to an E-step of expectation-maximization algorithm and mainly compute some statistical quantities in the posterior probability distribution of our Markov random field. The second part corresponds to computation of some statistical quantities in our Potts prior probability distributions. The third part is the maximization of marginal at each pixel of posterior probability distribution. Each step we have to compute one-body and two body marginals of posterior probability of Markov random field and prior probability.of q-state Potts model by using loopy belief propagations approximately. 24 September, 2015 University of Roma, La Sapienza 5
Bayesian Image Segmentation Potts Prior Hyperparameter Intel® Core™ i7-4600U CPU with a memory of 8 GB q=8 Hyperparameter Estimation in Maximum Likelihood Framewrok and Maximization of Posterior Marginal (MPM) Estimation with Loopy Belief Propagation for Observed Color Image d 481 x 321 This is one of our numerical experiments. The number of labeling is 8. We spend about 30 minutes. Berkeley Segmentation Data Set 500 (BSDS500), http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/ P. Arbelaez, M. Maire, C. Fowlkes and J. Malik: IEEE Trans. PAMI, 33, 898, 2011. 24 September, 2015 University of Roma, La Sapienza
Coarse Graining K K K K K K K(1) K(1) K(1) K(1) K(1) K(1) 4 Step 1 2 3 4 5 6 7 Step 1 K(1) K(1) K(1) In order to realize a sub-linear computational time modeling of our image segmentation algorithm, we apply a coarse graining techniques for our Potts prior.. For simplicity, we first consider a Potts prior on a one-dimensional chain graph.. We take summations over all the even-numbered nodes. After this procedure, we generate another Potts prior. New interaction is expressed in terms of previous interaction.. 1 3 5 7 Step 2 K(1) K(1) K(1) 1 2 3 4 24 September, 2015 University of Roma, La Sapienza
Coarse Graining y y x x x If K(2) is given, the original value of K can be estimated by iterating By repeating the same procedure, we can realize one of coarse graining procedure. This procedure is referred to as a real space renormalization group transformation in the statistical physics. If we can estimate the interaction parameter of coarse grained Potts prior, we can estimate the interaction parameter of original Potts prior by considering the inverse transformation. We regard it as an inverse real space renormalization group transformation for Potts prior on a one-dimensional chain graph. 24 September, 2015 University of Roma, La Sapienza
Coarse Graining q = 8 In the square grid graph, we can consider the similar coarse graining procedure approximately. This schemes can be regarded as an inverse real space renormalization group transformation in Bayesian modeling of image segmentation problems. 24 September, 2015 University of Roma, La Sapienza
Bayesian Image Segmentation Hyperparameter Estimation in Maximum Likelihood Framework with Belief Propagation for Original Image q=8 Potts Prior Hyperparameter Hyperparameter Estimation in Maximum Likelihood Framework with Belief Propagation after Coarse Graining Procedures 481 x 321 Segmentation by Belief Propagation for Original Image Intel® Core™ i7-4600U CPU with a memory of 8 GB MPM with LBP 30 x 20 30 x 20 Labeled Image This is one of numerical experiments. First we generate small size image from our original image. By applying our EM algorithm with belief propagation to the small size image, we estimate the hyperparameter for coarse grained Potts prior. By applying the inverse real space renormalization transformation, we can estimate the hyperparameter of original Potts prior. Our method spends less than 2 minutes although our original EM algorithm takes 30 minutes. Coarse Graining Procedures (r =8) Berkeley Segmentation Data Set 500 (BSDS500), http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/ 24 September, 2015 University of Roma, La Sapienza
Summary Bayesian Image Segmentation Modeling by Loopy Belief Propagation and EM algorithm. By introducing Inverse Real Space Renormalization Group Transformations, the computational time can be reduced. Labeled Image by our proposed algorithm Ground Truth Observed Image In the first part of my talk, we show the Bayesian image segmentation modeling by the loopy belief propagation and EM algorithm in the statistical machine learning theory. In the second part, we show that a sub-linear computational time modelling can be realized by introducing inverse real space renormalization group transformations in our problem. It is expected that prior can be learned from data base set of ground truths. Berkeley Segmentation Data Set 500 (BSDS500), http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/ P. Arbelaez, M. Maire, C. Fowlkes and J. Malik: IEEE Trans. PAMI, 33, 898, 2011. 24 September, 2015 University of Roma, La Sapienza
Markov Random Fields Original Image Degraded Image Restored Image Noise Reduction Image Impainting Missing Rate 90.0% 24 September, 2015 University of Roma, La Sapienza
Statistical-Mechanics Sublinear Time Computational Modeling in Big Data Sciences from Statistical-Mechanical Point of View Difficulty of Big Data Sublinear Time Computational Modeling Computational Time System Size N Our Target Statistical-Mechanics Informatics Demension of Data Point Computational Theory Statistical Sciences One of the key points to realize the sublinear computational time modeling is how to coarse-grain our observed data and we expect that the renormalization group theory is one of powerful technologies. Before closing my talk, we show our new project of Big Data Sciences which is promoted mainly by researchers of theoretical computation theory.. I am one of the members. The target of our project is to create innovative algorithms for Big Data and has started from the middle of 2014. Our project is supported by Japan Science and Technology Agency. We consider that difficulty of Big Data is the huge number of data points and the high-dimensionality of each data point. We understand that previous targets of statistical-mechanical informatics is mainly how to treat massive statistical models with high-dimensional data points. We consider that it is possible to create novel algorithms to treat Big Data with the huge number of data points with high-dimensionality by combining our statistical-mechanical informatics with computational theory and statistical sciences. One of the key words is sublinear time computational modeling which means that computational time should be reduced to less than order of system size. One of the key points to realize the sublinear computational time modeling is how to coarse-grain our observed data. We expect that the renormalization group theory is one of powerful technologies. . # of Data Points 24 September, 2015 University of Roma, La Sapienza
References K. Tanaka, M. Yasuda and D. M. Titterington: Bayesian image modeling by means of generalized sparse prior and loopy belief propagation, Journal of the Physical Society of Japan, vol.81, vo.11, article no.114802, November 2012. K. Tanaka, S. Kataoka, M. Yasuda, Y. Waizumi and C.-T. Hsu: Bayesian image segmentations by Potts prior and loopy belief propagation, Journal of the Physical Society of Japan, vol.83, no.12, article no.124002, December 2014. K. Tanaka, S. Kataoka, M. Yasuda and M. Ohzeki: Inverse renormalization group transformation in Bayesian image segmentations, Journal of the Physical Society of Japan, vol.84, no.4, article no.045001, April 2015. 24 September, 2015 University of Roma, La Sapienza