Statistics of natural images May 30, 2010 Ofer Bartal Alon Faktor 1.

Slides:

Advertisements

Similar presentations

Removing blur due to camera shake from images. William T. Freeman Joint work with Rob Fergus, Anat Levin, Yair Weiss, Fredo Durand, Aaron Hertzman, Sam.

Advertisements

Bayesian Belief Propagation

Filling Algorithms Pixelwise MRFsChaos Mosaics Patch segments are pasted, overlapping, across the image. Then either: Ambiguities are removed by smoothing.

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.

CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.

Various Regularization Methods in Computer Vision Min-Gyu Park Computer Vision Lab. School of Information and Communications GIST.

Introduction to Markov Random Fields and Graph Cuts Simon Prince

Exact Inference in Bayes Nets

Computer vision: models, learning and inference Chapter 8 Regression.

Patch-based Image Deconvolution via Joint Modeling of Sparse Priors Chao Jia and Brian L. Evans The University of Texas at Austin 12 Sep

Learning Inhomogeneous Gibbs Models Ce Liu

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

Image Denoising using Locally Learned Dictionaries Priyam Chatterjee Peyman Milanfar Dept. of Electrical Engineering University of California, Santa Cruz.

Rob Fergus Courant Institute of Mathematical Sciences New York University A Variational Approach to Blind Image Deconvolution.

Belief Propagation on Markov Random Fields Aggeliki Tsoli.

CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

Visual Recognition Tutorial

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

Motion Analysis (contd.) Slides are from RPI Registration Class.

CSci 6971: Image Registration Lecture 4: First Examples January 23, 2004 Prof. Chuck Stewart, RPI Dr. Luis Ibanez, Kitware Prof. Chuck Stewart, RPI Dr.

Lecture 5: Learning models using EM

P 3 & Beyond Solving Energies with Higher Order Cliques Pushmeet Kohli Pawan Kumar Philip H. S. Torr Oxford Brookes University CVPR 2007.

Problem Sets Problem Set 3 –Distributed Tuesday, 3/18. –Due Thursday, 4/3 Problem Set 4 –Distributed Tuesday, 4/1 –Due Tuesday, 4/15. Probably a total.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Learning Low-Level Vision William T. Freeman Egon C. Pasztor Owen T. Carmichael.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.

Automatic Estimation and Removal of Noise from a Single Image

Learning Energy-Based Models of High-Dimensional Data Geoffrey Hinton Max Welling Yee-Whye Teh Simon Osindero

CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.

Super-Resolution of Remotely-Sensed Images Using a Learning-Based Approach Isabelle Bégin and Frank P. Ferrie Abstract Super-resolution addresses the problem.

1 Patch Complexity, Finite Pixel Correlations and Optimal Denoising Anat Levin, Boaz Nadler, Fredo Durand and Bill Freeman Weizmann Institute, MIT CSAIL.

CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep

Computer vision: models, learning and inference Chapter 19 Temporal models.

CSC 2535: Computation in Neural Networks Lecture 10 Learning Deterministic Energy-Based Models Geoffrey Hinton.

INDEPENDENT COMPONENT ANALYSIS OF TEXTURES based on the article R.Manduchi, J. Portilla, ICA of Textures, The Proc. of the 7 th IEEE Int. Conf. On Comp.

CIAR Second Summer School Tutorial Lecture 1b Contrastive Divergence and Deterministic Energy-Based Models Geoffrey Hinton.

2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.

Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

Markov Random Fields Probabilistic Models for Images

SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.

CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.

Estimating the Likelihood of Statistical Models of Natural Image Patches Daniel Zoran ICNC – The Hebrew University of Jerusalem Advisor: Yair Weiss CifAR.

Learning to Perceive Transparency from the Statistics of Natural Scenes Anat Levin School of Computer Science and Engineering The Hebrew University of.

Fields of Experts: A Framework for Learning Image Priors (Mon) Young Ki Baik, Computer Vision Lab.

23 November Md. Tanvir Al Amin (Presenter) Anupam Bhattacharjee Department of Computer Science and Engineering,

1 Markov random field: A brief introduction (2) Tzu-Cheng Jen Institute of Electronics, NCTU

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

Markov Random Fields & Conditional Random Fields

Tracking with dynamics

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

Efficient Belief Propagation for Image Restoration Qi Zhao Mar.22,2006.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint.

Machine Vision Edge Detection Techniques ENT 273 Lecture 6 Hema C.R.

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

EE565 Advanced Image Processing Copyright Xin Li Why do we Need Image Model in the first place? Any image processing algorithm has to work on a collection.

CSC 2535: Computation in Neural Networks Lecture 10 Learning Deterministic Energy-Based Models Geoffrey Hinton.

Jianchao Yang, John Wright, Thomas Huang, Yi Ma CVPR 2008 Image Super-Resolution as Sparse Representation of Raw Image Patches.

Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.

Markov Random Fields Tomer Michaeli Graduate Course

Outline S. C. Zhu, X. Liu, and Y. Wu, “Exploring Texture Ensembles by Efficient Markov Chain Monte Carlo”, IEEE Transactions On Pattern Analysis And Machine.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Graduate School of Information Sciences, Tohoku University

≠ Particle-based Variational Inference for Continuous Systems

Expectation-Maximization & Belief Propagation

Presentation transcript:

Statistics of natural images May 30, 2010 Ofer Bartal Alon Faktor 1

Outline Motivation Classical statistical models New MRF model approach Learning the models Applications and results 2

Motivation Big variance in appearance Can we even dream of modeling this? 3

Motivation Main questions: – Do all natural images obey some common “rules”? – How can one find these “rules”? – How to use “rules” for computer vision tasks? 4

Motivation Why bother to model at all? “Noise”, uncertainty Model helps choose the “best” possible answer Lets see some examples 5 Natural image model

Noise-blur removal Consider the classical De-convolution problem Can be formulated as linear set of equations: H + 6

= Noise-blur removal 7

Inpainting Missing lines of identity matrix = missing pixels 8 (under-determined system)

Motivation Problems: – Unknown noise – H may be singular (Deconvolution) – H may be under-determined (Inpainting) So there can be many solutions. How can we find the “right” one? 9

Motivation Goal: Estimate x – Assume: Prior model of natural image: Prior model of noise: – Use MAP estimator to find x: 10

Energy Minimization problem The MAP problem can be reformulated as: 11

Proof: 12

Classical models Smoothness prior (model of image gradients) – Gaussian prior (LS problem) – L1 Prior and sparse prior (IRLS problem) Image gradient 13

Gaussian Priors Assume: – Gaussian priors on gradients of x: – Gaussian noise: Using this assumption: 14

Non-Gaussian Priors Empirical results: image gradients have a Non-Gaussian heavy tailed distribution We assume L1 or sparse prior We solve it by IRLS –iterative re-weighted LS 15

De-convolution Results Gaussian priorSparse prior Blurred image Good results on simple images 16

De-noising Results De-noising result Noisy image Poor results on real natural images 17

Classical models – Pro’s and Con’s Advantages: – Simple and easy to implement Disadvantages: – Too Heuristic – Only one property - Smoothness – Bias towards totally smooth images: 18

Going Beyond Classical Models 19

Modern Approach Model is based on image properties Choose properties using image dataset Questions: 1.What types of properties? Responses to linear filters. 2.How to find good properties? Either pre-determined bank or learn from data. 3.How should combine properties to one distribution? We will see how. 20

Mathematical framework Want: A model p(I) of real distribution f(I). Computationally hard: – A 100x100 pixel image has 10,000 variables Can explicitly model only a few dimensions at a time 21 Arrow = viewpoint of few dimensions

Mathematical framework A viewpoint is a response to a linear filter A distribution over these responses is a marginal of real distribution f(I) (Marginal = Distribution over a subset of variables) 22 Arrow = marginal of f(I)

Mathematical framework If p(I) and f(I) have the same marginal distributions of linear filters then p(I)=f(I) (proposition by Zhu and Mumford) “Hope”: If we will choose K “good” filters then p(I) and f(I) will be “close”. 23 How do we measure “close”?

Distance between distributions Kullback-Leibler divergence: Problem - f(I) unknown Proposition - use instead: Measures fit of model to observations 24

Illustration 25

Getting synthesized images Get synthesized images by sampling the learned model Sample using Markov Chain Monte Carlo (MCMC). Drawback: Learning process is slow 26

Our model P(I) – A MRF MRF = Markov Random Field A MRF is based on a graph G=(V,E). V – pixels E – between pixels that affect each other Our distribution is the MRF: 27

Simple grid MRF Here, cliques are edges Every pixel belongs to 4 cliques 28

MRF We limit ourselves to: – Cliques of fixed size (over-lapping patches) – Same for all cliques We get: 29

MRF simulation 30

Histogram simulation Histogram of a marginal 31

MRF In terms of convolutions: Denote: Set of potential functions: Denote: Set of filters: 32

MRF - A simple example Cliques of size 1 Pixels are i.i.d and distributed by grayscale histogram grayscale histogram 33 Drawback: cliques are too small

MRF - Another simple example Clique = whole image Result: Uniform distribution on images in dataset Px 34 Drawback: cliques are too big

Formulation as Gibbs models All pixels are i.i.d and 35

Formulation as Gibbs models Uniform distribution on the image dataset 36

Revisiting classical models Actually, the classical model is a pairwise MRF: Has cliques of size 2: Has only 2 linear filters => 2 marginals No guarantee that p(I) will be close to f(I) 37

Comparison between models Classical Linear MRF 38

Zhu and Mumford’s approach (1997) We want to find K “good” filters Strategy: – Start off with a bank B of possible filters – Choose subset that minimizes the distance between p(I) and f(I) – For computational reasons, choose filters one by one using a greedy method 39

MRF simulation 40

Choosing the next filter AIG = the difference between the model p(I) and the data from the viewpoint of marginal AIF = the difference in between different images in dataset from the viewpoint of marginal 41

Algorithm – Filter selection Bank of filters IC Model 42

Algorithm 43

Learning the potentials Model Calculate update Init 44 (Using maximum entropy on P)

The bank of filters Filter types: – Intensity filter (1X1) – Isotropic filters - Laplacian of Gaussian (LG, ) – Directional filters - Gabor (Gcos, Gsin) Computation in different scales - image pyramid Laplacian of GaussianGabor 45

Running example of algorithm Experiment I Use only small filters 46

Results All learned potentials have a diffusive nature 47

Running example of algorithm Experiment II Only gradient filters, in different scales Small filters -> diffusive potential (as expected) Surprisingly: Large filters -> reactive potentials 48 DiffusiveReactive

The discovery of reactive potentials 49

Examples of the synthesized images Experiment IExperiment II This image is more “natural” because it has some regions with sharp boundaries 50

Outline We have seen: – MRF models – Selection of filters from a bank – Learning potentials Now: – Data-driven filters – Analytic results for simple potentials – Making sense in results – Applications 51

Roth and Black’s approach filterspotentials Chosen from bankLearn a-parametrically Learn from dataLearn parametrically Learn together 52

Motivation – model of natural patches Why learn filters from data? Inspiration from models of natural patches: – Sparse coding – Component analysis – Product of experts 53

Motivation – Sparse Coding of patches Goal: find a set s.t. Learn from database of natural patches Only few filters should fire on a given patch 54

Motivation – Component analysis Learn by component analysis: – PCA – ICA Results in “filters like” components – PCA – first components look like contrast filters – ICA - components look like Gabor filters 55

PCA results high low 56

ICA results Independent filters Can derive model for patches: 57

Motivation – Product of experts More sophisticated model for natural patches: Training of MLE => “intuitive” filters: texture contrast 58

extension of POE to FOE: Field of experts (FOE) 59 Roth S., Black M. J., Fields of experts IJCV, 2009

The experts Student-t experts 60

Meaning of Higher means: – Punishes high responses more severely – A filter with higher weight 61

Learning the model Model MCMC init random

Update rule: For we use MCMC – very slow Learning the model Finding ML of 63

Contrastive divergence (CD) algorithm (Hinton, 2002) Start Markov Chain from “good” initial guess – X (data distribution) Run MCMC for only j steps Samples of MCMC will be close enough to the model distribution New Update rule: 64

Results of learning FOE Filters aren’t “intuitive” 65

Basis for representing the filters Instead of learning filters we can learn the filters by basis rotation two options: – Whitened basis – “Inverse” whitened basis is the covariance matrix of natural image patches 66

So far… filterspotentials Chosen from bankLearned a-parametrically diffusive reactive Small filtersLarge filters non-intuitive 67

So far… filterspotentials Learned from databaseLearned parametrically non-intuitive 68

What now? Revisiting POE and FOE with Gaussian potentials Relation to non-Gaussian potentials Making sense of previous results Weiss Y., Freeman W. T. What makes a good model of natural images?. CVPR,

Gaussian POE 70

Claim: Z is constant for any set of K orthonormal vectors This has an analytic solution – the K minor components of the data Gaussian POE 71

Non-intuitive high-frequency filters Reminder - PCA Results Example of learned filters high low 72

Gaussian FOE 73

Gaussian FOE 74

Gaussian FOE satisfies: => Optimal filters have high frequencies 75

Non-Gaussian potentials -> modeled by GSM Properties of GFOE hold for GSM Gaussian Scale Mixture (GSM) 76

Revisiting FOE Student t expert – fit GSM Filters have the property of Natural imageRoth and Black filters high-frequency filters 77

Learning FOE with fixed filters Algorithm prefers high-frequency filters 78

Conclusion For Gaussian potentials and GSM’s: learning => High frequency filters Experimental evidence to this phenomena Maybe there is a “logic” behind this non-intuitive result? 79

Making Sense of results Criterion for “good” filters for patches – Rarely fire on natural images and fire frequently on all other images Patches from Natural images Histogram of filter responses 80 White noise

Making Sense of results An image was modeled by what you don’t expect to find in it This is satisfied by the classical prior of smooth gradients But why limit ourselves to intuitive filters? Maybe non-intuitive filters can do better… 81

reactive diffusive White noise Patches from Natural images Revisiting diffusive and reactive potentials White noise Patches from Natural images 82

Inference We learned a model We can use it for inference problems – Corrupted information – Missing information Exact inference – Loopy BP Approximate inference - gradient based optimization 83

Belief Propagation Observed data is incorporated to model by 84

Belief Propagation Message passing Algorithm Exact only on tree MRFs Efficient only on pairwise MRFs 85

Alternative by Roth and Black Reminder: Approximate inference by gradient-based optimization : Advantage: Low computational cost Drawback: only local minimum if not convex Uncertainty \Noise modelLearned model 86

Partition function => No need to estimate partition function We get: (Doesn’t depend on ) 87

The gradient step How to derivate the second term? By a mathematical “trick” we get: 88

Assume Gaussian noise So the Gradient step is: De-noising 89

Results 90

Results 91

Results Original Noisy (20.29dB) FOE (28.72dB) Poritilla (Wavelets) (28.9dB) Non-local means (28.21dB) Standard Non-Linear diffusion (27.18dB) State of the art General prior 92

Results on Berkeley database Wiener filter Non-Linear diffusion FOE Poritilla1 Poritilla2 Output PSNR Low noise High noise Input PSNR Low noise High noise Input PSNR 93

How many 3x3 filters to take? Number of filters Size of filter – 3X3 Performance start saturating when we reach 8 filters 94

Dependence on size and shape of clique What is the best filter? 95

Random and Fixed filters FOE – learned filters random filters Fixed filters 96

Inpainting - Reminder 97 Problem: pixels outside mask can change Solution: constraint them

Inpainting Assume pixels outside mask M don’t change So the gradient step is: Advanced Topics In Computer Vision Course Spring 2010 Advanced Topics In Computer Vision Course Spring MaskImage we want to inpaint 98

Results 99

Results 100

Results FOEBertalmio FOEBertalmio PSNR29.06dB27.56dB SSIM

Pro’s and Con’s Perform well on narrow straws or small holes (even if they cover most of the image) Isn’t able to fill large holes Isn’t designed to handle textures 102

Thank you for Listening… 103