1 Patch Complexity, Finite Pixel Correlations and Optimal Denoising Anat Levin, Boaz Nadler, Fredo Durand and Bill Freeman Weizmann Institute, MIT CSAIL.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Spatial point patterns and Geostatistics an introduction
Bayesian Belief Propagation
Principles of Density Estimation
A Crash Course in Radio Astronomy and Interferometry: 4
CSCE 643 Computer Vision: Template Matching, Image Pyramids and Denoising Jinxiang Chai.
Sampling and Pulse Code Modulation
Use of Kalman filters in time and frequency analysis John Davis 1st May 2011.
Accelerating Spatially Varying Gaussian Filters Jongmin Baek and David E. Jacobs Stanford University.
Patch-based Image Deconvolution via Joint Modeling of Sparse Priors Chao Jia and Brian L. Evans The University of Texas at Austin 12 Sep
Lecture 3 Nonparametric density estimation and classification
Wed. 17th Sept Hamburg LOFAR Workshop.  Extract a cosmological signal from a datacube, the three axes of which are x and y positions, and frequency.
Rob Fergus Courant Institute of Mathematical Sciences New York University A Variational Approach to Blind Image Deconvolution.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Temporal Video Denoising Based on Multihypothesis Motion Compensation Liwei Guo; Au, O.C.; Mengyao Ma; Zhiqin Liang; Hong Kong Univ. of Sci. & Technol.,
For stimulus s, have estimated s est Bias: Cramer-Rao bound: Mean square error: Variance: Fisher information How good is our estimate? (ML is unbiased:
Image processing. Image operations Operations on an image –Linear filtering –Non-linear filtering –Transformations –Noise removal –Segmentation.
1 On the Statistical Analysis of Dirty Pictures Julian Besag.
Digital Image Processing Chapter 5: Image Restoration.
1 Markov random field: A brief introduction Tzu-Cheng Jen Institute of Electronics, NCTU
Clustering Color/Intensity
Linear View Synthesis Using a Dimensionality Gap Light Field Prior
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Understanding and evaluating blind deconvolution algorithms
(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.
Despeckle Filtering in Medical Ultrasound Imaging
Automatic Estimation and Removal of Noise from a Single Image
Ensemble Learning (2), Tree and Forest
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Noise Estimation from a Single Image Ce Liu William T. FreemanRichard Szeliski Sing Bing Kang.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Fabricating BRDFs at High Spatial Resolution Using Wave Optics Anat Levin, Daniel Glasner, Ying Xiong, Fredo Durand, Bill Freeman, Wojciech Matusik,
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
TELECOMMUNICATIONS Dr. Hugh Blanton ENTC 4307/ENTC 5307.
Non-Parametric Learning Prof. A.L. Yuille Stat 231. Fall Chp 4.1 – 4.3.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.
Digital Image Processing Lecture 10: Image Restoration March 28, 2005 Prof. Charlene Tsai.
CS654: Digital Image Analysis Lecture 30: Clustering based Segmentation Slides are adapted from:
Digital Image Processing Lecture 10: Image Restoration
8-1 Chapter 8: Image Restoration Image enhancement: Overlook degradation processes, deal with images intuitively Image restoration: Known degradation processes;
1 Markov random field: A brief introduction (2) Tzu-Cheng Jen Institute of Electronics, NCTU
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
Lecture 2: Statistical learning primer for biologists
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Combining the Power of Internal & External Denoising Inbar Mosseri The Weizmann Institute of Science, ISRAEL ICCP, 2013.
EE565 Advanced Image Processing Copyright Xin Li Further Improvements Gaussian scalar mixture (GSM) based denoising* (Portilla et al.’ 2003) Instead.
Spatial vs. Blind Approaches for Speaker Separation: Structural Differences and Beyond Julien Bourgeois RIC/AD.
Instructor: Mircea Nicolescu Lecture 5 CS 485 / 685 Computer Vision.
EE565 Advanced Image Processing Copyright Xin Li Why do we Need Image Model in the first place? Any image processing algorithm has to work on a collection.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Biointelligence Laboratory, Seoul National University

Ch8: Nonparametric Methods
Energy Preserving Non-linear Filters
Image Analysis Image Restoration.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
Foundation of Video Coding Part II: Scalar and Vector Quantization
A Block Based MAP Segmentation for Image Compression
Lecture 2: Image filtering
Deblurring Shaken and Partially Saturated Images
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

1 Patch Complexity, Finite Pixel Correlations and Optimal Denoising Anat Levin, Boaz Nadler, Fredo Durand and Bill Freeman Weizmann Institute, MIT CSAIL

2 Many research efforts invested, and results harder and harder to improve: reaching saturation? What uncertainty is inherent in the problem? How further can we improve results? Image denoising

3 What is the volume of all clean x images that can explain a noisy image y? Denoising Uncertainty

4 What is the volume of all clean images x that can explain a noisy image y? Multiple clean images within noise level. Denoising Uncertainty

5 Denoising limits- prior work Signal processing assumptions (Wiener filter, Gaussian priors) Limits on super resolution- numerical arguments, no prior [Baker&Kanade 02] Sharp bounds for perfectly piecewise constant images [Korostelev&Tsybakov 93, Polzehl&Spokoiny 03] Non-local means- asymptotically optimal for infinitely large images. No analysis of finite size images. [Buades,Coll&Morel. 05] Natural image denoising limits, but many assumptions which may not hold in practice and affect conclusions. [Chatterjee and Milanfar 10]

6 MMSE denoising bounds MMSE= conditional variance, achieved by the conditional mean MMSE with the exact p(x) (and not with heuristics used in practice), is the optimal possible denoising. By definition. Using internal image statistics or class specific information might provide practical benefits, but cannot perform better than the MMSE. By definition!

7 MMSE d best possible result of any algorithm which can utilize a d=k x k window w d around a pixel of interest e.g. spatial kernel size in bilateral filter, patch size in non-parametric methods Non Local Means: effective support = entire image MMSE with a finite support

8 Estimating denoising bounds in practice Challenge: Compute MMSE without knowing p(x)? The trick [Levin&Nadler CVPR11]: We don’t know p(x) but we can sample from it Evaluate MMSE non parametrically Sample mean:

9 MMSE as a function of patch size [Levin&Nadler CVPR11]: For small patches/ large noise, non parametric approach can accurately estimate the MMSE. patch size

10 MMSE as a function of patch size How much better can we do by increasing window size? patch size

11 Towards denoising bounds Questions: For non-parametric methods:For non-parametric methods: How does the difficulty in finding nearest neighbors relates to the potential gain, and how can we make a better usage of a given database size? For any possible method:For any possible method: Computational issues aside, what is the optimal possible restoration? Can we achieve zero error?

12 Patch Complexity ?

13 Patch Complexity ?

14 Patch Complexity ? Empty neighbors set

15 Patch Complexity ?

16 Patch complexity v.s. PSNR gain Law of diminishing return: When an increase in patch width requires many more training samples, the performance gain is smaller. Smooth regions: Easy to increase support, large gain Textured regions: Hard to increase support, small gain Adaptive patch size selection in denoising algorithms. See paper

17 Pixel Correlation and PSNR gain

18 Pixel Correlation and PSNR gain Independent Fully dependent

19 Pixel Correlation and PSNR gain Independent Fully dependent Many neighbors When the samples are correlated, going from 1D to 2D does not spread the samples. No gain from y 2 y 2 => factor 2 variance reduction Few neighbors

20 Towards denoising bounds Questions: For non-parametric methods:For non-parametric methods: How does the difficulty in finding nearest neighbors relates to the potential gain, and how can we make a better usage of a given database size? For any possible method:For any possible method: Computational issues aside, what is the optimal possible restoration? Can we achieve zero error? - What is the convergence rate as a function of patch size?

21 The Dead Leaves model (Matheron 68) Image = random collection of finite size piece-wise constant regions Region intensity = random variable with uniform distribution

22 Optimal denoising in the Dead Leaves model Given a segmentation oracle, best possible denoising is to average all observations within a segment Expected reconstruction error: s =number of pixels in segment

23 Optimal patch denoising & Dead Leaves Segment area <d Probability of a random pixel belonging to a segment of size s pixels Segment area >d For a given window size d: -If segment has size s smaller than d, only average over s pixels -Otherwise, use all d pixels inside window (but not the full segment) What is p(s)?

24 Scale invariance in natural images Down-scaling natural images does not change statistical properties [Ruderman, Field, etc.] Theorem: in a scale invariant distribution, the segment size distribution must satisfy Empirical segment size distribution: Fit (Repeated from Alvarez, Gousseau, and Morel.)

25 Optimal patch denoising & scale invariance Segment area <dSegment area >d

26 Empirical PSNR v.s. window size Good fit with a power lawPoor fit with an exponential curve (implied by Markov models) PSNR Window size PSNR

27 Extrapolating optimal PSNR Future sophisticated denoising algorithms appear to have modest room for improvement: ~ dB

28 Summary: inherent uncertainty of denoising Non-parametric methods: Law of diminishing return - When increasing patch size requires a significant increase in training data, the gain is low - Correlation with new pixels makes it easier to find samples AND makes them more useful -Adaptive denoising For any method: Optimal denoising as a function of window size follows a power law convergence - Scale invariance, dead leaves - Extrapolation predicts denoising bounds

29 Scope: Limitations: - MSE - Our database - Power law extrapolation is a conjecture for real images is by definition the lowest possible MSE of any algorithm Including: object recognition, depth estimation, multiple images of the same scene, internal image statistics, each.