Non-local means: a look at non-local self-similarity of images

Slides:

Advertisements

Similar presentations

Bayesian Belief Propagation

Advertisements

Transform-based Non-local Methods for Image Restoration IT530, Lecture Notes.

CISC 489/689 Spring 2009 University of Delaware

CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.

Linear Filtering – Part I Selim Aksoy Department of Computer Engineering Bilkent University

Texture. Edge detectors find differences in overall intensity. Average intensity is only simplest difference. many slides from David Jacobs.

Digital Image Processing In The Name Of God Digital Image Processing Lecture3: Image enhancement M. Ghelich Oghli By: M. Ghelich Oghli

5. 1 Model of Image degradation and restoration

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.

A LOW-COMPLEXITY, MOTION-ROBUST, SPATIO-TEMPORALLY ADAPTIVE VIDEO DE-NOISER WITH IN-LOOP NOISE ESTIMATION Chirag Jain, Sriram Sethuraman Ittiam Systems.

A New Block Based Motion Estimation with True Region Motion Field Jozef Huska & Peter Kulla EUROCON 2007 The International Conference on “Computer as a.

Lecture 4 Linear Filters and Convolution

Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,

Image processing. Image operations Operations on an image –Linear filtering –Non-linear filtering –Transformations –Noise removal –Segmentation.

Motion Analysis (contd.) Slides are from RPI Registration Class.

CSci 6971: Image Registration Lecture 4: First Examples January 23, 2004 Prof. Chuck Stewart, RPI Dr. Luis Ibanez, Kitware Prof. Chuck Stewart, RPI Dr.

Optical Flow Methods 2007/8/9.

Probabilistic video stabilization using Kalman filtering and mosaicking.

Segmentation Divide the image into segments. Each segment:

MSU CSE 803 Stockman Linear Operations Using Masks Masks are patterns used to define the weights used in averaging the neighbors of a pixel to compute.

Feature matching and tracking Class 5 Read Section 4.1 of course notes Read Shi and Tomasi’s paper on.

Texture Reading: Chapter 9 (skip 9.4) Key issue: How do we represent texture? Topics: –Texture segmentation –Texture-based matching –Texture synthesis.

Optical Flow Estimation using Variational Techniques Darya Frolova.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

MSU CSE 803 Linear Operations Using Masks Masks are patterns used to define the weights used in averaging the neighbors of a pixel to compute some result.

Retinex by Two Bilateral Filters Michael Elad The CS Department The Technion – Israel Institute of technology Haifa 32000, Israel Scale-Space 2005 The.

Rician Noise Removal in Diffusion Tensor MRI

Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;

Optical flow (motion vector) computation Course: Computer Graphics and Image Processing Semester:Fall 2002 Presenter:Nilesh Ghubade

A Gentle Introduction to Bilateral Filtering and its Applications Limitation? Pierre Kornprobst (INRIA) 0:20.

Advanced Image Processing Image Relaxation – Restoration and Feature Extraction 02/02/10.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

1 Patch Complexity, Finite Pixel Correlations and Optimal Denoising Anat Levin, Boaz Nadler, Fredo Durand and Bill Freeman Weizmann Institute, MIT CSAIL.

A Gentle Introduction to Bilateral Filtering and its Applications How does bilateral filter relates with other methods? Pierre Kornprobst (INRIA) 0:35.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

Digital Image Processing

University of Texas at Austin CS384G - Computer Graphics Fall 2010 Don Fussell Image processing.

EECS 274 Computer Vision Segmentation by Clustering II.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.

Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #16.

CSC508 Convolution Operators. CSC508 Convolution Arguably the most fundamental operation of computer vision It’s a neighborhood operator –Similar to the.

INTRODUCTION TO Machine Learning 3rd Edition

CS Statistical Machine learning Lecture 24

1 Markov random field: A brief introduction (2) Tzu-Cheng Jen Institute of Electronics, NCTU

Joint Tracking of Features and Edges STAN BIRCHFIELD AND SHRINIVAS PUNDLIK CLEMSON UNIVERSITY ABSTRACT LUCAS-KANADE AND HORN-SCHUNCK JOINT TRACKING OF.

Optical Flow. Distribution of apparent velocities of movement of brightness pattern in an image.

Segmentation of Vehicles in Traffic Video Tun-Yu Chiang Wilson Lau.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

October 1, 2013Computer Vision Lecture 9: From Edges to Contours 1 Canny Edge Detector However, usually there will still be noise in the array E[i, j],

Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.

Texture Synthesis by Image Quilting CS766 Class Project Fall 2004 Eric Robinson.

Machine Vision Edge Detection Techniques ENT 273 Lecture 6 Hema C.R.

Representing Moving Images with Layers J. Y. Wang and E. H. Adelson MIT Media Lab.

Instructor: Mircea Nicolescu Lecture 5 CS 485 / 685 Computer Vision.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

A New Approach of Anisotropic Diffusion: Medical Image Application Valencia 18th-19th 2010 Y. TOUFIQUE*, L.MASMOUDI*, R.CHERKAOUI EL MOURSLI*, M. CHERKAOUI.

LECTURE 11: Advanced Discriminant Analysis

Digital Image Processing Lecture 10: Image Restoration

Context-based Data Compression

Machine Learning Basics

Dynamical Statistical Shape Priors for Level Set Based Tracking

Image Analysis Image Restoration.

Computer Vision Lecture 16: Texture II

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Linear Operations Using Masks

Lecture 7 Patch based methods: nonlocal means, BM3D, K- SVD, data-driven (tight) frame.

Presentation transcript:

Non-local means: a look at non-local self-similarity of images IT 530, LECTURE NOTES

Partial Differential Equations (PDEs): Heat Equation Executing several iterations of this PDE on a noisy image is equivalent to convolving the same image with a Gaussian! The “sigma” of the Gaussian is directly proportional to the number of time-steps of the PDE. Inspired from thermodynamics Blurs out edges

PDEs: Anisotropic Diffusion Diffusivity function “g”. Decreasing function of gradient magnitude. Preserve edges: Diffuse along edges not across. Several papers: Perona and Malik [IEEE PAMI 1990], Total variation method [Rudin et al, 1992], Beltrami flow [Sochen et al, IEEE TIP 1998], etc.

Euler-Lagrange equation PDEs: Total Variation Total variation denoising seeks to minimize the following energy functional: Euler-Lagrange equation (Partial differential equation): exhibits anisotropic behaviour due to gradient magnitude term in the denominator. Diffusion is low across strong edges.

Heat equation Perona-Malik PDE Total variation

Neighborhood Filters for Denoising Simple averaging filter – will cause blurring of edges and textures in the image

Denoising with a neighborhood filter

Neighborhood Filters for Denoising: Lee Filter Weigh the pixels in the neighborhood by factors inversely proportional to the distance between the central pixel and the particular pixel used for weighting. This is expressed as: More weight to nearby pixels

Anisotropic Neighborhood Filter (Yaroslavsky Filter) Weigh the pixels in the neighborhood by factors inversely proportional to the difference between the intensity values at those pixels and the intensity value of the pixel to be denoised. This is expressed as: More weight to pixels with similar intensity values: better preservation of edges/boundaries

Bilateral Filter (Lee+Yaroslavsky Filter) Weigh the pixels in the neighborhood by factors inversely proportional to the difference between the intensity values at those pixels and the intensity value of the pixel to be denoised, and the difference in pixel locations. This is expressed as: More weight to pixels with similar intensity values: better preservation of edges/boundaries

Comparative Results

Comparative Results The anisotropic diffusion algorithm performs better than the others. In the Yaroslavsky/Bilateral filter, the comparison between the intensity values is not very robust. This creates artifacts around the edges. Performance difference between Yaroslavsky and bilateral filter is minor. All aforementioned filter are based on the principle of piece-wise constant intensity images.

Non-local self-similarity Non-local self-similarity is very useful in denoising (and almost everything else in image processing). For denoising, you could simply take an average of all those patches that were “similar” (modulo noise).

Non-local Means Natural images have a great deal of redundancy: patches from different regions can be very similar NL-Means: a non-local pixel-based method (Buades et al, 2005) Awate and Whitaker (PAMI 2007) Popat and Picard (TIP 1998) De-Bonet (MIT Tech report 1998) Wang et al (IEEE SPL 2003) Difference between patches

Non-local means: Basic Principle Non-local means compares entire patches (not individual pixel intensity values) to compute weights for denoising pixel intensities. Comparison of entire patches is more robust, i.e. if two patches are similar in a noisy image, they will be similar in the underlying clean image with very high probability. We will see this informally and prove it mathematically in due course.

Non-local means: Variant Euclidean distance between two patches is being weighted by a Gaussian with maximum weight at the center of the two patches and decaying outwards

Three principles to evaluate denoising algorithms (1): The residual image (also called “method noise”) – defined as the difference between the noisy image and the denoised image – should look like (and have all the properties of) a pure noise image. (2): A denoising algorithm should transform a pure noise image into another noise image (of lower variance). (3): A competent denoising algorithm should find for any pixel ‘i’, all and only those pixels ‘j’ that have the same model as ‘i’ (i.e. those pixels whose intensity would have most likely been the same as that of ‘i’, if there were no noise).

Principle 1: Residual Image

Principle 1: Residual Image

Principle 2: Noise to noise

Principle 3: Correct models? The pixels with high weight in anisotropic diffusion or bilateral filters do NOT line up with our expectation (in all images!). This is because noise affects the gradient computation or single intensity driven weights. In NL-means, the comparison between patches is MUCH more robust to noise!

Non-local means: Implementation details A drawback of the algorithm is its very high time complexity – O(N x N) for an image with N pixels. Heuristic work-around: given a reference patch, restrict the research for similar patches to a window of size S x S (called as “search zone”) around the center of the reference patch.

Non-local means implementation details The parameter sigma to compute the weights will depend on the noise variance. Heuristic relation is: Patch-size is a free parameter – usually some size between 7 x 7 and 21 x 21 is chosen. Larger patch-size – better discrimination of the truly similar patches, but more expensive and more (over)smoothing. Smaller patch-size – less smoothing.

Patch-size selection Patch-size too small: mottling effect (fake edges/patterns in constant intensity regions) Patch-size too large: oversmoothing of subtle textures and edges Ref: Duval and Gousseau, “A bias-variance approach for the non-local means”

Gray region (containing patch P) Ref: Duval and Gousseau, “A bias-variance approach for the non-local means” Black region (containing patch Q) Noisy gray region (containing patch U(x))

Assume patch-size is s x s. Assume noise from N(0,1). This is a zero-mean Gaussian random variable with variance 1 Discriminability improves as patch-size increases! It explains why NL-means outperforms single-pixel neighborhood filters! By definition of erfc, this probability decreases as ‘s’ increases.

Extension to Video denoising For video-denoising, simply denoising each individual frame independently ignores temporal similarity or redundancy. Most video denoising algorithms first perform a motion compensation step: (1) estimate the motion between consecutive frames, and (2) align each successive frame to its previous frame. Motion estimation is performed typically by exploiting the “brightness constancy assumption”, i.e. that the intensity of any physical point is unchanged throughout the video.

Extension to Video denoising The most popular motion compensation algorithms also assume that the motion of nearby pixels is similar (motion smoothness assumption). You will study this in more detail in computer vision: optical flow. Denoising is done after motion compensation (assuming that pixels at the same coordinate in successive frames will have same/similar intensities).

Extension to Video denoising There are some problems in motion estimation, even more so, if the video is noisy. One such issue is called the aperture problem – for any block in one frame, there are many matching blocks in the next frame.

Extension to video denoising The motion smoothness assumption is one way to alleviate the aperture problem (again, you will study this in more detail in computer vision). On the next slide, we will see the performance of the Lee filter and the Yaroslavsky filter, with and without motion compensation.

NL-means performs much better!

NL-Means for video denoising Video data has tremendous redundancy (more than individual frames). Any reference patch in one frame will have many similar patches in other frames – the aperture problem is NO problem for video denoising! So forget about motion compensation! Run NL-means on each frame, using similar patches from that frame as well as from nearby frames. Advantages: avoids all the inevitable errors in motion estimation, AND saves computational cost!

An information-theoretic (and iterated) variant of NL-Means - UINTA UINTA = Unsupervised information-theoretic adaptive filter. UINTA is again based on the principle of non-local similarity. It uses tools from information theory (conditional entropy) and kernel density estimation. Uses a simple observation about the entropy of natural images. Ref: Awate and Whitaker, Higher-order image statistics for unsupervised, information-theoretic, adaptive image filtering”

Principle of UINTA The conditional entropy of the intensity of a central pixel given its neighbors is low in a “clean” natural image. As noise is added, this entropy increases. y1 y2 y5 To denoise, you can minimize the following quantity at each pixel: X y20 y24

Overview of UINTA algorithm For each pixel location i, we seek to minimize the following quantity: For this do a gradient descent (at each location) until convergence:

Mathematical details For image neighborhoods with n pixels, we first need to estimate probability density functions of random variables having n (or n-1) dimensions. Consider the neighborhoods are denoted as follows: The expression for the PDF of Z is as follows:

Mathematical Details The expression for the entropy is: The gradient descent is given on the following slide.

Central pixel to be denoised Neighborhood Independent of value of x Chain rule A projection vector that extracts only the dimension corresponding to the central pixel

Note! Note! If you set the derivative of the conditional entropy to zero (you do this since you want to minimize the conditional entropy) and rearrange the terms, you get the NL-means update for denoising. So UINTA can be considered an iterated form of NL-means!

Earlier work on non-local similarity A technique similar (in principle) to UINTA was developed by Popat and Picard in 1997. A training set of clean and degraded images was used to learn the joint probability density of degraded neighborhoods and clean central pixels. Given a noisy image, a pixel value is restored using an MAP estimate. Unlike UINTA, this method requires prior training.

Texture synthesis or completion: another use of non-local similarity Ref: Efros and Leung, “Texture Synthesis by Non-parametric sampling” Remember: a texture image contains very high repetition of “similar” patches all over!

Method: For every pixel (x,y) that needs to be filled, collect valid neighboring intensity values. Search throughout the image to find “similar” neighborhoods. Assign the intensity at (x,y) as some weighted combination of such central pixel values. Free parameters: size of the neighborhood and the definition of “similar neighborhoods”. For pseudo-code, see http://graphics.cs.cmu.edu/people/efros/research/EfrosLeung.html

Some more results

Something similar in Natural Language Processing Collect sequences of n consecutive words (or alphabets) from a large corpus of English text (eg: newspaper, book etc.) Compute the probability of occurrence of the (n+1)-th word given a preceding sequence of n words. Sampling from such a conditional probability table allows for construct of plausible English-like text. Ref: Shannon, A mathematical theory of communication, 1948