Lecture 26: Single-Image Super-Resolution CAP 5415.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
Unsupervised Learning
Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Introduction to Markov Random Fields and Graph Cuts Simon Prince
Exact Inference in Bayes Nets
Hongliang Li, Senior Member, IEEE, Linfeng Xu, Member, IEEE, and Guanghui Liu Face Hallucination via Similarity Constraints.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
IMAGE UPSAMPLING VIA IMPOSED EDGE STATISTICS Raanan Fattal. ACM Siggraph 2007 Presenter: 이성호.
IMAGE RESTORATION AND REALISM MILLIONS OF IMAGES SEMINAR YUVAL RADO.
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Belief Propagation on Markov Random Fields Aggeliki Tsoli.
Planning under Uncertainty
1 Can this be generalized?  NP-hard for Potts model [K/BVZ 01]  Two main approaches 1. Exact solution [Ishikawa 03] Large graph, convex V (arbitrary.
Distributed Message Passing for Large Scale Graphical Models Alexander Schwing Tamir Hazan Marc Pollefeys Raquel Urtasun CVPR2011.
Exampled-based Super resolution Presenter: Yu-Wei Fan.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
1 Image Completion using Global Optimization Presented by Tingfan Wu.
Convergent and Correct Message Passing Algorithms Nicholas Ruozzi and Sekhar Tatikonda Yale University TexPoint fonts used in EMF. Read the TexPoint manual.
Contents Description of the big picture Theoretical background on this work The Algorithm Examples.
Learning Low-Level Vision William T. Freeman Egon C. Pasztor Owen T. Carmichael.
Announcements Readings for today:
1 Computer Science 631 Lecture 4: Wavelets Ramin Zabih Computer Science Department CORNELL UNIVERSITY.
Computer vision: models, learning and inference
Understanding and evaluating blind deconvolution algorithms
Image Sampling Moire patterns -
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Clustering Unsupervised learning Generating “classes”
Super-Resolution of Remotely-Sensed Images Using a Learning-Based Approach Isabelle Bégin and Frank P. Ferrie Abstract Super-resolution addresses the problem.
Interpolation, extrapolation, etc. Prof. Ramin Zabih
Image Renaissance Using Discrete Optimization Cédric AllèneNikos Paragios ENPC – CERTIS ESIEE – A²SI ECP - MAS France.
CAP5415: Computer Vision Lecture 4: Image Pyramids, Image Statistics, Denoising Fall 2006.
CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.
MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/24/10.
Image Processing Edge detection Filtering: Noise suppresion.
Markov Random Fields Probabilistic Models for Images
Interpolation Prof. Noah Snavely CS1114
1 Markov Random Fields with Efficient Approximations Yuri Boykov, Olga Veksler, Ramin Zabih Computer Science Department CORNELL UNIVERSITY.
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Lecture 19: Solving the Correspondence Problem with Graph Cuts CAP 5415 Fall 2006.
CS Statistical Machine learning Lecture 24
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
Contextual models for object detection using boosted random fields by Antonio Torralba, Kevin P. Murphy and William T. Freeman.
A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.
Graph Algorithms for Vision Amy Gale November 5, 2002.
Today Graphical Models Representing conditional dependence graphically
CSC321: Lecture 25: Non-linear dimensionality reduction Geoffrey Hinton.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Bayesian Belief Propagation for Image Understanding David Rosenberg.
Markov Random Fields in Vision
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
CSC321: Extra Lecture (not on the exam) Non-linear dimensionality reduction Geoffrey Hinton.
Jianchao Yang, John Wright, Thomas Huang, Yi Ma CVPR 2008 Image Super-Resolution as Sparse Representation of Raw Image Patches.
Energy minimization Another global approach to improve quality of correspondences Assumption: disparities vary (mostly) smoothly Minimize energy function:
CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.
Biointelligence Laboratory, Seoul National University
Sublinear Computational Time Modeling in Statistical Machine Learning Theory for Markov Random Fields Kazuyuki Tanaka GSIS, Tohoku University, Sendai,
Markov Random Fields with Efficient Approximations
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Expectation-Maximization & Belief Propagation
Mean Field and Variational Methods Loopy Belief Propagation
Presentation transcript:

Lecture 26: Single-Image Super-Resolution CAP 5415

Announcements Projects Due Dec 2 If you have nice results and would like to make a 20 minute presentation on your project in class on Wednesday, let me know  Bonus – reduction in page requirements for writeup

The Goal of Super-Resolution We have a low-resolution version of an image We want to create a higher-resolution version

Why not just use Photoshop? Standard interpolation (bicubic or bilinear) create an image with more pixels, but don’t introduce new frequency content Small Sharp Picture Large Smooth Picture

View in the Spatial Frequency Domain Original Image Doubled in Size Quadrupled in Size Each image shows the magnitude of the DFT of the image at a particular size. Interpolation cannot introduce high-frequency content into the high-res image Our goal is to create a system that can

Today I will begin by talking about one of most frequently-cited super-resolution papers Then talk about my own extensions

Our approach We take a probabilistic approach Create a distribution  - high-resolution image  - observed low-resolution image Find the high resolution image by finding the image that maximizes

Other Approaches Level set approaches (Morse and Schwartzwald 2001) Non-linear enhancement (Greenspan 2000) Deblurring by inverting the convolution filter  Unable to introduce new high-frequency components

The Real Problem We can now find the high-resolution image by finding the choosing the right patch to put in at each point Pick patch from image database

Basic Setup (Images from Freeman et al.) Low-Resolution Observation Bi-linearly interpolated Actual High-Res Store examples of corresponding patches

The Real Problem We can now find the high-resolution image by finding the choosing the right patch to put in at each point Pick patch from image database Problem:  That's a lot of patches! (Part) Solution 1:  Filter out low-frequencies  Reduces Variability (Part) Solution 2:  Contrast Normalize (Takes out scale variability)

What is really being stored Store these high-frequency pairs

Getting a new image Look at low-resolution patch in image Find most similar low-res patch in database Fill in corresponding high-res Does this work? AnswerCorrect High Res

What Happened?

To Do Better This method treats each patch independently We need to consider patches together For example  Edges tend to continue over long distances

Modeling For a 256x256 image with 10 candidate patches, could have as many as x256 numbers in the distribution Need to simplify our model! Strategy for simplification 1.Simplify relationships between pixels by using a Markov Random Field

Strategy: Markov Random Field Assume that given the red pixels, the blue pixel is conditionally independent of the green pixels. Alternate Explanation: Given the red pixels, the green pixels contain no additional information about what the blue pixel should be

Strategy: Markov Random Field Assuming that a patch depends on its four nearest neighbors Naively, only need 10,000 numbers (big reduction!)

Strategy #1: Markov Random Field Can represent a distribution like this as a graph Each node corresponds to one variable (pixel in this example) Edges denote assumptions about conditional independence In this graph, given each node's four neighbors, it is conditionally independent of all other nodes

The model Divide high-res image into 7x7 patches

Our Model Divide high-res image into 7x7 patches Each patch is one node in the graph  Or, each patch is one random variable in the distribution

Our Model Divide high-res image into 7x7 patches Each patch is one node in the graph  Or, each patch is one random vector in the distribution Conditioned on its four nearest neighbors, each patch is conditionally independent of the rest of the patches

Another View Intuitively, we want to create a high-resolution image out of a collection of patches (Figure from Freeman)

Finishing the Model In a pairwise MRF such as the lattice we're using, there is one compatibility function per edge in the graph The distribution represented by this graph is Every neighboring pair connected by an edge We have to decide this The states of two neighboring candidate patches

Choosing The compatibility between patches is determined by how similar their borders are Patch 1 Patch 2

Results First test, how well does it match the statistics of high-frequency, even if they're weird?

Results Training Images:

Results

Failure

Revisiting The Issue of States In this algorithm, the state of each node is which candidate patch to use Tradeoff:  Too many states – Maximizing P(h|l) is intractable  Too few states – unable to represent high-res image well with a small number of discrete states. Solution:  Use the observed low-resolution image to choose a small number of patches that are likely to represent the high-res image well

Choosing Candidates Using a database  Have to store database  Have to search database Different Approach:  Generate the candidates directly from the low-res observation.

1-D Example Learn a set of interpolators Each interpolator creates a candidate high-resolution signal from the low-resolution input

2-D Example N i (L):9x1 vector M1M1 M1M1 M1M1 MSMS M2M x9 Matrices Candidate 4x4 High-Res patches Patch of Input Image

Where does M S come from? Use a training database of high-resolution/low- resolution pairs to choose the best interpolators Find the interpolators using a clustering algorithm: 1. Cluster the pairs using k-means clustering 2. For each cluster, find the interpolator that best predicts the high-resolution patches from the low- resolution patches 3. For each pair, reassign it to the cluster whose interpolator best predicts the high-resolution patch from the low-resolution patch 4. Repeat Steps 2 and 3 until convergence

How Many Interpolators are Needed? Can use the training set to estimate a lower bound on the error incurred using the interpolators For super-resolution, we used 64

Finishing the Model We've decided the states of the model, but we still need to decide the actual distribution The distribution of a MRF has a unique form The distribution represented by this graph is A B C D The functions are known as compatibility functions or clique potentials In a pairwise MRF such as the lattice we're using, there is one compatibility function per edge in the graph These functions measure how compatible the states of two neighboring nodes are High number – very compatible Low number – not compatible

Finishing the Model In a pairwise MRF such as the lattice we're using, there is one compatibility function per edge in the graph The distribution represented by this graph is Every neighboring pair connected by an edge We have to decide this The states of two neighboring candidate patches

Choosing We use the image derivatives of neighboring patches to compute The difference between a pair of adjacent red and blue pixels Patch 1 Patch 2 < 1 Red Pixels – Border of Patch 1 Blue Pixels – Border of Patch 2

Justifying We can justify this choice in two ways  Image statistics  Image “sharpness”

Image Statistics and The distribution of the derivatives of a natural image is modeled well by where typically 0.7 < < 1.2 Using attempts to respect these statistics

Image Sharpness Consider the problem of interpolating y 1 from y 0 and y 2 Model the distribution of y 1 as If α > 1, then the most likely value of y 1 is 1.5 “Blurry” Edge If α < 1, then the most likely value of y 1 is either 1 or 2 Sharp edge α < 1 acts as a sharpness prior. It prefers sharp edges

Image Sharpness The compatibility function works the same way It favors setting the patches so that there are as few derivatives possible If there must be image derivatives, then  If α > 1, then the compatibility function favors many small derivatives over a few large derivatives  If α < 1, the compatibility function favors a few large derivatives over many small derivatives If α < 1 favors a high-res image with sharp edges

Summary so far MRF Model of high-resolution image  Each node corresponds to a patch of the high-resolution image The state of each node in the MRF corresponds to an interpolator that produces a high-resolution patch from the low-resolution input image The compatibility functions between patches are based on image derivatives  Model also includes a reconstruction constraint

Finding a high-resolution image Now that we have distribution, we can find a high-resolution image by maximizing One problem, With more than 2 states per node, maximizing is NP-Complete We're using 64 states per node!

Maximizing Maximizing P(h|l) is intractable, so we have to use an approximate technique  Approximate technique – Not guaranteed to find the best value of h, but we hope it finds a pretty good value Two Popular Techniques 1.Loopy Belief Propagation (Pearl 88, Weiss 98, Freeman 2000) 2.Graph Cuts (Boykov, Veksler, Zabih 2000)

Loopy Belief Propagation Assume P(h|l) can be represented by a graph with no loops Can maximize P(h|l) using an algorithm based on passing messages between nodes in the graph (Pearl 88, Weiss 98) Messages encode belief about the state of each nodes

Doesn't Our Graph Have Loops? Not guaranteed to work if there are loops in the graph Empirically, many have found that it works well anyway Some theoretical justification (Weiss and Freeman)

Graph Cuts Start with an initial labeling of the graph  Denote the probability of this labelling P(h 0 |l) How can we relabel the graph and increase P(h 0 |l)?

Graph Cuts Perform a swap:  For two states s 1 and s 2, perform a swap by changing some nodes from s 1 to s 2 and vice-versa The optimal swap can be found in polynomial time

Graph Cuts Keep performing swaps between states until you can no longer increase P(h|l) with a single swap P(h|l) never decreases! Practically, convergence is guaranteed

Which should I use? Belief Propagation  No guarantees about maximizing P(h|l)  Not guaranteed to converge  Can handle arbitrary graphs  Can estimate the marginal probabilities of P(h|l) caveat: these marginals will be wrong Graph Cuts  No guarantees about maximizing P(h|l)  Guaranteed to converge  Not sure  Cannot produce marginals  Tends to find better solutions than BP (Tappen 2003)

Which did we use? Our model also includes a reconstruction constraint that forces the recovered high- resolution image to match the low-resolution image when it is down-sampled The peculiar graph structure led us to use belief propagation (Already had BP code too) BP code available at

Results Actual High Res Pixel Replicated Low-Res Bicubic Interpolation Greenspan et al. Our method

Results Actual High Res Pixel Replicated Low-Res Bicubic Interpolation Greenspan et al. Our method

CCD Demosaicing This approach is flexible enough to be applied to other image processing problems We also applied it to CCD Demosaicing

Common Artifacts

Results