Download presentation
Published byDalton Topliff Modified over 9 years ago
Visual Dictionaries George Papandreou CVPR 2014 Tutorial on BASIS
Toyota Technological Institute at Chicago Visual Dictionaries George Papandreou CVPR 2014 Tutorial on BASIS
Additive Image Patch Modeling
The patch-based image modeling approach. How to span the space of all 8x8 image patches? α1 Σ α2 α3 K D Atoms shown correspond to real part of the filters (source the SPM DT-CWT review article)
Additive Image Patch Modeling
The patch-based image modeling approach. How to span the space of all 8x8 image patches? α1 α2 Σ α3 K D Atoms shown correspond to real part of the filters (source the SPM DT-CWT review article)
Two Modeling Goals Image reconstruction
Use dictionary to build image prior Tasks: Compression, denoising, deblurring, inpainting,… Atoms shown correspond to real part of the filters (source the SPM DT-CWT review article) Image interpretation Use dictionary for feature extraction Tasks: Classification, recognition,…
Three Modeling Regimes
Two inter-related properties: How big is the dictionary? Over-completeness: How many non-zero components? sparsity PCA Sparse Coding Clustering Atoms shown correspond to real part of the filters (source the SPM DT-CWT review article)
Where Does the Dictionary Come From?
(1) Dictionary is fixed, e.g., basis or union of bases JPEG image compression DCT Wavelets
Where Does the Dictionary Come From?
(2) Learn generic dictionary from a collection of images Many algorithms possible (see later)
Where Does the Dictionary Come From?
(3) Learn an image-specific (image-adapted) dictionary Many algorithms possible (see later)
Where Does the Dictionary Come From?
(4) Non-parametric: Dictionary is the set of all overlapping image patches (one or many images) Non-local means, patch transform, etc.
Beyond Bases: Hierarchical Dictionaries
(1) Multi-scale image modeling Apply same dictionary to image at different scales Gaussian+Laplacian pyramids, wavelets, … (2) Recursive hierarchical models Build recursive dictionaries Deep learning
Key Problems Coding Dictionary learning Hierarchical modeling
Find the expansion coefficients given the dictionary Dictionary learning Given data, learn a dictionary Hierarchical modeling K D
Image Coding Problem: Least Squares
Least squares criterion. Equivalent formulations: Solution (Tikhonov regularization, Wiener filtering): Columns of V are the dual filters (dual dictionary). Fast processing (inner products). Yields dense code.
Image Coding Problem: Vector Quantization
Equivalent formulations: Solution: Exact O(DK): one inner product for each basis Approximate O(D logK): ANN search
Sparse Coding Problem Assume only L non-zero coefficients:
This is a much harder combinatorial problem. In the worst case there are possible active sets. If we knew the active set of coefs, then LS problem. Two very effective families of approximate algorithms: Greedy algorithms Relaxation algorithms
Greedy Sparse Coding: Matching Pursuit
Greedily add T terms one by one Algorithm (Basic Matching Pursuit): Initialize the residual r = x Find atom that best explains the residual Update the residual Return if stopping criterion met, otherwise go to 2. VQ problem at each iteration Many variants (e.g., OMP). Efficient implementations. Mallat (2009) SPAMS
Basic Matching Pursuit Convergence Analysis
Exponential convergence (recall VQ analysis): Dictionary coherence: Note that if spans Basic matching pursuit costs T times more than VQ.
Relaxed Sparse Coding Continuous relaxation of the combinatorial problem Prominent case: p = 1 (L1 convex optimization)
Basis Pursuit Coding L1-penalized problem (a.k.a. basis pursuit, LASSO) Global optimum (convex optimization) Huge literature: Algorithms for large-scale problems Recovery guarantees: compressed sensing Extensions: TV minimization, ADMM Extensions: Re-weighted L1 Non-convex relaxations: 0 < p < 1 Mallat (2009), Elad (2010) SPAMS
Thresholding Algorithms
Lp-optimization with orthonormal basis Decompose into separable problem: L2 invariant to rotation Lp norm is separable Look-up table 1-D optimization: L0 / L1: hard/soft thresholding L2: linear shrinkage Elad (2010)
Recap: (Sparse) Coding
Problem: Find the expansion coefficients given the dictionary Exact methods p = 2 (Fourier, PCA, etc): Linear system p = 0 and = 1 (VQ): Fast search Orthonormal dictionary: Separable 1-D optimization Approximate methods for sparse coding p = 0: Greedy matching pursuit p = 1: Convex relaxation
Dictionary Learning Find a dictionary W that best fits a dataset
Exact solution for L2 norm via the SVD (PCA) For sparse norms this is a hard non-convex problem even if the coding problem is convex Main approach: alternating minimization Recent advances in theory
Alternating Minimization Methods
Update codes given dictionary Use any greedy/ relaxation sparse coding algorithm Update dictionary , given codes Least squares Method converges to local minimum K-SVD: Updates dictionaries sequentially Online version much faster for large datasets Olshausen & Field (1996); Engan+ (1999); Aharon+ (2006); Mairal+ (2010)
K-Means as Dictionary Learning Method
Update codes given dictionary such that Update dictionary , given codes Special case of K-SVD using OMP-1 for coding Extremely fast Aharon+ (2006); Coates, Lee, Ng (2011)
Learned Dictionaries Generic KSVD Barbara KSVD Aharon+ (2006)
Learned Dictionaries Generic KSVD Generic K-Means
Aharon+ (2006), Coates+ (2011), Papandreou+ (2014)
Image Denoising with Learned Dictionaries
Noisy 22.1dB Denoised KSVD 30.8dB Aharon+ (2006)
Image Inpainting with Learned Dictionaries
Joint dictionary learning and image inpainting Mairal+ (2010)
K-SVD vs K-Means Dictionaries in Denoising
Replace K-SVD with K-Means in dictionary learning step of the denoising algorithm. KSVD dB OMP-32, 84 sec K-Means dB OMP-1, 22 sec Noisy dB
Recap: Dictionary Learning
Find a dictionary W that best fits a dataset Non-convex problem Greedy alternating optimization methods The K-means algorithm is very fast and works well for small image patches
Image Patch Dictionaries in Visual Recognition
SIFT-based Bag-of-Words classification pipeline Dictionary >10K words Patches SIFT Classifier
Patch Dictionaries in Image Classification
Image classification without SIFT Key insights: K-means works well Whitening is crucial Using larger dictionaries boosts recognition rate Encoding has a huge effect on performance Promising results on CIFAR but not on large image datasets Varma, Zisserman (2003); Coates+ (2011)
Histograms of Sparse Codes for Object Detection
Key idea: Build a HOG-like descriptor on top of K-SVD learned patch dictionary instead of gradients, then DPM Ren, Ramanan (2013); Also see Dikmen, Hoiem, Huang (2012)
Hierarchical Modeling and Dictionary Learning
So far: Modeling the appearance of small image patches, say 8x8 pixels. How about dictionaries of larger visual patterns? Multiscale modeling Work with image pyramids Hierarchical modeling Model higher order statistics of feature responses Recursively compose complex visual patterns Use unsupervised or supervised objectives
Hierarchical Models of Objects
Fidler & Leonardis (2007); Zhu+ (2010)
Hierarchical Matching Pursuit (K-SVD)
Bo, Ren, Fox (2013)
Deep Convolutional Networks
LeCun+ (1998); Krizhevsky+ (2012)
Transformation Aware Dictionaries
How to span the space of all 8x8 image patches? α1 α2 Σ α3 K D Atoms shown correspond to real part of the filters (source the SPM DT-CWT review article)
Sources of Redundancy in Patch Dictionaries
Same pattern, different position Same pattern, opposite polarity (x2 redundancy) Same pattern, different contrast How to build less redundant dictionaries?
The Epitome Data Structure
Patch Epitome Epitomes: Jojic, Frey, Kannan, ICCV-03
Generating Patches from an Epitome
Generating Patches from an Epitome
A single epitome essentially is a large collection of translated copies of a visual pattern.
Position and Appearance Transformations
Epitomic Image Matching
Epitomes: Jojic, Frey, Kannan, ICCV-03
Dictionary of Mini-Epitomes
Papandreou, Chen, Yuille, CVPR-14
Coding and Learning with Epitomic Dictionaries
Patch coding in epitomic dictionaries: Epitomic dictionary equivalent to standard dictionary with patches at all possible positions in epitome: Dictionary learning: Variational inference on GMM model (Jojic+ '01) Sparse dictionary learning (Aharon, Elad '08; Mairal+ '11) Epitomic K-Means (Papandreou+ '14)
K-Means for the Mini-Epitome Model
Generative model: Select mini-epitome k with probabilityz Select position p within epitome uniformly Generate the patch Epitomic K-means (hard-EM) Epitomic matching (hard assignment) Epitome update Diverse initialization with K-means++ (optional) Papandreou, Chen, Yuille, CVPR-14
K-Means for the Mini-Epitome Model
Generative model: 1. Select mini-epitome k with probability 2. Select position p within epitome uniformly 3. Generate the patch Max likelihood, hard EM – essentially epitomic adaptation of K-Means. Faster convergence using diverse initialization of mini- epitomes by epitomic adaptation of K-Means++.
A Generic Mini-Epitome Dictionary
Epitomic dictionary 256 mini-epitomes (16x16) Non-Epitomic dictionary 1024 elements (8x8) Both trained on 10,000 Pascal images
Evaluation on Image Reconstruction
Original image Epitome reconstr. PSNR: 29.2 dB Improvement over non- epitome
Evaluation on Image Reconstruction
Evaluation on VOC-07 Image Classification
Max-Pooling vs. Epitomic Convolution
Deep Epitomic Convolutional Nets
Convolution+ max-pooling Epitomic convolution Imagenet top-5 error: 14.2(max-pool) 13.6 (epitome) Papandreou arXiv-14
Epitomic Patch Matching
1. We have K mini-epitomes (say patch size is 8x8 pixels and mini-epitome size is 12x12 pixels). 2. For each patch in the image and each mini-epitome k = 1:K, find the patch at position p in the epitome which minimizes the reconstruction error (whitening omitted): (12-8+1)^2 = 25 candidate positions/epitome in this example 3. Algorithms: Exact search (GPU, <0.5 sec/image) or ANN or dynamic programming algorithm.
Epitomic Match vs. Max Pooling
1. Position search equivalent to epitomic convolution: 2. Epitomic convolution is an image-centric alternative to convolution followed by “max-pooling”: * It is much easier to define image prob models based on EC than MP * Evaluation in discr. tasks underway
Recap: Transformation Aware Dictionaries
Reduce dictionary redundancy by explicitly modeling nuisance variables Compact dictionaries for image reconstruction and recognition Epitomes as translation aware data structurez Epitomic convolution as alternative to a pair of consecutive convolution and max-pooling layers in deep networks.
Similar presentations
© 2025 Inc.
All rights reserved.