Sparse and Overcomplete Data Representation Michael Elad The CS Department The Technion – Israel Institute of technology Haifa 32000, Israel Israel Statistical Association 2005 Annual Meeting Tel-Aviv University (Dan David bldg.) May 17th, 2005
Agenda A Visit to Sparseland 2. Answering the 4 Questions Motivating Sparsity & Overcompleteness 2. Answering the 4 Questions How & why should this work? Welcome to Sparseland Sparse and Overcomplete Data Representation
M Data Synthesis in Sparseland K N A fixed Dictionary Every column in D (dictionary) is a prototype data vector (Atom). M The vector is generated randomly with few non-zeros in random locations and random values. A sparse & random vector N Sparse and Overcomplete Data Representation
M Sparseland Data is Special Simple: Every generated vector is built as a linear combination of few atoms from our dictionary D Rich: A general model: the obtained vectors are a special type mixture-of-Gaussians (or Laplacians). Multiply by D M Sparse and Overcomplete Data Representation
M T Transforms in Sparseland ? M Assume that x is known to emerge from . . We desire simplicity, independence, and expressiveness. M How about “Given x, find the α that generated it in ” ? T Sparse and Overcomplete Data Representation
Difficulties with the Transform Multiply by D A sparse & random vector Are there practical ways to get ? 4 Major Questions Is ? Under which conditions? How effective are those ways? How would we get D? Sparse and Overcomplete Data Representation
Sparseland is HERE Why Is It Interesting? Several recent trends from signal/image processing worth looking at: JPEG to JPEG2000 - From (L2-norm) KLT to wavelet and non-linear approximation From Wiener to robust restoration – From L2-norm (Fourier) to L1. (e.g., TV, Beltrami, wavelet shrinkage …) From unitary to richer representations – Frames, shift-invariance, bilateral, steerable, curvelet Approximation theory – Non-linear approximation ICA and related models Sparsity. Overcompleteness. Sparsity & Overcompleteness. Independence and Sparsity. Sparseland is HERE Sparse and Overcomplete Data Representation
T Agenda 1. A Visit to Sparseland 2. Answering the 4 Questions Motivating Sparsity & Overcompleteness 2. Answering the 4 Questions How & why should this work? T Sparse and Overcomplete Data Representation
Suppose we can solve this exactly Question 1 – Uniqueness? Multiply by D M Suppose we can solve this exactly Why should we necessarily get ? It might happen that eventually . Sparse and Overcomplete Data Representation
Matrix “Spark” Donoho & Elad (‘02) Definition: Given a matrix D, =Spark{D} is the smallest and and number of columns that are linearly dependent. Example: Spark = 3 Rank = 4 Sparse and Overcomplete Data Representation
Uniqueness Rule M Suppose this problem has been solved somehow If we found a representation that satisfy then necessarily it is unique (the sparsest). Uniqueness Donoho & Elad (‘02) This result implies that if generates vectors using “sparse enough” , the solution of the above will find it exactly. M Sparse and Overcomplete Data Representation
M Question 2 – Practical P0 Solver? Are there reasonable ways to find ? Multiply by D Sparse and Overcomplete Data Representation
Matching Pursuit (MP) Mallat & Zhang (1993) The MP is a greedy algorithm that finds one atom at a time. Step 1: find the one atom that best matches the signal. Next steps: given the previously found atoms, find the next one to best fit … The Orthogonal MP (OMP) is an improved version that re-evaluates the coefficients after each round. Sparse and Overcomplete Data Representation
Basis Pursuit (BP) Instead of solving Solve Instead Chen, Donoho, & Saunders (1995) Instead of solving Solve Instead The newly defined problem is convex. It has a Linear Programming structure. Very efficient solvers can be deployed: Interior point methods [Chen, Donoho, & Saunders (`95)] , Sequential shrinkage for union of ortho-bases [Bruce et.al. (`98)], If computing Dx and DT are fast, based on shrinkage [Elad (`05)]. Sparse and Overcomplete Data Representation
M Question 3 – Approx. Quality? How effective are the MP/BP Multiply by D How effective are the MP/BP in finding ? Sparse and Overcomplete Data Representation
Mutual Coherence Compute D DT DTD = D DTD Assume normalized columns The Mutual Coherence µ is the largest entry in absolute value outside the main diagonal of DTD. The Mutual Coherence is a property of the dictionary (just like the “Spark”). The smaller it is, the better the dictionary. Sparse and Overcomplete Data Representation
BP and MP Equivalence Equivalence Given a vector x with a representation , Assuming that , BP and MP are Guaranteed to find the sparsest solution. Donoho & Elad (‘02) Gribonval & Nielsen (‘03) Tropp (‘03) Temlyakov (‘03) MP is typically inferior to BP! The above result corresponds to the worst-case. Average performance results are available too, showing much better bounds [Donoho (`04), Candes et.al. (`04), Elad and Zibulevsky (`04)]. Sparse and Overcomplete Data Representation
M Question 4 – Finding D? Multiply by D Given these P examples and a fixed size [NK] dictionary D: Is D unique? (Yes) How to find D? Train!! The K-SVD algorithm Skip? Sparse and Overcomplete Data Representation
X A D Training: The Objective Each example is a linear combination of atoms from D Each example has a sparse representation with no more than L atoms Sparse and Overcomplete Data Representation
Column-by-Column by SVD computation The K–SVD Algorithm Aharon, Elad, & Bruckstein (`04) D Initialize D Sparse Coding Use MP or BP X Dictionary Update Column-by-Column by SVD computation Sparse and Overcomplete Data Representation
Today We Discussed A Visit to Sparseland 2. Answering the 4 Questions Motivating Sparsity & Overcompleteness 2. Answering the 4 Questions How & why should this work? Sparse and Overcomplete Data Representation
There are difficulties in using them! Summary Sparsity and Over- completeness are important ideas that can be used in designing better tools in data/signal/image processing There are difficulties in using them! We are working on resolving those difficulties: Performance of pursuit alg. Speedup of those methods, Training the dictionary, Demonstrating applications, … The dream? Future transforms and regularizations will be data-driven, non-linear, overcomplete, and promoting sparsity. Sparse and Overcomplete Data Representation