Compressed Sensing Compressive Sampling Daniel Weller June 30, 2015
Introduction to Sampling Sampling is composed of two operations: Discretization: continuous-time to discrete-time Quantization: discrete-time to digital Discretization can be written as: Usually, we assume shift invariance: Question: What is s(-t) in classical DSP? s(-t) t = tm y[m] x(t)
Sampling and Reconstruction Given a set of samples {y[m]}, how do we reconstruct x(t)? Classical approach: Sampling period T Interpolate with a filter h(t) Another interpretation: {h(t-nT)} is a basis of xr(t). Inverse problem form with coefficients {αn}: We will mainly consider the finite M, N case. h(t) Impulse train y[m] xr(t) ×
Classical Approximation Theory Least squares problem: normal equations are For A’A to be positive definite, A must have at least N rows: need M ≥ N For infinite length signals, Shannon sampling theory: bandlimited signals Solution minimizes error power How about signals with structure? How many samples do we need to reconstruct a sine wave?
Compression and Approximation The more we know about a signal, the fewer samples we need to approximate it! This basic idea underlies signal compression methods like MP3, JPEG2000, others Peppers (uncompressed) Peppers with JPEG compression
Compressible Signals What kinds of structure are useful to us? Low dimensionality: Φ is N×K, where K << N Union of subspaces: Φ1, Φ2, … are each subspaces Sparsity: The set XK is the set of all K-sparse signals in the N-dimensional space. (not the same as Grassmann…) Others like finite rate of innovation also possible…
What signals are compressible? A signal is K-sparse if A signal is approximately K-sparse if is small enough. Peppers (grayscale) Peppers (10% coefficients) 10% coefficients 2% error
Sparsity Set of signals with p-norm ≤ K is called K-ball: Suppose we have a measurement y1 = a1’x. Where is x that minimizes the p-norm? p = 1 p = 2 p = ½ p = ¼
The Compressed Sensing Problem Finite measurements y generated by with M×N sensing matrix A, with M < N. If we know x is K-sparse (K << N), when is x determined uniquely by y? Null space condition: This is true when the null space of A contains no 2K-sparse vectors. Why? More formal conditions follow.
A Spark for Compressed Sensing Spark: minimum # of columns of A that are linearly dependent Theorem: x is unique iff spark(A) > 2K. What if x is compressible instead? We need to modify our condition to ensure the null space of A is not too compressible: This condition is related to the recovery guarantee
R.I.P., Compressed Sensing We assumed y = Ax, exactly, with no noise. The restricted isometry property (RIP) extends the idea to noisy recovery. An isometry preserves the norm, so RIP states that A preserves the norm across K-sparse x. RIP is necessary for stability with noise e: Here, δ2K ≥ 1-1/C2.
RIP Measurement Bounds If A satisfies RIP-2K with constant δ2K ≤ ½, Proof: First, construct a subset X of XK: Since A satisfies RIP-2K, These bounds allow us to state (via sphere-packing): Details are in MA Davenport et al., Compressed Sensing: Theory and Applications, YC Eldar and G Kutyniok, eds., Cambridge, 2015, pp. 45-47.
Mutual (In-)Coherence Coherence of a matrix A is the largest inner product between two different columns: It is possible to show spark(A) ≥ 1+1/μ(A). Thus, we have a coherence bound for exact recovery of K-sparse signals: Also A, with unit-norm columns, satisfies RIP-K with δK = (K-1)μ(A) for all K < 1/μ(A). Thus, the less coherent A is, the better RIP is.
Matrices for CS Some deterministic matrices have properties like minimum coherence or maximum spark. Equiangular tight frames (ETF) Vandermonde matrices Random matrices can satisfy these properties without the limitations of construction. iid Gaussian, Bernoulli matrices satisfy RIP. Such constructions are universal, in that RIP is satisfied irrespective of the signal basis. Mercedes-Benz ETF
Matrices for CS For image processing, iid random A may be extremely large. Instead, we can randomly subsample a deterministic sensing matrix. Fourier transform is used for MRI, some optical. The coherence, RIP bounds are not quite as good, and not universal. Some work (e.g., SparseMRI) empirically verifies the incoherence of a random sensing matrix. We can also construct dictionaries from data.
CS Reconstruction Formulation Consider the sparse recovery problem: Exact: Noisy: The convex relaxation yields a sparse solution: An unconstrained version also is popular: The matrix A may include a dictionary Φ. We will describe several standard approaches.
(Orthogonal) Matching Pursuit At each step, include the next column of A that best correlates with the residual. The columns of A must have unit norm. The ith step chooses an atom λi according to: If Ai is the collection of atoms λ1, …, λi, the new signal is then The new residual is This residual is orthogonal to Ai.
Iterative Hard Thresholding Another approach repeatedly thresholds the non-sparse coefficients. At each step, the normal residual A’(y-Axi-1) is added to xi-1, and thresholded to form xi. We can also view IHT as thresholding the separable quadratic approximation to
Convex Optimization The unconstrained version is Iterative soft thresholding is similar to IHT: Split Bregman iteration also is popular: This method is the same as ADMM (1 split).
Convex Optimization The constrained version (BPDN) is When ε = 0, and x is real, the problem is a LP. SPGL1 solves the related LASSO problem and by finding τ that maps to ε, solves BPDN. SPGL1 uses a linesearch-based projected gradient approach to solve the LASSO. Details are in E van den Berg and MP Friedlander, SIAM J. Sci. Comput., 31(2), pp. 890-912, 2008.