Compressed Sensing in Multimedia Coding/Processing Trac D. Tran ECE Department The Johns Hopkins University Baltimore, MD 21218.

Compressed Sensing in Multimedia Coding/Processing Trac D. Tran ECE Department The Johns Hopkins University Baltimore, MD 21218

Outline  Compressed Sensing: Quick Overview  Motivations. Toy Examples  Incoherent Bases and Restrictive Isometry Property  Decoding Strategy  L0 versus L1 versus L2  Basis Pursuit and Matching Pursuit  Compressed Sensing in Image/Video Processing  2D Separable Measurement Ensemble (SME)  Face Recognition  Distributed compressed video sensing (DISCOS)  Layered compressed sensing for robust video transmission

Compressed Sensing History  Emmanuel Candès and Terence Tao, ”Decoding by linear programming” IEEE Trans. on Information Theory, 51(12), pp. 4203 - 4215, December 2005  Emmanuel Candès, Justin Romberg, and Terence Tao, ”Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. on Information Theory, 52(2) pp. 489 - 509, Feb. 2006.  David Donoho, ”Compressed sensing,” IEEE Trans. on Information Theory, 52(4), pp. 1289 - 1306, Apr. 2006.  Emmanuel Candès and Michael Wakin, ”An introduction to compressive sampling,” IEEE Signal Processing Magazine, 25(2), pp. 21 - 30, Mar. 2008.

Traditional Data Acquisition: Sampling t x(t) t ^  Shannon Sampling Theorem In order for a band-limited signal x(t) to be reconstructed perfectly, it must be sampled at rate

Traditional Compression Paradigm Receive Decompress SampleCompress Transmit/ Store MP3, JPEG, JPEG200, MPEG… sampleslargest coefficients  Sample first and then worry about compression later!

Sparse Signals largest coefficients basis functions transform coefficients  Digital signals in practice are often sparse  Audio: MP3, AAC… ~10:1 compression  Images: JPEG, JPEG2000… ~20:1 compression  Video sequences: MPEG2, MPEG4… ~40:1 compression

Sparse Signals II basis functions transform coefficients N-pixel image nonzero entries : -sparse signal in non- sparse domain : -sparse signal

Definition & Notation  N = length of signal x  K = the sparsity level of x or x is called K-sparse  M = the number of measurements (samples) taken at the encoder

Compressed Sensing Framework  Encoding: obtain M measurements y from linear projection onto an incoherent basis  Decoding: reconstruct x from measurements y via nonlinear optimization with sparsity prior Sensing matrix = Compressed measurements has only K nonzero entries

At Encoder: Signal Sensing  Each measurement y contains a little information of each sample of x  y is not sparse, looks iid  Random projection works well!  Sensing & sparsifying matrix must be incoherent

At Decoder: Signal Reconstruction  Recover x from the set of measurements y  Without the sparseness assumption, the problem is ill-posed  With sparseness assumption, the L0-norm minimization problem is well-posed but computationally intractable  With sparseness assumption, the L1-norm minimization can be solved via linear programming – Basis Pursuit!

Incoherent Bases: Definition Suppose signal is sparse in a orthonomal transform domain Take K measurements from an orthonormal sensing matrix Definition : Coherence between and With, T

Incoherent Bases: Properties Bound of coherence : When is small, we call 2 bases are incoherent Intuition: when 2 bases are incoherent, all entries of matrix are spread out each measurement will contains more information of signal we hope to have small Some pair of incoherent bases : DFT and identity matrix : Gaussian (Bernoulli) matrix and any other basis:

Universality of Incoherent Bases  Random Gaussian white noise basis is incoherent with any fixed orthonormal basis with high probability  If the signal is sparse in frequency, the sparsifying matrix is  Product of is still Gaussian white noise!

Restricted Isometry Property Sufficient condition for exact recovery: All sub-matrices composed of columns are nearly orthogonal non-zero entries

L0- and L1-norm Reconstruction - 16 - - norm reconstruction : take advantage of sparsity prior Problem: Combinatorial searchingExhaustive computation - norm reconstruction : Compressed sensing framework This is a convex optimization problem Using linear programming to solve We find the sparsest solution Also can find the sparsest which turns out to be the exact solution

Closed-form solution L2-norm Reconstruction - norm reconstruction : classical approach We find with smallest energy Unfortunately, this method almost never find the sparsest and correct answer

 Problem  Let  Standard LP  Many available techniques  Simplex, primal-dual interior-point, log-barrier… L1-Minimization

circle at a non-sparse point Why Is L1 Better Than L2? - 19 - Bad point The lineintersect The lineintersectdiamond at the sparse point Unique and exact solution

CS Reconstruction: Matching Pursuit  Problem  Basis Pursuit  Greedy Pursuit – Iterative Algorithms  At each iteration, try to identify columns of A (atoms) that are associated with non-zero entries of x significant atoms y A x

1., set residual vector, selected index set 2.Find index yielding the maximal correlation with the residue 3.Augment selected index set: 4.Update the residue: 5., and stop when Matching Pursuit MP: At each iteration, MP attempts to identify the most significant atom. After K iteration, MP will hopefully identify the signal! t = K

Orthogonal Matching Pursuit OMP: guarantees that the residue is orthogonal to all previously chosen atoms  no atom will be selected twice! 1., set residual vector, index set 2.Find index that yields the maximal correlation with residue 3.Augment 4.Find new signal estimate by solving 5.Set the new residual : 6., and stop when t = K

SP: pursues the entire subspace that the signal lives in at every iteration steps and adds a backtracking mechanism! 1.Initialization 2.Selected set 3.Signal estimate 4.Residue 5., go to Step 2; stop when residue energy does not decrease anymore Subspace Pursuit

How Many Measurements Are Enough? Theorem (Candes, Romberg and Tao) : Suppose x has support on T, M rows of matrix F is selected uniformly at random from N rows of DFT matrix N x N, then if M obeying : Minimize L1 will recover x exactly with extremely high probability : In practice: is enough to perfectly recover

CS in Multimedia Coding/Processing  Practical Compressed Sensing or Sparse Signal Processing  One-pixel camera  2D separable measurement ensemble for image/video  Face recognition  Speech recognition  Distributed compressed video sensing  Layered compressed-sensing robust video transmission  Video denoising  Video super-resolution  Multiple description coding  MRI application

One-Pixel Compressed Sensing Camera  Courtesy of Richard Baraniuk & Kevin Kelly @ Rice

CS Analog-to-Digital Converter

Modulated Wideband Converter

Common Sensing Limitations  Treat every source signal as 1D signal: perform sensing operation on a vectorized signal  Increase significant complexity at both encoder and decoder  Inappropriate for some compressive imaging applications such as Compressive Image Sensors  Physical structure of image sensor arrays is 2D  Costly implementation of dense sensing matrices due to wide dynamical range issue  Block-diagonal sensing matrices results in low-performance due to incoherence degradation issue

2D Separable Measurement Ensembles Y = D 1 F 1 P 1 S 1 X S 2 P 2 F 2 D 2 Randomly subsampling rows Randomly subsampling columns WHT Block-Diagonal Fast Transform Randomly Permuting Rows and Columns Randomly Flipping Sign of Rows and Columns Entry 0 Entry 1Entry -1

2D Separable Measurement Ensembles  In algorithm:  Randomly flip sign of rows & columns of a source image  Randomly permute rows and columns of sign-flipped image  Transform the randomized image by Walsh-Hadamard block- diagonal matrix  Randomly subsample rows & columns of the transform image  All operations are performed on rows and columns of a source image, separately

Underlying Principles  Preprocess a source image before subsampling its rows and columns  Spread out the energy along rows and columns  Guarantee energy preservation of a subset of measurements (submatrix) if coupling with a suitable scale factor (with high probability) Preprocessing

Performance Comparison 512x512 Lena  1D Non-separable ME (SRM [Do])  2D Separable ME using block size 32x32 (a) SRM: PSNR = 29.4 dB (b) 2D-SME, PSNR = 28 dB Reconstruction from 25% measurement Simulation Results Reconstruction algorithm: GPSR [Figueiredo]

Performance Comparison 1024x1024 Man  1D Non-separable ME (SRM [Do])  2D Separable ME using block size 32x32 (a) SRM: PSNR = 29.3 dB (b) 2D-SME, PSNR = 28 dB Reconstruction from 35% measurement Simulation Results Reconstruction algorithm: GPSR [Figueiredo]

Application in Face Recognition  Face-subspace model: faces under varying lighting and expression lie on a subspace  A new test sample y of object i approximately lies in the linear span of the training samples associated with i  The test sample y is then a sparse linear combination of all training samples  Sparse representation encodes the membership i of y John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma, “Robust Face Recognition via Sparse Representation”, IEEE Trans. PAMI, Feb. 2009

Sparse Classification  Classification problem can be solved by  If the solution is sparse enough  Test data y is classified based on the residual where the only nonzero entries in are the entries in x that are associated with class i

Example I Coefficients Residual Energy

Robustness to Corruption  Test image y can be corrupted or occluded  If the occlusion e covers less than ~50% of the image, the sparsest solution to y = B w is the true generator w 0  If w 0 is sparse enough, can be solved by  Classified based on the residual

Example II Set-Up  Top left: face image is occluded by a disguise  Bottom left: 50% of random pixels replaced by random values  Right: training images

Example II Result  Test images: sum of a sparse linear combination of the training images and a sparse error due to occlusion or corruption

Robust Local Face Recognition  In previous scheme, faces must be aligned, not robust to registration errors  Local sparsity model:  A block y b in test data y of object i approximately lies in the linear span of the neighboring blocks in the training samples associated with i  y b is a sparse linear combination of neighboring blocks in all training samples Test Data Training Data Non-zero entry Block in the test data Neighboring Blocks in the training data (only one sample is shown for illustration)

Illustration of Our Approach Misaligned test data Generate atoms in the dictionary Select a block Neighboring blocks in a training sample

Our Classification  Solve the sparse recovery problem  Compute residuals and determine the class  Find the sparse coefficient

Example I: Translation Aligned data 32x28 Shifted by 5 pixels in each direction Class 38Class 24Class 27  Randomly choose 16x16 blocks four times  Final classification result: Class 27 (correct)

Example II: Rotation Aligned data 32x28 Class 19Class 27 Class 29 Rotated by 10 degrees  Randomly choose 16x16 blocks four times  Final classification result: Class 27 (correct)

Example III: Random Corruption Original data 32x28 Class 27Class 25Class 27  Final classification result: Class 27 (correct) 40% of the pixels are randomly corrupted  Randomly choose 16x16 blocks four times

Inter-frameVideo Coding Examples: Video Compression Standards MPEG/H.26x High complexity at Encoder & Low complexity at Decoder Inter-frame Decoder X’ Side Information X Inter-frame Encoder

Distributed Video Coding (DVC) Examples: PRISM at Berkeley, Turbo Coding with a feedback channel at Stanford, etc. All based on Wyner-Ziv coding techniques Low complexity at Encoder & High complexity at Decoder X’ Inter-frame Decoder Intra-frame Encoder Side Information X

Intra-frame Coding Inter-frame Decoding MPEG/H.26x Coding MPEG/H.26x Decoding Low Complexity Video Coding and Decoding Distributed Video Coding (DVC)

The Encoder  Intra-code Key Frames periodically using conventional video compression standards (MPEG/H.26x)  Acquire local block-based and global frame-based measurements of CS Frames Distributed Compressed Video Sensing Input Video Key Frames CS Frames Block-based measurement Acquisition Frame-based measurement Acquisition MPEG/H.26x IntraCoding Transmit to the decoder

The Decoder  Decode key frames using conventional image/video compression standards  Perform Sparsity-constraint Block Prediction for motion estimation and compensation  Perform Sparse Recovery with Decoder SI for prediction error reconstruction  Add reconstructed prediction error to the block-based prediction frame for final frame reconstruction Distributed Compressed Video Sensing Output Video Measurement Union Sparse recovery with Side Info at the Decoder Side Info Block Prediction using the Interframe Sparsity Model Key Frames Optimal Block-based Prediction MPEG/H.26x Decoding Measurement Generation Measurement Subtraction Sparse Recovery of Prediction Error Local block measurements Global frame measurements

Distributed Compressed Video Sensing Output Video EncoderDecoder Input Video Key-frames CS-frames Measurements Union Side Info Key-frames Analog-domain Compressive Sensing Block-based measurement Acquisition Frame-based measurement Acquisition Optimal Block-based Prediction MPEG/H.26x IntraCoding MPEG/H.26x Decoding Measurement Generation Measurement Subtraction Sparse Recovery of Prediction Error Block Prediction using the Interframe Sparsity Model Sparse recovery with Side Info at the Decoder

I-frame Inter-frame Sparsity Model  A block X B can be sparsely represented by a linear combination of a few temporal neighboring blocks  A generalized model of block motion BB DBDB xBxB CS-frame I-frame Non-zero entry Macro-Block in CS-frame Neighboring Macro-Blocks in nearby I-frames X B = D B  B

I-frame Inter-frame Sparsity Model  Half-pixel motion compensation BB DBDB xBxB CS-frame I-frame Non-zero entry Macro-Block in CS-frame Neighboring Macro-Blocks in nearby I-frames b1 b2 b3b4

Sparsity-Constraint Block Prediction  Find the block that has the sparsest representation in a dictionary of temporal neighboring blocks  * B = Argmin |  B | 1 s.t. y B = Φ B D B  B Received local block measurements Dictionary of temporal neighboring blocks X * B = D B  * B Block Prediction Block Sensing Matrix  A generalized prediction algorithm of both full-pixel and sub-pixel best matching block search

Sparse Recovery with Decoder SI Prediction Error Often very sparse Can be recovered with higher accuracy Frame Prediction (SI) Measurement Generation Sparse Recovery Received Measurements Prediction Error

Simulation Results Baseline: 27.9 dB DISCOS: 38.7dB Fig. 4 Reconstruction of frame 41 from 25% measurements Performance Comparison  DISCOS and  CS-based intra-coding and intra- decoding (Baseline) Baseline: 27.9 dB DISCOS: 38.7dB Reconstruction of frame 41 from 25% measurements

Simulation Results Baseline: 27.9 dB DISCOS: 38.7dB Fig. 4 Reconstruction of frame 41 from 25% measurements Baseline: 24.3 dB DISCOS: 32.9dB Reconstruction of frame 21 from 25% measurements Performance Comparison  DISCOS and  CS-based intra-coding and intra- decoding (Baseline)

Enhancement Layer Encoder Enhancement Layer Decoder Wyner-Ziv based Approaches Compressive Sensing Approach Forward Error Correction Systematic Lossy Error Protection (Stanford) Wyner-Ziv Layered Video Coding (Texas A&M) PRISM (Berkeley) Packet Loss Channel Packet Loss Channel Error-Resilient Data Transmission

Previous Approaches  FEC  Employ well-known channel codes (Reed-Solomon, Turbo code or LDPC,etc)  Decoded video quality significantly degraded when packet loss rate higher than error correction capacity of the channel code (cliff effect) All based on coding technique on a finite field  Recent Approaches  Wyner-Ziv coding technique based: SLEP (Stanford), Layered Wyner-Ziv Video Coding (Texas A&M)  Distributed Video Coding to mitigate error propagation of the predictive video coding: PRISM (Berkeley)

Compressive Sensing Approach  Borrow principles from Compressive Sensing  Effectively mitigate the cliff effect thanks to the soft-decoding feature of Sparse Recovery Algorithm  Eliminate “Get all or nothing” feature of coding techniques on a finite field A new channel-coding technique on a REAL FIELD

Principle of Block Motion Estimation  Partition current video frame into small non-overlapped blocks called macro-blocks (MB)  For each block, within a search window, find the motion vector (displacement) that minimizes a pre-defined mismatch error  For each block, motion vector and prediction error (residue) are encoded Search Window Reference Frame Current Frame MV

BME/BMC: Example I Previous FrameCurrent FrameMotion Vector Field Frame DifferenceMotion-Compensated Difference

BME/BMC: Example II Previous FrameCurrent FrameMotion Vector Field Frame DifferenceMotion-Compensated Difference

Compressive Sensing Overview Measurement vectorSensing matrix Sparsifying matrix Input signal Sparse transform coefficients  * = Argmin |  | 1 s.t. y = ΦΨ  X * = Ψ  * Sparse Recovery:

MPEG/H.26x Decoding Sparse Recovery with Decoder Side Info Quantized transformed prediction error Motion vectors, mode decisions Side info MPEG/H.26x Encoding EE -1 Sparse Recovery Measurement Acquisition Measurement Generation R Packet Loss Channe l E -1 RoundingEntropy Coding Layered Compressive Sensing Video Codec

 Conventionally encoded by video compression standards MPEG/H.26x  Slices of an prediction error frame are entropy-coded and packetized before being transmitted over error- prone channels without any error correcting code Quantized transformed prediction error Motion vectors, mode decisions MPEG/H.26x Encoding E Measurement Acquisition R Slice of MacroBlocks Prediction Error Frame Base Layer Coding

 Measurements  Acquired across slices of an error prediction frame  Rounded to integers, entropy-coded and sent to the decoder (along with motion vectors and mode decisions) Quantized transformed prediction error Motion vectors, mode decisions MPEG/H.26x Encoding E Measurement Acquisition R Cross-slice measurements A slice of MacroBlocks Enhancement Layer Coding

 Entropy-decode a corrupted base layer (regarded as SI)  Feed the SI and cross-slice measurements received from the enhancement layer into a sparse recovery with decoder SI for recovering lost slices/packets  Add recovered slices/packets back to the corrupted base layer for a final reconstruction of prediction error frames  Feed reconstruction of prediction error frames into a regular MPEG/H.26x decoder for final video frame reconstruction MPEG/H.26x Decoding Motion vectors, mode decisions Side info E -1 Sparse Recovery Measurement Generation E -1 LACOS Decoder

Example of Coding & Decoding = y Φ x x SI Φ = y SI Observed u = y- y SI v = x- x SI = u Φ v Observed v* = argmin |v| 1 s.t. u = Φv x = x SI + v*

Sparse Recovery Algorithm for LACOS  Sparsity Adaptive Matching Pursuit Algorithm (SAMP) (D. & Tran)  Follow the “divide and conquer” principle through stage by stage estimation the sparsity K and the true support set  At each stage, a fixed size finalist of nonzero, significant coefficients is iteratively refined via the Final Test.  When energy of a current residue is greater than that of previous iteration residue, shift to a new stage and expand the size of finalist by a step-size s  Optimal performance guarantee without prior info of sparsity K Prelim Test Update res. r k Update F k F k-1 r k-1 Candidate C k Final Test |C k | adaptive |F k | adaptive

Simulation Results Performance comparison : LACOS, FEC and Error Concealment with Football sequence. Base layers is encoded at 2.97 Mbps FEC: 29 dB LACOS: 30.7dB Reconstruction of the frame 27 with 13.3% packet loss

Simulation Results Performance Comparison: LACOS, FEC and Error Concealment with CIF sequence Mobile. Base layers is encoded at 3.79 Mbps FEC: 31.3 dB LACOS: 33 dB Reconstruction of the frame 34 with 13.9% packet loss

Some Remarks  WZ-based Approaches (e.g. FEC)  Work perfectly when packet loss rate is lower than error correction capacity of the channel code  Perform error concealment when channel decoder fails that results in low performance (cliff effect)  LACOS  Holds soft-decoding feature of sparse recovery algorithm  Mitigates the cliff effect effectively or the decoded video quality gradually degrades when the amount of packet loss increases  Efficient sensing and fast recovery, enabling it to work well in real-time scenarios

75 Clean key frame Video De-noising xBxB DBDB BB + e B noise

76 Video Denoising: Sparse Model x B = D B  B + e B = [D B | I B ] BeBBeB = W B  B Identity matrix Noisy Block  * B = Argmin |  B | 1 s.t. x B = W B  B x * B = D B  * B Denoised Block Sparsity constrained block prediction

77 High resolution key frame Noisy low resolution non-key frames High resolution key frame Video Super-Resolution Unobservable high resolution non-key frame SBSB HBHB = + e B subsamplingbluring yByB xBxB noise Typical relationship between LR and HR patches

78 Video Super-Resolution: Sparse Model x B = S B H B y B + e B = S B H B D B  B + e B = [S B H B D B | I B ] BeBBeB = W B  B Dictionary of neighboring blocks  * B = Argmin |  B | 1 s.t. x B = W B  B y * B = D B  * B High resolution block approximation Sparsity constrained block prediction

79 Blocking-Effect Elimination  Averaging from multiple approximations from shifted and overlapping grids

Compressed Sensing in Medical Imaging  Goal so far:  Achieve faster MR imaging while maintaining reconstruction quality  Methods:  under sample discrete Fourier space using some pseudo-random patterns  then reconstruct using L 1 -minimization (Lustig) or homotopic L 0 -minimization (Trzasko)

Sparsity of MR Images  Brain MR images, cardiology dynamic MR images Sparse in Wavelet, DCT domains Not sparse in spatial domain, or finite-difference domain Can be reconstructed with good quality using 5-10% of coefficients  Angiogram images Space in finite-difference domain and spatial domain (edges of blood vessels occupy only 5% in space) allow very good Compressed Sensing performance

Sampling Methods  Using smooth k-space trajectories Cartesian scanning Radial scanning Spiral scanning  Fully sampling in each read-out  Under sampling by: Cartesian grid: under sampling in phase encoding (uniform, non- uniform) Radial: angular under-sampling (using less angles) Spiral: using less spirals and randomly perturb spiral trajectories

Sampling Patterns: Spiral & Radial Spiral scanning: uniform density, varying density, and perturbed spiral trajectories New algorithm (FOCUSS) allows reconstruction without angular aliasing artifacts

Reconstruction Methods  Lustig’s : L1-minimization with non-linear Conjugate Gradient method  Trzasko’s: homotopic L0-minimization

Reconstruction Results (2DFT) Multi-slice 2DFT fast spin echo CS at 2.4 acceleration.

Results – 3DFT Contrast-enhanced 3D angiography reconstruction results as a function of acceleration. Left Column: Acceleration by LR. Note the diffused boundaries with acceleration. Middle Column: ZF-w/dc reconstruction. Note the increase of apparent noise with acceleration. Right Column: CS reconstruction with TV penalty from randomly under-sampled k-space

1 3 2 4 Results – Radial scan, FOCUSS reconstruction Reconstruction results from full-scan with uniform angular sampling between 0◦–360◦. 1st row: Reference reconstruction from 190 views. 2nd row: Reconstruction results from 51 views using LINFBP. 3rd row: Reconstruction results from 51 views using CG-ALONE. 4th row: Reconstruction results from 51 views using PR-FOCUSS

Results – spiral (a) Sagittal T2-weighted image of the spine, (b) simulated k-space trajectory (multishot Cartesian spiral, 83% under-sampling), (c) minimum energy solution via zero-filling, (d) reconstruction by L1 minimization, (e) reconstruction by homotopic L0 minimization using  (| ∇ u|,  ) = | ∇ u|/ (| ∇ u|+  ), (f) line profile across C6, (g-j) enlargements of (a,c-e), respectively.

Conclusion  Compressed sensing  A different paradigm for data acquisition  Sample less and compute more  Simple encoding; most computation at decoder  Exploit a priori signal sparsity  Universality, robustness  Compressed sensing applications for multimedia  2D separable measurement ensemble for image/video  Face/speech recognition  Distributed compressed video sensing  Layered robust video transmission  Image/Video De-noising & Concealment  MRI applications

References  http://www.dsp.ece.rice.edu/cs/ http://www.dsp.ece.rice.edu/cs/  http://nuit- blanche.blogspot.com/search/label/compressed%20sensing

Compressed Sensing in Multimedia Coding/Processing Trac D. Tran ECE Department The Johns Hopkins University Baltimore, MD 21218.

Similar presentations

Presentation on theme: "Compressed Sensing in Multimedia Coding/Processing Trac D. Tran ECE Department The Johns Hopkins University Baltimore, MD 21218."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Compressed Sensing in Multimedia Coding/Processing Trac D. Tran ECE Department The Johns Hopkins University Baltimore, MD 21218.

Similar presentations

Presentation on theme: "Compressed Sensing in Multimedia Coding/Processing Trac D. Tran ECE Department The Johns Hopkins University Baltimore, MD 21218."— Presentation transcript:

Similar presentations

About project

Feedback