Pyramid Vector Quantization

Slides:



Advertisements
Similar presentations
General Linear Model L ύ cia Garrido and Marieke Schölvinck ICN.
Advertisements

Image Registration  Mapping of Evolution. Registration Goals Assume the correspondences are known Find such f() and g() such that the images are best.
3D Geometry for Computer Graphics
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Medical Image Registration Kumar Rajamani. Registration Spatial transform that maps points from one image to corresponding points in another image.
Color Imaging Analysis of Spatio-chromatic Decorrelation for Colour Image Reconstruction Mark S. Drew and Steven Bergner
MPEG4 Natural Video Coding Functionalities: –Coding of arbitrary shaped objects –Efficient compression of video and images over wide range of bit rates.
Computer vision: models, learning and inference Chapter 8 Regression.
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.
Light Field Compression Using 2-D Warping and Block Matching Shinjini Kundu Anand Kamat Tarcar EE398A Final Project 1 EE398A - Compression of Light Fields.
CSE 589 Applied Algorithms Spring 1999 Image Compression Vector Quantization Nearest Neighbor Search.
x – independent variable (input)
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Spatial and Temporal Data Mining
Frederic Payan, Marc Antonini
Independent Component Analysis (ICA) and Factor Analysis (FA)
A New Content-Based Hybrid Video Transcoding Method YongQing Liang YapPeng Tan Presented by Robert Hung.
Texture Reading: Chapter 9 (skip 9.4) Key issue: How do we represent texture? Topics: –Texture segmentation –Texture-based matching –Texture synthesis.
CS :: Fall 2003 MPEG-1 Video (Part 1) Ketan Mayer-Patel.
Image Features, Hough Transform Image Pyramid CSE399b, Spring 06 Computer Vision Lecture 10
2015/7/12VLC 2008 PART 1 Introduction on Video Coding StandardsVLC 2008 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
Xinqiao LiuRate constrained conditional replenishment1 Rate-Constrained Conditional Replenishment with Adaptive Change Detection Xinqiao Liu December 8,
Representation and Compression of Multi-Dimensional Piecewise Functions Dror Baron Signal Processing and Systems (SP&S) Seminar June 2009 Joint work with:
Lossy Compression Based on spatial redundancy Measure of spatial redundancy: 2D covariance Cov X (i,j)=  2 e -  (i*i+j*j) Vertical correlation   
CS559-Computer Graphics Copyright Stephen Chenney Image File Formats How big is the image? –All files in some way store width and height How is the image.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Frame by Frame Bit Allocation for Motion-Compensated Video Michael Ringenburg May 9, 2003.
Weak Lensing 3 Tom Kitching. Introduction Scope of the lecture Power Spectra of weak lensing Statistics.
Estimation-Quantization Geometry Coding using Normal Meshes
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
JPEG. The JPEG Standard JPEG is an image compression standard which was accepted as an international standard in  Developed by the Joint Photographic.
Wavelet-based Coding And its application in JPEG2000 Monia Ghobadi CSC561 final project
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Approximation of Protein Structure for Fast Similarity Measures Fabian Schwarzer Itay Lotan Stanford University.
Fitting: The Hough transform
Image Denoising Using Wavelets
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Name Iterative Source- and Channel Decoding Speaker: Inga Trusova Advisor: Joachim Hagenauer.
ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.
ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission ( ) Image Compression Quantization independent samples uniform and optimum correlated.
Block-based coding Multimedia Systems and Standards S2 IF Telkom University.
Image Processing Architecture, © Oleh TretiakPage 1Lecture 4 ECE-C490 Winter 2004 Image Processing Architecture Lecture 4, 1/20/2004 Principles.
Entropy vs. Average Code-length Important application of Shannon’s entropy measure is in finding efficient (~ short average length) code words The measure.
4C8 Dr. David Corrigan Jpeg and the DCT. 2D DCT.
Image Processing Architecture, © Oleh TretiakPage 1Lecture 5 ECEC 453 Image Processing Architecture Lecture 5, 1/22/2004 Rate-Distortion Theory,
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
JPEG Compression What is JPEG? Motivation
Chapter 7. Classification and Prediction
Pyramid Vector Quantization for Video Coding
High-Quality, Low-Delay Music Coding in the Opus Codec
Vocoders.
Boosting and Additive Trees (2)
JPEG Image Coding Standard
Last update on June 15, 2010 Doug Young Suh
Computer vision: models, learning and inference
JPEG.
Context-based Data Compression
Convolutional Networks
PCM & DPCM & DM.
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
MPEG4 Natural Video Coding
Foundation of Video Coding Part II: Scalar and Vector Quantization
EE513 Audio Signals and Systems
#10 The Central Limit Theorem
Presentation transcript:

Pyramid Vector Quantization

What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform gain-shape quantization

Motivation

Why Vector Quantization? 3 classic advantages (Lookabaugh et al. 1989): Space filling advantage: VQ codepoints tile space more efficiently Example: 2-D, squares vs. hexagons Maximum possible gain for large dimension: 1.53 dB Shape advantage: VQ can use more points where PDF is higher 1.14 dB gain for 2-D Gaussian, 2.81 for high dimension Memory advantage: exploit statistical dependence between vector components

Why Vector Quantization? 3 classic advantages (Lookabaugh et al. 1989): Space filling advantage: VQ codepoints tile space more efficiently Example: 2-D, squares vs. hexagons Maximum possible gain for large dimension: 1.53 dB Shape advantage: VQ can use more points where PDF is higher Can be mitigated with entropy coding Memory advantage: exploit statistical dependence between vector components Transform coefficients are not strongly correlated

Why Vector Quantization Important: Space advantage applies even when values are totally uncorrelated Another important advantage Can have codebooks with less than 1 bit per dimension

Why Algebraic VQ? Trained VQ impractical for high rates, large dimensions High dimension → large LUTs, lots of memory Exponential in bitrate No codebook structure → slow search “Algebraic” VQ solves these problems Structured codebook: no LUTs, fast search Space-filling lattice for arbitrary dimension unknown: have to approximate PVQ asymptotically optimal for Laplacian sources

Why Gain-Shape Quantization? Separate “gain” (energy) from “shape” (spectrum) Vector = Magnitude × Unit Vector (point on sphere) Potential advantages Can give each piece different rate allocations Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding ±1’s Implicit activity masking Can derive quantization resolution from the explicitly coded energy Better representation of coefficients

How it Works (High-Level)

Simple Case: PVQ without a Predictor Scalar quantize gain Place K unit pulses in N dimensions Up to N = 1024 dimensions for large blocks Only has N-1 degrees of freedom Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain

Codebook for N=3 and different K

PVQ vs. Scalar Quantization

PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the prediction as “special” They are much more likely! Subtracting and coding the residual would lose energy preservation Solution: align the codebook axes with the prediction, and treat one dimension differently

2-D Projection Example Input Input

2-D Projection Example Input + Prediction Prediction Input

2-D Projection Example Input + Prediction Compute Householder Reflection Prediction Input

2-D Projection Example Input + Prediction Compute Householder Reflection Apply Reflection Prediction Input

2-D Projection Example Input + Prediction Compute Householder Reflection Apply Reflection Compute & code angle Prediction θ Input

2-D Projection Example Input + Prediction Compute Householder Reflection Apply Reflection Compute & code angle Code other dimensions Prediction θ Input

What does this accomplish? Creates another “intuitive” parameter, θ “How much like the predictor are we?” θ = 0 → use predictor exactly θ determines how many pulses go in the “prediction” direction K (and thus bitrate) for remaining N-1 dimensions adjusted down Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy) Can repeat for more predictors

Details...

Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands Gain, theta, etc., signaled separately for each band Layout ad-hoc for now Scan order in each band optimized for decreasing average variance

Band Structure 4x4 8x8 16x16 Scan order is possibly over-fit...

To Predict or Not to Predict... θ ≥ π/2 → Prediction not helping Could code large θ’s, but doesn’t seem that useful Need to handle zero predictors anyway Current approach: code a “noref” flag Currently jointly code up to 4 flags at once, with fixed order-0 probability per band (5% of KF rate) Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Quantization Matrix Simple approach (what we’re doing now) Separate quantization resolution for each band Keep flat quantization within bands Advanced approach? Scaling after normalization complicated Unit pulses no longer “unit” (how to sum to K?) Householder reflection scrambles things further Better(?): Pre-scale vector by quantization factors Effects on energy preservation?

Quantization Matrix Example Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Quantization Matrix Example Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23) Metrics: +15% PSNR, +12% SSIM, -18% PSNR-HVS

Activity Masking Goal: Use better resolution in flat areas Low contrast → low energy (gain) Derivations in doc/video_pvq.lyx, doc/theoretical_results.lyx Currently wrong/incomplete, will fix Step 1: Compand gain (g) Goal: Q ∝ g2α (x264 uses α = 0.173) Quantize ĝ = (Qgĥ)β, encode ĥ β = 1/(1-2α) Qg = (Q/β)β

Activity Masking cotd. Step 2: Choose θ resolution D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ Dpvq) Dθ = 2 – 2cos(θ – ϑ) = distortion due to θ quant. Dpvq = distortion due to PVQ Assume g = ĝ, ignore Dpvq... Qθ = (dĝ/dĥ)/ĝ = β/ĥ