Pyramid Vector Quantization

Pyramid Vector Quantization

What is Pyramid Vector Quantization?
A Vector Quantizer That has a simple algebraic structure To perform gain-shape quantization

Motivation

Why Vector Quantization?
3 classic advantages (Lookabaugh et al. 1989): Space filling advantage: VQ codepoints tile space more efficiently Example: 2-D, squares vs. hexagons Maximum possible gain for large dimension: dB Shape advantage: VQ can use more points where PDF is higher 1.14 dB gain for 2-D Gaussian, 2.81 for high dimension Memory advantage: exploit statistical dependence between vector components

Why Vector Quantization?
3 classic advantages (Lookabaugh et al. 1989): Space filling advantage: VQ codepoints tile space more efficiently Example: 2-D, squares vs. hexagons Maximum possible gain for large dimension: dB Shape advantage: VQ can use more points where PDF is higher Can be mitigated with entropy coding Memory advantage: exploit statistical dependence between vector components Transform coefficients are not strongly correlated

Why Vector Quantization
Important: Space advantage applies even when values are totally uncorrelated Another important advantage Can have codebooks with less than 1 bit per dimension

Why Algebraic VQ? Trained VQ impractical for high rates, large dimensions High dimension → large LUTs, lots of memory Exponential in bitrate No codebook structure → slow search “Algebraic” VQ solves these problems Structured codebook: no LUTs, fast search Space-filling lattice for arbitrary dimension unknown: have to approximate PVQ asymptotically optimal for Laplacian sources

Why Gain-Shape Quantization?
Separate “gain” (energy) from “shape” (spectrum) Vector = Magnitude × Unit Vector (point on sphere) Potential advantages Can give each piece different rate allocations Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding ±1’s Implicit activity masking Can derive quantization resolution from the explicitly coded energy Better representation of coefficients

How it Works (High-Level)

Simple Case: PVQ without a Predictor
Scalar quantize gain Place K unit pulses in N dimensions Up to N = 1024 dimensions for large blocks Only has N-1 degrees of freedom Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain

Codebook for N=3 and different K

PVQ vs. Scalar Quantization

PVQ with a Predictor Video provides us with useful predictors
We want to treat vectors in the direction of the prediction as “special” They are much more likely! Subtracting and coding the residual would lose energy preservation Solution: align the codebook axes with the prediction, and treat one dimension differently

2-D Projection Example Input Input

2-D Projection Example Input + Prediction Prediction Input

2-D Projection Example Input + Prediction
Compute Householder Reflection Prediction Input

Compute Householder Reflection Apply Reflection Prediction Input

Compute Householder Reflection Apply Reflection Compute & code angle Prediction θ Input

Compute Householder Reflection Apply Reflection Compute & code angle Code other dimensions Prediction θ Input

What does this accomplish?
Creates another “intuitive” parameter, θ “How much like the predictor are we?” θ = 0 → use predictor exactly θ determines how many pulses go in the “prediction” direction K (and thus bitrate) for remaining N-1 dimensions adjusted down Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy) Can repeat for more predictors

Details...

Band Structure DC coded separately with scalar quantization
AC coefficients grouped into bands Gain, theta, etc., signaled separately for each band Layout ad-hoc for now Scan order in each band optimized for decreasing average variance

Band Structure 4x4 8x8 16x16 Scan order is possibly over-fit...

To Predict or Not to Predict...
θ ≥ π/2 → Prediction not helping Could code large θ’s, but doesn’t seem that useful Need to handle zero predictors anyway Current approach: code a “noref” flag Currently jointly code up to 4 flags at once, with fixed order-0 probability per band (5% of KF rate) Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities

Quantization Matrix Simple approach (what we’re doing now)
Separate quantization resolution for each band Keep flat quantization within bands Advanced approach? Scaling after normalization complicated Unit pulses no longer “unit” (how to sum to K?) Householder reflection scrambles things further Better(?): Pre-scale vector by quantization factors Effects on energy preservation?

Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)

Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23) Metrics: +15% PSNR, +12% SSIM, -18% PSNR-HVS

Activity Masking Goal: Use better resolution in flat areas
Low contrast → low energy (gain) Derivations in doc/video_pvq.lyx, doc/theoretical_results.lyx Currently wrong/incomplete, will fix Step 1: Compand gain (g) Goal: Q ∝ g2α (x264 uses α = 0.173) Quantize ĝ = (Qgĥ)β, encode ĥ β = 1/(1-2α) Qg = (Q/β)β

Activity Masking cotd. Step 2: Choose θ resolution
D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ Dpvq) Dθ = 2 – 2cos(θ – ϑ) = distortion due to θ quant. Dpvq = distortion due to PVQ Assume g = ĝ, ignore Dpvq... Qθ = (dĝ/dĥ)/ĝ = β/ĥ

Pyramid Vector Quantization

Similar presentations

Presentation on theme: "Pyramid Vector Quantization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pyramid Vector Quantization

Similar presentations

Presentation on theme: "Pyramid Vector Quantization"— Presentation transcript:

Similar presentations

About project

Feedback