Pyramid Vector Quantization
What is Pyramid Vector Quantization? A Vector Quantizer That has a simple algebraic structure To perform gain-shape quantization
Motivation
Why Vector Quantization? 3 classic advantages (Lookabaugh et al. 1989): Space filling advantage: VQ codepoints tile space more efficiently Example: 2-D, squares vs. hexagons Maximum possible gain for large dimension: 1.53 dB Shape advantage: VQ can use more points where PDF is higher 1.14 dB gain for 2-D Gaussian, 2.81 for high dimension Memory advantage: exploit statistical dependence between vector components
Why Vector Quantization? 3 classic advantages (Lookabaugh et al. 1989): Space filling advantage: VQ codepoints tile space more efficiently Example: 2-D, squares vs. hexagons Maximum possible gain for large dimension: 1.53 dB Shape advantage: VQ can use more points where PDF is higher Can be mitigated with entropy coding Memory advantage: exploit statistical dependence between vector components Transform coefficients are not strongly correlated
Why Vector Quantization Important: Space advantage applies even when values are totally uncorrelated Another important advantage Can have codebooks with less than 1 bit per dimension
Why Algebraic VQ? Trained VQ impractical for high rates, large dimensions High dimension → large LUTs, lots of memory Exponential in bitrate No codebook structure → slow search “Algebraic” VQ solves these problems Structured codebook: no LUTs, fast search Space-filling lattice for arbitrary dimension unknown: have to approximate PVQ asymptotically optimal for Laplacian sources
Why Gain-Shape Quantization? Separate “gain” (energy) from “shape” (spectrum) Vector = Magnitude × Unit Vector (point on sphere) Potential advantages Can give each piece different rate allocations Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding ±1’s Implicit activity masking Can derive quantization resolution from the explicitly coded energy Better representation of coefficients
How it Works (High-Level)
Simple Case: PVQ without a Predictor Scalar quantize gain Place K unit pulses in N dimensions Up to N = 1024 dimensions for large blocks Only has N-1 degrees of freedom Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain
Codebook for N=3 and different K
PVQ vs. Scalar Quantization
PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the prediction as “special” They are much more likely! Subtracting and coding the residual would lose energy preservation Solution: align the codebook axes with the prediction, and treat one dimension differently
2-D Projection Example Input Input
2-D Projection Example Input + Prediction Prediction Input
2-D Projection Example Input + Prediction Compute Householder Reflection Prediction Input
2-D Projection Example Input + Prediction Compute Householder Reflection Apply Reflection Prediction Input
2-D Projection Example Input + Prediction Compute Householder Reflection Apply Reflection Compute & code angle Prediction θ Input
2-D Projection Example Input + Prediction Compute Householder Reflection Apply Reflection Compute & code angle Code other dimensions Prediction θ Input
What does this accomplish? Creates another “intuitive” parameter, θ “How much like the predictor are we?” θ = 0 → use predictor exactly θ determines how many pulses go in the “prediction” direction K (and thus bitrate) for remaining N-1 dimensions adjusted down Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy) Can repeat for more predictors
Details...
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands Gain, theta, etc., signaled separately for each band Layout ad-hoc for now Scan order in each band optimized for decreasing average variance
Band Structure 4x4 8x8 16x16 Scan order is possibly over-fit...
To Predict or Not to Predict... θ ≥ π/2 → Prediction not helping Could code large θ’s, but doesn’t seem that useful Need to handle zero predictors anyway Current approach: code a “noref” flag Currently jointly code up to 4 flags at once, with fixed order-0 probability per band (5% of KF rate) Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Quantization Matrix Simple approach (what we’re doing now) Separate quantization resolution for each band Keep flat quantization within bands Advanced approach? Scaling after normalization complicated Unit pulses no longer “unit” (how to sum to K?) Householder reflection scrambles things further Better(?): Pre-scale vector by quantization factors Effects on energy preservation?
Quantization Matrix Example Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Quantization Matrix Example Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23) Metrics: +15% PSNR, +12% SSIM, -18% PSNR-HVS
Activity Masking Goal: Use better resolution in flat areas Low contrast → low energy (gain) Derivations in doc/video_pvq.lyx, doc/theoretical_results.lyx Currently wrong/incomplete, will fix Step 1: Compand gain (g) Goal: Q ∝ g2α (x264 uses α = 0.173) Quantize ĝ = (Qgĥ)β, encode ĥ β = 1/(1-2α) Qg = (Q/β)β
Activity Masking cotd. Step 2: Choose θ resolution D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ Dpvq) Dθ = 2 – 2cos(θ – ϑ) = distortion due to θ quant. Dpvq = distortion due to PVQ Assume g = ĝ, ignore Dpvq... Qθ = (dĝ/dĥ)/ĝ = β/ĥ