Transform Coding Heejune AHN Embedded Communications Laboratory Seoul National Univ. of Technology Fall 2013 Last updated 2013. 9. 30
Agenda Transform Coding Concept Transform Theory Review DCT (Discrete Cosine Transform) DCT in Video coding DCT Implementation & Fast Algorithms Appendix: KL Transform
1. Transform Coding X1= lum(2n), X2= lum(2n+1), neighbor pixels Y1, Y2 X1 ~ U(0, 255), X2~ U(0,255) Quantization of X1 and X2 => same data Cross-Correlation of X1 and X2 Y1, Y2 45 degree rotation Y1 = (X1 + X2) /2 Average or DC value Y2 = (X2 – X1) /2 Difference or AC value Y1 ~ F(0, 255), Y2~ F(-255,255) 255 -255 255
Which ones are easier to encode (quantize)? f(X1) f(X2) 255 255 f(Y1) f(Y2) 255 -255 255
Origins of Transform Coding Benefits Signal Theory Make the representation easier to manipulate energy concentration Image and HVS Properties HVS is more sensitive to Low frequency More dense quantizer to Low frequency Vilfredo Pareto Economist 1848-1923
2. Transform Theory Review Definition of Transform N to M mapping, [Y1, Y2, . . ., YN] = F [X1,X2, . . ., XM] Linear Transform (cf. Non-Linear Transform) if [Y11, Y12] = F [X11,X12] and [Y21, Y22] = F [X21,X22] [Y11 + Y21, Y12 +Y22] = F [X11+X21, X21+X22] Matrix representation of Linear Transform Forward Inverse y = T x N transform coefficients, arranged as a vector Transform matrix of size NxN Input signal block of size N, arranged as a vector x = T-1 y
x = T-1 y = TT y T-1 =TT => ||y||2 = yTy = xTTT Tx = ||x||2 Basis Vectors Orthogonal Vl * Vm = 0 for basis Vector V1, V2, . . ., VN Each vectors are disjointed, separated. Orthonormal || Vl || = 1 for basis Vector V1, V2, . . ., VN Parseval’s Theorem Signal Power/Energy conserves between Transform Domain v1 v2 v3 vN T-1 =TT => x = T-1 y = TT y ||y||2 = yTy = xTTT Tx = ||x||2
Example of Orthonormal transform
2D Transform y = T x x = T-1 y Data Forward Transform 2D pixel value matrix, 2D transform coefs matrix 2D matrix => 1D vector Forward Transform Inverse transform y = T x NxN transform coefficients, arranged as a vector Transform matrix of size N2xN2 Input signal block of size NxN, arranged as a vector x = T-1 y
3. Transforms Various transforms in image compression DFT (Discrete Fourier Transform) DCT (Discrete cosine Transform) DST (Discrete sine Transform) Hadamard Transfrom Discrete Wavelet Transform and more (HAAR etc )
Hadamard transform Core Matrix 1차원 N 차원 2차원 Transform
DCT Transform 1D Forward DCT (pixel domain to frequency domain) 1D Inverse DCT (frequency domain to pixel domain)
2D DCT 2D DCT basis Functions Coef. Distribution DC ~ Uniform dist., AC ~ Laplacian dist.
Properties DCT performance DCT complexity Orthonormal transform Separable transform Real valued coefficients DCT performance very resembles KLT for image input Image input model (1 order Markov chain) xn+1 = rho * xn+1 + e(n) DCT complexity 2D DCT = 1D DCT for vertical * 1D DCT for horizontal Not for 3D (for delay and memory size) DCT size (4x4, 8x8, 16x16, 32x32 …) Larger: better performance, but blocking artifact (?) and HW complexity
Coding Performance of DCT Karhunen Loève transform [1948/1960] Haar transform [1910] Walsh-Hadamard transform [1923] Slant transform [Enomoto, Shibata, 1971] Discrete CosineTransform (DCT) [Ahmet, Natarajan, Rao, 1974] Comparison of 1-d basis functions for block size N=8
Energy concentration Performance measured for typical natural images, block size 1x32 KLT is optimum DCT performs only slightly worse than KLT
Complexity Performance of DCT Separation of 2D DCT Cascading 1-D DCT Reduction of the complexity (multiplication) from O(N4) to O(N3) 8x8 DCT For 64 each Coefs, 64 multiplications 2 times 64 Coefs x 8 Can you derive this ? NxN block of transform coefficients NxN block of pixels N column-wise N-transform row-wise N-transform
4. Transform in Image Coding Transform coding Procedure Transform T(x) usually invertible Quantization not invertible, introduces distortion Combination of encoder and decoder lossless
DCT in Image Coding Original 8x8 block Transformed 8x8 block Q Run-level coding Original 8x8 block Transformed 8x8 block Zig-zag scan Transmission Reconstructed 8x8 block Scaling and inverse DCT Inverse zig-zag scan
DCT in Image Coding Uniform deadzone quantizer Entrphy coding transform coefficients that fall below a threshold are discarded. Entrphy coding Positions of non-zero transform coefficients are transmitted in addition to their amplitude values. Efficient encoding of the position of non-zero transform coefficients: zig-zag-scan + run-level-coding Baseline JPEG does not use deadzone
DCT Examples Note that only a few coefficients has sizable value.
quantizer stepsize for AC coefficients: 25 quantizer stepsize DCT coding with increasingly coarse quantization, block size 8x8 quantizer stepsize for AC coefficients: 25 quantizer stepsize for AC coefficients: 100 quantizer stepsize for AC coefficients: 200
4. Implementation Implementation issue HW or SW Computational Cost, Speed, Implementation Size Performance Cost Implementation complexity SW Implementation decision factors Computational cost of multiplication Whether Fixed or Float point operation (esp. multiplication) Special Coprocessor and Instruction set (e.g. MMX)
Fast DCT Algorithm Original DCT/IDCT Fast DCT Computation load Scaling 64 Add + 64 Mult. 8 (7) Addition + 8 multiplication / one coeff. (from eqn.) Scaling input range [0, 255] => output range [-2024, 2024] Fast DCT Similar to Fast DFT Share same computation between nodes. O(NxN) => O (N log2N) N : Width (num of coeff.) log2N : Steps of algorithm Several version : Chen, Lee, Arai etc
Chen’s FDCT See Code at http://www.cmlab.csie.ntu.edu.tw/~chenhsiu/tech/fastdct.cpp
How the fast algorithm works? Exploiting the symmetry of cosine function. STEP 1 STEP 2
HW Implementation 2D DCT using 1D DCT Function Block 1-D DCT 8x8 RAM Input sample 1-D DCT Output coef MUX 8x8 RAM Row order input Column order output
Distributed Arithmetic DCT Multiplier-less architecture Lookup, Shift, accumulators only 4 bits from u input Shift(2-1) LUT (ROM) accumulator Output coef Fx Add or subtract
IDCT Mismatch DCT x IDCT = I ? DCT mismatch in MC-DCT DCT is defined: in “floating point” and “direct form.” Integer Implementation induces ‘error’ after Inverse DCT. different FDCT has different ‘error’s. DCT mismatch in MC-DCT different reference image at encoder and decoder very small error but it accumulates. orgE DCT Q VLC VLD IQ IQ IDCTE IDCTD Should Equal but Mismatch ! recE recD
IDCT Mismatch control Minimum accuracy of DCT algorithm is defined in SPEC. H.261/3,MPEG-1/2 Restrict the sum of coefficients values Oddification rule of sum of all DCT coefficients, Make LSB of F[63], the last Coef. Decoder check and correct the values H.264 (modified) Integer DCT is used adding random error cancelation
KL Transform, The Optimal Transform Appendix KL Transform, The Optimal Transform
Optimal Transform Optimality K-L (Karhunen-Loeve) transform (No) Redundancy in input signal => (No) Redundant Quantization Result No cross-correlation between different components (coefs) K-L (Karhunen-Loeve) transform Assumption Input Covariance is given Problem Definition find a transform (Y=T X) such that RY,Y = T RX,X TT meets diagonal matrix (i.e., completely uncorrelated Y)
Optimal Transform Solution Issue in KLT Build T with eigenvectors of RX,X as basis vector Then, by the definition of Eigen-vectors & values (of RX,X) So. Issue in KLT RX,X is varying for image to image: Need to calculate new T, transmit it to decoder Not Separable (vertical, horizontal) But, good for benchmarking performance of other transform.