Prof. Jayanta Mukhopadhyay Video Processing in Compressed Domain By Prof. Jayanta Mukhopadhyay
Video Resizing
MPEG Introduction Encoding INTRA Motion Compensated Inter Frames INTRA (DCT, Quant., Motion Estimation & Compensation, VLC) Encoding INTRA Motion Compensated Inter Frames INTRA (IDCT, IQuant., Inverse Motion Compensation, VLC) Decoding Details
Compressed Domain (DCT) Processing Spatial Domain MPEG video VLC Decoder Inverse Quantization IDCT 8 x 8 DCT blocks Processing Box 8 x 8 DCT blocks MPEG video VLC Encoder Quantization DCT
Video Downscaling Approaches Applications Browsing remote video database, PIP, video conferencing, transcoding etc. Approaches Spatial Domain Technique Hybrid (Spatial + DCT) Technique Pure DCT Domain Technique
Video Resizing Spatial Domain Technique Input Data VLC Decoder Buffer IDCT + Motion Compensation Frame Memory Spatial Downscaling Frame Memory + - VLC Encoder DCT Q Q-1 Buffer IDCT Motion Estimation & Compensation + Output Frame Memory
Video Resizing Computation Complexity for P frames from CIF resolution to QCIF resolution Function Complexity Mults. Adds Shifts Inverse Quant. + IDCT (144m, 464a per 8x8 block) 228096 734976 Inverse Motion Compensation (256a per 16x16 block) 101376 Downscale by 2 (3a, 1s per pixel) 76032 25344 Full Search ME (± 15 pels, 738048a per 16x16 block) 73066752 Motion Compensation (256a per 16x16 block) 25344 DCT + Quant. (144m, 464a per 8x8 block) 57024 183744 Total 285120 74188224 25344 Total Operations count (Add = 1op, Shift = 1op, Mult. = 3ops) = 75068928
Video Resizing DCT based Downscale DCT Intra Frame DCT based DCT based Intra DCT blocks DCT based Downscale Downscaled DCT Intra Frames DCT Intra Frame Intra DCT blocks DCT based Motion Estimation & Compensation DCT based Inverse Motion Compensation DCT Inter Frame Motion Vectors Downscaled DCT Intra & Inter Frames
Video Resizing DCT Domain based down-sampling of Intra Frames by a factor of two Compressed Bitstream (downscaled) Compressed Bitstream Huffman Decoder & Dequantizer Huffman Encoder & Quantizer x1 x2 x3 x4 8x8 DCT Blocks 8x8 DCT Block x
Video Resizing Downscaling Technique for an Intra frame (DCT Domain) 8 samples b1 b2 B1 B2 8 - DCT 4 - IDCT 4 samples ^ , B Computational Complexity Downscaling 1.25m + 1.25a per pixel of the Original frame Upsampling Upsampled frame
Video Resizing + DCT Domain based Inverse Motion Compensation (IMC) Huffman Decoder And Dequantizer Huffman Encoder And Quantizer 8x8 DCT Error Blocks + 8x8 DCT Intra Blocks 8x8 DCT Blocks DCT Domain Inverse Motion Compensation Previous Frame DCT domain data 8x8 DCT Intra Blocks
Video Resizing ^ = ∑ ci1 xi ci2 x ^ = S ∑ ci1 StS xi StS ci2 St S x St DCT Domain based Inverse Motion Compensation (Neri Merhav’s Scheme) x1 w x2 x ^ = ∑ ci1 xi ci2 i = 1 4 h x ^ E x3 x4 where cij, i = 1, …, 4, j = 1,2 are sparse 8x8 matrices of zeros and ones. (Intra) (Inter) Expression (1) can be written as S x St ^ = S ∑ ci1 StS xi StS ci2 St i = 1 4 StS = I Where S is a 8-point DCT matrix. S can be factorized as S = D P B1 B2 M A1 A2 A3 Expression (2) can further be written as ^ X = S [Jh B2t B1t Pt D ( X1 D P B1 B2 Jwt + X2 D P B1 B2 K8-wt) + K8-h B2t B1t Pt D( X3 D P B1 B2 Jwt + X4 D P B1 B2 K8-wt) ] St Where Ji = Ui (M A1 A2 A3)t, and Ki = Li (M A1 A2 A3)t i = 1,2,……8 Details
Video Resizing Computation Complexity of the Neri Merhav’s IMC Matrix Computations/column J1 3m + 6a J2 4m + 10a J3 5m + 16a J4 5m + 19a J5 5m + 20a J6 5m + 22a J7 5m + 24a J8 5m + 28a B1/B1t 4a B2/B2t S/St 5m + 29a Let w = h = 4 Total computations = Six multiplications by B1 or B1t : 6x8x4 = 192a Six multiplications by B2 or B2t : 6x8x4 = 192a Two multiplications by Jw and K8-w, and one by Jh and K8-h = 8x(3x(5m + 19a + 5m + 19a) ) = 240m + 912a One 2D DCT operation = 2x(8x(5m + 29a)) = 80m + 464a Total operations = 320m + 1760a ( per 8x8 block) Operations per pixel = 5m + 27.5a
Video Resizing Modified IMC technique (MBIMC) M’ cr cc E x1 r x2 x3 c M’ m’ = x1 x2 x3 x4 x5 x6 x7 x8 x9 cr cc x4 x6 E x7 x9 1 ≤ r ≤ 8 and 1 ≤ c ≤ 8 (intra) (inter) Where Cr and Cc are row and column selector matrices of size 16x24 & 24x16. 0 0 ……..0 . 1 0 …………...…..0 0 1 …………...…..0 . 0 0 …………...…..1 0 0 ……..0 . cr = 16 rows r-1 columns 16 columns 8-r+1 columns
Video Resizing Macroblock wise IMC in DCT domain (A) cr cc M’ = X1 X2 X3 X4 X5 X6 X7 X8 X9 S 0 0 0 S 0 0 0 S cr cc St 0 0 0 St 0 0 0 St M’ = S 0 0 S St 0 0 St (A) Using the 8-point DCT matrix factorization, we can represent - St 0 0 0 St 0 0 0 St (M A1 A2 A3)t 0 0 0 (M A1 A2 A3)t 0 0 0 (M A1 A2 A3)t B2t 0 0 0 B2t 0 0 0 B2t B1t 0 0 0 B1t 0 0 0 B1t Pt 0 0 0 Pt 0 0 0 Pt Dt 0 0 0 Dt 0 0 0 Dt Qt B2t B1t Pt Dt = Where S is a 8-point DCT matrix. S can be factorized as S = D P B1 B2 M A1 A2 A3
Video Resizing The expression (A) can be written as X1 X2 X3 X4 X5 X6 cr cc M’ = Qt B2t B1t Pt Dt D P B1 B2 Q S 0 0 S St 0 0 St Let us represent Jr = Cr Qt and Kc = Q Cc 1 ≤ r ≤ 8 and 1 ≤ c ≤ 8 Jr and Kc will have similar complexities due to similar structure. Jr and Kr matrix multiplication can also be implemented efficiently by extending the notion of Neri Merhav.
Video Resizing 27 % improvement on Neri Merhav’s Approach Computation Complexity of the Modified IMC scheme Matrix Computations/column J1 10m + 56a J2 13m + 58a J3 14m + 60a J4 15m + 64a J5 15m + 66a J6 J7 J8 B1/B1t 12a B2/B2t S/St 5m + 29a Let r = c= 5 Total Computations = Two multiplication of B1 type : 2x24x12 = 576a Two multiplication of B2 type : 2x24x12 = 576a One multiplication of Jr & Kc : 2x24x(15m+66a) = 720m + 3168a Four 2D DCT operation = 4x(8x(5m + 29a)) = 160m + 928a Total computations = 880m + 5248a (per 16x16 block) Operations per pixel = 3.43m + 20.5a 27 % improvement on Neri Merhav’s Approach
Video Resizing PSNR difference between Spatial and MBIMC technique Video : flower Video : susi
Video Resizing Integrated Scheme for (IMC + Downscaling) Downsampling Filter If x1, x2, x3, x4 are 8x8 spatial domain adjacent blocks. The downsampled block ‘x’ can be computed as x1 x2 x3 x4 x = d dt (B) Where ‘d’ is a downsampling filter. 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 d = 0.5 8x16
Video Resizing Using expression (A) and (B), we can write X1 X2 X3 cr Qt B2t B1t Pt Dt D P B1 B2 Q cc M’ = DCT d dt 16x16 8x8 Let us represent Jr = d Cr Qt and Kc = Q Cc dt 1≤ r ≤8 ; 1≤ c ≤8 Jr and Kc will have similar structure and similar computation complexities. But It will have two different structure when r (or c) is even and when r (or c) is odd. Details
Computations per column Video Resizing Computation Complexity of the Integrated Scheme (IMC + Downsampling) Matrix Computations per column J1 10m + 34a J2 17m + 44a J3 14m + 38a J4 16m + 43a J5 15m + 41a J6 J7 J8 15m + 44a Let r = c = 6 Two multiplication of B1/B1t = 2 x 12 x 24 = 576a Two multiplication of B2/B2t = 2 x 12 x 24 = 576a One multiplication by J6 = 24 x (17m + 44a) = 408m + 1056a One multiplication by K6 = 24 x (17m + 44a) = 408m + 1056a One 2D-DCT operation = 8 x (5m + 29a) = 40m + 232a Total computations = 856m + 3496a ( per 16x16 block) Operations per pixel = 3.34m + 13.65a 40 % improvement on Neri Merhav’s Approach
Video Resizing PSNR difference between Spatial and Integrated Scheme Video : Flower Video : Mobile
Video Resizing Average PSNR Comparison Chart Spatial Domain Scheme (Videos are downscaled from CIF (1.15 Mbps) to QCIF (512 Kbps) Spatial Domain Scheme Neri Merhav Scheme MBIMC Scheme Integrated Scheme Video I P Susi 32.32 32.58 35.90 35.85 33.92 Tennis 24.04 23.90 25.95 25.53 23.89 Mobile 21.03 22.50 22.62 24.30 22.88 Flower 21.99 23.20 24.07 25.62 23.85
Video Resizing Motion Vector Re-estimation
Video Resizing Algorithms for Motion Vector Re-estimation Adaptive Motion Vector Resampling Technique (AMVR) Maximum Average Correlation (MAC) Median Method Non-Linear Motion Vector Resampling Technique (NLMR) And many more…
Video Resizing Comparison of Motion Vector Re-estimation Methods Video : Coastguard Frames : 300 From : CIF (1.15 Mbps) To : QCIF (500 Kbps)
Video Resizing Comparison of Motion Vector Re-estimation Methods Video : Container Frames : 300 From : CIF (1.15 Mbps) To : QCIF (500 Kbps)
Video Resizing Pure DCT Domain based Proposed System Input Data VLC DCT blocks Input Data VLC Decoder Buffer Q-1 Motion Vector AMVR + MBIMC Scheme DCT Frame DCT Downscaling Intra DCT Blocks Q Step Size DCT Frame + - Q VLC Encoder Q-1 MTSS Buffer + DCT Based Motion Compensation Frame Memory Output
Video Resizing Computational Complexity of Proposed System Function Complexity Mults. Adds Shifts Inverse Quant. (64m per 8x8 block) 101376 MBIMC (3.43m, 20.5a per pixel) 347720 2078208 DCT downscale by 2 (1.25m, 1.25a per pixel) 126720 126720 AMVR (9m, 30a, 1shift per 16x16 block) 891 2970 99 DCT domain MC (3.43m, 20.5a per pixel) 86930 519552 Quant. (64m per 8x8 block) 25344 Total 688981 2727450 99 (Conversion of P frame from CIF to QCIF) Total Operations count (Add = 1op, Shift = 1op, Mult. = 3ops) = 4794492 16 times faster than Spatial Domain Method
Video Resizing Comparison of Pure DCT and Hybrid System Avg. PSNR (24.2630 32.4528 32.4523) Pure DCT (25.7454 42.2119 43.1655)
Video Resizing Comparison of Pure DCT and Spatial System Avg. PSNR (25.1723 32.5405 32.5595) Pure DCT (25.7454 42.2119 43.1655)
Video Resizing Optimization to Pure DCT based proposed system (Utilizing the sparseness of DCT blocks) Function Complexity Mults. Adds Shifts Inverse Quant. (64m per 8x8 block) 101376 MBIMC (0.9m, 6.8a per pixel) 91238 689357 (assuming only 16 non-zero coeff.) DCT downscale by 2 (1.25m, 1.25a per pixel) 126720 126720 AMVR (9m, 30a, 1shift per 16x16 block) 891 2970 99 DCT domain MC (0.9m, 6.8a per pixel) 22810 172339 Quant. (64m per 8x8 block) 25344 Total 368379 991386 99 (Conversion of P frame from CIF to QCIF) Total Operations count (Add = 1op, Shift = 1op, Mult. = 3ops) = 2096622 36 times faster than Spatial Domain Method
Video Resizing Comparison of Optimized Pure DCT and Hybrid System Avg. PSNR Hybrid (24.2630 32.4528 32.4523) Optimized Pure DCT (25.1310 42.2768 43.2303 )
Video Resizing Comparison of Optimized Pure DCT and Spatial System Avg. PSNR Spatial (25.1723 32.5405 32.5595) Optimized Pure DCT (25.1310 42.2768 43.2303 )
(assuming 16 non-zero coeff.) Video Resizing Average PSNR Comparison Chart (Videos are downscaled from CIF (1.15 Mbps) to QCIF (512 Kbps) Spatial Domain Method Hybrid Domain Method DCT Domain Method (assuming 16 non-zero coeff.) Video Y U V Coastguard 25.17 32.54 32.55 24.26 32.45 25.13 42.22 43.16 Foreman 28.61 32.20 32.08 28.29 32.00 31.92 29.42 40.09 41.09 Susi 33.90 32.94 32.44 33.86 32.93 32.43 36.83 40.92 40.73 Tennis 24.98 32.36 31.58 24.97 31.57 26.49 41.60 41.95
Conclusion The modified IMC (MBIMC) scheme provided 27% improvement over the existing IMC technique. The Integrated (IMC+downscaling) scheme provides 40% improvement. Our proposed DCT domain based video downscaling system is 36 times faster than spatial domain method. Our proposed DCT domain based video downscaling system produces approx. 1.5 dB better output than Hybrid and spatial domain system.
H.264 Resizing
Relation between Integer DCT and Real DCT
To simplify the implementation, d is approximated by 0.5. It can be factorized as To simplify the implementation, d is approximated by 0.5. To ensure that the transform remains orthogonal, b also needs to be modified such that
The 2nd and 4th rows of matrix C and the 2nd and 4th columns of matrix CT are scaled by a factor of 2 The post-scaling matrix E is scaled down to compensate. Ef This transform is an approximation to the 4x4 DCT but not equal to it. Forward transform and inverse transform are not the same.
Ei The forward and inverse transforms are orthogonal T-1(T(X)) = X. Ef and Ei are scaling matrices that can be incorporated into the quantizer. Hence Real forward DCT = Input is trasformed by Integer forward transfom and then sacled by Ef. Real Inverse DCT = input scaled by Ei and then transformed by Integer Inverse transform
Conversion of a H.264 P frame to an I frame Macroblock is be partitioned into any of the seven types 16x16,16x8,8x16,8x8,8x4,4x8,4x4 For each macroblock partition type there may be 10 prediction types. Full pel prediction Horizontal only – Half pel or quarter pel Vertical only – Half pel or quarter pel Horizontal and then vertical – Half pel or quarter pel Vertical and then Horizontal – Half pel or quarter pel Diagonal prediction – Half pel or quarter pel
What is Transcoding? Transcoding : A Process in which a coded bit stream is converted into another one of different bit rate, or different format. Bit stream of Different bit rate, or Different Format Pre-encoded Bit stream Transcoder
Pixel Domain Transcoder(PDT) yuv frames H264 video file H.264 Decoder MPEG-2 Encoder MPEG-2 Video file
Pixel Domain Transcoder Frame MPEG-2 encoder vs. PDT Pixel Domain Transcoder Frame MPEG-2 Frame
MPEG-2 encoder vs. PDT contd. MPEG-2 Encoder Vs Pixel Domain Transcoder
Motion Vectors and Block types DCT Domain Transcoder + VLD IQ Q2 VLC - IQ + + Motion Vectors and Block types MC- DCT MEMORY
Motion Vectors and Block types Enhancement of PDT VLD IQ IDCT DCT Q2 VLC MEMORY MC + - Motion Vectors and Block types
Adaptive Motion Vector Re-estimation(AMVR) Weighted average approach Align to best prediction error vector Criteria: if the object boundary blocks have low prediction error than background blocks. Align to worst prediction error vector Criteria: if the object boundary blocks have High prediction error than background blocks
AMVR-Contd MVi is the motion vector of block i of H.264. Ai is denotes the activity measurement of the block.
Median Method Extracts the motion vector situated in the middle of the rest of the motion vectors
Non-Linear Motion Vector Re-estimation Minimum distance from the optimal is the best matching motion vector Four parameters are defined for each block A – Activity measurement C – Cluster of motion vector Q – Quantization step size M – Magnitude of motion vector
NLMR contd For each referenced block i, Li is defined(Li is the likelihood score that the block is matching with the optimal) Li is incremented when any Ai,Ci,Qi is highest or Mi is lowest among 16 blocks. Motion vector corresponding to higest L is the best matching motion vector.
Computations Required Method Additions and Subtractions Multiplications and divisions Shifts and Comparisons Total Saving Per Frame Full Search 9669s +2733a 22m 262s + 836c 23622 0% AMVR 50a 34m+2d 1c 87 99.6% of ME time Median 242a 2d 16c 260 98.89% of ME time NLMR 512s+22a 24m+2d 352c 912 96.13% of ME time