E ARLY TERMINATION FOR TZ SEARCH IN HEVC MOTION ESTIMATION PRESENTED BY: Rajath Shivananda ( ) 1 EE 5359 Multimedia Processing Individual Project
List of Acronyms and Abbreviations AVC- Advanced Video Coding. AMVP- Advance motion vector prediction. AP- Above Predictor. ARP- Above Right Prdictor. B-frame- Bi-predictive frame. BMA - Block Matching Algorithm. CABAC- Context Adaptive Binary Arithmetic Coding. CTB- Coding Tree Block. CTU- Coding Tree Unit. CU- Coding Unit. CB- Coding Block DCT- Discrete Cosine Transform. HDTV- High Definition Television. HEVC- High Efficiency Video Coding. HM- HEVC Test Model. I-frame- Intra-coded frame. ICASSP- International Conference on Acoustics, Speech and Signal Processing. JCT- Joint Collaborative Team. JCT-VC- Joint Collaborative Team on Video Coding. JM- H.264 Test Model.
JPEG- Joint Photographic Experts Group. KBPS - Kilo Bits Per Second. LCU - Large Coding Unit. LDSP- Large Diamond Search Pattern LP – Left Preditor. MV- Motion Vector. MP - Median Predictor. MC - Motion Compensation. ME - Motion Estimation. MPEG - Motion Picture Experts Group. P-frame: Predicted frame. PC- Prediction Chunking. PU- Prediction Unit. PB- Prediction Block. PSNR- Peak Signal to Noise Ratio. SAD- Sum of absolute Difference. SCU- Small coding unit. SSD -Sum of Squared Differences. QP: Quantization Parameter.
RD: Rate Distortion TB: Transform Block. TU: Transform Unit. TZSearch: Test Zone Search
Abstract [14] The TZSearch algorithm was adopted in the high efficiency video coding reference software HM as a fast Motion Estimation (ME) algorithm for its excellent performance in reducing ME time and maintaining a comparable Rate Distortion (RD) performance. However, the multiple initial search point decision and the hybrid block matching search contribute a relatively high computational complexity to TZSearch. Based on the statistical analysis of the probability of median predictor to be selected as the final best point in the large Coding Units (CUs) (64x64, 32x32) and small CUs (16x16, 8x8) as well as the center-biased characteristic of the final best search point in ME process two early terminations for TZSearch are proposed. Experimental results shows that 38.96% encoding time is saved, while the RD performance degradation is quite acceptable [16].
Objective of the project In this project, TZSearch is used as the best block matching algorithm compared to full search algorithm. Proposed algorithm is followed to terminate TZSearch algorithm to reduce the computational time by % [16]. Median predictors are used as the best initial search point which is about 67.41% (Average). Further Experiment is conducted using 3 configuration profiles which are random access, low delay and custom configuration profile. The results for these different configuration profiles are compared and the best configuration profile is selected. Lastly, different video sequences are used and their PSNR and Bitrate are compared.
Basic Concepts of Video Coding Color Spaces : RGB color space – Each pixel is represented by three numbers indicating the relative proportions of red, green and blue colors YCrCb color space – Y is the luminance component, a monochrome version of color image. Y is a weighted average of R, G and B: Y = kr R + kg G + kb B where k are the weighting factors. 7
The popular patterns of sampling [4] are: 4:4:4 – The three components Y: Cr: Cb have the same resolution, which is for every 4 luminance samples there are 4 Cr and 4 Cb samples. 4:2:2 – For every 4 luminance samples in the horizontal direction, there are 2 Cr and 2 Cb samples. This representation is used for high quality video color reproduction. 4:2:0 – The Cr and Cb each have half the horizontal and vertical resolution of Y. This is popularly used in applications such as video conferencing, digital television and DVD storage. 8
9 Figure 1. 4:2:0 sub-sampling pattern [4]. Figure 2. 4:2:2 and 4:4:4 sub-sampling and sampling patterns [4].
H.265 / High Efficiency Video Coding High Efficiency Video Coding (HEVC) [5] is an international standard for video compression developed by a working group of ISO/IEC MPEG (Moving Picture Experts Group) and ITU-T( International Telecommunication Union ) VCEG (Video Coding Experts Group). The main goal of HEVC standard is to significantly improve compression performance compared to existing standards (such as H.264/Advanced Video Coding [6]) in the range of 50% bit rate reduction at similar visual quality[7]. The macroblocks and blocks in H.264 are replaced by CTU,CU,TU and PU in H.265/HEVC. 10
Figure 3. Block Diagram of HEVC CODEC [7]. 11
HEVC Encoder and Decoder Figure 4. Block Diagram of the HEVC Encoder [2]. 12
Figure 5. Block Diagram of the HEVC Decoder [8]. 13
Each picture is split into block-shaped regions, with the exact block partitioning being conveyed to the decoder. The first picture of a video sequence is coded using only intra-picture prediction. The encoder and decoder generate identical inter-picture prediction signals by applying motion compensation (MC) using the MV and mode decision data, which are transmitted as side information. The transform coefficients are then scaled, quantized, entropy coded, and transmitted together with the prediction information. The quantized transform coefficients are constructed by inverse scaling and are then inverse transformed to duplicate the decoded approximation of the residual signal. The residual is then added to the prediction, and the result of that addition may then be fed into one or two loop filters to smooth out artifacts induced by block-wise processing and quantization. The duplicate of the output of the decoder is stored in a decoded picture buffer to be used for the prediction of subsequent pictures. 14
Coding tree units and coding tree block (CTB) structure: Figure 6. 64*64 CTBs split into CBs [9] 15
Coding units (CUs) and coding blocks (CBs): One Luma CB and ordinarily two Chroma CBs, together with associated syntax, form a coding unit (CU) as shown in Figure.7. A CTB may contain only one CU or may be split to form multiple CUs, and each CU has an associated partitioning into prediction units (PUs) and a tree of transform units (TUs). Figure 7. CU’s split into CB’s [9] 16
Prediction units and prediction blocks (PBs) : Figure 8. Partitioning of Prediction Blocks from Coding Blocks [9] 17
TUs and transform blocks : Figure 9. Partitioning of Transform Blocks from Coding Blocks [9] 18
INTER PREDICTION[3] It takes advantage of the temporal redundancies that exist among successive frames and derives a motion- compensated prediction (MCP) for a block of image samples. Inter Prediction comprises of Motion Estimation and Motion Compensation. 19 Figure 10. Prediction from reference blocks [48].
20 MOTION ESTIMATION Figure 11. Motion Vectors [47]
BLOCK DIAGRAM OF HEVC INTER-PICTURE PREDICTION 21 Grey parts represent the bi-prediction path Figure 12. HEVC Inter – picture prediction [48]
INTER-PICTURE PREDICTION BLOCK MERGING 22 The HEVC quad-tree partitioning has a disadvantage of over-segmenting the image. This leads to redundant signaling and ineffective borders. This drawback can be overcome using Block merging technique. Figure 13. Shows the image of a pendulum, using quad-tree partitioning and block merging respectively [48].
FRACTIONAL SAMPLE INTERPOLATION Interpolation is applied on the reference pictures to derive the prediction signal when the corresponding motion vector has fractional sample accuracy. HEVC supports motion vectors with quarter-pixel accuracy for the luma component and one-eighth pixel accuracy for chroma components. 23 Figure 14. Interpolation[45]
BLOCK MATCHING ALGORITHM[12] Divide the current frame into blocks. Find one displacement vector for each block. Within a search range, find a “best match” that minimizes an error measure. Intelligent search strategies can reduce computation 24 Figure 15. Block matching algorithm [44].
BLOCK MATCHING ALGORITHM [CONT.][12] 25 Figure 16. Block matching algorithm [46]
I- Frame Figure 17. I-Frame from BQMall video sequence.
B-Frame Figure 18. B-Frame from BQMall video sequence
B-Frame (Continued) Figure 19. B-Frame from BQMall video sequence.
B-Frame (Continued) Figure 20. B-Frame from BQMall video sequence.
Residual from I-Frame Figure 20. Residual of I-Frame of BQMall video sequence
Residual from B-Frame Figure 21. Residual of B-Frame of BQMall video sequence
BLOCK MATCHING: SEARCH ALGORITHMS 32 Full Search Four Step Search Three Step Search Diamond Search Test Zone Search Exhaustive Search The various BMAs that are used to determine the best matched MV Figure 22. Search Algorithm [18].
DIAMOND SEARCH[12] 33 Figure 23. Diamond Search [12].
SAD (SUM OF ABSOLUTE DIFFERENCE)[19] SAD architecture is used to calculate the minimum cost between the current block and reference block. Since Motion Estimation(ME) is the most time consuming portion in HEVC encoder it is required to reduce the residual information that is sent to the transform block. The rate-distortion cost J is shown in equation(1), where λ-> Lagrange multiplier R-> Bits required to encode the motion vector difference D-> Distortion function. One of the most widely used distortion function for motion estimation is SAD that can be defined as shown in equation (2), where CB-> Current block pixels RB-> Reference block pixels MxN is the size of the current PU block. Minimum of J is considered. 34
Median Predictors Flexible size representation technique contribute the largest proportion of encoding time to HEVC encoder. The search procedure of TZSearch includes two steps, initial search point decision and block matching search, respectively. The first step is to determine the initial search point by using a set of predictors which includes Median Predictor (MP) [10], Left Predictor (LP), Above Predictor (AP), Above-Right Predictor (ARP) and (0, 0). LP, AP and ARP are corresponding to the Motion Vector (MV) of left, top, top right block of the current block, respectively. After the initial search point is determined, the hybrid block matching search, including multiple diamond/square search and raster search, are used to locate the best matching block which is with the minimum RD cost. However, the computational complexity of multiple initial search point decision and hybrid block matching search is still relatively high. If these two processes can be simplified, much more encoding time will be saved. Figure 24. Median Predictors [30]
Proposed Early termination Algorithm [1] 36
Test Sequences Figure 25. A frame from Mobisode2 video sequence. Resolution – 416x240 Figure 26. A frame from BQMall video sequence. Resolution – 832x480.
Test Sequences (Continued) Figure 27. A frame from Johnny video sequence. Resolution – 1280x720. Figure 28. A frame from Park Scene video sequence. Resolution – 1920x1080.
Experimental Results Table 1 shows the no of time ( in percentage) median predictors are selected as the best initial search point. It can be seen from table 1 that about % on an average median predictors are selected. Table 1. Possibility of selecting Median Predictors as the final best search point, Unit(%). QP Sequence Average BQMall Johnny Modisode ParkScene Average
Experimental Results (Continued) Test Conditions Table 2. Test conditions. Test Conditions Frame Rate30 Total Number of Frames60 GOP8 Search Range64 CU Size / Depth64/4 Inter frames intervals32 QP24, 28, 32, 36
Experimental Results (Continued) Comparison of low delay, random access and random access early for different QP values and different video sequences. Table 3. PSNR for low delay, random access and random access early profile. Table 4. PSNR for low delay, random access and random access early profile. Table 5. PSNR for low delay, random access and random access early profile. PSNR in DB for Different QP Video SequenceProfile Modisode2Low_delay Random_access Random_access_early Bitrate in kbps for Different QP Video SequenceProfile Modisode2Low_delay Random_access Random_access_early Encoding Time in Seconds for Different QP Video SequenceProfile Modisode2Low_delay Random_access Random_access_early Encoding Time saved (%)
Experimental Results (Continued) Comparison of low delay, random access and random access early for different QP values and different video sequences. Table 6. PSNR for low delay, random access and random access early profile. Table 7. PSNR for low delay, random access and random access early profile. Table 8. PSNR for low delay, random access and random access early profile. PSNR in DB for Different QP Video SequenceProfile BQMallLow_delay Random_access Random_access_early Bitrate in kbps for Different QP Video SequenceProfile BQMallLow_delay Random_access Random_access_early Encoding Time in Seconds for Different QP Video SequenceProfile BQMallLow_delay Random_access Random_access_early Encoding Time saved (%)
Experimental Results (Continued) Comparison of low delay, random access and random access early for different QP values and different video sequences. Table 9. PSNR for low delay, random access and random access early profile. Table 10. PSNR for low delay, random access and random access early profile. Table 11. PSNR for low delay, random access and random access early profile. PSNR in DB for Different QP Video SequenceProfile JohnnyLow_delay Random_access Random_access_early Bitrate in kbps for Different QP Video SequenceProfile JohnnyLow_delay Random_access Random_access_early Encoding Time in Seconds for Different QP Video SequenceProfile JohnnyLow_delay Random_access Random_access_early Encoding Time saved (%)
Experimental Results (Continued) Comparison of low delay, random access and random access early for different QP values and different video sequences. Table 12. PSNR for low delay, random access and random access early profile. Table 13. PSNR for low delay, random access and random access early profile. Table 14. PSNR for low delay, random access and random access early profile. PSNR in DB for Different QP Video SequenceProfile ParkSceneLow_delay Random_access Random_access_early Bitrate in kbps for Different QP Video SequenceProfile ParkSceneLow_delay Random_access Random_access_early Encoding Time in Seconds for Different QP Video SequenceProfile ParkSceneLow_delay Random_access Random_access_early Encoding Time saved (%)
Conclusion Hence from the results shown in table 1 it can be said that about 60% of the motion vectors are median predictors. And these median predictors are terminated using TZSearch which saves encoding time by 38%. The results can be seen from table 3 through table 14.
References [1] HEVC overview [2] D. Marpe et al, “The H.264/MPEG4 advanced video coding standard and its applications”, IEEE Communications Magazine, Vol. 44, pp , Aug [3] B. Bross et al, “High Efficiency Video Coding (HEVC) Text Specification Draft 10”, Document JCTVC-L1003, ITU-T/ISO/IEC Joint Collaborative Team on Video Coding (JCT-VC),Mar [4] I.E.G. Richardson, “Video Codec Design: Developing Image and Video Compression Systems”, Wiley, 2002 [5] HEVC white paper - Ateme: [6] G.J. Sullivan et al, “Standardized Extensions of High Efficiency Video Coding (HEVC)”, IEEE Journal of selected topics in Signal Processing, Vol. 7, No. 6, pp , Dec [7] HEVC tutorial by I.E.G. Richardson: [8] C. Fogg, “Suggested figures for the HEVC specification”, ITU-T / ISO-IEC Document: JCTVC J0292r1, July [9] U.S.M. Dayananda, “Study and Performance comparison of HEVC and H.264 video codecs” Final project report, EE Dept., UTA, Arlington, TX, Dec available on [10] HM Software Manual - [11] Visual studio: [12] Tortoise SVN: [13] Multimedia processing course website: [14] C. E. Rhee et al, “A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their Applications to High Efficiency Video Coding”, IEEE Transactions on Consumer Electronics, vol 58, no. 4, pp , Dec [15] R. Li, B. Zeng, and M.L. Lio, “A new three-step search algorithm for block motion estimation” IEEE Trans. Circuits and Systems for Video Technology, vol. 4, no. 4, pp ,August [16] Z. Pan et al, “Early termination for TZSearch in HEVC motion estimation”, IEEE ICASSP 2013, pp , June 2013.
References (Continued) [17] X. Zhang, S. Wang, S. Ma, “Early termination of coding unit splitting for HEVC”, Signal & Information Processing Association Annual Summit and Conference. Page(s):1-4, December [18] Ahmad asghar, Muhammad atiq, Rai Ammad khan, Nadeem a. khan, “Motion Estimation and Inter Prediction Mode Selection in HEVC”, Recent Researches in Telecommunications, Informatics, Electronics and Signal Processing, page(s): 351 – 357, December [19] G. Sullivan et al, “Overview of the high efficiency video coding (HEVC) standard”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp , December [20]HEVC tutorial [21] V. Sze, M. Budagavi and G.J. Sullivan (editors), “High Efficiency Video Coding (HEVC) Algorithms and Architectures” Springer, [22] L.C.Manikandan et al, “A new survey on Block Matching Algorithms in Video Coding” in International Journal of Engineering Research, vol. 3, pp , February [23] M. Jakubowski and G. Pastuszak, “Block-based motion estimation algorithms-a survey,” Journal of Opto-Electronics Review, vol. 21, pp , March [24] Maria Santamaria and Maria Trujilo,”A comparision of block-matching motion estimation algorithms” on October 4th [25] L. Xufeng, et al,”Fast motion estimation for HEVC.” 2014 IEEE International Symposium on Broadband Multimedia and Broadcasting (BMSB), IEEE, December [26] N. Purnachand, et al, “Fast motion estimation algorithm for HEVC,” in Consumer Electronics - Berlin (ICCE-Berlin), 2012 IEEE second International Conference on Consumer Electronics - Berlin (ICCE-Berlin), 2012 IEEE International Conference on. Berlin: pp. 34–37, IEEE, March [27] X.-l. Tang, et al, “An analysis of TZ search algorithm in JMVC,” in Green Circuits and Systems (ICGCS), 2010 International Conference on, ser. Green Circuits and Systems (ICGCS), 2010 International Conference on. Shanghai: pp. 516–520, IEEE, September [28] L.N.A. Alves, and A. Navarro, " Fast Motion Estimation Algorithm for HEVC ", Proc IEEE International Conf. on Consumer Electronics - vol.11, pp , ICCE Berlin, Germany, September 2012 [29] M. Jakubowski and G. Pastuszak, “Block-based motion estimation algorithms-a survey,” Journal of Opto-Electronics Review, vol. 21, pp , March 2013.
References (Continued) [30] Z. Pan, S. Kwong, L. Xu, Y. Zhang, T. Zhao, “Predictive and distribution-oriented fast motion estimation for H.264/AVC” Journal of Real-Time Image Processing, vol. 9, page(s): 597 – 607, December [31] P. Nalluri et al, “High Speed SAD Architectures for variable block size estimation in HEVC video coding”, IEEE International Conference on Image Processing (ICIP). Page(s): 1233 – 1237, October [32] M. A. B. Ayed, et al, “TZ Search pattern search improvement for HEVC motion estimation modules,” Advanced Technologies for Signal and Image Processing (ATSIP). Page(s): 95 – 99, March [33] Introduction to Motion estimation and Motion compensation---> [34] HM Software Manual - [35] Visual studio: [36] Tortoise SVN: [37] Tutorials---> N. Ling, “High efficiency video coding and its 3D extension: A research perspective,” Keynote Speech, IEEE Conference on Industrial Electronics and Applications, Singapore, July [38] Tutorials---> X. Wang et al, “Paralleling variable block size motion estimation of HEVC on CPU plus GPU platform”, IEEE International Conference on Multimedia and Expo workshop, [39] Tutorials---> H.R. Tohidpour, et al, “Content adaptive complexity reduction scheme for quality/fidelity scalable HEVC”, IEEE International Conference on Image Processing, pp , June [40] HEVC tutorial 2014 ISCAS ---> [41] Video Lecture on Digital Voice and Picture Communication by Prof.S. Sengupta, Department of Electronics and Electrical Communication Engineering IIT Kharagpur -> [42] Lecture on video coding standards -> [43] YUV format figures -> [44] Slideshare--->
References (Continued) [45] Video Compression Image---> [46] Motion Estimation for Video Coding---> [47] Different block matching algorithms---> [48] V. Sze, M. Budagavi and G.J. Sullivan (editors) “High efficiency video coding (HEVC): algorithms and architectures”, Springer, 2014.