Parallelization of HEVC Deblocking filters using CUDA GPU A PROJECT PROPOSAL UNDER THE GUIDANCE OF DR. K. R. RAO COURSE: EE MULTIMEDIA PROCESSING, SPRING 2016 SUBMISSION DATE: 14th April 2016 SUBMITTED BY ARPITA YAGNIK UT ARLINGTON ID: ID:
TABLE OF CONTENTS 1.Acronyms 2.Objective and Action Plan 3.Timing Analysis for encoding 4.Introduction to CUDA GPU 5.Overview of CUDA memory 6.Image mapping to memory grid 7.GPU grid computing 8.Steps of programming GPU 9.Proposed algorithm for parallel deblocking 10.Challenges 11.References
Acronyms AVC: Advanced Video Coding BS: Boundary Strength CODEC: COder/DECoder Croma: Chrominance CPU: Central Processing Unit CTU: Coding Tree Unit CU: Coding Unit CUDA: Compute Unified Device Architecture DBF: Deblocking Filter DCT: Discrete Cosine Transform DFT: Discrete Fourier Transform GPU: Graphics Processing Unit HEVC: High Efficiency Video Coding IEC: International Electrotechnical Commission ISO: International Standards Organization ITU-T: International Telecommunication Union (Telecommunication Standardization Sector) JBIG: Joint Bi-level Image Experts Group JPEG: Joint photographic experts group JCT-VC: Joint collaborative team on video coding LOT: Lapped Orthogonal Transform Luma: Luminance MB: Macro Block MPEG: Moving picture experts group OBMC: Overlapped Block Motion Compensation PU: Prediction Unit QP: Quantization Parameter SAO: Sample Adaptive Offset TU: Transform Unit
Objective and Action Plan Objective: To Implement the parallelization of Deblocking filter for HEVC CODEC using CUDA GPU. 1.Determine a way to program CUDA GPU. 2.Implement an algorithm for parallel processing of Deblocking filter operation. 3.Find the place to implement that algorithm at the appropriate location in HM code. 4.Execute the algorithm with both modes CPU only and CPU+GPU mode to compare the processing time and quality parameters. 5.Device a novel algorithm for the same.
Background study Analysis of Timing taken by Deblocking filter
Timing Analysis for encoding Comparison of encoding timings for different sequences[52]
Timing without Deblock filtering Sequence:waterfall_cif.yuv 352x288 FPS:25 No of frames:50 QP:32 Random access main configuration
Timing with Deblocking filter ON
Step 1 Way to program CUDA GPU
Introduction to CUDA GPU CUDA® (Compute Unified Device Architecture) is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).[51] CUDA is widely used in general purpose computation, such as astronomical calculation, computational fluid dynamics simulation, image processing and video codec.
Contd… CUDA Programming model[52]
Contd… Thread view[54]
Overview of CUDA memory Memory interface[54]
Image to grid mapping 1x128 8 [53]
Steps of programming GPU [54]
GPU Grid Computing [53]
Contd… [53]
Step 2 Algorithm proposed for the parallel processing of Deblocking Filter
The proposed Algorithm for parallel deblocking [55]
Contd… [55]
Step 4 Incorporating GPU code with HM code
Challenges The challenge is to incorporate CUDA C code along with HM C++ code. GPU library follows C/C++ codes along with FORTRAN library but basically it is called CUDA C.
References 1.Andrey Norkin et al.,”HEVC Deblocking Filter”, IEEE Transactions on CSVT, vol. 22, no. 12, pp , Dec Wei-Yi Wei, “Deblocking Algorithms in Video and Image Compression Coding”, National Taiwan University, Taipei, Taiwan, ROC 3.B. Bross, et al., High Efficiency Video Coding (HEVC) Text Specification Draft 8, ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 document JCTVTC-J1003, Joint Collaborative Team on Video Coding (JCTVC), Stockholm, Sweden, Jul ITU-T and ISO/IEC JCT 1, Advanced Video Coding for Generic Audiovisual Services, ITU-T Rec. H.264 and ISO/IEC (AVC), May 2003 (and subsequent editions). 5.T. Wedi and H. G. Musmann, “Motion and aliasing compensated prediction for hybrid video coding,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 577–586, Jul P. List, et al., “Adaptive deblocking filter,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 614– 619, Jul K. Ugur, K. R. Andersson, and A. Fuldseth, Video Coding Technology Proposal by Tandberg, Nokia, and Ericsson, ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 document JCTVC-A119, Joint Collaborative Team on Video Coding (JCTVC), Dresden, Germany, Apr A. Norkin, et al., CE12: Ericsson’s and MediaTek’s Deblocking Filter, ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 document JCTVC-F118, Joint Collaborative Team on Video Coding (JCTVC), Turin, Italy, Jul M. Ikeda and T. Suzuki, Non-CE10: Introduction of Strong Filter Clipping in Deblocking Filter, ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 document JCTVC-H0275, Joint Collaborative Team on Video Coding (JCTVC), San Jose, CA, Feb M. Ikeda, J. Tanaka, and T. Suzuki, CE12 Subset2: Parallel Deblocking Filter, ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 document JCTVC-E181, Joint Collaborative Team on Video Coding (JCTVC),Geneva, Switzerland, Mar
Contd… 11. M. Narroschke, S. Esenlik, and T. Wedi, CE12 Subtest 1: Results for Modified Decisions for Deblocking, ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 document JCTVC-G590, Joint Collaborative Team on Video Coding (JCTVC), Geneva, Switzerland, Nov A. Norkin, CE10.3: Deblocking Filter Simplifications: Bs Computation and Strong Filtering Decision, ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 document JCTVC-H0473, Joint Collaborative Team on Video Coding (JCTVC), San Jose, CA, Feb A. Fuldseth, et al., Tiles, ITU-TSG16 WP3 and ISO/IEC JTC1/SC29/WG11 document JCTVC-F335,Joint Collaborative Team on Video Coding (JCTVC), Turin, Italy, Jul T. Yamakage, et al.,CE12: Deblocking Filter Parameter Adjustment in Slice Level, ITUT SG16 WP3 and ISO/IEC JTC1/SC29/WG11 document JCTVCG174,Joint Collaborative Team on Video Coding (JCTVC), Geneva, Switzerland, Nov G. Van der Auwera,et al. (Panasonic), Support of Varying QP in Deblocking, ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 document JCTVCG1031,Joint Collaborative Team on Video Coding (JCTVC), Geneva, Switzerland, Nov M. Zhou, O. Sezer, and V. Sze, CE12 Subset 2: Test Results and Architectural Study on De-Blocking Filter Without Parallel on/off Filter Decision, ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 document JCTVC-G088, Joint Collaborative Team on Video Coding (JCTVC), Geneva, Switzerland, Nov G. Bjontegaard, Calculation of Average PSNR Differences Between RDCurves,ITU-T-T SG16 document VCEG-M33, Joint Collaborative Team on Video Coding (JCTVC), F. Bossen, Common Test Conditions, JCTVC-H1100, Joint Collaborative Team on Video Coding (JCTVC), San Jose, CA, Po-Kai Hsu and Chung-An Shen, The VLSI Architecture of a Highly Efficient Deblocking Filter for HEVC Systems, DOI /TCSVT , IEEE Transactions on Circuits and Systems for Video Technology 20.HEVC presentation: 21.Overview of H.264/AVC: 22.Detailed overview of HEVC/H.265:
Contd… 23. I.E.G. Richardson, “Video Codec Design: Developing Image and Video Compression Systems”, Wiley, I.E.G. Richardson, “The H.264 advanced video compression standard”, 2nd Edition, Hoboken, NJ, Wiley, K. Sayood, “Introduction to Data compression”, Third Edition, Morgan Kaufmann Series in Multimedia Information and Systems, San Francisco, CA, V. Sze and M. Budagavi, “Design and Implementation of Next Generation Video Coding Systems (H.265/HEVC Tutorial)”, IEEE International Symposium on Circuits and Systems (ISCAS), Melbourne, Australia, June V. Sze, M. Budagavi and G.J. Sullivan (Editors), “High Efficiency Video Coding (HEVC): Algorithms and Architectures”, Springer, G. J. Sullivan et al, “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 22, No. 12, pp , Dec G. J. Sullivan et al,“Standardized Extensions of High Efficiency Video Coding (HEVC)”, IEEE Journal of selected topics in Signal Processing, vol. 7, pp , Dec K.R. Rao, D.N. Kim and J.J. Hwang, “Video Coding Standards: AVS China, H.264/MPEG-4 Part 10, HEVC, VP6, DIRAC and VC-1”, Springer, D. Grois, B. Bross and D. Marpe, “HEVC/H.265 Video Coding Standard (Version 2) including the Range Extensions, Scalable Extensions, and Multiview Extensions,” (Tutorial) Sunday 27 Sept 2015, 9:00 am to 12:30 pm), IEEE ICIP, Quebec City, Canada, 27 – 30 Sept Generic quadtree based approach for block partitioning groups/image-video-coding/hevchigh-efficiency-video-coding/generic-quadtree-based-approach-for-block-partitioning.htmlwww.hhi.fraunhofer.de/fields-of-competence/image-processing/research- groups/image-video-coding/hevchigh-efficiency-video-coding/generic-quadtree-based-approach-for-block-partitioning.html 33.The tutorial below is for personal use only [Password: a2FazmgNK ] Please find the links to YouTube videos on the tutorial - HEVC/H.265 Video Coding Standard including the Range Extensions Scalable Extensions and Multiview Extensions below: 34.HEVC tutorial by I.E.G. Richardson: 35.“Special issue on HEVC extensions and efficient HEVC implementations”, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 26, pp , Jan K.R. Rao and J.J. Hwang, “Techniques and standards for image/video/audio coding”, Prentice Hall, 1996.
Contd… 37.Video lectures from IITs and IISC: 38.Image and video processing courses at UT Arlington (EE 5351, EE 5355, EE 5356 and EE 5359) : 39.HEVC chapter 1: 40.Online course on fundamentals of digital image and video processing from Coursera: 41.Access to HM 16.0 Software Manual: 42.Test Sequences: ftp://ftp.kw.bbc.co.uk/hevc/hm-11.0-anchors/bitstreams/ftp://ftp.kw.bbc.co.uk/hevc/hm-11.0-anchors/bitstreams/ 43.HEVC white paper-Ittiam Systems: 44.HEVC white paper-Elemental Technologies: 45.Access to HM 16.0 Reference Software: 46.Han W-J, et al. (2010), “Improved video compression efficiency through flexible unit representation and corresponding extension of coding tools”, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 20, no.12, pp , Dec Norkin A (2012) Non-CE1: non-normative improvement to deblocking filtering, Joint Collaborative Team on Video Coding (JCT-VC), Document JCTVC-K0289, Shanghai, Oct Norkin A, Andersson K, Fuldseth A, Bjøntegaard G (2012) HEVC deblocking filtering and decisions. In: Proc. SPIE. 8499, Applications of Digital Image Processing XXXV, no , Oct Norkin A, Andersson K, Kulyk V (2013) “Two HEVC encoder methods for block artifact reduction”. In: Proceedings of the IEEE international conference on visual communications and image processing (VCIP) 2013, Kuching, Sarawak, pp. 1–6, Nov Norkin A, Andersson K, Sjöberg R (2013) AHG6: on deblocking filter and parameters signaling, Joint Collaborative Team on Video Coding (JCT-VC), Document JCTVC-L0232, Geneva, Jan Information on GPU accelearted computing : 52.Xiaoou Sun et al, “Aceelerating IEEE 1857 Deblocking Filter on GPU using CUDA’, IEEE International Conference on Multimedia Big Data, pp , Apr
Contd… 53. Course on parallel computing: Course on heterogeneous parallel programming Anand Meher Kotra et al, “Comparison of different parallel implementations for deblocking filter of HEVC”, IEEE International conference on Acoustics, speech and signal processing, pp , Mar Test sequence:
Thank you