Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding

Slides:



Advertisements
Similar presentations
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Advertisements

Parallel H.264 Decoding on an Embedded Multicore Processor
Towards Efficient Wavefront Parallel Encoding of HEVC: Parallelism Analysis and Improvement Keji Chen, Yizhou Duan, Jun Sun, Zongming Guo 2014 IEEE 16th.
H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005.
INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS, ICT '09. TAREK OUNI WALID AYEDI MOHAMED ABID NATIONAL ENGINEERING SCHOOL OF SFAX New Low Complexity.
-1/20- MPEG 4, H.264 Compression Standards Presented by Dukhyun Chang
Technion - IIT Dept. of Electrical Engineering Signal and Image Processing lab Transrating and Transcoding of Coded Video Signals David Malah Ran Bar-Sella.
Development of Parallel Simulator for Wireless WCDMA Network Hong Zhang Communication lab of HUT.
1 Video Coding Concept Kai-Chao Yang. 2 Video Sequence and Picture Video sequence Large amount of temporal redundancy Intra Picture/VOP/Slice (I-Picture)
Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad.
H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS.
1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication.
SCHOOL OF COMPUTING SCIENCE SIMON FRASER UNIVERSITY CMPT 820 : Error Mitigation Schaar and Chou, Multimedia over IP and Wireless Networks: Compression,
Department of Computer Engineering University of California at Santa Cruz Video Compression Hai Tao.
1 Single Reference Frame Multiple Current Macroblocks Scheme for Multiple Reference IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Tung-Chien.
Michael A. Baker, Pravin Dalale, Karam S. Chatha, Sarma B. K. Vrudhula
Video on DSP and FPGA John Johansson April 12, 2004.
H.264 / MPEG-4 Part 10 Nimrod Peleg March 2003.
Error Concealment For Fine Granularity Scalable Video Transmission Hua Cai; Guobin Shen; Feng Wu; Shipeng Li; Bing Zeng; Multimedia and Expo, Proceedings.
1 Efficient Multithreading Implementation of H.264 Encoder on Intel Hyper- Threading Architectures Steven Ge, Xinmin Tian, and Yen-Kuang Chen IEEE Pacific-Rim.
1 Slice-Balancing H.264 Video Encoding for Improved Scalability of Multicore Decoding Michael Roitzsch Technische Universität Dresden ACM & IEEE international.
Error Resilience in a Generic Compressed Video Stream Transmitted over a Wireless Channel Muhammad Bilal
1 An Efficient Mode Decision Algorithm for H.264/AVC Encoding Optimization IEEE TRANSACTION ON MULTIMEDIA Hanli Wang, Student Member, IEEE, Sam Kwong,
Fundamentals of Multimedia Chapter 11 MPEG Video Coding I MPEG-1 and 2
H.264/AVC for Wireless Applications Thomas Stockhammer, and Thomas Wiegand Institute for Communications Engineering, Munich University of Technology, Germany.
Error Resilience of Video Transmission By Rate-Distortion Optimization and Adaptive Packetization Yuxin Liu, Paul Salama and Edwad Delp ICME 2002.
Multicore Design Considerations. Multicore: The Forefront of Computing Technology “We’re not going to have faster processors. Instead, making software.
EEL 6935 Embedded Systems Long Presentation 2 Group Member: Qin Chen, Xiang Mao 4/2/20101.
An Introduction to H.264/AVC and 3D Video Coding.
1. 1. Problem Statement 2. Overview of H.264/AVC Scalable Extension I. Temporal Scalability II. Spatial Scalability III. Complexity Reduction 3. Previous.
1 Thread-Parallel MPEG-2, MPEG4 and H.264 Video Encoders for SoC Multi- Processor Architecture Tom R. Jacobs, Vassilios A. Chouliars, and David J. Mulvaney.
Liquan Shen Zhi Liu Xinpeng Zhang Wenqiang Zhao Zhaoyang Zhang An Effective CU Size Decision Method for HEVC Encoders IEEE TRANSACTIONS ON MULTIMEDIA,
JPEG 2000 Image Type Image width and height: 1 to 2 32 – 1 Component depth: 1 to 32 bits Number of components: 1 to 255 Each component can have a different.
Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.
Electrical Engineering National Central University Video-Audio Processing Laboratory Data Error in (Networked) Video M.K.Tsai 04 / 08 / 2003.
Video Coding. Introduction Video Coding The objective of video coding is to compress moving images. The MPEG (Moving Picture Experts Group) and H.26X.
MPEG: (Moving Pictures Expert Group) A Video Compression Standard for Multimedia Applications Seo Yeong Geon Dept. of Computer Science in GNU.
Windows Media Video 9 Tarun Bhatia Multimedia Processing Lab University Of Texas at Arlington 11/05/04.
1 Data Partition for Wavefront Parallelization of H.264 Video Encoder Zhuo Zhao, Ping Liang IEEE ISCAS 2006.
Image Processing and Computer Vision: 91. Image and Video Coding Compressing data to a smaller volume without losing (too much) information.
Adaptive Multi-path Prediction for Error Resilient H.264 Coding Xiaosong Zhou, C.-C. Jay Kuo University of Southern California Multimedia Signal Processing.
June, 1999 An Introduction to MPEG School of Computer Science, University of Central Florida, VLSI and M-5 Research Group Tao.
TM Paramvir Bahl Microsoft Corporation Adaptive Region-Based Multi-Scaled Motion- Compensated Video Coding for Error Prone Communication.
By: Hitesh Yadav Supervising Professor: Dr. K. R. Rao Department of Electrical Engineering The University of Texas at Arlington Optimization of the Deblocking.
Compression video overview 演講者:林崇元. Outline Introduction Fundamentals of video compression Picture type Signal quality measure Video encoder and decoder.
Pipelined and Parallel Computing Data Dependency Analysis for 1 Hongtao Du AICIP Research Mar 9, 2006.
1 A high-level simulator for the H.264/AVC decoding process in multi-core systems Florian H. Seitner, Ralf M. Schreier, Michael Bleyer, Margrit Gelautz.
-BY KUSHAL KUNIGAL UNDER GUIDANCE OF DR. K.R.RAO. SPRING 2011, ELECTRICAL ENGINEERING DEPARTMENT, UNIVERSITY OF TEXAS AT ARLINGTON FPGA Implementation.
Guillaume Laroche, Joel Jung, Beatrice Pesquet-Popescu CSVT
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
UNDER THE GUIDANCE DR. K. R. RAO SUBMITTED BY SHAHEER AHMED ID : Encoding H.264 by Thread Level Parallelism.
-BY KUSHAL KUNIGAL UNDER GUIDANCE OF DR. K.R.RAO. SPRING 2011, ELECTRICAL ENGINEERING DEPARTMENT, UNIVERSITY OF TEXAS AT ARLINGTON FPGA Implementation.
1 Modular Refinement of H.264 Kermin Fleming. 2 What is H.264? Mobile Devices Low bit-rate Video Decoder –Follow on to MPEG-2 and H.26x Operates on pixel.
The World Leader in High Performance Signal Processing Solutions Multi-core programming frameworks for embedded systems Kaushal Sanghai and Rick Gentile.
Parallel processing
Time Optimization of HEVC Encoder over X86 Processors using SIMD Kushal Shah Advisor: Dr. K. R. Rao Spring 2013 Multimedia.
1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.
Outline  Introduction  Observations and analysis  Proposed algorithm  Experimental results 2.
Fine-granular Motion Matching for Inter-view Motion Skip Mode in Multi-view Video Coding Haitao Yanh, Yilin Chang, Junyan Huo CSVT.
Multi-Frame Motion Estimation and Mode Decision in H.264 Codec Shauli Rozen Amit Yedidia Supervised by Dr. Shlomo Greenberg Communication Systems Engineering.
CMPT365 Multimedia Systems 1 Media Compression - Video Spring 2015 CMPT 365 Multimedia Systems.
MPEG Video Coding I: MPEG-1 1. Overview  MPEG: Moving Pictures Experts Group, established in 1988 for the development of digital video.  It is appropriately.
H. 261 Video Compression Techniques 1. H.261  H.261: An earlier digital video compression standard, its principle of MC-based compression is retained.
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Steven Ge, Xinmin Tian, and Yen-Kuang Chen
Video-in-Video Insertion into a Pre-encoded Bit-stream
Sum of Absolute Differences Hardware Accelerator
ENEE 631 Project Video Codec and Shot Segmentation
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
Bongsoo Jung, Byeungwoo Jeon
Presentation transcript:

Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding Florian H. Seitner, Michael Bleyer, Ralf M. Schreier, Margrit Gelautz International Conference on Advances in Mobile & Multimedia (MoMM 2008)

Outline Introduction Parallel H.264 Decoding Evaluated Methods Experimental Results Conclusions

Introduction H.264 video standard is currently used in a wide range of video-related areas Video content distribution Television broadcasting High coding efficiency Qpel motion estimation Variable block size Multiple reference frames Significantly increased CPU and memory loads

Introduction Using multi-core systems to increase system performance How to distribute H.264 decoding algorithm among multiple processing units ? The decoding load should be distributed equally Data dependency issues Inter-communication Synchronization

Introduction The aim of this work is to evaluate the behavior of different decoding approaches Run-time complexity Efficient core usage Data transfers

Parallel H.264 Decoding Functional and Data-parallel splitting Functional partitioned decoding system Decoding tasks are assigned to individual processing cores Each processing unit can be optimized for a certain task Unequal workload distribution High transfer rate for inter-communication

Parallel H.264 Decoding Functional and Data-parallel splitting Data-parallel decoding system Distributing MBs among multiple processing unit Data dependencies between different cores must be minimized MB distribution onto the processing cores must achieve an equal workload balancing

Parallel H.264 Decoding The H.264 Decoder The H.264 decoding process Encoded Bitstream Inverse Quantization Inverse DCT Stream Parsing Entropy Decoder Deblocking + Spatial Prediction Motion Compensation Reference Frames Reconstructor Data-Parallel Processing Parser

Parallel H.264 Decoding Macroblock Dependencies Data-parallel splitting of the decoder’s reconstruction module is challenging due to spatial and temporal dependencies Intra prediction Deblocking Inter prediction

Evaluated Methods Overview Comparing the performance of five different approaches for accomplishing data-parallel splitting of the decoder’s reconstructor module Single row approach Multi-column approach Blocking slice-parallel method Nonblocking slice-parallel method Diagonal approach

Evaluated Methods Single Row Approach The assignment of MBs to processors 2 Cores 4 Cores 8 Cores N is the number of processors Processor i ( i = 0, 1, …, N - 1 ) is responsible for decoding the yth row of MBs if ( y mod N ) = i

Evaluated Methods Single Row Approach An example of SR approach ( 2 cores ) It takes a constant value of 1 unit of time to process a macroblock T = 2 T = 3 T = 8 T = 10 T = 34

Evaluated Methods Single Row Approach Advantage Simplicity Only a small start delay Disadvantage So many dependencies across processor assignment borders

Evaluated Methods Multi-column Approach The assignment of MBs to processors 2 Cores 4 Cores 8 Cores w is the width of a multi-column Processor i ( i = 0, 1, …, N - 1 ) is responsible for decoding a MB of the xth column if iw < x < ( i + 1)w

Evaluated Methods Multi-column Approach An example of MC approach ( 2 cores ) Advantage Less dependencies across processors One processor has to wait for the results only at the boundaries T = 4 T = 5 T = 8 T = 36

Evaluated Methods Slice-parallel Approach The assignment of MBs to processors 2 Cores 4 Cores 8 Cores h is the height of a slice Processor i ( i = 0, 1, …, N - 1 ) is responsible for decoding a MB of the yth row if ih < x < (i + 1)h

Evaluated Methods Slice-parallel Approach An example of SP approach in the blocking version ( 2 cores) Disadvantage Long delay CPU idle, less core usage T = 26 T = 32 T = 58

Evaluated Methods Slice-parallel Approach An example of SP approach in the non-blocking version ( 2 cores ) No dependencies is considered across slice boundaries (completely independent) NBSP requires having full control over the encoder T = 1 T = 32

Evaluated Methods Diagonal Approach The assignment of MBs to processors Dividing the first line of MBs into equally-sized columns The assignments for the subsequent lines are derived by left-shifting the MB of the line above 2 Cores 4 Cores 8 Cores

Evaluated Methods Diagonal Approach An example of DG approach T = 4 T = 10 T = 12 T = 13 T = 16 T = 18 T = 20 T = 23 T = 24 T = 43

Evaluated Methods Diagonal Approach Comparing the inter-processor dependencies introduced by DG and MC approach Diagonal approach Multi-column approach Dependencies for CPU 2 originate solely from MB assigned to CPU1 MBs assigned to CPU 2 are also dependent on CPU 3

Experimental Results Overview Test sequences Parameters GOP size = 14 Search range = +/- 16 pixels 5 reference frames

Experimental Results Run-time Complexity Two major indicators for the efficiency of multi-core decoding system Decoder’s run-time A low run-time indicates a high system decoding performance Number of data-dependency stalls occurring during the decoding process The number of stalls provides an estimate on how efficiently the system’s computational resources are used

Experimental Results Run-time Complexity Speed-up in run-time The speed increase for each parallelization approach in multiples of the single-core performance

Experimental Results Run-time Complexity Stall cycles caused by data dependencies between the cores

Experimental Results Inter-communication Memory transfer to and from the external DRAM and between the cores’ local memories are expensive in terms of power consumption and transfer time Core inter-communication Loading reference data and deblocking pixels

Experimental Results Inter-communication Data transform volume for reference data and deblocking information

Conclusions In this study, we have evaluated 5 data-parallel approaches for the H.264 decoder The run-time of each parallelization approaches is influenced by the frame partitions’ sizes and shapes Large and dependency-minimizing partitions cause less inter-communication between cores