1 Thread-Parallel MPEG-2, MPEG4 and H.264 Video Encoders for SoC Multi- Processor Architecture Tom R. Jacobs, Vassilios A. Chouliars, and David J. Mulvaney.

Slides:



Advertisements
Similar presentations
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Advertisements

Parallel H.264 Decoding on an Embedded Multicore Processor
Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.
H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005.
MPEG4 Natural Video Coding Functionalities: –Coding of arbitrary shaped objects –Efficient compression of video and images over wide range of bit rates.
Basics of MPEG Picture sizes: up to 4095 x 4095 Most algorithms are for the CCIR 601 format for video frames Y-Cb-Cr color space NTSC: 525 lines per frame.
Practical and Scalable Transmission of Segmented Video Sequences to Multiple Players using H.264 Fabian Di Fiore, Panagiotis Issaris Expertise Centre for.
Technion - IIT Dept. of Electrical Engineering Signal and Image Processing lab Transrating and Transcoding of Coded Video Signals David Malah Ran Bar-Sella.
1 Video Coding Concept Kai-Chao Yang. 2 Video Sequence and Picture Video sequence Large amount of temporal redundancy Intra Picture/VOP/Slice (I-Picture)
Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad.
H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS.
1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication.
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding
Efficient multi-frame motion estimation algorithms for MPEG-4 AVC/JVTH.264 Mei-Juan Chen, Yi-Yen Chiang, Hung- Ju Li and Ming-Chieh Chi ISCAS 2004.
An Error-Resilient GOP Structure for Robust Video Transmission Tao Fang, Lap-Pui Chau Electrical and Electronic Engineering, Nanyan Techonological University.
Department of Computer Engineering University of California at Santa Cruz Video Compression Hai Tao.
1 Single Reference Frame Multiple Current Macroblocks Scheme for Multiple Reference IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Tung-Chien.
Overview of Fine Granularity Scalability in MPEG-4 Video Standard Weiping Li, Fellow, IEEE.
Efficient MPEG Compressed Video Analysis Using Macroblock Type Information Soo-Chang Pei, Yu-Zuong Chou IEEE TRANSACTIONS ON MULTIMEDIA, DECEMBER,1999.
H.264 / MPEG-4 Part 10 Nimrod Peleg March 2003.
Scalable Wavelet Video Coding Using Aliasing- Reduced Hierarchical Motion Compensation Xuguang Yang, Member, IEEE, and Kannan Ramchandran, Member, IEEE.
1 Efficient Multithreading Implementation of H.264 Encoder on Intel Hyper- Threading Architectures Steven Ge, Xinmin Tian, and Yen-Kuang Chen IEEE Pacific-Rim.
Fundamentals of Multimedia Chapter 11 MPEG Video Coding I MPEG-1 and 2
An Introduction to H.264/AVC and 3D Video Coding.
Video Compression Concepts Nimrod Peleg Update: Dec
1. 1. Problem Statement 2. Overview of H.264/AVC Scalable Extension I. Temporal Scalability II. Spatial Scalability III. Complexity Reduction 3. Previous.
MPEG-2 Digital Video Coding Standard
CSE679: MPEG r MPEG-1 r MPEG-2. MPEG r MPEG: Motion Pictures Experts Group r Standard for encoding videos/movies/motion pictures r Evolving set of standards.
Conference title 1 A WYNER-ZIV TO H.264 VIDEO TRANSCODER José Luis Martínez, Pedro Cuenca, Gerardo Fernández-Escribano, Francisco José Quiles and Hari.
JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.
Electrical Engineering National Central University Video-Audio Processing Laboratory Data Error in (Networked) Video M.K.Tsai 04 / 08 / 2003.
Page 19/15/2015 CSE 40373/60373: Multimedia Systems 11.1 MPEG 1 and 2  MPEG: Moving Pictures Experts Group for the development of digital video  It is.
Video Coding. Introduction Video Coding The objective of video coding is to compress moving images. The MPEG (Moving Picture Experts Group) and H.26X.
MPEG-1 and MPEG-2 Digital Video Coding Standards Author: Thomas Sikora Presenter: Chaojun Liang.
MPEG: (Moving Pictures Expert Group) A Video Compression Standard for Multimedia Applications Seo Yeong Geon Dept. of Computer Science in GNU.
1 Efficient Reference Frame Selector for H.264 Tien-Ying Kuo, Hsin-Ju Lu IEEE CSVT 2008.
Windows Media Video 9 Tarun Bhatia Multimedia Processing Lab University Of Texas at Arlington 11/05/04.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
1 Data Partition for Wavefront Parallelization of H.264 Video Encoder Zhuo Zhao, Ping Liang IEEE ISCAS 2006.
Videos Mei-Chen Yeh. Outline Video representation Basic video compression concepts – Motion estimation and compensation Some slides are modified from.
Adaptive Multi-path Prediction for Error Resilient H.264 Coding Xiaosong Zhou, C.-C. Jay Kuo University of Southern California Multimedia Signal Processing.
Codec structuretMyn1 Codec structure In an MPEG system, the DCT and motion- compensated interframe prediction are combined. The coder subtracts the motion-compensated.
Diploma Project Real Time Motion Estimation on HDTV Video Streams (using the Xilinx FPGA) Supervisor :Averena L.I. Student:Das Samarjit.
MOTION ESTIMATION IMPLEMENTATION IN VERILOG
Compression video overview 演講者:林崇元. Outline Introduction Fundamentals of video compression Picture type Signal quality measure Video encoder and decoder.
Rate-distortion Optimized Mode Selection Based on Multi-channel Realizations Markus Gärtner Davide Bertozzi Classroom Presentation 13 th March 2001.
Aug 25, 2005 page1 Aug 25, 2005 Integration of Advanced Video/Speech Codecs into AccessGrid National Center for High Performance Computing Speaker: Barz.
Compression of Real-Time Cardiac MRI Video Sequences EE 368B Final Project December 8, 2000 Neal K. Bangerter and Julie C. Sabataitis.
-BY KUSHAL KUNIGAL UNDER GUIDANCE OF DR. K.R.RAO. SPRING 2011, ELECTRICAL ENGINEERING DEPARTMENT, UNIVERSITY OF TEXAS AT ARLINGTON FPGA Implementation.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Video Watermarking Real-time Labeling of MPEG-2 Compressed Video G. C. Langelaar, R. L. Lagendijk, and J. Biemond ITS, ICTG, Delft University of Technology.
UNDER THE GUIDANCE DR. K. R. RAO SUBMITTED BY SHAHEER AHMED ID : Encoding H.264 by Thread Level Parallelism.
Flow Control in Compressed Video Communications #2 Multimedia Systems and Standards S2 IF ITTelkom.
MPEG CODING PROCESS. Contents  What is MPEG Encoding?  Why MPEG Encoding?  Types of frames in MPEG 1  Layer of MPEG1 Video  MPEG 1 Intra frame Encoding.
Hierarchical Systolic Array Design for Full-Search Block Matching Motion Estimation Noam Gur Arie,August 2005.
1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.
1 Department of Electrical Engineering, Stanford University EE 392J Final Project Presentation Shantanu Rane Hash-Aided Motion Estimation & Rate Control.
V ENUS INTERNATIONAL COLLEGE OF TECHNOLOGY Guided by : Rinkal mam.
Principles of Video Compression Dr. S. M. N. Arosha Senanayake, Senior Member/IEEE Associate Professor in Artificial Intelligence Room No: M2.06
MPEG Video Coding I: MPEG-1 1. Overview  MPEG: Moving Pictures Experts Group, established in 1988 for the development of digital video.  It is appropriately.
H. 261 Video Compression Techniques 1. H.261  H.261: An earlier digital video compression standard, its principle of MC-based compression is retained.
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Steven Ge, Xinmin Tian, and Yen-Kuang Chen
ENEE 631 Project Video Codec and Shot Segmentation
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
Standards Presentation ECE 8873 – Data Compression and Modeling
MPEG4 Natural Video Coding
Bongsoo Jung, Byeungwoo Jeon
Fundamentals of Video Compression
Presentation transcript:

1 Thread-Parallel MPEG-2, MPEG4 and H.264 Video Encoders for SoC Multi- Processor Architecture Tom R. Jacobs, Vassilios A. Chouliars, and David J. Mulvaney IEEE Transactions on Consumer Electronics

2 Outline Introduction Background knowledge Main purpose Previous work Methodology Experimental results Conclusions

3 Introduction Background Knowledge (1/5) A number of lossy video compression standards have been developed. MPEG-1, MPEG-2, MPEG4-PART2, H.264 In order to maintain image quality and reduce bit-rates Additional computation and power consumption

4 Introduction Background Knowledge (2/5) Such processing-intense consumer application algorithms are generally implemented in System-On-Chip (SOC) devices. Parallelism DLP  Data-Level Parallelism TLP  Thread-Level Parallelism

5 Introduction Background Knowledge (3/5) Data-Level Parallelism (DLP) Distributing the data across different parallel processing nodes. Program: … if CPU="a" then low_limit=1; upper_limit=5 else if CPU="b" then low_limit=6; upper_limit=10 end if do i = low_limit, upper_limit Task on d(i) end do... end program

6 Introduction Background Knowledge (4/5) Data array D of size 10 Processing node

7 Introduction Background Knowledge (5/5) Thread-Level Parallelism (TLP) TLP is the parallelism inherent in an application that runs multiple threads at once. Benefit- Distributing the workload of a single high- performance processor among a number of slower and simpler processor cores.

8 Introduction Main Purpose (1/2) Utilizing Thread-Level Parallel (TLP) techniques to improve the performance on video coding. Reduce DIC (Dynamic Instruction Count). How to improve? Workload distribution among a number of parallel-executing processors.

9 Introduction Main Purpose (2/2) The results presented demonstrate that reductions in dynamic instruction count can be achieved.

10 Previous Work The majority of this research is focused on coarse-granularity TLP exploitation, with distribution the workload most commonly at GOP level. GOP Multi-threading Little inter-node communication

11 Previous Work In 1995, K. Shen, L. A. Rowe, and E.J. Delp implemented parallel MPEG-1 at GOP level. In 1996, S. Bozoki, S. J. P. Westen, R. L. Lagendijk and J. Biemond performed a comparison between GOP and slice level on MPEG-1.

12 Previous Work In 1997, A. Bilas, J. Fritts and J. P. Singh evaluated the performance of MPEG-2 decoders using shared memory system. Akramullah, Ahmad and Liou implemented a threaded MPEG-2 encoder at the MB level by using local memory.

13 Methodology Overview The threaded MPEG-2, MPEG-4 and H.264 implemented were compiled on multi-context instruction simulator (MT- ISS) based on SimpleScalar infrastructure. The most important issue Data dependancies between processors. Avoid race hazards.

14 Methodology Race hazards Integer i Thread 1 0 Thread 2 1 i Integer i Thread 1Thread i Race hazards Expected condition Error condition

15 Methodology Thread-parallel MPEG-2 (1/5) Test model 5 (TM5) of MPEG-2 encoder is used. Computation analysis (QCIF) DIST1  52%~73% of total DIC for a search window of 6 to 62 pels respectively. FullSearch  3.5%~23.2% of total DIC. Can be improved by less complex algorithmic ME method. (such as 3-step, 4-step, diamond) FDCT, and IDCT  2.1%~21% of total DIC.

16 Methodology Thread-parallel MPEG-2 (2/5)

17 Methodology Thread-parallel MPEG-2 (3/5) Motion Estimation Kernel implementation can take advantage of data parallel techniques. Store the information in mbinfo structure for motion compensation. Maintain exclusivity of all variables during the parallel sections.

18 Methodology Thread-parallel MPEG-2 (4/5) Forward transform FDCT first scans the MBs on a row-by-row basis, process these MBs in a row individually. Determine prediction error and applies the DCT to the block. Thread-parallel transform function can be performed in block-level.

19 Methodology Thread-parallel MPEG-2 (5/5) Inverse transform IDCT scans the MBs first row-by-row and then block-by-block. Due to the absence of data dependencies between blocks  Can executed as parallel.

20 Methodology Thread-parallel MPEG-4 (1/8) The implementation is based on XviD project with Advanced Simple Profile (ASP). Bidirectional frames Quarter-pel motion compensation Global motion compensation Trellis quantization Custom quantization matrices

21 Methodology Thread-parallel MPEG-4 (2/8) Computation analysis (QCIF)

22 Methodology Thread-parallel MPEG-4 (3/8) The nature of XivD encoder Intra-frame encoding Inter-frame encoding

23 Methodology Thread-parallel MPEG-4 (4/8) Intra-frame encoding FrameCodeI (row-by-row for each MBs) Parallelize the loop for encoding the MBs in a row of the image. MB data structure  pMB. Shared memory array. The highest DIC metric in FrameCodeI is MBTransQuantIntra.

24 Methodology Thread-parallel MPEG-4 (5/8) MBTransQuantIntra Forward transformation, quantization and inverse transformation. Shared data structure  pEnc Includes a count of quantization values. Serial code section. Transform specific MB pixel data into the frequency domain independently. MBPrediction and MBCoding Responsible for VLC and write to bitstream.

25 Methodology Thread-parallel MPEG-4 (6/8) Inter-frame encoding FrameCodeP Part 1  Motion Estimation Part 2  Transformation  Quantization  MC

26 Methodology Thread-parallel MPEG-4 (7/8) Motion Estimation Determine a MV for every MB and applies certain criteria to indicate when Intra coding should be used. Scanning in raster line order. Two kind of the process Motion prediction from current frame. ME relative to reference frames.

27 Methodology Thread-parallel MPEG-4 (8/8) Motion Prediction Examining the MVs in neighbouring MBs and determining an initial estimate for ME. ● ● ● ● ● ● ● ● ● ● Ideal pattern typical pattern TLP pattern

28 Methodology H.264 (1/6) Using x264 for implementation. Frame slicing Main problems of using MB-level Wide variation in processor workload. The modification of prediction algorithm is needed.

29 Methodology H.264 (2/6) Slice group in H.264 A group of MBs in a frame. Can be encoded or decoded separatedly from the remainder of the frame. Not allowing motion prediction cross slice boundaries. Drawback The required bit-rate increase.

30 Methodology H.264 (3/6) Comparison of different slice number

31 Methodology H.264 (4/6) Comparison of different slice number

32 Methodology H.264 (5/6) Different resolution with 4 slices

33 Methodology H.264 (6/6) Computation analysis

34 Experimental Results MPEG-2 Search Range

35 Experimental Results MPEG-4 Quality Setting

36 Experimental Results H.264 Quantization Parameter

37 Experimental Results Comparative results

38 Conclusions The DIC metric of MPEG-2, MPEG-4, and H.264 can be greatly reduced by TLP. For HD sequences, the improvement is around 84%, 92%, 96% respectively. TLP has become more significant for each new generation of video encoders.