Parallel Scalability and Efficiency of HEVC Parallelization Approaches

Slides:



Advertisements
Similar presentations
Jung-Hwan Low Redundancy Layered Multiple Description Scalable Coding Using The Subband Extension Of H.264/AVC Department of Electrical.
Advertisements

Institut für Informationsverarbeitung,
Wen-Hsiao Peng Chun-Chi Chen
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
KIANOOSH MOKHTARIAN SCHOOL OF COMPUTING SCIENCE SIMON FRASER UNIVERSITY 6/24/2007 Overview of the Scalable Video Coding Extension of the H.264/AVC Standard.
MPEG-2 to H.264/AVC Transcoding Techniques Jun Xin Xilient Inc. Cupertino, CA.
Parallel H.264 Decoding on an Embedded Multicore Processor
Towards Efficient Wavefront Parallel Encoding of HEVC: Parallelism Analysis and Improvement Keji Chen, Yizhou Duan, Jun Sun, Zongming Guo 2014 IEEE 16th.
Standards, process, requirements 4K PLAYBACK EXPLAINED.
-1/20- MPEG 4, H.264 Compression Standards Presented by Dukhyun Chang
MULTIMEDIA PROCESSING
A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors Chenggang Yan, Yongdong Zhang, Jizheng Xu, Feng Dai,
Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad.
Efficient Bit Allocation and CTU level Rate Control for HEVC Picture Coding Symposium, 2013, IEEE Junjun Si, Siwei Ma, Wen Gao Insitute of Digital Media,
1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication.
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding
Shaobo Zhang, Xiaoyun Zhang, Zhiyong Gao
Efficient multi-frame motion estimation algorithms for MPEG-4 AVC/JVTH.264 Mei-Juan Chen, Yi-Yen Chiang, Hung- Ju Li and Ming-Chieh Chi ISCAS 2004.
Efficient Moving Object Segmentation Algorithm Using Background Registration Technique Shao-Yi Chien, Shyh-Yih Ma, and Liang-Gee Chen, Fellow, IEEE Hsin-Hua.
Video Transmission Adopting Scalable Video Coding over Time- varying Networks Chun-Su Park, Nam-Hyeong Kim, Sang-Hee Park, Goo-Rak Kwon, and Sung-Jea Ko,
An Error-Resilient GOP Structure for Robust Video Transmission Tao Fang, Lap-Pui Chau Electrical and Electronic Engineering, Nanyan Techonological University.
Efficient Motion Vector Recovery Algorithm for H.264 Based on a Polynomial Model Jinghong Zheng and Lap-Pui Chau IEEE TRANSACTIONS ON MULTIMEDIA, June.
Rate-Distortion Optimized Layered Coding with Unequal Error Protection for Robust Internet Video Michael Gallant, Member, IEEE, and Faouzi Kossentini,
Adaptive Deblocking Filter
1 Single Reference Frame Multiple Current Macroblocks Scheme for Multiple Reference IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Tung-Chien.
Michael A. Baker, Pravin Dalale, Karam S. Chatha, Sarma B. K. Vrudhula
Motion-compensation Fine-Granular-Scalability (MC-FGS) for wireless multimedia M. van der Schaar, H. Radha Proceedings of IEEE Symposium on Multimedia.
1 Efficient Multithreading Implementation of H.264 Encoder on Intel Hyper- Threading Architectures Steven Ge, Xinmin Tian, and Yen-Kuang Chen IEEE Pacific-Rim.
FAST MULTI-BLOCK SELECTION FOR H.264 VIDEO CODING Chang, A.; Wong, P.H.W.; Yeung, Y.M.; Au, O.C.; Circuits and Systems, ISCAS '04. Proceedings of.
Motion Vector Refinement for High-Performance Transcoding Jeongnam Youn, Ming-Ting Sun, Fellow,IEEE, Chia-Wen Lin IEEE TRANSACTIONS ON MULTIMEDIA, MARCH.
1 Slice-Balancing H.264 Video Encoding for Improved Scalability of Multicore Decoding Michael Roitzsch Technische Universität Dresden ACM & IEEE international.
1 An Efficient Method for DCT- Domain Image Resizing with Mixed Field/Frame-Mode Macroblocks Changhoon Yim and Michael A. Isnardi IEEE TRANSACTION ON CIRCUITS.
Source-Channel Prediction in Error Resilient Video Coding Hua Yang and Kenneth Rose Signal Compression Laboratory ECE Department University of California,
Block Partitioning Structure in the HEVC Standard
Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International.
A Low-Power VLSI Architecture for Full-Search Block-Matching Motion Estimation Viet L. Do and Kenneth Y. Yun IEEE Transactions on Circuits and Systems.
Xinqiao LiuRate constrained conditional replenishment1 Rate-Constrained Conditional Replenishment with Adaptive Change Detection Xinqiao Liu December 8,
Error Resilience of Video Transmission By Rate-Distortion Optimization and Adaptive Packetization Yuxin Liu, Paul Salama and Edwad Delp ICME 2002.
1. 1. Problem Statement 2. Overview of H.264/AVC Scalable Extension I. Temporal Scalability II. Spatial Scalability III. Complexity Reduction 3. Previous.
January 26, Nick Feamster Development of a Transcoding Algorithm from MPEG to H.263.
Liquan Shen Zhi Liu Xinpeng Zhang Wenqiang Zhao Zhaoyang Zhang An Effective CU Size Decision Method for HEVC Encoders IEEE TRANSACTIONS ON MULTIMEDIA,
Copyright 2013, Toshiba Corporation. DAC2013 Designer/User Track Scalability Achievement by Low-Overhead, Transparent Threads on an Embedded Many-Core.
PROJECT PROPOSAL HEVC DEBLOCKING FILTER AND ITS IMPLIMENTATION RAKESH SAI SRIRAMBHATLA UTA ID: EE 5359 Under the guidance of DR. K. R. RAO.
Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.
1 Data Partition for Wavefront Parallelization of H.264 Video Encoder Zhuo Zhao, Ping Liang IEEE ISCAS 2006.
Adaptive Multi-path Prediction for Error Resilient H.264 Coding Xiaosong Zhou, C.-C. Jay Kuo University of Southern California Multimedia Signal Processing.
Codec structuretMyn1 Codec structure In an MPEG system, the DCT and motion- compensated interframe prediction are combined. The coder subtracts the motion-compensated.
High Efficiency Video Coding Kiana Calagari CMPT 880: Large-scale Multimedia Systems and Cloud Computing.
Layer-aligned Multi-priority Rateless Codes for Layered Video Streaming IEEE Transactions on Circuits and Systems for Video Technology, 2014 Hsu-Feng Hsiao.
A Robust Luby Transform Encoding Pattern-Aware Symbol Packetization Algorithm for Video Streaming Over Wireless Network Dongju Lee and Hwangjun Song IEEE.
IEEE Transactions on Consumer Electronics, Vol. 58, No. 2, May 2012 Kyungmin Lim, Seongwan Kim, Jaeho Lee, Daehyun Pak and Sangyoun Lee, Member, IEEE 報告者:劉冠宇.
UNDER THE GUIDANCE DR. K. R. RAO SUBMITTED BY SHAHEER AHMED ID : Encoding H.264 by Thread Level Parallelism.
Video Compression—From Concepts to the H.264/AVC Standard
Parallel processing
COMPARATIVE STUDY OF HEVC and H.264 INTRA FRAME CODING AND JPEG2000 BY Under the Guidance of Harshdeep Brahmasury Jain Dr. K. R. RAO ID MS Electrical.
Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.
A Frame-Level Rate Control Scheme Based on Texture and Nontexture Rate Models for HEVC IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,
Hierarchical Systolic Array Design for Full-Search Block Matching Motion Estimation Noam Gur Arie,August 2005.
1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.
Efficient Huffman Decoding Aggarwal, M. and Narayan, A., International Conference on Image Processing, vol. 1, pp. 936 – 939, 2000 Presenter :Yu-Cheng.
Fine-granular Motion Matching for Inter-view Motion Skip Mode in Multi-view Video Coding Haitao Yanh, Yilin Chang, Junyan Huo CSVT.
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Adaptive Block Coding Order for Intra Prediction in HEVC
Scalable Speech Coding for IP Networks: Beyond iLBC
Steven Ge, Xinmin Tian, and Yen-Kuang Chen
PROJECT PROPOSAL HEVC DEBLOCKING FILTER AND ITS IMPLIMENTATION RAKESH SAI SRIRAMBHATLA UTA ID: EE 5359 Under the guidance of DR. K. R. RAO.
Scalable Speech Coding for IP Networks: Beyond iLBC
Bongsoo Jung, Byeungwoo Jeon
Scalable light field coding using weighted binary images
Presentation transcript:

Parallel Scalability and Efficiency of HEVC Parallelization Approaches Chi Ching Chi, Mauricio Alvarez-Mesa,, Ben Juurlink, Gordon Clare, F´elix Henry, St´ephane Pateux and Thomas Schierl IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Outline Introduction Video codec parallelization approaches Coding efficiency analysis Experimental evaluation Conclusions

Introduction While the single-core processor can decode a 1080p H.264/AVC video in real-time, it is very unlikely that processor performance will decode a 2160p50 HEVC video in real-time. To obtain real-time HEVC decoding performance, parallelism is no longer an option but a necessity.

Introduction H.264/AVC supports slice parallelization. It may not achieve real-time if it receives a video with one or a few slices per frame. The main parallelization approaches currently included in the HEVC draft (Tiles and Wavefront Parallel Processing[WPP]). This paper presents a approach called Overlapped Wavefront(OWF).

Previous parallelization strategies Frame-level parallelism Slice-level parallelism Macroblock-level parallelism

Frame-level parallelism Frame-level parallelism consists of processing multiple frames at the same time. Frame-level parallelism is sufficient for multicore systems with just a few cores. If due to fast motion, motion vectors are long, there is little parallelism.

Slice-level Parallelism Each frame can be partitioned into one or more slices. Slices in a frame are completely independent from each other and therefore they can also be used for parallel processing. It is useful for a frame with a few slices but not one slice per frame.

Macroblock-level Parallelism

Parallelization Strategies in HEVC Tiles Wavefront Parallel Processing (WPP) Overlapped Wavefront (OWF)

Tiles

Tiles The number of tiles and the location of their boundaries can be defined for the entire sequence or changed from picture to picture. Compared to slices, Tiles have a better coding efficiency. The rate-distortion loss increases with the number of tiles. because Tiles allows picture partition shapes that contains samples with a potential higher correlation than slices

Wavefront Parallel Processing (WPP)

Overlapped Wavefront (OWF) When a thread has finished a CTB row in the current picture and no more rows are available it can start processing the next picture instead of waiting for the current picture to finish. The support this approach, the motion vector is contrained to ¼ of picture height.

Overlapped Wavefront (OWF)

Coding efficiency analysis

Coding efficiency analysis

Experimental evaluation Environment

Experimental evaluation

Experimental evaluation

Experimental evaluation

Experimental evaluation

Conclusions We present a detailed performance comparison of the main approaches, namely WPP ,Tiles and OWF. Tiles performance 7% higher than WPP on average at 12 cores. The proposed OWF 28% higher on average than Tiles. Achieve real-time performance for 1080p50 videos, but “only” 25.4 fps for 2160p.