Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International.

Slides:



Advertisements
Similar presentations
Parallel Scalability and Efficiency of HEVC Parallelization Approaches
Advertisements

Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Low-complexity merge candidate decision for fast HEVC encoding Multimedia and Expo Workshops (ICMEW), 2013 IEEE International Conference on Muchen LI,
Towards Efficient Wavefront Parallel Encoding of HEVC: Parallelism Analysis and Improvement Keji Chen, Yizhou Duan, Jun Sun, Zongming Guo 2014 IEEE 16th.
Time Optimization of HEVC Encoder over X86 Processors using SIMD
INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS, ICT '09. TAREK OUNI WALID AYEDI MOHAMED ABID NATIONAL ENGINEERING SCHOOL OF SFAX New Low Complexity.
MULTIMEDIA PROCESSING STUDY AND IMPLEMENTATION OF POPULAR PARALLELING TECHNIQUES APPLIED TO HEVC Under the guidance of Dr. K. R. Rao By: Karthik Suresh.
-1/20- MPEG 4, H.264 Compression Standards Presented by Dukhyun Chang
MULTIMEDIA PROCESSING
A KLT-Based Approach for Occlusion Handling in Human Tracking Chenyuan Zhang, Jiu Xu, Axel Beaugendre and Satoshi Goto 2012 Picture Coding Symposium.
A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors Chenggang Yan, Yongdong Zhang, Jizheng Xu, Feng Dai,
Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad.
Efficient Bit Allocation and CTU level Rate Control for HEVC Picture Coding Symposium, 2013, IEEE Junjun Si, Siwei Ma, Wen Gao Insitute of Digital Media,
H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS.
1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication.
{ Fast Disparity Estimation Using Spatio- temporal Correlation of Disparity Field for Multiview Video Coding Wei Zhu, Xiang Tian, Fan Zhou and Yaowu Chen.
Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding
Wei Zhu, Xiang Tian, Fan Zhou and Yaowu Chen IEEE TCE, 2010.
Shaobo Zhang, Xiaoyun Zhang, Zhiyong Gao
Outline Introduction Introduction Fast Inter Prediction Mode Decision for H.264 – –Pre-encoding An Efficient Inter Mode Decision Approach for H.264 Video.
Binary Image Compression Using Efficient Partitioning into Rectangular Regions IEEE Transactions on Communications Sherif A.Mohamed and Moustafa M. Fahmy.
Motion-compensation Fine-Granular-Scalability (MC-FGS) for wireless multimedia M. van der Schaar, H. Radha Proceedings of IEEE Symposium on Multimedia.
Automatic Key Video Object Plane Selection Using the Shape Information in the MPEG-4 Compressed Domain Berna Erol and Faouzi Kossentini, Senior Member,
Efficient Fine Granularity Scalability Using Adaptive Leaky Factor Yunlong Gao and Lap-Pui Chau, Senior Member, IEEE IEEE TRANSACTIONS ON BROADCASTING,
1 An Efficient Mode Decision Algorithm for H.264/AVC Encoding Optimization IEEE TRANSACTION ON MULTIMEDIA Hanli Wang, Student Member, IEEE, Sam Kwong,
A New Rate-Complexity-QP Algorithm for HEVC Intra-Picture Rate Control LING TIAN, YIMIN ZHOU, AND XIAOJUN CAO 2014 INTERNATIONAL CONFERENCE ON COMPUTING,
Block Partitioning Structure in the HEVC Standard
Topics in Signal Processing Project Proposal
Low Complexity Scalable DCT Image Compression IEEE International Conference on Image Processing 2000 Philips Research Laboratories, Eindhoven, Netherlands.
An Introduction to H.264/AVC and 3D Video Coding.
HARDEEPSINH JADEJA UTA ID: What is Transcoding The operation of converting video in one format to another format. It is the ability to take.
1. 1. Problem Statement 2. Overview of H.264/AVC Scalable Extension I. Temporal Scalability II. Spatial Scalability III. Complexity Reduction 3. Previous.
Liquan Shen Zhi Liu Xinpeng Zhang Wenqiang Zhao Zhaoyang Zhang An Effective CU Size Decision Method for HEVC Encoders IEEE TRANSACTIONS ON MULTIMEDIA,
Online Dictionary Learning for Sparse Coding International Conference on Machine Learning, 2009 Julien Mairal, Francis Bach, Jean Ponce and Guillermo Sapiro.
MGR: Multi-Level Global Router Yue Xu and Chris Chu Department of Electrical and Computer Engineering Iowa State University ICCAD
Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.
By Abhishek Hassan Thungaraj Supervisor- Dr. K. R. Rao.
EE 5359 PROJECT PROPOSAL FAST INTER AND INTRA MODE DECISION ALGORITHM BASED ON THREAD-LEVEL PARALLELISM IN H.264 VIDEO CODING Project Guide – Dr. K. R.
Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Split-Row: A Reduced Complexity, High Throughput.
Adaptive Multi-path Prediction for Error Resilient H.264 Coding Xiaosong Zhou, C.-C. Jay Kuo University of Southern California Multimedia Signal Processing.
- By Naveen Siddaraju - Under the guidance of Dr K R Rao Study and comparison of H.264/MPEG4.
EE 5359 TOPICS IN SIGNAL PROCESSING PROJECT ANALYSIS OF AVS-M FOR LOW PICTURE RESOLUTION MOBILE APPLICATIONS Under Guidance of: Dr. K. R. Rao Dept. of.
Adaptive Rate Control for HEVC Visual Communications and Image Processing (VCIP), 2012 IEEE Junjun Si, Siwei Ma, Xinfeng Zhang, Wen Gao 1.
Rate-GOP Based Rate Control for HEVC SHANSHE WANG, SIWEI MA, SHIQI WANG, DEBIN ZHAO, AND WEN GAO IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING,
- By Naveen Siddaraju - Under the guidance of Dr K R Rao Study and comparison between H.264.
Guillaume Laroche, Joel Jung, Beatrice Pesquet-Popescu CSVT
Reducing the Complexity of inter-prediction mode decision for High Efficiency Video Codec Kushal Shah Department of Electrical Engineering University of.
Fast Mode Decision Algorithm for Residual Quadtree Coding in HEVC Visual Communications and Image Processing (VCIP), 2011 IEEE.
An efficient Video Coding using Phase-matched Error from Phase Correlation Information Manoranjan Paul 1 and Golam Sorwar IEEE.
High-efficiency video coding: tools and complexity Oct
UNDER THE GUIDANCE DR. K. R. RAO SUBMITTED BY SHAHEER AHMED ID : Encoding H.264 by Thread Level Parallelism.
Porting of Fast Intra Prediction in HM7.0 to HM9.2
COMPARATIVE STUDY OF HEVC and H.264 INTRA FRAME CODING AND JPEG2000 BY Under the Guidance of Harshdeep Brahmasury Jain Dr. K. R. RAO ID MS Electrical.
Time Optimization of HEVC Encoder over X86 Processors using SIMD
Time Optimization of HEVC Encoder over X86 Processors using SIMD Kushal Shah Advisor: Dr. K. R. Rao Spring 2013 Multimedia.
A Frame-Level Rate Control Scheme Based on Texture and Nontexture Rate Models for HEVC IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,
1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.
Implementation and comparison study of H.264 and AVS china EE 5359 Multimedia Processing Spring 2012 Guidance : Prof K R Rao Pavan Kumar Reddy Gajjala.
E ARLY TERMINATION FOR TZ SEARCH IN HEVC MOTION ESTIMATION PRESENTED BY: Rajath Shivananda ( ) 1 EE 5359 Multimedia Processing Individual Project.
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Yimin Zhou, Hongyu Wang, Ling Tian and Ce Zhu
Quality Evaluation and Comparison of SVC Encoders
Adaptive Block Coding Order for Intra Prediction in HEVC
Early termination for tz search in hevc motion estimation
Porting of Fast Intra Prediction in HM7.0 to HM9.2
/ Fast block partitioning method in HEVC Intra coding for UHD video /
Viewport-based 360 Video Streaming:
Scalable Speech Coding for IP Networks: Beyond iLBC
Viewport-based 360 Video Streaming:
Bongsoo Jung, Byeungwoo Jeon
Presentation transcript:

Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International Conference on Visual Communications and Image Processing (VCIP) 1

Outline Introduction Related Work Proposed Method Experimental Results Conclusion 2

Introduction Demand for new video coding standards has been increasing due to recent expansion of digital broadcasting services and the advent of various multimedia devices. Newly supported coding tools cause not only high coding efficiency but also high computational complexity caused from decision process for the diverse modes. 3

Cont. Some studies on parallel processing methods as well as fast mode decision algorithms for HEVC fast encoder are considered to be one of key part in progress. In this paper, parallel processing methods using slice and tile tools supported by HEVC is introduced and load-balancing algorithm which enhances slice and tile parallel processing is proposed in this paper. 4

Related Work A few parallel tools are adopted in the HEVC main profile and key tools for parallel processing are tile [5] and wave-front parallel processing (WPP) [6]. Parallel method – Tile – Entropy slice – WPP(Wavefront parallel processing) [5] A. Fuldseth, M. Horowitz, S. Xu, A. Gegall, and M. Zhou, "Tiles," ITU-T/ISO/IEC JCT-VC doc., JCTVCE196, Mar [6] F. Henry and S. Pateux, "Wavefront parallel processing," ITU-T/ISO/IEC JCT-VC doc., JCTVCE196, Mar

Cont. (a) Tile (b) Entropy slice (c) WPP 6

Cont. To select suitable parallel options, several factors such as encoding time saving, coding efficiency decrease, and extensibility for the number of processing cores should be considered. Coding efficiency decrease is also one of the most important factors in adopting parallel processing. 7

Cont. Data-level parallelism can be applied to the frame-, slice-, tile-, or coding unit-level according to the parallelization methods. Number of non-referenced B frames in IBBP coding structures significantly impacts on coding efficiency and restricts extensibility of processing cores. 8

Cont. Extensibility of the number of processing cores is the highest and coding efficiency loss is also the smallest when using WPP. However, it is hard to expect a large encoding time saving with WPP due to restricted data dependency. Generally, increase of the number of slices and tiles impacts on bitrate much for low resolution sequences, but increase of the number of slices and tiles does not influence on bitrate much for high resolution sequences. 9

Proposed Method To resolve high computational complexity of HEVC encoder, various technical contributions on early termination methods and fast mode decision algorithms are adopted for the reference software [7][8]. However, it is not easy to achieve a real-time encoder with only the fast algorithms. Computational load should be balanced among core. [7] R. H. Gweon, Y.-L. Lee, and J. Lim, "Early termination of CU encoding to reduce HEVC complexity," ITUT/ ISO/IEC JCT-VC doc., JCTVC-F045, July [8] K. Choi and E. S. Jang, "Coding tree pruning based CU early termination," ITU-T/ISO/IEC JCT- VC doc., JCTVC-F092, July

Complexity Model For HEVC Encoder For slice and tile tools, the number of CTU should be determined earlier than actual encoding with complexity prediction. 11 (1)

Cont. 12 R(s, m) : complexity per unit. r(s, m) : complexity ratio of each CU size and mode. w(s) : width of CU size. NF : a normalization factor for fixed- point operation.

Cont. The proposed complexity model for HEVC encoder is evaluated with the Pearson product moment correlation with HEVC common test sequences under the HEVC common test conditions. 13

Cont. Pearson product-moment correlation coefficient is a measure of the linear correlation between two variables X and Y, giving a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation. 14

Complexity Model Based Load-balancing Algorithm For Parallel Tools Of HEVC Number of CTUs for each temporal level slice 15 L(k) : the number of CTUs assigned to k-th slice. i : frame index. j : temporal layer id. k : slice number. N is the number of slices in a frame. CTU inFrame is the number of CTUs in the frame.

Cont. Number of CTUs are assigned to each tile for a temporal layer with column and row offsets for load- balancing for tile-level parallel processing. 16 L(k) : the number of CTUs assigned to k-th tile. i : frame index. j : temporal layer id. k : tile number. N lnWidth and N height : number of tiles composing a frame in horizontal and vertical directions. CTU lnWidth and CTU height : number of CTUs of a tile in horizontal and vertical directions.

Cont. Control of complexity balancing for a tile-level parallelism is harder than that for a slice-level parallelism because size of tile is determined by only tile width and height not by CTU offset used in load balancing for slice-level parallelism. 17

Experimental Results HM 11.0 reference software is utilized. A PC equipped with the Intel® Core™ i7-3930K CPU and 16GB memory was used for this evaluation. Intel® C bit compiler XE 13.0 used in Windows 7 64-bit operating system. A frame is partitioned into four slices or tiles for fair evaluation. Two fast encoding algorithms, CFM [7] and ECU [8] adopted for HM are employed to evaluate the proposed load- balanced parallelization. 18 [7] R. H. Gweon, Y.-L. Lee, and J. Lim, "Early termination of CU encoding to reduce HEVC complexity," ITUT/ ISO/IEC JCT-VC doc., JCTVC-F045, July [8] K. Choi and E. S. Jang, "Coding tree pruning based CU early termination," ITU- T/ISO/IEC JCT-VC doc., JCTVC-F092, July 2011.

Cont. 19

Cont. 20

Cont. 21

Conclusion To maximize encoding time gain of parallel processing for HEVC encoder, load balance algorithms based complexity prediction model are proposed. Average ATS gain of slice-level parallel processing is achieved by 12.05% by adaptively adjusting the number of CTUs. Average ATS gain of tile-level parallel processing is 3.81 %. ATS gain obtained by load-balancing algorithm is higher in slice-level than in tile-level parallelism. 22