Shaobo Zhang, Xiaoyun Zhang, Zhiyong Gao

Slides:



Advertisements
Similar presentations
Packet Video Error Concealment With Auto Regressive Model Yongbing Zhang, Xinguang Xiang, Debin Zhao, Siwe Ma, Student Member, IEEE, and Wen Gao, Fellow,
Advertisements

Parallel Scalability and Efficiency of HEVC Parallelization Approaches
Wen-Hsiao Peng Chun-Chi Chen
Parallelizing Video Transcoding With Load Balancing On Cloud Computing Song Lin, Xinfeng Zhang, Qin Y, Siwei Ma Circuits and Systems, 2013 IEEE.
Towards Efficient Wavefront Parallel Encoding of HEVC: Parallelism Analysis and Improvement Keji Chen, Yizhou Duan, Jun Sun, Zongming Guo 2014 IEEE 16th.
Time Optimization of HEVC Encoder over X86 Processors using SIMD
INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS, ICT '09. TAREK OUNI WALID AYEDI MOHAMED ABID NATIONAL ENGINEERING SCHOOL OF SFAX New Low Complexity.
MULTIMEDIA PROCESSING STUDY AND IMPLEMENTATION OF POPULAR PARALLELING TECHNIQUES APPLIED TO HEVC Under the guidance of Dr. K. R. Rao By: Karthik Suresh.
-1/20- MPEG 4, H.264 Compression Standards Presented by Dukhyun Chang
MULTIMEDIA PROCESSING
Concealment of Whole-Picture Loss in Hierarchical B-Picture Scalable Video Coding IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 1, JANUARY 2009 Xiangyang.
A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors Chenggang Yan, Yongdong Zhang, Jizheng Xu, Feng Dai,
Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad.
Efficient Bit Allocation and CTU level Rate Control for HEVC Picture Coding Symposium, 2013, IEEE Junjun Si, Siwei Ma, Wen Gao Insitute of Digital Media,
1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication.
Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding
Wei Zhu, Xiang Tian, Fan Zhou and Yaowu Chen IEEE TCE, 2010.
Low-complexity mode decision for MVC Liquan Shen, Zhi Liu, Ping An, Ran Ma and Zhaoyang Zhang CSVT
1 Single Reference Frame Multiple Current Macroblocks Scheme for Multiple Reference IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Tung-Chien.
Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.
1 Efficient Multithreading Implementation of H.264 Encoder on Intel Hyper- Threading Architectures Steven Ge, Xinmin Tian, and Yen-Kuang Chen IEEE Pacific-Rim.
Efficient Fine Granularity Scalability Using Adaptive Leaky Factor Yunlong Gao and Lap-Pui Chau, Senior Member, IEEE IEEE TRANSACTIONS ON BROADCASTING,
1 Slice-Balancing H.264 Video Encoding for Improved Scalability of Multicore Decoding Michael Roitzsch Technische Universität Dresden ACM & IEEE international.
1 An Efficient Mode Decision Algorithm for H.264/AVC Encoding Optimization IEEE TRANSACTION ON MULTIMEDIA Hanli Wang, Student Member, IEEE, Sam Kwong,
A New Rate-Complexity-QP Algorithm for HEVC Intra-Picture Rate Control LING TIAN, YIMIN ZHOU, AND XIAOJUN CAO 2014 INTERNATIONAL CONFERENCE ON COMPUTING,
BIN LI, HOUQIAN LI, LI LI, AND JINLEI ZHANG IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL.23, NO.9, SEPTEMBER
Block Partitioning Structure in the HEVC Standard
Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International.
1. 1. Problem Statement 2. Overview of H.264/AVC Scalable Extension I. Temporal Scalability II. Spatial Scalability III. Complexity Reduction 3. Previous.
Liquan Shen Zhi Liu Xinpeng Zhang Wenqiang Zhao Zhaoyang Zhang An Effective CU Size Decision Method for HEVC Encoders IEEE TRANSACTIONS ON MULTIMEDIA,
Online Dictionary Learning for Sparse Coding International Conference on Machine Learning, 2009 Julien Mairal, Francis Bach, Jean Ponce and Guillermo Sapiro.
PROJECT PROPOSAL HEVC DEBLOCKING FILTER AND ITS IMPLIMENTATION RAKESH SAI SRIRAMBHATLA UTA ID: EE 5359 Under the guidance of DR. K. R. RAO.
PROJECT INTERIM REPORT HEVC DEBLOCKING FILTER AND ITS IMPLEMENTATION RAKESH SAI SRIRAMBHATLA UTA ID:
Reducing/Eliminating visual artifacts in HEVC by Deblocking filter By: Harshal Shah Under the guidance of: Dr. K. R. Rao.
EE 5359 PROJECT PROPOSAL FAST INTER AND INTRA MODE DECISION ALGORITHM BASED ON THREAD-LEVEL PARALLELISM IN H.264 VIDEO CODING Project Guide – Dr. K. R.
1 Data Partition for Wavefront Parallelization of H.264 Video Encoder Zhuo Zhao, Ping Liang IEEE ISCAS 2006.
Low-Power H.264 Video Compression Architecture for Mobile Communication Student: Tai-Jung Huang Advisor: Jar-Ferr Yang Teacher: Jenn-Jier Lien.
High Efficiency Video Coding Kiana Calagari CMPT 880: Large-scale Multimedia Systems and Cloud Computing.
Rate-GOP Based Rate Control for HEVC SHANSHE WANG, SIWEI MA, SHIQI WANG, DEBIN ZHAO, AND WEN GAO IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING,
EE5359 Multimedia Processing Interim Presentation SPRING 2015 ADVISOR: Dr. K.R.Rao EE5359 Multimedia Processing1 BY: BHARGAV VELLALAM SRIKANTESWAR
Figure 1.a AVS China encoder [3] Video Bit stream.
-BY KUSHAL KUNIGAL UNDER GUIDANCE OF DR. K.R.RAO. SPRING 2011, ELECTRICAL ENGINEERING DEPARTMENT, UNIVERSITY OF TEXAS AT ARLINGTON FPGA Implementation.
High-efficiency video coding: tools and complexity Oct
Vamsi Krishna Vegunta University of Texas, Arlington
IEEE Transactions on Consumer Electronics, Vol. 58, No. 2, May 2012 Kyungmin Lim, Seongwan Kim, Jaeho Lee, Daehyun Pak and Sangyoun Lee, Member, IEEE 報告者:劉冠宇.
UNDER THE GUIDANCE DR. K. R. RAO SUBMITTED BY SHAHEER AHMED ID : Encoding H.264 by Thread Level Parallelism.
Reducing/Eliminating visual artifacts in HEVC by Deblocking filter Submitted By: Harshal Shah Under the guidance of Dr. K. R. Rao.
Porting of Fast Intra Prediction in HM7.0 to HM9.2
Parallel processing
EFFICIENT PARALLEL FRAMEWORK FOR H.264 AVC DEBLOCKING FILTER ON MANY-CORE PLATFORM Yongdong Zhang, Member, IEEE, Chenggang Yan, Feng Dai, and Yike Ma.
UNDER THE GUIDANCE DR. K. R. RAO SUBMITTED BY SHAHEER AHMED ID : Encoding H.264 by Thread Level Parallelism.
Time Optimization of HEVC Encoder over X86 Processors using SIMD
EE5359 Multimedia Processing Final Presentation SPRING 2015 ADVISOR: Dr. K.R.Rao EE5359 Multimedia Processing1 BY: BHARGAV VELLALAM SRIKANTESWAR
Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.
Time Optimization of HEVC Encoder over X86 Processors using SIMD Kushal Shah Advisor: Dr. K. R. Rao Spring 2013 Multimedia.
Highly Parallel Mode Decision Method for HEVC Jun Zhang, Feng Dai, Yike Ma, and Yongdong Zhang Picture Coding Symposium (PCS),
A Frame-Level Rate Control Scheme Based on Texture and Nontexture Rate Models for HEVC IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,
1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.
Interim Report – Spring 2014 Course: EE5359 – Multimedia Processing Performance Comparison of HEVC & H.264 using various test sequences Under the guidance.
E ARLY TERMINATION FOR TZ SEARCH IN HEVC MOTION ESTIMATION PRESENTED BY: Rajath Shivananda ( ) 1 EE 5359 Multimedia Processing Individual Project.
Adaptive Block Coding Order for Intra Prediction in HEVC
Early termination for tz search in hevc motion estimation
Entropy Slices for Parallel Entropy Coding K. Misra, J. Zhao and A
Steven Ge, Xinmin Tian, and Yen-Kuang Chen
Quad-Tree Motion Modeling with Leaf Merging
Study and Optimization of the Deblocking Filter in H
PROJECT PROPOSAL HEVC DEBLOCKING FILTER AND ITS IMPLIMENTATION RAKESH SAI SRIRAMBHATLA UTA ID: EE 5359 Under the guidance of DR. K. R. RAO.
Viewport-based 360 Video Streaming:
Viewport-based 360 Video Streaming:
Bongsoo Jung, Byeungwoo Jeon
Presentation transcript:

Shaobo Zhang, Xiaoyun Zhang, Zhiyong Gao Implementation And Improvement Of Wavefront Parallel Processing For HEVC Encoding On Many-core Platform Shaobo Zhang, Xiaoyun Zhang, Zhiyong Gao 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

Outline Introduction Proposed Method Experimental Results Conclusion

Introduction In HEVC, two parallel tools, Tile and WPP, are presented to facilitate high level parallel processing. Compared with slice and Tile, WPP neither changes the regular raster scan order nor breaks coding dependencies at rows boundaries. WPP may often provide better compression performance and avoid some visual artifacts that may be induced by Tile and slice parallelism.

Introduction(Cont.) Several related works focus on improving parallelism of HEVC. Chi[4] presents a novel approach called Overlapped Wavefront (OWF) is provided to enhance the parallel efficiency of WPP. Yan[5] utilizes the data dependencies among neighboring CTUs and PU regions to exploit the implicit parallelism. [4] C. C. Chi et al., “Parallel scalability and efficiency of HEVC parallelization approaches,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, pp. 1827–1838, Dec. 2012. [5] Chenggang Yan et al., “Highly parallel framework for HEVC motion estimation on many-core platform,” Proc. DCC, pp. 63-72, Mar. 2013.

Introduction(Cont.) WPP and its applications still have some shortages. HEVC test model(HM) is a single-core codec, thus the serial realization of WPP in HM is not suitable for HEVC encoding on many-core platform. Due to the wavefront dependencies, it will introduce parallelization inefficiencies and becomes worse when a high number of processors is utilized.

Proposed Method Besides the first row of a slice, WPP requires control signaling to inform whether the top-right CTU in previous row has been encoded when processing a CTU. Additional memory to store side information and probabilities of CABAC are required by the next rows.

Proposed Method(Cont.) Try-and-wait mechanism is presented to apply WPP for HEVC encoder on many-core platform. The control signaling are stored CTU by CTU, thus W × H bytes are required. Current CTU should check whether the top-right CTU in previous row has been done before its processing. If not, the correspond core should wait and attempt again.

Ping-pang storage is utilized to reduce memory for side information storage.

Data reuse structure is also utilized for probabilities storage of CABAC. Probabilities of previous row have been utilized and unnecessary any more, thus they can be write off by the newest probabilities. Data reuse structure can reduce 88% for probabilities storage. Based on the above methods, WPP is realized for real-time HEVC encoder efficiently on many-core platform.

Proposed Method(Cont.) Parallel scalability model of WPP When the encoding speed ceases to increase with the increase of cores, the encoder gets to its Maximum Parallel Scalability (MPS) k : number of cores. n : CTU units (rows, Tile or slice) number in one frame.

Proposed Method(Cont.) α : remaining rows. u = ceil(H/k) v = (H−1)mod k

Proposed Method(Cont.) Improvement of parallel scalability for WPP Reduce CTU size Combine WPP with slice-level parallelism Combine WPP with frame-level parallelism

Proposed Method(Cont.) Reduce CTU size The reduction of CTU size is an efficient way to increase the height of CTU rows and improve the parallel scalability accordingly.

Proposed Method(Cont.) Although the reduction of CTU size can increase the parallel scalability of WPP effectively, however, it decreases the coding efficiency. Kim[6] proves that BD-rate drops about 3.4% to 14.4% performance loss when CTU size decreases from 32 × 32 to 16 × 16. CTU size of 32×32 would be preferable to balance the parallelism and performance loss. [6] Kim et al., “Block partitioning structure in the HEVC standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, pp. 1649–1668, Dec. 2012.

Proposed Method(Cont.) Combine WPP with slice-level parallelism Slice-level parallelism, such as slice and Tile, can break some dependencies among rows, thus the parallel scalability can be enhanced when they combined with WPP. Clare[7] implements two type of combinations of Tile and WPP, which divide frame into two independent or dependent Tiles side-by-side and each Tile is wavefront processed. [7] G. Clare et al., “Wavefront parallel processing for HEVC encoding and decoding,” JCTVCF0274, July. 2011.

Proposed Method(Cont.) Combination of 2-4 slices and WPP under 32 × 32 CTU size will bring promising parallel scalability while keep minor performance loss. m : number of slices or tiles. Hm = H/m. v' = (Hm−1) mod [floor(k/m)]

Proposed Method(Cont.)

Proposed Method(Cont.) Combine WPP with frame-level parallelism Two GOP structures, IPpP and IPpp, are introduced to improve parallelism, where I and P can be used as reference frame while p(denotes as disposable frame) can not be used as reference. When a row has been encoded and no more tasks are available in current picture, WPP combined with frame-level parallelism will start next 1−3 frames simultaneously.

Proposed Method(Cont.) It can be inferred that H −2 cores are enough for the encoding in parallel. Start time can be deduced as NW + 2Nr + 1. Finish moment of the Nth picture can be deduced as (N + 2)W + 2Nr + 2 r : maximum vertical search range. N : Nth picture.

Proposed Method(Cont.) Finishing moment of the N frame is (α + 2)W + 2αr + 2 (p+1)(H −r) cores are enough to attain its MPS r : maximum vertical search range. p : number of disposable frame. α = ceil[ N/(p+1) ].

Experimental Results Test sequences and encode environments Adopt an encoder named FHM10.0 migrated from HEVC reference software HM10.0. The input videos in our experiments contain a list of standard test sequences with 100 frames, and motion search range is set to 64. Select the Main profile and the default encoding test conditions are specified in [8]. The experiment platform of this paper is based on GX36, which is a member of TILERA many-core processor family and contains 36 processing cores. [8] F. Bossen, “Common test conditions and software reference configurations,” JCTVCI1100, Apr. 2012.

Experimental Results Parallel scalability analysis

Conclusion Several effective methods, such as try-and-wait data interface, ping-pang storage and data reuse structure, are presented to realize WPP on HEVC encoder in parallel. Three effective methods are presented to improve parallel scalability of WPP. Experimental results show that our proposed methods improve more than 40% maximum parallel scalability when compared with WPP.