Download presentation
Presentation is loading. Please wait.
Published byJohn Parks Modified over 9 years ago
1
Complexity Model Based Load- balancing Algorithm For Parallel Tools Of HEVC Yong-Jo Ahn, Tae-Jin Hwang, Dong-Gyu Sim, and Woo-Jin Han 2013 IEEE International Conference on Visual Communications and Image Processing (VCIP) 1
2
Outline Introduction Related Work Proposed Method Experimental Results Conclusion 2
3
Introduction Demand for new video coding standards has been increasing due to recent expansion of digital broadcasting services and the advent of various multimedia devices. Newly supported coding tools cause not only high coding efficiency but also high computational complexity caused from decision process for the diverse modes. 3
4
Cont. Some studies on parallel processing methods as well as fast mode decision algorithms for HEVC fast encoder are considered to be one of key part in progress. In this paper, parallel processing methods using slice and tile tools supported by HEVC is introduced and load-balancing algorithm which enhances slice and tile parallel processing is proposed in this paper. 4
5
Related Work A few parallel tools are adopted in the HEVC main profile and key tools for parallel processing are tile [5] and wave-front parallel processing (WPP) [6]. Parallel method – Tile – Entropy slice – WPP(Wavefront parallel processing) [5] A. Fuldseth, M. Horowitz, S. Xu, A. Gegall, and M. Zhou, "Tiles," ITU-T/ISO/IEC JCT-VC doc., JCTVCE196, Mar. 2011. [6] F. Henry and S. Pateux, "Wavefront parallel processing," ITU-T/ISO/IEC JCT-VC doc., JCTVCE196, Mar. 2011. 5
6
Cont. (a) Tile (b) Entropy slice (c) WPP 6
7
Cont. To select suitable parallel options, several factors such as encoding time saving, coding efficiency decrease, and extensibility for the number of processing cores should be considered. Coding efficiency decrease is also one of the most important factors in adopting parallel processing. 7
8
Cont. Data-level parallelism can be applied to the frame-, slice-, tile-, or coding unit-level according to the parallelization methods. Number of non-referenced B frames in IBBP coding structures significantly impacts on coding efficiency and restricts extensibility of processing cores. 8
9
Cont. Extensibility of the number of processing cores is the highest and coding efficiency loss is also the smallest when using WPP. However, it is hard to expect a large encoding time saving with WPP due to restricted data dependency. Generally, increase of the number of slices and tiles impacts on bitrate much for low resolution sequences, but increase of the number of slices and tiles does not influence on bitrate much for high resolution sequences. 9
10
Proposed Method To resolve high computational complexity of HEVC encoder, various technical contributions on early termination methods and fast mode decision algorithms are adopted for the reference software [7][8]. However, it is not easy to achieve a real-time encoder with only the fast algorithms. Computational load should be balanced among core. [7] R. H. Gweon, Y.-L. Lee, and J. Lim, "Early termination of CU encoding to reduce HEVC complexity," ITUT/ ISO/IEC JCT-VC doc., JCTVC-F045, July 2011. [8] K. Choi and E. S. Jang, "Coding tree pruning based CU early termination," ITU-T/ISO/IEC JCT- VC doc., JCTVC-F092, July 2011. 10
11
Complexity Model For HEVC Encoder For slice and tile tools, the number of CTU should be determined earlier than actual encoding with complexity prediction. 11 (1)
12
Cont. 12 R(s, m) : complexity per unit. r(s, m) : complexity ratio of each CU size and mode. w(s) : width of CU size. NF : a normalization factor for fixed- point operation.
13
Cont. The proposed complexity model for HEVC encoder is evaluated with the Pearson product moment correlation with HEVC common test sequences under the HEVC common test conditions. 13
14
Cont. Pearson product-moment correlation coefficient is a measure of the linear correlation between two variables X and Y, giving a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation. 14
15
Complexity Model Based Load-balancing Algorithm For Parallel Tools Of HEVC Number of CTUs for each temporal level slice 15 L(k) : the number of CTUs assigned to k-th slice. i : frame index. j : temporal layer id. k : slice number. N is the number of slices in a frame. CTU inFrame is the number of CTUs in the frame.
16
Cont. Number of CTUs are assigned to each tile for a temporal layer with column and row offsets for load- balancing for tile-level parallel processing. 16 L(k) : the number of CTUs assigned to k-th tile. i : frame index. j : temporal layer id. k : tile number. N lnWidth and N height : number of tiles composing a frame in horizontal and vertical directions. CTU lnWidth and CTU height : number of CTUs of a tile in horizontal and vertical directions.
17
Cont. Control of complexity balancing for a tile-level parallelism is harder than that for a slice-level parallelism because size of tile is determined by only tile width and height not by CTU offset used in load balancing for slice-level parallelism. 17
18
Experimental Results HM 11.0 reference software is utilized. A PC equipped with the Intel® Core™ i7-3930K CPU and 16GB memory was used for this evaluation. Intel® C++ 64- bit compiler XE 13.0 used in Windows 7 64-bit operating system. A frame is partitioned into four slices or tiles for fair evaluation. Two fast encoding algorithms, CFM [7] and ECU [8] adopted for HM are employed to evaluate the proposed load- balanced parallelization. 18 [7] R. H. Gweon, Y.-L. Lee, and J. Lim, "Early termination of CU encoding to reduce HEVC complexity," ITUT/ ISO/IEC JCT-VC doc., JCTVC-F045, July 2011. [8] K. Choi and E. S. Jang, "Coding tree pruning based CU early termination," ITU- T/ISO/IEC JCT-VC doc., JCTVC-F092, July 2011.
19
Cont. 19
20
Cont. 20
21
Cont. 21
22
Conclusion To maximize encoding time gain of parallel processing for HEVC encoder, load balance algorithms based complexity prediction model are proposed. Average ATS gain of slice-level parallel processing is achieved by 12.05% by adaptively adjusting the number of CTUs. Average ATS gain of tile-level parallel processing is 3.81 %. ATS gain obtained by load-balancing algorithm is higher in slice-level than in tile-level parallelism. 22
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.