Download presentation
Presentation is loading. Please wait.
1
Bongsoo Jung, Byeungwoo Jeon
Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication and Image Representation 2008
2
Outline Introduction Complexity Analysis Method Experimental Results
Pre Macroblock Mode Selection Adaptive Slice-level Parallelism Experimental Results Conclusions
3
Introduction H.264/AVC achieves high coding efficiency
Variable block size, multiple reference frame, quarter-pel motion vector accuracy,etc. High computational complexity Complexity reduction algorithm Parallel processing
4
Introduction GOP level Frame level Slice level Macroblock level
Simple but high latency Frame level Keep coding efficiency, but the dependence among frames limits the thread scalability Slice level Encode independently but less coding efficiency Macroblock level High dependency
5
Introduction MBs in a slice may not have similar computational complexity. Unnecessary extra waiting time in some threads. PU0 slice 0 PU1 slice 1 PU2 slice 2 PU3 slice 3 PU4 slice 4 PU5 slice 5 PU6 slice 6 PU7 slice 7 Encoding time
6
Main Purpose Objective Method
Using parallel algorithm to speed up H.264/AVC encoder Maximize the parallelism efficiency by distributing the workload equally. Method Pre processing: Fast MB mode selection Adaptive slice-level parallelism
7
Complexity Analysis Inter prediction mode of MBs in H.264
Intra prediction mode: 4*4, 16*16
8
Complexity Analysis The run-time complexity of the H.264/AVC encoder
Pentium IV 2.4GHz Foreman_CIF with IPPP structure
9
Pre Macroblock Mode Selection Overview
Why? High computational complexity of ME in variable block size Remove unnecessary ME block size and RD calculation of intra prediction mode This removal leads to Complexity reduction Workload balancing among slices
10
Pre Macroblock Mode Selection Inter MB mode selection
MC block sizes in video sequence Foreground region : 8*8 or smaller Non-moving region : 16*16 High temporal correlation Check consistency history of block size 16*16 and zero MV Two measurements Zero motion consistency (ZMC) Large block consistency (LBC)
11
Pre Macroblock Mode Selection Inter MB mode selection
Zero Motion Consistency (ZMC) Indicates how long a specified block has had a zero MV consecutively When a block is encoded in intra mode ZMC is set to 0 t : frame index , ZMC0 = 0, (n,m;i,j) indicates a 4*4 block at (n,m) within a MB (i,j) high value of ZMC high prob. of belonging to background region
12
Pre Macroblock Mode Selection Inter MB mode selection
Zero Motion Consistency Score Indicates how likely a MB being a stationary region TMOTION : A threshold value
13
Pre Macroblock Mode Selection Inter MB mode selection
Large Block Consistency (LBC) Indicates the number of continuous frames having a 16*16 MC block size at (i,j)th MB When a block is encoded in intra mode LBC is set to 0 bestModet(i,j) : The best MB mode of the (i,j) MB in tth frame LBC0 = 0
14
Pre Macroblock Mode Selection Inter MB mode selection
Large Block Consistency Score Indicates how likely a MB being partitioned in 16*16 TMODE1 ,TMODE2 : Threshold values used to make the assessment of the LBC
15
Pre Macroblock Mode Selection Inter MB mode selection
A illustration of LBCS
16
Pre Macroblock Mode Selection Inter MB mode selection
Conditional probability of MB modes given ZMCS = High The other block sizes are very unlikely to appear (less than about 0.04) Early detect SKIP and P16*16 mode TMotion = 4
17
Pre Macroblock Mode Selection Inter MB mode selection
Joint conditional probability of given LBCS with ZMCS = Low TMODE1 = 1, TMODE2 = 4 A: LBCS = High, B: LBCS = Medium, C: LBCS = Low
18
Pre Macroblock Mode Selection Pre selective intra mode selection
High computational load of computing RD costs of intra mode Comparing temporal correlation with spatial correlation of the current MB prior to frame coding
19
Pre Macroblock Mode Selection Selective intra mode selection
Mean Absolute Temporal Difference Mean Absolute Spatial Difference cx,y : Pixel values at location (x,y) of MB in current frame rx,y : Pixel values at location (x,y) of MB in previous frame X, Y : Horizontal and vertical dimensions of a MB MASDH : The MASD between horizontally neighboring pixels MASDV : The MASD between vertically
20
Pre Macroblock Mode Selection Selective intra mode selection
Comparing MATD and MASD to determine whether current MB should calculate RD costs of intra modes A larger w makes skipping intra mode search easier A smaller QP will incur more intra modes than a larger QP More temporally correlated than spatially correlated w: Weighting factor, currently is set to 0.6
21
Pre Macroblock Mode Selection MB mode classfication
Decision table of candidate MB mode A block diagram of MB selection
22
Adaptive Slice-level Parallelism Overview
Characteristic Easy to implement Lower overhead of inter communication among processor unit Good scalability Increase bitrate Slice boundary is defined on the basis of a fixed number of MBs or fixed number of bits Hard to decide a slice boundary prior to encoding
23
Adaptive Slice-level Parallelism Fixed MB assignment
The number of consecutive MBs in each slice L : The number of processor units on a multi-core system M : The total number of MBs in a frame i : Slice index Example : number of processing unit L = 8, sequence resolution is CIF (352*288), M = 22*18 = 396 We can assign about 49 MBs to each slice
24
Adaptive Slice-level Parallelism Fixed MB assignment
The scheduling of slice-level parallelism in eight processor units Ideal case Practical case PU0 slice 0 PU0 slice 0 PU1 slice 1 PU1 slice 1 PU2 slice 2 PU2 slice 2 PU3 slice 3 PU3 slice 3 PU4 slice 4 PU4 slice 4 PU5 PU5 Bottleneck slice 5 slice 5 PU6 slice 6 PU6 slice 6 PU7 slice 7 PU7 slice 7 Encoding time Encoding time
25
Adaptive Slice-level Parallelism Fixed MB assignment
The imbalance of computational load distribution Exhaustive Search Method Fast ME / Fast Mode Search
26
Adaptive Slice-level Parallelism Fixed MB assignment
Computational load for encoding one frame in slice level parallelism Computation load of the tth frame by a single processor system Ctslice(i) : The computational load of ith slice in tth frame L : Number of slice in a frame
27
Adaptive Slice-level Parallelism Fixed MB assignment
The speedup of multiprocessor system over a single processor system To achieve the maximum speedup Computation loads of each slice should be as similar as possible Adaptive slice partition method
28
Adaptive Slice-level Parallelism Complexity estimation model
A simple estimation method by utilizing the result of fast MB mode selection Define the group value g corresponding to the candidate MB modes
29
Adaptive Slice-level Parallelism Complexity estimation model
Complexity model Ck,CHKIntra(g) : Complexity cost of the kth MB g : Group index einter : Estimated complexity cost of inter mode in g = 1 eintra : Complexity cost according to the intra mode check in g = 1 α1, α2, α3, β1 β2 β3 : Weighting values of complexity cost
30
Adaptive Slice-level Parallelism Complexity estimation model
Relative computational load CHKintra = 0 Assume einter = 1, eintra = 0 α1=2.42, α2=3.12,α3=5.28 CHKintra = 1 Assume einter = 1, eintra = 3.97 β1=0.82, β2=0.83, β3=0.84
31
Adaptive Slice-level Parallelism Adaptive MB assignment
The total computational load at the tth frame Ideal computational load of each slice for the uniform workload distribution
32
Adaptive Slice-level Parallelism Adaptive MB assignment
MB assignment of slice Much better than fixed MB assignment in each slice
33
Adaptive Slice-level Parallelism Adaptive MB assignment
Entire block diagram
34
Experimental Results Overview
Performance comparison between proposed MB mode decision and the conventional method Comparing adaptive slice-level parallelism with fixed slice-level parallelism
35
Experimental Results MB mode selection
Average encoding time saving AST[%] BDPSNR and BDBR are used to measure the performance against FULL_1Slice FULL_1Slice : Exhaustive method FMD_1Slice : Fast MB mode search method
36
Experimental Results Rate distortion curves
37
Experimental Results R-D performance compared to one slice per frame (FMD_1Slice)
38
Experimental Results Rate distortion curves
39
Experimental Results Slice-level parallelism
Comparing adaptive and fixed slice level parallelism Speedup Encoding time of one slice per frame by a single processor system The longest encoding time of a slice using fixed mode The longest encoding time of a slice using adaptive mode
40
Experimental Results Speedup
41
Conclusions Proposed a fast MB mode selection using consistency history of block size and a zero MV Proposed a intra mode selection by comparing the correlation Using these two schemes, they proposed a new adaptive slice-level parallelism to speed up H.264/AVC encoder
42
Reference Z. Chen, P. Zhou, Y. He, Fast motion estimation for JVT, JVT Doc.JVT-G016,March 2003. B. Jeon, J. Lee, Fast mode decision for H.264, JVT-J003, ISO/IEC MPEG and ITU-T VCEG Joint Video Team, (Waikoloa, HI), December 2003. I. Choi, J. Lee, B. Jeon, Fast coding mode selection with rate-distortion optimization for MPEG-4 Part-10 AVC/H.264, IEEE Trans. Circuits Syst. VideoTechnol. 16 (12) (2006) 1557–1561.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.