Download presentation
Presentation is loading. Please wait.
1
1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication and Image Representation 2008
2
2 Outline Introduction Complexity Analysis Method Pre Macroblock Mode Selection Adaptive Slice-level Parallelism Experimental Results Conclusions
3
3 Introduction H.264/AVC achieves high coding efficiency Variable block size, multiple reference frame, quarter-pel motion vector accuracy,etc. High computational complexity Complexity reduction algorithm Parallel processing
4
4 Introduction GOP level Simple but high latency Frame level Keep coding efficiency, but the dependence among frames limits the thread scalability Slice level Encode independently but less coding efficiency Macroblock level High dependency
5
5 Introduction MBs in a slice may not have similar computational complexity. Unnecessary extra waiting time in some threads. slice 0 slice 1 slice 2 slice 3 slice 4 slice 5 slice 6 slice 7 Encoding time PU0 PU1 PU2 PU3 PU4 PU5 PU6 PU7
6
6 Main Purpose Objective Using parallel algorithm to speed up H.264/AVC encoder Maximize the parallelism efficiency by distributing the workload equally. Method Pre processing: Fast MB mode selection Adaptive slice-level parallelism
7
7 Complexity Analysis Inter prediction mode of MBs in H.264 Intra prediction mode: 4*4, 16*16
8
8 Complexity Analysis The run-time complexity of the H.264/AVC encoder Pentium IV 2.4GHz Foreman_CIF with IPPP structure
9
9 Pre Macroblock Mode Selection Overview Why? High computational complexity of ME in variable block size Remove unnecessary ME block size and RD calculation of intra prediction mode This removal leads to Complexity reduction Workload balancing among slices
10
10 Pre Macroblock Mode Selection Inter MB mode selection MC block sizes in video sequence Foreground region : 8*8 or smaller Non-moving region : 16*16 High temporal correlation Check consistency history of block size 16*16 and zero MV Two measurements Zero motion consistency (ZMC) Large block consistency (LBC)
11
11 Pre Macroblock Mode Selection Inter MB mode selection Zero Motion Consistency (ZMC) Indicates how long a specified block has had a zero MV consecutively When a block is encoded in intra mode ZMC is set to 0 t : frame index, ZMC 0 = 0, (n,m;i,j) indicates a 4*4 block at (n,m) within a MB (i,j) high value of ZMC high prob. of belonging to background region
12
12 Pre Macroblock Mode Selection Inter MB mode selection Zero Motion Consistency Score Indicates how likely a MB being a stationary region T MOTION : A threshold value
13
13 Pre Macroblock Mode Selection Inter MB mode selection Large Block Consistency (LBC) Indicates the number of continuous frames having a 16*16 MC block size at (i,j) th MB When a block is encoded in intra mode LBC is set to 0 bestMode t (i,j) : The best MB mode of the (i,j) MB in tth frame LBC 0 = 0
14
14 Pre Macroblock Mode Selection Inter MB mode selection Large Block Consistency Score Indicates how likely a MB being partitioned in 16*16 T MODE1,T MODE2 : Threshold values used to make the assessment of the LBC
15
15 Pre Macroblock Mode Selection Inter MB mode selection A illustration of LBCS
16
16 Pre Macroblock Mode Selection Inter MB mode selection Conditional probability of MB modes given ZMCS = High The other block sizes are very unlikely to appear (less than about 0.04) Early detect SKIP and P16*16 mode T Motion = 4
17
17 Pre Macroblock Mode Selection Inter MB mode selection Joint conditional probability of given LBCS with ZMCS = Low A: LBCS = High, B: LBCS = Medium, C: LBCS = Low T MODE1 = 1, T MODE2 = 4
18
18 Pre Macroblock Mode Selection Pre selective intra mode selection High computational load of computing RD costs of intra mode Comparing temporal correlation with spatial correlation of the current MB prior to frame coding
19
19 Pre Macroblock Mode Selection Selective intra mode selection Mean Absolute Temporal Difference Mean Absolute Spatial Difference c x,y : Pixel values at location (x,y) of MB in current frame r x,y : Pixel values at location (x,y) of MB in previous frame X, Y : Horizontal and vertical dimensions of a MB MASD H : The MASD between horizontally neighboring pixels MASD V : The MASD between vertically neighboring pixels
20
20 Pre Macroblock Mode Selection Selective intra mode selection Comparing MATD and MASD to determine whether current MB should calculate RD costs of intra modes A larger w makes skipping intra mode search easier A smaller QP will incur more intra modes than a larger QP w: Weighting factor, currently is set to 0.6 More temporally correlated than spatially correlated
21
21 Pre Macroblock Mode Selection MB mode classfication Decision table of candidate MB mode A block diagram of MB selection
22
22 Adaptive Slice-level Parallelism Overview Characteristic Easy to implement Lower overhead of inter communication among processor unit Good scalability Increase bitrate Slice boundary is defined on the basis of a fixed number of MBs or fixed number of bits Hard to decide a slice boundary prior to encoding
23
23 Adaptive Slice-level Parallelism Fixed MB assignment The number of consecutive MBs in each slice L : The number of processor units on a multi-core system M : The total number of MBs in a frame i : Slice index Example : number of processing unit L = 8, sequence resolution is CIF (352*288), M = 22*18 = 396 We can assign about 49 MBs to each slice
24
24 Adaptive Slice-level Parallelism Fixed MB assignment The scheduling of slice-level parallelism in eight processor units slice 0 slice 1 slice 2 slice 3 slice 4 slice 5 slice 6 slice 7 Encoding time PU0 PU1 PU2 PU3 PU4 PU5 PU6 PU7 slice 0 slice 1 slice 2 slice 3 slice 4 slice 5 slice 6 slice 7 Encoding time PU0 PU1 PU2 PU3 PU4 PU5 PU6 PU7 Ideal casePractical case Bottleneck
25
25 Adaptive Slice-level Parallelism Fixed MB assignment The imbalance of computational load distribution Exhaustive Search Method Fast ME / Fast Mode Search
26
26 Adaptive Slice-level Parallelism Fixed MB assignment Computational load for encoding one frame in slice level parallelism Computation load of the t th frame by a single processor system C t slice(i) : The computational load of i th slice in t th frame L : Number of slice in a frame
27
27 Adaptive Slice-level Parallelism Fixed MB assignment The speedup of multiprocessor system over a single processor system To achieve the maximum speedup Computation loads of each slice should be as similar as possible Adaptive slice partition method
28
28 Adaptive Slice-level Parallelism Complexity estimation model A simple estimation method by utilizing the result of fast MB mode selection Define the group value g corresponding to the candidate MB modes
29
29 Adaptive Slice-level Parallelism Complexity estimation model Complexity model C k,CHKIntra (g) : Complexity cost of the k th MB g : Group index e inter : Estimated complexity cost of inter mode in g = 1 e intra : Complexity cost according to the intra mode check in g = 1 α 1, α 2, α 3, β 1 β 2 β 3 : Weighting values of complexity cost
30
30 Adaptive Slice-level Parallelism Complexity estimation model Relative computational load CHK intra = 0 CHK intra = 1 Assume e inter = 1, e intra = 0 α 1 =2.42, α 2 =3.12,α 3 =5.28 β 1 =0.82, β 2 =0.83, β 3 =0.84 Assume e inter = 1, e intra = 3.97
31
31 Adaptive Slice-level Parallelism Adaptive MB assignment The total computational load at the t th frame Ideal computational load of each slice for the uniform workload distribution
32
32 Adaptive Slice-level Parallelism Adaptive MB assignment MB assignment of slice Much better than fixed MB assignment in each slice
33
33 Adaptive Slice-level Parallelism Adaptive MB assignment Entire block diagram
34
34 Experimental Results Overview Performance comparison between proposed MB mode decision and the conventional method Comparing adaptive slice-level parallelism with fixed slice-level parallelism
35
35 Experimental Results MB mode selection Average encoding time saving AST[%] BDPSNR and BDBR are used to measure the performance against FULL_1Slice FULL_1Slice : Exhaustive method FMD_1Slice : Fast MB mode search method
36
36 Experimental Results Rate distortion curves
37
37 Experimental Results R-D performance compared to one slice per frame (FMD_1Slice)
38
38 Experimental Results Rate distortion curves
39
39 Experimental Results Slice-level parallelism Comparing adaptive and fixed slice level parallelism Speedup Encoding time of one slice per frame by a single processor system The longest encoding time of a slice using fixed mode The longest encoding time of a slice using adaptive mode
40
40 Experimental Results Speedup
41
41 Conclusions Proposed a fast MB mode selection using consistency history of block size and a zero MV Proposed a intra mode selection by comparing the correlation Using these two schemes, they proposed a new adaptive slice-level parallelism to speed up H.264/AVC encoder
42
42 Reference Z. Chen, P. Zhou, Y. He, Fast motion estimation for JVT, JVT Doc.JVT-G016,March 2003. B. Jeon, J. Lee, Fast mode decision for H.264, JVT-J003, ISO/IEC MPEG and ITU-T VCEG Joint Video Team, (Waikoloa, HI), December 2003. I. Choi, J. Lee, B. Jeon, Fast coding mode selection with rate-distortion optimization for MPEG-4 Part-10 AVC/H.264, IEEE Trans. Circuits Syst. VideoTechnol. 16 (12) (2006) 1557–1561.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.