Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication.

Similar presentations


Presentation on theme: "1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication."— Presentation transcript:

1 1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication and Image Representation 2008

2 2 Outline  Introduction  Complexity Analysis  Method Pre Macroblock Mode Selection Adaptive Slice-level Parallelism  Experimental Results  Conclusions

3 3 Introduction  H.264/AVC achieves high coding efficiency Variable block size, multiple reference frame, quarter-pel motion vector accuracy,etc.  High computational complexity Complexity reduction algorithm Parallel processing

4 4 Introduction  GOP level Simple but high latency  Frame level Keep coding efficiency, but the dependence among frames limits the thread scalability  Slice level Encode independently but less coding efficiency  Macroblock level High dependency

5 5 Introduction  MBs in a slice may not have similar computational complexity. Unnecessary extra waiting time in some threads. slice 0 slice 1 slice 2 slice 3 slice 4 slice 5 slice 6 slice 7 Encoding time PU0 PU1 PU2 PU3 PU4 PU5 PU6 PU7

6 6 Main Purpose  Objective Using parallel algorithm to speed up H.264/AVC encoder Maximize the parallelism efficiency by distributing the workload equally.  Method Pre processing: Fast MB mode selection Adaptive slice-level parallelism

7 7 Complexity Analysis  Inter prediction mode of MBs in H.264  Intra prediction mode: 4*4, 16*16

8 8 Complexity Analysis  The run-time complexity of the H.264/AVC encoder Pentium IV 2.4GHz Foreman_CIF with IPPP structure

9 9 Pre Macroblock Mode Selection Overview  Why? High computational complexity of ME in variable block size Remove unnecessary ME block size and RD calculation of intra prediction mode  This removal leads to Complexity reduction Workload balancing among slices

10 10 Pre Macroblock Mode Selection Inter MB mode selection  MC block sizes in video sequence Foreground region : 8*8 or smaller Non-moving region : 16*16  High temporal correlation Check consistency history of block size 16*16 and zero MV  Two measurements Zero motion consistency (ZMC) Large block consistency (LBC)

11 11 Pre Macroblock Mode Selection Inter MB mode selection  Zero Motion Consistency (ZMC) Indicates how long a specified block has had a zero MV consecutively When a block is encoded in intra mode  ZMC is set to 0 t : frame index, ZMC 0 = 0, (n,m;i,j) indicates a 4*4 block at (n,m) within a MB (i,j) high value of ZMC  high prob. of belonging to background region

12 12 Pre Macroblock Mode Selection Inter MB mode selection  Zero Motion Consistency Score Indicates how likely a MB being a stationary region T MOTION : A threshold value

13 13 Pre Macroblock Mode Selection Inter MB mode selection  Large Block Consistency (LBC) Indicates the number of continuous frames having a 16*16 MC block size at (i,j) th MB When a block is encoded in intra mode  LBC is set to 0 bestMode t (i,j) : The best MB mode of the (i,j) MB in tth frame LBC 0 = 0

14 14 Pre Macroblock Mode Selection Inter MB mode selection  Large Block Consistency Score Indicates how likely a MB being partitioned in 16*16 T MODE1,T MODE2 : Threshold values used to make the assessment of the LBC

15 15 Pre Macroblock Mode Selection Inter MB mode selection  A illustration of LBCS

16 16 Pre Macroblock Mode Selection Inter MB mode selection  Conditional probability of MB modes given ZMCS = High The other block sizes are very unlikely to appear (less than about 0.04) Early detect SKIP and P16*16 mode T Motion = 4

17 17 Pre Macroblock Mode Selection Inter MB mode selection  Joint conditional probability of given LBCS with ZMCS = Low A: LBCS = High, B: LBCS = Medium, C: LBCS = Low T MODE1 = 1, T MODE2 = 4

18 18 Pre Macroblock Mode Selection Pre selective intra mode selection  High computational load of computing RD costs of intra mode  Comparing temporal correlation with spatial correlation of the current MB prior to frame coding

19 19 Pre Macroblock Mode Selection Selective intra mode selection  Mean Absolute Temporal Difference  Mean Absolute Spatial Difference c x,y : Pixel values at location (x,y) of MB in current frame r x,y : Pixel values at location (x,y) of MB in previous frame X, Y : Horizontal and vertical dimensions of a MB MASD H : The MASD between horizontally neighboring pixels MASD V : The MASD between vertically neighboring pixels

20 20 Pre Macroblock Mode Selection Selective intra mode selection  Comparing MATD and MASD to determine whether current MB should calculate RD costs of intra modes A larger w makes skipping intra mode search easier A smaller QP will incur more intra modes than a larger QP w: Weighting factor, currently is set to 0.6 More temporally correlated than spatially correlated

21 21 Pre Macroblock Mode Selection MB mode classfication  Decision table of candidate MB mode  A block diagram of MB selection

22 22 Adaptive Slice-level Parallelism Overview  Characteristic Easy to implement Lower overhead of inter communication among processor unit Good scalability Increase bitrate  Slice boundary is defined on the basis of a fixed number of MBs or fixed number of bits Hard to decide a slice boundary prior to encoding

23 23 Adaptive Slice-level Parallelism Fixed MB assignment  The number of consecutive MBs in each slice L : The number of processor units on a multi-core system M : The total number of MBs in a frame i : Slice index Example : number of processing unit L = 8, sequence resolution is CIF (352*288), M = 22*18 = 396  We can assign about 49 MBs to each slice

24 24 Adaptive Slice-level Parallelism Fixed MB assignment  The scheduling of slice-level parallelism in eight processor units slice 0 slice 1 slice 2 slice 3 slice 4 slice 5 slice 6 slice 7 Encoding time PU0 PU1 PU2 PU3 PU4 PU5 PU6 PU7 slice 0 slice 1 slice 2 slice 3 slice 4 slice 5 slice 6 slice 7 Encoding time PU0 PU1 PU2 PU3 PU4 PU5 PU6 PU7 Ideal casePractical case Bottleneck

25 25 Adaptive Slice-level Parallelism Fixed MB assignment  The imbalance of computational load distribution Exhaustive Search Method Fast ME / Fast Mode Search

26 26 Adaptive Slice-level Parallelism Fixed MB assignment  Computational load for encoding one frame in slice level parallelism  Computation load of the t th frame by a single processor system C t slice(i) : The computational load of i th slice in t th frame L : Number of slice in a frame

27 27 Adaptive Slice-level Parallelism Fixed MB assignment  The speedup of multiprocessor system over a single processor system  To achieve the maximum speedup Computation loads of each slice should be as similar as possible  Adaptive slice partition method

28 28 Adaptive Slice-level Parallelism Complexity estimation model  A simple estimation method by utilizing the result of fast MB mode selection  Define the group value g corresponding to the candidate MB modes

29 29 Adaptive Slice-level Parallelism Complexity estimation model  Complexity model C k,CHKIntra (g) : Complexity cost of the k th MB g : Group index e inter : Estimated complexity cost of inter mode in g = 1 e intra : Complexity cost according to the intra mode check in g = 1 α 1, α 2, α 3, β 1 β 2 β 3 : Weighting values of complexity cost

30 30 Adaptive Slice-level Parallelism Complexity estimation model  Relative computational load CHK intra = 0 CHK intra = 1 Assume e inter = 1, e intra = 0  α 1 =2.42, α 2 =3.12,α 3 =5.28  β 1 =0.82, β 2 =0.83, β 3 =0.84 Assume e inter = 1, e intra = 3.97

31 31 Adaptive Slice-level Parallelism Adaptive MB assignment  The total computational load at the t th frame  Ideal computational load of each slice for the uniform workload distribution

32 32 Adaptive Slice-level Parallelism Adaptive MB assignment  MB assignment of slice  Much better than fixed MB assignment in each slice

33 33 Adaptive Slice-level Parallelism Adaptive MB assignment  Entire block diagram

34 34 Experimental Results Overview  Performance comparison between proposed MB mode decision and the conventional method  Comparing adaptive slice-level parallelism with fixed slice-level parallelism

35 35 Experimental Results MB mode selection  Average encoding time saving AST[%]  BDPSNR and BDBR are used to measure the performance against FULL_1Slice FULL_1Slice : Exhaustive method FMD_1Slice : Fast MB mode search method

36 36 Experimental Results Rate distortion curves

37 37 Experimental Results  R-D performance compared to one slice per frame (FMD_1Slice)

38 38 Experimental Results Rate distortion curves

39 39 Experimental Results Slice-level parallelism  Comparing adaptive and fixed slice level parallelism  Speedup Encoding time of one slice per frame by a single processor system The longest encoding time of a slice using fixed mode The longest encoding time of a slice using adaptive mode

40 40 Experimental Results Speedup

41 41 Conclusions  Proposed a fast MB mode selection using consistency history of block size and a zero MV  Proposed a intra mode selection by comparing the correlation  Using these two schemes, they proposed a new adaptive slice-level parallelism to speed up H.264/AVC encoder

42 42 Reference  Z. Chen, P. Zhou, Y. He, Fast motion estimation for JVT, JVT Doc.JVT-G016,March 2003.  B. Jeon, J. Lee, Fast mode decision for H.264, JVT-J003, ISO/IEC MPEG and ITU-T VCEG Joint Video Team, (Waikoloa, HI), December 2003.  I. Choi, J. Lee, B. Jeon, Fast coding mode selection with rate-distortion optimization for MPEG-4 Part-10 AVC/H.264, IEEE Trans. Circuits Syst. VideoTechnol. 16 (12) (2006) 1557–1561.


Download ppt "1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication."

Similar presentations


Ads by Google