Presentation is loading. Please wait.

Presentation is loading. Please wait.

High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing 適用於雲端架構下兼具高效能與平行化設計之 分散式視訊編碼 CMLab, CSIE, NTU 1 Cheng, Han-Ping 程瀚平.

Similar presentations


Presentation on theme: "High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing 適用於雲端架構下兼具高效能與平行化設計之 分散式視訊編碼 CMLab, CSIE, NTU 1 Cheng, Han-Ping 程瀚平."— Presentation transcript:

1 High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing 適用於雲端架構下兼具高效能與平行化設計之 分散式視訊編碼 CMLab, CSIE, NTU 1 Cheng, Han-Ping 程瀚平 Advisor: Prof. Wu, Ja-Ling 吳家麟 教授 2010/6/2

2 Outline  Introduction  DISPAC video codec  RD performance of DISPAC  Parallelizing DISPAC decoder  Decoding speed of DISPAC  Conclusions and future work CMLab, CSIE, NTU 2

3 Trends of Cloud Computing  Cloud Computing makes Clients slimmer&thinner CMLab, CSIE, NTU 3

4 Video Coding in Cloud Computing  Only need low complexity encoder and decoder at client side  Conventional video coding (e.g. H.264)  Encode once, decode many times  Low complexity decoder  Distributed Video Coding (DVC)  e.g. Video surveillance, wireless sensor network  Low complexity encoder CMLab, CSIE, NTU 4

5 Distributed Video Coding  Slepian-Wolf Theorem (1973)  Wyner-Ziv Theorem (1976) CMLab, CSIE, NTU 5 R X ≧ H(X) Source X Source Y Dependency exists but is not exploited Joint Decoder X Y Encoder X Encoder Y R Y ≧ H(X) R X + R Y ≧ ? R X + R Y ≧ H(X, Y) Source X Source Y Statistical dependency Joint Encoder R X ≧ H(X) Joint Decoder X Y Conventional video coding paradigm R Y ≧ H(Y) Slepian&Wolf : H (X, Y) !!

6 Distributed Video Coding  Wyner-Ziv Theorem (1976)  Extend to lossy coding CMLab, CSIE, NTU 6 Dependency exists but is not exploited Joint DecoderEncoder X Source X Source Encoder X Source Decoder Virtual channel Encoder Y Source Y Y Source Encoder Source Decoder Side information estimation X’X’ DVC is also called Wyner-Ziv (WZ) video coding Quantizer Channel Encoder Channel Decoder Y Channel Encoder Channel Decoder Noisy Channel X’ X X+P(X+P)’ Channel coding (Error Control Code): R X + R Y ≧? Wyner&Ziv : H(X, Y) ! R Y ≧ H(Y) R X ≧ H(X|Y) Correlation is exploited P

7 Video Coding in Cloud Computing  WZ to H.264 video transcoder CMLab, CSIE, NTU 7 WZ to H.264 Transcoder Cloud Computational Resource WZ encoder (Low Complexity) H.264 decoder (Low Complexity) WZ encoded bitstream H.264 encoded bitstream

8 Motivation  There is still a gap between Wyner-Ziv video coding and conventional video coding (e.g. H.264/AVC)  Most reported WZ codecs have a high time- delay in the decoder  Trends of parallel computing  e.g. Multi-core CPU, GPU  Parallelizability of the decoder is essential CMLab, CSIE, NTU 8

9 DISPAC Video Codec  DIStributed video coding with PArallelized design for Cloud computing (DISPAC)  To better rate-distortion (RD) performance  Combine coding tools developed in recent literatures with some newly developed modules.  To reduce decoding time-delay  Highly parallelized decoder. CMLab, CSIE, NTU 9

10 Outline  Introduction  DISPAC video codec  RD performance of DISPAC  Parallelizing DISPAC decoder  Decoding speed of DISPAC  Conclusions and future work CMLab, CSIE, NTU 10

11 DISPAC Video Codec  Combine coding tools of two state-of-the-art WZ codec:  DISCOVER codec (Distributed coding for video services)  X. Artigas et al., “The DISCOVER codec: architecture, techniques and evaluation”, PCS, 2007  MLWZ codec (Motion-learning based Wyner-Ziv video coding)  R. Martin et al., “Statistical motion learning for improved transform domain Wyner-Ziv video coding”, IET Image Processing, 2010 CMLab, CSIE, NTU 11

12 DISCOVER Video Codec CMLab, CSIE, NTU 12 Ref. X. Artigas et al., PCS, 2007 GOP 2 WZ Key WZ Key GOP 4 WZ

13 Quantization CMLab, CSIE, NTU 13  Eight quantization matrices Q1 168 00 8 000 0000 0000 Q2 328 00 8 000 0000 0000 Q3 3284 0 84 00 4 000 0000 Q4 321684 84 0 84 00 4 000 Q5 321684 844 844 0 44 00 Q6 641688 884 8844 844 0 Q7 6432168 321684 844 844 0 Q8 128643216 6432168 321684 84 0 32 = 2 5 => use 5 bits 8 = 2 3 => use 3 bits 0 bits ( 不傳 送 )

14 Quantization CMLab, CSIE, NTU 14  DCT coefficient band Block1 S11S11 S12S12 S16S16 S17S17 S13S13 S15S15 S18S18 S 1 13 S14S14 S19S19 S 1 12 S 1 14 S 1 10 S 1 11 S 1 15 S 1 16 Block2 S21S21 S22S22 S26S26 S27S27 S23S23 S25S25 S28S28 S 2 13 S24S24 S29S29 S 2 12 S 2 14 S 2 10 S 2 11 S 2 15 S 2 16 Block3 S31S31 S32S32 S36S36 S37S37 S33S33 S35S35 S38S38 S 3 13 S34S34 S39S39 S 3 12 S 3 14 S 3 10 S 3 11 S 3 15 S 3 16 DCT coefficient band b1: { S 1 1, S 2 1, S 3 1, … S N 1 } DCT coefficient band b2: { S 1 2, S 2 2, S 3 2, … S N 2 } DCT coefficient band b16: { S 1 16, S 2 16, S 3 16, … S N 16 } … DC band AC bands

15 Bit plane Extraction CMLab, CSIE, NTU 15 0010000001 00000 11110 Bit planes of DC band: Bit plane 1: Bit plane 2: Bit plane 3: Bit plane 4: Bit plane 5: Channel Encode (LDPCA) 46 7 06 3 1 7 7 30 1 5  For each DCT coefficient band… MSB LSB Q4 321684 84 0 84 00 4 000

16 DISCOVER Video Codec CMLab, CSIE, NTU 16 Ref. X. Artigas et al., PCS, 2007 白育姍 Dependency exists but is not exploited Joint DecoderEncoder X Source X X Virtual channel Encoder Y Source Y Y Source Encoder Source Decoder Side information estimation X’X’ Quantizer Channel Encoder Channel Decoder Y R Y ≧ H(Y) R X ≧ H(X|Y) P

17 Side Information Creation CMLab, CSIE, NTU 17 XFXF XBXB Low pass filter (3x3 Mean filter) Divide frame to 16x16 non-overlapped blocksMotion estimation (search window: ±32)

18 Side Information Creation CMLab, CSIE, NTU 18 XFXF XBXB

19 Side Information Creation CMLab, CSIE, NTU 19 XFXF XBXB (x L, y L ) (x u, y u ) Adaptive search range: N N N N (x R y R ) (x B, y B )

20 Side Information Creation CMLab, CSIE, NTU 20 XFXF XBXB Half pixel motion estimation

21 Side Information Creation CMLab, CSIE, NTU 21 XFXF XBXB Weighted vector median filter: x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x8x8 x9x9 Spatial motion smoothing

22 MSE 2 Side Information Creation CMLab, CSIE, NTU 22 XFXF XBXB Weighted vector median filter: x1x1 x2x2 MSE 1

23 Side Information Creation CMLab, CSIE, NTU 23 XFXF XBXB Weighted vector median filter: x1x1

24 The result of x 6 is minimum x wvmf = x 6 (Final motion vector ! ) Side Information Creation CMLab, CSIE, NTU 24 XFXF XBXB Weighted vector median filter: x6x6

25 Side Information Creation CMLab, CSIE, NTU 25 XFXF XBXB Block interpolation ( 0.75*X B + 0.25*X F ) Bidirectional motion compensation

26 DISCOVER Video Codec CMLab, CSIE, NTU 26 Ref. X. Artigas et al., PCS, 2007 白育姍 Laplacian Distribution

27 CNM Parameter Estimation CMLab, CSIE, NTU 27 XFXF XBXB Residual frame generation:

28 CNM Parameter Estimation CMLab, CSIE, NTU 28 Residual frame DCT transform : (4x4) z 258 10 -30120 0.5 -6 35 5 -24 200 -40 20

29 CNM Parameter Estimation CMLab, CSIE, NTU 29 258 10 -30120 0.5 -6 35 5 -24 200 -40 20 CNM parameter computation:

30 DISCOVER Video Codec CMLab, CSIE, NTU 30 Ref. X. Artigas et al., PCS, 2007 白育姍

31 Correlation Noise Distribution Modeling CMLab, CSIE, NTU CNM parameter Side information Laplacian distribution WZ

32 DISCOVER Video Codec CMLab, CSIE, NTU 32 Ref. X. Artigas et al., PCS, 2007 白育姍

33 Conditional Bit Prob Computation  : probabilities of the k-th bit is one given side information (Y) and previous k-1 decoded bits CMLab, CSIE, NTU 33 X-Y Prob. 176/4 144/4 WZ Laplacian pdf Need to sum up 256 probabilities 0011000 (24)0011111 (31) Assume quantization step size is 32 (31-24+1) x 32 = 256 R.P. Westerlaken et al., “Analyzing symbol and bit plane-based LDPC in distributed video coding”, ICIP, 2007.

34 DISCOVER Video Codec CMLab, CSIE, NTU 34 Ref. X. Artigas et al., PCS, 2007 白育姍

35 Reconstruction CMLab, CSIE, NTU 35 4 7 6 1 7 7 0 3 6 30 5 1 Channel decode (LDPCA) Bit plane 1: 0 0 0 1 Bit plane 2: 0 0 0 1 Bit plane 3: 1 0 0 1 Bit plane 4: 0 0 0 1 Bit plane 5: 0 1 0 0 Zig zag order Bit planes of DC band:

36 Reconstruction CMLab, CSIE, NTU 36 D. Kubasov et al., “Optimal reconstruction in Wyner–Ziv video coding with multiple side information”, IEEE workshop on MMSP, 2007

37 DISCOVER Video Codec CMLab, CSIE, NTU 37 Ref. X. Artigas et al., PCS, 2007 Poor RD performance for high motion and large GOP size sequences 白育姍

38 DISCOVER Video Codec CMLab, CSIE, NTU 38 Ref. X. Artigas et al., PCS, 2007 Rooms for Improvement 白育姍

39 MLWZ Video Codec CMLab, CSIE, NTU 39 Ref. R. Martin et al., IET Image Processing, 2010 SI (Y) WZ (R) Search range SMF 1 =0.1 SMF 2 =0.02 SMF 81 =0.1 Update SMF: Normalize SMF: 白育姍

40 MLWZ Video Codec CMLab, CSIE, NTU 40 Ref. R. Martin et al., IET Image Processing, 2010 SI Search range … … Side information re-estimation:

41 MLWZ Video Codec CMLab, CSIE, NTU 41 Ref. R. Martin et al., IET Image Processing, 2010 Correlation Noise Distribution Modeling: DCT coefficient of WZ DCT coefficient SI Laplacian distributionLaplacian parameter Sum of Laplacian ! 白育姍

42 MLWZ Video Codec CMLab, CSIE, NTU 42 Ref. R. Martin et al., IET Image Processing, 2010 Improve RD performance in high motion and large GOP size sequences Rooms for Improvement 白育姍

43 DISPAC Video Codec CMLab, CSIE, NTU 43 邱柏叡 Half-pixel motion estimation: 白育姍 Reduce decoding time and Improve RD performance Improve subjective quality Improve SI for motion learning For low motion parts For high motion parts Improve initial SI and motion learning

44 DISPAC Video Codec CMLab, CSIE, NTU 44 邱柏叡 白育姍 程瀚平

45 Outline  Introduction  DISPAC video codec  RD performance of DISPAC  Parallelizing DISPAC decoder  Decoding speed of DISPAC  Conclusions and future work CMLab, CSIE, NTU 45

46 RD Performance of DISPAC  Test sequences:  QCIF, 15Hz, all frames (150 for Soccer, Foreman, Coastguard and 164 for Hall Monitor)  GOP size: 2, 4, 8  Bitrate and PSNR: only luminance component CMLab, CSIE, NTU 46 SoccerForemanCoastguardHall Monitor High Low Motion

47 RD Performance (GOP=2) CMLab, CSIE, NTU 47

48 RD Performance (GOP=4) 48 CMLab, CSIE, NTU

49 RD Performance (GOP=8) CMLab, CSIE, NTU 49 3.6 dB 3.1 dB 0.9 dB 2.6 dB 3.1 dB 1.6 dB 0.2 dB 2.6 dB

50 Outline  Introduction  DISPAC video codec  RD performance of DISPAC  Parallelizing DISPAC decoder  Decoding speed of DISPAC  Conclusions and future work CMLab, CSIE, NTU 50

51 Parallelizing DISPAC Decoder CMLab, CSIE, NTU 51 OpenMP CUDA 白育姍 邱柏叡

52 Side Information Re-Creation  Assume QCIF sequence, 800 4x4 WZ blocks, 1024 search candidates within search range CMLab, CSIE, NTU Second iteration (128 candidates) First iteration (128 candidates) Texture memory 52

53 Side Information Re-Creation  Reduction algorithm CMLab, CSIE, NTU 53 Mark Harris, “Optimizing parallel reduction in CUDA”, NVIDIA Developer Technology, 2007.

54 Parallelizing DISPAC Decoder CMLab, CSIE, NTU 54 CUDA 白育姍 邱柏叡

55 Correlation Noise Distribution Modeling  Assume QCIF sequence, 800 4x4 WZ blocks, 1024 possible integer values of X-Y for DCT coefficient band 2 CMLab, CSIE, NTU 55 176/4 144/4 WZ Skip Intra WZ 1024 integer values X-Y PCNM Sum of Laplacian pdf

56 Correlation Noise Distribution Modeling CMLab, CSIE, NTU 56

57 Conditional Bit Prob Computation  : probabilities of the k-th bit is one given side information (Y) and previous k-1 decoded bits CMLab, CSIE, NTU 57 X-Y PCNM 176/4 144/4 WZ Skip Intra WZ Sum of Laplacian pdf Need to sum up 256 probabilities 0011000 (24)0011111 (31) Assume quantization step size is 32 (31-24+1) x 32 = 256 R.P. Westerlaken et al., “Analyzing symbol and bit plane-based LDPC in distributed video coding”, ICIP, 2007.

58 Conditional Bit Prob Computation CMLab, CSIE, NTU 58

59 Outline  Introduction  DISPAC video codec  RD performance of DISPAC  Parallelizing DISPAC decoder  Decoding speed of DISPAC  Conclusions and future work CMLab, CSIE, NTU 59

60 Decoding speed of DISPAC  A workstation equipped with an Intel Xeon E5530 CPU at 2.4GHz and an NVIDIA Tesla C1060 graphics card is used to emulate the basic unit of a Could computing environment.  Operating system: Debian squeeze/sid with 2.6.32- 5-amd64 kernel.  QCIF, 15Hz, whole sequence, GOP size 8, quantization table 8 (Q8) CMLab, CSIE, NTU 60

61 Decoding speed of DISPAC CMLab, CSIE, NTU 61  Bottleneck analysis (sequential decoding) CNM: Correlation Noise Modeling

62 Decoding speed of DISPAC CMLab, CSIE, NTU 62

63 Decoding speed of DISPAC 63 Average decoding time per frame (sec.)

64 Decoding speed of DISPAC 64 Speed up ratio (compare to DISCOVER)

65 Outline  Introduction  DISPAC video codec  RD performance of DISPAC  Parallelizing DISPAC decoder  Decoding speed of DISPAC  Conclusions and future work CMLab, CSIE, NTU 65

66 Conclusions  DISPAC combined the coding tools developed in recent literatures (e.g. MLWZ codec) with some newly developed modules (block mode selection, SI re-creation and adaptive deblocking filter).  Up to 3.6 dB gain on RD performance  The decoding modules can be highly parallelized.  Up to 61 times faster than state-of-the-art DVC codec CMLab, CSIE, NTU 66

67 Future Work  Update the correlation noise model parameter during decoding process.  For RD performance  Improve parallelizability of the parallel LDPCA decoding algorithm for small size parity check matrices.  For decoding speed  WZ to H.264 video transcoder.  For real demo system CMLab, CSIE, NTU 67

68 Thank You CMLab, CSIE, NTU 68


Download ppt "High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing 適用於雲端架構下兼具高效能與平行化設計之 分散式視訊編碼 CMLab, CSIE, NTU 1 Cheng, Han-Ping 程瀚平."

Similar presentations


Ads by Google