Download presentation
Presentation is loading. Please wait.
Published byBethanie Strickland Modified over 9 years ago
1
High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing 適用於雲端架構下兼具高效能與平行化設計之 分散式視訊編碼 CMLab, CSIE, NTU 1 Cheng, Han-Ping 程瀚平 Advisor: Prof. Wu, Ja-Ling 吳家麟 教授 2010/6/2
2
Outline Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work CMLab, CSIE, NTU 2
3
Trends of Cloud Computing Cloud Computing makes Clients slimmer&thinner CMLab, CSIE, NTU 3
4
Video Coding in Cloud Computing Only need low complexity encoder and decoder at client side Conventional video coding (e.g. H.264) Encode once, decode many times Low complexity decoder Distributed Video Coding (DVC) e.g. Video surveillance, wireless sensor network Low complexity encoder CMLab, CSIE, NTU 4
5
Distributed Video Coding Slepian-Wolf Theorem (1973) Wyner-Ziv Theorem (1976) CMLab, CSIE, NTU 5 R X ≧ H(X) Source X Source Y Dependency exists but is not exploited Joint Decoder X Y Encoder X Encoder Y R Y ≧ H(X) R X + R Y ≧ ? R X + R Y ≧ H(X, Y) Source X Source Y Statistical dependency Joint Encoder R X ≧ H(X) Joint Decoder X Y Conventional video coding paradigm R Y ≧ H(Y) Slepian&Wolf : H (X, Y) !!
6
Distributed Video Coding Wyner-Ziv Theorem (1976) Extend to lossy coding CMLab, CSIE, NTU 6 Dependency exists but is not exploited Joint DecoderEncoder X Source X Source Encoder X Source Decoder Virtual channel Encoder Y Source Y Y Source Encoder Source Decoder Side information estimation X’X’ DVC is also called Wyner-Ziv (WZ) video coding Quantizer Channel Encoder Channel Decoder Y Channel Encoder Channel Decoder Noisy Channel X’ X X+P(X+P)’ Channel coding (Error Control Code): R X + R Y ≧? Wyner&Ziv : H(X, Y) ! R Y ≧ H(Y) R X ≧ H(X|Y) Correlation is exploited P
7
Video Coding in Cloud Computing WZ to H.264 video transcoder CMLab, CSIE, NTU 7 WZ to H.264 Transcoder Cloud Computational Resource WZ encoder (Low Complexity) H.264 decoder (Low Complexity) WZ encoded bitstream H.264 encoded bitstream
8
Motivation There is still a gap between Wyner-Ziv video coding and conventional video coding (e.g. H.264/AVC) Most reported WZ codecs have a high time- delay in the decoder Trends of parallel computing e.g. Multi-core CPU, GPU Parallelizability of the decoder is essential CMLab, CSIE, NTU 8
9
DISPAC Video Codec DIStributed video coding with PArallelized design for Cloud computing (DISPAC) To better rate-distortion (RD) performance Combine coding tools developed in recent literatures with some newly developed modules. To reduce decoding time-delay Highly parallelized decoder. CMLab, CSIE, NTU 9
10
Outline Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work CMLab, CSIE, NTU 10
11
DISPAC Video Codec Combine coding tools of two state-of-the-art WZ codec: DISCOVER codec (Distributed coding for video services) X. Artigas et al., “The DISCOVER codec: architecture, techniques and evaluation”, PCS, 2007 MLWZ codec (Motion-learning based Wyner-Ziv video coding) R. Martin et al., “Statistical motion learning for improved transform domain Wyner-Ziv video coding”, IET Image Processing, 2010 CMLab, CSIE, NTU 11
12
DISCOVER Video Codec CMLab, CSIE, NTU 12 Ref. X. Artigas et al., PCS, 2007 GOP 2 WZ Key WZ Key GOP 4 WZ
13
Quantization CMLab, CSIE, NTU 13 Eight quantization matrices Q1 168 00 8 000 0000 0000 Q2 328 00 8 000 0000 0000 Q3 3284 0 84 00 4 000 0000 Q4 321684 84 0 84 00 4 000 Q5 321684 844 844 0 44 00 Q6 641688 884 8844 844 0 Q7 6432168 321684 844 844 0 Q8 128643216 6432168 321684 84 0 32 = 2 5 => use 5 bits 8 = 2 3 => use 3 bits 0 bits ( 不傳 送 )
14
Quantization CMLab, CSIE, NTU 14 DCT coefficient band Block1 S11S11 S12S12 S16S16 S17S17 S13S13 S15S15 S18S18 S 1 13 S14S14 S19S19 S 1 12 S 1 14 S 1 10 S 1 11 S 1 15 S 1 16 Block2 S21S21 S22S22 S26S26 S27S27 S23S23 S25S25 S28S28 S 2 13 S24S24 S29S29 S 2 12 S 2 14 S 2 10 S 2 11 S 2 15 S 2 16 Block3 S31S31 S32S32 S36S36 S37S37 S33S33 S35S35 S38S38 S 3 13 S34S34 S39S39 S 3 12 S 3 14 S 3 10 S 3 11 S 3 15 S 3 16 DCT coefficient band b1: { S 1 1, S 2 1, S 3 1, … S N 1 } DCT coefficient band b2: { S 1 2, S 2 2, S 3 2, … S N 2 } DCT coefficient band b16: { S 1 16, S 2 16, S 3 16, … S N 16 } … DC band AC bands
15
Bit plane Extraction CMLab, CSIE, NTU 15 0010000001 00000 11110 Bit planes of DC band: Bit plane 1: Bit plane 2: Bit plane 3: Bit plane 4: Bit plane 5: Channel Encode (LDPCA) 46 7 06 3 1 7 7 30 1 5 For each DCT coefficient band… MSB LSB Q4 321684 84 0 84 00 4 000
16
DISCOVER Video Codec CMLab, CSIE, NTU 16 Ref. X. Artigas et al., PCS, 2007 白育姍 Dependency exists but is not exploited Joint DecoderEncoder X Source X X Virtual channel Encoder Y Source Y Y Source Encoder Source Decoder Side information estimation X’X’ Quantizer Channel Encoder Channel Decoder Y R Y ≧ H(Y) R X ≧ H(X|Y) P
17
Side Information Creation CMLab, CSIE, NTU 17 XFXF XBXB Low pass filter (3x3 Mean filter) Divide frame to 16x16 non-overlapped blocksMotion estimation (search window: ±32)
18
Side Information Creation CMLab, CSIE, NTU 18 XFXF XBXB
19
Side Information Creation CMLab, CSIE, NTU 19 XFXF XBXB (x L, y L ) (x u, y u ) Adaptive search range: N N N N (x R y R ) (x B, y B )
20
Side Information Creation CMLab, CSIE, NTU 20 XFXF XBXB Half pixel motion estimation
21
Side Information Creation CMLab, CSIE, NTU 21 XFXF XBXB Weighted vector median filter: x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x8x8 x9x9 Spatial motion smoothing
22
MSE 2 Side Information Creation CMLab, CSIE, NTU 22 XFXF XBXB Weighted vector median filter: x1x1 x2x2 MSE 1
23
Side Information Creation CMLab, CSIE, NTU 23 XFXF XBXB Weighted vector median filter: x1x1
24
The result of x 6 is minimum x wvmf = x 6 (Final motion vector ! ) Side Information Creation CMLab, CSIE, NTU 24 XFXF XBXB Weighted vector median filter: x6x6
25
Side Information Creation CMLab, CSIE, NTU 25 XFXF XBXB Block interpolation ( 0.75*X B + 0.25*X F ) Bidirectional motion compensation
26
DISCOVER Video Codec CMLab, CSIE, NTU 26 Ref. X. Artigas et al., PCS, 2007 白育姍 Laplacian Distribution
27
CNM Parameter Estimation CMLab, CSIE, NTU 27 XFXF XBXB Residual frame generation:
28
CNM Parameter Estimation CMLab, CSIE, NTU 28 Residual frame DCT transform : (4x4) z 258 10 -30120 0.5 -6 35 5 -24 200 -40 20
29
CNM Parameter Estimation CMLab, CSIE, NTU 29 258 10 -30120 0.5 -6 35 5 -24 200 -40 20 CNM parameter computation:
30
DISCOVER Video Codec CMLab, CSIE, NTU 30 Ref. X. Artigas et al., PCS, 2007 白育姍
31
Correlation Noise Distribution Modeling CMLab, CSIE, NTU CNM parameter Side information Laplacian distribution WZ
32
DISCOVER Video Codec CMLab, CSIE, NTU 32 Ref. X. Artigas et al., PCS, 2007 白育姍
33
Conditional Bit Prob Computation : probabilities of the k-th bit is one given side information (Y) and previous k-1 decoded bits CMLab, CSIE, NTU 33 X-Y Prob. 176/4 144/4 WZ Laplacian pdf Need to sum up 256 probabilities 0011000 (24)0011111 (31) Assume quantization step size is 32 (31-24+1) x 32 = 256 R.P. Westerlaken et al., “Analyzing symbol and bit plane-based LDPC in distributed video coding”, ICIP, 2007.
34
DISCOVER Video Codec CMLab, CSIE, NTU 34 Ref. X. Artigas et al., PCS, 2007 白育姍
35
Reconstruction CMLab, CSIE, NTU 35 4 7 6 1 7 7 0 3 6 30 5 1 Channel decode (LDPCA) Bit plane 1: 0 0 0 1 Bit plane 2: 0 0 0 1 Bit plane 3: 1 0 0 1 Bit plane 4: 0 0 0 1 Bit plane 5: 0 1 0 0 Zig zag order Bit planes of DC band:
36
Reconstruction CMLab, CSIE, NTU 36 D. Kubasov et al., “Optimal reconstruction in Wyner–Ziv video coding with multiple side information”, IEEE workshop on MMSP, 2007
37
DISCOVER Video Codec CMLab, CSIE, NTU 37 Ref. X. Artigas et al., PCS, 2007 Poor RD performance for high motion and large GOP size sequences 白育姍
38
DISCOVER Video Codec CMLab, CSIE, NTU 38 Ref. X. Artigas et al., PCS, 2007 Rooms for Improvement 白育姍
39
MLWZ Video Codec CMLab, CSIE, NTU 39 Ref. R. Martin et al., IET Image Processing, 2010 SI (Y) WZ (R) Search range SMF 1 =0.1 SMF 2 =0.02 SMF 81 =0.1 Update SMF: Normalize SMF: 白育姍
40
MLWZ Video Codec CMLab, CSIE, NTU 40 Ref. R. Martin et al., IET Image Processing, 2010 SI Search range … … Side information re-estimation:
41
MLWZ Video Codec CMLab, CSIE, NTU 41 Ref. R. Martin et al., IET Image Processing, 2010 Correlation Noise Distribution Modeling: DCT coefficient of WZ DCT coefficient SI Laplacian distributionLaplacian parameter Sum of Laplacian ! 白育姍
42
MLWZ Video Codec CMLab, CSIE, NTU 42 Ref. R. Martin et al., IET Image Processing, 2010 Improve RD performance in high motion and large GOP size sequences Rooms for Improvement 白育姍
43
DISPAC Video Codec CMLab, CSIE, NTU 43 邱柏叡 Half-pixel motion estimation: 白育姍 Reduce decoding time and Improve RD performance Improve subjective quality Improve SI for motion learning For low motion parts For high motion parts Improve initial SI and motion learning
44
DISPAC Video Codec CMLab, CSIE, NTU 44 邱柏叡 白育姍 程瀚平
45
Outline Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work CMLab, CSIE, NTU 45
46
RD Performance of DISPAC Test sequences: QCIF, 15Hz, all frames (150 for Soccer, Foreman, Coastguard and 164 for Hall Monitor) GOP size: 2, 4, 8 Bitrate and PSNR: only luminance component CMLab, CSIE, NTU 46 SoccerForemanCoastguardHall Monitor High Low Motion
47
RD Performance (GOP=2) CMLab, CSIE, NTU 47
48
RD Performance (GOP=4) 48 CMLab, CSIE, NTU
49
RD Performance (GOP=8) CMLab, CSIE, NTU 49 3.6 dB 3.1 dB 0.9 dB 2.6 dB 3.1 dB 1.6 dB 0.2 dB 2.6 dB
50
Outline Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work CMLab, CSIE, NTU 50
51
Parallelizing DISPAC Decoder CMLab, CSIE, NTU 51 OpenMP CUDA 白育姍 邱柏叡
52
Side Information Re-Creation Assume QCIF sequence, 800 4x4 WZ blocks, 1024 search candidates within search range CMLab, CSIE, NTU Second iteration (128 candidates) First iteration (128 candidates) Texture memory 52
53
Side Information Re-Creation Reduction algorithm CMLab, CSIE, NTU 53 Mark Harris, “Optimizing parallel reduction in CUDA”, NVIDIA Developer Technology, 2007.
54
Parallelizing DISPAC Decoder CMLab, CSIE, NTU 54 CUDA 白育姍 邱柏叡
55
Correlation Noise Distribution Modeling Assume QCIF sequence, 800 4x4 WZ blocks, 1024 possible integer values of X-Y for DCT coefficient band 2 CMLab, CSIE, NTU 55 176/4 144/4 WZ Skip Intra WZ 1024 integer values X-Y PCNM Sum of Laplacian pdf
56
Correlation Noise Distribution Modeling CMLab, CSIE, NTU 56
57
Conditional Bit Prob Computation : probabilities of the k-th bit is one given side information (Y) and previous k-1 decoded bits CMLab, CSIE, NTU 57 X-Y PCNM 176/4 144/4 WZ Skip Intra WZ Sum of Laplacian pdf Need to sum up 256 probabilities 0011000 (24)0011111 (31) Assume quantization step size is 32 (31-24+1) x 32 = 256 R.P. Westerlaken et al., “Analyzing symbol and bit plane-based LDPC in distributed video coding”, ICIP, 2007.
58
Conditional Bit Prob Computation CMLab, CSIE, NTU 58
59
Outline Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work CMLab, CSIE, NTU 59
60
Decoding speed of DISPAC A workstation equipped with an Intel Xeon E5530 CPU at 2.4GHz and an NVIDIA Tesla C1060 graphics card is used to emulate the basic unit of a Could computing environment. Operating system: Debian squeeze/sid with 2.6.32- 5-amd64 kernel. QCIF, 15Hz, whole sequence, GOP size 8, quantization table 8 (Q8) CMLab, CSIE, NTU 60
61
Decoding speed of DISPAC CMLab, CSIE, NTU 61 Bottleneck analysis (sequential decoding) CNM: Correlation Noise Modeling
62
Decoding speed of DISPAC CMLab, CSIE, NTU 62
63
Decoding speed of DISPAC 63 Average decoding time per frame (sec.)
64
Decoding speed of DISPAC 64 Speed up ratio (compare to DISCOVER)
65
Outline Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work CMLab, CSIE, NTU 65
66
Conclusions DISPAC combined the coding tools developed in recent literatures (e.g. MLWZ codec) with some newly developed modules (block mode selection, SI re-creation and adaptive deblocking filter). Up to 3.6 dB gain on RD performance The decoding modules can be highly parallelized. Up to 61 times faster than state-of-the-art DVC codec CMLab, CSIE, NTU 66
67
Future Work Update the correlation noise model parameter during decoding process. For RD performance Improve parallelizability of the parallel LDPCA decoding algorithm for small size parity check matrices. For decoding speed WZ to H.264 video transcoder. For real demo system CMLab, CSIE, NTU 67
68
Thank You CMLab, CSIE, NTU 68
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.