Download presentation
1
Analysis, Fast Algorithm, and VLSI Architecture Design for H
Analysis, Fast Algorithm, and VLSI Architecture Design for H.264/AVC Intra Frame Coder Yu-Wen Huang, Bing-Yu Hsieh, Tung-Chien Chen, and Liang-Gee Chen IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 2005
2
Outline Introduction H.264/AVC Intra Coding Computation Reduction
Hardware Architecture
3
Introduction - Decoder Input Coder Video Signal Control Data
Entropy Coding Scaling & Inv. Transform Motion- Compensation Control Data Quant. Transf. coeffs Motion Intra/Inter Coder Decoder Estimation Transform/ Scal./Quant. - Input Video Signal Split into Macroblocks 16x16 pixels Intra-frame Prediction De-blocking Filter Output Multiple Reference Frames & Variable Block sizes
4
Introduction Compressed Data Source Prediction Transform Quantization
Entropy Coding 44/1616 Luma 88 Chroma 4 4 DCT Scalar Nonuniform Q CAVLC CABAC lossy lossless (Bit per pixel)
5
Introduction H.264/AVC I-Frame Coder (CAVLC) vs. JPEG2000 (DWT 53)
Computational Complexity Block-based coding vs. Frame-based coding DWT 53 Hardware-friendly Memory-wasting
6
Introduction Comparison between different image coding standards JPEG
JPEG 2000 DWT53 H.264 I-Frame CAVLC 0.225 bpp
7
Introduction Two solutions for platform-based design of H.264/AVC intra frame coder Fast algorithm for software implementation Reduce 45% complexity PSNR drop 0.3 dB Hardware accelerator Max. clock rate 55 MHz 31 fps for 4:2:0 SDTV (All intra frames)
8
H.264/AVC Intra Coding Intra Prediction I4MB (44) I16MB (1616) + DC
Current 1 3 4 5 6 7 8 + DC + DC + Plane 1
9
H.264/AVC Intra Coding Mode Decision Low complexity mode
SATD (Original pels – Predictors) Rate (bit of Mode information) High complexity mode MSE (Original pels – Reconstructed pels) Rate (Mode information + Residual)
10
H.264/AVC Intra Coding Transform and Quantization
4 4 integer transform Hadamard transform DCT-based integer transform
11
H.264/AVC Intra Coding Entropy Coding
Context-Based Adaptive Binary Arithmetic Coding (CABAC) Context-Based Adaptive Variable Length Coding (CAVLC)
12
H.264/AVC Intra Coding Run-time percentage 720 480 4:2:0 30fps
10829 MIPS
13
Computation Reduction
Intra Prediction Table look-up Cost generation Sub-sampling
14
Computation Reduction
Fast Intra Prediction The smaller the mode number is, the more possible it will occur. global statistics cannot reflect the correlation of local modes. Local statistics of neighboring blocks are applied.
15
Computation Reduction
Fast Intra Prediction Skip unlikely candidates
16
Computation Reduction
Rate-distortion under different numbers of local-searched I4MB modes without insertion of full-search blocks 6 4 1 All DC modes 2
17
Computation Reduction
Fast Intra Prediction Prevention of error propagation Periodic insertion of full-search 4x4 blocks Adaptive threshold on the distortion for a MB If min SATD of P > THMinSATD, then search all modes. THMinSATD = (min SATD of F) = 2.0 F P F P P P P P F P F P P P P P
18
Computation Reduction
Subsampling Patterns
19
Computation Reduction
Saved Computation and PSNR Drop PSNR drop < 0.3 dB Global: subsampling + partial search using global statistics Local: subsampling + partial search Proposed: subsampling + partial search + periodic insertion of full search + adaptive SATD threshold
20
Hardware Architecture
Assumptions A RISC can execute one instruction per cycle, except multiplication requiring two. A processing element (PE) can generate predictors of one pixel per cycle.
21
Hardware Architecture
Solutions luma chroma Produce all modes per cycle Produce one mode per cycle 30fps # of modes Avg. cycles per predictors
22
Hardware Architecture
Comparisons in different degrees of parallelism
23
Hardware Architecture
DRAM M A B C D E F G H I K J L Register
24
Hardware Architecture
Four-Parallel Reconfigurable Intra Prediction Generator 8-bit adder 9-bit adder
25
Hardware Architecture
M A B C D E F G H I K J L Intra Prediction Generator
26
Hardware Architecture
Top PE0 PE1 PE2 PE3 Cycle 1: T0+T4+T8+T12 Cycle 1: T1+T5+T9+T13 Cycle 1: T2+T6+T10+T14 Cycle 1: T3+T7+T11+T15 Cycle 2: +L0+L4+L8 Cycle 2: +L0+L5+L9 Cycle 2: +L2+L6+L10 Cycle 2: +L3+L7+L11 Cycle 3: +L12 Cycle 3: +L13 Cycle 3: +L14 Cycle 3: +L15 Left Cycle 4: +++ I16MB DC Prediction Mode
27
Hardware Architecture
I16MB Plane Prediction Mode Pred[y, x] = Clip1((a + b (x – 7) + c (y – 7) >> 5) a = 16 (p[-1, 15] + p[15, -1]) b = (5 H + 32) >> 6 c = (5 V + 32) >> 6 H = 7x’=0 (x’+1) (p[-1, 8+x’] – p[-1, 6 – x’]) V = 7x’=0 (y’+1) (p[8+y’, -1] – p[6 – y’, -1]) Pred[0,0] Pred[0,8] Pred[0,4] Pred[0,12] A0 A1 A2 A3
28
Hardware Architecture
29
Hardware Architecture
30
Hardware Architecture
Transform (Implemented by shifters and adders) DCT iDCT Hadamard
31
Hardware Architecture
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.