Download presentation
Presentation is loading. Please wait.
Published byFrancis McCoy Modified over 9 years ago
1
H.264 Deblocking Filter Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com
2
Outline Introduction H.264 encoder and decoder Overview of DBF algorithm Hardware architecture of DBF Comparison with previous architectures
3
Introduction Video compression H.264/MPEG4 H.264 encoder and decoder includes “Deblocking filter (DBF)” DBF improves Visual quality of decoded frames by reducing artifacts and discontinuities DBF algorithm is complex in H.264 than old standards
4
Introduction Steps Applied to each edge of all the 4×4 luma and chroma blocks in a Macroblock Update 3 pixels in each direction Deblocking is applied or not (current and neighboring 4×4 blocks) DBF 16x16 DBF 16x16 hardware has less area and consumes less power than DBF 4×4 hardware To improve issue of hardware cost “Macroblock is a 16 x 16 pixel array”
5
Objective Hardware implementation of deblocking filter for H.264 4x4 block and 16x16 block For portable devices (hardware cost issue) 4x4 for high performance 16x16 for low power consumption
6
H.264 encoder block diagram
7
H.264 decoder block diagram
8
Edge filterting order DBF removes disturbing block boundaries 4x4 luma and chroma blocks Vertical block edges are filtered before Changing of up to 3 pixels on both sides DBF Steps Edge level (boundary strength) Sample level (α, β threshold value) Slice level (offset parameters)
9
Edge level adaptivity of the Filter To every edge of 4 x 4, boundary strength (Bs) parameter is assigned Evaluated from top to bottom Bs determines the strength of the filtering 4 means strongest filter, 0 means no filter 1-3 standard filter
10
Sample-Level Adaptivity of the Filter Distinguish between true edges and those created by quantization True edges should be left unfiltered while filtering artificial edges quantization-dependent parameters Filtering or not?
11
Slice level adaptivity of the Filter Encoder selects offsets to adjust α and β True edges should be left unfiltered while filtering artificial edges To control the properties of deblocking filter by transmitting nonzero offsets Reducing the amount of filtering by transmitting negative offsets Using positive offsets to increase the amount of filtering
12
H.264 Deblocking Filter Algorithm small change in intensity Clipping to remove blurring by limiting Δ
13
Hardware architecture IBUF is used to store one reconstructed MB (256 lum pixels + 124 chro pixels) SPAD and SRAMs to store partially filtered pixels DATAPATH for both DBF 4x4 and 16 x 16
14
Hardware architecture cont’d.. Two stage pipe line 1 st stage includes 12-bit adder and two shifters 2 nd stage includes 12-bit comparator and several two’s complementary and multiplexers conditional branch results multiplication and addition
15
Hardware architecture cont’d.. 4 x 4 DBF starts fileting as soon as new block 4 x 4 is ready 16 x 16 DBF waits for IBUF to be filled with IT/IQ Starts filtering after a new block is ready Processing Order of 4×4 Blocks by IT/IQ Module Hybrid edge filtering order Standard sequential filtering order
16
Hardware architecture cont’d.. Neighbors should be available in local on chip memory Left 4 x 4 blocks are stored in SPAD Uper in LUMA and CHRM SRAMs For a CIF (352x288) video Uper 4x4 luminance blocks Uper 4x4 chro blocks 4×352×8 = 1408×8 4x88x8+4x88x8 = 704×8 “Previously, off chip memory was used, but on chip consumes less power” “No need of Transpose pixel arrays” To remove irregularity
17
Implementation Video frame is loaded into SRAM It is used as an input to DBF running on FPGA DBF hardware applies H.264 DBF algorithm And writes frame back to SRAM The resulting frame is shown on the LCD 200 MHz, 30 VGA (640x480) frames/second. Synthesized to 7.4 K and 5.3 K gates Xilinx Virtex II FPGA, power estimated using Xilinx XPower tool Arm Versatile PB926EJ-S development board
18
Performance 36% less power consumption reading unfiltered MBs and writing the filtered MBs to the SRAM
19
Comparison [20] standard cell methodology is a method of designing application-specific integrated circuits (ASICs) with digital-logic features. standard cell library is a collection of low-level electronic logic functions such as AND, OR, INVERT, flip-flops, latches, and buffers.
21
Low power H.264 Deblocking Filter with hybrid filtering Presented By: Irfan Ullah
22
Outline Introduction of Edge filtering Architecture of DBF Transposition buffer Comparison with previous architectures
23
Edge filter order for 16x16 macroblock
24
Edge filter order for 16x16 macroblock cont’d
26
low power DF architecture
27
Hybrid architecture Horizontal Edge Skip Processing Architecture
28
Transposition Buffer Usage For QCIF video: 176x144 Left neighbor SRAM: 32x32 bits upper neighbor SRAM: 352x32 bits Transportation buffer: 640 bits Transportation buffers operates on 4x4 block of current MB Data bus 32 bits to access 4 samples each time Each filtered output needs 4 clock cycles Total cycles : 4 x 48 + 4 = 196 cycles Correct arrangement of data with separate SRAMs Reduces 100 clock cycles per MB by HESPA 48 edges
29
Transposition Buffer Usage cont’d..
30
Comparison
32
3D image processing VLSI system Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com
33
Introduction Image processing (Vision system, multimedia processing, consumer electronics) Fast computational speed, small chip size, low power Read/write operation, signal control, data flow
34
3D image system 3D VLSI Image chip into several layers Stacked vertically Through-Silicon Via (TSV) Used to Avoid multi-layer pipe line delay Improve system operation Reconfigurable memory Bandwidth Decrease size IBM's Silicon Carrier Packaging Technology
35
3D image system Cont’d.. Single instruction multiple data (SIMD) Multiple instruction multiple data (MIMD)
36
3D image system Cont’d..
37
Chip architecture 3D image system
38
Process control 3D image processor system can control image memory configuration and pipeline data flow.
39
Network on chip
40
3D image system Cont’d.. Common robust design method to repair VLSI system error is reconfigurable re-healing technology
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.