Download presentation
Presentation is loading. Please wait.
1
High Speed Hardware Implementation of an H.264 Quantizer. Alex Braun Shruti Lakdawala
2
H.264 Video Compression Standard Process of compacting data into smaller number of bits. Achieved by: removing redundancy between consecutive frames. Transforming the data into a different domain. Quantization Reordering the data and encoding it as compactly as possible
3
H.264 Encoder block diagram
4
Quantization Scales the data down to a smaller range of values thereby reducing the number of bits. To avoid floating point arithmetic the values are rounded. There are 52 values of Qstep.
5
Quantization - 2 To reduce the complexity of the quantization block, the division operation is implemented by multiplying the array by a multiplication factor(MF) and then using a binary right shift =
6
Implementation Quantisation Equation Architecture
7
Quantization on Three Arrays H.264 performs quantization on three arrays: 4 x 4 array of Residual coefficients 4 x 4 array of Luma coefficients 2 x 2 array of Chroma coefficients Mode select will be used to quantize three arrays differently because the quantization equation is slightly different for each array.
8
New Architecture Pipelining is used for fast implementation LUT Data Path Y mode QP f MF QP_div_6 Z
9
Look Up Table Multiplication factor and qbits depends on the position of the elements in the array and the quantization step. Look Up Tables required for pre- calculated MF and qbits.
10
Data Path Y f QP_div_6 MF 6 Stage Multiplier + + + 1 CO Right Shift Z Six Stage Booth-Recoded Wallace Tree Multiplier Add and Shift broken into two stages Two 15-bit Fast Carry Look Ahead Adders One 16-bit Fast Carry Look Ahead Incrementer and Right Shift Block
11
Performance Latency As Tested: 9 clock cycles If Implemented with LUT in parallel with last stage of transform block: 8 clock cycles Throughput 1 result per clock cycle Frequency As Implemented: 309 MHz Max Frequency of Data Path Without Area Constraints 355 MHz
12
Area Area (gates) Data Path58037 High Speed Data Path (not used in final design) 60845 LUTs10385 Total System938977
13
Comparison to Another Implementation PipelinedCombinational TechnologyTSMC 0.25µXlininx Virtex-2 Pro (0.15µ) Latency8-9 clocks1 clock Frequency309 MHz94 MHz Area LUT (gates)1038510320 Area Quantizer (gates) 928592119040 Area System (gates)938977129360 Critical Path Delay3.23ns10.6ns
14
Areas for Improvement Implement LUTs as ROMs to reduce area Pipeline LUTs and use faster Data Path implementation for ~15% improvement Implement in a smaller technology Gate clocks to the 12 unused data paths when in 2x2 DC Chroma mode
15
References Richardson, Iain E. G. H.264 and MPEG-4 Video Compression. John Wiley & Sons Ltd.England. 2003 H.265/MPEG-4 Part 10 Tutorials. http://www.vcodex.com/h264.htmlhttp://www.vcodex.com/h264.html Kordasiweicz R., Shirani S.. “Hardware Implementation of the Optimized Transform and Quantization Blocks of H.264”. Electrical and Computer Engineering, 2004. Canadian Conference on Volume 2, 2-5 May 2004 Page(s):943 - 946 Vol.2 Malvar, H., Hallapuro, A., Karczewicz, M., Kerofsky, L.. “Low-Complexity Transform and Quantization in H.264/AVC”. Circuits and Systems for Video Technology, IEEE Transactions on Volume 13, Issue 7, July 2003 Page(s):598 – 603 H. S. Malvar, “Low-Complexity length-4 transform and quantization with 16-bit arithmetic,” in ITU-T SG16, Sept. 2001, Doc. VCEG-N44. L. Kerofsky and S. Lei, “Reduced bit-depth quantization,” in JointVideoTeam (JVT) of ISO/IEC MPEG and ITU-T VCEG, Sept. 2001, Doc.VCEG-N20. L. Kerofsky, “H.26L transform/quantization complexity reduction Ad Hoc Report,” in Joint Video Team(JVT) of ISO/IEC MPEG and ITU-T VCEG, Nov. 2001, Doc. VCEG-O09.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.