12/5/2018.

12/5/2018

Hardware Optimized DCT-IDCT Implementation on Verilog HDL
Hardware Optimized DCT-IDCT Implementation on Verilog HDL RAHUL SRIKUMAR ECE734:VLSI ARRAY STRUCTURES FOR DSP /10/13 12/5/2018

Contents Algorithm Implementations Performance Results Conclusion
Future Work 12/5/2018

Algorithm 8 point DCT 2D DCT = C*X*Transpose(C) C – coefficient matrix
12/5/2018

Algorithm(Cont’d) 1D DCT = C*X 2D DCT = Transpose(1D DCT)* C
1D IDCT = Transpose(C) * 2D DCT 2D IDCT = Transpose(1D IDCT) * Transpose(C) 12/5/2018

Implementations Part 1 Input word length – 8 bits
1D DCT internal word length – 11 bits 2D DCT output word length – 9 bits 2D IDCT output word length – 8 bits 4 implementations were evaluated Serial In (SI) – 1 pixel at a time 2 Parallel In (2PI) – 2 pixels at a time 4 Parallel In (4PI) – 4 pixels at a time 8 Parallel In (8PI) – 8 pixels at a time 12/5/2018

Implementations Part 2 8 registers of 8 bits each for coefficient storage. very efficient when compared to 64 registers required for 8*8 DCT/IDCT computation. 2 RAMS each of 64 locations(8 bit wide) are used. RAMS are enabled in the order en_ram1_write->(en_ram1_read, en_ram2_write) ->en_ram2_read 12/5/2018

Performance 1 Serial In (1 pixel at a time) Read 8 inputs = 8 cycles
Register 8 inputs + sign extension = 1 cycle Add/Sub = 1 cycle Absolute value = 1 cycle Multiplication = 1 cycle Final addition = 2 cycles Total = 14 cycles 12/5/2018

Performance 2 2 Parallel In (2 pixel at a time)

Synthesis Target Platform : ALTERA Cyclone IV GX FPGA
Tool Used : Quartus II Language Used : Verilog 12/5/2018

Results 1 Serial In has lowest synthesized combinational
area because of lowest number of wires needed to feed in the data. 12/5/2018

Results 2 Serial In has lowest synthesized area due to least
number of storage elements and counters required to process the data. 12/5/2018

Results 3 8 parallel In takes 236 cycles in contrast to 246 for
serial in. 12/5/2018

Conclusion Serial In occupies ~6% less area than 8 parallel In with a
performance degradation that is comparatively lower(~4%). 12/5/2018

References A Fast Hybrid Dct Architecture Supporting H.264, Vc-1,
Mpeg-2, Avs And Jpeg Codecs by Muhammad Martuza, Carl McCrosky and Khan Wahid at 11TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCES, SIGNAL PROCESSING AND ITS APPLICATIONS. An Area Efficient Dct Architecture For Mpeg-2 Video Encoder by Kyeounsoo Kim and Jong-Seog Koh in IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 45, NO. 1, FEBRUARY 1999. Architecture Design of Shape-Adaptive Discrete Cosine Transform and Its Inverse for MPEG-4 Video Coding by Hui-Cheng Hsu et. Al in IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 3, MARCH 2008. Integer DCT Based on Direct-Lifting of DCT-IDCT for Lossless-to-Lossy Image Coding by Taizo Suzuki, Student Member, IEEE, and Masaaki Ikehara, Senior Member, IEEE in IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010. 12/5/2018

12/5/2018.

Similar presentations

Presentation on theme: "12/5/2018."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

12/5/2018.

Similar presentations

Presentation on theme: "12/5/2018."— Presentation transcript:

Similar presentations

About project

Feedback