Presentation is loading. Please wait.

Presentation is loading. Please wait.

12/5/2018.

Similar presentations


Presentation on theme: "12/5/2018."— Presentation transcript:

1 12/5/2018

2 Hardware Optimized DCT-IDCT Implementation on Verilog HDL
Hardware Optimized DCT-IDCT Implementation on Verilog HDL RAHUL SRIKUMAR ECE734:VLSI ARRAY STRUCTURES FOR DSP /10/13 12/5/2018

3 Contents Algorithm Implementations Performance Results Conclusion
Future Work 12/5/2018

4 Algorithm 8 point DCT 2D DCT = C*X*Transpose(C) C – coefficient matrix
12/5/2018

5 Algorithm(Cont’d) 1D DCT = C*X 2D DCT = Transpose(1D DCT)* C
1D IDCT = Transpose(C) * 2D DCT 2D IDCT = Transpose(1D IDCT) * Transpose(C) 12/5/2018

6 Implementations Part 1 Input word length – 8 bits
1D DCT internal word length – 11 bits 2D DCT output word length – 9 bits 2D IDCT output word length – 8 bits 4 implementations were evaluated Serial In (SI) – 1 pixel at a time 2 Parallel In (2PI) – 2 pixels at a time 4 Parallel In (4PI) – 4 pixels at a time 8 Parallel In (8PI) – 8 pixels at a time 12/5/2018

7 Implementations Part 2 8 registers of 8 bits each for coefficient storage. very efficient when compared to 64 registers required for 8*8 DCT/IDCT computation. 2 RAMS each of 64 locations(8 bit wide) are used. RAMS are enabled in the order en_ram1_write->(en_ram1_read, en_ram2_write) ->en_ram2_read 12/5/2018

8 Performance 1 Serial In (1 pixel at a time) Read 8 inputs = 8 cycles
Register 8 inputs + sign extension = 1 cycle Add/Sub = 1 cycle Absolute value = 1 cycle Multiplication = 1 cycle Final addition = 2 cycles Total = 14 cycles 12/5/2018

9 Performance 2 2 Parallel In (2 pixel at a time)
Register 8 inputs + sign extension = 4 cycle Add/Sub = 1 cycle Absolute value = 1 cycle Multiplication = 1 cycle Final addition = 2 cycles Total = 9 cycles 12/5/2018

10 Performance 3 4 Parallel In (4 pixel at a time)
Register 8 inputs + sign extension = 2 cycle Add/Sub = 1 cycle Absolute value = 1 cycle Multiplication = 1 cycle Final addition = 2 cycles Total = 7 cycles 12/5/2018

11 Performance 4 8 Parallel In (8 pixel at a time)
Register 8 inputs + sign extension = 1 cycle Add/Sub = 1 cycle Absolute value = 1 cycle Multiplication = 1 cycle Final addition = 2 cycles Total = 6 cycles 12/5/2018

12 Synthesis Target Platform : ALTERA Cyclone IV GX FPGA
Tool Used : Quartus II Language Used : Verilog 12/5/2018

13 Results 1 Serial In has lowest synthesized combinational
area because of lowest number of wires needed to feed in the data. 12/5/2018

14 Results 2 Serial In has lowest synthesized area due to least
number of storage elements and counters required to process the data. 12/5/2018

15 Results 3 8 parallel In takes 236 cycles in contrast to 246 for
serial in. 12/5/2018

16 Conclusion Serial In occupies ~6% less area than 8 parallel In with a
performance degradation that is comparatively lower(~4%). 12/5/2018

17 References A Fast Hybrid Dct Architecture Supporting H.264, Vc-1,
Mpeg-2, Avs And Jpeg Codecs by Muhammad Martuza, Carl McCrosky and Khan Wahid at 11TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCES, SIGNAL PROCESSING AND ITS APPLICATIONS. An Area Efficient Dct Architecture For Mpeg-2 Video Encoder by Kyeounsoo Kim and Jong-Seog Koh in IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 45, NO. 1, FEBRUARY 1999. Architecture Design of Shape-Adaptive Discrete Cosine Transform and Its Inverse for MPEG-4 Video Coding by Hui-Cheng Hsu et. Al in IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 3, MARCH 2008. Integer DCT Based on Direct-Lifting of DCT-IDCT for Lossless-to-Lossy Image Coding by Taizo Suzuki, Student Member, IEEE, and Masaaki Ikehara, Senior Member, IEEE in IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010. 12/5/2018


Download ppt "12/5/2018."

Similar presentations


Ads by Google