Download presentation
Presentation is loading. Please wait.
1
Automatic Generation of Customized Discrete Fourier Transform IPs Grace Nordin, Peter A. Milder, James C. Hoe, Markus Püschel Carnegie Mellon University This project is supported in part by NSF awards ITR/NGS-0325687 and SYS-0310941 and a DARPA DESA program www.spiral.net
2
Slide 2 The Paradox of Reusable IPs Boon to productivity zero effort required zero knowledge required zero chance to introduce new bugs Why repeat what has already been done? Bane to optimality finding the right functionality with the right interface design tradeoff -- performance, area, power, accuracy..... Are you getting what you really wanted? Solution: Solution: parameterized automatic IP generators zero effort, knowledge or bugs allows application specific customization facilitates design exploration
3
Slide 3 Our Work: Discrete Fourier Transform IPs Discrete Fourier Transform (DFT) important building block in DSP applications numerous design “cores” available Current IP libraries support: various sizes, number formats, data orderings small number only a small number of microarchitecture choices (Xilinx LogiCore DFT gives 3 choices) We generate IPs with custom design tradeoffs degree of parallelism in microarchitecture (min max) resource preference (e.g. BRAM vs. slices in FPGAs) Extensible to other common linear DSP transforms
4
Slide 4 Outline Introduction Formula-Driven Design Generation Microarchitecture Parameterization Generator User Interface Experimental Results Conclusions
5
Slide 5 Transforms as Formulas [www.spiral.net] Transform computation is represented as matrix-vector multiplication Matrix-vector multiplication is O(n 2 ) operations “Fast” algorithms factor the transform into a sequence of structured sparse matrices O(n log n) operations DFT: FFT: Datapath easily formed from factorized formulas
6
Slide 6 Formula to Datapath Given where is: apply, then is a permutationpermute apply, times in parallel is a diagonalscale A A B A ×4×4 ×2×2 ×7×7 ×8×8
7
Slide 7 Outline Introduction Formula-Driven Design Generation Microarchitecture Parameterization Generator User Interface Experimental Results Conclusions
8
Slide 8 Simple regular structure embodied in formula Example: Pease DFT diagonal permutation butterfly parallel k stages stage 1 stage 2 stage 3
9
Slide 9 Pease DFT Example: DFT 8 x x x x x x x x x x x x stage 1 stage 2 stage 3 (formula is applied from right to left) (datapath is built left to right) Repeating column structure hardware reuse without performance penalty without performance penalty
10
Slide 10 x x x x Horizontal folding x x x x x x x x our baseline design degree of freedom: vertical parallelism p parameter p input bypass register p
11
Slide 11 Vertical (V-)folding according to p latency Fine-grained control over cost/latency tradeoff cost
12
Slide 12 Outline Introduction Formula-Driven Design Generation Microarchitecture Parameterization Generator User Interface Experimental Results Conclusions
13
Slide 13 User Interface http://www.spiral.net/hardware/dftgen.html common DFT options customization options
14
Slide 14 Outline Introduction Formula-Driven Design Generation Microarchitecture Parameterization Generator User Interface Experimental Results Conclusions
15
Slide 15 We compare Xilinx’s fixed design against our variable generated designs Evaluation We compare against Xilinx LogiCore DFT Ver. 3.1 radix-4 burst I/O interface XilinxSPIRAL datapathfixed, one radix- 4 basic block variable, p radix-2 basic blocks cost-performance tradeoff fixed user-controlled, varies with p Comparison DFT n = {64, 1024, 2048}; width = 16; bit-reversed output Xilinx ISE ver. 6.1, Xilinx Virtex2-Pro XC2VP100-6
16
Slide 16 DFT 1024 relative to Xilinx Xilinx Performance and resources scale with p 1.0 = 1955 slices 1.0 = 7 BRAMs1.0 = 1 / 5.6 µsec logic storage performance
17
Slide 17 0 2 4 6 8 10 12 14 12481632 p relative slices 0 5 10 15 20 25 30 35 12481632 p relative BRAMs Resource usage preferences Xilinx 1.0 = 1955 slices 1.0 = 7 BRAMs1.0 = 1 / 5.6 µsec logic storage performance 0 2 4 6 12481632 p speedup
18
Slide 18 Resource usage preferences Can control tradeoff between slices and BRAMs Xilinx exchange BRAM for slices very little change in performance 1.0 = 1955 slices 1.0 = 7 BRAMs1.0 = 1 / 5.6 µsec logic storage performance
19
Slide 19 DFT 64 and DFT 2048 2048 1.0 = 2140 slices 1.0 = 7 BRAMs 1.0 = 1 transform / 24.578 µsec Trends hold for sizes 64, 2048 1.0 = 1743 slices 1.0 = 8 BRAMs 1.0 = 1 transform / 0.648 µsec 64 Xilinx
20
Slide 20 Related Work Kumhom, Johnson, Nagvajara, ASIC/SOC 2000 universal FFT processor microarchitecture based on processing elements interconnected by on-chip reconfigurable network microarchitecture is scalable in the number of elements supports both Cooley Tukey and Pease Choi, Scrofano, Prasanna, Jang, FPGA’2003 mapped radix-4 Cooley-Tukey algorithm onto log 2 (n)/2 DFT 4 primitives scalable datapath between 1 element and 4 elements at a time show energy and performance improvements from scaling
21
Slide 21 Conclusions Parameterized DFT IP generator formula-driven matrix formula-driven synthesis performance/cost tradeoff resources vs. latency fine-grained control over resources vs. latency resource usage preference slices and BRAM can balance tradeoff between slices and BRAM Key results efficient: efficient: the Xilinx design point can be matched customizable: design tradeoffs customizable: design tradeoffs directly controllable easy to use: simple yet powerful web interface
22
Slide 22 Web Generator SPIRAL www.spiral.net This work is part of the SPIRAL project, which aims to push the limits of automation in software and hardware development for DSP algorithms. For more information visit: www.spiral.net http://www.spiral.net/hardware/dftgen.html http://www.spiral.net/hardware/dftgen.html
23
Slide 23 V-folding according to p (continued) 0123456701234567 0415263704152637 6 4 2 0 7 5 3 1 p max = n/2 p min = 1
24
Slide 24 V-Folding of Permutations [Takala, et al. ICASSP’2001] where
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.