Automatic Generation of Customized Discrete Fourier Transform IPs Grace Nordin, Peter A. Milder, James C. Hoe, Markus Püschel Carnegie Mellon University.

Automatic Generation of Customized Discrete Fourier Transform IPs Grace Nordin, Peter A. Milder, James C. Hoe, Markus Püschel Carnegie Mellon University This project is supported in part by NSF awards ITR/NGS-0325687 and SYS-0310941 and a DARPA DESA program www.spiral.net

The Paradox of Reusable IPs Boon to productivity  zero effort required  zero knowledge required  zero chance to introduce new bugs Why repeat what has already been done? Bane to optimality  finding the right functionality with the right interface  design tradeoff -- performance, area, power, accuracy..... Are you getting what you really wanted? Solution: Solution: parameterized automatic IP generators  zero effort, knowledge or bugs  allows application specific customization  facilitates design exploration

Our Work: Discrete Fourier Transform IPs Discrete Fourier Transform (DFT)  important building block in DSP applications  numerous design “cores” available Current IP libraries support:  various sizes, number formats, data orderings small number  only a small number of microarchitecture choices  (Xilinx LogiCore DFT gives 3 choices) We generate IPs with custom design tradeoffs  degree of parallelism in microarchitecture (min  max)  resource preference (e.g. BRAM vs. slices in FPGAs) Extensible to other common linear DSP transforms

Outline Introduction Formula-Driven Design Generation Microarchitecture Parameterization Generator User Interface Experimental Results Conclusions

Transforms as Formulas [www.spiral.net] Transform computation is represented as matrix-vector multiplication  Matrix-vector multiplication is O(n 2 ) operations “Fast” algorithms factor the transform into a sequence of structured sparse matrices  O(n log n) operations DFT: FFT: Datapath easily formed from factorized formulas

Formula to Datapath Given where is:  apply, then  is a permutationpermute  apply, times in parallel  is a diagonalscale A A B A ×4×4 ×2×2 ×7×7 ×8×8

Simple regular structure embodied in formula Example: Pease DFT diagonal permutation butterfly parallel k stages stage 1 stage 2 stage 3

Pease DFT Example: DFT 8 x x x x x x x x x x x x stage 1 stage 2 stage 3 (formula is applied from right to left) (datapath is built left to right) Repeating column structure  hardware reuse without performance penalty without performance penalty

x x x x Horizontal folding x x x x x x x x our baseline design degree of freedom: vertical parallelism p  parameter p input bypass register p

Vertical (V-)folding according to p latency Fine-grained control over cost/latency tradeoff cost

User Interface http://www.spiral.net/hardware/dftgen.html common DFT options customization options

We compare Xilinx’s fixed design against our variable generated designs Evaluation We compare against Xilinx LogiCore DFT Ver. 3.1  radix-4 burst I/O interface XilinxSPIRAL datapathfixed, one radix- 4 basic block variable, p radix-2 basic blocks cost-performance tradeoff fixed user-controlled, varies with p Comparison  DFT n = {64, 1024, 2048}; width = 16; bit-reversed output  Xilinx ISE ver. 6.1, Xilinx Virtex2-Pro XC2VP100-6

DFT 1024 relative to Xilinx Xilinx Performance and resources scale with p 1.0 = 1955 slices 1.0 = 7 BRAMs1.0 = 1 / 5.6 µsec logic storage performance

0 2 4 6 8 10 12 14 12481632 p relative slices 0 5 10 15 20 25 30 35 12481632 p relative BRAMs Resource usage preferences Xilinx 1.0 = 1955 slices 1.0 = 7 BRAMs1.0 = 1 / 5.6 µsec logic storage performance 0 2 4 6 12481632 p speedup

Resource usage preferences Can control tradeoff between slices and BRAMs Xilinx exchange BRAM for slices  very little change in performance 1.0 = 1955 slices 1.0 = 7 BRAMs1.0 = 1 / 5.6 µsec logic storage performance

DFT 64 and DFT 2048 2048 1.0 = 2140 slices 1.0 = 7 BRAMs 1.0 = 1 transform / 24.578 µsec Trends hold for sizes 64, 2048 1.0 = 1743 slices 1.0 = 8 BRAMs 1.0 = 1 transform / 0.648 µsec 64 Xilinx

Related Work Kumhom, Johnson, Nagvajara, ASIC/SOC 2000  universal FFT processor microarchitecture based on processing elements interconnected by on-chip reconfigurable network  microarchitecture is scalable in the number of elements  supports both Cooley Tukey and Pease Choi, Scrofano, Prasanna, Jang, FPGA’2003  mapped radix-4 Cooley-Tukey algorithm onto log 2 (n)/2 DFT 4 primitives  scalable datapath between 1 element and 4 elements at a time  show energy and performance improvements from scaling

Conclusions Parameterized DFT IP generator formula-driven  matrix formula-driven synthesis  performance/cost tradeoff resources vs. latency  fine-grained control over resources vs. latency  resource usage preference slices and BRAM  can balance tradeoff between slices and BRAM Key results  efficient:  efficient: the Xilinx design point can be matched  customizable: design tradeoffs  customizable: design tradeoffs directly controllable  easy to use: simple yet powerful web interface

Web Generator SPIRAL www.spiral.net This work is part of the SPIRAL project, which aims to push the limits of automation in software and hardware development for DSP algorithms. For more information visit: www.spiral.net http://www.spiral.net/hardware/dftgen.html http://www.spiral.net/hardware/dftgen.html

V-folding according to p (continued) 0123456701234567 0415263704152637 6  4  2  0 7  5  3  1 p max = n/2 p min = 1

V-Folding of Permutations [Takala, et al. ICASSP’2001] where

Automatic Generation of Customized Discrete Fourier Transform IPs Grace Nordin, Peter A. Milder, James C. Hoe, Markus Püschel Carnegie Mellon University.

Similar presentations

Presentation on theme: "Automatic Generation of Customized Discrete Fourier Transform IPs Grace Nordin, Peter A. Milder, James C. Hoe, Markus Püschel Carnegie Mellon University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automatic Generation of Customized Discrete Fourier Transform IPs Grace Nordin, Peter A. Milder, James C. Hoe, Markus Püschel Carnegie Mellon University.

Similar presentations

Presentation on theme: "Automatic Generation of Customized Discrete Fourier Transform IPs Grace Nordin, Peter A. Milder, James C. Hoe, Markus Püschel Carnegie Mellon University."— Presentation transcript:

Similar presentations

About project

Feedback