Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Computing Origami: Folding Streams in FPGAs S. M. Farhad PhD Student University of Sydney DAC 2009, California, USA.

Similar presentations


Presentation on theme: "A Computing Origami: Folding Streams in FPGAs S. M. Farhad PhD Student University of Sydney DAC 2009, California, USA."— Presentation transcript:

1 A Computing Origami: Folding Streams in FPGAs S. M. Farhad PhD Student University of Sydney DAC 2009, California, USA

2 2 Outline Motivation  Stream programming  FPGA  Problem Stream Folding Results Conclusion 2

3 Stream Programming Paradigm Programs expressed as stream graphs  Streams: Sequence of data elements  Actor: Functions applied to streams Independent actors with explicit communication Regular and repeating computation 3 Actor/Filter Streams

4 FPGA FPGAs are widely available as programmable coprocessors Opportunities to exploit FPGA-based acceleration  Multimedia, networking, graphics, and security codes 4

5 Problem Maximizing throughput subject to  Area and latency constraints Resolving bottleneck actors  The replicated filters do not require resynthesis 5

6 Motivating Example 6

7 7

8 8

9 9 Outline Motivation  Stream programming  FPGA  Problem Stream Folding Results Conclusion 9

10 Area/Throughput Design Folding 1 foreach Filter f in S do 2 workFactor[f] = f.latency.S.runs(f); 3 designPointArea + = f.area.workFactor[f]; 4 scaleLimit = min f.hasState (1/workFactor[f]); 5 scaling = min(AREA/designPointArea, scaleLimit); 6 foreach Filter f in S do 7 replication[f] = workFactor[f].scaling; 8 while area(replication) > AREA do 9 replication = reduceThroughput(replication); 10

11 Calculating Throughput 11

12 Calculating Latency FPGAs that are coupled to host processors Initiation interval (DMA) Replication improves throughput, it often increases the latency! Major factors for latency variation  Non-periodic data arrival  Data-token reordering  Local congestion 12

13 Latency constrained design folding 1 latConf= null ; T = ∞; 2 while throughput(thrConf) ≤ T do 3 if feasibleImprovement(thrConf) then 4 candidates = simAnnealing(thrConf, T); 5 foreach candidate in candidates do 6 if throughput(candidate) < T then 7 latConf = candidate; 8 T = throughput(latConf); 9 thrConf = reduceThroughput(thrConf); 10 return latConf 13

14 Results Benchm ark Minimum areaBest throughputConstrained design LUTsLatencyIILUTsLatencyIILUTsLatencyII Constrai nt Run time MatrixM ult1498480197618185345581757 Latency ≤ 1751.14s Serpent3028102743878773230539014 Latency ≤ 9100.73s FFT23761011993433707642395308687 AREA ≤ 4000034.7s FMRadio374583713987564371136251137120 AREA ≤ 650001.01s DCT4575234931372563491915043492 AREA ≤ 1200000.73s BitonicS ort4392010423131760104214740012822 AREA ≤ 5000018.3s Syntheti c350309135159905042149030947 AREA ≤ 15000.43s 14

15 Questions?


Download ppt "A Computing Origami: Folding Streams in FPGAs S. M. Farhad PhD Student University of Sydney DAC 2009, California, USA."

Similar presentations


Ads by Google