Presentation is loading. Please wait.

Presentation is loading. Please wait.

High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov.

Similar presentations


Presentation on theme: "High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov."— Presentation transcript:

1 High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov 10, 2009

2 Overview Motivation Review simulated annealing Approaches Summary

3 Motivation

4 Simulated Annealing Placement Probabilistic approach to finding optimal solution Behavior  Moves through solution space Greedily Randomly  Balance between greediness and randomness is controlled by a temperature  Temperature evolves through time based on a cooling schedule

5 Simulated Annealing Placement For a single move  Compute change in cost: ΔC  Accept move: ΔC < 0 ΔC > 0, with probability e -ΔC/T Repeat while gradually decreasing T and window size c4c1 c5 c2 c3 t3

6 Constraints Runs on commodity hardware Good quality of results  Robust Determinism  Bug reporting  Consistent regression results

7 Selected Previous Work Close related  Move acceleration  Parallel moves Other methods  Independent sets  Partitioned placements  Speculative

8 Algorithm #1

9 Algorithm #2

10 Objective Determine efficacy Analyze runtime and categorize  Memory  Synchronization  Infrastructure  Evaluation  Proposal

11 Methodology Parallel equivalent flow  Serial flow which mimic parallel flow  Emulates behavior of multithreaded application by using only one thread/core Useful for comparison  Accounts for infrastructure overhead

12 Methodology Attributing runtime Two types of measurements  Bottom up (bu) measure each component of a move  End-to-end (e2e) measure runtime for entire run

13 Methodology

14 Test sets  Set of 11 Stratix® II FPGA benchmark designs IP and customer circuits 10k to 100k logic cells  Also tested on 40 Stratix II FPGA circuits Obtained similar result

15 Results for Algorithm #1

16 Moves attribution

17 Overhead analysis

18 Observations Theoretical speedup 1.7x  Measured: 1.3x (best) Increase in evaluation runtime  Due to reduced cache locality Proposal time is “hidden”

19 Analysis Time spent on stall is negligible Evaluation accounts for most of overhead Little to gain by removing determinism  Serial equivalency is less than 3% runtime

20 Summary for Algorithm #1 Speedup: 1 – 1.3x Memory inefficiency is the biggest bottleneck Theoretically algorithm should scale  However, difficult to partition and balance two stages

21 Speedups for Algorithm #2

22 Attribution on 2 cores

23

24 Attribution on 4 core

25 Attribution on 4 cores

26 Observations Memory latency due to inter-processor communication  Worsens with more cores

27 Summary for Algorithm #2 Parallel moves has better scalability than pipelined moves Bottleneck is still memory Again serial equivalency costs little

28 Take Home Messages Memory is important Good algorithms are even more important


Download ppt "High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov."

Similar presentations


Ads by Google