Download presentation
Presentation is loading. Please wait.
Published byLee Bradford Modified over 9 years ago
1
High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov 10, 2009
2
Overview Motivation Review simulated annealing Approaches Summary
3
Motivation
4
Simulated Annealing Placement Probabilistic approach to finding optimal solution Behavior Moves through solution space Greedily Randomly Balance between greediness and randomness is controlled by a temperature Temperature evolves through time based on a cooling schedule
5
Simulated Annealing Placement For a single move Compute change in cost: ΔC Accept move: ΔC < 0 ΔC > 0, with probability e -ΔC/T Repeat while gradually decreasing T and window size c4c1 c5 c2 c3 t3
6
Constraints Runs on commodity hardware Good quality of results Robust Determinism Bug reporting Consistent regression results
7
Selected Previous Work Close related Move acceleration Parallel moves Other methods Independent sets Partitioned placements Speculative
8
Algorithm #1
9
Algorithm #2
10
Objective Determine efficacy Analyze runtime and categorize Memory Synchronization Infrastructure Evaluation Proposal
11
Methodology Parallel equivalent flow Serial flow which mimic parallel flow Emulates behavior of multithreaded application by using only one thread/core Useful for comparison Accounts for infrastructure overhead
12
Methodology Attributing runtime Two types of measurements Bottom up (bu) measure each component of a move End-to-end (e2e) measure runtime for entire run
13
Methodology
14
Test sets Set of 11 Stratix® II FPGA benchmark designs IP and customer circuits 10k to 100k logic cells Also tested on 40 Stratix II FPGA circuits Obtained similar result
15
Results for Algorithm #1
16
Moves attribution
17
Overhead analysis
18
Observations Theoretical speedup 1.7x Measured: 1.3x (best) Increase in evaluation runtime Due to reduced cache locality Proposal time is “hidden”
19
Analysis Time spent on stall is negligible Evaluation accounts for most of overhead Little to gain by removing determinism Serial equivalency is less than 3% runtime
20
Summary for Algorithm #1 Speedup: 1 – 1.3x Memory inefficiency is the biggest bottleneck Theoretically algorithm should scale However, difficult to partition and balance two stages
21
Speedups for Algorithm #2
22
Attribution on 2 cores
24
Attribution on 4 core
25
Attribution on 4 cores
26
Observations Memory latency due to inter-processor communication Worsens with more cores
27
Summary for Algorithm #2 Parallel moves has better scalability than pipelined moves Bottleneck is still memory Again serial equivalency costs little
28
Take Home Messages Memory is important Good algorithms are even more important
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.