QUIZ 1. Question 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which.

QUIZ 1

Question 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which drawbacks of the traditional FPGA CAD flow are targeted with the fragment level moves? 2

BSPlace: A BLE Swapping technique for placement 04.11.2014 Minsik Hong George Hwang Hemayamini Kurra Minjun Seo 3

Outline SCPlace Introduction Algorithm flowchart Net Counting Algorithm Results BSPlace Algorithm Demo Backup Slides If you guys ask minimal questions we can cover more Net Weighting VPR Datastructures 4

Rajavel, Senthilkumar Thoravi, and Ali Akoglu. "MO-Pack: Many-objective clustering for F PGA CAD." Proceedings of the 48th Design Automation Conference. ACM, 2011. 5

Simultaneous timing driven clustering and placement for FPGAs. Chen, Gang, and Jason Cong. Field Programmable Logic and Application. Springer Berlin Heidelberg, 2004. 158-167. 6

Key concept Fragment level move BLE to a new CLB Check for valid CLB configuration Feasibility (number of BLEs and input pins) Update the cost function Block level move CLB to CLB 7

BLE Level Swapping Advantages Fix Packing issues during simulated annealing Better Congestion Mitigation Better at Routeability Disadvantages Speed Complexity 8

SCPlace Algorithm 9

Additional feature of Journal version SCPlace 11

Use Novel net weighting 12

A novel net weighting algorithm for timing- driven placement Kong, Tim Tianming. Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design. ACM, 2002. 13

Accurate All Path Counting 14

a b c d e f 5 7 1 5 3 0/0 0/2 7/7 8/8 13/13 11/13 ARR/REQ a b c d e f Calculate F(t) Fs(a, c) = 7 – 0 – 7 = 0 Fs(b, c) = 7 – 0 – 2 = 2 2 0 0 0 0 a=2, T: the longest path delay 1 1 0 0 0 0 F(c) = F(c) + D{Fs(a, c), T} x F(a) + D{Fs(b, c), T} x F(b) = 0 + 1x1 + 0.88x1 = 1.88 1.88 1 1 delay 15

Calculate B(s) a b c d e f 5 7 1 5 3 0/0 0/2 7/7 8/8 13/13 11/13 ARR/REQ a b c d e f 0 0 1 1 0 0 Bs(d, e) = 13 – 5 – 8 = 0 Bs(d, f) = 13 – 3 – 8 = 2 0 0 0 0 2 a=2, T: the longest path delay D{Bs(a, c), T} = D{0,13} = 1 D{Bs(b, c), T} = D{0,13} = 1 D{Bs(c, d), T} = D{0,13} = 1 D{Bs(d, e), T} = D{0,13} = 1 D{Bs(d, f), T} = D{2,13} = 0.88 B(d) = B(d) + D{Bs(d, e), T} x B(e) + D{Bs(d, f), T} x B(f) = 0 + 1x1 + 0.88x1 = 1.88 1.88 1 1 16

Calculate AP(s, t) (a=2) D{slack(a, c), T} = D{0,13} = 1 D{slack(b, c), T} = D{2,13} = 0.88 D{slack(c, d), T} = D{0,13} = 1 D{slack(d, e), T} = D{0,13} = 1 D{slack(d, f), T} = D{2,13} = 0.88 a b c d e f 1.88/1.88 1.88/1 1/1.88 2 0 0 0 2 F(s)/B(t) slack AP(a,c) = F(a) x B(c) x D{slack(a, c), T} = 1 x 1.88 x 1 = 1.88 AP(b,c) = F(b) x B(c) x D{slack(b, c), T} = 1 x 1.88 x 0.88 = 1.65 a b cd e f 1.88 1.65 3.53 1.88 1.65 17

Results (Only use BLE swapping) 18 CLB = 4

Results (Only use BLE swapping) 19

Results (BLE + CLB swapping) 20

Results (BLE + CLB swapping) T-Vpack+VPR vs SCPlace (α=0.5) 21

BSPlace 22

BSPlace BLE Level Swapping within Simulated Annealing with Rent’s Rule Advantages Fix packing issues as they occur. Potentially better routability. Potentially better congestion due to combination of placement and packing. Disadvantages Execution time – We need to do memory allocation and deallocation for any ble swapping. Code Complexity – VPR is complex. We focus a lot of time with debugging and testing instead of algorithms. 23

Rent’s Rule Threshold Value Calculate the k value to get threshold Enter simulated annealing process Outer loop process Inner loop process Choose random CLB to move from current position to another position Check Rent’s Rule Threshold If we get a better result for swap Queue BLE Swapping Otherwise Do CLB swapping :Use T-v place Loop Through BLE Swapping Do BLE Swap after checking whether swap overlaps with previous swap Re-Allocated Memory and return to outer loop 24

Current Status Code Created our own BLE swapping mechanism using VPR data structure. We have a whole suite of test fixtures to test code. Testing still continuing, but we are finding minimal issues. We have done a swap within placement. We have started to integrate our cost function Validation We intend to run VPR benchmarks. Our BLE swapping solution should be better or the same as TV-Place. Our VPR benchmarks should also be comparable to IRAC. 25

The circuit below abstracts the MUX, switchboxes, and connection boxes. The connections represent the direct connections between bles in clbs. Op timize this circuit by performing one BLE swap. Explain why your optimizat ion will result in better performance. Architecture Parameter K = 2 I = 3 N = 2 Measurement Critical Path Delay = 1.182ns Demo 26

Demo http://www.screenr.com/gJdN 27

Demo 28

Thanks. 29

Backup Slides 30

Impact of duplication on placement Delay = 2 Delay = 1 31

A novel net weighting algorithm for timing- driven placement Kong, Tim Tianming. Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design. ACM, 2002. 32

A Novel Net Weighting Algorithm Accurate path counting algorithm The first known accurate path counting algorithm that considers all paths Due to experimental number of paths present in the circuit, accurate all path counting has been considered very difficult. Significant performance improvement Little loss in total wirelength No runtime overhead 33

A Novel Net Weighting Algorithm consider the path sharing effect If two critical paths share a common segment, the edges in the common segment should receive higher weights. Define two variables Forward path F(p) - the number of different critical paths starting from P I elements, terminating at p. Backward path B(p) – the number of different critical paths staring from P O elements, terminating at p, if we reverse all signal flow directions. 34

Background 35

Background 36

Example a b c d e f 5 7 1 5 3 Timing of a circuit 0 0 7 8 1313 1 5 7 1 5 3 ARR(t) 0 2 7 8 1313 1313 5 7 1 5 3 REQ(s) The longest path delay (T) 37

Example 0 2 0 0 0 2 5 7 1 5 3 Slack(s, t) 5 7 1 5 3 0/0 0/2 7/7 8/8 13/13 11/13 38

Example 0 0 0 0 7 1 5 d(π) = 13, slack(π) = 0 2 0 0 2 5 1 3 0 0 0 2 7 1 3 2 0 0 0 5 1 5 d(π) = 9, slack(π) = 4 d(π) = 11, slack(π) = 2 39

Critical Path counting 40

Calculate F(p) 0 0 0 0 0 0 5 7 1 5 3 1 1 0 0 0 0 5 7 1 5 3 1 1 2 2 2 2 5 7 1 5 3 41

Calculate B(p) 0 0 0 0 0 0 5 7 1 5 3 0 0 0 0 1 1 5 7 1 5 3 2 2 2 2 1 1 5 7 1 5 3 42

Calculate GP(s,t) 2 2 2 2 1 1 5 7 1 5 3 1 1 2 2 2 2 5 7 1 5 3 a b c d e f 2 2 4 2 2 43

Accurate All Path Counting Use discount function to get accurate counting result ‘a’ is a positive constant number x Fs(s,t) = ARR(t) – ARR(s) – d(s,t) Bs(s,t) = REQ(t) – REQ(s) – d(s,t) y is the longest path delay (T) 44

Accurate All Path Counting 45

Ex. Calculate F(t) (a=2) a b c d e f 5 7 1 5 3 0/0 0/2 7/7 8/8 13/13 11/13 D{Fs(a, c), T} = D{0,13} = 1 D{Fs(b, c), T} = D{2,13} = 0.88 D{Fs(c, d), T} = D{0,13} = 1 D{Fs(d, e), T} = D{0,13} = 1 D{Fs(d, f), T} = D{0,13} = 1 a b c d e f 5 7 1 5 3 1 1 1+0.8 8 1.88 46

Ex. Calculate B(s) (a=2) a b c d e f 5 7 1 5 3 0/0 0/2 7/7 8/8 13/13 11/13 D{Bs(a, c), T} = D{0,13} = 1 D{Bs(b, c), T} = D{0,13} = 1 D{Bs(c, d), T} = D{0,13} = 1 D{Bs(d, e), T} = D{0,13} = 1 D{Bs(d, f), T} = D{2,13} = 0.88 a b c d e f 5 7 1 5 3 1.88 1+0.88 1 1 47

Ex. Calculate AP(s,t) (a=2) a b c d e f 5 7 1 5 3 1.88 1+0.8 8 1 1 a b c d e f 5 7 1 5 3 1 1 1.88 a b c d e f 1*1.88*1 = 1.88 D{slack(a, c), T} = D{0,13} = 1 D{slack(b, c), T} = D{2,13} = 0.88 D{slack(c, d), T} = D{0,13} = 1 D{slack(d, e), T} = D{0,13} = 1 D{slack(d, f), T} = D{2,13} = 0.88 1*1.88*0.88 =1.65 1.88*1.88*1 =3.53 1.88*1*1 =1.88 1.88*1*0.88 =1.65 48

Compare results a b c d e f 1.88 1.65 3.53 1.88 1.65 a b c d e f 2 2 4 2 2 Using Critical counting method (GPATH), it is difficult to get accurate result. However, if we use proposed algorithm, we can get more accurate result. 49

VPR Datastructures Resource Routing Graph Physical Block Graph Netlist Global CLB Netlist Global Atom Netlist Blocks 50

Blocks Contains CLB Contains the Input Output Contains the Resource Routing Graph Contains the Physical Blocks Physical Blocks represents the BLE Physical Blocks represents the Flip Flop Physical Blocks also contains the LUTs 51

Resource Routing Graph Nodes are pins Edges are architectural connections Each pin is associated with a net num Prev Nodes and Edges represents the actual connections per ble. 52

Global Netlist 53

Atom Netlist 54

QUIZ 1. Question 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which.

Similar presentations

Presentation on theme: "QUIZ 1. Question 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

QUIZ 1. Question 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which.

Similar presentations

Presentation on theme: "QUIZ 1. Question 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which."— Presentation transcript:

Similar presentations

About project

Feedback