ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement.

ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement

Placement °VLSI Design Flow Objective: -Minimize total chip area, -Sustain routable circuit within timing budget °FPGA Flow Area fixed Objective: -Assign LUTs in the netlist to available logic blocks in the array within utilization and performance constraints (Interconnect) -Locate functional blocks such that the interconnect required to route the signals between them is minimized. Target Architecture determines the cost function

Placement algorithm °two basic inputs: netlist with functional blocks and connections between them device map (architecture) °algorithm selects a legal location for each block such that the circuit wiring is optimized.

Significance of Placement °Good placement is extremely important sets constraints for routability even if the circuit does route, a poor placement will still lead to a lower maximum operating speed and increased power consumption. °Finding a good placement is challenging A large commercial FPGA contains over 500,000 functional blocks, -500,000! Possible placements. Exhaustive evaluation is therefore impossible. Placement is a computationally hard problem, -no known algorithm that produces optimal results in practical central processing unit (CPU) time. Development of fast and effective heuristic placement algorithms is a critical research area.

Device Legality Constraints °All resources are prefabricated in an FPGA leads to a variety of placement legality constraints: °A legal placement must place a functional block only in a location on the chip that can accommodate it. RAM block must be placed in a RAM location, and a lookup table (LUT) must be placed in a LUT location. °Some groups of functional blocks must be placed in a specific relative orientation to make use of special, dedicated routing resources. arithmetic logic cells—to use the dedicated carry- chain hardware, the logic cells forming a carry chain must be placed adjacent to each other in the sequence required by the carry structure.

FPGA Placement Constraints °FPGA interconnect is prefabricated, Amount of interconnect in each region of a device is fixed °Routing congestion When the interconnect demand approaches or exceeds the fabricated wiring capacity in some part of the FPGA. A placement that requires more interconnect in a device region than that region contains cannot be routed

FPGA Placement Constraints °Stratix-II is an island-style FPGA that contains routing segments that span 4, 16, and 24 logic blocks. Programmable switches allow routing segments in the same direction (horizontal or vertical) to be connected at their endpoints to create longer routes. Other programmable switches allow some horizontal routing segments to connect to vertical routing segments where they cross and vice versa. XY Length 4 Length 2 Length 1

Placement Objective– Routability Driven °Create a placement that minimizes the total interconnect required, °Increase the probability of successful routing °Consequently, some routability-driven placement algorithms minimize not only the total wiring required by the design but also the amount of routing congestion.

Placement Objective – Timing Driven °In addition to optimizing for routability, timing- driven algorithms use timing analysis to identify critical paths and/or connections to optimize the delay of those connections. °Most delays in an FPGA are due to the programmable interconnect timing-driven placement can achieve a large improvement in circuit speed over routability-driven approaches.

Level of Control on Placement °Commercial FPGA placement tools allow designers to control the placement °Common types of placement directives. °1) Exact location of a block The most restrictive Typical uses -to lock down the design I/Os at the locations required by the circuit board or to lock down the elements of a performance- critical intellectual property (IP) core. °2) Area specific less restrictive forces blocks to go into a specific 2D area, allows a designer to guide the placement tool

Level of Control on Placement °3) Relative location specify the relative location of several blocks, placement tool chooses exactly where to locate the block group. Typical use -for library components where a designer knows a good placement of the component blocks relative to each other. °4) Floating region specifies that some logic should be placed within a tight region placement tool can choose where that region should be on the device.

Placement Algorithms Constructive methods: -Begin from netlist and generate an initial placement. -Partitioning method: Mincut -First address placement of partitions individually –Significant amount of reduction in search space -Then address placement of partitions relative to each other -Not suitable for FPGAs –Especially island style FPGA with limited routing resources –Method postpones the impact of inter-partition connections –Leads to increased demand on routing tracks

Placement Placement has a set of competing goals. Can’t optimize locally and globally simultaneously. Use heuristic approaches to evaluate quality. CDF A B E 12 LUT1LUT2 A B C D E

Getting Stuck with Local Minima pick a random starting point repeatedly swap, if the new state has a lower cost, it is accepted, otherwise the current state is retained. greedily accept good moves Problem: large number of local minima circuit placed as shown at left, is in a local minima. No swap of logic or I/O functions will reduce the total wirelength.

Technology Mapping to Placement Mapping onto 5-LUT

Technology Mapping to Placement

Iterative Placement Algorithms °Iterative improvement Begin with random or constructive placement. Iterate to improve it. Pairwise interchange Hill climbing -To avoid getting trapped in local minima, consider “hill- climbing” approach -Need to accept worse solutions or make “bad” moves to get global minima. -Acceptance is probabalistic. Only accept cost-increasing moves some of the time.

Iterative Placement Algorithms °Methods Force-directed methods (classical mechanics) -Force vector computed on each module corresponding to all nets -Solve set of non-linear differential equations. –FD relaxation –FD pairwise exchange Simulated annealing (statistical mechanics) -Model a physical annealing process which optimizes energy. -Similar to “quenching” metal. -Generates best results -Can be time consuming Macro-based approaches -Genetic algorithms

Physical Annealing Take a metal and heat to high temperature Allow it to cool slowly; metal is annealed to a low temperature Atoms in the metal are at lower energy states after annealing Higher the temperature initially and slower the cooling, the tougher the metal becomes. Atoms transition to high energy states and then move to low energy.

Simulated Annealing Optimization strategy based on physical annealing process Generate random moves. -Initially, accept moves that decrease and increase cost. As temperature decreases, the probability of accepting bad moves decreases. Eventually, default to greedy algorithm Only accept positive moves Determine when to terminate.

Simulated Annealing

Bounding Box and Cost Function °Bounding box underestimates wirelength q(n) is compensation factor -q is 1 for 3- and 2-terminal nets -increases to 2.79 for 50 terminal nets Cav is channel capacity (tracks) in x and y directions over the bounding box of net n -penalizes placements which require more routing in areas of the FPGA that have narrower channels. -However, Cav is constant since channel width is fixed for island style FPGA

Placement Flow

Wire length measures °Estimate wire length by distance between components. °Possible distance measures: Euclidean distance (sqrt(x 2 + y 2 )); Manhattan distance (x + y). °Multi-point nets must be broken up into trees for good estimates. Euclidean Manhattan

Weighted Graph -> Distance Table °Geometric Distance NOT Accurate !!! °Need Weighted Graph Cost of Routing Resources °Finding Shortest Path at Each Step of Annealing costly Need for Lookup Table

Simulated Annealing – Moves per iteration Moves_per_iteration = BN 4/3 N = # of logic blocks and I/O pads B = scaling factor

Simulated Annealing – Swapping Range Swap distance is adjusted based on the acceptance rate as well. Initially set to entire FPGA As T drops, distance drops.

Simulated Annealing New T depends on the fraction of attempted moves that were accepted. Reduces rapidly when acceptance rate is high When the temperature is less than a small fraction of the average cost of a net, it is unlikely that any move that results in a cost increase will be accepted, so we terminate the anneal.

Annealing Criteria Contemporary FPGA packages use the following parameters: 1.Starting temp – 20 * stand_dev(cost of N swaps) 2.Cost function – weighted sum of wire length and delay 3.Inner loop – B * N 4/3 Beta cost function 4.Stopping criteria – T < [.005 * Cost/N nets ]

Strengths of SA making it suitable for FPGA °Can enforce all the legality constraints imposed by the FPGA architecture fairly directly By forbidding the creation of illegal placements in the move generator By adding a penalty cost to illegal placements. °Can directly model the impact of the FPGA routing architecture on circuit delay and routing congestion By creating an appropriate cost function

ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement.

Similar presentations

Presentation on theme: "ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement.

Similar presentations

Presentation on theme: "ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement."— Presentation transcript:

Similar presentations

About project

Feedback