HeAP: Heterogeneous Analytical Placement for FPGAs

Slides:



Advertisements
Similar presentations
Constraint Driven I/O Planning and Placement for Chip-package Co-design Jinjun Xiong, Yiuchung Wong, Egino Sarto, Lei He University of California, Los.
Advertisements

3D-STAF: Scalable Temperature and Leakage Aware Floorplanning for Three-Dimensional Integrated Circuits Pingqiang Zhou, Yuchun Ma, Zhouyuan Li, Robert.
Natarajan Viswanathan Min Pan Chris Chu Iowa State University International Symposium on Physical Design April 6, 2005 FastPlace: An Analytical Placer.
A Size Scaling Approach for Mixed-size Placement Kalliopi Tsota, Cheng-Kok Koh, Venkataramanan Balakrishnan School of Electrical and Computer Engineering.
Ripple: An Effective Routability-Driven Placer by Iterative Cell Movement Xu He, Tao Huang, Linfu Xiao, Haitong Tian, Guxin Cui and Evangeline F.Y. Young.
SimPL: An Effective Placement Algorithm Myung-Chul Kim, Dong-Jin Lee and Igor L. Markov Dept. of EECS, University of Michigan 1ICCAD 2010, Myung-Chul Kim,
1 Physical Hierarchy Generation with Routing Congestion Control Chin-Chih Chang *, Jason Cong *, Zhigang (David) Pan +, and Xin Yuan * * UCLA Computer.
FastPlace: Efficient Analytical Placement using Cell Shifting, Iterative Local Refinement and a Hybrid Net Model FastPlace: Efficient Analytical Placement.
Routability-Driven Blockage-Aware Macro Placement Yi-Fang Chen, Chau-Chin Huang, Chien-Hsiung Chiou, Yao-Wen Chang, Chang-Jen Wang.
ISQED’2015: D. Seemuth, A. Davoodi, K. Morrow 1 Automatic Die Placement and Flexible I/O Assignment in 2.5D IC Design Daniel P. Seemuth Prof. Azadeh Davoodi.
On Legalization of Row-Based Placements Andrew B. KahngSherief Reda CSE & ECE Departments University of CA, San Diego La Jolla, CA 92093
Can Recursive Bisection Alone Produce Routable Placements? Andrew E. Caldwell Andrew B. Kahng Igor L. Markov Supported by Cadence.
POLAR 2.0: An Effective Routability-Driven Placer Chris Chu Tao Lin.
ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement.
Octavo: An FPGA-Centric Processor Architecture Charles Eric LaForest J. Gregory Steffan ECE, University of Toronto FPGA 2012, February 24.
CSE 242A Integrated Circuit Layout Automation Lecture 5: Placement Winter 2009 Chung-Kuan Cheng.
Confidentiality Preserving Integer Programming for Global Routing Hamid Shojaei, Azadeh Davoodi, Parmesh Ramanathan Department of Electrical and Computer.
Titan: Large and Complex Benchmarks in Academic CAD
Are Floorplan Representations Important in Digital Design? H. H. Chan, S. N. Adya, I. L. Markov The University of Michigan.
Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.
Horizontal Benchmark Extension for Improved Assessment of Physical CAD Research Andrew B. Kahng, Hyein Lee and Jiajia Li UC San Diego VLSI CAD Laboratory.
Solving Hard Instances of FPGA Routing with a Congestion-Optimal Restrained-Norm Path Search Space Keith So School of Computer Science and Engineering.
10/11/ VLSI Physical Design Automation Prof. David Pan Office: ACES Placement (2)
March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,
Archer: A History-Driven Global Routing Algorithm Mustafa Ozdal Intel Corporation Martin D. F. Wong Univ. of Illinois at Urbana-Champaign Mustafa Ozdal.
UC San Diego / VLSI CAD Laboratory Incremental Multiple-Scan Chain Ordering for ECO Flip-Flop Insertion Andrew B. Kahng, Ilgweon Kang and Siddhartha Nath.
Seeing the Forest and the Trees: Steiner Wirelength Optimization in Placement Jarrod A. Roy, James F. Lu and Igor L. Markov University of Michigan Ann.
BSG-Route: A Length-Matching Router for General Topology T. Yan and M. D. F. Wong University of Illinois at Urbana-Champaign ICCAD 2008.
Analytic Placement. Layout Project:  Sending the RTL file: −Thursday, 27 Farvardin  Final deadline: −Tuesday, 22 Ordibehesht  New Project: −Soon 2.
Regularity-Constrained Floorplanning for Multi-Core Processors Xi Chen and Jiang Hu (Department of ECE Texas A&M University), Ning Xu (College of CST Wuhan.
Multilevel Generalized Force-directed Method for Circuit Placement Tony Chan 1, Jason Cong 2, Kenton Sze 1 1 UCLA Mathematics Department 2 UCLA Computer.
Kwangsoo Han, Andrew B. Kahng, Hyein Lee and Lutong Wang
Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.
Incremental Placement Algorithm for Field Programmable Gate Arrays David Leong Advisor: Guy Lemieux University of British Columbia Department of Electrical.
Quadratic VLSI Placement Manolis Pantelias. General Various types of VLSI placement  Simulated-Annealing  Quadratic or Force-Directed  Min-Cut  Nonlinear.
I N V E N T I V EI N V E N T I V E A Morphing Approach To Address Placement Stability Philip Chong Christian Szegedy.
Clock-Tree Aware Placement Based on Dynamic Clock-Tree Building Yanfeng Wang, Qiang Zhou, Xianlong Hong, and Yici Cai Department of Computer Science and.
Maze Routing Algorithms with Exact Matching Constraints for Analog and Mixed Signal Designs M. M. Ozdal and R. F. Hentschke Intel Corporation ICCAD 2012.
© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010.
FPGA CAD 10-MAR-2003.
Self-Hosted Placement for Massively Parallel Processor Arrays (MPPAs) Graeme Smecher, Steve Wilton, Guy Lemieux Thursday, December 10, 2009 FPT 2009.
© Ram Ramanan 2/22/2016 Commercial Codes 1 ME 7337 Notes Computational Fluid Dynamics for Engineers Lecture 4: Commercial Codes.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
May Mike Drob Grant Furgiuele Ben Winters Advisor: Dr. Chris Chu Client: IBM IBM Contact – Karl Erickson.
Effective Linear Programming-Based Placement Techniques Sherief Reda UC San Diego Amit Chowdhary Intel Corporation.
SEMI-SYNTHETIC CIRCUIT GENERATION FOR TESTING INCREMENTAL PLACE AND ROUTE TOOLS David GrantGuy Lemieux University of British Columbia Vancouver, BC.
Congestion-Driven Re-Clustering for Low-cost FPGAs MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of.
Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong (Steven) Deng & Wojciech P. Maly
Placement and Routing Algorithms. 2 FPGA Placement & Routing.
Oleg Petelin and Vaughn Betz FPL 2016
Runtime-Quality Tradeoff in Partitioning Based Multithreaded Packing
Placement study at ESA Filomena Decuzzi David Merodio Codinachs
Floating-Point FPGA (FPFPGA)
Exploiting Fast Carry Chains of FPGAs for Designing Compressor Trees
Incremental Placement Algorithm for Field Programmable Gate Arrays
APLACE: A General and Extensible Large-Scale Placer
Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke
Ice Investigation with PPC
Chin Hau Hoo, Akash Kumar
A Novel FPGA Logic Block for Improved Arithmetic Performance
EE5780 Advanced VLSI Computer-Aided Design
mPL 5 Overview ISPD 2005 Placement Contest Entry
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
EDA Lab., Tsinghua University
Multi-Commodity Flow-Based Spreading in a Commercial Analytic Placer
Measuring the Gap between FPGAs and ASICs
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Xilinx Alliance Series
Under a Concurrent and Hierarchical Scheme
Presentation transcript:

HeAP: Heterogeneous Analytical Placement for FPGAs FPL 2012 Marcel Gort and Jason Anderson

Motivation CAD for FPGAs takes too long (up to a day). FPGA placement contributes to a large proportion of overall CAD time. In ASIC domain, where millions of cells are handled by placement, fast analytical methods dominate. Recent work [1] adapted FastPlace to homogeneous FPGAs. vs. VPR: ~13x speedup with 20% worse wirelength. Can we use analytical methods to place cells onto a realistic heterogeneous FPGA? 1. Bian H., Ling A., Choong A., Zhu J. Towards scalable placement for FPGAs. in FPGA 2010

Analytical Placement (AP) Objective function: Half-Perimeter Wirelength (HPWL) Minimizing objective function: Solve system of linear equations generated from connections between cells. Convert multi-pin nets to 2-pin nets. Create system of linear equations to solve weighted sum of squared distances between cells. Solve system using off-the-shelf linear systems solver.

Φ = Min [ (XB – XA)2 + (XC – XB)2 + (XA – 0)2 + (3 – XC)2 ] IO A B C IO 1 2 3 Φ = Min [ (XB – XA)2 + (XC – XB)2 + (XA – 0)2 + (3 – XC)2 ]

IO A B C IO 1 2 3 2 -1 0 -1 2 -1 0 -1 2 XA XB XC 3 X =

2 -1 0 -1 2 -1 0 -1 2 XA XB XC 3 X = XA = 0.75, XB=1.5, XC=2.25 1 2 3 IO A B C IO 1 2 3 2 -1 0 -1 2 -1 0 -1 2 XA XB XC 3 X = XA = 0.75, XB=1.5, XC=2.25

2 -1 0 -1 2 -1 0 -1 2 XA XB XC 3 X = XA = 0.75, XB=1.5, XC=2.25 1 2 3 IO A B C IO 1 2 3 2 -1 0 -1 2 -1 0 -1 2 XA XB XC 3 X = XA = 0.75, XB=1.5, XC=2.25

Actual Solved Solution Mention that IO is on the bottom left Need to spread cells!

Generate X and Y matrices based AP Overview Generate X and Y matrices based on netlist Solve Perturb formulation based on spreading Spread cells to legalize solution Good enough? No Yes Refinement

Legalized Solution

Spreading Adapted from SimPL [2]. Find over-utilized area. 2. Myung-Chul K., Dong-Jin L., Markov I. SimPL: An Effective Placement Algorithm. in ICCAD 2010

Spreading Adapted from SimPL [2]. Find over-utilized area. Find a larger surrounding area that can accommodate all cells within it.

Spreading Adapted from SimPL [2]. Find over-utilized area. Find a larger surrounding area that can accommodate all cells within it. Split the cells into two sets.

Spreading Assign an area to each cell which is proportional to the total area of the cells.

Spreading Assign an area to each cell which is proportional to the total area of the cells. Spread each set of cells separately, within the area assigned to it.

Spreading Assign an area to each cell which is proportional to the total area of the cells. Spread each set of cells separately, within the area assigned to it. Alternate x and y spreading directions until solution is legal.

Spreading Assign an area to each cell which is proportional to the total area of the cells. Spread each set of cells separately, within the area assigned to it. Alternate x and y spreading directions until solution is legal.

Spreading Assign an area to each cell which is proportional to the total area of the cells. Spread each set of cells separately, within the area assigned to it. Alternate x and y spreading directions until solution is legal.

Generate X and Y matrices based AP Overview Generate X and Y matrices based on netlist Solve Perturb formulation based on spreading Spread cells to legalize solution Good enough? No Yes Refinement

Pseudo-connections Solved solution Legalized solution 1 2 3 1 2 3 A IO B C 1 2 3 A IO IO Legalized solution B C 1 2 3

Pseudo-connections Solved solution Legalized solution 1 2 3 1 2 3 A IO B C 1 2 3 A IO IO Legalized solution B C 1 2 3

Pseudo-connections Solved solution Weighted pseudo-connections between A Solved solution IO IO B C 1 2 3 Weighted pseudo-connections between cells and their legalized placements.

AP Overview Re-generate X and Y matrices based on perturbed formulation Solve Perturb formulation based on spreading Spread cells to legalize solution Good enough? No Yes Refinement

HeAP Analytical placement framework that targets commercial FPGAs. Currently supports Cyclone II FPGAs. Supports RAMs, DSPs, LABs, hard macros (carry chains). Uses Quartus University Interface Program (QUIP) to replace the placer in Quartus II.

AP for FPGAs FPGA ASIC

AP for FPGAs FPGA ASIC

AP for FPGAs FPGA ASIC

AP for FPGAs FPGA ASIC Snap to Grid – cut generation in spreading is multi-objective.

AP for FPGAs FPGA ASIC Snap to Grid – cut generation in spreading is multi-objective. Carry chains – relative cell placements must be maintained.

AP for FPGAs FPGA ASIC Snap to Grid – cut generation in spreading is multi-objective. Carry chains – relative cell placements must be maintained. Cell type can only fit in correct slot type (e.g. RAMs in RAM slots).

AP for FPGAs – Spreading Spread each cell type separately. Maintain legality.

AP for FPGAs – Spreading Spread each cell type separately. Maintain legality.

AP for FPGAs – Spreading Spread each cell type separately. Maintain legality.

AP for FPGAs – Spreading Spread each cell type separately. Maintain legality.

AP for FPGAs – Spreading Spread each cell type separately. Maintain legality.

AP for FPGAs – Spreading Spread each cell type separately. Maintain legality.

AP for FPGAs – Spreading Spread each cell type separately. Handle legality in spreading.

AP for FPGAs – Solving Solver doesn’t know about resource constraints but… We can encourage solver to place a cell close to a legal slot by adding a pseudo-connection between that cell and the slot. After enough iterations, solver will begin to generate solutions that are close to legal.

AP for FPGAs – Solving Solver doesn’t know about resource constraints but… We can encourage solver to place a cell close to a legal slot by adding a pseudo-connection between that cell and the slot. After enough iterations, solver will begin to generate solutions that are close to legal.

AP for FPGAs – Solving Solver doesn’t know about resource constraints but… We can encourage solver to place a cell close to a legal slot by adding a pseudo-connection between that cell and the slot. After enough iterations, solver will begin to generate solutions that are close to legal.

AP for FPGAs – Solving Solver doesn’t know about resource constraints but… We can encourage solver to place a cell close to a legal slot by adding a pseudo-connection between that cell and the slot. After enough iterations, solver will begin to generate solutions that are close to legal. Solve different types of cells together or separately?

Solving Orders All Initial Placement Solve all Spread DSPs Spread RAMs Spread LABs Solve all Spread DSPs Spread RAMs Spread LABs

Solving Orders All Rotate Initial Placement Initial Placement Solve all Solve DSPs Spread DSPs Spread DSPs Spread RAMs Solve RAMs Spread LABs Spread RAMs Solve all Solve LABs Spread DSPs Spread LABs Spread RAMs Solve DSPs Spread LABs Spread DSPs

Solving Orders All + Rotate Initial Placement Solve all Solve DSPs Spread DSPs Spread DSPs Spread RAMs Solve RAMs Spread LABs Spread RAMs Solve LABs Spread LABs Solve all

Solving Order Convergence Rate Legalized Solved

Experimental Methodology Used Altera Quartus II to generate: I/O placement Cell packing Run HeAP, targeting smallest of three Cyclone II FPGAs on which benchmark will fit. Quartus II verifies legality and generates post-routing results.

Experimental Methodology vs. Quartus II: All 4 cores of Intel core i5 using various Quartus effort levels. HeAP parallelization During solving, x and y directions solved simultaneously. During refinement, “move suggestion” phase parallelized. vs. Simulated Annealing (SA): Code from non-timing driven VPR placement (“fast”). 10 largest benchmarks from CHStone, QUIP, and MCNC.

Placement Run-time 10x speedup 4x speedup

HPWL Comparison 7% reduction

Wirelength vs. Run-time Landscape Varied Quartus Placement effort from 0.1, 0.5, 1.0, 2.0. Varied fitter effort level between auto and standard. Ran in timing and non-timing driven modes.

Wirelength vs. P&R Run-time Landscape

Wirelength vs. P&R Run-time Landscape

Wirelength vs. P&R Run-time Landscape

FMax vs. P&R Run-time Landscape Effort = 0.1 Effort = 0.5

FMax vs. P&R Run-time Landscape Effort = 0.1 Effort = 0.5

Conclusions AP can work well for Heterogeneous FPGAs. Add pseudo connections to legal cell locations. To obtain high quality and high speed, mix solving for each cell type separately and solving for cell types together. HeAP offers better run-time and wirelength than some Quartus II low effort placement runs. HeAP is unable to equal high-effort, high-quality results obtained using Quartus II. Compared to Quartus II run with default effort levels: 4x faster than non-timing driven flow with 5% worse wirelength. 11x faster than timing-driven flow with 4% worse wirelength and 9% worse FMax.

Future Work Timing-driven AP Quality-driven spreading 9% worse FMax when ignoring timing in placement. Can we improve on that? Quality-driven spreading Spreading objective function does not consider wirelength.

Questions?