DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs --------Deming Chen, Jason Cong , Computer Science Department , UCLA Presented.

Slides:



Advertisements
Similar presentations
~1~ Infocom’04 Mar. 10th On Finding Disjoint Paths in Single and Dual Link Cost Networks Chunming Qiao* LANDER, CSE Department SUNY at Buffalo *Collaborators:
Advertisements

Timing Optimization. Optimization of Timing Three phases 1globally restructure to reduce the maximum level or longest path Ex: a ripple carry adder ==>
ECE 667 Synthesis & Verificatioin - FPGA Mapping 1 ECE 667 Synthesis and Verification of Digital Systems Technology Mapping for FPGAs D.Chen, J.Cong, DAOMap.
Xing Wei, Wai-Chung Tang, Yu-Liang Wu Department of Computer Science and Engineering The Chinese University of HongKong
FPGA Technology Mapping Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability Allocation Vishal Khandelwal and Ankur Srivastava Department of.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Application Specific Instruction Generation for Configurable Processor Architectures VLSI CAD Lab Computer Science Department, UCLA Led by Jason Cong Yiping.
➢ Performing Technology Mapping and Optimization by DAG Covering: A Review of Traditional Approaches Evriklis Kounalakis.
Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification.
Localized Techniques for Power Minimization and Information Gathering in Sensor Networks EE249 Final Presentation David Tong Nguyen Abhijit Davare Mentor:
VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.
1 DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004 Presented by: Wei Chen.
Technology Mapping.
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.
Reconfigurable Computing (EN2911X, Fall07)
Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification.
EDA (CS286.5b) Day 3 Clustering (LUT Map and Delay) N.B. no lecture Thursday.
ICCAD Nov-2000 Timing Driven Gate Duplication: Complexity Issues and Algorithms Ankur Srivastava, Ryan Kastner and Majid Sarrafzadeh Embedded & Reconfigurable.
DAG-Aware AIG Rewriting Alan Mishchenko, Satrajit Chatterjee, Robert Brayton Department of EECS, University of California Berkeley Presented by Rozana.
CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping.
A Probabilistic Method to Determine the Minimum Leakage Vector for Combinational Designs Kanupriya Gulati Nikhil Jayakumar Sunil P. Khatri Department of.
Layout-based Logic Decomposition for Timing Optimization Yun-Yin Lien* Youn-Long Lin Department of Computer Science, National Tsing Hua University, Hsin-Chu,
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 27, 2008 Clustering (LUT Mapping, Delay) Please work preclass example.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 17: March 30, 2009 Clustering (LUT Mapping, Delay)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 18, 2009 Static Timing Analysis and Multi-Level Speedup.
Optimality Study of Logic Synthesis for LUT-Based FPGAs Jason Cong and Kirill Minkovich VLSI CAD Lab Computer Science Department University of California,
FPGA Technology Mapping Algorithms
FPGA Technology Mapping. 2 Technology mapping:  Implements the optimized nodes of the Boolean network to the target device library.  For FPGA, library.
Distributed Constraint Optimization * some slides courtesy of P. Modi
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
Library of Efficient Data types and Algorithms (LEDA)
POWER-DRIVEN MAPPING K-LUT-BASED FPGA CIRCUITS I. Bucur, N. Cupcea, C. Stefanescu, A. Surpateanu Computer Science and Engineering Department, University.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.
Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.
FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimisation in Lookup- Table Based FPGA Designs 04/06/ Presented by Qiwei Jin.
DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen and Jason Cong Computer Science Department University of California,
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 3: January 12, 2004 Clustering (LUT Mapping, Delay)
Lecture 6: Mapping to Embedded Memory and PLAs September 27, 2004 ECE 697F Reconfigurable Computing Lecture 6 Mapping to Embedded Memory and PLAs.
ELEC692 VLSI Signal Processing Architecture Lecture 3
Technology Mapping. 2 Technology mapping is the phase of logic synthesis when gates are selected from a technology library to implement the circuit. Technology.
Optimality Study of Logic Synthesis for LUT-Based FPGAs Jason Cong and Kirill Minkovich.
Give qualifications of instructors: DAP
In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.
Technology Mapping with Choices, Priority Cuts, and Placement-Aware Heuristics Alan Mishchenko UC Berkeley.
1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,
Global Clustering-Based Performance-Driven Circuit Partitioning Jason Cong University of California Los Angeles Chang Wu Aplus Design.
SEMI-SYNTHETIC CIRCUIT GENERATION FOR TESTING INCREMENTAL PLACE AND ROUTE TOOLS David GrantGuy Lemieux University of British Columbia Vancouver, BC.
A Novel Timing-Driven Global Routing Algorithm Considering Coupling Effects for High Performance Circuit Design Jingyu Xu, Xianlong Hong, Tong Jing, Yici.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2015 Clustering (LUT Mapping, Delay)
Placement and Routing Algorithms. 2 FPGA Placement & Routing.
Delay Optimization using SOP Balancing
CS137: Electronic Design Automation
Applying Logic Synthesis for Speeding Up SAT
Reconfigurable Computing
Standard-Cell Mapping Revisited
Alan Mishchenko University of California, Berkeley
SAT-Based Optimization with Don’t-Cares Revisited
Sungho Kang Yonsei University
Alan Mishchenko UC Berkeley
Automatic Test Pattern Generation
Improvements in FPGA Technology Mapping
Delay Optimization using SOP Balancing
CS137: Electronic Design Automation
CS137: Electronic Design Automation
Presentation transcript:

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong , Computer Science Department , UCLA Presented by Shikang Xu 1

Outline Introduction Related Works Definitions and Problem Fomulation Algorithm Description Discuss of Techniques 2

Introduction The LUT-based FPGA architecture dominates the existing programmable chip industry FPGA technology mapping converts a given Boolean circuit into a functionally equivalent network comprised only of LUTs 3

Related Works Area Minimization – Chortle-crf, [Francis, et al, DAC’91] – MIS-pga, [Murgai, et al, ICCAD’91] – Praetor, [Cong, et al, FPGA’99] – Anti-fuse FPGA Mapper, [Kang, et al, ASPDAC’04] Delay Minimization – DAG-Map, [Chen, et al, DTC’92] – FlowMap, [Cong, et al, ICCAD’92] – Edge-map, [Yang, et al, ICCAD’94] Power Minimization – PowerMinMap, [Li, et al, ASPDAC’03] – Emap, [Lamoureux, et al, ICCAD’03] – DVmap, [Chen, et al, FPGA’04] Simultaneous Delay and Area Minimization(Area Minimization under Timing Constraints ) – FlowMap-r, [Cong, et al, TVLSI’94] – CutMap, [Cong, et al, FPGA’95] – BoolMap-D, [Legl, et al, DAC’96] Adopted from Deming Chen, Jason Cong, Computer Science Department, UCLA 4

Definitions Cone (Ov):- A subnetwork of the original network, consisting of v and some of its predecessors, such that for any node w in Ov, there is a path from w to v in Ov. Fanin cone (Fv):- The maximum cone of v, consisting of all PI predecessors of v Input(Ov):- Denotes the set of distinct nodes outside Ov which supply inputs to the gates in Ov. Cut:- It is a partitioning (X,X’) of a cone Ov such that X’ is a cone of v. Cut-set:- It is represented as V(X,X’), and consists of input(X’) 5

Definitons Cutsize: It is the cardinality of the cut-set. A cut is said to be K-feasible if the cutsize is <=K Level: The level of a node v is the length of the longest path from any PI to the node v. Depth : The depth of a network is the largest node level in the network. Mapping Depth: The largest optimal delay of the mapped circuit. Picture adopted from Deming Chen, Jason Cong, Computer Science Department, UCLA 6 a b c d e v FvFv 3-feasible cone C v PIs Delay of 2

Problem Formulation Area Minimization under Timing Constraint: Given: a Boolean network; Unity delay model (1 LUT contributes unit delay) Goal: cover the network with K-feasible cones (K-LUTs), such that Optimal mapping depth is guaranteed Area (number of LUTs) is minimized 7

Algorithm Description A Cut-enumeration-based method consisting of cut generation and cut selection Cut generation traverses the network from the PI to the PO, and combines subcuts on the fanin nodes of a target node to generate all the cuts on the target node After generating the cuts, the network is traversed from the PO to the PI, and the cuts are selected to produce the LUT mapping result. 8

Cut Enumeration Cut enumeration means generating all K-feasible cuts of a cone for a given node effectively f(K, v) represents all the K-feasible cuts rooted at node v, operator + is Boolean OR, K is Boolean AND on its operands, but filtering out all the resulting p-terms with more than K variables. 9

Cut Enumeration: Example 10 All the cuts rooted on node s can be generated by combining the cuts rooted on its fanin nodes q and r. The cuts on the fanin nodes are called subcuts. Combining C1 with C2 will form a new cut Cs = {m, n, o, p} rooted on s. If the input of the new cut exceeds K, the cut is discarded.

Cut Enumeration: Time propagation The arrival time propagates through each of the cut, and each cut represents a LUT and hence a unit delay. The minimum arrival time at a node v is where C represents every cut generated for v through cut enumeration. Arr i is the minimum arrival time on input signal i of C. There can be several cuts with Arr i, form a set Xv 11

Cut enumeration: Area Propagation Similar to the arrival time, the area can also be propagated. The area is calculated as Where Uc is the area contributed by the cut C, A i is the estimated area of the cone rooted on signal i and f(i) is the fanout number of signal i. That means that the area on i is shared and distributed into other fanout nodes of i. 12

Delay and Area Propagation a c d yx z b w e f g Delay 1, Area 1 Optimal Delay = 1 Area = 1 Optimal Delay = 2 Area = 2 Delay 1, Area 1 Delay 2, Area 3 Delay 2, Area 2 Optimal Delay = 1 Area = 1 Optimal Delay = 1 Area = 1 Propagation process visits cuts and nodes iteratively The longest best delay on the POs is the optimal mapping delay Adopted from Deming Chen, Jason Cong, Computer Science Department, UCLA 13

Area propagation under Timing constraints To guarantee optimal mapping depth, we need to propagate the estimated area together with the minimum arrival time A v represents the best achievable area under the constraint that it also generates the optimal mapping delay upto the point of v With these formulae, the areas of cuts and nodes are iteratively calculated until the enumeration process reaches the POs. During the cut selection process when we know that v is not on a critical path, a cut C not belonging to Xv can be chosen as long as it does not violate the timing constraint. 14

Cost function of a cut Some Key parameters I C : cutsize of C N C : number of nodes covered by C f(v): fanout number of the root node v Rc: number of reconvergent path 15

Example of Cost function In the example above C1 and C2 have the same cutsize, but C2 is better C2 covers two sets of reconvergent paths Having a cut rooted at node 5 will reduce potential duplications 16

Global Duplication Cost Adjustment Consider potential node duplications Check the sub-cuts for multiple fanouts Propagate adjusted cost globally 17

Cut Selection From POs to PIs Critical paths: optimal delay + best area available Non-critical paths: relaxed delay + better area 18

Cut Selection Greedily pick cuts with smallest costs will forfeit some optimization factors in term of reducing duplication locally. Use heuristics to guide the selection procedure – Iterative Cut Selection Procedure – Local Cost Adjustment Input Sharing Slack Distribution Cut Probing 19

Efficiency With DAOmap, the researchers report a better area values with a lower runtime, when compared to CutMap. The impact of the various techniques used, on the final area values is shown here. 20

Efficiency: Impact of Techniques Input sharing proves to be the most important technique to reduce area because it reduces the number of edges and node duplications The mincost propagation is trying to evaluate how accurate our cost estimation model is. Global duplication cost adjustment offers the next largest gain, which shows that duplication of nodes adds to the area cost 21

Summary A cut enumeration based cut selection and generation process for LUT Novel techniques make DAOmap gained significant amount of area and runtime reduction over a state-of-the-art algorithm CutMap 22