Improvements in FPGA Technology Mapping

Slides:



Advertisements
Similar presentations
FRAIGs - A Unifying Representation for Logic Synthesis and Verification - Alan Mishchenko, Satrajit Chatterjee, Roland Jiang, Robert Brayton ERL Technical.
Advertisements

ECE 667 Synthesis & Verificatioin - FPGA Mapping 1 ECE 667 Synthesis and Verification of Digital Systems Technology Mapping for FPGAs D.Chen, J.Cong, DAOMap.
FPGA Technology Mapping Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
ECE 667 Synthesis and Verification of Digital Systems
Combining Technology Mapping and Retiming EECS 290A Sequential Logic Synthesis and Verification.
1 DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004 Presented by: Wei Chen.
Technology Mapping.
1 FRAIGs: Functionally Reduced And-Inverter Graphs Adapted from the paper “FRAIGs: A Unifying Representation for Logic Synthesis and Verification”, by.
DAG-Aware AIG Rewriting Alan Mishchenko, Satrajit Chatterjee, Robert Brayton Department of EECS, University of California Berkeley Presented by Rozana.
ECE Synthesis & Verification, Lecture 17 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Systems Technology.
1 A Method for Fast Delay/Area Estimation EE219b Semester Project Mike Sheets May 16, 2000.
Electrical and Computer Engineering Archana Rengaraj ABC Logic Synthesis basics ECE 667 Synthesis and Verification of Digital Systems Spring 2011.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
1 EECS 219B Spring 2001 Timing Optimization Andreas Kuehlmann.
Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.
1 Stephen Jang Kevin Chung Xilinx Inc. Alan Mishchenko Robert Brayton UC Berkeley Power Optimization Toolbox for Logic Synthesis and Mapping.
Technology Mapping. 2 Technology mapping is the phase of logic synthesis when gates are selected from a technology library to implement the circuit. Technology.
Technology Mapping with Choices, Priority Cuts, and Placement-Aware Heuristics Alan Mishchenko UC Berkeley.
1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,
DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong , Computer Science Department , UCLA Presented.
A Semi-Canonical Form for Sequential Circuits Alan Mishchenko Niklas Een Robert Brayton UC Berkeley Michael Case Pankaj Chauhan Nikhil Sharma Calypto Design.
Global Delay Optimization using Structural Choices Alan Mishchenko Robert Brayton UC Berkeley Stephen Jang Xilinx Inc.
Reducing Structural Bias in Technology Mapping
The Analysis of Cyclic Circuits with Boolean Satisfiability
Synthesis for Verification
Power Optimization Toolbox for Logic Synthesis and Mapping
Alan Mishchenko UC Berkeley
Mapping into LUT Structures
Delay Optimization using SOP Balancing
CS137: Electronic Design Automation
Faster Logic Manipulation for Large Designs
SAT-Based Logic Optimization and Resynthesis
Robert Brayton Alan Mishchenko Niklas Een
Alan Mishchenko Satrajit Chatterjee Robert Brayton UC Berkeley
Logic Synthesis Primer
A Semi-Canonical Form for Sequential AIGs
Applying Logic Synthesis for Speeding Up SAT
Integrating an AIG Package, Simulator, and SAT Solver
Reconfigurable Computing
Overview: Fault Diagnosis
Standard-Cell Mapping Revisited
Alan Mishchenko University of California, Berkeley
SAT-Based Optimization with Don’t-Cares Revisited
3.5 Minimum Cuts in Undirected Graphs
Scalable and Scalably-Verifiable Sequential Synthesis
Mapping into LUT Structures
Sungho Kang Yonsei University
ECE 667 Synthesis and Verification of Digital Systems
FPGA Glitch Power Analysis and Reduction
Integrating Logic Synthesis, Technology Mapping, and Retiming
Research Status of Equivalence Checking at Zhejiang University
Resolution Proofs for Combinational Equivalence
Alan Mishchenko UC Berkeley
Reinventing The Wheel: Developing a New Standard-Cell Synthesis Flow
Integrating an AIG Package, Simulator, and SAT Solver
Introduction to Logic Synthesis
Technology Mapping I based on tree covering
VLSI CAD Flow: Logic Synthesis, Placement and Routing Lecture 5
Recording Synthesis History for Sequential Verification
Delay Optimization using SOP Balancing
Logic Synthesis: Past and Future
Reinventing The Wheel: Developing a New Standard-Cell Synthesis Flow
Innovative Sequential Synthesis and Verification
Robert Brayton Alan Mishchenko Niklas Een
Illustrative Example p p Lookup Table for Digits of h g f e ) ( d c b
Fast Min-Register Retiming Through Binary Max-Flow
Robert Brayton Alan Mishchenko Niklas Een
Alan Mishchenko Department of EECS UC Berkeley
Integrating AIG Package, Simulator, and SAT Solver
CS137: Electronic Design Automation
Presentation transcript:

Improvements in FPGA Technology Mapping Satrajit Chatterjee, Alan Mishchenko and Robert Brayton U. C. Berkeley

Outline Review of Technology Mapping More Efficient Cut Computation Lossless Synthesis Area Recovery

Technology Mapping Input: A Boolean network Output: A netlist of k-LUTs implementing the Boolean network optimizing some cost function f f Technology Mapping a b c d e a b c d e The subject graph The mapped netlist

Basic Mapping Algorithm Cut-based mapping using dynamic programming on a DAG for delay optimality Input: And-Inverter Graph Compute k-feasible cuts for each node Compute best arrival time at each node In topological order (from PI to PO) Assuming that each cut maps to a k-LUT Assuming that each k-LUT has unit delay Chose the best cover In reverse topological order (from PO to PI) Output: Mapped Netlist

k-feasible Cuts r (Rough definitions) A cut of a node n is a set of nodes in transitive fan-in such that assigning values to those nodes fixes the value of n. A k-feasible cut means the size of the cut must be k or less. p q a b c The set {p, b, c} is a 3-feasible cut of node r. (It is also a 5-feasible cut.) k-feasible cuts are important in FPGA mapping since the logic between a node and the nodes in its cut can be replaced by a k-LUT.

k-feasible Cut Computation The set of cuts of a node is a ‘cross product’ of the sets of cuts of its children { {r}, {p, q}, {p, b, c}, {a, b, q}, {a, b, c} } r { {p}, {a, b} } { {q}, {b, c} } Computation is done bottom-up p q { {a} } { {b} } { {c} } a b c Any cut that is of size greater than k is discarded (Pan ’98, Cong ’99)

Outline Review of Technology Mapping More Efficient Cut Computation Cut Dropping Cut Dominance Lossless Synthesis Area Recovery

Bottom-up computation Cut Dropping During bottom up computation of cuts, the set of cuts of a node can be freed once all its fan-outs have been processed { {r}, {p, q}, {p, b, c}, {a, b, q}, {a, b, c} } r Can delete these cuts once node r is done { {p}, {a, b} } { {q}, {b, c} } Bottom-up computation p q a b c Once the cuts of node r are computed, the cuts of q are no longer needed But cannot discard the cuts of node p since not all fan-outs of p have been processed Dramatically reduces peak memory consumption on large designs

Cuts Behaving Badly Bottom-up cut computation in the presence of re-convergence might produce dominated cuts x = ~a + a.b + ~b.c x { .. {a, d, b, c} .. {a, b, c} .. } f { .. {d, b, c} .. {a, b, c} .. } d e Cut {a, b, c} dominates cut {a, d, b, c} a b c The “good” cut {a, b, c} is there: so not a quality issue But the “bad” cut {a, d, b, c} may be propagated further: so a run-time issue Want to discard dominated cuts quickly

Signature-based Dominance Problem: Given two cuts how to quickly determine whether one is a subset of another Define signature of a cut: sig (c) = Σ 2ID(n) mod 32 n Îc (Σ means bit-wise OR) where ID(n) is the integer id of node n Observation: If cut c1 dominates cut c2 then sig(c1) OR sig(c2) = sig(c2) Cheap test for the common case that a cut does not dominate another. Only if this fails is an actual comparison made.

Example Let the node id’s be a = 1, b = 2, c = 3, d = 4 Let c1 = {a, b, c} and c2 = {a, d, b, c} sig (c1) = 21 OR 22 OR 23 = 0001 OR 0010 OR 0100 = 0111 sig (c2) = 21 OR 24 OR 22 OR 23 = 0001 OR 1000 OR 0010 OR 0100 = 1111 As sig (c1) OR sig (c2) ¹ sig (c1), c2 does not dominate c1 But sig (c1) OR sig (c2) = sig (c2), so c1 may dominate c2

Other Uses of Signatures Signatures can be used as quick negative tests for equality of cuts and for k-feasibility

Run-time of k-feasible cut computation

Peak Memory in Mb with Cut Dropping

Outline Review of Technology Mapping More Efficient Cut Computation Lossless Synthesis Area Recovery

Structural Bias The mapped netlist very closely resembles the subject graph f f p p Technology Mapping m m a b c d e a b c d e Every input of every LUT in the mapped netlist must be present in the subject graph .. .. otherwise technology mapping will not find the match

The Problem of Structural Bias A better match may not be found f f This match is not found p p f q m m a b c d e a b c d e a b c d e Since the point q is not present in the subject graph, the match on the extreme right will not be found

The Problem of Structural Bias The match would be found with a different subject graph f f p f = q q m a b c d e a b c d e a b c d e

Traditional Synthesis Only the network at the end of technology independent synthesis is used for mapping Technology- independent synthesis Boolean Network sweep eliminate resub simplify No guarantee of optimality since each synthesis step is heuristic. But structural bias means the mapped netlist depends heavily on the final network. fx resub sweep eliminate sweep full simplify Technology Mapping Mapped Netlist

Lossless Synthesis Idea: Merge intermediate networks into a single network with choices which is used for mapping Technology- independent synthesis Boolean Network sweep eliminate resub simplify Choice operator Technology mapping is not any harder with choices (Lehman-Watanabe ’95, Chen and Cong `01) fx resub sweep eliminate sweep full simplify Technology Mapping Mapped Netlist

Lossless Synthesis Can combine the results of different technology independent optimization scripts Script optimizes area Boolean Network sweep Script optimizes delay eliminate resub simplify speed up reduce depth fx resub sweep eliminate sweep full simplify Technology Mapping Mapped Netlist

Mapping with Choices Boolean sweep Network eliminate resub Question 1: simplify Question 1: How to implement an efficient choice operator? fx resub sweep Question 2: How to map quickly with choices? eliminate sweep full simplify Technology Mapping Mapped Netlist

Mapping with Choices Boolean sweep Network eliminate resub Question 1: simplify Question 1: How to implement an efficient choice operator? fx resub sweep Question 2: How to map quickly with choices? eliminate sweep full simplify Technology Mapping Mapped Netlist

Detecting Choices Task: Given two Boolean networks, we need to create a network with choices Network 1 x = (a + b).c y = b.c.d Network 2 x = a.c + b.c y = b.c.d Step 1: Make And-Inverter decomposition of networks x y a b c d x y a b c d

Detecting Choices Step 2: Use combinational equivalence to detect functionally equivalent nodes up to complementation (Kuehlmann ’04, …) Random simulation to detect possibly equivalent nodes SAT-based decision procedure to prove equivalence Network 1 x = (a + b).c y = b.c.d Network 2 x = a.c + b.c y = b.c.d x y a b c d x y a b c d

Detecting Choices Step 3: Merge equivalent nodes with choice edges x y b c d x y a b c d a b c d x y x now represents a class of nodes that are functionally equivalent up to complementation

Mapping with Choices Boolean sweep Network eliminate resub Question 1: simplify Question 1: How to implement an efficient choice operator? fx resub sweep Question 2: How to map quickly with choices? eliminate sweep full simplify Technology Mapping Mapped Netlist

Mapping with Choices Only Step 1 requires modification Input: And-Inverter Graph with Choices Compute k-feasible cuts with choices Compute best arrival time at each node In topological order (from PI to PO) Assuming that each cut maps to a k-LUT Assuming that each k-LUT has unit delay Chose the best cover In reverse topological order (from PO to PI) Output: Mapped Netlist

Cut Computation with Choices Cuts are now computed for equivalence classes of nodes { {x1}, {p, r}, {p, b, c}, {a, c, r}, {a, b, c} } { {x2}, {q, c}, {a, b, c} } x y x1 x2 p q r a b c d Cuts ( x ) = Cuts ( x1 )  Cuts( x2 ) = { {x1}, {p, r}, {p, b, c}, {a, c, r}, {a, b, c}, {x2}, {q, c} }

Mapping with Choices After Step 1 everything else remains same Input: And-Inverter Graph with Choices Compute k-feasible cuts with choices Compute best arrival time at each node In topological order (from PI to PO) Assuming that each cut maps to a k-LUT Assuming that each k-LUT has unit delay Chose the best cover In reverse topological order (from PO to PI) Output: Mapped Netlist

Outline Review of Technology Mapping More Efficient Cut Computation Lossless Synthesis Area Recovery Area-flow Exact Area

Overview of Area Recovery Initial mapping is delay oriented Gets best delay for all paths Area-based tie-breaking Not all paths critical Area recovery tries to slow down non critical paths to reduce area Each node with positive slack: choose a different cut that reduces area Done as subsequent passes after delay-oriented mapping Question: how to measure area?

How to Measure Area? Naïve definition: Area (cut) = 1 + [ Σ area (fan-in) ] y x x y p q r p q r a b c d e f a b c d e f Area of cut {p, c, d} = 1 + [1 + 0 + 0] = 2 Area of cut {a, b, q} = 1 + [ 0 + 0 + 1] = 2 Naïve definition says both cuts are equally good in area Naïve definition ignores sharing due to multiple fan-outs

Area-flow Area-flow “correctly” accounts for sharing Area-flow (cut) = 1 + [ Σ ( area-flow (fan-in) / fan-out (fan-in) ) ] y x x y p q r p q r a b c d e f a b c d e f Area-flow of cut {p, c, d} = 1 + [1 + 0 + 0] = 2 Area-flow of cut {a, b, q} = 1 + [ 0/1 + 0/1 + ½] = 1.5 Area-flow recognizes that cut {a, b, q} is better Area-flow “correctly” accounts for sharing (Cong ’99, Manohara-rajah ’04)

Area Recovery with Area-flow Do delay-optimal mapping Compute slack at each node Do area recovery with area-flow Done in topological order from PI to PO Among all the cuts which do not exceed slack budget choose cut with smallest area-flow Fan-out of a node is estimated from delay optimal mapping We only do one pass Saw only marginal improvement on subsequent passes

Exact Area Exact-area (cut) = 1 + [ Σ exact-area (fan-in with no other fan-out) ] f f p p 6 6 6 6 q q s t s t a b c d e f a b c d e f Cut {p, e, f} Area flow = 1+ [(.25+.25+3)/2] = 2.75 Exact area = 1 + 0 (p is used elsewhere) Exact area will choose this cut. Cut {s, t, q} Area flow = 1+ [.25+.25 +1] = 2.5 Exact area = 1 + 1 = 2 (due to q) Area flow will choose this cut.

Area Recovery with Exact-area Do delay-optimal mapping Compute slack at each node Do area recovery with area-flow Do area recovery with exact-flow Done in topological order from PI to PO Among all the cuts which do not exceed slack budget choose cut with smallest exact-area Note: Unlike area-flow, no estimation involved We only do one pass Saw only marginal improvement on subsequent passes

Area Recovery Summary Two step area recovery Area-flow has global view Exact area has local view Ensures local minimum is reached Order in which nodes are processed for both steps is important Order of the two passes is important

Experimental Comparison Compare area-recovery with state-of-the-art academic mapper DAOmap DAOmap uses many (~10) different area recovery heuristics Some more effective than others Just the two heuristics of area-recovery and exact-area give better results on their benchmarks Also separate comparison with choices obtained from lossless synthesis flow Six snapshots of MVSIS script.rugged Not the best FPGA optimization script  Improves both area and delay

Comparison with DAOmap

Summary Improvements to cut computation Lossless Synthesis Cut dropping Signature-based dominance check Lossless Synthesis Map over multiple synthesis snapshots Simpler, faster and better area recovery Global area-flow Local exact area Order of application is important Implemented in the abc system Google: “abc berkeley logic synthesis”