Global Delay Optimization using Structural Choices Alan Mishchenko Robert Brayton UC Berkeley Stephen Jang Xilinx Inc.

Slides:



Advertisements
Similar presentations
EE290A 1 Retiming of AND- INVERTER graphs with latches Juliet Holwill 290A Project 10 May 2005.
Advertisements

1 FRAIGs: Functionally Reduced And-Inverter Graphs Adapted from the paper “FRAIGs: A Unifying Representation for Logic Synthesis and Verification”, by.
DAG-Aware AIG Rewriting Alan Mishchenko, Satrajit Chatterjee, Robert Brayton Department of EECS, University of California Berkeley Presented by Rozana.
Reinventing The Wheel: Developing a New Standard-Cell Synthesis Flow Alan Mishchenko University of California, Berkeley.
Logic Synthesis Primer
Wenlong Yang Lingli Wang State Key Lab of ASIC and System Fudan University, Shanghai, China Alan Mishchenko Department of EECS University of California,
Faster Logic Manipulation for Large Designs Alan Mishchenko Robert Brayton University of California, Berkeley.
Electrical and Computer Engineering Archana Rengaraj ABC Logic Synthesis basics ECE 667 Synthesis and Verification of Digital Systems Spring 2011.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.
ABC: A System for Sequential Synthesis and Verification BVSRC Berkeley Verification and Synthesis Research Center Robert Brayton, Niklas Een, Alan Mishchenko,
Cut-Based Inductive Invariant Computation Michael Case 1,2 Alan Mishchenko 1 Robert Brayton 1 Robert Brayton 1 1 UC Berkeley 2 IBM Systems and Technology.
1 Stephen Jang Kevin Chung Xilinx Inc. Alan Mishchenko Robert Brayton UC Berkeley Power Optimization Toolbox for Logic Synthesis and Mapping.
Wenlong Yang Lingli Wang State Key Lab of ASIC and System Fudan University, Shanghai, China Alan Mishchenko Department of EECS University of California,
1 Area-Efficient FPGA Logic Elements: Architecture and Synthesis Jason Anderson and Qiang Wang 1 IEEE/ACM ASP-DAC Yokohama, Japan January 26-28,
Technology Mapping with Choices, Priority Cuts, and Placement-Aware Heuristics Alan Mishchenko UC Berkeley.
1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,
A Semi-Canonical Form for Sequential Circuits Alan Mishchenko Niklas Een Robert Brayton UC Berkeley Michael Case Pankaj Chauhan Nikhil Sharma Calypto Design.
Reducing Structural Bias in Technology Mapping
Synthesis for Verification
Mohammad Gh. Alfailakawi, Imtiaz Ahmad, Suha Hamdan
Technology Mapping into General Programmable Cells
Power Optimization Toolbox for Logic Synthesis and Mapping
Alan Mishchenko UC Berkeley
Mapping into LUT Structures
Delay Optimization using SOP Balancing
Faster Logic Manipulation for Large Designs
Robert Brayton Alan Mishchenko Niklas Een
Alan Mishchenko Robert Brayton UC Berkeley
Alan Mishchenko Satrajit Chatterjee Robert Brayton UC Berkeley
A. Mishchenko S. Chatterjee1 R. Brayton UC Berkeley and Intel1
Magic An Industrial-Strength Logic Optimization, Technology Mapping, and Formal Verification System Alan Mishchenko UC Berkeley.
Applying Logic Synthesis for Speeding Up SAT
Versatile SAT-based Remapping for Standard Cells
Integrated Retiming and Mapping for Sequential Optimization
Integrating an AIG Package, Simulator, and SAT Solver
A Boolean Paradigm in Multi-Valued Logic Synthesis
Synthesis for Verification
SmartOpt An Industrial Strength Framework for Logic Synthesis
Standard-Cell Mapping Revisited
Faster Logic Manipulation for Large Designs
LUT Structure for Delay: Cluster or Cascade?
Fast Computation of Symmetries in Boolean Functions Alan Mishchenko
SAT-Based Area Recovery in Technology Mapping
Polynomial Construction for Arithmetic Circuits
Alan Mishchenko University of California, Berkeley
SAT-Based Optimization with Don’t-Cares Revisited
Scalable and Scalably-Verifiable Sequential Synthesis
Mapping into LUT Structures
Improvements to Combinational Equivalence Checking
ECE 667 Synthesis and Verification of Digital Systems
Integrating Logic Synthesis, Technology Mapping, and Retiming
Alan Mishchenko UC Berkeley
Reinventing The Wheel: Developing a New Standard-Cell Synthesis Flow
Integrating an AIG Package, Simulator, and SAT Solver
Improvements in FPGA Technology Mapping
Canonical Computation without Canonical Data Structure
Recording Synthesis History for Sequential Verification
Delay Optimization using SOP Balancing
Reinventing The Wheel: Developing a New Standard-Cell Synthesis Flow
Magic An Industrial-Strength Logic Optimization, Technology Mapping, and Formal Verification System Alan Mishchenko UC Berkeley.
Innovative Sequential Synthesis and Verification
Robert Brayton Alan Mishchenko Niklas Een
Word-Level Aspects of ABC
SAT-based Methods: Logic Synthesis and Technology Mapping
Fast Min-Register Retiming Through Binary Max-Flow
Scalable Don’t-Care-Based Logic Optimization and Resynthesis
Robert Brayton Alan Mishchenko Niklas Een
Alan Mishchenko Department of EECS UC Berkeley
Integrating AIG Package, Simulator, and SAT Solver
Presentation transcript:

Global Delay Optimization using Structural Choices Alan Mishchenko Robert Brayton UC Berkeley Stephen Jang Xilinx Inc.

2 Overview Motivation Motivation Timing criticality Timing criticality Restructuring for delay Restructuring for delay Algorithm Algorithm Experimental results Experimental results Conclusions Conclusions Future work Future work

3 Motivation AIG is an And-Inverter Graph AIG is an And-Inverter Graph AIG-based combinational logic synthesis is fast and effective AIG-based combinational logic synthesis is fast and effective AIG-based synthesis is area-oriented (except balancing) AIG-based synthesis is area-oriented (except balancing) Needed: Delay optimization in AIG-based synthesis Needed: Delay optimization in AIG-based synthesis AIGs allow for accumulation of structural choices [Lehman et al, TCAD’97; Chatterjee et al, ICCAD’05] AIGs allow for accumulation of structural choices [Lehman et al, TCAD’97; Chatterjee et al, ICCAD’05] Can leverage efficient technology mapper with choices Can leverage efficient technology mapper with choices Can lead to fast delay optimization (~10% of mapping time) Can lead to fast delay optimization (~10% of mapping time)

4 Distinctive Features Traditional approach Traditional approach For all timing-critical areas For all timing-critical areas Perform timing analysis Perform timing analysis Generate alternative structures Generate alternative structures Evaluate the improvement and decide is transformation is accepted Evaluate the improvement and decide is transformation is accepted Proposed approach Proposed approach Perform timing analysis only once Perform timing analysis only once For all timing-critical areas For all timing-critical areas Generate and store structural choices Generate and store structural choices Use technology mapper to pick and choose good structures Use technology mapper to pick and choose good structures Characteristics of the proposed approach Characteristics of the proposed approach Fast – because there is no repeated timing analysis Fast – because there is no repeated timing analysis Simple – because it leverages AIG package and LUT mapper Simple – because it leverages AIG package and LUT mapper Effective – because it makes decision in the global space Effective – because it makes decision in the global space

5 Timing Criticality Critical nodes Critical nodes Used by many traditional algorithms Used by many traditional algorithms Critical edges Critical edges Used by our algorithm Used by our algorithm We pre-compute critical edges of critical nodes We pre-compute critical edges of critical nodes Reduces computation Reduces computation An edge between critical nodes may not be critical An edge between critical nodes may not be critical See illustration: edge 1  3 See illustration: edge 1  Primary inputs Primary outputs

6 Delay-Oriented Restructuring Using traditional MUX-restructuring Using traditional MUX-restructuring AKA generalized select transform AKA generalized select transform

7 Overall Algorithm mapped netlist performSpeedup ( subject graph S, // S is an And-Inverter Graph subject graph S, // S is an And-Inverter Graph mapped netlist M, // M was previously derived by tech-mapping of S mapped netlist M, // M was previously derived by tech-mapping of S timing window w, // w is used to detect the critical paths timing window w, // w is used to detect the critical paths logic depth l, // l is used to detect a logic cone rooted at a node logic depth l, // l is used to detect a logic cone rooted at a node edge count p ) // p limits the number critical edges of the cone edge count p ) // p limits the number critical edges of the cone{ perform timing analysis of M with unit-delay or LUT-library model; perform timing analysis of M with unit-delay or LUT-library model; pre-compute critical section of M as nodes n such that 0  slack(n)  w; pre-compute critical section of M as nodes n such that 0  slack(n)  w; pre-compute timing-critical edges connecting these nodes; pre-compute timing-critical edges connecting these nodes; for each timing critical node n { for each timing critical node n { find cone C of M that extends l levels down from n; find cone C of M that extends l levels down from n; pick the set of timing-critical edges V feeding into C; pick the set of timing-critical edges V feeding into C; if the number of edges in V exceeds p, continue; if the number of edges in V exceeds p, continue; find logic cone C’ in S corresponding to C in M; find logic cone C’ in S corresponding to C in M; find variables V’ in S corresponding to V in M; find variables V’ in S corresponding to V in M; derive cofactors of the function of C’ w.r.t. variables in V’; derive cofactors of the function of C’ w.r.t. variables in V’; build multiplexer tree C’’ of the cofactors using variables in V’; build multiplexer tree C’’ of the cofactors using variables in V’; add structural choice C’= C’’ to the subject graph S; add structural choice C’= C’’ to the subject graph S; } return mapped netlist M’ derived by mapping subject graph S with added choices; return mapped netlist M’ derived by mapping subject graph S with added choices;}

8 Experimental Setup Implemented in ABC as command speedup Implemented in ABC as command speedup Used FPGA technology mapper if Used FPGA technology mapper if Verified the results using CEC engine cec Verified the results using CEC engine cec Experiments targeting 6-LUTs were run on an Intel Xeon 2-CPU 4-core computer with 8Gb RAM. Experiments targeting 6-LUTs were run on an Intel Xeon 2-CPU 4-core computer with 8Gb RAM. Experimentally compared the following scripts Experimentally compared the following scripts Without delay-optimization: Without delay-optimization: (st; dchoice; if -C 16 -F 2) 8 (st; dchoice; if -C 16 -F 2) 8 With delay-optimization: With delay-optimization: (st; dchoice; if -C 16 -F 2) 4 (st; dchoice; if -C 16 -F 2) 4 (speedup; if -C 16 -F 2) 3 (speedup; if -C 16 -F 2) 3 (st; dchoice; if -C 16 -F 2) 4 (st; dchoice; if -C 16 -F 2) 4

9 Examples of LUT Libraries A variable-pin-delay LUT library A variable-pin-delay LUT library The unit-delay LUT library The unit-delay LUT library A variable-pin-delay LUT library with wire-delays A variable-pin-delay LUT library with wire-delays LUT size LUT area LUT pin delays

10 Experimental Results LUT – number of LUTs Lev – number of LUT levels Delay – delay using LUT library Total – total runtime of Baseline Time1 – the runtime of AIG restructuring only Time2 – the total runtime of Speeup Geomean – geometric averages of columns Ratios – ratios of geometric averages

11 Conclusions and Future Work Developed a method that is Developed a method that is Fast – because there is no repeated timing analysis Fast – because there is no repeated timing analysis Simple – because it leverages AIG package and LUT mapper Simple – because it leverages AIG package and LUT mapper Effective – because it makes decision in the global space Effective – because it makes decision in the global space Future work may include Future work may include measuring improvements after place-and-route measuring improvements after place-and-route extending the algorithm to work for sequential circuits extending the algorithm to work for sequential circuits applying similar optimization for cost functions other than delay (e.g. switching activity minimization) applying similar optimization for cost functions other than delay (e.g. switching activity minimization)