1 Alan Mishchenko UC Berkeley Implementation of Industrial FPGA Synthesis Flow Revisited
2 Overview Introduction Motivation Motivation Structure of FPGA synthesis flow Structure of FPGA synthesis flow Overview of the previous system Overview of the previous system Lessons learned while developing new system Verilog parsing Verilog parsing Design representation Design representation Netlist datastructure Netlist datastructure Integration of application packages Integration of application packages Customization Customization Experimental results Future work
3 Motivation ABC is a logic synthesis and verification tool developed at Berkeley ( ) ABC has been in public domain since 2005, but it does not meet all of the industrial requirements New system is needed to fill the gap Magic was an industrial version of ABC developed in 2010 and used by several companies A new system to enhance ABC and replace Magic is being developed at this time This presentation shares this experience
4 What Is Missing in ABC? The baseline version of ABC is not applicable to industrial designs because it does not support Complex flops Complex flops Multiple clock domains Multiple clock domains Special objects (adders, RAMs, DSPs, etc) Special objects (adders, RAMs, DSPs, etc) Standard-cell libraries Standard-cell libraries
5 FPGA Synthesis Flow Inputting the design Sequential synthesis Comb synthesis with choices Retiming and resynthesis Tech mapping Outputting the design Verification
6 Magic: Synthesis Flow Based on ABC Verilog, EDIF, BLIF Programmable APIs A. Mishchenko, N. Een, R. K. Brayton, S. Jang, M. Ciesielski, and T. Daniel, "Magic: An industrial-strength logic optimization, technology mapping, and formal verification tool". Proc. IWLS'10.
7 Case Study 1: Combinational Synthesis with Structural Choices Traditional synthesis D2 D1 Synthesis with choices D3 HAIG D2 D1 D3D4 D4 Perform synthesis and keep track of changes Iterate fast local AIG rewriting with a global view (via hash table) Iterate fast local AIG rewriting with a global view (via hash table) Collect AIG snapshots and prove equivalences across them Collect AIG snapshots and prove equivalences across them Use equivalences (choices) during technology mapping Use equivalences (choices) during technology mapping Observations Leads to improved QoR after technology mapping Leads to improved QoR after technology mapping Successfully applied to 1M gate designs Successfully applied to 1M gate designs
8 Case Study 2: Sequential Verification Property checking Takes design and property and makes a miter (AIG) Takes design and property and makes a miter (AIG) Equivalence checking Takes two designs and makes a miter (AIG) Takes two designs and makes a miter (AIG) The goal is to transform AIG until the output can be proved const 0 Equivalence checking in Magic is based on the model checker that won Hardware Model Checking Competition in 2008, 2010, D2 D1 Equivalence checking 0 D1 Property checking 0 p
9 AIG: A Unifying Representation An underlying data structure for various computations Representing both local and global functions Representing both local and global functions Used in rewriting, resubstitution, simulation, SAT sweeping, induction, etc Used in rewriting, resubstitution, simulation, SAT sweeping, induction, etc A unifying representation for the whole flow Synthesis, mapping, verification pass around AIGs Synthesis, mapping, verification pass around AIGs Stored multiple structures for mapping (‘AIG with choices’) Stored multiple structures for mapping (‘AIG with choices’) The main functional representation in ABC Foundation of ‘contemporary’ logic synthesis Foundation of ‘contemporary’ logic synthesis Source of ‘signature features’ (speed, scalability, etc) Source of ‘signature features’ (speed, scalability, etc)
10 AIG: Definition and Examples cd ab F(a,b,c,d) = ab + d(ac’+bc) F(a,b,c,d) = ac’(b’d’)’ + c(a’d’)’ = ac’(b+d) + bc(a+d) cd a b nodes 4 levels 7 nodes 3 levels bcac a b d acbdbcad AIG is a Boolean network composed of two-input ANDs and inverters
11 Design size, gate count Time, years Conjunctive normal forms Truth tables Sum-of- products Binary Decision Diagrams Historical Perspective And-Inverter Graphs ,000,000 Espresso, MIS, SIS SIS, VIS, MVSIS ABC, Magic ,000
12 Magic 2: Lessons Learned (1) Verilog parsing Limit Verilog to a structural subset Limit Verilog to a structural subset (2) Design representation Represent only relevant data and hide useless details Represent only relevant data and hide useless details (3) Netlist data-structure Use simple, compact netlist data-structure Use simple, compact netlist data-structure (4) Integration of application packages Make packages independent of the netlist and interface them using AIGs Make packages independent of the netlist and interface them using AIGs (5) Customization Make the system user-independent Make the system user-independent
13 (1) Verilog Parsing Verilog parsing is believed to be a difficult problem, and companies (e.g. Verific) offer industry-standard solutions However, several simplifying assumptions can make Verilog parsing a 1-person 1-month project: Consider only structural Verilog Consider only structural Verilog Read the file into memory and parse it in memory Read the file into memory and parse it in memory Remove preprocessor definitions, comments, line endings, etc Remove preprocessor definitions, comments, line endings, etc Split into statements separated by semi-colons (;) Split into statements separated by semi-colons (;) Parse in two passes: first statements for module interfaces Parse in two passes: first statements for module interfaces module/endmodule, input/output/inout, etcmodule/endmodule, input/output/inout, etc Second, parse remaining statements, including instance definitions Second, parse remaining statements, including instance definitions Connect all constructed objects using net/pin names Connect all constructed objects using net/pin names Check the correctness of the connectivity info Check the correctness of the connectivity info
14 Example module add2( A, B, S, CO ); input [1:0] A, B; input [1:0] A, B; output CO, S[1:0]; output CO, S[1:0]; wire n1; wire n1; fadd inst1 (.ci(1’b0),.a(A[0]),.b(B[0]),.s(S[0]),.co(n1) ); fadd inst1 (.ci(1’b0),.a(A[0]),.b(B[0]),.s(S[0]),.co(n1) ); fadd inst2 (.ci(n1),.a(A[1]),.b(B[1]),.s(S[1]),.co(CO) ); fadd inst2 (.ci(n1),.a(A[1]),.b(B[1]),.s(S[1]),.co(CO) );endmodule module fadd( ci, a, b, s, co ); input ci, a, b; input ci, a, b; output s, co; output s, co; assign s = ci ^ a ^ b; assign s = ci ^ a ^ b; assign co = (ci & a) | (ci & b) | (a & b); assign co = (ci & a) | (ci & b) | (a & b);endmodule
15 (2) Design Representation Structural information Inputs, outputs, wires, internal objects, etc Inputs, outputs, wires, internal objects, etc Hierarchy (to be flattened, to be kept, library cells, etc) Hierarchy (to be flattened, to be kept, library cells, etc) Functional information Combinational: gates, LUTs Combinational: gates, LUTs Sequential: flip-flops, clocks Sequential: flip-flops, clocks Additional structural information White/black/grey boxes: RAM, DSP, regfiles, etc White/black/grey boxes: RAM, DSP, regfiles, etc Multiple clock domains, clock network Multiple clock domains, clock network Tri-states, in-outs, etc Tri-states, in-outs, etc
16 Handling Design Representation Design representation should be comprehensive (represent complete information) but flexible (work only on what is necessary at each time) Examples: to flatten hierarchy, only structural info is needed to flatten hierarchy, only structural info is needed to perform comb synthesis, only comb logic is needed to perform comb synthesis, only comb logic is needed In both cases, it should be possible to access and modify each type of information without changing other types
17 (3) Netlist Data Structure Should be very simple and easy to construct Objects use as little memory as possible Objects use as little memory as possible Currently, 4-LUT uses 28 bytes + memory for attributesCurrently, 4-LUT uses 28 bytes + memory for attributes Object attributes are added/removed on demand Object attributes are added/removed on demand For example, no need for fanout information in most casesFor example, no need for fanout information in most cases Objects ordered in memory in a topological order Objects ordered in memory in a topological order Improves runtime of iterative traversalsImproves runtime of iterative traversals Makes the code much simplerMakes the code much simpler Limitation Each time the netlist is modified, it needs to be duplicated Each time the netlist is modified, it needs to be duplicated
18 (4) Integration of Application Packages Application packages interact with design database Logic information is extracted and inserted in the form of AIGs Synthesis & verification are performed by ABC working on these AIGs
19 (5) Customization The system should be easily customizable The source code is the same for all users The source code is the same for all users Configuration files differ Configuration files differ Currently, the user “owns” the following: The library of primitives (a Verilog file) The library of primitives (a Verilog file) Timing info for primitives (e.g. LUT pin delays) Timing info for primitives (e.g. LUT pin delays) Timing models used for calculating data for boxes, complex flops, wires, etc Timing models used for calculating data for boxes, complex flops, wires, etc
20 Experimental Setup Integrated Magic into an industrial FPGA synthesis flow Experimented with the full flow, including P&R Did not use retiming Did not use retiming Did not use post-placement re-synthesis Did not use post-placement re-synthesis Verified by running Magic and in-house simulation tools Experimented with 20 designs, from 175K to 648K LUT4 Two experimental runs: “Reference” stands for the typical industrial flow without Magic “Reference” stands for the typical industrial flow without Magic “Magic” stands for the new flow with Magic “Magic” stands for the new flow with Magic Frontend Design entry, high-level synthesis, quick mapping Backend Placement, routing, design rule checking, etc Magic Seq and comb synthesis, mapping, legalization
21 Experimental Results
22 Cumulative Improvement (retiming excluded) 22
23 Future Work Improve the integration Simpler interfaces, better data consistency checking, etc Simpler interfaces, better data consistency checking, etc Improve application packages AIG rewriting, tech-mapping, sequential synthesis, etc AIG rewriting, tech-mapping, sequential synthesis, etc Integrate logic and physical synthesis Synthesis/mapping/retiming before placement Synthesis/mapping/retiming before placement Retiming/restructuring after placement Retiming/restructuring after placement Extend to work for various technologies Standard cells Standard cells Macro cells Macro cells LUT structures LUT structures LUT/MUX structures LUT/MUX structures
24 Abstract This talk is inspired by the recent experiences gained while developing an industrial-strength system for FPGA synthesis and mapping. First, we review the design representation with "industrial stuff", such as black and while boxes, complex flops, multiple clock domains, tristates, inouts, etc, and how to handle them in the tool whose primary strength is applying combinational synthesis and mapping. Next, we discuss several ideas for implementing a custom Verilog parser for hierarchical designs. Finally, we propose a low-memory netlist representation used to store the data and interface various optimization engines.