CSE241 L5 Synthesis.1Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits UC San Diego Winter 2003 Lecture 05: Logic Synthesis Cho Moon Cadence Design Systems January 21, 2003
CSE241 L5 Synthesis.2Kahng & Cichy, UCSD ©2003 Outline Introduction Two-level Logic Synthesis Multi-level Logic Synthesis
CSE241 L5 Synthesis.3Kahng & Cichy, UCSD ©2003 Introduction Cho Moon l PhD from UC Berkeley 92 l Lattice Semiconductor (synthesis) l Cadence Design Systems (synthesis, verification and timing analysis) 96 – present Why logic synthesis? l Ubiquitous – used almost everywhere VLSI is done l Body of useful and general techniques – same solutions can be used for different problems l Foundation for many applications such as -Formal verification -ATPG -Timing analysis -Sequential optimization
CSE241 L5 Synthesis.4Kahng & Cichy, UCSD ©2003 RTL Design Flow RTL Synthesis HDL netlist Logic Synthesis netlist Library Physical Synthesis layout a b s q 0 1 d clk a b s q 0 1 d Module Generators Manual Design Slide courtesy of Devadas, et. al
CSE241 L5 Synthesis.5Kahng & Cichy, UCSD ©2003 Logic Synthesis Problem Given l Initial gate-level netlist l Design constraints -Input arrival times, output required times, power consumption, noise immunity, etc… l Target technology libraries Produce l Smaller, faster or cooler gate-level netlist that meets constraints Very hard optimization problem!
CSE241 L5 Synthesis.6Kahng & Cichy, UCSD ©2003 Combinational Logic Synthesis Logic Synthesis netlist Library tech independent tech dependent 2-level Logic opt multilevel Logic opt Library Slide courtesy of Devadas, et. al
CSE241 L5 Synthesis.7Kahng & Cichy, UCSD ©2003 Outline Introduction Two-level Logic Synthesis Multi-level Logic Synthesis Sequential Logic Synthesis
CSE241 L5 Synthesis.8Kahng & Cichy, UCSD ©2003 Two-level Logic Synthesis Problem Given an arbitrary logic function in two-level form, produce a smaller representation. For sum-of-products (SOP) implementation on PLAs, fewer product terms and fewer inputs to each product term mean smaller area. F = A B + A B C F = A B I1 I2 O1 O2
CSE241 L5 Synthesis.9Kahng & Cichy, UCSD ©2003 Boolean Functions f(x) : B n B B = {0, 1}, x = (x 1, x 2, …, x n ) x 1, x 2, … are variables x 1, x 1, x 2, x 2, … are literals each vertex of B n is mapped to 0 or 1 the onset of f is a set of input values for which f(x) = 1 the offset of f is a set of input values for which f(x) = 0
CSE241 L5 Synthesis.10Kahng & Cichy, UCSD ©2003 Literals Slide courtesy of Devadas, et. al
CSE241 L5 Synthesis.11Kahng & Cichy, UCSD ©2003 Boolean Formulas Slide courtesy of Devadas, et. al
CSE241 L5 Synthesis.12Kahng & Cichy, UCSD ©2003 Logic Functions: Slide courtesy of Devadas, et. al
CSE241 L5 Synthesis.13Kahng & Cichy, UCSD ©2003 Cube Representation Slide courtesy of Devadas, et. al
CSE241 L5 Synthesis.14Kahng & Cichy, UCSD ©2003 Operations on Logic Functions (1)Complement: f f interchange ON and OFF-SETS (2)Product (or intersection or logical AND) h = f g or h = f g (3)Sum (or union or logical OR): h = f + g or h = f g
CSE241 L5 Synthesis.15Kahng & Cichy, UCSD ©2003 A function can be represented by a sum of cubes (products): f = ab + ac + bc Since each cube is a product of literals, this is a “sum of products” representation A SOP can be thought of as a set of cubes F F = {ab, ac, bc} = C A set of cubes that represents f is called a cover of f. F={ab, ac, bc} is a cover of f = ab + ac + bc. Sum-of-products (SOP)
CSE241 L5 Synthesis.16Kahng & Cichy, UCSD ©2003 Prime Cover A cube is prime if there is no other cube that contains it (for example, b c is not a prime but b is) A cover is prime iff all of its cubes are prime c a b
CSE241 L5 Synthesis.17Kahng & Cichy, UCSD ©2003 Irredundant Cube A cube of a cover C is irredundant if C fails to be a cover if c is dropped from C A cover is irredundant iff all its cubes are irredudant (for exmaple, F = a b + a c + b c) c b a Not covered
CSE241 L5 Synthesis.18Kahng & Cichy, UCSD ©2003 Quine-McCluskey Method We want to find a minimum prime and irredundant cover for a given function. l Prime cover leads to min number of inputs to each product term. l Min irredundant cover leads to min number of product terms. Quine-McCluskey (QM) method (1960’s) finds a minimum prime and irredundant cover. l Step 1: List all minterms of on-set: O(2^n) n = #inputs l Step 2: Find all primes: O(3^n) n = #inputs l Step 3: Construct minterms vs primes table l Step 4: Find a min set of primes that covers all the minterms: O(2^m) m = #primes
CSE241 L5 Synthesis.19Kahng & Cichy, UCSD ©2003 QM Example (Step 1) F = a’ b’ c’ + a b’ c’ + a b’ c + a b c + a’ b c List all on-set minterms Minterms a’ b’ c’ a b’ c’ a b’ c a b c a’ b c
CSE241 L5 Synthesis.20Kahng & Cichy, UCSD ©2003 QM Example (Step 2) F = a’ b’ c’ + a b’ c’ + a b’ c + a b c + a’ b c Find all primes. primesb’ c’a b’a cb c
CSE241 L5 Synthesis.21Kahng & Cichy, UCSD ©2003 QM Example (Step 3) F = a’ b’ c’ + a b’ c’ + a b’ c + a b c + a’ b c Construct minterms vs primes table (prime implicant table) by determining which cube is contained in which prime. X at row i, colum j means that cube in row i is contained by prime in column j. b’ c’a b’a cb c a’ b’ c’X a b’ c’XX a b’ cXX a b cXX a’ b cX
CSE241 L5 Synthesis.22Kahng & Cichy, UCSD ©2003 QM Example (Step 4) F = a’ b’ c’ + a b’ c’ + a b’ c + a b c + a’ b c Find a minimum set of primes that covers all the minterms “Minimum column covering problem” b’ c’a b’a cb c a’ b’ c’X a b’ c’XX a b’ cXX a b cXX a’ b cX Essential primes
CSE241 L5 Synthesis.23Kahng & Cichy, UCSD ©2003 ESPRESSO – Heuristic Minimizer Quine-McCluskey gives a minimum solution but is only good for functions with small number of inputs (< 10) ESPRESSO is a heuristic two-level minimizer that finds a “minimal” solution ESPRESSO(F) { do { reduce(F); expand(F); irredundant(F); } while (fewer terms in F); verfiy(F); }
CSE241 L5 Synthesis.24Kahng & Cichy, UCSD ©2003 ESPRESSO ILLUSTRATED Reduce Irredundant Expand
CSE241 L5 Synthesis.25Kahng & Cichy, UCSD ©2003 Outline Introduction Two-level Logic Synthesis Multi-level Logic Synthesis
CSE241 L5 Synthesis.26Kahng & Cichy, UCSD ©2003 Multi-level Logic Synthesis Two-level logic synthesis is effective and mature Two-level logic synthesis is directly applicable to PLAs and PLDs But… There are many functions that are too expensive to implement in two-level forms (too many product terms!) Two-level implementation constrains layout (AND-plane, OR-plane) Rule of thumb: l Two-level logic is good for control logic l Multi-level logic is good for datapath or random logic
CSE241 L5 Synthesis.27Kahng & Cichy, UCSD ©2003 Representation: Boolean Network Boolean network: directed acyclic graph (DAG) node logic function representation f j (x,y) node variable y j : y j = f j (x,y) edge (i,j) if f j depends explicitly on y i Inputs x = (x 1, x 2,…,x n ) Outputs z = (z 1, z 2,…,z p ) Slide courtesy of Brayton
CSE241 L5 Synthesis.28Kahng & Cichy, UCSD ©2003 Multi-level Logic Synthesis Problem Given l Initial Boolean network l Design constraints -Arrival times, required times, power consumption, noise immunity, etc… l Target technology libraries Produce l a minimum area netlist consisting of the gates from the target libraries such that design constraints are satisfied
CSE241 L5 Synthesis.29Kahng & Cichy, UCSD ©2003 Modern Approach to Logic Optimization Divide logic optimization into two subproblems: l Technology-independent optimization -determine overall logic structure -estimate costs (mostly) independent of technology -simplified cost modeling l Technology-dependent optimization (technology mapping) -binding onto the gates in the library -detailed technology-specific cost model Orchestration of various optimization/transformation techniques for each subproblem Slide courtesy of Keutzer
CSE241 L5 Synthesis.30Kahng & Cichy, UCSD ©2003 Technology-Independent Optimization Simplified cost models Area = sum of factored form literals in all nodes l Number of product terms is not a good measure of area in multi- level implementation f=ad+ae+bd+be+cd+ce (6 product terms) f’=a’b’c’+d’e’ (2 product terms) The only difference between f and f’ is inversion f=(a+b+c)(d+e) (5 literals in factored form) f’= a’b’c’ + d’e’ (5 literals in factored form) Delay = levels of logic on critical paths
CSE241 L5 Synthesis.31Kahng & Cichy, UCSD ©2003 Technology-Independent Optimization Technology-independent optimization is a bag of tricks: l Two-level minimization (also called simplify) l Constant propagation (also called sweep) f = a b + c; b = 1 => f = a + c l Decomposition (single function) f = abc+abd+a’c’d’+b’c’d’ => f = xy + x’y’; x = ab ; y = c+d l Extraction (multiple functions) f = (az+bz’)cd+e g = (az+bz’)e’ h = cde f = xy+e g = xe’ h = ye x = az+bz’ y = cd
CSE241 L5 Synthesis.32Kahng & Cichy, UCSD ©2003 More Technology-Independent Optimization More technology-independent optimization tricks: l Substitution g = a+bf = a+bc f = g(a+b) l Collapsing (also called elimination) f = ga+g’bg = c+d f = ac+ad+bc’d’g = c+d l Factoring (series-parallel decomposition) f = ac+ad+bc+bd+e => f = (a+b)(c+d)+e
CSE241 L5 Synthesis.33Kahng & Cichy, UCSD ©2003 Summary of Typical Recipe for TI Optimization Propagate constants Simplify: two-level minimization at Boolean network node Decomposition Local “Boolean” optimizations l Boolean techniques exploit Boolean identities (e.g., a a’ = 0) Consider f = a b’ + a c’ + b a’ + b c’ + c a’ + c b’ l Algebraic factorization procedures f = a (b’ + c’) + a’ (b + c) + b c’ + c b’ l Boolean factorization produces f = (a + b + c) (a’ + b’ + c’) Slide courtesy of Keutzer
CSE241 L5 Synthesis.34Kahng & Cichy, UCSD ©2003 Technology-Dependent Optimization Technology-dependent optimization consists of Technology mapping: maps Boolean network to a set of gates from technology libraries Local transformations l Discrete resizing l Cloning l Fanout optimization (buffering) l Logic restructuring Slide courtesy of Keutzer
CSE241 L5 Synthesis.35Kahng & Cichy, UCSD ©2003 Technology Mapping Input 1. Technology independent, optimized logic network 2. Description of the gates in the library with their cost Output l Netlist of gates (from library) which minimizes total cost General Approach l Construct a subject DAG for the network l Represent each gate in the target library by pattern DAG’s l Find an optimal-cost covering of subject DAG using the collection of pattern DAG’s l Canonical form: 2-input NAND gates and inverters
CSE241 L5 Synthesis.36Kahng & Cichy, UCSD ©2003 DAG Covering DAG covering is an NP-hard problem Solve the sub-problem optimally l Partition DAG into a forest of trees l Solve each tree optimally using tree covering l Stitch trees back together Slide courtesy of Keutzer
CSE241 L5 Synthesis.37Kahng & Cichy, UCSD ©2003 Tree Covering Algorithm Transform netlist and libraries into canonical forms l 2-input NANDs and inverters Visit each node in BFS from inputs to outputs l Find all candidate matches at each node N -Match is found by comparing topology only (no need to compare functions) l Find the optimal match at N by computing the new cost -New cost = cost of match at node N + sum of costs for matches at children of N l Store the optimal match at node N with cost Optimal solution is guaranteed if cost is area Complexity = O(n) where n is the number of nodes in netlist
CSE241 L5 Synthesis.38Kahng & Cichy, UCSD ©2003 Tree Covering Example into the technology library (simple example below) Find an ``optimal’’ (in area, delay, power) mapping of this circuit Slide courtesy of Keutzer
CSE241 L5 Synthesis.39Kahng & Cichy, UCSD ©2003 Elements of a library - 1 INVERTER2 NAND23 NAND34 NAND45 Element/Area CostTree Representation (normal form) Slide courtesy of Keutzer
CSE241 L5 Synthesis.40Kahng & Cichy, UCSD ©2003 Elements of a library - 2 AOI21 4 AOI22 5 Element/Area Cost Tree Representation (normal form) Slide courtesy of Keutzer
CSE241 L5 Synthesis.41Kahng & Cichy, UCSD ©2003 Trivial Covering subject DAG 7NAND2 (3) = 21 5INV (2) = 10 Area cost 31 Slide courtesy of Keutzer Can we do better with tree covering?
CSE241 L5 Synthesis.42Kahng & Cichy, UCSD ©2003 Optimal tree covering - 1 ``subject tree’’ Slide courtesy of Keutzer
CSE241 L5 Synthesis.43Kahng & Cichy, UCSD ©2003 Optimal tree covering - 2 ``subject tree’’ Slide courtesy of Keutzer
CSE241 L5 Synthesis.44Kahng & Cichy, UCSD ©2003 Optimal tree covering - 3 ``subject tree’’ Cover with ND2 or ND3 ? NAND23 + subtree 5 1 NAND3= 4 Area cost 8 Slide courtesy of Keutzer
CSE241 L5 Synthesis.45Kahng & Cichy, UCSD ©2003 Optimal tree covering – 3b ``subject tree’’ Label the root of the sub-tree with optimal match and cost Slide courtesy of Keutzer
CSE241 L5 Synthesis.46Kahng & Cichy, UCSD ©2003 Optimal tree covering - 4 ``subject tree’’ Cover with INV or AO21 ? Inverter 2 + subtree 13 Area cost 15 1 AO subtree subtree 22 Area cost 9 Slide courtesy of Keutzer
CSE241 L5 Synthesis.47Kahng & Cichy, UCSD ©2003 Optimal tree covering – 4b ``subject tree’’ Label the root of the sub-tree with optimal match and cost Slide courtesy of Keutzer
CSE241 L5 Synthesis.48Kahng & Cichy, UCSD ©2003 Optimal tree covering - 5 ``subject tree’’ Cover with ND2 or ND3 ? subtree 19 subtree 24 1 NAND23 Area cost 16 NAND2 NAND subtree 18 subtree 22 subtree 34 1 NAND34 Area cost 18 2 Slide courtesy of Keutzer
CSE241 L5 Synthesis.49Kahng & Cichy, UCSD ©2003 Optimal tree covering – 5b ``subject tree’’ Label the root of the sub-tree with optimal match and cost Slide courtesy of Keutzer
CSE241 L5 Synthesis.50Kahng & Cichy, UCSD ©2003 Optimal tree covering - 6 ``subject tree’’ Cover with INV or AOI21 ? INV AOI21 Area cost Area cost 18 subtree INV 2 subtree 113 subtree AOI Slide courtesy of Keutzer
CSE241 L5 Synthesis.51Kahng & Cichy, UCSD ©2003 Optimal tree covering – 6b ``subject tree’’ Label the root of the sub-tree with optimal match and cost Slide courtesy of Keutzer
CSE241 L5 Synthesis.52Kahng & Cichy, UCSD ©2003 Optimal tree covering - 7 ``subject tree’’ Cover with ND2 or ND3 or ND4 ? Slide courtesy of Keutzer
CSE241 L5 Synthesis.53Kahng & Cichy, UCSD ©2003 Cover 1 - NAND2 ``subject tree’’ Cover with ND2 ? subtree 118 subtree NAND2 3 Area cost Slide courtesy of Keutzer
CSE241 L5 Synthesis.54Kahng & Cichy, UCSD ©2003 Cover 2 - NAND3 ``subject tree’’ Cover with ND3? subtree 1 9 subtree 2 4 subtree NAND3 4 Area cost Slide courtesy of Keutzer
CSE241 L5 Synthesis.55Kahng & Cichy, UCSD ©2003 Cover - 3 ``subject tree’’ Cover with ND4 ? Area cost 19 subtree 1 8 subtree 2 2 subtree 3 4 subtree NAND Slide courtesy of Keutzer
CSE241 L5 Synthesis.56Kahng & Cichy, UCSD ©2003 Optimal Cover was Cover 2 ``subject tree’’ INV 2 ND2 3 2 ND3 8 AOI21 4 Area cost 17 AOI21 ND2 INV ND3 Slide courtesy of Keutzer
CSE241 L5 Synthesis.57Kahng & Cichy, UCSD ©2003 Summary of Technology Mapping DAG covering formulation l Separated library issues from mapping algorithm (can’t do this with rule-based systems) Tree covering approximation l Very efficient (linear time) l Applicable to wide range of libraries (std cells, gate arrays) and technologies (FPGAs, CPLDs) Weaknesses l Problems with DAG patterns (Multiplexors, full adders, …) l Large input gates lead to a large number of patterns
CSE241 L5 Synthesis.58Kahng & Cichy, UCSD ©2003 Local Transformations Given l Technology-mapped netlist l Target technology libraries l Some violations (timing, noise immunity, power, etc…) Produce l New netlist that corrects the given violations without introducing new violations Approach: another bag of tricks l Discrete resizing l Cloning l Buffering l Logic restructuring l More…
CSE241 L5 Synthesis.59Kahng & Cichy, UCSD ©2003 Discrete Resizing b a d e f ? b a A b a C DAC-2002, Physical Chip Implementation Note that some arrival and required times become invalid
CSE241 L5 Synthesis.60Kahng & Cichy, UCSD ©2003 Cloning b a d e f g h 0.2 ? b a d e f g h A B DAC-2002, Physical Chip Implementation Note that loads at a, b increase
CSE241 L5 Synthesis.61Kahng & Cichy, UCSD ©2003 Buffering b a d e f g h 0.2 ? b a d e f g h B B DAC-2002, Physical Chip Implementation
CSE241 L5 Synthesis.62Kahng & Cichy, UCSD ©2003 Logic Restructuring 1 Nodes in critical section that fan out outside of critical section are duplicated h f a b c d e Late input signals f Collapsed node b c d a e e h Slides courtesy of Keutzer
CSE241 L5 Synthesis.63Kahng & Cichy, UCSD ©2003 Logic Restructuring 2 f Collapsed node b c d a e Collapse critical section Re-extract factor k divisor f k d c close to output b a e Place timing-critical nodes closer to output l Make them pass through fewer gates l After collapse, a divisor is selected such that substituting k into f places critical signal c and d closer to output Slides courtesy of Keutzer
CSE241 L5 Synthesis.64Kahng & Cichy, UCSD ©2003 Summary of Local Transformations Variety of methods for delay optimization l No single technique dominates The one with more tricks wins? No! Need a good framework for evaluating and processing different transforms l Accurate, fast timing engine with incremental analysis capability -don’t want to retime the whole design for each local transform l Simultaneous min and max delay analysis -How does fixing the setup violation affect the existing hold checks?