Iterative Layering: Optimizing Arithmetic Circuits by Structuring the Information Flow Ajay K. Verma 1, Philip Brisk 2, Paolo Ienne 1 International Conference.

Slides:



Advertisements
Similar presentations
Representing Boolean Functions for Symbolic Model Checking Supratik Chakraborty IIT Bombay.
Advertisements

Glitches & Hazards.
Functions and Functional Blocks
An Introduction to the Model Verifier verds Wenhui Zhang September 15 th, 2010.
Introduction So far, we have studied the basic skills of designing combinational and sequential logic using schematic and Verilog-HDL Now, we are going.
Architectural Improvement for Field Programmable Counter Array: Enabling Efficient Synthesis of Fast Compressor Trees on FPGA Alessandro Cevrero 1,2 Panagiotis.
Class Presentation on Binary Moment Diagrams by Krishna Chillara Base Paper: “Verification of Arithmetic Circuits using Binary Moment Diagrams” by.
ECE 667 Synthesis & Verification - Boolean Functions 1 ECE 667 Spring 2013 ECE 667 Spring 2013 Synthesis and Verification of Digital Circuits Boolean Functions.
Optimizing high speed arithmetic circuits using three-term extraction Anup Hosangadi Ryan Kastner Farzan Fallah ECE Department Fujitsu Laboratories University.
Arithmetic II CPSC 321 E. J. Kim. Today’s Menu Arithmetic-Logic Units Logic Design Revisited Faster Addition Multiplication (if time permits)
Common Subexpression Elimination Involving Multiple Variables for Linear DSP Synthesis 15 th IEEE International Conference on Application Specific Architectures.
Logic Synthesis 5 Outline –Multi-Level Logic Optimization –Recursive Learning - HANNIBAL Goal –Understand recursive learning –Understand HANNIBAL algorithms.
ECE Synthesis & Verification - Lecture 18 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Systems Word-level.
A New Approach to Structural Analysis and Transformation of Networks Alan Mishchenko November 29, 1999.
ECE Synthesis & Verification - Lecture 19 1 ECE 667 Spring 2009 ECE 667 Spring 2009 Synthesis and Verification of Digital Systems Functional Decomposition.
Reachability Analysis using AIGs (instead of BDDs?) 290N: The Unknown Component Problem Lecture 23.
Reducing Hardware Complexity of Linear DSP Systems by Iteratively Eliminating Two-Term Common Subexpressions IEEE/ACM Asia South Pacific Design Automation.
Contemporary Logic Design Two-Level Logic © R.H. Katz Transparency No. 3-1 Chapter #2: Two-Level Combinational Logic Section 2.1, Logic Functions.
Contemporary Logic Design Two-Level Logic © R.H. Katz Transparency No. 4-1 Chapter #2: Two-Level Combinational Logic Section 2.3, Switches and Tools.
Arithmetic-Logic Units CPSC 321 Computer Architecture Andreas Klappenecker.
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 3 – Combinational Logic Design Part 1 –
ECE 667 Synthesis and Verification of Digital Systems
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 3 – Combinational Logic Design Part 1 –
Propositional Calculus Math Foundations of Computer Science.
Overview Part 1 – Design Procedure 3-1 Design Procedure
Logic Decomposition ECE1769 Jianwen Zhu (Courtesy Dennis Wu)
Digitaalsüsteemide verifitseerimise kursus1 Formal verification: BDD BDDs applied in equivalence checking.
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Efficient.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Lecture 12 – Design Procedure.
Systems Architecture I1 Propositional Calculus Objective: To provide students with the concepts and techniques from propositional calculus so that they.
Chap 3. Chap 3. Combinational Logic Design. Chap Combinational Circuits l logic circuits for digital systems: combinational vs sequential l Combinational.
Combinational Logic Design BIL- 223 Logic Circuit Design Ege University Department of Computer Engineering.
Enhancing FPGA Performance for Arithmetic Circuits Philip Brisk 1 Ajay K. Verma 1 Paolo Ienne 1 Hadi Parandeh-Afshar 1,2 1 2 University of Tehran Department.
Description and Analysis of MULTIPLIERS using LAVA.
Ajay K. Verma, Philip Brisk and Paolo Ienne Processor Architecture Laboratory (LAP) & Centre for Advanced Digital Systems (CSDA) Ecole Polytechnique Fédérale.
A Flexible DSP Block to Enhance FGPA Arithmetic Performance
Combinational Problems: Unate Covering, Binate Covering, Graph Coloring and Maximum Cliques Example of application: Decomposition.
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 3 – Combinational Logic Design Part 1 –
2-1 Introduction Gate Logic: Two-Level Simplification Design Example: Two Bit Comparator Block Diagram and Truth Table A 4-Variable K-map for each of the.
Background Motivation Implementation Conclusion 2.
Ajay K. Verma, Philip Brisk and Paolo Ienne Processor Architecture Laboratory (LAP) & Centre for Advanced Digital Systems (CSDA) Ecole Polytechnique Fédérale.
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Multi-Level Logic Synthesis.
A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International.
© Copyright 2008 STI INNSBRUCK Intelligent Systems Propositional Logic.
Full Tree Multipliers All k PPs Produced Simultaneously Input to k-input Multioperand Tree Multiples of a (Binary, High-Radix or Recoded) Formed at Top.
Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.
Logical Agents Chapter 7. Outline Knowledge-based agents Propositional (Boolean) logic Equivalence, validity, satisfiability Inference rules and theorem.
BDS – A BDD Based Logic Optimization System Presented by Nitin Prakash (ECE 667, Spring 2011)
Proof Methods for Propositional Logic CIS 391 – Intro to Artificial Intelligence.
Grade School Again: A Parallel Perspective CS Lecture 7.
Philip Brisk 2 Paolo Ienne 2 Hadi Parandeh-Afshar 1,2 1: University of Tehran, ECE Department 2: EPFL, School of Computer and Communication Sciences Improving.
2009/6/30 CAV Quantifier Elimination via Functional Composition Jie-Hong Roland Jiang Dept. of Electrical Eng. / Grad. Inst. of Electronics Eng.
Binary Decision Diagrams Prof. Shobha Vasudevan ECE, UIUC ECE 462.
Ajay K. Verma, Philip Brisk and Paolo Ienne Processor Architecture Laboratory (LAP) & Centre for Advanced Digital Systems (CSDA) Ecole Polytechnique Fédérale.
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
Amit Verma National Institute of Technology, Rourkela, India
Overview Part 1 – Design Procedure Beginning Hierarchical Design
A. Mishchenko S. Chatterjee1 R. Brayton UC Berkeley and Intel1
A Boolean Paradigm in Multi-Valued Logic Synthesis
SAT-Based Area Recovery in Technology Mapping
Canonical Computation without Canonical Data Structure
Canonical Computation Without Canonical Data Structure
Chapter 3 – Combinational Logic Design
Canonical Computation without Canonical Data Structure
This Lecture Substitution model
Description and Analysis of MULTIPLIERS using LAVA
ECE 352 Digital System Fundamentals
Presentation transcript:

Iterative Layering: Optimizing Arithmetic Circuits by Structuring the Information Flow Ajay K. Verma 1, Philip Brisk 2, Paolo Ienne 1 International Conference on Computer-Aided Design November 5, Processor Architecture Laboratory School of Computer and Communication Sciences Ecole Polytechnique Fédérale de Lausanne (EPFL) 2 Department of Computer Science and Engineering Bourns College of Engineering University of California, Riverside

Logic Optimization Strategies Ripple-Carry AdderCarry-Lookahead Adder Logic synthesis tools –Local optimization via Boolean minimization Architectural transformation –Not with “traditional” logic synthesis 1

Leading Zero Detector 2 16% faster, 8% smaller [Oklobdzija, TLVSI 1994] Naïve Implementation Optimized Implementation

Outline Decomposition Techniques Progressive Decomposition and its Shortcomings –[Verma et al., DAC 2007] Iterative Layering Algorithm Experimental Results Conclusion 3

Outline Decomposition Techniques Progressive Decomposition and its Shortcomings –[Verma et al., DAC 2007] Iterative Layering Algorithm Experimental Results Conclusion 3

4 Decomposition

4 Optimize the red block locally Recursively decompose the remaining circuit Decomposition

4 Input condensation –At each step, fewer input bits remain –Imposes hierarchy on the circuit Decomposition

4 The result is a well-structured hierarchical circuit Decomposition

Disjoint Decomposition Non-disjoint Decomposition 5

Disjoint Decomposition Example: 8:4 Parallel Counter s c (Full Adder) 6

4x4-bit Multiplier 7 y0y0 x0x0 y1y1 x0x0 y2y2 x0x0 y3y3 x0x0 y0y0 x1x1 y1y1 x1x1 y2y2 x1x1 y3y3 x1x1 y0y0 x2x2 y1y1 x2x2 y2y2 x2x2 y3y3 x2x2 y0y0 x3x3 y1y1 x3x3 y2y2 x3x3 y3y3 x3x3 4 bits Σ X Y PPG X Y 4 bits 16 bits

4x4-bit Multiplier 7 4 bits Σ X Y PPG X Y 4 bits 16 bits Partial product reduction tree has a disjoint decomposition

4x4-bit Multiplier 7 4 bits Σ X Y PPG X Y 4 bits 16 bits Partial product reduction tree has a disjoint decomposition The partial product generator requires a non-disjoint decomposition

M1M2 48 E1E sign neg s1s2 xor out Compound Circuits M1M2 48 E1E2 19 sign not out and 1 4 s1s2 xor g72x 12% faster, 55% larger 8

Outline Decomposition Techniques Progressive Decomposition and its Shortcomings –[Verma et al., DAC 2007] Iterative Layering Algorithm Experimental Results Conclusion 9

Successfully structured some arithmetic circuits –Ripple-carry adder Inferred parallel prefix adder –3-input ripple-carry adder Inferred carry-save adder –Leading zero detector Inferred design of [Oklobdzija 1994] –Various counters, majorityInferred carry-free structures functions, etc.based on carry-save addition 10 Progressive Decomposition [Verma et al., DAC 2007]

Disjoint decomposition –Forget about multipliers –Cannot handle compound arithmetic circuits Entire algorithm based on Reed-Muller Form –Rewrite ‘your’ optimizer, e.g., if you use AIGs or BDDs. –Exponential size for leading one detector Leading zero detector remains polynomial 10

Outline Decomposition Techniques Progressive Decomposition and its Shortcomings –[Verma et al., DAC 2007] Iterative Layering Algorithm Experimental Results Conclusion 11

12 Non-disjoint decomposition –Yields disjoint decompositions when appropriate Not tied to any specific circuit representation –Our implementation uses BDDs SAT-based functional dependence test [Lee et al., ICCAD 2007] –Requires efficient conversion to CNF –Functional dependence is inherent to any decomposition Iterative Layering

13 Bricks –Definition and algorithmic overview –Evaluation metrics Brick Enumeration –Cofactor enumeration –Generate bricks from cofactors Brick Selection –Problem formulation related to Set Cover Iterative Layering Outline

13 Bricks –Definition and algorithmic overview –Evaluation metrics Brick Enumeration –Cofactor enumeration –Generate bricks from cofactors Brick Selection –Problem formulation related to Set Cover Iterative Layering Outline

Bricks 14 A subcircuit with < k inputs and one output –Any functional dependence may exist between a brick and the original expression –Kernels and co-kernels are bricks The dependence is disjunctive by definition E = ac + ad + bc + bd7 gates Brick: p = a + b(1 gate) E = pc + pd4 gates E = p(c + d)3 gates

Iterative Layering Algorithm 15 Enumerate all bricks having < k inputs –k=6 in our implementation Evaluate all bricks based on a merit function Select a subset of bricks –The subset must contain all of the information about the circuit –The subset should be optimal w.r.t. some optimization criteria The selected bricks form a “layer” Stack layers on top of one another to structure the circuit

Information Fitness 16 Estimated gate reduction –Size of BDD of input expression [Macii et al., GLS-VLSI 1999] fg p Info. Fitness = Size(BDD f ) Size(BDD g ) + Size(BDD p )

Information Coverage 17 E – expression to optimize p – brick under consideration D = on-set(E)  off-set(E) N = {(x, y)  D| p(x)  p(y)} Intuition: Attempt to quantify the functional dependency from p to E Limitation: Requires completely specified truth table –Size is exponential in the number of inputs Our Approach: Randomly sample the truth table of E Theorem 1 in the paper includes some probabilistic justification Info. Coverage = |N| |D|

18 Bricks –Definition and algorithmic overview –Evaluation metrics Brick Enumeration –Cofactor enumeration –Generate bricks from cofactors Brick Selection –Problem formulation related to Set Cover Iterative Layering: Outline

Brute Force Cofactor Enumeration 19 Enumerate every combination of k input bits E = ab  cd  (a  b)(c  d) B = {a, b, c}R = {d} Enumerate the set of cofactors with respect to R S = {E d E d } = {ab  bc  ac, ab  bc  ac  a  b  c} Problem: |S| = 2 |R|

Cofactor Enumeration via Sampling and SAT-based Functional Dependence Testing 20 1.Generate an initial set of cofactors using random sampling 2.Test if E depends on the cofactors and any remaining variables [Lee et al., ICCAD 2007] SAT = FALSE implies a full dependence SAT = TRUE implies a partial dependence Satisfying assignment of input variables yields one missing cofactor 3.Repeat Step 2 until SAT = FALSE

Brick Computation: Summary 21 For every combination of at most k input bits Generate the cofactors of the remaining bits –Random sampling + SAT-based functional dependence testing Discard useless cofactors –Details are in the paper Recursively apply iterative layering with a smaller value of k to generate the bricks from the cofactors That’s a lot of bricks! Which bricks do I really need?

22 Bricks –Definition and algorithmic overview –Evaluation metric Brick Enumeration –Cofactor enumeration –Generate bricks from cofactors Brick Selection –Problem formulation related to Set Cover Iterative Layering: Outline

Brick Selection: Overview 23 Goal: Find a minimal set of bricks that covers all points in on-set(E)  off-set(E) Greedy heuristic based on [Johnson, HCSS 1974] –Select a brick that maximizes Info.Fitness  Info.Coverage –Update Info.Fitness and Info.Coverage for the remaining bricks –Stop when E is functionally dependent on the chosen bricks [Lee et al., ICCAD 2007] See the paper for details on the data structures used

Outline Decomposition Techniques Progressive Decomposition and its Shortcomings –[Verma et al., DAC 2007] Iterative Layering Algorithm Experimental Results Conclusion 24

Experimental Setup Circuit written by hand Known Arithmetic Circuits Progressive Decomposition [Verma et al., DAC 2007] Synopsis Design Compiler - compile_ultra - minimize delay Artisan Standard Cells UMC (90 nm) Iterative Layering 4 25

Critical Path Delay 26 OriginalProgressive Decomposition Iterative Layering Library/Manual Implementation Optimized for Area, Not Delay Progressive Decomposition Fails ns

Area 27 OriginalProgressive Decomposition Iterative Layering Library/Manual Implementation Optimized for Area, Not Delay Progressive Decomposition Fails μm2μm2

n-bit, k-input MAX Function 28 Pairwise Comparison of Inputs ½k(k - l) comparators Delay: O(log n + log k) Area: O(k 2 n) 0.21ns, 3479  m 2 Iterative Layering 0.22ns, 1331  m 2 Binary Tree of Comparators k - l comparators Delay: O(log n  log k) Area: O(kn) 0.46ns, 1755  m 2 (Circuit structure was unknown to us)

IntegerDomination TableCount Leading 1’s 8-bit, 4-input MAX Example 29 (22) (59) (62) (61) Replace any all-zero column with ones! 001(1) 100(4) 110(6) 101(5) (1)001 (4)100 (6)110 (5) (0) 01(1) 10(2) 01(1) (0)00 (1)01 (2)10 (1) MAX 0

Outline Decomposition Techniques Progressive Decomposition and its Shortcomings –[Verma et al., DAC 2007] Iterative Layering Algorithm Experimental Results Conclusion 30

Conclusion Iterative Layering structures arithmetic circuits –Automatically infer well-known manual designs from arithmetic literature –Fixes shortcomings of Progressive Decomposition Non-disjoint decomposition Usable with any circuit representation 31 PD IL ADD3-ADDLZDMULSHFTMAX Compound Arithmetic Circuits