Sabyasachi Das Synplicity Inc Sunil P. Khatri Texas A&M University

Slides:

Advertisements

Similar presentations

Lecture 13: Sequential Circuits

Advertisements

1 ECE 4436ECE 5367 Computer Arithmetic I-II. 2 ECE 4436ECE 5367 Addition concepts 1 bit adder –2 inputs for the operands. –Third input – carry in from.

CPE 626 CPU Resources: Adders & Multipliers Aleksandar Milenkovic Web:

1 Lecture 12: Hardware for Arithmetic Today’s topics:  Designing an ALU  Carry-lookahead adder Reminder: Assignment 5 will be posted in a couple of days.

n-bit comparator using 1-bit comparator

A Regularity-Driven Fast Gridless Detailed Router for High Frequency Datapath Designs By Sabyasachi Das (Intel Corporation) Sunil P. Khatri (Univ. of Colorado,

Fast Adders See: P&H Chapter 3.1-3, C Goals: serial to parallel conversion time vs. space tradeoffs design choices.

1 A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri.

1 Generation of Optimal Bit-Width Topology of Fast Hybrid Adder in a Parallel Multiplier Sabyasachi Das Synplicity Inc. Sunil P. Khatri Texas A&M University.

1 Timing-Driven Synthesis for Fast Barrel Shifters Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri Texas A&M University.

1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil.

ECE C03 Lecture 61 Lecture 6 Arithmetic Logic Circuits Hai Zhou ECE 303 Advanced Digital Design Spring 2002.

Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H

Lecture 8 Arithmetic Logic Circuits

1 Area-reducing Sharing of Mutually Exclusive Multiplier, MAC, Adder and Subtractor blocks Sabyasachi Das Synplicity Inc Sunil P. Khatri Texas A&M University.

Aug Shift Operations Source: David Harris. Aug Shifter Implementation Regular layout, can be compact, use transmission gates to avoid threshold.

Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.

Digital Arithmetic and Arithmetic Circuits

Chapter # 5: Arithmetic Circuits

1 Chapter 7 Computer Arithmetic Smruti Ranjan Sarangi Computer Organisation and Architecture PowerPoint Slides PROPRIETARY MATERIAL. © 2014 The McGraw-Hill.

EKT 221/4 DIGITAL ELECTRONICS II  Registers, Micro-operations and Implementations - Part3.

Computer Architecture Lecture 16 Fasih ur Rehman.

1 Carry Lookahead Logic Carry Generate Gi = Ai Bi must generate carry when A = B = 1 Carry Propagate Pi = Ai xor Bi carry in will equal carry out here.

Arithmetic-Logic Units. Logic Gates AND gate OR gate NOT gate.

Application of Addition Algorithms Joe Cavallaro.

Reconfigurable Computing - Performance Issues John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western.

Combinational Circuits

Lecture Adders Half adder.

Propositional Equivalence

CSE477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design

CSE 575 Computer Arithmetic Spring 2003 Mary Jane Irwin (www. cse. psu

Addition and multiplication

Space vs. Speed: Binary Adders

Principles & Applications

Lecture 12 Modular Design Topics Adder and Subtractor Design

Multipliers Multipliers play an important role in today’s digital signal processing and various other applications. The common multiplication method is.

Principles & Applications

Summary Half-Adder Basic rules of binary addition are performed by a half adder, which has two binary inputs (A and B) and two binary outputs (Carry out.

Digital System Design Review.

Chapter 4 Combinational Logic

Combinational Circuits

Arithmetic Circuits Didn’t I learn how

CSE Winter 2001 – Arithmetic Unit - 1

Lecture 12: Adders, Sequential Circuits

Lecture 12: Adders, Sequential Circuits

Unsigned Multiplication

VLSI Arithmetic Lecture 10: Multipliers

Arithmetic Circuits (Part I) Randy H

Multiplier-less Multiplication by Constants

Instructor: Prof. Chung-Kuan Cheng

12/4/2018 A Regularity-Driven Fast Gridless Detailed Router for High Frequency Datapath Designs By Sabyasachi Das (Intel Corporation) Sunil P. Khatri (Univ.

Integer Multipliers.

Chapter 1_5 register Cell Design

CS 140 Lecture 14 Standard Combinational Modules

Overview Part 1 – Design Procedure Part 2 – Combinational Logic

Reading: Study Chapter (including Booth coding)

Multioperand Addition

CSE 140 Lecture 14 Standard Combinational Modules

Addition and multiplication

Overview Iterative combinational circuits Binary adders

Addition and multiplication

74LS283 4-Bit Binary Adder with Fast Carry

ECE 352 Digital System Fundamentals

UNIVERSITY OF MASSACHUSETTS Dept

Lecture 9 Digital VLSI System Design Laboratory

Description and Analysis of MULTIPLIERS using LAVA

Number Representation

Lecture 2 Adders Half adder.

CSE140 Final Review Xinyuan Wang 06/06/2019.

Presentation transcript:

Sabyasachi Das Synplicity Inc Sunil P. Khatri Texas A&M University A Timing-Driven Hybrid-Compression Algorithm for Faster Sum-of-Products Sabyasachi Das Synplicity Inc Sunil P. Khatri Texas A&M University

What is a Sum-of-Product? IC block that performs addition of multiple product and sum terms Computationally-intensive Wide usage in DSP, Graphics, Microprocessors p = a * b q = c * d d z = p + q + e + f e f z p q c b a

Examples of Sum-of-Product Blocks Multiplication {assign z = a * b} MAC {assign z = (a * b) + c} 2-operand Addition {assign z = a + b} Squarer {assign z = a * a} Adder-Tree {assign z = a + b + c + d} Generalized SOP {assign z = (a * b) + (c * d) + (e * f) + g + h + k}

Structure of Sum-of-Products Inputs Sum-of-Products block consists of 3 parts (written in the order of data-flow) Partial Product Generator (PPGen) Partial Product Reduction Tree (PPRT) Final Carry-Propagation Adder (CPA) Partial Product Generator (PPGen) Partial Product Reduction Tree (PPRT) Final Carry Propagation Adder (CPA) Output

Partial Product Reduction Tree In Partial Product Reduction Tree, total number of elements in each bit gets reduced to upto two Partial Product Reduction Tree (PPRT) consumes >50% delay of the SOP block Hence the performance of PPRT is crucial to the performance of the SOP block

Two Reduction Counters in PPRT Reduces 2 inputs (ai and bi) to 2 outputs (Si and Ci+1) (3:2) Counter Reduces 3 inputs (ai, bi and ci) to 2 outputs (Si and Ci+1) ai bi Ci+1 Si ai bi ci Ci+1 Si

4:3 Reduction Counters (4:3) Counter 4 inputs to 3 outputs The functionality of the Ci+2 is a 4-input AND gate. Faster reduction at ith column Produces element to (i+2)th column at an earlier time Has larger area than other two counters bi ci di Ci+2 Ci+1 Si ai Key idea is to use (4:3) counter as much as possible in conjunction with the (3:2) and (2:2) counters

Explanation of our approach Perform column-wise reduction (LSB to MSB) For each column (or BitSlice/BitCluster) Sort inputs based on arrival time Is (2:2) reduction fast? If yes, instantiate that Else is (3:2) reduction fast? If yes, instantiate that Else instantiate (4:3) reduction After each reduction, re-sort the signals and continue

An example of our approach P07 P06 P05 P04 P03 P02 P01 P00 P17 P16 P15 P14 P13 P12 P11 P10 P27 P26 P25 P24 P23 P22 P21 P20 P37 P36 P35 P34 P33 P32 P31 P30 P47 P46 P45 P44 P43 P42 P41 P40 P57 P56 P55 P54 P53 P52 P51 P50 C02 C01 S00 C11 S10 C03 S01 C13 C12 S02

Results On an average, our approach produces about 3.5% speed improvement with 4.3% area penalty

Summary A 4:3 reduction counter is designed Reduces elements in the given column at a faster pace Produces an element to the (i+2)th column at an earlier time 4:3 reduction counter is used extensively (in conjunction with the existing 3:2 and 2:2 counters) A timing-driven algorithm selects the correct type of counter that needs to be instantiated On an average, 3.5% improvement in speed with 4.3% area penalty.

Thank you