1 A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri.

Slides:

Advertisements

Similar presentations

UNIVERSITY OF MASSACHUSETTS Dept

Advertisements

1 ECE 4436ECE 5367 Computer Arithmetic I-II. 2 ECE 4436ECE 5367 Addition concepts 1 bit adder –2 inputs for the operands. –Third input – carry in from.

Introduction So far, we have studied the basic skills of designing combinational and sequential logic using schematic and Verilog-HDL Now, we are going.

CPE 626 CPU Resources: Adders & Multipliers Aleksandar Milenkovic Web:

1 Lecture 12: Hardware for Arithmetic Today’s topics:  Designing an ALU  Carry-lookahead adder Reminder: Assignment 5 will be posted in a couple of days.

Mohamed Younis CMCS 411, Computer Architecture 1 CMCS Computer Architecture Lecture 7 Arithmetic Logic Unit February 19,

Using Carry-Save Adders For Radix- 4, Can Be Used to Generate 3a – No Booth’s Slight Delay Penalty from CSA – 3 Gates.

Multiplication Schemes Continued

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]

1 CS 140 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris.

1 Generation of Optimal Bit-Width Topology of Fast Hybrid Adder in a Parallel Multiplier Sabyasachi Das Synplicity Inc. Sunil P. Khatri Texas A&M University.

Arithmetic II CPSC 321 E. J. Kim. Today’s Menu Arithmetic-Logic Units Logic Design Revisited Faster Addition Multiplication (if time permits)

1 Timing-Driven Synthesis for Fast Barrel Shifters Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri Texas A&M University.

1 Design of a Parallel-Prefix Adder Architecture with Efficient Timing-Area Tradeoff Characteristic Sabyasachi Das University of Colorado, Boulder Sunil.

UNIVERSITY OF MASSACHUSETTS Dept

Introduction to CMOS VLSI Design Lecture 11: Adders

Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H

Lecture 8 Arithmetic Logic Circuits

Arithmetic-Logic Units CPSC 321 Computer Architecture Andreas Klappenecker.

1 Area-reducing Sharing of Mutually Exclusive Multiplier, MAC, Adder and Subtractor blocks Sabyasachi Das Synplicity Inc Sunil P. Khatri Texas A&M University.

Overview Iterative combinational circuits Binary adders

Fall 2008EE VLSI Design I - © Kia Bazargan 1 EE 5323 – VLSI Design I Kia Bazargan University of Minnesota Adders.

Lecture 17: Adders.

Overview Iterative combinational circuits Binary adders

Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,

Parallel Prefix Adders A Case Study

Introduction to CMOS VLSI Design Lecture 11: Adders David Harris Harvey Mudd College Spring 2004.

Bar Ilan University, Engineering Faculty

Chapter 6-2 Multiplier Multiplier Next Lecture Divider

Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.

Digital Arithmetic and Arithmetic Circuits

Chapter 4 – Arithmetic Functions and HDLs Logic and Computer Design Fundamentals.

Chapter # 5: Arithmetic Circuits

Chapter 6-1 ALU, Adder and Subtractor

5-1 Programmable and Steering Logic Chapter # 5: Arithmetic Circuits.

1 Chapter 7 Computer Arithmetic Smruti Ranjan Sarangi Computer Organisation and Architecture PowerPoint Slides PROPRIETARY MATERIAL. © 2014 The McGraw-Hill.

Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.

1 CPSC3850 Adders and Simple ALUs Simple Adders Figures 10.1/10.2 Binary half-adder (HA) and full-adder (FA). Digit-set interpretation: {0, 1}

Design of a 32-Bit Hybrid Prefix-Carry Look-Ahead Adder

July 2005Computer Architecture, The Arithmetic/Logic UnitSlide 1 Part III The Arithmetic/Logic Unit.

Fast Adders: Parallel Prefix Network Adders, Conditional-Sum Adders, & Carry-Skip Adders ECE 645: Lecture 5.

Area: VLSI Signal Processing.

Spring C:160/55:132 Page 1 Lecture 19 - Computer Arithmetic March 30, 2004 Sukumar Ghosh.

1 CS 151: Digital Design Chapter 4: Arithmetic Functions and Circuits 4-1,2: Iterative Combinational Circuits and Binary Adders.

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.

COMP541 Arithmetic Circuits

ECE 645 – Computer Arithmetic Lecture 6: Multi-Operand Addition ECE 645—Computer Arithmetic 3/5/08.

Block p and g Generators. Carry Determination as Prefix Computations Two Contiguous (or Overlapping) Blocks (g’, p’) and (g’’, p’’) Merged Block (g, p)

Topics covered: Arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

Unrolling Carry Recurrence

COMP541 Arithmetic Circuits

Conditional-Sum Adders Parallel Prefix Network Adders

Computer Architecture Lecture 16 Fasih ur Rehman.

ECE 331 – Digital System Design Multi-bit Adder Circuits, Adder/Subtractor Circuit, and Multiplier Circuit (Lecture #12)

EE466: VLSI Design Lecture 13: Adders

Full Tree Multipliers All k PPs Produced Simultaneously Input to k-input Multioperand Tree Multiples of a (Binary, High-Radix or Recoded) Formed at Top.

ECE/CS 552: Arithmetic I Instructor:Mikko H Lipasti Fall 2010 University of Wisconsin-Madison Lecture notes partially based on set created by Mark Hill.

Arithmetic-Logic Units. Logic Gates AND gate OR gate NOT gate.

1 Lecture 11: Hardware for Arithmetic Today’s topics:  Logic for common operations  Designing an ALU  Carry-lookahead adder.

BR 8/99 Arithmetic Operations We will review the arithmetic building blocks we have previously used, and look at some new ones. –Addition –incrementer.

Grade School Again: A Parallel Perspective CS Lecture 7.

EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.

EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.

EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.

Combinational Circuits

Conditional-Sum Adders Parallel Prefix Network Adders

Unsigned Multiplication

Sabyasachi Das Synplicity Inc Sunil P. Khatri Texas A&M University

Overview Part 1 – Design Procedure Part 2 – Combinational Logic

Part III The Arithmetic/Logic Unit

Presentation transcript:

1 A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri Texas A&M University

2 What is a Sum-of-Product (SOP) An arithmetic Sum-of-Product block (SOP) consists of an arbitrary number of product terms and sum terms. An arithmetic Sum-of-Product block (SOP) consists of an arbitrary number of product terms and sum terms. General form of SOP: General form of SOP: p = a * b a b q = c * d c d z = p + q + e + f e f z p q

3 Examples of SOP Blocks Multiplier { assign z = a * b} Multiplier { assign z = a * b} found in Microprocessors found in Microprocessors Multiply-Accumulator { assign z = (a * b) + c} Multiply-Accumulator { assign z = (a * b) + c} found in Cryptographic Applications found in Cryptographic Applications Squarer { assign z = a * a} Squarer { assign z = a * a} found in DSP processors found in DSP processors Addition Tree { assign z = a + b + c + d} Addition Tree { assign z = a + b + c + d} found in ALU, Wireless applications found in ALU, Wireless applications Generalized SOP { assign z = (a * b) + (c * d)} Generalized SOP { assign z = (a * b) + (c * d)} found in FIR filters, IIR filters found in FIR filters, IIR filters

4 Synthesis of Sum-of-Products Synthesis of Sum-of- Product blocks is done in 3 steps (in the order of data- flow) Synthesis of Sum-of- Product blocks is done in 3 steps (in the order of data- flow) Creation of Partial Products Creation of Partial Products Reduction of Partial Products into 2 operands Reduction of Partial Products into 2 operands Computation of Final Sum by adding the 2 operands Computation of Final Sum by adding the 2 operands Creation of Partial Products Reduction of Partial Products Computation of Final Sum Inputs Output

5 Motivation and Problem Statement SOP blocks are widely used and computationally-intensive SOP blocks are widely used and computationally-intensive Final adder in SOP consumes about 30% to 40% delay of the SOP block. This paper focuses on the synthesis of an efficient final adder for a SOP expression Final adder in SOP consumes about 30% to 40% delay of the SOP block. This paper focuses on the synthesis of an efficient final adder for a SOP expression Stand-alone adder architectures do not work well in SOP Stand-alone adder architectures do not work well in SOP

6 Stand-alone Adder Architectures Frequently used adder architectures Frequently used adder architectures Ripple-Carry Ripple-Carry Area-efficient, but slow Area-efficient, but slow Timing-efficient if inputs have skewed arrival time Timing-efficient if inputs have skewed arrival time Parallel-Prefix architecture (Brent-Kung, Kogge-Stone) Parallel-Prefix architecture (Brent-Kung, Kogge-Stone) Faster architecture Faster architecture Requires more area Requires more area Carry-Select Carry-Select Large area overhead (often >100%) Large area overhead (often >100%) Better delay if C in signal arrives late. Better delay if C in signal arrives late. None of these are very suitable in Sum-of-Products None of these are very suitable in Sum-of-Products Why? Why?

7 Special Arrival-time Property The 2 operands of the final adder in a SOP exhibit a peculiar arrival time pattern The 2 operands of the final adder in a SOP exhibit a peculiar arrival time pattern As a result, traditional monolithic adders do not work well in SOP As a result, traditional monolithic adders do not work well in SOP Optimized for equal arrival times Optimized for equal arrival times Hence, hybrid adders are required, which exploit this arrival-time pattern Hence, hybrid adders are required, which exploit this arrival-time pattern Hence it is critical to synthesize an efficient hybrid adder which is designed specifically for SOP blocks Hence it is critical to synthesize an efficient hybrid adder which is designed specifically for SOP blocks

8 Proposed 4-Stage Hybrid Adder SubAdder 1 RippleCarry w1w1 w1w1 w1w1 SubAdder 2 KoggeStone w2w2 w2w2 w2w2 SubAdder 3 CarrySelect w3w3 w3w3 w3w3 SubAdder 4 CarrySelect w4w4 w4w4 w4w4 Ripple-Carry architecture near LSB Ripple-Carry architecture near LSB Fast Kogge-Stone architecture near Middle Fast Kogge-Stone architecture near Middle 2 Carry-Selects (based on Brent-Kung) near MSB 2 Carry-Selects (based on Brent-Kung) near MSB GOAL : Find w 1, w 2, w 3 and w 4 algorithmically GOAL : Find w 1, w 2, w 3 and w 4 algorithmically

9 Notations We use the following notations: We use the following notations: The bit-width of SubAdder 1 (Ripple) is w 1 bits The bit-width of SubAdder 1 (Ripple) is w 1 bits The bit-width of SubAdder 2 (Kogge-Stone) is w 2 bits The bit-width of SubAdder 2 (Kogge-Stone) is w 2 bits The bit-width of SubAdder 3 (Carry-Select, Brent-Kung) is w 3 bits The bit-width of SubAdder 3 (Carry-Select, Brent-Kung) is w 3 bits The bit-width of SubAdder 4 (Carry-Select, Brent-Kung) is w 4 bits The bit-width of SubAdder 4 (Carry-Select, Brent-Kung) is w 4 bits w 1 + w 2 + w 3 + w 4 = n (total width of the hybrid adder) w 1 + w 2 + w 3 + w 4 = n (total width of the hybrid adder) T(a i ) = Time when input signal a i is available T(a i ) = Time when input signal a i is available T(S i ) = Time when output signal S i (Sum i ) is available T(S i ) = Time when output signal S i (Sum i ) is available T(C i ) = Time when output signal C i (Carry i ) is available T(C i ) = Time when output signal C i (Carry i ) is available

10 SubAdder 1 (Ripple-Carry) Most area-efficient architecture Most area-efficient architecture Very slow Very slow Timing-efficient if input arrival time is skewed. We use it for a few bits near LSB (which arrive earliest) Timing-efficient if input arrival time is skewed. We use it for a few bits near LSB (which arrive earliest) FA x0x0 y0y0 z0z0 x1x1 y1y1 z1z1 x2x2 y2y2 z2z2 xkxk ykyk zkzk z k+1

11 Parallel-Prefix Adders (KS, BK) In a Parallel-Prefix adder, Carry for each bit is computed by an efficient tree-structure (using the Generate and Propagate concept). In a Parallel-Prefix adder, Carry for each bit is computed by an efficient tree-structure (using the Generate and Propagate concept). For each bit i of the adder, Generate (G i ) indicates whether a carry is generated from that bit For each bit i of the adder, Generate (G i ) indicates whether a carry is generated from that bit G i = a i b i G i = a i b i For each bit i of the adder, Propagate (P i ) indicates whether a carry is propagated through that bit For each bit i of the adder, Propagate (P i ) indicates whether a carry is propagated through that bit P i = a i b i P i = a i b i The Generate and Propagate concept is extendable to blocks comprising multiple bits, as we discuss next The Generate and Propagate concept is extendable to blocks comprising multiple bits, as we discuss next

12 Parallel-Prefix Adders (KS, BK) If two blocks (comprising one or more bits) have the GP value- pairs as (G left, P left ) and (G right, P right ), then the combined block has the GP values as follows: If two blocks (comprising one or more bits) have the GP value- pairs as (G left, P left ) and (G right, P right ), then the combined block has the GP values as follows: G left, right = G left (P left G right ) G left, right = G left (P left G right ) P left, right = P left P right P left, right = P left P right The above computation is performed The above computation is performed by a carry-operator or ”o”-operator by a carry-operator or ”o”-operator Once we obtain carry for each bit, Once we obtain carry for each bit, it is trivial to compute the sum output of each bit (XOR and NAND) (G left, P left ) (G right, P right ) (G left, right, P left, right )

13 SubAdder 2 (Kogge-Stone) Kogge-Stone Parallel prefix architecture Kogge-Stone Parallel prefix architecture Delay: log 2 n levels of ”o”-operator Delay: log 2 n levels of ”o”-operator Area: (n*log 2 n)-n+1 number of ”o”-operator Area: (n*log 2 n)-n+1 number of ”o”-operator GP 3 GP 2 GP 1 GP 0 GP 7 GP 6 GP 5 GP 4 C4C4 C3C3 C2C2 C8C8 C7C7 C6C6 C5C5 C1C1 Kogge and Stone, “A parallel algorithm for the efficient solution of a general class of recurrence equations”, In IEEE transaction for Computers, 1973

14 Brent-Kung (BK) Brent-Kung Parallel prefix architecture Brent-Kung Parallel prefix architecture Delay: (2*log 2 n)-2 levels of ”o”-operator Delay: (2*log 2 n)-2 levels of ”o”-operator Area: (2*n)-2-log 2 n number of ”o”-operator Area: (2*n)-2-log 2 n number of ”o”-operator GP 3 GP 2 GP 1 GP 0 GP 7 GP 6 GP 5 GP 4 C4C4 C3C3 C2C2 C8C8 C7C7 C6C6 C5C5 C1C1 Brent and Kung, “A regular layout for parallel adders”, In IEEE transaction for Computers, 1982

15 SubAdder 3 & SubAdder 4 (Carry-Select) Adder 1 y x z1 Adder 0 1’b0 x z0 Mux z c in y 1’b1 Large area overhead Large area overhead Used as a special case, since C in arrives late Used as a special case, since C in arrives late Speed depends on the architecture of two adders Speed depends on the architecture of two adders But these adders need not be KS (rather, we use BK) But these adders need not be KS (rather, we use BK) The arrival times of the inputs of SubAdder 3 and SubAdder 4 are earlier than those for SubAdder 2 The arrival times of the inputs of SubAdder 3 and SubAdder 4 are earlier than those for SubAdder 2

16 Determination of width of SubAdder 1 Width of the Ripple adder (SubAdder 1 ) Width of the Ripple adder (SubAdder 1 ) At every bit (i), compute T(C i+1 ) and check if At every bit (i), compute T(C i+1 ) and check if T(C i+1 ) ≤ T(a i+1 ) T(C i+1 ) ≤ T(a i+1 ) T(C i+1 ) ≤ T(b i+1 ) T(C i+1 ) ≤ T(b i+1 ) If check passes, i = i+1 If check passes, i = i+1 Else continue checking until 3 consecutive bits fail the check (Hill Climbing) Else continue checking until 3 consecutive bits fail the check (Hill Climbing) Return the value i as the Ripple Adder width Return the value i as the Ripple Adder width

17 Determination of width of SubAdder 2 Width of Kogge-Stone Adder (SubAdder 2 ) Width of Kogge-Stone Adder (SubAdder 2 ) The latest arriving signals are part of this adder The latest arriving signals are part of this adder Hence keep this adder wide, while ensuring that this does not result in a very narrow Carry- Select adder for SubAdder 3 and SubAdder 4 Hence keep this adder wide, while ensuring that this does not result in a very narrow Carry- Select adder for SubAdder 3 and SubAdder 4 We determine the widths with the following equation: We determine the widths with the following equation: w 2 = n – w 1 if (n-w 1 ) ≤ 8 w 2 = n – w 1 if (n-w 1 ) ≤ 8 w 2 = 2 p, where p = log 2 (n-w 1 ) if (n-w 1 ) > 8 w 2 = 2 p, where p = log 2 (n-w 1 ) if (n-w 1 ) > 8 Example: If n=32 and w 1 =7 then w 2 =16

18 Delay of the Hybrid Adder SubAdder 1 RippleCarry w1w1 w1w1 w1w1 SubAdder 2 KoggeStone w2w2 w2w2 w2w2 SubAdder 3 CarrySelect w3w3 w3w3 w3w3 SubAdder 4 CarrySelect w4w4 w4w4 w4w4 T hybrid = max (T(C 4 ), T(S 4 ), T(S 3 ), T(S 2 )) T(S 2 ) T(S 3 )T(S 4 )T(C 4 )

19 Determination of widths of SubAdder 3 and SubAdder 4 Width of the two Carry-Select adders Width of the two Carry-Select adders Initial width configuration w 3 = (n-w 1 -w 2 )/2 w 4 = (n-w 1 -w 2 -w 3 ) With this initial configuration, estimate delay of the overall hybrid adder (based on the previous slide) Use an iterative approach to explore in the appropriate direction (similar to Binary Search) and converge on the smallest delay configuration Use an iterative approach to explore in the appropriate direction (similar to Binary Search) and converge on the smallest delay configuration

20 Experimental Setup To test our approach, we used: To test our approach, we used: Adders in several different types of SOP blocks (Multipliers, MAC, generalized SOP and Squarer) Adders in several different types of SOP blocks (Multipliers, MAC, generalized SOP and Squarer) Two process technologies (0.13µ and 0.09µ) Two process technologies (0.13µ and 0.09µ) Two commercial library vendors Two commercial library vendors Two different arrival time constraints Two different arrival time constraints We compared the results of our hybrid adder with the adder produced by a commercial datapath synthesis tool. We compared the results of our hybrid adder with the adder produced by a commercial datapath synthesis tool.

21Results On an average, 14.31% faster than the result of the commercial Synthesis tool (with 6.62% area penalty)

22 Summary Hybrid adder consists of 4 SubAdders Hybrid adder consists of 4 SubAdders SubAdder 1 has Ripple-Carry architecture SubAdder 1 has Ripple-Carry architecture SubAdder 2 has Kogge-Stone architecture SubAdder 2 has Kogge-Stone architecture SubAdder 3 and SubAdder 4 have Carry-Select (based on Brent-Kung) architecture SubAdder 3 and SubAdder 4 have Carry-Select (based on Brent-Kung) architecture Widths of all SubAdders are computed based on a timing-driven analysis Widths of all SubAdders are computed based on a timing-driven analysis On an average, 14.31% faster (with 6.62% area penalty) On an average, 14.31% faster (with 6.62% area penalty)

23 Thank you