An Extra-Regular, Compact, Low-Power Multiplier Design Using Triple-Expansion Schemes and Borrow Parallel Counter Circuits Rong Lin Ronald B. Alonzo SUNY.

Slides:

Advertisements

Similar presentations

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.

Advertisements

EE141 Adder Circuits S. Sundar Kumar Iyer.

Using Carry-Save Adders For Radix- 4, Can Be Used to Generate 3a – No Booth’s Slight Delay Penalty from CSA – 3 Gates.

UNIVERSITY OF MASSACHUSETTS Dept

A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.

Parallel Adder Recap To add two n-bit numbers together, n full-adders should be cascaded. Each full-adder represents a column in the long addition. The.

CSE-221 Digital Logic Design (DLD)

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE VLSI Circuit Design Lecture 24 - Subsystem.

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]

Copyright 2008 Koren ECE666/Koren Part.6b.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.

UNIVERSITY OF MASSACHUSETTS Dept

Introduction to VLSI Circuits and Systems, NCUT 2007 Chapter 12 Arithmetic Circuits in CMOS VLSI Introduction to VLSI Circuits and Systems 積體電路概論賴秉樑 Dept.

Digital Design – Optimizations and Tradeoffs

Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H

Modern VLSI Design 2e: Chapter 6 Copyright  1998 Prentice Hall PTR Topics n Shifters. n Adders and ALUs.

Contemporary Logic Design Arithmetic Circuits © R.H. Katz Lecture #24: Arithmetic Circuits -1 Arithmetic Circuits (Part II) Randy H. Katz University of.

Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.

Copyright 2008 Koren ECE666/Koren Part.6a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.

Lecture 18: Datapath Functional Units

Design of Robust, Energy-Efficient Full Adders for Deep-Submicrometer Design Using Hybrid-CMOS Logic Style Sumeer Goel, Ashok Kumar, and Magdy A. Bayoumi.

Aug Shift Operations Source: David Harris. Aug Shifter Implementation Regular layout, can be compact, use transmission gates to avoid threshold.

Chapter 6-2 Multiplier Multiplier Next Lecture Divider

VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California

Introduction to VLSI Design – Lec01. Chapter 1 Introduction to VLSI Design Lecture # 2 A Circuit Design Example.

Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.

Digital Integrated Circuits Chpt. 5Lec /29/2006 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (

Chapter # 5: Arithmetic Circuits

Arithmetic Building Blocks

1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University

Reconfigurable Computing - Multipliers: Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on.

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Reference: Digital Integrated.

Arithmetic Building Blocks

Description and Analysis of MULTIPLIERS using LAVA.

Multi-operand Addition

Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.

Chapter 14 Arithmetic Circuits (I): Adder Designs Rev /12/2003

CHAPTER 4 Combinational Logic

Reconfigurable Computing - Type conversions and the standard libraries John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots.

L 19: Low Power Circuit Optimization. Power Optimization Modeling and Technology Circuit Design Level –logic Families –low-power Flip-Flops –low-power.

Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics n Shifters. n Adders and ALUs.

A Reconfigurable Low-power High-Performance Matrix Multiplier Architecture With Borrow Parallel Counters Counters : Rong Lin SUNY at Geneseo

EE 466/586 VLSI Design Partha Pande School of EECS Washington State University

EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.

Cost/Performance Tradeoffs: a case study

Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.

Priority encoder. Overview Priority encoder- theoretic view Other implementations The chosen implementation- simulations Calculations and comparisons.

Full Tree Multipliers All k PPs Produced Simultaneously Input to k-input Multioperand Tree Multiples of a (Binary, High-Radix or Recoded) Formed at Top.

A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.

EE141 Project: 32x32 SRAM Abhinav Gupta, Glen Wong Optimization goals: Balance between area and performance Minimize area without sacrificing performance.

Comparison of Various Multipliers for Performance Issues 24 March Depart. Of Electronics By: Manto Kwan High Speed & Low Power ASIC

CSE477 L21 Multiplier Design.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (

EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.

CS151 Introduction to Digital Design Chapter 4: Arithmetic Functions and HDLs 4-1: Iterative Combinational Circuits 4-2: Binary Adders 1Created by: Ms.Amany.

Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,

Discrete Systems I Lecture 10 Adder and ALU Profs. Koike and Yukita.

Digital Logic & Design Dr.Waseem Ikram Lecture 44.

EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003.

Multiplier Design [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]

CSE477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design

Multipliers Multipliers play an important role in today’s digital signal processing and various other applications. The common multiplication method is.

VLSI Arithmetic Lecture 10: Multipliers

UNIVERSITY OF MASSACHUSETTS Dept

UNIVERSITY OF MASSACHUSETTS Dept

UNIVERSITY OF MASSACHUSETTS Dept

Lecture 9 Digital VLSI System Design Laboratory

Comparison of Various Multipliers for Performance Issues

UNIVERSITY OF MASSACHUSETTS Dept

Description and Analysis of MULTIPLIERS using LAVA

Arithmetic Building Blocks

UNIVERSITY OF MASSACHUSETTS Dept

Presentation transcript:

An Extra-Regular, Compact, Low-Power Multiplier Design Using Triple-Expansion Schemes and Borrow Parallel Counter Circuits Rong Lin Ronald B. Alonzo SUNY at Geneseo University of Rochester ISCA-WCED, San Diego, CA, June 2003

The Focus of The Presentation: A Complexity-Reduced Multiplier Design Approach Background Overview of the building block circuits Overview of the intermediate block circuits Overview of the triple expanded multiplier architecture Experimental work Concluding remarks Contents With superiority in layout compactness, small area, low-power, high- performance, with potential for self testability.

1. Background

Traditional Approach Stage 1: Generation of the large partial product bit matrix Usually with Booth recoding Stage 2: Reduction of the partial product matrix into two numbers Usually with binary CSA adders: (3,2) (4, 2) based Stage 3: Final addition (by a standard fast adder) Rectangular-styled Wallacetree [Ref. 2] (Itoh, et al. 2001) Limited switch dynamic logic [Ref.1] (Montoye, et al. 2003) Recently proposed designs: two groups of partial Product bits merging precharged Dynamic logic into Input of every latch

Our Approach Stage 1: Generation of many (81 for 54x54-b) small partial product bit matrices in parallel -----Non-Booth Stage 2: Reduction of the partial product matrices into two numbers with non-binary 4-b 1-hot encoded counters (called borrow parallel counters ), which are larger than (3,2) (4, 2) binary counters Stage 3: Final addition (by a standard fast adder) Complexity is reduced significantly: simple CMOS technology Smaller area minimal custom design repeatable and modular self-testable low-power

2. The Circuits Of Building Blocks

The building block circuits: borrow parallel counters The 5_1 borrow parallel counter

About the large parallel counter 5_1 Receiving 5 binary Input bits with 1 of them being weighted 2 (called borrow bit), and others weighted 1. Producing 2 output bits and 3 In-stage carry in and out bits), so that the weighted sums of all in bits and all out bits are equal. CMOS pass-transistor circuit processing 4-b 1-hot encoded signals, each representing an integer of value ranging 0 to 3.

(1) Low switching activity (2) Fewer hot lines (data paths) (3) Low transistor count (78; equivalent to 3.3 FA’s)

(4) A very compact layout due to good transistor distribution and 4 identical paths processed in parallel (binary logic does not have the advantages)

The borrow bit (in red) (1) Simplify the logic, reduce the number of transistors (2) Reduce the number of pass transistors cascaded (no more than 4 including 1 within the input inverter) (3) Rearrange and balance input bits for small multipliers

The embedded full adder adding two 4-b 1-hot encoded bits (s0 at column j+2, s1 column at j+1) and 1 binary bit (q at column j) directly they have the same weight! No type-conversion needed

The embedded full adder adding two 4-b 1-hot encoded bits (s0 at column j+2, s1 column at j+1) and 1 binary bit (q at column j) directly they have the same weight! No type-conversion needed

3. The Circuits Of Intermediate Blocks

The 6 x 6-b borrow parallel multiplier An array of borrow parallel counters ( virtually eliminating all area needed for inter-counter connections ) Input: two 6-b numbers; output two numbers: p10 - p0 and q10 - q5 CSA style output, because it serves as an intermediate block) Inheriting all advantages of borrow parallel counters Delay = a single counter delay Height = a single counter height Extra compact virtually no inter-counter connection The height of the block is very small (important for triple expansion) ovals with the same color form an embedded FA (or HA or a binary bit) (3,2): 3 ovals (2,2): 2 ovals single bit: 1 oval

Comparison of inter-block connections of 6 x6 multipliers Traditional approach Borrow parallel approach 30% area reduction!

4. The Triple Expanded Multipliers

The partial product bit matrix trisect-decomposition and first-level multiplier triple expansion Triple 6 x 6-b => 18 x 18-b multiplier

Second-level multiplier triple expansion Triple 18 x 18-b => 54 x 54-b multiplier 54 x 54-b

The typical simulation data

The summary of multipliers 0.70

The counterpart works in our study : (1)parallel counters existing and widely used binary-logic (3, 2)s and (4, 2)s (2) Small multipliers (widely used 8 x 8 -b ) (3) IEEE floating point multipliers ( 54 x 54 -b )

5. The Experimental Work: Layout And Tests

The 5_1 borrow parallel counter (with output buffers):

The 6 x 6 multiplier - wiring at this level very simple - Manhattan cell structure

The 4X4 multiplier with counters (4,2), (3,2), and (2,2) - wiring very irregular

6. Concluding Remarks

Concluding Remarks Complexity-reduced multiplier design with new arithmetic circuits and schemes achieving low-power high- performance through a novel logic approach which includes: (1) 4-b 1-hot data paths are dominated (lower switching activity in each logic stage) (2) Fewer hot lines generated in logic process (power & leakage power) (3) Lower transistor count (4) Higher circuit regularity, lower layout complexity (5) Lower complexity of component interconnection

Concluding Remarks (cont’d) (6) Utilizing borrow bits for simple circuit and high speed, more importantly, reducing pass-transistor path length (no more than 4) and rearranging and balancing input bits to each column of small multipliers. (7) Utilizing partial product bit matrix decomposition for component repetition and full self-testability, achieving high observability and controllability for component circuits (small multipliers are exhaustively testable)