Cost/Performance Tradeoffs: a case study

Slides:



Advertisements
Similar presentations
Part 4: combinational devices
Advertisements

Using Carry-Save Adders For Radix- 4, Can Be Used to Generate 3a – No Booth’s Slight Delay Penalty from CSA – 3 Gates.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3: IT Students.
Parallel Adder Recap To add two n-bit numbers together, n full-adders should be cascaded. Each full-adder represents a column in the long addition. The.
EE 382 Processor DesignWinter 98/99Michael Flynn 1 AT Arithmetic Most concern has gone into creating fast implementation of (especially) FP Arith. Under.
CSE-221 Digital Logic Design (DLD)
EE 141 Project 2May 8, Outstanding Features of Design Maximize speed of one 8-bit Division by: i. Observing loop-holes in 8-bit division ii. Taking.
EECS Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
Chapter 6 Arithmetic. Addition Carry in Carry out
L10 – Multiplication Division 1 Comp 411 – Fall /19/2009 Binary Multipliers ×
UNIVERSITY OF MASSACHUSETTS Dept
Digital Design – Optimizations and Tradeoffs
Modern VLSI Design 2e: Chapter 6 Copyright  1998 Prentice Hall PTR Topics n Shifters. n Adders and ALUs.
Contemporary Logic Design Arithmetic Circuits © R.H. Katz Lecture #24: Arithmetic Circuits -1 Arithmetic Circuits (Part II) Randy H. Katz University of.
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Copyright 2008 Koren ECE666/Koren Part.6a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
ENGIN112 L26: Shift Registers November 3, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 26 Shift Registers.
Multiplication.
 Arithmetic circuit  Addition  Subtraction  Division  Multiplication.
ECE 4110– Sequential Logic Design
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
Digital Integrated Circuits Chpt. 5Lec /29/2006 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (
IKI a-Combinatorial Components Bobby Nazief Semester-I The materials on these slides are adopted from those in CS231’s Lecture Notes.
Chapter # 5: Arithmetic Circuits
Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2 y 1 s 1 c 1 x 0 y 0 s 0 c 0 MSB positionLSB position Ripple-Carry.
Multi-operand Addition
EECS Components and Design Techniques for Digital Systems Lec 16 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
EKT 221/4 DIGITAL ELECTRONICS II  Registers, Micro-operations and Implementations - Part3.
July 2005Computer Architecture, The Arithmetic/Logic UnitSlide 1 Part III The Arithmetic/Logic Unit.
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics n Shifters. n Adders and ALUs.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 3:
FPGA-Based System Design: Chapter 4 Copyright  2003 Prentice Hall PTR Topics n Number representation. n Shifters. n Adders and ALUs.
4. Computer Maths and Logic 4.2 Boolean Logic Logic Circuits.
Abdullah Said Alkalbani University of Buraimi
RTL Hardware Design by P. Chu Chapter Poor design practice and remedy 2. More counters 3. Register as fast temporary storage 4. Pipelined circuit.
EE2174: Digital Logic and Lab Professor Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University CHAPTER 8 Arithmetic.
Topics covered: Arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
Digital Integrated Circuits© Prentice Hall 1995 Arithmetic Arithmetic Building Blocks.
Wallace Tree Previous Example is 7 Input Wallace Tree
Computer Architecture Lecture 16 Fasih ur Rehman.
Full Tree Multipliers All k PPs Produced Simultaneously Input to k-input Multioperand Tree Multiples of a (Binary, High-Radix or Recoded) Formed at Top.
Addition and multiplication Arithmetic is the most basic thing you can do with a computer, but it’s not as easy as you might expect! These next few lectures.
ECE DIGITAL LOGIC LECTURE 15: COMBINATIONAL CIRCUITS Assistant Prof. Fareena Saqib Florida Institute of Technology Fall 2015, 10/20/2015.
CSE477 L21 Multiplier Design.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (
ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Dr. Shi Dept. of Electrical and Computer Engineering.
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
RTL Hardware Design by P. Chu Chapter 9 – ECE420 (CSUN) Mirzaei 1 Sequential Circuit Design: Practice Shahnam Mirzaei, PhD Spring 2016 California State.
Topic: N-Bit parallel and Serial adder
Addition and multiplication1 Arithmetic is the most basic thing you can do with a computer, but it’s not as easy as you might expect! These next few lectures.
Multiplier Design [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
CSE477 VLSI Digital Circuits Fall 2003 Lecture 21: Multiplier Design
Multipliers Multipliers play an important role in today’s digital signal processing and various other applications. The common multiplication method is.
Unsigned Multiplication
ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN
ARM implementation the design is divided into a data path section that is described in register transfer level (RTL) notation control section that is viewed.
Part III The Arithmetic/Logic Unit
Reading: Study Chapter (including Booth coding)
Addition and multiplication
Montek Singh Mon, Mar 28, 2011 Lecture 11
UNIVERSITY OF MASSACHUSETTS Dept
UNIVERSITY OF MASSACHUSETTS Dept
Addition and multiplication
ECE 352 Digital System Fundamentals
Lecture 9 Digital VLSI System Design Laboratory
UNIVERSITY OF MASSACHUSETTS Dept
UNIVERSITY OF MASSACHUSETTS Dept
Presentation transcript:

Cost/Performance Tradeoffs: a case study Digital Systems Architecture I. 6.004 –Fall 2002 10/08/02

Binary Multiplication n bits EASY PROBLEM: design combinational circuit to multiply tiny (1-, 2-, 3-bit) operands... n bits HARD PROBLEM design circuit to multiply BIG (32-bit, 64-bit) numbers 2n bits since (2n-1)2 < 22n We can make big multipliers out of little ones! Engineering Principle: Exploit STRUCTURE in problem. 6.004 –Fall 2002 10/08/02

Making a 2n-bit multiplier using n-bit multipliers Given n-bit multipliers: Synthesize 2n-bit multipliers: 6.004 –Fall 2002 10/08/02

Our Basis: n=1: minimalist starting point Multiplying two 1-bit numbers is pretty simple: Of course, we could start with optimized combinational multipliers for larger operands; e.g. the logic gets more complex, but some optimizations are possible... 2-bit Multiplier 6.004 –Fall 2002 10/08/02

Our induction step: 2n-bit by 2n-bit multiplication: Divide multiplicands into n-bit pieces Form 2n-bit partial products, using n-bit by n-bit multipliers. Align appropriately Add. REGROUP partial products – 2 additions rather than 3! Induction: we can use the same structuring principle to build a 4n-bit multiplier from our newly-constructed 2n-bit ones... 6.004 –Fall 2002 10/08/02

Brick Wall view of partial products Making 4n-bit multipliers from n-bit ones: 2 “induction steps” 6.004 –Fall 2002 10/08/02

Multiplier Cookbook: Chapter 1 Step 1: Form (& arrange) Given problem: Partial Products: Subassemblies: Partial Products Adders Step 2: Sum 6.004 –Fall 2002 10/08/02

Performance/Cost Analysis "Order Of" notation: Example: "g(n) is of order f(n)" since if there exist C2≥C1 >0, such that for all but finitely many integral n ≥0 "almost always" Partial Products: implies Things to Add: both inequalities; O(...) implies only the second. Adder Width: Hardware Cost: Latency: 6.004 –Fall 2002 10/08/02

Observations: partial products. full adders. Hmmm. 6.004 –Fall 2002 10/08/02

Repackaging Function Engineering Principle #2: partial products. Put the Solution where the Problem is. full adders. How about n2 blocks, each doing a little multiplication and a little addition? 6.004 –Fall 2002 10/08/02

Goal: Array of Identical Multiplier Cells Single "brick" of brick-wall array... • Forms partial product • Adds to accumulating sum along with carry Necessary Component: Full Adder Takes 2 addend bits plus carry bit. Produces sum and carry output bits. CASCADE to form an n-bit adder. 6.004 –Fall 2002 10/08/02

Design of 1-bit multiplier "Brick": Array Layout: •operand bits bused diagonally •Carry bits propagate right-to-left •Sum bits propagate down Brick design: •AND gate forms 1x1 product •2-bit sum propagates from top to bottom •Carry propagates to left 6.004 –Fall 2002 10/08/02

Latency revisited Here’s our combinational multiplier: What’s its propagation delay? Naive (but valid) bound: •O(n) addtions •O(n) time for each addition •Hence O(n2) time required On closer inspection: •Propagation only toward left, bottom •Hence longest path bounded by length + width of array: O(n+n) = O(n)! 6.004 –Fall 2002 10/08/02

Multiplier Cookbook: Chapter 2 Combinational Multiplier: Hardware for n by n bits: Latency: Throughput: 6.004 –Fall 2002 10/08/02

Combinational Multiplier: best bang for the buck? Suppose we have LOTS of multiplications. PIPELINING Can we do better from a cost/performance standpoint? 6.004 –Fall 2002 10/08/02

Stupid Pipeline Tricks gotta break that long carry chain! Stages: Clock Period: Hardware cost for n by n bits: Latency: Throughput: 6.004 –Fall 2002 10/08/02

The Pipelining Bandwagon... where do I get on? WE HAVE: • Pipeline rules –"well formed pipelines" • Plenty of registers • Demand for higher throughput. What do we do? Where do we define stages? 6.004 –Fall 2002 10/08/02

Even Stupider Pipeline Tricks WORSE idea: • Doesn’t break long combinational paths • NOT a well-formed pipeline... ... different register counts on alternative paths ... data crosses stage boundaries in both directions! Back to basics: what’s the point of pipelining, anyhow? 6.004 –Fall 2002 10/08/02

Breaking O(n) combinational paths LONG PATHS go down, to left: •Break array into diagonal slices •Segment every long combinational path GOAL: Θ(n) stages; Θ (1) clock period! 6.004 –Fall 2002 10/08/02

Multiplier Cookbook: Chapter 3 Stages: Clock Period: Hardware cost for n by n bits: Latency: Throughput: •Well-formed pipeline (careful!) • Constant (high!) throughput, independently of operand size. ... but suppose we don’t need the throughput? 6.004 –Fall 2002 10/08/02

Moving down the cost curve... Suppose we have INFREQUENT multiplications... pipelining doesn’t help us. Can we do better from a cost/performance standpoint? Hmmm, are all these extras really needed? 6.004 –Fall 2002 10/08/02

Multiplier Cookbook: Chapter 4 Sequential Multiplier: •Re-uses a single n-bit “slice” to emulate each pipeline stage •a operand entered serially •Lots of details to be filled in... Stages: Clock Period: (constant!) Hardware cost for n by n bits: Latency: Throughput: 6.004 –Fall 2002 10/08/02

(Ridiculous?) Extremes Dept... Cost minimization: how far can we go? Suppose we want to minimize hardware (at any cost)… •Consider bit-serial! •Form and add 1-bit partial product per clock •Reuse single “brick” for each bit bj of slice; •Re-use slice for each bit of a operand 6.004 –Fall 2002 10/08/02

Multiplier Cookbook: Chapter 5 Bit Serial multiplier: •Re-uses a single brick to emulate an n-bit slice •both operands entered serially •O(n2) clock cycles required •Needs additional storage (typically from existing registers) Stages: Clock Period: (constant!) Hardware cost for n by n bits: Latency: Throughput: 6.004 –Fall 2002 10/08/02

Summary: Scheme: $ Latency Thruput Combinational N-pipe Slice-serial Bit-serial Lots more multiplier technology: fast adders, Booth Encoding, column compression, ... 6.004 –Fall 2002 10/08/02