Lecture 18: Datapath Functional Units

Slides:



Advertisements
Similar presentations
Lecture 23: Registers and Counters (2)
Advertisements

Multiplication and Shift Circuits Dec 2012 Shmuel Wimer Bar Ilan University, Engineering Faculty Technion, EE Faculty 1.
ECE2030 Introduction to Computer Engineering Lecture 13: Building Blocks for Combinational Logic (4) Shifters, Multipliers Prof. Hsien-Hsin Sean Lee School.
Combinational Circuits
Arithmetic Functions and Circuits
Using Carry-Save Adders For Radix- 4, Can Be Used to Generate 3a – No Booth’s Slight Delay Penalty from CSA – 3 Gates.
Datapath Functional Units. Outline  Comparators  Shifters  Multi-input Adders  Multipliers.
UNIVERSITY OF MASSACHUSETTS Dept
1 CS 140 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris.
Copyright 2008 Koren ECE666/Koren Part.6b.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 29: Datapath Subsystems 3/3 Prof. Sherief Reda Division of Engineering,
EECS Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
ECE C03 Lecture 61 Lecture 6 Arithmetic Logic Circuits Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
UNIVERSITY OF MASSACHUSETTS Dept
Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H
Modern VLSI Design 2e: Chapter 6 Copyright  1998 Prentice Hall PTR Topics n Shifters. n Adders and ALUs.
Lecture 8 Arithmetic Logic Circuits
Contemporary Logic Design Arithmetic Circuits © R.H. Katz Lecture #24: Arithmetic Circuits -1 Arithmetic Circuits (Part II) Randy H. Katz University of.
Computer ArchitectureFall 2008 © August 25, CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (1)
EE466: VLSI Design Lecture 14: Datapath Functional Units.
Introduction to CMOS VLSI Design Datapath Functional Units
ECE 301 – Digital Electronics
Copyright 2008 Koren ECE666/Koren Part.6a.1 Israel Koren Spring 2008 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Digital Computer.
Datapath Functional Units Adapted from David Harris of Harvey Mudd College.
Multiplication.
Lecture 18: Datapath Functional Units
ECE 4110– Sequential Logic Design
Aug Shift Operations Source: David Harris. Aug Shifter Implementation Regular layout, can be compact, use transmission gates to avoid threshold.
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
1 CHAPTER 4: PART I ARITHMETIC FOR COMPUTERS. 2 The MIPS ALU We’ll be working with the MIPS instruction set architecture –similar to other architectures.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Logic Circuits I.
Digital Integrated Circuits Chpt. 5Lec /29/2006 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (
IKI a-Combinatorial Components Bobby Nazief Semester-I The materials on these slides are adopted from those in CS231’s Lecture Notes.
Chapter 4 – Arithmetic Functions and HDLs Logic and Computer Design Fundamentals.
Chapter # 5: Arithmetic Circuits
Reading Assignment: Weste: Chapter 8 Rabaey: Chapter 11
Chapter 6-1 ALU, Adder and Subtractor
Introduction to Computer Organization and Architecture Lecture 10 By Juthawut Chantharamalee wut_cha/home.htm.
5-1 Programmable and Steering Logic Chapter # 5: Arithmetic Circuits.
Reconfigurable Computing - Multipliers: Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on.
Spring 2002EECS150 - Lec12-cl3 Page 1 EECS150 - Digital Design Lecture 12 - Combinational Logic Circuits Part 3 March 4, 2002 John Wawrzynek.
55:035 Computer Architecture and Organization Lecture 5.
Advanced VLSI Design Unit 05: Datapath Units. Slide 2 Outline  Adders  Comparators  Shifters  Multi-input Adders  Multipliers.
EECS Components and Design Techniques for Digital Systems Lec 16 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use Logic and Computer Design.
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics n Shifters. n Adders and ALUs.
FPGA-Based System Design: Chapter 4 Copyright  2003 Prentice Hall PTR Topics n Number representation. n Shifters. n Adders and ALUs.
1 Arithmetic and Logic Operations Patt and Patel Ch. 2 & 3.
EE141 © Digital Integrated Circuits 2nd Arithmetic Circuits 1 Digital Integrated Circuits A Design Perspective Arithmetic Circuits Jan M. Rabaey Anantha.
COMP541 Arithmetic Circuits
Topics covered: Arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Kavita Bala CS 3410, Spring 2014 Computer Science Cornell University.
Addition, Subtraction, Logic Operations and ALU Design
ECE/CS 552: Arithmetic I Instructor:Mikko H Lipasti Fall 2010 University of Wisconsin-Madison Lecture notes partially based on set created by Mark Hill.
Addition and multiplication Arithmetic is the most basic thing you can do with a computer, but it’s not as easy as you might expect! These next few lectures.
Lecture #23: Arithmetic Circuits-1 Arithmetic Circuits (Part I) Randy H. Katz University of California, Berkeley Fall 2005.
CSE477 L21 Multiplier Design.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 21: Multiplier Design Mary Jane Irwin (
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
Addition and multiplication1 Arithmetic is the most basic thing you can do with a computer, but it’s not as easy as you might expect! These next few lectures.
Introduction to VLSI Design Datapath
Lecture 18: Datapath Functional Units. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 18: Datapath Functional Units2 Outline  Multipliers.
Multiplier Design [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
Arithmetic Circuits (Part I) Randy H
CS 140 Lecture 14 Standard Combinational Modules
UNIVERSITY OF MASSACHUSETTS Dept
Addition and multiplication
ECE 352 Digital System Fundamentals
Lecture 9 Digital VLSI System Design Laboratory
UNIVERSITY OF MASSACHUSETTS Dept
Presentation transcript:

Lecture 18: Datapath Functional Units

Outline Comparators Shifters Multi-input Adders Multipliers 18: Datapath Functional Units

Comparators 0’s detector: A = 00…000 1’s detector: A = 11…111 Equality comparator: A = B Magnitude comparator: A < B 18: Datapath Functional Units

1’s & 0’s Detectors 1’s detector: N-input AND gate 0’s detector: NOTs + 1’s detector (N-input NOR) Asymmetric design that favors the late-arriving inputs. Here, the delay from the latest bit A7 is a single gate 18: Datapath Functional Units

1’s & 0’s Detectors Fast pseudo-nMOS or dynamic NOR detector. Works well for words up to about 16 bits For larger words, the gates can be split into 8–16-bit chunks to reduce the parasitic delay 18: Datapath Functional Units

Equality Comparator Check if each bit is equal (XNOR, aka equality gate) 1’s detect on bitwise equality 18: Datapath Functional Units

Magnitude Comparator Compute B – A and look at sign B – A = B + ~A + 1 For unsigned numbers, carry out is sign bit If there is a carry-out: A<=B. Otherwise: A>B. zero detector indicates that the numbers are equal. 18: Datapath Functional Units

Signed vs. Unsigned For signed numbers, comparison is harder C: carry out Z: zero (all bits of B – A are 0) N: negative (MSB of result) V: overflow (inputs had different signs, output sign  B) S: N xor V (sign of result) 18: Datapath Functional Units

Counters Features of counters: Resettable: counter value is reset to 0 when RESET is asserted Loadable: counter value is loaded with N-bit value when LOAD is asserted Enabled: counter counts only on clock cycles when EN is asserted Reversible: counter increments or decrements based on UP/DOWN input Terminal Count: TC output asserted when counter overflows 18: Datapath Functional Units

Binary Counter Asynchronous ripple-carry counter The falling transition of each register clocks the subsequent register. Asynchronous circuits introduce a whole assortment of problems. Delay is long. 18: Datapath Functional Units

Binary Counter Synchronous up/down counter Resettable Cycle time is limited by the ripple-carry delay with reset, load, and enable 18: Datapath Functional Units

Fast Binary Counter The speed of the previous counter is limited by the adder It can be overcome by dividing the counter into two or more segments. A 32-bit counter could be constructed from: a 4-bit prescalar counter 28-bit counter 18: Datapath Functional Units

Shifters Logical Shift: Shifts number left or right and fills with 0’s 1011 LSR 1 = 0101 1011 LSL1 = 0110 Arithmetic Shift: Shifts number left or right. Rt shift sign extends 1011 ASR1 = 1101 1011 ASL1 = 0110 Rotate: Shifts number left or right and fills with lost bits 1011 ROR1 = 1101 1011 ROL1 = 0111 18: Datapath Functional Units

Funnel Shifter A funnel shifter can do all six types of shifts creates a 2N – 1-bit input word Z from A and/or the kill values. Selects N-bit field Y from 2N–1-bit input Shift by k bits (0  k < N) Logically involves N N:1 multiplexers 18: Datapath Functional Units

Funnel Source Generator Rotate Right Logical Right Arithmetic Right Rotate Left Logical/Arithmetic Left 18: Datapath Functional Units

Array Funnel Shifter N N-input multiplexers Use 1-of-N hot select signals for shift amount nMOS pass transistor design (Vt drops!) N=4 18: Datapath Functional Units

Array Funnel Shifter Source generator consists of a 2N – 1-bit 5:1 multiplexer controlled by the shift type and direction. Z2N-2:N 18: Datapath Functional Units

Logarithmic Funnel Shifter Log N stages of 2-input muxes No select decoding needed 18: Datapath Functional Units

32-bit Logarithmic Funnel The first stage performs a coarse shift right by 0, 8, 16, or 24 bits. The second stage performs a fine shift right by 0–7 bits. The mux decode block conditionally inverts k for left shifts. 18: Datapath Functional Units

Barrel Shifter Barrel shifters perform right rotations using wrap-around wires. Left rotations are right rotations by N – k = k + 1 bits. Shifts are rotations with the end bits masked off. Barrel shifter forms: Array Logarithmic: better for large shifts. 18: Datapath Functional Units

Logarithmic Barrel Shifter 1 1 Right rotate only Right/Left Shift & Rotate Right/Left shift 18: Datapath Functional Units

32-bit Logarithmic Barrel Datapath never wider than 32 bits First stage preshifts by 1 to handle left shifts 18: Datapath Functional Units

Multi-input Adders Suppose we want to add k N-bit words Ex: 0001 + 0111 + 1101 + 0010 = 10111 Straightforward solution: k-1 N-input CPAs Large and slow 18: Datapath Functional Units

Carry Save Addition A full adder sums 3 inputs and produces 2 outputs Carry output has twice weight of sum output (X + Y + Z = S + 2C) N full adders in parallel are called carry save adder Produce N sums and N carry outs 18: Datapath Functional Units

CSA Application Use k-2 stages of CSAs Keep result in carry-save redundant form Final CPA computes actual result 18: Datapath Functional Units

Multiplication Multiplication is less common than addition. Multiplier is essential for Microprocessors Digital signal processors Graphics engines 18: Datapath Functional Units

Multiplication Example: M x N-bit multiplication Produce N M-bit partial products (PPs) Sum these to produce M+N-bit product 18: Datapath Functional Units

General Form Multiplicand: Y = (yM-1, yM-2, …, y1, y0) Multiplier: X = (xN-1, xN-2, …, x1, x0) Product: 18: Datapath Functional Units

Dot Diagram Large multiplications are illustrated using dot diagrams. Each dot represents a bit 18: Datapath Functional Units

Multiplier design Depends on: Latency Throughput Energy Area Design complexity Simple approach: use an M + 1-bit carry-propagate adder (CPA) to add the first two partial products then another CPA to add the third partial product to the running sum, and so forth Requires N – 1 CPAs and is slow Better approach: using arrays 18: Datapath Functional Units

Array Multiplier Fast multipliers use carry-save adders (CSAs) 4 × 4 array multiplier for unsigned numbers using an array of CSAs. Each cell contains: 2-input AND gate Full adder (CSA) First row: converts first PPs into CS form Other rows: use CSA to add PP to CS of previous row and generate a CS result N LSB outputs: are directly sum of CSAs. MSB output bits arrive in CS form require M-bit CPA to convert into binary form. (CPA: Carry-Propagate Adder) (CSA: Carry Save Adder) 18: Datapath Functional Units

Rectangular Array Squash array to fit rectangular floorplan (the parallelogram shape wastes space. 18: Datapath Functional Units

Improvement The first row of CSAs adds the first partial product to a pair of 0s (Sin=0 and Cin=0). Leads to a regular structure but is inefficient. The first row of CSAs can be used to add the first three partial products together. This reduces the number of rows by two and reduces the adder propagation delay Another improvement: replace the bottom row with a faster CPA such as a lookahead or tree adder The critical path of an array multiplier involves N–2 CSAs and a CPA. 18: Datapath Functional Units

Two’s Complement Array Multiplication MSB of a two’s complement number has a negative weight. Two of the PPs have negative weight and should be subtracted. Subtraction is handled by taking the two’s complement of the terms to be subtracted, i.e., inverting the bits and adding one. 18: Datapath Functional Units

Baugh-Wooley multiplier algorithm The upper parallelogram: Multiplication of all but the MSBs of the inputs Next row: product of the MSB (X5Y5) Next two pairs of rows: the inversions of the terms to be subtracted. Each term has implicit leading and trailing zeros, which are inverted to leading and trailing ones. Extra ones must be added in the least significant column when taking the two’s complement. 18: Datapath Functional Units

Modified Baugh-Wooley multiplier Reduces the number of partial products: precomputing the sums of the constant ones pushing some of the terms upward into extra columns 18: Datapath Functional Units

squashed into a rectangle Almost identical to the unsigned multiplier AND gates are replaced by NAND gates in the hatched cells. 1s are added in place of 0s at two of the unused inputs. Signed and unsigned arrays are so similar A single array can be used for both XOR gates are used to conditionally invert some of the terms depending on the mode. 18: Datapath Functional Units

Fewer Partial Products Array multiplier requires N partial products If we looked at groups of r bits, we could form N/r partial products. Faster and smaller? Called radix-2r encoding Ex: r = 2: look at pairs of bits Form partial products of 00  0 01  Y 10  2Y simple shift 11  3Y hard multiple, requiring a slow carry-propagate addition of Y+2Y First three are easy, but 3Y requires adder  18: Datapath Functional Units

Booth Encoding Observe that: 3Y = 4Y – Y 2Y = 4Y – 2Y 4Y in a radix-4 multiplier array is equivalent to Y in the next row 18: Datapath Functional Units

Booth Encoding Instead of 3Y, try –Y, then increment next partial product to add 4Y Similarly, for 2Y, try –2Y + 4Y in next partial product 18: Datapath Functional Units

Example 18: Datapath Functional Units

Booth Hardware Booth encoder generates control lines for each PP Booth selectors choose PP bits 18: Datapath Functional Units

Sign Extension Partial products can be negative Require sign extension, which is cumbersome High fanout on most significant bit 18: Datapath Functional Units

Simplified Sign Ext. Sign bits are either all 0’s or all 1’s Note that all 0’s is all 1’s + 1 in proper column Use this to reduce loading on MSB 18: Datapath Functional Units

Even Simpler Sign Ext. No need to add all the 1’s in hardware Precompute the answer! 18: Datapath Functional Units

Advanced Multiplication Signed vs. unsigned inputs Higher radix Booth encoding Array vs. tree CSA networks 18: Datapath Functional Units