Caltech CS184 Winter2003 -- DeHon CS184a: Computer Architecture (Structure and Organization) Day 3: January 13, 2003 Arithmetic and Pipelining.

Slides:



Advertisements
Similar presentations
Introduction So far, we have studied the basic skills of designing combinational and sequential logic using schematic and Verilog-HDL Now, we are going.
Advertisements

L10 – Transistors Logic Math 1 Comp 411 – Spring /22/07 Arithmetic Circuits Didn’t I learn how to do addition in the second grade?
Comparator.
Datorteknik ArithmeticCircuits bild 1 Computer arithmetic Somet things you should know about digital arithmetic: Principles Architecture Design.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 4, 2011 Synchronous Circuits.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day3: October 2, 2000 Arithmetic and Pipelining.
Using Carry-Save Adders For Radix- 4, Can Be Used to Generate 3a – No Booth’s Slight Delay Penalty from CSA – 3 Gates.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Arithmetic See: P&H Chapter 3.1-3, C.5-6.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE VLSI Circuit Design Lecture 24 - Subsystem.
Arithmetic II CPSC 321 E. J. Kim. Today’s Menu Arithmetic-Logic Units Logic Design Revisited Faster Addition Multiplication (if time permits)
EECS Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
Chapter 6 Arithmetic. Addition Carry in Carry out
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 26, 2009 Scheduled Operator Sharing.
Digital Design – Optimizations and Tradeoffs
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day4: October 4, 2000 Memories, ALUs, and Virtualization.
Arithmetic-Logic Units CPSC 321 Computer Architecture Andreas Klappenecker.
Penn ESE Spring DeHon 1 ESE : Computer Organization Day 3: January 17, 2007 Arithmetic and Pipelining.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…
Lecture # 12 University of Tehran
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
1 Bits are just bits (no inherent meaning) — conventions define relationship between bits and numbers Binary numbers (base 2)
CS1Q Computer Systems Lecture 9 Simon Gay. Lecture 9CS1Q Computer Systems - Simon Gay2 Addition We want to be able to do arithmetic on computers and therefore.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 9: February 24, 2014 Operator Sharing, Virtualization, Programmable Architectures.
CSC321 Where We’ve Been Binary representations Boolean logic Logic gates – combinational circuits Flip-flops – sequential circuits Complex gates – modules.
Chapter # 5: Arithmetic Circuits
Chapter 6-1 ALU, Adder and Subtractor
CSE 241 Computer Organization Lecture # 9 Ch. 4 Computer Arithmetic Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2 y 1 s 1 c 1 x 0 y 0 s 0 c 0 MSB positionLSB position Ripple-Carry.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 7: February 9, 2015 Scheduled Operator Sharing.
EECS Components and Design Techniques for Digital Systems Lec 16 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
EKT 221/4 DIGITAL ELECTRONICS II  Registers, Micro-operations and Implementations - Part3.
Important Components, Blocks and Methodologies. To remember 1.EXORS 2.Counters and Generalized Counters 3.State Machines (Moore, Mealy, Rabin-Scott) 4.Controllers.
Nov 10, 2008ECE 561 Lecture 151 Adders. Nov 10, 2008ECE 561 Lecture 152 Adders Basic Ripple Adders Faster Adders Sequential Adders.
Computing Systems Designing a basic ALU.
Half Adder & Full Adder Patrick Marshall. Intro Adding binary digits Half adder Full adder Parallel adder (ripple carry) Arithmetic overflow.
EE2174: Digital Logic and Lab Professor Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University CHAPTER 8 Arithmetic.
Cost/Performance Tradeoffs: a case study
1 Arithmetic I Instructor: Mozafar Bag-Mohammadi Ilam University.
COMP541 Arithmetic Circuits
Computer Architecture Lecture 3 Combinational Circuits Ralph Grishman September 2015 NYU.
Topics covered: Arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 3: February 3, 2014 Arithmetic Work preclass exercise.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 3: January 10, 2005 Arithmetic and Pipelining.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 4: January 25, 2012 Sequential Logic (FSMs, Pipelining, FSMD)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 16, 2013 Scheduled Operator Sharing.
EKT 221 : Digital 2 Serial Transfers & Microoperations Date : Lecture : 2 hr.
CS 151: Digital Design Chapter 4: Arithmetic Functions and Circuits
ECE/CS 552: Arithmetic I Instructor:Mikko H Lipasti Fall 2010 University of Wisconsin-Madison Lecture notes partially based on set created by Mark Hill.
Lecture #23: Arithmetic Circuits-1 Arithmetic Circuits (Part I) Randy H. Katz University of California, Berkeley Fall 2005.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 3: January 25, 2010 Arithmetic Work preclass exercise.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.
MicroProcessors Lec. 4 Dr. Tamer Samy Gaafar. Course Web Page —
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 4: January 27, 2010 Sequential Logic (FSMs, Pipelining, FSMD)
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 5, 2012 Synchronous Circuits.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 20: October 25, 2010 Pass Transistors.
1 The ALU l ALU includes combinational logic. –Combinational logic  a change in inputs directly causes a change in output, after a characteristic delay.
CS2100 Computer Organisation
ESE534: Computer Organization
Homework Reading Machine Projects Labs
CSE Winter 2001 – Arithmetic Unit - 1
ESE534: Computer Organization
Homework Reading Machine Projects Labs
Arithmetic and Decisions
ESE534: Computer Organization
ESE534: Computer Organization
ESE535: Electronic Design Automation
ESE532: System-on-a-Chip Architecture
Presentation transcript:

Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 3: January 13, 2003 Arithmetic and Pipelining

Caltech CS184 Winter DeHon Last Time Boolean logic  computing any finite function Sequential logic  computing any finite automata –included some functions of unbounded size Saw gates and registers –…and a few properties of logic

Caltech CS184 Winter DeHon Today Addition –organization –design space –area, time Pipelining Temporal Reuse –area-time tradeoffs

Caltech CS184 Winter DeHon C: 00 A: B: S: 0 C: 000 A: B: S: 10 C: 0000 A: B: S: 110 C: A: B: S: 0110 C: A: B: S: C: A: B: S: C: A: B: S: C: A: B: S: C: A: B: S: C: A: B: S: C: A: B: S: Example: Bit Level Addition Addition – (everyone knows how to do addition base 2, right?) A: B: S: C: 0 A: B: S:

Caltech CS184 Winter DeHon Addition Base 2 A = a n-1 *2 (n-1) +a n-2 *2 (n-2) +... a 1 *2 1 + a 0 *2 0 =  (a i *2 i ) S=A+B s i =(xor carry i (xor a i b i )) carry i = ( a i-1 + b i-1 + carry i-1 )  2 = (or (and a i-1 b i-1 ) (and a i-1 carry i-1 ) (and b i-1 carry i-1 ))

Caltech CS184 Winter DeHon Adder Bit S=(xor a b carry) t=(xor2 a b); s=(xor2 t carry) xor2 = (and (not (and2 a b) (not (and2 (not a) (not b))) carry = (not (and2 (not (and2 a b)) (and2 (not (and2 b carry)) (not (and2 a carry)))))

Caltech CS184 Winter DeHon Ripple Carry Addition Shown operation of each bit Often convenient to define logic for each bit, then assemble: –bit slice

Caltech CS184 Winter DeHon Ripple Carry Analysis Area: O(N) [6n] Delay: O(N) [2n]

Caltech CS184 Winter DeHon Can we do better?

Caltech CS184 Winter DeHon Important Observation Do we have to wait for the carry to show up to begin doing useful work? –We do have to know the carry to get the right answer. –But, it can only take on two values

Caltech CS184 Winter DeHon Idea Compute both possible values and select correct result when we know the answer

Caltech CS184 Winter DeHon Preliminary Analysis DRA--Delay Ripple Adder DRA(n) = k*n DRA(n) = 2*DRA(n/2) DP2A-- Delay Predictive Adder DP2A=DRA(n/2)+D(mux2) …almost half the delay!

Caltech CS184 Winter DeHon Recurse If something works once, do it again. Use the predictive adder to implement the first half of the addition

Caltech CS184 Winter DeHon Recurse Redundant (can share)

Caltech CS184 Winter DeHon Recurse If something works once, do it again. Use the predictive adder to implement the first half of the addition DP4A(n)=DRA(n/4) + D(mux2) + D(mux2) DP4A(n)=DRA(n/4)+2*D(mux2)

Caltech CS184 Winter DeHon Recurse By know we realize we’ve been using the wrong recursion –should be using the DPA in the recursion DPA(n) = DPA(n/2) + D(mux2) DPA(n)=log 2 (n)*D(mux2)+C

Caltech CS184 Winter DeHon Resulting RPA [and a few more optimizations]

Caltech CS184 Winter DeHon RPA Analysis Delay: O(log(n)) Area: O(n) –maybe n log(n) when consider wiring... bounded fanout

Caltech CS184 Winter DeHon Constructive RPA Each block (I,J) may –propagate or squash a carry in –generate a carry out –can compute PG(I,J) in terms of PG(I,K) and PG(K,J)(I<K<J) PG(I,J) + carry(I) –is enough to calculate carry(J)

Caltech CS184 Winter DeHon Resulting RPA

Caltech CS184 Winter DeHon Note: Constants Matter Watch the constants Asymptotically this RPA is great For small adders can be smaller with –fast ripple carry –larger combining than 2-ary tree –mix of techniques …will depend on the technology primitives and cost functions

Caltech CS184 Winter DeHon Two’s Complement Everyone seemed to know Two’s complement 2’s complement: –positive numbers in binary –negative numbers subtract 1 and invert (or invert and add 1)

Caltech CS184 Winter DeHon Two’s Complement 2 = = = = = 110

Caltech CS184 Winter DeHon Addition of Negative Numbers? …just works A: 111 B: 001 S: 000 A: 110 B: 001 S: 111 A: 111 B: 010 S: 001 A: 111 B: 110 S: 101

Caltech CS184 Winter DeHon Subtraction Negate the subtracted input and use adder –which is: invert input and add 1 works for both positive and negative input –001  = 111 –111  = 001 –000  = 000 –010  = 110 –110  = 010

Caltech CS184 Winter DeHon Subtraction (add/sub) Note: you can use the “unused” carry input at the LSB to perform the “add 1”

Caltech CS184 Winter DeHon Overflow? Overflow=(A.s==B.s)*(A.s!=S.s) A: 111 B: 001 S: 000 A: 110 B: 001 S: 111 A: 111 B: 010 S: 001 A: 111 B: 110 S: 101 A: 001 B: 001 S: 010 A: 011 B: 001 S: 100 A: 111 B: 100 S: 011

Caltech CS184 Winter DeHon Reuse

Caltech CS184 Winter DeHon Reuse In general, we want to reuse our components in time –not disposable logic How do we do that? –Wait until done, someone’s used output

Caltech CS184 Winter DeHon Reuse: “Waiting” Discipline Use registers and timing (or acknowledgements) for orderly progression of data

Caltech CS184 Winter DeHon Example: 4b Ripple Adder Recall 2 gates/FA Latency: 8 gates to S3 Throughput: 1 result / 8 gate delays max

Caltech CS184 Winter DeHon Can we do better?

Caltech CS184 Winter DeHon Stagger Inputs Correct if expecting A,B[3:2] to be staggered one cycle behind A,B[1:0] …and succeeding stage expects S[3:2] staggered from S[1:0]

Caltech CS184 Winter DeHon Align Data / Balance Paths Good discipline to line up pipe stages in diagrams.

Caltech CS184 Winter DeHon Example: 4b RA pipe 2 Recall 2 gates/FA Latency: 8 gates to S3 Throughput: 1 result / 4 gate delays max

Caltech CS184 Winter DeHon Deeper? Can we do it again? What’s our limit? Why would we stop?

Caltech CS184 Winter DeHon More Reuse Saw could pipeline and reuse FA more frequently Suggests we’re wasting the FA part of the time in non-pipelined

Caltech CS184 Winter DeHon More Reuse (cont.) If we’re willing to take 8 gate-delay units, do we need 4 FAs?

Caltech CS184 Winter DeHon Ripple Add (pipe view) Can pipeline to FA. If don’t need throughput, reuse FA on SAME addition.

Caltech CS184 Winter DeHon Bit Serial Addition Assumes LSB first ordering of input data.

Caltech CS184 Winter DeHon Bit Serial Addition: Pipelining Latency: 8 gate delays Throughput: 1 result / 10 gate delays Can squash Cout[3] and do in 1 result/8 gate delays registers do have time overhead –setup, hold time, clock jitter

Caltech CS184 Winter DeHon Multiplication Can be defined in terms of addition Ask you to play with implementations and tradeoffs in homework 2

Caltech CS184 Winter DeHon Compute Function Compute: y=Ax 2 +Bx +C Assume –D(Mpy) > D(Add) –A(Mpy) > A(Add)

Caltech CS184 Winter DeHon Spatial Quadratic D(Quad) = 2*D(Mpy)+D(Add) Throughput 1/(2*D(Mpy)+D(Add)) A(Quad) = 3*A(Mpy) + 2*A(Add)

Caltech CS184 Winter DeHon Pipelined Spatial Quadratic D(Quad) = 3*D(Mpy) Throughput 1/D(Mpy) A(Quad) = 3*A(Mpy) + 2*A(Add)+6A(Reg)

Caltech CS184 Winter DeHon Bit Serial Quadratic data width w; one bit per cycle roughly 1/w-th the area of pipelined spatial roughly 1/w-th the throughput latency just a little larger than pipelined

Caltech CS184 Winter DeHon Quadratic with Single Multiplier and Adder? We’ve seen reuse to perform the same operation –pipelining –bit-serial, homogeneous datapath We can also reuse a resource in time to perform a different role. –Here: x*x, A*(x*x), B*x –also: (Bx)+c, (A*x*x)+(Bx+c)

Caltech CS184 Winter DeHon Quadratic Datapath Start with one of each operation (alternatives where build multiply from adds…e.g. homework)

Caltech CS184 Winter DeHon Quadratic Datapath Multiplier servers multiple roles –x*x –A*(x*x) – B*x Will need to be able to steer data (switch interconnections)

Caltech CS184 Winter DeHon Quadratic Datapath Multiplier servers multiple roles –x*x –A*(x*x) – B*x x, x*x x,A,B

Caltech CS184 Winter DeHon Quadratic Datapath Multiplier servers multiple roles –x*x –A*(x*x) – B*x x, x*x x,A,B

Caltech CS184 Winter DeHon Quadratic Datapath Adder servers multiple roles –(Bx)+c – (A*x*x)+(Bx+c) one always mpy output C, Bx+C

Caltech CS184 Winter DeHon Quadratic Datapath

Caltech CS184 Winter DeHon Quadratic Datapath Add input register for x

Caltech CS184 Winter DeHon Quadratic Control Now, we just need to control the datapath Control: –LD x –LD x*x –MA Select –MB Select –AB Select –LD Bx+C –LD Y

Caltech CS184 Winter DeHon FSMD FSMD = FSM + Datapath Stylization for building controlled datapaths such as this Of course, an FSMD is just an FSM – it’s often easier to think about as a datapath –synthesis, AP&R tools have been notoriously bad about discovering/exploiting datapath structure

Caltech CS184 Winter DeHon Quadratic FSMD

Caltech CS184 Winter DeHon Quadratic FSMD Control S0: if (go) LD_X; goto S1 –else goto S0 S1: MA_SEL=x,MB_SEL[1:0]=x, LD_x*x –goto S2 S2: MA_SEL=x,MB_SEL[1:0]=B –goto S3 S3: AB_SEL=C,MA_SEL=x*x, MB_SEL=A –goto S4 S4: AB_SEL=Bx+C, LD_Y –goto S0

Caltech CS184 Winter DeHon Quadratic FSMD Control S0: if (go) LD_X; goto S1 –else goto S0 S1: MA_SEL=x,MB_SEL[1:0]=x, LD_x*x –goto S2 S2: MA_SEL=x,MB_SEL[1:0]=B –goto S3 S3: AB_SEL=C,MA_SEL=x*x, MB_SEL=A –goto S4 S4: AB_SEL=Bx+C, LD_Y –goto S0

Caltech CS184 Winter DeHon Quadratic FSM Latency: 5*(D(MPY)+D(mux3)) Throughput: 1/Latency Area: A(Mpy)+A(Add)+5*A(Reg) +2*A(Mux2)+A(Mux3)+A(QFSM)

Caltech CS184 Winter DeHon Big Ideas [MSB Ideas] Can build arithmetic out of logic Pipelining: –increases parallelism –allows reuse in time (same function) Control and Sequencing –reuse in time for different functions Can tradeoff Area and Time

Caltech CS184 Winter DeHon Big Ideas [MSB-1 Ideas] Area-Time Tradeoff in Adders FSMD control style