Penn ESE680-002 Spring2007 -- DeHon 1 ESE680-002: Computer Organization Day 3: January 17, 2007 Arithmetic and Pipelining.

Slides:



Advertisements
Similar presentations
Introduction So far, we have studied the basic skills of designing combinational and sequential logic using schematic and Verilog-HDL Now, we are going.
Advertisements

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
Chapter 4 -- Modular Combinational Logic. Decoders.
L10 – Transistors Logic Math 1 Comp 411 – Spring /22/07 Arithmetic Circuits Didn’t I learn how to do addition in the second grade?
Comparator.
Datorteknik ArithmeticCircuits bild 1 Computer arithmetic Somet things you should know about digital arithmetic: Principles Architecture Design.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 4, 2011 Synchronous Circuits.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day3: October 2, 2000 Arithmetic and Pipelining.
Fast Adders See: P&H Chapter 3.1-3, C Goals: serial to parallel conversion time vs. space tradeoffs design choices.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
1 CS 140 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris.
EECS Components and Design Techniques for Digital Systems Lec 18 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
Chapter 6 Arithmetic. Addition Carry in Carry out
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 26, 2009 Scheduled Operator Sharing.
ECE C03 Lecture 61 Lecture 6 Arithmetic Logic Circuits Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
Chapter # 5: Arithmetic Circuits Contemporary Logic Design Randy H
Lecture 8 Arithmetic Logic Circuits
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…
Chapter 5 Arithmetic Logic Functions. Page 2 This Chapter..  We will be looking at multi-valued arithmetic and logic functions  Bitwise AND, OR, EXOR,
Lecture # 12 University of Tehran
1 Bits are just bits (no inherent meaning) — conventions define relationship between bits and numbers Binary numbers (base 2)
CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 9: January 30, 2006 Parallel Prefix.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 9: February 24, 2014 Operator Sharing, Virtualization, Programmable Architectures.
ECE2030 Introduction to Computer Engineering Lecture 12: Building Blocks for Combinational Logic (3) Adders/Subtractors, Parity Checkers Prof. Hsien-Hsin.
IKI a-Combinatorial Components Bobby Nazief Semester-I The materials on these slides are adopted from those in CS231’s Lecture Notes.
Chapter # 5: Arithmetic Circuits
Chapter 6-1 ALU, Adder and Subtractor
5-1 Programmable and Steering Logic Chapter # 5: Arithmetic Circuits.
Csci 136 Computer Architecture II – Constructing An Arithmetic Logic Unit Xiuzhen Cheng
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 7: February 9, 2015 Scheduled Operator Sharing.
EECS Components and Design Techniques for Digital Systems Lec 16 – Arithmetic II (Multiplication) David Culler Electrical Engineering and Computer.
EKT 221/4 DIGITAL ELECTRONICS II  Registers, Micro-operations and Implementations - Part3.
Important Components, Blocks and Methodologies. To remember 1.EXORS 2.Counters and Generalized Counters 3.State Machines (Moore, Mealy, Rabin-Scott) 4.Controllers.
Nov 10, 2008ECE 561 Lecture 151 Adders. Nov 10, 2008ECE 561 Lecture 152 Adders Basic Ripple Adders Faster Adders Sequential Adders.
Computing Systems Designing a basic ALU.
درس مدارهای منطقی دانشگاه قم مدارهای منطقی محاسباتی تهیه شده توسط حسین امیرخانی مبتنی بر اسلایدهای درس مدارهای منطقی دانشگاه.
Half Adder & Full Adder Patrick Marshall. Intro Adding binary digits Half adder Full adder Parallel adder (ripple carry) Arithmetic overflow.
1 Lecture 12 Time/space trade offs Adders. 2 Time vs. speed: Linear chain 8-input OR function with 2-input gates Gates: 7 Max delay: 7.
EE2174: Digital Logic and Lab Professor Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University CHAPTER 8 Arithmetic.
Cost/Performance Tradeoffs: a case study
1 Arithmetic I Instructor: Mozafar Bag-Mohammadi Ilam University.
COMP541 Arithmetic Circuits
Topics covered: Arithmetic CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 3: February 3, 2014 Arithmetic Work preclass exercise.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 3: January 10, 2005 Arithmetic and Pipelining.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 4: January 25, 2012 Sequential Logic (FSMs, Pipelining, FSMD)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 16, 2013 Scheduled Operator Sharing.
CS 151: Digital Design Chapter 4: Arithmetic Functions and Circuits
CSE 311 Foundations of Computing I Lecture 25 Circuits for FSMs, Carry-Look-Ahead Adders Autumn 2011 CSE 3111.
ECE/CS 552: Arithmetic I Instructor:Mikko H Lipasti Fall 2010 University of Wisconsin-Madison Lecture notes partially based on set created by Mark Hill.
Lecture #23: Arithmetic Circuits-1 Arithmetic Circuits (Part I) Randy H. Katz University of California, Berkeley Fall 2005.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 3: January 25, 2010 Arithmetic Work preclass exercise.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 4: January 15, 2003 Memories, ALUs, Virtualization.
MicroProcessors Lec. 4 Dr. Tamer Samy Gaafar. Course Web Page —
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 4: January 27, 2010 Sequential Logic (FSMs, Pipelining, FSMD)
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 5, 2012 Synchronous Circuits.
Caltech CS184 Winter DeHon CS184a: Computer Architecture (Structure and Organization) Day 3: January 13, 2003 Arithmetic and Pipelining.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 20: October 25, 2010 Pass Transistors.
CS2100 Computer Organisation
ESE534: Computer Organization
Computer Organization and Design Arithmetic & Logic Circuits
CSE Winter 2001 – Arithmetic Unit - 1
Arithmetic Circuits (Part I) Randy H
ESE534: Computer Organization
Arithmetic and Decisions
Instructor: Mozafar Bag-Mohammadi University of Ilam
ESE534: Computer Organization
ESE534: Computer Organization
ESE535: Electronic Design Automation
Presentation transcript:

Penn ESE Spring DeHon 1 ESE : Computer Organization Day 3: January 17, 2007 Arithmetic and Pipelining

Penn ESE Spring DeHon 2 Last Time Boolean logic  computing any finite function Sequential logic  computing any finite automata –included some functions of unbounded size Saw gates and registers –…and a few properties of logic

Penn ESE Spring DeHon 3 Today Addition –organization –design space –area, time Pipelining Temporal Reuse –area-time tradeoffs

Penn ESE Spring DeHon 4 Why? Start getting a handle on –Complexity Area and time Area-time tradeoffs –Parallelism –Regularity Arithmetic underlies much computation –grounds out complexity

Penn ESE Spring DeHon 5 C: 00 A: B: S: 0 C: 000 A: B: S: 10 C: 0000 A: B: S: 110 C: A: B: S: 0110 C: A: B: S: C: A: B: S: C: A: B: S: C: A: B: S: C: A: B: S: C: A: B: S: C: A: B: S: Example: Bit Level Addition Addition – (everyone knows how to do addition base 2, right?) A: B: S: C: 0 A: B: S:

Penn ESE Spring DeHon 6 Addition Base 2 A = a n-1 *2 (n-1) +a n-2 *2 (n-2) +... a 1 *2 1 + a 0 *2 0 =  (a i *2 i ) S=A+B What is the function for s i … carry i ? s i =(xor carry i (xor a i b i )) carry i = ( a i-1 + b i-1 + carry i-1 )  2 = (or (and a i-1 b i-1 ) (and a i-1 carry i-1 ) (and b i-1 carry i-1 ))

Penn ESE Spring DeHon 7 Adder Bit S=(xor a b carry) t=(xor2 a b); s=(xor2 t carry) xor2 = (and (not (and2 a b) (not (and2 (not a) (not b))) carry = (not (and2 (not (and2 a b)) (and2 (not (and2 b carry)) (not (and2 a carry)))))

Penn ESE Spring DeHon 8 Ripple Carry Addition Shown operation of each bit Often convenient to define logic for each bit, then assemble: –bit slice

Penn ESE Spring DeHon 9 Ripple Carry Analysis Area: O(N) [6n] Delay: O(N) [n+3] What is area and delay?

Penn ESE Spring DeHon 10 Can we do better?

Penn ESE Spring DeHon 11 Important Observation Do we have to wait for the carry to show up to begin doing useful work? –We do have to know the carry to get the right answer. –But, it can only take on two values

Penn ESE Spring DeHon 12 Idea Compute both possible values and select correct result when we know the answer

Penn ESE Spring DeHon 13 Preliminary Analysis Delay(RA) --Delay Ripple Adder Delay(RA(n)) = k*n Delay(RA(n)) = 2*(k*n/2)=2*DRA(n/2) Delay(P2A) -- Delay Predictive Adder Delay(P2A)=DRA(n/2)+D(mux2) …almost half the delay!

Penn ESE Spring DeHon 14 Recurse If something works once, do it again. Use the predictive adder to implement the first half of the addition

Penn ESE Spring DeHon 15 Recurse Redundant (can share)

Penn ESE Spring DeHon 16 Recurse If something works once, do it again. Use the predictive adder to implement the first half of the addition Delay(P4A(n))= Delay(RA(n/4)) + D(mux2) + D(mux2) Delay(P4A(n))=Delay(RA(n/4))+2*D(mux2)

Penn ESE Spring DeHon 17 Recurse By know we realize we’ve been using the wrong recursion –should be using the Predictive Adder in the recursion Delay(PA(n)) = Delay(PA(n/2)) + D(mux2) Every time cut in half…? How many times cut in half? Delay(PA(n))=log 2 (n)*D(mux2)+C

Penn ESE Spring DeHon 18 Another Way (Parallel Prefix)

Penn ESE Spring DeHon 19 CLA Think about each adder bit as a computing a function on the carry in –C[i]=g(c[i-1]) –Particular function f will depend on a[i], b[i] –G=f(a,b)

Penn ESE Spring DeHon 20 Functions What functions can g(c[i-1]) be? –g(x)=1 a[i]=b[i]=1 –g(x)=x a[i] xor b[i]=1 –g(x)=0 a[i]=b[i]=0

Penn ESE Spring DeHon 21 Functions What functions can g(c[i-1]) be? –g(x)=1 Generate a[i]=b[i]=1 –g(x)=x Propagate a[i] xor b[i]=1 –g(x)=0 Squash a[i]=b[i]=0

Penn ESE Spring DeHon 22 Combining Want to combine functions –Compute c[i]=g i (g i-1 (c[i-2])) –Compute compose of two functions What functions will the compose of two of these functions be? –Same as before Propagate, generate, squash

Penn ESE Spring DeHon 23 Compose Rules (LSB MSB) GG GP GS PG PP PS SG SP SS

Penn ESE Spring DeHon 24 Compose Rules (LSB MSB) GG = G GP = G GS = S PG = G PP = P PS = S SG = G SP = S SS = S

Penn ESE Spring DeHon 25 Combining Do it again… Combine g[i-3,i-2] and g[i-1,i] What do we get?

Penn ESE Spring DeHon 26 Reduce Tree

Penn ESE Spring DeHon 27 Prefix Tree

Penn ESE Spring DeHon 28 Parallel Prefix Important Pattern Applicable any time operation is associative Examples of associative functions? –Non-associative? Function Composition is always associative

Penn ESE Spring DeHon 29 Note: Constants Matter Watch the constants Asymptotically this Carry-Lookahead Adder (CLA) is great For small adders can be smaller with –fast ripple carry –larger combining than 2-ary tree –mix of techniques …will depend on the technology primitives and cost functions

Penn ESE Spring DeHon 30 Two’s Complement Everyone seemed to know Two’s complement 2’s complement: –positive numbers in binary –negative numbers subtract 1 and invert (or invert and add 1)

Penn ESE Spring DeHon 31 Two’s Complement 2 = = = = = 110

Penn ESE Spring DeHon 32 Addition of Negative Numbers? …just works A: 111 B: 001 S: 000 A: 110 B: 001 S: 111 A: 111 B: 010 S: 001 A: 111 B: 110 S: 101

Penn ESE Spring DeHon 33 Subtraction Negate the subtracted input and use adder –which is: invert input and add 1 works for both positive and negative input –001  = 111 –111  = 001 –000  = 000 –010  = 110 –110  = 010

Penn ESE Spring DeHon 34 Subtraction (add/sub) Note: you can use the “unused” carry input at the LSB to perform the “add 1”

Penn ESE Spring DeHon 35 Overflow? Overflow=(A.s==B.s)*(A.s!=S.s) A: 111 B: 001 S: 000 A: 110 B: 001 S: 111 A: 111 B: 010 S: 001 A: 111 B: 110 S: 101 A: 001 B: 001 S: 010 A: 011 B: 001 S: 100 A: 111 B: 100 S: 011

Penn ESE Spring DeHon 36 Reuse

Penn ESE Spring DeHon 37 Reuse In general, we want to reuse our components in time –not disposable logic How do we do that? –Wait until done, someone’s used output

Penn ESE Spring DeHon 38 Reuse: “Waiting” Discipline Use registers and timing (or acknowledgements) for orderly progression of data

Penn ESE Spring DeHon 39 Example: 4b Ripple Adder Recall 1 gates/FA Latency and throughput? Latency: 4 gates to S3 Throughput: 1 result / 4 gate delays max

Penn ESE Spring DeHon 40 Can we do better?

Penn ESE Spring DeHon 41 Stagger Inputs Correct if expecting A,B[3:2] to be staggered one cycle behind A,B[1:0] …and succeeding stage expects S[3:2] staggered from S[1:0]

Penn ESE Spring DeHon 42 Align Data / Balance Paths Good discipline to line up pipe stages in diagrams.

Penn ESE Spring DeHon 43 Example: 4b RA pipe 2 Recall 1 gates/FA Latency and Throughput? Latency: 4 gates to S3 Throughput: 1 result / 2 gate delays max

Penn ESE Spring DeHon 44 Deeper? Can we do it again? What’s our limit? Why would we stop?

Penn ESE Spring DeHon 45 More Reuse Saw could pipeline and reuse FA more frequently Suggests we’re wasting the FA part of the time in non-pipelined

Penn ESE Spring DeHon 46 More Reuse (cont.) If we’re willing to take 4 gate-delay units, do we need 4 FAs?

Penn ESE Spring DeHon 47 Ripple Add (pipe view) Can pipeline to FA. If don’t need throughput, reuse FA on SAME addition. What if don’t need the throughput?

Penn ESE Spring DeHon 48 Bit Serial Addition Assumes LSB first ordering of input data.

Penn ESE Spring DeHon 49 Bit Serial Addition: Pipelining Latency and throughput? Latency: 4 gate delays Throughput: 1 result / 5 gate delays Can squash Cout[3] and do in 1 result/4 gate delays registers do have time overhead –setup, hold time, clock jitter

Penn ESE Spring DeHon 50 Multiplication Can be defined in terms of addition Ask you to play with implementations and tradeoffs in homework 2 –Out today –Pickup from syllabus page on web

Penn ESE Spring DeHon 51 Compute Function Compute: y=Ax 2 +Bx +C Assume –D(Mpy) > D(Add) –A(Mpy) > A(Add)

Penn ESE Spring DeHon 52 Spatial Quadratic D(Quad) = 2*D(Mpy)+D(Add) Throughput 1/(2*D(Mpy)+D(Add)) A(Quad) = 3*A(Mpy) + 2*A(Add)

Penn ESE Spring DeHon 53 Pipelined Spatial Quadratic D(Quad) = 3*D(Mpy) Throughput 1/D(Mpy) A(Quad) = 3*A(Mpy) + 2*A(Add)+6A(Reg)

Penn ESE Spring DeHon 54 Bit Serial Quadratic data width w; one bit per cycle roughly 1/w-th the area of pipelined spatial roughly 1/w-th the throughput latency just a little larger than pipelined

Penn ESE Spring DeHon 55 Quadratic with Single Multiplier and Adder? We’ve seen reuse to perform the same operation –pipelining –bit-serial, homogeneous datapath We can also reuse a resource in time to perform a different role. –Here: x*x, A*(x*x), B*x –also: (Bx)+c, (A*x*x)+(Bx+c)

Penn ESE Spring DeHon 56 Quadratic Datapath Start with one of each operation (alternatives where build multiply from adds…e.g. homework)

Penn ESE Spring DeHon 57 Quadratic Datapath Multiplier servers multiple roles –x*x –A*(x*x) – B*x Will need to be able to steer data (switch interconnections)

Penn ESE Spring DeHon 58 Quadratic Datapath Multiplier servers multiple roles –x*x –A*(x*x) – B*x x, x*x x,A,B

Penn ESE Spring DeHon 59 Quadratic Datapath Multiplier servers multiple roles –x*x –A*(x*x) – B*x x, x*x x,A,B

Penn ESE Spring DeHon 60 Quadratic Datapath Adder servers multiple roles –(Bx)+c –(A*x*x)+(Bx+c) one always mpy output C, Bx+C

Penn ESE Spring DeHon 61 Quadratic Datapath

Penn ESE Spring DeHon 62 Quadratic Datapath Add input register for x

Penn ESE Spring DeHon 63 Quadratic Control Now, we just need to control the datapath What control? Control: –LD x –LD x*x –MA Select –MB Select –AB Select –LD Bx+C –LD Y

Penn ESE Spring DeHon 64 FSMD FSMD = FSM + Datapath Stylization for building controlled datapaths such as this (a pattern) Of course, an FSMD is just an FSM – it’s often easier to think about as a datapath –synthesis, AP&R tools have been notoriously bad about discovering/exploiting datapath structure

Penn ESE Spring DeHon 65 Quadratic FSMD

Penn ESE Spring DeHon 66 Quadratic FSMD Control S0: if (go) LD_X; goto S1 –else goto S0 S1: MA_SEL=x,MB_SEL[1:0]=x, LD_x*x –goto S2 S2: MA_SEL=x,MB_SEL[1:0]=B –goto S3 S3: AB_SEL=C,MA_SEL=x*x, MB_SEL=A –goto S4 S4: AB_SEL=Bx+C, LD_Y –goto S0

Penn ESE Spring DeHon 67 Quadratic FSMD Control S0: if (go) LD_X; goto S1 –else goto S0 S1: MA_SEL=x,MB_SEL[1:0]=x, LD_x*x –goto S2 S2: MA_SEL=x,MB_SEL[1:0]=B –goto S3 S3: AB_SEL=C,MA_SEL=x*x, MB_SEL=A –goto S4 S4: AB_SEL=Bx+C, LD_Y –goto S0

Penn ESE Spring DeHon 68 Quadratic FSM Latency/Throughput/Area? Latency: 5*(D(MPY)+D(mux3)) Throughput: 1/Latency Area: A(Mpy)+A(Add)+5*A(Reg) +2*A(Mux2)+A(Mux3)+A(QFSM)

Penn ESE Spring DeHon 69 Big Ideas [MSB Ideas] Can build arithmetic out of logic Pipelining: –increases parallelism –allows reuse in time (same function) Control and Sequencing –reuse in time for different functions Can tradeoff Area and Time

Penn ESE Spring DeHon 70 Big Ideas [MSB-1 Ideas] Area-Time Tradeoff in Adders Parallel Prefix FSMD control style