© Ran Ginosar Lecture 3: Handshake Ckt Implementations 1 VLSI Architectures Lecture 3 S&F Ch. 5: Handshake Ckt Implementations
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 2 Implementations We only consider simple circuits More aggressive circuits will come later First, reminder on latches
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 3 4- & 2-phase bundled data latches
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 4 4-phase dual rail – many bits
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 5 4-phase Fork, Join
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 6 4-phase Bundled-data Mux
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 7 4-phase Bundled-data Demux
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 8 4-phase Merge
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 9 4-phase Merge Mutually exclusive inputs. Guaranteed elsewhere! (more later..) Assume X active… …C-element sees input glitch Relative Timing: x-req < z-ack simplify CEL
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 10 Asymmetric C Element Useful when we know the relative timing: b < a only a needed to pull up Only one pMOS - faster
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 11 2-phase Merge Try it at home… This is not an assignment!
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 12 Mutual Exclusion: MUTEX
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 13 Standard Gate MUTEXs Not fully guaranteed that outputs are M/E, but highly probable ! Very low threshold
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 14 Arbiter
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 15 Arbitrating Merge
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 16 Function Blocks We said “transparent” but… –Need a matched delay for bundled-data –Need to generate completion for dual-rail –Need to join inputs, fork outputs:
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 17 Transparency Revisited Function blocks must not affect how the latches “shake hands” (except for timing)
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 18 Indication Revisited FB(req_out ) means –FB(req_in ) –Computation finished, data out ready Simple “strong indication” for bundled data: 1: ALL DATA_IN VALID 2: REQ_IN 3: COMPUTE 4: ALL DATA_OUT VALID 5: REQ_OUT
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 19 Strong vs. Weak Indication Strong Indication: All inputs must arrive before any output is allowed (“indicated”). –Even if some outputs are ready earlier, there is no REQ_OUT, so they cannot be used. –Implies worst-case latency Weak Indication: Some outputs are allowed even before all inputs arrived –Only makes sense in dual-rail:
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 20 Weak Indication No REQ on dual-rail – each bit is “self- indicating” May lead to faster circuits Example chain of events:
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 21 Composition of FBs Legal composition: –All inputs and outputs are connected –No cycles Legal composition of weekly indicating FBs is weakly indicating Legal composition of strongly indicating FBs is strongly indicating
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 22 Example: Ripple-carry
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 23 Example: Ripple-carry Full adder (a,b,c) = (s,d) –s = a b c –d = ab + ac + bc Shortcuts for look-ahead (prop, gen, kill): –p = a b s = p c –g = abd = g + pc, OR d' = k + pc' –k = a' b' Sometimes d can be made valid without waiting for c
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 24 Speculative / Strong Ripple Carry 16 bit ripple-carry adder, bundled-data Longest carry is 16 stages But if p 8 =0 then longest carry is 8 stages And if p 12 p 8 p 4 =0, then longest carry is 4 stages If willing to trade area and power for speed:
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 25 Speculative / Strong Ripple Carry
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 26 ST-CL Based on David, Ginosar, Yoeli, "An Efficient Implementation of Boolean Functions as Self-Timed Circuits,'' IEEE Trans. Computers, Jan. 1992
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 27 Dual-Rail DIMS PLA Notation
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 28 Dual-Rail DIMS Adders Still slow: LF(V) = LF(E)
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 29 Transistor Level DIMS Too many P transistors - slow Some N paths can be shared:
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 30 Hybrid Adder Dual-rail carry (for flexible latency) Bundled-data data inputs and sum output (for lower area and power) Data-dependent data-forward (V) latency Constant empty-forward (E) latency
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 31 Hybrid Adder Dual-rail Bundled-data
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 32 Domino Logic Dual Rail Req Out: Either by (flexible) Completion Detection or by matched (worst case) delay
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 33 Hybrid Adder: Sum Ckt
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 34 Hybrid Adder: Two Carry Ckts Weak IndicationStrong Indication KILL GEN
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 35 Hybrid Adder: Two Carry Ckts WEAK CARRY STRONG CARRY STRONG CARRY WEAK CARRY STRONG CARRY STRONG CARRY … CD Slightly faster…