A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project
May 9, EE241 Motivation Asynchronous designs give better throughput and has higher efficiency Larger circuits and smaller transistors is more susceptible to process variations. Process variation decreases yield of circuits Reach optimum clocking frequency per block Need for self timing with the circuits with a signal completion, which also increases yield from process variations.
May 9, EE241 Past Solution GALS DCDVSL IEEE, 1998
May 9, EE241 PS-CMOS 16-bit Kogge-Stone Pipelined Adder Adders –Adder is an integral part of ALU –Large pipelined adders may be beneficial for large adders to increase clock frequencies and throughput
May 9, EE241 PS-CMOS 16-bit Kogge Stone Pipelined Adder Adder Design –Kogge-Stone CLA –Four stages »Stage 1: Bit P and G »Stage 2: Dot 1, 2 »Stage 3: Dot 3, 4, Cout »Stage 4: Sum –2-Input gates, no complex logic gates, for significant logic depth
May 9, EE241 PS-CMOS 16-bit Kogge Stone Pipelined Adder PS-CMOS –Monotonic output transition –Noise Immunity –Pseudo-dynamic, fast evaluate
May 9, EE241 PS-CMOS 16-bit Kogge Stone Pipelined Adder Completion Signal –Simple scheme that is compatible with PS- CMOS »DCDVSL »Dummy paths –Take advantage of monotonic output transition –Input the worst case input vector upon startup to find clock frequency –Calibrate in situ
May 9, EE241 Completion Signal Scheme Output signal Clock signal Delay Output signal precharge evaluate
May 9, EE241 Completion Signal Scheme Slow Clock Output signal Clock signal Delay Output signal precharge evaluate
May 9, EE241 Completion Signal Scheme Fast Clock Output signal Clock signal Delay Output signal precharge evaluate
May 9, EE241 Completion Signal Circuitry
May 9, EE241 Completion Signal Circuitry Input OutputCritical path Input check sum Check for delay Increase, decrease, stop counting 8-bit Counter VCO Clock Generation DAC Stage 3
May 9, EE241 Results
May 9, EE241 Results
May 9, EE241 Results Theoretical delay 714ps, measure 850ps For in situ calibration how often will the worst case input vector appear? –Assuming perfectly random inputs –Worst case input vector will appear approximately once every 10 5 switches –Circuit runs approximately 10 9 switches per second –Every second there can be a potential of 10 4 updates. –This sets the optimum clock speed to clock for the calibration circuitry
May 9, EE241 Results Counter able to count up as well as down –Speed up and slow down based on conditions Ability to calibrate for different supply voltages Ability to test at startup and in situ
May 9, EE241 Discussion Ideal sensor has 0 capacitance –We have small capacitance –1 inverter, 1 latch Circuit Overhead low Probability of Switching –Maximum clock frequency for test circuit –Calibration frequency is high –Multiple paths available for detection Closes feedback path for DVS
May 9, EE241 Discussion During clock change no evaluation is allowed Slack margin 100 ps built in delay from detection –Nonexistant with registers, because of intrinsic need of delay for registers PS-CMOS –Difficult to implement XOR –Not straight forward logic –When used with latches timing of precharge and evaluate is difficult Frequency increments –Small time step necessary for stability
May 9, EE241 Future Improvements Fix delay overhead of detection circuit Fix problems from latch based design Circuitry for multiple path detection Super Pipeline 256-bit adder Ability to run adder slower –Monitor precharge
May 9, EE241 Conclusion Shown a simple completion signal scheme for a pipelined PS-CMOS adder Small amount of overhead Ability to adjust clock frequencies during operation not only on startup Because I could not stop for Death, He kindly stopped for me; The carriage held but just ourselves And Immortality. -Emily Dickinson, 1924