A 64b Adder Using Self-Calibrating Differential Output Prediction Logic K. H. Chong and Larry McMurchie Dept. of Electrical Engineering University of Washington.

A 64b Adder Using Self-Calibrating Differential Output Prediction Logic K. H. Chong and Larry McMurchie Dept. of Electrical Engineering University of Washington Carl Sechen Electrical Engineering Dept. University of Texas at Dallas Advanced VLSI FALL 2006 CLASS PRESENTATION BY: A.Jahanshahi Dept. of Electrical And Computer Engineering University of Tehran ISSCC 2006 Supervisor: M.Fakhraei

2 Outline History of Output Prediction Logic Introduction to Output Prediction Logic (OPL) –Fastest digital logic technique Self-calibrating Differential OPL (DOPL) –Twice as fast and uses half the energy compared to domino logic High-speed, power-efficient 64b adder architecture –Valency-3 Kogge-Stone sparse tree –DOPL-specific 3b carry-select units –Only 5 logic levels Measurement results

3 History First introduced in 2000[1] - speed up of 2X to 3X over optimized static CMOS and to 5X when applied to wide input NOR Gates.[1] Differential OPL(DOPL) is 5 times faster than optimized static CMOS and nearly 2 times faster than OPL and domino [2]. Until Now several successful Chips have been reported.[3-7]

4 Gate 1Gate 2Gate 3Gate 4 1 1 0 0 Gate 1Gate 2Gate 3Gate 4 1 1 11 clk1 clk2 clk3 clk4 OPL reduces the worst case by predicting that all outputs are 1 On any critical path, only every other gate will have to transition (pull down) If consecutive gates pull down, then this is not a critical path since the first pull- down event does not cause the second Critical path delay will be reduced by at least 50% Speedup > 2X by skewing the gates for pd transitions How to achieve high inputs and high outputs on inverting gates? Output Prediction Logic Both static and OPL gates are inherently inverting In the worst-case for static CMOS, the output of every gate in a critical path must fully transition from 1 to 0, or 0 to 1

5 Three Types of Single-Rail OPL Gates OPL-staticOPL-pseudoOPL-dynamic a b c abc clk out abc clk out abc clk out clk In all 3 cases, when gates are not enabled (clk = 0), output will be high even if inputs are high Enable the gates (clk = 1) when inputs have arrived

6 OPL Clock Separations Gate 1Gate 2Gate 3Gate 4 1 1 11 clk1 clk2 clk3 clk4 clk i-1 clk i clk i+1 clock separation  t i One fast pull-down event every TWO adjacent clock separations! –for ANY critical path Separations are small, less than an inverter delay which can be produced for example by Reduced swing logic[3] Predicted 1’s are maintained by delaying the clocks

7 Delay vs. Clock Separation V DD GND CLK IN OUT CLK IN OUT GND CLK IN OUT GND Clock blockingDelay optimalClock too early Red: OPL-static NOR3 chain Blue point is optimally sized static CMOS NOR3 chain Robust: +/- 30% over nominal sep. of.14 gives > 2X speedup Delay (ns) Clock separation (ns) [2]

8 Drawback of true differential gates is that one side or the other will have a tall stack of devices In differential domino, in the worst case, every stack on a particular signal path will have to discharge In OPL-differential, at most every other stack on a critical signal path will have to discharge: 2X speedup OPL-Differential Gates

9 Diff. Domino vs. OPL-Diff. Delays (ns) for chains of 10 gates (FO of 4) in 0.18um TSMC –static CMOS chains are optimally sized –domino and OPL-differential use same size transistors

10 Improved OPL-Differential S. Kio, L. McMurchie, and C. Sechen, “Application of Output Prediction Logic to Differential CMOS,” Proc. of IEEE Computer Society Annual Workshop on VLSI, Orlando, FL, April 19-20, 2001 t fall (OUT)  t fall (OUTB) needed for contention free evaluation OUTB OUT clk P4 P3 P1 P2 clk OUTB OUT P4 P3 P1 P2 clk

11 Self-Calibrating Differential OPL clk clk_ref T-gate Buffer tree 1st 2nd 3rd 4th 5th 1st level DOPL 2 nd level DOPL 3 rd level DOPL 4 th level DOPL 5 th level DOPL Dual output gates: Use a completion detector to produce a downstream clock –Ideally should feed to the next level –But, DOPL gates are too fast! If a DOPL gate evaluates slower (faster) than expected, downstream clock will be delayed (sped-up) to compensate

12 Clock Skew Reduction DOPL circuits are levelized Completion detector outputs for each level are tied together Cannot use static CMOS NAND2’s due to contention DOPL clk_ref T-gate Buffer tree 1 st 2 nd 3 rd 4 th 5 th DOPL 1 st 2 nd 3 rd 4 th 5 th

13 pMOS Dynamic NAND2 Completion Detector clk Reset out2out1 out4out3 Minimizes crowbar current Fast, monotonic rising clock edge Power consumption is comparable to that of an inverter out2 clk out1 Evaluate devices out4 out3 Evaluate devices DOPL1 DOPL2

14 Low Skew Inverter Generates Reset Signals DOPL clk_ref T-gate Buffer tree 1st 2 nd 3 rd 4 th 5 th Reset(3) Reset(1) Reset(2) clk Reset(n) in1 in2 in3 in4 Crowbar current is minimized: –Reset goes low slightly before DOPL evaluates –Reset goes high slightly after DOPL pre-charges

15 Self-Calibrating DOPL Floorplan

16 64b Adder Architecture c16 c13c10 c7 c4 c1 Valency-3 Kogge-Stone sparse carry tree log 3 N levels for every 3rd carry, but the challenge is in efficiently producing the “missing” pairs of carries

17 3b Carry-Select Units Cin C2 C5 C8 Valency-3 Kogge-Stone sparse tree quickly generates every 3rd carry –log 3 N levels Use carry-select CLA Adder to output sums when this “quick” carry arrives S1 0 -S0 0 S4 0 -S2 0 S7 0 -S5 0 S1 1 -S0 1 S4 1 -S2 1 S7 1 -S5 1 MUX Cin C2 C5 S1-S0 S4-S2 S7-S5

18 64b Adder Layout and Photomicrograph IBM 130nm 1.2V process (8RF): Area = 264um X 180um Auto placed and routed

19 Energy per operation: 29.5 pJ for the IBM 130nm 1.2V process Since energy is proportional to CV 2, we can conservatively estimate the energy consumption for a 90nm 1.1V process: Energy Consumption

20 Measured Results 1.R. Zlatanovici and B. Nikolic, “Power-performance optimal 64-bit carry-lookahead adders” Proc. ESSCIRC, Sep 2003, pp. 321 – 324. 2.S. Sun, Y. Han, X. Guo, K.H. Chong, L. McMurchie, and C. Sechen, “409ps 4.7 FO4 64b Adder Based on Output Prediction Logic in 0.18um CMOS,” Proc. IEEE Comp. Soc. Annual Symp. on VLSI (ISVLSI), 11-12 May 2005, Pages: 52 – 58. 3.S. Perri, P. Corsonello, and G. Staino, “A Low Power Sub-Nanosecond Standard-Cells Based Adder,” Proc. IEEE ICECS 2003, pp. 296 – 299.

21 V DD vs. Delay Curve for 64b DOPL Adder

22 Summary Developed self-calibrating differential output prediction logic (DOPL) –Twice as fast as domino logic, and half the energy Developed hybrid 64b adder architecture, consisting of a valency-3 Kogge-Stone sparse tree and DOPL-specific 3b carry-select units 64b adder implemented using 130nm 1.2V IBM process (8RF) Nominal measured delay 238ps (3.9 FO4) –Best measured delay 215ps (3.5 FO4) Fastest 64b adder reported by nearly 2X DOPL is a great candidate for scaling. Energy: 29.5 pJ (conservatively scales to 17.2 pJ for a 90nm process) –Competitive with fast static CMOS adders

23 References [1] L. McMurchie, S. Kio, G. Yee, T. Thorp and C. Sechen, “Output Prediction Logic: A High Performance CMOS Design Technique”, Proc. Int. Conf. On Computer Design (ICCD), September 17-20,2000, Austin, TX. [2] Kio Su, et al., “Application of Output Prediction Logic to Differential CMOS,” Proc. IEEE Workshop on VLSI, pp. 57- 65, April 2001. [3] S. Sun, Y. Han, X. Guo, K.H. Chong, L. McMurchie, and C. Sechen, “409ps 4.7 FO4 64b Adder Based on Output Prediction Logic in 0.18um CMOS,” Proc. IEEE Comp. Soc. Annual Symp. on VLSI (ISVLSI), 11-12 May 2005, Pages: 52 – 58. [4] X. Guo and C. Sechen, “A High Throughput Divider Implementation,” Proc. IEEE CICC, Paper 15.2, Sept., 2005. [5] R. Zlatanovici and B. Nikolic, “Power-Performance Optimal 64-bit Carry-Lookahead Adders,” Proc. ESSCIRC, pp. 321 - 324, Sept., 2003. [6] Sheng Sun; McMurchie, L.; A High-Performance 64-bit Adder Implemented in Output Prediction Logic, Sechen, C.; Advanced Research in VLSI, 2001. ARVLSI 2001. Proceedings. 2001 Conference on 14-16 March 2001 Page(s):213 - 222 [7] High Speed Redundant Adder and Divider in Output Prediction Logic, Proceedings of the IEEE Computer Society Annual Symposium on VLSI New Frontiers in VLSI Design,2005

A 64b Adder Using Self-Calibrating Differential Output Prediction Logic K. H. Chong and Larry McMurchie Dept. of Electrical Engineering University of Washington.

Similar presentations

Presentation on theme: "A 64b Adder Using Self-Calibrating Differential Output Prediction Logic K. H. Chong and Larry McMurchie Dept. of Electrical Engineering University of Washington."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A 64b Adder Using Self-Calibrating Differential Output Prediction Logic K. H. Chong and Larry McMurchie Dept. of Electrical Engineering University of Washington.

Similar presentations

Presentation on theme: "A 64b Adder Using Self-Calibrating Differential Output Prediction Logic K. H. Chong and Larry McMurchie Dept. of Electrical Engineering University of Washington."— Presentation transcript:

Similar presentations

About project

Feedback