A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Faezeh Montazeri Advanced VLSI Course Presentation University of Tehran December 2006 Based on : A 240ps 64b Carry-Lookahead Adder in 90nm CMOS Sean Kao, Radu Zlatanovici, Borivoje Nikolić University of California, Berkeley
2 What Is an Optimal Adder? Optimal adder: Minimum delay for given energy Minimum energy for given delay 64-bit Adders on IEEE Xplore [1]
3 This Work Multi-issue 64-bit microprocessor environment: Optimize a set of representative 64-bit adders in the energy – delay space Analyze the design tradeoffs Implement the optimal adder in 1.0V 90nm GP CMOS
4 Outline Energy – delay optimization Design tradeoffs for 64-bit adders Test chip implementation Measured results Summary
5 Energy – Delay Optimization Delay Energy Domino CLA Adder Goal: obtain the energy – delay optimal adder CAD tool: optimize custom digital circuits in the energy – delay space [3] Static CLA Adder [1]
6 Circuit Optimization Framework Optimizer (Matlab) Delay, Energy Static timer (C++) ModelsNetlistOptimization Goal Optimal Design Variables Design Variables Static timer (C++) Optimization Core [1]
7 Adder Optimization Setup Minimize DELAY subject to Maximum ENERGY [1]
8 CLA: Full Tree Comparison 6 stages Moderate branching 3 stages Larger branching Radix- 4 closer to optimum number of stages Radix-2 Radix-4 [1]
9 CLA vs. Ling Conventional CLA Higher stack in first stage Simple sum precompute Ling CLA Lower stack in first stage Complex sum precompute Higher speed [1] [2]
10 Full vs. Sparse Comparison FULL SP2Ling CLA [1]
11 Full vs. Sparse Comparison FULL SP2Ling CLA SP2 R2+ R4+ [1]
12 Full vs. Sparse Comparison Sparseness benefits adders with large carry trees FULL SP4Ling CLA SP2SP4 R2++ R4+– [1]
13 Optimal Adder Ling’s equations Radix-4 sparse-2 Domino carry tree Static sum-precompute Delay of fastest adder: 7.3 FO4 [1]
14 Radix-4 Sparse-2 Carry Tree Computes every other Ling pseudo-carry: H0, H2, H4 … Each output selects two sums [1]
15 Adder Core Block Diagram Critical paths implemented in clock-delayed domino Non-critical paths implemented in static At-speed BIST [1]
16 Timing Diagram 20 ps margin on all edges; Adjustable hard edges Delay spread places precharge in critical path [1]
17 Layout Floorplan Bitslice height: 24 metal tracks Aligned clock lines Sum precompute occupies space freed by sparse carry tree [1]
18 90 nm Test Chip 1.7 mm 1.6 mm 90 nm GP 7M 1P SVT transistors V DD = 1V 8 adder cores + test circuitry Core 1: this work Cores 2-8: Supply noise measurements and supply grid experiments [4]. Adder core size: 417 x 75 m 2 [1]
19 [1]
20 Chip Packaging Chip-on-board: Bond wires 60% shorter Cleaner supply 10 ps shorter delays Advance ProgramDigest [1]
21 Measured Results: Delay CHIP-ON-BOARD: V DD = 1 V –Average: 240 ps –Fastest: 226 ps V DD = 1.3 V –Average: 180 ps D avg = 7.5 FO4 [1]
22 Measured Results: Power V DD = 1V:P max = 260 mW V DD = 1.3V:P max = 606 mW Adder core Clk gen BIST Leakage [1]
23 Conclusion 90 nm GP 7M 1P SVT transistors V DD = 1V 8 adder cores + test circuitry Adder core size: 417 x 75 m 2
24 64-bit Adders on IEEE Xplore Summary Ling radix-4 sparse-2 domino carry tree 90nm GP CMOS: 240ps, [1]
25 References [1]. S. Kao, R. Zlatanovici, B. Nikolic, “A 240ps 64-bit Carry-Lookahead Adder in 90nm CMOS,” ISSCC2006, Feb [2]. H. Ling, “High Speed Binary Adder,” IBM J. R&D, vol. 25, no. 3, pp , May, [3]. R. Zlatanovici, B. Nikolic, “Power – Performance Optimization for Custom Digital Circuits,” Proc. PATMOS, pp , Sept., [4] V. Abramzon, E. Alon, M. Horowitz Stanford University