Circuit Design with Alternative Energy-Efficient Devices Elad Alon Collaborators: Hei Kam, Fred Chen (MIT), Tsu-Jae King-Liu, Vladimir Stojanovic (MIT), Dejan Markovic (UCLA), Mark Horowitz (Stanford) Dept. of EECS, UC Berkeley
2 CMOS is Scaling, Power Can Not S. Borkar, Intel Power (W) DX 486DX Pentium Pentium Pro Pentium II Pentium III Pentium 4 Itanium Itanium II Reality (Core 2) Predictions (ca. 2000) 2010
3 Supply and Threshold Voltages kT/q doesn’t scale, so lowering V th increases leakage Fixed V th, V dd power density doesn’t scale well Ed Nowak, IBM Drain Current I d Gate Voltage V g Scaling V th, V dd
4 Alternative Devices to the Rescue? Drain Current I d Gate Voltage V g Slope=S -1 MOSFET New Device Many new devices with S -1 <60mV/dec proposed But, many of these are slow (low I on ) –And/or have other “weird” characteristics Can these devices reduce energy? If so, at what performance? –Need to look at the circuits
5 Outline Energy-Performance Analysis Circuit Design with Relays Conclusions
6 Processor Power Breakdown Most components track performance vs. energy curves of logic Control, Datapath, Clock Use proxy circuit to examine tradeoffs
7 Proxy Circuit for Static Logic L d stages Switching activity factor = , Gate capacitance per stage = C 0V V dd Input Output V dd t delay = L d CV dd /(2I on ) E dyn +E leak = αL d CV dd 2 + L d I off V dd t delay
8 Simple Optimization Rule Optimal I on /I off L d /α –Derived in CMOS –But holds for nearly all switching devices P leak /P dyn ~constant –~30-50% across wide range of parameters Nose and Sakurai
9 Using the Rule to Compare Match I off by adjusting “V T ” New device wins if: I on,new (V dd ) > I on,MOS (V dd ) Drain Current I d Gate Voltage V g MOSFET “New Device” Energy Performance V ddx MOSFET “New Device”
10 What Else Matters: Variability Leakage: –E(I off ) vs. E(V th ) Delay: –Finite L d –Cycle time set by worst-case
11 What Else Matters: Wires & Area 0V V dd Input Output V dd CwCw Devices don’t drive just other devices Need to look at extrinsic cap (wires) too –Especially if device has area overhead CwCw CwCw CwCw
12 Parallelism Energy Performance MOSFET “New Device” If available, parallelism allows slower devices –Extends energy benefit to higher performance Serial: Perf. f Parallel: Perf. 2 f, E/op ~const
13 Minimum Energy Lower S eff Normalized Energy/cycle V dd (V) At low performance or high parallelism: –Lowest V dd for required I on /I off wins V dd,min S eff, E min S eff 2 S eff -1 Drain Current I d Gate Voltage V g
14 Example: Tunneling FET Band-to-band tunneling device –Steep transition (<60mV/dec) at low current –Low I on (<~100μA) Assume work function can be tuned NP Source Drain Gate I on ≈A(V gs +V T )exp[-B/(V gs +V T )] [1] Gate Voltage V g (V) Drain Current I d (A/ m) [1]J. Chen et al., IEEE Electron Device Lett., vol. EDL-8, no. 11, pp. 515–517, Nov
15 MOSFET TFET Energy (J) Performance (GHz) 30 stages α=0.01 Energy-Performance Tradeoff Competitive with subthreshold CMOS TFETs promising below ~100MHz
16 Outline Energy-Performance Analysis Circuit Design with Relays Conclusions
17 Nano-Electro-Mechanical Relay Based on mechanically making and breaking contact –No leakage, perfectly abrupt transition Reliability is the key challenge Conductance Gate Voltage V g [V] G on V pi V rl
18 Circuit Design with Relays CMOS delay set by electrical time constant –Distribute logical/electrical effort over many stages Relay: mechanical delay (~10ns) >> electrical (~1ps) –Implement logic as a single complex gate CMOS: Relay:
19 MOSFET TFET Energy (J) Performance (GHz) Relay Energy-Perf. Tradeoff Stack of 30 series relays No leakage –V dd,min set only by functionality (surface force) How about real logic circuits? Relay
20 Relay-Based Adder Manchester carry chain Ripple carry –Cascade full adder cells N-bit adder still 1 mechanical delay
21 Adder Energy-Delay Compare vs. optimal CMOS adder ~10-40x slower –Low R cont not critical ~10-100x lower E/op –Lower C g –Fewer devices, all minimum size –Lower V dd,min
22 Parallelism and Area If parallelism available, can trade area for throughput Competing with sub-threshold CMOS –Area-overhead bounded
23 Power Breakdown Revisited Better logic “uncore” power dominant Need to analyze (and leverage) devices for entire system… –Relay DRAM or NVM (not SRAM)? –Relay ADC/DACs?
24 Outline Simple Energy-Performance Analysis Circuit Design with Relays Conclusions
25 Summary New devices need circuit level analysis I on /I off set by logic depth, activity factor Don’t forget about variability, wires Tailor circuit style to the device If available, parallelism may allow slower (low I on ) devices Don’t forget about the rest of the system
26 Good News/Bad News Parallelism still available in CMOS But eventually limited by E min Opportunity for new devices… At least in sub- 100MHz applications Today: Parallelism lowers E/op Future: Parallelism doesn’t help
27 Acknowledgements Berkeley Wireless Research Center NSF DARPA FCRP