Circuit Design with Alternative Energy-Efficient Devices Elad Alon Collaborators: Hei Kam, Fred Chen (MIT), Tsu-Jae King-Liu, Vladimir Stojanovic (MIT), Dejan Markovic (UCLA), Mark Horowitz (Stanford) Dept. of EECS, UC Berkeley
CMOS is Scaling, Power Can Not 1000 Predictions (ca. 2000) Reality (Core 2) Itanium II 100 Itanium Pentium 4 Pentium III Pentium Pro Power (W) 10 Pentium Pentium II 486DX 80286 8086 8088 1 8080 386DX 8008 S. Borkar, Intel 4004 0.1 1970 1975 1980 1985 1990 1995 2000 2005 2010
Supply and Threshold Voltages Ed Nowak, IBM Drain Current Id Scaling Vth, Vdd kT/q is fixed, so lowering Vth increases leakage (independent of technology) Since reached Vth that optimally balances leakage and dynamic energy at roughly 90nm node, can no longer scale Vth Means that Vdd is fixed in order to achieve certain performance, and hence power doesn’t scale well with technology. Gate Voltage Vg kT/q doesn’t scale, so lowering Vth increases leakage Fixed Vth, Vdd power density doesn’t scale well 3
Alternative Devices to the Rescue? Many new devices with S-1<60mV/dec proposed But, many of these are slow (low Ion) And/or have other “weird” characteristics Can these devices reduce energy? If so, at what performance? Need to look at the circuits MOSFET Drain Current Id New Device Slope=S-1 Gate Voltage Vg
Outline Energy-Performance Analysis Circuit Design with Relays Conclusions 5
Processor Power Breakdown Most components track performance vs. energy curves of logic Control, Datapath, Clock Use proxy circuit to examine tradeoffs
Proxy Circuit for Static Logic 0V Vdd Vdd Input Output Ld stages Switching activity factor = , Gate capacitance per stage = C tdelay = LdCVdd/(2Ion) Edyn+Eleak = αLdCVdd2 + LdIoffVddtdelay
Simple Optimization Rule Optimal Ion/Ioff Ld/α Derived in CMOS But holds for nearly all switching devices Pleak/Pdyn ~constant ~30-50% across wide range of parameters Nose and Sakurai
Using the Rule to Compare MOSFET “New Device” Energy Drain Current Id “New Device” MOSFET Vddx Vddx Gate Voltage Vg Performance Match Ioff by adjusting “VT” New device wins if: Ion,new(Vdd) > Ion,MOS(Vdd)
What Else Matters: Variability Delay: Finite Ld Cycle time set by worst-case Leakage: E(Ioff) vs. E(Vth)
What Else Matters: Wires & Area 0V Vdd Vdd Input Output Cw Cw Cw Cw Devices don’t drive just other devices Need to look at extrinsic cap (wires) too Especially if device has area overhead
Parallelism Serial: Perf. f Parallel: Perf. 2f, E/op ~const Energy “New Device” Parallel: Perf. 2f, E/op ~const MOSFET Performance If available, parallelism allows slower devices Extends energy benefit to higher performance
Minimum Energy Seff-1 At low performance or high parallelism: Drain Current Id Seff-1 Normalized Energy/cycle Lower Seff Gate Voltage Vg Vdd(V) At low performance or high parallelism: Lowest Vdd for required Ion/Ioff wins Vdd,min Seff, Emin Seff2
Example: Tunneling FET Gate Source Drain Drain Current Id (A/mm) Ion ≈A(Vgs+VT)exp[-B/(Vgs+VT)] [1] [1]J. Chen et al., IEEE Electron Device Lett., vol. EDL-8, no. 11, pp. 515–517, Nov. 1987. Gate Voltage Vg (V) Band-to-band tunneling device Steep transition (<60mV/dec) at low current Low Ion(<~100μA) Assume work function can be tuned
Energy-Performance Tradeoff 30 stages α=0.01 TFET Energy (J) MOSFET Performance (GHz) Competitive with subthreshold CMOS TFETs promising below ~100MHz
Outline Energy-Performance Analysis Circuit Design with Relays Conclusions 16
Nano-Electro-Mechanical Relay Gon Conductance Vrl Vpi Gate Voltage Vg [V] Based on mechanically making and breaking contact No leakage, perfectly abrupt transition Reliability is the key challenge
Circuit Design with Relays CMOS: Relay: CMOS delay set by electrical time constant Distribute logical/electrical effort over many stages Relay: mechanical delay (~10ns) >> electrical t (~1ps) Implement logic as a single complex gate Spend more time discussing stages vs. pass transistor logic. This characteristic means that we should take a design style that is a departure from the one we’re used to w/ CMOS Fundamental difference between CMOS & Relays - CMOS delay set by electrical time constant - more (delay) efficient to buffer in stages - Relay delay set mechanically - can sacrifice electrical (logical) complexity for mechanical simplicity Add # of relays & # of transistors to title of slides… 18
Relay Energy-Perf. Tradeoff TFET Stack of 30 series relays No leakage Vdd,min set only by functionality (surface force) How about real logic circuits? MOSFET Energy (J) Relay Performance (GHz)
Relay-Based Adder Manchester carry chain Ripple carry Cascade full adder cells N-bit adder still 1 mechanical delay
Adder Energy-Delay Compare vs. optimal CMOS adder ~10-40x slower Low Rcont not critical ~10-100x lower E/op Lower Cg Fewer devices, all minimum size Lower Vdd,min
Parallelism and Area If parallelism available, can trade area for throughput Competing with sub-threshold CMOS Area-overhead bounded
Power Breakdown Revisited Better logic “uncore” power dominant Need to analyze (and leverage) devices for entire system… Relay DRAM or NVM (not SRAM)? Relay ADC/DACs?
Outline Simple Energy-Performance Analysis Circuit Design with Relays Conclusions 24
Summary New devices need circuit level analysis Ion/Ioff set by logic depth, activity factor Don’t forget about variability, wires Tailor circuit style to the device If available, parallelism may allow slower (low Ion) devices Don’t forget about the rest of the system
Today: Parallelism lowers E/op Future: Parallelism doesn’t help Good News/Bad News Parallelism still available in CMOS But eventually limited by Emin Opportunity for new devices… At least in sub-100MHz applications Today: Parallelism lowers E/op Future: Parallelism doesn’t help -1
Acknowledgements Berkeley Wireless Research Center NSF DARPA FCRP