Click to edit Master title style Progress Update Energy-Performance Characterization of CMOS/MTJ Hybrid Circuits Fengbo Ren 05/28/2010
Modern MTJ Bias voltage/current controlled variable resistance device –Low: R P –High: R AP –TMR = (R AP - R P )/ R P Spin-transfer-torque (STT) Switching –Switching is controlled by the direction of writing current. –Writing current density has to exceed thresholds 2
Motivations for Hybrid Logic Significant application in MRAM design. Why logic? –CMOS-compitible ● Switching current: 200uA – 2mA ● 90nm transistor: 1mA/um gate width –Non-volatility, high stability ● Introducing MTJ's non-volatility into CMOS, which may suppress leakage in active mode and reduce the leakage in idle mode to minimum. –3D – stack ● Replace CMOS with MTJ may increase density. 3
Questions? What architecture can best utilize MTJ's non-volatility feature to improve energy efficiency? Can MTJ/CMOS hybrid circuit has better energy delay trade-off than CMOS circuit? How much leakage power can be saved by introducing MTJ to CMOS? Any overhead? How much is the switching power of MTJ? What will be the trend of MTJ/CMOS hybrid circuit with technology scaling? 4
Logic-in-Memory MTJ (LIM-MTJ) Logic Style LIMT-MTJ –Use differential MTJ in Dynamic Current-mode Logic (DyCML) ● Outputs are evaluated based on the resistance difference of pull down networks through x-coupled PMOS. ● Claimed to have dynamic and static power than SCMOS. 5 Schematic of LIM-MTJ 1-bit full adder.
Energy-Performance Characterization V.S. SCMOS & DyCML –LIM-MTJ has no energy performance advantage as compared to the equivelent CMOS implementation 6 Schematic of SCMOS 1-bit full adder. Schematic of DyCML 1-bit full adder.
MTJ Switching Energy Analysis Switching Energy –I W = J C ∙A, ● J C is the critical current density ● A is the junction area. A = π∙W∙L= K∙L 2, L is junction size. –R = δ/A ● δ is the resistance-area product, intrinsic MTJ parameter. δ = 20 Ω ∙ um 2 –t is time. 7
MTJ Switching Energy Analysis J C is a function of current pulse width. –Switching time is a function of current density. ● Δ is the thermal stability factor (Δ≥40) ● t 0 is the intrinsic switching time. t 0 = 1 ns ● J C0 is the intrinsic critical current density, J C0 = J C at t= t 0. –Modern MTJs have been shown to have J C0 = 2-7 MA/cm 2 8
Switching Energy –Function of switching time (t) given J C0, δ, L, Δ –Ref. MTJ ● J C0 = 5 MA/cm 2, δ= 20 Ω ∙ um 2, L=135nm, (W=65 nm,) ● R P =725 Ω, I C t=1ns Switching Energy > 1 pJ –CMOS/MTJ hybrid logic circuits require frequent switching is hardly energy efficient. MTJ Switching Energy Analysis 9
Switching Energy with scaling –δ, L, J C0 fJ Switching –δ ≤ 5 Ω ∙ um 2 & J C0 ≤ 0.6 MA/cm 2 & L ≤ 33nm 10
LUT-based Logic Store the true table in memory Reads out the logic value based on input selection. –Reconfigurable –Can implement all type of logics. e.g. FPGA Replace storage cell with MTJ –No MTJ switching during the logic operation. Only need to be configured once. –Non-volatile, minimum stanby power. –Instant boot-up. 11 Example of 3 input LUT
MTJ Reading Circuit Conventional current-mirror sense amplifier based reading circuit. (SA) –Slow (2 stages) –Power hungry (DC current) 12 ∆V∆V ∆V∆V VIP VIN
MTJ Reading Circuit X-coupled inverter based reading circuit. (XSA) –Fast ● ∆V are generated and amplified at the same time –Power efficient ● no DC current, only charging discharging capacitance 13 ∆V at evaluation phase 1MTJ and 1R ref accessed per read Amplified by X- coupled inverter
Energy Performance Comparison 14
Instant Power 15
1 Bit Full Adder (CMOS_LUT) Transistor Count –16xEDFF –4xMUX4 –2xMUX2 –672 Transistors 16
1 Bit Full Adder (MTJ_LUT1) Transistor Count –16xREAD1XMTJ –4xMUX4 –2xMUX2 –2xWRTCKT –448 Transistors –33% Reduction –16 MTJ 17
READ1XMTJ 15T+1MTJ Need writing circuit 18
1 Bit Full Adder (MTJ_LUT2) Transistor Count –2x READ8XMTJ –1x 9-WORD DECODER –2x MUX2 –1x INV –1x WRTCKT –174 Transistors –76% Reduction –16 MTJ 19
READ8XMTJ MTJs share reading circuit 1MTJ + 1 R ref are accessed / read 1MTJ is accessed / write 23T + 8 MTJ 20
Simulation Setup 3 LUT architecture are compared –CMOS-LUT –MTJ-LUT1: MTJ reading circuit + MUX –MTJ-LUT2: Shared MTJ reading circuit + decoder Configured to implement 1-bit full adder –2 3-input LUTs ASU predictive technology model (PTM) –90nm, 65nm (bulk) –45nm, 32nm (SOI) MTJ characteristic –Rp = 700, Rap = 1400, TMR = 100%, I cap2p = 223uA, I cp2ap = 500uA –Verilog-A MTJ model from Richard. 21
Configuration Power CMOS-LUT –1GHz MTJ-LUT –250MHz –750uA Writing Current –About 3 ns Writing time / MTJ MTJ-based LUT are 10x bigger configuration power –16 MTJ’s switching energy 22
Delay MTJ-based LUT2 has 2.5x bigger delay 23
Leakage Power MTJ-LUT1 has a little bit bigger leakage power MTJ-LUT2 has about 5x smaller total leakage power and –10x smaller storage leakage (due to MTJ) –2x smaller logic leakage (from MUX to decoder) 24
Energy (Operation Frequency:100MHz) LUT2 –4x total energy 32nm ● 1/10 leakage_storage, ½ leakage_logic, bigger dynamic_logic ● Dynamic_storage overhead decreases with technology scaling down. 25
Energy (Operation Frequency:250MHz) LUT2 –3x total energy 32nm ● 1/10 leakage_storage, ½ leakage_logic, ½ dynamic_logic ● Dynamic_storage overhead decreases with technology scaling down. 26
Energy (Operation Frequency:500MHz) LUT2 –2x total energy 32nm ● 1/10 leakage_storage, ½ leakage_logic, ½ dynamic_logic ● Dynamic_storage overhead decreases with technology scaling down. 27
Standby Power 28 Dynamic sleep transistor –50mV voltage drop across sleep transistor 5-20X reduction Standby Power (uW)Technology Node Structure90nm65nm45nm32nm CMOS-LUT MTJ-LUT MTJ-LUT
Conclusions What architecture can best utilize MTJ's non-volatility feature to improve energy efficiency? –LUT-based logic which require no MTJ switching. Can MTJ/CMOS hybrid circuit has better energy delay trade-off than CMOS circuit? –Yes. How much leakage power can be saved by introducing MTJ to CMOS? –About 10x reduction Any overhead? How much is the switching power of MTJ? –Yes. MTJ reading energy is overhead. MTJ writing energy of modern MTJ is around several pJ. What will be the trend of MTJ/CMOS hybrid circuit with technology scaling? –Will play significant role in suppressing leakage below 45 nm. 29