Download presentation
Presentation is loading. Please wait.
Published byCalvin Berry Modified over 9 years ago
1
Energy Reduction for STT-RAM Using Early Write Termination Ping Zhou, Bo Zhao, Jun Yang, *Youtao Zhang Electrical and Computer Engineering Department *Department of Computer Science University of Pittsburgh 1 ICCAD 2009
2
Introduction Traditional SRAM Cache – Limited by density, leakage and scalability STT-RAM Cache? – High density (~4x than SRAM) – High speed (same read speed as SRAM) – Non-volatile – No write endurance problem 2
3
STT-RAM: Cell Magnetic Tunnel Junction (MTJ) Relative magnetization direction – Different resistances Logic 0 or 1 Write: spin-polarized current – Much less write current than conventional MRAM 3 MgO High Resistance (Logic 1) Low Resistance (Logic 0) Reference Layer Free Layer
4
Similar array structure as SRAM Bidirectional write current STT-RAM: Cell Array 4 write 0write 1 MTJ BLSLBLSL WL
5
STT-RAM Cache: Challenge High dynamic energy – 6~14x more energy per write access [Dong et al. DAC 2008, Sun et al. HPCA 2009] – Write contributes >74% of total dynamic energy 5 74.2% Need to reduce write energy in STT-RAM cache!
6
Opportunity Many bits are unchanged in a write access – Redundant bit-writes [Zhou et al. ISCA 2009] Redundant bit-writes in 16MB STT-RAM cache 6 88% How to exploit this opportunity?
7
Exploiting Redundant Bit-Writes Need to know the old value… Read & compare before write [Zhou et al. ISCA 2009] Can we do better? 7
8
Observation MTJ resistance changes abruptly by the end of write cycle – Cell still holds old value at early stage of write cycle Read is much faster than write 8 Y. Chen et al. ISQED 2008 Possible to sense the old value at early stage of write cycle
9
Early Write Termination: Idea On a write access… – Start write cycle like normal – Sense the old value at early stage – Terminate the write cycle if old value is same as new value Does not require a preceding read & compare! 9
10
EWT Circuit 10 MTJ pass Vsense1Vsense0 write 0 write 1 conversion Vin1Vin0 Conversion circuit -Basic differential amplifier -Input lower Output higher -Input higher Output lower Rwire Vsense0 Vsense1 Vref0 Vref1 Sense-Amp New value Terminate? SLBL WL
11
How EWT Works? 11 MTJ pass Vsense1 Vsense0 low write 0 high conversion Vin1 Vin0 Rwire Old ValueNew ValueVsense0SA outputAction 0 0higher1Terminate Vin0 lower 10 0Continuehigher 0.536ns SLBL WL
12
Advantages of EWT No performance penalty! – Carried within a write cycle – No need to read & compare before a write – Write access may finish early Slight speedup Low energy overhead (3.23%) Low complexity Easy to integrate with existing designs 12
13
MODELING STT-RAM AND EWT 13
14
Latency Modeling Cell – Derived from recent works [Dong et al. DAC 2008] Peripheral – Derived from CACTI [Thoziyoor et al. ISCA 2008, Dong et al. DAC 2008] 14
15
Dynamic Energy Modeling Baseline: Derived from recent works [Dong et al. DAC 2008] EWT – Read energy: same as baseline – Write energy: variable 15 Peripheral (derived from CACTI) Extra energy introduced by EWT circuits (HSPICE) N changed × E changed + N unchanged × E unchanged Cell changeTerminated cell change
16
Leakage Energy Modeling STT-RAM is non-volatile – Power gate the idle banks – Assume 1ns delay to “wake up” – Used in both baseline and EWT 16
17
Experimental Setup Simics-based simulator – 4-core CMP, 1GHz – 32KB private L1 cache – 16MB shared L2 cache using STT-RAM, 16 banks – 4GB main memory – Enhanced cache model: STT-RAM & EWT 17
18
Results: Performance 18 Normalized Cycle-Per-Instruction (CPI) 1% speedup Slight performance improvement
19
Results: Write Energy 19 Normalized write energy Up to 80% write energy reduction 70% saving
20
Results: Dynamic Energy 20 Normalized dynamic energy 52% reduction EWT Base
21
Results: Total Energy Normalized total energy 21 33% reduction
22
Results: Energy-Delay Product Normalized ED 2 22 34% reduction
23
Conclusion Address a key challenge to STT-RAM cache: dynamic energy EWT: Exploit redundant bit-writes without performance penalty – Low overhead and complexity Modeling and evaluation – Up to 80% write energy reduction – 34% ED 2 reduction 23
24
THANK YOU! 24
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.