Energy Reduction for STT-RAM Using Early Write Termination Ping Zhou, Bo Zhao, Jun Yang, *Youtao Zhang Electrical and Computer Engineering Department *Department.

Slides:



Advertisements
Similar presentations
COEN 180 SRAM. High-speed Low capacity Expensive Large chip area. Continuous power use to maintain storage Technology used for making MM caches.
Advertisements

Prefetching Techniques for STT-RAM based Last-level Cache in CMP Systems Mengjie Mao, Guangyu Sun, Yong Li, Kai Bu, Alex K. Jones, Yiran Chen Department.
Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
University of Michigan Electrical Engineering and Computer Science 1 A Distributed Control Path Architecture for VLIW Processors Hongtao Zhong, Kevin Fan,
Robust Low Power VLSI R obust L ow P ower VLSI Sub-threshold Sense Amplifier (SA) Compensation Using Auto-zeroing Circuitry 01/21/2014 Peter Beshay Department.
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
Probabilistic Design Methodology to Improve Run- time Stability and Performance of STT-RAM Caches Xiuyuan Bi (1), Zhenyu Sun (1), Hai Li (1) and Wenqing.
Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring Lei Jin and Sangyeun Cho Dept. of Computer Science University.
Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,
†The Pennsylvania State University
STT-RAM as a sub for SRAM and DRAM
Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen.
Adaptive Techniques for Leakage Power Management in L2 Cache Peripheral Circuits Houman Homayoun Alex Veidenbaum and Jean-Luc Gaudiot Dept. of Computer.
Phase Change Memory What to wear out today? Chris Craik, Aapo Kyrola, Yoshihisa Abe.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science August 20, 2009 Enabling.
Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.
11/29/2004EE 42 fall 2004 lecture 371 Lecture #37: Memory Last lecture: –Transmission line equations –Reflections and termination –High frequency measurements.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
1 Drowsy Caches Simple Techniques for Reducing Leakage Power Krisztián Flautner Nam Sung Kim Steve Martin David Blaauw Trevor Mudge
Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.
CAFO: Cost Aware Flip Optimization for Asymmetric Memories RAKAN MADDAH *, SEYED MOHAMMAD SEYEDZADEH AND RAMI MELHEM COMPUTER SCIENCE DEPARTMENT UNIVERSITY.
Memory Key component of a computer system is its memory system to store programs and data. ITCS 3181 Logic and Computer Systems 2014 B. Wilkinson Slides12.ppt.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in partitioned architectures Rajeev Balasubramonian Naveen.
University of Utah 1 The Effect of Interconnect Design on the Performance of Large L2 Caches Naveen Muralimanohar Rajeev Balasubramonian.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.
12/1/2004EE 42 fall 2004 lecture 381 Lecture #38: Memory (2) Last lecture: –Memory Architecture –Static Ram This lecture –Dynamic Ram –E 2 memory.
Mrinmoy Ghosh Weidong Shi Hsien-Hsin (Sean) Lee
ERD and Memory Architectures Paul Franzon Department of Electrical and Computer Engineering
Physical Memory By Gregory Marshall. MEMORY HIERARCHY.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.
1 University of Utah & HP Labs 1 Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian.
TOWARDS AN EARLY DESIGN SPACE EXPLORATION TOOL SET FOR STT-RAM DESIGN Philip Asare and Ben Melton.
NVSleep: Using Non-Volatile Memory to Enable Fast Sleep/Wakeup of Idle Cores Xiang Pan and Radu Teodorescu Computer Architecture Research Lab
Defining Anomalous Behavior for Phase Change Memory
High Speed 64kb SRAM ECE 4332 Fall 2013 Team VeryLargeScaleEngineers Robert Costanzo Michael Recachinas Hector Soto.
International Symposium on Low Power Electronics and Design NoC Frequency Scaling with Flexible- Pipeline Routers Pingqiang Zhou, Jieming Yin, Antonia.
Reconfigurable Caches and their Application to Media Processing Parthasarathy (Partha) Ranganathan Dept. of Electrical and Computer Engineering Rice University.
Sangyeun Cho Hyunjin Lee
Magnetoresistive Random Access Memory (MRAM)
Dept. of Computer Science, UC Irvine
Dong Hyuk Woo Nak Hee Seong Hsien-Hsin S. Lee
1 Review Of “A 125 MHz Burst-Mode Flexible Read While Write 256Mbit 2b/c 1.8V NOR Flash Memory” Adopted From: “ISSCC 2005 / SESSION 2 / NON-VOLATILE MEMORY.
Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.
A Robust Pulse-triggered Flip-Flop and Enhanced Scan Cell Design
A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.
Low-Power SRAM ECE 4332 Fall 2010 Team 2: Yanran Chen Cary Converse Chenqian Gan David Moore.
02/21/2003 CART 1 On-chip MRAM as a High-Bandwidth, Low-Latency Replacement for DRAM Physical Memories Rajagopalan Desikan, Charles R. Lefurgy, Stephen.
Spintronics. Properties of Electron Electron has three properties. Charge Mass Spin.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
A High-Speed & High-Capacity Single-Chip Copper Crossbar John Damiano, Bruce Duewer, Alan Glaser, Toby Schaffer, John Wilson, and Paul Franzon North Carolina.
Click to edit Master title style Progress Update Energy-Performance Characterization of CMOS/MTJ Hybrid Circuits Fengbo Ren 05/28/2010.
Submitted To: Presented By : Dr R S Meena Shailendra Kumar Singh Mr Pankaj Shukla C.R. No : 07/126 Final B. Tech. (ECE) University College Of Engineering,
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
STT-RAM Generator - Anurag Nigam.
ECE/CS 552: Cache Concepts © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith.
Mohsen Imani†, Abbas Rahimi‡, Yeseong Kim†, Tajana S. Rosing†
Hang Zhang1, Xuhao Chen1, Nong Xiao1,2, Fang Liu1
Modeling of Failure Probability and Statistical Design of Spin-Torque Transfer MRAM (STT MRAM) Array for Yield Enhancement Jing Li, Charles Augustine,
Low Write-Energy STT-MRAMs using FinFET-based Access Transistors
Cache Memory Presentation I
Energy-Efficient Address Translation
Semiconductor Memories
Literature Review A Nondestructive Self-Reference Scheme for Spin-Transfer Torque Random Access Memory (STT-RAM) —— Yiran Chen, et al. Fengbo Ren 09/03/2010.
Lei Zhao, Youtao Zhang, Jun Yang
Presentation transcript:

Energy Reduction for STT-RAM Using Early Write Termination Ping Zhou, Bo Zhao, Jun Yang, *Youtao Zhang Electrical and Computer Engineering Department *Department of Computer Science University of Pittsburgh 1 ICCAD 2009

Introduction Traditional SRAM Cache – Limited by density, leakage and scalability STT-RAM Cache? – High density (~4x than SRAM) – High speed (same read speed as SRAM) – Non-volatile – No write endurance problem 2

STT-RAM: Cell Magnetic Tunnel Junction (MTJ) Relative magnetization direction – Different resistances  Logic 0 or 1 Write: spin-polarized current – Much less write current than conventional MRAM 3 MgO High Resistance (Logic 1) Low Resistance (Logic 0) Reference Layer Free Layer

Similar array structure as SRAM Bidirectional write current STT-RAM: Cell Array 4 write 0write 1 MTJ BLSLBLSL WL

STT-RAM Cache: Challenge High dynamic energy – 6~14x more energy per write access [Dong et al. DAC 2008, Sun et al. HPCA 2009] – Write contributes >74% of total dynamic energy % Need to reduce write energy in STT-RAM cache!

Opportunity Many bits are unchanged in a write access – Redundant bit-writes [Zhou et al. ISCA 2009] Redundant bit-writes in 16MB STT-RAM cache 6 88% How to exploit this opportunity?

Exploiting Redundant Bit-Writes Need to know the old value… Read & compare before write [Zhou et al. ISCA 2009] Can we do better? 7

Observation MTJ resistance changes abruptly by the end of write cycle – Cell still holds old value at early stage of write cycle Read is much faster than write 8 Y. Chen et al. ISQED 2008 Possible to sense the old value at early stage of write cycle

Early Write Termination: Idea On a write access… – Start write cycle like normal – Sense the old value at early stage – Terminate the write cycle if old value is same as new value Does not require a preceding read & compare! 9

EWT Circuit 10 MTJ pass Vsense1Vsense0 write 0 write 1 conversion Vin1Vin0 Conversion circuit -Basic differential amplifier -Input lower  Output higher -Input higher  Output lower Rwire Vsense0 Vsense1 Vref0 Vref1 Sense-Amp New value Terminate? SLBL WL

How EWT Works? 11 MTJ pass Vsense1 Vsense0 low write 0 high conversion Vin1 Vin0 Rwire Old ValueNew ValueVsense0SA outputAction 0 0higher1Terminate Vin0 lower 10 0Continuehigher 0.536ns SLBL WL

Advantages of EWT No performance penalty! – Carried within a write cycle – No need to read & compare before a write – Write access may finish early  Slight speedup Low energy overhead (3.23%) Low complexity Easy to integrate with existing designs 12

MODELING STT-RAM AND EWT 13

Latency Modeling Cell – Derived from recent works [Dong et al. DAC 2008] Peripheral – Derived from CACTI [Thoziyoor et al. ISCA 2008, Dong et al. DAC 2008] 14

Dynamic Energy Modeling Baseline: Derived from recent works [Dong et al. DAC 2008] EWT – Read energy: same as baseline – Write energy: variable 15 Peripheral (derived from CACTI) Extra energy introduced by EWT circuits (HSPICE) N changed × E changed + N unchanged × E unchanged Cell changeTerminated cell change

Leakage Energy Modeling STT-RAM is non-volatile – Power gate the idle banks – Assume 1ns delay to “wake up” – Used in both baseline and EWT 16

Experimental Setup Simics-based simulator – 4-core CMP, 1GHz – 32KB private L1 cache – 16MB shared L2 cache using STT-RAM, 16 banks – 4GB main memory – Enhanced cache model: STT-RAM & EWT 17

Results: Performance 18 Normalized Cycle-Per-Instruction (CPI) 1% speedup Slight performance improvement

Results: Write Energy 19 Normalized write energy Up to 80% write energy reduction 70% saving

Results: Dynamic Energy 20 Normalized dynamic energy 52% reduction EWT Base

Results: Total Energy Normalized total energy 21 33% reduction

Results: Energy-Delay Product Normalized ED % reduction

Conclusion Address a key challenge to STT-RAM cache: dynamic energy EWT: Exploit redundant bit-writes without performance penalty – Low overhead and complexity Modeling and evaluation – Up to 80% write energy reduction – 34% ED 2 reduction 23

THANK YOU! 24