CAFO: Cost Aware Flip Optimization for Asymmetric Memories RAKAN MADDAH *, SEYED MOHAMMAD SEYEDZADEH AND RAMI MELHEM COMPUTER SCIENCE DEPARTMENT UNIVERSITY.

Slides:

Advertisements

Similar presentations

Jesper H. Sørensen, Toshiaki Koike-Akino, and Philip Orlik 2012 IEEE International Symposium on Information Theory Proceedings Rateless Feedback Codes.

Advertisements

Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.

Logic Gates A logic gate is an elementary building block of a digital circuit Most logic gates have two inputs and one output At any given moment, every.

Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.

Thank you for your introduction.

Prefetching Techniques for STT-RAM based Last-level Cache in CMP Systems Mengjie Mao, Guangyu Sun, Yong Li, Kai Bu, Alex K. Jones, Yiran Chen Department.

Computer Engineering II

LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.

University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.

Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.

1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.

Probabilistic Design Methodology to Improve Run- time Stability and Performance of STT-RAM Caches Xiuyuan Bi (1), Zhenyu Sun (1), Hai Li (1) and Wenqing.

Programming Types of Testing.

Quantum Packet Switching A. Yavuz Oruç Department of Electrical and Computer Engineering University of Maryland, College Park.

Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen.

Phase Change Memory What to wear out today? Chris Craik, Aapo Kyrola, Yoshihisa Abe.

Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.

Chapter 4: The Building Blocks: Binary Numbers, Boolean Logic, and Gates Invitation to Computer Science, Java Version, Third Edition.

Chapter 4: The Building Blocks: Binary Numbers, Boolean Logic, and Gates Invitation to Computer Science, C++ Version, Third Edition.

1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)

Chapter 4: The Building Blocks: Binary Numbers, Boolean Logic, and Gates Invitation to Computer Science, C++ Version, Third & Fourth Edition Spring 2008:

Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure Oguz Ergin*, Deniz Balkan, Kanad Ghose, Dmitry Ponomarev Department.

Overview Logic Combinational Logic Sequential Logic Storage Devices SR Flip-Flops D Flip Flops JK Flip Flops Registers Addressing Computer Memory.

Review Two’s complement

Redundant Data Update in Server-less Video-on-Demand Systems Presented by Ho Tsz Kin.

Using Reduction for the Game of NIM. At each turn, a player chooses one pile and removes some sticks. The player who takes the last stick wins. Problem:

COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.

On Fairness, Optimizing Replica Selection in Data Grids Husni Hamad E. AL-Mistarihi and Chan Huah Yong IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,

1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.

Object Tracking for Retrieval Application in MPEG-2 Lorenzo Favalli, Alessandro Mecocci, Fulvio Moschetti IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR.

Lecture 3. Boolean Algebra, Logic Gates Prof. Sin-Min Lee Department of Computer Science 2x.

1 Secure Cooperative MIMO Communications Under Active Compromised Nodes Liang Hong, McKenzie McNeal III, Wei Chen College of Engineering, Technology, and.

Efficient Model Selection for Support Vector Machines

Defining Anomalous Behavior for Phase Change Memory

Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2

Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.

Sangyeun Cho Hyunjin Lee

Transportation Problem

Performance evaluation of adaptive sub-carrier allocation scheme for OFDMA Thesis presentation16th Jan 2007 Author:Li Xiao Supervisor: Professor Riku Jäntti.

Adaptive Data Aggregation for Wireless Sensor Networks S. Jagannathan Rutledge-Emerson Distinguished Professor Department of Electrical and Computer Engineering.

Adaptive Multi-path Prediction for Error Resilient H.264 Coding Xiaosong Zhou, C.-C. Jay Kuo University of Southern California Multimedia Signal Processing.

QUANTITATIVE ANALYSIS FOR MANAGERS TRANSPORTATION MODEL

RDIS: A Recursively Defined Invertible Set Scheme to Tolerate Multiple Stuck-At Faults in Resistive Memory Rami Melhem, Rakan Maddah and Sangyeun cho Computer.

Project 1: DRAM timing violation due to PV Due to PV, transistor and capacitor may have variations in their dimensions, causing charging time of a cell.

Reducing Test Application Time Through Test Data Mutation Encoding Sherief Reda and Alex Orailoglu Computer Science Engineering Dept. University of California,

4. Computer Maths and Logic 4.2 Boolean Logic Logic Circuits.

1 Amit Berman Reliable Architecture for Flash Memory Joint work with Uri C. Weiser, Acknowledgement: thanks to Idit Keidar Department of Electrical Engineering,

Energy Reduction for STT-RAM Using Early Write Termination Ping Zhou, Bo Zhao, Jun Yang, *Youtao Zhang Electrical and Computer Engineering Department *Department.

ITERATIVE CHANNEL ESTIMATION AND DECODING OF TURBO/CONVOLUTIONALLY CODED STBC-OFDM SYSTEMS Hakan Doğan 1, Hakan Ali Çırpan 1, Erdal Panayırcı 2 1 Istanbul.

Dr. Sudharman K. Jayaweera and Amila Kariyapperuma ECE Department University of New Mexico Ankur Sharma Department of ECE Indian Institute of Technology,

On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.

OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer.

A Fast LBG Codebook Training Algorithm for Vector Quantization Presented by 蔡進義.

Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:

Seok-jae, Lee VLSI Signal Processing Lab. Korea University

Coding and Algorithms for Memories Lecture 7 1.

Random Access Memory Team Members: Aditya vaingankar Aneel Chandan Gupta Pallvi Sharma Richa Rashmi.

Hang Zhang1, Xuhao Chen1, Nong Xiao1,2, Fang Liu1

Invitation to Computer Science, C++ Version, Fourth Edition

Rakan Maddah1, Sangyeun2,1 Cho and Rami Melhem1

Seyed Mohammad Seyedzadeh, Rakan Maddah, Alex Jones, Rami Melhem

Invitation to Computer Science, Java Version, Third Edition

ICIEV 2014 Dhaka, Bangladesh

Lecture 6: Reliability, PCM

Use ECP, not ECC, for hard failures in resistive memories

Erasure Correcting Codes for Highly Available Storage

Hardware Main memory 26/04/2019.

Scalable light field coding using weighted binary images

Restrictive Compression Techniques to Increase Level 1 Cache Capacity

Presentation transcript:

CAFO: Cost Aware Flip Optimization for Asymmetric Memories RAKAN MADDAH *, SEYED MOHAMMAD SEYEDZADEH AND RAMI MELHEM COMPUTER SCIENCE DEPARTMENT UNIVERSITY OF PITTSBURGH HPCA 2015

Introduction  DRAM and NAND Flash are facing physical limitations putting their scalability into question DRAM: Decrease in cell reliability and Increase in power consumption NAND Flash: Endurance degradation and Increase in number of transient and hard errors  Phase-Change Memory (PCM) and Spin-Transfer Torque Random Access Memory (STT-RAM)are a promising alternative Scalability, low access latency and close to zero leakage power Initial assessments and evaluations are encouraging

Challenges  PCM and STT-RAM have a number of challenges that needs to be dealt with before deployment in functional systems PCM suffers from limited endurance STT-RAM suffers from high write bit error rate  Solution: Bit flip minimization Service write requests while flipping as few bits as possible Preserves PCM’s endurance and improves STT-RAM’s write reliability

Previous Work Differential Write: compares old data against new data and then only flips differing cells. Flip-N-Write: encodes write data into either its regular or inverted form and then picks the encoding that yields in less flips in comparison against old data Flip-Min: encodes write data into a set of data vectors and then picks the vector that yields in less flips in comparison against old data Old New Saves 2 bit flips Old New Saves 3 bit flips Old New New 2 Saves 4 bit flips New 3

Write Asymmetries  PCM The RESET state is more detrimental to endurance than the set state  STT-RAM Anti-parallel magnetization is more prone to write errors than parallel magnetization SET (“1”) RESET (“0”) Time Power Free Layer Oxide Layer Reference Layer Free Layer Oxide Layer Reference Layer Parallel magnetization (“0”) Anti-parallel magnetization (“1”)

Contribution  Observation: existing schemes fail to exploit the write asymmetry Saves 1 bit flip Old New Saves 3 bit flips Writing a “0” is 4 times more detrimental to endurance than writing a“1” Number of bit flips is oblivious to the write asymmetry!

Contribution  Observation: existing schemes fail to exploit the write asymmetry Focusing solely on the number of bit flips is oblivious to the write asymmetry  Proposal: move from the concept of “bit flip reduction” to “cost reduction”  Cost Aware Flip Optimization (CAFO) Cost model: captures the write asymmetry and assigns a cost for a given write operation Coding engine: encodes the write data into a form that result in overall cost reduction

Cost Model  Compare write data to currently stored data and associate a cost to each cell  The costs “a”, “b”, “c” and “d” depend on the technology being modeled and the optimization objective (endurance, energy, error rate) acdbabdb Currently Stored Data New Data Cost of Writing a: 0  1, b: 1  0, c: 0  0, d:1  1 With a write cost we can define a gain among different encodings

Gain Calculation C= 2a + 3b + 1c + 2d = 8 C encoded = 1a + 2b + 2c + 3d = 5 Gain G = C- C encoded = 8 – 5 = Currently Stored Data New Data acdbabdb Cost of Writing cbadcdad Encoded Data a: 0  1, b: 1  0, c: 0  0, d:1  1 Costs: a = 1, b = 2, c = 0, d = A positive gain implies that it is less costly to write the data encoded How to encode Data?

Encoding Auxiliary bits  Auxiliary bits serve as inversion flags  Coding steps: 1.Compute rows gain 2.Flip all rows with positive gain

Encoding  Auxiliary bits serve as inversion flags  Coding steps: 1.Compute rows gain 2.Flip all rows with positive gain 3.Compute columns gain 4.Flip all columns with positive gain 5.Repeat process until all rows and columns show a zero or negative gain  Alteration between row and column flips yields in additional cost reduction

Encoding example  Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a cell that is to be flipped, “0” otherwise Gain

Encoding example  Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a cell that is to be flipped, “0” otherwise Flip rows with + gain

Encoding example  Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a cell that is to be flipped, “0” otherwise Flip columns with + gain Flip rows with + gain

Encoding example  Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a cell that is to be flipped, “0” otherwise flips 33 flips Encoding terminates as no row or column shows a positive gain Flip columns with + gain Flip rows with + gain

Row only Inversion FN W flips 25 flips

Encoding example  Costs: a = 1, b = 1, c = 0, d = 0—”1” represents a cell that is to be flipped, “0” otherwise flips 33 flips Flip columns with + gain Flip rows with + gain Can We do better?

Encoding Optimization  Write cost can be further reduced even if no row or column shows a positive gain flips 3 flips Flip row and column together Gain

Encoding Optimization  Write cost can be further reduced even if no row or column shows a positive gain Flipping both a row and a column, leaves their intersecting cell un- inverted The local gain of the intersecting cell has to subtracted from the total gain of the corresponding row and columns Gain is achieved if G r + G c – 2g r+c > 0 GcGc GrGr g r+c Flip row and column together Gain

Encoding Optimization (cont.)  Generalize to Flipping 1 column with multiple rows (Vice Versa) Gain flips4 flips Flip 2 rows and 2column together

Aux. Bits Cost  The cost of updating the auxiliary bits can be easily incorporated in the gain calculation a: 0  1, b: 1  0, c: 0  0, d:1  1 acdbabdc b cabdcdbad C = 2a + 3b + 2c +d = 8 C inverted = 2a + 2b + 2c +2d = 6 G= C – C inverted = = 2 Cost of Writing Gain Currently Stored Data New Data Cost of Writing Inverted Data Costs: a = 1, b = 2, c = 0, d = 0 Old aux bit has to be flipped to “0” Old aux bit stays the same

Decoding  Simple: XOR the corresponding vertical and horizontal aux bits Output of “1”: read cell value inverted Output of “0”: read cell valued un-inverted Encode Decode

Decoding  Simple: XOR the corresponding vertical and horizontal aux bits Output of “1”: read cell value inverted Output of “0”: read cell valued un-inverted Encode Decode

Evaluation  Compare Against Flip-Min and Flip-N-Write (FNW)  Experiment with various block sizes of matching space overhead  Compute average cost reduction achieved by every scheme relative to differential write  Experiment with random input stream and memory traces collected from various SPEC benchmark programs  Model both PCM and STT-RAM through setting the cost labels to match the underlying technology

Cost Reduction vs. Cost oblivious FNW and Flip-Min Overhead: 3.125% Overhead: 12.5%Overhead: 6.25%

Cost Reduction vs. Cost oblivious FNW and Flip-Min Overhead: 3.125% Overhead: 12.5%Overhead: 6.25%

Cost Reduction vs. Cost aware FNW and Flip-Min Overhead: 12.5%Overhead: 6.25% Overhead: 3.125% Cost Model Improves FNW and Flip Min

Cost Model Improvement

Optimization Isolation At least 15% of cost reduction without encoding optimization

STT-RAM Cost Reduction Costs: a = 1, b = 0, c = 0, d = 0 Overhead: 12.5%Overhead: 6.25% Overhead: 3.125%

Benchmark Data Costs: a = 1, b = 2, c = 0, d = 0 Block Size: 128B (6.25% overhead)

Conclusion  Bit flip Minimization techniques are oblivious to write asymmetries  Move from the concept of bit flip minimization to cost Reduction  CAFO ◦Cost model that captures the asymmetry in the write cost ◦2D Encoder that minimizes the overall cost of write operations

Questions?