Project 1: DRAM timing violation due to PV Due to PV, transistor and capacitor may have variations in their dimensions, causing charging time of a cell.

Slides:



Advertisements
Similar presentations
Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
Advertisements

361 Computer Architecture Lecture 15: Cache Memory
Thank you for your introduction.
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
5-1 Memory System. Logical Memory Map. Each location size is one byte (Byte Addressable) Logical Memory Map. Each location size is one byte (Byte Addressable)
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects Sarangi et al Prateeksha Satyamoorthy CS
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
STT-RAM as a sub for SRAM and DRAM
SAFER: Stuck-At-Fault Error Recovery for Memories Nak Hee Seong † Dong Hyuk Woo † Vijayalakshmi Srinivasan ‡ Jude A. Rivers ‡ Hsien-Hsin S. Lee † ‡†
Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen.
Some Results on Codes for Flash Memory Michael Mitzenmacher Includes work with Hilary Finucane, Zhenming Liu, Flavio Chierichetti.
Designing Floating Codes for Expected Performance Hilary Finucane Zhenming Liu Michael Mitzenmacher.
Phase Change Memory What to wear out today? Chris Craik, Aapo Kyrola, Yoshihisa Abe.
1 Eitan Yaakobi, Laura Grupp Steven Swanson, Paul H. Siegel, and Jack K. Wolf Flash Memory Summit, August 2010 University of California San Diego Efficient.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
1 Error Correction Coding for Flash Memories Eitan Yaakobi, Jing Ma, Adrian Caulfield, Laura Grupp Steven Swanson, Paul H. Siegel, Jack K. Wolf Flash Memory.
Coding for Flash Memories
1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng.
Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
CAFO: Cost Aware Flip Optimization for Asymmetric Memories RAKAN MADDAH *, SEYED MOHAMMAD SEYEDZADEH AND RAMI MELHEM COMPUTER SCIENCE DEPARTMENT UNIVERSITY.
Memory Key component of a computer system is its memory system to store programs and data. ITCS 3181 Logic and Computer Systems 2014 B. Wilkinson Slides12.ppt.
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
Data Storage Technology
Moinuddin K. Qureshi ECE, Georgia Tech ISCA 2012 Michele Franceschini, Ashish Jagmohan, Luis Lastras IBM T. J. Watson Research Center PreSET: Improving.
Memory Allocation via Graph Coloring using Scratchpad Memory
Intro to Cache Memory By david hsu. Examples of memory Paper and writing, books Neon signs Cassettes and other magnetic tape memory Abacus Art material.
Coding and Algorithms for Memories Lecture 2 1.
Defining Anomalous Behavior for Phase Change Memory
CSCI 4717/5717 Computer Architecture
Lecture 7: PCM, Cache coherence
Sangyeun Cho Hyunjin Lee
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. NON.
Dong Hyuk Woo Nak Hee Seong Hsien-Hsin S. Lee
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Designing a Fast and Reliable Memory with Memristor Technology
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
Background Gaussian Elimination Fault Tolerance Single or multiple core failures: Single or multiple core additions: Simultaneous core failures and additions:
RDIS: A Recursively Defined Invertible Set Scheme to Tolerate Multiple Stuck-At Faults in Resistive Memory Rami Melhem, Rakan Maddah and Sangyeun cho Computer.
Error Correction and Partial Information Rewriting for Flash Memories Yue Li joint work with Anxiao (Andrew) Jiang and Jehoshua Bruck.
1 Lecture 2: Memory Energy Topics: energy breakdowns, handling overfetch, LPDRAM, row buffer management, channel energy, refresh energy.
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
33 rd IEEE International Conference on Computer Design ICCD rd IEEE International Conference on Computer Design ICCD 2015 Improving Memristor Memory.
Department of Electrical and Computer Engineering University of Wisconsin - Madison Optimizing Total Power of Many-core Processors Considering Voltage.
Jeffrey Ellak CS 147. Topics What is memory hierarchy? What are the different types of memory? What is in charge of accessing memory?
What is it and why do we need it? Chris Ward CS147 10/16/2008.
1 Lecture 7: PCM Wrap-Up, Cache coherence Topics: handling PCM errors and writes, cache coherence intro.
Coding and Algorithms for Memories Lecture 6 1.
Coding and Algorithms for Memories Lecture 2
CS161 – Design and Architecture of Computer
Rakan Maddah1, Sangyeun2,1 Cho and Rami Melhem1
Main Memory Cache Architectures
The Goal: illusion of large, fast, cheap memory
Seyed Mohammad Seyedzadeh, Rakan Maddah, Alex Jones, Rami Melhem
Exam 2 Review Two’s Complement Arithmetic Ripple carry ALU logic and performance Look-ahead techniques, performance and equations Basic multiplication.
Computer Architecture & Operations I
Lecture 6: Reliability, PCM
CDA 5155 Caches.
Adapted from slides by Sally McKee Cornell University
DDM – A Cache-Only Memory Architecture
Horizontally Partitioned Hybrid Main Memory with PCM
Use ECP, not ECC, for hard failures in resistive memories
CS 3410, Spring 2014 Computer Science Cornell University
15-740/ Computer Architecture Lecture 19: Main Memory
Literature Review A Nondestructive Self-Reference Scheme for Spin-Transfer Torque Random Access Memory (STT-RAM) —— Yiran Chen, et al. Fengbo Ren 09/03/2010.
Semiconductor memories are classified in different ways. A distinction is made between read-only (ROM) and read-write (RWM) memories. The contents RWMs.
Presentation transcript:

Project 1: DRAM timing violation due to PV Due to PV, transistor and capacitor may have variations in their dimensions, causing charging time of a cell to vary Situation is becoming worse with smaller technologies Threatens yield

Initial Study

Challenges: Maintain yield – How to overcome slow-to-write cells? Naïve Solution: there are both fast and slow cells. Can fast cells balance slow cells? Distribution Data

Issues Cannot do it at cell granularity. The memory controller would not be able to handle different write speed at cell level A practical way is to handle it at the row level The write speed of a row is determined by its slowest cell: is it good enough, or do we need a different granularity, say a chunk (sub-row, super- row)? Cost of fine granularity: the memory controller needs to bookkeep the information – a huge hardware overhead

Issues continued Problem of coarse granularity: limited by the slowest cell. May not be able to exploit the fast cells Question 1: what is the best granularity that the memory controller should consider in distinguishing different write speeds?

Another Question Suggestions: use only a few write times, and put memory chunks into bins Question 2: given a chunk size, how to decide  ?

A Reference Reading Bo Zhao et al. “Variation-Tolerant Non- Uniform 3D Cache Management in Die Stacked Multicore Processor”, in MICRO 2009.

Tools You May Need DRAMSim: – Paul Rosenfeld, Elliott Cooper-Balis, and Bruce Jacob. Dramsim2: A cycle accurate memory system simulator. IEEE Comput. Archit. Lett., 10(1):16–19, January VARIUS: – S. Sarangi, B. Greskamp, R. Teodorescu, J. Nakano, A. Tiwari, and J. Torrellas. Varius: A model of process variation and resulting timing errors for microarchitects. IEEE Transactions on Semiconductor Manufacturing, 21(1):3–13, 2008

WoM Encoding for PCM Slow write operation − Write blocks read, causes slowdown − SET: long latency (~8x of read) − RESET: short latency (~ same to read) time Power RESET SET

PCM Memory PreSET Scheme [Qureshi_hpca’13] Exploit asymmetry (slow SET vs. fast RESET) in write operations Perform SET ahead of actual write proactive-SET − Proactively SETs (proactive-SET) dirty cache line; write- back write − Only RESETs are performed when actually written (write- back write) to memory (fast write). 10 DRAM$ Proactive SET Eviction to ✗ slow ✓ fast

PreSET Increases no. of Bit Changes 11 For 128B line, − Baseline sets 91 bits and reset 77 bits; − PreSET sets 180 and resets X 1.98X

PreSET Overall Effects Positive: − Improves performance by 34% − Decreases Energy-Delay-Product (EDP) by 24% Drawbacks: − Greatly increases write power (225%) & system power (30%); − Impairs lifetime of PCM, ~60%. 12 Can we cut down PreSET’s power consumption without losing performance?

Write-Once Memory (WoM) First introduced for uni-direction write-once memories: 0  1 [Rivest & Shamir’82] Recently adopted in Flash [Jiang A.’07] – Cut the no. of erasures by half – Improved write performance and lifetime bit data1 st -write2 nd -write WoM code for PCM Both writes have RESETs only Original WoM code

WoM-SET A proactive-SET based write scheme using WoM code BaselinePreSETWoM-SET Memory Line time 3RESET, 1SET 11RESET, 9SET 5RESET, 5SET

Questions to Solve What if we just apply WoM codes to the baseline (i.e. without PreSET)? How would that improve (or degrade) the baseline? After applying code 1 and code 2, how to proceed on the third write of a cell? – Option 1, write code 1 directly – Option 2, use PreSET and code 1 Which one is better?