1 Lecture 5: Scheduling and Reliability Topics: scheduling policies, handling DRAM errors.

Slides:



Advertisements
Similar presentations
Memory Controller Innovations for High-Performance Systems
Advertisements

Application-Aware Memory Channel Partitioning † Sai Prashanth Muralidhara § Lavanya Subramanian † † Onur Mutlu † Mahmut Kandemir § ‡ Thomas Moscibroda.
LEVERAGING ACCESS LOCALITY FOR THE EFFICIENT USE OF MULTIBIT ERROR-CORRECTING CODES IN L2 CACHE By Hongbin Sun, Nanning Zheng, and Tong Zhang Joseph Schneider.
Rethinking DRAM Design and Organization for Energy-Constrained Multi-Cores Aniruddha N. Udipi, Naveen Muralimanohar*, Niladrish Chatterjee, Rajeev Balasubramonian,
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
Parallelism-Aware Batch Scheduling Enhancing both Performance and Fairness of Shared DRAM Systems Onur Mutlu and Thomas Moscibroda Computer Architecture.
Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures Dec 15 th 2014 MICRO-47 Cambridge UK Prashant Nair - Georgia Tech David.
1 Lecture 13: DRAM Innovations Today: energy efficiency, row buffer management, scheduling.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Lecture 27: Disks, Reliability, SSDs, Processors Topics: HDDs, SSDs, RAID, Intel and IBM case studies Final exam stats:  Highest 91, 18 scores of 82+
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
Justin Meza Qiang Wu Sanjeev Kumar Onur Mutlu Revisiting Memory Errors in Large-Scale Production Data Centers Analysis and Modeling of New Trends from.
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.
Due to the economic downturn, Microsoft Research has eliminated all funding for title slides. We sincerely apologize for any impact these austerity measures.
1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah.
1 Lecture 4: Memory: HMC, Scheduling Topics: BOOM, memory blades, HMC, scheduling policies.
Virtualized and Flexible ECC for Main Memory
1 Storage Refinement. Outline Disk failures To attack Intermittent failures To attack Media Decay and Write failure –Checksum To attack Disk crash –RAID.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
1 Efficient Data Access in Future Memory Hierarchies Rajeev Balasubramonian School of Computing Research Buffet, Fall 2010.
LOT-ECC: LOcalized and Tiered Reliability Mechanisms for Commodity Memory Systems Ani Udipi § Naveen Muralimanohar* Rajeev Balasubramonian Al Davis Norm.
Lecture 7: PCM, Cache coherence
IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.
1 Lecture: Virtual Memory, DRAM Main Memory Topics: virtual memory, TLB/cache access, DRAM intro (Sections 2.2)
Thread Cluster Memory Scheduling : Exploiting Differences in Memory Access Behavior Yoongu Kim Michael Papamichael Onur Mutlu Mor Harchol-Balter.
Embedded System Lab. 최 길 모최 길 모 Kilmo Choi A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore.
RAID SECTION (2.3.5) ASHLEY BAILEY SEYEDFARAZ YASROBI GOKUL SHANKAR.
Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era Dimitris Kaseridis +, Jeffrey Stuecheli *+, and.
1 Lecture 14: DRAM Main Memory Systems Today: cache/TLB wrap-up, DRAM basics (Section 2.3)
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
This project has received funding from the European Union's Seventh Framework Programme for research, technological development.
Parallelism-Aware Batch Scheduling Enhancing both Performance and Fairness of Shared DRAM Systems Onur Mutlu and Thomas Moscibroda Computer Architecture.
1 Lecture 2: Memory Energy Topics: energy breakdowns, handling overfetch, LPDRAM, row buffer management, channel energy, refresh energy.
ECE/CS 552: Main Memory and ECC © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and.
1 Lecture 27: Disks Today’s topics:  Disk basics  RAID  Research topics.
1 Lecture 5: Refresh, Chipkill Topics: refresh basics and innovations, error correction.
1 Lecture: Storage, GPUs Topics: disks, RAID, reliability, GPUs (Appendix D, Ch 4)
1 Lecture 3: Memory Buffers and Scheduling Topics: buffers (FB-DIMM, RDIMM, LRDIMM, BoB, BOOM), memory blades, scheduling policies.
1 Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile.
1 Lecture 7: PCM Wrap-Up, Cache coherence Topics: handling PCM errors and writes, cache coherence intro.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Lavanya Subramanian 1.
1 Lecture 4: Memory Scheduling, Refresh Topics: scheduling policies, refresh basics.
1 Lecture 16: Main Memory Innovations Today: DRAM basics, innovations, trends HW5 due on Thursday; simulations can take a few hours Midterm: 32 scores.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
Reducing Memory Interference in Multicore Systems
Vladimir Stojanovic & Nicholas Weaver
Understanding Latency Variation in Modern DRAM Chips Experimental Characterization, Analysis, and Optimization Kevin Chang Abhijith Kashyap, Hasan Hassan,
Lecture 15: DRAM Main Memory Systems
Lecture: Memory, Multiprocessors
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
Lecture 28: Reliability Today’s topics: GPU wrap-up Disk basics RAID
Lecture: Memory Technology Innovations
Lecture 6: Reliability, PCM
Aniruddha N. Udipi, Naveen Muralimanohar*, Niladrish Chatterjee,
SYNERGY: Rethinking Secure-Memory Design for Error-Correcting Memories
Use ECP, not ECC, for hard failures in resistive memories
15-740/ Computer Architecture Lecture 19: Main Memory
2019 2학기 고급운영체제론 ZebRAM: Comprehensive and Compatible Software Protection Against Rowhammer Attacks 3 # 단국대학교 컴퓨터학과 # 남혜민 # 발표자.
Presentation transcript:

1 Lecture 5: Scheduling and Reliability Topics: scheduling policies, handling DRAM errors

2 PAR-BS Mutlu and Moscibroda, ISCA’08 A batch of requests (per bank) is formed: each thread can only contribute R requests to this batch; batch requests have priority over non-batch requests Within a batch, priority is first given to row buffer hits, then to threads with a higher “rank”, then to older requests Rank is computed based on the thread’s memory intensity; low-intensity threads are given higher priority; this policy improves batch completion time and overall throughput By using rank, requests from a thread are serviced in parallel; hence, parallelism-aware batch scheduling

3 TCM Kim et al., MICRO 2010 Organize threads into latency-sensitive and bw-sensitive clusters based on memory intensity; former gets higher priority Within bw-sensitive cluster, priority is based on rank Rank is determined based on “niceness” of a thread and the rank is periodically shuffled with insertion shuffling or random shuffling (the former is used if there is a big gap in niceness) Threads with low row buffer hit rates and high bank level parallelism are considered “nice” to others

4 Minimalist Open-Page Kaseridis et al., MICRO 2011 Place 4 consecutive cache lines in one bank, then the next 4 in a different bank and so on – provides the best balance between row buffer locality and bank-level parallelism Don’t have to worry as much about fairness Scheduling first takes priority into account, where priority is determined by wait-time, prefetch distance, and MLP in thread A row is precharged after 50 ns, or immediately following a prefetch-dictated large burst

5 Other Scheduling Ideas Using reinforcement learning Ipek et al., ISCA 2008 Co-ordinating across multiple MCs Kim et al., HPCA 2010 Co-ordinating requests from GPU and CPU Ausavarungnirun et al., ISCA 2012 Several schedulers in the Memory Scheduling Championship at ISCA 2012 Predicting the number of row buffer hits Awasthi et al., PACT 2011

6 Basic Reliability Every 64-bit data transfer is accompanied by an 8-bit (Hamming) code – typically stored in a x8 DRAM chip Guaranteed to detect any 2 errors and recover from any single-bit error (SECDED) Such DIMMs are commodities and are sufficient for most applications that are inherently error-tolerant (search) 12.5% overhead in storage and energy For a BCH code, to correct t errors in k-bit data, need an r-bit code, r = t * ceil (log 2 k) + 1

7 Terminology Hard errors: caused by permanent device-level faults Soft errors: caused by particle strikes, noise, etc. SDC: silent data corruption (error was never detected) DUE: detected unrecoverable error DUE in memory caused by a hard error will typically lead to DIMM replacement Scrubbing: a background scan of memory (1GB every 45 mins) to detect and correct 1-bit errors

8 Field Studies Schroeder et al., SIGMETRICS 2009 Memory errors are the top causes for hw failures in servers and DIMMs are the top component replacements in servers Study examined Google servers in , using DDR1, DDR2, and FBDIMM

9 Field Studies Schroeder et al., SIGMETRICS 2009

10 Field Studies Schroeder et al., SIGMETRICS 2009 A machine with past errors is more likely to have future errors 20% of DIMMs account for 94% of errors DIMMs in platforms C and D see higher UE rates because they do not have chipkill 65-80% of uncorrectable errors are preceded by a correctable error in the same month – but predicting a UE is very difficult Chip/DIMM capacity does not strongly influence error rates Higher temperature by itself does not cause more errors, but higher system utilization does Error rates do increase with age; the increase is steep in the month range and then flattens out

11 Chipkill Chipkill correct systems can withstand failure of an entire DRAM chip For chipkill correctness  the 72-bit word must be spread across 72 DRAM chips  or, a 13-bit word (8-bit data and 5-bit ECC) must be spread across 13 DRAM chips

12 RAID-like DRAM Designs DRAM chips do not have built-in error detection Can employ a 9-chip rank with ECC to detect and recover from a single error; in case of a multi-bit error, rely on a second tier of error correction Can do parity across DIMMs (needs an extra DIMM); use ECC within a DIMM to recover from 1-bit errors; use parity across DIMMs to recover from multi-bit errors in 1 DIMM Reads are cheap (must only access 1 DIMM); writes are expensive (must read and write 2 DIMMs) Used in some HP servers

13 RAID-like DRAM Udipi et al., ISCA’10 Add a checksum to every row in DRAM; verified at the memory controller Adds area overhead, but provides self-contained error detection When a chip fails, can re-construct data by examining another parity DRAM chip Can control overheads by having checksum for a large row or one parity chip for many data chips Writes are again problematic

14 Virtualized ECC Yoon and Erez, ASPLOS’10 Also builds a two-tier error protection scheme, but does the second tier in software The second-tier codes are stored in the regular physical address space (not specialized DRAM chips); software has flexibility in terms of the types of codes to use and the types of pages that are protected Reads are cheap; writes are expensive as usual; but, the second-tier codes can now be cached; greatly helps reduce the number of DRAM writes Requires a 144-bit datapath (increases overfetch)

15 LoT-ECC Udipi et al., ISCA 2012 Use checksums to detect errors and parity codes to fix Requires access of only 9 DRAM chips per read, but the storage overhead grows to 26%

16 Title Bullet