(Find all PTEs that map to a given PPN)

Slides:



Advertisements
Similar presentations
Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
EECS 470 Virtual Memory Lecture 15. Why Use Virtual Memory? Decouples size of physical memory from programmer visible virtual memory Provides a convenient.
Caching IV Andreas Klappenecker CPSC321 Computer Architecture.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
Lecture 15: Virtual Memory EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui COMP 203 / NWEN 201 Computer Organisation / Computer Architectures Virtual.
Computer Architecture Lecture 26 Fasih ur Rehman.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?
Virtual Memory.  Next in memory hierarchy  Motivations:  to remove programming burdens of a small, limited amount of main memory  to allow efficient.
IMP: Indirect Memory Prefetcher
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
Computer Architecture Lecture 12: Virtual Memory I
CMSC 611: Advanced Computer Architecture
Translation Lookaside Buffer
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Virtual Memory Chapter 7.4.
Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh
18-447: Computer Architecture Lecture 23: Caches
Lecture Topics: 11/19 Paging Page tables Memory protection, validation
Section 9: Virtual Memory (VM)
Section 9: Virtual Memory (VM)
Today How was the midterm review? Lab4 due today.
CS 704 Advanced Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Lecture: Cache Hierarchies
Some Real Problem What if a program needs more memory than the machine has? even if individual programs fit in memory, how can we run multiple programs?
Memory Hierarchy Virtual Memory, Address Translation
Cache Memory Presentation I
CSE 153 Design of Operating Systems Winter 2018
Moinuddin K. Qureshi ECE, Georgia Tech Gabriel H. Loh, AMD
Virtual Memory 3 Hakim Weatherspoon CS 3410, Spring 2011
Lecture 21: Memory Hierarchy
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
Part V Memory System Design
Replacement Policies Assume all accesses are: Cache Replacement Policy
Virtual Memory 4 classes to go! Today: Virtual Memory.
Andy Wang Operating Systems COP 4610 / CGS 5765
Lecture 23: Cache, Memory, Virtual Memory
FIGURE 12-1 Memory Hierarchy
Lecture 22: Cache Hierarchies, Memory
ECE Dept., University of Toronto
Andy Wang Operating Systems COP 4610 / CGS 5765
Virtual Memory فصل هشتم.
Performance metrics for caches
Performance metrics for caches
Performance metrics for caches
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
TLB Performance Seung Ki Lee.
CANDY: Enabling Coherent DRAM Caches for Multi-node Systems
Virtual Memory Prof. Eric Rotenberg
Lecture: Cache Hierarchies
CSC3050 – Computer Architecture
Lecture 21: Memory Hierarchy
Performance metrics for caches
CSE 153 Design of Operating Systems Winter 2019
Virtual Memory Lecture notes from MKP and S. Yalamanchili.
Principle of Locality: Memory Hierarchies
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Sarah Diesburg Operating Systems CS 3430
Andy Wang Operating Systems COP 4610 / CGS 5765
Performance metrics for caches
Design of Digital Circuits Lecture 25b: Virtual Memory
Sarah Diesburg Operating Systems COP 4610
Presentation transcript:

(Find all PTEs that map to a given PPN) Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation Xiangyao Yu1, Christopher Hughes2, Nadathur Satish2, Onur Mutlu3, Srinivas Devadas1 1MIT 2 Intel Labs 3ETH Zürich Motivation Banshee Contribution Bandwidth efficiency as a first-class design constraint High Bandwidth efficiency without degrading latency In-package DRAM has 5X higher bandwidth than off- package DRAM Similar latency as off-package DRAM Limited capacity (up to 16 GB) In-package DRAM can be used as a cache Idea 1: Efficient TLB coherence for Page-Table-Based DRAM Caches * Assuming 4-way set-associative DRAM cache Translation Lookaside Buffer (TLB) TLB Entry … VPN PPN Core SRAM Cache Hierarchy Memory Controller In-Package DRAM Off-Package Hardware Software Cached (1 bit) Way (2 bits) Mapping Tag Buffer V Page Table Page Table Entry Reverse (Find all PTEs that map to a given PPN) Track DRAM cache contents using page tables and TLBs Maintain latest mapping for recently remapped pages in the Tag Buffer Enforce TLB coherence lazily when the Tag Buffer is full to amortize the cost Core SRAM Cache Hierarchy Memory Controller In-Package DRAM Off-Package On-Chip > 400 GB/s 90 GB/s 384 GB 16 GB * Numbers from Intel Knights Landing Bandwdith Inefficiency in Existing DRAM Caches DRAM Cache Traffic Breakdown Coarse-Granularity Fine-Granularity Idea 2: Bandwidth-Aware Cache Replacement DRAM cache replacement incurs significant DRAM traffic Cache replacement traffic Metadata traffic Limit cache replacement rate Replace only when the incoming page’s frequency counter is greater than the victim pages’s counter by a threshold Reduce metadata traffic Access frequency counters for a randomly sampled fraction of memory accesses Memory Controller In- Package DRAM Off-Package Cache Hits (64 B) Cache Misses Limited Cache Replacements Sampled Frequency Counter Accesses Drawback 1: Metadata traffic (e.g., tags, LRU bits, frequency counters, etc.) Drawback 2: Replacement traffic Especially for coarse-granularity (e.g., page-granularity) DRAM cache designs Evaluations Banshee improves performance by 15% on average over the best-previous (i.e., BEAR) latency-optimized DRAM cache design Banshee reduces 36% in-package DRAM traffic over the best-previous design Banshee reduces 3% off-package DRAM traffic over the best-previous design