(Find all PTEs that map to a given PPN)

Slides:

Advertisements

Similar presentations

Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.

Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

EECS 470 Virtual Memory Lecture 15. Why Use Virtual Memory? Decouples size of physical memory from programmer visible virtual memory Provides a convenient.

Caching IV Andreas Klappenecker CPSC321 Computer Architecture.

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.

The Memory Hierarchy II CPSC 321 Andreas Klappenecker.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)

Lecture 15: Virtual Memory EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.

The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui COMP 203 / NWEN 201 Computer Organisation / Computer Architectures Virtual.

Computer Architecture Lecture 26 Fasih ur Rehman.

1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.

1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?

Virtual Memory.  Next in memory hierarchy  Motivations:  to remove programming burdens of a small, limited amount of main memory  to allow efficient.

IMP: Indirect Memory Prefetcher

Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.

CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.

Computer Architecture Lecture 12: Virtual Memory I

CMSC 611: Advanced Computer Architecture

Translation Lookaside Buffer

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Virtual Memory Chapter 7.4.

Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh

18-447: Computer Architecture Lecture 23: Caches

Lecture Topics: 11/19 Paging Page tables Memory protection, validation

Section 9: Virtual Memory (VM)

Section 9: Virtual Memory (VM)

Today How was the midterm review? Lab4 due today.

CS 704 Advanced Computer Architecture

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

Lecture: Cache Hierarchies

Some Real Problem What if a program needs more memory than the machine has? even if individual programs fit in memory, how can we run multiple programs?

Memory Hierarchy Virtual Memory, Address Translation

Cache Memory Presentation I

CSE 153 Design of Operating Systems Winter 2018

Moinuddin K. Qureshi ECE, Georgia Tech Gabriel H. Loh, AMD

Virtual Memory 3 Hakim Weatherspoon CS 3410, Spring 2011

Lecture 21: Memory Hierarchy

Chapter 8 Digital Design and Computer Architecture: ARM® Edition

Part V Memory System Design

Replacement Policies Assume all accesses are: Cache Replacement Policy

Virtual Memory 4 classes to go! Today: Virtual Memory.

Andy Wang Operating Systems COP 4610 / CGS 5765

Lecture 23: Cache, Memory, Virtual Memory

FIGURE 12-1 Memory Hierarchy

Lecture 22: Cache Hierarchies, Memory

ECE Dept., University of Toronto

Andy Wang Operating Systems COP 4610 / CGS 5765

Virtual Memory فصل هشتم.

Performance metrics for caches

Performance metrics for caches

Performance metrics for caches

Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory

TLB Performance Seung Ki Lee.

CANDY: Enabling Coherent DRAM Caches for Multi-node Systems

Virtual Memory Prof. Eric Rotenberg

Lecture: Cache Hierarchies

CSC3050 – Computer Architecture

Lecture 21: Memory Hierarchy

Performance metrics for caches

CSE 153 Design of Operating Systems Winter 2019

Virtual Memory Lecture notes from MKP and S. Yalamanchili.

Principle of Locality: Memory Hierarchies

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

Sarah Diesburg Operating Systems CS 3430

Andy Wang Operating Systems COP 4610 / CGS 5765

Performance metrics for caches

Design of Digital Circuits Lecture 25b: Virtual Memory

Sarah Diesburg Operating Systems COP 4610

Presentation transcript:

(Find all PTEs that map to a given PPN) Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation Xiangyao Yu1, Christopher Hughes2, Nadathur Satish2, Onur Mutlu3, Srinivas Devadas1 1MIT 2 Intel Labs 3ETH Zürich Motivation Banshee Contribution Bandwidth efficiency as a first-class design constraint High Bandwidth efficiency without degrading latency In-package DRAM has 5X higher bandwidth than off- package DRAM Similar latency as off-package DRAM Limited capacity (up to 16 GB) In-package DRAM can be used as a cache Idea 1: Efficient TLB coherence for Page-Table-Based DRAM Caches * Assuming 4-way set-associative DRAM cache Translation Lookaside Buffer (TLB) TLB Entry … VPN PPN Core SRAM Cache Hierarchy Memory Controller In-Package DRAM Off-Package Hardware Software Cached (1 bit) Way (2 bits) Mapping Tag Buffer V Page Table Page Table Entry Reverse (Find all PTEs that map to a given PPN) Track DRAM cache contents using page tables and TLBs Maintain latest mapping for recently remapped pages in the Tag Buffer Enforce TLB coherence lazily when the Tag Buffer is full to amortize the cost Core SRAM Cache Hierarchy Memory Controller In-Package DRAM Off-Package On-Chip > 400 GB/s 90 GB/s 384 GB 16 GB * Numbers from Intel Knights Landing Bandwdith Inefficiency in Existing DRAM Caches DRAM Cache Traffic Breakdown Coarse-Granularity Fine-Granularity Idea 2: Bandwidth-Aware Cache Replacement DRAM cache replacement incurs significant DRAM traffic Cache replacement traffic Metadata traffic Limit cache replacement rate Replace only when the incoming page’s frequency counter is greater than the victim pages’s counter by a threshold Reduce metadata traffic Access frequency counters for a randomly sampled fraction of memory accesses Memory Controller In- Package DRAM Off-Package Cache Hits (64 B) Cache Misses Limited Cache Replacements Sampled Frequency Counter Accesses Drawback 1: Metadata traffic (e.g., tags, LRU bits, frequency counters, etc.) Drawback 2: Replacement traffic Especially for coarse-granularity (e.g., page-granularity) DRAM cache designs Evaluations Banshee improves performance by 15% on average over the best-previous (i.e., BEAR) latency-optimized DRAM cache design Banshee reduces 36% in-package DRAM traffic over the best-previous design Banshee reduces 3% off-package DRAM traffic over the best-previous design