Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Vivek Seshadri Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu,

Slides:



Advertisements
Similar presentations
Gennady Pekhimenko Advisers: Todd C. Mowry & Onur Mutlu
Advertisements

Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim,
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
1 Multi-Core Systems CORE 0CORE 1CORE 2CORE 3 L2 CACHE L2 CACHE L2 CACHE L2 CACHE DRAM MEMORY CONTROLLER DRAM Bank 0 DRAM Bank 1 DRAM Bank 2 DRAM Bank.
Caching IV Andreas Klappenecker CPSC321 Computer Architecture.
Virtual Memory & Address Translation Vivek Pai Princeton University.
Memory Management and Paging CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks Vivek Seshadri Samihan Yedkar ∙ Hongyi Xin ∙ Onur Mutlu Phillip.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
1 Coordinated Control of Multiple Prefetchers in Multi-Core Systems Eiman Ebrahimi * Onur Mutlu ‡ Chang Joo Lee * Yale N. Patt * * HPS Research Group The.
Vir. Mem II CSE 471 Aut 011 Synonyms v.p. x, process A v.p. y, process B v.p # index Map to same physical page Map to synonyms in the cache To avoid synonyms,
Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Session 2B – 10:45 AM Vivek Seshadri Gennady Pekhimenko, Olatunji.
The Dirty-Block Index Vivek Seshadri Abhishek Bhowmick ∙ Onur Mutlu Phillip B. Gibbons ∙ Michael A. Kozuch ∙ Todd C. Mowry.
Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches Gennady Pekhimenko Vivek Seshadri Onur Mutlu, Todd C. Mowry Phillip B.
Computer Architecture Lecture 28 Fasih ur Rehman.
CS 153 Design of Operating Systems Spring 2015 Lecture 17: Paging.
Lecture 19: Virtual Memory
Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
Exploiting Compressed Block Size as an Indicator of Future Reuse
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
4.3 Virtual Memory. Virtual memory  Want to run programs (code+stack+data) larger than available memory.  Overlays programmer divides program into pieces.
Redundant Memory Mappings for Fast Access to Large Memories
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
The Evicted-Address Filter
Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Vivek Seshadri Thomas Mullins, Amirali Boroumand,
Achieving High Performance and Fairness at Low Cost Lavanya Subramanian, Donghyuk Lee, Vivek Seshadri, Harsha Rastogi, Onur Mutlu 1 The Blacklisting Memory.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur.
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
DRAM Tutorial Lecture Vivek Seshadri. Vivek Seshadri – Thesis Proposal DRAM Module and Chip 2.
ARM 7 & ARM 9 MICROCONTROLLERS AT91 1 ARM920T Processor.
Simple DRAM and Virtual Memory Abstractions for Highly Efficient Memory Systems Thesis Oral Committee: Todd Mowry (Co-chair) Onur Mutlu (Co-chair) Phillip.
CS161 – Design and Architecture of Computer
Translation Lookaside Buffer
Improving Cache Performance using Victim Tag Stores
Vivek Seshadri 15740/18740 Computer Architecture
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
143A: Principles of Operating Systems Lecture 6: Address translation (Paging) Anton Burtsev October, 2017.
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Prof. Onur Mutlu and Gennady Pekhimenko Carnegie Mellon University
Morgan Kaufmann Publishers
(Find all PTEs that map to a given PPN)
CS510 Operating System Foundations
Ambit In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology Vivek Seshadri Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali.
Chapter 8: Main Memory.
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Part V Memory System Design
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
Gather-Scatter DRAM In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses Session C1, Tuesday 10:40 AM Vivek Seshadri.
Ambit In-memory Accelerator for Bulk Bitwise Operations
Achieving High Performance and Fairness at Low Cost
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.
Virtual Memory Hardware
Lecture 8: Efficient Address Translation
Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi.
CSE 471 Autumn 1998 Virtual memory
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Synonyms v.p. x, process A v.p # index Map to same physical page
Computer Architecture Lecture 30: In-memory Processing
ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the.
Presentation transcript:

Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Vivek Seshadri Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry, Trishul

Executive Summary Sub-page memory management has several applications – More efficient capacity management, protection, metadata, … Page-granularity virtual memory → inefficient implementations – Low performance and high memory redundancy Page Overlays: New Virtual Memory Framework Virtual Page → (physical page, overlay ) Overlay contains new versions of subset of cache lines Efficiently store pages with mostly similar data Largely retains existing virtual memory structure – Low cost implementation over existing frameworks Powerful access semantics – Enables many applications – E.g., overlay-on-write, efficient sparse data structure representation Improves performance and reduces memory redundancy 2 V P O

Existing Virtual Memory Systems Virtual memory enables many OS functionalities – Flexible capacity management – Inter-process data protection, sharing – Copy-on-write, page flipping 3 Virtual page 4KB Physical Page 4KB Page Tables

Case Study: Copy-on-Write 4 Virtual page Physical Page Write Copy entire page 2 Change mapping 3 1 Allocate new page Copy-on- Write Copy Page Tables Virtual Address Space Physical Address Space

Shortcomings of Page-granularity Management 5 Virtual page Physical Page Write Copy entire page 2 Change mapping 3 1 Allocate new page Copy-on- Write Copy Page Tables Virtual Address Space Physical Address Space High memory redundancy

Shortcomings of Page-granularity Management 6 Virtual page Physical Page Write Copy entire page 2 Change mapping 3 1 Allocate new page Copy-on- Write Copy Page Tables Virtual Address Space Physical Address Space 4KB copy: High Latency

Shortcomings of Page-granularity Management 7 Virtual page Physical Page Write Copy entire page 2 Change mapping 3 1 Allocate new page Copy-on- Write Copy Page Tables Virtual Address Space Physical Address Space TLB Shootdown High Latency Wouldn’t it be nice to map pages at a finer granularity (smaller than 4KB)?

Fine-grained Memory Management 8 Fine-grained Memory Management Fine-grained data protection (simpler programs) More efficient capacity management (avoid internal fragmentation, deduplication) Fine-grained metadata management (better security, efficient software debugging) Higher Performance (e.g., more efficient copy-on-write)

Goal: Efficient Fine-grained Memory Management 9 Existing Virtual Memory Framework Low performanceHigh memory redundancy V P O Enable efficient fine-grained management Low implementation cost New Virtual Memory Framework

Outline Shortcomings of Existing Framework Page Overlays – Overview Implementation – Challenges and solutions Applications and Evaluation Conclusion 10

The Page Overlay Framework 11 C0 C1 C2 C3 C4 C5 Virtual Page C0 C1 C2 C3 C4 C5 Physical Page C2 C5 Overlay The overlay contains only a subset of cache lines from the virtual page Access Semantics: Only cache lines not present in the overlay are accessed from the physical page C1 C5 Overlay maintains the newer version of a subset of cache lines from the virtual page

Overlay-on-Write: An Efficient Copy-on-Write 12 Virtual page Physical Page Write Copy-on- Write Page Tables Virtual Address Space Physical Address Space Overlay Overlay contains only modified cache lines Does not require full page copy

Outline Shortcomings of Existing Framework Page Overlays – Overview Implementation – Challenges and solutions Applications and Evaluation Conclusion 13

Implementation Overview 14 V Virtual Address Space P O Main Memory Regular Physical Pages Overlays No changes! Three challenges

Implementation Challenges 15 C0 C1 C2 C3 C4 C5 Virtual Page C0 C1 C2 C3 C4 C5 Physical Page C2 C5 Overlay C5 ? 1 Does the cache line belong to the overlay? 2 What is the address/tag of the overlay cache line? C3 3 How to keep the TLBs coherent?

Identifying Overlay Cache Lines: Overlay Bit Vector 16 C0 C1 C2 C3 C4 C5 Virtual Page C0 C1 C2 C3 C4 C5 Physical Page C2 C5 Overlay C5 ? 1 Does the cache line belong to the overlay? Overlay Bit Vector Indicates which cache lines belong to the overlay

Addressing Overlay Cache Lines: Naïve Approach 17 V Virtual Address Space P O Main Memory Use the location of the overlay in main memory to tag overlay cache lines 1. Processor must compute the address 2. Does not work with virtually-indexed caches 3. Complicates overlay cache line insertion

Addressing Overlay Cache Lines: Dual Address Design 18 V Virtual Address Space P O Main Memory P O Page Tables Physical Address Space Unused physical address space Overlay cache address space same size

Virtual-to-Overlay Mappings 19 V Virtual Address Space P O Main Memory P O Page Tables Physical Address Space Overlay cache address space Direct Mapping Overlay Mapping Table (OMT) (maintained by memory controller)

Keeping TLBs Coherent 20 C0 C1 C2 C3 C4 C5 Virtual Page C0 C1 C2 C3 C4 C5 Physical Page C2 C5 Overlay C5 C3 3 How to keep the TLBs coherent? Use the cache coherence protocol to keep TLBs coherent!

Final Implementation 21 CPU L1 Cache Last Level Cache Regular Physical Pages Memory Controller TLB OMT Cache OMT Overlay Bit Vectors Overlays

Other Details in the Paper Virtual-to-overlay mapping TLB and cache coherence OMT management (by the memory controller) Hardware cost – 94.5 KB of storage OS Support 22

Outline Shortcomings of existing frameworks Page Overlays – Overview Implementation – Challenges and solutions Applications and Evaluation Conclusion 23

Methodology Memsim memory system simulator [Seshadri+ PACT 2012] 2.67 GHz, single core, out-of-order, 64 entry instruction window 64-entry L1 TLB, 1024-entry L2 TLB 64KB L1 cache, 512KB L2 cache, 2MB L3 cache Multi-entry Stream Prefetcher [Srinath+ HPCA 2007] Open row, FR-FCFS, 64 entry write buffer, drain when full 64-entry OMT cache DDR MHz, 1 channel, 1 rank, 8 banks 24

Overlay-on-Write Lower memory redundancy Lower latency 25 Virtual page Physical Page Write Copy-on- Write Virtual page Physical Page Write Copy-on- Write Overlay Copy-on-Write Overlay-on-Write

Fork Benchmark Additional memory consumption Performance (cycles per instruction) 26 Parent Process Fork (child idles) Copy-on-Write Overlay-on-Write Time write 300 million insts Applications from SPEC CPU 2006 (varying write working sets)

Overlay-on-Write vs. Copy-on-Write on Fork 27 Copy-on-Write Overlay-on-Write 53% 15%

Conclusion Sub-page memory management has several applications – More efficient capacity management, protection, metadata, … Page-granularity virtual memory → inefficient implementations – Low performance and high memory redundancy Page Overlays: New Virtual Memory Framework Virtual Page → (physical page, overlay ) Overlay contains new versions of subset of cache lines Efficiently store pages with mostly similar data Largely retains existing virtual memory structure – Low cost implementation over existing frameworks Powerful access semantics – Enables many applications – E.g., overlay-on-write, efficient sparse data structure representation Improves performance and reduces memory redundancy 28 V P O

Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Vivek Seshadri Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry, Trishul