Redundant Memory Mappings for Fast Access to Large Memories

Slides:



Advertisements
Similar presentations
Memory.
Advertisements

Efficient Virtual Memory for Big Memory Servers U Wisc and HP Labs ISCA’13 Architecture Reading Club Summer'131.
4/14/2017 Discussed Earlier segmentation - the process address space is divided into logical pieces called segments. The following are the example of types.
May 7, A Real Problem  What if you wanted to run a program that needs more memory than you have?
OS Memory Addressing.
1 A Real Problem  What if you wanted to run a program that needs more memory than you have?
CS 153 Design of Operating Systems Spring 2015
CS 333 Introduction to Operating Systems Class 12 - Virtual Memory (2) Jonathan Walpole Computer Science Portland State University.
UQC152H3 Advanced OS Memory Management under Linux.
CS 333 Introduction to Operating Systems Class 11 – Virtual Memory (1)
Memory Management. 2 How to create a process? On Unix systems, executable read by loader Compiler: generates one object file per source file Linker: combines.
Chapter 3.2 : Virtual Memory
Chapter 3 Memory Management
Chapter 7: Main Memory CS 170, Fall Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation.
Silberschatz, Galvin and Gagne  Operating System Concepts Segmentation Memory-management scheme that supports user view of memory. A program.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 24 Instructor: L.N. Bhuyan
Chapter 8: Main Memory.
CS 346 – Chapter 8 Main memory –Addressing –Swapping –Allocation and fragmentation –Paging –Segmentation Commitment –Please finish chapter 8.
Address Translation Mechanism of 80386
Operating Systems Chapter 8
Chapter 8 Memory Management Dr. Yingwu Zhu. Outline Background Basic Concepts Memory Allocation.
Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Vivek Seshadri Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu,
CoLT: Coalesced Large-Reach TLBs December 2012 Binh Pham §, Viswanathan Vaidyanathan §, Aamer Jaleel ǂ, Abhishek Bhattacharjee § § Rutgers University ǂ.
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
Revisiting Hardware-Assisted Page Walks for Virtualized Systems
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
Chapter 4 Memory Management Virtual Memory.
Accelerating Two-Dimensional Page Walks for Virtualized Systems Jun Ma.
CS399 New Beginnings Jonathan Walpole. Virtual Memory (1)
1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
Operating Systems Unit 7: – Virtual Memory organization Operating Systems.
CSC 360, Instructor Kui Wu Memory Management I: Main Memory.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
Memory Management Continued Questions answered in this lecture: What is paging? How can segmentation and paging be combined? How can one speed up address.
OS Memory Addressing. Architecture CPU – Processing units – Caches – Interrupt controllers – MMU Memory Interconnect North bridge South bridge PCI, etc.
CS203 – Advanced Computer Architecture Virtual Memory.
Chapter 7: Main Memory CS 170, Fall Program Execution & Memory Management Program execution Swapping Contiguous Memory Allocation Paging Structure.
W4118 Operating Systems Instructor: Junfeng Yang.
Agile Paging: Exceeding the Best of Nested and Shadow Paging
Translation Lookaside Buffer
Lecture 11 Virtual Memory
Virtual Memory Chapter 7.4.
Chang Hyun Park, Taekyung Heo, and Jaehyuk Huh
CSE 120 Principles of Operating
Lecture Topics: 11/19 Paging Page tables Memory protection, validation
Virtual Memory User memory model so far:
CS703 - Advanced Operating Systems
Section 9: Virtual Memory (VM)
143A: Principles of Operating Systems Lecture 6: Address translation (Paging) Anton Burtsev October, 2017.
Address Translation Mechanism of 80386
Mosaic: A GPU Memory Manager
Some Real Problem What if a program needs more memory than the machine has? even if individual programs fit in memory, how can we run multiple programs?
CS510 Operating System Foundations
Energy-Efficient Address Translation
Virtual Memory Partially Adapted from:
Executive Summary Problem: Overheads of virtual memory can be high
CSCI206 - Computer Organization & Programming
Reducing Memory Reference Energy with Opportunistic Virtual Caching
Virtual Memory فصل هشتم.
Translation Lookaside Buffer
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Lecture 7: Flexible Address Translation
Memory Management CSE451 Andrew Whitaker.
Lecture 8: Efficient Address Translation
Paging and Segmentation
CS703 - Advanced Operating Systems
CSE 542: Operating Systems
CS 444/544 Operating Systems II Virtual Memory Translation
Presentation transcript:

Redundant Memory Mappings for Fast Access to Large Memories Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S. Mckinley, Mario Nemirovsky, Michael M. Swift, Osman S. Ünsal Provide Title and introduce everyone.

Executive Summary Proposal: Redundant Memory Mappings Result: Problem: Virtual memory overheads are high (up to 41%) Proposal: Redundant Memory Mappings Propose compact representation called range translation Range Translation – arbitrarily large contiguous mapping Effectively cache, manage and facilitate range translations Retain flexibility of 4KB paging Result: Reduces overheads of virtual memory to less than 1%

Outline Motivation  Design: Redundant Memory Mappings Results Virtual Memory Refresher + Key Technology Trends Previous Approaches Goals + Key Observation Design: Redundant Memory Mappings Results Conclusion

Virtual Memory Refresher Virtual Address Space Page Table Physical Memory Process 1 Challenge: How to reduce costly page walks? Process 2 TLB (Translation Lookaside Buffer)

Two Technology Trends TLB reach is limited Year Processor L1 DTLB entries 1999 Pent. III 72 2001 Pent. 4 64 2008 Nehalem 96 2012 IvyBridge 100 2015 Broadwell TLB reach is limited *Inflation-adjusted 2011 USD, from: jcmit.com

0. Page-based Translation Virtual Memory TLB VPN0 PFN0 Benefits + Most flexible + Only 4KB alignment required Disadvantage ─ Requires more TLB entries ─ Limited TLB reach Physical Memory

1. Multipage Mapping Sub-blocked TLB/CoLT Clustered TLB Virtual Memory VPN(0-3) PFN(0-3) Map Bitmap Benefits + Increased the TLB reach by 8X-32X + Clustering of pages allowed Disadvantages ─ Requires size alignment ─ Contiguity required Physical Memory [ASPLOS’94, MICRO’12 and HPCA’14]

2. Large Pages Large Page TLB Virtual Memory VPN0 PFN0 Benefits + Increases TLB reach + 2MB and 1GB page size in x86-64 Disadvantages ─ Size alignment restriction ─ Contiguity required Physical Memory [Transparent Huge Pages and libhugetlbfs]

3. Direct Segments Direct Segment BASE LIMIT Virtual Memory (BASE,LIMIT)  OFFSET OFFSET Benefits + Unlimited reach + No size alignment required Disadvantages ─ One direct segment per program ─ Not transparent to application ─ Applicable to big-memory workloads Physical Memory [ISCA’13 and MICRO’14]

Can we get best of many worlds? Multipage Mapping Large Pages Direct Segments Our Proposal Flexible alignment Arbitrary reach Multiple entries Transparent to applications Applicable to all workloads

Key Observation Virtual Memory Physical Memory

Key Observation Large contiguous regions of virtual memory Code Heap Stack Shared Lib. Virtual Memory Large contiguous regions of virtual memory Limited in number: only a few handful Physical Memory

Compact Representation: Range Translation BASE1 LIMIT1 Virtual Memory OFFSET1 Range Translation 1 Physical Memory Range Translation: is a mapping between contiguous virtual pages mapped to contiguous physical pages with uniform protection

Redundant Memory Mappings Range Translation 3 Virtual Memory Range Translation 2 Range Translation 4 Range Translation 1 Range Translation 5 Physical Memory Map most of process’s virtual address space redundantly with modest number of range translations in addition to page mappings

Outline Motivation Design: Redundant Memory Mappings  Results A. Caching Range Translations B. Managing Range Translations C. Facilitating Range Translations Results Conclusion

A. Caching Range Translations V47 …………. V12 L1 DTLB L2 DTLB Range TLB Enhanced Page Table Walker Page Table Walker P47 …………. P12

A. Caching Range Translations V47 …………. V12 Hit L1 DTLB L2 DTLB Range TLB Enhanced Page Table Walker P47 …………. P12

A. Caching Range Translations V47 …………. V12 Miss L1 DTLB Refill L2 DTLB Range TLB Hit Enhanced Page Table Walker P47 …………. P12

A. Caching Range Translations V47 …………. V12 Refill Miss L1 DTLB L2 DTLB Range TLB Hit Enhanced Page Table Walker P47 …………. P12

A. Caching Range Translations Miss V47 …………. V12 P47 …………. P12 L1 DTLB Range TLB L2 DTLB Hit Refill Entry 1 BASE 1 ≤ > LIMIT 1 OFFSET 1 Protection 1 Entry N BASE N ≤ > LIMIT N OFFSET N Protection N L1 TLB Entry Generator Logic: (Virtual Address + OFFSET) Protection

A. Caching Range Translations V47 …………. V12 Miss L1 DTLB L2 DTLB Range TLB Miss Miss Enhanced Page Table Walker P47 …………. P12

B. Managing Range Translations Stores all the range translations in a OS managed structure Per-process like page-table Range Table CR-RT RTC RTD RTF RTG RTA RTB RTE

B. Managing Range Translations On a L2+Range TLB miss, what structure to walk? A) Page Table B) Range Table C) Both A) and B) D) Either? Is a virtual page part of range? – Not known at a miss

B. Managing Range Translations Redundancy to the rescue One bit in page table entry denotes that page is part of a range 2 1 3 Page Table Walk Application resumes memory access Range Table Walk (Background) CR-3 CR-RT RTC RTD RTF RTG RTA RTB RTE Part of a range Insert into L1 TLB Insert into Range TLB

C. Facilitating Range Translations Demand Paging Virtual Memory Physical Memory Does not facilitate physical page contiguity for range creation

C. Facilitating Range Translations Eager Paging Virtual Memory Physical Memory Allocate physical pages when virtual memory is allocated Increases range sizes  Reduces number of ranges

Outline Motivation Design: Redundant Memory Mappings Results  Methodology Performance Results Virtual Contiguity Conclusion

Methodology Measure cost on page walks on real hardware Intel 12-core Sandy-bridge with 96GB memory 64-entry L1 TLB + 512-entry L2 TLB 4-way associative for 4KB pages 32-entry L1 TLB 4-way associative for 2MB pages Prototype Eager Paging and Emulator in Linux v3.15.5 BadgerTrap for online analysis of TLB misses and emulate Range TLB Linear model to predict performance Workloads Big-memory workloads, SPEC 2006, BioBench, PARSEC

Comparisons 4KB: Baseline using 4KB paging THP: Transparent Huge Pages using 2MB paging [Transparent Huge Pages] CTLB: Clustered TLB with cluster of 8 4KB entries [HPCA’14] DS: Direct Segments [ISCA’13 and MICRO’14] RMM: Our proposal: Redundant Memory Mappings [ISCA’15]

Performance Results Assumptions: CTLB: 512 entry fully-associative RMM: 32 entry fully-associative Both in parallel with L2 Measured using performance counters Modeled based on emulator 5/14 workloads Rest in paper Make this animated and add points to raise.

Overheads of using 4KB pages are very high Performance Results Overheads of using 4KB pages are very high Make this animated and add points to raise.

Clustered TLB works well, but limited by 8x reach Performance Results Clustered TLB works well, but limited by 8x reach Make this animated and add points to raise.

2MB page helps with 512x reach: Overheads not very low Performance Results 2MB page helps with 512x reach: Overheads not very low Make this animated and add points to raise.

Direct Segment perfect for some but not all workloads Performance Results Direct Segment perfect for some but not all workloads Make this animated and add points to raise.

RMM achieves low overheads robustly across all workloads Performance Results RMM achieves low overheads robustly across all workloads Make this animated and add points to raise.

Why low overheads? Virtual Contiguity Benchmark Paging Ideal RMM ranges 4KB + 2MB THP # of ranges #of ranges to cover more than 99% of memory cactusADM 1365 + 333 112 49 canneal 10016 + 359 77 4 graph500 8983 + 35725 86 3 mcf 1737 + 839 55 1 tigr 28299 + 235 16 Only 10s-100s of ranges per application 1000s of TLB entries required Only few ranges for 99% coverage

Summary Proposal: Redundant Memory Mappings Result: Problem: Virtual memory overheads are high Proposal: Redundant Memory Mappings Propose compact representation called range translation Range Translation – arbitrarily large contiguous mapping Effectively cache, manage and facilitate range translations Retain flexibility of 4KB paging Result: Reduces overheads of virtual memory to less than 1%

Questions ?