2012. 06. 13 Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011.

Slides:



Advertisements
Similar presentations
Fabián E. Bustamante, Spring 2007
Advertisements

D. Tam, R. Azimi, L. Soares, M. Stumm, University of Toronto Appeared in ASPLOS XIV (2009) Reading Group by Theo 1.
University of Maryland Locality Optimizations in cc-NUMA Architectures Using Hardware Counters and Dyninst Mustafa M. Tikir Jeffrey K. Hollingsworth.
Memory/Storage Architecture Lab Computer Architecture Virtual Memory.
CS 333 Introduction to Operating Systems Class 12 - Virtual Memory (2) Jonathan Walpole Computer Science Portland State University.
Memory Management and Paging CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
Memory Management 2 Tanenbaum Ch. 3 Silberschatz Ch. 8,9.
Answers to the VM Problems Spring First question A computer has 32 bit addresses and a virtual memory with a page size of 8 kilobytes.  How many.
CS 333 Introduction to Operating Systems Class 12 - Virtual Memory (2) Jonathan Walpole Computer Science Portland State University.
Chapter VIII Virtual Memory Review Questions Jehan-François Pâris
Memory Management in Windows and Linux &. Windows Memory Management Virtual memory manager (VMM) –Executive component responsible for managing memory.
CS333 Intro to Operating Systems Jonathan Walpole.
Memory Addressing in Linux  Logical Address machine language instruction location  Linear address (virtual address) a single 32 but unsigned integer.
A Lightweight Hybrid Hardware/Software Approach for Object-Relative Memory Profiling Licheng Chen, Zehan Cui, Yungang Bao, Mingyu Chen, Yongbing Huang,
Page Overlays An Enhanced Virtual Memory Framework to Enable Fine-grained Memory Management Vivek Seshadri Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu,
Lecture 19: Virtual Memory
1 Lecture: Virtual Memory, DRAM Main Memory Topics: virtual memory, TLB/cache access, DRAM intro (Sections 2.2)
Adaptive Cache Partitioning on a Composite Core Jiecao Yu, Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Scott Mahlke Computer Engineering Lab University.
CoLT: Coalesced Large-Reach TLBs December 2012 Binh Pham §, Viswanathan Vaidyanathan §, Aamer Jaleel ǂ, Abhishek Bhattacharjee § § Rutgers University ǂ.
Revisiting Hardware-Assisted Page Walks for Virtualized Systems
Operating Systems COMP 4850/CISG 5550 Page Tables TLBs Inverted Page Tables Dr. James Money.
Practical, Transparent Operating System Support for Superpages J. Navarro Rice University and Universidad Católica de Chile S. Iyer, P. Druschel, A. Cox.
Virtual Memory 1 1.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Redundant Memory Mappings for Fast Access to Large Memories
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
3.1 Advanced Operating Systems Superpages TLB coverage is the amount of memory mapped by TLB. I.e. the amount of memory that can be accessed without TLB.
CS203 – Advanced Computer Architecture Virtual Memory.
Practical, transparent operating system support for superpages Juan Navarro, Sitaram Iyer, Peter Druschel, Alan Cox OSDI 2002.
Operating Systems, Winter Semester 2011 Practical Session 9, Memory 1.
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing Liquin Cheng, John B. Carter and Donglai Dai cs.utah.edu by Evangelos Vlachos.
CMSC 611: Advanced Computer Architecture
Virtual Memory Chapter 7.4.
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Adaptive Cache Partitioning on a Composite Core
A Real Problem What if you wanted to run a program that needs more memory than you have? September 11, 2018.
Virtual Memory - Part II
Virtual Memory User memory model so far:
Section 9: Virtual Memory (VM)
143A: Principles of Operating Systems Lecture 6: Address translation (Paging) Anton Burtsev October, 2017.
Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Part V Memory System Design
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
Reducing Memory Reference Energy with Opportunistic Virtual Caching
Lecture 17: Case Studies Topics: case studies for virtual memory and cache hierarchies (Sections )
Address-Value Delta (AVD) Prediction
ECE Dept., University of Toronto
PRACTICAL, TRANSPARENT OPERATING SYSTEM SUPPORT FOR SUPERPAGES
Lecture 29: Virtual Memory-Address Translation
Virtual Memory Hardware
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
CSE 451: Operating Systems Autumn 2005 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
© 2004 Ed Lazowska & Hank Levy
CSE451 Virtual Memory Paging Autumn 2002
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Lecture 8: Efficient Address Translation
Paging and Segmentation
CSE 451: Operating Systems Winter 2005 Page Tables, TLBs, and Other Pragmatics Steve Gribble 1.
Lecture 9: Caching and Demand-Paged Virtual Memory
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Virtual Memory 1 1.
Presentation transcript:

Miseon Han Thomas W. Barr, Alan L. Cox, Scott Rixner Rice Computer Architecture Group, Rice University ISCA, June 2011

Motivation

Virtual memory – Performance overhead 5-14% for ‘typical’ applications [Bhargava08] – 89% under virtualization [Bhargava08] – Large pages not always a good solution 3

What page size to pick? – 4KB, 2MB, 1GB on x86 Can’t always use largest size – Wasted memory – increased I/O traffic Dynamic page size selection 4

SpecTLB (Speculative TLB) – A hardware/software system Reservation-based physical memory allocator [Talluri94] – Allocate small pages by default to maintain fine-grained control Predict small page translations in hardware – Performance of large pages, control of small pages 5

Background

Four-level radix-tree page table 7 0x5c8315cc2016 [47:39] [38:30] [29:21] [20:12] [11:0] {0b9, 00c, 0ae, 0c2, 016} {123, 016}

Page table levels describe physical address space at different granularity 8 512GB1GB2MB4KB

Reservation-based memory allocation [Talluri94] – Always allocate small pages in book-keeping entry at first – Place these small pages in a large page ‘reservation’ if the handler decides that reservation is needed – Promote reservation to large page when all small pages in the reservation are allocated – Extended and implemented in FreeBSD [Navarro02] Default memory allocator 9

10 Handler reserves 2MB region of physical space

11 Reservation is ‘promoted’ into Large page.

12 Reservations may not be filled.

13

SpecTLB

TLB-like structure – Tracks reservations, not actual mappings – Detect reservations – Predict translations – Verify predictions 15

16 {0b9, 00c, 0ae, 002, 313}{8002, 313} Virtual AddressPhysical Address {0b9, 00c, 0ae, 000, 000}{8000, 000} Current Reservations: {8000, 000}

17 {0b9, 00c, 0ae, 005, 313}{8005, 313}? Virtual AddressPhysical Address {0b9, 00c, 0ae, 000, 000}{8000, 000} Current Reservations: {8000, 000} ?

Provides predicted translations for pages within tracked reservations Predictions may be incorrect – Page table must still be walked Page walk can occur in parallel Latency hidden – Speculative translation can be used concurrently Microarchitecture cancels speculative work 18

Simulation & Result

BenchmarkTLB miss rate (/1k DRAM accesses) Speculative Prediction frequency Prediction Accuracy DRAM Accesses Overlapped PostgreSQL python SPECjbb bzip gcc mcf dc.B ep.C Full system simulator, unmodified FreeBSD kernel

SpecTLB and TLB prefetching hide the latency of TLB misses. – SpecTLB : large-page reservations. current TLB miss. – TLB prefetcher : access patterns, future TLB miss. Speculative work – SpecTLB : instructions are executed parallel with translation confirm. – TLB prefetcher : prefetch page table entries. 21

Generally hides fewer walks than SpecTLB – Prefetcher does well with high access regularity 22 BenchmarkTLB miss rate SpecTLBTLB Prefetcher PostgreSQL python SPECjbb bzip gcc mcf dc.B ep.C

SpecTLB hides latency of TLB misses – Predictions allow page walk to occur in parallel with speculative work – >62% of TLB miss latencies hidden for majority of benchmarks 23