PRACTICAL, TRANSPARENT OPERATING SYSTEM SUPPORT FOR SUPERPAGES

Slides:



Advertisements
Similar presentations
CS 443 Advanced OS David R. Choffnes, Spring 2005 Practical, transparent operating system support for superpages Juan Navarro, Sitaram Iyer, Peter Druschel,
Advertisements

Chapter 4 Memory Management Basic memory management Swapping
Memory.
Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.
EECS 470 Virtual Memory Lecture 15. Why Use Virtual Memory? Decouples size of physical memory from programmer visible virtual memory Provides a convenient.
OS Fall’02 Virtual Memory Operating Systems Fall 2002.
Segmentation and Paging Considerations
Memory/Storage Architecture Lab Computer Architecture Virtual Memory.
Memory Management Design & Implementation Segmentation Chapter 4.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 17, 2003 Topic: Virtual Memory.
Memory Management (II)
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
Paging and Virtual Memory. Memory management: Review  Fixed partitioning, dynamic partitioning  Problems Internal/external fragmentation A process can.
Memory Management 2010.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
Virtual Memory I Chapter 8.
Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.
Computer Architecture Lecture 28 Fasih ur Rehman.
Lecture 19: Virtual Memory
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
A NEW PAGE TABLE FOR 64-BIT ADDRESS SPACES M. Talluri, M. D. Hill, Y. A. Kalidi University of Wisconsin, Madison Sun Microsystems Laboratories.
1 Memory Management 4.1 Basic memory management 4.2 Swapping 4.3 Virtual memory 4.4 Page replacement algorithms 4.5 Modeling page replacement algorithms.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming  To allocate scarce memory resources.
Practical, Transparent Operating System Support for Superpages J. Navarro Rice University and Universidad Católica de Chile S. Iyer, P. Druschel, A. Cox.
1 Practical, transparent operating system support for superpages Juan Navarro, Sitaram Iyer, Peter Druschel, Alan Cox OSDI 2002.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
3.1 Advanced Operating Systems Superpages TLB coverage is the amount of memory mapped by TLB. I.e. the amount of memory that can be accessed without TLB.
Operating Systems Lecture 9 Introduction to Paging Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of.
CS203 – Advanced Computer Architecture Virtual Memory.
Practical, transparent operating system support for superpages Juan Navarro, Sitaram Iyer, Peter Druschel, Alan Cox OSDI 2002.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
CS161 – Design and Architecture of Computer
Memory Management.
Non Contiguous Memory Allocation
Chapter 2 Memory and process management
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Lecture 12 Virtual Memory.
Lecture: Large Caches, Virtual Memory
CS703 - Advanced Operating Systems
Section 9: Virtual Memory (VM)
From Address Translation to Demand Paging
Some Real Problem What if a program needs more memory than the machine has? even if individual programs fit in memory, how can we run multiple programs?
Virtual Memory Chapter 8.
CSCI206 - Computer Organization & Programming
Operating System Concepts
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Evolution in Memory Management Techniques
Virtual Memory 4 classes to go! Today: Virtual Memory.
Background Program must be brought into memory and placed within a process for it to be run. Input queue – collection of processes on the disk that are.
Operating System Main Memory
CS399 New Beginnings Jonathan Walpole.
Lecture 29: Virtual Memory-Address Translation
Virtual Memory Hardware
CSE 451: Operating Systems Autumn 2005 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE451 Virtual Memory Paging Autumn 2002
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Memory Management CSE451 Andrew Whitaker.
Paging and Segmentation
CSE 542: Operating Systems
COMP755 Advanced Operating Systems
Virtual Memory.
Virtual Memory 1 1.
Presentation transcript:

PRACTICAL, TRANSPARENT OPERATING SYSTEM SUPPORT FOR SUPERPAGES J. Navarro Rice University/Universidad Católica de Chile S. Iyer, P. Druschel, A. Cox Rice University

Paper Highlights Presents an general efficient mechanism to let OS manage VM pages of different sizes Superpages Without user intervention Main motivation is to address the limitations of extant translation lookaside buffers (TLB)

THE PROBLEM

The translation look aside buffer Small high-speed memory Contains a fixed number of page table entries Content-addressable memory Entries include page frame number and page number Page frame number Bits Page number

TLB organization Usually fully associative Not always true (see Intel Nehalem) Considerably fewer entries than an L1 cache Speed considerations

Realizations (I) TLB of ULTRA SPARC III 64-bit addresses Do not even attempt to memorize this! TLB of ULTRA SPARC III 64-bit addresses Maximum program size is 16 TB (244) Supported page sizes were 4 KB, 16KB, 64 KB, 4MB ("superpages") External L2 cache had a maximum capacity of 8 MB

Realizations (II) TLB of ULTRA SPARC III Dual direct-mapping TLB Do not even attempt to memorize this! TLB of ULTRA SPARC III Dual direct-mapping TLB 64 entries for code pages 64 entries for data pages Each entry occupies 64 bits Page number and page frame number Context Valid bit, dirty bit, …

Realizations (III) Intel Nehalem Architecture: Do not even attempt to memorize this! Do not even attempt to memorize this! Intel Nehalem Architecture: Two-level TLB:First level: Two parts Data TLB has 64 entries for 4K pages (4K) or 32 for big pages (2M/4M) Instruction TLB has 128 entries for 4K pages and 7 for big pages.

Realizations (IV) Second level: Unified cache Do not even attempt to memorize this! Second level: Unified cache Can store up to 512 entries Operates only with 4K pages

The main problem TLB sizes have not grown with sizes of main memories Define TLB coverage as amount of main memory that can be accessed without incurring TLB misses Typically one gigabyte or less Relative TLB coverage is fraction of main memory that can be accessed without incurring TLB misses

Back to our examples Ultra SPARC III with 4 KB pages: (64 + 64)×4 KB = 512 KB with 16 KB pages: (64 + 64)×16 KB = 2 MB

Back to our examples Intel Nehalem with 4 KB pages: Level 1: (64 + 128)×4 KB = 768 KB Level 2: 512×4 KB = 2 MB

Relative TLB coverage evolution

Consequences Processes with very large working sets incur too many TLB misses "Significant performance penalty" Some machines have L2 caches bigger than their TLB coverage Can have TLB misses for data in L2 cache!

Solutions (I) Increase TLB size: Would increase TLB access time Would slow down memory accesses Increase page sizes: Would increase memory fragmentation Poor utilization of main memory

Solutions (II) Use multiple page sizes: Keep a relatively small "base" page size Say 4 KB Let them coexist with much larger page sizes Superpages Intel Nehalem solution

Hardware limitations (I) Superpage sizes must be supported by hardware: 4 KB, 16KB, 64 KB, 4MB for UltraSPARC III 4 KB, 2 MB and 4 MB for Intel Nehanem Ten possible page sizes from 4KB to 256M for Intel Itanium

Hardware limitations (II) Superpages must be contiguous and properly aligned in both virtual and physical address spaces Single TLB entry for each superpage All its base pages must have Same protection attributes Same clean/dirty status Will cause problems

ISSUES AND TRADE-OFFS

Allocation When we bring a page in main memory, we can Put it anywhere in RAM Will need to relocate it to a suitable place when we merge it into a superpage Put it in a location that would let us "grow" a superpage around it: Reservation-based allocation Must pick a maximum size for the potential superpage

Fragmentation control The OS must keep contiguous chunks of memory availably at any time OS will break previous reservation commitments if the superpage is unlikely to materialize Must "treat contiguity a a potentially contended resource"

Promotion Once a sufficient number of base pages within a potential superpage have been allocated, the OS may elect to promote them into a superpage. This requires Updating PTEs for all bases pages in the new superpage Bringing the missing base pages into main memory

Promotion Promotion can be incremental Progressively larger and larger superpages In use In use In use In use In use In use Free Free Superpage In use Free

Demotion OS should disband or reduce the size of a superpage whenever some portions of it fall in disuse Main problem is that OS can only track accesses at the level of the superpage

Eviction Not that different from expelling individual base pages Must flush out all base pages of any superpage containing dirty pages OS cannot ascertain which base pages remain clean

Many OS kernels use superpages Focus here is on application memory RELATED APPROACHES Many OS kernels use superpages Focus here is on application memory

Reservations Talluri and Hill: propose a reservation-based scheme reservations can be preempted emphasis is on partial subblocks HP-UX and IRIX Create superpages at page fault time User must specify a preferred per segment page size

Page relocation Relocation-based schemes Let base pages reside any place in main memory Migrate these pages to a contiguous region in main memory when they find out that superpages are "likely to be beneficial." Disadvantage: cost of copying base pages Advantage: " more robust to fragmentation"

Hardware support Two proposals Having multiple valid bits in each TLB entry Would allow small superpages to contain missing base pages Partial subblocking (Talluri and Hill) Adding additional level of address translation in memory controller Would "eliminate the contiguity requirement for superpages" (Fang et al.)

DESIGN

Allocation Use A reservation-based scheme for superpages Assumes a preferred superpage size for a given range of addresses A buddy system to manage main memory Think of scheme used to manage block fragments in Unix FFS

Preferred superpage size (I) For fixed-size memory objects, pick largest aligned superpage that Contains the faulting base page Does not overlap with other superpages or tentative superpages Does not extend over the boundaries of the object

Preferred superpage size (II) For dynamically-size memory objects, pick largest aligned superpage that Contains the faulting base page Does not overlap with other superpages or tentative superpages Does not exceed the current size of the object

Fragmentation control Mostly managed by buddy allocator Helped by page replacement daemon Modified BSD daemon is made "contiguity-aware"

Promotion Use incremental promotion Wait until superpage is fully populated Conservative approach

Demotion (I) Incremental demotion Required when A base page of a superpage is expelled from main memory Protection attributes of some base pages are changed

Demotion (II) Speculative demotion Could be done each time a superpage referenced bit is reset When memory becomes scarce Let system know which parts of a superpage are still in use

Handling dirty superpages (I) Demote superpages as soon as they a base page modified Otherwise would have to flush out whole superpage when it will be expelled from main memory Because there is one single dirty bit per superpage

Handling dirty superpages (II) A superpage has been modified The whole superpage is dirty We break up the superpage All other pages remain clean X X

Multi-list reservation scheme Maintains separate lists for each superpage size supported by the hardware, but largest one Each list contains reserved frames that could still accommodate a superpage of that size Sorted by time of their most recent page frame allocation Oldest entries are preempted first

Example Area above contains 8 page frames reserved for a possible superpage Three frames are allocated, five are free Breaking the reservation will free space for A superpage with 4 base pages or Two superpages with two base page each

Population maps One per memory object Keep track of allocated pages within each object

EVALUATION

Benchmarks Thirty-five representative programs running on an Alpha processor Four page sizes: 8 KB, 64 KB, 512 KB and 4 MB Fully associative TLB with 128 entries for code and 128 for data 512 MB of RAM Separate 64 KB code and 64 KB data L1 caches 4 MB unified L2 cache

Results (I) Eighteen out of 35 benchmarks showed improvements over 5 percent Ten out of 35 showed improvements over 25 percent A single application showed a degradation of 1.5 percent Allocator does not does not distinguish zeroed-out pages from other free pages

Results (II) Different applications benefit most from different superpage sizes Should let system choose among multiple page sizes Contiguity-aware page replacement daemon can maintain enough contiguous regions Huge penalty for not demoting dirty superpages Overheads are small

CONCLUSION It works and does not require any changes to existing hardware