CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Virtual Memory Primitives for User Programs Andrew Appel and Kai Li Princeton U. Appears in ASPLOS.

Slides:



Advertisements
Similar presentations
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Advertisements

Virtual Memory Management G. Anuradha Ref:- Galvin.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 3 Memory Management Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
Virtual Memory Operating System Concepts chapter 9 CS 355
Virtual Memory Primitives for User Programs Andrew W. Appel and Kai Li Presented by Phil Howard.
OS Fall’02 Virtual Memory Operating Systems Fall 2002.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
9.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Virtual Memory OSC: Chapter 9. Demand Paging Copy-on-Write Page Replacement.
Garbage Collection CSCI 2720 Spring Static vs. Dynamic Allocation Early versions of Fortran –All memory was static C –Mix of static and dynamic.
G Robert Grimm New York University Cool Pet Tricks with… …Virtual Memory.
G Robert Grimm New York University Recoverable Virtual Memory.
Architectural Support for OS March 29, 2000 Instructor: Gary Kimura Slides courtesy of Hank Levy.
Virtual Memory Primitives for User Programs Andrew W. Appel and Kai Li Presented by: Khanh Nguyen.
Virtual Memory Virtual Memory Management in Mach Labels and Event Processes in Asbestos Ingar Arntzen.
Virtual Memory Art Munson CS614 Presentation February 10, 2004.
Memory Management and Paging CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han.
Paging and Virtual Memory. Memory management: Review  Fixed partitioning, dynamic partitioning  Problems Internal/external fragmentation A process can.
Chapter 9: Virtual Memory. 9.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 22, 2005 Chapter 9: Virtual Memory Background.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
G Robert Grimm New York University Recoverable Virtual Memory.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
1 Process Description and Control Chapter 3 = Why process? = What is a process? = How to represent processes? = How to control processes?
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
Virtual Memory.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
CS 153 Design of Operating Systems Spring 2015 Lecture 17: Paging.
CS533 - Concepts of Operating Systems Virtual Memory Primitives for User Programs Presentation by David Florey.
Operating Systems ECE344 Ding Yuan Paging Lecture 8: Paging.
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
Lecture Topics: 11/17 Page tables TLBs Virtual memory flat page tables
Lecture 11 Page 1 CS 111 Online Memory Management: Paging and Virtual Memory CS 111 On-Line MS Program Operating Systems Peter Reiher.
Computer Studies (AL) Memory Management Virtual Memory I.
Chapter 4 Memory Management Virtual Memory.
Memory Management Fundamentals Virtual Memory. Outline Introduction Motivation for virtual memory Paging – general concepts –Principle of locality, demand.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 9: Virtual Memory.
1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?
1 CSE451 Architectural Supports for Operating Systems Autumn 2002 Gary Kimura Lecture #2 October 2, 2002.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Paging (continued) & Caching CS-3013 A-term Paging (continued) & Caching CS-3013 Operating Systems A-term 2008 (Slides include materials from Modern.
Processes and Virtual Memory
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
NETW3005 Virtual Memory. Reading For this lecture, you should have read Chapter 9 (Sections 1-7). NETW3005 (Operating Systems) Lecture 08 - Virtual Memory2.
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
1 Chapter 10: Virtual Memory Background Demand Paging Process Creation Page Replacement Allocation of Frames Thrashing Operating System Examples (not covered.
Simple Generational GC Andrew W. Appel (Practice and Experience, February 1989) Rudy Kaplan Depena CS 395T: Memory Management February 9, 2009.
CS161 – Design and Architecture of Computer
Jonathan Walpole Computer Science Portland State University
CS161 – Design and Architecture of Computer
A Real Problem What if you wanted to run a program that needs more memory than you have? September 11, 2018.
Outline Paging Swapping and demand paging Virtual memory.
Chapter 9: Virtual Memory
Swapping Segmented paging allows us to have non-contiguous allocations
Lecture 28: Virtual Memory-Address Translation
Chapter 9: Virtual-Memory Management
Page Replacement.
CS399 New Beginnings Jonathan Walpole.
Virtual Memory Hardware
Architectural Support for OS
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Architectural Support for OS
Lecture 9: Caching and Demand-Paged Virtual Memory
Translation Buffers (TLBs)
Synonyms v.p. x, process A v.p # index Map to same physical page
Review What are the advantages/disadvantages of pages versus segments?
CSE 542: Operating Systems
Presentation transcript:

CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Virtual Memory Primitives for User Programs Andrew Appel and Kai Li Princeton U. Appears in ASPLOS 1991 Presented by: Fabián E. Bustamante

2 Motivation Uses of virtual memory –Traditional – increase size of address space by keeping frequently-accessed subset in physical mem. –IPC through shared pages –Guarantee reentrance making instruction-spaces read-only –Zeroed-on-demand or copy-on-write portions of memory –… Modern OS enable user-programs to play such tricks by allowing association of handlers w/ protection violations This work –Look at examples of algorithms using user-level page- protection techniques –Benchmarks today’s (1991) OS support of such techniques –Draw some lessons for OS implementations

3 VM Primitives OS’ VM services needed by some of these apps. TRAP – handle page-fault traps in user-mode PROT1 – decrease accessibility of 1 page PROTN – same for N pages UNPROT – increase accessibility of 1 page DIRTY – return list of dirty pages since previous call MAP2 – map same physical page at 2 != virtual addresses, at != levels of protections, in the same address PAGESIZE changes? PROT1 & PROTN – some OSs may support only one, but most apps do this for a batch (not commonly needed for UNPROT) DIRTY could be done in user-mode based on PROTN, TRAP and UNPROT but the OS can do it more efficiently Some apps want a thread to have access to a particular page but want others to fault on that page; this can be done w/ MAP2

4 VM Applications A sample of applications which use VM primitives to draw general conclusions on what user programs require from the OS & HW Concurrent garbage collection Shared virtual memory Concurrent checkpointing Generational garbage collection Persistent stores Extending addressability Data-compression paging Heap overflow detection

5 VM Apps – Real-time, concurrent GC Stop-and-copy GC –Memory divided in 2 contiguous regions: from- & to-space –At the beginning of a collections, all objects in from-space –Collector, starting from registers & other global roots, traces out the graph of reachable objects, copying them to to-space –When done, what’s left in from-space is garbage –At that point, roles of from- & to-space are reversed (flip) –Mutator (app program) runs until to-space is full Forwarding – examining a pointer into from-space, copying the referenced object if necessary & updating the pointer Obvious problem – long delay while mutator is suspended Real-time GC – the mutator is never interrupted for longer than a very small constant time Baker’s (RT, sequential) - Mutator sees only to-space pointers – –Checking every pointer fetch requires HW support for efficiency –Needs to be sequential to avoid conflicting access to object

6 Real-time, concurrent GC Instead of checking every pointer fetched from mem., collector uses VM page protection Pages in un-scanned area are set to “no access” When mutator tries to access an unscanned object, it gets a page-access trap Collector fields trap –Scans objects in page, copying from-space ones & forwarding pointers as necessary –Unprotects page & resume mutator Collector also runs concurrently scanning pages & unprotecting them as it goes (to ↓ mutator’s page-access traps) Algorithm requires –TRAP – to detect fetches from unscanned area –PROTN – to mark the entire to-space inaccessible during flip –UNPROT – needed as each page is scanned –MAP2 – to allow collector to scan it while mutator can’t access it –You may want to reduce page size, obviously

7 Another use, implement shared VM on network of computers Basic idea – use paging mechanism to control & maintain single-writer/multiple-reader coherence at page level All nodes see a coherent shared memory address space, as big as allowed by MMU Read-only pages replicated in 1+ nodes Write-only only in one Mem. mapping manager see its local memory as a big cache of SVM address space Mem. reference may cause a page fault when page is not in node’s physical mem. Algorithms requires –TRAP, PROT1, UNPROT, MAP2 and maybe PAGESIZE Shared VM Shared virtual memory Mapping Manager Mem CPU Mapping Manager Mem CPU Mapping Manager Mem CPU

8 Concurrent checkpointing Idea – use mechanisms to make checkpointing concurrent & real-time Instead of saving the writable main mem. to disk all at once Set all AS as ‘read only’ Restart program’s threads A copying thread sequentially copy pages to a separate virtual AS as it goes When done copying, set AS back to read/write While program makes read memory references – no problem Write attempt to not-yet copied page –Page fault, copying thread immediately copies page, sets access to read/write, & restarts faulting thread Algorithm requires –TRAP, PROT1, PROTN, UNPROT, and DIRTY; a medium PAGESIZE may be appropriate

9 Generational GC An efficient GC algorithm that depends on properties of dynamically allocated records in LISP & like languages –Younger records are much more likely to die soon than older ones –Younger records tend to point to older records Allocated records kept in distinct areas G i of memory (generations) –Records in G i older than records in G I+1 Idea – used VM to detect pointers form older Gs to new ones –DIRTY if available (GC examined dirty pages) –Write protect older generations, –The trap handler then save address on a list for the GC –At GC time, the GC scans list for possible pointers into the youngest ones Algorithm requires –DIRTY or TRAP, PROTN, and UNPROT –Smaller PAGESIZE may be good as time for GC depends on page size

10 Persistent stores Persistent store – a dynamic allocation heap that persist bet/ program invocations Program execution may traverse, modify, commit and/or abort modifications Traversals should be as fast as for in-core data Done through VM – persistent store ~mem. mapped disk file However, permanent image of a modified object should not be modified until the commit GC can be used to improve efficiency, recovering pages, collocating related objects, etc Algorithm require –TRAP and UNPROT, and file-mapping with copy-on-write –If copy-on-write is not available, simulate it w/ PROTN, UNPROT and MAP2

11 Extending addressability Persistent store might grow > 2 32 objects; a problem for a 32-bit machine However, at any one run a program would probably access < 2 32 objects Idea – –Use disk as second stage, disk pages use 64-bit –When disk page brought to disk, translate addresses from 64-32b w/ a translation table –Translation table per session Algorithm requires –TRAP, UNPROT, PROT1 or PROTN –In a multithreaded environment, MAP2

12 Data-compression paging In typical linked data structure, many words point to nearby objects, others are nil … basically, small entropy of the average word A GC can reduce it further by putting close-by objects that point to each other Idea – compress a page instead of paging it out (decompressing may be cheaper than page it in) Algorithm requires –TRAP, PROT1 (or PROTN), UNPROT –OS support (?) to determine when pages are not recently used

13 Heap overflow detection Process’ or thread’s stack requires protection against overflow A well-known technique – mark pages above the top of the stack as invalid → memory access will cause a page fault In most implementations of Unix – stack pages are not allocated until used, requires –TRAP, PROTN, UNPROT Similar technique can be used in a garbage-collected system – here size of allocated region is commonly small → performance is an issue

14 Usage of VM system services MethodsTRAPPROT1PROTNUNPROTMAP2DIRTYPAGESIZE Concurrent GC√√√√√ SVM√√√√√ Concurrent checkpoint √√√‡√ Generational GC√√√‡√ Persistent store√√√√ Extending addressability √**√√√ Data-compression paging √**√√ Heap overflow√†√ * Extending addressability and data compression paging use PROT1 only to remove inactive pages; the batching technique described in Sec. 5 could be used instead † VM-based heap-overflow detection can be used even w/o explicit memory-protection primitives, as long as there’s a usable boundary b/ accessible/inaccessible mem. ‡ Dirty-page bookkeeping can be simulated by using PROTN, TRAP and UNPROT

15 VM primitive performance Two classes of algorithms –Protect pages in large batches, upon each page-fault trap, unprotect 1 page –Protect a page and unprotect it Since PROTN or PROT1, TRAP and UNPROT are always used together – measured them together (one is slow, everything is) Two microbenchmarks –Sum of PROT1, TRAP, UNPROT x 100 Access a random protected page In fault handler, protect 1 other page, & unprotect faulting page –Sum of PROTN, TRAP, UNPORT x 100 Protect 100 pages, access each in a random sequence In the fault handler unprotect faulting page Other measurements –Time for a single instruction (ADD) – 20-instrution loops w/ 18 ADDs –Time for trap handler that does not change mem. protection Three OSs: Ultrix, SunOS, MACH 5 Archs: Sun 3/60, SparStn 1, DEC 3100, microVax 3m & i386 on iPSC/2

16 Microbenchmarks results MachineOSADDTRAPTRAP+ PROT1+ UNPROT TRAP+ PROTN+ UNPROT MAP2PAGESIZE Sun 3/60 SunOS 4.0 SunOS 4.1 Mach 2.5(xp) Mach 2.5(exc) Yes 8192 SparcStn 1 SunOS 4.0.3c SunOS 4.1 Mach 2.5(xp) Mach 2.5(exc) Yes 4096 DEC 3100 Ultrix 4.1 Mach 2.5(xp) Mach 2.5(exc) No 4096 µVax 3Ultrix No1024 I386 on iPSC/2NX/ yes4096 Best case Worst case Quite different between architectures

17 System design issues Lessons on hardware and OS design TLB consistency –Many of the algorithms make memory less accessible in large batches & more accessible 1 at a time –A good thing, specially in a multiprocessor When made less-accessible, outdated info in TLBs is dangerous – flush TLBs (shootdown – interrupt & request flushing) SW shootdown is expensive, but you can batch it Optimal page size –Page size traditionally big given disk overhead, … –For user-handled faults processed entirely in CPU, smaller is better –Effect of varying page size can be done on HW w/ small page size – for PROT & UNPROT use small size, w/ disk use multi-page block

18 System design issues Access to protected pages –Many algorithms need way for user-mode service routine to access a page while client threads have no access –Several ways to do this (illustrated w/ concurrent CG) Multiple mapping of same page at != addresses System call to copy memory to/from a protected area ($$$$ memory copies) Shared pages bet/ processes, collector running as a different process ($$ context switches) Collector running inside kernel – not the best place for GC –Best – multiple mapping at some extra cost w/ two entries in page table for each physical page; potential of cache inconsistency (some mapping may be stale)

19 System design issues Is this too much to ask? –Synchronous memory management algorithms may be problematic in highly pipelined machines (unless you got some hardware support) Instructions half way done with results written into registers Possible addressing-mode side-effects –However, all but the heap overflow detection algorithm are sufficiently asynchronous ~ like the behavior of a traditional disk-pager from the machine’s point of view Other primitives –For persistent store w/ transaction – pin a page –External-pager interface – the OS can tell the client which pages are LRU and about to be paged out

20 Conclusions VM not just a tool for implementing large address spaces & protecting one user process from another Several algorithms rely on VM primitives … but these primitives haven’t been paid enough attention Common traits of surveyed algorithms –Mem. Is made less-accessible in large batches and more- accessible on at a time –Fault-handling is done almost entirely by the CPU & take time proportional to the page size –Page faults results in faulting page being made more accessible –Frequency of faults is inversely related to locality of reference of the client program – so algorithms should scale well –User-mode service routines need to access pages that are protected from user-mode client routines –But don’t need to examine client’s CPU state