Resource Management Policy and Mechanism Jeff Chase Duke University.

Resource Management Policy and Mechanism Jeff Chase Duke University

The kernel syscall trap/returnfault/return interrupt/return system call layer: files, processes, IPC, thread syscalls fault entry: VM page faults, signals, etc. I/O completionstimer ticks thread/CPU/core management: sleep and ready queues memory management: block/page cache sleep queueready queue

The kernel syscall trap/returnfault/return interrupt/return system call layer: files, processes, IPC, thread syscalls fault entry: VM page faults, signals, etc. I/O completionstimer ticks thread/CPU/core management: sleep and ready queues memory management: block/page cache sleep queueready queue policy

Separation of policy and mechanism Every OS platform has mechanisms that enable it to mediate access to machine resources. – Gain control of core by timer interrupts – Fault on access to non-resident virtual memory – I/O through system call traps – Internal code and data structures to track resource usage and allocate resources The mechanisms enable resource management policy. But the mechanisms do not and must/should not determine the policy. We might want to change the policy!

Goals of policy Share resources fairly. Use machine resources efficiently. Be responsive to user interaction. But what do these things mean? How do we know if a policy is good or not? What are the metrics? What do we assume about the workload?

Memory Allocation How should an OS allocate its memory resources among contending demands? – Virtual address spaces: fork, exec, sbrk, page fault. – The kernel controls how many machine memory frames back the pages of each virtual address space. – The kernel can take memory away from a VAS at any time. – The kernel always gets control if a VAS (or rather a thread running within a VAS) asks for more. – The kernel controls how much machine memory to use as a cache for data blocks whose home is on slow storage. – Policy choices: which pages or blocks to keep in memory? And which ones to evict from memory to make room for others?

What is a Virtual Address Space? Protection domain – A “sandbox” for threads that limits what memory they can access for read/write/execute. – Each thread is in exactly one sandbox, but many threads may play in the same sandbox. Uniform name space – Threads access their code and data items without caring where they are in physical memory, or even if they are resident in memory at all. A set of V  P translations – A level of indirection from virtual pages to physical frames. – The OS kernel controls the translations in effect at any time.

Introduction to Virtual Addressing text data BSS user stack args/env kernel data virtual memory (big?) physical memory (small?) virtual-to-physical translations Code addresses memory through virtual addresses. The kernel and the machine collude to translate virtual addresses to physical addresses. The kernel controls the virtual-physical translations in effect (space). The machine does not allow a user process to access memory unless the kernel “says it’s OK”. The specific mechanisms for implementing virtual address translation are machine-dependent.

Virtual Memory as a Cache text dataidata wdata header symbol table, etc. program sections text data BSS user stack args/env kernel data process segments page frames virtual memory (big) physical memory (small) executable file backing storage virtual-to-physical translations pageout/eviction page fetch

Virtual Address Translation VPNoffset 12 Example: typical 32-bit architecture with 4KB pages. address translation Virtual address translation maps a virtual page number (VPN) to a physical page frame number (PFN): the rest is easy. PFN offset + 0 physical address { Deliver exception to OS if translation is not valid and accessible in requested mode.

Cartoon View PFN 0 PFN 1 PFN i page #i offset user virtual address PFN i + offset process page table (map) physical memory page frames In this example, each VPN j maps to PFN j, but in practice any physical frame may be used for any virtual page. Each process/VAS has its own page table. Virtual addresses are translated relative to the current page table. The maps are themselves stored in memory; a protected register holds a pointer to the current map.

Under the Hood raise exception probe page table load TLB probe TLB access physical memory access valid? page fault? signal process allocate frame page on disk? fetch from disk zero-fill load TLB start here MMU OS

Page/block maps map Idea: use a level of indirection through a map to assemble a storage object from “scraps” of storage in different locations. The “scraps” can be fixed-size slots: that makes allocation easy because they are interchangeable. Example: page tables that implement a VAS.

Names and layers notes in notebook file User view Application File System notefile fd, byte range* Disk Subsystem device, block # surface, cylinder, sector bytes fd block# Add more layers as needed.

Representing a File On Disk logical block 0 logical block 1 logical block 2 once upo n a time /nin a l and far far away,/nlived t he wise and sage wizard. physical block pointers in the block map are sector IDs or physical block numbers file attributes: may include owner, access control list, time of create/modify/access, etc. block map Index by logical block number “inode”

A filesystem on disk 11100010 00101101 10111101 10011010 00110001 00010101 00101110 00011001 01000100 inode 0 bitmap file allocation bitmap file blocks 0 rain: 32 hail: 48 0 wind: 18 snow: 62 once upo n a time /n in a l and far far away, lived th inode 1 root directory fixed locations on disk This is a toy example (Nachos). regular file (inode) directory blocks file blocks

The Buffer Cache Memory File cache Proc

File Buffer Cache Avoid the disk for as many file operations as possible. Cache acts as a filter for the requests seen by the disk  reads served best. Delayed writeback will avoid going to disk at all for temp files. Copyin/copyout File cache Proc

Page/block cache internals HASH(blockID) Each frame/buffer of memory is described by a meta-object (header). Resident pages or blocks are accessible through through a global hash table. An ordered list of eviction candidates winds through the hash chains. Some frames/buffers are free (no valid data). These are on a free list.

VM page cache internals HASH(segment, page offset) 1. Pages in active use are mapped through the page table of one or more processes. 2. On a fault, the global object/offset hash table in kernel finds pages brought into memory by other processes. 3. Several page queues wind through the set of active frames, keeping track of usage. 4. Pages selected for eviction are removed from all page tables first.

Replacement  Think of physical memory as a cache  What happens on a cache miss?  Page fault  Must decide what to evict  Goal: reduce number of misses

Review of replacement algorithms 1.Random  Easy implementation, not great results 2.FIFO (first in, first out)  Replace page that came in longest ago  Popular pages often come in early  Problem: doesn’t consider last time used 3.OPT (optimal)  Replace the page that won’t be needed for longest time  Problem: requires knowledge of the future

Review of replacement algorithms  LRU (least-recently used)  Use past references to predict future  Exploit “temporal locality”  Problem: expensive to implement exactly  Why?  Either have to keep sorted list  Or maintain time stamps + scan on eviction  Update info on every access (ugh)

LRU  LRU is just an approximation of OPT  Could try approximating LRU instead  Don’t have to replace oldest page  Just replace an old page

– 25 – 15-213, F’02 Locality Principle of Locality: Programs tend to reuse data and instructions near those they have used recently, or that were recently referenced themselves. Temporal locality: Recently referenced items are likely to be referenced in the near future. Spatial locality: Items with nearby addresses tend to be referenced close together in time. Locality Example: Data –Reference array elements in succession (stride-1 reference pattern): –Reference sum each iteration: Instructions –Reference instructions in sequence: –Cycle through loop repeatedly: sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Spatial locality Temporal locality

– 26 – 15-213, F’02 Memory Hierarchies Some fundamental and enduring properties of hardware and software: Fast storage technologies cost more per byte and have less capacity. The gap between CPU and main memory speed is widening. Well-written programs tend to exhibit good locality. These fundamental properties complement each other beautifully. They suggest an approach for organizing memory and storage systems known as a memory hierarchy.

– 27 – 15-213, F’02 An Example Memory Hierarchy registers on-chip L1 cache (SRAM) main memory (DRAM) local secondary storage (local disks) Larger, slower, and cheaper (per byte) storage devices remote secondary storage (distributed file systems, Web servers) Local disks hold files retrieved from disks on remote network servers. Main memory holds disk blocks retrieved from local disks. off-chip L2 cache (SRAM) L1 cache holds cache lines retrieved from the L2 cache memory. CPU registers hold words retrieved from L1 cache. L2 cache holds cache lines retrieved from main memory. L0: L1: L2: L3: L4: L5: Smaller, faster, and costlier (per byte) storage devices

– 28 – 15-213, F’02 Caches Cache: A smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device. Fundamental idea of a memory hierarchy: For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1. Why do memory hierarchies work? Programs tend to access the data at level k more often than they access the data at level k+1. Thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit. Net effect: A large pool of memory that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top.

– 29 – 15-213, F’02 Caching in a Memory Hierarchy 0123 4567 891011 12131415 Larger, slower, cheaper storage device at level k+1 is partitioned into blocks. Data is copied between levels in block-sized transfer units 8 9143 Smaller, faster, more expensive device at level k caches a subset of the blocks from level k+1 Level k: Level k+1: 4 4 4 10

– 30 – 15-213, F’02 Request 14 Request 12 General Caching Concepts Program needs object d, which is stored in some block b. Cache hit Program finds b in the cache at level k. E.g., block 14. Cache miss b is not at level k, so level k cache must fetch it from level k+1. E.g., block 12. If level k cache is full, then some current block must be replaced (evicted). Which one is the “victim”? Placement policy: where can the new block go? E.g., b mod 4 Replacement policy: which block should be evicted? E.g., LRU 93 0123 4567 891011 12131415 Level k: Level k+1: 14 12 14 4* 12 0123 Request 12 4* 12

– 31 – 15-213, F’02 A System with Virtual Memory Examples: workstations, servers, modern PCs, etc. Address Translation: Hardware converts virtual addresses to physical addresses via OS-managed lookup table (page table) CPU 0: 1: N-1: Memory 0: 1: P-1: Page Table Disk Virtual Addresses Physical Addresses

– 32 – 15-213, F’02 Page Faults (like “Cache Misses”) What if an object is on disk rather than in memory? Page table entry indicates virtual address not in memory OS exception handler invoked to move data from disk into memory current process suspends, others can resume OS has full control over placement, etc. CPU Memory Page Table Disk Virtual Addresses Physical Addresses CPU Memory Page Table Disk Virtual Addresses Physical Addresses Before fault After fault

Dynamic address translation User process Translator (MMU) Translator (MMU) Physical memory Physical memory Virtual address Physical address Will this allow us to provide protection? Sure, as long as the translation is correct

The Page Caching Problem Each thread/process/job utters a stream of page references. – reference string: e.g., abcabcdabce.. The OS tries to minimize the number of faults incurred. – The set of pages (the working set) actively used by each job changes relatively slowly. – Try to arrange for the resident set of pages for each active job to closely approximate its working set. Replacement policy is the key. – On each page fault, select a victim page to evict from memory; read the new page into the victim’s frame. – Simple: replace the page whose next reference is furthest in the future (OPT).

Managing the VM Page Cache Managing a VM page cache is similar to a file block cache, but with some new twists. Pages are typically referenced by page table (pmap) entries. – Must invalidate mappings before reusing the frame. Reads and writes are implicit; the TLB hides them from the OS. – How can we tell if a page is dirty? – How can we tell if a page is referenced? Cache manager must run policies periodically, sampling page state. – Continuously push dirty pages to disk to “launder” them. – Continuously check references to judge how “hot” each page is. – Balance accuracy with sampling overhead.

public interface IVirtualDisk { /* Read a block specified by the dBID into buffer */ public void readBlock(int dBID, byte buffer[]) throws…; /* Write to block specified by the dBID from buffer */ public void writeBlock(int dBID, byte buffer[]) throws…; /* * Start an asynchronous request to the device/disk. * -- operation is either READ or WRITE * -- callbackIdentifer is an identifier the caller may use to match the * responses from the device (through a callback) with the requests. The * device does not interpret the callbackIdentifer, it just passes with * it along with the callback. * -- blockID uniquely identifies the block to access * -- buffer[] is a byte array used for read/write operations */ public void startRequest(DiskOperationType operation, int callbackIdentifer, int blockID, byte buffer[]) throws…; }

public interface IDFS { /* creates a new DFile and returns the DFileID */ public DFileID createDFile(); /* destroys the file specified by the DFileID */ public void destroyDFile(DFileID dFID); /* reads the file specified by DFileID starting from the offset startOffset * to the count specified into the buffer */ public int read(DFileID dFID, byte[] buffer, int startOffset, int count); /* writes to the file specified by DFileID from the buffer starting at * offset startOffset upto the count specified */ public int write(DFileID dFID, byte[] buffer, int startOffset, int count); /* List all the existing DFileIDs in the associated volume _volName */ public List listAllDFiles(); }

public abstract class DBufferCache implements VirtualDiskCallback { /* * Buffer allocation: Get locked buffer that can be used for block specified * by blockID */ public abstract DBuffer getBlock(int dBID); /* Release the locked buffer so that others waiting on it can use it */ public abstract void releaseBuffer(byte[] buffer); /* * sync() writes back all dirty blocks to DStore and forces DStore * to write back all contents to the disk device. The sync( ) method should * maintain clean block copies in DBufferCache. */ public abstract void sync(); /* Similar to sync() but invalidates all cached blocks unlike sync(). */ public abstract void flush(); }

public abstract class DBuffer { /* If the block is not in cache, start a fetch from disk asynchronously */ public abstract void startFetch(); /* Push a buffer block to device/disk asynchronously */ public abstract void startPush(); /* Check whether the buffer is in use */ public abstract boolean checkValid(); /* Wait until the buffer is free */ public abstract boolean waitValid(); /* Check whether the buffer is dirty, i.e., written to memory but not written to the disk device yet */ public abstract boolean checkClean(); /* Wait until the buffer is clean */ public abstract boolean waitClean(); }

public abstract class DBuffer { /* * reads into the buffer[ ] array the cache block specified by blockID from * the DBufferCache if it is in cache, otherwise reads the corresponding * disk block from the disk device. Upon an error, it should return -1, * otherwise return number of bytes read. */ public abstract int read(int blockID, byte[] buffer, int startOffset, int count); /* * writes the buffer[ ] array contents to the cache block specified by * blockID from the DBufferCache if it is in cache, otherwise finds a free * cache block and writes the buffer [ ] contents on it. Upon an error, it * should return -1, otherwise return number of bytes written. */ public abstract int write(int blockID, byte[] buffer, int startOffset, int count); }

How it should be

DFS DBufferCache DBuffer VirtualDisk startRequest(r/w) ioComplete() copy bytes to/from buffer startFetch(), startPush() waitValid(), waitClean() sync(); DBuffer = getBlock(blockID); releaseBlock(buf); create, destroy, read, write a dfilelist() dfiles sync() cache

/* creates a new dfile and returns the DFileID */ public DFileID createDFile(); /* destroys the dfile named by the DFileID */ public void destroyDFile(DFileID dFID); /* reads contents of the dfile named by DFileID into the buffer * starting from buffer offset startOffset; at most count bytes are transferred */ public int read(DFileID dFID, byte[] buffer, int startOffset, int count); /* writes to the file specified by DFileID from the buffer * starting from buffer offset startOffset; at most count bytes are transferred */ public int write(DFileID dFID, byte[] buffer, int startOffset, int count); /* List DFileIDs for all existing dfiles in the volume */ public List listAllDFiles(); DFS

/* Get buffer for block specified by blockID The buffer is “busy” until the caller releases it. */ public DBuffer getBlock(int blockID); /* Release the buffer so that others */ public void releaseBlock(DBuffer buf); /* Write back all dirty blocks to the volume, and wait for completion. */ public void sync(); DBufferCache

/* Start an asynchronous fetch of associated block from the volume */ public abstract void startFetch(); /* Start an asynchronous write of buffer contents to block on volume */ public abstract void startPush(); /* Check whether the buffer has valid data*/ public abstract boolean checkValid(); /* Wait until the buffer is free */ public abstract boolean waitValid(); /* Check whether the buffer is dirty, i.e., has modified data to be written back */ public abstract boolean checkClean(); /* Wait until the buffer is clean, i.e., until a push operation completes */ public abstract boolean waitClean(); /* Check if buffer is evictable: not evictable if I/O in progress, or buffer is held. */ public abstract boolean isBusy(); DBuffer

/* * reads into the buffer[ ] array from the contents of the DBuffer. * Check first that the DBuffer has a valid copy of the data! * startOffset and count are for the buffer array, not the DBuffer. */ public int read(byte[] buffer, int startOffset, int count); /* * writes into the Dbuffer from the contents of buffer[ ] array. * startOffset and count are for the buffer array, not the Dbuffer. * Mark buffer dirty! */ public int write(byte[] buffer, int startOffset, int count); } DBuffer

/* * Start an asynchronous request to the device/disk. * Nature of the request is encoded in the state of the DBuffer * -- operation is either READ or WRITE * -- blockID uniquely identifies the block to access * -- buffer[] is a byte array used for read/write operations */ public void startRequest(DBuffer buf) throws…; VirtualDisk

Resource Management Policy and Mechanism Jeff Chase Duke University.

Similar presentations

Presentation on theme: "Resource Management Policy and Mechanism Jeff Chase Duke University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Resource Management Policy and Mechanism Jeff Chase Duke University.

Similar presentations

Presentation on theme: "Resource Management Policy and Mechanism Jeff Chase Duke University."— Presentation transcript:

Similar presentations

About project

Feedback