Memory Caches & TLB Virtual Memory
Cache Memory Issues Caching Issues: Improves performance of memory system Introduces some issues to the operating system Issues: Coherence Addressing mode (virtual vs. physical) context switching O.S. code structuring & line size Frame allocation
Coherence Issues A cache can be: combined: instruction + data split: instruction cache + data cache hardware may maintain coherence hardware may not maintain coherence Structure of the cache affects correctness during code segment downloading
Coherence & Instruction Cache Only an issue for split caches when hardware does not guarantee coherence I-cache read-only D-cache read-write 1. O.S. loads user program instructions through data cache 2. User program runs instructions off the I-cache, but I-cache may contain stale lines: OS must flush instruction cache
Addressing Modes A cache may be organized by physical address (most common) by virtual address Virtually addressable caches: No need for TLB check if line is present in cache To avoid flushing the entire cache on context switch, we may add a pid tag
Virtually Addressable Caches virtual address pid address tag virtual address pid address tag virtual address pid address tag By having the pid: No need to flush the entire cache on a context switch OS must supply the tag, usually using a privileged register or instruction
Aliasing vaddr 1 pid1 vaddr 2 pid2 If pid1 & pid2 are sharing a frame: Frame may be mapped in different cache lines Modifications by one process are not seen by the other OS must detect situation and flush the additional line
Context Switching It is often claimed that an O.S. can context switch in x microseconds Should alert your bogusity sensors Such numbers always quoted without cache effect Real penalty of context switch: Cache memory must reload the active process TLB misses at the beginning
Context Switching At the beginning of interval, too many cache and TLB misses. If frequent, performance goes down Introduces variability Some real-time O.S.’es disable cache entirely Some real-time O.S.’es partitions cache among processes (software or hardware)
Caching & O.S. Code Structure O.S. code does not have many loops follow the principle of locality fit in a small space It follows that: O.S. code has very poor cache hit ratio O.S. when invoked pollutes the cache a penalty even when no context switch occurs
Dependence on Cache Line Size Some OS’es are tuned to a particular cache line size e.g. MacOS assumes a 32-byte cache line Tradeoff between: Performance Portability of code
Frame Allocation Relevant to: Physically addressed caches O.S. must allocate frames to a single process (or kernel) such that: cache collisions would be minimized
Example Direct mapped caches use physical address as a hashing key When choice is possible, OS must avoid allocations that result in poor cache performance 96 80 77 Free frames 88 Allocated page assume lines are frame no % 8 + some offset
Virtual Memory Extend the physical memory into disk For each process Keep only the needed pages in memory Swap out the unneeded pages Can have more processes Process size is no longer limited by memory size
Demand Paging Virtual Physical Virtual Process 1 Process 0 Swap space
Demand Paging is not Swapping In demand paging Can page out only unneeded pages Process size is limited by swap space Address space growth requires only a new page Fine grain control In swapping Must swap entire process out Process size is limited by physical memory Partition growth is difficult Cannot control portions of partition
Implementation Page table entries may overload valid bit Swap space mgmt For each frame, must track the inverse mapping (when shared) Frame No. v w r x f m Frame No. v w r x f m Frame No. v w r x f m
On Page Faults Operating system checks access If valid, schedules a disk read in order to bring the required page from swap space Do we wait for the page, or do we context switch to another process? When disk comes back with data, need to readjust page table entry (ies, if sharing)
On Page Faults When swapping in a page, we need: A free frame to read the page into Free frame locked while disk reads page (so that it does not get reallocated) But what if we run out of free frames? Pick a victim frame If dirty, then page it out, else use it
Properties Virtual Memory Page replacement Demand paging Working sets Association of virtual-physical addresses changes over time Page faults can be very expensive (two disk reads) Requires instructions to be restartable Virtual Memory Page replacement Demand paging lazy downloading on demand pre-paging Working sets Thrashing Local vs. Global allocation Page size fragmentation paging overhead TLB coverage Locking kernel pages user pages I/O interlocking Instruction set issues Paging Daemons Page Fault handling