Prof. Sin-Min Lee Department of Computer Science

Slides:



Advertisements
Similar presentations
Chapter 4 Memory Management Basic memory management Swapping
Advertisements

Cache and Virtual Memory Replacement Algorithms
Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.
Chapter 4 Memory Management Page Replacement 补充:什么叫页面抖动?
Fabián E. Bustamante, Spring 2007
Chapter 2: Memory Management, Early Systems
Chapter 2: Memory Management, Early Systems
Memory Management, Early Systems
Understanding Operating Systems Fifth Edition
1 Virtual Memory Management B.Ramamurthy. 2 Demand Paging Main memory LAS 0 LAS 1 LAS 2 (Physical Address Space -PAS) LAS - Logical Address.
Module 9: Virtual Memory
Module 10: Virtual Memory Background Demand Paging Performance of Demand Paging Page Replacement Page-Replacement Algorithms Allocation of Frames Thrashing.
Virtual Memory Introduction to Operating Systems: Module 9.
Chapter 101 Virtual Memory Chapter 10 Sections and plus (Skip:10.3.2, 10.7, rest of 10.8)
Virtual Memory Chapter 8.
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
Multiprocessing Memory Management
Understanding Operating Systems1 Operating Systems Virtual Memory Thrashing Single-User Contiguous Scheme Fixed Partitions Dynamic Partitions.
CS 104 Introduction to Computer Science and Graphics Problems
Memory Management 2010.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
Virtual Memory Chapter 8.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
Virtual Memory Management B.Ramamurthy. Paging (2) The relation between virtual addresses and physical memory addres- ses given by page table.
1 Virtual Memory Management B.Ramamurthy Chapter 10.
Chapter 4 Memory Management 4.1 Basic memory management 4.2 Swapping
CSCI2413 Lecture 6 Operating Systems Memory Management 2 phones off (please)
Virtual Memory.
Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.
Chapter 2 Memory Management: Early Systems Understanding Operating Systems, Fourth Edition.
Chapter 9: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating Kernel.
1 Memory Management 4.1 Basic memory management 4.2 Swapping 4.3 Virtual memory 4.4 Page replacement algorithms 4.5 Modeling page replacement algorithms.
Chapter 4 Memory Management Virtual Memory.
Virtual Memory Prof. Sin-Min Lee Department of Computer Science.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Virtual Memory.  Next in memory hierarchy  Motivations:  to remove programming burdens of a small, limited amount of main memory  to allow efficient.
1 Virtual Memory. Cache memory: provides illusion of very high speed Virtual memory: provides illusion of very large size Main memory: reasonable cost,
Virtual Memory Prof. Sin-Min Lee Department of Computer Science.
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Virtual Memory Chapter 8.
CS161 – Design and Architecture of Computer
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
UNIT–IV: Memory Management
CS703 - Advanced Operating Systems
Chapter 9 – Real Memory Organization and Management
Module 9: Virtual Memory
Chapter 8: Main Memory.
Chapter 9: Virtual-Memory Management
CS 105 “Tour of the Black Holes of Computing!”
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 8 11/24/2018.
Computer Architecture
5: Virtual Memory Background Demand Paging
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 9 12/1/2018.
Main Memory Background Swapping Contiguous Allocation Paging
CS399 New Beginnings Jonathan Walpole.
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Virtual Memory Nov 27, 2007 Slide Source:
Contents Memory types & memory hierarchy Virtual memory (VM)
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 9 4/5/2019.
CS 105 “Tour of the Black Holes of Computing!”
Computer Architecture
CS 105 “Tour of the Black Holes of Computing!”
Module 9: Virtual Memory
Virtual Memory.
CSE 542: Operating Systems
Presentation transcript:

Prof. Sin-Min Lee Department of Computer Science CS147 Lecture 17 Virtual Memory Prof. Sin-Min Lee Department of Computer Science

Fixed (Static) Partitions Attempt at multiprogramming using fixed partitions one partition for each job size of partition designated by reconfiguring the system partitions can’t be too small or too large. Critical to protect job’s memory space. Entire program stored contiguously in memory during entire execution. Internal fragmentation is a problem.

Simplified Fixed Partition Memory Table (Table 2.1)

Table 2.1 : Main memory use during fixed partition allocation of Table 2.1. Job 3 must wait. Job List : J1 30K J2 50K J3 30K J4 25K Original State After Job Entry 100K Job 1 (30K) Partition 1 Partition 1 Partition 2 25K Job 4 (25K) Partition 2 Partition 3 25K Partition 3 50K Job 2 (50K) Partition 4 Partition 4

Dynamic Partitions Available memory kept in contiguous blocks and jobs given only as much memory as they request when loaded. Improves memory use over fixed partitions. Performance deteriorates as new jobs enter the system fragments of free memory are created between blocks of allocated memory (external fragmentation).

Dynamic Partitioning of Main Memory & Fragmentation (Figure 2.2)

Dynamic Partition Allocation Schemes First-fit: Allocate the first partition that is big enough. Keep free/busy lists organized by memory location (low-order to high-order). Faster in making the allocation. Best-fit: Allocate the smallest partition that is big enough Keep free/busy lists ordered by size (smallest to largest). Produces the smallest leftover partition. Makes best use of memory.

First-Fit Allocation Example (Table 2.2) J1 10K J2 20K J3 30K* J4 10K Memory Memory Job Job Internal location block size number size Status fragmentation 10240 30K J1 10K Busy 20K 40960 15K J4 10K Busy 5K 56320 50K J2 20K Busy 30K 107520 20K Free Total Available: 115K Total Used: 40K Job List

Best-Fit Allocation Example (Table 2.3) J1 10K J2 20K J3 30K J4 10K Memory Memory Job Job Internal location block size number size Status fragmentation 40960 15K J1 10K Busy 5K 107520 20K J2 20K Busy None 10240 30K J3 30K Busy None 56230 50K J4 10K Busy 40K Total Available: 115K Total Used: 70K Job List

First-Fit Memory Request Assume that a job of size 200 bytes is waiting to be loaded into memory.

Best-Fit Memory Request

Best-Fit vs. First-Fit Best-Fit More complex algorithm Increases memory use Memory allocation takes less time Increases internal fragmentation Discriminates against large jobs Best-Fit More complex algorithm Searches entire table before allocating memory Results in a smaller “free” space (sliver)

Release of Memory Space : Deallocation Deallocation for fixed partitions is simple Memory Manager resets status of memory block to “free”. Deallocation for dynamic partitions tries to combine free areas of memory whenever possible Is the block adjacent to another free block? Is the block between 2 free blocks? Is the block isolated from other free blocks?

Case 1: Joining 2 Free Blocks

Case 2: Joining 3 Free Blocks This slide has an error: The job finishing is at 7580 and has a size of 20 bytes, not 7600 and a size of 200. Assume that the job at 7600 is already free and has a size of 205. The after chart is correct as stated assuming these changes to the before deallocation.

Case 3: Deallocating an Isolated Block

Relocatable Dynamic Partitions Memory Manager relocates programs to gather all empty blocks and compact them to make 1 memory block. Memory compaction (garbage collection, defragmentation) performed by OS to reclaim fragmented sections of memory space. Memory Manager optimizes use of memory & improves throughput by compacting & relocating.

Compaction Steps Relocate every program in memory so they’re contiguous. Adjust every address, and every reference to an address, within each program to account for program’s new location in memory. Must leave alone all other values within the program (e.g., data values).

Memory Before & After Compaction (Figure 2.5)

Contents of relocation register & close-up of Job 4 memory area (a) before relocation & (b) after relocation and compaction (Figure 2.6) Note each job will have its own relocation register value.

Virtual Memory Virtual Memory (VM) = the ability of the CPU and the operating system software to use the hard disk drive as additional RAM when needed (safety net) Good – no longer get “insufficient memory” error Bad - performance is very slow when accessing VM Solution = more RAM

Motivations for Virtual Memory Use Physical DRAM as a Cache for the Disk Address space of a process can exceed physical memory size Sum of address spaces of multiple processes can exceed physical memory Simplify Memory Management Multiple processes resident in main memory. Each process with its own address space Only “active” code and data is actually in memory Allocate more memory to process as needed. Provide Protection One process can’t interfere with another. because they operate in different address spaces. User process cannot access privileged information different sections of address spaces have different permissions.

Virtual Memory

Levels in Memory Hierarchy cache virtual memory CPU regs C a c h e Memory 8 B 32 B 4 KB disk Register Cache Memory Disk Memory size: speed: $/Mbyte: line size: 32 B 1 ns 8 B 32 KB-4MB 2 ns $100/MB 32 B 128 MB 50 ns $1.00/MB 4 KB 20 GB 8 ms $0.006/MB larger, slower, cheaper

DRAM vs. SRAM as a “Cache” DRAM vs. disk is more extreme than SRAM vs. DRAM Access latencies: DRAM ~10X slower than SRAM Disk ~100,000X slower than DRAM Importance of exploiting spatial locality: First byte is ~100,000X slower than successive bytes on disk vs. ~4X improvement for page-mode vs. regular accesses to DRAM Bottom line: Design decisions made for DRAM caches driven by enormous cost of misses SRAM DRAM Disk

Locating an Object in a “Cache” (cont.) DRAM Cache Each allocate page of virtual memory has entry in page table Mapping from virtual pages to physical pages From uncached form to cached form Page table entry even if page not in memory Specifies disk address OS retrieves information Page Table “Cache” Location Data 243 17 105 • 0: 1: N-1: X Object Name D: J: X: On Disk • 1

A System with Physical Memory Only Examples: most Cray machines, early PCs, nearly all embedded systems, etc. Memory Physical Addresses 0: 1: CPU N-1: Addresses generated by the CPU point directly to bytes in physical memory

A System with Virtual Memory Examples: workstations, servers, modern PCs, etc. Memory 0: 1: N-1: Page Table Virtual Addresses Physical Addresses 0: 1: CPU P-1: Disk Address Translation: Hardware converts virtual addresses to physical addresses via an OS-managed lookup table (page table)

Page Faults (Similar to “Cache Misses”) What if an object is on disk rather than in memory? Page table entry indicates virtual address not in memory OS exception handler invoked to move data from disk into memory current process suspends, others can resume OS has full control over placement, etc. Before fault After fault Memory Memory Page Table Page Table Virtual Addresses Physical Addresses Virtual Addresses Physical Addresses CPU CPU Disk Disk

4 Terminology Cache: a small, fast “buffer” that lies between the CPU and the Main Memory which holds the most recently accessed data. Virtual Memory: Program and data are assigned addresses independent of the amount of physical main memory storage actually available and the location from which the program will actually be executed. Hit ratio: Probability that next memory access is found in the cache. Miss rate: (1.0 – Hit rate)

Importance of Hit Ratio 5 Importance of Hit Ratio Given: h = Hit ratio Ta = Average effective memory access time by CPU Tc = Cache access time Tm = Main memory access time Effective memory time is: Ta = hTc + (1 – h)Tm Speedup due to the cache is: Sc = Tm / Ta Example: Assume main memory access time of 100ns and cache access time of 10ns and there is a hit ratio of .9. Ta = .9(10ns) + (1 - .9)(100ns) = 19ns Sc = 100ns / 19ns = 5.26 Same as above only hit ratio is now .95 instead: Ta = .95(10ns) + (1 - .95)(100ns) = 14.5ns Sc = 100ns / 14.5ns = 6.9

Cache vs Virtual Memory 6 Cache vs Virtual Memory Primary goal of Cache: increase Speed. Primary goal of Virtual Memory: increase Space.

Cache Replacement Algorithms 15 Cache Replacement Algorithms Replacement algorithm determines which block in cache is removed to make room. 2 main policies used today Least Recently Used (LRU) The block replaced is the one unused for the longest time. Random The block replaced is completely random – a counter-intuitive approach.

16 LRU vs Random Below is a sample table comparing miss rates for both LRU and Random. Cache Size Miss Rate: LRU Random 16KB 4.4% 5.0% 64KB 1.4% 1.5% 256KB 1.1% As the cache size increases there are more blocks to choose from, therefore the choice is less critical  probability of replacing the block that’s needed next is relatively low.

Virtual Memory Replacement Algorithms 17 Virtual Memory Replacement Algorithms 1) Optimal 2) First In First Out (FIFO) 3) Least Recently Used (LRU)

18 Optimal Replace the page which will not be used for the longest (future) period of time. Faults are shown in boxes; hits are not shown. 1 2 3 4 1 2 5 1 2 5 3 4 5 7 page faults occur

19 Optimal A theoretically “best” page replacement algorithm for a given fixed size of VM. Produces the lowest possible page fault rate. Impossible to implement since it requires future knowledge of reference string. Just used to gauge the performance of real algorithms against best theoretical.

FIFO 20 Faults are shown in boxes; hits are not shown. When a page fault occurs, replace the one that was brought in first. Faults are shown in boxes; hits are not shown. 1 2 3 4 1 2 5 1 2 5 3 4 5 9 page faults occur

FIFO Simplest page replacement algorithm. 21 FIFO Simplest page replacement algorithm. Problem: can exhibit inconsistent behavior known as Belady’s anomaly. Number of faults can increase if job is given more physical memory i.e., not predictable

Example of FIFO Inconsistency 22 Example of FIFO Inconsistency Same reference string as before only with 4 frames instead of 3. Faults are shown in boxes; hits are not shown. 1 2 3 4 1 2 5 1 2 5 3 4 5 10 page faults occur

23 LRU Replace the page which has not been used for the longest period of time. Faults are shown in boxes; hits only rearrange stack 1 2 3 4 1 2 5 1 2 5 3 4 5 1 2 5 5 1 2 2 5 1 9 page faults occur

LRU More expensive to implement than FIFO, but it is more consistent. 24 LRU More expensive to implement than FIFO, but it is more consistent. Does not exhibit Belady’s anomaly More overhead needed since stack must be updated on each access.

Example of LRU Consistency 25 Example of LRU Consistency Same reference string as before only with 4 frames instead of 3. Faults are shown in boxes; hits only rearrange stack 1 2 3 4 1 2 5 1 2 5 3 4 5 1 2 1 2 5 4 1 5 1 2 3 4 2 5 1 2 3 4 4 4 7 page faults occur

Servicing a Page Fault Processor Signals Controller (1) Initiate Block Read Processor Signals Controller Read block of length P starting at disk address X and store starting at memory address Y Read Occurs Direct Memory Access (DMA) Under control of I/O controller I / O Controller Signals Completion Interrupt processor OS resumes suspended process Processor Reg (3) Read Done Cache Memory-I/O bus (2) DMA Transfer I/O controller Memory disk Disk Disk disk

Handling Page Faults Memory reference causes a fault – called a page fault Page fault can happen at any time and place Instruction fetch In the middle of an instruction execution System must save all state Move page from disk to memory Restart the faulting instruction Restore state Backup PC – not easy to find out by how much – need HW help

Page Fault If there is ever a reference to a page, first reference will trap to OS  page fault Hardware traps to kernel General registers saved OS determines which virtual page needed OS checks validity of address, seeks page frame If selected frame is dirty, write it to disk OS brings schedules new page in from disk Page tables updated Faulting instruction backed up to when it began Faulting process scheduled Registers restored Program continues

What to Page in Demand paging brings in the faulting page To bring in additional pages, we need to know the future Users don’t really know the future, but some OSs have user-controlled pre-fetching In real systems, load the initial page Start running Some systems (e.g. WinNT will bring in additional neighboring pages (clustering)) Demand paging – start with nothing – all PTE’s I=0 Then as execute, get faults as code and data needed ( demanded) by process

VM Page Replacement If there is an unused page, use it. If there are no pages available, select one (Policy?) and If it is dirty (M == 1) write it to disk Invalidate its PTE and TLB entry Load in new page from disk Update the PTE and TLB entry! Restart the faulting instruction What is cost of replacing a page? How does the OS select the page to be evicted?

Measuring Demand Paging Performance Page Fault Rate (p) 0 < p < 1.0 (no page faults to every ref is a fault) Page Fault Overhead = fault service overhead + read page + restart process overhead Dominated by time to read page in Effective Access Time = (1-p) (memory access) + p (page fault overhead)

Performance Example Memory access time = 100 nanoseconds Page fault overhead = 25 millisec (msec) Page fault rate = 1/1000 EAT = (1-p) * 100 + p * (25 msec) = (1-p) * 100 + p * 25,000,000 = 100 + 24,999,900 * p = 100 + 24,999,900 * 1/1000 = 25 microseconds! Want less than 10% degradation 110 > 100 + 24,999,900 * p 10 > 24,999,900 * p p < .0000004 or 1 fault in 2,500,000 accesses!

Page Replacement Algorithms Want lowest page-fault rate. Evaluate algorithm by running it on a particular string of memory references (reference string) and computing the number of page faults on that string. Reference string – ordered list of pages accessed as process executes Ex. Reference String is A B C A B D A D B C B

The Best Page to Replace The best page to replace is the one that will never be accessed again Optimal Algorithm - Belady’s Algorithm Lowest fault rate for any reference string Basically, replace the page that will not be used for the longest time in the future. If you know the future, please see me after class!! Belady’s Algorithm is a yardstick We want to find close approximations

Page Replacement - FIFO FIFO is simple to implement When page in, place page id on end of list Evict page at head of list Might be good? Page to be evicted has been in memory the longest time But? Maybe it is being used We just don’t know FIFO suffers from Belady’s Anomaly – fault rate may increase when there is more physical memory!

FIFO vs. Optimal Reference string – ordered list of pages accessed as process executes Ex. Reference String is A B C A B D A D B C B OPTIMAL A B C A B D A D B C B System has 3 page frames 5 Faults toss C toss A or D A B C D FIFO A B C A B D A D B C B toss A toss ? 7 faults

Second Chance Maintain FIFO page list On page fault Check reference bit If R == 1 then move page to end of list and clear R If R == 0 then evict page

Clock Replacement Create circular list of PTEs in FIFO Order One-handed Clock – pointer starts at oldest page Algorithm – FIFO, but check Reference bit If R == 1, set R = 0 and advance hand evict first page with R == 0 Looks like a clock hand sweeping PTE entries Fast, but worst case may take a lot of time Two-handed clock – add a 2nd hand that is n PTEs ahead 2nd hand clears Reference bit

Not Recently Used Page Replacement Algorithm Each page has Reference bit, Modified bit bits are set when page is referenced, modified Pages are classified not referenced, not modified not referenced, modified referenced, not modified referenced, modified NRU removes page at random from lowest numbered non empty class

Least Recently Used (LRU) Replace the page that has not been used for the longest time 3 Page Frames Reference String - A B C A B D A D B C LRU – 5 faults A B C A B D A D B C

LRU Past experience may indicate future behavior Perfect LRU requires some form of timestamp to be associated with a PTE on every memory reference !!! Counter implementation Every page entry has a counter; every time page is referenced through this entry, copy the clock into the counter. When a page needs to be changed, look at the counters to determine which are to change Stack implementation – keep a stack of page numbers in a double link form: Page referenced: move it to the top No search for replacement

LRU Approximations Aging Clock replacement Keep a counter for each PTE Periodically – check Reference bit If R == 0 increment counter (page has not been used) If R == 1 clear the counter (page has been used) Set R = 0 Counter contains # of intervals since last access Replace page with largest counter value Clock replacement

Contrast: Macintosh Memory Model MAC OS 1–9 Does not use traditional virtual memory All program objects accessed through “handles” Indirect reference through pointer table Objects stored in shared global address space P1 Pointer Table P2 Pointer Table Process P1 Process P2 Shared Address Space A B C D E “Handles”

Macintosh Memory Management Allocation / Deallocation Similar to free-list management of malloc/free Compaction Can move any object and just update the (unique) pointer in pointer table P1 Pointer Table P2 Pointer Table Process P1 Process P2 Shared Address Space A B C D E “Handles”

Mac vs. VM-Based Memory Mgmt Allocating, deallocating, and moving memory: can be accomplished by both techniques Block sizes: Mac: variable-sized may be very small or very large VM: fixed-size size is equal to one page (4KB on x86 Linux systems) Allocating contiguous chunks of memory: Mac: contiguous allocation is required VM: can map contiguous range of virtual addresses to disjoint ranges of physical addresses Protection Mac: “wild write” by one process can corrupt another’s data

MAC OS X “Modern” Operating System Based on MACH OS Virtual memory with protection Preemptive multitasking Other versions of MAC OS require processes to voluntarily relinquish control Based on MACH OS Developed at CMU in late 1980’s

Page Replacement Policy Working Set: Set of pages used actively & heavily Kept in memory to reduce Page Faults Set is found/maintained dynamically by OS Replacement: OS tries to predict which page would have least impact on the running program Common Replacement Schemes: Least Recently Used (LRU) First-In-First-Out (FIFO)

Page Replacement Policies Least Recently Used (LRU) Generally works well TROUBLE: When the working set is larger than the Main Memory Working Set = 9 pages Pages are executed in sequence (08 (repeat)) THRASHING

Page Replacement Policies First-In-First-Out(FIFO) Removes Least Recently Loaded page Does not depend on Use Determined by number of page faults seen by a page

Page Replacement Policies Upon Replacement Need to know whether to write data back Add a Dirty-Bit Dirty Bit = 0; Page is clean; No writing Dirty Bit = 1; Page is dirty; Write back