Prof. Sin-Min Lee Department of Computer Science

Name: Prof. Sin-Min Lee Department of Computer Science
Uploaded: 2017-08-22T02:46:43+00:00
Duration: PTM38S16
Channel: Sasha Rostron
Description: Prof. Sin-Min Lee Department of Computer Science

Prof. Sin-Min Lee Department of Computer Science
CS147 Lecture 17 Virtual Memory Prof. Sin-Min Lee Department of Computer Science

Fixed (Static) Partitions
Attempt at multiprogramming using fixed partitions one partition for each job size of partition designated by reconfiguring the system partitions can’t be too small or too large. Critical to protect job’s memory space. Entire program stored contiguously in memory during entire execution. Internal fragmentation is a problem.

Simplified Fixed Partition Memory Table (Table 2.1)

Table 2.1 : Main memory use during fixed partition allocation of Table 2.1. Job 3 must wait.
Job List : J1 30K J2 50K J3 30K J4 25K Original State After Job Entry 100K Job 1 (30K) Partition 1 Partition 1 Partition 2 25K Job 4 (25K) Partition 2 Partition 3 25K Partition 3 50K Job 2 (50K) Partition 4 Partition 4

Dynamic Partitions Available memory kept in contiguous blocks and jobs given only as much memory as they request when loaded. Improves memory use over fixed partitions. Performance deteriorates as new jobs enter the system fragments of free memory are created between blocks of allocated memory (external fragmentation).

Dynamic Partitioning of Main Memory & Fragmentation (Figure 2.2)

Dynamic Partition Allocation Schemes
First-fit: Allocate the first partition that is big enough. Keep free/busy lists organized by memory location (low-order to high-order). Faster in making the allocation. Best-fit: Allocate the smallest partition that is big enough Keep free/busy lists ordered by size (smallest to largest). Produces the smallest leftover partition. Makes best use of memory.

First-Fit Allocation Example (Table 2.2)
J1 10K J2 20K J3 30K* J4 10K Memory Memory Job Job Internal location block size number size Status fragmentation K J K Busy 20K K J K Busy 5K K J K Busy 30K K Free Total Available: 115K Total Used: K Job List

Best-Fit Allocation Example (Table 2.3)
J1 10K J2 20K J3 30K J4 10K Memory Memory Job Job Internal location block size number size Status fragmentation K J K Busy 5K K J K Busy None K J K Busy None K J K Busy 40K Total Available: 115K Total Used: K Job List

First-Fit Memory Request
Assume that a job of size 200 bytes is waiting to be loaded into memory.

Best-Fit Memory Request

Best-Fit vs. First-Fit Best-Fit More complex algorithm
Increases memory use Memory allocation takes less time Increases internal fragmentation Discriminates against large jobs Best-Fit More complex algorithm Searches entire table before allocating memory Results in a smaller “free” space (sliver)

Release of Memory Space : Deallocation
Deallocation for fixed partitions is simple Memory Manager resets status of memory block to “free”. Deallocation for dynamic partitions tries to combine free areas of memory whenever possible Is the block adjacent to another free block? Is the block between 2 free blocks? Is the block isolated from other free blocks?

Case 1: Joining 2 Free Blocks

Case 2: Joining 3 Free Blocks
This slide has an error: The job finishing is at 7580 and has a size of 20 bytes, not 7600 and a size of Assume that the job at 7600 is already free and has a size of 205. The after chart is correct as stated assuming these changes to the before deallocation.

Case 3: Deallocating an Isolated Block

Relocatable Dynamic Partitions
Memory Manager relocates programs to gather all empty blocks and compact them to make 1 memory block. Memory compaction (garbage collection, defragmentation) performed by OS to reclaim fragmented sections of memory space. Memory Manager optimizes use of memory & improves throughput by compacting & relocating.

Compaction Steps Relocate every program in memory so they’re contiguous. Adjust every address, and every reference to an address, within each program to account for program’s new location in memory. Must leave alone all other values within the program (e.g., data values).

Memory Before & After Compaction (Figure 2.5)

Contents of relocation register & close-up of Job 4 memory area (a) before relocation & (b) after relocation and compaction (Figure 2.6) Note each job will have its own relocation register value.

Virtual Memory Virtual Memory (VM) = the ability of the CPU and the operating system software to use the hard disk drive as additional RAM when needed (safety net) Good – no longer get “insufficient memory” error Bad - performance is very slow when accessing VM Solution = more RAM

Motivations for Virtual Memory
Use Physical DRAM as a Cache for the Disk Address space of a process can exceed physical memory size Sum of address spaces of multiple processes can exceed physical memory Simplify Memory Management Multiple processes resident in main memory. Each process with its own address space Only “active” code and data is actually in memory Allocate more memory to process as needed. Provide Protection One process can’t interfere with another. because they operate in different address spaces. User process cannot access privileged information different sections of address spaces have different permissions.

Virtual Memory

Levels in Memory Hierarchy
cache virtual memory CPU regs C a c h e Memory 8 B 32 B 4 KB disk Register Cache Memory Disk Memory size: speed: $/Mbyte: line size: 32 B 1 ns 8 B 32 KB-4MB 2 ns $100/MB 32 B 128 MB 50 ns $1.00/MB 4 KB 20 GB 8 ms $0.006/MB larger, slower, cheaper

DRAM vs. SRAM as a “Cache”
DRAM vs. disk is more extreme than SRAM vs. DRAM Access latencies: DRAM ~10X slower than SRAM Disk ~100,000X slower than DRAM Importance of exploiting spatial locality: First byte is ~100,000X slower than successive bytes on disk vs. ~4X improvement for page-mode vs. regular accesses to DRAM Bottom line: Design decisions made for DRAM caches driven by enormous cost of misses SRAM DRAM Disk

Locating an Object in a “Cache” (cont.)
DRAM Cache Each allocate page of virtual memory has entry in page table Mapping from virtual pages to physical pages From uncached form to cached form Page table entry even if page not in memory Specifies disk address OS retrieves information Page Table “Cache” Location Data 243 17 105 • 0: 1: N-1: X Object Name D: J: X: On Disk • 1

A System with Physical Memory Only
Examples: most Cray machines, early PCs, nearly all embedded systems, etc. Memory Physical Addresses 0: 1: CPU N-1: Addresses generated by the CPU point directly to bytes in physical memory

A System with Virtual Memory
Examples: workstations, servers, modern PCs, etc. Memory 0: 1: N-1: Page Table Virtual Addresses Physical Addresses 0: 1: CPU P-1: Disk Address Translation: Hardware converts virtual addresses to physical addresses via an OS-managed lookup table (page table)

Page Faults (Similar to “Cache Misses”)
What if an object is on disk rather than in memory? Page table entry indicates virtual address not in memory OS exception handler invoked to move data from disk into memory current process suspends, others can resume OS has full control over placement, etc. Before fault After fault Memory Memory Page Table Page Table Virtual Addresses Physical Addresses Virtual Addresses Physical Addresses CPU CPU Disk Disk

4 Terminology Cache: a small, fast “buffer” that lies between the CPU and the Main Memory which holds the most recently accessed data. Virtual Memory: Program and data are assigned addresses independent of the amount of physical main memory storage actually available and the location from which the program will actually be executed. Hit ratio: Probability that next memory access is found in the cache. Miss rate: (1.0 – Hit rate)

Importance of Hit Ratio
5 Importance of Hit Ratio Given: h = Hit ratio Ta = Average effective memory access time by CPU Tc = Cache access time Tm = Main memory access time Effective memory time is: Ta = hTc + (1 – h)Tm Speedup due to the cache is: Sc = Tm / Ta Example: Assume main memory access time of 100ns and cache access time of 10ns and there is a hit ratio of .9. Ta = .9(10ns) + (1 - .9)(100ns) = 19ns Sc = 100ns / 19ns = 5.26 Same as above only hit ratio is now .95 instead: Ta = .95(10ns) + ( )(100ns) = 14.5ns Sc = 100ns / 14.5ns = 6.9

Cache vs Virtual Memory
6 Cache vs Virtual Memory Primary goal of Cache: increase Speed. Primary goal of Virtual Memory: increase Space.

Cache Replacement Algorithms
15 Cache Replacement Algorithms Replacement algorithm determines which block in cache is removed to make room. 2 main policies used today Least Recently Used (LRU) The block replaced is the one unused for the longest time. Random The block replaced is completely random – a counter-intuitive approach.

16 LRU vs Random Below is a sample table comparing miss rates for both LRU and Random. Cache Size Miss Rate: LRU Random 16KB 4.4% 5.0% 64KB 1.4% 1.5% 256KB 1.1% As the cache size increases there are more blocks to choose from, therefore the choice is less critical  probability of replacing the block that’s needed next is relatively low.

Virtual Memory Replacement Algorithms
17 Virtual Memory Replacement Algorithms 1) Optimal 2) First In First Out (FIFO) 3) Least Recently Used (LRU)

18 Optimal Replace the page which will not be used for the longest (future) period of time. Faults are shown in boxes; hits are not shown. 7 page faults occur

19 Optimal A theoretically “best” page replacement algorithm for a given fixed size of VM. Produces the lowest possible page fault rate. Impossible to implement since it requires future knowledge of reference string. Just used to gauge the performance of real algorithms against best theoretical.

FIFO 20 Faults are shown in boxes; hits are not shown.
When a page fault occurs, replace the one that was brought in first. Faults are shown in boxes; hits are not shown. 9 page faults occur

FIFO Simplest page replacement algorithm.
21 FIFO Simplest page replacement algorithm. Problem: can exhibit inconsistent behavior known as Belady’s anomaly. Number of faults can increase if job is given more physical memory i.e., not predictable

Example of FIFO Inconsistency
22 Example of FIFO Inconsistency Same reference string as before only with 4 frames instead of 3. Faults are shown in boxes; hits are not shown. 10 page faults occur

23 LRU Replace the page which has not been used for the longest period of time. Faults are shown in boxes; hits only rearrange stack 1 2 5 5 1 2 2 5 1 9 page faults occur

LRU More expensive to implement than FIFO, but it is more consistent.
24 LRU More expensive to implement than FIFO, but it is more consistent. Does not exhibit Belady’s anomaly More overhead needed since stack must be updated on each access.

Example of LRU Consistency
25 Example of LRU Consistency Same reference string as before only with 4 frames instead of 3. Faults are shown in boxes; hits only rearrange stack 1 2 1 2 5 4 1 5 1 2 3 4 2 5 1 2 3 4 4 4 7 page faults occur

Servicing a Page Fault Processor Signals Controller
(1) Initiate Block Read Processor Signals Controller Read block of length P starting at disk address X and store starting at memory address Y Read Occurs Direct Memory Access (DMA) Under control of I/O controller I / O Controller Signals Completion Interrupt processor OS resumes suspended process Processor Reg (3) Read Done Cache Memory-I/O bus (2) DMA Transfer I/O controller Memory disk Disk Disk disk

Handling Page Faults Memory reference causes a fault – called a page fault Page fault can happen at any time and place Instruction fetch In the middle of an instruction execution System must save all state Move page from disk to memory Restart the faulting instruction Restore state Backup PC – not easy to find out by how much – need HW help

Page Fault If there is ever a reference to a page, first reference will trap to OS  page fault Hardware traps to kernel General registers saved OS determines which virtual page needed OS checks validity of address, seeks page frame If selected frame is dirty, write it to disk OS brings schedules new page in from disk Page tables updated Faulting instruction backed up to when it began Faulting process scheduled Registers restored Program continues

What to Page in Demand paging brings in the faulting page
To bring in additional pages, we need to know the future Users don’t really know the future, but some OSs have user-controlled pre-fetching In real systems, load the initial page Start running Some systems (e.g. WinNT will bring in additional neighboring pages (clustering)) Demand paging – start with nothing – all PTE’s I=0 Then as execute, get faults as code and data needed ( demanded) by process

VM Page Replacement If there is an unused page, use it.
If there are no pages available, select one (Policy?) and If it is dirty (M == 1) write it to disk Invalidate its PTE and TLB entry Load in new page from disk Update the PTE and TLB entry! Restart the faulting instruction What is cost of replacing a page? How does the OS select the page to be evicted?

Measuring Demand Paging Performance
Page Fault Rate (p) 0 < p < (no page faults to every ref is a fault) Page Fault Overhead = fault service overhead + read page + restart process overhead Dominated by time to read page in Effective Access Time = (1-p) (memory access) + p (page fault overhead)

Performance Example Memory access time = 100 nanoseconds
Page fault overhead = 25 millisec (msec) Page fault rate = 1/1000 EAT = (1-p) * p * (25 msec) = (1-p) * p * 25,000,000 = ,999,900 * p = ,999,900 * 1/1000 = 25 microseconds! Want less than 10% degradation 110 > ,999,900 * p 10 > 24,999,900 * p p < or 1 fault in 2,500,000 accesses!

Page Replacement Algorithms
Want lowest page-fault rate. Evaluate algorithm by running it on a particular string of memory references (reference string) and computing the number of page faults on that string. Reference string – ordered list of pages accessed as process executes Ex. Reference String is A B C A B D A D B C B

The Best Page to Replace
The best page to replace is the one that will never be accessed again Optimal Algorithm - Belady’s Algorithm Lowest fault rate for any reference string Basically, replace the page that will not be used for the longest time in the future. If you know the future, please see me after class!! Belady’s Algorithm is a yardstick We want to find close approximations

Page Replacement - FIFO
FIFO is simple to implement When page in, place page id on end of list Evict page at head of list Might be good? Page to be evicted has been in memory the longest time But? Maybe it is being used We just don’t know FIFO suffers from Belady’s Anomaly – fault rate may increase when there is more physical memory!

FIFO vs. Optimal Reference string – ordered list of pages accessed as process executes Ex. Reference String is A B C A B D A D B C B OPTIMAL A B C A B D A D B C B System has 3 page frames 5 Faults toss C toss A or D A B C D FIFO A B C A B D A D B C B toss A toss ? 7 faults

Second Chance Maintain FIFO page list On page fault
Check reference bit If R == 1 then move page to end of list and clear R If R == 0 then evict page

Clock Replacement Create circular list of PTEs in FIFO Order
One-handed Clock – pointer starts at oldest page Algorithm – FIFO, but check Reference bit If R == 1, set R = 0 and advance hand evict first page with R == 0 Looks like a clock hand sweeping PTE entries Fast, but worst case may take a lot of time Two-handed clock – add a 2nd hand that is n PTEs ahead 2nd hand clears Reference bit

Not Recently Used Page Replacement Algorithm
Each page has Reference bit, Modified bit bits are set when page is referenced, modified Pages are classified not referenced, not modified not referenced, modified referenced, not modified referenced, modified NRU removes page at random from lowest numbered non empty class

Least Recently Used (LRU)
Replace the page that has not been used for the longest time 3 Page Frames Reference String - A B C A B D A D B C LRU – 5 faults A B C A B D A D B C

LRU Past experience may indicate future behavior
Perfect LRU requires some form of timestamp to be associated with a PTE on every memory reference !!! Counter implementation Every page entry has a counter; every time page is referenced through this entry, copy the clock into the counter. When a page needs to be changed, look at the counters to determine which are to change Stack implementation – keep a stack of page numbers in a double link form: Page referenced: move it to the top No search for replacement

LRU Approximations Aging Clock replacement Keep a counter for each PTE
Periodically – check Reference bit If R == 0 increment counter (page has not been used) If R == 1 clear the counter (page has been used) Set R = 0 Counter contains # of intervals since last access Replace page with largest counter value Clock replacement

Contrast: Macintosh Memory Model
MAC OS 1–9 Does not use traditional virtual memory All program objects accessed through “handles” Indirect reference through pointer table Objects stored in shared global address space P1 Pointer Table P2 Pointer Table Process P1 Process P2 Shared Address Space A B C D E “Handles”

Macintosh Memory Management
Allocation / Deallocation Similar to free-list management of malloc/free Compaction Can move any object and just update the (unique) pointer in pointer table P1 Pointer Table P2 Pointer Table Process P1 Process P2 Shared Address Space A B C D E “Handles”

Mac vs. VM-Based Memory Mgmt
Allocating, deallocating, and moving memory: can be accomplished by both techniques Block sizes: Mac: variable-sized may be very small or very large VM: fixed-size size is equal to one page (4KB on x86 Linux systems) Allocating contiguous chunks of memory: Mac: contiguous allocation is required VM: can map contiguous range of virtual addresses to disjoint ranges of physical addresses Protection Mac: “wild write” by one process can corrupt another’s data

MAC OS X “Modern” Operating System Based on MACH OS
Virtual memory with protection Preemptive multitasking Other versions of MAC OS require processes to voluntarily relinquish control Based on MACH OS Developed at CMU in late 1980’s

Page Replacement Policy
Working Set: Set of pages used actively & heavily Kept in memory to reduce Page Faults Set is found/maintained dynamically by OS Replacement: OS tries to predict which page would have least impact on the running program Common Replacement Schemes: Least Recently Used (LRU) First-In-First-Out (FIFO)

Page Replacement Policies
Least Recently Used (LRU) Generally works well TROUBLE: When the working set is larger than the Main Memory Working Set = 9 pages Pages are executed in sequence (08 (repeat)) THRASHING

First-In-First-Out(FIFO) Removes Least Recently Loaded page Does not depend on Use Determined by number of page faults seen by a page

Upon Replacement Need to know whether to write data back Add a Dirty-Bit Dirty Bit = 0; Page is clean; No writing Dirty Bit = 1; Page is dirty; Write back

Prof. Sin-Min Lee Department of Computer Science

Similar presentations

Presentation on theme: "Prof. Sin-Min Lee Department of Computer Science"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Prof. Sin-Min Lee Department of Computer Science

Similar presentations

Presentation on theme: "Prof. Sin-Min Lee Department of Computer Science"— Presentation transcript:

Similar presentations

About project

Feedback