Virtual Memory.

Virtual Memory

Announcements TA Swap My office hours today: 3:25-4:25
Tyler Steele is a 415 TA Rohit Doshi is a 414 TA My office hours today: 3:25-4:25

Review: Memory Hierarchy of a Modern Computer System
Take advantage of the principle of locality to: Present as much memory as in the cheapest technology Provide access at speed offered by the fastest technology On-Chip Cache Registers Control Datapath Secondary Storage (Disk) Processor Main Memory (DRAM) Second Level (SRAM) 1s 10,000,000s (10s ms) Speed (ns): 10s-100s 100s Gs Size (bytes): Ks-Ms Ms Tertiary (Tape) 10,000,000,000s (10s sec) Ts The design goal is to present the user with as much memory as is available in the cheapest technology (points to the disk). While by taking advantage of the principle of locality, we like to provide the user an average access speed that is very close to the speed that is offered by the fastest technology. (We will go over this slide in details in the next lecture on caches). +1 = 16 min. (X:56)

Review: A Summary on Sources of Cache Misses
Compulsory (cold start): first reference to a block “Cold” fact of life: not a whole lot you can do about it Note: When running “billions” of instruction, Compulsory Misses are insignificant Capacity: Cache cannot contain all blocks access by the program Solution: increase cache size Conflict (collision): Multiple memory locations mapped to same cache location Solutions: increase cache size, or increase associativity Two others: Coherence (Invalidation): other process (e.g., I/O) updates memory Policy: Due to non-optimal replacement policy (Capacity miss) That is the cache misses are due to the fact that the cache is simply not large enough to contain all the blocks that are accessed by the program. The solution to reduce the Capacity miss rate is simple: increase the cache size. Here is a summary of other types of cache miss we talked about. First is the Compulsory misses. These are the misses that we cannot avoid. They are caused when we first start the program. Then we talked about the conflict misses. They are the misses that caused by multiple memory locations being mapped to the same cache location. There are two solutions to reduce conflict misses. The first one is, once again, increase the cache size. The second one is to increase the associativity. For example, say using a 2-way set associative cache instead of directed mapped cache. But keep in mind that cache miss rate is only one part of the equation. You also have to worry about cache access time and miss penalty. Do NOT optimize miss rate alone. Finally, there is another source of cache miss we will not cover today. Those are referred to as invalidation misses caused by another process, such as IO , update the main memory so you have to flush the cache to avoid inconsistency between memory and cache. +2 = 43 min. (Y:23)

Review: Where does a Block Get Placed in a Cache?
Example: Block 12 placed in 8 block cache 32-Block Address Space: Block no. Block no. Direct mapped: block 12 can go only into block 4 (12 mod 8) Set associative: block 12 can go anywhere in set 0 (12 mod 4) Block no. Set 1 2 3 Fully associative: block 12 can go anywhere Block no.

Review: Other Caching Questions
What line gets replaced on cache miss? Easy for Direct Mapped: Only one possibility Set Associative or Fully Associative: Random LRU (Least Recently Used) What happens on a write? Write through: The information is written to both the cache and to the block in the lower-level memory Write back: The information is written only to the block in the cache Modified cache block is written to main memory only when it is replaced Question is block clean or dirty?

Caching Applied to Address Translation
TLB Virtual Address Physical Memory CPU Physical Address Cached? Yes Save Result No Data Read or Write (untranslated) Translate (MMU) Question is one of page locality: does it exist? Instruction accesses spend a lot of time on the same page (since accesses sequential) Stack accesses have definite locality of reference Data accesses have less page locality, but still some… Can we have a TLB hierarchy? Sure: multiple levels at different sizes/speeds

What Actually Happens on a TLB Miss?
Hardware traversed page tables: On TLB miss, hardware in MMU looks at current page table to fill TLB (may walk multiple levels) If PTE valid, hardware fills TLB and processor never knows If PTE marked as invalid, causes Page Fault, after which kernel decides what to do afterwards Software traversed Page tables (like MIPS) On TLB miss, processor receives TLB fault Kernel traverses page table to find PTE If PTE valid, fills TLB and returns from fault If PTE marked as invalid, internally calls Page Fault handler Most chip sets provide hardware traversal Modern operating systems tend to have more TLB faults since they use translation for many things Examples: shared segments user-level portions of an operating system

Goals for Today Virtual memory How does it work? When to fetch?
Page faults Resuming after page faults When to fetch? What to replace? Page replacement algorithms FIFO, OPT, LRU (Clock) Page Buffering Allocating Pages to processes

What is virtual memory? disk page table Physical memory Virtual memory
Each process has illusion of large address space 232 for 32-bit addressing However, physical memory is much smaller How do we give this illusion to multiple processes? Virtual Memory: some addresses reside in disk page table Physical memory disk Virtual memory page 0 page 1 page 2 page 3 page 4 page N

Virtual Memory Separates users logical memory from physical memory.
Only part of the program needs to be in memory for execution Logical address space can therefore be much larger than physical address space Allows address spaces to be shared by several processes Allows for more efficient process creation

Virtual Memory Load entire process in memory (swapping), run it, exit
Is slow (for big processes) Wasteful (might not require everything) Solutions: partial residency Paging: only bring in pages, not all pages of process Demand paging: bring only pages that are required Where to fetch page from? Have a contiguous space in disk: swap file (pagefile.sys)

How does VM work? Modify Page Tables with another bit (“valid”) 1 2 3
If page in memory, valid = 1, else valid = 0 If page is in memory, translation works as before If page is not in memory, translation causes a page fault 1 2 3 :V=1 4183 :V=0 177 :V=1 5721 :V=0 Disk Mem Page Table

Page Faults On a page fault:
OS finds a free frame, or evicts one from memory (which one?) Want knowledge of the future? Issues disk request to fetch data for page (what to fetch?) Just the requested page, or more? Block current process, context switch to new process (how?) Process might be executing an instruction When disk completes, set valid bit to 1, and current process in ready queue

Steps in Handling a Page Fault

Resuming after a page fault
Should be able to restart the instruction For RISC processors this is simple: Instructions are idempotent until references are done More complicated for CISC: E.g. move 256 bytes from one location to another Possible Solutions: Ensure pages are in memory before the instruction executes

Page Fault (Cont.) Restart instruction block move
auto increment/decrement location

When to fetch? Just before the page is used! Demand paging: Prepaging:
Need to know the future Demand paging: Fetch a page when it faults Prepaging: Get the page on fault + some of its neighbors, or Get all pages in use last time process was swapped

Performance of Demand Paging
Page Fault Rate 0  p  1.0 if p = 0 no page faults if p = 1, every reference is a fault Effective Access Time (EAT) EAT = (1 – p) x memory access + p (page fault overhead + swap page out + swap page in + restart overhead )

Demand Paging Example Memory access time = 200 nanoseconds
Average page-fault service time = 8 milliseconds EAT = (1 – p) x p (8 milliseconds) = (1 – p x p x 8,000,000 = p x 7,999,800 If one access out of 1,000 causes a page fault EAT = 8.2 microseconds. This is a slowdown by a factor of 40!!

What to replace? What happens if there is no free frame?
find some page in memory, but not really in use, swap it out Page Replacement When process has used up all frames it is allowed to use OS must select a page to eject from memory to allow new page The page to eject is selected using the Page Replacement Algorithm Goal: Select page that minimizes future page faults

Page Replacement Prevent over-allocation of memory by modifying page-fault service routine to include page replacement Use modify (dirty) bit to reduce overhead of page transfers – only modified pages are written to disk Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory

Page Replacement

Page Replacement Algorithms
Random: Pick any page to eject at random Used mainly for comparison FIFO: The page brought in earliest is evicted Ignores usage Suffers from “Belady’s Anomaly” Fault rate could increase on increasing number of pages E.g with frame sizes 3 and 4 OPT: Belady’s algorithm Select page not used for longest time LRU: Evict page that hasn’t been used the longest Past could be a good predictor of the future

First-In-First-Out (FIFO) Algorithm
Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 3 frames (3 pages can be in memory at a time per process): 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 4 frames: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 Belady’s Anomaly: more frames  more page faults 1 1 4 5 2 2 1 3 9 page faults 3 3 2 4 1 1 5 4 2 2 1 10 page faults 5 3 3 2 4 4 3

FIFO Illustrating Belady’s Anomaly

Optimal Algorithm Replace page that will not be used for longest period of time 4 frames example 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 How do you know this? Used for measuring how well your algorithm performs 1 4 6 page faults 2 3 4 5

Least Recently Used (LRU) Algorithm
Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 1 1 1 1 5 2 2 2 2 2 3 5 5 4 4 4 4 3 3 3

Implementing Perfect LRU
On reference: Time stamp each page On eviction: Scan for oldest frame Problems: Large page lists Timestamps are costly Approximate LRU LRU is already an approximation! 13 t=4 t=14 t=5 0xffdcd: add r1,r2,r3 0xffdd0: ld r1, 0(sp) 14 14

LRU: Clock Algorithm Each page has a reference bit Algorithm: Problem:
Set on use, reset periodically by the OS Algorithm: FIFO + reference bit (keep pages in circular list) Scan: if ref bit is 1, set to 0, and proceed. If ref bit is 0, stop and evict. Problem: Low accuracy for large memory R=1 R=1 R=0 R=0 R=1 R=0 R=1 R=1 R=1 R=0 R=0

LRU with large memory Solution: Add another hand What if angle small?
Leading edge clears ref bits Trailing edge evicts pages with ref bit 0 What if angle small? What if angle big? R=1 R=0

Clock Algorithm: Discussion
Sensitive to sweeping interval Fast: lose usage information Slow: all pages look used Clock: add reference bits Could use (ref bit, modified bit) as ordered pair Might have to scan all pages LFU: Remove page with lowest count No track of when the page was referenced Use multiple bits. Shift right by 1 at regular intervals. MFU: remove the most frequently used page LFU and MFU do not approximate OPT well

Page Buffering Cute simple trick: (XP, 2K, Mach, VMS) evict add used
Keep a list of free pages Track which page the free page corresponds to Periodically write modified pages, and reset modified bit add evict used free unmodified free list modified list (batch writes = speed)

Allocating Pages to Processes
Global replacement Single memory pool for entire system On page fault, evict oldest page in the system Problem: protection Local (per-process) replacement Have a separate pool of pages for each process Page fault in one process can only replace pages from its own process Problem: might have idle resources

Allocation of Frames Each process needs minimum number of pages
Example: IBM 370 – 6 pages to handle SS MOVE instruction: instruction is 6 bytes, might span 2 pages 2 pages to handle from 2 pages to handle to Two major allocation schemes fixed allocation priority allocation

Summary Demand Paging: Transparent Level of Indirection
Treat memory as cache on disk Cache miss  get page from disk Transparent Level of Indirection User program is unaware of activities of OS behind scenes Data can be moved without affecting application correctness Replacement policies FIFO: Place pages on queue, replace page at end OPT: replace page that will be used farthest in future LRU: Replace page that hasn’t be used for the longest time Clock Algorithm: Approximation to LRU Arrange all pages in circular list Sweep through them, marking as not “in use” If page not “in use” for one pass, than can replace

Virtual Memory.

Similar presentations

Presentation on theme: "Virtual Memory."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Virtual Memory.

Similar presentations

Presentation on theme: "Virtual Memory."— Presentation transcript:

Similar presentations

About project

Feedback