1 1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
2 1998 Morgan Kaufmann Publishers Virtual Memory: Motivations To allow efficient and safe sharing of memory among multiple programs. To remove the programming burdens of a small, limited amount of main memory.
3 1998 Morgan Kaufmann Publishers Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation –protection
4 1998 Morgan Kaufmann Publishers Pages: virtual memory blocks Page faults: the data is not in memory, retrieve it from disk –huge miss penalty, thus pages should be fairly large (e.g., 4KB) –reducing page faults is important (LRU is worth the price) –can handle the faults in software instead of hardware –using write-through is too expensive so we use write-back
5 1998 Morgan Kaufmann Publishers Placing a Page and Finding It Again We want the ability to use a clever and flexible replacement scheme. We want to reduce page fault rate. Fully-associative placement serves our purposes. But full search is impractical, so we locate pages by using a full table that indexes the memory. ==> page table (resides in memory) Each program has it own page table, which maps the virtual address space of that program to main memory.
6 1998 Morgan Kaufmann Publishers Page Table Register le register Page table
7 1998 Morgan Kaufmann Publishers Process The page table, together with the program counter and the registers, specifies the state of a program. If we want to allow another program to use the CPU, we must save this state. We often refer to this state as a process. A process is considered active when it’s in possession of the CPU.
8 1998 Morgan Kaufmann Publishers Dealing With Page Faults When the valid bit for a virtual page is off, a page fault occurs. The operating system takes over, and the transfer is done with the exception mechanism. The OS must find the page in the next level of hierarchy, and decide where to place the requested page in the main memory. LRU policy is often used.
9 1998 Morgan Kaufmann Publishers Page Tables
10 1998 Morgan Kaufmann Publishers What About Writes? Write-back scheme is used because write-through takes too much time! Also known as copy-back. To determine whether a page needs to be copied back when we choose to replace it, a dirty bit is added to the page table. The dirty bit is set when any word in the page is written.
11 1998 Morgan Kaufmann Publishers Making Address Translation Fast A cache for address translations: translation-lookaside buffer (TLB) age or disk address Physical memory Disk storage
12 1998 Morgan Kaufmann Publishers Typical Values for TLB TLB (also known as translation cache) size: entries Block size: 1-2 page table entries (typically 4-8 bytes each) Hit time: clock cycle Miss penalty: clock cycles Miss rate: 0.01%-1%
13 1998 Morgan Kaufmann Publishers Integrating VM, TLBs and Caches rty Tag TLB hit Physical page number Physical address tag TLB Physical address
14 1998 Morgan Kaufmann Publishers TLBs and caches
15 1998 Morgan Kaufmann Publishers Overall Operation of a Memory Hierarchy TLBPage Table CachePossible? If so, under what circumstances? Hit MissPossible MissHit TLB misses, but entry found in page table, after retry, data is found in cache MissHitMissTLB misses, but entry found in page table, after retry, data misses in cache Miss TLB misses and followed by page fault, after retry, data must miss in cache HitMiss Impossible HitMissHitImpossible Miss HitImpossible Possible combinations of events in TLB, VM and Cache
16 1998 Morgan Kaufmann Publishers Implementing Protection with Virtual Memory The OS takes care of this. Hardware need to provide at least three capabilities: –support at least two modes that indicate whether the running process is a user process or an OS process (kernel process, supervisor process, executive process) –provide a portion of the CPU state that a user process can read but not write. –Provide mechanisms whereby the CPU can go from the user mode to supervisor mode.
17 1998 Morgan Kaufmann Publishers A Common Framework for Memory Hierarchies Question 1: Where can a block be placed? Question 2: How is a block found? Question 3: Which block should be replaced on a cache miss? Question 4: What happens on a Write?
18 1998 Morgan Kaufmann Publishers The Three Cs Compulsory misses (cold-start misses) Capacity misses Conflict misses (collision misses)
19 1998 Morgan Kaufmann Publishers Modern Systems: Intel P4 and AMD Opteron Very complicated memory systems:
20 1998 Morgan Kaufmann Publishers Processor speeds continue to increase very fast — much faster than either DRAM or disk access times Design challenge: dealing with this growing disparity Trends: –synchronous SRAMs (provide a burst of data) –redesign DRAM chips to provide higher bandwidth or processing –restructure code to increase locality –use prefetching (make cache visible to ISA) Some Issues