Presentation is loading. Please wait.

Presentation is loading. Please wait.

Virtual Memory: Implementing Paging

Similar presentations


Presentation on theme: "Virtual Memory: Implementing Paging"— Presentation transcript:

1 Virtual Memory: Implementing Paging
CS/COE 1541 (term 2174) Jarrett Billingsley

2 Class Announcements HW4 will be out tomorrow or Friday.
Statistics tracking in your project will be… tricky ;) For example: in a write-through, write-no-allocate data cache, where there is more than 1 word per block… Starting tomorrow evening I’ll have a lot more time to work on my implementation and get you more accurate statistics. My phone has an 8-core CPU with 4GB memory. 4 of the cores are in-order 2-way superscalar 4 of the cores are out-of-order 3-way superscalar All 8 have SIMD instructions, hardware OS virtualization, and L1 and L2 caches, all multiple-way set-associative I just. I can’t believe this. 3/1/2017 CS/COE 1541 term 2174

3 A small rant on the book It says we’re in the “Post-PC era”, as if PCs are somehow on their way out, and will be replaced by tablets and phones. (page 6) Yes, we’ve shifted a lot of tasks from PCs to phones, but… We’re not gonna start making websites and programming and doing financial analyses on a damn touchscreen. Our “personal mobile devices” are mainly for communication and consumption, not production. The typewriter never made pens and pencils obsolete. Also, the operating systems on our phones are just as sophisticated as any other. Virtual memory is not some “cloud computing” concept applicable only to virtual machines. I mean, eight cores in a fucking phone. </rant> 3/1/2017 CS/COE 1541 term 2174

4 Virtual Memory in Depth
3/1/2017 CS/COE 1541 term 2174

5 But first, some terms Cache Term VMem Term Block Page Miss Page fault
I’ll be using VMem as shorthand for “virtual memory” and PMem as shorthand for “physical memory.” I’ll be using VM as shorthand for “virtual machine.” (confusing?) There are some differences between the terms that CPU caches and VMem use for very similar concepts. Last, VA means “virtual address,” and PA means “physical address.” Okay? Okay. Cache Term VMem Term Block Page Miss Page fault 3/1/2017 CS/COE 1541 term 2174

6 The main goals of VMem Remember this?
Code Memory Process 1 0x8000 0xFFFF Process 2 Remember this? Probably the biggest is protection. If processes don’t even know that other processes exist, they can’t interfere with one another. One program can’t crash another! A corollary is collaborative execution. Running multiple programs at once is useful! And the OS does that with relocation. Programs might be written to assume addresses start at 0x8000, but OS can put it anywhere in PMem that’s convenient! 3/1/2017 CS/COE 1541 term 2174

7 Secondary goal: page-swapping (“paging”)
In other words, using RAM as a cache for the disk. (Or, using the disk as nonvolatile backing for the RAM.) You can run more programs than memory supports. e.g. 100 processes all want 50 MB of RAM. That’s 5GB! But most of them are sleeping. So put the memory for the sleeping ones on disk, and bring it back in when needed. You can hibernate the system. Copy RAM out to the disk and shut down. Then you can resume later, like nothing happened. You could even move that RAM image to another computer! You can randomly access parts of large files. The VMem system can bring in only the pages that you access. 3/1/2017 CS/COE 1541 term 2174

8 But what’s wrong with spinning disks?
They’re slow. To be precise, they have high latency, but also pretty good bandwidth! Does this sound familiar…? The latency is usually on the order of ~5ms. With a 3GHz CPU, that’s 15M cycles. But the bandwidth can be decent! Nowhere near as fast as RAM but still >100MB/sec. So this places a big design constraint on virtual memory paging: Miss penalty is huge. So we have to reduce the miss rate as much as possible. 3/1/2017 CS/COE 1541 term 2174

9 Avoiding misses What size of block (page)?
Big. Usually 4KiB or larger with VMem. What associativity? Fully-associative. What block replacement scheme? LRU, or some approximation. What write scheme? Write-back. What else could reduce stalls? A write buffer. Most HDDs have one built in now. Smart programming. OSes and HDDs have complex access scheduling algorithms to optimize for throughput. We’ll talk more about paging next class. 3/1/2017 CS/COE 1541 term 2174

10 Address translation 3/1/2017 CS/COE 1541 term 2174

11 The yellow pages of memory
To do address translation efficiently, we need a data structure. Pretty much all architectures today use a page table (PT). This is a “directory” which maps from VAs to PAs. The page table has multiple page table entries (PTEs). Each has the mapping from VA to PA, of course Each has valid, dirty, and reference bits Last, each contains protection information Which process does this page belong to? Given that each process has its own virtual memory space, can we use just one page table? No. Each process gets its own. We can switch between pages by using a page table register (PTR) in the CPU. It just points to the current page table. 3/1/2017 CS/COE 1541 term 2174

12 How it works Say we have lw $t0, 16($s0).
What do we need to do to before we can access PMem? Translate from VA to PA. First we have to look up the PTE that holds this VA. We use the PTR as a base address to load the right PTE. Then we can check that it’s valid. If not, this is a page fault. Then we can get the PA that corresponds to the VA. NOW we can access memory for the lw! But there’s a catch. Where’s the PT located? In the CPU? No, it’s also in memory. Should we cache it??? 3/1/2017 CS/COE 1541 term 2174

13 It just never ends Why not have a cache dedicated to PTEs?
We call this the translation lookaside buffer (TLB). I don’t know why. Each block contains 1 or more PTEs. On a hit… Hey, instant VA -> PA translation! On a miss… Uh oh. Is it just that the PTE we need isn’t cached? Or is it that it’s a page fault? Which one is more likely? 3/1/2017 CS/COE 1541 term 2174

14 TLB Performance We can’t even access L1 cache without doing translation. So how fast does the TLB have to be? Fast. Therefore: TLB is small (hundreds of blocks) Set associativity to reduce miss rate Random replacement for quick access (no complex LRU!) Write-back for less bus traffic 3/1/2017 CS/COE 1541 term 2174


Download ppt "Virtual Memory: Implementing Paging"

Similar presentations


Ads by Google