Chapter 19 Translation Lookaside Buffer Chien-Chung Shen CIS, UD
Introduction High performance overheads of paging –large amount of mapping information (in memory) –extra memory access for each virtual address Hardware support –translation-lookaside buffer (TLB) –part of MMU –hardware cache of popular virtual-to-physical address translations –better name would be address-translation cache –Upon each virtual memory reference, hardware first checks TLB to see if the desired translation is held therein; if so, the translation is performed (quickly) without having to consult the page table (which has all translations)
TLB Algorithm VPN = (VirtualAddress & VPN_MASK) >> SHIFT (Success, TlbEntry) = TLB_Lookup(VPN) if (Success == True) // TLB Hit if (CanAccess(TlbEntry.ProtectBits) == True) Offset = VirtualAddress & OFFSET_MASK PhysAddr = (TlbEntry.PFN << SHIFT) | Offset AccessMemory(PhysAddr) else RaiseException(PROTECTION_FAULT) else // TLB Miss PTEAddr = PTBR + (VPN * sizeof(PTE)) PTE = AccessMemory(PTEAddr) if (PTE.Valid == False) RaiseException(SEGMENTATION_FAULT) else if (CanAccess(PTE.ProtectBits) == False) RaiseException(PROTECTION_FAULT) else TLB_Insert(VPN, PTE.PFN, PTE.ProtectBits) RetryInstruction()
Example: Access Array 8-bit virtual address space and 16-byte pages 10 4-byte integers starting at VA bit VPN and 4-bit offset int sum = 0; for (i = 0; i < 10; i++) { sum += a[i]; } TLB hit rate: 70% Spatial locality Any other way to improve hit rate? –larger pages Quick re-reference of memory in time –temporal locality
Caching and Locality Caching is one of the most fundamental performance techniques in computer systems to make common-case faster Idea behind caching is to take advantage of locality in instruction and data references Temporal locality: an instruction or data item that has been recently accessed will likely be re-accessed soon in the future (e.g., instructions in a loop) Spatial locality: if program accesses memory x, it will likely soon access memory near x
Who handles TLB Misses For CISC (complex-instruction set computers) architecture, by hardware –using page-table base register For RISC (reduced-instruction set computers) architecture, by software (where hardware simply raises an exception and jumps to a trap handler) –advantage: flexibility (OS may use any data structure to implement page table) and simplicity –return-from-trap returns to the same instruction that caused the trap –avoid causing an infinite chain of TLB misses keep TLB miss handlers in physical memory (not subject to address translation) reserve some entries in TLB for permanently-valid translations and use some of those permanent translation slots for the handler code itself
TLB Contents 32, 64, or 128 entries Fully associative: any given translation can be anywhere in TLB, and hardware will search the entire TLB in parallel to find the desired translation An entry looks like: VPN | PFN | other bits –e.g., valid bit –TLB valid bit ≠ page table valid bit in page table, when a PTE is marked invalid, it means that the page has not been allocated by the process a TLB valid bit refers to whether a TLB entry has a valid translation within it
Context Switch TLB contains virtual-to-physical translations that are only valid for the currently running process, which are not meaningful for other processes What to do on a context switch? –flush TLB on context switches by sets all valid bits to 0 Incur TLB misses after context switches: what can you do better? VPN PFN valid prot ASID (Address Space ID) rwx 1 — — 0 — — rwx 2 — — 0 — — With ASID, TLB may hold translations from different processes VPN PFN valid prot ASID rwx 1 — — 0 — — rwx 2 — — 0 — — Sharing of page
Replacement Policy Cache replacement with goal of minimizing miss rate Policies –evict the least-recently-used (LRU) entry how about a loop accessing n + 1 pages, a TLB of size n, and an LRU replacement policy ? –random
A Real TLB Entry MIPS R4000 with software-managed TLB
Culler’s Law The term random-access memory (RAM) implies that you can access any part of RAM just as quickly as another. While it is generally good to think of RAM in this way, because of hardware/OS features such as TLB, accessing a particular page of memory may be costly, particularly if that page isn’t currently mapped by TLB. Thus, it is always good to remember the implementation tip: RAM isn’t always RAM. Sometimes randomly accessing your address space, particular if the number of pages accessed exceeds the TLB coverage, can lead to severe performance penalties. -- David Culler TLB is the source of many performance problems