Accelerating Two-Dimensional Page Walks for Virtualized Systems Jun Ma.

Slides:



Advertisements
Similar presentations
Multilevel Page Tables
Advertisements

虛擬化技術 Virtualization Technique
Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Virtual Memory I Steve Ko Computer Sciences and Engineering University at Buffalo.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
A. Frank - P. Weisberg Operating Systems Simple/Basic Paging.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
EECE476: Computer Architecture Lecture 27: Virtual Memory, TLBs, and Caches Chapter 7 The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
11/10/2005Comp 120 Fall November 10 8 classes to go! questions to me –Topics you would like covered –Things you don’t understand –Suggestions.
Virtual Memory Main Memory Magnetic Disk Upper level Lower level.
Lecture 21 Last lecture Today’s lecture Cache Memory Virtual memory
CSE431 L22 TLBs.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 22. Virtual Memory Hardware Support Mary Jane Irwin (
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Revisiting Hardware-Assisted Page Walks for Virtualized Systems
8.1 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Paging Physical address space of a process can be noncontiguous Avoids.
Low-Power Cache Organization Through Selective Tag Translation for Embedded Processors with Virtual Memory Support Xiangrong Zhou and Peter Petrov Proceedings.
Virtual Memory Part 1 Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology May 2, 2012L22-1
The Three C’s of Misses 7.5 Compulsory Misses The first time a memory location is accessed, it is always a miss Also known as cold-start misses Only way.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
Virtual Memory 1 1.
Operating Systems Unit 7: – Virtual Memory organization Operating Systems.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Protection of Processes Security and privacy of data is challenging currently. Protecting information – Not limited to hardware. – Depends on innovation.
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
Lectures 8 & 9 Virtual Memory - Paging & Segmentation System Design.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Operating Systems Session 7: – Virtual Memory organization Operating Systems.
3/1/2002CSE Virtual Memory Virtual Memory CPU On-chip cache Off-chip cache DRAM memory Disk memory Note: Some of the material in this lecture are.
The Memory Hierarchy Lecture 31 20/07/2009Lecture 31_CA&O_Engr. Umbreen Sabir.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 33 Paging Read Ch. 9.4.
Agile Paging: Exceeding the Best of Nested and Shadow Paging
Memory Management & Virtual Memory. Hierarchy Cache Memory : Provide invisible speedup to main memory.
Memory: Page Table Structure
CMSC 611: Advanced Computer Architecture
Lecture 11 Virtual Memory
Virtual Memory Chapter 7.4.
Memory COMPUTER ARCHITECTURE
Lecture 12 Virtual Memory.
Virtual Memory - Part II
Virtual Memory User memory model so far:
From Address Translation to Demand Paging
Section 9: Virtual Memory (VM)
From Address Translation to Demand Paging
CS 704 Advanced Computer Architecture
The University of Adelaide, School of Computer Science
Some Real Problem What if a program needs more memory than the machine has? even if individual programs fit in memory, how can we run multiple programs?
Morgan Kaufmann Publishers
CS510 Operating System Foundations
CSCI206 - Computer Organization & Programming
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Part V Memory System Design
TLC: A Tag-less Cache for reducing dynamic first level Cache Energy
FIGURE 12-1 Memory Hierarchy
Virtual Memory Hardware
Translation Buffers (TLB’s)
TLB Performance Seung Ki Lee.
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Virtual Memory 1 1.
Presentation transcript:

Accelerating Two-Dimensional Page Walks for Virtualized Systems Jun Ma

Introduction Native non-virtualized system We have a OS running on a physical system. OS communicates with physical system directly. Address Mapping: Virtual Address: The address used in OS application software. Physical Address: The address in physical machine. For native system: VA->PA.

Introduction Virtualization: Multiple OS can run simultaneously but separately on one physical system. hypervisor: underlying software used to insert abstractions into virtualized system and manipulate the communication between OS and physical system.

Introduction Virtualization: Address mapping for Virtual Machine. Guest OS: Guest Virtual Address (GVA), Guest Physical Address. (GPA) Physical system: System Physical Address(SPA). Address translation: GVA->GPA->SPA

Introduction Virtualization: Tradition idea for memory translation: manipulated by hypervisor. Drawbacks: hypervisor intercepts operation, exits guest, emulates the operation and does memory translation and then return back to guest. -> high overhead. Alternative idea: Using hardware to finish translation. Don’t need hypervisor, save overhead.

Background X86 Native Page Translation Page table: use hierarchical address-translation tables to map VA to PA. Page walk: an iterative process. In order to get the final PA from VA, we need a page walk and traverse all level page table hierarch.

Background X86 Native Page Translation From level 4 down to level 1. A physical address from above level is used as base address and 9-bit VA is used as offside. TLB(Translation look-aside buffers) caches the final physical address to reduce frequency of page walks.

Background Memory Management for Virtualization Without hardware support, we should use hypervisor to manipulate this translation. This is one important overhead for hypervisor. (Using shadow page table to map GVA to SPA) Hardware mechanism: Same idea as X86 page walking. (2D page walking) Nested paging: map GPA to SPA.

Background Memory Management for Virtualization Traverse guest page table to translate GVA to GPA. For each level, original GPA should be translated to SPA by walking nested page table for each gL (guest page table) to read. TLB caches the final SPA to reduce page walk overhead.

Background Large page size advantages: * Memory saving: With 4 KB pages, an OS should use entire L1 table which is 4 KB large. If we can make all KB into a 2 MB contiguous block, we can escape L1 so we save 4 KB space used by L1. * Reduction in TLB pressure: Each large page table entry can be stored in a single TLB entry while the corresponding regular page entries require KB TLB entries to map the same 2 MB range of virtual address. *Shorter page walk: Escape the entire L1, the page walking is shorter and therefore save some overhead.

Page walk characterization Page walk cost Perfect TLB Opportunity means the performance improvement that could be achieved with a perfect TLB which eliminates cold misses as well as conflict and capacity misses.

Page walk characterization Page entry reuses

Page walk characterization Page entry reuses

Page walk characterization Page entry reuses Nested page tables have much higher reuse than guest page tables, in part due to the inherent redundancy of the nested page walk. There are many more nested accesses than guest accesses in a 2D page walk. Each level of the nested page table hierarchy must be accessed for each guest level. In many cases the same nested page entries are accessed multiple times in a 2D page walk (high reuse rate).

Page walk characterization Page entry reuses and both have high unique page entries because both of them map guest data into their respective address space. maps GVA-> GPA. maps GPA -> SPA. So these two are most difficult to be cached.

Page Walk Acceleration AMD Opteron Translation Caching: Page walk cache(PWC): stores page entries from all page table levels except L1, which is stored in TLB. All page entries are initially brought into L2 cache. On a PWC miss, the page entry data may reside in the L2 cache, L3 cache(if present).

Page Walk Acceleration Translation caching for 2D page walks

Page Walk Acceleration Translation caching for 2D page walks One –Dimensional PWC(1D_PWC) : Only page entry data from the guest dimension are stored in the PWC and the entries are tagged based on the system physical address. The lowest level guest page table entry {G,gL1} is not cached in the PWC because of its low reuse rate. Two-Dimensional PWC (2D PWC): Extends 1D PWC into the nested dimension of the 2D page walk. Turning the 20 unconditional cache hierarchy accesses into 16 likely PWC hits (dark-filled references in Figure 5(b)) and four possible PWC hits (checkered references. Like 1D PWC, all page entries are tagged with their system physical address and {G,gL1} is not cached.

Page Walk Acceleration Translation caching for 2D page walks Two-Dimensional PWC with Nested Translations (2D PWC+NT): Augment 2D PWC with a dedicated GPA to SPA translation buffer, the Nested TLB (NTLB), which is used to reduce the average number of page entry references that take place during a 2D page walk. The NTLB uses the guest physical address of the guest page entry to cache the corresponding nL1 entry. The page walk begins by accessing the NTLB with the guest physical address of {G,gL4} and produce the data of {nL1,gL4}, allowing nested references 1-4 to be skipped. On an NTLB hit, the system physical address of {G,gL4} needed for the PWC access is calculated.

Result Benchmark we will use in the following slides:

Result The three hardware-only page walk caching schemes improve performance by turning page entry memory hierarchy references into lower latency PWC accesses and, in the case of 2D PWC+NT, skipping some page entry references entirely.

Result Left side: G column is not skipped, so it does not change. So does gPA row. gL1 in 2D_PWC+NT is skipped in 2D_PWC+NT though it has a low reuse rate. So it exhibits a shorter space in 2D_PWC_NT than in 2D_PWC. Right side: NTLB eliminates many of the PWC accesses, but it does not eliminate a significant portion of the accesses that have the highest penalty.

Result The first data column states that L2 accesses incurred during a 2D page walk using the 2D PWC+NT configuration generate times more L2 misses than the native page walk. This increase is primarily because the native page walk has fewer entries that are difficult to cache (L1 and sometimes L2) compared to the 2D page walk ({G,gL1}, {nL1,gPA} and sometimes {G,gL2}, {nL2,gPA}, {nL1,gL1}, and {nL2,gL1}). The second data column shows the L2 cache miss percentage due only to page entries from the 2D page walk. The miss percentages are relatively high because the PWC and NTLB have filtered the easy-to-cache accesses and the remaining accesses are difficult to cache.

Result The 8096 w/(G, gL1) configuration is unique in that it writes the gL1 guest page entry to the PWC.

Result Large pages allow the TLB to cover a larger data region with fewer translations, which will lead to less TLB missing. (the nL1 references for the gPA, gL1, gL2, gL3,and gL4 levels are all eliminated. ) The ability to eliminate poor-locality references, like {nL1,gL1} and {nL1,gPA}, reduces the number of L2 cache misses by 60%-64%.

Conclusion Nested paging is a hardware technique to reduce the complexity of software memory management during system virtualization. Nested page tables combine with the guest page tables to map GPA to SPA, resulting in a two-dimensional (2D) page walk(2D_PWC, 2D_PWC+NT). A hypervisor is no longer required to trap on all guest page table updates and significant virtualization overhead is eliminated. However, nested paging can introduce new overhead due to the increase in page entry references. Therefore, the overall performance of a virtualized system is improved by nested paging when the eliminated hypervisor memory management overhead is greater than the new 2D page walk overhead.