Address Translation Case Studies Questions answered in this lecture: Case studies: x86, PowerPC, SPARC architecture How are page tables organized? What.

Slides:

Advertisements

Similar presentations

Paging Andrew Whitaker CSE451.

Advertisements

Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.

Paging 1 CS502 Spring 2006 Paging CS-502 Operating Systems.

EECS 470 Virtual Memory Lecture 15. Why Use Virtual Memory? Decouples size of physical memory from programmer visible virtual memory Provides a convenient.

4/14/2017 Discussed Earlier segmentation - the process address space is divided into logical pieces called segments. The following are the example of types.

Virtual Memory Chapter 18 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.

COMP 3221: Microprocessors and Embedded Systems Lectures 27: Virtual Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Virtual Memory I Steve Ko Computer Sciences and Engineering University at Buffalo.

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Virtual Memory 2 P & H Chapter

CS 153 Design of Operating Systems Spring 2015

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

CS 333 Introduction to Operating Systems Class 12 - Virtual Memory (2) Jonathan Walpole Computer Science Portland State University.

CS 333 Introduction to Operating Systems Class 12 - Virtual Memory (2) Jonathan Walpole Computer Science Portland State University.

Memory Management (II)

Memory Management and Paging CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han.

Computer ArchitectureFall 2007 © November 21, 2007 Karem A. Sakallah Lecture 23 Virtual Memory (2) CS : Computer Architecture.

CS 333 Introduction to Operating Systems Class 11 – Virtual Memory (1)

CE6105 Linux 作業系統 Linux Operating System 許富皓. Chapter 2 Memory Addressing.

Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.

©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan

©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 24 Instructor: L.N. Bhuyan

Address Translation with Paging Case studies for X86, SPARC, and PowerPC.

Address Translation Mechanism of 80386

Lecture 19: Virtual Memory

Computer Architecture Memory Management Units Iolanthe II - Reefed down, heading for Great Barrier Island.

Operating Systems ECE344 Ding Yuan Paging Lecture 8: Paging.

1 Linux Operating System 許富皓. 2 Memory Addressing.

© 2004, D. J. Foreman 1 Virtual Memory. © 2004, D. J. Foreman 2 Objectives  Avoid copy/restore entire address space  Avoid unusable holes in memory.

8.1 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Paging Physical address space of a process can be noncontiguous Avoids.

Computer Structure 2012 – VM 1 Computer Structure X86 Virtual Memory and TLB Franck Sala Slides from Lihu and Adi’s Lecture.

Virtual Memory Muhamed Mudawar CS 282 – KAUST – Spring 2010 Computer Architecture & Organization.

Virtual Memory Part 1 Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology May 2, 2012L22-1

1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.

Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.

CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.

Lab4: Virtual Memory CS 3410 : Computer System Organization & Programming Spring 2015.

CHAPTER 3-3: PAGE MAPPING MEMORY MANAGEMENT. VIRTUAL MEMORY Key Idea Disassociate addresses referenced in a running process from addresses available in.

Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.

Memory Management Continued Questions answered in this lecture: What is paging? How can segmentation and paging be combined? How can one speed up address.

LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”

CS203 – Advanced Computer Architecture Virtual Memory.

Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.

Chapter 19 Translation Lookaside Buffer

CS161 – Design and Architecture of Computer

Translation Lookaside Buffer

Memory Management Virtual Memory.

ECE232: Hardware Organization and Design

CS161 – Design and Architecture of Computer

Lecture 12 Virtual Memory.

From Address Translation to Demand Paging

From Address Translation to Demand Paging

Address Translation Mechanism of 80386

Chapter 8: Main Memory Source & Copyright: Operating System Concepts, Silberschatz, Galvin and Gagne.

Paging Adapted from: © Ed Lazowska, Hank Levy, Andrea And Remzi Arpaci-Dussea, Michael Swift.

Memory Hierarchy Virtual Memory, Address Translation

Virtual Memory 3 Hakim Weatherspoon CS 3410, Spring 2011

Lecture 14 Virtual Memory and the Alpha Memory Hierarchy

Translation Lookaside Buffer

CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs

© 2004 Ed Lazowska & Hank Levy

CSE451 Virtual Memory Paging Autumn 2002

CSE 451: Operating Systems Autumn 2004 Page Tables, TLBs, and Other Pragmatics Hank Levy 1.

CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs

Lecture 8: Efficient Address Translation

CS703 - Advanced Operating Systems

CSE 451: Operating Systems Winter 2005 Page Tables, TLBs, and Other Pragmatics Steve Gribble 1.

Sarah Diesburg Operating Systems CS 3430

Paging Andrew Whitaker CSE451.

Paging Adapted from: © Ed Lazowska, Hank Levy, Andrea And Remzi Arpaci-Dussea, Michael Swift.

Presentation transcript:

Address Translation Case Studies Questions answered in this lecture: Case studies: x86, PowerPC, SPARC architecture How are page tables organized? What do PTEs contain? How can address translation be performed quickly? UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 537 Introduction to Operating Systems Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau

Generic Page Table Memory divided into pages Page table is a collection of PTEs (page table entries) Maps a virtual page number to physical page (or another page table) If no virtual to physical mapping => seg fault or page fault Organization and content vary with architecture Virtual page # Physical page # (or page fault) ? … Page Table

Generic PTE PTE maps virtual page to physical page Includes some page properties Valid?, writable?, dirty?, cacheable? Physical Page #Property bitsVirtual Page # Terminology PTE = page table entry PDE = page directory entry VA = virtual address PA = physical address VPN = virtual page number {R,P}PN = {real, physical} page number

Designing Page Tables Requirements/Goals Minimize memory use (PTs are pure overhead) Fast (logically accessed on every memory ref) Requirements lead to Compact data structures –Support sparse address spaces O(1) access –e.g. indexed lookup, hashtable Case Studies: x86 and PowerPC

Case Study 1: x86-32 Basic Idea: Page tables organized as a two-level tree “Outer” and “inner” page tables Efficient because address space is sparse Each level of tree indexed using part of VPN for fast lookups Hardware support CPU walks the page tables to find translations Architecturally-defined page tables CPU updates accessed and dirty bits (useful for policies in future lectures) How does CPU know where page tables live? Current set of page tables pointed to by CR3 One set of page tables per process Page sizes: 4K or 4M (sometimes 2M)

x86-32 Page Table Lookup Top 10 va bits: index page directory returns PDE that points to a page table Next 10 va bits: index page table returns PTE that points to a physical memory page Bottom 12 va bits: offset to a single byte in the physical page Checks made at each step: Ensure desired page is available in memory Process has sufficient rights to access page Page Directory Page Tables Physical Memory 32-bit virtual address 10-bit page dir index10-bit page tbl index12-bit offset of byte in page

x86-32 PDE and PTE Details Available Global PAT Dirty Accessed Cache Disabled Write-through User/Supervisor Read/Write Present IA-32 Intel Architecture Software Developer’s Manual, Volume 3, pg bit page number of a physical memory page12 bit properties If a page is not present, all but bit 0 are available for OS Available Global 4K or 4M Page Reserved Accessed Cache Disabled Write-through User/Supervisor Read/Write Present 20 bit page number of a PTE12 bit properties Page Directory Entry (PDE) Page Table Entry (PTE)

X86-32 and PAE Problem: How to handle larger physical memory sizes?? How much physical memory can 32 bits address? Pentium Pro adds support for up to 64 GB of physical memory How many physical address bits are needed? Idea: Map 32-bit address space to 36-bit physical space Single-process logical address space is still 32 bits Physical Address Extensions (PAE): Two additions New CPU mode: PAE –In PAE mode, 32-bit VAs map to 36-bit PAs Another layer in the page tables needed –To hold 24-bit PPN, PDEs and PTEs expanded to 64 bits How many PTE fit in a 4KB page now? –Still want to be able to fit page tables in a page –Top-level: 4-entry page-directory-pointer-table (PDPT) Points to a page directory and then translation proceeds as before

Case Study #2: PowerPC Goal: Handle large virtual address spaces w/o large page tables 80-bit virtual address with segments 62-bit physical (“real”) address Idea: Inverted Page Tables Only need entries for virtual pages w/ physical mappings Too much time to search entire table Find relevant PTE with hash function Details of Hash Table (HTAB) Hash table (HTAB) contains PTEs Each HTAB entry is a page table entry group (PTEG) Each PTEG has 8 PTEs Hash function on VPN gives the index of two PTEGs (Primary and secondary PTEGs) –Resulting 16 PTEs searched for a VPN match –No match => page fault Why is it good to have PTEGs?

PowerPC Segmentation Segment Lookaside Buffer (SLB) is an “associative memory” Small # entries Top 36 bits of a process “effective” address used as “effective segment id (ESID)” tag Search for ESID tag value in SLB If a match exists, property bits validated for access A failed match causes fault Associated 52-bit virtual segment id (VSID) is concatenated with the remaining 28 bits to form an 80- bit virtual address Segmentation used to separate processes within the large virtual address space Segment Lookaside Buffer (SLB) 36-bit ESID28 address bits ESID52-bit VSID Property bits (U/S, X, V) Associative Lookup 52-bit VSID 64-bit “effective” address generated by a process 28 address bits 80-bit “virtual” address used for page table lookup Matching entry

PowerPC Page Table Lookup Variable size hash table Processor register points to hash table base and bounds Architecture-defined hash function on VPN returns two possible hash table entries Each of the 16 possible PTEs is checked for a VA match If no match then page fault Possibility that a translation exists but that it can’t fit in the hash table – OS must handle Hash Table (HTAB) 80-bit virtual address Primary hash index Secondary hash index Hash function Secondary PTEG Primary PTEG 16-byte PTE Match? No Match? Page Fault

PowerPC PTE Details Abbreviated Virtual Page NumberSW/HV //Real Page Number// ACAC RCWIMGNPP Key SW=Available for OS use H=Hash function ID V=Valid bit AC=Address compare bit R=Referenced bit C=Changed bit WIMG=Storage control bits N=No execute bit PP=Page protection bits PowerPC Operating Environment Architecture, Book III, Version 2.01, Sections

TLBs: Making Translation Fast Problem: Page table logically accessed on every instruction Each process memory reference requires at least three memory references Observation: Page table access has temporal locality Use a cache to speed up access Translation Lookaside Buffer (TLB) Small cache of recent PTEs (about 64 entries) Huge impact on performance Various organizations, search strategies, and levels of OS involvement possible

TLB Organization A AB ABCD ABCDELMNOP Direct mapped Fully associative Two-way set associative Four-way set associative Tag (virtual page number) Physical page number (page table entry) TLB Entry Various ways to organize a 16-entry TLB Lookup Calculate set (tag % num_sets) Search for tag within resulting set Why not use upper bits of tag for index? Set Index

TLB Associativity Trade-offs Higher associativity + Better utilization, fewer collisions – Slower – More hardware Lower associativity + Fast + Simple, less hardware – Greater chance of collisions How does page size affect TLB performance? TLB reach or coverage

Case Study: x86 TLB TLB management Shared by processor and OS Role of CPU Fills TLB on demand from page table (the OS is unaware of TLB misses) Evicts entries when a new entry is added and no free slots exist Role of OS Ensures TLB/page table consistency by flushing entries as needed when page tables are updated or switched (e.g. during context switch) TLB entries removed one at a time using INVLPG instruction Or entire TLB flushed by writing a new entry into CR3

Example: Pentium-M TLBs Four different TLBs Instruction TLB for 4K pages –128 entries, 4-way set associative Instruction TLB for large pages –2 entries, fully associative Data TLB for 4K pages –128 entries, 4-way set associative Data TLB for large pages –8 entries, 4-way set associative All TLBs use LRU replacement policy Why different TLBs for instruction and data?

Case Study: SPARC TLB SPARC is RISC CPU Philosophy of “simpler is better” TLB management “Software-managed” TLB TLB miss causes a fault, handled by OS Page table format not defined by architecture! Role of OS Explicitly adds entries to TLB Free to organize page tables in any way it wants because the CPU does not use them –E.g. Linux uses multiple levels like x86, Solaris uses a hash table

Speedup SPARC TLB: #1 Minimizing TLB Flushes Problem: TLB misses trap to OS (SLOW) We want to avoid TLB misses Idea: Retain TLB contents across context switch When a process is switched back onto CPU, chances are that some of its TLB entries have been retained from last time SPARC TLB entries enhanced with a context id Context id allows entries with same VPN to coexist in the TLB (e.g. entries from different process address spaces) Some TLB entries shared (OS kernel memory) –Mark as global –Context id ignored during matching

Example: UltraSPARC III TLBs Supports multiple page sizes 8K (default), 64K, 512K, and 4M Five different TLBs 2 Instruction TLBs 16 entries, fully associative (supports all page sizes) 128 entries, 2-way set associative (8K pages only) 3 Data TLBs 16 entries, fully associative (supports all page sizes) 2 x 512 entries, 2-way set associative (each supports one page size per process) 13-bit context id 8192 different concurrent address spaces What happens if you have > 8192 processes?

Speedup SPARC TLB: #2 Faster TLB Miss Handling Problem with software-managed TLBs: Huge amount of time can be spent handling TLB misses (2-50% in one study of SuperSPARC and SunOS) Idea: Have hardware-assisted TLB miss handling SPARC solution: Use large, virtually-indexed, direct-mapped, physically contiguous table of recently used TLB entries Translation Storage Buffer (TSB) On TLB miss, hardware calculates offset of matching entry in TSB and supplies it to software TLB miss handle –Software TLB miss handler only needs to make a tag comparison to the TSB entry, load it into the TLB, and return –If an access misses in TSB then a slow software search of page tables is required Context switch: Location of TSB is loaded into the processor (implies one TSB per process)