ECE232: Hardware Organization and Design

Slides:



Advertisements
Similar presentations
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a cache for secondary (disk) storage – Managed jointly.
Advertisements

Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
EECS 470 Virtual Memory Lecture 15. Why Use Virtual Memory? Decouples size of physical memory from programmer visible virtual memory Provides a convenient.
Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Virtual Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley and Rabi Mahapatra & Hank Walker.
Virtual Memory.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 16 Instructor: L.N. Bhuyan
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Virtual Memory - II Lecturer: Hui Wu Session 2, 2005 Modified.
ECE 232 L27.Virtual.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 27 Virtual.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
CS 430 – Computer Architecture 1 CS 430 – Computer Architecture Virtual Memory William J. Taffe using slides of David Patterson.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 24 Instructor: L.N. Bhuyan
COMP3221 lec37-vm-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 13: Virtual Memory - II
CSE431 L22 TLBs.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 22. Virtual Memory Hardware Support Mary Jane Irwin (
Lecture 19: Virtual Memory
Lecture 15: Virtual Memory EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Operating Systems COMP 4850/CISG 5550 Page Tables TLBs Inverted Page Tables Dr. James Money.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
Review (1/2) °Caches are NOT mandatory: Processor performs arithmetic Memory stores data Caches simply make data transfers go faster °Each level of memory.
Review °Apply Principle of Locality Recursively °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings.
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.
CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
CS203 – Advanced Computer Architecture Virtual Memory.
CS161 – Design and Architecture of Computer
Memory Management Virtual Memory.
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Lecture 12 Virtual Memory.
From Address Translation to Demand Paging
From Address Translation to Demand Paging
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Memory Hierarchy Virtual Memory, Address Translation
Morgan Kaufmann Publishers
Chapter 4 Large and Fast: Exploiting Memory Hierarchy Part 2 Virtual Memory 박능수.
Virtual Memory 3 Hakim Weatherspoon CS 3410, Spring 2011
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Csci 211 Computer System Architecture – Review on Virtual Memory
Andy Wang Operating Systems COP 4610 / CGS 5765
Andy Wang Operating Systems COP 4610 / CGS 5765
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Translation Buffers (TLB’s)
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Virtual Memory Prof. Eric Rotenberg
CSC3050 – Computer Architecture
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Paging and Segmentation
Virtual Memory Lecture notes from MKP and S. Yalamanchili.
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Sarah Diesburg Operating Systems CS 3430
Review What are the advantages/disadvantages of pages versus segments?
Sarah Diesburg Operating Systems COP 4610
Presentation transcript:

ECE232: Hardware Organization and Design Part 16: Virtual Memory Chapter 7 http://www.ecs.umass.edu/ece/ece232/ Other handouts Course schedule with due dates To handout next time HW#1 Combinations to AV system, etc. 5581 (1988 in 113 IST) Call AV hot line at 8-777-0035

Virtual Memory - Objectives Allow program to be written without memory constraints program can exceed the size of the main memory Many Programs sharing DRAM Memory so that context switches can occur Relocation: Parts of the program can be placed at different locations in the memory instead of a big chunk Virtual Memory: Main Memory holds many programs running at same time (processes) use Main Memory as a kind of “cache” for disk Control Datapath Processor Regs Cache Disk Main Memory (DRAM)

Disk Technology in Brief Disk is mechanical memory tracks R/W arm 3600–4200-5400-7200-10000 RPM rotation speed Disk Access Time = seek time + rotational delay + transfer time usually measured in milliseconds “Miss” to disk is extremely expensive typical access time = millions of clock cycles Addressing a sector

Virtual to Physical Memory mapping Each process has its own private “virtual address space” (e.g., 232 Bytes); CPU actually generates “virtual addresses” Each computer has a “physical address space” (e.g., 128 MB DRAM); also called “real memory” Address translation: mapping virtual addresses to physical addresses Virtual Memory Allows some chunks of virtual memory to be present on disk, not in main memory Allows multiple programs to use (different chunks of physical) memory at same time Physical Memory virtual address Processor

Mapping Virtual Memory to Physical Memory Divide Memory into equal sized pages (say, 4KB each) A page of Virtual Memory can be assigned to any page frame of Physical Memory 4GB Stack Physical Memory Single Process 128 MB Heap Heap Static Code

How to Perform Address Translation? VM divides memory into equal sized pages Address translation maps entire pages offsets within the pages do not change if page size is a power of two, the virtual address separates into two fields: like cache index, offset fields virtual address Virtual Page Number Page Offset

Mapping Virtual to Physical Address Virtual Address 31 30 29 28 27 .………………….12 11 10 9 8 ……..……. 3 2 1 0 Virtual Page Number Page Offset 1KB page size Translation Physical Page Number Page Offset 27 ……..………………….12 11 10 9 8 ……..……. 3 2 1 0 Physical Address

Miss_penalty = 10^6 cycles Address Translation Want fully associative page placement How to locate the physical page? Search impractical (too many pages) A page table is a data structure which contains the mapping of virtual pages to physical pages There are several different ways, all up to the operating system, to keep and update this data Each process running in the system has its own page table Virtual Address Hit_time = 70-100 CPU cycles Miss_penalty = 10^6 cycles Miss_rate = 1% CPU Page Offset Virtual Page No. Page Table Memory Page Offset Physical Page No. Physical Address Address Mapping

Address Translation: Page Table Virtual Address (VA): virtual page # offset Page Table ... Page Table Register V A.R. P. P. N. Access Rights Val -id Physical Page Number index into page table Physical Memory Address (PA) 1 A.R. P. P. N. A.R. ... Page Table is located in physical memory disk Access Rights: None, Read Only, Read/Write, Execute

Page Table

Handling Page Faults A page fault is like a cache miss Must find page in lower level of hierarchy If valid bit is zero, the Physical Page Number points to a page on disk When OS starts new process, it creates space on disk for all the pages of the process, sets all valid bits in page table to zero, and all Physical Page Numbers to point to disk called Demand Paging - pages of the process are loaded from disk only as needed

Comparing the 2 hierarchies Cache Virtual Memory Block or Line Page Miss Page Fault Block Size: 32-64B Page Size: 4K-16KB Placement: Fully Associative Direct Mapped, N-way Set Associative Replacement: Least Recently Used LRU or Random (LRU) approximation Write Thru or Back Write Back How Managed: Hardware + Software Hardware (Operating System)

Optimizing for Space Page Table too big! 4GB Virtual Address Space ÷ 4 KB page 1 million Page Table Entries 4 MB just for Page Table of single process! Variety of solutions to tradeoff Page Table size for slower performance Multilevel page table, Paging page tables, etc. (Take O/S Class to learn more)

How to Translate Fast? Problem: Virtual Memory requires two memory accesses! one to translate Virtual Address into Physical Address (page table lookup) - Page Table is in physical memory one to transfer the actual data (hopefully cache hit) Observation: since there is locality in pages of data, must be locality in virtual addresses of those pages! Why not create a cache of virtual to physical address translations to make translation fast? (smaller is faster) For historical reasons, such a “page table cache” is called a Translation Lookaside Buffer, or TLB

Translation-Lookaside Buffer (TLB) Physical Page 0 Physical Page 1 of page 1 Physical Page N-1 Main Memory H. Stone, “High Performance Computer Architecture,” AW 1993

TLB and Page Table

Typical TLB Format Virtual Physical Valid Ref Dirty Access TLB just a cache of the page table mappings Dirty: since use write back, need to know whether or not to write page to disk when replaced Ref: Used to calculate LRU on replacement Reference bit - set when page accessed; OS periodically sorts and moves the referenced pages to the top & resets all Ref bits Must provide timer interrupt to update LRU bits TLB access time comparable to cache (much less than main memory access time) Virtual Physical Valid Ref Dirty Access Page Number Page Number Rights “tag” “data”

Translation Look-Aside Buffers TLB is usually small, typically 32-512 entries Like any other cache, the TLB can be fully associative, set associative, or direct mapped data data virtual addr. physical addr. TLB hit Cache hit miss Main Memory Processor miss Page Table Disk Memory OS Fault Handler page fault/ protection violation

Steps in Memory Access - Example

64 entries, fully associative 16K entries, direct mapped DECStation 3100/ MIPS R2000 3 1 3 2 9 1 5 1 4 1 3 1 2 1 1 1 9 8 3 2 1 Virtual Address V i r t u a l p a g e n u m b e r P a g e o f f s e t 2 1 2 TLB V a l i d D i r t y T a g P h y s i c a l p a g e n u m b e r T L B h i t 64 entries, fully associative 2 Physical Address P h y s i c a l p a g e n u m b e r P a g e o f f s e t P h y s i c a l a d d r e s s t a g C a c h e i n d e x B y t e 1 6 1 4 2 o f f s e t V a l i d T a g D a t a Cache 16K entries, direct mapped 3 2 C a c h e h i t D a t a

Real Stuff: Pentium Pro Memory Hierarchy Address Size: 32 bits (VA, PA) VM Page Size: 4 KB, 4 MB TLB organization: separate i,d TLBs (i-TLB: 32 entries, d-TLB: 64 entries) 4-way set associative LRU approximated hardware handles miss L1 Cache: 8 KB, separate i,d 4-way set associative LRU approximated 32 byte block write back L2 Cache: 256 or 512 KB

Intel “Nehalim” quad-core processor 13.519.6 mm die; 731 million transistors; Two 128-bit memory channels Each processor has: private 32-KB instruction and 32-KB data caches and a 512-KB L2 cache. The four cores share an 8-MB L3 cache. Each core also has a two-level TLB.

Comparing Intel’s Nehalim to AMD’s Opteron Intel Nehalem AMD Opteron X4 Virtual addr 48 bits Physical addr 44 bits Page size 4KB, 2/4MB L1 TLB (per core) L1 I-TLB: 128 entries L1 D-TLB: 64 entries Both 4-way, LRU replacement L1 I-TLB: 48 entries L1 D-TLB: 48 entries Both fully associative, LRU replacement L2 TLB (per core) Single L2 TLB: 512 entries 4-way, LRU replacement L2 I-TLB: 512 entries L2 D-TLB: 512 entries Both 4-way, round-robin LRU TLB misses Handled in hardware

Further Comparison Intel Nehalem AMD Opteron X4 L1 caches (per core) L1 I-cache: 32KB, 64-byte blocks, 4-way, approx LRU, hit time n/a L1 D-cache: 32KB, 64-byte blocks, 8-way, approx LRU, write-back/allocate, hit time n/a L1 I-cache: 32KB, 64-byte blocks, 2-way, LRU, hit time 3 cycles L1 D-cache: 32KB, 64-byte blocks, 2-way, LRU, write-back/allocate, hit time 9 cycles L2 unified cache (per core) 256KB, 64-byte blocks, 8-way, approx LRU, write-back/allocate, hit time n/a 512KB, 64-byte blocks, 16-way, approx LRU, write-back/allocate, hit time n/a L3 unified cache (shared) 8MB, 64-byte blocks, 16-way, write-back/allocate, hit time n/a 2MB, 64-byte blocks, 32-way, write-back/allocate, hit time 32 cycles