Vm.1 EEL-4713 Computer Architecture Virtual Memory.

Slides:



Advertisements
Similar presentations
Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.
Advertisements

Virtual Memory. The Limits of Physical Addressing CPU Memory A0-A31 D0-D31 “Physical addresses” of memory locations Data All programs share one address.
OS Fall’02 Virtual Memory Operating Systems Fall 2002.
Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
The Memory Hierarchy (Lectures #24) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer Organization.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
Virtual Memory Adapted from lecture notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley and Rabi Mahapatra & Hank Walker.
Translation Look-Aside Buffers TLBs usually small, typically entries Like any other cache, the TLB can be fully associative, set associative,
Virtual Memory.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.
Memory Management and Paging CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han.
Paging and Virtual Memory. Memory management: Review  Fixed partitioning, dynamic partitioning  Problems Internal/external fragmentation A process can.
ECE 232 L27.Virtual.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 27 Virtual.
Translation Buffers (TLB’s)
1 Chapter 8 Virtual Memory Virtual memory is a storage allocation scheme in which secondary memory can be addressed as though it were part of main memory.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
Vm Computer Architecture Lecture 16: Virtual Memory.
CPE 442 vm.1 Introduction To Computer Architecture CpE 442 Virtual Memory.
CSE431 L22 TLBs.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 22. Virtual Memory Hardware Support Mary Jane Irwin (
Lecture 19: Virtual Memory
Lecture 15: Virtual Memory EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Operating Systems ECE344 Ding Yuan Paging Lecture 8: Paging.
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
IT253: Computer Organization
CpE 442 Virtual Memory Start: X:40.
© 2004, D. J. Foreman 1 Virtual Memory. © 2004, D. J. Foreman 2 Objectives  Avoid copy/restore entire address space  Avoid unusable holes in memory.
Virtual Memory Part 1 Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology May 2, 2012L22-1
Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)
University of Amsterdam Computer Systems – virtual memory Arnoud Visser 1 Computer Systems Virtual Memory.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.
CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Virtual Memory Slides adapted from Dr. Valeriu Bieu, (Washington State University, EE 334, Spring 2005) (originally adapted from Dr. David Patterson, UC.
Virtual Memory Chapter 8.
CS161 – Design and Architecture of Computer
Memory Hierarchy Ideal memory is fast, large, and inexpensive
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
CS352H: Computer Systems Architecture
Virtual Memory Provides illusion of very large memory
From Address Translation to Demand Paging
From Address Translation to Demand Paging
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Morgan Kaufmann Publishers
CSE 153 Design of Operating Systems Winter 2018
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Translation Buffers (TLB’s)
Translation Buffers (TLB’s)
CSC3050 – Computer Architecture
CSE 153 Design of Operating Systems Winter 2019
Virtual Memory Lecture notes from MKP and S. Yalamanchili.
Translation Buffers (TLBs)
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Review What are the advantages/disadvantages of pages versus segments?
Presentation transcript:

vm.1 EEL-4713 Computer Architecture Virtual Memory

vm.2 Outline °Recap of Memory Hierarchy °Virtual Memory °Page Tables and TLB °Protection

vm.3 Memory addressing - physical °So far we considered addresses of loads/stores go directly to caches/memory As in your project °This makes life complicated if a computer is multi-processed/multi-user How do you assign addresses within a program so that you know other users/programs will not conflict with them? Program A:Program B: store 0x100,1 store 0x100,5 load R1,0x100

vm.4 Virtual Memory? Provides illusion of very large memory – sum of the memory of many jobs greater than physical memory – address space of each job larger than physical memory Allows available (fast and expensive) physical memory to be efficiently utilized Simplifies memory management and programming Exploits memory hierarchy to keep average access time low. Involves at least two storage levels: main and secondary Main (DRAM): nanoseconds, M/GBytes Secondary (HD): miliseconds, G/TBytes Virtual Address -- address used by the programmer Virtual Address Space -- collection of such addresses Memory Address -- address of word in physical memory also known as “physical address” or “real address”

vm.5 Memory addressing - virtual Program A:Program B: store 0x100,1 store 0x100,5 load R1,0x100 Translation A:Translation B: 0x100 -> 0x x100 -> 0x Use software and hardware to guarantee no conflicts Operating system: keep software translation tables Hardware: cache recent translations

vm.6 Basic Issues in VM System Design size of information blocks (pages) that are transferred from secondary (disk) to main storage (Mem) Page brought into Mem, if Mem is full some page of Mem must be released to make room for the new page --> replacement policy missing page fetched from secondary memory only on the occurrence of a page fault --> fetch/load policy Paging Organization virtual and physical address space partitioned into blocks of equal size page frames pages reg cache mem disk frame

vm.7 Address Map V = {0, 1,..., n - 1} virtual address space M = {0, 1,..., m - 1} physical address space MAP: V --> M U {0} address mapping function n can be > m MAP(a) = a' if data at virtual address a is present in physical address a' in M = 0 if data at virtual address a is not present in M need to allocate address in M Processor Name Space V Addr Trans Mechanism fault handler Main Memory Secondary Memory a a a' 0 missing item fault physical address OS performs this transfer

vm.8 Paging Organization frame Phys Addr (PA) Physical Memory 1K Addr Trans MAP page K Page size: unit of mapping also unit of transfer from virtual to physical memory Virtual Memory Address Mapping VApage no.Page offset 10 Page Table index into page table Page Table Base Reg V Access Rights PA + table located in physical memory physical memory address (concatenation) Virt. Addr (VA) Page table stored in memory One page table per process Start of page table stored in page table base register V – Is page in memory or on disk

vm.9 Address Mapping Algorithm If V = 1 (where is page currently stored?) then page is in main memory at frame address stored in table else page is located in secondary memory (location determined at process creation) Access Rights R = Read-only, R/W = read/write, X = execute only If kind of access not compatible with specified access rights, then protection_violation_fault If valid bit not set then page fault Terms: Protection Fault: access rights violation; hardware raises exception, microcode, or software fault handler Page Fault: page not resident in physical memory, also causes a trap; usually accompanied by a context switch: current process suspended while page is fetched from secondary storage; page faults usually handled in software by OS because page fault to secondary memory takes million+ cycles

vm.10 *Hardware/software interface °What checks does the processor perform during a load/store memory access? Effective address computed in pipeline is virtual Before accessing memory, must perform virtual-physical mapping -At hardware speed, critical to performance If there is a valid mapping, load/store proceeds as usual; address sent to cache, DRAM is the mapped address (physical addressed) If there is no valid mapping, or if there is a protection violation, processor does not know how to handle it -Throw an exception –Save the PC of the instruction that caused the exception so that it can be retried later –Jump into an operating system exception handling routine -O/S handles exception using its specific policies (Linux, Windows will behave differently) -Once it finishes handling, issue “return from interrupt” instruction to recover PC and try instruction again

vm.11 Virtual Address and a Cache CPU Trans- lation Cache Main Memory VAPA miss hit data It takes an extra memory access to translate VA to PA This makes cache access very expensive, and this is the "innermost loop" that you want to go as fast as possible ASIDE: Why access cache with PA at all? VA caches have a problem! synonym problem: 1. two different virtual addresses map to same physical address => two different cache entries holding data for the same physical address! (data sharing between different processes) 2. two same virtual addresses (from different processes) map to different physical addresses

vm.12 TLBs A way to speed up translation is to use a special cache of recently used page table entries -- this has many names, but the most frequently used is Translation Lookaside Buffer or TLB Virtual Address Physical Address Dirty Ref Valid Access TLB access time comparable to, though shorter than, cache access time (still much less than main memory access time)

vm.13 Translation Look-Aside Buffers Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped TLBs are usually small, typically not more than hundreds of entries. This permits large/full associativity. CPU TLB Lookup Cache Main Memory VAPA miss hit data Trans- lation hit miss 20 t t 1/2 t Translation with a TLB

vm.14 Reducing Translation Time Machines with TLBs can go one step further to reduce # cycles/cache access They overlap the cache access with the TLB access Works because high order bits of the VA are used to look up in the TLB while low order bits are used as index into cache * Virtually indexed, physically tagged.

vm.15 Overlapped Cache & TLB Access TLB Cache bytes index 1 K page #disp assoc lookup 32 PA Hit/ Miss PA Data Hit/ Miss = IF cache hit AND (cache tag = PA) then deliver data to CPU ELSE IF [cache miss OR (cache tag != PA)] and TLB hit THEN access memory with the PA from the TLB ELSE do standard VA translation

vm.16 Problems With Overlapped TLB Access Overlapped access only works as long as the address bits used to index into the cache do not change as the result of VA translation This usually limits things to small caches, large page sizes, or high n-way set associative caches if you want a large cache (e.g., cache size and page size need to match) Example: suppose everything the same except that the cache is increased to 8 K bytes instead of 4 K: virt page #disp cache index This bit is changed by VA translation, but is needed for cache lookup Solutions: go to 8K byte page sizes go to 2 way set associative cache (would allow you to continue to use a 10 bit index) 1K way set assoc cache

vm.17 Hardware versus software TLB management °The TLB misses can be handled either by software or hardware Software: processor has instructions to modify TLB in the architecture; O/S handles replacement E.g. MIPS Hardware: processor handles replacement without need for instructions to store TLB entries E.g. x86 °Instructions that cause TLB “flushes” are needed in hardware case too

vm.18 Optimal Page Size Choose page that minimizes fragmentation large page size => internal fragmentation more severe (unused memory) BUT increase in the # of pages / name space => larger page tables In general, the trend is towards larger page sizes because Most machines at 4K-64K byte pages today, with page sizes likely to increase -- memories get larger as the price of RAM drops -- the gap between processor speed and disk speed grows wider Larger pages can exploit more spatial locality in transfers between disk and memory -- programmers desire larger virtual address spaces

vm.19 2-level page table Seg 0 Seg 1 Seg bytes 256 P0 P255 4 bytes 1 K PA D0 D1023 PA Root Page Tables Data Pages 4 K Second Level Page Table x x x = Allocated in User Virtual Space 1 Mbyte, but allocated in system virtual addr space 256K bytes in physical memory

vm.20 Example: two-level address translation in x86 CR3 directory table VA PA

vm.21 Example: Linux VM °Demand-paging Pages are brought into physical memory when referenced °Kernel keeps track of each process’ virtual address space using a mm_struct data structure Which contains pointers to list of “area” structures (vm_area_struct)

vm.22 vm_next Linux VM areas task_struct mm_struct mm mmap vm_area_struct vm_end vm_prot vm_start vm_end vm_prot vm_start vm_end vm_prot vm_next vm_start process virtual memory text data shared libraries Linked list of vm_area structures associated with process Start/end: -Bounds of each VM map Prot: -Protection info (r/w)

vm.23 VM “areas” °Linked vm_area_struct structures °A VM area: a part of the process virtual memory space that has a special rule for the page-fault handlers (i.e. a shared library, the executable area etc). °These are specified in a vm_area_struct Start and end VM address of area Protection information, flags

vm.24 Linux fault handling vm_area_struct vm_end r/o vm_next vm_start vm_end r/w/ vm_next vm_start vm_end r/o vm_next vm_start process virtual memory VMA3 VMA2 VMA1 0 °Exception triggers O/S handling if address out of bounds or protection violated °Traverse vm_area list, check for bounds If not mapped, it is a segmentation violation – signal to process °If mapped, check protection of vm_area_struct E.g. r/o, r/w Signal protection violation to process if access not allowed Otherwise, handle fault and bring page to memory

vm.25 Page Replacement Algorithms Just like cache block replacement! Least Recently Used: -- selects the least recently used page for replacement -- requires knowledge about past references, more difficult to implement -- good performance, recognizes principle of locality -- hard to keep track – update a structure on each memory reference?

vm.26 Page Replacement (Continued) Not Recently Used: Associated with each page is a reference flag such that ref flag = 1 if the page has been referenced in recent past = 0 otherwise -- if replacement is necessary, choose any page frame such that its reference bit is 0. This is a page that has not been referenced in the recent past -- an implementation of NRU: page table entry page table entry ref bit last replaced pointer (lrp) if replacement is to take place, advance lrp to next entry (mod table size) until one with a 0 bit is found; this is the target for replacement; As a side effect, all examined PTE's have their reference bits set to zero. 1 0 An optimization is to search for the a page that is both not recently referenced AND not dirty.

vm.27 Demand Paging and Prefetching Pages Fetch Policy when is the page brought into memory? if pages are loaded solely in response to page faults, then the policy is demand paging An alternative is prefetching: anticipate future references and load such pages before their actual use + reduces page transfer overhead - removes pages already in page frames, which could adversely affect the page fault rate - predicting future references usually difficult Most systems implement demand paging without prefetching (One way to obtain effect of prefetching behavior is increasing the page size

vm.28 Discussion – caches or no caches? °Caches help performance when there is locality but can be overhead if locality is not high Each hit/miss decision at each cache level requires a lookup Some applications can run better with fewer cache levels °Example: “Cell” processor No cache on attached processing units There is a small memory array next to each unit, but it is handled by software (not a cache controller)

vm.29

vm.30

vm.31 Summary °Virtual memory: a mechanism to provide much larger memory than physically available memory in the system °Placement, replacement and other policies can have significant impact on performance °Interaction of Virtual memory with physical memory hierarchy is complex and addresses translation mechanisms must be designed carefully for good performance.