ECE/CS 552: Memory Hierarchy Instructor: Mikko H Lipasti Fall 2010 University of Wisconsin-Madison Lecture notes based on notes by Mark Hill Updated by.

Slides:

Advertisements

Similar presentations

Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.

Advertisements

Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.

Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.

CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and

1 A Real Problem  What if you wanted to run a program that needs more memory than you have?

Main Memory Prof. Mikko H. Lipasti University of Wisconsin-Madison ECE/CS 752 Spring 2012 Lecture notes based on notes by Jim Smith and Mark Hill Updated.

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Virtual Memory I Steve Ko Computer Sciences and Engineering University at Buffalo.

CS 153 Design of Operating Systems Spring 2015

Memory/Storage Architecture Lab Computer Architecture Virtual Memory.

The Memory Hierarchy (Lectures #24) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer Organization.

ECE/CS 552: Cache Performance Instructor: Mikko H Lipasti Fall 2010 University of Wisconsin-Madison Lecture notes based on notes by Mark Hill Updated by.

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

Memory Management (II)

S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.

Computer ArchitectureFall 2008 © CS : Computer Architecture Lecture 22 Virtual Memory (1) November 6, 2008 Nael Abu-Ghazaleh.

Computer ArchitectureFall 2008 © November 10, 2007 Nael Abu-Ghazaleh Lecture 23 Virtual.

Computer ArchitectureFall 2007 © November 21, 2007 Karem A. Sakallah Lecture 23 Virtual Memory (2) CS : Computer Architecture.

Translation Buffers (TLB’s)

Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.

©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan

Review of Memory Management, Virtual Memory CS448.

Lecture 19: Virtual Memory

July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.

Lecture Topics: 11/17 Page tables TLBs Virtual memory flat page tables

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui COMP 203 / NWEN 201 Computer Organisation / Computer Architectures Virtual.

1 Virtual Memory and Address Translation. 2 Review Program addresses are virtual addresses.  Relative offset of program regions can not change during.

3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems

The Three C’s of Misses 7.5 Compulsory Misses The first time a memory location is accessed, it is always a miss Also known as cold-start misses Only way.

Virtual Memory. Virtual Memory: Topics Why virtual memory? Virtual to physical address translation Page Table Translation Lookaside Buffer (TLB)

Virtual Memory Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

Virtual Memory.  Next in memory hierarchy  Motivations:  to remove programming burdens of a small, limited amount of main memory  to allow efficient.

CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.

Carnegie Mellon 1 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Virtual Memory: Concepts Slides adapted from Bryant.

Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.

ECE/CS 552: Main Memory and ECC © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and.

Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.

LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”

Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.

CDA 5155 Virtual Memory Lecture 27. Memory Hierarchy Cache (SRAM) Main Memory (DRAM) Disk Storage (Magnetic media) CostLatencyAccess.

Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.

CS161 – Design and Architecture of Computer

Memory Hierarchy Ideal memory is fast, large, and inexpensive

ECE232: Hardware Organization and Design

Memory COMPUTER ARCHITECTURE

CS161 – Design and Architecture of Computer

CS352H: Computer Systems Architecture

From Address Translation to Demand Paging

From Address Translation to Demand Paging

CS 704 Advanced Computer Architecture

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

ECE/CS 552: Virtual Memory

Cache Memory Presentation I

Morgan Kaufmann Publishers

Lecture 14 Virtual Memory and the Alpha Memory Hierarchy

ECE/CS 552: Cache Design © Prof. Mikko Lipasti

Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory

Translation Buffers (TLB’s)

CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs

CSE 451: Operating Systems Autumn 2004 Page Tables, TLBs, and Other Pragmatics Hank Levy 1.

Translation Buffers (TLB’s)

CSC3050 – Computer Architecture

CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs

Computer Architecture

CSE 451: Operating Systems Winter 2005 Page Tables, TLBs, and Other Pragmatics Steve Gribble 1.

Virtual Memory Lecture notes from MKP and S. Yalamanchili.

Translation Buffers (TLBs)

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

Review What are the advantages/disadvantages of pages versus segments?

Presentation transcript:

ECE/CS 552: Memory Hierarchy Instructor: Mikko H Lipasti Fall 2010 University of Wisconsin-Madison Lecture notes based on notes by Mark Hill Updated by Mikko Lipasti

© Hill, Lipasti 2 Memory Hierarchy Registers On-Chip SRAM Off-Chip SRAM DRAM Disk CAPACITY SPEED and COST

© Hill, Lipasti 3 Memory Hierarchy CPU I & D L1 Cache Shared L2 Cache Main Memory Disk Temporal Locality Keep recently referenced items at higher levels Future references satisfied quickly Spatial Locality Bring neighbors of recently referenced to higher levels Future references satisfied quickly

© Hill, Lipasti 4 Four Burning Questions These are: –Placement Where can a block of memory go? –Identification How do I find a block of memory? –Replacement How do I make space for new blocks? –Write Policy How do I propagate changes? Consider these for registers and main memory –Main memory usually DRAM

© Hill, Lipasti 5 Placement Memory Type PlacementComments RegistersAnywhere; Int, FP, SPR Compiler/programmer manages Cache (SRAM) Fixed in H/WDirect-mapped, set-associative, fully-associative DRAMAnywhereO/S manages DiskAnywhereO/S manages

© Hill, Lipasti 6 Register File Registers managed by programmer/compiler –Assign variables, temporaries to registers –Limited name space matches available storage –Learn more in CS536, CS701 PlacementFlexible (subject to data type) IdentificationImplicit (name == location) ReplacementSpill code (store to stack frame) Write policyWrite-back (store on replacement)

© Hill, Lipasti 7 Main Memory and Virtual Memory Use of virtual memory –Main memory becomes another level in the memory hierarchy –Enables programs with address space or working set that exceed physically available memory No need for programmer to manage overlays, etc. Sparse use of large address space is OK –Allows multiple users or programs to timeshare limited amount of physical memory space and address space Bottom line: efficient use of expensive resource, and ease of programming

© Hill, Lipasti 8 Virtual Memory Enables –Use more memory than system has –Program can think it is the only one running Don’t have to manage address space usage across programs E.g. think it always starts at address 0x0 –Memory protection Each program has private VA space: no-one else can clobber –Better performance Start running a large program before all of it has been loaded from disk

© Hill, Lipasti 9 Virtual Memory – Placement Main memory managed in larger blocks –Page size typically 4K – 16K Fully flexible placement; fully associative –Operating system manages placement –Indirection through page table –Maintain mapping between: Virtual address (seen by programmer) Physical address (seen by main memory)

© Hill, Lipasti 10 Virtual Memory – Placement Fully associative implies expensive lookup? –In caches, yes: check multiple tags in parallel In virtual memory, expensive lookup is avoided by using a level of indirection –Lookup table or hash table –Called a page table

© Hill, Lipasti 11 Virtual Memory – Identification Similar to cache tag array –Page table entry contains VA, PA, dirty bit Virtual address: –Matches programmer view; based on register values –Can be the same for multiple programs sharing same system, without conflicts Physical address: –Invisible to programmer, managed by O/S –Created/deleted on demand basis, can change Virtual AddressPhysical AddressDirty bit 0x x2000Y/N

© Hill, Lipasti 12 Virtual Memory – Replacement Similar to caches: –FIFO –LRU; overhead too high Approximated with reference bit checks Clock algorithm –Random O/S decides, manages –CS537

© Hill, Lipasti 13 Virtual Memory – Write Policy Write back –Disks are too slow to write through Page table maintains dirty bit –Hardware must set dirty bit on first write –O/S checks dirty bit on eviction –Dirty pages written to backing store Disk write, 10+ ms

© Hill, Lipasti 14 Virtual Memory Implementation Caches have fixed policies, hardware FSM for control, pipeline stall VM has very different miss penalties –Remember disks are 10+ ms! Hence engineered differently

© Hill, Lipasti 15 Page Faults A virtual memory miss is a page fault –Physical memory location does not exist –Exception is raised, save PC –Invoke OS page fault handler Find a physical page (possibly evict) Initiate fetch from disk –Switch to other task that is ready to run –Interrupt when disk access complete –Restart original instruction Why use O/S and not hardware FSM?

© Hill, Lipasti 16 Address Translation O/S and hardware communicate via PTE How do we find a PTE? –&PTE = PTBR + page number * sizeof(PTE) –PTBR is private for each program Context switch replaces PTBR contents VAPADirtyRefProtection 0x x2000Y/N Read/Write/ Execute

© Hill, Lipasti 17 Address Translation PAVADPTBR Virtual Page NumberOffset +

© Hill, Lipasti 18 Page Table Size How big is page table? –2 32 / 4K * 4B = 4M per program (!) –Much worse for 64-bit machines To make it smaller –Use limit register(s) If VA exceeds limit, invoke O/S to grow region –Use a multi-level page table –Make the page table pageable (use VM)

© Hill, Lipasti 19 Multilevel Page Table PTBR+ Offset + +

© Hill, Lipasti 20 Hashed Page Table Use a hash table or inverted page table –PT contains an entry for each real address Instead of entry for every virtual address –Entry is found by hashing VA –Oversize PT to reduce collisions: #PTE = 4 x (#phys. pages)

© Hill, Lipasti 21 Hashed Page Table PTBR Virtual Page NumberOffset Hash PTE2PTE1PTE0PTE3

© Hill, Lipasti 22 High-Performance VM VA translation –Additional memory reference to PTE –Each instruction fetch/load/store now 2 memory references Or more, with multilevel table or hash collisions –Even if PTE are cached, still slow Hence, use special-purpose cache for PTEs –Called TLB (translation lookaside buffer) –Caches PTE entries –Exploits temporal and spatial locality (just a cache)

© Hill, Lipasti 23 TLB

© Hill, Lipasti 24 Virtual Memory Protection Each process/program has private virtual address space –Automatically protected from rogue programs Sharing is possible, necessary, desirable –Avoid copying, staleness issues, etc. Sharing in a controlled manner –Grant specific permissions Read Write Execute Any combination –Store permissions in PTE and TLB

© Hill, Lipasti 25 VM Sharing Share memory locations by: –Map shared physical location into both address spaces: E.g. PA 0xC00DA becomes: –VA 0x2D000DA for process 0 –VA 0x4D000DA for process 1 –Either process can read/write shared location However, causes synonym problem

© Hill, Lipasti 26 VA Synonyms Virtually-addressed caches are desirable –No need to translate VA to PA before cache lookup –Faster hit time, translate only on misses However, VA synonyms cause problems –Can end up with two copies of same physical line Solutions: –Flush caches/TLBs on context switch –Extend cache tags to include PID & prevent duplicates Effectively a shared VA space (PID becomes part of address)

© Hill, Lipasti 27 Main Memory Design Storage in commodity DRAM How do we map these to logical cache organization? –Block size –Bus width –Etc.

© Hill, Lipasti 28 Main Memory Design

© Hill, Lipasti 29 Main Memory Access Each memory access –1 cycle address –5 cycle DRAM (really >> 10) –1 cycle data –4 word cache block one word wide: (a=addr, d=delay, b=bus) –adddddbdddddbdddddbdddddbdddddb –1 + 4 *(5+1) = 25 cycles

© Hill, Lipasti 30 Main Memory Access Four word wide: –adddddb – = 7 cycles Interleaved (pipelined) –adddddb – ddddd b – = 10 cycles

© Hill, Lipasti 31 Error Detection and Correction Main memory stores a huge number of bits –Probability of bit flip becomes nontrivial –Bit flips (called soft errors) caused by Slight manufacturing defects Gamma rays and alpha particles Interference Etc. –Getting worse with smaller feature sizes Reliable systems must be protected from soft errors via ECC (error correction codes) –Even PCs support ECC these days

© Hill, Lipasti 32 Error Correcting Codes Probabilities: P(1 word no errors) > P(single error) > P(two errors) >> P(>2 errors) Detection - signal a problem Correction - restore data to correct value Most common –Parity - single error detection –SECDED - single error correction; double bit detection Supplemental reading on course web page!

© Hill, Lipasti 33 1-bit ECC PowerCorrect#bitsComments Nothing0,11 SED00,11201,10 detect errors SEC000, ,010,100 => 0 110,101,011 => 1 SECDED0000,11114One 1 => 0 Two 1’s => error Three 1’s => 1

© Hill, Lipasti 34 ECC Hamming distance –No. of changes to convert one code to another –All legal SECDED codes are at Hamming distance of 4 I.e. in single-bit SECDED, all 4 bits flip to go from representation for ‘0’ to representation for ‘1’ # 1’s01234 Result00Err11

© Hill, Lipasti 35 ECC Reduce overhead by doing codes on word, not bit # bitsSED overheadSECDED overhead 11 (100%)3 (300%) 321 (3%)7 (22%) 641 (1.6%)8 (13%) n1 (1/n)1 + log 2 n + a little

© Hill, Lipasti bit ECC 64 bits data with 8 check bits dddd…..dccccccccc Use eight by 9 SIMMS = 72 bits Intuition –One check bit is parity –Other check bits point to Error in data, or Error in all check bits, or No error

© Hill, Lipasti 37 ECC To store (write) –Use data 0 to compute check 0 –Store data 0 and check 0 To load –Read data 1 and check 1 –Use data 1 to compute check 2 –Syndrome = check 1 xor check 2 I.e. make sure check bits are equal

© Hill, Lipasti 38 ECC Syndrome SyndromeParityImplications 0OKdata 1 ==data 0 n != 0Not OK Flip bit n of data 1 to get data 0 n != 0OKSignal uncorrectable error

© Hill, Lipasti 39 4-bit SECDED Code C n parity bits chosen specifically to: –Identify errors in bits where bit n of the index is 1 –C 1 checks all odd bit positions (where LSB=1) –C 2 checks all positions where middle bit=1 –C 3 checks all positions where MSB=1 Hence, nonzero syndrome points to faulty bit Bit Position CodewordC1C1 C2C2 b1b1 C3C3 b2b2 b3b3 b4b4 P C1C1 XXXX C2C2 XXXX C3C3 XXXX PXXXXXXXX

© Hill, Lipasti 40 4-bit SECDED Example 4 data bits, 3 check bits, 1 parity bit Syndrome is xor of check bits C 1-3 –If (syndrome==0) and (parity OK) => no error –If (syndrome != 0) and (parity !OK) => flip bit position pointed to by syndrome –If syndrome != 0) and (parity OK) => double-bit error Bit Position CodewordC1C1 C2C2 b1b1 C3C3 b2b2 b3b3 b4b4 P Original data Syndrome No corruption , P ok 1 bit corrupted , P !ok 2 bits corrupted , P ok

© Hill, Lipasti 41 Summary Memory hierarchy: Register file –Under compiler/programmer control –Complex register allocation algorithms to optimize utilization Memory hierarchy: Virtual Memory –Placement: fully flexible –Identification: through page table –Replacement: approximate LRU or LFU –Write policy: write-through

© Hill, Lipasti 42 Summary Page tables –Forward page table &PTE = PTBR + VPN * sizeof(PTE) –Multilevel page table Tree structure enables more compact storage for sparsely populated address space –Inverted or hashed page table Stores PTE for each real page instead of each virtual page HPT size scales up with physical memory –Also used for protection, sharing at page level

© Hill, Lipasti 43 Summary TLB –Special-purpose cache for PTEs –Often accessed in parallel with L1 cache Main memory design –Commodity DRAM chips –Wide design space for Minimizing cost, latency Maximizing bandwidth, storage –Susceptible to soft errors Protect with ECC (SECDED) ECC also widely used in on-chip memories, busses