Computer Architecture, Memory Hierarchy & Virtual Memory

Slides:



Advertisements
Similar presentations
Virtual Memory In this lecture, slides from lecture 16 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission.
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Multi-Level Caches Vittorio Zaccaria. Preview What you have seen: Data organization, Associativity, Cache size Policies -- how to manage the data once.
Lecture 12 Reduce Miss Penalty and Hit Time
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
Virtual Memory 3 Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University P & H Chapter
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Virtual Memory 2 P & H Chapter
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Associative Cache Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word (or sub-address in line) Tag.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 24 Instructor: L.N. Bhuyan
Computer Organization and Architecture
Lecture 19: Virtual Memory
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
IT253: Computer Organization
Memory and cache CPU Memory I/O. CEG 320/52010: Memory and cache2 The Memory Hierarchy Registers Primary cache Secondary cache Main memory Magnetic disk.
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
The Three C’s of Misses 7.5 Compulsory Misses The first time a memory location is accessed, it is always a miss Also known as cold-start misses Only way.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
Computer Organization & Programming
Memory Hierarchy. Hierarchy List Registers L1 Cache L2 Cache Main memory Disk cache Disk Optical Tape.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
Constructive Computer Architecture Virtual Memory: From Address Translation to Demand Paging Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
CS203 – Advanced Computer Architecture Virtual Memory.
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
Virtual Memory Chapter 7.4.
Cache Memory.
The Memory System (Chapter 5)
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Memory and cache CPU Memory I/O.
Lecture 12 Virtual Memory.
CS 704 Advanced Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Cache Memory Presentation I
William Stallings Computer Organization and Architecture 7th Edition
Virtual Memory 3 Hakim Weatherspoon CS 3410, Spring 2011
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
CMSC 611: Advanced Computer Architecture
Memory and cache CPU Memory I/O.
Lecture 23: Cache, Memory, Virtual Memory
ECE 445 – Computer Organization
Chapter 6 Memory System Design
Chap. 12 Memory Organization
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
CSC3050 – Computer Architecture
Cache - Optimization.
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Presentation transcript:

Computer Architecture, Memory Hierarchy & Virtual Memory Some diagrams from Computer Organization and Architecture 5th edition by William Stallings

“Memory Hierarchy Pyramid” CPU Registers 3-10 acc/cycl 32-64 words Words On-Chip Cache 1-2 access/cycle 5-10 ns 1KB - 2MB Lines Off-Chip Cache (SRAM) 5-20 cycles/access 10-40 ns 1MB – 16MB Blocks Main Memory (DRAM) 20-200 cycles/access 60-120ns 64MB -many GB $0.137/MB Pages Disk or Network 1M-2M cycles/access 4GB – many TB $1.10/GB 2

Movement of Memory Technology Machine CPI Clock Main Miss Penalty (ns) Memory (ns) Cycles / Instr. VAX 11/780 10 200 1200 6 0.6 Alpha 21064 0.5 5 70 14 28 Alpha 21164 0.25 2 60 30 120 Pentium IV ?? 0.5 ~5 ?? ?? CPI: Cycles per instruction 3

Cache and Main Memory Problem: Main Memory is slow compared to CPU. Solution: Store the most commonly used data in a smaller, faster memory. Good trade off between $$ and performance. 4

“Cache/Main-Memory Structure” Generalized Caches “Cache/Main-Memory Structure” At any time some subset of the Main Memory resides in the Cache. If a word in a block of memory is read, that block is transferred to one of the lines of the cache. 5

“Cache Read Operation” Generalized Caches “Cache Read Operation” CPU generates an address, RA, that it wants to read a word from. If the word is in the cache then it is sent to the CPU. Otherwise, the block that would contain the word is loaded into the cache, and then the word is sent to the processor. 6

Elements of Cache Design Cache Size Mapping Function Direct Associative Set Associative Replacement Algorithm Least recently used (LRU) First in first out (FIFO) Least frequently used (LFU) Random Write Policy Write through Write back Line Size Number of caches Single or two level Unified or split 7

Cache Size “Bigger is better” is the motto. The problem is you can only fit so much on to the chip with out making it too expensive to make or sell for your intended market sector. 8

Mapping Functions Since cache is not as big as the main memory how do we determine where data is written/read to/from the cache? The mapping functions are how memory addresses are mapped into cache locations. 9

Mapping Function “Direct Mapping” Map each block of main memory into only one possible cache line. 10

Mapping Function “Fully Associative” More flexible than direct because it permits each main memory block to be loaded into any line of the cache. Makes it much more complex though. 11

“Two-Way Set Associative” Mapping Function “Two-Way Set Associative” Compromise that has the pros of both direct and associative while reducing their disadvantages. 12

Replacement Algorithms Since the cache is not as big as the main memory you have to replace things in it. Think of cache as your bed side table and main memory like the library. If you want more books from the library you need to replace some books on you shelf. 13

Replacement algorithm Least Recently Used (LRU) – probably the most effective. Replace the line in the cache that has been in the cache the longest with no reference to it. First-In-First-Out (FIFO) – replace the block that has been in the cache the longest. Easy to implement. Least Frequently Used (LFU) – replace the block that has had the least references. Requires a counter for each cache line. Random – just randomly replace a line in the cache. Studies show this gives only slightly worse performance than the above ones. 14

Write Policy Before a block resides in the cache can be replaced, you need to determine if it has been altered in the cache but not in the main memory. If so, you must write the cache line back to main memory before replacing it. 15

Write Policy Write Through – the simplest technique. All write operations are made to main memory as well as to the cache, ensuring that memory is always up-to-date. Cons: Generates a lot of memory traffic. Write Back – minimizes memory writes. Updates are only made to the cache. Only when the block is replaced is it written back to main memory. Cons: I/O modules must go through the cache or risk getting stale memory. 16

Line size / Num. of caches So cache is number of lines by size of line. A line contains many words so the longer the line the more time it takes to decode where the word is in the line. Number of caches Either a data cache and a separate instruction cache or just one, unified cache. 17

Cache Examples Intel Pentium II IBM/Motorola Power PC G3 DEC/Compaq/HP Alpha 21064 18

“Pentium II Block Diagram” Example Cache Organizations “Pentium II Block Diagram” Cache Structure Has two L1 caches, one for data, one for instructions. The instruction cache is four-way set associative, the data cache is two-way set associative. Sizes ranges from 8KB to 16KB. The L2 cache is four-way set associative and ranged in size from 256KB to 1MB. Processor Core Fetch/decode unit: fetches program instructions in order from L1 instruction cache, decodes these into micro-operations, and stores the results in the instruction pool. Instruction pool: current set of instructions to execute. Dispatch/execute unit: schedules execution of micro-operations subject to data dependencies and resource availability. Retire unit: determines when to write values back to registers or to the L1 cache. Removes instructions from the pool after committing the results. 19

“Power PC G3 Block Diagram” Example Cache Organizations “Power PC G3 Block Diagram” Cache Structure L1 caches are eight-way set associative. The L2 cache is a two-way set associative cache with 256KB, 512KB, or 1MB of memory. Processor Core Two integer arithmetic and logic units which may execute in parallel. Floating point unit with its own registers. Data cache feeds both the integer and floating point operations via a load/store unit. 20

“Alpha 21064” 8 KB cache. With 34-bit addressing. Cache Example: 21064 “Alpha 21064” 8 KB cache. With 34-bit addressing. 256-bit lines (32 bytes) Block placement: Direct map One possible place for each address Multiple addresses for each possible place 33 Tag Cache Index Offset Cache line includes… tag data 21

Cache Example: 21064 22

How cache works for 21064 Cache operation Send address to cache Parse address into offset, index, and tag Decode index into a line of the cache, prepare cache for reading (precharge) Read line of cache: valid, tag, data Compare tag with tag field of address Miss if no match Select word according to byte offset and read or write 23

Cache operation continued… How cache works for 21064 Cache operation continued… If there is a miss… - Stall the processor while reading in line from the next level of memory hierarchy - Which in turn may miss and read from main memory - Which in turn may miss and read from disk 24

Virtual Memory Cache is relatively expensive, main memory is much cheaper, disk drives are even cheaper though. Virtual memory is the using of disk space as if it where more RAM. 25

Differences from cache Virtual Memory Block 1KB-16KB Hit 20-200 Cycles DRAM access Miss 700,000-6,000,000 cycles Page Fault Miss rate 1:0.1 – 10 million Differences from cache Implement miss strategy in software Hit/Miss factor 10,000+ (vs 10-20 for cache) Critical concerns are Fast address translation Miss ratio as low as possible without ideal knowledge 26

Virtual Memory Characteristics Fetch strategy Swap pages on task switch May pre-fetch next page if extra transfer time is only issue may include a disk cache Block Placement Anywhere – fully associate – random access is easily available, and time to place a block well is tiny compared to miss penalty. 27

Finding a block – Look in page table Virtual Memory Characteristics Finding a block – Look in page table List of VPNs (Virtual Page Numbers) and physical address (or disk location) Consider 32-bit VA, 30-bit PA, and 2 KB pages. Page table has 232/211 = 221 entries for perhaps 225 bytes or 214 pages. Page table must be in virtual memory System page table must always be in memory. Translation look-aside buffer (TLB) Cache of address translations Hit in 1 cycle (no stalls in pipeline) Miss results in page table access (which could lead to page fault). Perhaps 10-100 OS instructions. 28

TLB will support determining what translations have been used. Virtual Memory Characteristics Page replacement LRU used most often (really an approximations of LRU with a fixed time window). TLB will support determining what translations have been used. Write policy Write through or write back? Write Through – data is written to both the block in the cache and to the block in lower level memory. Write Back – data is written only to the block in the cache, only written to lower level when replaced. 29

Memory protection Must index page table entries by PID Virtual Memory Characteristics Memory protection Must index page table entries by PID Flush TLB on task switch Verify access to page before loading into TLB Provide OS access to all memory, physical, and virtual Provide some un-translated addresses to OS for I/O buffers 30

Address Translation TLB Properties (typically) Since address translations happen all the time, let’s cache them for faster accesses. We call this caches a translation, look-aside buffer (TLB) TLB Properties (typically) 8-32 entries Set-associative or fully associative Random or LRU replacement Two or more ports (instruction and data) 31

Summary What is the deal with memory hierarchies? Why bother? Why are the caches so small? Why not make them larger? Do I have to worry about any of this when I am writing code? 32