Download presentation
Presentation is loading. Please wait.
1
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 nestorj@lafayette.edu ECE 313 - Computer Organization Memory Hierarchy 2 Feb 2005 Reading: 7.4-7.8, 7.9* Homework: Look Over 7.1-7.4, 7.9, 7.12 for discussion on Friday Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD2e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s 18-347 Slides - Fall 1999 CMU other sources as noted
2
ECE 313 Fall 2004Lecture 21 - Memory 22 Outline - Memory Systems Overview Motivation General Structure and Terminology Memory Technology Static RAM Dynamic RAM Disks Cache Memory Virtual Memory
3
ECE 313 Fall 2004Lecture 21 - Memory 23 Four Key Cache Questions: 1.Where can block be placed in cache? (block placement) 2.How can block be found in cache? …using a tag (block identification) 3.Which block should be replaced on a miss? (block replacement) 4.What happens on a write? (write strategy)
4
ECE 313 Fall 2004Lecture 21 - Memory 24 Q1: Block Placement Where can block be placed in cache? In one predetermined place - direct-mapped Use fragment of address to calculate block location in cache Compare cache block with tag to test if block present Anywhere in cache - fully associative Compare tag to every block in cache In a limited set of places - set-associative Use address fragment to calculate set (like direct-mapped) Place in any block in the set Compare tag to every block in set Hybrid of direct mapped and fully associative
5
ECE 313 Fall 2004Lecture 21 - Memory 25 Direct Mapped Block Placement *4*0*8*C Cache 0400080C1014181C2024282C3034383C4044484C Memory address maps to block: location = (block address MOD # blocks in cache)
6
ECE 313 Fall 2004Lecture 21 - Memory 26 Fully Associative Block Placement 0400080C1014181C2024282C3034383C4044484C Cache Memory arbitrary block mapping location = any
7
ECE 313 Fall 2004Lecture 21 - Memory 27 Set-Associative Block Placement 0400080C1014181C2024282C3034383C *4*0*8*C 4044484C Cache Memory *0*4*8*C Set 0 Set 1 Set 2 Set 3 address maps to set: location = (block address MOD # sets in cache) (arbitrary location in set)
8
ECE 313 Fall 2004Lecture 21 - Memory 28 Q2: Block Identification Every cache block has an address tag that identifies its location in memory Hit when tag and address of desired word match (comparison by hardware) Q: What happens when a cache block is empty? A: Mark this condition with a valid bit 0x 00001C0 0xff083c2d 1 TagValidData
9
ECE 313 Fall 2004Lecture 21 - Memory 29 Direct-Mapped Cache Design CACHE SRAM ADDR DATA[31:0] 0x 00001C0 0xff083c2d 0 1 0x00000000x00000021 1 0x00000000x00000103 0 0 1 0 0x23F02100x00000009 1 TagVData = 030x0000000 DATA[58:32]DATA[59] DATAHIT ADDRESS =1 Tag Cache Index Byte Offset
10
ECE 313 Fall 2004Lecture 21 - Memory 210 Set Associative Cache Design Key idea: Divide cache into sets Allow block anywhere in a set Advantages: Better hit rate Disadvantage: More tag bits More hardware Higher access time A Four-Way Set-Associative Cache (Fig. 7.17)
11
ECE 313 Fall 2004Lecture 21 - Memory 211 tag 11110111data 1111000011110000101011= Fully Associative Cache Design Key idea: set size of one block 1 comparator required for each block No address decoding Practical only for small caches due to hardware demands tag 00011100data 0000111100001111111101= = = = = tag 11111110 tag 00000011 tag 11100110 tag 11110111data 1111000011110000101011 data 0000000000001111111100 data 1110111100001110000001 data 1111111111111111111111 tag in 11110111data out 1111000011110000101011
12
ECE 313 Fall 2004Lecture 21 - Memory 212 Q3: Block Replacement On a miss, data must be read from memory. So, where do we put the new data? Direct-mapped cache: must place in fixed location Set-associative, fully-associative - can pick within set Random - replace an arbitrary block Least recently used (LRU) - replace the “least popular” block (best way) –Easy for 2-way set associative - one bit –Harder for n-way set associative - often “pseudo-LRU”
13
ECE 313 Fall 2004Lecture 21 - Memory 213 Q4: Write Strategy What happens on a write? Write through - write to memory, stall processor until done Write buffer - place in buffer (allows pipeline to continue*) Write back - delay write to memory until block is replaced in cache Special considerations when using DMA, multiprocessors (coherence between caches)
14
ECE 313 Fall 2004Lecture 21 - Memory 214 Example: DECStation 3100 Cache MIPS R2000 Workstation Pipelined implementation Instruction, Data Caches separate to allow concurrent access Direct-Mapped Cache Size: 64KB (16K words) Write buffer (4-word buffer) Old Fig. 7.8
15
ECE 313 Fall 2004Lecture 21 - Memory 215 Miss Rates - DecStation 3100 Cache Program Instr. Miss Rate Data Miss Rate Combined Miss Rate gcc6.1%2.1%5.4% spice1.2%1.3%1.2% Old Fig. 7.10
16
ECE 313 Fall 2004Lecture 21 - Memory 216 Miss Rates vs. Block Size - DecStation 3100 Block Size (Words) Instr. Miss Rate Data Miss Rate Combined Miss Rate 16.1%2.1%5.4% 42.0%1.7%1.9% Fig. 7.11 Program gcc spice 40.3%0.6%0.4% 11.2%1.3%1.2%
17
ECE 313 Fall 2004Lecture 21 - Memory 217 Variation: Larger Block Size Key advantage: take advantage of spatial locality Disadvantages: competition, complicated writes Fig. 7.10
18
ECE 313 Fall 2004Lecture 21 - Memory 218 Example - Intrinsity FastMATH Cache Instr. Miss Rate Data Miss Rate Combined Miss Rate 0.4%11.4%3.2% Fig. 7.10 Separate 16KB instruciton and data caches 16-word blocks (see previous page) Results on SPECINT2000 benchmark:
19
ECE 313 Fall 2004Lecture 21 - Memory 219 Miss Rates vs. Block Size Fig. 7.12 Note miss rate increases for larger block sizes
20
ECE 313 Fall 2004Lecture 21 - Memory 220 Example: Caches in the Pentium 4 Source: “The Microarchitecture of the Pentium® 4 Processor”, Intel Technology Journal, First Quarter 2001 http://developer.intel.com/technology/itj/q12001/articles/art_2.htm. L2 Cache: 128-byte block size Write Back L1 Data : 64-byte block size Write Through L1 Trace: decoded instr.
21
ECE 313 Fall 2004Lecture 21 - Memory 221 Summary: Cache Memory Speeds up access by storing recently-used data Structure has a strong impact on performance Modern microprocessors use on-chip cache (sometimes multilevel caches)
22
ECE 313 Fall 2004Lecture 21 - Memory 222 Outline - Memory Systems Overview Motivation General Structure and Terminology Memory Technology Static RAM Dynamic RAM Cache Memory Virtual Memory
23
ECE 313 Fall 2004Lecture 21 - Memory 223 Virtual Memory Key idea: simulate a larger physical memory than is actually available General approach: Break address space up into pages Each program accesses a working set of pages Store pages: In physical memory as space permits On disk when no space left in physical memory Access pages using virtual address Page Number (2)Offset Page 0 Page 1 Page 2
24
ECE 313 Fall 2004Lecture 21 - Memory 224 Virtual Memory Why do this? So a program can run as if it has a larger memory So multiple programs can run in same memory with protected address spaces Virtual addresses Physical addresses Disk addresses Address Translation
25
ECE 313 Fall 2004Lecture 21 - Memory 225 Virtual Memory Mapping from virtual to physical address Page offset Virtual page number Physical page number Translation 0111231 0111229 (Fig 7.20)
26
ECE 313 Fall 2004Lecture 21 - Memory 226 Virtual Address Translation (Fig. 7.21)
27
ECE 313 Fall 2004Lecture 21 - Memory 227 Virtual Address Translation What happens during a memory access? map virtual address into physical address using page table If the page is in memory: access physical memory If the page is on disk: page fault Suspend program Get operating system to load the page from disk Page table is in memory - this slows down access! Translation lookaside buffer (TLB) special cache of translated addresses (speeds access back up)
28
ECE 313 Fall 2004Lecture 21 - Memory 228 TLB Structure (Fig. 7.23)
29
ECE 313 Fall 2004Lecture 21 - Memory 229 TLB / Cache Interaction Fig. 7.24
30
ECE 313 Fall 2004Lecture 21 - Memory 230 Virtual Memory and Protection Important function of virtual memory: Protection Allow sharing of single main memory by multiple processes Provide each process with its own address space Protect each process from memory accesses by other processes Basic mechanism: two modes of operation User mode - allows access only to user address space Supervisor (kernel) mode - allows access to OS address space System call - allows processor to change mode
31
ECE 313 Fall 2004Lecture 21 - Memory 231 Summary - Virtual Memory Bottom level of memory hierarchy for programs Used in all general-purpose architectures Relies heavily on OS for support
32
ECE 313 Fall 2004Lecture 21 - Memory 232 Roadmap for the term: major topics Overview / Abstractions and Technology Instruction sets Logic & arithmetic Performance Processor Implementation Single-cycle implemenatation Multicycle implementation Pipelined Implementation Memory systems Input/Output
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.