Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Memory Hierarchy 2.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 nestorj@lafayette.edu ECE 313 - Computer Organization Memory Hierarchy 2 Feb 2005 Reading: 7.4-7.8, 7.9* Homework: Look Over 7.1-7.4, 7.9, 7.12 for discussion on Friday Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD2e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s 18-347 Slides - Fall 1999 CMU other sources as noted

ECE 313 Fall 2004Lecture 21 - Memory 22 Outline - Memory Systems  Overview  Motivation  General Structure and Terminology  Memory Technology  Static RAM  Dynamic RAM  Disks  Cache Memory   Virtual Memory

ECE 313 Fall 2004Lecture 21 - Memory 23 Four Key Cache Questions: 1.Where can block be placed in cache? (block placement) 2.How can block be found in cache? …using a tag (block identification) 3.Which block should be replaced on a miss? (block replacement) 4.What happens on a write? (write strategy)

ECE 313 Fall 2004Lecture 21 - Memory 24 Q1: Block Placement  Where can block be placed in cache?  In one predetermined place - direct-mapped Use fragment of address to calculate block location in cache Compare cache block with tag to test if block present  Anywhere in cache - fully associative Compare tag to every block in cache  In a limited set of places - set-associative Use address fragment to calculate set (like direct-mapped) Place in any block in the set Compare tag to every block in set Hybrid of direct mapped and fully associative

ECE 313 Fall 2004Lecture 21 - Memory 25 Direct Mapped Block Placement *4*0*8*C Cache 0400080C1014181C2024282C3034383C4044484C Memory address maps to block: location = (block address MOD # blocks in cache)

ECE 313 Fall 2004Lecture 21 - Memory 26 Fully Associative Block Placement 0400080C1014181C2024282C3034383C4044484C Cache Memory arbitrary block mapping location = any

ECE 313 Fall 2004Lecture 21 - Memory 27 Set-Associative Block Placement 0400080C1014181C2024282C3034383C *4*0*8*C 4044484C Cache Memory *0*4*8*C Set 0 Set 1 Set 2 Set 3 address maps to set: location = (block address MOD # sets in cache) (arbitrary location in set)

ECE 313 Fall 2004Lecture 21 - Memory 28 Q2: Block Identification  Every cache block has an address tag that identifies its location in memory  Hit when tag and address of desired word match (comparison by hardware)  Q: What happens when a cache block is empty? A: Mark this condition with a valid bit 0x 00001C0 0xff083c2d 1 TagValidData

ECE 313 Fall 2004Lecture 21 - Memory 29 Direct-Mapped Cache Design CACHE SRAM ADDR DATA[31:0] 0x 00001C0 0xff083c2d 0 1 0x00000000x00000021 1 0x00000000x00000103 0 0 1 0 0x23F02100x00000009 1 TagVData = 030x0000000 DATA[58:32]DATA[59] DATAHIT ADDRESS =1 Tag Cache Index Byte Offset

ECE 313 Fall 2004Lecture 21 - Memory 210 Set Associative Cache Design  Key idea:  Divide cache into sets  Allow block anywhere in a set  Advantages:  Better hit rate  Disadvantage:  More tag bits  More hardware  Higher access time A Four-Way Set-Associative Cache (Fig. 7.17)

ECE 313 Fall 2004Lecture 21 - Memory 211 tag 11110111data 1111000011110000101011= Fully Associative Cache Design  Key idea: set size of one block  1 comparator required for each block  No address decoding  Practical only for small caches due to hardware demands tag 00011100data 0000111100001111111101= = = = = tag 11111110 tag 00000011 tag 11100110 tag 11110111data 1111000011110000101011 data 0000000000001111111100 data 1110111100001110000001 data 1111111111111111111111 tag in 11110111data out 1111000011110000101011

ECE 313 Fall 2004Lecture 21 - Memory 212 Q3: Block Replacement  On a miss, data must be read from memory.  So, where do we put the new data?  Direct-mapped cache: must place in fixed location  Set-associative, fully-associative - can pick within set Random - replace an arbitrary block Least recently used (LRU) - replace the “least popular” block (best way) –Easy for 2-way set associative - one bit –Harder for n-way set associative - often “pseudo-LRU”

ECE 313 Fall 2004Lecture 21 - Memory 213 Q4: Write Strategy  What happens on a write?  Write through - write to memory, stall processor until done  Write buffer - place in buffer (allows pipeline to continue*)  Write back - delay write to memory until block is replaced in cache  Special considerations when using DMA, multiprocessors (coherence between caches)

ECE 313 Fall 2004Lecture 21 - Memory 214 Example: DECStation 3100 Cache  MIPS R2000 Workstation  Pipelined implementation  Instruction, Data Caches separate to allow concurrent access  Direct-Mapped Cache  Size: 64KB (16K words)  Write buffer (4-word buffer) Old Fig. 7.8

ECE 313 Fall 2004Lecture 21 - Memory 215 Miss Rates - DecStation 3100 Cache Program Instr. Miss Rate Data Miss Rate Combined Miss Rate gcc6.1%2.1%5.4% spice1.2%1.3%1.2% Old Fig. 7.10

ECE 313 Fall 2004Lecture 21 - Memory 216 Miss Rates vs. Block Size - DecStation 3100 Block Size (Words) Instr. Miss Rate Data Miss Rate Combined Miss Rate 16.1%2.1%5.4% 42.0%1.7%1.9% Fig. 7.11 Program gcc spice 40.3%0.6%0.4% 11.2%1.3%1.2%

ECE 313 Fall 2004Lecture 21 - Memory 217 Variation: Larger Block Size  Key advantage: take advantage of spatial locality  Disadvantages: competition, complicated writes Fig. 7.10

ECE 313 Fall 2004Lecture 21 - Memory 218 Example - Intrinsity FastMATH Cache Instr. Miss Rate Data Miss Rate Combined Miss Rate 0.4%11.4%3.2% Fig. 7.10  Separate 16KB instruciton and data caches  16-word blocks (see previous page)  Results on SPECINT2000 benchmark:

ECE 313 Fall 2004Lecture 21 - Memory 219 Miss Rates vs. Block Size Fig. 7.12  Note miss rate increases for larger block sizes

ECE 313 Fall 2004Lecture 21 - Memory 220 Example: Caches in the Pentium 4 Source: “The Microarchitecture of the Pentium® 4 Processor”, Intel Technology Journal, First Quarter 2001 http://developer.intel.com/technology/itj/q12001/articles/art_2.htm. L2 Cache: 128-byte block size Write Back L1 Data : 64-byte block size Write Through L1 Trace: decoded instr.

ECE 313 Fall 2004Lecture 21 - Memory 221 Summary: Cache Memory  Speeds up access by storing recently-used data  Structure has a strong impact on performance  Modern microprocessors use on-chip cache (sometimes multilevel caches)

ECE 313 Fall 2004Lecture 21 - Memory 222 Outline - Memory Systems  Overview  Motivation  General Structure and Terminology  Memory Technology  Static RAM  Dynamic RAM  Cache Memory  Virtual Memory 

ECE 313 Fall 2004Lecture 21 - Memory 223 Virtual Memory  Key idea: simulate a larger physical memory than is actually available  General approach:  Break address space up into pages  Each program accesses a working set of pages  Store pages: In physical memory as space permits On disk when no space left in physical memory  Access pages using virtual address Page Number (2)Offset Page 0 Page 1 Page 2

ECE 313 Fall 2004Lecture 21 - Memory 224 Virtual Memory  Why do this?  So a program can run as if it has a larger memory  So multiple programs can run in same memory with protected address spaces Virtual addresses Physical addresses Disk addresses Address Translation

ECE 313 Fall 2004Lecture 21 - Memory 225 Virtual Memory  Mapping from virtual to physical address Page offset Virtual page number Physical page number Translation 0111231 0111229 (Fig 7.20)

ECE 313 Fall 2004Lecture 21 - Memory 226 Virtual Address Translation (Fig. 7.21)

ECE 313 Fall 2004Lecture 21 - Memory 227 Virtual Address Translation  What happens during a memory access?  map virtual address into physical address using page table  If the page is in memory: access physical memory  If the page is on disk: page fault Suspend program Get operating system to load the page from disk  Page table is in memory - this slows down access!  Translation lookaside buffer (TLB) special cache of translated addresses (speeds access back up)

ECE 313 Fall 2004Lecture 21 - Memory 228 TLB Structure (Fig. 7.23)

ECE 313 Fall 2004Lecture 21 - Memory 229 TLB / Cache Interaction Fig. 7.24

ECE 313 Fall 2004Lecture 21 - Memory 230 Virtual Memory and Protection  Important function of virtual memory: Protection  Allow sharing of single main memory by multiple processes  Provide each process with its own address space  Protect each process from memory accesses by other processes  Basic mechanism: two modes of operation  User mode - allows access only to user address space  Supervisor (kernel) mode - allows access to OS address space  System call - allows processor to change mode

ECE 313 Fall 2004Lecture 21 - Memory 231 Summary - Virtual Memory  Bottom level of memory hierarchy for programs  Used in all general-purpose architectures  Relies heavily on OS for support

ECE 313 Fall 2004Lecture 21 - Memory 232 Roadmap for the term: major topics  Overview / Abstractions and Technology  Instruction sets  Logic & arithmetic  Performance  Processor Implementation  Single-cycle implemenatation  Multicycle implementation  Pipelined Implementation  Memory systems  Input/Output 

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Memory Hierarchy 2.

Similar presentations

Presentation on theme: "Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Memory Hierarchy 2."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Memory Hierarchy 2.

Similar presentations

Presentation on theme: "Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 ECE 313 - Computer Organization Memory Hierarchy 2."— Presentation transcript:

Similar presentations

About project

Feedback