1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

361 Computer Architecture Lecture 15: Cache Memory
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
Review of Mem. HierarchyCSCE430/830 Review of Memory Hierarchy CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu (U.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Cache intro CSE 471 Autumn 011 Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Memory Hierarchy 2.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Lecture 19: Virtual Memory
Memory/Storage Architecture Lab Computer Architecture Memory Hierarchy.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Lecture 14 Memory Hierarchy and Cache Design Prof. Mike Schulte Computer Architecture ECE 201.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.
EEL-4713 Ann Gordon-Ross 1 EEL-4713 Computer Architecture Memory hierarchies.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Computer Organization & Programming
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.
Cache Memory.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Lecture 20 Last lecture: Today’s lecture: Types of memory
Cps 220 Cache. 1 ©GK Fall 1998 CPS220 Computer System Organization Lecture 17: The Cache Alvin R. Lebeck Fall 1999.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Memory Hierarchy Please if anyone has additional comments please speak up.
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
CPE 626 CPU Resources: Introduction to Cache Memories Aleksandar Milenkovic Web:
CMSC 611: Advanced Computer Architecture
Memory COMPUTER ARCHITECTURE
The Goal: illusion of large, fast, cheap memory
Appendix B. Review of Memory Hierarchy
Cache Memory Presentation I
CPE 631 Lecture 05: Cache Design
CMSC 611: Advanced Computer Architecture
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

1 Recap: Memory Hierarchy

2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest Smallest Biggest Highest Lowest Speed: Size: Cost: Control Datapath Secondary Storage (Disk) Processor Registers L2 Off-Chip Cache Main Memory (DRAM) L1 On-Chip Cache

3 Why Hierarchy Works The principle of locality –Programs access a relatively small portion of the address space at any instant of time. –Temporal locality: recently accessed instruction/data is likely to be used again –Spatial locality: instruction/data near recently accessed /instruction data is likely to be used soon Result: the illusion of large, fast memory Address Space 02 n - 1 Probability of reference

4 Example of Locality int A[100], B[100], C[100], D; for (i=0; i<100; i++) { C[i] = A[i] * B[i] + D; } A[0]A[1]A[2]A[3]A[5]A[6]A[7]A[4] A[96]A[97]A[98]A[99]B[1]B[2]B[3]B[0] B[5]B[6]B[7]B[4]B[9]B[10]B[11]B[8] C[0]C[1]C[2]C[3]C[5]C[6]C[7]C[4] C[96]C[97]C[98]C[99]D

5 Four Key Cache Questions: 1.Where can block be placed in cache? (block placement) 2.How can block be found in cache? …using a tag (block identification) 3.Which block should be replaced on a miss? (block replacement) 4.What happens on a write? (write strategy)

6 Q1: Block Placement Where can block be placed in cache? –In one predetermined place - direct-mapped Use fragment of address to calculate block location in cache Compare cache block with tag to test if block present –Anywhere in cache - fully associative Compare tag to every block in cache –In a limited set of places - set-associative Use address fragment to calculate set Place in any block in the set Compare tag to every block in set Hybrid of direct mapped and fully associative

7 Direct Mapped Block Placement *4*0*8*C Cache C C C C C Memory address maps to block: location = (block address MOD # blocks in cache)

8 0xF xAA 0x0F x55 Direct Mapping x0F x xAA 0xF Tag Index Data Direct mapping: A memory value can only be placed at a single corresponding location in the cache

9 Fully Associative Block Placement C C C C C Cache Memory arbitrary block mapping location = any

10 0xF xAA 0x0F x55 Fully Associative Mapping 0x0F 0x55 0xAA 0xF0 Tag Data xF xAA 0x0F x55 0x0F 0x55 0xAA 0xF Fully-associative mapping: A memory value can be anywhere in the cache

11 Set-Associative Block Placement C C C C *4*0*8*C C Cache Memory *0*4*8*C Set 0 Set 1 Set 2 Set 3 address maps to set: location = (block address MOD # sets in cache) (arbitrary location in set)

12 0xF xAA 0x0F x55 Set Associative Mapping (2- Way) 0 1 0x0F 0x55 0xAA 0xF0 Tag Index Data Way Way 1 Way 0 Set-associative mapping: A memory value can be placed in any of a set of corresponding locations in the cache

13 Q2: Block Identification Every cache block has an address tag and index that identifies its location in memory Hit when tag and index of desired word match (comparison by hardware) Q: What happens when a cache block is empty? A: Mark this condition with a valid bit 0x 00001C0 0xff083c2d 1 Tag/indexValidData

14 Direct-Mapped Cache Design CACHE SRAM ADDR DATA[31:0] 0x 00001C0 0xff083c2d 0 1 0x x x x x23F02100x TagVData = 030x DATA[58:32]DATA[59] DATAHIT ADDRESS =1 Tag Cache Index Byte Offset

15 Set Associative Cache Design Key idea: –Divide cache into sets –Allow block anywhere in a set Advantages: –Better hit rate Disadvantage: –More tag bits –More hardware –Higher access time A Four-Way Set-Associative Cache (Fig. 7.17)

16 tag data = Fully Associative Cache Design Key idea: set size of one block –1 comparator required for each block –No address decoding –Practical only for small caches due to hardware demands tag data = = = = = tag tag tag tag data data data data tag in data out

17 Cache Replacement Policy Random –Replace a randomly chosen line LRU (Least Recently Used) –Replace the least recently used line

18 LRU Policy ABCD MRU LRULRU+1MRU-1 Access C CABD Access D DCAB Access E EDCA Access C CEDA Access G GCED MISS, replacement needed MISS, replacement needed

19 Cache Write Strategies Need to keep cache consistent with the main memory –Reads are easy - require no modification –Writes- when does the update occur 1 Write Though: Data is written to both the cache block and to a block of main memory.  The lower level always has the most updated data; an important feature for I/O and multiprocessing.  Easier to implement than write back. 2 Write back: Data is written or updated only to the cache block. The modified or dirty cache block is written to main memory when it’s being replaced from cache.  Writes occur at the speed of cache  Uses less memory bandwidth than write through.

20 0x1234 Write-through Policy 0x1234 Processor Cache Memory 0x1234 0x5678

21 0x1234 Write-back Policy 0x1234 Processor Cache Memory 0x1234 0x5678 0x9ABC

22 Write Buffer for Write Through A Write Buffer is needed between the Cache and Memory –Processor: writes data into the cache and the write buffer –Memory controller: write contents of the buffer to memory Write buffer is just a FIFO: –Typical number of entries: 4 –Works fine if: Store frequency (w.r.t. time) << 1 / DRAM write cycle Processor Cache Write Buffer DRAM

23 Unified vs.Separate Level 1 Cache Unified Level 1 Cache (Princeton Memory Architecture). A single level 1 cache is used for both instructions and data. Separate instruction/data Level 1 caches (Harvard Memory Architecture): The level 1 (L 1 ) cache is split into two caches, one for instructions (instruction cache, L 1 I-cache) and the other for data (data cache, L 1 D-cache). Control Datapath Processor Registers Unified Level One Cache L 1 Control Datapath Processor Registers L 1 I-cache L 1 D-cache Unified Level 1 Cache (Princeton Memory Architecture) Separate Level 1 Caches (Harvard Memory Architecture)