1 Computer Architecture Cache Memory. 2 Today is brought to you by cache What do we want? –Fast access to data from memory –Large size of memory –Acceptable.

Slides:



Advertisements
Similar presentations
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 23, 2002 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Advertisements

CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
CS 430 Computer Architecture 1 CS 430 – Computer Architecture Caches, Part II William J. Taffe using slides of David Patterson.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
ECE 232 L26.Cache.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 26 Caches.
Now, Review of Memory Hierarchy
331 Week13.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Cache Memory Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Chapter 7 Cache Memories.
COMP3221: Microprocessors and Embedded Systems Lecture 26: Cache - II Lecturer: Hui Wu Session 2, 2005 Modified from.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
361 Computer Architecture Lecture 14: Cache Memory
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
COMP3221 lec34-Cache-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 34: Cache Memory - II
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
EEM 486 EEM 486: Computer Architecture Lecture 6 Memory Systems and Caches.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
CMPE 421 Parallel Computer Architecture
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff Case.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
CPE 442 cache.1 Introduction To Computer Architecture CpE 442 Cache Memory Design.
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.
Memory Hierarchy— Motivation, Definitions, Four Questions about Memory Hierarchy, Improving Performance Professor Alvin R. Lebeck Computer Science 220.
1010 Caching ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
EEL-4713 Ann Gordon-Ross 1 EEL-4713 Computer Architecture Memory hierarchies.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
Computer Organization & Programming
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 CMPE 421 Parallel Computer Architecture PART3 Accessing a Cache.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Improving Memory Access 2/3 The Cache and Virtual Memory
Cps 220 Cache. 1 ©GK Fall 1998 CPS220 Computer System Organization Lecture 17: The Cache Alvin R. Lebeck Fall 1999.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Summary of caches: The Principle of Locality: –Program likely to access a relatively small portion of the address space at any instant of time. Temporal.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
CMSC 611: Advanced Computer Architecture
The Goal: illusion of large, fast, cheap memory
Morgan Kaufmann Publishers Memory & Cache
CPE 631 Lecture 05: Cache Design
ECE232: Hardware Organization and Design
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Some of the slides are adopted from David Patterson (UCB)
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache Memory Rabi Mahapatra
CPE 631 Lecture 04: Review of the ABC of Caches
10/18: Lecture Topics Using spatial locality
Presentation transcript:

1 Computer Architecture Cache Memory

2 Today is brought to you by cache What do we want? –Fast access to data from memory –Large size of memory –Acceptable memory system cost Where do we get it? –Use a method to interpose a smaller but faster memory between the data-path and main memory which holds recently accessed data

3 Cache Cache = To conceal or store, as in the earth; hide in a secret place; n. A place for hiding or storing provisions, equipment etc, also the things stored or hidden [F. cacher to hide] Cache sounds like cash Programs usually exhibit locality: –temporal locality: If an item is referenced, with high probability it will be referenced again –spatial locality: If an item is referenced, the items near to it have high probability of being referenced

4 Learning Objectives 1.Know principle of cache implementation 2.Know the difference between direct, partial set associative and fully associative cache and how they work 3.Know the terms: cache hit, cache miss, word size, block size, row or set size, cache rows, cache tag, cache index, direct mapped cache, partial set associative and fully associative cache.

5 Consider this. It is like caching information You are in the library gathering books for an assignment 1) The well selected books you have gathered probably contain material that you had not expected but will likely use 2) You do not collect ALL the books from the library to your desk 3) It is quicker to access information from the book on your desk than to go to stack again This is like use of cache principles in computing.

6 Cache Principle The memory fetch and store on a simply configured CPU- memory system with no cache has access time dependent on memory access speed. In general for a given technology, for larger memory size, the access time increases. Cache is a mechanism that can speed up the memory transfers by making use a of proximity principle: machine instructions and memory accesses are often "near" to the previous and following accesses. By caching the recent transactions in fast access memory and having another memory transfer process between the main memory and cache, the effective memory access time can be sped up with consequent performance gains.

7 Why is caching effective in computing Spatial locality arises from –loops –data structures, arrays Temporal locality arises from –loops –sequential access to program instructions Memory cost and speed –SRAM5-25ns$100-$250/MByte –DRAM60-120ns$5-$10/MByte –Magnetic disk10-20 ms$ $0.20/MByte

8 Memory access time and cost

9 Practical usage of memory types Advantageous to build a hierarchy of memories: –fastest and most expensive, small and close to processor –slower and least expensive, large and further from processor

10 Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the user with as much memory as is available in the cheapest technology. –Provide access at the speed offered by the fastest technology. Control Datapath Secondary Storage (Disk) Processor Registers Main Memory (DRAM) Second Level Cache (SRAM) On-Chip Cache 1ns10,000,000ns (10s ms) Speed (ns):10ns100ns 100s Gs Size (bytes): KsMs Tertiary Storage (Disk) 10,000,000,000ns (10s sec) Ts

11 The Art of Memory System Design Processor $ MEM Memory reference stream,,,,... op: i-fetch, read, write Optimize the memory system organization to minimize the average memory access time for typical workloads Workload or Benchmark programs

12 Notation for accessing data and instructions in memory Define a BLOCK as the minimum size unit of information transferred between two adjacent levels of the memory hierarchy When a word of data is required, the whole block that the word is in is transferred. There is a high probability that the next word required is also in the block!, hence the next word is obtained from FAST memory rather than SLOW memory

13 Hits and misses Define a hit as event when data requested by a processor is available in some block of the highest memory hierarchy. A miss is the other case. Hit rate is a measure of success in accessing a cache

14 More notation Hit rate, miss rate, hit time, miss penalty: time to fetch from slow memory memory systems are critical to good performance

15 Basics of caches How do we determine if the data is in the cache? If data is in the cache, how is it found? We only have information on: –address of data –how the cache is organized Direct mapped cache: –the data can only be at a specific place

16 Data Address is used to organize cache storage strategy Word is organized by byte bits Block is organized by bits denoting the word Location in cache is indexed by row Tag is identification of a block in a cache row Tag Index Block Byte Word address bits fields

17 Example 24 bit address with 8 byte block and 2048 blocks in cache of bytes

18 Bit fields for 4 byte word in 32 bit address with 2 b words per block FieldAddress BitsUsage Word field0 : 3address bits within the word being accessed Block field4 : 4+b-1identifies word within the block, field could be empty Set fieldno bits Tag field4+b : 31identifies tag field (unique identifier for block on its row)

19 Example of direct mapped cache Example shows address entries that map to the same location in cache for one byte per word, one word per block, one block per row Tag Index Block Byte Word address bits fields Index 8 cache entries Data mapped by address modulo 8

20 Contents of a direct mapped cache Data == Cached block TAG == Most significant bits of cached block address that identify the block in that cache row from other blocks that map to that same row VALID == Flag bit to indicate the cache content is valid

21 Direct cache Separate address into fields: Byte offset in word Index for row of cache Tag identifier of block Cache of 2^n words, a block being a 4 byte word, has 2^n*(63-n) bits for 32 bit address #rows=2^n #bits/row= n+1=63-n

22 Reading: Hits and Misses Hit requires no special handling. The data is available Instruction fetch cache miss: –Stall the pipeline, apply the PC to memory and fetch the block. Re-fetch the instruction when the miss has been serviced –Same for data fetch

23 Multi-word Blocks Address (showing bit positions) 1612Byte offset V Tag Data HitData K entries 16 bits128 bits Mux Block offsetIndex Tag

24 Miss Rates Vs Block Size

25 Block Size Tradeoff In general, larger block size take advantage of spatial locality BUT: –Larger block size means larger miss penalty: Takes longer time to fill up the block –If block size is too big relative to cache size, miss rate will go up Too few cache blocks In general, Average Access Time: –= Hit Time x (1 - Miss Rate) + Miss Penalty x Miss Rate Miss Penalty Block Size Miss Rate Exploits Spatial Locality Fewer blocks: compromises temporal locality Average Access Time Increased Miss Penalty & Miss Rate Block Size

26 Example: 1 KB Direct Mapped Cache with 32 Byte Blocks For a 2 ** N byte cache: –The uppermost (32 - N) bits are always the Cache Tag –The lowest M bits are the Byte Select (Block Size = 2 ** M) Cache Index : Cache Data Byte : Cache TagExample: 0x50 Ex: 0x01 0x50 Stored as part of the cache “state” Valid Bit : 31 Byte 1Byte 31 : Byte 32Byte 33Byte 63 : Byte 992Byte 1023 : Cache Tag Byte Select Ex: 0x00 9

27 Extreme Example: single big line Cache Size = 4 bytesBlock Size = 4 bytes –Only ONE entry in the cache If an item is accessed, likely that it will be accessed again soon –But it is unlikely that it will be accessed again immediately!!! –The next access will likely be a miss again Continually loading data into the cache but discard (force out) them before they are used again Worst nightmare of a cache designer: Ping Pong Effect Conflict Misses are misses caused by: –Different memory locations mapped to the same cache index Solution 1: make the cache size bigger Solution 2: Multiple entries for the same Cache Index 0 Cache DataValid Bit Byte 0Byte 1Byte 3 Cache Tag Byte 2

28 Another Extreme Example: Fully Associative Fully Associative Cache, N blocks of 32 bytes each –Forget about the Cache Index –Compare the Cache Tags of all cache entries in parallel –Example: Block Size = 32 Byte blocks, we need N 27-bit comparators By definition: Conflict Miss = 0 for a fully associative cache : Cache Data Byte : Cache Tag (27 bits long) Valid Bit : Byte 1Byte 31 : Byte 32Byte 33Byte 63 : Cache Tag Byte Select Ex: 0x01 X X X X X

29 A Two-way Set Associative Cache N-way set associative: N entries for each Cache Index –N direct mapped caches operates in parallel Example: Two-way set associative cache –Cache Index selects a “set” from the cache –The two tags in the set are compared in parallel –Data is selected based on the tag result Cache Data Cache Block 0 Cache TagValid ::: Cache Data Cache Block 0 Cache TagValid ::: Cache Index Mux 01 Sel1Sel0 Cache Block Compare Adr Tag Compare OR Hit

30 Disadvantage of Set Associative Cache N-way Set Associative Cache versus Direct Mapped Cache: –N comparators vs. 1 –Extra MUX delay for the data –Data comes AFTER Hit/Miss decision and set selection In a direct mapped cache, Cache Block is available BEFORE Hit/Miss: –Possible to assume a hit and continue. Recover later if miss. Cache Data Cache Block 0 Cache TagValid ::: Cache Data Cache Block 0 Cache TagValid ::: Cache Index Mux 01 Sel1Sel0 Cache Block Compare Adr Tag Compare OR Hit

31 Three Cs of Caches: 1.Compulsory misses: These are cache misses caused by the first access to the block that has never been in cache (also known as cold-start misses) 2.Capacity misses: These are cache misses caused when the cache cannot contain all the blocks needed during execution of a program. Capacity misses occur because of blocks being replaced and later retrieved when accessed. 3.Conflict misses: These are cache misses that occur in set- associative or direct-mapped caches when multiple blocks compete for the same set. Conflict misses are those misses in a direct- mapped or set-associative cache that are eliminated in a fully associative cache of the same size. These are also called collision misses.

32 A Summary on Sources of Cache Misses Compulsory (cold start or process migration, first reference): first access to a block –“Cold” fact of life: not a whole lot you can do about it –Note: If you are going to run “billions” of instruction, Compulsory Misses are insignificant Conflict (collision): –Multiple memory locations mapped to the same cache location –Solution 1: increase cache size –Solution 2: increase associativity Capacity: –Cache cannot contain all blocks access by the program –Solution: increase cache size Invalidation: other process (e.g., I/O) updates memory

33 Summary: The Principle of Locality: –Program likely to access a relatively small portion of the address space at any instant of time. Temporal Locality: Locality in Time Spatial Locality: Locality in Space Three Major Categories of Cache Misses: –Compulsory Misses: sad facts of life. Example: cold start misses. –Conflict Misses: increase cache size and/or associativity. Nightmare Scenario: ping pong effect! –Capacity Misses: increase cache size Cache Design Space –total size, block size, associativity –replacement policy –write-hit policy (write-through, write-back) –write-miss policy

34 Cache design parameters Design changeeffect on miss ratepossible negative performance effect Increase block decreases miss ratemay increase size due to compulsorymiss-penalty misses Increase sizedecreases capacitymay access time increase misses Increase decreases miss rate may increase access associativity time due to conflict misses