Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of.

Slides:



Advertisements
Similar presentations
CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.
Advertisements

How caches take advantage of Temporal locality
Associative Cache Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word (or sub-address in line) Tag.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Cache memory Direct Cache Memory Associate Cache Memory Set Associative Cache Memory.
COEN 180 Main Memory Cache Architectures. Basics Speed difference between cache and memory is small. Therefore:  Cache algorithms need to be implemented.
CH05 Internal Memory Computer Memory System Overview Semiconductor Main Memory Cache Memory Pentium II and PowerPC Cache Organizations Advanced DRAM Organization.
Maninder Kaur CACHE MEMORY 24-Nov
CSNB123 coMPUTER oRGANIZATION
Faculty of Information Technology Department of Computer Science Computer Organization and Assembly Language Chapter 4 Cache Memory.
7-1 Chapter 7 - Memory Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles of Computer Architecture.
7-1 Chapter 7 - Memory Department of Information Technology, Radford University ITEC 352 Computer Organization Principles of Computer Architecture Miles.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Cache Memory.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 2 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
Computer Architecture Lecture 26 Fasih ur Rehman.
2007 Sept. 14SYSC 2001* - Fall SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples.
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
Additional Slides By Professor Mary Jane Irwin Pennsylvania State University Group 3.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Memory Hierarchy. Hierarchy List Registers L1 Cache L2 Cache Main memory Disk cache Disk Optical Tape.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
11 Intro to cache memory Kosarev Nikolay MIPT Nov, 2009.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
Lecture 20 Last lecture: Today’s lecture: Types of memory
Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module.
CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra.
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.
Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.
Main Memory Cache Architectures
Cache Memory.
William Stallings Computer Organization and Architecture 7th Edition
COSC3330 Computer Architecture
CSE 351 Section 9 3/1/12.
Computer Architecture
Basic Performance Parameters in Computer Architecture:
Yunju Baek 2006 spring Chapter 4 Cache Memory Yunju Baek 2006 spring.
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
William Stallings Computer Organization and Architecture 7th Edition
Cache memory Direct Cache Memory Associate Cache Memory
CS61C : Machine Structures Lecture 6. 2
BIC 10503: COMPUTER ARCHITECTURE
Module IV Memory Organization.
Direct Mapping.
Module IV Memory Organization.
Chapter 6 Memory System Design
Performance metrics for caches
Performance metrics for caches
ECE232: Hardware Organization and Design
Performance metrics for caches
Morgan Kaufmann Publishers
CS 704 Advanced Computer Architecture
Main Memory Cache Architectures
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Cache - Optimization.
Cache Memory and Performance
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every line’s tag is examined for a match Cache searching gets expensive -must simultaneously examine every line’s tag for a match

Fully Associative Cache Organization

Tag 22 bit Word 2 bit Associative Mapping Address Structure Example 24 bit address 22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which byte is required from 32 bit data block

Associative Mapping Summary Address length = (s + w) bits Number of addressable units = 2 s+w words or bytes Block size = line size = 2 w words or bytes Number of blocks in main memory = 2 s+ w /2 w = 2 s Number of lines in cache = undetermined Size of tag = s bits

Associative Mapping Example Consider how an access to memory location (A035F014) 16 is mapped to the cache for a 2 32 word memory. The memory is divided into 2 27 blocks of 2 5 = 32 words per block, and the cache consists of 2 14 slots:Consider how an access to memory location (A035F014) 16 is mapped to the cache for a 2 32 word memory. The memory is divided into 2 27 blocks of 2 5 = 32 words per block, and the cache consists of 2 14 slots:

If the addressed word is in the cache, it will be found in word (14) 16 of a slot that has tag (501AF80) 16, which is made up of the 27 most significant bits of the address. If the addressed word is not in the cache, then the block corresponding to tag field (501AF80) 16 is brought into an available slot in the cache from the main memory, and the memory reference is then satisfied from the cache.

Associate Mapping pros & cons Advantage – Flexible Disadvantages – Cost – Complex circuit for simultaneous comparison

Set Associative Mapping Compromise between the previous two Cache is divided into v sets of k lines each – m = v x k, where m: #lines – i = j mod v, where – i : cache set number – j : memory block number A given block maps to any line in a given set K-way set associate cache – 2-way and 4-way are common

Set Associative Mapping Example m = 16 lines, v = 8 sets  k = 2 lines/set, 2 way set associative mapping Assume 32 blocks in memory, i = j mod v – setblocks – 00, 8, 16, 24 – 11, 9, 17, 25 – :: – 77, 15, 23, 31 – A given block can be in one of 2 lines in only one set – e.g., block 17 can be assigned to either line 0 or line 1 in set 1

Set Associative Mapping Address Structure d bits: v = 2 d, specify one of v sets s bits: specify one of 2 s blocks Use set field to determine cache set to look in Compare tag field simultaneously to see if we have a hit Tag (s-d) bit Set d bit Word w bit

K Way Set Associative Cache Organization

Set Associative Mapping Summary Address length = (s + w) bits Number of addressable units = 2 s+w words or bytes Block size = line size = 2 w words or bytes Number of blocks in main memory = 2 s Number of lines in set = k Number of sets = v = 2 d Number of lines in cache = kv = k * 2 d Size of tag = (s – d) bits

Remarks Why is the simultaneous comparison cheaper here, compared to associate mapping? – Tag is much smaller – Only k tags within a set are compared Relationship between set associate and the first two: extreme cases of set associate – k = 1  v = m  direct (1 line/set) – k = m  v = 1  associate (one big set)

An Associative Mapping Scheme for a Cache Memory

Set-Associative Mapping Example Consider how an access to memory location (A035F014) 16 is mapped to the cache for a 2 32 word memory. The memory is divided into 2 27 blocks of 2 5 = 32 words per block, there are two blocks per set, and the cache consists of 2 14 slots: Consider how an access to memory location (A035F014) 16 is mapped to the cache for a 2 32 word memory. The memory is divided into 2 27 blocks of 2 5 = 32 words per block, there are two blocks per set, and the cache consists of 2 14 slots:

The leftmost 14 bits form the tag field, followed by 13 bits for the set field, followed by five bits for the word field: The leftmost 14 bits form the tag field, followed by 13 bits for the set field, followed by five bits for the word field:

Replacement Algorithms (1) Direct mapping Replacement algorithm – When a new block is brought into cache, one of existing blocks must be replaced Direct Mapping – No choice – Each block only maps to one line – Replace that line

Replacement Policies When there are no available slots in which to place a block, a replacement policy is implemented. The replacement policy governs the choice of which slot is freed up for the new block. When there are no available slots in which to place a block, a replacement policy is implemented. The replacement policy governs the choice of which slot is freed up for the new block. Least recently used (LRU) Least recently used (LRU) First-in/first-out (FIFO) First-in/first-out (FIFO) Least frequently used (LFU) Least frequently used (LFU) Random Random Optimal (used for analysis only – look backward in time and reverse-engineer the best possible strategy for a particular sequence of memory references.) Optimal (used for analysis only – look backward in time and reverse-engineer the best possible strategy for a particular sequence of memory references.)

Replacement Algorithms (2) Associative & Set Associative Hardware implemented algorithm (speed) Least Recently used (LRU) – e.g. in 2 way set associative – Which of the 2 block is LRU? First in first out (FIFO) – replace block that has been in cache longest Least frequently used – replace block which has had fewest hits Random

Write Policy Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly

Write through All writes go to main memory as well as cache Both copies always agree Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Disadvantage – Lots of traffic  bottleneck

Write back Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set,i.e., only if at least one word in the cache line is updated Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes

Block Size Block size = line size As block size increases from very small  hit ratio increases because of “the principle of locality” As block size becomes very large  hit ratio decreases as – Number of blocks decreases – Probability of referencing all words in a block decreases addressable units is reasonable

Cache Read and Write Policies

Hit Ratios and Effective Access Times Hit ratio and effective access time for single level cache: Hit ratio and effective access time for single level cache:

Hit ratios and effective access time for multi-level cache: Hit ratios and effective access time for multi-level cache:

Performance example (1) Assume 2-level memory system – Level 1: access time T 1 – Level 2: access time T 2 – Hit ratio, H: fraction of time a reference can be found in level 1 Average access time, T ave = prob(found in level1) x T(found in level1) + prob(not found in level1) x T(not found in level1) = H xT 1 + (1- H ) x (T 1 + T 2 ) =T 1 + (1 - H )T 2

Performance example (2) Assume 2-level memory system Level 1: access time T 1 = 1  s Level 2: access time T 2 = 10  s Hit ratio, H = 95% Average access time, T ave =?

Performance example (2) Assume 2-level memory system Level 1: access time T 1 = 1  s Level 2: access time T 2 = 10  s Hit ratio, H = 95% Average access time, T ave = H xT 1 + (1- H )x(T 1 + T 2 )=.95 x 1 + (1 -.95) X (1 + 10) = X 11= 1.5  s

Performance example (3) Higher hit ratio  better performance

Exercise A CPU has level 1 cache and level 2 cache, with access times of 5 nsec and 10 nsec respectively. The main memory access time is 50 nsec. If 20% of the accesses are level 1 cache hits and 60% are level 2 cache hits, what is the average access time?

Exercise Assuming a cache of 4K blocks, a four-word block size, and 32-bit address, find the total number of sets and the total number of tag bits for caches that are direct mapped, two way and four way set associative, and fully associative.

Solution Since there are 16 (=2 4 ) bytes per block, a 32- bit address gives 32-4=28 bits to be used for index and tag. 1) direct-mapped: same number of sets as blocks, so 12 bits of index (since 4K), thus total number of tag bits is (28-12)x4K=64Kbits.

Continued 2)2-way associative: there are 2K sets, and total number of of tag bits is (28-11)x2x2K=34x2K=68Kbits 4-way associative: there are 1K sets, total number of tag bits: (28-10)x4x1K=72Kbits Fully associative: Only one set with 4K blocks, and the tag is 28 bits, so total: 28x4Kx1=112K tag bits

Exercise Consider a 3-way set-associative write through cache with an 8 byte blocksize, 128 sets, and random replacement. Assume a 32-bit address. How big is the cache (in bytes)? How many bits total are there in the cache (i.e. for data, tags, etc.).

Solution The cache size is 8 * 3 * 128 = 3072 bytes. Each line has 8 bytes * 8 bits/byte = 64 bits of data. A tag is 22 bits (2 address bits ignored, 1 for position of word, 7 for index). Each line has a valid bit (no dirty bit 'cause it is write-through). Each line therefore has = 87 bits. Each set has 3 * 87 = 261 bits. There are no extra bits for replacement. The total number of bits is 128 * 261 = bits.

Exercise Assume there are three small caches, each consisting of four one-word blocks. One cache is direct mapped, a second is two-way set associative, and the third is a fully associative. Find the number of misses for each cache organization given the following sequence of block addresses: 0, 8, 0, 6, 8.

Solution: direct-mapped Block Address Cache block 0 (0 modulo 4)=0 6 (6 modulo 4)=2 8 (8 modulo 4)=0 Address of mem block Hit or miss Contents of Cacheblock after reference addressed0123 0missMemory[0] 8missMemory[8] 0missMemory[0] 6missMemory[0]Memory[6] 8missMemory[8]Memory[6] Remember: i=j modulo m The direct-mapped cache generates five misses for the five accesses. The direct-mapped cache generates five misses for the five accesses.

Solution: 2-way set associative Block Address Cache set 0 (0 modulo 2)=0 6 (6 modulo 2)=0 8 (8 modulo 2)=0 Address of mem block Hit or miss Contents of Cacheblock after reference addressed Set 0 Set 1 0missMemory[0] 8missMemory[0]Memory[8] 0hitMemory[0]Memory[8] 6missMemory[0]Memory[6] 8missMemory[8]Memory[6] Remember: i=j modulo v The 2-way set associative cache has 4 misses, one less than the direct mapped cache. The 2-way set associative cache has 4 misses, one less than the direct mapped cache.

Solution: Fully associative Address of mem block Hit or miss Contents of Cacheblock after reference addressed Block 0 Block 1 Block 2 Block 3 0missMemory[0] 8missMemory[0]Memory[8] 0hitMemory[0]Memory[8] 6missMemory[0]Memory[8]Memory[6] 8hitMemory[0]Memory[8]Memory[6] Remember: One single set of 4 blocks The fully associative cache has the best performance with only 3 misses. The fully associative cache has the best performance with only 3 misses.