Presentation is loading. Please wait.

Presentation is loading. Please wait.

Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of.

Similar presentations


Presentation on theme: "Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of."— Presentation transcript:

1 Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every line’s tag is examined for a match Cache searching gets expensive -must simultaneously examine every line’s tag for a match

2 Fully Associative Cache Organization

3 Tag 22 bit Word 2 bit Associative Mapping Address Structure Example 24 bit address 22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which byte is required from 32 bit data block

4 Associative Mapping Summary Address length = (s + w) bits Number of addressable units = 2 s+w words or bytes Block size = line size = 2 w words or bytes Number of blocks in main memory = 2 s+ w /2 w = 2 s Number of lines in cache = undetermined Size of tag = s bits

5 Associative Mapping Example Consider how an access to memory location (A035F014) 16 is mapped to the cache for a 2 32 word memory. The memory is divided into 2 27 blocks of 2 5 = 32 words per block, and the cache consists of 2 14 slots:Consider how an access to memory location (A035F014) 16 is mapped to the cache for a 2 32 word memory. The memory is divided into 2 27 blocks of 2 5 = 32 words per block, and the cache consists of 2 14 slots:

6 If the addressed word is in the cache, it will be found in word (14) 16 of a slot that has tag (501AF80) 16, which is made up of the 27 most significant bits of the address. If the addressed word is not in the cache, then the block corresponding to tag field (501AF80) 16 is brought into an available slot in the cache from the main memory, and the memory reference is then satisfied from the cache.

7 Associate Mapping pros & cons Advantage – Flexible Disadvantages – Cost – Complex circuit for simultaneous comparison

8 Set Associative Mapping Compromise between the previous two Cache is divided into v sets of k lines each – m = v x k, where m: #lines – i = j mod v, where – i : cache set number – j : memory block number A given block maps to any line in a given set K-way set associate cache – 2-way and 4-way are common

9 Set Associative Mapping Example m = 16 lines, v = 8 sets  k = 2 lines/set, 2 way set associative mapping Assume 32 blocks in memory, i = j mod v – setblocks – 00, 8, 16, 24 – 11, 9, 17, 25 – :: – 77, 15, 23, 31 – A given block can be in one of 2 lines in only one set – e.g., block 17 can be assigned to either line 0 or line 1 in set 1

10 Set Associative Mapping Address Structure d bits: v = 2 d, specify one of v sets s bits: specify one of 2 s blocks Use set field to determine cache set to look in Compare tag field simultaneously to see if we have a hit Tag (s-d) bit Set d bit Word w bit

11 K Way Set Associative Cache Organization

12

13 Set Associative Mapping Summary Address length = (s + w) bits Number of addressable units = 2 s+w words or bytes Block size = line size = 2 w words or bytes Number of blocks in main memory = 2 s Number of lines in set = k Number of sets = v = 2 d Number of lines in cache = kv = k * 2 d Size of tag = (s – d) bits

14 Remarks Why is the simultaneous comparison cheaper here, compared to associate mapping? – Tag is much smaller – Only k tags within a set are compared Relationship between set associate and the first two: extreme cases of set associate – k = 1  v = m  direct (1 line/set) – k = m  v = 1  associate (one big set)

15 An Associative Mapping Scheme for a Cache Memory

16 Set-Associative Mapping Example Consider how an access to memory location (A035F014) 16 is mapped to the cache for a 2 32 word memory. The memory is divided into 2 27 blocks of 2 5 = 32 words per block, there are two blocks per set, and the cache consists of 2 14 slots: Consider how an access to memory location (A035F014) 16 is mapped to the cache for a 2 32 word memory. The memory is divided into 2 27 blocks of 2 5 = 32 words per block, there are two blocks per set, and the cache consists of 2 14 slots:

17 The leftmost 14 bits form the tag field, followed by 13 bits for the set field, followed by five bits for the word field: The leftmost 14 bits form the tag field, followed by 13 bits for the set field, followed by five bits for the word field:

18 Replacement Algorithms (1) Direct mapping Replacement algorithm – When a new block is brought into cache, one of existing blocks must be replaced Direct Mapping – No choice – Each block only maps to one line – Replace that line

19 Replacement Policies When there are no available slots in which to place a block, a replacement policy is implemented. The replacement policy governs the choice of which slot is freed up for the new block. When there are no available slots in which to place a block, a replacement policy is implemented. The replacement policy governs the choice of which slot is freed up for the new block. Least recently used (LRU) Least recently used (LRU) First-in/first-out (FIFO) First-in/first-out (FIFO) Least frequently used (LFU) Least frequently used (LFU) Random Random Optimal (used for analysis only – look backward in time and reverse-engineer the best possible strategy for a particular sequence of memory references.) Optimal (used for analysis only – look backward in time and reverse-engineer the best possible strategy for a particular sequence of memory references.)

20 Replacement Algorithms (2) Associative & Set Associative Hardware implemented algorithm (speed) Least Recently used (LRU) – e.g. in 2 way set associative – Which of the 2 block is LRU? First in first out (FIFO) – replace block that has been in cache longest Least frequently used – replace block which has had fewest hits Random

21 Write Policy Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches I/O may address main memory directly

22 Write through All writes go to main memory as well as cache Both copies always agree Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Disadvantage – Lots of traffic  bottleneck

23 Write back Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set,i.e., only if at least one word in the cache line is updated Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes

24 Block Size Block size = line size As block size increases from very small  hit ratio increases because of “the principle of locality” As block size becomes very large  hit ratio decreases as – Number of blocks decreases – Probability of referencing all words in a block decreases 4 - 8 addressable units is reasonable

25 Cache Read and Write Policies

26 Hit Ratios and Effective Access Times Hit ratio and effective access time for single level cache: Hit ratio and effective access time for single level cache:

27 Hit ratios and effective access time for multi-level cache: Hit ratios and effective access time for multi-level cache:

28 Performance example (1) Assume 2-level memory system – Level 1: access time T 1 – Level 2: access time T 2 – Hit ratio, H: fraction of time a reference can be found in level 1 Average access time, T ave = prob(found in level1) x T(found in level1) + prob(not found in level1) x T(not found in level1) = H xT 1 + (1- H ) x (T 1 + T 2 ) =T 1 + (1 - H )T 2

29 Performance example (2) Assume 2-level memory system Level 1: access time T 1 = 1  s Level 2: access time T 2 = 10  s Hit ratio, H = 95% Average access time, T ave =?

30 Performance example (2) Assume 2-level memory system Level 1: access time T 1 = 1  s Level 2: access time T 2 = 10  s Hit ratio, H = 95% Average access time, T ave = H xT 1 + (1- H )x(T 1 + T 2 )=.95 x 1 + (1 -.95) X (1 + 10) =.95 +.05 X 11= 1.5  s

31 Performance example (3) Higher hit ratio  better performance

32 Exercise A CPU has level 1 cache and level 2 cache, with access times of 5 nsec and 10 nsec respectively. The main memory access time is 50 nsec. If 20% of the accesses are level 1 cache hits and 60% are level 2 cache hits, what is the average access time?

33 Exercise Assuming a cache of 4K blocks, a four-word block size, and 32-bit address, find the total number of sets and the total number of tag bits for caches that are direct mapped, two way and four way set associative, and fully associative.

34 Solution Since there are 16 (=2 4 ) bytes per block, a 32- bit address gives 32-4=28 bits to be used for index and tag. 1) direct-mapped: same number of sets as blocks, so 12 bits of index (since 4K), thus total number of tag bits is (28-12)x4K=64Kbits.

35 Continued 2)2-way associative: there are 2K sets, and total number of of tag bits is (28-11)x2x2K=34x2K=68Kbits 4-way associative: there are 1K sets, total number of tag bits: (28-10)x4x1K=72Kbits Fully associative: Only one set with 4K blocks, and the tag is 28 bits, so total: 28x4Kx1=112K tag bits

36 Exercise Consider a 3-way set-associative write through cache with an 8 byte blocksize, 128 sets, and random replacement. Assume a 32-bit address. How big is the cache (in bytes)? How many bits total are there in the cache (i.e. for data, tags, etc.).

37 Solution The cache size is 8 * 3 * 128 = 3072 bytes. Each line has 8 bytes * 8 bits/byte = 64 bits of data. A tag is 22 bits (2 address bits ignored, 1 for position of word, 7 for index). Each line has a valid bit (no dirty bit 'cause it is write-through). Each line therefore has 64 + 22 + 1 = 87 bits. Each set has 3 * 87 = 261 bits. There are no extra bits for replacement. The total number of bits is 128 * 261 = 33408 bits.

38 Exercise Assume there are three small caches, each consisting of four one-word blocks. One cache is direct mapped, a second is two-way set associative, and the third is a fully associative. Find the number of misses for each cache organization given the following sequence of block addresses: 0, 8, 0, 6, 8.

39 Solution: direct-mapped Block Address Cache block 0 (0 modulo 4)=0 6 (6 modulo 4)=2 8 (8 modulo 4)=0 Address of mem block Hit or miss Contents of Cacheblock after reference addressed0123 0missMemory[0] 8missMemory[8] 0missMemory[0] 6missMemory[0]Memory[6] 8missMemory[8]Memory[6] Remember: i=j modulo m The direct-mapped cache generates five misses for the five accesses. The direct-mapped cache generates five misses for the five accesses.

40 Solution: 2-way set associative Block Address Cache set 0 (0 modulo 2)=0 6 (6 modulo 2)=0 8 (8 modulo 2)=0 Address of mem block Hit or miss Contents of Cacheblock after reference addressed Set 0 Set 1 0missMemory[0] 8missMemory[0]Memory[8] 0hitMemory[0]Memory[8] 6missMemory[0]Memory[6] 8missMemory[8]Memory[6] Remember: i=j modulo v The 2-way set associative cache has 4 misses, one less than the direct mapped cache. The 2-way set associative cache has 4 misses, one less than the direct mapped cache.

41 Solution: Fully associative Address of mem block Hit or miss Contents of Cacheblock after reference addressed Block 0 Block 1 Block 2 Block 3 0missMemory[0] 8missMemory[0]Memory[8] 0hitMemory[0]Memory[8] 6missMemory[0]Memory[8]Memory[6] 8hitMemory[0]Memory[8]Memory[6] Remember: One single set of 4 blocks The fully associative cache has the best performance with only 3 misses. The fully associative cache has the best performance with only 3 misses.


Download ppt "Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of."

Similar presentations


Ads by Google