Download presentation
Presentation is loading. Please wait.
Published byJunior Stewart Modified over 9 years ago
1
1 CMPE 421 Advanced Computer Architecture Caching with Associativity PART2
2
2 Other Cache organizations Direct Mapped 0: 1: 2 3: 4: 5: 6: 7: 8 9: 10: 11: 12: 13: 14: 15: VTagData Index Address = Tag | Index | Block offset Fully Associative No Index Address = Tag | Block offset Each address has only one possible location TagDataV
3
3 Fully Associative Cache
4
4 A Compromise 2-Way set associative Address = Tag | Index | Block offset 4-Way set associative Address = Tag | Index | Block offset 0: 1: 2: 3: 4: 5: 6: 7: VTagData Each address has two possible locations with the same index Each address has two possible locations with the same index One fewer index bit: 1/2 the indexes 0: 1: 2: 3: VTagData Each address has four possible locations with the same index Each address has four possible locations with the same index Two fewer index bits: 1/4 the indexes
5
5 Range of Set Associative Caches Block offsetByte offsetIndexTag Decreasing associativity Fully associative (only one set) Tag is all the bits except block and byte offset Direct mapped (only one way) Smaller tags Increasing associativity Selects the setUsed for tag compareSelects the word in the block
6
6 Set Associative Cache 0 Cache Main Memory Q1: How do we find it? Use next 1 low order memory address bit to determine which cache set (i.e., modulo the number of sets in the cache) TagData Q2: Is it there? Compare all the cache tags in the set to the high order 3 memory address bits to tell if the memory block is in the cache V 0000xx 0001xx 0010xx 0011xx 0100xx 0101xx 0110xx 0111xx 1000xx 1001xx 1010xx 1011xx 1100xx 1101xx 1110xx 1111xx Two low order bits define the byte in the word (32-b words) One word blocks Set 1 0 1 Way 0 1 (block address) modulo (# set in the cache)
7
7 Set Associative Cache Organization FIGURE 7.17 The implementation of a four-way set-associative cache requires four comparators and a 4-to-1 multiplexor. The comparators determine which element of the selected set (if any) matches the tag. The output of the comparators is used to select the data from one of the four blocks of the indexed set, using a multiplexor with a decoded select signal. In some implementations, the Output enable signals on the data portions of the cache RAMs can be used to select the entry in the set that drives the output. The Output enable signal comes from the comparators, causing the element that matches to drive the data outputs.
8
8 Remember the Example for Direct Mapping (ping pong effect) 0404 0 404 Consider the main memory word reference string 0 4 0 4 0 4 0 4 miss 00 Mem(0) 01 4 01 Mem(4) 0 00 00 Mem(0) 01 4 00 Mem(0) 01 4 00 Mem(0) 01 4 01 Mem(4) 0 00 01 Mem(4) 0 00 Start with an empty cache - all blocks initially marked as not valid Ping pong effect due to conflict misses - two memory locations that map into the same cache block l 8 requests, 8 misses
9
9 Solution: Use set associative cache 0404 Consider the main memory word reference string 0 4 0 4 0 4 0 4 miss hit 000 Mem(0) Start with an empty cache - all blocks initially marked as not valid 010 Mem(4) 000 Mem(0) 010 Mem(4) Solves the ping pong effect in a direct mapped cache due to conflict misses since now two memory locations that map into the same cache set can co-exist! l 8 requests, 2 misses
10
10 Set Associative Example VTagData Index 0 0 0 0 0 0 0 0 000: 001: 010: 011: 100: 101: 110: 111: 0100111000 1100110100 0100111100 0110110000 1100111000 Miss Index VTagData 0 0 0 0 0 0 0 0 00: 01: 10: 11: VTagData Index 0 0 0 0 0 0 0 0 0: 1: Direct-Mapped2-Way Set Assoc.4-Way Set Assoc. 0100111000 1100110100 0100111100 0110110000 1100111000 Miss Hit Miss 0100111000 1100110100 0100111100 0110110000 1100111000 Miss Hit Miss Hit Byte offset (2 bits) Block offset (2 bits) Index (1-3 bits) Tag (3-5 bits) 010 - 1 110 010 0100 - 1 1100 - 1 011 110 0110 1100 1 01001 1 11001 1 01101 - - -
11
11 New Performance Numbers Miss rates for DEC 3100 (MIPS machine) spiceDirect0.3%0.6%0.4% gccDirect2.0%1.7%1.9% spice2-way0.3%0.6%0.4% gcc4-way1.6%1.4%1.5% BenchmarkAssociativityInstructionData missCombined ratemiss rate Separate 64KB Instruction/Data Caches gcc2-way1.6%1.4%1.5% spice4-way0.3%0.6%0.4%
12
12 Benefits of Set Associative Caches The choice of direct mapped or set associative depends on the cost of a miss versus the cost of implementation Data from Hennessy & Patterson, Computer Architecture, 2003 Largest gains are in going from direct mapped to 2-way (20%+ reduction in miss rate)
13
Virtual Memory (32-bit system): 8KB page size,16MB Mem Phys. Page # Disk Address Virt. Pg.# V 0 1 2 512K... 0 121331 Index 13 19 Virtual Address Page offset 0 121323 Physical Address 4GB / 8KB = 512K entries 2 19 =512K 11
14
Virtual memory example Virtual Page #ValidPhysical Page #/ (index)BitDisk address 00000011001 0000010sector 5000... 00001010010 0000110sector 4323… 00010011011 00010111010 0001100sector 1239... 00011110001 Page Table: System with 20-bit V.A., 16KB pages, 256KB of physical memory Page offset takes 14 bits, 6 bits for V.P.N. and 4 bits for P.P.N. Access to: 0000 1000 1100 1010 1010 PPN = 0010 Physical Address: 00 1000 1100 1010 1010 Access to: 0001 1001 0011 1100 0000 PPN = Page Fault to sector 1239... Pick a page to “kick out” of memory (use LRU). Assume LRU is VPN 000101 for this example. 0 1 1010 sector xxxx... Read data from sector 1239 into PPN 1010
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.