1 CMPE 421 Advanced Computer Architecture Caching with Associativity PART2
2 Other Cache organizations Direct Mapped 0: 1: 2 3: 4: 5: 6: 7: 8 9: 10: 11: 12: 13: 14: 15: VTagData Index Address = Tag | Index | Block offset Fully Associative No Index Address = Tag | Block offset Each address has only one possible location TagDataV
3 Fully Associative Cache
4 A Compromise 2-Way set associative Address = Tag | Index | Block offset 4-Way set associative Address = Tag | Index | Block offset 0: 1: 2: 3: 4: 5: 6: 7: VTagData Each address has two possible locations with the same index Each address has two possible locations with the same index One fewer index bit: 1/2 the indexes 0: 1: 2: 3: VTagData Each address has four possible locations with the same index Each address has four possible locations with the same index Two fewer index bits: 1/4 the indexes
5 Range of Set Associative Caches Block offsetByte offsetIndexTag Decreasing associativity Fully associative (only one set) Tag is all the bits except block and byte offset Direct mapped (only one way) Smaller tags Increasing associativity Selects the setUsed for tag compareSelects the word in the block
6 Set Associative Cache 0 Cache Main Memory Q1: How do we find it? Use next 1 low order memory address bit to determine which cache set (i.e., modulo the number of sets in the cache) TagData Q2: Is it there? Compare all the cache tags in the set to the high order 3 memory address bits to tell if the memory block is in the cache V 0000xx 0001xx 0010xx 0011xx 0100xx 0101xx 0110xx 0111xx 1000xx 1001xx 1010xx 1011xx 1100xx 1101xx 1110xx 1111xx Two low order bits define the byte in the word (32-b words) One word blocks Set Way 0 1 (block address) modulo (# set in the cache)
7 Set Associative Cache Organization FIGURE 7.17 The implementation of a four-way set-associative cache requires four comparators and a 4-to-1 multiplexor. The comparators determine which element of the selected set (if any) matches the tag. The output of the comparators is used to select the data from one of the four blocks of the indexed set, using a multiplexor with a decoded select signal. In some implementations, the Output enable signals on the data portions of the cache RAMs can be used to select the entry in the set that drives the output. The Output enable signal comes from the comparators, causing the element that matches to drive the data outputs.
8 Remember the Example for Direct Mapping (ping pong effect) Consider the main memory word reference string miss 00 Mem(0) Mem(4) Mem(0) Mem(0) Mem(0) Mem(4) Mem(4) 0 00 Start with an empty cache - all blocks initially marked as not valid Ping pong effect due to conflict misses - two memory locations that map into the same cache block l 8 requests, 8 misses
9 Solution: Use set associative cache 0404 Consider the main memory word reference string miss hit 000 Mem(0) Start with an empty cache - all blocks initially marked as not valid 010 Mem(4) 000 Mem(0) 010 Mem(4) Solves the ping pong effect in a direct mapped cache due to conflict misses since now two memory locations that map into the same cache set can co-exist! l 8 requests, 2 misses
10 Set Associative Example VTagData Index : 001: 010: 011: 100: 101: 110: 111: Miss Index VTagData : 01: 10: 11: VTagData Index : 1: Direct-Mapped2-Way Set Assoc.4-Way Set Assoc Miss Hit Miss Miss Hit Miss Hit Byte offset (2 bits) Block offset (2 bits) Index (1-3 bits) Tag (3-5 bits)
11 New Performance Numbers Miss rates for DEC 3100 (MIPS machine) spiceDirect0.3%0.6%0.4% gccDirect2.0%1.7%1.9% spice2-way0.3%0.6%0.4% gcc4-way1.6%1.4%1.5% BenchmarkAssociativityInstructionData missCombined ratemiss rate Separate 64KB Instruction/Data Caches gcc2-way1.6%1.4%1.5% spice4-way0.3%0.6%0.4%
12 Benefits of Set Associative Caches The choice of direct mapped or set associative depends on the cost of a miss versus the cost of implementation Data from Hennessy & Patterson, Computer Architecture, 2003 Largest gains are in going from direct mapped to 2-way (20%+ reduction in miss rate)
Virtual Memory (32-bit system): 8KB page size,16MB Mem Phys. Page # Disk Address Virt. Pg.# V K Index Virtual Address Page offset Physical Address 4GB / 8KB = 512K entries 2 19 =512K 11
Virtual memory example Virtual Page #ValidPhysical Page #/ (index)BitDisk address sector sector 4323… sector Page Table: System with 20-bit V.A., 16KB pages, 256KB of physical memory Page offset takes 14 bits, 6 bits for V.P.N. and 4 bits for P.P.N. Access to: PPN = 0010 Physical Address: Access to: PPN = Page Fault to sector Pick a page to “kick out” of memory (use LRU). Assume LRU is VPN for this example sector xxxx... Read data from sector 1239 into PPN 1010