Download presentation
Presentation is loading. Please wait.
Published byAugustus Riley Modified over 9 years ago
1
Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group
2
Contents Cache Logical View Physical View Case Study – Power 4 L2 Cache Cache Logical View Physical View Case Study – Power 4 L2 Cache
3
Logical Cache Structure n-way associative cache n-elements per set 2 m Sets TagIndex Address (32 bits) = = Data Hit m 32 – m - k … Offset k or
4
Cache Structure
5
Cache Access Steps 1. Decode address 2. Enable the word line 3. Raise the bit lines to high 4. Get the tag value from the tag array 5. Check for tag match 6. Select data output Steps 1. Decode address 2. Enable the word line 3. Raise the bit lines to high 4. Get the tag value from the tag array 5. Check for tag match 6. Select data output
6
Conventional Cache Organization Memory Cell
7
Read: Set bit and bit´ high If the value in the cell is 1, then bit´ is discharged. It the value is 0, then bit is discharged Write: Set bit´ to 0. This forces 1 in the latch.
8
Decoder with Driver
9
Various Components Comparator is xor logic Multiplexer hierarchy for offset. First get block (from output drive), then word, then byte Output Driver Maximum of one input bits high If input 0, then high resistant output … I0I1I7
10
Banking Idea: Support Multiple Cache Accesses Solution: Use multiporting on bit cells (Cost is big) Divide the cache into independent banks
11
Cache Search Steps: 1. Find Bank (bank index) 2. Find Set in Bank (index) 3. Check if data is valid and in the cache (tag match) 4. If all ok return data (block and byte offset), else check lower level memory Steps: 1. Find Bank (bank index) 2. Find Set in Bank (index) 3. Check if data is valid and in the cache (tag match) 4. If all ok return data (block and byte offset), else check lower level memory
12
Case Study - Power 4 Dual Core 64-bit Processors 32KB L1 D-Cache (Per Processor) 2-way associative 128 Bytes Line 64KB L1 I-Cache (Per Processor) Direct Mapped 128 Bytes Line (4 sectors x 32B) ~1.5MB L2 Cache 8-way set associative 128 Bytes line
13
Power4 Floorplan
14
Power4 L2 Logical View Cache Split into 3 Parts, 0.5Mb each Control by 4 Coherency Processors 1 64B Store Queue per Processor
15
Power4 L2U ~512 KB 8 Banks 128 B block size 8-way associative Word lines Bit lines Decoders Address Bus
16
Power4 L2 Cache Block Size C = 512 KB = 2 19 B Block Size = 128 B = 2 7 B 8-way associative 8 Banks per Cache Block Therefore: Set Size is 2 3 *2 7 B= 2 10 B Sets in Cache are 2 19 /2 10 = 2 9 sets Sets per Bank are 2 9 / 2 3 = 2 6 sets L2 Cache Block Size C = 512 KB = 2 19 B Block Size = 128 B = 2 7 B 8-way associative 8 Banks per Cache Block Therefore: Set Size is 2 3 *2 7 B= 2 10 B Sets in Cache are 2 19 /2 10 = 2 9 sets Sets per Bank are 2 9 / 2 3 = 2 6 sets tagindexoffset bank indexset index 64-bit 79 6 3
17
Power4: CACTI Results cacti 524288 128 8 0.8um 8 ---------- CACTI version 3.2 ---------- Cache Parameters: Number of Subbanks: 8 Total Cache Size: 524288 Size in bytes of Subbank: 65536 Number of sets: 64 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V Access Time (ns): 12.3473 Cycle Time (wave pipelined) (ns): 4.97337 Total Power all Banks (nJ): 418.337 Total Power Without Routing (nJ): 198.563 Total Routing Power (nJ): 219.774 Maximum Bank Power (nJ): 63.5175 Best Ndwl (L1): 16 Best Ndbl (L1): 1 Best Nspd (L1): 1 Best Ntwl (L1): 1 Best Ntbl (L1): 1 Best Ntspd (L1): 1 Nor inputs (data): 2 Nor inputs (tag): 2 cacti 524288 128 8 0.8um 8 ---------- CACTI version 3.2 ---------- Cache Parameters: Number of Subbanks: 8 Total Cache Size: 524288 Size in bytes of Subbank: 65536 Number of sets: 64 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V Access Time (ns): 12.3473 Cycle Time (wave pipelined) (ns): 4.97337 Total Power all Banks (nJ): 418.337 Total Power Without Routing (nJ): 198.563 Total Routing Power (nJ): 219.774 Maximum Bank Power (nJ): 63.5175 Best Ndwl (L1): 16 Best Ndbl (L1): 1 Best Nspd (L1): 1 Best Ntwl (L1): 1 Best Ntbl (L1): 1 Best Ntspd (L1): 1 Nor inputs (data): 2 Nor inputs (tag): 2 cacti 524288 128 8 0.8um 16 ---------- CACTI version 3.2 ---------- Cache Parameters: Number of Subbanks: 16 Total Cache Size: 524288 Size in bytes of Subbank: 32768 Number of sets: 32 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V Access Time (ns): 12.434 Cycle Time (wave pipelined) (ns): 4.85483 Total Power all Banks (nJ): 793.381 Total Power Without Routing (nJ): 341.424 Total Routing Power (nJ): 451.957 Maximum Bank Power (nJ): 63.1382 Best Ndwl (L1): 16 Best Ndbl (L1): 1 Best Nspd (L1): 1 Best Ntwl (L1): 1 Best Ntbl (L1): 1 Best Ntspd (L1): 1 Nor inputs (data): 2 Nor inputs (tag): 2
18
CACTI Data Array Ndwl: World line split factor Ndbl: Bit line split factor Nspd: Number of sets mapped to a single word line (sectors) Tag Array Ntwl: World line split factor Ntbl: Bit line split factor Nspt: Number of sets mapped to a single word line (sectors) Increase of Ndbl, Nspd, Ntbl, Nspt requires the increase of sense amplifiers Increase of Ndwl and Ntwl increases the number of word line drivers Data Array Ndwl: World line split factor Ndbl: Bit line split factor Nspd: Number of sets mapped to a single word line (sectors) Tag Array Ntwl: World line split factor Ntbl: Bit line split factor Nspt: Number of sets mapped to a single word line (sectors) Increase of Ndbl, Nspd, Ntbl, Nspt requires the increase of sense amplifiers Increase of Ndwl and Ntwl increases the number of word line drivers
19
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.