Download presentation
Presentation is loading. Please wait.
Published byTriston Holton Modified over 9 years ago
1
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Archipelago: A Polymorphic Cache Design for Enabling Robust Near-Threshold Operation HPCA-17 February 16, 2011 Amin Ansari, Shuguang Feng, Shantanu Gupta, and Scott Mahlke University of Michigan, Ann Arbor
2
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Matching Power Consumption and Utilization [Roth et. al.] More than 80% of times idle Large SRAM structures limit the Min V dd Logic cells can operate close to V th More than 50% of all computers [Webber et. al.] DVS to improve battery life Core i7 achieves 37% power reduction in idle state. Core i7 achieves 37% power reduction in idle state. 2
3
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Bit-Error-Rate for an SRAM Cell Extremely fast growth in failure rate with decreasing V dd 3
4
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Our Goal Enabling DVS to push core’s V dd down to o Ultra low voltage region ( < 650mV ) o While preserving correct functionality of on-chip caches 4 Proposing a highly flexible and FT cache architecture that can efficiently tolerate these SRAM failures Minimizing our overheads in high- power mode
5
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Archipelago (AP) 1 2 3 4 5 6 7 8 data chunk sacrificial line This particular cache has only a single functional line. This particular cache has only a single functional line. By forming autonomous islands, AP saves 6 out of 8 lines. By forming autonomous islands, AP saves 6 out of 8 lines. Island 1 Island 2
6
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Baseline AP Architecture 6 MUXing layer First BankSecond Bank Functional Block G3 Fault Map -- G3 Input Address Data line Sacrificial line Fault map address Memory Map Added modules: + Memory map + Fault map + MUXing layer Two type of lines: + data line + sacrificial line Two lines have collision, if they have at least one faulty chunk in the same position (10 and 15 are collision free) There should be no collision between lines within a group [Group 3 (G3) contains lines 4, 10, and 15] S
7
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science AP with Relaxed Group Formation 7 Sacrificial lines do not contribute to the effective capacity o We want to minimize the total number of groups First Bank Second Bank S S First Bank Second Bank S
8
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Semi-Sacrificial Lines 8 First Bank Second Bank Sacrificial line Semi-sacrificial line MUXing Layer Semi-sacrificial line guarantees the parallel access In contrast to a sacrificial line, it also contributes to the effective cache capacity Semi-sacrificial line guarantees the parallel access In contrast to a sacrificial line, it also contributes to the effective cache capacity
9
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science AP with Semi-Sacrificial Lines 9 S
10
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science AP Configuration We model the problem as a graph: o Each node is a line of the cache. o Edge when there is no collision between nodes A collision free group forms a clique o Group formation Finding the cliques 10 To maximize the number of functional lines, we need to minimize the number of groups. o minimum clique cover (MCC).
11
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science AP Configuration Example 11 1 3 72 4 6 8 G2(1) G2(2) G1(2) G2(S) 5 9 10 1 2 3 4 5 8 7 6 9 G1(1) G2(3) G2(4) G1(S) G1(3) D First BankSecond Bank Island or Group 1 Island or Group 2 Disabled
12
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Operation Modes 12 High power mode ( AP is turned off) There is no non-functional lines in this case Clock gating to reduce dynamic power of SRAM structures Low power mode o During the boot time in low-power mode BIST scans cache for potential faulty cells Processor switches back to high power mode Forms groups and configure the HW
13
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Evaluation Methodology Performance [DEC Alpha 21364] o SimAlpha that is based on SimpleScalar OoO [SPEC2K] Delay, power and area o Wattch and hot-leakage for power of processor o Artisan memory-compiler for our SRAM structures o CACTI for baseline on-chip caches (64KB, 2MB) o Synopsys design-compiler and power-compiler for Miscellaneous logic (e.g. bypass MUXes and comparators) Given set of cache parameters (e.g. V dd ) o Monte Carlo (with 1000 iterations) using our modified MCC o Determining disabled portion of caches (for 99% yield) 13
14
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Minimum Achievable V dd 14
15
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Overheads Overheads for L1 and L2 caches o 10T used to protect the fault map, tag array, and memory map 15
16
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Performance Loss One extra cycle latency for L1 and 2 cycles for L2 16
17
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Summary of Benefits 17 Larger leakage power savings for deeper technology nodes Larger leakage power savings for deeper technology nodes
18
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science 18 Comparison with Alternative Methods Conventional Recently Proposed
19
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Conclusion DVS is widely used to deal with high power dissipation o Minimum achievable voltage is bounded by SRAM structures We proposed a highly flexible cache architecture o To tolerate failures when operating in near-threshold region Using our approach o V dd of processor can be reduced to 375mV o 79% dynamic power saving and 51% leakage power saving o < 10% area overhead and performance overheads 19
20
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science Thank You 20 http://cccp.eecs.umich.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.