Download presentation
Presentation is loading. Please wait.
Published byVictor Hicks Modified over 9 years ago
1
Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien Lu Intel Labs
2
Overview eDRAM refresh contributes overall power ECC can increase refresh time –High cost: Storage/logic/latency Hi-ECC: a practical multi-bit ECC –Addresses traditional obstacles to multi-bit ECC Reduces storage/logic overhead Minimizes latency –Reduces eDRAM refresh power by 93%
3
Why Refresh Power? Typical Usage: 27% total power A. Naveh, et al., “Power and thermal management in the Intel® Core® Duo processor,” Intel Technology Journal, vol. 10, no. 2, May 2006. CPU Idle CPU Active Intel® Centrino® 2 Processor Technology. www.intel.com
4
Reducing eDRAM Refresh Power Option 1: Power gating eDRAM banks –50% power gating -> 50% reduction in refresh power –Lose 64MB of cache state. ~2ms to refetch 64MB from bulk DRAM >>typical idle exit/enter 100-200us Option 2: Extend Refresh Time...
5
Problem w/ Extending Refresh Time Reference: Wong et al., ITC 2008 128MB eDRAM cache single bit (eDRAM cell) Target yield loss ~30us, ~930mW Extending refresh time introduces random bit failures
6
Ways to Extend Refresh Time Use tests to identify/avoid weak bits: –RAPID: R. Venkatesan et al. (HPCA 06) –BitFix: C. Wilkerson et al. (ISCA 08) Eliminate refresh when data is written/read. –Smart Refresh: M. Ghosh, H. Lee (Micro 07) Use error correcting codes (ECC) to eliminate refresh related failures –Rethinking Refresh: P. Emma et al. IEEE Micro Nov 08
7
Impact of ECC on Refresh Time 30us 930mW Base: 128MB eDRAM cache
8
Impact of ECC on Refresh Time 30us 930mW SECDEC 150us, 185mW Base: 128MB eDRAM cache
9
Impact of ECC on Refresh Time 30us 930mW SECDEC 150us, 185mW 5EC6ED Base: 128MB eDRAM cache
10
Reducing Cost of ECC Storage 64B 1 cycle 6 cycle 11 cycle 6 cycle 11 cycle 165 cycle 170 cycle 1KB Reduced complexity at the cost of high latency. Assume 1-xor = 10 DRAM bits Partial (64) reads/writes on 1KB lines
11
The Partial Read/Write Problem ECCx161KB (64Bx16) ECC processing Scenario 1: Segmented ECC ECC processing Scenario 2: Monolithic ECC 1KB 64Bx16ECC ReadWrite Entire 1KB line must be read to decode ECC 64B write requires read-modify-write
12
After reading/checking a line, lines are guaranteed to be error free for 30us. Recently Accessed Line Table (RALT) identifies lines referenced 30us ago. Lines identified by RALT don’t require ECC checking, do not require 1KB reads. RALT Reduces Reads
13
Reducing the Cost of ECC Logic 64B 1 cycle 6 cycle 11 cycle 6 cycle 11 cycle 165 cycle 170 cycle 1KB Reduced complexity at the cost of high latency. Assume 1-xor = 10 DRAM bits Hi-ECC
14
Quick ECC Reduces Latency CPUCPU TAG/ECC ARRAY eDRAM Address Quick ECC >1 fail? High latency ECC processing ~165 cycles No Yes Functionally equivalent to full ECC. Optimizes the common case of 1 error or less.
15
Hi-ECC Reduce storage costs of ECC: –Amortize ECC bits over larger lines RALT: –Facilitates partial reads Quick-ECC: –Minimizes latency for error-detect/single-bit correct –High latency (165 cyles) for lines with multi-bit failures. Disable lines with multi-bit failures to further reduce latency –900 out of 128K (< 1%) lines have multi-bit failures –Total overhead (disabled lines + ECC code): ~1.6%
16
Evaluation Intel® Core™ i7-like processor (2GHz) –256KB L2 Cache/ 128MB eDRAM cache (40-cycle) SD: SECDED –Reads/Writes: 2-cycle latency HE: Hi-ECC w/o RALT –Reads: 32-cycle latency; Writes: read-modify-write HER: Hi-ECC w/ RALT –RALT miss/hit: 32/2-cycle latency Negligible perf impact: ~0.1 – 0.5%
17
Power Analysis ISPEC FSPEC GM OFF SERV GMEAN SD: SECDED HE: Hi-ECC HER: Hi-ECC w/ RALT HER reduces cache power 92% vs BASE, 61% vs SECDED
18
Conclusions Idle power significant for low power systems eDRAM refresh ~27% of total power ECC can increase refresh time –High storage/logic/latency Hi-ECC: a practical multi-bit ECC –Addresses traditional obstacles to multi-bit ECC Reduces storage/logic overhead Minimizes latency 93% reduction in eDRAM refresh power Still have a problem with writes…
19
Backup
20
Power Analysis Base eDRAM SD: SECDED HE: Hi-ECC HER: Hi-ECC w/ RALT
21
CPU idle power A. Naveh, et al., “Power and thermal management in the Intel® Core® Duo processor,” Intel Technology Journal, vol. 10, no. 2, May 2006. High frequency idle enter/exits motivates low cost transitions (100-200us) Low average power motivates aggressive power management. ~0.5W ~1.05W Bar chart compares DRAM power & CPU power (active, idle) Power vs Refresh time for eDRAM.
22
eDRAM Idle Power A. Naveh, et al., “Power and thermal management in the Intel® Core® Duo processor,” Intel Technology Journal, vol. 10, no. 2, May 2006. Refresh Power for 128MB eDRAM 50% Reduction in Refresh Power CPU idle power Double Refresh Time
23
ECC processing 1KB 64Bx16ECC ….…. line addrparityvalid 2 bit period 0 1 … 63 ADDR HIT MISS recency cntr ~30us Period? HIT RALT Reduces Reads First read requires 1KB read. Subsequent reads
24
Remove Failing Bits w/ BitFix Read Repaired Line Quick ECC > 1 fail? N Generate ECC Write Physical Line Failure Locations
25
Further Reducing Latency Read Failing Bits Repaired Line Failure Locations Quick ECC > 1 fail? Y High latency ECC processing Of 128K lines < 900 require high latency ECC processing. Disable lines with high failure rates. If too many lines have multi-bit failures… ECC
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.