Download presentation
Presentation is loading. Please wait.
Published byLindsay Terry Modified over 9 years ago
1
1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah
2
2 Main Memory Problems PROCESSOR DIMM 1. Energy 2. High capacity at high bandwidth 3. Reliability
3
3 Motivation: Memory Energy Contributions of memory to overall system energy: 25-40%, IBM, Sun, and Google server data summarized by Meisner et al., ASPLOS’09 HP servers: 175 W out of ~785 W for 256 GB memory (HP power calculator) Intel SCC: memory controller contributes 19-69% of chip power, ISSCC’10
4
4 Motivation: Reliability DRAM data from Schroeder et al., SIGMETRICS’09: 25K-70K errors per billion device hours per Mbit 8% of DRAM DIMMs affected by errors every year DRAM error rates may get worse as scalability limits are reached; PCM (hard and soft) error rates expected to be high as well Primary concern: storage and energy overheads for error detection and correction ECC support is not too onerous; chip-kill is much worse
5
5 Motivation: Capacity, Bandwidth Processor DIMM
6
6 Motivation: Capacity, Bandwidth Processor Cores are increasing, but pins are not DIMM
7
7 Motivation: Capacity, Bandwidth Processor Cores are increasing, but pins are not DIMM High channel frequency fewer DIMMs Will eventually need disruptive shifts: NVM, optics Can’t have high capacity, high bandwidth, and low energy Pick 2 of the 3!
8
8 Memory System Basics Processor M DIMM M M Multiple on-chip memory controllers that handle multiple 64-bit channels
9
9 Memory System Basics: FB-DIMM Processor M DIMM M FB-DIMM: Can boost capacity with narrow channels and buffering at each DIMM M DIMM M M
10
10 What’s a Rank? Processor M x8 64b DIMM Rank: DRAM chips required to provide the 64b output expected by a JEDEC standard bus For example: 8 x8 DRAM chips
11
11 What’s a Bank? Processor M x8 64b DIMM Bank: A portion of a rank that is tied up when servicing a request; multiple banks in a rank enable parallel handling of multiple requests BANK
12
12 What’s an Array? Processor M x8 64b DIMM Array: Matrix of cells One array provides 1 bit/cycle Each array reads out an entire row Large array high density BANK
13
13 What’s a Row Buffer? … Array Wordline Bitlines Row Buffer RAS CAS Output pin
14
14 Row Buffer Management Row buffer: collection of rows read out by arrays in a bank Row buffer hits incur low latency and low energy Bitlines must be precharged before a new row can be read Open page policy: delays the precharge until a different row is encountered Close page policy: issues the precharge immediately
15
15 Primary Sources of Energy Inefficiency Overfetch: 8 KB of data read out for each cache line request Poor row buffer hit rates: diminished locality in multi-cores Electrical medium: bus speeds have been increasing Reliability measures: overhead in building a reliable system from inherently unreliable parts
16
16 SECDED Support 64-bit data word8-bit ECC One extra x8 chip per rank Storage and energy overhead of 12.5% Cannot handle complete failure in one chip
17
17 Chipkill Support I Use 72 DRAM chips to read out 72 bits Dramatic increase in activation energy and overfetch Storage overhead is still 12.5% 64-bit data word8-bit ECC At most one bit from each DRAM chip
18
18 Chipkill Support II Use 13 DRAM chips to read out 13 bits Storage and energy overhead: 62.5% Other options exist; trade-off between energy and storage 8-bit data word5-bit ECC At most one bit from each DRAM chip
19
19 Summary So Far We now understand… why memory energy is a problem - overfetch, row buffer miss rates why reliability incurs high energy overheads - chipkill support requires high activation per useful bit why capacity and bandwidth increases cost energy - need high frequency and buffering per hop
20
20 Crucial Timing Disruptive changes may be compelling today… Increasing role of memory energy Increasing role of memory errors Impact of multi-core: high bandwidth needs, loss of locality Emerging technologies (NVM, optics) will require a revamp of memory architecture ideas can be easily applied to NVM role of DRAM may change
21
21 Attacking the Problem Find ways to maximize row buffer utility Find ways to reduce overfetch Treat reliability as a first-class design constraint Use photonics and 3D to boost capacity and bandwidth Solutions must be very cost-sensitive
22
22 Maximizing Row Buffer Locality Micro-pages (ASPLOS’10) Handling multiple memory controllers (PACT’10) On-going work: better write scheduling, better bank management (data mapping, row closure)
23
23 Micro-Pages Key observation: most accesses to a page are localized to a small region (micro-page)
24
24 Solution Identify hot micro-pages Co-locate hot micro-pages in reserved DRAM rows Memory controller keeps track of re-direction Low overheads if applications have few hot micro-pages that account for most memory accesses Processor M DIMM
25
25 Results Overall 9% improvement in performance and 15% reduction in energy
26
26 Handling Multiple Memory Controllers Data mapping across multiple memory controllers is key: Must equalize load and queuing delays Must minimize “distance” Must maximize row buffer hit rates M DIMM M M
27
27 Solution Cost function to guide initial page placement Similar cost function to guide page migration Initial page placement improves performance by 7%, page migration by 9% Row buffer hit rates can be doubled
28
28 Reducing Overfetch Key idea: eliminate overfetch by employing smaller arrays and activating a single array in a single chip Single Subarray Access (SSA), ISCA’10 Positive effects: Minimizes activation energy Small activation footprint: more arrays can be asleep longer Enables higher parallelism and reduces queuing delays Negative effects: Longer transfer time Drop in density No row buffer hits Vulnerable to chip failure Change to standards
29
29 Energy Results Dynamic energy reduction of 6x In some cases, 3x reduction in leakage
30
30 Performance Results SSA better on half the programs (mem-intensive ones)
31
31 Support for Reliability Checksum support per row allows low-cost error detection Can build a 2 nd tier error-correction scheme, based on RAID DRAM chip Checksum Data row … Parity DRAM chip Reads: single array read Writes: two array reads and two array writes
32
32 Capacity and Bandwidth Silicon photonics to break the pin barrier at the processor But, several concerns at the DIMM: Breaking the DRAM pin barrier will impact cost! High capacity daisy-chaining and loss of power High static power for photonics; need high utilization Scheduling for large capacities
33
33 Exploiting 3D Stacks (ISCA’11) Processor DIMM Waveguide DRAM chips Interface die + Stack controller Memory controller Interface die for photonic penetration Does not impact DRAM design Few photonic hops; high utilization Interface die schedules low-level operations
34
34 Packet-Based Scheduling Protocol High capacity high scheduling complexity Move to a packet-based interface Processor issues an address request Processor reserves a slot for data return Scheduling minutiae are handled by stack controller Data is returned at the correct time Back-up slot in case deadline is not met Better plug’n’play Reduced complexity at processor Can handle heterogeneity
35
35 Summary Treat reliability as a first-order constraint Possible to use photonics to break pin barrier and not disrupt memory chip design: boosts bandwidth and capacity ! Can reduce memory chip energy by reducing overfetch and with better row buffer management
36
36 Acks Terrific students in the Utah Arch group Prof. Al Davis (Utah) and collaborators at HP, Intel, IBM Funding from NSF, Intel, HP, University of Utah
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.