Download presentation
Presentation is loading. Please wait.
1
Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scalability 36th International Symposium on Computer Architecture Brian Rogers †‡, Anil Krishna †‡, Gordon Bell ‡, Ken Vu ‡, Xiaowei Jiang †, Yan Solihin † NC STATE UNIVERSITY †‡
2
2 Scaling the Bandwidth Wall -- ISCA 2009 As Process Technology Scales … P PP P $$$ DRAM P P P P P P P P P P P P P P P P $ $ $ $ $ $$ $$ $$ $
3
3 Scaling the Bandwidth Wall -- ISCA 2009 Problem Core growth >> Memory bandwidth growth Cores: ~ exponential growth (driven by Moore’s Law) Bandwidth: ~ much slower growth (pin and power limitations) At each relative technology generation (T): (# Cores = 2 T ) >> (Bandwidth = B T ) Some key questions (Our contributions): How constraining is increasing gap between # of cores and available memory bandwidth? How should future CMPs be designed; how should we allocate transistors to caches and cores? What techniques can best reduce memory traffic demand? Build Analytical CMP Memory Bandwidth Model
4
4 Scaling the Bandwidth Wall -- ISCA 2009 Agenda Background / Motivation Assumptions / Scope CMP Memory Traffic Model Alternate Views of Model Memory Traffic Reduction Techniques Indirect Direct Dual Conclusions
5
5 Scaling the Bandwidth Wall -- ISCA 2009 Assumptions / Scope Homogenous cores Single-threaded cores (multi-threading adds to problem) Co-scheduled sequential applications Multi-threaded apps with data sharing evaluated separately Enough work to keep all cores busy Workloads static across technology generations Equal amount of cache per core Power/Energy constraints outside scope of this study
6
6 Scaling the Bandwidth Wall -- ISCA 2009 Agenda Background / Motivation Assumptions / Scope CMP Memory Traffic Model Alternate Views of Model Memory Traffic Reduction Techniques Indirect Direct Dual Conclusions
7
7 Scaling the Bandwidth Wall -- ISCA 2009 Cache Miss Rate vs. Cache Size Relationship follows the Power Law, Hartstein et al. (√2 Rule) M = M 0 * R -α R = New cache size / Old cache size α = Sensitivity of workload to cache size change
8
8 Scaling the Bandwidth Wall -- ISCA 2009 CMP Traffic Model Express chip area in terms of Core Equivalent Areas (CEAs) Core = 1 CEA, Unit_of_Cache = 1 CEA P = # cores, C = # cache CEAs, N = P+C, S = C/P Assume that non-core and non-cache components require constant fraction of area Add # of cores term for CMP model:
9
9 Scaling the Bandwidth Wall -- ISCA 2009 CMP Traffic Model (2) Going from CMP 1 = to CMP 2 = Remove common terms, express M2 in terms of M1 P = # cores, C = # cache CEAS N = P+C, S = C/P
10
10 Scaling the Bandwidth Wall -- ISCA 2009 One Generation of Scaling Baseline Processor: 8 cores, 8 cache CEAs N 1 =16, P 1 =8, C 1 =8, S 1 =1, and ~ fully utilized BW α = 0.5 How many cores possible if 32 CEAS now available? Ideal Scaling = 2X # of cores at each successive technology generation Ideal Scaling BW Limited Scaling
11
11 Scaling the Bandwidth Wall -- ISCA 2009 Agenda Background / Motivation Assumptions / Scope CMP Memory Traffic Model Alternate Views of Model Memory Traffic Reduction Techniques Indirect Direct Dual Conclusions
12
12 Scaling the Bandwidth Wall -- ISCA 2009 CMP Design Constraint If available off-chip BW grows by factor of B: Total memory traffic should grow by at most a factor of B each generation Write S 2 in terms of P 2 and N 2 : New technology: N 2 CEAs, B bandwidth => solve for P 2 numerically P 2 is # of cores that can be supported P = # cores, C = # cache CEAS N = P+C, S = C/P
13
13 Scaling the Bandwidth Wall -- ISCA 2009 Scaling Under Area Constraints With an increasing # of CEAs available, how many cores can be supported at constant BW requirement 2x die area: 1.4x cores 4x die area: 1.9x cores 8x die area: 2.4x cores 16x die area: 3.2x cores …
14
14 Scaling the Bandwidth Wall -- ISCA 2009 Agenda Background / Motivation Assumptions / Scope CMP Memory Traffic Model Alternate Views of Model Memory Traffic Reduction Techniques Indirect Direct Dual Conclusions
15
15 Scaling the Bandwidth Wall -- ISCA 2009 Categories of Techniques Indirect Cache Compression DRAM Caches 3D-stacked Cache Unused Data Filter Smaller Cores Direct Link Compression Sectored Caches Dual Cache+Link Compress Small Cache Lines Data Sharing
16
16 Scaling the Bandwidth Wall -- ISCA 2009 Indirect – DRAM Cache F – Influenced by Increased Density Ideal Scaling
17
17 Scaling the Bandwidth Wall -- ISCA 2009 Direct – Link Compression R – Influenced by Compression Ratio Ideal Scaling
18
18 Scaling the Bandwidth Wall -- ISCA 2009 Dual – Small Cache Lines F,R – Influenced by % Unused Data Ideal Scaling
19
19 Scaling the Bandwidth Wall -- ISCA 2009 Dual – Data Sharing Please see paper for details on modeling of sharing Data sharing unlikely to provide a scalable solution
20
20 Scaling the Bandwidth Wall -- ISCA 2009 Summary of Individual Techniques IndirectDirectDual
21
21 Scaling the Bandwidth Wall -- ISCA 2009 Summary of Combined Techniques
22
22 Scaling the Bandwidth Wall -- ISCA 2009 Conclusions Contributions Simple, powerful analytical CMP memory traffic model Quantify significance of memory BW wall problem 10% chip area for cores in 4 generations if constant traffic req. Guide design (cores vs. cache) of future CMPs Given fixed chip area and BW scaling, how many cores? Evaluate memory traffic reduction techniques Combinations can enable ideal scaling for several generations Need bandwidth-efficient computing: Hardware/Architecture level: DRAM caches, cache/link compression, prefetching, smarter memory controllers, etc. Technology level: 3D chips, optical interconnects, etc. Application level: working set reduction, locality enhancement, data vs. pipelined parallelism, computation vs. communication, etc.
23
23 Scaling the Bandwidth Wall -- ISCA 2009 Questions ? Thank You Brian Rogers bmrogers@ece.ncsu.edu
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.