Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scalability 36th International Symposium on Computer Architecture Brian Rogers †‡, Anil Krishna.

Similar presentations


Presentation on theme: "Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scalability 36th International Symposium on Computer Architecture Brian Rogers †‡, Anil Krishna."— Presentation transcript:

1 Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scalability 36th International Symposium on Computer Architecture Brian Rogers †‡, Anil Krishna †‡, Gordon Bell ‡, Ken Vu ‡, Xiaowei Jiang †, Yan Solihin † NC STATE UNIVERSITY †‡

2 2 Scaling the Bandwidth Wall -- ISCA 2009 As Process Technology Scales … P PP P $$$ DRAM P P P P P P P P P P P P P P P P $ $ $ $ $ $$ $$ $$ $

3 3 Scaling the Bandwidth Wall -- ISCA 2009 Problem Core growth >> Memory bandwidth growth  Cores: ~ exponential growth (driven by Moore’s Law)  Bandwidth: ~ much slower growth (pin and power limitations) At each relative technology generation (T):  (# Cores = 2 T ) >> (Bandwidth = B T ) Some key questions (Our contributions):  How constraining is increasing gap between # of cores and available memory bandwidth?  How should future CMPs be designed; how should we allocate transistors to caches and cores?  What techniques can best reduce memory traffic demand? Build Analytical CMP Memory Bandwidth Model

4 4 Scaling the Bandwidth Wall -- ISCA 2009 Agenda Background / Motivation Assumptions / Scope CMP Memory Traffic Model Alternate Views of Model Memory Traffic Reduction Techniques  Indirect  Direct  Dual Conclusions

5 5 Scaling the Bandwidth Wall -- ISCA 2009 Assumptions / Scope Homogenous cores Single-threaded cores (multi-threading adds to problem) Co-scheduled sequential applications  Multi-threaded apps with data sharing evaluated separately Enough work to keep all cores busy Workloads static across technology generations Equal amount of cache per core Power/Energy constraints outside scope of this study

6 6 Scaling the Bandwidth Wall -- ISCA 2009 Agenda Background / Motivation Assumptions / Scope CMP Memory Traffic Model Alternate Views of Model Memory Traffic Reduction Techniques  Indirect  Direct  Dual Conclusions

7 7 Scaling the Bandwidth Wall -- ISCA 2009 Cache Miss Rate vs. Cache Size Relationship follows the Power Law, Hartstein et al. (√2 Rule) M = M 0 * R -α R = New cache size / Old cache size α = Sensitivity of workload to cache size change

8 8 Scaling the Bandwidth Wall -- ISCA 2009 CMP Traffic Model Express chip area in terms of Core Equivalent Areas (CEAs)  Core = 1 CEA, Unit_of_Cache = 1 CEA  P = # cores, C = # cache CEAs, N = P+C, S = C/P Assume that non-core and non-cache components require constant fraction of area Add # of cores term for CMP model:

9 9 Scaling the Bandwidth Wall -- ISCA 2009 CMP Traffic Model (2) Going from CMP 1 = to CMP 2 = Remove common terms, express M2 in terms of M1 P = # cores, C = # cache CEAS N = P+C, S = C/P

10 10 Scaling the Bandwidth Wall -- ISCA 2009 One Generation of Scaling Baseline Processor: 8 cores, 8 cache CEAs  N 1 =16, P 1 =8, C 1 =8, S 1 =1, and ~ fully utilized BW  α = 0.5 How many cores possible if 32 CEAS now available?  Ideal Scaling = 2X # of cores at each successive technology generation Ideal Scaling BW Limited Scaling

11 11 Scaling the Bandwidth Wall -- ISCA 2009 Agenda Background / Motivation Assumptions / Scope CMP Memory Traffic Model Alternate Views of Model Memory Traffic Reduction Techniques  Indirect  Direct  Dual Conclusions

12 12 Scaling the Bandwidth Wall -- ISCA 2009 CMP Design Constraint If available off-chip BW grows by factor of B:  Total memory traffic should grow by at most a factor of B each generation Write S 2 in terms of P 2 and N 2 : New technology: N 2 CEAs, B bandwidth => solve for P 2 numerically P 2 is # of cores that can be supported P = # cores, C = # cache CEAS N = P+C, S = C/P

13 13 Scaling the Bandwidth Wall -- ISCA 2009 Scaling Under Area Constraints With an increasing # of CEAs available, how many cores can be supported at constant BW requirement 2x die area: 1.4x cores 4x die area: 1.9x cores 8x die area: 2.4x cores 16x die area: 3.2x cores …

14 14 Scaling the Bandwidth Wall -- ISCA 2009 Agenda Background / Motivation Assumptions / Scope CMP Memory Traffic Model Alternate Views of Model Memory Traffic Reduction Techniques  Indirect  Direct  Dual Conclusions

15 15 Scaling the Bandwidth Wall -- ISCA 2009 Categories of Techniques Indirect Cache Compression DRAM Caches 3D-stacked Cache Unused Data Filter Smaller Cores Direct Link Compression Sectored Caches Dual Cache+Link Compress Small Cache Lines Data Sharing

16 16 Scaling the Bandwidth Wall -- ISCA 2009 Indirect – DRAM Cache F – Influenced by Increased Density Ideal Scaling

17 17 Scaling the Bandwidth Wall -- ISCA 2009 Direct – Link Compression R – Influenced by Compression Ratio Ideal Scaling

18 18 Scaling the Bandwidth Wall -- ISCA 2009 Dual – Small Cache Lines F,R – Influenced by % Unused Data Ideal Scaling

19 19 Scaling the Bandwidth Wall -- ISCA 2009 Dual – Data Sharing Please see paper for details on modeling of sharing Data sharing unlikely to provide a scalable solution

20 20 Scaling the Bandwidth Wall -- ISCA 2009 Summary of Individual Techniques IndirectDirectDual

21 21 Scaling the Bandwidth Wall -- ISCA 2009 Summary of Combined Techniques

22 22 Scaling the Bandwidth Wall -- ISCA 2009 Conclusions Contributions  Simple, powerful analytical CMP memory traffic model  Quantify significance of memory BW wall problem 10% chip area for cores in 4 generations if constant traffic req.  Guide design (cores vs. cache) of future CMPs Given fixed chip area and BW scaling, how many cores?  Evaluate memory traffic reduction techniques Combinations can enable ideal scaling for several generations Need bandwidth-efficient computing:  Hardware/Architecture level: DRAM caches, cache/link compression, prefetching, smarter memory controllers, etc.  Technology level: 3D chips, optical interconnects, etc.  Application level: working set reduction, locality enhancement, data vs. pipelined parallelism, computation vs. communication, etc.

23 23 Scaling the Bandwidth Wall -- ISCA 2009 Questions ? Thank You Brian Rogers bmrogers@ece.ncsu.edu


Download ppt "Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scalability 36th International Symposium on Computer Architecture Brian Rogers †‡, Anil Krishna."

Similar presentations


Ads by Google