Download presentation
Presentation is loading. Please wait.
Published byCalvin Houston Modified over 9 years ago
1
Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications for Microprocessors
2
Multicore Decade? 2 We have relied on multicore scaling for over five years. How much longer will it be our primary performance scaling technique? 2000200520102015 Pentium Extreme Dual-Core Core 2 Quad-Core i7 980x Hex-Core ?
3
Finding Optimal Multicore Designs 3 Comprehensive design space: Fixed area budget Fixed power budget Two sets of CMOS scaling projections Optimal core and diverse multicore organizations Parallel benchmarks For next 5 technology generations, we find the best performing multicore from a comprehensive design space search for each of the PARSEC benchmarks
4
Symmetric Multicore Projections 4 Symmetric multicores alone will not sustain the multicore era. 3.4x in 10 years 18x
5
Multicore Solutions 5 Asymmetric Topologies 3.5x
6
Multicore Solutions 6 Dynamic Topologies [Chakraborty (2008), Suleman et al (2009)] 3.5x
7
Multicore Solutions 7 Composed/Fused Topologies [Ipek et al (2007), Kim et al (2007)] 3.7x
8
Multicore Solutions 8 GPU-Style Cores 2.7x
9
Multicore Era Projections 9 The best designs speed up 14% per year rather than the recent trend of 34% per year 3.7x 18x
10
Why Diminishing Returns? 10 Transistor area is still scaling Voltage and capacitance scaling have slowed Result: designs are power, not area, limited
11
Devices Find the best case technology scaling Cores Find the best cores Multicores Find the best multicore organization Projections Predict best case multicore performance for each technology generation Overview 11
12
ConservativeOptimistic Area32x Power4.5x8.3x Frequency1.3x3.9x [Borkar 2007][ITRS 2010] Device Scaling Projections 12 From 45 nm to 8 nm:
13
Modeling Ideal Core Power/Perf. 13 Atom Nehalem Pareto Frontier includes all optimal power/performance points Repeat using core area for optimal area/performance points
14
Combining Device and Core Models 14 45 nm Frontier 32 nm Frontier Device Scaling
15
Devices Find the best case technology scaling Cores Find the best cores Multicores Find the best multicore organization Projections Predict best case multicore performance for each technology generation Overview 15
16
What belongs in multicore model? 16 Styles Applications Topologies Area & Power / Performance Tradeoffs Architectures PARSEC f parallel, Data Use Number of Threads, Cache Sizes Area & Power Budget Cache & memory latencies, memory bandwidth Pareto Frontiers
17
Multicore Speedup Model 17 Multicore Speedup 1 1-f parallel Serial Speedup f parallel Parallel Speedup = +
18
Multicore Performance Model 18 Performance is limited by: and Memory bandwidth BW max / (instructions per byte from memory) Computation N cores (core frequency/CPI exe ) core utilization [Guz et al, 2009]
19
Core Utilization Model 19 Core utilization is limited by: Fraction of Time Core is Ready to Issue Number of Threads in Core / Number of Threads to Keep Busy [Guz et al, 2009]
20
Multicore Model & Pareto Frontiers 20 100 points A(q), P(q) q
21
Translating from SPECmark 21 1. From q, find core’s SPECmark speedup 2. Frequency linearly distributed from Atom to Nehalem 3. Recall: model predicts benchmark performance as f(benchmark chars, frequency, CPI exe ) 4. Compute CPI exe such that Benchmark Speedup = SPECmark Speedup
22
Area and Power Constraints 22 N cores x A(q) ≤ Area Budget N cores x P(q) ≤ Power Budget Dark silicon = N cores / # of cores that fit in chip area
23
Devices Find the best case technology scaling Cores Find the best cores Multicores Find the best multicore organization Projections Predict best case multicore performance for each technology generation Overview 23
24
Dark Silicon 24 Sources of Dark Silicon: Power + Limited Parallelism At 22 nm: At 8 nm: 17% 26% 51% 71%
25
Overall Performance 25 Target f parallel = 0.99 18x 16x 8x 6x 3x
26
Conclusions Multicore performance gains are limited Need at least 18%-40% per generation from architecture alone without additional power 26 Unicore EraMulticore Era ?
27
27 Specialization Shrinking chips Pervasive Efficiency
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.