Presentation is loading. Please wait.

Presentation is loading. Please wait.

Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications.

Similar presentations


Presentation on theme: "Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications."— Presentation transcript:

1 Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications for Microprocessors

2 Multicore Decade? 2 We have relied on multicore scaling for over five years. How much longer will it be our primary performance scaling technique? 2000200520102015 Pentium Extreme Dual-Core Core 2 Quad-Core i7 980x Hex-Core ?

3 Finding Optimal Multicore Designs 3 Comprehensive design space: Fixed area budget Fixed power budget Two sets of CMOS scaling projections Optimal core and diverse multicore organizations Parallel benchmarks For next 5 technology generations, we find the best performing multicore from a comprehensive design space search for each of the PARSEC benchmarks

4 Symmetric Multicore Projections 4 Symmetric multicores alone will not sustain the multicore era. 3.4x in 10 years 18x

5 Multicore Solutions 5 Asymmetric Topologies 3.5x

6 Multicore Solutions 6 Dynamic Topologies [Chakraborty (2008), Suleman et al (2009)] 3.5x

7 Multicore Solutions 7 Composed/Fused Topologies [Ipek et al (2007), Kim et al (2007)] 3.7x

8 Multicore Solutions 8 GPU-Style Cores 2.7x

9 Multicore Era Projections 9 The best designs speed up 14% per year rather than the recent trend of 34% per year 3.7x 18x

10 Why Diminishing Returns? 10 Transistor area is still scaling Voltage and capacitance scaling have slowed Result: designs are power, not area, limited

11 Devices Find the best case technology scaling Cores Find the best cores Multicores Find the best multicore organization Projections Predict best case multicore performance for each technology generation Overview 11

12 ConservativeOptimistic Area32x Power4.5x8.3x Frequency1.3x3.9x [Borkar 2007][ITRS 2010] Device Scaling Projections 12 From 45 nm to 8 nm:

13 Modeling Ideal Core Power/Perf. 13 Atom Nehalem Pareto Frontier includes all optimal power/performance points Repeat using core area for optimal area/performance points

14 Combining Device and Core Models 14 45 nm Frontier 32 nm Frontier Device Scaling

15 Devices Find the best case technology scaling Cores Find the best cores Multicores Find the best multicore organization Projections Predict best case multicore performance for each technology generation Overview 15

16 What belongs in multicore model? 16 Styles Applications Topologies Area & Power / Performance Tradeoffs Architectures PARSEC f parallel, Data Use Number of Threads, Cache Sizes Area & Power Budget Cache & memory latencies, memory bandwidth Pareto Frontiers

17 Multicore Speedup Model 17 Multicore Speedup 1 1-f parallel Serial Speedup f parallel Parallel Speedup = +

18 Multicore Performance Model 18 Performance is limited by: and Memory bandwidth BW max / (instructions per byte from memory) Computation N cores  (core frequency/CPI exe )  core utilization [Guz et al, 2009]

19 Core Utilization Model 19 Core utilization is limited by: Fraction of Time Core is Ready to Issue Number of Threads in Core / Number of Threads to Keep Busy [Guz et al, 2009]

20 Multicore Model & Pareto Frontiers 20 100 points A(q), P(q) q

21 Translating from SPECmark 21 1. From q, find core’s SPECmark speedup 2. Frequency linearly distributed from Atom to Nehalem 3. Recall: model predicts benchmark performance as f(benchmark chars, frequency, CPI exe ) 4. Compute CPI exe such that Benchmark Speedup = SPECmark Speedup

22 Area and Power Constraints 22 N cores x A(q) ≤ Area Budget N cores x P(q) ≤ Power Budget Dark silicon = N cores / # of cores that fit in chip area

23 Devices Find the best case technology scaling Cores Find the best cores Multicores Find the best multicore organization Projections Predict best case multicore performance for each technology generation Overview 23

24 Dark Silicon 24 Sources of Dark Silicon: Power + Limited Parallelism At 22 nm: At 8 nm: 17% 26% 51% 71%

25 Overall Performance 25 Target f parallel = 0.99 18x 16x 8x 6x 3x

26 Conclusions Multicore performance gains are limited Need at least 18%-40% per generation from architecture alone without additional power 26 Unicore EraMulticore Era ?

27 27 Specialization Shrinking chips Pervasive Efficiency


Download ppt "Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications."

Similar presentations


Ads by Google