Amdahl’s Law in the Multicore Era Mark D.Hill & Michael R.Marty 2008 ECE 259 / CPS 221 Advanced Computer Architecture II Presenter : Tae Jun Ham
Outline Summary - Amdahl’s law in the multicore era - Symmetric MC Case - Asymmetric MC Case - Dynamic MC Case Review - Strong Point - Negative Point - Possible Questions
Problem Multicore Chip Design has additional degree of freedom - Total number of Cores - Complexity of the individual core - Multicore Chip Design Style (Symmetric / Asymmetric / Dynamic) Goal of this paper : To explore the design space of multicore chip and obtaining some useful implication for computer architects
Amdahl’s Law Original : Multicore :
Basic Assumptions Limited Resource : Area Resource Unit : BCE(Base Core Equivalence) Simple Core : Consume : 1 BCE Performance : 1 Complex Core : Consume : r BCEs Performance : perf(r) = sqrt(r)
Symmetric Multicore Model Resource : n BCEs Each core consumes r BCEs Total number of core : n/r Serial Performance : perf(r) Parallel Performance : perf(r) * (n/r)
Symmetric Multicore Analysis Parallelization is important rBCEs>1 can be optimal (Complex core is still important even with the diminishing return in performance per area)
Asymmetric Multicore Model Resource : n BCEs One complex core consumes r BCEs Other cores consumes 1 BCE Total number of core : n-r+1 Serial Performance : perf(r) Parallel Performance : perf(1) * (n-r)+perf(r)
Asymmetric Multicore Analysis Asymmetric multicore allows better speedups For asymmetric multicore, having a nice complex core is crucial
Dynamic Multicore Model Resource : n BCEs Forms a r BCEs complex core for sequential operation Other part consumes 1 BCE Total number of core : n ( parallel ) / n-r+1 (serial) Serial Performance : perf(r) Parallel Performance : n * perf(1) = n
Dynamic Multicore Analysis Dynamic Multicore provides better speedups
Strength Identified the future research direction 1. Increase Parallelism 2. Increase Core Performance 3. Better asymmetric & dynamic multicore design Derived corollary for Amdahl’s law for multicore cases
Limitation Not very accurate model 1. Limited Resource : combination of power, area and cost 2. Performance Model : can be different from sqrt(r) 3. Need to consider partially parallel portion Skepticism 1. Can Moore’s law continue till 256 core per chip? 2. Can we really achieve 99.9% parallelization? Optimal point highly depends on parallel portion. As parallel portion differs among applications, it is hard to determine the best hardware design
Future work / Discussions What would be the appropriate ways to implement dynamic multicore design with HW? How do we develop a better analytical model for multicore performance? What would be software challenges for asymmetric multicore or dynamic multicore? What would be the most power efficient multicore design among three choices presented?