Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of.

Similar presentations


Presentation on theme: "Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of."— Presentation transcript:

1 Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of Sciences yaoerlin@gmail.com, {baoyg, tgm, cmy}@ncic.ac.cn

2 A Brief Intro Of ICT, CAS ICT has built the Fastest HPC in China – Dawning 5000, which is 233.5TFlops and rank 10 th in Top500. ICT has developed the Loongson CPU

3 Outline I. Background and Related Works II. Model of Multicore Scalability III. Symmetrical Multicore Chips IV. Asymmetrical Multicore Chips V. Dynamic Multicore Chips VI. Conclusion and Future Work

4 We are in the Multi-Core Era Mainstream market has already been dominated by multicore Intel: 2-core Core Duo, 4-core i7 AMD: 2-core Athlon, 4-core Opteron IBM: 2-core POWER6, 9-core Cell Sun: 8-core T1/T2 ……

5 Many-Core is coming Some processor vendors have announced or released their manycore processors Tilera: 64-core Intel: 80-core GPGPU: 100x-core ……

6 Revisiting Amdahl’s Law in the Multi/Many-Core Era Assume that a fraction f of a program’s execution time was infinitely parallelizable with no scheduling overhead, while the remaining fraction, 1 − f, was totally sequential. Using p processors to accelerate the parallel fraction. Fixed-size speedup, the amount of work to be executed is independent of the number of processors

7 Implications of Amdahl’s Law Despite its simplicity, Amdahl’s law applies broadly and gives important insights such as: (i) Attack the common case: When f is small, optimization will have little effect. (ii) The aspects you ignore also limit speedup: Even if p approaches infinity, speedup is bounded by 1/(1−f).

8 Mark Hill et al.’s Insights Hill and Marty apply Amdahl’s law to multicore hardware by constructing a cost model for the number and performance of cores in one chip.  Obtaining optimal multicore performance requires further research both in extracting more parallelism and in making sequential cores faster. Woo and Lee have extended Hill’s work by taking power and energy into account.

9 Motivation of Our Work The revised Amdahl’s Law model provides a better understanding of multicore scalability. However, there is little work on theoretical analysis. This paper presents our investigations on theoretical analysis of multicore scalability and attempts to find the optimal results under different conditions.

10 Model of Multicore Scalability We adopt the same cost model on multicore hardware proposed by Hill and Marty, which includes two assumptions: First, assume that a multicore chip of given size and technology generation can contain at most n base core equivalents (BCE) Second, assume that the individual core with more resources (r BCEs) can achieve better sequential performance. –1 < perf(r) < r The architecture of multicore chips can be classified into three types: –Symmetric –Asymmetric –Dynamic

11 Model-Symmetrical A symmetric multicore chip requires that all its cores have the same cost. Example: given 16 BCEs. –r = 8  2 cores * 8 BCEs/core –r = 4  4 cores * 4 BCEs/core Given the resource budget of n BCEs, we have n/r cores, each with r BCEs. Performance of each core is perf(r). Then we get

12 Model-Asymmetrical In an asymmetric multicore chip, several cores are more powerful than the others. Example: given 16 BCEs –1 four-BCE core and 12 base cores. –1 six-BCE core and 10 base cores. Given the resource budget of n BCEs, we have 1+n−r cores with one larger core (with r BCEs) and n−r base cores (with 1 BCE each). Then we get

13 Model-Dynamic A dynamic multicore chip can dynamically combine up to r cores into one core in order to boost sequential performance. –In sequential mode, it can execute with performance of perf(r) when the dynamic techniques use r BCEs. –In parallel mode, it can obtain performance of n using all base cores in parallel. Then, we get

14 Symmetrical Multicore Chips –Fixed n and r, speedup is an increasing function of f –Fixed f and r, speedup is an increasing function of n  Increasing both the parallel fraction (f) and the number of base core (n) can improve the speedup of symmetric multicore chip. For fixed f and n, we have the following theorem:

15 Symmetrical Multicore Chips For any fixed f and c, –if f < c, the maximum speedup is achieved at r = n. –if f > c and n is not big, the maximum speedup is achieved at r = 1. –if f > c and n is big enough, to obtain optimal multicore performance, the resources of BCEs should be dedicated to one core intended to offer reasonable individual core’s performance.

16 Symmetrical Multicore Chips If n is big enough, then will the maximum speedup always be achieved between extremes for any perf(x) < x? Counterexample: –(i) perf(x)=kx, for any 0<k<1; –(ii) perf(x)=x c, for any f<c<1.

17 Asymmetrical Multicore Chips Similarly, increasing both the parallel fraction (f) and the number of BCEs (n) can improve the speedup of asymmetric multicore chip. For fixed f and n, we have the following theorem:

18 Asymmetrical Multicore Chips If f >c and n is not big, maximum speedup is achieved at r = 1. If f <c and n is not big, maximum speedup is achieved at r = n. For any fixed f and c, if n is big enough, the maximum speedup is achieved at 1<r 0 <n.

19 Asymmetrical Multicore Chips Note that the optimal r 0 in Theorem 2 can not be solved analytically. r 0 is linear with n, and if n is big enough, r 0 will approach n to any extent.

20 Asymmetrical Multicore Chips If n is big enough, will the maximum speedup always be achieved between extremes for any perf(x)<x? Counterexample: –perf(x)=kx, for any f<k<1. For saturated functions, Like p(x)=x c, p(x)=kx c +mx c’ +…, where c, c’<1.

21 Asymmetrical Multicore Chips Based on the simplistic assumptions of Amdahl’s law, it makes most sense to devote extra resources to increase only one core’s capability. In fact we have the following theorem: Although the architecture of asymmetric multicore chip using one large core and many base cores is assumed originally for simplicity, it is indeed the optimal architecture in the sense of speedup.

22 Dynamic Multicore Chips We should increase both f and n to enhance the speedup of dynamic multicore chip. For fixed f and n, –if perf(r) is an increasing function, speedup is also an increasing function –  the maximum speedup is always achieved at r = n.  Dynamic multicore chips can offer potential speedups that are greater and never worse than symmetric or asymmetric multicore chips with identical perf(r) functions. So researchers should continue to investigate methods that approximate a dynamic multicore chip.

23 Potentials of Maximum Speedups Recall that in the Amdahl’s law, even if the number of processors approaches infinity, the speedup is bound by1/(1−f). The increasing of n can improve the speedup continuously. Under the assumption of perf(r) = r c, when n approaches infinity, the speedup can also approach infinity even if the performance index c is small.

24 Implications and Results A theoretical analysis of multicore scalability is investigated, and quantitative conditions are given to determine how to obtain optimal multicore performance. The theorems and corollary provide computer architects with a better understanding of multicore design types, enabling them to make more informed tradeoffs. However, our precise quantitative results are suspect because the real world is much more complex. The model considered here ignores many important structures. This theoretical analysis attempts to provide insights on future work.

25 Future Work In applications, the parallel fraction f can not be infinitely parallelizable. The parallel degree can be less than some constant d or even be random in some circumstances. Introducing practical structures, such as memory hierarchy, shared caches, etc. More cores might allow more parallelism for larger problem size. Fixed-time speedup, like the Gustafson’s law, should be considered. …

26 Acknowledgements We would like to thank Professor Mark Hill for his valuable comments and suggestions. We also appreciate the help of Dr. Mark Squillant and the arrangement of the MAMA organizator on this video presentation.

27 Thanks Welcome Questions and Comments


Download ppt "Extending Amdahl’s Law in the Multicore Era Erlin Yao, Yungang Bao, Guangming Tan and Mingyu Chen Institute of Computing Technology, Chinese Academy of."

Similar presentations


Ads by Google