Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matt DeVuyst Rakesh Kumar Dean Tullsen.

Similar presentations


Presentation on theme: "Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matt DeVuyst Rakesh Kumar Dean Tullsen."— Presentation transcript:

1 Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matt DeVuyst Rakesh Kumar Dean Tullsen

2 IPDPS: DeVuyst, Kumar, Tullsen 2 Some Definitions Balanced schedule:  A schedule of threads to contexts such that the number of threads per core is equal Unbalanced schedule:  A schedule of threads to contexts such that the number of threads per core is not equal Core 1Core 2Core 3 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Core 1Core 2Core 3 Thread 1 Thread 2 Thread 3 Thread 4Thread 5 Thread 6 Thread 7

3 IPDPS: DeVuyst, Kumar, Tullsen 3 Why a CMP of SMT cores? Chip makers are manufacturing more Chip Multiprocessors (CMP) with Simultaneous Multithreading (SMT)  Power5  Niagra Very little work has been done on thread scheduling for such an architecture Scheduling on this architecture is challenging

4 IPDPS: DeVuyst, Kumar, Tullsen 4 Application Diversity Different applications have different needs One way to effectively cope with application diversity is hardware heterogeneity [Kumar03]

5 IPDPS: DeVuyst, Kumar, Tullsen 5 Hardware Heterogeneity Threads Cores

6 IPDPS: DeVuyst, Kumar, Tullsen 6 Application Diversity Different applications have different needs One way to effectively cope with application diversity is hardware heterogeneity Another way to deal with application diversity is soft heterogeneity

7 IPDPS: DeVuyst, Kumar, Tullsen 7 Soft Heterogeneity Threads SMT Cores

8 IPDPS: DeVuyst, Kumar, Tullsen 8 Scheduling Complexity Given a 4 core CMP, with 4 contexts per core, and 12 threads  There are 15,400 balanced schedules  There are 644,875 unbalanced schedules Core Context

9 IPDPS: DeVuyst, Kumar, Tullsen 9 Our Goals Find good scheduling policies  System-level scheduling → Granularity is an OS time-slice Optimize for both power and performance  Performance  Power  Energy  Energy Delay Product (EDP) = Energy * Performance

10 IPDPS: DeVuyst, Kumar, Tullsen 10 Outline Architecture Methodology Scheduling Policies Conclusions

11 IPDPS: DeVuyst, Kumar, Tullsen 11 Architecture 4 SMT cores 4 contexts per core Shared L2, L3 Cores can be power- gated L2 and L3 Caches Ctx Shared L1s Ctx Shared L1s Ctx Shared L1s Ctx Shared L1s

12 IPDPS: DeVuyst, Kumar, Tullsen 12 Methodology Benchmarks  12 SPEC 2k benchmarks  TLP varied from 4,6,8,12,16  8 benchmark sets for each level of TLP Each benchmark is given fair coverage Dynamic scheduling policies seeded with the best static schedule A variant of SMTSIM and a CMP-aware version of Wattch

13 IPDPS: DeVuyst, Kumar, Tullsen 13 Outline Architecture Methodology Scheduling Policies  Naïve balanced scheduling policy  Sampling-based policies  Electron policies Conclusions

14 IPDPS: DeVuyst, Kumar, Tullsen 14 Naïve Balanced Scheduling Policy Main idea  Spreading threads evenly across cores results in good resource utilization How it works  Each thread is assigned to a context such that the resulting schedule is balanced.  The schedule is changed randomly over time. This was our baseline for comparison  Easy to implement  Most common

15 IPDPS: DeVuyst, Kumar, Tullsen 15 What We Learn From Static Schedules Baseline is Naïve Balanced Dynamic Policy

16 IPDPS: DeVuyst, Kumar, Tullsen 16 Outline Architecture Methodology Scheduling Policies  Naïve balanced scheduling policy  Sampling-based policies  Electron policies Conclusions

17 IPDPS: DeVuyst, Kumar, Tullsen 17 Sampling-based Policies Main idea  Try different schedules to find an effective one  Oblivious to underlying hardware How they work  Two alternating phases Sampling phase: different schedules are sampled Steady phase: best schedule from sampling phase is used  Steady phase is much longer than sampling phase

18 IPDPS: DeVuyst, Kumar, Tullsen 18 Sampling-based Policies

19 IPDPS: DeVuyst, Kumar, Tullsen 19 Sampling-based Policies

20 IPDPS: DeVuyst, Kumar, Tullsen 20 Sampling-based Policies

21 IPDPS: DeVuyst, Kumar, Tullsen 21 Outline Architecture Methodology Scheduling Policies  Naïve balanced scheduling policy  Sampling-based policies Symbiosis policies [Snavely02] “Prefer Last” policies  Electron policies Conclusions

22 IPDPS: DeVuyst, Kumar, Tullsen 22 Symbiosis Policy Main idea  Some threads run well together, others do not How it works  Sampling phase: random schedules created, performance sampled.  Steady phase: the schedule in which threads achieve the most symbiosis is run  Two versions: Balanced: only balanced schedules considered Unbalanced

23 IPDPS: DeVuyst, Kumar, Tullsen 23 Symbiosis Policy Baseline is Naïve Balanced

24 IPDPS: DeVuyst, Kumar, Tullsen 24 Outline Architecture Methodology Scheduling Policies  Naïve balanced scheduling policy  Sampling-based policies Symbiosis policies “Prefer Last” policies  Electron policies Conclusions

25 IPDPS: DeVuyst, Kumar, Tullsen 25 “Prefer Last” Policies Main idea  Current schedules has merit  A similar schedule might be a little better How they work  Create multiple permutations on the current schedule  Create a few random samples to prevent remaining in only local minima  Sample schedules and pick the best

26 IPDPS: DeVuyst, Kumar, Tullsen 26 “Prefer Last” Policies

27 IPDPS: DeVuyst, Kumar, Tullsen 27 “Prefer Last” Policies

28 IPDPS: DeVuyst, Kumar, Tullsen 28 “Prefer Last” Policies

29 IPDPS: DeVuyst, Kumar, Tullsen 29 “Prefer Last” Policies

30 IPDPS: DeVuyst, Kumar, Tullsen 30 Sampling Based Policies

31 IPDPS: DeVuyst, Kumar, Tullsen 31 Sampling Based Policies

32 IPDPS: DeVuyst, Kumar, Tullsen 32 Issues With Sampling Based Policies Non-scalable  Search space grows → number of samples grow Overhead of sampling  Some schedules result in improvement  …but most just make things worse

33 IPDPS: DeVuyst, Kumar, Tullsen 33 Outline Architecture Methodology Scheduling Policies  Naïve balanced scheduling policy  Sampling-based policies  Electron policies Conclusions

34 IPDPS: DeVuyst, Kumar, Tullsen 34 Electron Policies Main idea  One core attracts a thread  Another core repels a thread. How it works (EDP)  Highest EDP core identified  Lowest EDP core identified  A thread running on the low EDP core is moved to the high EDP core

35 IPDPS: DeVuyst, Kumar, Tullsen 35 Electron Policies t1t2t3 t4t5t6t7 t8 Core 1Core 2 Core 3Core 4 Core with the highest EDP Core with the lowest EDP

36 IPDPS: DeVuyst, Kumar, Tullsen 36 Electron Policy Results

37 IPDPS: DeVuyst, Kumar, Tullsen 37 Outline Architecture Methodology Scheduling Policies  Naïve balanced scheduling policy  Sampling-based policies  Electron policies Conclusions

38 IPDPS: DeVuyst, Kumar, Tullsen 38 Conclusions A good scheduling policy for a CMP of SMTs must consider unbalanced schedules to achieve the most efficiency. “Prefer Last” policies yield more energy savings than symbiotic scheduling policies and the naïve balanced policy. Electron policies have low overhead and are particularly effective well when TLP is high.

39 Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matt DeVuyst Rakesh Kumar Dean Tullsen


Download ppt "Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matt DeVuyst Rakesh Kumar Dean Tullsen."

Similar presentations


Ads by Google