Download presentation
Presentation is loading. Please wait.
Published byMaia Tuft Modified over 9 years ago
1
Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matt DeVuyst Rakesh Kumar Dean Tullsen
2
IPDPS: DeVuyst, Kumar, Tullsen 2 Some Definitions Balanced schedule: A schedule of threads to contexts such that the number of threads per core is equal Unbalanced schedule: A schedule of threads to contexts such that the number of threads per core is not equal Core 1Core 2Core 3 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Core 1Core 2Core 3 Thread 1 Thread 2 Thread 3 Thread 4Thread 5 Thread 6 Thread 7
3
IPDPS: DeVuyst, Kumar, Tullsen 3 Why a CMP of SMT cores? Chip makers are manufacturing more Chip Multiprocessors (CMP) with Simultaneous Multithreading (SMT) Power5 Niagra Very little work has been done on thread scheduling for such an architecture Scheduling on this architecture is challenging
4
IPDPS: DeVuyst, Kumar, Tullsen 4 Application Diversity Different applications have different needs One way to effectively cope with application diversity is hardware heterogeneity [Kumar03]
5
IPDPS: DeVuyst, Kumar, Tullsen 5 Hardware Heterogeneity Threads Cores
6
IPDPS: DeVuyst, Kumar, Tullsen 6 Application Diversity Different applications have different needs One way to effectively cope with application diversity is hardware heterogeneity Another way to deal with application diversity is soft heterogeneity
7
IPDPS: DeVuyst, Kumar, Tullsen 7 Soft Heterogeneity Threads SMT Cores
8
IPDPS: DeVuyst, Kumar, Tullsen 8 Scheduling Complexity Given a 4 core CMP, with 4 contexts per core, and 12 threads There are 15,400 balanced schedules There are 644,875 unbalanced schedules Core Context
9
IPDPS: DeVuyst, Kumar, Tullsen 9 Our Goals Find good scheduling policies System-level scheduling → Granularity is an OS time-slice Optimize for both power and performance Performance Power Energy Energy Delay Product (EDP) = Energy * Performance
10
IPDPS: DeVuyst, Kumar, Tullsen 10 Outline Architecture Methodology Scheduling Policies Conclusions
11
IPDPS: DeVuyst, Kumar, Tullsen 11 Architecture 4 SMT cores 4 contexts per core Shared L2, L3 Cores can be power- gated L2 and L3 Caches Ctx Shared L1s Ctx Shared L1s Ctx Shared L1s Ctx Shared L1s
12
IPDPS: DeVuyst, Kumar, Tullsen 12 Methodology Benchmarks 12 SPEC 2k benchmarks TLP varied from 4,6,8,12,16 8 benchmark sets for each level of TLP Each benchmark is given fair coverage Dynamic scheduling policies seeded with the best static schedule A variant of SMTSIM and a CMP-aware version of Wattch
13
IPDPS: DeVuyst, Kumar, Tullsen 13 Outline Architecture Methodology Scheduling Policies Naïve balanced scheduling policy Sampling-based policies Electron policies Conclusions
14
IPDPS: DeVuyst, Kumar, Tullsen 14 Naïve Balanced Scheduling Policy Main idea Spreading threads evenly across cores results in good resource utilization How it works Each thread is assigned to a context such that the resulting schedule is balanced. The schedule is changed randomly over time. This was our baseline for comparison Easy to implement Most common
15
IPDPS: DeVuyst, Kumar, Tullsen 15 What We Learn From Static Schedules Baseline is Naïve Balanced Dynamic Policy
16
IPDPS: DeVuyst, Kumar, Tullsen 16 Outline Architecture Methodology Scheduling Policies Naïve balanced scheduling policy Sampling-based policies Electron policies Conclusions
17
IPDPS: DeVuyst, Kumar, Tullsen 17 Sampling-based Policies Main idea Try different schedules to find an effective one Oblivious to underlying hardware How they work Two alternating phases Sampling phase: different schedules are sampled Steady phase: best schedule from sampling phase is used Steady phase is much longer than sampling phase
18
IPDPS: DeVuyst, Kumar, Tullsen 18 Sampling-based Policies
19
IPDPS: DeVuyst, Kumar, Tullsen 19 Sampling-based Policies
20
IPDPS: DeVuyst, Kumar, Tullsen 20 Sampling-based Policies
21
IPDPS: DeVuyst, Kumar, Tullsen 21 Outline Architecture Methodology Scheduling Policies Naïve balanced scheduling policy Sampling-based policies Symbiosis policies [Snavely02] “Prefer Last” policies Electron policies Conclusions
22
IPDPS: DeVuyst, Kumar, Tullsen 22 Symbiosis Policy Main idea Some threads run well together, others do not How it works Sampling phase: random schedules created, performance sampled. Steady phase: the schedule in which threads achieve the most symbiosis is run Two versions: Balanced: only balanced schedules considered Unbalanced
23
IPDPS: DeVuyst, Kumar, Tullsen 23 Symbiosis Policy Baseline is Naïve Balanced
24
IPDPS: DeVuyst, Kumar, Tullsen 24 Outline Architecture Methodology Scheduling Policies Naïve balanced scheduling policy Sampling-based policies Symbiosis policies “Prefer Last” policies Electron policies Conclusions
25
IPDPS: DeVuyst, Kumar, Tullsen 25 “Prefer Last” Policies Main idea Current schedules has merit A similar schedule might be a little better How they work Create multiple permutations on the current schedule Create a few random samples to prevent remaining in only local minima Sample schedules and pick the best
26
IPDPS: DeVuyst, Kumar, Tullsen 26 “Prefer Last” Policies
27
IPDPS: DeVuyst, Kumar, Tullsen 27 “Prefer Last” Policies
28
IPDPS: DeVuyst, Kumar, Tullsen 28 “Prefer Last” Policies
29
IPDPS: DeVuyst, Kumar, Tullsen 29 “Prefer Last” Policies
30
IPDPS: DeVuyst, Kumar, Tullsen 30 Sampling Based Policies
31
IPDPS: DeVuyst, Kumar, Tullsen 31 Sampling Based Policies
32
IPDPS: DeVuyst, Kumar, Tullsen 32 Issues With Sampling Based Policies Non-scalable Search space grows → number of samples grow Overhead of sampling Some schedules result in improvement …but most just make things worse
33
IPDPS: DeVuyst, Kumar, Tullsen 33 Outline Architecture Methodology Scheduling Policies Naïve balanced scheduling policy Sampling-based policies Electron policies Conclusions
34
IPDPS: DeVuyst, Kumar, Tullsen 34 Electron Policies Main idea One core attracts a thread Another core repels a thread. How it works (EDP) Highest EDP core identified Lowest EDP core identified A thread running on the low EDP core is moved to the high EDP core
35
IPDPS: DeVuyst, Kumar, Tullsen 35 Electron Policies t1t2t3 t4t5t6t7 t8 Core 1Core 2 Core 3Core 4 Core with the highest EDP Core with the lowest EDP
36
IPDPS: DeVuyst, Kumar, Tullsen 36 Electron Policy Results
37
IPDPS: DeVuyst, Kumar, Tullsen 37 Outline Architecture Methodology Scheduling Policies Naïve balanced scheduling policy Sampling-based policies Electron policies Conclusions
38
IPDPS: DeVuyst, Kumar, Tullsen 38 Conclusions A good scheduling policy for a CMP of SMTs must consider unbalanced schedules to achieve the most efficiency. “Prefer Last” policies yield more energy savings than symbiotic scheduling policies and the naïve balanced policy. Electron policies have low overhead and are particularly effective well when TLP is high.
39
Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matt DeVuyst Rakesh Kumar Dean Tullsen
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.