Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim.

Similar presentations


Presentation on theme: "Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim."— Presentation transcript:

1 Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim 2,3 1 2 3

2 Single-chip heterogeneous processors 2 Compared to systems based on discrete components -Lower communication overhead -Lower power consumption -Lower cost (less silicon) -Emerging application friendly (sequential + parallel processing) Sources: AMD, Intel, and Samsung AMD’s Llano Intel’s Sandy Bridge Samsung’s Exynos

3 Challenges 3 SCHP’s performance: limited by power budget -Total chip power budget -CPU/GPU power budget Multiprogrammed workload -Workload-aware power allocation -Considering characteristics and metrics How can optimize overall performance within limited power budget?

4 Outline 4 Motivation Target platform: SCHP + MW Workload-aware power allocation -Characteristics of programs -Evaluation Metrics Methodology -Power configuration -Benchmark programs Evaluation Algorithm Conclusion

5 Target platform: SCHP + MW 5 4-core CPU + 16-SM GPU Multiple V/F domains  DVFS 2 programs running Hardware resources evenly divided GPU0 GPU0 V/F domain Memory Controllers MCs V/F domain CPU Core0 CPU Core1 CPU Core2 CPU Core3 CPU V/F domain (per-core) GPU1 GPU1 V/F domain Multiprogrammed Workload Program 1 Program 2

6 Workload-aware power allocation 6 Characteristics of programs -Non-uniform performance sensitivities Evaluation metrics -Throughput vs. Energy efficiency Normalized throughput Allocating more power to mri-q Power allocation (using the same HW)

7 Outline 7 Motivation Target platform: SCHP + MW Workload-aware power allocation -Characteristics of programs -Evaluation Metrics Methodology -Power configuration -Benchmark programs Evaluation Algorithm Conclusion

8 Methodology: shared power budget 8 Can change the power budget for 17.4 24.8 34.2 46.4 62.8 11.2 16.8 22.4 31.2 41.6 11.2 16.8 22.4 31.2 41.6 CPU 2GPU 1GPU 2 Power Configuration Output 17.4 24.8 34.2 46.4 62.8 CPU 1 Total chip power budget = 100 W CPU power budget = 80 W GPU power budget = 64 W Baseline configuration -Evenly divided (25 W for each CPU/GPU group)

9 Methodology: benchmark programs 9 Used 6 benchmark programs. Divided into 3 groups depending on characteristics BenchmarkAcronymSourceCharacteristics Magnetic Resonance Imaging Q MRQParboilCompute-bound Stream ClusterSCLRodiniaCompute-bound HotspotHOTRodiniaNeutral Sum of Absolute Difference SADParboilNeutral StencilSTNParboilMemory-bound Stream CopySCPCS VirginiaMemory-bound

10 Outline 10 Motivation Target platform: SCHP + MW Workload-aware power allocation -Characteristics of programs -Evaluation Metrics Methodology -Power configuration -Benchmark programs Evaluation Algorithm Conclusion

11 Evaluation: case study 1 (compute- vs. memory-bound) 11 19% throughput improvement32% energy efficiency improvement Allocating more power to compute-bound Optimal points vary depending on metrics.

12 Evaluation: case study 2 (memory- vs. memory-bound) 12 10% throughput improvement32% energy efficiency improvement Equally allocated power Again, optimal point depends on -Evaluation metric -Workload characteristics (compute- or memory-bound)

13 Evaluation: variation of optimal configuration 13 Depending on programs’ characteristics and evaluation metrics P1P2 Metric 1: throughputMetric 2: energy efficiency P1 (Watt)P2 (Watt)P1 (Watt)P2 (Watt) CPUGPUCPUGPUCPUGPUCPUGPU MRQ (C)SCL(C)17.431.217.431.217.416.817.416.8 SCP (M)STN (M)17.431.217.431.217.411.217.411.2 SAD (N)HOT (N)17.431.217.431.217.411.217.416.8 MRQ (C)SCP (M)17.441.617.422.417.422.417.416.8 SCL (C)SCP (M)17.441.617.422.417.411.217.411.2 HOT (N)MRQ(N)17.431.217.431.217.411.217.422.4 MRQ (C)SAD (N)17.431.217.431.217.416.817.422.4 SCL (C)SAD (N)17.431.217.431.217.416.817.411.2 HOT (N)STN (M)17.441.617.422.417.411.217.411.2 HOT (N)SCP (M)17.441.617.422.417.411.217.411.2 SAD (N)SCP (M)17.441.617.422.417.411.217.422.4

14 Evaluation: performance improvement from optimal power allocation 14 Achieved significant improvement -12% for throughput -18% for energy efficiency

15 Algorithm for throughput maximization 15 calculate (slope) abs(sp1-sp2) < threshold sp1 > sp2 alloc(p2_more) alloc(p1_more) alloc(equally) wait(regular_time) YES NO Normalized throughput Power allocation

16 Algorithm for energy efficiency maximization 16 final = min_power EE(final) == MAX EE(final, p1++) > EE(final, p2++) final = (final, p1++) final = (final, p2++) exit MAX = max( EE(final), EE(final, p1++), EE(final, p2++) ) Gradient search from the minimum power allocation

17 Conclusion 17 We propose a solution for optimal power allocation -Workload-aware power allocation -By using program characteristics and evaluation metrics Significant performance improvement achieved -12% for throughput -18% for energy efficiency Run-time algorithms effectively find (near-)optimal power allocation

18 Backup slides 18

19 Simulator 19 Integrated CPU + GPU simulator -H. Wang, V. Sathish, R. Singh, M. Schulte and N. Kim, "Workload and Power Budget Partitioning for Single-Chip Heterogeneous Processors," in PACT, 2012. -http://cpu-gpu-sim.ece.wisc.edu/ -gem5 + GPGPU-Sim Adaptive power allocation for multiprogrammed workload -Per-core V/F domains for CPU -2 V/F domains for GPU


Download ppt "Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim."

Similar presentations


Ads by Google