Shanjiang Tang1, Bingsheng He2, Shuhao Zhang2,4, Zhaojie Niu3

Shanjiang Tang1, Bingsheng He2, Shuhao Zhang2,4, Zhaojie Niu3
7/12/2018 Elastic Multi-Resource Fairness Balancing Fairness and Efficiency in Coupled CPU- GPU Architectures Shanjiang Tang1, Bingsheng He2, Shuhao Zhang2,4, Zhaojie Niu3 1School of Computer Science & Technology, Tianjin University 2School of Computing, National University of Singapore 3Interdisciplinary Graduate School, Nanyang Technological University 4SAP Research & Innovation, Singapore

Outline Motivation Elastic Multi-Resource Fairness Evaluation
Conclusion and Future Work

Coupled CPU-GPU Architectures (CCGA)
Integrate both CPU and GPU into a single chip by removing PCI-e bus. Intel Sandy Bridge, Ivy Bridge, AMD APU, etc. Reduce the data transfer cost between CPU and GPU. CPU GPU Main Memory

Heterogeneous Computing Model
7/12/2018 Heterogeneous Computing Model 1. Co-run computation: (a job level). 2. Sharing computing devices: (user level). 1) high resource utilization: a) high parallelism of a GPU user’s workload cannot fully utilize. b) varying resource demands of a user’s workload. 2) high utilization => high cost efficiency.

Fair Resource Allocation
7/12/2018 Fair Resource Allocation Integrate CPU and GPU resources of CCGA with GFLOPS. Users concern with the total allocated GFLOPS of all computing devices in CCGA, rather than separately. Different allocation policies can have different allocation results on Fairness and Efficiency. Proportional resource sharing Dominant resource fairness (DRF) Efficiency-oriented Allocation users concern with the fairness of the total GFLOPS of all computing devices, rather than the fairness over CPU or GPU separately.

Motivation Example CPU GPU Allocation 800 Capacity (GFLOPS) 100
CPU GPU Allocation 800 Capacity (GFLOPS) 100 Devices to User 1 Idle to User 2

Proportional Resource Sharing
CPU GPU Allocation 800 Capacity (GFLOPS) 100 to User 1 Idle 20 GFLOPS 180 GFLOPS 80 GFLOPS 120 GFLOPS to User 2 37% Devices

Dominant Resource Fairness(DRF)
CPU GPU Allocation 800 100 Devices to User 1 Idle 47.1GFLOPS 423.5 GFLOPS 52.9GFLOPS 79.4GFLOPS to User 2 63% Capacity (GFLOPS)

Efficiency-oriented Allocation
CPU GPU 800 100 Devices Idle 86.7GFLOPS 780 GFLOPS 13.3 Allocation to User 1 to User 2 20 GFLOPS GFLOPS Capacity (GFLOPS)

Fairness VS Efficiency
Tend to be a tradeoff between fairness and efficiency. Pursuing 100% fairness often results in poor efficiency, and vice versa. Needs an allocation policy that can balance the two metrics flexibly as users want. Fairness Efficiency

Desirable Fair Sharing Properties
Sharing Incentive (SI) Each user performs no worse than that under exclusive non-sharing case. Envy Freeness (EF) No user envies the allocation of any other users. Pareto-Efficiency (PE) It is NOT feasible for a user to increase its own allocation without decreasing at least one other user.

Our Work Challenges: can we find an allocation policy that
1) allows users to tune and balance the fairness and efficiency as they need flexibly? 2) and meanwhile satisfies the following properties: Sharing Incentive Envy Freeness Pareto-Efficiency Our Solution: Elastic Multi-Resource Fairness Knob-based allocation policy. Based on DRF allocation policy.

Elastic Multi-Resource Fairness
7/12/2018 Elastic Multi-Resource Fairness

Step1: Fairness-oriented Allocation(FA)
CPU GPU 800 Capacity (GFLOPS) 100 Idle CPU GPU Allocation 800 100 Devices to User 1 Idle to User 2 31.5% Capacity (GFLOPS) 50% FA

Step2: Efficiency-oriented Allocation(EA)
Model it as a liner programming problem CPU GPU Allocation 800 100 Devices Idle to User 2 31.5% Capacity (GFLOPS) 50% Devices CPU GPU Allocation 800 100 to User 1 Idle to User 2 88% Capacity (GFLOPS) EA to User 1

System Implementation

Properties Analysis for EMRF
EMRF satisfies all three properties: Proof sketches are in the paper.

Evaluation Platform: AMD A8-3870K APU APU Benchmarks
7/12/2018 Evaluation Platform: AMD A8-3870K APU APU Benchmarks Detailed setups are in the paper.

Throughput and System Efficiency
7/12/2018 Throughput and System Efficiency Performance Evaluation Points. Static Partitioning: refers to exclusively non-sharing case for each queue/workload. MLRF: under memoryless resource fairness (default Yarn fair scheduler). LTRF: our long-term resource fair scheduler (i.e., LTYARN). Observation Points: 1). The shared cases (MLRF and LTRF) can possibly achieve better performance than non-shared case. The finding is consistent with paper Mesos at NSDI’11. 2). There is no conclusive results regrding which one is abosolutely better than the other between MLRF and LTRF.

CPU and GPU Utilizations
EMRF can achieve high CPU and GPU utilizations.

Different Knobs Favor the fairness but worsen the throughput when knob enlarges, and vice versa.

Different Number of Users
There is a decreasing trend for throughput when the number users increases. Smaller knob value achieves better throughput for EMRF.

7/12/2018 Conclusions There is a tradeoff between fairness and efficiency for resource allocation in coupled CPU-GPU architecture (CCGA). We argue that it should integrate CPU and GPU of CCGA as a whole when considering fairness /efficiency optimization in resource allocation. We propose a knob-based fairness-efficiency allocation policy called EMRF for CCGA. We theoretically show that EMRF satisfies SI, EF, PE properties and validate the effectiveness of our EMRF experimentally.

7/12/2018 Future Work Extend our allocation policy to distributed environment with multiple CCGAs. Extend EMRF by considering other resource types: CPU, memory, Network I/O, etc.

Thanks! Question?

Shanjiang Tang1, Bingsheng He2, Shuhao Zhang2,4, Zhaojie Niu3

Similar presentations

Presentation on theme: "Shanjiang Tang1, Bingsheng He2, Shuhao Zhang2,4, Zhaojie Niu3"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Shanjiang Tang1, Bingsheng He2, Shuhao Zhang2,4, Zhaojie Niu3

Similar presentations

Presentation on theme: "Shanjiang Tang1, Bingsheng He2, Shuhao Zhang2,4, Zhaojie Niu3"— Presentation transcript:

Similar presentations

About project

Feedback