Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le

Similar presentations


Presentation on theme: "Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le"— Presentation transcript:

1 Phurti: Application and Network-Aware Flow Scheduling for Multi-Tenant MapReduce Clusters
Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le Systems Research Group Distributed Protocols Research Group

2 Outline Introduction System Architecture Scheduling Algorithm
Evaluation Summary

3 Multi-tenancy in MapReduce Clusters
Jobs Users MapReduce Cluster Better ROI, high utilization. How to share resources? Network is the primary bottleneck.

4 Problem Statement How to schedule network traffic to improve completion time for MapReduce jobs?

5 Application-Awareness in Scheduling
Job 1 Traffic Job 2 Traffic Link 1 6 units Link 2 3 units 2 units Fair Sharing1 Shortest Flow First2 Application Aware L1 L1 L1 L2 L2 L2 2 4 6 time 2 4 6 time time 2 4 6 Job 1 Completion time = 5 Job 1 Completion time = 5 Job 1 Completion time = 3 Job 2 Completion time = 6 Job 2 Completion time = 6 Job 2 Completion time = 6 5

6 Network-Awareness in Scheduling
Path 1 N1 S1 S2 N3 N2 N4 Path 2 Job 1 Traffic Job 2 Traffic Path 1 3 units Path 2 3 units

7 Network-Awareness in Scheduling
Job 1 Traffic Job 2 Traffic Path 1 3 units Path 2 3 units Network-Agnostic Network-Aware P1 P1 P2 P2 2 4 6 time 2 4 6 time Job 1 Completion time = 6 Job 1 Completion time = 3 Job 2 Completion time = 6 Job 2 Completion time = 6 Takeaway: Do not schedule interfering flows of concurrent jobs together

8 Related Work Traditional flow-scheduling
PDQ [SIGCOMM ‘12], Hedera [NSDI ‘10] Only improve network-level metrics Application and Network-Aware Task Schedulers Cross-Layer Scheduling [IC2E 2015], Tetris [SIGCOMM ’14] Schedule tasks instead of network traffic Application-Aware traffic schedulers Baraat [SIGCOMM ‘14], Varys [SIGCOMM ’14] Unaware of network topology

9 Phurti: Contributions
Improves Job Completion Time Fairness and Starvation Protection Scalable API Compatibility Hardware Compatibility

10 Outline Introduction System Architecture Scheduling Algorithm
Evaluation Summary

11 Phurti Framework Hadoop Nodes N4 N5 N6 N1 N2 N3 Phurti
Northbound API N4 N5 N6 N1 N2 N3 Phurti Scheduling Framework Southbound API SDN Switches S1 S2

12 Outline Introduction System Architecture Scheduling Algorithm
Evaluation Summary

13 Phurti Algorithm – Intuition
Job 1 Flows Job 2 Flows 1 3 1 3 2 4 2 4 Max. Sequential Traffic: 4 units Max. Sequential Traffic: 5 units P1 P1 P2 P2 2 4 6 time 4 time 2 6 Job 1 Completion time = 4 Job 2 Completion time = 5 Takeaway: Job completion time is determined by maximum sequential traffic.

14 Phurti Algorithm – Intuition (cont.)
Job 1 Traffic 1 3 Max. Sequential Traffic: 4 units Job 2 Traffic Max. Sequential Traffic: 5 units 2 4 If Job 2 scheduled first If Job 1 scheduled first P1 P1 P2 P2 4 time 2 6 8 2 4 6 8 time Job 1 Completion time = 4 Job 1 Completion time = 8 Job 2 Completion time = 8 Job 2 Completion time = 5 Observation: It is better to schedule the job with smaller maximum sequential traffic first.

15 Phurti Algorithm Assign priorities to jobs based on Max Sequential Traffic N2 N3 s1 s2 s3 N4 N1 Let flows of the highest priority job transfer N1 N4 N1 N2 N3 N4 Let non-interfering flows of the lower priority jobs transfer Job Flow Size Max Seq. Traffic Priority J1 N1N4 2 LOW N4N1 J2 N2N3 1 HIGH Let other lower priority flows transfer at a small rate Latency Improvement Throughput Maximization Starvation Protection

16 Evaluation Baseline: Fair Sharing (Default in MapReduce)
Testbed: 6 nodes, 2 SDN switches SWIM workload: workload generated from Facebook Hadoop trace Job Size Bin % of total jobs % of total bytes in shuffled data Small 62% 5.5% Medium 16% 10.3% Large 22% 84.2%

17 Job Completion Time 95% of jobs have better job completion time under Phurti. Negative values mean Phurti performs better.

18 Job Completion Time 13% improvement in 95th percentile job completion time showing starvation protection. Much better for smaller jobs since they typically have higher priority

19 Flow Scheduling Overhead
Simulate a fat-tree topology with 128 hosts. Even in unlikely event of 100 simultaneous incoming flows, scheduling time is 4.5ms which is negligible scheduling overhead.

20 Flow Scheduling Overhead
Scheduling time for a new flow with 10 ongoing flows in the network Scheduling overhead grows much slower than linear rate showing that it is scalable with increasing number of hosts.

21 Phurti vs Varys Simulate 128-hosts fat-tree topology with core network having 1x, 5x and 10x capacity compared to access links Outperforms Varys significantly when the core network has much less capacity (oversubscribed). Better than Varys in every case.

22 Phurti: Contributions
Improves completion time for 95% of the jobs, decreases the average completion time by 20% for all jobs. Fairness and Starvation Protection. Improves tail job completion time by 13%. Scalable. Shown to scale to 1024 hosts and 100 simultaneous flow arrivals. API Compatibility Hardware Compatibility

23 Backup slides

24 Effective Transmit Rate
80% of jobs have effective transmit rate larger than 0.9 showing minimal throttling.


Download ppt "Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le"

Similar presentations


Ads by Google