Scheduling Jobs Across Geo-distributed Datacenters Chien-Chun Hung, Leana Golubchik, Minlan Yu Department of Computer Science University of Southern California.

Slides:



Advertisements
Similar presentations
Hadi Goudarzi and Massoud Pedram
Advertisements

CPU Scheduling Questions answered in this lecture: What is scheduling vs. allocation? What is preemptive vs. non-preemptive scheduling? What are FCFS,
David Ripplinger, Aradhana Narula-Tam, Katherine Szeto AIAA 2013 August 21, 2013 Scheduling vs Random Access in Frequency Hopped Airborne.
1 Size-Based Scheduling Policies with Inaccurate Scheduling Information Dong Lu *, Huanyuan Sheng +, Peter A. Dinda * * Prescience Lab, Dept. of Computer.
 Basic Concepts  Scheduling Criteria  Scheduling Algorithms.
Chapter 5 CPU Scheduling. CPU Scheduling Topics: Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling.
Simulation Evaluation of Hybrid SRPT Policies
Operating Systems 1 K. Salah Module 2.1: CPU Scheduling Scheduling Types Scheduling Criteria Scheduling Algorithms Performance Evaluation.
CS 311 – Lecture 23 Outline Kernel – Process subsystem Process scheduling Scheduling algorithms User mode and kernel mode Lecture 231CS Operating.
Chapter 6: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Chapter 6: CPU Scheduling Basic.
Looking at the Server-side of P2P Systems Yi Qiao, Dong Lu, Fabian E. Bustamante and Peter A. Dinda Department of Computer Science Northwestern University.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
1 Connection Scheduling in Web Servers Mor Harchol-Balter School of Computer Science Carnegie Mellon
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
New Challenges in Cloud Datacenter Monitoring and Management
GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks Íñigo Goiri, Kien Le, Thu D. Nguyen, Jordi Guitart, Jordi Torres, and Ricardo Bianchini.
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-3 CPU Scheduling Department of Computer Science and Software Engineering.
OPERATING SYSTEMS CPU SCHEDULING.  Introduction to CPU scheduling Introduction to CPU scheduling  Dispatcher Dispatcher  Terms used in CPU scheduling.
Chapter 6 CPU SCHEDULING.
More Scheduling cs550 Operating Systems David Monismith.
Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.
Network Aware Resource Allocation in Distributed Clouds.
Scheduling of Parallel Jobs In a Heterogeneous Multi-Site Environment By Gerald Sabin from Ohio State Reviewed by Shengchao Yu 02/2005.
Optimal Scheduling of File Transfers with Divisible Sizes on Multiple Disjoint Paths Mugurel Ionut Andreica Polytechnic University of Bucharest Computer.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
CPU Scheduling CSCI 444/544 Operating Systems Fall 2008.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
Resource/Accuracy Tradeoffs in Software-Defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan HotSDN’13.
CE Operating Systems Lecture 7 Threads & Introduction to CPU Scheduling.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
Chapter 5: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 5: CPU Scheduling Basic Concepts Scheduling Criteria.
1 11/29/2015 Chapter 6: CPU Scheduling l Basic Concepts l Scheduling Criteria l Scheduling Algorithms l Multiple-Processor Scheduling l Real-Time Scheduling.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
“A cost-based admission control algorithm for digital library multimedia systems storing heterogeneous objects” – I.R. Chen & N. Verma – The Computer Journal.
Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
1 CS.217 Operating System By Ajarn..Sutapart Sappajak,METC,MSIT Chapter 5 CPU Scheduling Slide 1 Chapter 5 CPU Scheduling.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,
1 CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling.
Optimal Relay Placement for Indoor Sensor Networks Cuiyao Xue †, Yanmin Zhu †, Lei Ni †, Minglu Li †, Bo Li ‡ † Shanghai Jiao Tong University ‡ HK University.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
CPU scheduling.  Single Process  one process at a time  Maximum CPU utilization obtained with multiprogramming  CPU idle :waiting time is wasted 2.
1 Module 5: Scheduling CPU Scheduling Scheduling Algorithms Reading: Chapter
Basic Concepts Maximum CPU utilization obtained with multiprogramming
CPU Scheduling Algorithms CSSE 332 Operating Systems Rose-Hulman Institute of Technology.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
GRASS: Trimming Stragglers in Approximation Analytics
Dynamic Graph Partitioning Algorithm
Scheduling Jobs Across Geo-distributed Datacenters
PA an Coordinated Memory Caching for Parallel Jobs
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
3: CPU Scheduling Basic Concepts Scheduling Criteria
Chapter5: CPU Scheduling
Chapter 6: CPU Scheduling
Process Scheduling B.Ramamurthy 2/23/2019.
Process Scheduling B.Ramamurthy 4/11/2019.
Process Scheduling B.Ramamurthy 4/7/2019.
Shortest-Job-First (SJR) Scheduling
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Presentation transcript:

Scheduling Jobs Across Geo-distributed Datacenters Chien-Chun Hung, Leana Golubchik, Minlan Yu Department of Computer Science University of Southern California

Geo-distributed Jobs Large-scale data-parallel jobs – Data too big for full replication – Data spread across geo-distributed datacenters Conventional approach moves data for computation – Bandwidth cost, completion time, data access restrictions Emerging trend moves computation for data – Bandwidth usage savings up to 250x [Vulimiri-NSDI’15] – Query completion time 3-19x [Pu-SIGCOMM’15]

Job Scheduling Job scheduling is critical for distributed job execution – Global scheduler: compute job order – Datacenter scheduler: schedule tasks, report progress Global Scheduler Datacenter Scheduler Job order Progress

Challenges in Job Scheduling Shortest Remaining Processing Time (SRPT) for reducing average job completion time – Greedily schedule the smallest job – Sub-optimal due to lack of coordination across datacenters Scheduling distributed jobs is NP-hard [Garg-FSTTCS’07] – Design 2 job scheduling heuristics for reducing average job completion time

Motivating Example Job ID#tasks in DC1#tasks in DC2#tasks in DC3Total #tasks A B38011 C70613 B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2 SRPT B  A  C: 12.3 B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2 Optimal C  B  A: 11.7

Reordering-based Approach Design insights – Job has bottleneck – Tasks can be delayed until the bottleneck Adjust the job order based on bottleneck B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2 Job A’s bottleneck

SRPT B  A  C Average: 12.3 Reordering B  C  A Average: 12 Reordering Algorithm Iteratively select “delay-able” job 1.Identify the last set job at the most-loaded datacenter. 2.Delay the tasks won’t hurt its job completion time. 3.Continue until all jobs are selected. Light-weight add-on; won’t degrade job’s completion Conservative improvements Optimal: 11.7 B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2 B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2 B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2 B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2 B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2

Workload-Aware Greedy Scheduling (SWAG) Design insights: – Faster job first (SRPT) – Uneven #tasks at each datacenter – Uneven existing queue length at each datacenter Bottleneck determines completion

SWAG Algorithm Greedily select the “fastest” job – Minimum addition to the current queue lengths – Minimum remaining tasks (tie-breaker) – Continue until all jobs are selected More computationally intensive More performance improvements A: 10 A: 1 DC 1DC 3DC 2 B: 3 B: 8 DC 1DC 3DC 2 C: 7 C: 6 DC 1DC 3DC C: 7 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2 B: 3 C: 7 B: 8 C: 6 DC 1DC 3DC 2 10; A: 12 tasks 10; B: 11 tasks B: 3 C: 7 B: 8 C: 6 DC 1DC 3DC 2 A: 10 A: 1 SWAG C  B  A Average: 11.7 Optimal C  B  A Average: 11.7

Simulation Settings Real job size information (Facebook, Google) Poisson arrival process; 200 – 500 ms Sensitivity experiments – System utilization: 40% - 80% (default: 70%) – Task distribution: Zipf distribution (default: 2) – Number of datacenters: (default: 30) – Fairness across different job sizes – System overhead – Robustness to estimation accuracy

Performance Improvements Average job completion time normalized by FCFS More improvements as utilization increases. – At 80%, Reordering improves SRPT by 30%, SWAG improves by 40%. – Improvements are up to 35% from Reordering; 50% from SWAG. SWAG achieves performance within 5% of Optimal. System Utilization Average Completion time 30% 40%

Other Key Results Sensitivity to task distribution – Most improvements in skewed settings E.g., Zipf distribution with parameter 2 – Similar performance in extreme scenarios E.g., uniform (no-skew), single-DC Fairness across different job classes – All solutions perform similarly for small jobs – SWAG and Reordering outperform for large jobs

Summary The challenges of job scheduling – Shortcomings of SRPT-based approaches. Two heuristics for reducing average job completion time – Reordering: light-weight, add-on, conservative improvements – SWAG: more performance improvements at reasonable cost Simulation experiments show promising improvements – SWAG (50%), Reordering (35%) – More improvements in heavily-loaded and skewed settings

Thank You! Chien-Chun Hung University of Southern California (USC) Poster Presentation: August 28 th, 1:30pm

Appendix

Robustness to Inaccurate Info Stale info (job order) for local scheduler – Continue to schedule based on previous job order Stale info (progress) for global scheduler – Compute job order based on stale info Inaccurate estimation for task duration – Performance gains robust to error

Scheduling Decision Point, Scalability Local scheduler computes upon available slot – Longest task from the first job Global scheduler computes upon job arrival/departure – No significant difference from task completion – Less overhead; less than tens of ms

Job/Task Size Job size is measured by number of tasks – Common approach in existing works Task duration is estimated based on: – Data processing rate (from history record) – Data size Performance gains robust to estimation error

SWAG vs. Reordering Reordering – Light-weight add-on – Won’t degrade any job’s completion time SWAG – Higher computational complexity – More improvements

SWAG vs. Optimal

Sensitivity to Task Distribution The most improvements happen at partly-skewed scenarios. – The gains first increases then decreases. All methods have similar performance in extreme scenarios.

Fairness Across Different Job Classes Slowdown: system time divided by service time All methods have best slowdown for small job class. SWAG and Reordering have better slowdowns for large job class.

Overhead

Sensitivity to #Datacenters

Sensitivity to Estimation Accuracy

System Architecture Global scheduler Collect system states from DCs. Compute job orders. Local scheduler Schedule tasks based on job orders. Report progress to controller. Scheduling decision upon job arrivals and departures.

Materials

B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2 B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2 B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2 B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2

B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2 B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2 B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2 B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2

A: 10 A: 1 DC 1DC 3DC 2 B: 3 B: 8 DC 1DC 3DC 2 C: 7 C: 6 DC 1DC 3DC C: 7 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2 B: 3 C: 7 B: 8 C: 6 DC 1DC 3DC 2 10; A: 12 tasks 10; B: 11 tasks B: 3 C: 7 B: 8 C: 6 DC 1DC 3DC 2 A: 10 A: 1

B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2 B: 3 C: 7 B: 8 A: 10 A: 1 C: 6 A: 1 DC 1DC 3DC 2

B-1: 3 tasks C-1: 7 tasks B-2: 8 tasks A-2: 10 tasks A-1: 1 task C-3: 6 tasks A-3: 1 task DC 1DC 3DC 2 B-1: 3 tasks C-1: 7 tasks B-2: 8 tasks A-2: 10 tasks A-1: 1 task C-3: 6 tasks A-3: 1 task DC 1DC 3DC 2 FCFS Global-SRPT B-1: 3 tasks C-1: 7 tasks B-2: 8 tasks A-2: 10 tasks A-1: 1 task C-3: 6 tasks A-3: 1 task DC 1DC 3DC 2 Optimal B-1: 3 tasks C-1: 7 tasks B-2: 8 tasks A-2: 10 tasks A-1: 1 task C-3: 6 tasks A-3: 1 task DC 1DC 3DC 2 FCFS: 13