Scalable and Coordinated Scheduling for Cloud-Scale computing

Slides:



Advertisements
Similar presentations
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Advertisements

Bag-of-Tasks Scheduling under Budget Constraints Ana-Maria Oprescu, Thilo Kielman Presented by Bryan Rosander.
Walter Binder University of Lugano, Switzerland Niranjan Suri IHMC, Florida, USA Green Computing: Energy Consumption Optimized Service Hosting.
A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
Charles Reiss *, Alexey Tumanov †, Gregory R. Ganger †, Randy H. Katz *, Michael A. Kozuch ‡ * UC Berkeley† CMU‡ Intel Labs.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Jonathan.
CPU Scheduling. Schedulers Process migrates among several queues –Device queue, job queue, ready queue Scheduler selects a process to run from these queues.
Tao Yang, UCSB CS 240B’03 Unix Scheduling Multilevel feedback queues –128 priority queues (value: 0-127) –Round Robin per priority queue Every scheduling.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Cluster Scheduler Reference: Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI’2011 Multi-agent Cluster Scheduling for Scalability.
New Challenges in Cloud Datacenter Monitoring and Management
GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks Íñigo Goiri, Kien Le, Thu D. Nguyen, Jordi Guitart, Jordi Torres, and Ricardo Bianchini.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian, Haojie Zhou, Yongqiang He,Li Zha 簡報人:碩資工一甲 董耀文.
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 1 Introduction Read:
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
© Pearson Education Limited, Chapter 16 Physical Database Design – Step 7 (Monitor and Tune the Operational System) Transparencies.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Trace Generation to Simulate Large Scale Distributed Application Olivier Dalle, Emiio P. ManciniMar. 8th, 2012.
Jockey Guaranteed Job Latency in Data Parallel Clusters Andrew Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.
The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.
BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.
1 Coscheduling in Clusters: Is it a Viable Alternative? Gyu Sang Choi, Jin-Ha Kim, Deniz Ersoz, Andy B. Yoo, Chita R. Das Presented by: Richard Huang.
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
A Platform for Fine-Grained Resource Sharing in the Data Center
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
CPU Scheduling CSCI Introduction By switching the CPU among processes, the O.S. can make the system more productive –Some process is running at.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Optimizing Distributed Actor Systems for Dynamic Interactive Services
Jacob R. Lorch Microsoft Research
Introduction to Load Balancing:
Curator: Self-Managing Storage for Enterprise Clusters
Introduction | Model | Solution | Evaluation
Parallel Programming By J. H. Wang May 2, 2017.
Resource Elasticity for Large-Scale Machine Learning
Parallel Algorithm Design
PA an Coordinated Memory Caching for Parallel Jobs
Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.
EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16.
Omega: flexible, scalable schedulers for large compute clusters
Apollo Weize Sun Feb.17th, 2017.
Lecture 21: Introduction to Process Scheduling
Energy Efficient Scheduling in IoT Networks
Scheduling Adapted from:
CPU SCHEDULING.
Chapter 6: CPU Scheduling
Lecture 21: Introduction to Process Scheduling
CPU Scheduling David Ferry CSCI 3500 – Operating Systems
Parallel Programming in C with MPI and OpenMP
Operating System Overview
Presentation transcript:

Scalable and Coordinated Scheduling for Cloud-Scale computing Apollo : Scalable and Coordinated Scheduling for Cloud-Scale computing 72150263 심윤석

INDEX Backgroud Goals & Challenges of Apollo Apollo Framework Evaluation Conclusion

Backgroud SCOPE DAG (Directed acyclic graph) Job Stage Task Compile 150 DOG

Backgroud

Goals & Challenges Minimize Job Latency & Maximize Cluster Utilization Scaling Heterogeneous workload Maximize Resource Utilization

Goals & Challenges Scale Job processes had GB to PB of data 100,000 scheduling request/sec (in peak time) Clusters contain over 20,000 servers Clusters run up to 170,000 tasks in parallel

Goals & Challenges Heterogeneous workload Short (Seconds) & Long (Hours) Execution Time I/O bound, CPU bound Various Resource Requirements (e.g. Memory, Cores) Data Locality (Long Task) & Scheduling Latency (Short Task)

Goals & Challenges Maximize Utilization Workload Fluctuates Regularly Especially CPU Utilization

Apollo Framework

Apollo Framework Distributed and Coordinate Scheduler

Apollo Framework Estimation Based Scheduling

Apollo Framework Wait-Time Update

Apollo Framework Wait-Time Matrix For represent server load Lightweight Expected Wait Time Future Resource Availability

Apollo Framework 𝐸=𝐼+𝑊+𝑅 𝐶= 𝑃 𝑠𝑢𝑐𝑐 𝐸+𝐾 1− 𝑃 𝑠𝑢𝑐𝑐 𝐸 Estimation-Based Scheduling For Minimize Task Completion Time Stable match algorithm Task Completion Time Equation E  Estimated Task Completion Time I  Initialization Time W  Wait Time R  Runtime Include Server Failure Cost C  Final Estimated Completion Time P  Success Probability K  Server Failure Panalty 𝐸=𝐼+𝑊+𝑅 𝐶= 𝑃 𝑠𝑢𝑐𝑐 𝐸+𝐾 1− 𝑃 𝑠𝑢𝑐𝑐 𝐸

Apollo Framework Distributed and Coordinate Scheduler One scheduler per one job Each scheduler make Independent Decision based on Global Status Conflicts can be occur

Apollo Framework Correcting Conflicts (Correction Machanism) Re-evaluates prior scheduling decisions Duplicate Scheduling Confidence Scattering completion time Randomization

Apollo framework Opportunistic Scheduling Opportunistic Task Maximize Utilization Random Scheduling  Fairness Opportunistic Task Can be preempted Can be upgrade to regular task Only consume idle resources Opportunistic Task can use if Regular Task does not exist

Evaluation Apollo at Sacle Scheduling Quality Evaluating Estimates Completion Time Correction Effectiveness Stable matching Efficiency

Evaluation Apollo at Scale Run 170,000 tasks in parallel Tracks 14,000,000 pending tasks Well utilized in weekday (90% median CPU utilization)

Evaluation Scheduling Quality 80% of Recurring jobs getting faster Significantly improved wait time Similar performance with Oracle (No schedule latency, conflicts, failure …)

Evaluation Evaluating Estimates Completion Time

Evaluation Correction Effectiveness Stable matching Efficiency 82% Success rate < 0.5% Trigger rate  Stable matching Efficiency

Conclusion Minimize Job Latency Maximize Cluster Utilization Loosely Coordinated Distributed Scheduler High Quality Scheduling Maximize Cluster Utilization Opportunistic Scheduling

reference https://www.usenix.org/conference/osdi14/technical- sessions/presentation/boutin https://www.usenix.org/sites/default/files/conference/protected- files/osdi14_slides_boutin.pdf