Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao.

Slides:

Advertisements

Similar presentations

1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.

Advertisements

Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.

Effective Straggler Mitigation: Attack of the Clones Ganesh Ananthanarayanan, Ali Ghodsi, Srikanth Kandula, Scott Shenker, Ion Stoica.

GRASS: Trimming Stragglers in Approximation Analytics Ganesh Ananthanarayanan, Michael Hung, Xiaoqi Ren, Ion Stoica, Adam Wierman, Minlan Yu.

Effective Straggler Mitigation: Attack of the Clones [1]

LIBRA: Lightweight Data Skew Mitigation in MapReduce

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.

MANAGING PROJECT RESOURCES. Resource Allocation Problem  A shortcoming of most scheduling procedures is that they do not address the issues of resource.

1 Communication Networks Kolja Eger, Prof. Dr. U. Killat 1 From Packet-level to Flow-level Simulations of P2P Networks Kolja Eger, Ulrich Killat Hamburg.

© 2009 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice SCAN-Lite: Enterprise-wide analysis.

Resource Management with YARN: YARN Past, Present and Future

Enabling High-level SLOs on Shared Storage Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, Ion Stoica Cake 1.

Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.

Inter-DC Measurements; App Workloads: Google, Facebook, Microsoft Aditya Akella Lecture 11.

Communication Pattern Based Node Selection for Shared Networks

Reciprocal Resource Fairness: Towards Cooperative Multiple-Resource Fair Sharing in IaaS Clouds School of Computer Engineering Nanyang Technological University,

Charles Reiss *, Alexey Tumanov †, Gregory R. Ganger †, Randy H. Katz *, Michael A. Kozuch ‡ * UC Berkeley† CMU‡ Intel Labs.

Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Delay.

UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.

CloudCmp: Shopping for a Cloud Made Easy Ang Li Xiaowei Yang Duke University Srikanth Kandula Ming Zhang Microsoft Research 6/22/2010HotCloud 2010, Boston1.

Cutting the Electric Bill for Internet-Scale Systems Andreas Andreou Cambridge University, R02

Scheduling a Large DataCenter Cliff Stein Columbia University Google Research June, 2009 Monika Henzinger, Ana Radovanovic Google Research.

Resource Management in Virtualization-based Data Centers Bhuvan Urgaonkar Computer Systems Laboratory Pennsylvania State University Bhuvan Urgaonkar Computer.

PLANNING ENGINEERING AND PROJECT MANAGEMENT By Lec. Junaid Arshad 1 Lecture#10 DEPARTMENT OF ENGINEERING MANAGEMENT.

A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian, Haojie Zhou, Yongqiang He,Li Zha 簡報人：碩資工一甲董耀文.

Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.

1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.

Scheduling in Cloud Presented by: Abdullah Al Mahmud Course: Cloud Computing(Fall 2012)

Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis [1] 4/24/2014 Presented by: Rakesh Kumar [1 ]

EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

Jockey Guaranteed Job Latency in Data Parallel Clusters Andrew Ferguson, Peter Bodik, Srikanth Kandula, Eric Boutin, and Rodrigo Fonseca.

Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.

Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.

Energy Aware Consolidation for Cloud Computing Srikanaiah, Kansal, Zhao Usenix HotPower 2008.

Hadoop System simulation with Mumak Fei Dong, Tianyu Feng, Hong Zhang Dec 8, 2010.

MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.

The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1.

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Scalable and Coordinated Scheduling for Cloud-Scale computing

A Platform for Fine-Grained Resource Sharing in the Data Center

Multi-Resource Packing for Cluster Schedulers Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, Aditya Akella.

Presented by Qifan Pu With many slides from Ali’s NSDI talk Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica.

Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.

Part III BigData Analysis Tools (YARN) Yuan Xue

Architecture for Resource Allocation Services Supporting Interactive Remote Desktop Sessions in Utility Grids Vanish Talwar, HP Labs Bikash Agarwalla,

Efficient Coflow Scheduling with Varys

PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

GRASS: Trimming Stragglers in Approximation Analytics

Packing Tasks with Dependencies

Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center

CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016

Altruistic Scheduling in Multi-Resource Clusters

CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017

PA an Coordinated Memory Caching for Parallel Jobs

Understanding and Exploiting Amazon EC2 Spot Instances

Altruistic Scheduling in Multi-Resource Clusters

Haoyu Zhang, Microsoft and Princeton University;

Omega: flexible, scalable schedulers for large compute clusters

COS 518: Advanced Computer Systems Lecture 14 Michael Freedman

Shanjiang Tang1, Bingsheng He2, Shuhao Zhang2,4, Zhaojie Niu3

Cloud Computing Large-scale Resource Management

Cloud Computing MapReduce in Heterogeneous Environments

Tiresias A GPU Cluster Manager for Distributed Deep Learning

Towards Predictable Datacenter Networks

Presentation transcript:

Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao

Diverse Resource Requirements Tasks need varying amounts of each resource  E.g., Memory [100MB to 17GB] CPU [2% of a core to 6 cores] Need to match tasks with machines based on resource Demands for resources are not correlated  Correlation coefficient across resource demands  [–0.11, 0.33]

Current Schedulers do not Pack Resources allocated in terms of “slots” Resource Fragmentation T1: 2 GB T2: 2 GB T3: 4 GB 4 GB Memory Machine A 4 GB Memory Machine B Current SchedulersPacker Schedulers T1: 2 GB T2: 2 GB T3: 4 GB 4 GB Memory Machine A 4 GB Memory Machine B

Current Schedulers do not Pack Resources allocated in terms of “slots” Over-allocation 20 MB/s In Nw. T1: 2 GB Mem Current Schedulers 4 GB Memory 20 MB/s In Nw. Machine A 20 MB/s In Nw. T2: 2 GB Mem 20 MB/s In Nw. 20 MB/s In Nw. Packer Schedulers 20 MB/s In Nw. T1: 2 GB Mem 4 GB Memory 20 MB/s In Nw. Machine A 20 MB/s In Nw. T2: 2 GB Mem T3: 2 GB Mem

Current Schedulers do not Pack Slots allocated purely on fairness considerations Cluster [18 Cores, 36 GB] / Job: [Task Prof.], # tasks A[1 Core, 2 GB], 18 B[3 Cores, 1 GB], 6 C 6 tasks 2 tasks 18 cores 16 GB 6 tasks 2 tasks 6 tasks 2 tasks 18 cores 16 GB 18 cores 16 GB A B C t 2t 3t 18 tasks 0 tasks 18 cores 36 GB 6 tasks 0 tasks6 tasks 18 cores 6 GB 18 cores 6 GB A B C t 2t 3t Durations: DRF Packer Resources used: DRF share = 1/3Resources used: Packer Current Schedulers Packer Schedulers A: 3t B: 3t C: 3t Durations: A: t B: 2t C: 3t 33% improvement

It is all about packing ? Multi-dimensional bin packing is NP-hard for #dimens. ≥ 2  Several heuristics proposed  But they do not apply here … size of the ball, contiguity of allocation, resource demands are elastic in time Will perfect packing suffice ? Competing objectives: Cluster utilization vs. Job completion times vs. Fairness

Something reasonably simple and which can be applied Intuition behind the solution Cluster efficiencyJob completion time Cluster efficiency Fairness

Tetris Pack tasks along multiple resources  Cosine similarity between task demand vector and machine resource vector Multi-resource version of SRTF  Favor jobs with small remaining duration and small resource consumption Incorporate Fairness  Fairness knob  (0, 1] f → 0 close to perfect fairness f = 1 most efficient scheduling A T F 1: while (resources R are free) 2: among  FJ  jobs furthest from fair share 3: score (j) = 4: max task t in j, demand(t) ≤ R A(t, R) +  T(j) 5: pick j*, t* = argmax score(j) 6: R = R – demand(t*) 7: end while (simplified) Scheduling procedure

Learning task requirements  From tasks that have finished in the same phase  Coefficient of variation  [0.022, 0.41]  Collecting statistics from recurring jobs Task Requirements and resource usages Resource Tracker  measure actual usage of resources  enforce allocations  aware of activities on the cluster other than tasks assignment: ingest and evacuation  Peak usage demands estimates for tasks Machine - In Network MBytes / sec Time (sec) In Network Used In Network Free Task In Network Estimates Resource Tracker

Prototype atop Hadoop 2.3 Evaluation  Tetris as a pluggable scheduler to RM  Implement RT as a NM service  Modified AM/RM resource allocation protocol Cluster capacity: 250 Nodes 4 hour synthetic workload 60 jobs with complementary task demands Reduction (%) in Job Duration - DRF CDF Reduces average job duration by up to 40% Reduces makespan by 39% Large scale evaluation

Evaluation Facebook production traces analysis Fairness knob: fewer than 6% of jobs slow down; by not more than 8% on average Knob value of 0.75 offers nearly the best possible efficiency with little unfairness Trace-driven simulation Fairness Knob - Fair Slowdown (%) Fairness Knob - DRF Slowdown (%)

Conclusion Identify the importance of scheduling all relevant resources in a cluster Resource Fragmentation Over-allocation and Interference New scheduler that pack tasks along multiple resources Reduce makespan Job Completion Time Enable a trade-off between packing efficiency and fairness Fairness Knob Come and see our poster !