1 TUD-PDS A Periodic Portfolio Scheduler for Scientific Computing in the Data Center Kefeng Deng, Ruben Verboon, Kaijun Ren, and Alexandru Iosup Parallel.

Slides:

Advertisements

Similar presentations

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University

Advertisements

Evaluating the Cost-Benefit of Using Cloud Computing to Extend the Capacity of Clusters Presenter: Xiaoyu Sun.

SLA-Oriented Resource Provisioning for Cloud Computing

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.

Towards Provision of Quality of Service Guarantees in Job Scheduling Mohammad IslamPavan Balaji P. SadayappanD. K. Panda Computer Science and Engineering.

19 November 2013 Exploring Portfolio Scheduling for Long-term Execution of Scientific Workloads in IaaS Clouds Alexandru Iosup Delft University of Technology.

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

Cloud SUT proposal OSGcloud group. Objective To fill in the Research the group about the thinking within the OSG working group To solicit new ideas/proposals.

Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,

Proactive Prediction Models for Web Application Resource Provisioning in the Cloud _______________________________ Samuel A. Ajila & Bankole A. Akindele.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.

All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.

Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.

The Performance of Bags-Of-Tasks in Large-Scale Distributed Computing Systems Alexandru Iosup, Ozan Sonmez, Shanny Anoep, and Dick Epema ACM/IEEE Int’l.

DAS-3/Grid’5000 meeting: 4th December The KOALA Grid Scheduler over DAS-3 and Grid’5000 Processor and data co-allocation in grids Dick Epema, Alexandru.

1 A Performance Study of Grid Workflow Engines Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Corina Stratan Parallel.

1 Trace-Based Characteristics of Grid Workflows Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands Simon Ostermann,

Security-Driven Heuristics and A Fast Genetic Algorithm for Trusted Grid Job Scheduling Shanshan Song, Ricky Kwok, and Kai Hwang University of Southern.

1 Validation and Verification of Simulation Models.

June 28, Resource and Test Management in Grids Rapid Prototyping in e-Science VL-e Workshop, Amsterdam, NL Dick Epema, Catalin Dumitrescu, Hashim.

July 13, “How are Real Grids Used?” The Analysis of Four Grid Traces and Its Implications IEEE Grid 2006 Alexandru Iosup, Catalin Dumitrescu, and.

Euro-Par 2008, Las Palmas, 27 August DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan.

Is 99% Utilization of a Supercomputer a Good Thing? Scheduling in Context: User Utility Functions Cynthia Bailey Lee Department of Computer Science and.

CloudCmp: Shopping for a Cloud Made Easy Ang Li Xiaowei Yang Duke University Srikanth Kandula Ming Zhang Microsoft Research 6/22/2010HotCloud 2010, Boston1.

New Challenges in Cloud Datacenter Monitoring and Management

1 Efficient Management of Data Center Resources for Massively Multiplayer Online Games V. Nae, A. Iosup, S. Podlipnig, R. Prodan, D. Epema, T. Fahringer,

August 28, Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing Berkeley, CA, USA Alexandru Iosup, Nezih Yigitbasi,

Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

1 Cloud Computing Research at TU Delft – A. Iosup Alexandru Iosup Parallel and Distributed Systems Group Delft University of Technology The Netherlands.

Verification & Validation

1 EuroPar 2009 – POGGI: Puzzle-Based Online Games on Grid Infrastructures POGGI: Puzzle-Based Online Games on Grid Infrastructures Alexandru Iosup Parallel.

Marcos Dias de Assunção 1,2, Alexandre di Costanzo 1 and Rajkumar Buyya 1 1 Department of Computer Science and Software Engineering 2 National ICT Australia.

November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.

Resource Provisioning based on Lease Preemption in InterGrid Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing and Distributed Systems.

A Broker for Cost-efficient QoS aware Resource Allocation in EC2. Kurt Vermeersch Coordinator: Kurt Vanmechelen.

Fault-Tolerant Workflow Scheduling Using Spot Instances on Clouds Deepak Poola, Kotagiri Ramamohanarao, and Rajkumar Buyya Cloud Computing and Distributed.

Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.

© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.

1 Performance and Availability Models for IaaS Cloud and Their Applications Rahul Ghosh Duke High Availability Assurance Lab Dept. of Electrical and Computer.

1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.

Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.

1 ROIA 2009 – CAMEO: Continuous Analytics for Massively Multiplayer Online Games CAMEO: Continuous Analytics for Massively Multiplayer Online Games Alexandru.

The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.

CPU Scheduling CSCI 444/544 Operating Systems Fall 2008.

A Node and Load Allocation Algorithm for Resilient CPSs under Energy-Exhaustion Attack Tam Chantem and Ryan M. Gerdes Electrical and Computer Engineering.

Dynamic Load Balancing and Job Replication in a Global-Scale Grid Environment: A Comparison IEEE Transactions on Parallel and Distributed Systems, Vol.

CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 31 – Multimedia OS (Part 1) Klara Nahrstedt Spring 2011.

VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.

November 29, Our team: Undergrad Thomas de Ruiter, Anand Sawant, Ruben Verboon, … Grad Siqi Shen, Guo Yong, Nezih Yigitbasi Staff Henk Sips, Dick.

George Goulas, Christos Gogos, Panayiotis Alefragis, Efthymios Housos Computer Systems Laboratory, Electrical & Computer Engineering Dept., University.

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

Performance Analysis of Preemption-aware Scheduling in Multi-Cluster Grid Environments Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing.

June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.

QoPS: A QoS based Scheme for Parallel Job Scheduling M. IslamP. Balaji P. Sadayappan and D. K. Panda Computer and Information Science The Ohio State University.

Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,

Dynamic Placement of Virtual Machines for Managing SLA Violations NORMAN BOBROFF, ANDRZEJ KOCHUT, KIRK BEATY SOME SLIDE CONTENT ADAPTED FROM ALEXANDER.

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 32 – Multimedia OS Klara Nahrstedt Spring 2010.

Chapter 4 CPU Scheduling. 2 Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.

Lecture 4 CPU scheduling. Basic Concepts Single Process  one process at a time Maximum CPU utilization obtained with multiprogramming CPU idle :waiting.

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

Basic Concepts Maximum CPU utilization obtained with multiprogramming

Jacob R. Lorch Microsoft Research

Introduction | Model | Solution | Evaluation

Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.

Dynamically Negotiating Capacity Between On-demand and Batch Clusters

The Performance of Big Data Workloads in Cloud Datacenters

Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00

MagnaData: Scheduling Complex Workflows with Non-Functional Requirements in Datacenters Laurens Versluis Massivizing Computer research.

Presentation transcript:

1 TUD-PDS A Periodic Portfolio Scheduler for Scientific Computing in the Data Center Kefeng Deng, Ruben Verboon, Kaijun Ren, and Alexandru Iosup Parallel and Distributed Systems Group JSSSP, IPDPS WS, Boston, MA, USA May 24, 2013

2 PDS Group, TUD Warm-Up Question: (2 minutes think-time + 2 minutes open discussion) Think about own experience Convince your partner before proposing an answer Tell everyone the answer Q: What are the major issues of scheduling various types of workloads in current data centers?

3 PDS Group, TUD Data centers increasingly popular Constant deployment since mid-1990s Users moving their computation to IaaS clouds Consolidation efforts in mid- and large-scale companies Old scheduling aspects Hundreds of approaches, each targeting specific conditions—which? No one-size-fits-all policy New scheduling aspects New workloads New data center architectures New cost models Developing a scheduling policy is risky and ephemeral Selecting a scheduling policy for your data center is difficult Why Portfolio Scheduling?

4 PDS Group, TUD What is Portfolio Scheduling? In a Nutshell, for Data Centers Create a set of scheduling policies Resource provisioning and allocation policies, in this work Online selection of the active policy, at important moments Periodic selection, in this work Same principle for other changes: pricing model, system, …

5 PDS Group, TUD Agenda 1.Why portfolio scheduling? 2.What is portfolio scheduling? In a nutshell… 3.Our periodic portfolio scheduler for the data center 1.Operational model 2.A portfolio scheduler architecture 3.The creation and selection components 4.Other design decisions 4.Experimental results How useful is our portfolio scheduler? How does it work in practice? 5.Our ongoing work on portfolio scheduling 6.How novel is our portfolio scheduler? A comparison with related work 7.Conclusion

6 PDS Group, TUD Background Information Operational Model Single data center VM pool per user Provisioning and allocation of resources via policies Issues orthogonal to this model: failures, pre-emption, migration, … Which policy? Which resources?

7 PDS Group, TUD Portfolio Scheduling The Process CreationSelection ReflectionApplication Which policies to include?Which policy to activate? Which resources? What to log?Which changes to the portfolio?

8 PDS Group, TUD Portfolio Scheduling Components Creation Scheduling policy = (provisioning, job selection) tuple We assume in this work all VMs are equal and exclusively used (no VM selection policy—we study these in other work) Provisioning policies Start-Up: all resources available from start to finish of execution (classic) On-Demand, Single VM (ODS): one new VM for each queued job On-Demand, Geometric (ODG): grow-shrink exponentially On-Demand, Execution Time (ODE): lease according to estimation of queued runtime (uses historical information and a predictor) On-Demand, Wait Time (ODW): leases only for jobs with high wait times On-Demand, XFactor (ODX): tries to ensure constant slowdown, via observed wait time and estimated run time Job selection policies FCFS, SJF (assumes known or well-estimated run-times) Deng, Song, Ren, and Iosup. Exploring Portfolio Scheduling for Long-term Execution of Scientific Workloads in IaaS Clouds. Submitted to SC|13.

9 PDS Group, TUD Portfolio Scheduling Components Selection Periodic execution Simulation-based selection Utility function Alternatives simulator Expert human knowledge Running workload sample in similar environment, under different policies mathematical analysis Alternatives utility function Well-known and exotic functions Agmon Ben-Yehuda, Schuster, Sharov, Silberstein, Iosup. ExPERT: pareto-efficient task replication on grids and a cloud. IPDPS’12. α=β=1 Κ=100 R J : Total Runtime of Jobs R V : Total Runtime of VMs S: Slowdown

10 PDS Group, TUD Putting it all together Our Portfolio Scheduler (1) Creation (2) Selection (3) Application (4) Reflection

11 PDS Group, TUD Agenda 1.Why portfolio scheduling? 2.What is portfolio scheduling? In a nutshell… 3.Our periodic portfolio scheduler for the data center 4.Experimental results 1.Experimental Setup 2.How useful is our portfolio scheduler? 3.How does it work in practice? 5.Our ongoing work on portfolio scheduling 6.How novel is our portfolio scheduler? A comparison with related work 7.Conclusion

12 PDS Group, TUD Experimental Setup Simulator and Metrics The DGSim simulator Since 2007 Scheduling in single- and multi-cluster grids Scheduling in IaaS clouds Metrics Average Job Wait-Time Average Job Slowdown Resource utilization Charged Cost Utility Iosup, Sonmez, Epema. DGSim: Comparing Grid Resource Management Architectures through Trace-Based Simulation. Euro-Par 2008.

13 PDS Group, TUD Experimental Setup Synthetic and Real Traces Synthetic Workloads: 5 arrival patterns Real Trace: ANL Intrepid months 68,936 jobs

14 PDS Group, TUD Experimental Results, Synthetic Workloads Resource Utilization + Workload Utility POrtfolio leads to better utility Start-Up leads to poor utility POrtfolio leads to high utilization Start-Up leads to poor utilization

15 PDS Group, TUD Experimental Results, ANL Intrepid Workload Cost + Utilization + Utility POrtfolio not best for each metric POrtfolio leads to low cost POrtfolio leads to high utilization POrtfolio leads to high utility (slowdown-utilization compound)

16 PDS Group, TUD Experimental Results Operation of the Portfolio Scheduler Policy change follows arrival pattern ANL-Intrepid between Steady and Periodic

17 PDS Group, TUD Experimental Results Operation of the Portfolio Scheduler No single policy is always selected for the same workload Different workloads, different top-3 policies

18 PDS Group, TUD Agenda 1.Why portfolio scheduling? 2.What is portfolio scheduling? In a nutshell… 3.Our periodic portfolio scheduler for the data center 4.Experimental results 5.Our ongoing work on portfolio scheduling 1.Application also to online gaming, more complex scheduling policies 2.Algorithm for selection under limited selection time, utility functions 6.How novel is our portfolio scheduler? A comparison with related work 7.Conclusion

19 PDS Group, TUD Portfolio Scheduling for Online Gaming (also for Scientific Workloads) CoH = Cloud-based, online, Hybrid scheduling Intuition: keep rental cost low by finding good mix of machine configurations and billing options Main idea: portfolio scheduler = run both solver of an Integer Programming Problem and various heuristics, then pick best schedule at deadline Additional feature: Can use reserved cloud instances Promising early results, for Gaming (and scientific) workloads Shen, Deng, Iosup, and Epema. Scheduling Jobs in the Cloud Using On-demand and Reserved Instances, EuroPar’13.

20 PDS Group, TUD Ongoing Work (1)Job run time estimation (2)Different traces (4) Maximum simulation time (5) Time interval for portfolio simulation (3) Different selection criteria Deng, Song, Ren, and Iosup. Exploring Portfolio Scheduling for Long-term Execution of Scientific Workloads in IaaS Clouds. Submitted to SC|13.

21 PDS Group, TUD Agenda 1.Why portfolio scheduling? 2.What is portfolio scheduling? In a nutshell… 3.Our periodic portfolio scheduler for the data center 4.Experimental results 5.Our ongoing work on portfolio scheduling 6.How novel is our portfolio scheduler? A comparison with related work 7.Conclusion

22 PDS Group, TUD Related Work Computational portfolio design Huberman’97, Streeter et al.’07 ’12, Bougeret’09, Goldman’12, Gagliolo et al.’06 ’11, Feitelson et al. JSSPP’13 (next talk!) We focus on dynamic, scientific workloads We use an utility function that combines slowdown and utilization Modern portfolio theory in finance Markowitz’52, Magill and Constantinides’76, Black and Scholes’76 Dynamic problem set vs fixed problem set Different workloads and utility functions Selection and Application very different Historical simulation General scheduling

23 PDS Group, TUD Agenda 1.Why portfolio scheduling? 2.What is portfolio scheduling? In a nutshell… 3.Our periodic portfolio scheduler for the data center 4.Experimental results 5.Our ongoing work on portfolio scheduling 6.How novel is our portfolio scheduler? A comparison with related work 7.Conclusion

24 PDS Group, TUD Conclusion Take-Home Message Portfolio Scheduling = set of scheduling policies, online selection Creation, Selection, Application, Reflection Periodic portfolio scheduler for data centers Explored Creation and, especially, Selection Good results for synthetic and real traces Easy to setup, easy to trust JSSPP’13, EuroPar’13, SC’13 (?) Reality Check (future work): we will apply it in the DAS Alexandru Iosup