Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workflow scheduling and optimization on clouds

Similar presentations


Presentation on theme: "Workflow scheduling and optimization on clouds"— Presentation transcript:

1 Workflow scheduling and optimization on clouds
Maciej Malawski AGH University of Science and Technology Department of Computer Science Academic Computer Centre CYFRONET Kraków, Poland University of Notre Dame Center for Research Computing Indiana, USA

2 Problem space and selected areas
Applications: Single cloud – delays, caching Scientific workflows Problems: Workflow Ensembles (multiple workflows) Resource provisioning (creating resources on-demand) Bag-of-task applications Infrastructure: Task scheduling (assigning tasks to resources) Interplay between autoscaling systems and schedulers IaaS clouds (Amazon, Google, Azure) Single cloud Algorithms Multiple clouds Private clouds (OpenStack) Static planning Alternative/emerging infrastructures (Google App Engine, AWS Lambda, EC2 burstable instances (T2)) Dynamic scheduling Mathematical programming Adaptive Optimization objectives and constraints Interesting problems Cost optimization under deadline constraint Uncertainty of estimations Maximization of completed workflows under budget and deadline constraints Task granularity vs. resource billing frequency Cloud Storage aspects in scheduling Performance modeling Multiple clouds – inter-cloud storage and transfer Evaluation of clouds Application benchmarking

3 Workflow Ensembles - Problem Description
Typical research question: How much computation can we complete given the limited time and budget of our research project? Constraints: Budget and Deadline Goal: given budget and deadline, maximize the number of prioritized workflows in an ensemble Workflow = DAG of tasks NVM = Budget / Time VM # Deadline Budget = area Time

4 Dynamic Algorithm: DPDS Workflow-Aware: WA-DPDS Static Algorithm: SPSS
VM.1 a.70 a.95 b.75 b.60 b.100 Budget $18 = 3VM * 6h Dynamic Algorithm: DPDS Deadline 360 minutes VM.2 b.70 a.110 b.90 c.50 c.65 Priority: VM.3 c.60 c.45 a.100 a.160 a.95 60 120 180 240 300 360 Time in minutes a.70 a.110 a.160 1 a.100 a.70 a.95 a.110 a.100 a.160 b.70 b.75 b.90 b.60 b.100 Workflow-Aware: WA-DPDS VM.1 b.75 VM.2 2 b.70 b.90 b.100 VM.3 b.60 60 120 180 240 300 360 Time in minutes Deadlines Blue: 74, (181, 196, 186), 360 Red: 90, (225, 240, 210), 360 Green: 96, (249, 254, 269), 360 c.45 a.70 a.95 a.110 a.100 a.160 c.60 c.45 c.50 c.65 c.55 VM.1 3 c.60 c.50 c.55 VM.2 c.65 Static Algorithm: SPSS VM.3 VM.4 60 120 180 240 300 360 Time in minutes

5 Evaluation Simulation Ensembles Experiments
Enables us to explore a large parameter space Simulator uses CloudSim framework CloudWorkflowSimulator Ensembles Use synthetic workflows generated using parameters from real applications (Montage, CyberShake, LIGO, SIPHT, Epigenomics) Randomized using different distributions, priorities Experiments Determine relative performance Measure effect of low quality estimates and delays M. Malawski, G. Juve, E. Deelman, J. Nabrzyski: Algorithms for Cost- and Deadline-Constrained Provisioning for Scientific Workflow Ensembles in IaaS Clouds. Future Generation Computer Systems, vol. 48, pp (July 2015)

6 Model of storage and data access in clouds
Problem: most scheduling algorithms assume p2p communication between nodes Scientific workflow are data-intensive Communication to computation ratio Existing cloud storage technologies We assume: 1..N replicas Bandwidth limited at VM and replica endpoint Latency, Fair sharing of bandwidth We can model: In-memory storage – memcache Cloud storage – Amazon S3 Shared filesystem - NFS

7 Storage and locality-aware algorithms
Include data transfer estimates in task runtimes: Storage-Aware DPDS (SA-DPDS), Storage- and Workflow-Aware DPDS (SWA-DPDS), Storage-Aware SPSS (SA-SPSS). New scheduling algorithm that takes advantage of caches and file locality to improve performance. Dynamic Provisioning Locality-Aware Scheduling (DPLS), Storage- and Workflow-Aware DPLS (SWA-DPLS).

8 Locality-Aware Scheduling
Examine the Virtual Machines caches at the time of task submission Chooses the Virtual Machine on which the task is predicted to finish earliest Use both runtime and file transfer time estimates. Using cache No cache Low priority task

9 Selected results for in-memory storage (memcache)

10 Parallel transfer and cache hit ratio
Applications with high degree of parallelism can benefit from parallel transfers Cache can improve cloud storage significantly

11 Inaccurate Runtime Estimate Results
Cost / Budget Makespan / Deadline Box plots show the ratio of the ensemble cost to budget, and of ensemble makespan to deadline. Whiskers on the plots indicate maximum and minimum values. The ratio indicates whether the value (for example, the simulated cost) exeeded the constraint (the budget). Values greater than 1 indicate that the constraint was exceeded.

12 Task granularity Workflows with many short tasks are much easier to schedule using simple dynamic algorithms When tasks are closer to the cloud billing cycle (e.g. 1 hour) the static planning algorithms have advantage Montage with artificially stretched tasks

13 Cost optimization of applications on multiple clouds
Infrastructure model Multiple compute and storage clouds Heterogeneous instance types Application model Bag of tasks Multi-level workflows Mathematical modeling with AMPL and CMPL Cost optimization under deadline constraints Mixed integer programming Bonmin, Cplex solvers Models for fine-grained and coarse-grained workflows Adaptive scheduling model: Static scheduling level-by-level M. Malawski, K. Figiela, J. Nabrzyski: Cost minimization for computational applications on hybrid cloud infrastructures, Future Generation Computer Systems, Volume 29, Issue 7, September 2013, Pages , ISSN X, M. Malawski, K. Figiela, M. Bubak, E. Deelman, and J. Nabrzyski: Scheduling multi-level deadline-constrained scientific workflows on clouds based on cost optimization. Scientific Programming (2015) Tomasz Dziok, Kamil Figiela, Maciej Malawski: Adaptive Multi-level Workflow Scheduling with Uncertain Task Estimates, PPAM’2015 (accepted)

14 PaaSage – Deployment and Execution of Scientific Workflows in Model-based Cloud Platform Upperware
Motivation Provisioning of multi-cloud resources for scientific workflows Loosely coupled integration with cloud management platforms Leverage cloud elasticity for autoscaling of scientific workflows driven by workflow execution stage Objectives Integrate the HyperFlow workflow runtime environment with the PaaSage cloud platform Application-agnostic interplay of application-specific workflow scheduler with generic provisioning and autoscaling components of PaaSage Novelty On-demand deployment of the workflow runtime environment as part of the workflow application Workflow engine as another app component driving the execution of other components Avoidance of tight coupling to a particular cloud infrastructure and middleware PaaSage platform Open and integrated platform to support model-driven development, deployment and adaptive execution of multi- cloud applications. Integration with PaaSage CAMEL application model automatically generated based on the HyperFlow workflow description. Includes initial deployment plan and scalability rules which control autoscaling behavior Monitoring information sent from the Task scheduler and VM workers to the PaaSage Executionware; Triggers the scalability rules and automatic scaling of the workflow application Bartosz Baliś, Marian Bubak, Kamil Figiela, Maciej Malawski, Maciej Pawlik, Towards Deployment and Autoscaling of Scientific Workflows with HyperFlow and PaaSage, CGW’14

15 Levee Monitoring Application – ISMOP project
Levee breach threat due to a passing wave High water levels lasting for up to 2 weeks Large areas of levees affected (100+ km)

16 ISMOP threat level assessment workflow
Implemented in HyperFlow workflow engine

17 ISMOP resource provisioning model
Cost optimization under deadline Bag-of-task model Selection of dominating tasks Uniform task runtimes Performance model: T = f (v, d, s, …) T – total computing time v – number of VMs d – time window in days s – number of tasks (sections) 𝑇=𝑎∗ 𝑠∗𝑑 𝑣 +𝑏∗𝑣+𝑐 (1) Parameters a, b, c to be determined experimentally Solve eq. (1) to compute number of VMs given a deadline

18 ISMOP Experiments Setup: private cloud infrastructure Test runs:
a node with 8 cores (Xeon E5-2650) virtual machines (1VCPU, 512MB RAM) data for simulated scenarios (244MB total) on local disks Test runs: 128 sections, 16 days 16 VMs, 1 day 128 sections, 1 day 1024 sections, 1 day 16 VMs, 16 VMs Warmup tasks:

19 { Analysis of results Warmup tasks clearly separated as outliers
Linear functions Parameters a, b, c determined using non-linear fit The model fits well to the data Bartosz Balis, Marek Kasztelnik, Maciej Malawski, Piotr Nowakowski, Bartosz Wilk, Maciej Pawlik, Marian Bubak, Execution Management and Efficient Resource Provisioning for Flood Decision Support, Procedia Computer Science, Volume 51, 2015, Pages , ISSN ,

20 Cloud performance evaluation
Performance of VM deployment times Virtualization overhead Evaluation of open source cloud stacks (Eucalyptus, OpenNebula, OpenStack) Survey of European public cloud providers Performance evaluation of top cloud providers (EC2, RackSpace, SoftLayer) A grant from Amazon has been obtained M. Bubak, M. Kasztelnik, M. Malawski, J. Meizner, P. Nowakowski and S. Varma: Evaluation of Cloud Providers for VPH Applications, poster at CCGrid th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Delft, the Netherlands, May 13-16, 2013

21 Cloud and Big Data Benchmarking and Verification Methodology
Methodology of Evaluation of systems and applications Qualitative metrics (architectures, functionality) Quantitative metrics (performance, stability, cost) Test scenarios, test cases and parameters Experiment planning, analysis of results Selection of benchmarks Portfolio of standard benchmarks Design of application-specific scenarios Target platforms IaaS clouds (public, private) Hybrid Clouds with cloud bursting Real-Time BigData processing systems (Hadoop, Spark, ElasticSearch) Collaboration with Samsung R&D Polska Methodology applied to cloud infrastructure at the industrial partner Consultancy on the analysis of results and development of Testing-as-a-service (TaaS) system K. Zieliński, M. Malawski, M. Jarząb, S. Zieliński, K. Grzegorczyk, T. Szepieniec, and M. Zyśk: Evaluation Methodology of Converged Cloud Environments. In: K. Wiatr, J. Kitowski, M. Bubak (Eds) Proceedings of the Seventh ACC Cyfronet AGH Users’ Conference, ACC CYFRONET AGH, Kraków, ISBN , pp (2014)

22 Thank you! DICE Team at AGH & Cyfronet PhD Student: MSc Students:
Marian Bubak, Piotr Nowakowski, Bartosz Baliś, Maciej Pawlik, Marek Kasztelnik, Bartosz Wilk, Tomasz Bartyński, Jan Meizner, Daniel Harężlak PhD Student: Kamil Figiela MSc Students: Piotr Bryk, Tomasz Dziok Notre Dame: Jarek Nabrzyski USC/ISI: Ewa Deelman, Gideon Juve Projects & Grants EU FP7 VPH-Share PL-Grid EU FP7 PaaSage ISMOP (PL) References: CloudWorkflowSimulator HyperFlow: DICE Team:


Download ppt "Workflow scheduling and optimization on clouds"

Similar presentations


Ads by Google