Marian Bubak Department of Computer Science

Slides:

Advertisements

Similar presentations

Introduction to Grid Application On-Boarding Nick Werstiuk

Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University

System Center 2012 R2 Overview

Scientific Workflow Support in the PL-Grid Infrastructure with HyperFlow Bartosz Baliś, Tomasz Bartyński, Kamil Figiela, Maciej Malawski, Piotr Nowakowski,

Towards Autonomic Adaptive Scaling of General Purpose Virtual Worlds Deploying a large-scale OpenSim grid using OpenStack cloud infrastructure and Chef.

Polish Infrastructure for Supporting Computational Science in the European Research Space GridSpace Based Virtual Laboratory for PL-Grid Users Maciej Malawski,

ProActive Task Manager Component for SEGL Parameter Sweeping Natalia Currle-Linde and Wasseim Alzouabi High Performance Computing Center Stuttgart (HLRS),

Next Generation Domain-Services in PL-Grid Infrastructure for Polish Science. Numerical Simulations of Metal Forming Production Processes and Cycles by.

WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.

EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Cracow Grid Workshop’10 Kraków, October 11-13,

Towards auto-scaling in Atmosphere cloud platform Tomasz Bartyński 1, Marek Kasztelnik 1, Bartosz Wilk 1, Marian Bubak 1,2 AGH University of Science and.

EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Towards scalable, semantic-based virtualized storage.

Word Wide Cache Distributed Caching for the Distributed Enterprise.

Polish Infrastructure for Supporting Computational Science in the European Research Space Policy Driven Data Management in PL-Grid Virtual Organizations.

 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.

Advanced Grid-Enabled System for Online Application Monitoring Main Service Manager is a central component, one per each.

CGW 2003 Institute of Computer Science AGH Proposal of Adaptation of Legacy C/C++ Software to Grid Services Bartosz Baliś, Marian Bubak, Michał Węgiel,

Elastic Applications in the Cloud Dinesh Rajan University of Notre Dame CCL Workshop, June 2012.

SimCity Building Blocks at the DICE team

Łukasz Skitał 2, Renata Słota 1, Maciej Janusz 1 and Jacek Kitowski 1,2 1 Institute of Computer Science AGH University of Science and Technology, Mickiewicza.

:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: Dennis Hoppe (HLRS) ATOM: A near-real time Monitoring.

DISTRIBUTED COMPUTING

Next Generation Domain-Services in PL-Grid Infrastructure for Polish Science Daniel Bachniak 1, Jakub Liput 2, Łukasz Rauch 1, Renata Słota 2,3, Jacek.

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

In each iteration macro model creates several micro modules, sends data to them and waits for the results. Using Akka Actors for Managing Iterations in.

1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.

DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.

Styx Grid Services: Lightweight, easy-to-use middleware for e-Science Jon Blower Keith Haines Reading e-Science Centre, ESSC, University of Reading, RG6.

Cloud Age Time to change the programming paradigm?

 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.

EC-project number: Universal Grid Client: Grid Operation Invoker Tomasz Bartyński 1, Marian Bubak 1,2 Tomasz Gubała 1,3, Maciej Malawski 1,2 1 Academic.

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

Scalarm: Scalable Platform for Data Farming D. Król, Ł. Dutka, M. Wrzeszcz, B. Kryza, R. Słota and J. Kitowski ACC Cyfronet AGH KU KDM, Zakopane, 2013.

Centre d’Excellence en Technologies de l’Information et de la Communication Evolution dans la gestion d’infrastructure de type Cloud (SDI)

A scalable and flexible platform to run various types of resource intensive applications on clouds ISWG June 2015 Budapest, Hungary Tamas Kiss,

Lightweight construction of rich scientific applications Daniel Harężlak(1), Marek Kasztelnik(1), Maciej Pawlik(1), Bartosz Wilk(1) and Marian Bubak(1,

Towards large-scale parallel simulated packings of ellipsoids with OpenMP and HyperFlow Monika Bargieł 1, Łukasz Szczygłowski 1, Radosław Trzcionkowski.

Federating PL-Grid Computational Resources with the Atmosphere Cloud Platform Piotr Nowakowski, Marek Kasztelnik, Tomasz Bartyński, Tomasz Gubała, Daniel.

Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

Workflow scheduling and optimization on clouds

Parameter Sweep and Resources Scaling Automation in Scalarm Data Farming Platform J. Liput, M. Paciorek, M. Wrona, M. Orzechowski, R. Slota, and J. Kitowski.

Metadata Organization and Management for Globalization of Data Access with Michał Wrzeszcz, Krzysztof Trzepla, Rafał Słota, Konrad Zemek, Tomasz Lichoń,

Collection and storage of provenance data Jakub Wach Master of Science Thesis Faculty of Electrical Engineering, Automatics, Computer Science and Electronics.

CSE 5810 Biomedical Informatics and Cloud Computing Zhitong Fei Computer Science & Engineering Department The University of Connecticut CSE5810: Introduction.

Copyright © Univa Corporation, All Rights Reserved Using Containers for HPC Workloads HEPiX – Apr 21, 2016 Fritz Ferstl – CTO, Univa.

Federating cloud computing resources for scientific computing Marian Bubak Department of Computer Science and ACC Cyfronet AGH University of Science and.

Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,

PLG-Data and rimrock Services as Building

Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)

Federating cloud computing resources for scientific computing

New Paradigms: Clouds, Virtualization and Co.

In quest of the operational database for real-time environmental monitoring and early warning systems Bartosz Baliś, Marian Bubak, Daniel Harezlak, Piotr.

Organizations Are Embracing New Opportunities

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING

Introduction to Distributed Platforms

From VPH-Share to PL-Grid: Atmosphere as an Advanced Frontend

Cloudy Skies: Astronomy and Utility Computing

Flowbster: Dynamic creation of data pipelines in clouds

Model Execution Environment for Investigation of Heart Valve Diseases

DICE - Distributed Computing Environments Team

Grid Computing.

Recap: introduction to e-science

AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.

University of Technology

Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering

Cloud Computing Dr. Sharad Saxena.

Model-Driven Analysis Frameworks for Embedded Systems

1ACC Cyfronet AGH, Kraków, Poland

Technical Capabilities

Overview of Workflows: Why Use Them?

Presentation transcript:

Scientific Applications with HyperFlow and Scalarm on the PaaSage Platform Marian Bubak Department of Computer Science AGH University of Science and Technology Krakow, Poland http://dice.cyfronet.pl ESOCC 2016 Conference, Vienna, September 5-7, 2016

Coauthors Maciej Malawski Bartosz Balis Kamil Figiela Maciej Pawlik Dariusz Krol Renata Slota Michal Orzechowski Jacek Kitowski Dennis Hoppe dice.cyfronet.pl www.hlrs.de

Outline Motivation: scientific applications vs cloud resources Cloud platform – PaaSage Application deployment and execution modeling with CAMEL Workflows on clouds with HyperFlow Scalability and scheduling of workflows Parameter study on clouds with Scalarm Sample results Conclusions

Scientific applications vs cloud resources Cloud = a complex ecosystem Users, advanced middleware services, security requirements, resource usage quotas, etc. Challenge: a solution for deployment and execution of scientific applications on clouds in an automated cost- and performance-effective way: adaptable to diverse cloud infrastructures transparent to the user lightweight: minimizes user’s effort (setup, configuration) maintainable: easy to integrate and fast to update leverage cloud elasticity for autoscaling of scientific workflows loosely coupled integration with cloud management platforms deployment description in a cloud independent way automated deployment in cross-cloud environments cost- and performance-based deployment optimization scaling out/in based on application-specific metrics Maciej Malawski, Bartosz Balis, Kamil Figiela, Maciej Pawlik, Marian Bubak: Support for Scientific Workflows in a Model-Based Cloud Platform. UCC 2015: 412-413

HLRS Molecular dynamics Docking task is a parallel MPI job either a 32+ core VM Or multiple VMs (virtual cluster) Compute intensive workflow e.g. on 16 cores, 8M molecules, simulation time 0.05: Run time 50 minutes Initialization Preprocessing MD simuation Postprocessing End

Montage (astronomy) workflow Characteristics: all tasks are "dataflow" all consume and produce files each task is fired only once tasks are I/O intensive Execution model: HyperFlow passes the command to be invoked on a VM for each task Executor fetches tasks ready for execution from a queue, executes the command, and puts a message back on the queue Very large (100K+) pipelines (DAGs) of resource-intensive tasks

eScience workflows on the PaaSage platform PaaSage community e.g. MD from HLRS PL-Grid community: bioinformatics (genomics, proteomics) metals engineering (complex metallurgical processes) Virtual Physiological Human (Taverna and DataFluo workflows) multiscale applications: fusion (Kepler workflows) military mission planning support (EDA) astronomy (Pegasus workflows) Results HyperFlow: workflow execution engine based on REST paradigm Scalarm: massively self-scalable platform for data farming

PaaSage Platform http://www.paasage.eu An open and integrated platform with an accompanying methodology that allows model-based development, configuration, optimisation, and deployment of cloud applications independently of underlying cloud infrastructures. http://www.paasage.eu

Model-based development and deployment of cloud applications PaaSage cloud platform: CAMEL: Cloud Application Modeling and Execution Language Deployment model: components, connections Requirements model Scalability model Multi-cloud application deployment Autoscaling, adaptation Integration with workflow systems: CAMEL app model is generated from a workflow description Workflow monitoring information triggers workflow autoscaling

Scientific applications on multi-clouds Extensions to PaaSage platform a new Domain-Specific Language for describing workflows workflow planning based on user-provided objectives and constraints a new workflow execution engine Take benefit of PaaSage services automatic and adaptive deployment of workflow components autoscaling driven by rules Workflow description (DAG) Reasoner Prepares a deployment plan Workflow Engine Executes workflow tasks Deployer Launches VMs Constraint, objectives Executionware Deployment plan, autoscaling rules Enforcement engine Applies rules Adapter Concrete deployment commands and rules Monitoring events Upperware Workflow Planner Plans workflow execution CAMEL description of workflow application Bartosz Balis, Kamil Figiela, Maciej Malawski, Maciej Pawlik, Marian Bubak: A Lightweight Approach for Deployment of Scientific Workflows in Cloud Infrastructures. PPAM (1) 2015: 281-290

Lightweight workflow programming and execution environment Simple wf description (JSON) Advanced programming of wf activities (JavaScript) Running a workflow – simple command line client hflowc run <workflow_dir> function getPathWayByGene(ins, outs, config, cb) { var geneId = ins.geneId.data[0], url = ... http({"timeout": 10000, "url": url }, function(error, response, body) { ... cb(null, outs); }); } <workflow_dir> contains: File workflow.json (wf graph) File workflow.cfg (wf config) Optionally: file functions.js (advanced workflow activities) Input files

HyperFlow on the PaaSage platform PaaSage cloud platform: Model-based development of cloud applications Multi-cloud application deployment Autoscaling according to application demands Integration of HyperFlow with PaaSage: CAMEL app model generated from HyperFlow Wf description Task scheduler delivers initial deployment plan and scalability rules Workflow monitoring information triggers workflow autoscaling Bartosz Baliś, Marian Bubak, Kamil Figiela, Maciej Malawski, Maciej Pawlik, Towards Deployment and Autoscaling of Scientific Workflows with HyperFlow and PaaSage, CGW’14

Workflow generic scenario: HLRS MD with MPI Worker Storage Workflow Engine (Master) MPI Worker MPI Master MPI App Bin Job Exec WWW NFS Rabbit HyperFlow Redis Worker MPI Worker MPI Master Flexiant [location: UK] Each HyperFlow worker requires a set of machines - a virtual cluster consisting of MPI master and MPI worker The VMs need to be on the same IP subdomain Application Binaries App Bin RabbitMQ Server Rabbit Job Executor Job Exec HyperFlow Engine HyperFlow NFS Server NFS Redis DB Redis MPI Runtime MPI WWW Web server 13

Example application goals and scalability rules My workflow can run in parallel and can scale up to 8 VMs I need to dynamically scale out my virtual cluster based on utilization: if resource utilization of VMs > 90% for more than 3 minutes then add a new worker VM if resource utilization of VMs < 10% for more than 3 minutes then terminate this worker VM I prepared an execution plan with constraints: My workflow consists of 2 stages: For stage 1 I need 8 VMs; For stage 2 I need to add 8 more VMs → please monitor the WorkflowStage metric published by workflow engine and if WorkflowStage > 1 then add 8 worker VMs All VMs need to be of the same type: m3.xxxlarge on Amazon or 8-core VM on any provider (alternative) I need 16 core-hours to run my workflow Please find the cheapest deployment Additional constraint: I have a quota on VM number: Max 8 instances on OpenStack Max 20 instances on Amazon I need to terminate all workers when the workflow is complete → please monitor WorklfowExecutionState metric published by workflow engine and if WorkflowExecutionState == DONE then terminate all the workers

Scheduling and provisioning plan Example workflow with 3 stages Scaling rule: launch 7 VMs for stage 2 Scaling rule: terminate 2 VMs for stage 3 Tasks of stage 1 VM Tasks of stage 2 Tasks of stage 3 Time Maciej Malawski, Kamil Figiela, Marian Bubak, Ewa Deelman, Jarek Nabrzyski: Scheduling Multilevel Deadline-Constrained Scientific Workflows on Clouds Based on Cost Optimization. Scientific Programming 2015: 680271:1-680271:13 (2015)

Parameter study with Scalarm on PaaSage platform Scalarm orchestrates parameter studies and data farming experiments PaaSage framework manages Scalarm deployment in cross-cloud environment Scalarm is modeled (services, deployment and scaling) with the CAMEL language Obtained solution is published in the PaaSage social network D. Król, M. Orzechowski, J. Liput, R. Słota, and J. Kitowski. Model-based execution of scientific applications on cloud infrastructures: Scalarm case study, in: Proceedings of Cracow Grid Workshop, pp. 77-78, 2014. D. Król, R. Da Silva, E. Deelman and V. Lynch, Workflow Performance Profiles: Development and Analysis, accepted: The International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar'2016), Euro-Par 2016.

Main goal: execution of the same application with different input parameter values i.e. support different steps of a data farming/parameter studies process input parameter space specification application execution with different input parameter values collecting results and analysis

Overview of results Behavior analysis of security forces M. Kvassay, L. Hluchý, S. Dlugolinský, B. Schneider, H. Bracker, A. Tavčar, M. Gams, M. Contat, L. Dutka, D. Król, M. Wrzeszcz, J. Kitowski, A Novel Way of Using Simulations to Support Urban Security Operations. COMPUTING AND INFORMATICS, 34(6), 2015. Molecular dynamics - nano droplet simulation D. Król, M. Orzechowski, J.Kitowski, Ch. Niethammer, A. Sulisto, A. Wafai, A Cloud-Based Data Farming Platform for Molecular Dynamics Simulations, in: proc. 7th IEEE/ACM International Conference on Utility and Cloud Computing (UCC), 8-11, London UK, IEEE 2014, pp. 579 – 584. Hot rolling mill design D. Król, R. Slota, J. Kitowski, L. Rauch, K. Bzowski, M. Pietrzyk, Model-based approach to study hot rolling mills with data farming, in: T. Claus, et al. (eds.), Proc. of 30th European Conf. on Modelling and Simulations, Regensburg, 2016, OTH Regensburg 2016, pp. 495-501. Sensitivity analysis D. Bachniak, J. Liput, L. Rauch, R. Słota, and J. Kitowski. Massively Parallel Approach to Sensitivity Analysis on HPC Architectures by using Scalarm Platform. In Parallel Processing and Applied Mathematics : 11th international conference, PPAM 2015 : Kraków, Poland, September 6–9, 2015 : book of abstracts,, page 93, 2015. Material science - molecular dynamics + neutron scattering intensity calculations D. Król, R. Silva, E. Deelman, V. E. Lynch, Workflow Performance Profiles: Development and Analysis, accepted at Euro-Par 2016 (HeteroPar’16).

Summary HyperFlow + PaaSage improve reproducibility: CAMEL: complete description of infrastructure HyperFlow JSON: complete description of application On-demand deployment of the workflow runtime environment as part of the workflow application Workflow engine as another app component driving the execution of other components Avoidance of tight coupling to a particular cloud infrastructure and middleware

https://github.com/dice-cyfronet/hyperflow More at http://www.paasage.eu https://github.com/dice-cyfronet/hyperflow http://scalarm.com http://dice.cyfronet.pl bubak@agh.edu.pl

Acknowledgements This research was supported by EU FP-7 ICT Project PaaSage – 317715 Polish Grant 3033/7PR/2014/2