A scheduling component for e-Science Central Anirudh Agarwal Jacek Cała.

Slides:

Advertisements

Similar presentations

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University

Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.

Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.

Chapter 1: Introduction

Manage Run Activities Cognos 8 BI. Objectives  At the end of this course, you should be able to:  manage current, upcoming and past activities  manage.

GridFlow: Workflow Management for Grid Computing Kavita Shinde.

A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter ： S.Y.Chen.

Understanding Operating Systems 1 Overview Introduction Operating System Components Machine Hardware Types of Operating Systems Brief History of Operating.

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

Real-Time Kernels and Operating Systems. Operating System: Software that coordinates multiple tasks in processor, including peripheral interfacing Types.

The Architecture of Transaction Processing Systems

1/16/2008CSCI 315 Operating Systems Design1 Introduction Notice: The slides for this lecture have been largely based on those accompanying the textbook.

Understanding and Managing WebSphere V5

Internet GIS. A vast network connecting computers throughout the world Computers on the Internet are physically connected Computers on the Internet use.

Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.

ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.

Copyright © 2010 Platform Computing Corporation. All Rights Reserved.1 The CERN Cloud Computing Project William Lu, Ph.D. Platform Computing.

CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.

DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Modeling Framework Generally modeling framework is made up of the following components: A set of biophysical modules that simulate biological and physical.

 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.

MobSched: An Optimizable Scheduler for Mobile Cloud Computing S. SindiaS. GaoB. Black A.LimV. D. AgrawalP. Agrawal Auburn University, Auburn, AL 45 th.

Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.

Thanks to Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction n What is an Operating System? n Mainframe Systems.

OSG Public Storage and iRODS

 Escalonamento e Migração de Recursos e Balanceamento de carga Carlos Ferrão Lopes nº M6935 Bruno Simões nº M6082 Celina Alexandre nº M6807.

M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.

Threads, Thread management & Resource Management.

Cluster Reliability Project ISIS Vanderbilt University.

Scalable Cluster Management: Frameworks, Tools, and Systems David A. Evensky Ann C. Gentile Pete Wyckoff Robert C. Armstrong Robert L. Clay Ron Brightwell.

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.

Liam Newcombe BCS Data Centre Specialist Group Secretary Modelling Data Centre Energy Efficiency and Cost.

SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,

임규찬. 1. Abstract 2. Introduction 3. Design Goals 4. Sample-Based Scheduling for Parallel Jobs 5. Implements.

CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

Reference: Ian Sommerville, Chap 15  Systems which monitor and control their environment.  Sometimes associated with hardware devices ◦ Sensors: Collect.

Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”

Tools for collaboration How to share your duck tales…

Cloud Age Time to change the programming paradigm?

 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.

SC2012 Infrastructure Components Management Justin Cook (Data # 3) Principal Consultant, Systems Management Noel Fairclough (Data # 3) Consultant, Systems.

A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside

CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.

Centre d’Excellence en Technologies de l’Information et de la Communication Evolution dans la gestion d’infrastructure de type Cloud (SDI)

20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.

Clever Framework Name MARCH 27, Meeting Agenda  Framework Overview  Prototype 1 Design Goals  Prototype 1 Demo  Prototype 2 Design Goals  Timeline.

Tool Integration with Data and Computation Grid “Grid Wizard 2”

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

IPS Infrastructure Technological Overview of Work Done.

INFSO-RI Enabling Grids for E-sciencE Policy management and fair share in gLite Andrea Guarise HPDC 2006 Paris June 19th, 2006.

Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.

Resource Selection Services for a Single Job Execution Soonwook Hwang National Institute of Informatics/NAREGI OGSA F2F RSS Session Sunnyvale, CA, US Aug.

Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.

INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.

1 Module 3: Processes Reading: Chapter Next Module: –Inter-process Communication –Process Scheduling –Reading: Chapter 4.5, 6.1 – 6.3.

Organizations Are Embracing New Opportunities

OpenPBS – Distributed Workload Management System

Introduction to Load Balancing:

StratusLab Final Periodic Review

StratusLab Final Periodic Review

Grid Computing.

Introduction to Cloud Computing

Real-time Software Design

Jason Neih and Monica.S.Lam

Chapter 1 Introduction.

CPU scheduling decisions may take place when a process:

Building and running HPC apps in Windows Azure

Presentation transcript:

A scheduling component for e-Science Central Anirudh Agarwal Jacek Cała

Introduction. – Cloud-based workflow management system for data analytics. – Workflows composed of blocks which can be written in Java, R, Octave, JavaScript, Gnuplot, recently also bash. – Portable system – workflows can run on a laptop, cluster, private or public clouds. EUBrazil Cloud Connect – to create an intercontinental, federated infrastructure for the scientific use. – combined effort between Brazil and several EU countries. – 3 user applications to demonstrate potential of the EUBCC infrastructure: Leishmania Virtual Laboratory, Heart Simulation, Biodiversity and climate change 2

EUBrazil Cloud Connect AAI Opportunistic Cloud HPC COMPSs PMES CSGRID e-SC PDAS fogbow Private Cloud mc2 Users Execution & Provisioning Services Infrastructure Providers COMPSse-SC API Programming Frameworks & Services Data Providers IMVMRC LSF OCCICDMI BES x509 oAuth2 OVF VOMS OGE 3

EUBrazil Cloud Connect AAI Opportunistic Cloud HPC COMPSs PMES CSGRID e-SC PDAS fogbow Private Cloud mc2 Users Execution & Provisioning Services Infrastructure Providers COMPSs e-SC API Programming Frameworks & Services Data Providers IMVMRC LSF OCCICDMI BES x509 oAuth2 OVF VOMS OGE 4

e-Science Central workflow execution model Workflows are constructed from a number of interacting blocks. Each workflow invocation is deployed onto one engine as a single job. Each engine can process one or more workflows at a time. Workflows can be composite -- can submit sub-workflow invocations allowing for parallelism. 5

Advantages of the current model Simple management: – single pool of engines, – the pool can grow and shrink according to needs, – engines can be of different speed. Good scalability: – very little overheads. 6

Limitations of the current model To simple for more sophisticated needs: – heterogeneous workflows/blocks, – heterogeneous hardware infrastructure. No control over invocation dispatch policy: – no priorities – e.g. admin == user, – no fairness – single user can block the system submitting 1000s of invocations, – invocation messages may be consumed in an unfavourable manner. Invocation messages which are once moved to the JMS queue cannot be re-allocated. 7

Selected scheduler requirements To run workflows based on their hard and soft requirements and static and dynamic infrastructure capabilities: – support for heterogeneous workflows and resources <= federated resources, – data-aware scheduling, – user-defined scheduling policies. To allow system to adapt in size dynamically (cloud bursting, opportunistic resources). To allow users to specify the priority for the workflows. Improve the use of resources available: – offer users/administrators some optimisation strategies. 8

Our focus in EUBCC To run workflows based on their hard and soft requirements and static and dynamic infrastructure capabilities: – support for heterogeneous workflows and resources <= federated resources, – data-aware scheduling, – user-defined scheduling policies. To allow system to adapt in size dynamically (cloud bursting, opportunistic resources). To allow users to specify the priority for the workflows. Improve the use of resources available: – offer users/administrators some optimisation strategies. 9

Proposed solution Add a scheduling component (as a pluggable module) between the e-SC server and engines. Make use of the Performance Monitor which gathers information about the system. Have a one-one JMS queue for each engine (pool?). Based on a Scheduling Policy choose the best engine to send the workflow to. Make sure the pending workflows can be rescheduled when all the execution threads are busy. 10

Proposed Solution (cont.) 11

Progress so far… 12

DEMO 1 13

Progress so far (cont.) Current scheduling policy based on CPU load – not effective – just as a PoC. More advanced queue management – able to dynamically attach a new engine to a “scheluder” queue, – able to grow the queue pool if needed. Able to save workflow invocations in the scheduler when all the engine execution threads are exhausted – currently assuming there is 1 execution thread per engine. 14

Current problems and issues Simple CPU load policy. Engine vs engine pool per queue. Impact of the delay between engine --> PM --> scheduler. Missing event-based communication between the engine and server. 15

DEMO 2 16

Delay problem E-SC serverScheduler Performance Monitor Engine JMS Queue Start Workflow Check for Jobs Get Information from PM Send job to correct engine Update PM Update Server about job status 5 sec delay Gets wrong engine information from PM because of 5 second delay Wrong engine maybe selected or wrong task maybe assigned 17

Expected issues and problems For more sophisticated policies: – Lack of input information about the task and its inputs and outputs: hardware/software requirements and capabilities, absence of time completion for the task rules out many scheduling policies, data locality can play important role. Support for cloud bursting e.g. interaction with an Infrastructure Manager Support for simulation – e.g. integration with WorkflowSim 18

DISCUSSION 19