Combining the strengths of UMIST and The Victoria University of Manchester Utility-based Adaptive Workflow Execution on the Grid Kevin Lee School of Computer.

Slides:



Advertisements
Similar presentations
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Advertisements

Short introduction to the use of PEARL General properties First tier assessments Higher tier assessments Before looking at first and higher tier assessments,
Part IV: Memory Management
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
Selecting Preservation Strategies for Web Archives Stephan Strodl, Andreas Rauber Department of Software.
Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.
Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.
Opportunistic Optimization for Market-Based Multirobot Control M. Bernardine Dias and Anthony Stentz Presented by: Wenjin Zhou.
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
1 Optimizing Utility in Cloud Computing through Autonomic Workload Execution Reporter : Lin Kelly Date : 2010/11/24.
Overview and Mathematics Bjoern Griesbach
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.
Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta,
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.
Chapter 6 : Software Metrics
CONTENTS Arrival Characters Definition Merits Chararterstics Workflows Wfms Workflow engine Workflows levels & categories.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Configuration Management (CM)
What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.
CS453 Lecture 3.  A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size).  The asymptotic runtime.
Combining the strengths of UMIST and The Victoria University of Manchester Utility Driven Adaptive Workflow Execution Kevin Lee School of Computer Science,
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
China Grid Activity on SIG Presented by Guoqing Li At WGISS-21, Budapest 8 May, 2006.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
Scientific Workflow Scheduling in Computational Grids Report: Wei-Cheng Lee 8th Grid Computing Conference IEEE 2007 – Planning, Reservation,
Dr. Ahmed Abdeen Hamed, Ph.D. University of Vermont, EPSCoR Research on Adaptation to Climate Change (RACC) Burlington Vermont USA MODELING THE IMPACTS.
Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
Combining the strengths of UMIST and The Victoria University of Manchester Adaptive Workflow Processing and Execution in Pegasus Kevin Lee School of Computer.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
OPERATING SYSTEMS CS 3502 Fall 2017
GWE Core Grid Wizard Enterprise (
On Spatial Joins in MapReduce
Dtk-tools Benoit Raybaud, Research Software Manager.
Overview of Workflows: Why Use Them?
Workflow Adaptation as an Autonomic Computing Problem
Frieda meets Pegasus-WMS
Presentation transcript:

Combining the strengths of UMIST and The Victoria University of Manchester Utility-based Adaptive Workflow Execution on the Grid Kevin Lee School of Computer Science, University of Manchester 11 th March 2009

Combining the strengths of UMIST and The Victoria University of Manchester Talk Overview 1)Overview 2)Technical Background 3)Generic Adaptivity Framework 4)Instantiation for Workflow Execution 5)Experimental Evaluation 6)Future work 7)Questions

Combining the strengths of UMIST and The Victoria University of Manchester 1. Overview  Scientists doing research requiring large scale computation.  Computation is expensive.  Unlikely to have local or regional computation resources.  Globally, Institutions share resources to maximize usage and value.  Computation in the form of workflows can execute across multiple resources.  We want to make sure that the resources are used as efficiently as possible and workflows can execute as fast as possible.  As we’ll see there is potential for bad decisions leading to inefficient execution  The solution presented here is to use runtime adaptation to improve execution

Combining the strengths of UMIST and The Victoria University of Manchester 2. Technical Background Abstract workflows: Workflows that can be specified by hand or by a higher level program Replicas (input and outputs) specified as logical files Transformations (programs) specified as logical transformations Dependencies between tasks in a Digraph form (workflows) Compiler and Management software: Looks up the replicas and transformations in various databases and services Creates a concrete (executable) workflow that contains locations of resources Local diagraph manager that executes the workflow Resources: Grid based resources that can accept individual jobs that form the workflow

Combining the strengths of UMIST and The Victoria University of Manchester 2. Technical Background: a workflow Mosaic created by Montage from a run of the M101 galaxy images <- A Simple Montage workflow. These can be of varying sizes depending on the size of the area of sky of the mosaic. The numbers represent the level of each task in the overall workflow. This corresponds to the size used in our experiments (25 tasks, equivalent to a 0.2 degree area). Montage Deliver science-grade mosaics on demand Produce mosaics from a wide range of data sources User-specified parameters of projection, coordinates, size, rotation and spatial sampling

Combining the strengths of UMIST and The Victoria University of Manchester 2. Technical Background: Pegasus Workflow Management System

Combining the strengths of UMIST and The Victoria University of Manchester 2. Technical Background More detailed view: Compilation Submission Execution Reporting

Combining the strengths of UMIST and The Victoria University of Manchester 2. Technical Background Execution Characteristics of Pegasus workflow execution Very long running Small delays can have large effects due to dependencies involve highly distributed resources Limited control over resources Uncertain execution times Uncertain queue waiting times Pegasus schedules a workflow before it starts executing Using current information about the execution environment What happens if the environment changes? Resources appear/disappear Loads change due to resources being used Obvious solution, Adapt at runtime!!!

Combining the strengths of UMIST and The Victoria University of Manchester 3. Generic Adaptivity Framework Developing infrastructure to support the Systematic Development of Adaptive Systems Ease the development of adaptive systems. Support the development of better adaptive systems Investigate the use of the infrastructure in a number of different domains Leverage the best tools for in each part of the adaptation process Use the infrastructure to improve the general understanding of adaptive systems

Combining the strengths of UMIST and The Victoria University of Manchester Monitor: Events from a source: log files in-memory process sensors Analyze: When an event occurs, what to do about it... Plan: After the event is detected and analysed, the system needs to determine what to do about it. Execute: Perform the necessary changes 3. Generic Adaptivity Framework (IBM autonomic vision) Next, each in more detail.

Combining the strengths of UMIST and The Victoria University of Manchester Transforms sensor events from the system into something more useful Events are expected to be in XML XSLT transformation transforms event to a standard style Output events stored in the Knowledge base Control events indicate to components in the pipeline that new data is available in the knowledge base 3. Generic Adaptivity Framework: Monitoring

Combining the strengths of UMIST and The Victoria University of Manchester Performs analysis on data in the knowledge base Performs analysis in response to an indication that new data is available Analysis based on a series of CQL Stream queries Output events stored in the knowledge base Control events indicate to components in the pipeline that new data is available in the knowledge base 3. Generic Adaptivity Framework: Analysis

Combining the strengths of UMIST and The Victoria University of Manchester Performs optimisation of utility functions Based on data in the knowledge base Performs optimisation in response to an indication that new data is available Optimisation algorithm used to maximise utility Output events stored in the knowledge base Control events indicate to components in the pipeline that new data is available in the knowledge base 3. Generic Adaptivity Framework: Planning

Combining the strengths of UMIST and The Victoria University of Manchester Executes a workflow on data in the knowledge base Executes workflow in response to an indication that new data is available Uses BPEL workflow containing services which effect adaptations on the system Output events stored in the knowledge base 3. Generic Adaptivity Framework: Execution So, how does this help improve workflow execution?

Combining the strengths of UMIST and The Victoria University of Manchester Combining the framework and Pegasus No changes to Pegasus Touch points via Sensors and Effectors 4. Instantiation for Workflow Execution

Combining the strengths of UMIST and The Victoria University of Manchester 4. Instantiation for Workflow Execution At the time the workflow is compiled and scheduled the resources selected may be correct As time goes on these decisions are likely to diverge from the ideal For computation resources the largest cause of delay is contention for the resource This manifests itself as increased batch queue times for submitted jobs Therefore, when applying the framework to Pegasus we focus on adapting to queue times. Aim

Combining the strengths of UMIST and The Victoria University of Manchester 4. Instantiation for Workflow Execution To monitor the progress of an executing workflow, we parse the Live Log. Example: Sensors->Monitoring 2/17 11:53:14 Event: ULOG_GRID_SUBMIT for Condor Node mBackground_ID (4713.0) 2/17 11:53:14 Event: ULOG_EXECUTE for Condor Node mBackground_ID (4709.0) 2/17 11:53:14 Number of idle job procs: 4 2/17 11:53:20 Event: ULOG_EXECUTE for Condor Node mBackground_ID (4708.0) 2/17 11:53:20 Number of idle job procs: 3 2/17 11:53:28 Event: ULOG_JOB_TERMINATED for Condor Node mBackground_ID (4710.0) 2/17 11:53:28 Node mBackground_ID job proc (4710.0) completed successfully. Result: XML Events for job queued, executed, termination. Made available to analysis as a stream RegEx: ([\d]+)/([\d]+).([\d]+):([\d]+):([\d]+).Event:.([\S]+_[\S]+).for.Condor.Node.([a-zA-Z0-9_]+)

Combining the strengths of UMIST and The Victoria University of Manchester 4. Instantiation for Workflow Execution Uses the CQL continuous query language to group and analyse the events SQL-like but with extensions for queries over time. 1.Calculates current average job queue times over a period of time 2.Causes re-planning when queue times are more or less than expected Analysis select h*3600+m*60+s,job,site,est from workflowlog where event="ULOG_SUBMIT"; register stream submittedjobs (time int, job char(22), site char(22), est int); select h*3600+m*60+s,job from workflowlog where event="ULOG_EXECUTE"; register stream executedjobs (time int, job char(22)); Rstream (select executed.time-submitted.time, executed.job, submitted.site, submitted.est from executedjobs[Range 180 Seconds] as executed,submittedjobs as submitted where executed.job=submitted.job); register stream jobdelay (delay int, job char(22), site char(22), est int); select site, delay, est, (delay-est) from jobdelay where (delay-est)>20; Output from this causes planning

Combining the strengths of UMIST and The Victoria University of Manchester 4. Instantiation for Workflow Execution Planning has the task of recalculating a better assignment for the workflow Data we have: Workflow DAG Current Assignment Collected data about resources, number CPUS, Execution times, AVG queue time What we’ve submitted since the execution started Approach: Use a Matlab based utility function optimiser We write a function that depending on the values (above) and a potential new assignment gives us a value of the assignment (Higher the better) Optimiser (MADS) searches potential values calling the function many times. Planning

Combining the strengths of UMIST and The Victoria University of Manchester 4. Instantiation for Workflow Execution Firstly, for each proposed new assignment we calculate estimated queue times: Planning Estimated Queue time: Based on external demand, the new demand and the change in actual queue times A Estimate of External Demand For a period p Assigned demand For the period p The Candidate Demand The demand we’ll put on the resources Full explanation in papers

Combining the strengths of UMIST and The Victoria University of Manchester 4. Instantiation for Workflow Execution Next, calculate the Predicted Response Time for the workflow: Planning Completion time of the last task plus any adaptation cost: Recursive formula to estimate the completion time of the last task So, now we have a estimate of how long a workflow will take for each new assignment We need a way of judging how good a assignment is in relation to its PRT and the resources used

Combining the strengths of UMIST and The Victoria University of Manchester 4. Instantiation for Workflow Execution Option 1: Utility for Response time: Purely tries to use the fastest resources available to complete the workflow EQT ensures a resource isn’t overloaded The utility is therefore just: The higher the Utility value the better The optimiser will try multiple values of assignment until a ‘good’ one is found Planning

Combining the strengths of UMIST and The Victoria University of Manchester 4. Instantiation for Workflow Execution Option 2: Utility for Profit: As resources are not free, we attach a value to using resources We have a reward for completing a workflow within a target time A cost for using a resource to execute a task Planning Cost for a workflow assignment: Profit is a measure of utility minus cost The utility is a calculation of how likely the assignment completes before the target response time The larger the ‘profit’ the better for the optimiser

Combining the strengths of UMIST and The Victoria University of Manchester The first has a high target response time which it easily meets. Then improves further. 4. Instantiation for Workflow Execution Planning Some example runs of the optimizer with the profit utility The second has a lower target response time which is slowly gets closer to. Graphs show the utility being minimized rather than maximized

Combining the strengths of UMIST and The Victoria University of Manchester 4. Instantiation for Workflow Execution For a new assignment: 1.Tell the local DAG manager to halt the workflow(s) 2.Collect the locations of all the partial results 3.Modify local databases with this new data 4.Replan the workflow(s) with the new assignment 5.Deploy the workflow 6.Continue monitoring the new execution Repeats every time a new assignment is available Execution/Deploying a new assignment

Combining the strengths of UMIST and The Victoria University of Manchester 5. Experimental Evaluation Workflow: 27 Node Montage workflow of M17: Takes between 20 mins and a few hours depending on resources Profit gain is 100 for completing within the target Resources: Previous work on teragrid clusters (see papers) These experiments are new so 2 workstations. (1) is less powerful with longer queue times (2) is more powerful with shorter queue times (2) costs more than (1). (1) costs 1, (2) costs 2.

Combining the strengths of UMIST and The Victoria University of Manchester 5. Experimental Evaluation Experiment 1 Single workflow. Periodic Load Applied to Cluster 1. The adaptive version performs an adaption and results in a faster workflow

Combining the strengths of UMIST and The Victoria University of Manchester 5. Experimental Evaluation Experiment 1 For different target response times U(RT) Always performs the best. U(Profit) meets the High and mid target response times at less cost than U(RT) U(Profit) fails to meet the low target response time so uses the cheapest resources

Combining the strengths of UMIST and The Victoria University of Manchester 5. Experimental Evaluation Experiment 2 Two Montage workflows. Periodic Load Applied to Cluster 1. Achieved by submitting and monitoring two workflows at the same time. Utility is the Sum of all U(RT) and U(Profit) for all workflows. U(RT) Always performs the best. U(Profit) meets the High and mid target response times at less cost than U(RT) U(Profit) fails to meet the low target response time so uses the cheapest resources

Combining the strengths of UMIST and The Victoria University of Manchester 6. Current/Future Work Current work: Scaling up the number of workflows, 10, and more. Scaling up to more sites. Managing workflows arriving over time, rather than at the start. More workflow types. We’ve used linear types and montage in our papers. Future work: Further refinement of the conditions when to adapt. Large scale experiments, more tight integration into Pegasus. Lots of interesting problems

Combining the strengths of UMIST and The Victoria University of Manchester Publications I’ve tried to give the general picture and some details. See for more: WORKS 2007 K. Lee, R. Sakellariou, N. W. Paton and A. A. A. Fernandes, Workflow Adaptation as an Autonomic Computing Problem, 2nd Workshop on Workflows in Support of Large-Scale Science (Works 07), In Proceedings of HPDC 2007, Monterey Bay California, June WAGE 2008 K.Lee, N. W. Paton, R. Sakellariou, E. Deelman, A. A. A. Fernandes, G. Mehta, Adaptive Work- flow Processing and Execution in Pegasus, 3rd International Workshop on Workflow Management and Applications in Grid Environments (WaGe08) May , Kunming, China CCGRID 2009 K. Lee, N. W. Paton, R. Sakellariou, A. A. A. Fernandes, Utility based scheduling for Adaptive workflow execution, 9th IEEE International Symposium on Cluster Computing and the Grid (CCGRID 2009), to appear soon.

Combining the strengths of UMIST and The Victoria University of Manchester Acknowledgements Rizos Sakellariou, Norman W. Paton and Alvaro A. A. Fernandes {klee, rizos, norm, University of Manchester UK Ewa Deelman, Gaurang Mehta Information Systems Institute University of Southern California, US

Combining the strengths of UMIST and The Victoria University of Manchester Questions/Comments?

Combining the strengths of UMIST and The Victoria University of Manchester Notes: Less on the slides Rizos paper notes: > reasons Added value, maybe another experiment =