Combining the strengths of UMIST and The Victoria University of Manchester Utility Driven Adaptive Workflow Execution Kevin Lee School of Computer Science,

Slides:



Advertisements
Similar presentations
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Advertisements

Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
SLA-Oriented Resource Provisioning for Cloud Computing
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
A Grid Parallel Application Framework Jeremy Villalobos PhD student Department of Computer Science University of North Carolina Charlotte.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
1 Optimizing Utility in Cloud Computing through Autonomic Workload Execution Reporter : Lin Kelly Date : 2010/11/24.
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks Íñigo Goiri, Kien Le, Thu D. Nguyen, Jordi Guitart, Jordi Torres, and Ricardo Bianchini.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.
 Scheduling  Linux Scheduling  Linux Scheduling Policy  Classification Of Processes In Linux  Linux Scheduling Classes  Process States In Linux.
Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.
Chapter 9 Uniprocessor Scheduling Spring, 2011 School of Computer Science & Engineering Chung-Ang University.
Resource management system for distributed environment B4. Nguyen Tuan Duc.
Scheduling of Parallel Jobs In a Heterogeneous Multi-Site Environment By Gerald Sabin from Ohio State Reviewed by Shengchao Yu 02/2005.
LOGO Scheduling system for distributed MPD data processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.
DISTRIBUTED COMPUTING
Combining the strengths of UMIST and The Victoria University of Manchester Utility-based Adaptive Workflow Execution on the Grid Kevin Lee School of Computer.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.
What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
CSIU Submission of BLAST jobs via the Galaxy Interface Rob Quick Open Science Grid – Operations Area Coordinator Indiana University.
Scientific Workflow Scheduling in Computational Grids Report: Wei-Cheng Lee 8th Grid Computing Conference IEEE 2007 – Planning, Reservation,
Page 1 Online Aggregation for Large MapReduce Jobs Niketan Pansare, Vinayak Borkar, Chris Jermaine, Tyson Condie VLDB 2011 IDS Fall Seminar
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
Grid infrastructure analysis with a simple flow model Andrey Demichev, Alexander Kryukov, Lev Shamardin, Grigory Shpiz Scobeltsyn Institute of Nuclear.
Combining the strengths of UMIST and The Victoria University of Manchester Adaptive Workflow Processing and Execution in Pegasus Kevin Lee School of Computer.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
Investigating Adaptive Compilation using the MIPSpro Compiler Keith D. Cooper Todd Waterman Department of Computer Science Rice University Houston, TX.
Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath.
1 A Steering Portal for Condor/DAGMAN Naoya Maruyama on behalf of Akiko Iino Hidemoto Nakada, Satoshi Matsuoka Tokyo Institute of Technology.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
George Goulas, Christos Gogos, Panayiotis Alefragis, Efthymios Housos Computer Systems Laboratory, Electrical & Computer Engineering Dept., University.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Workflows Description, Enactment and Monitoring in SAGA Ashiq Anjum, UWE Bristol Shantenu Jha, LSU 1.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
Parag Mhashilkar (Fermi National Accelerator Laboratory)
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Pegasus WMS Extends DAGMan to the grid world
Seismic Hazard Analysis Using Distributed Workflows
Conception of parallel algorithms
AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
On Spatial Joins in MapReduce
Apollo Weize Sun Feb.17th, 2017.
Genre1: Condor Grid: CSECCR
Resource Allocation for Distributed Streaming Applications
Workflow Adaptation as an Autonomic Computing Problem
Frieda meets Pegasus-WMS
Presentation transcript:

Combining the strengths of UMIST and The Victoria University of Manchester Utility Driven Adaptive Workflow Execution Kevin Lee School of Computer Science, University of Manchester Currently at University of Mannheim, Germany 20 th May 2009

Combining the strengths of UMIST and The Victoria University of Manchester 1. Problem Overview Concerning: Scientific Workflows executing on Grids Characteristics: Very long running Small delays can have large effects due to dependencies involve highly distributed resources Limited control over resources Uncertain execution and batch queue times Statically schedule a workflow before it starts executing: Using current information about the execution environment What happens if the environment changes? Resources appear/disappear Loads change due to resources being used Obvious solution, Adapt at runtime!!!

Combining the strengths of UMIST and The Victoria University of Manchester 2. Background: montage workflow Mosaic created by Montage from a run of the M101 galaxy images <- A Simple Montage workflow. Can execute on Grid resources. Can be specified in a high level abstract form: Logical files Logical transformations Montage Deliver science-grade mosaics on demand Produce mosaics from a wide range of data sources User-specified parameters of projection, coordinates, size, rotation and spatial sampling

Combining the strengths of UMIST and The Victoria University of Manchester 2. Background Pegasus Workflow execution: Compilation Abstract (logical) ->Concrete Submission Graph dependency manager Execution Jobs execute on grid resources Reporting Task and workflow status

Combining the strengths of UMIST and The Victoria University of Manchester Retrofit an Adaptivity framework to Pegasus Minimal changes to Pegasus Touch points via Sensors and Effectors 3. Adaptive Workflow Execution

Combining the strengths of UMIST and The Victoria University of Manchester 3. Adaptive Workflow Execution When grid site has contention Batch queues times is higher than estimated When a grid site is under utilised Batch queues times is lower than estimated Static Schedule may be initially correct This diverges from the ideal with time. Adapt to changing batch queue times Aim

Combining the strengths of UMIST and The Victoria University of Manchester 3. Adaptive Workflow Execution To monitor the progress of an executing workflow, we parse the Live Log. Example: Sensors->Monitoring 2/17 11:53:14 Event: ULOG_GRID_SUBMIT for Condor Node mBackground_ID (4713.0) 2/17 11:53:14 Event: ULOG_EXECUTE for Condor Node mBackground_ID (4709.0) 2/17 11:53:14 Number of idle job procs: 4 2/17 11:53:20 Event: ULOG_EXECUTE for Condor Node mBackground_ID (4708.0) 2/17 11:53:20 Number of idle job procs: 3 2/17 11:53:28 Event: ULOG_JOB_TERMINATED for Condor Node mBackground_ID (4710.0) 2/17 11:53:28 Node mBackground_ID job proc (4710.0) completed successfully. Result: XML Events for job queued, executed, termination. Made available to analysis as a stream RegEx: ([\d]+)/([\d]+).([\d]+):([\d]+):([\d]+).Event:.([\S]+_[\S]+).for.Condor.Node.([a-zA-Z0-9_]+)

Combining the strengths of UMIST and The Victoria University of Manchester 3. Adaptive Workflow Execution Uses the CQL continuous query language to group and analyse the events SQL-like but with extensions for queries over time. 1.Calculates current average job queue times over a period of time 2.Causes re-planning when queue times are more or less than expected Analysis select h*3600+m*60+s,job,site,est from workflowlog where event="ULOG_SUBMIT"; register stream submittedjobs (time int, job char(22), site char(22), est int); select h*3600+m*60+s,job from workflowlog where event="ULOG_EXECUTE"; register stream executedjobs (time int, job char(22)); Rstream (select executed.time-submitted.time, executed.job, submitted.site, submitted.est from executedjobs[Range 360 Seconds] as executed,submittedjobs as submitted where executed.job=submitted.job); register stream jobdelay (delay int, job char(22), site char(22), est int); select site, delay, est, (delay-est) from jobdelay where (delay-est)>20; Output from this causes planning

Combining the strengths of UMIST and The Victoria University of Manchester 3. Adaptive Workflow Execution Planning has the task of recalculating a better assignment for the workflow Data we have: Workflow DAG Current Assignment Collected data about resources, number CPUS, Execution times, AVG queue time What we’ve submitted since the execution started. Approach: Call out to a Matlab based utility function optimiser (MADS) Each iteration: New potential assignment We provide a function that evaluates the new potential assignment. Proceed with the search until the best assignment is found Planning

Combining the strengths of UMIST and The Victoria University of Manchester 3. Adaptive Workflow Execution Firstly, for each proposed new assignment we calculate estimated queue times: Planning Estimated Queue time: Based on external demand, the new demand and the change in actual queue times A Estimate of External Demand For a period p Assigned demand For the period p The Candidate Demand The demand we’ll put on the resources Full explanation in papers

Combining the strengths of UMIST and The Victoria University of Manchester 3. Adaptive Workflow Execution Next, calculate the Predicted Response Time for the workflow: Planning Completion time of the last task plus any adaptation cost: Recursive formula to estimate the completion time of the last task So, now we have a estimate of how long a workflow will take for each new assignment We need a way of judging how good a assignment is in relation to its PRT and the resources used

Combining the strengths of UMIST and The Victoria University of Manchester 3. Adaptive Workflow Execution Option 1: Utility for Response time: Purely tries to use the fastest resources available to complete the workflow EQT ensures a resource isn’t overloaded The utility is therefore just: The higher the Utility value the better The optimiser will try multiple values of assignment until a ‘good’ one is found Planning

Combining the strengths of UMIST and The Victoria University of Manchester 3. Adaptive Workflow Execution Option 2: Utility for Profit: As resources are not free, we attach a value to using resources We have a reward for completing a workflow within a target time A cost for using a resource to execute a task Planning Cost for a workflow assignment: Profit is a measure of utility minus cost The utility is a calculation of how likely the assignment completes before the target response time The larger the ‘profit’ the better for the optimiser

Combining the strengths of UMIST and The Victoria University of Manchester 3. Adaptive Workflow Execution For a new assignment: 1.Tell the local DAG manager to halt the workflow(s) 2.Collect the locations of all the partial results 3.Modify local databases with this new data 4.Replan the workflow(s) with the new assignment 5.Deploy the workflow 6.Continue monitoring the new execution Repeats every time a new assignment is available Execution/Deploying a new assignment

Combining the strengths of UMIST and The Victoria University of Manchester 4. Experimental Evaluation Workflow: 27 Node Montage workflow of M17: Takes between 20 mins and a few hours depending on resources Profit gain is 100 for completing within the target Resources: Two Clusters. Linux, Sun Grid Engine, WSGRAM, Globus. (1) is less powerful with longer queue times (2) is more powerful with shorter queue times (2) costs more than (1). (1) costs 1, (2) costs 2.

Combining the strengths of UMIST and The Victoria University of Manchester 4. Experimental Evaluation Experiment 1 Single workflow. Periodic Load Applied to Cluster 1. Utility based on response time The adaptive version performs an adaption and results in a faster workflow

Combining the strengths of UMIST and The Victoria University of Manchester 4. Experimental Evaluation Experiment 2: Same as experiment 1 but for different target response times U(RT) Always performs the best. U(Profit) meets the High and mid target response times at less cost than U(RT) U(Profit) fails to meet the low target response time so uses the cheapest resources

Combining the strengths of UMIST and The Victoria University of Manchester 4. Experimental Evaluation Experiment 3 Two Montage workflows. Periodic Load Applied to Cluster 1. Achieved by submitting and monitoring two workflows at the same time. Utility is the Sum of all U(RT) and U(Profit) for all workflows. U(RT) Always performs the best. U(Profit) meets the Loose and mid target response times at less cost than U(RT) U(Profit) fails to meet the Tight target response time so uses the cheapest resources

Combining the strengths of UMIST and The Victoria University of Manchester 5. Conclusions An Approach to optimising workflow execution: Long running workflows Takes into account a workflows structure Takes into account current loads Takes into account the loads we will apply Minimal intervention to workflow infrastructure Good results for Response time and Profit focus

Combining the strengths of UMIST and The Victoria University of Manchester Questions?