Combining the strengths of UMIST and The Victoria University of Manchester Adaptive Workflow Processing and Execution in Pegasus Kevin Lee School of Computer Science, University of Manchester 25 th May 2008
Combining the strengths of UMIST and The Victoria University of Manchester Contributors Rizos Sakellariou, Norman W. Paton and Alvaro A. A. Fernandes {klee, rizos, norm, University of Manchester UK Ewa Deelman, Gaurang Mehta Information Systems Institute University of Southern California, US
Combining the strengths of UMIST and The Victoria University of Manchester Talk Overview 1)Background: Adaptivity at Manchester 2)Background: Pegasus Workflow Execution 3)Adaptive Pegasus 4)Experiments and Results 5)Conclusions and Future work 6)Questions
Combining the strengths of UMIST and The Victoria University of Manchester 1. Background: Adaptivity at Manchester Creating an infrastructure to support the Systematic Development of Adaptive Systems based on the ideas presented today Ease the development of adaptive systems. Support the development of better adaptive systems Investigate the use of the infrastructure in a number of different domains Use the infrastructure to improve the general understanding of adaptive systems Applying the infrastructure to related domains Workflow processing with the Pegasus team Concurrent web-service workflows Distributed Query Processing
Combining the strengths of UMIST and The Victoria University of Manchester 2. Background: Pegasus Workflow Execution
Combining the strengths of UMIST and The Victoria University of Manchester 3. Adaptive Pegasus Execution Characteristics of Pegasus workflow execution Very long running Small delays can have large effects due to dependencies involve highly distributed resources Limited control over resources Uncertain execution times Uncertain queue waiting times Pegasus schedules a workflow before it starts executing Using current information about the execution environment What happens if the environment changes? Resources appear/disappear Loads change due to resources being used
Combining the strengths of UMIST and The Victoria University of Manchester 3. Adaptive Pegasus We combined the adaptivity work at Manchester with Pegasus Retrofitted Pegasus with an adaptivity framework Focused on adapting to site queue length, one of the biggest delays in execution Result is a Pegasus instantiation that can react dynamically to the environment Main components:
Combining the strengths of UMIST and The Victoria University of Manchester 3. Adaptive Pegasus Monitoring: To monitor the progress of an executing workflow. Events: Job queue, Execute, Termination. Sensed from the Pegasus Log. Analysis: Establish whether the workflow is performing according to expectations when it was compiled Uses the CQL continuous query language to group and analyse the events produced by monitoring *SQL-like but with extensions for queries over time. *Detailed in paper. Planning: When analysis detects a sustained change in batch queue times for a site. Re-scheduling using scheduler that takes into account historic data. *algorithm in paper. Execution: Halt the current workflow and deploy the newly planned one.
Combining the strengths of UMIST and The Victoria University of Manchester 4. Experiments and Results: Overview Experiment investigates the effect of the adaptive approach on the workflow response time. Pegasus operates on abstract workflows in the form of Directed Acyclic Graphs (DAGs) We used two styles of DAGs in our experiments, linear workflow and a Montage workflow. The experiments took place using 2 clusters. Each cluster was running the Condor Scheduler We apply loads to the clusters by submitting additional workflows and submit the workflow with adaptive support.
Combining the strengths of UMIST and The Victoria University of Manchester 4. Experiments and Results: Workflow type 1 This is simply a DAG were each subsequent task is dependent on the file created by the previous task, and may contain any number of tasks. With these dependencies present, the tasks in the workflow will execute in series. In our experiments we considered an instance with 50 tasks.
Combining the strengths of UMIST and The Victoria University of Manchester 4. Experiments and Results: Workflow type 2 –Montage (NASA and NVO) Deliver science-grade custom mosaics on demand Produce mosaics from a wide range of data sources (possibly in different spectra) User-specified parameters of projection, coordinates, size, rotation and spatial sampling. Mosaic created by Pegasus based Montage from a run of the M101 galaxy images on the Teragrid. <- A Simple Montage workflow. These can be of varying sizes depending on the size of the area of sky of the mosaic. The numbers represent the level of each task in the overall workflow. This corresponds to the size used in our experiments (25 tasks, equivalent to a 0.2 degree area).
Combining the strengths of UMIST and The Victoria University of Manchester 4. Experiments and Results: Experiment 4 The linear workflow is scheduled in a round robin fashion to cluster 1 and 2 Cluster 1 is constant loaded with an additional 50 linear workflows. Jobs sent to Cluster 1 are queued longer, visible on the graph. The adaptive workflow adapts early to the constant load Result is adaptive workflow has a better response time
Combining the strengths of UMIST and The Victoria University of Manchester 4. Experiments and Results: Experiment 5 The Montage workflow is scheduled in a round robin fashion to cluster 1 and 2 Cluster 1 is constant loaded with an additional 50 linear workflows. Jobs sent to Cluster 1 are queued longer, visible on the graph. The adaptive workflow adapts early to the constant load Result is adaptive workflow has a better response time
Combining the strengths of UMIST and The Victoria University of Manchester 4. Experiments and Results: Experiment 6 The linear workflow is scheduled in a round robin fashion to cluster 1 and 2 Cluster 1 is temperately loaded during execution with an additional 50 linear workflows at 60 minutes into the execution for 60 minutes. Jobs sent to Cluster 1 are queued longer, visible on the graph. The adaptive workflow adapts twice, after the load is applied and after it is removed Result is adaptive workflow has a marginally better response time The temporary load is roughly equivalent to 2 adaptations.
Combining the strengths of UMIST and The Victoria University of Manchester 5. Conclusions and Future work Adaptive Pegasus succeeds in improving the response time of workflows. Retrofitting Pegasus with dynamic behaviour required minimum interference with Pegasus. Ongoing work: Current work involves the use of Utility Functions to acomplish generic Planning and make better decisions. Continuing with Framework Development Continuing with multiple case studies
Combining the strengths of UMIST and The Victoria University of Manchester Questions/Comments?