Download presentation
Presentation is loading. Please wait.
Published byLuz Penfold Modified over 9 years ago
1
Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle US Inc Korea Advanced Institute of Science and Technology Information Sciences Institute/University of Southern California Sungkyunkwan University
2
Overview Motivation Background – Pegasus – Virtual Grid Pegasus-VG Proxy Conclusion Discussion
3
Motivation Challenges in scientific application development – Data/control flow, task scheduling, data replication, fault-tolerance, etc Challenges in resource management – Availability, performance, cost, reliability, fault- tolerance, etc How to leverage existing cyber infrastructures for easy and efficient scientific computing?
4
Separations of Concerns Application domain – Workflow management: application management can be conducted independently of target execution environments. – E.g.) Pegasus, Askalon, Triana Resource domain – Resource provisioning: resource management can be encapsulated underneath abstractions or virtualizations – E.g.) Virtual Grid, virtual cluster, cloud
5
Workflow planning and execution over provisioned resources
6
Pegasus A framework for workflow planning and execution Workflow lifecycle – Design: describe the data/control flows of application via an abstract workflow – Planning: map the workflow tasks onto physical resources – Execution: schedule and run the workflow tasks on the mapped resources
7
Pegasus Workflow Management Pegasus mapper Condor DAGman Condor Computing environment Monitoring Information provenance Pegasus Executable workflow tasks Monitoring Information provenance Abstract workflow Condor pool
8
Virtual Grid A programmable virtualized resource provisioning framework Components – vgDL (Virtual Grid Description Language) Specifies resource requirements – vgES (Virtual Grid Execution System) Compiles and coordinates resources – PC (Personal Cluster) Provides uniform job management
9
Timeshare A BC D Application Virtual Grid Resource Abstraction Virtual Grid Resource Abstraction VG Timeshare Lease Batch VG PBS P4 VGDL vgdl=clusterof (node) [2] { node = [Processor==“P4”] } program run AB C D ClassificationSelectionBindingEnvironment ok
10
Pegasus on Virtual Grid Scope – A basic integration for workflow planning and execution over provisioned resources Issues – Resource capacity estimation Resource specification (vgDL) synthesis for Virtual Grid – Resource information publication Site catalog generation for Pegasus
11
Resource Capacity Estimation What Virtual Grid expects from Pegasus – vgDL description Available information – Task execution time, data transfer time, performance metrics, minimum memory capacity, cost, deadline, etc Unknown information – # of virtual processors Resource capacity estimate – Minimize the # of processors that can execute a workflow within a deadline
12
BTS (Balanced Time Scheduling) Ref: E-science’08 E.-K. Byun, Y.-S. Kee et. al 1 2345 6 ID 1 2 3 4 5 6 ET 1 5 2 2 1 1 1 2 6 3 4 5 Time p1 p2 How many processors do we need to run this workflow within 7 units?
13
Example Execution time of each task - Xeon processor Data transfer time - network with 1Gbs bandwidth. Deadline is 1 hour. Diamond = ClusterOf [2] (nd) [, 0:30:00] { nd = [Processor == “Xeon”] } preprocess findrange analyze f.input f.output
14
Resource Information Publication What Pegasus expects from Virtual Grid – Site catalog Virtual Grid – VG instance Resource information publication – Devirtualize a VG instance and generate a site catalog for Pegasus
15
Timeshare A BC D Application Virtual Grid Resource Abstraction Virtual Grid Resource Abstraction VG Timeshare Lease Batch VG PBS P4 VGDL vgdl=clusterof (node) [2] { node = [Processor==“P4”] } program run AB C D ClassificationSelectionBindingEnvironment ok
16
Personal Cluster A partition of resources dedicated to a user under the control of a user-level resource manager during a limited time period GT4/PBS Ref: HCW’08 Y.-S. Kee and C. Kesselman
17
Site Catalog Publication … /home/globus/pegasus- 2.1.0 gt4 PBS $HOME/workdir …
18
Workflow Planning over Provisioned Resources Creation Planning Scheduling/ Execution A BC D CC A BC D CC Executable workflow Abstract workflow BTS VG Virtual Grid VGDL Devirtualization Site catalog vgdl = ClusterOf (nd) [2] { nd = [Proc==“Xeon”] } GT4+PBS PegasusVG-Pegasus Proxy
19
Conclusion Pegasus on Virtual Grid – Implements workflow planning and execution over on-demand captive resources – Enables easy and efficient application development and execution Issues – Resource capacity estimation – Site catalog publication
20
Discussion Effective performance – What is the cost that a user has to pay to have a successful execution? Ongoing studies – Find-grain planning for resource provisioning Performance, cost, reliability – Workflow execution for virtualization Recovery of failed tasks
21
Need More Information? Pegaus – http://pegasus.isi.edu VGrADS – Tuesday, 11:30am, RENCI booth (2633) – Wednesday, noon, GCAS booth (285) – Wednesday, 2:00Pm, SDSC booth (568) – Wednesday, 4:00pm, RENCI booth (2633)
22
A Q & Q U E S T I O N S A N S W E R S
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.