Ewa Deelman University of Southern California

Slides:



Advertisements
Similar presentations
Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
Advertisements

Workshop on Workflows in Support of Large-Scale Science June 20, Paris, France In conjunction with HPDC 2006 HPDC Ewa Deelman,
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
Ewa Deelman, Optimizing for Time and Space in Distributed Scientific Workflows Ewa Deelman University.
Ewa Deelman, Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences.
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Ewa Deelman Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.
Ewa Deelman, Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences.
Managing Workflows with the Pegasus Workflow Management System
Ewa Deelman, Pegasus and DAGMan: From Concept to Execution Mapping Scientific Workflows onto the National.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta,
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.
Large-Scale Science Through Workflow Management Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.
Managing large-scale workflows with Pegasus Karan Vahi ( Collaborative Computing Group USC Information Sciences Institute Funded.
 The workflow description modified to output a VDS DAX.  The workflow description toolkit developed allows any concrete workflow description to be migrated.
Experiences Using Cloud Computing for A Scientific Workflow Application Jens Vöckler, Gideon Juve, Ewa Deelman, Mats Rynge, G. Bruce Berriman Funded by.
Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute.
Pegasus: Planning for Execution in Grids Ewa Deelman Information Sciences Institute University of Southern California.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Dr. Ahmed Abdeen Hamed, Ph.D. University of Vermont, EPSCoR Research on Adaptation to Climate Change (RACC) Burlington Vermont USA MODELING THE IMPACTS.
Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Interactive Composition of Computational Pathways Jihie Kim Varun Ratnakar Students: Marc Spraragen (USC)
Combining the strengths of UMIST and The Victoria University of Manchester Adaptive Workflow Processing and Execution in Pegasus Kevin Lee School of Computer.
1 USC INFORMATION SCIENCES INSTITUTE CAT: Composition Analysis Tool Interactive Composition of Computational Pathways Yolanda Gil Jihie Kim Varun Ratnakar.
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Transparently Gathering Provenance with Provenance Aware Condor Christine Reilly and Jeffrey Naughton Department of Computer Sciences University of Wisconsin.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Pegasus WMS: Leveraging Condor for Workflow Management Ewa Deelman, Gaurang Mehta, Karan Vahi, Gideon Juve, Mats Rynge, Prasanth.
Experiment Management from a Pegasus Perspective Jens-S. Vöckler Ewa Deelman
Planning Ewa Deelman USC Information Sciences Institute GriPhyN NSF Project Review January 2003 Chicago.
Pegasus: Planning for Execution in Grids Ewa Deelman, Carl Kesselman, Gaurang Mehta, Gurmeet Singh, Karan Vahi Information Sciences Institute University.
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
1 Pegasus and wings WINGS/Pegasus Provenance Challenge Ewa Deelman Yolanda Gil Jihie Kim Gaurang Mehta Varun Ratnakar USC Information Sciences Institute.
George Kola Computer Sciences Department University of Wisconsin-Madison Data Pipelines: Real Life Fully.
1 Artemis: Integrating Scientific Data on the Grid Rattapoom Tuchinda Snehal Thakkar Yolanda Gil Ewa Deelman.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
Ewa Deelman, Managing Scientific Workflows on OSG with Pegasus Ewa Deelman USC Information Sciences.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
VO Experiences with Open Science Grid Storage OSG Storage Forum | Wednesday September 22, 2010 (10:30am)
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Pegasus WMS Extends DAGMan to the grid world
Cloudy Skies: Astronomy and Utility Computing
Seismic Hazard Analysis Using Distributed Workflows
Self Healing and Dynamic Construction Framework:
Workflows and the Pegasus Workflow Management System
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
Clouds: An Opportunity for Scientific Applications?
University of Southern California
USC Information Sciences Institute {jihie, gil,
Managing Computational Workflows in the Cloud
Pegasus Workflows on XSEDE
Overview of Workflows: Why Use Them?
Mats Rynge USC Information Sciences Institute
rvGAHP – Push-Based Job Submission Using Reverse SSH Connections
High Throughput Computing for Astronomers
A General Approach to Real-time Workflow Monitoring
Chaitali Gupta, Madhusudhan Govindaraju
Frieda meets Pegasus-WMS
Presentation transcript:

Data Management Challenges of Large-Scale Data Intensive Scientific Workflows Ewa Deelman University of Southern California Information Sciences Institute Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Types of Workflow Applications Providing a service to a community (Montage project) Data and derived data products available to a broad range of users A limited number of small computational requests can be handled locally For large numbers of requests or large requests need to rely on shared cyberinfrastructure resources On-the fly workflow generation, portable workflow definition Supporting community-based analysis (SCEC project) Codes are collaboratively developed Codes are “strung” together to model complex systems Ability to correctly connect components, scalability Processing large amounts of shared data on shared resources (LIGO project) Data captured by various instruments and cataloged in community data registries. Amounts of data necessitate reaching out beyond local clusters Automation, scalability and reliability Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Generating mosaics of the sky (Bruce Berriman, Caltech) Size of the mosaic is degrees square* Number of jobs Number of input data files Number of Intermediate files Total data footprint Approx. execution time (20 procs) 1 232 53 588 1.2GB 40 mins 2 1,444 212 3,906 5.5GB 49 mins 4 4,856 747 13,061 20GB 1hr 46 mins 6 8,586 22,850 38GB 2 hrs. 14 mins 10 20,652 3,722 54,434 97GB 6 hours Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu *The full moon is 0.5 deg. sq. when viewed form Earth, Full Sky is ~ 400,000 deg. sq.

Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu Workflow Lifecycle Reuse Creation Planning Scheduling/ Execution Distributed Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu Workflow Creation Design a workflow (semantics info needed) Find the right components Set the right parameters Find the right data Connect appropriate pieces together Find the right fillers Support both experts and novices Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Challenges in user experiences Users’ expectations vary greatly High-level descriptions Detailed plans that include specific resources Users interactions can be exploratory Or workflows can be iterative Modifying portions of the workflow as the computation progresses Users need progress, failure information at the right level of detail There is no ONE user but many users with different knowledge and capabilities Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Wings: Workflow Instance Generation and Selection (Y. Gil, USC/ISI) “Validate this workflow based on the component specs” WINGS Workflow templates specify complex analyses sequences - Workflow instances specify data “Show me workflows that prune MT rules” Workflow Creation Workflow Selection Workflow Libraries SEASONED NL RESEARCHER Ontologies: Domain terms, Component types, Workflow Products Workflow Template Application Components Specifies data requirements Specifies execution STUDENT (OWL) “Run this workflow with the WSJ-04 data set” Data Selection Data Repositories Component Specification - Preexisting data collections - Workflow execution results Workflow Instance ALGORITHM DEVELOPER “Here is a new Rule pruning code, takes in a set of MT rules, is compiled for MPI” Workflow Management System Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Editing and Creating Workflows Wings Editor Workflow template Workflow instance Users get feedback and suggestions Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu Workflow Lifecycle Planning Ewa Deelman www.isi.edu/~deelman Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Specification: Place Y = F(x) at L Execution Environment: Distributed Find where x is--- {S1,S2, …} Find where F can be computed--- {C1,C2, …} Choose c and s subject to constraints (performance, space availability,….) Move x from s to c Move F to c Compute F(x) at c Move Y from c to L Register Y in data registry Record provenance of Y, performance of F(x) at c Error! x was not at s! Error! F(x) failed! Error! c crashed! Error! there is not enough space at L! Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Some challenges in workflow mapping Automated management of data Efficient mapping the workflow instances to resources Runtime Performance Data space optimizations Fault tolerance (involves interfacing with the workflow execution system) Recovery by replanning plan “B” Scalability Providing feedback to the user Feasibility, time estimates Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu Ewa Deelman www.isi.edu/~deelman

Pegasus-Workflow Management System Leverages abstraction for workflow description to obtain ease of use, scalability, and portability Provides a compiler to map from high-level descriptions (workflow instances) to executable workflows Correct mapping Performance enhanced mapping Provides a runtime engine to carry out the instructions (Condor DAGMan) Scalable manner Reliable manner In collaboration with Miron Livny, UW Madison, funded under NSF-OCI SDCI Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu Mapping Correctly Select where to run the computations Apply a scheduling algorithm HEFT, min-min, round-robin, random Schedule in a data-aware fashion (data transfers, amount of storage) The quality of the scheduling depends on the quality of information Transform task nodes into nodes with executable descriptions Execution location Environment variables initializes Appropriate command-line parameters set Select which data to access Add stage-in nodes to move data to computations Add stage-out nodes to transfer data out of remote sites to storage Add data transfer nodes between computation nodes that execute on different resources Add nodes to create an execution directory on a remote site Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Additional Mapping Elements Cluster compute nodes in small granularity applications Add data cleanup nodes to remove data from remote sites when no longer needed reduces workflow data footprint Add nodes that register the newly-created data products Provide provenance capture steps Information about source of data, executables invoked, environment variables, parameters, machines used, performance Scale matters--today we can handle: 1 million tasks in the workflow instance (SCEC) 10TB input data (LIGO) Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Pegasus Workflow Mapping 1 4 Original workflow: 15 compute nodes devoid of resource assignment 5 8 Resulting workflow mapped onto 3 Grid sites: 11 compute nodes (4 reduced based on available intermediate data) 12 data stage-in nodes 8 inter-site data transfers 14 data stage-out nodes to long- term storage 14 data registration nodes (data cataloging) 9 4 8 3 7 10 13 12 15 60 jobs to execute 9 10 12 13 15 Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu Data Reuse Sometimes it is cheaper to access the data than to regenerate it Keeping track of data as it is generated supports workflow-level checkpointing Need to be careful how reuse is done Mapping Complex Workflows Onto Grid Environments, E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, K. Backburn, A. Lazzarini, A. Arbee, R. Cavanaugh, S. Koranda, Journal of Grid Computing, Vol.1, No. 1, 2003., pp25-39. Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Efficient data handling Input data is staged dynamically New data products are generated during execution For large workflows 10,000+ files Similar order of intermediate and output files Total space occupied is far greater than available space—failures occur Solution: Determine which data are no longer needed and when Add nodes to the workflow do cleanup data along the way Issues: minimize the number of nodes and dependencies added so as not to slow down workflow execution deal with portions of workflows scheduled to multiple sites Joint work with Rizos Sakellariou, Manchester University, CCGrid 2007, Scientific Programming Journal, 2007 Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

LIGO-a gravitational-wave physics application and Montage Adding cleanup nodes to the workflow 1.25GB versus 4.5 GB Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu Full workflow: 185,000 nodes 466,000 edges 10 TB of input data 1 TB of output data. 56% improvement 26% improvement 166 nodes LIGO Workflows LIGO Scientists Don’t like their Workflows Either Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu Workflow Lifecycle Scheduling/ Execution Ewa Deelman www.isi.edu/~deelman Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Challenges in Workflow Execution Resource provisioning Which resources to provision if many possibilities? How many resources to provision? For how long? Fault Tolerance How to recognize different types of failures How to recover from failures? Efficient collaboration between the data and computation management systems Debugging How to relate the workflow result (outcome) to workflow specification Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Data Placement Service Interaction Between Workflow Planner and Data Placement Service for Staging Data Workflow Planner Data Placement Service Compute Cluster Storage Elements Jobs Data Transfer Workflow Tasks Staging Request Setup Transfers (Pegasus) (Data Replication Service) Joint work with Ann Chervenak, USC/ISI Grid 2007 Significant potential for multiple workflows Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Execution Times with Default Input Sizes Combination of prestaging data with DRS followed by workflow execution using Pegasus Improves execution time approximately 21.4% over Pegasus performing explicit data staging Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Execution Times with Additional 20 MB Input Files With asynchronous data staging, execution time is reduced by over 46% Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Workflow Mapping and Execution Connected For each data item, we can find the executable workflow steps that produced it and other data items that contributed to those steps. For each workflow step, we can find its connection to the workflow instance jobs from which it was refined. Workflow Instance Executable Workflow Data Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu Joint work with Luc Moreau, Southampton University e-Science 2007

Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu Workflow Lifecycle Reuse Ewa Deelman www.isi.edu/~deelman Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Challenges in reuse and sharing How to find what is already there How to determine the quality of what’s there How to invoke an existing workflow How to share a workflow with a colleague How to share a workflow with a competitor Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Sharing: the new frontier MyExperiment in the UK (University of Manchester), a repository of workflows http://www.myexperiment.org/ How do you share workflows across different workflow systems? How to write a workflow in Wings and execute in ASKALON? NSF/Mellon Workshop on Scientific and Scholarly Workflow, 2007 https://spaces.internet2.edu/display/SciSchWorkflow/Home How do you interpret results from one workflow when you are using a different workflows system (provenance-level interoperability) Provenance challenge http://twiki.ipaw.info/ Open provenance model http://eprints.ecs.soton.ac.uk/14979/1/opm.pdf Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu Conclusions Much work done to date in scientific workflows Scientists are buying into the new programming model Data handling is critical to the success of workflows Identifying the right data Managing data transfers and execution-side storage Reliability On-time data delivery Timely data offload Keeping track of provenance information Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu Acknowledgments Pegasus: Gaurang Mehta, Mei-Hui Su, Karan Vahi, Arun Ramakrishnan (USC) DAGMan (in Pegasus-WMS): Miron Livny, Kent Wenger, and the Condor team (Wisconsin Madison) Wings: Yolanda Gil, Jihie Kim, Varun Ratnakar, Paul Groth (USC) LIGO: Kent Blackburn, Duncan Brown, Stephen Fairhurst, Scott Koranda (Caltech) Montage: Bruce Berriman, John Good, Dan Katz, and Joe Jacobs (Caltech, JPL) SCEC: Tom Jordan, Robert Graves, Phil Maechling, David Okaya, Li Zhao (USC, UCSD, others) Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu

Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu Relevant Links Pegasus: pegasus.isi.edu DAGMan: www.cs.wisc.edu/condor/dagman Gil, Y., E. Deelman, et al. Examining the Challenges of Scientific Workflows. IEEE Computer, 2007. Workflows for e-Science, Taylor, I.J.; Deelman, E.; Gannon, D.B.; Shields, M. (Eds.), Dec. 2006 Montage: montage.ipac.caltech.edu/ LIGO: www.ligo.caltech.edu/ Condor: www.cs.wisc.edu/condor/ Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu