Large-Scale Science Through Workflow Management Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.

Slides:



Advertisements
Similar presentations
3 September 2004NVO Coordination Meeting1 Grid-Technologies NVO and the Grid Reagan W. Moore George Kremenek Leesa Brieger Ewa Deelman Roy Williams John.
Advertisements

Pegasus on the Virtual Grid: A Case Study of Workflow Planning over Captive Resources Yang-Suk Kee, Eun-Kyu Byun, Ewa Deelman, Kran Vahi, Jin-Soo Kim Oracle.
The ADAMANT Project: Linking Scientific Workflows and Networks “Adaptive Data-Aware Multi-Domain Application Network Topologies” Ilia Baldine, Charles.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Ewa Deelman, Optimizing for Time and Space in Distributed Scientific Workflows Ewa Deelman University.
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
An Astronomical Image Mosaic Service for the National Virtual Observatory
A Grid-Enabled Engine for Delivering Custom Science- Grade Images on Demand
An Astronomical Image Mosaic Service for the National Virtual Observatory / ESTO.
Ewa Deelman Using Grid Technologies to Support Large-Scale Astronomy Applications Ewa Deelman Center for Grid Technologies USC Information.
1 The Application-Infrastructure Gap Dynamic and/or Distributed Applications A 1 B Shared Distributed Infrastructure.
Ewa Deelman, Clouds: An Opportunity for Scientific Applications? Ewa Deelman USC Information Sciences.
Managing Workflows with the Pegasus Workflow Management System
Ewa Deelman, Pegasus and DAGMan: From Concept to Execution Mapping Scientific Workflows onto the National.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Pegasus A Framework for Workflow Planning on the Grid Ewa Deelman USC Information Sciences Institute Pegasus Acknowledgments: Carl Kesselman, Gaurang Mehta,
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need.
LIGO- G E LIGO Grid Applications Development Within GriPhyN Albert Lazzarini LIGO Laboratory, Caltech GriPhyN All Hands.
1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Artificial Intelligence and Large-Scope Science: Workflow Planning and Beyond Yolanda Gil USC/Information.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Combining the strengths of UMIST and The Victoria University of Manchester Utility Driven Adaptive Workflow Execution Kevin Lee School of Computer Science,
 The workflow description modified to output a VDS DAX.  The workflow description toolkit developed allows any concrete workflow description to be migrated.
Pegasus-a framework for planning for execution in grids Ewa Deelman USC Information Sciences Institute.
Pegasus: Planning for Execution in Grids Ewa Deelman Information Sciences Institute University of Southern California.
CSIU Submission of BLAST jobs via the Galaxy Interface Rob Quick Open Science Grid – Operations Area Coordinator Indiana University.
1 USC Information Sciences Institute Yolanda Gil AAAI-08 Tutorial July 13, 2008 AAAI-08 Tutorial on Computational Workflows for Large-Scale.
Dr. Ahmed Abdeen Hamed, Ph.D. University of Vermont, EPSCoR Research on Adaptation to Climate Change (RACC) Burlington Vermont USA MODELING THE IMPACTS.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Pegasus: Mapping Scientific Workflows onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
1 USC INFORMATION SCIENCES INSTITUTE Yolanda Gil Interactive Composition of Computational Pathways Jihie Kim Varun Ratnakar Students: Marc Spraragen (USC)
Combining the strengths of UMIST and The Victoria University of Manchester Adaptive Workflow Processing and Execution in Pegasus Kevin Lee School of Computer.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida DGL: The Assembly Language for Grid Computing Arun swaran Jagatheesan.
Pegasus: Running Large-Scale Scientific Workflows on the TeraGrid Ewa Deelman USC Information Sciences Institute
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
GriPhyN Virtual Data System Grid Execution of Virtual Data Workflows Mike Wilde Argonne National Laboratory Mathematics and Computer Science Division.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Pegasus-a framework for planning for execution in grids Karan Vahi USC Information Sciences Institute May 5 th, 2004.
Planning Ewa Deelman USC Information Sciences Institute GriPhyN NSF Project Review January 2003 Chicago.
Pegasus: Planning for Execution in Grids Ewa Deelman, Carl Kesselman, Gaurang Mehta, Gurmeet Singh, Karan Vahi Information Sciences Institute University.
Funded by the NSF OCI program grants OCI and OCI Mats Rynge, Gideon Juve, Karan Vahi, Gaurang Mehta, Ewa Deelman Information Sciences Institute,
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
1 Pegasus and wings WINGS/Pegasus Provenance Challenge Ewa Deelman Yolanda Gil Jihie Kim Gaurang Mehta Varun Ratnakar USC Information Sciences Institute.
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
1 Artemis: Integrating Scientific Data on the Grid Rattapoom Tuchinda Snehal Thakkar Yolanda Gil Ewa Deelman.
1 USC Information Sciences InstituteYolanda Gil AAAI-08 Tutorial July 13, 2008 Part IV Workflow Mapping and Execution in Pegasus (Thanks.
Managing LIGO Workflows on OSG with Pegasus Karan Vahi USC Information Sciences Institute
Ewa Deelman, Managing Scientific Workflows on OSG with Pegasus Ewa Deelman USC Information Sciences.
Resource Allocation and Scheduling for Workflows Gurmeet Singh, Carl Kesselman, Ewa Deelman.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
SCEC CyberShake on TG & OSG: Options and Experiments Allan Espinosa*°, Daniel S. Katz*, Michael Wilde*, Ian Foster*°,
Pegasus WMS Extends DAGMan to the grid world
Cloudy Skies: Astronomy and Utility Computing
Seismic Hazard Analysis Using Distributed Workflows
Pegasus and Condor Gaurang Mehta, Ewa Deelman, Carl Kesselman, Karan Vahi Center For Grid Technologies USC/ISI.
USC Information Sciences Institute {jihie, gil,
Ewa Deelman University of Southern California
Mats Rynge USC Information Sciences Institute
rvGAHP – Push-Based Job Submission Using Reverse SSH Connections
High Throughput Computing for Astronomers
A General Approach to Real-time Workflow Monitoring
Frieda meets Pegasus-WMS
Presentation transcript:

Large-Scale Science Through Workflow Management Ewa Deelman Center for Grid Technologies USC Information Sciences Institute

Ewa Deelman, Acknowledgements Ewa Deelman, Carl Kesselman, Gaurang Mehta, Gurmeet Singh, Mei-Hui Su, Karan Vahi (Center for Grid Technologies, ISI) James Blythe, Yolanda Gil (Intelligent Systems Division, ISI) Research funded as part of the NSF GriPhyN, NVO and SCEC projects and EU- funded GridLab

Ewa Deelman, Today’s Scientific Applications Increasing in the level of complexity Use of individual application components Reuse of individual intermediate data products (files) Description of Data Products using Metadata Attributes Execution environment is complex and very dynamic Resources come and go Data is replicated Components can be found at various locations or staged in on demand Separation between the application description the actual execution description

Ewa Deelman, Workflow Definitions Workflow template: shows the main steps in the scientific analysis and their dependencies without specifying particular data products Abstract workflow: depicts the scientific analysis including the data used and generated, but does not include information about the resources needed for execution Concrete workflow: an executable workflow that includes details of the execution environment

Ewa Deelman,

Concrete Workflow Generation and Mapping

Ewa Deelman, Pegasus: Planning for Execution in Grids Maps from abstract to concrete workflow Algorithmic and AI-based techniques Automatically locates physical locations for both workflow components and data Finds appropriate resources to execute Reuses existing data products where applicable Publishes newly derived data products Provides provenance information

Ewa Deelman, Generating a Concrete Workflow Information location of files and component Instances State of the Grid resources Select specific Resources Files Add jobs required to form a concrete workflow that can be executed in the Grid environment Data movement Data registration Each component in the abstract workflow is turned into an executable job

Ewa Deelman, Information Components used by Pegasus Globus Monitoring and Discovery Service (MDS) Locates available resources Finds resource properties Dynamic: load, queue length Static: location of GridFTP server, RLS, etc Globus Replica Location Service Locates data that may be replicated Registers new data products Transformation Catalog Locates installed executables

Ewa Deelman, Example Workflow Reduction Original abstract workflow If “b” already exists (as determined by query to the RLS), the workflow can be reduced

Ewa Deelman, Mapping from abstract to concrete Query RLS, MDS, and TC, schedule computation and data movement

Ewa Deelman, Pegasus Research resource discovery and assessment resource selection resource provisioning workflow restructuring task merged together or reordered to improve overall performance adaptive computing Workflow refinement adapts to changing execution environment

Ewa Deelman, Benefits of the workflow & Pegasus approach The workflow exposes the structure of the application maximum parallelism of the application Pegasus can take advantage of the structure to Set a planning horizon (how far into the workflow to plan) Cluster a set of workflow nodes to be executed as one (for performance) Pegasus shields from the Grid details

Ewa Deelman, Benefits of the workflow & Pegasus approach Pegasus can run the workflow on a variety of resources Pegasus can run a single workflow across multiple resources Pegasus can opportunistically take advantage of available resources (through dynamic workflow mapping) Pegasus can take advantage of pre-existing intermediate data products Pegasus can improve the performance of the application.

Ewa Deelman, Mosaic of M42 created on the Teragrid resources using Pegasus Pegasus improved the runtime of this application by 90% over the baseline case Bruce Berriman, John Good (Caltech) Joe Jacob, Dan Katz (JPL)

Ewa Deelman, Future Directions Support for workflows with real-time feedback to scientists. Providing intermediate analysis results so that the experimental setup can be adjusted while the short-lived samples or human subjects are available.

Ewa Deelman, time Levels of abstraction Application -level knowledge Logical tasks Tasks bound to resources and sent for execution User’s Request Relevant components Full abstract workflow Partial execution Not yet executed Workflow refinement Onto-based Matchmaker Workflow repair Policy reasoner Cognitive Grids: Distributed Intelligent Reasoners that Incrementally Generate the Workflow

Ewa Deelman, BLAST : set of sequence comparison algorithms that are used to search sequence databases for optimal local alignments to a query Lead by Veronika Nefedova (ANL) as part of the Paci Data Quest Expedition program 2 major runs were performed using Chimera and Pegasus: 1)60 genomes (4,000 sequences each), In 24 hours processed Genomes selected from DOE-sponsored sequencing projects 67 CPU-days of processing time delivered ~ 10,000 Grid jobs >200,000 BLAST executions 50 GB of data generated 2) 450 genomes processed Speedup of 5-20 times were achieved because the compute nodes we used efficiently by keeping the submission of the jobs to the compute cluster constant.

Ewa Deelman, Tomography (NIH-funded project) Derivation of 3D structure from a series of 2D electron microscopic projection images, Reconstruction and detailed structural analysis complex structures like synapses large structures like dendritic spines. Acquisition and generation of huge amounts of data Large amount of state-of-the-art image processing required to segment structures from extraneous background. Dendrite structure to be rendered by Tomography Work performed with Mark Ellisman, Steve Peltier, Abel Lin, Thomas Molina (SDSC)

Ewa Deelman, LIGO’s pulsar search at SC 2002 The pulsar search conducted at SC 2002 Used LIGO’s data collected during the first scientific run of the instrument Targeted a set of 1000 locations of known pulsar as well as random locations in the sky Results of the analysis were be published via LDAS (LIGO Data Analysis System) to the LIGO Scientific Collaboration performed using LDAS and compute and storage resources at Caltech, University of Southern California, University of Wisconsin Milwaukee. ISI people involved: Gaurang Mehta, Sonal Patil, Srividya Rao, Gurmeet Singh, Karan Vahi Visualization by Marcus Thiebaux

Ewa Deelman, Southern California Earthquake Center Southern California Earthquake Center (SCEC), in collaboration with the USC Information Sciences Institute, San Diego Supercomputer Center, the Incorporated Research Institutions for Seismology, and the U.S. Geological Survey, is developing the Southern California Earthquake Center Community Modeling Environment (SCEC/CME). Create fully three-dimensional (3D) simulations of fault-system dynamics. Physics-based simulations can potentially provide enormous practical benefits for assessing and mitigating earthquake risks through Seismic Hazard Analysis (SHA). The SCEC/CME system is an integrated geophysical simulation modeling framework that automates the process of selecting, configuring, and executing models of earthquake systems. Acknowledgments : Philip Maechling and Vipin Gupta University Of Southern California