Download presentation
Presentation is loading. Please wait.
1
Pegasus: Mapping complex applications onto the Grid Ewa Deelman Center for Grid Technologies USC Information Sciences Institute
2
Ewa DeelmanInformation Sciences Institute Pegasus Acknowledgements l Ewa Deelman, Carl Kesselman, Saurabh Khurana, Gaurang Mehta, Sonal Patil, Gurmeet Singh, Mei-Hui Su, Karan Vahi (Center for Grid Computing, ISI) l James Blythe, Yolanda Gil (Intelligent Systems Division, ISI) l http://pegasus.isi.edu http://pegasus.isi.edu l Research funded as part of the NSF GriPhyN, NVO and SCEC projects.
3
Ewa DeelmanInformation Sciences Institute Outline l The GriPhyN project and Grid Applications l Workflow Management in Grids l Pegasus, Planning for Execution in Grids u Framework Description u Generation of Executable Workflows l Applications Using Pegasus l Future Research Directions
4
Ewa DeelmanInformation Sciences Institute GriPhyN Data Grid Challenge l Provide a framework that enables Virtual Organizations around the world to perform computationally demanding analysis of large, geographically distributed datasets. l The Virtual Organizations are large and highly distributed l The datasets are large, currently on the order of Terabytes and expected to grow to the level of 100s of Petabytes in the next decade l Provide a seamless access to data: experimental raw data or processed data products l Enable a user/application to ask for any domain-specific data, whether computed or not Concept of Virtual Data
5
Ewa DeelmanInformation Sciences Institute Grid Applications l Increasing in the level of complexity l Use of individual application components l Reuse of individual intermediate data products (files) l Description of Data Products using Metadata Attributes l Execution environment is complex and very dynamic u Resources come and go u Data is replicated u Components can be found at various locations or staged in on demand l Separation between u the application description u the actual execution description
6
Ewa DeelmanInformation Sciences Institute
7
Generating an Abstract Workflow l Available Information u Specification of component capabilities u Ability to generate the desired data products Select and configure application components to form an abstract workflow u assign input files that exist or that can be generated by other application components. u specify the order in which the components must be executed u components and files are referred to by their logical names l Logical transformation name l Logical file name l Both transformations and data can be replicated
8
Ewa DeelmanInformation Sciences Institute Generating a Concrete Workflow l Information u location of files and component Instances u State of the Grid resources l Select specific u Resources u Files u Add jobs required to form a concrete workflow that can be executed in the Grid environment l Data movement u Data registration u Each component in the abstract workflow is turned into an executable job
9
Ewa DeelmanInformation Sciences Institute Why Automate Workflow Generation? l Usability: Limit User’s necessary Grid knowledge l Monitoring and Directory Service l Replica Location Service l Complexity: u User needs to make choices l Alternative application components l Alternative files l Alternative locations u The user may reach a dead end u Many different interdependencies may occur among components l Solution cost: u Evaluate the alternative solution costs l Performance l Reliability l Resource Usage l Global cost: u minimizing cost within a community or a virtual organization u requires reasoning about individual user’s choices in light of other user’s choices
10
Ewa DeelmanInformation Sciences Institute GriPhyN’s Executable Workflow Construction l Build an abstract workflow based on VDL descriptions (Chimera) l Build an executable workflow based on the abstract workflows (Pegasus) l Execute the workflow (Condor’s DAGMan) VDL
11
Ewa DeelmanInformation Sciences Institute Chimera: Creating Abstract Workflows l Developed at ANL (Foster, Voeckler, Wilde) l Chimera’s Virtual Data Language (VDL) allows for the description of an abstract workflow l Transformations: u general description of the transformation applied to data, use logical transformation name TRgalMorph( in redshift, in pixScale, in zeroPoint, in Ho, in om, in flat, in image, out galMorph ) { … }
12
Ewa DeelmanInformation Sciences Institute Chimera : Creating Abstract Workflows l Derivations are instantiations of TRs u Identify particular logical input and output file names u Identify actual parameters DV d1->galMorph( redshift="0.027886", image=@{in:"NGP9_F323-0927589.fit"}, pixScale="2.831933107035062E-4", zeroPoint="0", Ho="100", om="0.3", flat="1", galMorph=@{out:"NGP9_F323-0927589.txt"} );
13
Ewa DeelmanInformation Sciences Institute Abstract Workflow Generation l Definitions for transformations and derivations are stored in Chimera’s Database l Database can be browsed l User queries Chimera giving it a logical filename
14
Ewa DeelmanInformation Sciences Institute VDL and Abstract Workflow VDL descriptions User request data file “c” Abstract Workflow
15
Ewa DeelmanInformation Sciences Institute Condor’s DAGMan l Developed at UW Madison (Livny) l Executes a concrete workflow l Makes sure the dependencies are followed l Execute the jobs specified in the workflow u Execution u Data movement u Catalog updates l Provides a “rescue DAG” in case of failure
16
Ewa DeelmanInformation Sciences Institute Pegasus: Planning for Execution in Grids l Maps from abstract to concrete workflow u Algorithmic and AI-based techniques l Automatically locates physical locations for both components (transformations) and data l Finds appropriate resources to execute l Reuses existing data products where applicable l Publishes newly derived data products u Chimera virtual data catalog u Provides provenance information
17
Ewa DeelmanInformation Sciences Institute
18
Information Components Used by Pegasus l Globus Monitoring and Discovery Service (MDS) u Locates available resources u Finds resource properties l Dynamic: load, queue length l Static: location of gridftp server, RLS, etc l Globus Replica Location Service u Locates data that may be replicated u Registers new data products l Transformation Catalog u Locates installed executables
19
Ewa DeelmanInformation Sciences Institute Example Workflow Reduction l Original abstract workflow l If “b” already exists (as determined by query to the RLS), the workflow can be reduced
20
Ewa DeelmanInformation Sciences Institute Mapping from abstract to concrete l Query RLS, MDS, and TC, schedule computation and data movement
21
Ewa DeelmanInformation Sciences Institute Applications Using Chimera, Pegasus and DAGMan l GriPhyN applications: u High-energy physics: Atlas, CMS (many) u Astronomy: SDSS (Fermi Lab, ANL) u Gravitational-wave physics: LIGO (Caltech, UWM) l Astronomy: u Galaxy Morphology (NCSA, JHU, Fermi, many others, NVO-funded) l Biology u BLAST (ANL, PDQ-funded) l Neuroscience u Tomography for Telescience(SDSC, NIH-funded)
22
Ewa DeelmanInformation Sciences Institute Pegasus interfaces l Main interface: command-line interface l Applications can also be integrated with a portal environment l Demonstrated the portal at SC 2003 u LIGO-gravitational-wave physics u Montage-astronomy l Much of the portal is application- independent
23
Ewa DeelmanInformation Sciences Institute Montage l Montage (NASA and NVO) u Deliver science-grade custom mosaics on demand u Produce mosaics from a wide range of data sources (possibly in different spectra) u User-specified parameters of projection, coordinates, size, rotation and spatial sampling. Mosaic created by Pegasus based Montage from a run of the M101 galaxy images on the Teragrid.
24
Ewa DeelmanInformation Sciences Institute Small Montage Workflow ~1200 nodes
25
Ewa DeelmanInformation Sciences Institute Montage Acknowledgments l Bruce Berriman, John Good, Anastasia Laity, Caltech/IPAC l Joseph C. Jacob, Daniel S. Katz, JPL l http://montage.ipac. caltech.edu/ http://montage.ipac l Testbed for Montage: Condor pools at USC/ISI, UW Madison, and Teragrid resources at NCSA, PSC, and SDSC. Montage is funded by the National Aeronautics and Space Administration's Earth Science Technology Office, Computational Technologies Project, under Cooperative Agreement Number NCC5-626 between NASA and the California Institute of Technology.
26
Ewa DeelmanInformation Sciences Institute
31
Conclusions l Pegasus maps complex workflows onto the Grid l Uses Grid information services to find resources, data and executables l Reduces the workflow based on existing intermediate products l Used in many applications l Part of GriPhyN’s Virtual Data Toolkit
32
Ewa DeelmanInformation Sciences Institute Future Directions l Incorporate AI-planning technologies in production software (Virtual Data Toolkit) l Investigate various scheduling techniques l Investigating fault tolerance issues u Selecting resources based on their reliability u Responding to failures l http://pegasus.isi.edu
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.