Presentation is loading. Please wait.

Presentation is loading. Please wait.

Virtual Data Management for Grid Computing Michael Wilde Argonne National Laboratory Gaurang Mehta, Karan Vahi Center for Grid Technologies.

Similar presentations


Presentation on theme: "Virtual Data Management for Grid Computing Michael Wilde Argonne National Laboratory Gaurang Mehta, Karan Vahi Center for Grid Technologies."— Presentation transcript:

1 Virtual Data Management for Grid Computing Michael Wilde Argonne National Laboratory wilde@mcs.anl.gov Gaurang Mehta, Karan Vahi Center for Grid Technologies USC Information Sciences Institute gmehta,vahi @isi.edu

2

3 3 Outline l The concept of Virtual Data l The Virtual Data System and Language l Tools and approach for Running on the Grid l Pegasus: Grid Workflow Planning l Virtual Data Applications on the Grid l Research issues

4 4 The Virtual Data Concept Enhance scientific productivity through: l Discovery and application of datasets and programs at petabyte scale l Enabling use of a worldwide data grid as a scientific workstation Virtual Data enables this approach by creating datasets from workflow “recipes” and recording their provenance.

5 5 Virtual Data Concept l Motivated by next generation data-intensive applications –Enormous quantities of data, petabyte-scale –The need to discover, access, explore, and analyze diverse distributed data sources –The need to organize, archive, schedule, and explain scientific workflows –The need to share data products and the resources needed to produce and store them

6 6 GriPhyN – The Grid Physics Network l NSF-funded IT research project l Supports the concept of Virtual Data, where data is materialized on demand l Data can exist on some data resource and be directly accessible l Data can exist only in a form of a recipe l The GriPhyN Virtual Data System can seamlessly deliver the data to the user or application regardless of the form in which the data exists l GriPhyN targets applications in high-energy physics, gravitational-wave physics and astronomy

7 7 Virtual Data System Capabilities Producing data from transformations with uniform, precise data interface descriptions enables… l Discovery: finding and understanding datasets and transformations l Workflow: structured paradigm for organizing, locating, specifying, & producing scientific datasets –Forming new workflow –Building new workflow from existing patterns –Managing change l Planning: automated to make the Grid transparent l Audit: explanation and validation via provenance

8 8 Jim Annis, Steve Kent, Vijay Sehkri, Fermilab, Michael Milligan, Yong Zhao, University of Chicago Galaxy cluster size distribution DAG Virtual Data Example: Galaxy Cluster Search Sloan Data

9 9 mass = 200 decay = WW stability = 1 LowPt = 20 HighPt = 10000 mass = 200 decay = WW stability = 1 event = 8 mass = 200 decay = WW stability = 1 plot = 1 mass = 200 decay = WW plot = 1 mass = 200 decay = WW event = 8 mass = 200 decay = WW stability = 1 mass = 200 decay = WW stability = 3 mass = 200 decay = WW mass = 200 decay = ZZ mass = 200 decay = bb mass = 200 plot = 1 mass = 200 event = 8 Virtual Data Application: High Energy Physics Data Analysis Work and slide by Rick Cavanaugh and Dimitri Bourilkov, University of Florida

10 10 I have some subject images, what analyses are available? which can be applied to this format? Lifecycle of Virtual Data I want to apply an image registration program to thousands of objects. If the results already exist, I’ll save weeks of computation. I’ve come across some interesting data, but I need to understand the nature of the analyses applied when it was constructed before I can trust it for my purposes. I need to document the devices and methods used to measure the data, and keep track of the analysis steps applied to the data.

11 11 A Virtual Data Grid

12 12 Virtual Data Scenario simulate – t 10 … file1 file2 reformat – f fz … file1 File3,4,5 psearch – t 10 … conv – I esd – o aod file6 summarize – t 10 … file7 file8 On-demand data generation Update workflow following changes Manage workflow; psearch –t 10 –i file3 file4 file5 –o file8 summarize –t 10 –i file6 –o file7 reformat –f fz –i file2 –o file3 file4 file5 conv –l esd –o aod –i file 2 –o file6 simulate –t 10 –o file1 file2 Explain provenance, e.g. for file8:

13 13 VDL and Abstract Workflow VDL descriptions User request data file “c” Abstract Workflow

14 14 Example Workflow Reduction l Original abstract workflow l If “b” already exists (as determined by query to the RLS), the workflow can be reduced

15 15 Mapping from abstract to concrete l Query RLS, MDS, and TC, schedule computation and data movement

16 16 Workflow management in GriPhyN l Workflow Generation: how do you describe the workflow (at various levels of abstraction)? (Virtual Data Language and catalog “VDC”)) l Workflow Mapping/Refinement: how do you map an abstract workflow representation to an executable form? (“Planners”: Pegasus) l Workflow Execution: how to you reliably execute the workflow? (Condor’s DAGMan)

17 17 Executable Workflow Construction l VDL tools used to build an abstract workflow based on VDL descriptions l Planners (e.g. Pegasus) take the abstract workflow and produce an executable workflow for the Grid or other environments l Workflow executors (“enactment engines”) like Condor DAGMan execute the workflow

18 18 Terms l Abstract Workflow (DAX) –Expressed in terms of logical entities –Specifies all logical files required to generate the desired data product from scratch –Dependencies between the jobs –Analogous to build style dag l Concrete Workflow –Expressed in terms of physical entities –Specifies the location of the data and executables –Analogous to a make style dag

19 19 Outline l The concept of Virtual Data l The Virtual Data System and Language l Tools and Approach for Running on the Grid l Pegasus: Grid Workflow Planning l Virtual Data Applications on the Grid l Research issues

20 20 VDL: Virtual Data Language Describes Data Transformations l Transformation - “TR” –Abstract template of program invocation –Similar to "function definition" l Derivation – “DV” –“Function call” to a Transformation –Store past and future: >A record of how data products were generated >A recipe of how data products can be generated l Invocation –Record of a Derivation execution

21 21 Example Transformation TR t1( out a2, in a1, none pa = "500", none env = "100000" ) { argument = "-p "${pa}; argument = "-f "${a1}; argument = "-x –y"; argument stdout = ${a2}; profile env.MAXMEM = ${env}; } $a1 $a2 t1

22 22 Example Derivations DV d1->t1 ( env="20000", pa="600", a2=@{out:run1.exp15.T1932.summary}, a1=@{in:run1.exp15.T1932.raw}, ); DV d2->t1 ( a1=@{in:run1.exp16.T1918.raw}, a2=@{out.run1.exp16.T1918.summary} );

23 23 Workflow from File Dependencies TR tr1(in a1, out a2) { argument stdin = ${a1}; argument stdout = ${a2}; } TR tr2(in a1, out a2) { argument stdin = ${a1}; argument stdout = ${a2}; } DV x1->tr1(a1=@{in:file1}, a2=@{out:file2}); DV x2->tr2(a1=@{in:file2}, a2=@{out:file3}); file1 file2 file3 x1 x2

24 24 Example Workflow l Complex structure –Fan-in –Fan-out –"left" and "right" can run in parallel l Uses input file –Register with RLS l Complex file dependencies –Glues workflow findrange analyze preprocess

25 25 Workflow step "preprocess" l TR preprocess turns f.a into f.b1 and f.b2 TR preprocess( output b[], input a ) { argument = "-a top"; argument = " –i "${input:a}; argument = " –o " ${output:b}; } l Makes use of the "list" feature of VDL –Generates 0..N output files. –Number file files depend on the caller.

26 26 Workflow step "findrange" l Turns two inputs into one output TR findrange( output b, input a1, input a2, none name="findrange", none p="0.0" ) { argument = "-a "${name}; argument = " –i " ${a1} " " ${a2}; argument = " –o " ${b}; argument = " –p " ${p}; } l Uses the default argument feature

27 27 Can also use list[] parameters TR findrange( output b, input a[], none name="findrange", none p="0.0" ) { argument = "-a "${name}; argument = " –i " ${" "|a}; argument = " –o " ${b}; argument = " –p " ${p}; }

28 28 Workflow step "analyze" l Combines intermediary results TR analyze( output b, input a[] ) { argument = "-a bottom"; argument = " –i " ${a}; argument = " –o " ${b}; }

29 29 Complete VDL workflow l Generate appropriate derivations DV top->preprocess( b=[ @{out:"f.b1"}, @{ out:"f.b2"} ], a=@{in:"f.a"} ); DV left->findrange( b=@{out:"f.c1"}, a2=@{in:"f.b2"}, a1=@{in:"f.b1"}, name="left", p="0.5" ); DV right->findrange( b=@{out:"f.c2"}, a2=@{in:"f.b2"}, a1=@{in:"f.b1"}, name="right" ); DV bottom->analyze( b=@{out:"f.d"}, a=[ @{in:"f.c1"}, @{in:"f.c2"} );

30 30 Compound Transformations l Using compound TR –Permits composition of complex TRs from basic ones –Calls are independent >unless linked through LFN –A Call is effectively an anonymous derivation >Late instantiation at workflow generation time –Permits bundling of repetitive workflows –Model: Function calls nested within a function definition

31 31 Compound Transformations (cont) l TR diamond bundles black-diamonds TR diamond( out fd, io fc1, io fc2, io fb1, io fb2, in fa, p1, p2 ) { call preprocess( a=${fa}, b=[ ${out:fb1}, ${out:fb2} ] ); call findrange( a1=${in:fb1}, a2=${in:fb2}, name="LEFT", p=${p1}, b=${out:fc1} ); call findrange( a1=${in:fb1}, a2=${in:fb2}, name="RIGHT", p=${p2}, b=${out:fc2} ); call analyze( a=[ ${in:fc1}, ${in:fc2} ], b=${fd} ); }

32 32 Compound Transformations (cont) l Multiple DVs allow easy generator scripts: DV d1->diamond( fd=@{out:"f.00005"}, fc1=@{io:"f.00004"}, fc2=@{io:"f.00003"}, fb1=@{io:"f.00002"}, fb2=@{io:"f.00001"}, fa=@{io:"f.00000"}, p2="100", p1="0" ); DV d2->diamond( fd=@{out:"f.0000B"}, fc1=@{io:"f.0000A"}, fc2=@{io:"f.00009"}, fb1=@{io:"f.00008"}, fb2=@{io:"f.00007"}, fa=@{io:"f.00006"}, p2="141.42135623731", p1="0" );... DV d70->diamond( fd=@{out:"f.001A3"}, fc1=@{io:"f.001A2"}, fc2=@{io:"f.001A1"}, fb1=@{io:"f.001A0"}, fb2=@{io:"f.0019F"}, fa=@{io:"f.0019E"}, p2="800", p1="18" );

33 33 Functional MRI Analysis

34 34 fMRI Example: AIR Tools TR air::align_warp( in reg_img, in reg_hdr, in sub_img, in sub_hdr, m, out warp ) { argument = ${reg_img}; argument = ${sub_img}; argument = ${warp}; argument = "-m " ${m}; argument = "-q"; } TR air::reslice( in warp, sliced, out sliced_img, out sliced_hdr ) { argument = ${warp}; argument = ${sliced}; }

35 35 fMRI Example: AIR Tools TR air::warp_n_slice( in reg_img, in reg_hdr, in sub_img, in sub_hdr, m = "12", io warp, sliced, out sliced_img, out sliced_hdr ) { call air::align_warp( reg_img=${reg_img}, reg_hdr=${reg_hdr}, sub_img=${sub_img}, sub_hdr=${sub_hdr}, m=${m}, warp = ${out:warp} ); call air::reslice( warp=${in:warp}, sliced=${sliced}, sliced_img=${sliced_img}, sliced_hdr=${sliced_hdr} ); } TR air::softmean( in sliced_img[], in sliced_hdr[], arg1 = "y", arg2 = "null", atlas, out atlas_img, out atlas_hdr ) { argument = ${atlas}; argument = ${arg1} " " ${arg2}; argument = ${sliced_img}; }

36 36 fMRI Example: AIR Tools DV air::i3472_3->air::warp_n_slice( reg_hdr = @{in:"3472-3_anonymized.hdr"}, reg_img = @{in:"3472-3_anonymized.img"}, sub_hdr = @{in:"3472-3_anonymized.hdr"}, sub_img = @{in:"3472-3_anonymized.img"}, warp = @{io:"3472-3_anonymized.warp"}, sliced = "3472-3_anonymized.sliced", sliced_hdr = @{out:"3472-3_anonymized.sliced.hdr"}, sliced_img = @{out:"3472-3_anonymized.sliced.img"} ); … DV air::i3472_6->air::warp_n_slice( reg_hdr = @{in:"3472-3_anonymized.hdr"}, reg_img = @{in:"3472-3_anonymized.img"}, sub_hdr = @{in:"3472-6_anonymized.hdr"}, sub_img = @{in:"3472-6_anonymized.img"}, warp = @{io:"3472-6_anonymized.warp"}, sliced = "3472-6_anonymized.sliced", sliced_hdr = @{out:"3472-6_anonymized.sliced.hdr"}, sliced_img = @{out:"3472-6_anonymized.sliced.img"} );

37 37 fMRI Example: AIR Tools DV air::a3472_3->air::softmean( sliced_img = [ @{in:"3472-3_anonymized.sliced.img"}, @{in:"3472-4_anonymized.sliced.img"}, @{in:"3472-5_anonymized.sliced.img"}, @{in:"3472-6_anonymized.sliced.img"} ], sliced_hdr = [ @{in:"3472-3_anonymized.sliced.hdr"}, @{in:"3472-4_anonymized.sliced.hdr"}, @{in:"3472-5_anonymized.sliced.hdr"}, @{in:"3472-6_anonymized.sliced.hdr"} ], atlas = "atlas", atlas_img = @{out:"atlas.img"}, atlas_hdr = @{out:"atlas.hdr"} );

38 38 Query Examples Which TRs can process a "subject" image? l Q: xsearchvdc -q tr_meta dataType subject_image input l A: fMRIDC.AIR::align_warp Which TRs can create an "ATLAS"? l Q: xsearchvdc -q tr_meta dataType atlas_image output l A: fMRIDC.AIR::softmean Which TRs have output parameter "warp" and a parameter "options" l Q: xsearchvdc -q tr_para warp output options l A: fMRIDC.AIR::align_warp Which DVs call TR "slicer"? l Q: xsearchvdc -q tr_dv slicer l A: fMRIDC.FSL::s3472_3_x->fMRIDC.FSL::slicer fMRIDC.FSL::s3472_3_y->fMRIDC.FSL::slicer fMRIDC.FSL::s3472_3_z->fMRIDC.FSL::slicer

39 39 Query Examples List anonymized subject-images for young subjects. This query searches for files based on their metadata. l Q: xsearchvdc -q lfn_meta dataType subject_image privacy anonymized subjectType young l A: 3472-4_anonymized.img For a specific patient image, 3472-3, show all DVs and files that were derived from this image, directly or indirectly. l Q: xsearchvdc -q lfn_tree 3472-3_anonymized.img l A: 3472-3_anonymized.img 3472-3_anonymized.sliced.hdr atlas.hdr atlas.img … atlas_z.jpg 3472-3_anonymized.sliced.img

40 40 Virtual Provenance: list of derivations and files <job id="ID000001" namespace="Quarknet.HEPSRCH" name="ECalEnergySum" level="5" dv-namespace="Quarknet.HEPSRCH" dv-name="run1aesum">... <job id="ID000014" namespace="Quarknet.HEPSRCH" name="ReconTotalEnergy" level="3"… …... (excerpted for display)

41 41 Virtual Provenance in XML: control flow graph............ … (excerpted for display…)

42 42 Invocation Provenance Completion status and resource usage Attributes of executable transformation Attributes of input and output files

43 43 Future: Provenance & Workflow for Web Services

44 44

45 45 Outline l The concept of Virtual Data l The Virtual Data System and Language l Tools and Issues for Running on the Grid l Pegasus: Grid Workflow Planning l Virtual Data Applications on the Grid l Research issues

46 46 Outline l Introduction and the GriPhyN project l Chimera l Overview of Grid concepts and tools l Pegasus l Applications using Chimera and Pegasus l Research issues l Exercises

47 47 Motivation for Grids: How do we solve problems? l Communities committed to common goals –Virtual organizations l Teams with heterogeneous members & capabilities l Distributed geographically and politically –No location/organization possesses all required skills and resources l Adapt as a function of the situation –Adjust membership, reallocate responsibilities, renegotiate resources

48 48 The Grid Vision “ Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations” –On-demand, ubiquitous access to computing, data, and services –New capabilities constructed dynamically and transparently from distributed services

49 49 The Grid l Emerging computational, networking, and storage infrastructure –Pervasive, uniform, and reliable access to remote data, computational, sensor, and human resources l Enable new approaches to applications and problem solving –Remote resources the rule, not the exception l Challenges –Heterogeneous components –Component failures common –Different administrative domains –Local policies for security and resource usage

50 50 Data Grids for High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPSFrance Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents www.griphyn.org www.ppdg.net www.eu-datagrid.org Slide courtesy Harvey Newman, CalTech

51 51 Grid Applications l Increasing in the level of complexity l Use of individual application components l Reuse of individual intermediate data products l Description of Data Products using Metadata Attributes l Execution environment is complex and very dynamic –Resources come and go –Data is replicated –Components can be found at various locations or staged in on demand l Separation between –the application description –the actual execution description

52 52 Grid3 – The Laboratory Supported by the National Science Foundation and the Department of Energy.

53 53 Globus Monitoring and Discovery Service (MDS) The MDS architecture is a flexible hierarchy. There can be several levels of GIISes; any GRIS can register with any GIIS; and any GIIS can register with another, making this approach modular and extensible.

54 54 GridFTP l Data-intensive grid applications transfer and replicate large data sets (terabytes, petabytes) l GridFTP Features: –Third party (client mediated) transfer –Parallel transfers –Striped transfers –TCP buffer optimizations –Grid security l Important feature is separation of control and data channel

55 55 A Replica Location Service l A Replica Location Service (RLS) is a distributed registry service that records the locations of data copies and allows discovery of replicas l Maintains mappings between logical identifiers and target names –Physical targets: Map to exact locations of replicated data –Logical targets: Map to another layer of logical names, allowing storage systems to move data without informing the RLS l RLS was designed and implemented in a collaboration between the Globus project and the DataGrid project

56 56 LRC RLI LRC Replica Location Indexes Local Replica Catalogs LRCs contain consistent information about logical-to- target mappings on a site RLIs nodes aggregate information about LRCs Soft state updates from LRCs to RLIs: relaxed consistency of index information, used to rebuild index after failures Arbitrary levels of RLI hierarchy

57 57 Condor’s DAGMan l Developed at UW Madison (Livny) l Executes a concrete workflow l Makes sure the dependencies are followed l Executes the jobs specified in the workflow –Execution –Data movement –Catalog updates l Provides a “rescue DAG” in case of failure

58 58 Outline l The concept of Virtual Data l The Virtual Data System and Language l Tools and Approach for Running on the Grid l Pegasus: Grid Workflow Planning l Virtual Data Applications on the Grid l Research issues

59 59 Pegasus Outline –Introduction –Pegasus and How it works –Pegasus Components and Internals –Deferred Planning –Pegasus Portal

60 60 Introduction l Scientific applications in various domains are compute and data intensive. l Not feasible to run on a single resource (limits scalability). l Scientific application can be represented as a workflow. l Use of individual application components. l Reuse data if possible (Virtual Data).

61 61 Abstract Workflow Generation Concrete Workflow Generation

62 62 Generating an Abstract Workflow l Available Information –Specification of component capabilities –Ability to generate the desired data products Select and configure application components to form an abstract workflow –assign input files that exist or that can be generated by other application components. –specify the order in which the components must be executed –components and files are referred to by their logical names >Logical transformation name >Logical file name >Both transformations and data can be replicated

63 63 Generating a Concrete Workflow l Information –location of files and component Instances –State of the Grid resources l Select specific –Resources –Files –Add jobs required to form a concrete workflow that can be executed in the Grid environment –Each component in the abstract workflow is turned into an executable job

64 64 Concrete Workflow Generator Domain Knowledge Resource Information Location Information Concrete Workflow Generator Plan submitted to the grid Abstract Workflow

65 65 Pegasus l Pegasus - Planning for Execution on Grids l Planning framework l Take as input an abstract workflow –Abstract Dag in XML (DAX) l Generates concrete workflow. l Submits the workflow to a workflow executor (e.g. Condor Dagman).

66 66 Pegasus Outline –Introduction –Pegasus and How it works –Pegasus Components and Internals –Deferred Planning –Pegasus Portal

67 67

68 68 Abstract Workflow Reduction The output jobs for the Dag are all the leaf nodes i.e. f, h, i. Each job requires 2 input files and generates 2 output files. The user specifies the output location KEY The original node Input transfer node Registration node Output transfer node Node deleted by Reduction algorithm Job e Job gJob h Job d Job a Job c Job f Job i Job b

69 69 Jobs d, e, f have output files that have been found in the Replica Location Service. Additional jobs are deleted. All jobs (a, b, c, d, e, f) are removed from the DAG. KEY The original node Input transfer node Registration node Output transfer node Node deleted by Reduction algorithm Job e Job gJob h Job d Job a Job c Job f Job i Job b Optimizing from the point of view of Virtual Data

70 70 Job e Job gJob h Job d Job a Job c Job f Job i Job b adding transfer nodes for the input files for the root nodes Planner picks execution and replica locations Plans for staging data in KEY The original node Input transfer node Registration node Output transfer node Node deleted by Reduction algorithm

71 71 Staging and registering for each job that materializes data (g, h, i ). KEY The original node Input transfer node Registration node Output transfer node Node deleted by Reduction algorithm transferring the output files of the leaf job (f) to the output location Job e Job gJob h Job d Job a Job c Job f Job i Job b Staging data out and registering new derived products

72 72 KEY The original node Input transfer node Registration node Output transfer node The final, executable DAG Job gJob h Job i Job e Job gJob h Job d Job a Job c Job f Job i Job b Input DAG

73 73 Pegasus Outline –Introduction –Pegasus and How it works –Pegasus Components and Internals –Deferred Planning –Pegasus Portal

74 74 Pegasus Components

75 75 Replica Location Service l Pegasus uses the RLS to find input data LRC RLI Computation l Pegasus uses the RLS to register new data products

76 76 Pegasus Components

77 77 Transformation Catalog l Transformation Catalog keeps information about domain transformations (executables). l It keeps information like the logical name of the transformation, the resource on the which the transformation is available and the URL for the physical transformation. l The new transformation catalog also allows to define different types of transformation, Architecture, OS, OS-version and glibc version of the transformation. l Profiles which allow a user to attach Environment variables, hints, scheduler related information, job-manager configurations

78 78 Transformation Catalog l A set of standard api's are available to perform various operations on the transformation Catalog. l Various implementations are available for the transformation catalog like –Database : >Useful for large Catalogs and supports fast adds, deletes and queries. >Standard database security used. –File Format : >Useful for small number of transformations and easy editing. > isi vds::fft:1.0 /opt/bin/fft INSTALLED INTEL32::LINUX ENV::VDS_HOME=/opt/vds l Users can provide their own implementation.

79 79 Pegasus Components

80 80 Resource Information Catalog (Pool Config) l Pool Config is an XML file which contains information about various pools/sites on which DAGs may execute. l Some of the information contained in the Pool Config file is –Specifies the various job-managers that are available on the pool for the different types of schedulers. –Specifies the GridFtp storage servers associated with each pool. –Specifies the Local Replica Catalogs where data residing in the pool has to be cataloged. –Contains profiles like environment hints which are common site-wide. –Contains the working and storage directories to be used on the pool.

81 81 Resource Information Catalog Pool Config l Two Ways to construct the Pool Config File. –Monitoring and Discovery Service –Local Pool Config File (Text Based) l Client tool to generate Pool Config File –The tool gen-poolconfig is used to query the MDS and/or the local pool config file/s to generate the XML Pool Config file.

82 82 Monitoring and Discovery Service l MDS provides up-to-date Grid state information –Total and idle job queues length on a pool of resources (condor) –Total and available memory on the pool –Disk space on the pools –Number of jobs running on a job manager l Can be used for resource discovery and selection –Developing various task to resource mapping heuristics l Can be used to publish information necessary for replica selection –Publish gridftp transfer statistics –Developing replica selection components

83 83 Pegasus Components

84 84 Site Selector l Determines job to site mappings. l Selector Api provides easy pluggable implementations. l Implementations provided –Random –RoundRobin –Group –NonJavaCallout l Research on advance algorithms for site selector in progress

85 85 Pegasus Components

86 86 Transfer Tools / Workflow Executors l Various transfer tools are used for staging data between grid sites. l Define strategies to reliably and efficiently transfer data l Current implementations –Globus url copy –Transfer –Transfer2 (T2) –Stork l Workflow executors take the concrete workflow and act as a metascheduler for the Grid. l Current supported workflow executors –Condor Dagman –Gridlab GRMS

87 87 System Configuration - Properties l Properties file define and modify the behavior of Pegasus. l Properties set in the $VDS_HOME/etc/properties can be overridden by defining them either in $HOME/.chimerarc or by giving them on the command line of any executable. –eg. Gendax –Dvds.home=path to vds home…… l Some examples follow but for more details please read the sample.properties file in $VDS_HOME/etc directory. l Basic Required Properties –vds.home : This is auto set by the clients from the environment variable $VDS_HOME –vds.properties : Path to the default properties file >Default : ${vds.home}/etc/properties

88 88 Pegasus Clients in the Virtual Data System l gencdag –Converts DAX to concrete Dag l genpoolconfig –Generates a pool config file from MDS l tc-client –Queries, adds, deletes transformations from the Transformation Catalog l rls-client, rls-query-client –Queries, adds, deletes entry from the RLS l partiondax –Partitions the abstract Dax into smaller dax’s.

89 89

90 90 Concrete Planner Gencdag l The Concrete planner takes the DAX produced by Chimera and converts into a set of condor dag and submit files. l You can specify more then one execution pools. Execution will take place on the pools on which the executable exists. If the executable exists on more then one pool then the pool on which the executable will run is selected depending on the site selector used. l Output pool is the pool where you want all the output products to be transferred to. If not specified the materialized data stays on the execution pool l Authentication removes sites which are not accessible.

91 91 Pegasus Outline –Introduction –Pegasus and How it works –Pegasus Components and Internals –Deferred Planning –Pegasus Portal

92 92 Abstract To Concrete Steps

93 93 Original Pegasus configuration Simple scheduling: random or round robin using well-defined scheduling interfaces.

94 94 Deferred Planning through Partitioning A variety of partitioning algorithms can be implemented

95 95 Mega DAG is created by Pegasus and then submitted to DAGMan

96 96 Mega DAG Pegasus

97 97 Re-planning capabilities

98 98 Complex Replanning for Free (almost)

99 99 Planning & Scheduling Granularity l Partitioning –Allows to set the granularity of planning ahead l Node aggregation –Allows to combine nodes in the workflow and schedule them as one unit (minimizes the scheduling overheads) –May reduce the overheads of making scheduling and planning decisions l Related but separate concepts –Small jobs >High-level of node aggregation >Large partitions –Very dynamic system >Small partitions

100 100 Partitioned Workflow Processing l Create workflow partitions –partition the abstract workflow into smaller workflows. –create the xml partition graph that lists out the dependencies between partitions. l Create the MegaDAG (creates the dagman submit files) – transform the xml partition graph to it’s corresponding condor representation. l Submit the MegaDAG –Each job invokes Pegasus on a partition and then submits the plan generated back to condor.

101 101 Future work for Pegasus l Staging in executables on demand l Expanding the scheduling plug-ins l Investigating various partitioning approaches l Investigating reliability across partitions

102 102 Pegasus Outline –Introduction –Pegasus and How it works –Pegasus Components and Internals –Deferred Planning –Pegasus Portal

103 103 Portal l Provides proxy based login l Authentication to the sites l Configuration for files l Metadata based workflow creation l Workflow Submission and Monitoring l User Notification

104 104

105 105 Portal Demonstration

106 106 Outline l The concept of Virtual Data l The Virtual Data System and Language l Tools and Issues for Running on the Grid l Pegasus: Grid Workflow Planning l Virtual Data Applications on the Grid l Research issues

107 107 Types of Applications l Gravitational Wave Physics l High Energy Physics l Astronomy l Earthquake Science l Computational Biology

108 108 LIGO Scientific Collaboration l Continuous gravitational waves are expected to be produced by a variety of celestial objects l Only a small fraction of potential sources are known l Need to perform blind searches, scanning the regions of the sky where we have no a priori information of the presence of a source –Wide area, wide frequency searches l Search is performed for potential sources of continuous periodic waves near the Galactic Center and the galactic core l Search for binary inspirals collapsing into black holes. l The search is very compute and data intensive

109 109 Additional resources used: Grid3 iVDGL resources Caltech ISI UTB UWM WISC NCSA-Teragrid LSU Cardiff AEI SC04 Caltech Teragrid SDSC-Teragrid USC ANL-Teragrid PSC-Teragrid LSC ResourcesTeragrid ResourcesOther Resources

110 110 LIGO Acknowledgements l Patrick Brady, Scott Koranda, Duncan Brown, Stephen Fairhurst University of Wisconsin Milwaukee, USA l Stuart Anderson, Kent Blackburn, Albert Lazzarini, Hari Pulapaka, Teviet Creighton Caltech, USA l Gabriela Gonzalez, Louisiana State University l Many Others involved in the Testbed l www.ligo.caltech.edu l www.lsc-group.phys.uwm.edu/lscdatagrid/ l LIGO Laboratory operates under NSF cooperative agreement PHY-0107417

111 111 Montage l Montage (NASA and NVO) –Deliver science-grade custom mosaics on demand –Produce mosaics from a wide range of data sources (possibly in different spectra) –User-specified parameters of projection, coordinates, size, rotation and spatial sampling. Mosaic created by Pegasus based Montage from a run of the M101 galaxy images on the Teragrid.

112 112 Small Montage Workflow ~1200 nodes

113 113 Montage Acknowledgments l Bruce Berriman, John Good, Anastasia Laity, Caltech/IPAC l Joseph C. Jacob, Daniel S. Katz, JPL l http://montage.ipac. caltech.edu/ http://montage.ipac l Testbed for Montage: Condor pools at USC/ISI, UW Madison, and Teragrid resources at NCSA, Caltech, and SDSC. Montage is funded by the National Aeronautics and Space Administration's Earth Science Technology Office, Computational Technologies Project, under Cooperative Agreement Number NCC5-626 between NASA and the California Institute of Technology.

114 114 Other Applications Southern California Earthquake Center Southern California Earthquake Center (SCEC), in collaboration with the USC Information Sciences Institute, San Diego Supercomputer Center, the Incorporated Research Institutions for Seismology, and the U.S. Geological Survey, is developing the Southern California Earthquake Center Community Modeling Environment (SCEC/CME). Create fully three-dimensional (3D) simulations of fault-system dynamics. Physics-based simulations can potentially provide enormous practical benefits for assessing and mitigating earthquake risks through Seismic Hazard Analysis (SHA). The SCEC/CME system is an integrated geophysical simulation modeling framework that automates the process of selecting, configuring, and executing models of earthquake systems. Acknowledgments : Philip Maechling and Vipin Gupta University Of Southern California

115 115 Astronomy l Galaxy Morphology (National Virtual Observatory) –Investigates the dynamical state of galaxy clusters –Explores galaxy evolution inside the context of large- scale structure. –Uses galaxy morphologies as a probe of the star formation and stellar distribution history of the galaxies inside the clusters. –Data intensive computations involving hundreds of galaxies in a cluster The x-ray emission is shown in blue, and the optical mission is in red. The colored dots are located at the positions of the galaxies within the cluster; the dot color represents the value of the asymmetry index. Blue dots represent the most asymmetric galaxies and are scattered throughout the image, while orange are the most symmetric, indicative of elliptical galaxies, are concentrated more toward the center. People involved: Gurmeet Singh, Mei-Hui Su, many others

116 116 Biology Applications Tomography (NIH-funded project) l Derivation of 3D structure from a series of 2D electron microscopic projection images, l Reconstruction and detailed structural analysis –complex structures like synapses –large structures like dendritic spines. l Acquisition and generation of huge amounts of data l Large amount of state-of-the-art image processing required to segment structures from extraneous background. Dendrite structure to be rendered by Tomography Work performed by Mei-Hui Su with Mark Ellisman, Steve Peltier, Abel Lin, Thomas Molina (SDSC)

117 117 BLAST : set of sequence comparison algorithms that are used to search sequence databases for optimal local alignments to a query Lead by Veronika Nefedova (ANL) as part of the Paci Data Quest Expedition program 2 major runs were performed using Chimera and Pegasus: 1)60 genomes (4,000 sequences each), In 24 hours processed Genomes selected from DOE-sponsored sequencing projects 67 CPU-days of processing time delivered ~ 10,000 Grid jobs >200,000 BLAST executions 50 GB of data generated 2) 450 genomes processed Speedup of 5-20 times were achieved because the compute nodes we used efficiently by keeping the submission of the jobs to the compute cluster constant.

118 118 Jim Annis, Steve Kent, Vijay Sehkri, Fermilab, Michael Milligan, Yong Zhao, University of Chicago Galaxy cluster size distribution DAG Virtual Data Example: Galaxy Cluster Search Sloan Data

119 119 Cluster Search Workflow Graph and Execution Trace Workflow jobs vs time

120 120 Capone Grid Interactions Capone Condor-G schedd GridMgr CE gatekeeper gsiftp WN SE Chimera RLS Monitoring MDS GridCat MonALISA Windmill Pegasus ProdDB VDC DonQuijote ATLAS Slides courtesy of Marco Mambelli, U Chicago ATLAS Team

121 121 Capone Condor-G schedd GridMgr CE gatekeeper gsiftp WN SE Chimera RLS Monitoring MDS GridCat MonALISA Windmill Pegasus ProdDB VDC DonQuijote ATLAS “Capone” Production Executor l Reception –Job received from work distributor l Translation –Un-marshalling, ATLAS transformation l DAX generation –VDL tools generate abstract DAG l Input file retrieval from RLS catalog –Check RLS for input LFNs (retrieval of GUID, PFN) l Scheduling: CE and SE are chosen l Concrete DAG generation and submission –Pegasus creates Condor submit files –DAGMan invoked to manage remote steps Capone Condor-G schedd GridMgr CE gatekeeper gsiftp WN SE Chimera RLS Monitoring MDS GridCat MonALISA Windmill Pegasus ProdDB VDC DonQuijote

122 122 Capone Condor-G schedd GridMgr CE gatekeeper gsiftp WN SE Chimera RLS Monitoring MDS GridCat MonALISA Windmill Pegasus ProdDB VDC DonQuijote A job in Capone (2, execution) l Remote job running / status checking –Stage-in of input files, create POOL FileCatalog –Athena (ATLAS code) execution l Remote Execution Check –Verification of output files and exit codes –Recovery of metadata (GUID, MD5sum, exe attributes) l Stage Out: transfer from CE site to destination SE l Output registration –Registration of the output LFN/PFN and metadata in RLS l Finish –Job completed successfully, communicates to Windmill that jobs is ready for validation l Job status is sent to Windmill during all the execution l Windmill/DQ validate & register output in ProdDB

123 123 Ramp up ATLAS DC2 Sep 10 Mid July CPU-day

124 124 Outline l The concept of Virtual Data l The Virtual Data System and Language l Tools and approach for Running on the Grid l Pegasus: Grid Workflow Planning l Virtual Data Applications on the Grid l Research issues

125 125 Research directions in Virtual Data Grid Architecture

126 126 l Focus on data intensive science l Planning is necessary l Reaction to the environment is a must (things go wrong, resources come up) l Iterative Workflow Execution: –Workflow Planner –Workload Manager l Planning decision points –Workflow Delegation Time (eager) –Activity Scheduling Time (deferred) –Resource Availability Time (just in time) l Decision specification level l Reacting to the changing environment and recovering from failures l How does the communication takes place? Callbacks, workflow annotations etc… Planner Manager Concrete workflow Tasks Abstract workflow info Grid Resource Manager Research issues

127 127 Provenance Hyperlinks

128 128 Goals for a Dataset Model <FORM /FORM> FileSet of files Relational query or spreadsheet range XML Element Set of files with relational index Object closure New user-defined dataset type: Speculative model described in CIDR 2003 paper by Foster, Voeckler, Wilde and Zhao

129 129 Acknowledgements GriPhyN, iVDGL, and QuarkNet (in part) are supported by the National Science Foundation The Globus Alliance, PPDG, and QuarkNet are supported in part by the US Department of Energy, Office of Science; by the NASA Information Power Grid program; and by IBM

130 130 Acknowledgements l Argonne and The University of Chicago: Ian Foster, Jens Voeckler, Yong Zhao, Jin Soon Chang, Luiz Meyer (UFRJ) –www.griphyn.org/vds l The USC Information Sciences Institute: Ewa Deelman, Carl Kesselman, Gurmeet Singh, Mei-Hui Su –http://pegasus.isi.eduhttp://pegasus.isi.edu l Special thanks to Jed Dobson, fMRI Data Center, Dartmouth College

131 131 For further information l Chimera and Pegasus: –www.griphyn.org/chimerawww.griphyn.org/chimera –http://pegasus.isi.edupegasus.isi.edu l Workflow Management research group in GGF: –www.isi.edu/~deelman/wfm-rgwww.isi.edu/~deelman/wfm-rg l Demos on the SC floor –Argonne National Laboratory and University of Southern California Booths.


Download ppt "Virtual Data Management for Grid Computing Michael Wilde Argonne National Laboratory Gaurang Mehta, Karan Vahi Center for Grid Technologies."

Similar presentations


Ads by Google