Download presentation
Presentation is loading. Please wait.
Published byArline O’Brien’ Modified over 8 years ago
1
Virtual Data Management for Grid Computing Michael Wilde Argonne National Laboratory wilde@mcs.anl.gov Gaurang Mehta, Karan Vahi Center for Grid Technologies USC Information Sciences Institute gmehta,vahi @isi.edu
3
3 Outline l The concept of Virtual Data l The Virtual Data System and Language l Tools and approach for Running on the Grid l Pegasus: Grid Workflow Planning l Virtual Data Applications on the Grid l Research issues
4
4 The Virtual Data Concept Enhance scientific productivity through: l Discovery and application of datasets and programs at petabyte scale l Enabling use of a worldwide data grid as a scientific workstation Virtual Data enables this approach by creating datasets from workflow “recipes” and recording their provenance.
5
5 Virtual Data Concept l Motivated by next generation data-intensive applications –Enormous quantities of data, petabyte-scale –The need to discover, access, explore, and analyze diverse distributed data sources –The need to organize, archive, schedule, and explain scientific workflows –The need to share data products and the resources needed to produce and store them
6
6 GriPhyN – The Grid Physics Network l NSF-funded IT research project l Supports the concept of Virtual Data, where data is materialized on demand l Data can exist on some data resource and be directly accessible l Data can exist only in a form of a recipe l The GriPhyN Virtual Data System can seamlessly deliver the data to the user or application regardless of the form in which the data exists l GriPhyN targets applications in high-energy physics, gravitational-wave physics and astronomy
7
7 Virtual Data System Capabilities Producing data from transformations with uniform, precise data interface descriptions enables… l Discovery: finding and understanding datasets and transformations l Workflow: structured paradigm for organizing, locating, specifying, & producing scientific datasets –Forming new workflow –Building new workflow from existing patterns –Managing change l Planning: automated to make the Grid transparent l Audit: explanation and validation via provenance
8
8 Jim Annis, Steve Kent, Vijay Sehkri, Fermilab, Michael Milligan, Yong Zhao, University of Chicago Galaxy cluster size distribution DAG Virtual Data Example: Galaxy Cluster Search Sloan Data
9
9 mass = 200 decay = WW stability = 1 LowPt = 20 HighPt = 10000 mass = 200 decay = WW stability = 1 event = 8 mass = 200 decay = WW stability = 1 plot = 1 mass = 200 decay = WW plot = 1 mass = 200 decay = WW event = 8 mass = 200 decay = WW stability = 1 mass = 200 decay = WW stability = 3 mass = 200 decay = WW mass = 200 decay = ZZ mass = 200 decay = bb mass = 200 plot = 1 mass = 200 event = 8 Virtual Data Application: High Energy Physics Data Analysis Work and slide by Rick Cavanaugh and Dimitri Bourilkov, University of Florida
10
10 I have some subject images, what analyses are available? which can be applied to this format? Lifecycle of Virtual Data I want to apply an image registration program to thousands of objects. If the results already exist, I’ll save weeks of computation. I’ve come across some interesting data, but I need to understand the nature of the analyses applied when it was constructed before I can trust it for my purposes. I need to document the devices and methods used to measure the data, and keep track of the analysis steps applied to the data.
11
11 A Virtual Data Grid
12
12 Virtual Data Scenario simulate – t 10 … file1 file2 reformat – f fz … file1 File3,4,5 psearch – t 10 … conv – I esd – o aod file6 summarize – t 10 … file7 file8 On-demand data generation Update workflow following changes Manage workflow; psearch –t 10 –i file3 file4 file5 –o file8 summarize –t 10 –i file6 –o file7 reformat –f fz –i file2 –o file3 file4 file5 conv –l esd –o aod –i file 2 –o file6 simulate –t 10 –o file1 file2 Explain provenance, e.g. for file8:
13
13 VDL and Abstract Workflow VDL descriptions User request data file “c” Abstract Workflow
14
14 Example Workflow Reduction l Original abstract workflow l If “b” already exists (as determined by query to the RLS), the workflow can be reduced
15
15 Mapping from abstract to concrete l Query RLS, MDS, and TC, schedule computation and data movement
16
16 Workflow management in GriPhyN l Workflow Generation: how do you describe the workflow (at various levels of abstraction)? (Virtual Data Language and catalog “VDC”)) l Workflow Mapping/Refinement: how do you map an abstract workflow representation to an executable form? (“Planners”: Pegasus) l Workflow Execution: how to you reliably execute the workflow? (Condor’s DAGMan)
17
17 Executable Workflow Construction l VDL tools used to build an abstract workflow based on VDL descriptions l Planners (e.g. Pegasus) take the abstract workflow and produce an executable workflow for the Grid or other environments l Workflow executors (“enactment engines”) like Condor DAGMan execute the workflow
18
18 Terms l Abstract Workflow (DAX) –Expressed in terms of logical entities –Specifies all logical files required to generate the desired data product from scratch –Dependencies between the jobs –Analogous to build style dag l Concrete Workflow –Expressed in terms of physical entities –Specifies the location of the data and executables –Analogous to a make style dag
19
19 Outline l The concept of Virtual Data l The Virtual Data System and Language l Tools and Approach for Running on the Grid l Pegasus: Grid Workflow Planning l Virtual Data Applications on the Grid l Research issues
20
20 VDL: Virtual Data Language Describes Data Transformations l Transformation - “TR” –Abstract template of program invocation –Similar to "function definition" l Derivation – “DV” –“Function call” to a Transformation –Store past and future: >A record of how data products were generated >A recipe of how data products can be generated l Invocation –Record of a Derivation execution
21
21 Example Transformation TR t1( out a2, in a1, none pa = "500", none env = "100000" ) { argument = "-p "${pa}; argument = "-f "${a1}; argument = "-x –y"; argument stdout = ${a2}; profile env.MAXMEM = ${env}; } $a1 $a2 t1
22
22 Example Derivations DV d1->t1 ( env="20000", pa="600", a2=@{out:run1.exp15.T1932.summary}, a1=@{in:run1.exp15.T1932.raw}, ); DV d2->t1 ( a1=@{in:run1.exp16.T1918.raw}, a2=@{out.run1.exp16.T1918.summary} );
23
23 Workflow from File Dependencies TR tr1(in a1, out a2) { argument stdin = ${a1}; argument stdout = ${a2}; } TR tr2(in a1, out a2) { argument stdin = ${a1}; argument stdout = ${a2}; } DV x1->tr1(a1=@{in:file1}, a2=@{out:file2}); DV x2->tr2(a1=@{in:file2}, a2=@{out:file3}); file1 file2 file3 x1 x2
24
24 Example Workflow l Complex structure –Fan-in –Fan-out –"left" and "right" can run in parallel l Uses input file –Register with RLS l Complex file dependencies –Glues workflow findrange analyze preprocess
25
25 Workflow step "preprocess" l TR preprocess turns f.a into f.b1 and f.b2 TR preprocess( output b[], input a ) { argument = "-a top"; argument = " –i "${input:a}; argument = " –o " ${output:b}; } l Makes use of the "list" feature of VDL –Generates 0..N output files. –Number file files depend on the caller.
26
26 Workflow step "findrange" l Turns two inputs into one output TR findrange( output b, input a1, input a2, none name="findrange", none p="0.0" ) { argument = "-a "${name}; argument = " –i " ${a1} " " ${a2}; argument = " –o " ${b}; argument = " –p " ${p}; } l Uses the default argument feature
27
27 Can also use list[] parameters TR findrange( output b, input a[], none name="findrange", none p="0.0" ) { argument = "-a "${name}; argument = " –i " ${" "|a}; argument = " –o " ${b}; argument = " –p " ${p}; }
28
28 Workflow step "analyze" l Combines intermediary results TR analyze( output b, input a[] ) { argument = "-a bottom"; argument = " –i " ${a}; argument = " –o " ${b}; }
29
29 Complete VDL workflow l Generate appropriate derivations DV top->preprocess( b=[ @{out:"f.b1"}, @{ out:"f.b2"} ], a=@{in:"f.a"} ); DV left->findrange( b=@{out:"f.c1"}, a2=@{in:"f.b2"}, a1=@{in:"f.b1"}, name="left", p="0.5" ); DV right->findrange( b=@{out:"f.c2"}, a2=@{in:"f.b2"}, a1=@{in:"f.b1"}, name="right" ); DV bottom->analyze( b=@{out:"f.d"}, a=[ @{in:"f.c1"}, @{in:"f.c2"} );
30
30 Compound Transformations l Using compound TR –Permits composition of complex TRs from basic ones –Calls are independent >unless linked through LFN –A Call is effectively an anonymous derivation >Late instantiation at workflow generation time –Permits bundling of repetitive workflows –Model: Function calls nested within a function definition
31
31 Compound Transformations (cont) l TR diamond bundles black-diamonds TR diamond( out fd, io fc1, io fc2, io fb1, io fb2, in fa, p1, p2 ) { call preprocess( a=${fa}, b=[ ${out:fb1}, ${out:fb2} ] ); call findrange( a1=${in:fb1}, a2=${in:fb2}, name="LEFT", p=${p1}, b=${out:fc1} ); call findrange( a1=${in:fb1}, a2=${in:fb2}, name="RIGHT", p=${p2}, b=${out:fc2} ); call analyze( a=[ ${in:fc1}, ${in:fc2} ], b=${fd} ); }
32
32 Compound Transformations (cont) l Multiple DVs allow easy generator scripts: DV d1->diamond( fd=@{out:"f.00005"}, fc1=@{io:"f.00004"}, fc2=@{io:"f.00003"}, fb1=@{io:"f.00002"}, fb2=@{io:"f.00001"}, fa=@{io:"f.00000"}, p2="100", p1="0" ); DV d2->diamond( fd=@{out:"f.0000B"}, fc1=@{io:"f.0000A"}, fc2=@{io:"f.00009"}, fb1=@{io:"f.00008"}, fb2=@{io:"f.00007"}, fa=@{io:"f.00006"}, p2="141.42135623731", p1="0" );... DV d70->diamond( fd=@{out:"f.001A3"}, fc1=@{io:"f.001A2"}, fc2=@{io:"f.001A1"}, fb1=@{io:"f.001A0"}, fb2=@{io:"f.0019F"}, fa=@{io:"f.0019E"}, p2="800", p1="18" );
33
33 Functional MRI Analysis
34
34 fMRI Example: AIR Tools TR air::align_warp( in reg_img, in reg_hdr, in sub_img, in sub_hdr, m, out warp ) { argument = ${reg_img}; argument = ${sub_img}; argument = ${warp}; argument = "-m " ${m}; argument = "-q"; } TR air::reslice( in warp, sliced, out sliced_img, out sliced_hdr ) { argument = ${warp}; argument = ${sliced}; }
35
35 fMRI Example: AIR Tools TR air::warp_n_slice( in reg_img, in reg_hdr, in sub_img, in sub_hdr, m = "12", io warp, sliced, out sliced_img, out sliced_hdr ) { call air::align_warp( reg_img=${reg_img}, reg_hdr=${reg_hdr}, sub_img=${sub_img}, sub_hdr=${sub_hdr}, m=${m}, warp = ${out:warp} ); call air::reslice( warp=${in:warp}, sliced=${sliced}, sliced_img=${sliced_img}, sliced_hdr=${sliced_hdr} ); } TR air::softmean( in sliced_img[], in sliced_hdr[], arg1 = "y", arg2 = "null", atlas, out atlas_img, out atlas_hdr ) { argument = ${atlas}; argument = ${arg1} " " ${arg2}; argument = ${sliced_img}; }
36
36 fMRI Example: AIR Tools DV air::i3472_3->air::warp_n_slice( reg_hdr = @{in:"3472-3_anonymized.hdr"}, reg_img = @{in:"3472-3_anonymized.img"}, sub_hdr = @{in:"3472-3_anonymized.hdr"}, sub_img = @{in:"3472-3_anonymized.img"}, warp = @{io:"3472-3_anonymized.warp"}, sliced = "3472-3_anonymized.sliced", sliced_hdr = @{out:"3472-3_anonymized.sliced.hdr"}, sliced_img = @{out:"3472-3_anonymized.sliced.img"} ); … DV air::i3472_6->air::warp_n_slice( reg_hdr = @{in:"3472-3_anonymized.hdr"}, reg_img = @{in:"3472-3_anonymized.img"}, sub_hdr = @{in:"3472-6_anonymized.hdr"}, sub_img = @{in:"3472-6_anonymized.img"}, warp = @{io:"3472-6_anonymized.warp"}, sliced = "3472-6_anonymized.sliced", sliced_hdr = @{out:"3472-6_anonymized.sliced.hdr"}, sliced_img = @{out:"3472-6_anonymized.sliced.img"} );
37
37 fMRI Example: AIR Tools DV air::a3472_3->air::softmean( sliced_img = [ @{in:"3472-3_anonymized.sliced.img"}, @{in:"3472-4_anonymized.sliced.img"}, @{in:"3472-5_anonymized.sliced.img"}, @{in:"3472-6_anonymized.sliced.img"} ], sliced_hdr = [ @{in:"3472-3_anonymized.sliced.hdr"}, @{in:"3472-4_anonymized.sliced.hdr"}, @{in:"3472-5_anonymized.sliced.hdr"}, @{in:"3472-6_anonymized.sliced.hdr"} ], atlas = "atlas", atlas_img = @{out:"atlas.img"}, atlas_hdr = @{out:"atlas.hdr"} );
38
38 Query Examples Which TRs can process a "subject" image? l Q: xsearchvdc -q tr_meta dataType subject_image input l A: fMRIDC.AIR::align_warp Which TRs can create an "ATLAS"? l Q: xsearchvdc -q tr_meta dataType atlas_image output l A: fMRIDC.AIR::softmean Which TRs have output parameter "warp" and a parameter "options" l Q: xsearchvdc -q tr_para warp output options l A: fMRIDC.AIR::align_warp Which DVs call TR "slicer"? l Q: xsearchvdc -q tr_dv slicer l A: fMRIDC.FSL::s3472_3_x->fMRIDC.FSL::slicer fMRIDC.FSL::s3472_3_y->fMRIDC.FSL::slicer fMRIDC.FSL::s3472_3_z->fMRIDC.FSL::slicer
39
39 Query Examples List anonymized subject-images for young subjects. This query searches for files based on their metadata. l Q: xsearchvdc -q lfn_meta dataType subject_image privacy anonymized subjectType young l A: 3472-4_anonymized.img For a specific patient image, 3472-3, show all DVs and files that were derived from this image, directly or indirectly. l Q: xsearchvdc -q lfn_tree 3472-3_anonymized.img l A: 3472-3_anonymized.img 3472-3_anonymized.sliced.hdr atlas.hdr atlas.img … atlas_z.jpg 3472-3_anonymized.sliced.img
40
40 Virtual Provenance: list of derivations and files <job id="ID000001" namespace="Quarknet.HEPSRCH" name="ECalEnergySum" level="5" dv-namespace="Quarknet.HEPSRCH" dv-name="run1aesum">... <job id="ID000014" namespace="Quarknet.HEPSRCH" name="ReconTotalEnergy" level="3"… …... (excerpted for display)
41
41 Virtual Provenance in XML: control flow graph............ … (excerpted for display…)
42
42 Invocation Provenance Completion status and resource usage Attributes of executable transformation Attributes of input and output files
43
43 Future: Provenance & Workflow for Web Services
44
44
45
45 Outline l The concept of Virtual Data l The Virtual Data System and Language l Tools and Issues for Running on the Grid l Pegasus: Grid Workflow Planning l Virtual Data Applications on the Grid l Research issues
46
46 Outline l Introduction and the GriPhyN project l Chimera l Overview of Grid concepts and tools l Pegasus l Applications using Chimera and Pegasus l Research issues l Exercises
47
47 Motivation for Grids: How do we solve problems? l Communities committed to common goals –Virtual organizations l Teams with heterogeneous members & capabilities l Distributed geographically and politically –No location/organization possesses all required skills and resources l Adapt as a function of the situation –Adjust membership, reallocate responsibilities, renegotiate resources
48
48 The Grid Vision “ Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations” –On-demand, ubiquitous access to computing, data, and services –New capabilities constructed dynamically and transparently from distributed services
49
49 The Grid l Emerging computational, networking, and storage infrastructure –Pervasive, uniform, and reliable access to remote data, computational, sensor, and human resources l Enable new approaches to applications and problem solving –Remote resources the rule, not the exception l Challenges –Heterogeneous components –Component failures common –Different administrative domains –Local policies for security and resource usage
50
50 Data Grids for High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPSFrance Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents www.griphyn.org www.ppdg.net www.eu-datagrid.org Slide courtesy Harvey Newman, CalTech
51
51 Grid Applications l Increasing in the level of complexity l Use of individual application components l Reuse of individual intermediate data products l Description of Data Products using Metadata Attributes l Execution environment is complex and very dynamic –Resources come and go –Data is replicated –Components can be found at various locations or staged in on demand l Separation between –the application description –the actual execution description
52
52 Grid3 – The Laboratory Supported by the National Science Foundation and the Department of Energy.
53
53 Globus Monitoring and Discovery Service (MDS) The MDS architecture is a flexible hierarchy. There can be several levels of GIISes; any GRIS can register with any GIIS; and any GIIS can register with another, making this approach modular and extensible.
54
54 GridFTP l Data-intensive grid applications transfer and replicate large data sets (terabytes, petabytes) l GridFTP Features: –Third party (client mediated) transfer –Parallel transfers –Striped transfers –TCP buffer optimizations –Grid security l Important feature is separation of control and data channel
55
55 A Replica Location Service l A Replica Location Service (RLS) is a distributed registry service that records the locations of data copies and allows discovery of replicas l Maintains mappings between logical identifiers and target names –Physical targets: Map to exact locations of replicated data –Logical targets: Map to another layer of logical names, allowing storage systems to move data without informing the RLS l RLS was designed and implemented in a collaboration between the Globus project and the DataGrid project
56
56 LRC RLI LRC Replica Location Indexes Local Replica Catalogs LRCs contain consistent information about logical-to- target mappings on a site RLIs nodes aggregate information about LRCs Soft state updates from LRCs to RLIs: relaxed consistency of index information, used to rebuild index after failures Arbitrary levels of RLI hierarchy
57
57 Condor’s DAGMan l Developed at UW Madison (Livny) l Executes a concrete workflow l Makes sure the dependencies are followed l Executes the jobs specified in the workflow –Execution –Data movement –Catalog updates l Provides a “rescue DAG” in case of failure
58
58 Outline l The concept of Virtual Data l The Virtual Data System and Language l Tools and Approach for Running on the Grid l Pegasus: Grid Workflow Planning l Virtual Data Applications on the Grid l Research issues
59
59 Pegasus Outline –Introduction –Pegasus and How it works –Pegasus Components and Internals –Deferred Planning –Pegasus Portal
60
60 Introduction l Scientific applications in various domains are compute and data intensive. l Not feasible to run on a single resource (limits scalability). l Scientific application can be represented as a workflow. l Use of individual application components. l Reuse data if possible (Virtual Data).
61
61 Abstract Workflow Generation Concrete Workflow Generation
62
62 Generating an Abstract Workflow l Available Information –Specification of component capabilities –Ability to generate the desired data products Select and configure application components to form an abstract workflow –assign input files that exist or that can be generated by other application components. –specify the order in which the components must be executed –components and files are referred to by their logical names >Logical transformation name >Logical file name >Both transformations and data can be replicated
63
63 Generating a Concrete Workflow l Information –location of files and component Instances –State of the Grid resources l Select specific –Resources –Files –Add jobs required to form a concrete workflow that can be executed in the Grid environment –Each component in the abstract workflow is turned into an executable job
64
64 Concrete Workflow Generator Domain Knowledge Resource Information Location Information Concrete Workflow Generator Plan submitted to the grid Abstract Workflow
65
65 Pegasus l Pegasus - Planning for Execution on Grids l Planning framework l Take as input an abstract workflow –Abstract Dag in XML (DAX) l Generates concrete workflow. l Submits the workflow to a workflow executor (e.g. Condor Dagman).
66
66 Pegasus Outline –Introduction –Pegasus and How it works –Pegasus Components and Internals –Deferred Planning –Pegasus Portal
67
67
68
68 Abstract Workflow Reduction The output jobs for the Dag are all the leaf nodes i.e. f, h, i. Each job requires 2 input files and generates 2 output files. The user specifies the output location KEY The original node Input transfer node Registration node Output transfer node Node deleted by Reduction algorithm Job e Job gJob h Job d Job a Job c Job f Job i Job b
69
69 Jobs d, e, f have output files that have been found in the Replica Location Service. Additional jobs are deleted. All jobs (a, b, c, d, e, f) are removed from the DAG. KEY The original node Input transfer node Registration node Output transfer node Node deleted by Reduction algorithm Job e Job gJob h Job d Job a Job c Job f Job i Job b Optimizing from the point of view of Virtual Data
70
70 Job e Job gJob h Job d Job a Job c Job f Job i Job b adding transfer nodes for the input files for the root nodes Planner picks execution and replica locations Plans for staging data in KEY The original node Input transfer node Registration node Output transfer node Node deleted by Reduction algorithm
71
71 Staging and registering for each job that materializes data (g, h, i ). KEY The original node Input transfer node Registration node Output transfer node Node deleted by Reduction algorithm transferring the output files of the leaf job (f) to the output location Job e Job gJob h Job d Job a Job c Job f Job i Job b Staging data out and registering new derived products
72
72 KEY The original node Input transfer node Registration node Output transfer node The final, executable DAG Job gJob h Job i Job e Job gJob h Job d Job a Job c Job f Job i Job b Input DAG
73
73 Pegasus Outline –Introduction –Pegasus and How it works –Pegasus Components and Internals –Deferred Planning –Pegasus Portal
74
74 Pegasus Components
75
75 Replica Location Service l Pegasus uses the RLS to find input data LRC RLI Computation l Pegasus uses the RLS to register new data products
76
76 Pegasus Components
77
77 Transformation Catalog l Transformation Catalog keeps information about domain transformations (executables). l It keeps information like the logical name of the transformation, the resource on the which the transformation is available and the URL for the physical transformation. l The new transformation catalog also allows to define different types of transformation, Architecture, OS, OS-version and glibc version of the transformation. l Profiles which allow a user to attach Environment variables, hints, scheduler related information, job-manager configurations
78
78 Transformation Catalog l A set of standard api's are available to perform various operations on the transformation Catalog. l Various implementations are available for the transformation catalog like –Database : >Useful for large Catalogs and supports fast adds, deletes and queries. >Standard database security used. –File Format : >Useful for small number of transformations and easy editing. > isi vds::fft:1.0 /opt/bin/fft INSTALLED INTEL32::LINUX ENV::VDS_HOME=/opt/vds l Users can provide their own implementation.
79
79 Pegasus Components
80
80 Resource Information Catalog (Pool Config) l Pool Config is an XML file which contains information about various pools/sites on which DAGs may execute. l Some of the information contained in the Pool Config file is –Specifies the various job-managers that are available on the pool for the different types of schedulers. –Specifies the GridFtp storage servers associated with each pool. –Specifies the Local Replica Catalogs where data residing in the pool has to be cataloged. –Contains profiles like environment hints which are common site-wide. –Contains the working and storage directories to be used on the pool.
81
81 Resource Information Catalog Pool Config l Two Ways to construct the Pool Config File. –Monitoring and Discovery Service –Local Pool Config File (Text Based) l Client tool to generate Pool Config File –The tool gen-poolconfig is used to query the MDS and/or the local pool config file/s to generate the XML Pool Config file.
82
82 Monitoring and Discovery Service l MDS provides up-to-date Grid state information –Total and idle job queues length on a pool of resources (condor) –Total and available memory on the pool –Disk space on the pools –Number of jobs running on a job manager l Can be used for resource discovery and selection –Developing various task to resource mapping heuristics l Can be used to publish information necessary for replica selection –Publish gridftp transfer statistics –Developing replica selection components
83
83 Pegasus Components
84
84 Site Selector l Determines job to site mappings. l Selector Api provides easy pluggable implementations. l Implementations provided –Random –RoundRobin –Group –NonJavaCallout l Research on advance algorithms for site selector in progress
85
85 Pegasus Components
86
86 Transfer Tools / Workflow Executors l Various transfer tools are used for staging data between grid sites. l Define strategies to reliably and efficiently transfer data l Current implementations –Globus url copy –Transfer –Transfer2 (T2) –Stork l Workflow executors take the concrete workflow and act as a metascheduler for the Grid. l Current supported workflow executors –Condor Dagman –Gridlab GRMS
87
87 System Configuration - Properties l Properties file define and modify the behavior of Pegasus. l Properties set in the $VDS_HOME/etc/properties can be overridden by defining them either in $HOME/.chimerarc or by giving them on the command line of any executable. –eg. Gendax –Dvds.home=path to vds home…… l Some examples follow but for more details please read the sample.properties file in $VDS_HOME/etc directory. l Basic Required Properties –vds.home : This is auto set by the clients from the environment variable $VDS_HOME –vds.properties : Path to the default properties file >Default : ${vds.home}/etc/properties
88
88 Pegasus Clients in the Virtual Data System l gencdag –Converts DAX to concrete Dag l genpoolconfig –Generates a pool config file from MDS l tc-client –Queries, adds, deletes transformations from the Transformation Catalog l rls-client, rls-query-client –Queries, adds, deletes entry from the RLS l partiondax –Partitions the abstract Dax into smaller dax’s.
89
89
90
90 Concrete Planner Gencdag l The Concrete planner takes the DAX produced by Chimera and converts into a set of condor dag and submit files. l You can specify more then one execution pools. Execution will take place on the pools on which the executable exists. If the executable exists on more then one pool then the pool on which the executable will run is selected depending on the site selector used. l Output pool is the pool where you want all the output products to be transferred to. If not specified the materialized data stays on the execution pool l Authentication removes sites which are not accessible.
91
91 Pegasus Outline –Introduction –Pegasus and How it works –Pegasus Components and Internals –Deferred Planning –Pegasus Portal
92
92 Abstract To Concrete Steps
93
93 Original Pegasus configuration Simple scheduling: random or round robin using well-defined scheduling interfaces.
94
94 Deferred Planning through Partitioning A variety of partitioning algorithms can be implemented
95
95 Mega DAG is created by Pegasus and then submitted to DAGMan
96
96 Mega DAG Pegasus
97
97 Re-planning capabilities
98
98 Complex Replanning for Free (almost)
99
99 Planning & Scheduling Granularity l Partitioning –Allows to set the granularity of planning ahead l Node aggregation –Allows to combine nodes in the workflow and schedule them as one unit (minimizes the scheduling overheads) –May reduce the overheads of making scheduling and planning decisions l Related but separate concepts –Small jobs >High-level of node aggregation >Large partitions –Very dynamic system >Small partitions
100
100 Partitioned Workflow Processing l Create workflow partitions –partition the abstract workflow into smaller workflows. –create the xml partition graph that lists out the dependencies between partitions. l Create the MegaDAG (creates the dagman submit files) – transform the xml partition graph to it’s corresponding condor representation. l Submit the MegaDAG –Each job invokes Pegasus on a partition and then submits the plan generated back to condor.
101
101 Future work for Pegasus l Staging in executables on demand l Expanding the scheduling plug-ins l Investigating various partitioning approaches l Investigating reliability across partitions
102
102 Pegasus Outline –Introduction –Pegasus and How it works –Pegasus Components and Internals –Deferred Planning –Pegasus Portal
103
103 Portal l Provides proxy based login l Authentication to the sites l Configuration for files l Metadata based workflow creation l Workflow Submission and Monitoring l User Notification
104
104
105
105 Portal Demonstration
106
106 Outline l The concept of Virtual Data l The Virtual Data System and Language l Tools and Issues for Running on the Grid l Pegasus: Grid Workflow Planning l Virtual Data Applications on the Grid l Research issues
107
107 Types of Applications l Gravitational Wave Physics l High Energy Physics l Astronomy l Earthquake Science l Computational Biology
108
108 LIGO Scientific Collaboration l Continuous gravitational waves are expected to be produced by a variety of celestial objects l Only a small fraction of potential sources are known l Need to perform blind searches, scanning the regions of the sky where we have no a priori information of the presence of a source –Wide area, wide frequency searches l Search is performed for potential sources of continuous periodic waves near the Galactic Center and the galactic core l Search for binary inspirals collapsing into black holes. l The search is very compute and data intensive
109
109 Additional resources used: Grid3 iVDGL resources Caltech ISI UTB UWM WISC NCSA-Teragrid LSU Cardiff AEI SC04 Caltech Teragrid SDSC-Teragrid USC ANL-Teragrid PSC-Teragrid LSC ResourcesTeragrid ResourcesOther Resources
110
110 LIGO Acknowledgements l Patrick Brady, Scott Koranda, Duncan Brown, Stephen Fairhurst University of Wisconsin Milwaukee, USA l Stuart Anderson, Kent Blackburn, Albert Lazzarini, Hari Pulapaka, Teviet Creighton Caltech, USA l Gabriela Gonzalez, Louisiana State University l Many Others involved in the Testbed l www.ligo.caltech.edu l www.lsc-group.phys.uwm.edu/lscdatagrid/ l LIGO Laboratory operates under NSF cooperative agreement PHY-0107417
111
111 Montage l Montage (NASA and NVO) –Deliver science-grade custom mosaics on demand –Produce mosaics from a wide range of data sources (possibly in different spectra) –User-specified parameters of projection, coordinates, size, rotation and spatial sampling. Mosaic created by Pegasus based Montage from a run of the M101 galaxy images on the Teragrid.
112
112 Small Montage Workflow ~1200 nodes
113
113 Montage Acknowledgments l Bruce Berriman, John Good, Anastasia Laity, Caltech/IPAC l Joseph C. Jacob, Daniel S. Katz, JPL l http://montage.ipac. caltech.edu/ http://montage.ipac l Testbed for Montage: Condor pools at USC/ISI, UW Madison, and Teragrid resources at NCSA, Caltech, and SDSC. Montage is funded by the National Aeronautics and Space Administration's Earth Science Technology Office, Computational Technologies Project, under Cooperative Agreement Number NCC5-626 between NASA and the California Institute of Technology.
114
114 Other Applications Southern California Earthquake Center Southern California Earthquake Center (SCEC), in collaboration with the USC Information Sciences Institute, San Diego Supercomputer Center, the Incorporated Research Institutions for Seismology, and the U.S. Geological Survey, is developing the Southern California Earthquake Center Community Modeling Environment (SCEC/CME). Create fully three-dimensional (3D) simulations of fault-system dynamics. Physics-based simulations can potentially provide enormous practical benefits for assessing and mitigating earthquake risks through Seismic Hazard Analysis (SHA). The SCEC/CME system is an integrated geophysical simulation modeling framework that automates the process of selecting, configuring, and executing models of earthquake systems. Acknowledgments : Philip Maechling and Vipin Gupta University Of Southern California
115
115 Astronomy l Galaxy Morphology (National Virtual Observatory) –Investigates the dynamical state of galaxy clusters –Explores galaxy evolution inside the context of large- scale structure. –Uses galaxy morphologies as a probe of the star formation and stellar distribution history of the galaxies inside the clusters. –Data intensive computations involving hundreds of galaxies in a cluster The x-ray emission is shown in blue, and the optical mission is in red. The colored dots are located at the positions of the galaxies within the cluster; the dot color represents the value of the asymmetry index. Blue dots represent the most asymmetric galaxies and are scattered throughout the image, while orange are the most symmetric, indicative of elliptical galaxies, are concentrated more toward the center. People involved: Gurmeet Singh, Mei-Hui Su, many others
116
116 Biology Applications Tomography (NIH-funded project) l Derivation of 3D structure from a series of 2D electron microscopic projection images, l Reconstruction and detailed structural analysis –complex structures like synapses –large structures like dendritic spines. l Acquisition and generation of huge amounts of data l Large amount of state-of-the-art image processing required to segment structures from extraneous background. Dendrite structure to be rendered by Tomography Work performed by Mei-Hui Su with Mark Ellisman, Steve Peltier, Abel Lin, Thomas Molina (SDSC)
117
117 BLAST : set of sequence comparison algorithms that are used to search sequence databases for optimal local alignments to a query Lead by Veronika Nefedova (ANL) as part of the Paci Data Quest Expedition program 2 major runs were performed using Chimera and Pegasus: 1)60 genomes (4,000 sequences each), In 24 hours processed Genomes selected from DOE-sponsored sequencing projects 67 CPU-days of processing time delivered ~ 10,000 Grid jobs >200,000 BLAST executions 50 GB of data generated 2) 450 genomes processed Speedup of 5-20 times were achieved because the compute nodes we used efficiently by keeping the submission of the jobs to the compute cluster constant.
118
118 Jim Annis, Steve Kent, Vijay Sehkri, Fermilab, Michael Milligan, Yong Zhao, University of Chicago Galaxy cluster size distribution DAG Virtual Data Example: Galaxy Cluster Search Sloan Data
119
119 Cluster Search Workflow Graph and Execution Trace Workflow jobs vs time
120
120 Capone Grid Interactions Capone Condor-G schedd GridMgr CE gatekeeper gsiftp WN SE Chimera RLS Monitoring MDS GridCat MonALISA Windmill Pegasus ProdDB VDC DonQuijote ATLAS Slides courtesy of Marco Mambelli, U Chicago ATLAS Team
121
121 Capone Condor-G schedd GridMgr CE gatekeeper gsiftp WN SE Chimera RLS Monitoring MDS GridCat MonALISA Windmill Pegasus ProdDB VDC DonQuijote ATLAS “Capone” Production Executor l Reception –Job received from work distributor l Translation –Un-marshalling, ATLAS transformation l DAX generation –VDL tools generate abstract DAG l Input file retrieval from RLS catalog –Check RLS for input LFNs (retrieval of GUID, PFN) l Scheduling: CE and SE are chosen l Concrete DAG generation and submission –Pegasus creates Condor submit files –DAGMan invoked to manage remote steps Capone Condor-G schedd GridMgr CE gatekeeper gsiftp WN SE Chimera RLS Monitoring MDS GridCat MonALISA Windmill Pegasus ProdDB VDC DonQuijote
122
122 Capone Condor-G schedd GridMgr CE gatekeeper gsiftp WN SE Chimera RLS Monitoring MDS GridCat MonALISA Windmill Pegasus ProdDB VDC DonQuijote A job in Capone (2, execution) l Remote job running / status checking –Stage-in of input files, create POOL FileCatalog –Athena (ATLAS code) execution l Remote Execution Check –Verification of output files and exit codes –Recovery of metadata (GUID, MD5sum, exe attributes) l Stage Out: transfer from CE site to destination SE l Output registration –Registration of the output LFN/PFN and metadata in RLS l Finish –Job completed successfully, communicates to Windmill that jobs is ready for validation l Job status is sent to Windmill during all the execution l Windmill/DQ validate & register output in ProdDB
123
123 Ramp up ATLAS DC2 Sep 10 Mid July CPU-day
124
124 Outline l The concept of Virtual Data l The Virtual Data System and Language l Tools and approach for Running on the Grid l Pegasus: Grid Workflow Planning l Virtual Data Applications on the Grid l Research issues
125
125 Research directions in Virtual Data Grid Architecture
126
126 l Focus on data intensive science l Planning is necessary l Reaction to the environment is a must (things go wrong, resources come up) l Iterative Workflow Execution: –Workflow Planner –Workload Manager l Planning decision points –Workflow Delegation Time (eager) –Activity Scheduling Time (deferred) –Resource Availability Time (just in time) l Decision specification level l Reacting to the changing environment and recovering from failures l How does the communication takes place? Callbacks, workflow annotations etc… Planner Manager Concrete workflow Tasks Abstract workflow info Grid Resource Manager Research issues
127
127 Provenance Hyperlinks
128
128 Goals for a Dataset Model <FORM /FORM> FileSet of files Relational query or spreadsheet range XML Element Set of files with relational index Object closure New user-defined dataset type: Speculative model described in CIDR 2003 paper by Foster, Voeckler, Wilde and Zhao
129
129 Acknowledgements GriPhyN, iVDGL, and QuarkNet (in part) are supported by the National Science Foundation The Globus Alliance, PPDG, and QuarkNet are supported in part by the US Department of Energy, Office of Science; by the NASA Information Power Grid program; and by IBM
130
130 Acknowledgements l Argonne and The University of Chicago: Ian Foster, Jens Voeckler, Yong Zhao, Jin Soon Chang, Luiz Meyer (UFRJ) –www.griphyn.org/vds l The USC Information Sciences Institute: Ewa Deelman, Carl Kesselman, Gurmeet Singh, Mei-Hui Su –http://pegasus.isi.eduhttp://pegasus.isi.edu l Special thanks to Jed Dobson, fMRI Data Center, Dartmouth College
131
131 For further information l Chimera and Pegasus: –www.griphyn.org/chimerawww.griphyn.org/chimera –http://pegasus.isi.edupegasus.isi.edu l Workflow Management research group in GGF: –www.isi.edu/~deelman/wfm-rgwww.isi.edu/~deelman/wfm-rg l Demos on the SC floor –Argonne National Laboratory and University of Southern California Booths.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.