Virtual Data Tools Status Update ATLAS Grid Software Meeting BNL, 6 May 2002 Mike Wilde Argonne National Laboratory An update on work by Jens Voeckler,

Virtual Data Tools Status Update ATLAS Grid Software Meeting BNL, 6 May 2002 Mike Wilde Argonne National Laboratory An update on work by Jens Voeckler, Yong Zhao, Gaurang Mehta, and many others.

2 l Data suppliers publish data to the Grid l Users request raw or derived data from Grid, without needing to know –Where data is located –Whether data is stored or computed on demand l User and applications can easily determine –What it will cost to obtain data –Quality of derived data l Virtual Data Grid serves requests efficiently, subject to global and local policy constraints The Virtual Data Model

3 pythia_input pythia.exe cmsim_input cmsim.exe writeHits writeDigis begin v /usr/local/demo/scripts/cmkin_input.csh file i ntpl_file_path file i template_file file i num_events stdout cmkin_param_file end begin v /usr/local/demo/binaries/kine_make_ntpl_pyt_cms121.exe pre cms_env_var stdin cmkin_param_file stdout cmkin_log file o ntpl_file end begin v /usr/local/demo/scripts/cmsim_input.csh file i ntpl_file file i fz_file_path file i hbook_file_path file i num_trigs stdout cmsim_param_file end begin v /usr/local/demo/binaries/cms121.exe condor copy_to_spool=false condor getenv=true stdin cmsim_param_file stdout cmsim_log file o fz_file file o hbook_file end begin v /usr/local/demo/binaries/writeHits.sh condor getenv=true pre orca_hits file i fz_file file i detinput file i condor_writeHits_log file i oo_fd_boot file i datasetname stdout writeHits_log file o hits_db end begin v /usr/local/demo/binaries/writeDigis.sh pre orca_digis file i hits_db file i oo_fd_boot file i carf_input_dataset_name file i carf_output_dataset_name file i carf_input_owner file i carf_output_owner file i condor_writeDigis_log stdout writeDigis_log file o digis_db end CMS Pipeline in VDL

4 Virtual Data for Real Science: A Prototype Virtual Data Catalog Virtual Data Catalog (PostgreSQL) Local File Storage Virtual Data Language VDL Interpreter (VDLI) GSI Job Execution Site U of Chicago GridFTP Client Globus GRAM CondorPool Job Execution Site U of Wisconsin GridFTP Client Globus GRAM CondorPool Job Execution Site U of Florida GridFTP Client Globus GRAM CondorPool Job Sumission Sites ANL, SC, … Condor-G Agent Globus Client GridFTP Server Grid testbed Simulate Physics Simulate CMS Detector Response Copy flat-file to OODBMS Simulate Digitization of Electronic Signals Production DAG of Simulated CMS Data: Architecture of the System:

5 Cluster-finding Data Pipeline catalog cluster 5 4 core brg field tsObj 3 2 1 brg field tsObj 2 1 brg field tsObj 2 1 brg field tsObj 2 1 core 3

6 Virtual Data Tools l Virtual Data API –A Java class hierarchy to represent transformations and derivations l Virtual Data Language –Textual for illustrative examples –XML for machine-to-machine interfaces l Virtual Data Database –Makes the objects of a virtual data definition persistent l Virtual Data Service –Provides an OGSA interface to persistent objects

7 Languages l VDLt – textual version –mainy for documentation for now –May eventually implement a ytranslator –Can dump data structures in this representation l VDLx – XML version – app-to-VDC interchange –Useful for bulk data entry – catalog import-export l aDAGx – XML version of abstract DAG l cDAG – actual DAGman DAG

8 Components and Interfaces l Java API –Manage Catalog objects (tr,dv, args…) –Create / Locate / Update / Delete –Same API at client and within server –Can embed Java classes in an App for now l Virtual Data Catalog Server –Web (eventually OGSA) –SOAP interface mirrors Java API operations l XML processor l Database – managed by VDCS

9 System Architecture Client App Virtual Data Catalog Service Virtual Data Catalog Objects Virtual Data Catalog Database Client API

10 Initial Release Architecture Client App Virtual Data Catalog Objects Virtual Data Catalog Database Client API

11 Applicaton interfaces l Invoke Java client API (to make OGSA calls) l Invoke Java server API (for now, embed VDC processing directly in App l Make OGSA calls directly l Formulate XML (VDLx) to load the catalog or request derivations

12 Example VDL-Text TR t1( output a2, input a1, none env="100000", none pa="500" ) { app = "/usr/bin/app3"; argument parg = "-p "${none:pa}; argument farg = "-f "${input:a1}; argument xarg = "-x -y "; argument stdout = ${output:a2}; profile env.MAXMEM = ${none:env}; }

13 Example Derivation DV t1 ( a2=@{output:run1.exp15.T1932.summary}, a1=@{input:run1.exp15.T1932.raw}, env="20000", pa="600“ );

14 Derivations with dependencies TR trans1( output a2, input a1 ){ app = "/usr/bin/app1"; argument stdin = ${input:a1}; argument stdout = ${output:a2};} TR trans2( output a2, input a1 ){ app = "/usr/bin/app2"; argument stdin = ${input:a1}; argument stdout = ${output:a2};} DV trans1( a2=@{output:file2}, a1=@{input:file1}); DV trans2( a2=@{output:file3}, a1=@{output:file2});

15 Expressing Dependencies

16 Define the transformations TR generate( output a ){ app = "generator.exe"; argument stdout = ${output:a2}; TR findrange( output b, input a, none p="0.0" ){ app = "ranger.exe"; argument arg = "-i "${:p}; argument stdin = ${output:a}; argument stdout = ${output:b};} TR default.analyze( input a[], output c ){ pfnHint vanilla = "analyze.exe"; argument files = ${:a}; argument stdout = ${output:a2};}

17 Derivations forming a DAG DV generate( a=@{output:f.a} ); DV findrange( b=@{output:f.b}, a=@{input:f.a}, p="0.5" ); DV findrange( b=@{output:f.c}, a=@{input:f.a}, p="1.0" ); DV analyze( a=[ @{input:f.b}, @{input:f.c} ], c=@{output:f.d} );

18 Virtual Data Class Diagram Diagram by Jens Voeckler

19 Virtual Data Catalog Structure

20 Virtual Data Language - XML

21 VDL Searches l Locate the derivations that can produce a specific lfn l General queries for catalog maintenance l Locate transforms that can produce a specific file type (what does a type mean in this context?)

22 Virtual Data Issues l Param file support l Param structures l Sequences l Virtual datasets

23 Execution Environment Profile l Condor / DAGman / GRAM / WP1 l Concept of a EE driver –Allows plug-in of DAG generating code for: DAGman, Condor, GRAM, WP1 JM/RB l Execution Profile: Global, User/Group, Transformation, Derviation, Invocation

24 First Release – June 2002 l Java Catalog Classes l XML import – export l Textual VDL formatting l DAX – (abstract) DAG in XML l Simple planner for constrained Grid –Will generate Condor DAGs

25 Next Releases - Features l RLS Integration l Compound Transformations l Database persistency l OGSA Service l Other needed clients: C, TCL, ? l Expanded execution profiles / planners –Support for WP1 scheduler / broker –Support for generic RSL-based schedulers

26 Longer-term Feature Preview l Instance tracking l Virtual files and virtual transformations l Multi-modal data l Structured namespaces l Grid-wide distributed catalog service l Metadata database integration l Knowledge-base integration

27 Virtual Transformations l Is the VTR a new VDL construct or can we just have a transformation? l Yong: suggests collection have a start# and an end # l Example of how an app should update its own VDC entries (to track what files were really used…? l Iterators – specify the characteristics of a sequence of files (ie how to generate it) without enumerating the list – need this in a *dv* when the list of files can be determined before execution time or when this list would be too long. l this is a case where VTRs might require here documents

28 Virtual Transformation (cont) l Note that Ed Frank suggests constraining the set of primitives thatan app can use o access data specifically so that the data production *can* be accurately tracked and reproduced. l Filter –f x –s GB l VTR vfilter (seq) { foreach $f in (seq) { filter –f $f –s $siteSZ (.5 GB ANL, 2 GB BNL) } }

29 SDSS Extension: Dynamic Dependencies l Data is organized into spacial cells l Scope of search is not known until run time l In this case – nearest 9 or 25 cells to a centroid l Need a dynamic algorithmic spec for what the range of cells to process is – a nested loop that generates the actual file names to examine. l In complex cases, might be a sequence of such centroid-based sequences.

30 LIGO Example l Consider 3 (fictitious) channels: c, p, t l Operations are extract and concatenate l ex –i a –s t0 –e tb >ta l ex –i e –s te –e t1 >te l cat ta b c d te | filter l exch p <a –s t0 –e t1 l filter –v p,t l Examine whether derived metadata handles this concept

31 Distributed Virtual Data Service l Will parallel the service architecture of the RLS l …but probably can’t use soft-state approach – needs consistency; can accept latency l Need a global name space for collaboration-wide information and knowledge sharing l May use distributed database technology below the covers l Will leverage a distributed, structured namespce l Preliminary – not yet designed

32 Distributed Virtual Data Service apps Tier 1 centers Regional Centers Local sites VDC Distributed virtual data service

33 End of presentation

34 Supplementary Material

35 Knowledge Management Architecture l Knowledge based requests are formulated in terms of science data –Eg, Give me this transform of channels c,p,&t over time range t0-t1 l Finder finds the data files –Translates range “t0-t1” into a set of files l Coder creates an execution plan and defines derivations from known transformations –Can deal with missing files (e.g, file c in LIGO example) l K-B request is formulated in terms of virtual datasets l Coder translates into logical files l Planner trans;ates into physical files

36 NCSA Linux cluster 5) Secondary reports complete to master Master Condor job running at Caltech 7) GridFTP fetches data from UniTree NCSA UniTree - GridFTP- enabled FTP server 4) 100 data files transferred via GridFTP, ~ 1 GB each Secondary Condor job on WI pool 3) 100 Monte Carlo jobs on Wisconsin Condor pool 2) Launch secondary job on WI pool; input files via Globus GASS Caltech workstation 6) Master starts reconstruction jobs via Globus jobmanager on cluster 8) Processed objectivity database stored to UniTree 9) Reconstruction job reports complete to master User View of the Virtual Data Grid Scott Koranda, Miron Livny, others

37.................................... Data: 0.5 MB 175 MB 275 MB 105 MB SC2001 Demo Version: pythia cmsim writeHits writeDigis 1 run = 500 events 1 run 1 event CPU: 2 min 8 hours 5 min 45 min truth.ntpl hits.fz hits.DB digis.DB Production Pipeline GriphyN-CMS Demo

38 GriPhyN: Virtual Data Tracking Complex Dependencies l Dependency graph is: –Files: 8 < (1,3,4,5,7), 7 < 6, (3,4,5,6) < 2 –Programs: 8 < psearch, 7 < summarize, (3,4,5) < reformat, 6 < conv, (1,2) < simulate simulate – t 10 … file1 file2 reformat – f fz … file1 File3,4,5 psearch – t 10 … conv – I esd – o aod file6 summarize – t 10 … file7 file8 Requested file

39 Re-creating Virtual Data l To recreate file 8: Step 1 –simulate > file1, file2 simulate – t 10 … file1 file2 reformat – f fz … file1 File3,4,5 psearch – t 10 … conv – I esd – o aod file6 summarize – t 10 … file7 file8 Requested file

40 Re-creating Virtual Data l To re-create file8: Step 2 –files 3, 4, 5, 6 derived from file 2 –reformat > file3, file4, file5 –conv > file 6 simulate – t 10 … file1 file2 reformat – f fz … file1 File3,4,5 psearch – t 10 … conv – I esd – o aod file6 summarize – t 10 … file7 file8 Requested file

41 Re-creating Virtual Data l To re-create file 8: step 3 –File 7 depends on file 6 –Summarize > file 7 simulate – t 10 … file1 file2 reformat – f fz … file1 File3,4,5 psearch – t 10 … conv – I esd – o aod file6 summarize – t 10 … file7 file8 Requested file

42 Re-creating Virtual Data l To re-create file 8: final step –File 8 depends on files 1, 3, 4, 5, 7 –psearch file 8 simulate – t 10 … file1 file2 psearch – t 10 … reformat – f fz … conv – I esd – o aod file1 File3,4,5 file6 summarize – t 10 … file7 file8 Requested file

43 SDSS Galaxy Cluster Finding

44 Cluster-finding Grid Work of: Yong Zhao, James Annis, & others

45 Cluster-finding pipeline execution

46 Virtual Data in CMS Virtual Data Long Term Vision of CMS: CMS Note 2001/047, GRIPHYN 2001-16

47 CMS Data Analysis 100b 200b 5K 7K 100K 50K 300K 100K 50K 100K 200K 100K 100b 200b 5K 7K 100K 50K 300K 100K 50K 100K 200K 100K Tag 2 Jet finder 2 Jet finder 1 Reconstruction Algorithm Tag 1 Calibration data Raw data (simulated or real) Reconstructed data (produced by physics analysis jobs) Event 1 Event 2Event 3 Uploaded dataVirtual dataAlgorithms Dominant use of Virtual Data in the Future

48 Topics – Planner l Does the planner have a queue? What does presence and absence of queue imply? l How is responsibility between planner and the executor (cluster scheduler) partitioned? l How does planner estimate times if it only has partial responsibility for when/where things run? l How does a cluster sched assign CPUs – dedicated or shared? l See Mirons email on NeST for more Qs l Use of a Execution profiler in the planner arch? –Characterize the resource requirements of an app over time –Parameterize the res reqs of an app w.r.t its (salient) parameters

49 Planner Context l Map of grid resources l Status of grid resources –State (up/down) –Load –Dedication (commitment of resource to VO or group based on policy) l Policy l Request Queue (w/ lookahead, or process sequentially?)

50 CAS and SAS l Site Authorization Service –How does a physical site control the policy by which its resources get used? –How does a SAS and a CAS interact? –Can a resource inerpret restructed proxies from multiple CAS’s? (Yes, but not from arbitrary CASes) –Consider MPI and MPICH-G jobs – how would the latter be handled? –Consider: if P2 schedules a whole DAG up front – causes schedule to use outdated information

51 Planner Architecture

52 Policy l Focuses on Security and Configuration (controlled resource sharing/allocation) l Allocation example: –“cms should get 90% of the resources at Caltech” –Issues of fair share scheduling l How to factor in time quanta:CPU-hours; GB-Days l Relationship to accounting

53 Policy and the Planner l Planner considers: –Policy (fairly static, from CAS/SAS) –Grid status –Job (user/group) resource consumptn history –Job profiles (resources over time) from Prophesy

54 GriPhyN/PPDG Data Grid Architecture Application Planner Executor Catalog Services Info Services Policy/Security Monitoring Repl. Mgmt. Reliable Transfer Service Compute ResourceStorage Resource DAG (concrete) DAG (abstract) DAGMAN, Kangaroo GRAMGridFTP; GRAM; SRM GSI, CAS MDS MCAT; GriPhyN catalogs GDMP MDS Globus

55 (evolving) View of Data Grid Stack Data Transport (GridFTP) Storage Element Local Repl Catalog (Flat or Hierarchical) Reliable File Transfer Replica Location Service Publish-Subscribe Service (GDMP) Storage Element Manager Reliable Replication

56 Executor Example: Condor DAGMan l Directed Acyclic Graph Manager l Specify the dependencies between Condor jobs using DAG data structure l Manage dependencies automatically –(e.g., “Don’t run job “B” until job “A” has completed successfully.”) l Each job is a “node” in DAG l Any number of parent or children nodes l No loops Job A Job BJob C Job D Slide courtesy Miron Livny, U. Wisconsin

57 Executor Example: Condor DAGMan (Cont.) l DAGMan acts as a “meta-scheduler” –holds & submits jobs to the Condor queue at the appropriate times based on DAG dependencies l If a job fails, DAGMan continues until it can no longer make progress and then creates a “rescue” file with the current state of the DAG –When failed job is ready to be re-run, the rescue file is used to restore the prior state of the DAG DAGMan Condor Job Queue C D B C B A Slide courtesy Miron Livny, U. Wisconsin

58 l Abstract DAG –Represents user requests –Simplest case: request for one or more data product –Complex case: request execution of a chained set of applications –No file or execution locations need be present l Concrete DAG –Specifies any application invocations needed to derive data –Specifes locations of all invocations (to the site level) –Includes explicit job steps to move data DAG Usage

59 Strawman Architecture

60 The GriPhyN Charter “A virtual data grid enables the definition and delivery of a potentially unlimited virtual space of data products derived from other data. In this virtual space, requests can be satisfied via direct retrieval of materialized products and/or computation, with local and global resource management, policy, and security constraints determining the strategy used.”

61 GriPhyN-LIGO SC2001 Demo

62 GriPhyN CMS SC2001 Demo         Full Event Database of ~100,000 large objects Full Event Database of ~40,000 large objects “Tag” database of ~140,000 small objects Request Parallel tuned GSI FTP Bandwidth Greedy Grid-enabled Object Collection Analysis for Particle Physics http://pcbunn.cacr.caltech.edu/Tier2/Tier2_Overall_JJB.htm

63 Virtual Data in Action l Data request may l Access local data l Compute locally l Compute remotely l Access remote data l Scheduling & execution subject to local & global policies

Virtual Data Tools Status Update ATLAS Grid Software Meeting BNL, 6 May 2002 Mike Wilde Argonne National Laboratory An update on work by Jens Voeckler,

Similar presentations

Presentation on theme: "Virtual Data Tools Status Update ATLAS Grid Software Meeting BNL, 6 May 2002 Mike Wilde Argonne National Laboratory An update on work by Jens Voeckler,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Virtual Data Tools Status Update ATLAS Grid Software Meeting BNL, 6 May 2002 Mike Wilde Argonne National Laboratory An update on work by Jens Voeckler,

Similar presentations

Presentation on theme: "Virtual Data Tools Status Update ATLAS Grid Software Meeting BNL, 6 May 2002 Mike Wilde Argonne National Laboratory An update on work by Jens Voeckler,"— Presentation transcript:

Similar presentations

About project

Feedback