Presentation is loading. Please wait.

Presentation is loading. Please wait.

Virtual Data in CMS Analysis A.Arbree, P.Avery, D.Bourilkov, R.Cavanaugh, G.Graham, J.Rodriguez, M.Wilde, Y.Zhao CMS & GriPhyN CHEP03, La Jolla, California.

Similar presentations


Presentation on theme: "Virtual Data in CMS Analysis A.Arbree, P.Avery, D.Bourilkov, R.Cavanaugh, G.Graham, J.Rodriguez, M.Wilde, Y.Zhao CMS & GriPhyN CHEP03, La Jolla, California."— Presentation transcript:

1 Virtual Data in CMS Analysis A.Arbree, P.Avery, D.Bourilkov, R.Cavanaugh, G.Graham, J.Rodriguez, M.Wilde, Y.Zhao CMS & GriPhyN CHEP03, La Jolla, California March 25, 2003

2 D.BourilkovVirtual Data in CMS Analysis2 We already do this, but manually! Virtual Data Webster dictionary: vir·tu·al Function: adjective Etymology: Middle English, possessed of certain physical virtues, from Medieval Latin virtualis, from Latin virtus strength, virtue Most scientific data are not simple “measurements”  produced from increasingly complex computations (e.g. reconstructions, calibrations, selections, simulations, fits etc.) HEP (and other sciences) increasingly CPU/Data intensive Programs are significant community resources (transformations) So are the executions of those programs (derivations) Management of dataset transformations important! Derivation: Instantiation of a potential data product Provenance: Exact history of any existing data product

3 D.BourilkovVirtual Data in CMS Analysis3 Transformation Derivation Data product-of execution-of consumed-by/ generated-by “I’ve detected a muon calibration error and want to know which derived data products need to be recomputed.” “I’ve found some interesting data, but I need to know exactly what corrections were applied before I can trust it.” “I want to search a database for 3 rare electron events. If a program that does this analysis exists, I won’t have to write one from scratch.” “I want to apply a forward jet analysis to 100M events. If the results already exist, I’ll save weeks of computation.” Virtual Data Motivations

4 D.BourilkovVirtual Data in CMS Analysis4 Virtual Data Motivations Data track-ability and result audit-ability: "Virtual Logbook” In the nature of science Reproducibility of results Tools and data sharing and collaboration (data with “recipe”) Individuals discover other scientists’ work and build from it Different Teams can work in a modular, semi-autonomous fashion: reuse previous data/code/results or entire analysis chains Repair and correction of data – c.f. “make” Workflow management, Performance optimization: data staged-in from remote site OR re-created locally on demand? Transparency with respect to location and existence

5 D.BourilkovVirtual Data in CMS Analysis5 Introducing CHIMERA: The GriPhyN Virtual Data System l Virtual Data Language l textual (concise, for human consumption) l XML (uses XML schema, for component integration) l Virtual Data Interpreter l implemented in Java l JAVA API and command-line toolkit l Virtual Data Catalog tracks data provenance (acts like a metadata repository); different back-ends for persistency: l PostGreSQL and MySQL DB l file based (for easy testing)

6 D.BourilkovVirtual Data in CMS Analysis6 Virtual Data in CHIMERA A “function call” paradigm l Virtual data: data objects with a well defined method of (re)production Transformation [namespace]::identifier:[version  ] Abstract description of how a script/executable is invoked Similar to a "function declaration" in C/C++ Derivation [namespace]::identifier:[version range] Invocation of a transformation with specific arguments Similar to a "function call" in C/C++ Can be either past or future a record of how logical files were produced a recipe for creating logical files at some point in the future

7 D.BourilkovVirtual Data in CMS Analysis7 Virtual Data Language TR pythia( out a2, in a1, none param=“160.0” ) { argument arg = ${param}; argument file = ${a1}; Build-style recipe argument file = ${a2}; } TR cmsim( out a2, in a1[] ) { argument files = ${a1}; argument file = ${a2}; } DV x1->pythia( a2=@{out:file2}, a1=@{in:file1}); DV x2->cmsim( a2=@{out:file3}, a1=[@{in:file2}, @{in:cardfile}] ); Make-style recipe file1 file2, cardfile file3 x1 x2

8 D.BourilkovVirtual Data in CMS Analysis8 Abstract and Concrete DAGs Abstract DAXs (Virtual Data DAG) abstract directed acyclic graph with logical names for files/executables (complete build-style recipe as DAX) –Resource locations unspecified –File names are logical –Data destinations unspecified Concrete DAGs (stuff for DAGMan) CONDOR style DAG for grid execution (check RC, skip steps, make-style) –Resource locations determined –Physical file names specified –Data delivered to and returned from physical locations Abs. Plan VDC RC C. Plan. DAX DAGMan DAG VDL Logical Physical XML

9 D.BourilkovVirtual Data in CMS Analysis9 Nitty-Gritty Transformation catalog (expects pre-built executables) #poolname ltransformation physical transformation environment String local hw /bin/echo null local pythcvs /workdir/lhc-h-6-cvs null local pythlin /workdir/lhc-h-6-link null local pythgen /workdir/lhc-h-6-run null local pythtree /workdir/h2root.sh null local pythview /workdir/root.sh null local GriphynRC /vdshome/bin/replica-catalog JAVA_HOME=/vdt/jdk1.3;VDS_HOME=/vdshome local globus-url-copy /vdt/bin/globus-url-copy GLOBUS_LOCATION=/vdt;LD_LIBRARY_PATH=/vdt/lib ufl hw /bin/echo null ufl GriphynRC /vdshome/bin/replica-catalog JAVA_HOME=/vdt/jdk1.3.1_04;VDS_HOME=/vdshome ufl globus-url-copy /vdt/bin/globus-url-copy GLOBUS_LOCATION=/vdt;LD_LIBRARY_PATH=/vdt/lib Pool configuration #pooluniverse job-manager-string url-prefix workdir... ufl vanilla testulix/jm-condor-INTEL-LINUX gsiftp://testulix/mydir /mydir ufl standard testulix/jm-condor-INTEL-LINUX gsiftp://testulix/mydir /mydir ufl globus testulix/jm-condor-INTEL-LINUX gsiftp://testulix/mydir /mydir ufl transfer testulix/jobmanager gsiftp://testulix/mydir /mydir local vanilla localhost/jm-condor gsiftp://localhost/mydir /mydir local globus localhost/jm-condor gsiftp://localhost/mydir /mydir local transfer localhost/jobmanager gsiftp://localhost/mydir /mydir

10 D.BourilkovVirtual Data in CMS Analysis10 Data Analysis in HEP Decentralized, “chaotic” Flexible enough system: able to accommodate large user base, use cases that we can’t foresee Ability to build scripts/executables “on the fly”, including user supplied code/parameters (possibly linking with preinstalled libraries on the execution sites)

11 D.BourilkovVirtual Data in CMS Analysis11 Prototypes First for SC2002, second for CHEP03 CVS tag FORTRAN code datacards libraries version N executable root wrapper h2root PYTHIA wrapper compile, link CVS plots ntuples root trees event displays C++ code

12 D.BourilkovVirtual Data in CMS Analysis12 Prototypes CHIMERA/ROOT prototype for generating events with PYTHIA/CMKIN, histogramming and visualization

13 D.BourilkovVirtual Data in CMS Analysis13 mass = 160 decay = WW WW  e event = 8 mass = 160 decay = WW WW  e plot = 1 mass = 160 decay = WW plot = 1 mass = 160 decay = WW event = 8 mass = 160 decay = WW WW  e mass = 160 decay = WW WW  leptons mass = 160 decay = WW mass = 160 decay = ZZ mass = 160 decay = bb mass = 160 plot = 1 mass = 160 event = 8 A virtual space of simulated data is created for future use by scientists...

14 D.BourilkovVirtual Data in CMS Analysis14 mass = 160 decay = WW WW  e event = 8 mass = 160 decay = WW WW  e plot = 1 mass = 160 decay = WW plot = 1 mass = 160 decay = WW event = 8 mass = 160 decay = WW WW  e mass = 160 decay = WW WW  leptons mass = 160 decay = WW mass = 160 decay = ZZ mass = 160 decay = bb mass = 160 plot = 1 mass = 160 event = 8 Search for WW decays of the Higgs Boson where the Ws decay to electron and muon: mass = 160; decay = WW; WW  e

15 D.BourilkovVirtual Data in CMS Analysis15 mass = 160 decay = WW WW  e event = 8 mass = 160 decay = WW WW  e plot = 1 mass = 160 decay = WW plot = 1 mass = 160 decay = WW event = 8 mass = 160 decay = WW WW  e mass = 160 decay = WW WW  leptons mass = 160 decay = WW mass = 160 decay = ZZ mass = 160 decay = bb mass = 160 plot = 1 mass = 160 event = 8 Scientist obtains an interesting result and wants to track how it was derived.

16 D.BourilkovVirtual Data in CMS Analysis16 mass = 160 decay = WW WW  e event = 8 mass = 160 decay = WW WW  e plot = 1 mass = 160 decay = WW plot = 1 mass = 160 decay = WW event = 8 mass = 160 decay = WW WW  e mass = 160 decay = WW WW  leptons mass = 160 decay = WW mass = 160 decay = ZZ mass = 160 decay = bb mass = 160 plot = 1 mass = 160 event = 8 Now the scientist wants to dig deeper...

17 D.BourilkovVirtual Data in CMS Analysis17 mass = 160 decay = WW WW  e Pt > 20 mass = 160 decay = WW WW  e event = 8 mass = 160 decay = WW WW  e plot = 1 mass = 160 decay = WW plot = 1 mass = 160 decay = WW event = 8 mass = 160 decay = WW WW  e mass = 160 decay = WW WW  leptons mass = 160 decay = WW mass = 160 decay = ZZ mass = 160 decay = bb mass = 160 plot = 1 mass = 160 event = 8...The scientist adds a new derived data branch......and continues to investigate !

18 D.BourilkovVirtual Data in CMS Analysis18 A Collaborative Data-flow Development Environment: Complex Data Flow and Data Provenance in HEP Raw ESD AOD TAG Plots, Tables, Fits Comparisons Plots, Tables, Fits Real Data Simulated Data l History of a Data Analysis (like CVS) l "Check-point" a Data Analysis l Analysis Development Environment l Audit a Data Analysis

19 D.BourilkovVirtual Data in CMS Analysis19 Outlook Work in progress both on CHIMERA & CMS sides – a “snapshot” A CHIMERA/ROOT prototype for building executables “on the fly”, generating events with PYTHIA/CMKIN, plotting and visualization available (CHIMERA is a great integration tool) The full CMS Monte Carlo chain is working under CHIMERA (next talk) Possible future directions: Workflow management; automatic generation; inheritance … Store metadata about derivations (like annotations) in a searchable catalog Handle Datasets, not just Logical File Names Integration with CLARENS (remote access), with ROOT/PROOF (run in parallel) A picture is better than 1000 words: Prototype Demo Prototype Demo


Download ppt "Virtual Data in CMS Analysis A.Arbree, P.Avery, D.Bourilkov, R.Cavanaugh, G.Graham, J.Rodriguez, M.Wilde, Y.Zhao CMS & GriPhyN CHEP03, La Jolla, California."

Similar presentations


Ads by Google