Data Access for Analysis Jeff Templon PDP Groep, NIKHEF A. Tsaregorodtsev, F. Carminati, D. Liko, R. Trompert GDB Meeting 8 march 2006.

Slides:

Advertisements

Similar presentations

Storage Issues: the experiments’ perspective Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 9 September 2008.

Advertisements

Grid Data Management Assaf Gottlieb - Israeli Grid NA3 Team EGEE is a project funded by the European Union under contract IST EGEE tutorial,

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim

Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',

DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.

Bookkeeping data Monitoring info Get jobs Site A Site B Site C Site D Agent Production service Monitoring service Bookkeeping service Agent © Andrei Tsaregorodtsev.

EGEE-II INFSO-RI Enabling Grids for E-sciencE gLite Data Management System Yaodong Cheng CC-IHEP, Chinese Academy.

Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.

INFSO-RI Enabling Grids for E-sciencE Project Gridification: the UNOSAT experience Patricia Méndez Lorenzo CERN (IT-PSS/ED) CERN,

- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.

Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.

D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.

Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.

Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.

SRM workshop – September’05 1 SRM: Expt Reqts Nick Brook Revisit LCG baseline services working group Priorities & timescales Use case (from LHCb)

CERN IT Department CH-1211 Geneva 23 Switzerland GT WG on Storage Federations First introduction Fabrizio Furano

CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.

LHCb The LHCb Data Management System Philippe Charpentier CERN On behalf of the LHCb Collaboration.

T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.

1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.

Stephen Burke – Data Management - 3/9/02 Partner Logo Data Management Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF.

E-science grid facility for Europe and Latin America Data Management Services E2GRIS1 Rafael Silva – UFCG (Brazil) Universidade Federal.

SRM & SE Jens G Jensen WP5 ATF, December Collaborators Rutherford Appleton (ATLAS datastore) CERN (CASTOR) Fermilab Jefferson Lab Lawrence Berkeley.

1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.

Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.

AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.

INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.

SEE-GRID-SCI Storage Element Installation and Configuration Branimir Ackovic Institute of Physics Serbia The SEE-GRID-SCI.

INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.

Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.

Transformation System report Luisa Arrabito 1, Federico Stagni 2 1) LUPM CNRS/IN2P3, France 2) CERN 5 th DIRAC User Workshop 27 th – 29 th May 2015, Ferrara.

1 DIRAC Job submission A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.

David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.

DIRAC Review (12 th December 2005)Stuart K. Paterson1 DIRAC Review Workload Management System.

Storage Classes report GDB Oct Artem Trunov

Data Management The European DataGrid Project Team

Testing the HEPCAL use cases J.J. Blaising, F. Harris, Andrea Sciabà GAG Meeting April,

Some Notes on Logical File Names and Related Interfaces David Malon ATLAS Database Group LHC Persistence Workshop 5 June 2002.

LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.

1 Xrootd-SRM Andy Hanushevsky, SLAC Alex Romosan, LBNL August, 2006.

Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.

Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.

EGEE-II INFSO-RI Enabling Grids for E-sciencE Data management in EGEE.

WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.

1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Algiers, EUMED/Epikh Application Porting Tutorial, 2010/07/04.

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) LFC Installation and Configuration Dong Xu IHEP,

SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,

Grid Data Management Assaf Gottlieb Tel-Aviv University assafgot tau.ac.il EGEE is a project funded by the European Union under contract IST

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Data Management Maha Metawei

CMS data access Artem Trunov. CMS site roles Tier0 –Initial reconstruction –Archive RAW + REC from first reconstruction –Analysis, detector studies, etc.

Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.

Federating Data in the ALICE Experiment

GFAL Grid File Access Library

GFAL: Grid File Access Library

gLite Basic APIs Christos Filippidis

L’analisi in LHCb Angelo Carbone INFN Bologna

Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF

WLCG Collaboration Workshop;

R. Graciani for LHCb Mumbay, Feb 2006

Data Management Ouafa Bentaleb CERIST, Algeria

Data services in gLite “s” gLite and LCG.

Architecture of the gLite Data Management System

gLite Data and Metadata Management

Production Manager Tools (New Architecture)

Presentation transcript:

Data Access for Analysis Jeff Templon PDP Groep, NIKHEF A. Tsaregorodtsev, F. Carminati, D. Liko, R. Trompert GDB Meeting 8 march 2006

Jeff Templon – GDB Meeting, CERN, Use Case: Reprocessing or Analysis u Job sent to site whose SE has (or is “close”?) to data u Job shows up and needs to “do something” prior to “open” u What???

Jeff Templon – GDB Meeting, CERN, Approaches seen in the wild u “copy to local storage” : lcg-cp to TMPDIR or equivalent, read from there (nb good to know where local storage is!) u (gsi)rfio : apparently supported by all DPMs but spotty elsewhere u (gsi)dcap: horribly insecure (this just in: doesn’t have to be this way) u Xrootd: only used by ALICE? u GFAL: seems to be somewhat unknown in app community

Jeff Templon – GDB Meeting, CERN, LHCb example (next several slides): Job access to the input data  The DIRAC job wrapper ensures access to the input sandbox and data before starting the application  Downloads input sandbox files to the local directory  Currently InputSandbox is a DIRAC WMS specific service  Can also use generic LFNs which is to become the main mode of operation

Jeff Templon – GDB Meeting, CERN, Job access to the input data (2) u Resolves the input data LFN into a “best replica” PFN for the execution site n Contacts the central LFC File Catalog for replica information n Picks up the replica on a local storage if any u Attempts to stage the data files (using lcg-gt) n File staging n Getting the TURL of the staged file accessible with dcap or rfio protocols n Returns turl immediately regardless of file’s staging status n Also needs file pinning not yet available

Jeff Templon – GDB Meeting, CERN, Job access to the input data (3) u If the previous step fails, constructs the TURL based on the information stored in the Configuration service n E.g. rfio:/castor/cern.ch/grid/lhcb/production/DC04/v2/ u If the previous step fails (e.g. no adequate protocol available for the site), bring datasets local u Construct the POOL XML slice with the LFN to PFN mapping to be used by applications

Jeff Templon – GDB Meeting, CERN, HEPCAL (2002) The issues regarding DS access in support of analysis jobs are largely addressed in HEPCAL, which assumed that the Data Management System would transparently optimize data access on the user’s behalf. HEPCAL anticipated that at least the following options would be considered by the DMS: 1.Access (possibly via remote protocol) to an existing physical copy of the DS; 2.Making a new replica to an SE – because this SE has file-systems mountable from the chosen worker node, or perhaps it supports the protocol requested by the application – and arranging for the user program to access this new one; 3.Making a local copy to temporary storage at the worker node where the job is running; 4.If a virtual definition of the dataset exists, materializing the DS to either a suitable SE or local temporary storage at the node where the job will run. The user will in general not be aware of this; her program will just “open the DS”. Subsequent reads on the returned handle will “get the bytes”.

Jeff Templon – GDB Meeting, CERN, What is the answer? GFAL? u If this is the answer, why don’t large exp’ts such as ATLAS know much about it? u ALICE wants xrootd, GFAL ‘not enough’ u Are there sites that want to support multiple file access pools?? u Can we get a GFAL presentation soon in the GDB about n What it can do n What underlying site protocols (gsiftp / rfio /dcap / xrootd) it wraps n Vision for the future