Download presentation
Presentation is loading. Please wait.
Published byShanon Lane Modified over 9 years ago
1
Data Access for Analysis Jeff Templon PDP Groep, NIKHEF A. Tsaregorodtsev, F. Carminati, D. Liko, R. Trompert GDB Meeting 8 march 2006
2
Jeff Templon – GDB Meeting, CERN, 2006.03.08 - 2 Use Case: Reprocessing or Analysis u Job sent to site whose SE has (or is “close”?) to data u Job shows up and needs to “do something” prior to “open” u What???
3
Jeff Templon – GDB Meeting, CERN, 2006.03.08 - 3 Approaches seen in the wild u “copy to local storage” : lcg-cp to TMPDIR or equivalent, read from there (nb good to know where local storage is!) u (gsi)rfio : apparently supported by all DPMs but spotty elsewhere u (gsi)dcap: horribly insecure (this just in: doesn’t have to be this way) u Xrootd: only used by ALICE? u GFAL: seems to be somewhat unknown in app community
4
Jeff Templon – GDB Meeting, CERN, 2006.03.08 - 4 LHCb example (next several slides): Job access to the input data The DIRAC job wrapper ensures access to the input sandbox and data before starting the application Downloads input sandbox files to the local directory Currently InputSandbox is a DIRAC WMS specific service Can also use generic LFNs which is to become the main mode of operation
5
Jeff Templon – GDB Meeting, CERN, 2006.03.08 - 5 Job access to the input data (2) u Resolves the input data LFN into a “best replica” PFN for the execution site n Contacts the central LFC File Catalog for replica information n Picks up the replica on a local storage if any u Attempts to stage the data files (using lcg-gt) n File staging n Getting the TURL of the staged file accessible with dcap or rfio protocols n Returns turl immediately regardless of file’s staging status n Also needs file pinning not yet available
6
Jeff Templon – GDB Meeting, CERN, 2006.03.08 - 6 Job access to the input data (3) u If the previous step fails, constructs the TURL based on the information stored in the Configuration service n E.g. rfio:/castor/cern.ch/grid/lhcb/production/DC04/v2/ u If the previous step fails (e.g. no adequate protocol available for the site), bring datasets local u Construct the POOL XML slice with the LFN to PFN mapping to be used by applications
7
Jeff Templon – GDB Meeting, CERN, 2006.03.08 - 7 HEPCAL (2002) The issues regarding DS access in support of analysis jobs are largely addressed in HEPCAL, which assumed that the Data Management System would transparently optimize data access on the user’s behalf. HEPCAL anticipated that at least the following options would be considered by the DMS: 1.Access (possibly via remote protocol) to an existing physical copy of the DS; 2.Making a new replica to an SE – because this SE has file-systems mountable from the chosen worker node, or perhaps it supports the protocol requested by the application – and arranging for the user program to access this new one; 3.Making a local copy to temporary storage at the worker node where the job is running; 4.If a virtual definition of the dataset exists, materializing the DS to either a suitable SE or local temporary storage at the node where the job will run. The user will in general not be aware of this; her program will just “open the DS”. Subsequent reads on the returned handle will “get the bytes”.
8
Jeff Templon – GDB Meeting, CERN, 2006.03.08 - 8 What is the answer? GFAL? u If this is the answer, why don’t large exp’ts such as ATLAS know much about it? u ALICE wants xrootd, GFAL ‘not enough’ u Are there sites that want to support multiple file access pools?? u Can we get a GFAL presentation soon in the GDB about n What it can do n What underlying site protocols (gsiftp / rfio /dcap / xrootd) it wraps n Vision for the future
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.