Henry Nebrensky – MICE DAQ review - 4 June 2009 MICE Data Flow Henry Nebrensky Brunel University 1.

Slides:



Advertisements
Similar presentations
Software Summary Database Data Flow G4MICE Status & Plans Detector Reconstruction 1M.Ellis - CM24 - 3rd June 2009.
Advertisements

Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Computing Panel Discussion Continued Marco Apollonio, Linda Coney, Mike Courthold, Malcolm Ellis, Jean-Sebastien Graulich, Pierrick Hanlet, Henry Nebrensky.
1 Data Storage MICE DAQ Workshop 10 th February 2006 Malcolm Ellis & Paul Kyberd.
Henry Nebrensky – Data Flow Workshop – 30 June 2009 MICE Data Flow Workshop Henry Nebrensky Brunel University 1.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Software Summary 1M.Ellis - CM23 - Harbin - 16th January 2009  Four very good presentations that produced a lot of useful discussion: u Online Reconstruction.
Henry Nebrensky - MICE CM June 2009 MICE Data Flow Henry Nebrensky Brunel University 1.
Henry Nebrensky - MICE VC May 2009 MICE Data and the Grid 1  Storage, archiving and dissemination of experimental data: u Not been a high priority.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
MySQL and GRID Gabriele Carcassi STAR Collaboration 6 May Proposal.
Grid Update Henry Nebrensky Brunel University MICE Collaboration Meeting CM23.
How to Install and Use the DQ2 User Tools US ATLAS Tier2 workshop at IU June 20, Bloomington, IN Marco Mambelli University of Chicago.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Event Data History David Adams BNL Atlas Software Week December 2001.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
SRM workshop – September’05 1 SRM: Expt Reqts Nick Brook Revisit LCG baseline services working group Priorities & timescales Use case (from LHCb)
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Parameter Study Principles & Practices. What is Parameter Study? Parameter study is the application of a single algorithm over a set of independent inputs:
GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Non-LHC and Non-US-Collider Experiments’ Requirements Dan Tovey, University.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
A B A B AR InterGrid Testbed Proposal for discussion Robin Middleton/Roger Barlow Rome: October 2001.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
Parameter Study Principles & Practices. Outline Data Model of the PS Part I Simple PS –Generating simple PS Workflow by introducing PS Input port – using.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
The GridPP DIRAC project DIRAC for non-LHC communities.
Testing the HEPCAL use cases J.J. Blaising, F. Harris, Andrea Sciabà GAG Meeting April,
M. Oldenburg GridPP Metadata Workshop — July 4–7 2006, Oxford University 1 Markus Oldenburg GridPP Metadata Workshop July 4–7 2006, Oxford University ALICE.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
EGEE is a project funded by the European Union under contract IST Enabling bioinformatics applications to.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
1 Configuration Database David Forrest University of Glasgow RAL :: 31 May 2009.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
The GridPP DIRAC project DIRAC for non-LHC communities.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Storage Element Security Jens G Jensen, WP5 Barcelona, May 2003.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
Federating Data in the ALICE Experiment
Jean-Philippe Baud, IT-GD, CERN November 2007
Bulk production of Monte Carlo
Database Replication and Monitoring
Bulk production of Monte Carlo
Data Management and Database Framework for the MICE Experiment
SRM2 Migration Strategy
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Artem Trunov and EKP team EPK – Uni Karlsruhe
GSAF Grid Storage Access Framework
Data services in gLite “s” gLite and LCG.
Architecture of the gLite Data Management System
Integrating SRB with the GIGGLE framework
The LHCb Computing Data Challenge DC06
Presentation transcript:

Henry Nebrensky – MICE DAQ review - 4 June 2009 MICE Data Flow Henry Nebrensky Brunel University 1

Henry Nebrensky – MICE DAQ review - 4 June 2009 The Awesome Power of Grid Computing  The Grid provides seamless interconnection between tens of thousands of computers.  It therefore generates new acronyms and jargon at superhuman speed. 2

Henry Nebrensky – MICE DAQ review - 4 June 2009 MICE and Grid Data Storage  The Grid provides MICE not only with computing (number-crunching) power, but also with a secure global framework allowing users access to data u Good news: storing development data on the Grid keeps it available to the collaboration – not stuck on an old PC in the corner of the lab u Bad news: loss of ownership – who picks up the data curation responsibilities?  Data can be downloaded from the Grid to user’s “own” PC – doesn’t need to be analysed remotely 3

Henry Nebrensky – MICE DAQ review - 4 June 2009 Grid Middleware  We are currently using EGEE/WLCG middleware and resources, as they are receiving significant development effort and are a reasonable match for our needs.  Outside Europe other software may be expected – e.g. the OSG stack in the US. Interoperability has not been investigated by us yet.  In the worst case, users would have to install a gLite UI locally. 4

Henry Nebrensky – MICE DAQ review - 4 June 2009 Grid File Management (1)  Each file is given a unique, machine-generated, GUID when stored on the Grid  The file is physically uploaded to one (or more) SEs (Storage Elements) where it is given a machine-generated SURL (Storage URL)  A “replica catalogue” tracks the multiple SURLs of a GUID  Machine-generated names are not (meant to be) human- usable  For sanity's sake we would like to associate sensible filenames with each file (LFN, Logical File Name)  A “file catalogue” is a database that translates between something that looks like a Unix filesystem and the GUIDs and SURLs needed to actually access the data on the Grid 5

Henry Nebrensky – MICE DAQ review - 4 June 2009 Grid File Management (2) 6  MICE has an instance of LFC (LCG File Catalogue) run by the Tier 1 at RAL  The LFC service can do both the replica and LFN cataloguing  LFC presents the user with what looks like a normal Unix filespace - the Grid client SW keeps track of the data behind the scenes. LFC From MICE Note 247

Henry Nebrensky – MICE DAQ review - 4 June 2009 Data Integrity  (For recent SE releases) a checksum is calculated automatically when a file is uploaded.  This can be checked when the file is transferred between SEs, or the value retrieved to check local copies.  Should we also do it ourselves before uploading the file in the first place, or should we use “compression” (can check integrity with gunzip –t …)?  (Default algorithm is Adler32 – lightweight + effective) 7

Henry Nebrensky – MICE DAQ review - 4 June 2009 The VOMS server  File permissions will needed e.g. to ensure that users can’t accidentally delete RAW data. These rules will need to last for at least the life of the experiment.  VOMS is a Grid service that allows us to define specific roles (e.g. DAQ data archiver) which will then be allowed certain privileges (such as writing to tape at RAL Tier 1).  The VOMS service then maps humans to those roles, via their Grid certificates.  MICE VOMS server is provided via GridPP at Manchester, UK.  New Mice are added or assigned to roles by the VO Manager (and Mouse) Paul Hodgson.  Thus the VOMS service provides us with a single portal where we can add/remove/reassign Mice, without needing to negotiate with the operators of every Grid resource worldwide – we actually keep control “in-house.” 8

Henry Nebrensky – MICE DAQ review - 4 June 2009 MICE Data Flow  The basic data flow in MICE is thus something like: u The raw data file from the experiment are sent to tape using Grid protocols, including registering the files in LFC. u The offline reconstruction can then use Grid/LFC to pull down the raw data, and upload reconstructed (“RECO” or DST) files. u Users can use Grid/LFC to access RECO files they want to play with.  Combining the above description with the Grid and work being done by current users gives: 9

Henry Nebrensky – MICE DAQ review - 4 June 2009 MICE Data Flow Diagram 10  Short-dashed lines indicate entities that still need confirmation  Question marks indicate even higher levels of uncertainty  More details in MICE Note 252  The diagram would look pretty much the same if non-Grid tools were used

Henry Nebrensky – MICE DAQ review - 4 June 2009 MICE Data Unknowns  MICE Note 252 identifies four main flavours of data: RAW, RECO, analysis results, and Monte Carlo simulation.  For all four, we need to understand the: u volume (the total amount of data, the rate at which it will be produced, and the size of the individual files in which it will be stored) u lifetime (ephemeral or longer lasting? will it need archiving to tape? replication?) u access control (who will create the data? who is allowed to see it? can it be modified or deleted, and if so who has those privileges?) u “service level” (desired availability? allowable downtime?)  Also need to identify use cases I’ve missed, especially ones that will need more VOMS roles or CASTOR space tokens. 11

Henry Nebrensky – MICE DAQ review - 4 June 2009 File Catalogue Namespace (1)  Also, we need to agree on a consistent namespace for the file catalogue  Proposal (MICE Note 247, Grid talk at CM23):  We get given /grid/mice/ by the server u Five upper-level directories:  Construction/ historical data from detector development and QA  Calibration/ needed during analysis (large datasets, c.f. DB)  TestBeam/ test beam data  MICE/ DAQ output and corresponding MC simulation 12

Henry Nebrensky – MICE DAQ review - 4 June 2009 File Catalogue Namespace (2)  /grid/mice/users/name For people to use as scratch space for their own purposes, e.g. analysis u Encourage people to do this through LFC – helps avoid “dark data” u LFC allows Unix-style access permissions  Again, the LFC namespace is something that needs to be finalised before production data can start to be registered. 13

Henry Nebrensky – MICE DAQ review - 4 June 2009 Metadata Catalogue  For many applications – such as analysis – you will want to identify the list of files containing the data that matches some parameters  This is done by a “metadata catalogue”. For MICE this doesn't yet exist  A metadata catalogue can in principle return either the GUID or an LFN – it shouldn’t matter which as long as it’s properly integrated with the other Grid services. 14

Henry Nebrensky – MICE DAQ review - 4 June 2009 MICE Metadata Catalogue  We need to select a technology to use for this u use the configuration database? (no) u gLite AMGA (who else uses it – will it remain supported?) u ?  Need to implement – i.e. register metadata to files  What metadata will be needed for analysis?  Should the catalogue include the file format and compression scheme (gzip ≠ PKzip)? 15

Henry Nebrensky – MICE DAQ review - 4 June 2009 MICE Metadata Catalogue for Humans or, in non-Gridspeak:  we have several databases (configuration DB, EPICS, e-Logbook) where we should be able to find all sorts of information about a run/timestamp.  but how do we know which runs to be interested in, for our analysis?  we need an “index” to the MICE data, and for this we need to define the set of “index terms” that will be used to search for relevant datasets. 16

Henry Nebrensky – MICE DAQ review - 4 June 2009 MICE Metadata  Run, date/time  Step  Beam – μ, e -, π, p  Nominal 4-d / transverse normalised emittance  Diffuser setting  Nominal momentum  Configuration: u Magnet currents (nominal) u Physical geometry  Absorber material  RF?  MC Truth? 17

Henry Nebrensky – MICE DAQ review - 4 June 2009 Conclusions  The data flow is more complex than people realise…  … and probably won’t work by accident Some specific issues that need to be understood are the attributes of the data flows (Note 252), the LFC Namespace (Note 247) and the index terms for the metadata catalogue. There is to be a one-day workshop in the next month to finalise these. 18