Data management in ATLAS S. Jézéquel LAPP-CNRS-France
SFO2 SFO1 SFO3 SFO4 T0 Tape Reconstruction farm T1 T2 Real data
RAW e High Level Trigger -> Stream (e, mu, tau/jet, MinBias, Beauty) RAW mu RAW jet/tau RAW MinBias RAW Beauty ESD e ESD mu ESD jet/tau ESD MinBias ESD Beauty AOD e AOD mu AOD jet/tau AOD MinBias AOD Beauty D1PD S.M./e D1PD top D1PD Susy D1PD Higgs D1PD Beauty 10 files/stream 5x10 files total Central management Production Replication Deletion
Tape Reconstruction farm T1 T2 Simulation data (MC) Central management Production Replication Deletion
Simulation Production number of events assume: HITS = 4 MB RDO = 2 MB ESD = 1 MB AOD =.2 MB TAG =.01 MB assume: HITS = 4 MB RDO = 2 MB ESD = 1 MB AOD =.2 MB TAG =.01 MB June-Sept TeV Sept-Dec TeV Dec-Mar TeV total Geant415M 45M ATLFAST85M 255M total100M 300M assume: we keep 20% RDO’s we keep 20% ESD assume: we keep 20% RDO’s we keep 20% ESD 5
June-Sept 10 TeV Sept-Dec 10 TeV Dec-Mar 14 TeV total HITS from G4 60 TB 180 TB AOD from HITS 3 TB 9 TB 20% RDO from HITS 6 TB 18 TB 20% ESD from HITS 3 TB 9 TB 100% TAG from HITS 0.15 TB 0.5 TB AOD from ATLFAST 17 TB 51 TB total89 TB 268 TB Simulation Production sample sizes 6
Orders of magnitude Real data RAW=1.5 MB/evt ESD =0.5 MB/evt AOD=0.2 MB/evt DPD=0.02 MB/evt per type? 1 file = 1GB Write at 200 Hz during 50k seconds/day : RAW=15 TB/day or 15k files/day ESD= 5 TB/day or 5k files/day AOD= 2 TB/day -> 60 days : ~100 TB DPD=200 GB/day per type Total : 20 TB/day or 20k files
File grouping in datasets Too many files -> Manipulated in blocks File : Contains a list of independent events Dataset: List of files with all/few parameters in common Dataset may contain files (Technical problem: More than 10k files)
Discover dataset and file locations? DDM central catalog: list of sites contain at least a fraction of dataset List of guids per dataset + checksum + size (for consistency check) List of files on a site: Get information from LFC catalog (Beijing -> LYON LFC) Input : list of guids Middle : For each guid, list of replicas managed by T1 LFC Output: List of guids on the site In addition: list of SURLs of the files (used by jobs) WARNING : No check of consistency between catalogs : DDM/ LFC/SE →Check done a posteriori Long operation which consums ressources Start work on dumping SE catalog content -> Physicists will or will require to put data on reliable sites (from their empirical measurement) Our/your role: Provide reliable sites (hardware and services) Measure the reliability and help sites to progress Provide feed-back to Grid or ATLAS developpers
Catalogs consistency (central managment) One of central development now: Get dumps of SE content (avoid load on SE) Compare with LFC and DDM catalog Compare, understand and reduce differences between: Disk space occupancy (space token) Expected dataset occupancy Example : BEIJING-LCG2_MCDISK : Space token : 75 GB DDM : 1 GB Most probable reason : Orphan MC files Gain : Efficient usage of disk space Find lost files as soon as possible to replicate from other sites to help sites to understand the problem (if needed)
Schedulded data transfers Go ATLAS ( Distributed Data Managment) DDM provides list of replicas. Requester can restrict the source sites (reliable) DDM scans LFC catalogs and chooses the source files DDM triggers FTS transfers When succesfull, DDM registers new files in LFC DDM publishes a new list of replicas
Transfer hierarchy and path LYON CERN T2 IHEP T1 LYON FTS CERN FTS T1 FTS Dataset
DDM follow-up Central operation team (meeting at 9:00 am each day): Follow development (Grid, DDM) Report DDM misbehaviors to developers (restart services if possible) Manage central services (CERN) : Machines + DDM Check the availability of T1 services (LFC,FTS,…): can contact T1 Focus mainly on T0->T1 Always one person on call Team per cloud: Report ATLAS requests and DDM requests to T1/T2 sites Report results of tests Follow T2 availability and contact in case of problems Active team is strongly requested by T2 sites to keep responsiveness Contacted by central operation by mail Possible overlap between 2 groups
Transfer policy DDM : Schedulded transfers : restricted to happy few (Production role) LFC : Discovery of data locations FTS for physical transfers Advantage: Organise and monitor massive transfer (retrial if needed) Protect srm from too many random accesses Physicists request data locally: Bash commands to replicate data on his UI or local SE (based on lcg-cp) Users manages themselves (not OK fr bug datasets) New location can be registered in DDM or Fill Web page to request data replication (validated at cloud level) FR cloud : Luc Poggioli Will use DDM (follow up by DDM central team)
User transfer LYON CERN T2 IHEP T1 LYON FTS CERN FTS T1 FTS Dataset
Key components for IHEP Lyon : FTS server, LFC server, SE Source cloud : FTS server (in from T2), LFC server Central catalog : CERN Discussions to have another LFC replica
Feb-May 08 : CCRC08 T0 LYO N FZK T2 15% RAW, 15% ESD, 100% AOD 15% RAW, 15% ESD, 100% AOD AOD share Test of the infrastructures DDM Sites Goal : Reach stability and organize DDM central/site over few days
CCRC08: Results for T1s MB/s
CCRC08: Results for FR T2s MB/s
CCRC08: Results for BEIJING In May, many transfer problems in BEIJING More stable situation since end of May Continuous effort needed from T1 and BEIJING to reach stability to keep knowledge of ATLAS organisation
Now CCRC08 continues for ATLAS (lower rate) Result: Everything works during challenges After, pressure is decreasing and quality of service decreases Difficult especially for T1 Discover site problems before site Need more monitoring DDM errors are still understood only by few people: Permanent expert on call Please complain if you receive no data during more than 8 hours
Replication policy Request per cloud: 100 % AOD dispatched over one cloud Site can request any D1PD Organisation Each site has to define the type of datasets it requires (each year ?) To be coherent with DDM technical possibility Negociation between sites from same cloud to fulfill ALAS requirement For 2008: LYON : T2 components can access all AOD (T1 storage) GRIF ((LAL-IRFU-LPNHE)~Paris): 100 % AOD TOKYO : 100 % AOD ->All other sites can define there own policy BEIJING : 100 % AOD electron stream 50 % AOD muon stream D1PD (to be defined as other sites)
New data organisation Introduced in the last months For each type of data, define precise path in SE name space associate to a space token Space token (available in last months) : Software reservation and accounting of disk space More flexible than defining pools Publish avalaible/used disk space Main limitation : limited implementations of ACL (only one owner) Implemented in BEIJING dcache (within available functionalities)
ATLAS Space Tokens token namestorage type used ATLASDATATAPET1D0RAW data, ESD, AOD from re-proc XX ATLASDATADISKT0D1ESD, AOD from dataXXX ATLASMCTAPET1D0HITS from G4, AOD from ATLFAST X ATLASMCDISKT0D1AOD from MCXXX ATLASPRODDISKT0D1buffer for in-and exportX ATLASGROUPDISKT0D1DPDXXX ATLASUSERDISKT0D1User DataXX *) ATLASLOCALGROUP DISK T0D1Local User *) Although there is officially no user analysis at the Tier-1’s, many Tier-1’s have a Tier-2 component for which this USERDISK is needed (to be decided by each T1 individually) 24
2 TB MC TAPE MC TAPE HITHIT 2 TB PRODDIS K CPUs Pile-up digitization reconstruction G4 and ATLFAST Simulation 120 TB MC DISK MC DISK AO D HIT S AO D AOD from ATLFAS T HITS from G4 AOD from ATLFAS T 25 TB AOD from ATLFAS T HITS from G4 EVN T 15 TB MC DISK MC DISK All other T1’s AO D EVN T 6 TB GROUPD ISK 6 TB GROUPDIS K CPUs 5 TB USERDISK User analysis Group analysis On request HITS from G4 AO D DPD making D1P D D2P D D1P D D2P D 25 D2P D AO D D1P D
Conclusion DDM tool+ Grid components are becoming reliable T2s : Setup almost frozen for 2008 (to be deployed and adapt analysis tools) Global reliability: depends mainly on site stability (SAM tests not enough) Mainly focused on T1s during this summer Need pressure from T2s not to forget them Request : continue participation to survey and communication with FR T2s For BEIJING T2: close collaboration needed between Site administrators Local ATLAS computing contact Local physicist users (most difficult)