Data management in ATLAS S. Jézéquel LAPP-CNRS-France.

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

Status of BESIII Distributed Computing BESIII Workshop, Mar 2015 Xianghu Zhao On Behalf of the BESIII Distributed Computing Group.
Grid data transfer tools Tier 3(g,w) meeting at ANL ASC – May 19, 2009 Marco Mambelli – University of Chicago
December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
FZU participation in the Tier0 test CERN August 3, 2006.
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Data management at T3s Hironori Ito Brookhaven National Laboratory.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
CCRC08-1 report WLCG Workshop, April KorsBos, ATLAS/NIKHEF/CERN.
BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing.
А.Минаенко Совещание по физике и компьютингу, 03 февраля 2010 г. НИИЯФ МГУ, Москва Текущее состояние и ближайшие перспективы компьютинга для АТЛАСа в России.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
V.Ilyin, V.Gavrilov, O.Kodolova, V.Korenkov, E.Tikhonenko Meeting of Russia-CERN JWG on LHC computing CERN, March 14, 2007 RDMS CMS Computing.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
ATLAS Bulk Pre-stageing Tests Graeme Stewart University of Glasgow.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
INFSO-RI Enabling Grids for E-sciencE ATLAS DDM Operations - II Monitoring and Daily Tasks Jiří Chudoba ATLAS meeting, ,
Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
David Stickland CMS Core Software and Computing
14/03/2007A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 14/03/07.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI VO auger experience with large scale simulations on the grid Jiří Chudoba.
U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)
ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
Discussion on data transfer options for Tier 3 Tier 3(g,w) meeting at ANL ASC – May 19, 2009 Marco Mambelli – University of Chicago
WLCG November Plan for shutdown and 2009 data-taking Kors Bos.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
CERN IT Department CH-1211 Genève 23 Switzerland t L'infrastructure de calcul pour le LHC Le point de vue d'ATLAS Simone Campana CERN IT/GS.
1 S. JEZEQUEL- First chinese-french workshop 13 December 2006 Grid: An LHC user point of vue S. Jézéquel (LAPP-CNRS/Université de Savoie)
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
ATLAS Computing: Experience from first data processing and analysis Workshop TYL’10.
The CMS Beijing Tier 2: Status and Application Xiaomei Zhang CMS IHEP Group Meeting December 28, 2007.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
1 LCG-France 22 November 2010 Tier2s connectivity requirements 22 Novembre 2010 S. Jézéquel (LAPP-ATLAS)
Computing Operations Roadmap
Xiaomei Zhang CMS IHEP Group Meeting December
David Adams Brookhaven National Laboratory September 28, 2006
Data Challenge with the Grid in ATLAS
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Readiness of ATLAS Computing - A personal view
Simulation use cases for T2 in ALICE
ALICE Computing Model in Run3
ALICE Computing Upgrade Predrag Buncic
Pierre Girard ATLAS Visit
Presentation transcript:

Data management in ATLAS S. Jézéquel LAPP-CNRS-France

SFO2 SFO1 SFO3 SFO4 T0 Tape Reconstruction farm T1 T2 Real data

RAW e High Level Trigger -> Stream (e, mu, tau/jet, MinBias, Beauty) RAW mu RAW jet/tau RAW MinBias RAW Beauty ESD e ESD mu ESD jet/tau ESD MinBias ESD Beauty AOD e AOD mu AOD jet/tau AOD MinBias AOD Beauty D1PD S.M./e D1PD top D1PD Susy D1PD Higgs D1PD Beauty 10 files/stream 5x10 files total Central management Production Replication Deletion

Tape Reconstruction farm T1 T2 Simulation data (MC) Central management Production Replication Deletion

Simulation Production number of events assume: HITS = 4 MB RDO = 2 MB ESD = 1 MB AOD =.2 MB TAG =.01 MB assume: HITS = 4 MB RDO = 2 MB ESD = 1 MB AOD =.2 MB TAG =.01 MB June-Sept TeV Sept-Dec TeV Dec-Mar TeV total Geant415M 45M ATLFAST85M 255M total100M 300M assume: we keep 20% RDO’s we keep 20% ESD assume: we keep 20% RDO’s we keep 20% ESD 5

June-Sept 10 TeV Sept-Dec 10 TeV Dec-Mar 14 TeV total HITS from G4 60 TB 180 TB AOD from HITS 3 TB 9 TB 20% RDO from HITS 6 TB 18 TB 20% ESD from HITS 3 TB 9 TB 100% TAG from HITS 0.15 TB 0.5 TB AOD from ATLFAST 17 TB 51 TB total89 TB 268 TB Simulation Production sample sizes 6

Orders of magnitude Real data RAW=1.5 MB/evt ESD =0.5 MB/evt AOD=0.2 MB/evt DPD=0.02 MB/evt per type? 1 file = 1GB Write at 200 Hz during 50k seconds/day : RAW=15 TB/day or 15k files/day ESD= 5 TB/day or 5k files/day AOD= 2 TB/day -> 60 days : ~100 TB DPD=200 GB/day per type Total : 20 TB/day or 20k files

File grouping in datasets Too many files -> Manipulated in blocks File : Contains a list of independent events Dataset: List of files with all/few parameters in common Dataset may contain files (Technical problem: More than 10k files)

Discover dataset and file locations? DDM central catalog: list of sites contain at least a fraction of dataset List of guids per dataset + checksum + size (for consistency check) List of files on a site: Get information from LFC catalog (Beijing -> LYON LFC) Input : list of guids Middle : For each guid, list of replicas managed by T1 LFC Output: List of guids on the site In addition: list of SURLs of the files (used by jobs) WARNING : No check of consistency between catalogs : DDM/ LFC/SE →Check done a posteriori Long operation which consums ressources Start work on dumping SE catalog content -> Physicists will or will require to put data on reliable sites (from their empirical measurement) Our/your role: Provide reliable sites (hardware and services) Measure the reliability and help sites to progress Provide feed-back to Grid or ATLAS developpers

Catalogs consistency (central managment) One of central development now: Get dumps of SE content (avoid load on SE) Compare with LFC and DDM catalog Compare, understand and reduce differences between: Disk space occupancy (space token) Expected dataset occupancy Example : BEIJING-LCG2_MCDISK : Space token : 75 GB DDM : 1 GB Most probable reason : Orphan MC files Gain : Efficient usage of disk space Find lost files as soon as possible to replicate from other sites to help sites to understand the problem (if needed)

Schedulded data transfers Go ATLAS ( Distributed Data Managment) DDM provides list of replicas. Requester can restrict the source sites (reliable) DDM scans LFC catalogs and chooses the source files DDM triggers FTS transfers When succesfull, DDM registers new files in LFC DDM publishes a new list of replicas

Transfer hierarchy and path LYON CERN T2 IHEP T1 LYON FTS CERN FTS T1 FTS Dataset

DDM follow-up Central operation team (meeting at 9:00 am each day): Follow development (Grid, DDM) Report DDM misbehaviors to developers (restart services if possible) Manage central services (CERN) : Machines + DDM Check the availability of T1 services (LFC,FTS,…): can contact T1 Focus mainly on T0->T1 Always one person on call Team per cloud: Report ATLAS requests and DDM requests to T1/T2 sites Report results of tests Follow T2 availability and contact in case of problems Active team is strongly requested by T2 sites to keep responsiveness Contacted by central operation by mail Possible overlap between 2 groups

Transfer policy DDM : Schedulded transfers : restricted to happy few (Production role) LFC : Discovery of data locations FTS for physical transfers Advantage: Organise and monitor massive transfer (retrial if needed) Protect srm from too many random accesses Physicists request data locally: Bash commands to replicate data on his UI or local SE (based on lcg-cp) Users manages themselves (not OK fr bug datasets) New location can be registered in DDM or Fill Web page to request data replication (validated at cloud level) FR cloud : Luc Poggioli Will use DDM (follow up by DDM central team)

User transfer LYON CERN T2 IHEP T1 LYON FTS CERN FTS T1 FTS Dataset

Key components for IHEP Lyon : FTS server, LFC server, SE Source cloud : FTS server (in from T2), LFC server Central catalog : CERN Discussions to have another LFC replica

Feb-May 08 : CCRC08 T0 LYO N FZK T2 15% RAW, 15% ESD, 100% AOD 15% RAW, 15% ESD, 100% AOD AOD share Test of the infrastructures DDM Sites Goal : Reach stability and organize DDM central/site over few days

CCRC08: Results for T1s MB/s

CCRC08: Results for FR T2s MB/s

CCRC08: Results for BEIJING In May, many transfer problems in BEIJING More stable situation since end of May Continuous effort needed from T1 and BEIJING to reach stability to keep knowledge of ATLAS organisation

Now CCRC08 continues for ATLAS (lower rate) Result: Everything works during challenges After, pressure is decreasing and quality of service decreases Difficult especially for T1 Discover site problems before site Need more monitoring DDM errors are still understood only by few people: Permanent expert on call Please complain if you receive no data during more than 8 hours

Replication policy Request per cloud: 100 % AOD dispatched over one cloud Site can request any D1PD Organisation Each site has to define the type of datasets it requires (each year ?) To be coherent with DDM technical possibility Negociation between sites from same cloud to fulfill ALAS requirement For 2008: LYON : T2 components can access all AOD (T1 storage) GRIF ((LAL-IRFU-LPNHE)~Paris): 100 % AOD TOKYO : 100 % AOD ->All other sites can define there own policy BEIJING : 100 % AOD electron stream 50 % AOD muon stream D1PD (to be defined as other sites)

New data organisation Introduced in the last months For each type of data, define precise path in SE name space associate to a space token Space token (available in last months) : Software reservation and accounting of disk space More flexible than defining pools Publish avalaible/used disk space Main limitation : limited implementations of ACL (only one owner) Implemented in BEIJING dcache (within available functionalities)

ATLAS Space Tokens token namestorage type used ATLASDATATAPET1D0RAW data, ESD, AOD from re-proc XX ATLASDATADISKT0D1ESD, AOD from dataXXX ATLASMCTAPET1D0HITS from G4, AOD from ATLFAST X ATLASMCDISKT0D1AOD from MCXXX ATLASPRODDISKT0D1buffer for in-and exportX ATLASGROUPDISKT0D1DPDXXX ATLASUSERDISKT0D1User DataXX *) ATLASLOCALGROUP DISK T0D1Local User *) Although there is officially no user analysis at the Tier-1’s, many Tier-1’s have a Tier-2 component for which this USERDISK is needed (to be decided by each T1 individually) 24

2 TB MC TAPE MC TAPE HITHIT 2 TB PRODDIS K CPUs Pile-up digitization reconstruction G4 and ATLFAST Simulation 120 TB MC DISK MC DISK AO D HIT S AO D AOD from ATLFAS T HITS from G4 AOD from ATLFAS T 25 TB AOD from ATLFAS T HITS from G4 EVN T 15 TB MC DISK MC DISK All other T1’s AO D EVN T 6 TB GROUPD ISK 6 TB GROUPDIS K CPUs 5 TB USERDISK User analysis Group analysis On request HITS from G4 AO D DPD making D1P D D2P D D1P D D2P D 25 D2P D AO D D1P D

Conclusion DDM tool+ Grid components are becoming reliable T2s : Setup almost frozen for 2008 (to be deployed and adapt analysis tools) Global reliability: depends mainly on site stability (SAM tests not enough) Mainly focused on T1s during this summer Need pressure from T2s not to forget them Request : continue participation to survey and communication with FR T2s For BEIJING T2: close collaboration needed between Site administrators Local ATLAS computing contact Local physicist users (most difficult)