US ATLAS DDM Operations Alexei Klimentov, BNL US ATLAS Tier-2 Workshop UCSD, Mar 8 th 2007.

Slides:



Advertisements
Similar presentations
Southwest Tier 2 Center Status Report U.S. ATLAS Tier 2 Workshop - Harvard Mark Sosebee for the SWT2 Center August 17, 2006.
Advertisements

December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS Computing Integration.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
The Panda System Mark Sosebee (for K. De) University of Texas at Arlington dosar workshop March 30, 2006.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO.
Alexei Klimentov : ATLAS Computing CHEP March Prague Reprocessing LHC beam and cosmic ray data with the ATLAS distributed Production System.
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
How to Install and Use the DQ2 User Tools US ATLAS Tier2 workshop at IU June 20, Bloomington, IN Marco Mambelli University of Chicago.
DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
BNL DDM Status Report Hironori Ito Brookhaven National Laboratory.
PanDA Monitor Development ATLAS S&C Workshop by V.Fine (BNL)
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
My Name: ATLAS Computing Meeting – NN Xxxxxx A Dynamic System for ATLAS Software Installation on OSG Sites Xin Zhao, Tadashi Maeno, Torre Wenaus.
BNL ATLAS Database service update Yuri Smirnov, Iris Wu BNL, USA LCG Database Deployment and Persistency Workshop, CERN, Geneva October 17-19, 2005.
Status & Plan of the Xrootd Federation Wei Yang 13/19/12 US ATLAS Computing Facility Meeting at 2012 OSG AHM, University of Nebraska, Lincoln.
June 22, 2007USATLAS T2-T3 DQ2 0.3 SiteServices Patrick McGuigan
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
Nurcan Ozturk University of Texas at Arlington US ATLAS Transparent Distributed Facility Workshop University of North Carolina - March 4, 2008 A Distributed.
10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
PROOF Farm preparation for Atlas FDR-1 Wensheng Deng, Tadashi Maeno, Sergey Panitkin, Robert Petkus, Ofer Rind, Torre Wenaus, Shuwei Ye BNL.
ATLAS Bulk Pre-stageing Tests Graeme Stewart University of Glasgow.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Post-DC2/Rome Production Kaushik De, Mark Sosebee University of Texas at Arlington U.S. Grid Phone Meeting July 13, 2005.
ATLAS XRootd Demonstrator Doug Benjamin Duke University On behalf of ATLAS.
The ATLAS Cloud Model Simone Campana. LCG sites and ATLAS sites LCG counts almost 200 sites. –Almost all of them support the ATLAS VO. –The ATLAS production.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
INFSO-RI Enabling Grids for E-sciencE ATLAS DDM Operations - II Monitoring and Daily Tasks Jiří Chudoba ATLAS meeting, ,
Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.
WLCG Service Report ~~~ WLCG Management Board, 7 th September 2010 Updated 8 th September
ATLAS Distributed Data Management Operations Experience and Projection Alexei Klimentov, Pavel Nevski Brookhaven National Laboratory Sergey Pirogov, Alexander.
ATLAS Midwest Tier2 University of Chicago Indiana University Rob Gardner Computation and Enrico Fermi Institutes University of Chicago WLCG Collaboration.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
BNL Oracle database services status and future plans Carlos Fernando Gamboa, John DeStefano, Dantong Yu Grid Group, RACF Facility Brookhaven National Lab,
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
14/03/2007A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 14/03/07.
U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
UTA Site Report Jae Yu UTA Site Report 7 th DOSAR Workshop Louisiana State University Apr. 2 – 3, 2009 Jae Yu Univ. of Texas, Arlington.
ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.
Data Management at Tier-1 and Tier-2 Centers Hironori Ito Brookhaven National Laboratory US ATLAS Tier-2/Tier-3/OSG meeting March 2010.
WLCG November Plan for shutdown and 2009 data-taking Kors Bos.
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Production System 2 manpower and funding issues Alexei Klimentov Brookhaven National Laboratory Aug 19, 2013 Production System Technical Meeting CERN.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
ATLAS DDM Operations Issues
Computing Operations Roadmap
LCG 3D Distributed Deployment of Databases
David Adams Brookhaven National Laboratory September 28, 2006
BNL FTS services Hironori Ito.
A full demonstration based on a “real” analysis scenario
Southwest Tier 2 Center Status Report
ATLAS Sites Jamboree, CERN January, 2017
Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.
U.S. ATLAS Testbed Status Report
Presentation transcript:

US ATLAS DDM Operations Alexei Klimentov, BNL US ATLAS Tier-2 Workshop UCSD, Mar 8 th 2007

Mar 8, 2007US ATLAS T2 WS. A.Klimentov2 DDM Operations (US ATLAS) Coordination : Alexei Klimentov and Wensheng Deng –BNL : Wensheng Deng, Hironori Ito, Xin Zhao –GLTier2 : UM : Shawn Mckee, –NETier2 : BU : Saul Youssef, –WTier2 : Wei Yang –SWTier2 : Patrick McGuigan –OU : Horst Severini, Karthik Arunachalam –UTA : Patrick McGuigan, Mark Sosebee –MWTier2 : Dan Schrager IU : Kristy Kallback-Rose, Dan Schrager, UC : Robert Gardner, Greg Cross

Mar 8, 2007US ATLAS T2 WS. A.Klimentov3 DDM Operations (US ATLAS) Main Activities DDM AODs, EVGEN, DB releases distribution, RDOs consolidation : –W.Deng, A.Klimentov, P.Nevski LFC stress test : –A.Klimentov DDM development and deployment (0.2.12/0.3) : –P.McGuigan, W.Deng, H.Ito End users tools (development and support) –T.Maeno Monitoring and metrics –P.McGuigan, S.Reddy, D.Adams,H.Ito,AK User’s Support – X.Zhao, H.Ito, H.Severini, AK Documentation, FAQs, WiKi, procedures –H.Severini – Troubleshooting console – R.Gardner et al

Mar 8, 2007US ATLAS T2 WS. A.Klimentov4 AODs, RDOs, EVGEN and DB releases distribution (EVGEN & RDO) Follow Physics Coordinator request for RDO replication –From BNL (~8 TB) to LYON and FZK –From ALL sites (T1 and T2, for US ATLAS – BNLTAPE and BNLPANDA) to CERN Status is available EVGEN (P.Nevski) started a systematic replication of event generator input files to all Tier-1 centers for production needs. For the moment only recently produced EVGEN datasets are subscribed and no dedicated monitoring is available. this will be improved in the future.

Mar 8, 2007US ATLAS T2 WS. A.Klimentov5 AODs, RDOs, EVGEN and DB releases distribution (AODs) Follow ATLAS and US ATLAS Computing model –ATLAS Model: Each cloud must have at least 2 complete copies of AODs One copy on Tier-1 and the second copy shared between Tier-2s –US ATLAS Model : Each Tier-2 must have a complete AOD copy Currently : AGLT2, BU, MWT2, UTA – 100%, SLAC – 10% 2 steps central subscription policy –AODs are subscribed centrally from source Tier-1 to all Tier-1s and CERN (only if datasets have files) –AODs are subscribed centrally from parent Tier-1 to Tier-2s within the cloud

Mar 8, 2007US ATLAS T2 WS. A.Klimentov6 AODs, RDOs, EVGEN and DB releases distribution. AODs (Cont.) Last Subscription : 138Kfiles, 600 datasets (total) –2 DDM instances : BNLDISK (initially BNLTAPE) and BNLPANDA BNLDISK : destination for Tier-1s and source for US Tier-2s BNLPANDA source for Tier-1s and for US Tier-2s loading on BNL facilities –BNL demonstrates steady performance : All subscriptions are processed Data to BNL ~1+TB /week (reached after Feb 25 th ) Also data from BNL is replicated to other sites with ~100% performance 85% of files are replicated –Similar numbers for CERN,FZK, LYON and NDGF –15% »2-3% related to BNL problem »7% data not available on source Tier-1s »~5% instability of central services –PIC : out of disk space –ASGC,RAL, NIKHEF/SARA processes ~50% of subscriptions –CNAF, TRIUMF – backlog is growing –AODs Data Volume on Tier-2s (P.McGuigan page) Subscription Periods Feb 5 – Feb 22Feb 25 – Mar 4Mar 5 - Mar 8 Datasets/files 40/10K230/45K400/80K Performance/waiting subscriptions 95%/085%/271%/6

Mar 8, 2007US ATLAS T2 WS. A.Klimentov7 AODs, RDOs, EVGEN and DB releases distribution. AODs (Known Problems) –Central services saturation –Large (10+K files) datasets –Interference of AODs and MC input files distribution within US cloud Data are coming to Tier-2s, but AODs data transfer is dominating Hypothesis –MC data transfer has lower priority after several transfer failures –Data is coming proportionally –dCache cannot serve all requests –GBs/files transfered to BU, MICH and UTA (from Hiro, Mar 4th) Type Files GB Type Files GB TotalGB UMICH AOD NTUP UTA AOD NTUP BU AOD NTUP –Frequent disks “switching” on Tier-2s –200B size files (all sites, but OSG) –Monitoring ARDA monitoring is improving, but we need more for day-by-day operations It takes hours to get information from LFC But it is even slower to get information using DQ2 APIs

Mar 8, 2007US ATLAS T2 WS. A.Klimentov8 DDM Deployment and Installation (Dec Tier-2 WS) Installation –A2. DDM ops test-bed set up (coordinated by Wensheng and Patrick) Central DB server prototype at BNL Sites service VO box at UTA –A3. DDM installation procedure (coordinated by Patrick) from Patrick (Dec 5 th ) “..Generic DDM installation procedure for all sites complete with pacman installation for The most basic test of subscribing to data at BNL works without a problem with test site UTA_TEST1…” DQ2 client (pacman version) Hiro (done) –A11. DDM installation and deployment (coordinated by Alexei) will be in production at least up to Mar validation done (Wensheng) Time slot and scenario agreed with Kaushik Installation on the first sites (BNL, BU, UTA) week 50 (Dec 11 – 18)

Mar 8, 2007US ATLAS T2 WS. A.Klimentov9 DDM Deployment and Installation. DQ in production since Oct was expected in production Mar 2007 and for (pre)testing on OSG Feb 25 th Current status : –Miguel : 0.3 will be in production after Tier-0 test Presumably end of March/beg of April –0.3 isn’t available yet for pre-testing on OSG (confirmed by DQ2 developers during ATLAS Operations meeting Mar 6 th ) –0.3 won’t be available for pre-testing until end of Tier-0 test (Mar 25 th ) –DDM developers are planned to dedicate 1 week after tests for 0.3 “integration” DDM operations : 0.3 will not be a production version until all necessary tests will be conducted and pacman installation for OSG sites will be ready.

Mar 8, 2007US ATLAS T2 WS. A.Klimentov10 DDM Operations priority list Data Integrity Check at BNL and Tier-2s (H.Ito, P.Nevski) –dCache content against LRC –LRC content against DDM catalogs 85% of data is replicated using central subscription (the number is monitored regularly) –What about 15% ? –All information is available now, no need to scan DQ2 datasets –W.Deng is working on automatic resubscription/dq2_cr program, to have ALL data at BNL and US T2 Monitoring –Not to repeat ARDA monitoring, but provide all information about day-by-day operations D.Adams, PM, AK, monitoring pages S.Reddy will make one coherent source of info Metrics (H.Ito) –Number of subscriptions per site, obsolete subscriptions, time to process subscription, etc –In action items for a while LRC POOL dependencies (P.Salgado, W.Deng) Troubleshooting console (R.Gardner et al) 0.3 installation (W.Deng, H.ito, P.McGuigan)

Mar 8, 2007US ATLAS T2 WS. A.Klimentov11 DDM Operations priority list (cont) DQ2/Panda –Policy in case of network or central services are not available –Files registration in TID datasets only after files are replicated to BNL and registered in LRC Minimize human intervention in DDM operations –Generic integrity check scripts for all ATLAS sites –More scripts to check dataset content on site (PM pages) Kaushik’s proposal : to get info for datasets produced by Panda using dataset name or task ID Recovering procedures –after failure of FTS, dCache, site services, network, etc 2007 Functional tests will address DDM performance issues (April 2007, after 0.3 deployment on sites)

Mar 8, 2007US ATLAS T2 WS. A.Klimentov12 DDM Operations URLs AODs and RDOs distribution – – DB releases distribution – ATLAS AODs/NTUPS delivered in BNL cloud (P.McGuigan) ATLAS CSC datasets at BNL /html/bnl_datasets.html /html/bnl_datasets.html

Mar 8, 2007US ATLAS T2 WS. A.Klimentov13 Backup slides

Mar 8, 2007US ATLAS T2 WS. A.Klimentov14 Results on Performance Testing of the CERN Machines used for running the local tests: lxmrrb53[09/10].cern.ch  CPUs: 2x Intel Xeon 3.0 GHz (2 MB L2 cache)  RAM: 4 GB  NIC: 1 Gbps Ethernet Local test conditions:  Background load: < 2% (CPUs), < 45% (RAM)  Ping to the LFC (LRC) server: ≈ 0.5 (0.1) ms On the remote sites the similar 2xCPU ATLAS VO boxes were used. Tiers Plato Rate (Hz) Time per GUID, ms Plato Rate (Hz) Time per GUID, ms CERN / CNAF RAL ASGC LFC server: prod-lfc-atlas-local.cern.ch LFC test server : lxb1540.cern.ch.cern.ch December 2006 January 2007

Mar 8, 2007US ATLAS T2 WS. A.Klimentov15 LFC Performance Testing Jan 2007 : test LFC host API lib with bulk ops the same set of GUIDs average of 5 meas. 250+/- 35 Hz GUIDs processing rate Dec LFC production host Production API libs No bulk operations support 12.4 Hz GUIDs processing rate