ATLAS Sites Jamboree, CERN January, 2017

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
Brookhaven Analysis Facility Michael Ernst Brookhaven National Laboratory U.S. ATLAS Facility Meeting University of Chicago, Chicago 19 – 20 August, 2009.
BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
Atlas Tier 3 Overview Doug Benjamin Duke University.
J. Shank DOSAR Workshop LSU 2 April 2009 DOSAR Workshop VII 2 April ATLAS Grid Activities Preparing for Data Analysis Jim Shank.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
Dynamic Extension of the INFN Tier-1 on external resources
Status of BESIII Computing
Bob Ball/University of Michigan
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Xiaomei Zhang CMS IHEP Group Meeting December
Virtualization and Clouds ATLAS position
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Status of the SRM 2.2 MoU extension
U.S. ATLAS Grid Production Experience
BNL Tier1 Report Worker nodes Tier 1: added 88 Dell R430 nodes
Future of WAN Access in ATLAS
ATLAS Cloud Operations
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
October 28, 2013 at 14th CERN-Korea Committee, Geneva
Yaodong CHENG Computing Center, IHEP, CAS 2016 Fall HEPiX Workshop
CC - IN2P3 Site Report Hepix Spring meeting 2011 Darmstadt May 3rd
Service Challenge 3 CERN
Data Challenge with the Grid in ATLAS
Provisioning 160,000 cores with HEPCloud at SC17
Vanderbilt Tier 2 Project
Bernd Panzer-Steindel, CERN/IT
Update on Plan for KISTI-GSDC
Status and Prospects of The LHC Experiments Computing
Luca dell’Agnello INFN-CNAF
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Readiness of ATLAS Computing - A personal view
The ADC Operations Story
LCGAA nightlies infrastructure
Southwest Tier 2.
Artem Trunov and EKP team EPK – Uni Karlsruhe
Project Status Report Computing Resource Review Board Ian Bird
Monitoring at a Multi-Site Tier 1
XROOTd for cloud storage
NET2.
LQCD Computing Operations
Brookhaven National Laboratory Storage service Group Hironori Ito
Hironori Ito Brookhaven National Laboratory
Ákos Frohner EGEE'08 September 2008
WLCG and support for IPv6-only CPU
Large Scale Test of a storage solution based on an Industry Standard
Kirill Lozinskiy NERSC Storage Systems Group
New strategies of the LHC experiments to meet
Grid Canada Testbed using HEP applications
This work is supported by projects Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1 6013/ ) from.
IPv6 update Duncan Rand Imperial College London
The LHCb Computing Data Challenge DC06
Presentation transcript:

ATLAS Sites Jamboree, CERN 18 - 20 January, 2017 BNL Site Report Xin Zhao, Hironori Ito Brookhaven National Laboratory, USA ATLAS Sites Jamboree, CERN 18 - 20 January, 2017 September 15th, 2004, 17H00, Session A6. 7-Aug-18 Xin Zhao, BNL

Outline General Status Object Store and its Integration with dCache (Hiro) Tier1 Network Migration Increased Bandwidth for LHCONE/LHCOPN Production Jobs on Local (Opportunistic) Tier3 queue Staging Test directly from HPSS tapes Running ATLAS jobs on OSG Opportunistic Sites Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 2

General Status BNL CSI (Computational Science Initiative) Centralizing scientific computing at BNL RACF/Tier1 => SDCC (Scientific Data and Computing Center) ATLAS Computing Facility (Tier1) Running fine overall Pledges fulfilled for 2016 CPU capability Added 100 Dell R430 systems, total ~ 18k condor job slots dCache: disk storage 14PB (pledge 11.5PB) HPSS: ~22PB of ATLAS data on tapes (pledge 27PB) Now move to some highlighted topics … Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 3

Object Store and Integration with dCache We have been running two independent TEST instances of Ceph storage by using the retired dCache storages and other retired servers. One of them, which have previously used as the main S3 storage for event service, has been retired completely to make the space for newly retired storage. At the time of retirement, there exist more than 30M objects (data) in the S3 storage. It seems that (a) the deletion service is not working and/or (b) the data is not by any means temporary and/or (c) the data is not being used. The other one, previously called a test instance, is now the primary and only S3 service currently. But, due to various re-organization of backend storage servers, it is currently at less than ½ capacity and performance. The new instance of Ceph storage is currently being installed. S3 storage is not the only use of Ceph storage at BNL. Ceph librados: dCache storage pool, Gridftp, XROOTd CephFS is being used for the cloud storage, Gridftp, XROOTd Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 4

S3 Performace via Event Services Doug, Taylor and Wen has been the primary persons who are closely looking at the behavior of S3 storage through the event service jobs at HPC sites (NERSC and ANL) as well as opportunistic event service jobs in the grid. Doug and Taylor has reported the performance issues seen in Event service jobs at HPC sites. They also has done a simulated tests to evaluate the observed S3 client behavior. https://drive.google.com/open?id=1U00Zl11gjd_0yoYWFnkZ3Aj3ElHLSjLRgMR7I1H2SMw https://drive.google.com/open?id=1qta2BQst1dsLb0wzgXvqT7O2NMQQVKzgtKWfy9DM6f8 https://drive.google.com/open?id=1ZuUK0rnrlTYgiT7Z0_15IBVeCHHzwqkd2X7vc0kPIZQ ADC has asked BNL to do more tests to study the S3 performance systematically by looking at the following parameters. The number of clients, RTT, data size, the number of buckets, the deletion rate The study is currently under way and still on-going. 7-Aug-18 Xin Zhao, BNL

Tier1 Network Migration Move Tier1 (ACF) network out of BNL Campus Network Internet BNL Campus Network (protected by BNL Perimeter Firewall) Science DMZ (open to the internet) ACF Network (protected by US ATLAS Firewall) Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 6

Tier1 Network Migration Why the move? Allow IPv6 rollout to ATLAS Tier1 facility Separate high bandwidth firewalled scientific internet traffic from BNL campus general purpose internet traffic Isolate Tier1 from the BNL campus network, to allow the Tier1 facility to benefit from cybersecurity rules that govern “scientific” traffic. Schedule January 30th is the cut-over day, downtime may be scheduled Effect to users, after the migration Transparent to users/jobs coming from outside of BNL Minor changes in ways of accessing interactive nodes for local ATLAS (Tier3) users Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 7

Increased Bandwidth for LHCONE/LHCOPN BNL network (perimeter/science DMZ) topology One primary (100G) for ALL traffic, one backup (100G) Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 8

Increased Bandwidth for LHCONE/LHCOPN LHCONE+LHCOPN saturated the primary circuit at 100Gbps, for the first time, around the end of August, 2016 The old backup 100G connection is now the primary circuit for LHCONE The two circuits back up each other. Effectively, total bandwidth to ESnet is doubled 7-Aug-18 Xin Zhao, BNL

Production Jobs on Local (Opportunistic) Tier3 Queue BNL local Tier3 resources ~2k job slots, local users jobs preempt others single core jobs only Preemption doesn’t work well with Partitionable slots in condor (in contact with HTCondor developers) Backfill of Production Jobs PanDA queue: BNL_LOCAL High failure rate (>50%) for regular single core jobs, due to preemption Single core ES jobs are ideal Implemented and tested by ADC successfully Need more ! --- many idle CPU-hours Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 10

Production Jobs on Local (Opportunistic) Tier3 Queue Spotty Backfill Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 11

Staging Test from Tapes Staging Test (Jan 10-14): replicate 150TB AODs from DATATAPE to DATADISK ~1500 new reqs added, per hour Transfer rate : not constant, average at 385MB/s Number of Queued Reqs, did not go up Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 12

Staging Test from Tapes Number of Files Queued VS Staged /hour Improvements ? Increase File size Bulk request Number of files staged / hour Link to EDMS. Update web page. Copy abstract from paper. Small files 7-Aug-18 Xin Zhao, BNL 13

Staging Test from Tapes Improvements ? Increase File size Bulk request: BNL tape system optimizer reduces tape remounts Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 14

Staging Test from Tapes Use Case Study of Tape System Performance: BNL STAR Experiment Sep 2016, STAR submitted ~245,000 files Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 15

ATLAS Jobs on OSG Opportunistic Sites Each OSG Opportunistic site has its own PanDA queue and AGIS entries Initially one PanDA queue for all OSG opportunistic sites : proved to be too course-grained and difficult to troubleshoot Required working out AGIS hierarchy for opportunistic sites in order to separate pledged vs. non-pledged resources for accounting (thanks to Alexey Anisenkov) Most sites are CMS-owned, some non- LHC sites as well, so we get slots during CMS lulls. Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 16

ATLAS Jobs on OSG Opportunistic Sites Usage is volatile, but peak simultaneous jobs can be significant Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 17

Questions ? 7-Aug-18 Xin Zhao, BNL Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 18

Backup Slides 7-Aug-18 Xin Zhao, BNL Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 19

Backup Slides 7-Aug-18 Xin Zhao, BNL Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 20

Backup Slides 7-Aug-18 Xin Zhao, BNL Jan 11 Sample log Link to EDMS. Date New Req Staged Failed Data Volume Average MB/s 01/11/2017 00:00:00    1605   1633   0    585,396,061,953  155.08   01/11/2017 01:00:00    2007   1951   0    760,356,521,658  201.43   01/11/2017 02:00:00    1791   1855   0   3,062,424,578,459  811.27   01/11/2017 03:00:00    1982   2033   0   3,888,970,095,398   1030.23   01/11/2017 04:00:00    1699   1710   0   3,796,246,265,582   1005.66   01/11/2017 05:00:00    2517   2482   0   3,003,709,885,109    795.71   01/11/2017 06:00:00    2251   2241   0    762,490,212,934   201.99   01/11/2017 07:00:00    2081   2015   0    648,813,882,979    171.88   01/11/2017 08:00:00    1228   1215   0    440,803,029,680    116.77   01/11/2017 09:00:00    2297   2420   0    972,648,422,679    257.66   01/11/2017 10:00:00    2229   2202   0   1,672,576,231,070    443.08   01/11/2017 11:00:00    1981   1964   0   2,662,066,393,059    705.21   01/11/2017 12:00:00    1919   1947   0   2,565,759,484,005    679.69   01/11/2017 13:00:00    1747   1721   0   4,062,789,294,705   1076.27   01/11/2017 14:00:00    2176   2194   0   3,186,628,257,301    844.17   01/11/2017 15:00:00    1845   1777   0   2,370,421,391,742    627.95   01/11/2017 16:00:00    1326   1375   0   3,358,896,586,460    889.80   01/11/2017 17:00:00    1628   1599   0   2,618,689,730,919    693.72   01/11/2017 18:00:00    1557   1692   0   2,920,748,610,089    773.73   01/11/2017 19:00:00    2040   1883   0   2,192,466,095,603    580.81   01/11/2017 20:00:00    1446   1525   0   1,448,261,789,917    383.66   01/11/2017 21:00:00    1734   1705   0   3,202,032,394,601    848.25   01/11/2017 22:00:00    1539   1507   0   1,991,100,218,575    527.46   01/11/2017 23:00:00    1429   1368   2   1,238,955,364,515    328.21  Jan 11 Sample log Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 21

Backup Slides 7-Aug-18 Xin Zhao, BNL Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 22

Backup Slides 7-Aug-18 Xin Zhao, BNL Link to EDMS. Update web page. Copy abstract from paper. 7-Aug-18 Xin Zhao, BNL 23