February WLC GDB Short summaries Jeremy Coles
Introduction March GDB co-located with ISGC April pre-GDB – collaborating with other communities https://indico.cern.ch/event/578969/ WISE 27th-29th March; LHCOPN-LHCONE 4-5 April; 24th-28th April HEPiX Budapest…
AFS phaseout @ CERN Slow demise of upstream project Deadline ‘soft’ <2yrs Easy: Software-> CVMFS; websites->EOSWEB; FUSE. https://its.cern.ch/jira/browse/NOAFS 2017 for “harder stuff”. Project migrations; remove software. Moving /work and /user. Non-grid experiment use of AFS (home dir, T0 activity,..). Occasional GGUS tickets. “Misconfigured” sites. External AFS disconnection test. 2017-02-15 09:00 CET 24hts. ITSSB entry. Goal to flush out unknown AFS dependencies. Example /afs/cern.ch user home directories. Trying to setup CERN only CVMFS for compiler type software.
Pre-GDB on Benchmarking Mainly HEPiX WG context Mandate: Investigate scaling issues (HS06 vs HEP workloads); Evaluate fast benchmarks; study the next generation of long-running benchmark. Fast benchmarks: converged to DB12 & Atlas KV (looked at job durations, approach to running benchmarks; highlight implications on resource accounting). Linearity of KV and DB12 vs job duration has been demonstrated. But reco, analysis, merge and skim have non-neg I/O component and will not scale well. Cloud environment ‘whole node’ performance. Scaling factor for HS06, KV and DB12 within expectations. Not everything understood vs bare metal. Passive Benchmarks – use real jobs. Approaches to running benchmarks: benchmark in the pilot (not poss on HPC….) Two options provided by LHCb. (DB12 GitHUB). Run in job or at boot. Cloud Benchmark Suite: Toolkit (wrap KV, DB12). Internal to CERN. Adopted by others. Open questions: KV (not discussed so much. DB12 easier to install). KV based on ATLAS Athena (large code base. License issues). But KV can highlight second order effects (interplay CPU speed and mem access). Magic boost with Haswell of 45% over Sandy Bridge. Major contributor is the Cpython interpreter. Turns out boost is from Branch Prediction. ALICE and LHCb happy with DB12. ATLAS still evaluating.
EOS workshop Overview of workshop EOS releases (gem stone naming) Tags to track changes. Currently at CERN 150PB raw capacity Storage node + storage array = Block Filesystem access – FUSE Python Notebook Integration (SWAN) Collaborative editing via MS Office. Australian – distributed setup. Earth Observation data processing. CMS CERN Tier-2 EOS Namespace – using redis. Infrastructure aware scheduling. Putting /eos in production. Gradual role out and checking of performance. IHEP instance. Russian – Federated Storage usecase The EOS workflow engine
Cloud services for Synchronisation and Sharing (CS3) Sessions – Applications; Technology; Storage tech; Services; Industrial; Projects and collaborations and New site services. One recurrent issue under services – scalability
CERN Tape Archive (CTA) Evolution of CASTOR EOS plus CTA is a “drop in” replacement for CASTOR. EOS de facto disk storage for LHC physics data Natural evolution Ready for friendly small expts in mid 2018. Ready for LHC expts end 2018. Could use ENSTORE. Why build from scratch? CASTOR had the tape storage software already. Just redoing the metadata stuff
Baseline for WLCG Stratum 1 Operations Stratum 1 Network. Backbone for CernVM-FS HTTP content distribution. Current WLCG Stratum 1s: loosely coupled set of web services at 5 sites. Maintenance of S0-S1 replication – client configuraton. Typical information needs of experiments – e.g. which stratum-1 has my repo. Suggested baseline storage 20TB (50% growth/year); sync every 15 mins; latest software within 2 months; ports 80 and 80000 etc.