COMPUTING FOR ALICE IN THE CZECH REPUBLIC in 2016/2017

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

11 September 2007Milos Lokajicek Institute of Physics AS CR Prague Status of the GRID in the Czech Republic NEC’2007.
Regional Computing Centre for Particle Physics Institute of Physics AS CR (FZU) TIER2 of LCG (LHC Computing Grid) 1M. Lokajicek Dell Presentation.
Tier 2 Prague Institute of Physics AS CR Status and Outlook J. Chudoba, M. Elias, L. Fiala, J. Horky, T. Kouba, J. Kundrat, M. Lokajicek, J. Svec, P. Tylka.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.
Prague Site Report Jiří Chudoba Institute of Physics, Prague Hepix meeting, Prague.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
Computing for HEP in the Czech Republic Jiří Chudoba Institute of Physics, AS CR, Prague.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
March 2003 CERN 1 EDG and AliEn in Prague Dagmar Adamova INP Rez near Prague.
WLCG Tier-2 site in Prague: a little bit of history, current status and future perspectives Dagmar Adamova, Jiri Chudoba, Marek Elias, Lukas Fiala, Tomas.
Sejong STATUS Chang Yeong CHOI CERN, ALICE LHC Computing Grid Tier-2 Workshop in Asia, 1 th December 2006.
1 PRAGUE site report. 2 Overview Supported HEP experiments and staff Hardware on Prague farms Statistics about running LHC experiment’s DC Experience.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.
KISTI-GSDC SITE REPORT Sang-Un Ahn, Jin Kim On the behalf of KISTI GSDC 24 March 2015 HEPiX Spring 2015 Workshop Oxford University, Oxford, UK.
INFN TIER1 (IT-INFN-CNAF) “Concerns from sites” Session LHC OPN/ONE “Networking for WLCG” Workshop CERN, Stefano Zani
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team ATHIC2012, Busan,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
Computing Jiří Chudoba Institute of Physics, CAS.
Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
Materials for Report about Computing Jiří Chudoba x.y.2006 Institute of Physics, Prague.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.
A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Global Science experimental Data hub Center April 25, 2016 Seo-Young Noh Status Report on KISTI’s Computing Activities.
Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
Grid activities in Czech Republic Jiri Kosina Institute of Physics of the Academy of Sciences of the Czech Republic
Grid Operations in Germany T1-T2 workshop 2015 Torino, Italy Kilian Schwarz WooJin Park Christopher Jung.
13 January 2004GDB Geneva, Milos Lokajicek Institute of Physics AS CR, Prague LCG regional centre in Prague
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
Virtual machines ALICE 2 Experience and use cases Services at CERN Worker nodes at sites – CNAF – GSI Site services (VoBoxes)
Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
Kilian Schwarz ALICE Computing Meeting GSI, October 7, 2009
Dynamic Extension of the INFN Tier-1 on external resources
Extending the farm to external sites: the INFN Tier-1 experience
WLCG IPv6 deployment strategy
COMPUTING for ALICE at WLCG TIER-2 SITE in PRAGUE in 2015/2016
COMPUTING FOR ALICE IN THE CZECH REPUBLIC in 2015/2016
(Prague, March 2009) Andrey Y Shevel
GRID OPERATIONS IN ROMANIA
Bob Ball/University of Michigan
Ian Bird WLCG Workshop San Francisco, 8th October 2016
The Beijing Tier 2: status and plans
LCG Service Challenge: Planning and Milestones
Grid site as a tool for data processing and data analysis
Belle II Physics Analysis Center at TIFR
Operations and plans - Polish sites
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
INFN Computing infrastructure - Workload management at the Tier-1
Andrea Chierici On behalf of INFN-T1 staff
Kolkata Status and Plan
Update on Plan for KISTI-GSDC
Dagmar Adamova, NPI AS CR Prague/Rez
RDIG for ALICE today and in future
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Статус ГРИД-кластера ИЯФ СО РАН.
AGLT2 Site Report Shawn McKee/University of Michigan
Ákos Frohner EGEE'08 September 2008
New strategies of the LHC experiments to meet
This work is partially supported by projects InterExcellence (LTT17018), Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1.
This work is supported by projects Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1 6013/ ) from.
Presentation transcript:

COMPUTING FOR ALICE IN THE CZECH REPUBLIC in 2016/2017 D. Adamova, with the help and support of : J. Chudoba, J.Svec, V. Rikal, A. Mikula, M. Adam and J. Hampl Strasbourg 2017

Outline Status of the WLCG Tier-2 site in Prague ALICE operations in 2016/2017 Report on the delivery of mandatory resources A couple of various issues Summary and Outlook

10Gb/s dedicated fiber link The geographical layout Nuclear Physics Institute AS CR (NPI) a large ALICE group, no ATLAS involvement A (smaller) part of ALICE storage resources 10Gb/s dedicated fiber link Institute of Physics AS CR (FZU) Regional computing center WLCG Tier-2 site praguelcg2 All the CPU resources for ALICE A (larger) part of storage resources for ALICE Quite small ALICE group, much larger ATLAS community Originally, the xrootd storage servers for ALICE were only at NPI. Subsequently, xrootd servers were also put in operation at FZU and now even the redirector is there.

HEP Computing in Prague: site praguelcg2 (a.k.a. the farm GOLIAS) A national computing center for processing data from various HEP experiments Located in the Institute of Physics (FZU) in Prague Basic infrastructure already in 2002, but officially started in 2004 Certified as a Tier2 center of LHC Computing Grid (praguelcg2) Collaboration with several Grid projects. April 2008, WLCG MoU signed by Czech Republic (ALICE+ATLAS). Very good network connectivity: Multiple dedicated 1 – 10 Gb/s connections to collaborating institutions and 2x10 Gb/s connection to LHCONE (upgraded from 10 Gb/s on 13.10.2016). Provides computing services for ATLAS + ALICE, D0, Solid state physics, Auger, Nova, CTA ... 4

Some pictures Servers xrootd1 (125 TB) and xrootd2 (89 TB) Server xrootd5 (422 TB)

Current numbers 1 batch system (torque + maui) 2 main WLCG VOs: ALICE, ATLAS Other users: Auger, Nova, CTA, FNAL's D0 user group 220 WNs, 4556 CPU cores, published in the Grid ~4 PB on disk storage (DPM, XRootD, NFS), a tape library Regular yearly upscale of resources on the basis of various financial supports, mainly the academic grants. The WLCG services include: Apel publisher, Argus Authorization service, BDII, several UIs, ALICE vobox, Cream CEs, Storage Elements, lcg-CA Monitoring: Nagios, Munin, Ganglia, Netflow, Observium Log processing and visualization: ElasticSearch, Kibana + Logstach, i.e. the ELK stack Service management by Puppet The use of virtualization at the site is quite extensive. ALICE disk XRootD Storage Element ALICE::Prague::SE ~ 1.6 PB of disk space in total Redirector+ 5 clients @ FZU, 4 clients @ NPI Rez a distributed storage cluster 10 Gb/s dedicated connection from FZU to the NPI ALICE storage cluster 6

Annual growth of total on site resources since 2012 7

All disk servers are connected via 10 Gb/s to the central CISCO Nexus 3172PQ switch. Worker node connection is 1 Gb/s to top of the rack switches that have 10 Gb/s connection to the central switch.

External connectivity, by CESNET Core load on external connections comes from LHC Experiments ATLAS and ALICE. Before the upgrade to 2x10 Gb/s the LHCONE link was often saturated up to the limit.

Prague LHCONE connectivity: 2 x 10 Gb/s FZU  LHCONE connection All disk servers are connected via 10 Gbps to the central CISCO Nexus 3172PQ switch. Worker node connection is 1 Gbps to top of the rack switches that have 10 Gbps connection to the central switch. Data traffic on the FZU  LHCONE connection

IPv6 deployment All worker and storage nodes have IPv4 and IPv6 addresses. All local transfers via gridftp protocol utilize preferably IPv6.

The ELK Stack is used for log processing and visualization. An example of Kibana filtering.

ALICE operations in 2016/2017 Running jobs profile at the Prague Tier-2 in Nov. 2015 – Nov. 2016. The ALICE local share in 2016 is lower than in 2015. Opportunistic resources were not so easy to use because ATLAS and NOVA are very active. ATLAS ATLAS multicore ALICE Running jobs profile in 2016/2017 as shown in ALICE MonALISA. The avg. number of jobs is 1192, in peaks up to 3070.

CPU resources share & site availability in 2016/2017 In 2016, the Czech Republic share in the Tier-2 CPU resources delivery was ~3.1%. In 2017, this share of Prague is ~2.7%. Services availability in 2016

Partial upgrade of the Prague SE cluster On March 2, 2017 a new switch/router Cisco Catalyst 4500-X 16 ports was installed at NPI and the lids* storage machines were put on 10 Gb/s connectivity. The peaks in traffic on individual lids* servers went up from close to 100 MB/s to over 500 MB/s.

Tuning/upgrade of the xrootd cluster January 2016 – April 2017: outgoing traffic at the Prague/Rez ALICE storage cluster: > 3.8 GByte/s in peaks almost 32.1 PBytes of data read out. Tuning/upgrade of the xrootd cluster Traffic on the dedicated line NPI <-> FZU. Maximum rate over 5 Gb/s . Not yet fully made use of the new 10 Gb/s Ethernet connectivity. Some additional tuning necessary. But the data traffic on the servers went up. New CISCO installed/ lids* servers on 10 Gb/s

Czech Republic resources delivery MANDATORY ACCORDING TO THE ALICE CONSTITUTION: 1.96% of the total required T2 resources Resources delivery in 2016: CPU (kHepSpec06) ALICE requirement: 7,2 REBUS info: pledged 5,0 REBUS info: delivered 8,85 Resources delivery in 2017: ALICE requirement: 12,22 REBUS info: pledged 5,0 REBUS info: delivered 9,4 Resources delivery in 2016: Disk (PB): ALICE requirement: 0,92 REBUS info: pledged 1,4 REBUS info: delivered 1,79 Resources delivery in 2017: ALICE requirement: 1,12 REBUS info: pledged 1, 4 REBUS info: delivered 1,79 For the ALICE Czech group easier to scale up the storage capacity than to pile up CPUs. ALICE requests in 2016 of computing and storage resources were satisfied, though the CPU pledges were low.: - by stable operations with almost no outage - by using the servers out of warranty by extensive use of other projects resources

ARC-CE TEST OPERATIONS PRAGUE ARC CE submission 1.3.2017 – 22.4.2017: Since a couple of months, ARC-CE has been in operation in Prague for NOVA and ATLAS. In March a second (ARC) vobox was set and a small part of ALICE jobs has been submitted via ARC-CE. The final goal is to replace CREAM + Torque by ARC + HTCondor. Currently, both systems co-exist. PRAGUE ARC CE submission 1.3.2017 – 22.4.2017: avg. 70, max. 425 running jobs

Planned compute/storage capacity upgrades In May 2017, a new grant project is starting, running until November 2019. It is a complementary project to the traditional big CERN – CR collaboration grant aimed at investments into computing for CERN (in our case it means ALICE and ATLAS). Investments into computing resources are absent in the “big” CERN-CR grant. The total capacity is shared with the ratio ALICE : ATLAS = 1 : 3, according to the ratio of scientists participating in the projects. Unfortunately the limits are low, cca 450 kEuro in total for the hardware purchases for the both experiments. The schedule of purchases: 2017: compute cluster with computing capacity of 20 – 25 kHS06. The cluster will be officially shared between ALICE and ATLAS with the ratio 1:3. 2018: storage servers/cluster with capacity 300 – 400 TB exclusively for ALICE, operated at NPI. 2019: compute cluster with capacity ~13 kHS06. Again the share ALICE : ATLAS will be 1:3.

Aspects of other future upscale The future LHC runs will require a huge upscale of computing, storage and network resources which will be impossible for us to achieve with the expected budget. Activities towards getting access to external resources, HPC centers or computational clouds are ongoing at all LHC experiments. On the local level, there are lengthy discussions with the HPC center IT for Innovations (IT4I) in Ostrava. The showstopper is the absence of tools for submission of ALICE jobs to the site. We’d need an IT student with some experience to work on the submission interface. Problem to find someone like this.

Summary and Outlook Czech Republic was performing well during 2016 concerning delivery of mandatory resources. High-level accessibility, reliable delivery of services, fast response to problems In 2017, we provide enough storage but are under-delivering in compute resources . A network upgrade was done at NPI putting the local part of ALICE::Prague::SE on 10 Gb/s connectivity. A new grant is starting in May with planned investments into compute and storage resources allowing for an upscale of current capacities. Into the upcoming years, we will do our best to keep up the reliability and performance level of the services and deliver the pledges.

Backups

Czech republic CPU delivery in 2017 The Prague share in the Tier-2 CPU resources delivery in 2017 was ~2.7%.

ALICE operations in 2016 Running jobs profile at the Prague Tier-2 in March 2016. The ALICE share is lower than in 2015. ATLAS ALICE Running jobs profile as shown in ALICE MonALISA. The number of concurrently running ALICE jobs at FZU in the first months of 2016 was in average ~1300, in peaks ~ 3000.