WLCG Tier-2 site in Prague: a little bit of history, current status and future perspectives Dagmar Adamova, Jiri Chudoba, Marek Elias, Lukas Fiala, Tomas.

Slides:



Advertisements
Similar presentations
1 PRAGUE site report. 2 Overview Supported HEP experiments and staff Hardware and software on Prague farms Brief statistics about running LHC experiments.
Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
11 September 2007Milos Lokajicek Institute of Physics AS CR Prague Status of the GRID in the Czech Republic NEC’2007.
Regional Computing Centre for Particle Physics Institute of Physics AS CR (FZU) TIER2 of LCG (LHC Computing Grid) 1M. Lokajicek Dell Presentation.
Tier 2 Prague Institute of Physics AS CR Status and Outlook J. Chudoba, M. Elias, L. Fiala, J. Horky, T. Kouba, J. Kundrat, M. Lokajicek, J. Svec, P. Tylka.
Prague Site Report Jiří Chudoba Institute of Physics, Prague Hepix meeting, Prague.
E-Infrastructure hierarchy Networking and Computational facilities in Armenia ASNET AM Network Armenian National Grid Initiative Armenian ATLAS site (AM-04-YERPHI)
10 October 2006ICFA DDW'06, Cracow Milos Lokajicek, Prague 1 Current status and plans for Czech Grid for HEP.
Prague TIER2 Computing Centre Evolution Equipment and Capacities NEC'2009 Varna Milos Lokajicek for Prague Tier2.
FZU Computing Centre Jan Švec Institute of Physics of the AS CR, v.v.i
Computing for HEP in the Czech Republic Jiří Chudoba Institute of Physics, AS CR, Prague.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
The WLCG services required by ALICE at the T2 sites: evolution of the CREAM-system, VOBOXES and monitoring Dagmar Adamova, Patricia Mendez Lorenzo, Galina.
29 June 2004Distributed Computing and Grid- technologies in Science and Education. Dubna 1 Grid Computing in the Czech Republic Jiri Kosina, Milos Lokajicek,
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Data management for ATLAS, ALICE and VOCE in the Czech Republic L.Fiala, J. Chudoba, J. Kosina, J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril,
March 2003 CERN 1 EDG and AliEn in Prague Dagmar Adamova INP Rez near Prague.
Sejong STATUS Chang Yeong CHOI CERN, ALICE LHC Computing Grid Tier-2 Workshop in Asia, 1 th December 2006.
São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.
October 2002 INFN Catania 1 The (LHCC) Grid Project Initiative in Prague Dagmar Adamova INP Rez near Prague.
1 PRAGUE site report. 2 Overview Supported HEP experiments and staff Hardware on Prague farms Statistics about running LHC experiment’s DC Experience.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
INFSO-RI Enabling Grids for E-sciencE Experience with monitoring of Prague T2 site Tomáš Kouba NEC 2007, Varna, Bulgaria
Grid activities in the Czech Republic Jiří Kosina, Miloš Lokajíček, Jan Švec Institute of Physics of the Academy of Sciences of the Czech Republic
Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team ATHIC2012, Busan,
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Experiment Operations: ALICE Report WLCG GDB Meeting, CERN 14th October 2009 Patricia Méndez Lorenzo, IT/GS-EIS.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
5 Sept 2006GDB meeting BNL, MIlos Lokajicek Service planning and monitoring in T2 - Prague.
Computing Jiří Chudoba Institute of Physics, CAS.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Status of India CMS Grid Computing Facility (T2-IN-TIFR) Rajesh Babu Muda TIFR, Mumbai On behalf of IndiaCMS T2 Team July 28, 20111Status of India CMS.
13 October 2004GDB - NIKHEF M. Lokajicek1 Operational Issues in Prague Data Challenge Experience.
Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
Materials for Report about Computing Jiří Chudoba x.y.2006 Institute of Physics, Prague.
Accounting in LCG/EGEE Can We Gauge Grid Usage via RBs? Dave Kant CCLRC, e-Science Centre.
Computing for HEP in the Czech Republic Jiří Chudoba Institute of Physics, AS CR, Prague.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
INRNE's participation in LCG Elena Puncheva Preslav Konstantinov IT Department.
Development of a Tier-1 computing cluster at National Research Centre 'Kurchatov Institute' Igor Tkachenko on behalf of the NRC-KI Tier-1 team National.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
The HEPiX IPv6 Working Group David Kelsey (STFC-RAL) EGI OMB 19 Dec 2013.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
KIT – Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der Helmholtz-Gemeinschaft Steinbuch Centre for Computing
Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Grid activities in Czech Republic Jiri Kosina Institute of Physics of the Academy of Sciences of the Czech Republic
13 January 2004GDB Geneva, Milos Lokajicek Institute of Physics AS CR, Prague LCG regional centre in Prague
A Distributed Tier-1 for WLCG Michael Grønager, PhD Technical Coordinator, NDGF CHEP 2007 Victoria, September the 3 rd, 2007.
WLCG IPv6 deployment strategy
COMPUTING for ALICE at WLCG TIER-2 SITE in PRAGUE in 2015/2016
COMPUTING FOR ALICE IN THE CZECH REPUBLIC in 2015/2016
COMPUTING FOR ALICE IN THE CZECH REPUBLIC in 2016/2017
Prague TIER2 Site Report
Update on Plan for KISTI-GSDC
This work is supported by projects Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1 6013/ ) from.
Presentation transcript:

WLCG Tier-2 site in Prague: a little bit of history, current status and future perspectives Dagmar Adamova, Jiri Chudoba, Marek Elias, Lukas Fiala, Tomas Kouba, Milos Lokajicek, Jan Svec Prague

Outline Introducing the WLCG Tier-2 site in Prague A couple of history flashbacks  we celebrate the 10 th anniversary Current issues Summary and Outlook

HEP Computing in Prague: site praguelcg2 (a.k.a. the farm GOLIAS) A national computing center for processing data from various HEP experiments –Located in the Institute of Physics (FZU) in Prague –Basic infrastructure already in 2002, but –OFFICIALLY STARTED IN 2004  10 th ANNIVERSARY THIS YEAR Certified as a Tier2 center of LHC Computing Grid (praguelcg2) –Collaboration with several Grid projects. April 2008, WLCG MoU signed by Czech republic (ALICE+ATLAS). Excellent network connectivity: Multiple dedicated 1 – 10 Gb/s connections to collaborating institutions. Connected to LHCONE. Provides computing services for ATLAS + ALICE, D0, Solid state physics, Auger, Star... Started in 2002 with: 32 dual PIII 1.2GHz, 1 GB RAM, 18 GB SCSI HDD, 100 Mb/s Ethernet rack servers …. (29 of these decommissioned in 2009) Storage - disk array 1TB: HP server TC4100

History: > 2014

Current numbers 5 1 batch system (torque + maui) 2 main WLCG VOs: ALICE, ATLAS –FNAL's D0 (dzero) user group –Other VOs: Auger, Star ~ 4000 cores published in the Grid ~ 3 PB on new disk servers (DPM, XRootD, NFS) Regular yearly upscale of resources on the basis of various financial supports, mainly the academic grants. The WLCG services include: –Apel publisher, Argus Authorization service, BDII, several UIs, Alice VOBOX, Cream CEs, Storage Elements The use of virtualization at the site is quite extensive. ALICE disk XRootD Storage Element ALICE::Prague::SE –~ PB of disk space in total –Redirector/client + 3 FZU, 5 NPI Rez –  a distributed storage cluster

Site Usage ATLAS and ALICE – continuous production other projects – shorter campaigns ALICE ATLAS

Some history flashbacks (celebrating the 10 th anniversary)

8 ALICE PDC 2004 resource statistics: 14 sites ALICE 2014 resource statistics: 74 sites

9 ALICE PDC resources statistics sites in operation Running jobs (8 November 2005) Farm Min Avg Max Sum CCIN2P CERN-L CNAF FZK Houston0314 Münster25881 Prague Sejong222 Torino334143

2006 ALICE vobox set-up –fixing problems with the vobox proxy (unwanted expirations) –AliEn services set up –manually changing the RBs used by the JAs – successful participation in the ALICE PDC'06: –Prague site delivered ~ 5% of total computing resources (6 Tier1s, 30 Tier2s) Problems with the fair-share of the site local batch system (then PBSPro) still problems with functioning of the ALICE vobox- proxy during the PDC'07 problems with job submission due to malfunctions of the default RBs  the failover submission configured Prague site delivered ~ 2.6% of total computing resources (significant increase of the number of Tier- 2s) migration to gLite3.1 ALICE vobox on 64-bit SLC4 machine upgrade of the local CE serving ALICE to lcg- CE 3 repeating problems with job submission through RB's  in Oct. the site re-configured for the WMS submission migration to the Torque batch system on a part of the site: some WNs on 32bit in PBS and some on 64bit in Torque installation and tuning of the creamCE hybrid state: –‘glite’ vobox and WNs, 32bit –‘cream’ vobox submitting JAs directly to creamCE  Torque, 64bit –Dec: ALICE jobs submitted only to the creamCE

2010 creamCE 1.6 / gLite 3.2/sl5 64bit installed in Prague  we were the first ALICE Tier-2 where cream1.6 was tested and put in production NGI_CZ set in operation 2011 Start of IPv6 implementation The site router got an IPv6 address Routing set-up in special VLANs ACLs directly implemented in the router IPv6 address configuration: DHCPv6 Set-up of an IPv6 testbed

2012 Optimization of the ALICE XRootD storage cluster performance an extensive tuning of the cluster motivated by a remarkably different performance of the individual machines: – data was migrated from the machine to be tuned to free disk arrays at another machine of the cluster. – the migration procedure done so that the data was accessible all the time. – the empty machine re-configured. –number of disks in one array reduced. –set-up of disk failure monitoring. –raid controller cache carefully configured. –readahead option set to a multiple of (stripe_unit * stripe_width) of the underlying RAID array. –no partition table used to ensure proper alignment of the file systems: they were created with right geometry options ("-d su=X, w=YYk“ mmkfs.xfs switches). –mounting performed with the noatime option. Parameters of one of the optimized XRootD servers before and after tuning Almost all machines migrated to SL6 CVMFS installed on all machines Connected to LHCONE

praguelcg2 contribution to WLCG Tier-2 ATLAS+ALICE computing resources A long-term slide down due to problems with financial support

Current issues

Monitoring issues Monitoring issues A number of monitoring tools in use: NAGIOS, MUNIN, GANGLIA, MRTG, NETFLOW, Gstat, MonALISA Nagios: –IPv6-only or IPv4-only servers connected to the central Dual stack node via Livestatus –Some checks can be run form IPv4-only or IPv6-only Nagios nodes MUNIN2: –current version –IPv6 in testing Ganglia: –problems if the proper gai.conf is not present –gmetad doesn’t bind to IPv6 address on aggregators NetFlow: –plan to switch from v5 to v9 to use nfdump + nfsen Some new sensors are needed to fully deploy IPv6, some additional work necessary MonALISA REPOSITORY: –A simple test version installed, plans for future development

Network monitoring – weathermap LHCONE link is heavily utilized (capacity 10 Gbps) Nagios for alerts

Network architecture at FZU

Outgoing IPv4 local traffic from DPM servers Outgoing IPv6 local traffic from DPM servers IPv6 deployment Currently on Dual-stack: dpm headnode, all production disk nodes, all but 2 subclusters of WNs Over IPv6 goes: dpns between disknodes and headnode, srm between WNs and headnode, actual data transfer via gridftp IPv6 enabled on the ALICE vobox

Site services management Site services management Since 2008 services management done with CFEngine version 2 –cfagent Nagios sensor developed: a python script checking CFEngine logs for fresh records (error signals if the log is too old) CFEngine v2 used for production Puppet used for IPv6 testbed Migration to the overall Puppet management in progress

NGI_CZ Since 2010, NGI_CZ is recognized and in operation:  all the events and relevant information about praguelcg2 2 sites involved: praguelcg2 and prague_cesnet_lcg2 significant part of the services provided by the praguelcg2 team Services provided by NGI_CZ for the EGI infrastructure: Accounting (APEL, DGAS, Cesga portal) Resources database (GOC DB) Operations ROD (Regional Operator on Duty) Top level BDII VOMS servers Meta VO User support (GGUS/RT) - Middleware versions: UMD 3.0.0, EMI 3.0

Use of external resources Use of external resources Not much really to choose from Longer term usage of the cluster ‘skurut’ in Prague: site prague_cesnet_lcg2, courtesy of CESNET association – a long-time established cooperation NGI_CZ provided a single opportunity to use ~ 35 TB disk storage in Pilsen – for testing purposes mostly –dCache manager used –Evaluating the effect of switching/tuning TTreeCache, dCap RA –Not much of help as an extension of home resources

Summary and Outlook Prague Tier-2 site was performing as a distinguished member of the WLCG collaboration for 10 years now A stable upscale of resources High-level accessibility, reliable delivery of services, fast response to problems Into the upcoming years, we will do our best to keep up the reliability and performance level of the services Crucial is the high-capacity, state-of-the-art network infrastructure provided by CESNET However, the future LHC runs will require a huge upscale of resources which will be impossible for us to achieve with the expected flat budget As everybody else these days, we are in a search for external resources: got some help from CESNET but need more. As widely recommended, we very likely will try to collaborate with non-HEP scientific projects to get access to additional resources in the future

A couple of current plots

RUNNING ALICE JOBS IN PRAGUE in 2013/2014: Average = 996, maximum = 2227 Total number of processed jobs: ~ 5 millions GRID for ALICE in Prague – Monitoring jobs (MonALISA)

26 ALICE Disk Storage Elements – 62 endpoints, ~ 34 PB Prague scores with the largest Tier-2 storage

NETWORK TRAFFIC ON PRAGUE ALICE STORAGE CLUSTER in 2013/2014: (Total disk space capacity PB) Max total traffic IN/write: 195 MB/s Max total traffic OUT/read: 1.05 GB/s Total data OUT/read : PB GRID for ALICE in Prague – Monitoring storage (MonALISA)