5 Sept 2006GDB meeting BNL, MIlos Lokajicek Service planning and monitoring in T2 - Prague.

Slides:

Advertisements

Similar presentations

1 PRAGUE site report. 2 Overview Supported HEP experiments and staff Hardware and software on Prague farms Brief statistics about running LHC experiments.

Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.

Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.

LCG-France Project Status Fabio Hernandez Frédérique Chollet Fairouz Malek Réunion Sites LCG-France Annecy, May

CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle and Streams Diagnostics and Monitoring Eva Dafonte Pérez Florbela Tique Aires.

11 September 2007Milos Lokajicek Institute of Physics AS CR Prague Status of the GRID in the Czech Republic NEC’2007.

Regional Computing Centre for Particle Physics Institute of Physics AS CR (FZU) TIER2 of LCG (LHC Computing Grid) 1M. Lokajicek Dell Presentation.

Tier 2 Prague Institute of Physics AS CR Status and Outlook J. Chudoba, M. Elias, L. Fiala, J. Horky, T. Kouba, J. Kundrat, M. Lokajicek, J. Svec, P. Tylka.

Prague Site Report Jiří Chudoba Institute of Physics, Prague Hepix meeting, Prague.

UCL Site Report Ben Waugh HepSysMan, 22 May 2007.

Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.

10 October 2006ICFA DDW'06, Cracow Milos Lokajicek, Prague 1 Current status and plans for Czech Grid for HEP.

Prague TIER2 Computing Centre Evolution Equipment and Capacities NEC'2009 Varna Milos Lokajicek for Prague Tier2.

Computing for HEP in the Czech Republic Jiří Chudoba Institute of Physics, AS CR, Prague.

BINP/GCF Status Report BINP LCG Site Registration Oct 2009

SouthGrid Status Pete Gronbech: 2 nd April 2009 GridPP22 UCL.

29 June 2004Distributed Computing and Grid- technologies in Science and Education. Dubna 1 Grid Computing in the Czech Republic Jiri Kosina, Milos Lokajicek,

Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.

LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.

Data management for ATLAS, ALICE and VOCE in the Czech Republic L.Fiala, J. Chudoba, J. Kosina, J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril,

March 2003 CERN 1 EDG and AliEn in Prague Dagmar Adamova INP Rez near Prague.

SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.

Graphing and statistics with Cacti AfNOG 11, Kigali/Rwanda.

RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.

Quick Introduction to NorduGrid Oxana Smirnova 4 th Nordic LHC Workshop November 23, 2001, Stockholm.

CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.

October 2002 INFN Catania 1 The (LHCC) Grid Project Initiative in Prague Dagmar Adamova INP Rez near Prague.

1 PRAGUE site report. 2 Overview Supported HEP experiments and staff Hardware on Prague farms Statistics about running LHC experiment’s DC Experience.

Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.

09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.

HEPIX - HEPNT, 1 Nov Milos Lokajicek, IP AS CR, Prague1 Status Report - Czech Republic HEP Groups and experiments Networking and Computing Grid activities.

Grid activities in the Czech Republic Jiří Kosina, Miloš Lokajíček, Jan Švec Institute of Physics of the Academy of Sciences of the Czech Republic

Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.

7 March 2000EU GRID Project Proposal Meeting CERN, M. Lokajicek 1 Proposal for Participation of the Czech Republic in the EU HEP GRID Project Institute.

1 LHCC RRB SG 16 Sep P. Vande Vyvre CERN-PH On-line Computing M&O LHCC RRB SG 16 Sep 2004 P. Vande Vyvre CERN/PH for 4 LHC DAQ project leaders.

HEPiX 2 nd Nov 2000 Alan Silverman Proposal to form a Large Cluster SIG Alan Silverman 2 nd Nov 2000 HEPiX – Jefferson Lab.

Computing Jiří Chudoba Institute of Physics, CAS.

13 October 2004GDB - NIKHEF M. Lokajicek1 Operational Issues in Prague Data Challenge Experience.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

Materials for Report about Computing Jiří Chudoba x.y.2006 Institute of Physics, Prague.

Evangelos Markatos and Charalampos Gkikas FORTH-ICS Athens, th Mar Institute of Computer Science - FORTH Christos.

Computing for HEP in the Czech Republic Jiří Chudoba Institute of Physics, AS CR, Prague.

Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.

G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.

Activities and Perspectives at Armenian Grid site The 6th International Conference "Distributed Computing and Grid- technologies in Science and Education"

Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.

The status of IHEP Beijing Site WLCG Asia-Pacific Workshop Yaodong CHENG IHEP, China 01 December 2006.

Grid activities in Czech Republic Jiri Kosina Institute of Physics of the Academy of Sciences of the Czech Republic

13 January 2004GDB Geneva, Milos Lokajicek Institute of Physics AS CR, Prague LCG regional centre in Prague

A Distributed Tier-1 for WLCG Michael Grønager, PhD Technical Coordinator, NDGF CHEP 2007 Victoria, September the 3 rd, 2007.

COMPUTING FOR ALICE IN THE CZECH REPUBLIC in 2015/2016

COMPUTING FOR ALICE IN THE CZECH REPUBLIC in 2016/2017

System Monitoring with Lemon

LCG Service Challenge: Planning and Milestones

Prague TIER2 Site Report

Database Services at CERN Status Update

Update on Plan for KISTI-GSDC

Luca dell’Agnello INFN-CNAF

Patrick Dreher Research Scientist & Associate Director

CC and LQCD dimanche 13 janvier 2019dimanche 13 janvier 2019

This work is supported by projects Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1 6013/ ) from.

Presentation transcript:

5 Sept 2006GDB meeting BNL, MIlos Lokajicek Service planning and monitoring in T2 - Prague

Overview Introduction Service planning and current status –Capacities –Networking –Personnel Monitoring –HW and SW –Middleware –Service Remarks

Introduction Czech Republic’s LHC activities –ATLAS, target 3% of authors -> activities –ALICE, target 1 % –TOTEM, much smaller experiments, relative target higher. – (non LHC – HERA/H1, TEVATRON/D0, AUGER) Institutions (mention just big groups) –Academy of Sciences of the Czech Republic Institute of Physics Nuclear Physics Institute –Charles University in Prague Faculty of Mathematics and Physics –Czech Technical University in Prague Faculty of Nuclear Sciences and Physical Engineering HEP manpower (2005) –145 people 59 physicists 22 engineers 21 technicians 43 undergraduate students a PHD students

Service planning ATLAS + ALICE CPU (MSI2000) Disk (TB) MSS (TB) Table based on LCG MoU for ATLAS and Alice and our anticipated share Project proposals to various grant systems in the Czech Republic Prepare bigger project proposal for CZ GRID together with CESNET –For the LHC needs –In 2010 add 3x more capacity for Czech non-HEP scientists, financed fro state resources and structural funds of EU All proposals include new personnel (up to 10 new persons) Today, regular financing, sufficient for D0 –today 250 cores, 150 kSI2k, 40 TB disk space, no tapes

Networking Local connection of institutes in Prague –Optical 1 Gbps E2E lines WAN –Opticla E2E lines to Fermilab, Taipei new FZK (from 1 Sept 06) –Connection Prague – Amsterodam now through GN2 –Planning further lines to other T1s CEF Networks workshop Prauge, May 30 th, 2006

Personnel Now 4 persons to run T2 –Jiri Kosina – middleware (leaving, looking for replacement), Storage (FTS), monitoring –Tomas Kouba – middleware, monitoring –Jan Svec – basic HW, OS, storage, networking, monitoring –Lukas Fiala - Basic HW, networking, web services –Jiri Chudoba – liason to ATLAS and ALICE, running the jobs and reporting errors, service monitoring Further information is based on their experience

Monitoring HW and basic SW –installation and test of new hardware normally choose proven HW HW - installation by delivery firm install operating system and solve problems with delivery firms install middleware test it for some time outside the production service –Nagios working nodes access via ping disks – how the partitions are full load average if pbs_mom process is running number of running processes if ssh demon is running how full is the swap …. Limits for warning and error Distribution of mails or SMS to admins – fixing problems remotely Regular check of nagios web page for red dots –Regular automatic (cron) checks and restarts for some daemons

Monitoring PBS – job count (via RRD and mrtg) –Local tools for monitoring of number of jobs/machine/per chosen period Apel – not much useful, might be setup for more useful info Gridice ATLAS –Checks and statistics from ATLAS database ALICE - Mona Lisa - very useful Monitor pool accounts and actual user certificates Networking –Network traffic to FZK, SARA, CERN in certain ip range –With the help of ipaccounting (utility ipac-ng) SFT – site functional tests – very useful

outgoing to fzk1 Max: 37M Average: 6M Total: 129G outgoing to internet Max: 61M Average: 8M Total: 164G

Updates and patches YAIM + automated updates on all farm nodes using simple BEX script toolkit (takes care of upgrading the node which was switched off at the deployment/upgrade phase... keeps all nodes in sync automatically) ftp://atrey.karlin.mff.cuni.cz/pub/local/mj/linux/bex-2.0.tar.gz, info in README file ftp://atrey.karlin.mff.cuni.cz/pub/local/mj/linux/bex-2.0.tar.gz

SAM GRAPHS SAM at glance – regular check of our SAM station accessibility SAM At A Glance: d0_fzu_prague D0 Production Environment This page generated on 05 Sep :31:28 Server Host:Port Version Up Since Master:Station sam.farm.particle.cz:45274 v4_2_1_77 31 Jul :08:18Master:Station FSS:Sewer sam.farm.particle.cz:45278 v4_2_1_77 31 Jul :08:20FSS:Sewer sam.farm.particle.cz:45281 v4_2_1_77 31 Jul :08:22 NGFSS:Sewer sam.farm.particle.cz:45278 v4_2_1_77 31 Jul :08:20

Service monitoring Using higher described checks and their combinations Rely on centrally/by experiments supported useful monitors We would appreciate to receive early warning if jobs on some site/working_nodes start quickly fail after submission Service requirements for T2s in “extended” working hours –No special plan today –Try to provide architecture that responsible people can even travel and do as much as possible remotely (e.g. network console access) –Future computing capacities will probably require new arrangements

Remarks Sufficient set of monitors for HW, basic SW, middleware indispensable Especially for service monitoring we rely on centrally distributed tools –Big space for additions and improvements –Or just more useful setups –E.g. SFT - Site Functional Tests very useful Service level – no special arrangement, rather rely on remote repairs outside working hours