13 October 2004GDB - NIKHEF M. Lokajicek1 Operational Issues in Prague Data Challenge Experience.

Slides:



Advertisements
Similar presentations
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
Advertisements

Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH Home server AFS using openafs 3 DB servers. Web server AFS Mail Server.
1 PRAGUE site report. 2 Overview Supported HEP experiments and staff Hardware and software on Prague farms Brief statistics about running LHC experiments.
IFIN-HH LHCB GRID Activities Eduard Pauna Radu Stoica.
11 September 2007Milos Lokajicek Institute of Physics AS CR Prague Status of the GRID in the Czech Republic NEC’2007.
Regional Computing Centre for Particle Physics Institute of Physics AS CR (FZU) TIER2 of LCG (LHC Computing Grid) 1M. Lokajicek Dell Presentation.
Tier 2 Prague Institute of Physics AS CR Status and Outlook J. Chudoba, M. Elias, L. Fiala, J. Horky, T. Kouba, J. Kundrat, M. Lokajicek, J. Svec, P. Tylka.
March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Prague Site Report Jiří Chudoba Institute of Physics, Prague Hepix meeting, Prague.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
27/04/05Sabah Salih Particle Physics Group The School of Physics and Astronomy The University of Manchester
10 October 2006ICFA DDW'06, Cracow Milos Lokajicek, Prague 1 Current status and plans for Czech Grid for HEP.
Prague TIER2 Computing Centre Evolution Equipment and Capacities NEC'2009 Varna Milos Lokajicek for Prague Tier2.
FZU Computing Centre Jan Švec Institute of Physics of the AS CR, v.v.i
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Computing for HEP in the Czech Republic Jiří Chudoba Institute of Physics, AS CR, Prague.
ATLAS DC2 Pile-up Jobs on LCG Atlas DC Meeting February 2005.
EGEE is a project funded by the European Union under contract IST Large scale simulations on the EGEE Grid Jiri Chudoba FZU and CESNET, Prague.
29 June 2004Distributed Computing and Grid- technologies in Science and Education. Dubna 1 Grid Computing in the Czech Republic Jiri Kosina, Milos Lokajicek,
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
ATLAS DC2 seen from Prague Tier2 center - some remarks Atlas sw workshop September 2004.
Data management for ATLAS, ALICE and VOCE in the Czech Republic L.Fiala, J. Chudoba, J. Kosina, J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril,
March 2003 CERN 1 EDG and AliEn in Prague Dagmar Adamova INP Rez near Prague.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
IST E-infrastructure shared between Europe and Latin America High Energy Physics Applications in EELA Raquel Pezoa Universidad.
WLCG Tier-2 site in Prague: a little bit of history, current status and future perspectives Dagmar Adamova, Jiri Chudoba, Marek Elias, Lukas Fiala, Tomas.
28 April 2003Imperial College1 Imperial College Site Report HEP Sysman meeting 28 April 2003.
Sejong STATUS Chang Yeong CHOI CERN, ALICE LHC Computing Grid Tier-2 Workshop in Asia, 1 th December 2006.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
October 2002 INFN Catania 1 The (LHCC) Grid Project Initiative in Prague Dagmar Adamova INP Rez near Prague.
1 PRAGUE site report. 2 Overview Supported HEP experiments and staff Hardware on Prague farms Statistics about running LHC experiment’s DC Experience.
London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney.
HEPIX - HEPNT, 1 Nov Milos Lokajicek, IP AS CR, Prague1 Status Report - Czech Republic HEP Groups and experiments Networking and Computing Grid activities.
Grid activities in the Czech Republic Jiří Kosina, Miloš Lokajíček, Jan Švec Institute of Physics of the Academy of Sciences of the Czech Republic
Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.
7 March 2000EU GRID Project Proposal Meeting CERN, M. Lokajicek 1 Proposal for Participation of the Czech Republic in the EU HEP GRID Project Institute.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Large Simulations using EGEE Grid for the.
5 Sept 2006GDB meeting BNL, MIlos Lokajicek Service planning and monitoring in T2 - Prague.
Computing Jiří Chudoba Institute of Physics, CAS.
LCG LCG-1 Deployment and usage experience Lev Shamardin SINP MSU, Moscow
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
Materials for Report about Computing Jiří Chudoba x.y.2006 Institute of Physics, Prague.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
BaBar Cluster Had been unstable mainly because of failing disks Very few (
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
Evangelos Markatos and Charalampos Gkikas FORTH-ICS Athens, th Mar Institute of Computer Science - FORTH Christos.
Computing for HEP in the Czech Republic Jiří Chudoba Institute of Physics, AS CR, Prague.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.
17 September 2004David Foster CERN IT-CS 1 Network Planning September 2004 David Foster Networks and Communications Systems Group Leader
Grid activities in Czech Republic Jiri Kosina Institute of Physics of the Academy of Sciences of the Czech Republic
13 January 2004GDB Geneva, Milos Lokajicek Institute of Physics AS CR, Prague LCG regional centre in Prague
The Beijing Tier 2: status and plans
LCG 3D Distributed Deployment of Databases
Prague TIER2 Site Report
VOCE Peter Kaczuk, Dan Kouril, Miroslav Ruda, Jan Svec,
Luca dell’Agnello INFN-CNAF
The INFN TIER1 Regional Centre
NIKHEF Data Processing Fclty
Статус ГРИД-кластера ИЯФ СО РАН.
PES Lessons learned from large scale LSF scalability tests
Bernd Panzer-Steindel CERN/IT
This work is supported by projects Research infrastructure CERN (CERN-CZ, LM ) and OP RDE CERN Computing (CZ /0.0/0.0/1 6013/ ) from.
Presentation transcript:

13 October 2004GDB - NIKHEF M. Lokajicek1 Operational Issues in Prague Data Challenge Experience

13 October 2004GDB - NIKHEF M. Lokajicek2 Prague experience Experiments and people HW in Prague Local DC statistics Experience

13 October 2004GDB - NIKHEF M. Lokajicek3 Experiments and people Three institutions in Prague –Academy of Sciences of the Czech Republic –Charles University in Prague –Czech Technical University in Prague Collaborate on experiments –CERN – ATLAS, ALICE, TOTEM, *AUGER* –FNAL – D0 –BNL - STAR –DESY – H1 Collaborating community 125 persons –60 researchers –43 students and PHD students –22 engineers and 21 technicians LCG Computing staff – take care for GOLIAS and Skurut –Jiri Kosina – LCG SW installation, networking –Jiri Chudoba – ATLAS and ALICE SW and running –Jan Svec – HW, operating system, PbsPro, networking, D0 SW support (SAM, JIM) Vlastimil Hynek – run D0 simulations –Lukas Fiala – HW, networking, web

13 October 2004GDB - NIKHEF M. Lokajicek4 Available HW in Prague Two independent farms in Prague –GOLIAS – Institute of Physics AS CR LCG farm serving D0, ATLAS, ALICE –Skurut – CESNET, z.s.p.o. EGEE preproduction farm, used for ATLAS DC –Sharing of resources D0:ATLAS:ALICE= 50:40:10 Golias: –80 dual CPU, 40 TB 32 dual CPU nodes PIII1.13GHz, 1GB RAM In July dual CPU Xeon 3.06 GHz, 2 GB RAM (WN) 10 TB disk space, we use LVM to create 3 volumes with 3 TB, one per experiment, nfs mounted on SE In July TB disk space, now in tests, PBSPro batch system –18 racks, more than half empty, 150 kW secured input electric power GOLIAS

13 October 2004GDB - NIKHEF M. Lokajicek5 Available HW in Prague Skurut – located at CESNET 16 dual CPU nodes PIII 700MHz, 1GB RAM OpenPBS batch syste Older, but stable, no upgrades, no development, no changes in PBS

13 October 2004GDB - NIKHEF M. Lokajicek6 Network connection General – Geant connection –Gb infrastructure at GOLIAS, over 10 Gbps Metropolitan Prague backbone –CZ - GEANT 2.5 Gbps (over 10 Gbps HW) –USA 0.8 Gbps Dedicated connection – provided by CESNET –Delivered by CESNET in Collaboration with NederLight and recently in the scope of GLIF projects 1 Gbps (10 Gbps line) optical connection Golias-CERN Plan to provide the connection for other groups in Prague –In consideration connections to FERMILAB, RAL or Taipei –Independent optical connection between the collaborating Institutes in Prague, finished by end 2004

13 October 2004GDB - NIKHEF M. Lokajicek7 Local DC results

13 October 2004GDB - NIKHEF M. Lokajicek8 ATLAS - July 1 – September 21 GOLIAS jobsCPU (days) Elapsed (days) all long (cpu>100s) short SKURUT jobsCPU (days) Elapsed (days) all long (cpu>100s) short number of jobs in DQ: 1349 done 1231 failed = 2580 jobs, 52% number of jobs in DQ: 362 done 572 failed = 934 jobs, 38%

13 October 2004GDB - NIKHEF M. Lokajicek9 Local job distribution GOLIAS –not enough jobs ALICE D0 ATLAS 2 Aug23 Aug

13 October 2004GDB - NIKHEF M. Lokajicek10 Local job distribution SKURUT –ATLAS jobs –usage much better

13 October 2004GDB - NIKHEF M. Lokajicek11 ATLAS - Memory usage atlas jobs on GOLIAS, july – september (part) 2004

13 October 2004GDB - NIKHEF M. Lokajicek12 ATLAS - CPU Time PIII1.13GHz Xeon 3.06GHz hours PIII700MHz hours queue limit: 48 hours later changed to 72 hours

13 October 2004GDB - NIKHEF M. Lokajicek13 Statistics for ATLAS - Jobs distribution

13 October 2004GDB - NIKHEF M. Lokajicek14 ATLAS - Real and CPU Time very long tail for real time – some jobs were hanging during IO operation

13 October 2004GDB - NIKHEF M. Lokajicek15 No imposed time limit on atlas jobs, but some hanging jobs had to be killed. ATLAS CPU and REAL TIME difference

13 October 2004GDB - NIKHEF M. Lokajicek16 ATLAS Total statistics Total time used: –1593 days of CPU time –1829 days of real time Mean usage in 90 days: –17.7 working CPUs/day –20.3 used CPUs/day ONLY JOBS WITH CPU TIME > 100s COUNTED

13 October 2004GDB - NIKHEF M. Lokajicek17 ATLAS Miscellaneous no job name in the local batch system – difficult to identify no (?) documentation where to look for log files, which logs are relevant lost jobs due to CPU time limit - no warning lost jobs due to one missconfigured node - spotted from local logs and by Simone too some jobs loop forever

13 October 2004GDB - NIKHEF M. Lokajicek18 ATLAS Memory usage some jobs required > 1GB RAM (no pileup events yet!)

13 October 2004GDB - NIKHEF M. Lokajicek19 ALICE jobs

13 October 2004GDB - NIKHEF M. Lokajicek20 ALICE

13 October 2004GDB - NIKHEF M. Lokajicek21 ALICE

13 October 2004GDB - NIKHEF M. Lokajicek22 ALICE

13 October 2004GDB - NIKHEF M. Lokajicek23 ALICE Total statistics Total time used: –2076 days of CPU time –2409 days of real time Mean usage in 100 days: –20.7 working CPUs/day –24 used CPUs/day ONLY JOBS WITH CPU TIME > 100s COUNTED

13 October 2004GDB - NIKHEF M. Lokajicek24 Experience, lessons learned LCG installation – On GOLIAS we use PbsPro. Due to modificatons we use manual installation –Worker nodes – the first installation via LCFGng, then switched off –All other configurations and upgrades manually –In case of problems – manual installations helps to understand which intervention should be done (LCFGng non transparent) –Currently installed LCG version 2_2_0 Problems encountered –Earlier installation manuals were in pdf only, new version also in html – enables useful copy/paste – OK –LCG 2_2_0 has R-GMA inside – unfortunately manual installation version is incomplete, is not sufficient for manual configuration – parts on tomcat and java security missing

13 October 2004GDB - NIKHEF M. Lokajicek25 Experience, lessons learned PBS –Skurut – OpenPbs, simply configured, effectively used for one experiment only –GOLIAS – PbsPro 3 experiments with defined proportions We have problems to set wanted conditions, regular manual intervention to set number of nodes for various queues, priorities We do not want nodes idle, if some higher priority experiment does not send jobs –Already mentioned problem of pending i/o operations from which some jobs will not recover