S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June 20071 ATLAS cluster in Geneva Szymon Gadomski University of Geneva.

Slides:

Advertisements

Similar presentations

UCL HEP Computing Status HEPSYSMAN, RAL,

Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.

Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.

Duke Atlas Tier 3 Site Doug Benjamin (Duke University)

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.

Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.

ATLAS computing in Geneva Szymon Gadomski, NDGF meeting, September 2009 S. Gadomski, ”ATLAS computing in Geneva", NDGF, Sept 091 the Geneva ATLAS Tier-3.

K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.

Israel Cluster Structure. Outline The local cluster Local analysis on the cluster –Program location –Storage –Interactive analysis & batch analysis –PBS.

1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.

Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.

Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.

INDIACMS-TIFR Tier 2 Grid Status Report I IndiaCMS Meeting, April 05-06, 2007.

F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,

21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.

Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.

Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.

HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

2-Dec Offline Report Matthias Schröder Topics: Scientific Linux Fatmen Monte Carlo Production.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

23.March 2004Bernd Panzer-Steindel, CERN/IT1 LCG Workshop Computing Fabric.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.

Dario Barberis: ATLAS Activities at Tier-2s Tier-2 Workshop June ATLAS Activities at Tier-2s Dario Barberis CERN & Genoa University.

Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.

Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.

K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.

Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)

Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.

T3g software services Outline of the T3g Components R. Yoshida (ANL)

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.

CERN IT Department CH-1211 Genève 23 Switzerland t SL(C) 5 Migration at CERN CHEP 2009, Prague Ulrich SCHWICKERATH Ricardo SILVA CERN, IT-FIO-FS.

Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.

ATLAS TIER3 in Valencia Santiago González de la Hoz IFIC – Instituto de Física Corpuscular (Valencia)

ATLAS Physics Analysis Framework James R. Catmore Lancaster University.

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon

A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.

BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.

Tier3(g,w) meeting at ANL ASC

Data Challenge with the Grid in ATLAS

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

Readiness of ATLAS Computing - A personal view

Computing Report ATLAS Bern

Artem Trunov and EKP team EPK – Uni Karlsruhe

US CMS Testbed.

ATLAS DC2 & Continuous production

Production Manager Tools (New Architecture)

The LHCb Computing Data Challenge DC06

Presentation transcript:

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June ATLAS cluster in Geneva Szymon Gadomski University of Geneva and INP Cracow the existing system − hardware, configuration, use − failure of a file server the system for the LHC startup − draft architecture of the cluster − comments

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June The cluster at Uni Dufour First batch (2005) 12 Sun Fire v20z (wn01 to 12) –2 AMD Opteron 2.4 GHz, 4 GB RAM 1 Sun Fire v20z (grid00) –7.9 GB disk array SLC3 Second batch ( ) 21 Sun Fire X2200 (atlas-ui01 to 03 and wn16 to 33) –2 Dual Core AMD Opteron 2.6 GHz, 4 GB RAM 1 Sun Fire X4500 with 48 disks 500 GB each –17.6 TB effective storage (1 RAID 6, 1 RAID 50, 2 RAID 5) SLC4 Description at

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June Diagram of the existing system 26 TB and 108 cores in workers and login PCs atlas-ui01..03, grid00 and grid02 have double identity because of the CERN network setup by Frederik Orellana

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June The cluster at Uni Dufour (2) Frederik, Jan 07“old” worker nodesnew ones

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June disk arrays CERN network worker nodes back side

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June Current use (1) GRID batch facility in NorduGrid –in fact two independent systems “old” (grid00 + wn01..12), SLC3, older NG “new” (grid02 + wn16..33), SLC4, newer NG –runs most of the time for the ATLAS Mote Carlo production controlled centrally by the “production system” (role of the Tier 2) –higher priority for Swiss ATLAS members mapped to a different user ID, using a different batch queue –ATLAS releases installed locally and regularly updated some level of maintenance is involved

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June NorduGrid (ATLAS sites) n.b. you can submit to all ATLAS NG resources prerequisites –grid certificate –read NG user guide easier to deal with then ATLAS software

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June Current use (2) Interactive work like on lxplus.cern.ch –login with the same account (“afs account”) –see all the software installed on /afs/cern.ch, including the nightly releases –use the ATLAS software repository in CVS like on lxplus “cmt co …” works –same environment to work with Athena, just follow the “ATLAS workbook” Offer advantages of lxplus without the limitations –You should be able to do the same things the same way. –Your disk space can easily be ~TB and not ~100 MB. Roughly speaking we have a factor of –Around 18’000 people have accounts on lxplus…

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June Problems and failure of grid02 The PC Sun Fire X4500 “Thumper” 48 x 500 GB of disk space, 17 TB effective storage 2 Dual Core AMD Opteron processors, 16 GB of RAM one Gb network connection to CERN, one Gb to the University network including all worker nodes (not enough) SLC4 installed by Frederik last January working as a file server, batch server (Torque) and Nordu Grid server (manager + ftp + info) The problems batch system stopping often when many jobs running (18 OK, 36 not OK, 64 definitely not OK) high “system load” caused by data copying (scp) can take minutes to respond to a simple command The crash More jobs allowed again last Thursday, to understand scaling problems. More jobs sent. A few minutes later the system stopped responding. It could not be shut down. After a power cycle it is corrupted. It will be restored under SLC4. What to do in the long run is a good question. Any advice?

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June Plan of the system for Autumn 2007 new hardware before end of September one idea of the architecture 60 TB, 176 cores some open questions comments are welcome

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June Comments about future evolution Interactive work is vital. –Everyone needs to login somewhere. We are not supporting ATLAS software or Grid tools on laptops and desktops. –The more we can do interactively, the better for our efficiency. –A larger fraction of the cluster will be available for login. Maybe even all. Plan to remain a Grid site. –Bern and Geneva have been playing a role of a Tier 2 in ATLAS. We plan to continue that. Data transfer are too unreliable in ATLAS. –Need to find ways to make them work much better. –Data placement from FZK directly to Geneva would be welcome. No way to do that (LCG>NorduGrid) at the moment. Choice of gLite vs NorduGrid is (always) open. It depends on –feasibility of installation and maintenance by one person not full time –reliability (file transfers, job submissions) –integration with the rest of ATLAS computing Colleagues in Geneva working for Neutrino experiments would like to use the cluster. We are planning to make it available to them and use priorities. Be careful with extrapolations from present experience. Real data volume will be 200x larger then a large Monte Carlo production.

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June You can always use the (Nordu) GRID directly: –make your own.xrsl file (see next page) –submit jour jobs and get back the results (ngsub, ngstat, ngcat, ngls) –write your own scripts to mange this when you have 100 or 1000 similar jobs to submit, each one with different input data –many people work like that now –if you like to work like that, less problems for me I think we will use higher level tools which: –operate on data sets and not on files –know about ATLAS catalogues of data sets –automatize the process: make many similar jobs and look after them (split the input data, merging of output) –work with many Grids Higher level GRID tools

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June An example of the job description in.xrsl &("executable" = "SUSYWG_prewrapper.sh" )("inputFiles" = ("SUSYWG_wrapper.sh" "SUSYWG_wrapper.sh" ) ("SUSYWG_prewrapper.sh" "SUSYWG_prewrapper.sh" ) ("SUSYWG_csc_evgen_trf.sh" "SUSYWG_csc_evgen_trf.sh" ) ("SUSYWG_csc_atlfast_trf.sh" "SUSYWG_csc_atlfast_trf.sh" ) ("SUSYWG_SetCorrectMaxEvents.py" "SUSYWG_SetCorrectMaxEvents.py" ) ("z1jckkw_nunu_PT40_ unw" "gsiftp://lheppc50.unibe.ch/se1/borgeg/IN_FILES/zNj_nnPT40_in_noEF/z1jckkw_nunu_PT40_ unw" ) ("z1jckkw_nunu_PT40_01.001_unw.par" "gsiftp://lheppc50.unibe.ch/se1/borgeg/IN_FILES/zNj_nnPT40_in_noEF/z1jckkw_nunu_PT40_ _unw.par.ok" ) ("SUSYWG_DC AlpgenJimmyToplnlnNp3.py" "SUSYWG_DC AlpgenJimmyToplnlnNp3.py" ) ("SUSYWG_csc_atlfast_log.py" "SUSYWG_csc_atlfast_log.py" ) )("outputFiles" = ("z1jckkw_nunu_PT40_ Atlfast12064.NTUP.root" "gsiftp://lheppc50.unibe.ch/se1/borgeg/atlfast12064/zNj_nnPT40_noEF/z1jckkw_nunu_PT40_ Atlfast12064.NTUP.root" ) ("z1jckkw_nunu_PT40_ Atlfast12064.AOD.pool.root" "gsiftp://lheppc50.unibe.ch/se1/borgeg/atlfast12064/zNj_nnPT40_noEF/z1jckkw_nunu_PT40_ Atlfast12064.AOD.pool.root" ) ("z1jckkw_nunu_PT40_ Atlfast12064.log" "gsiftp://lheppc50.unibe.ch/se1/borgeg/atlfast12064/zNj_nnPT40_noEF/z1jckkw_nunu_PT40_ Atlfast12064.log" ) ("z1jckkw_nunu_PT40_ Atlfast12064.logdir.tar.gz" "gsiftp://lheppc50.unibe.ch/se1/borgeg/atlfast12064/zNj_nnPT40_noEF/z1jckkw_nunu_PT40_ Atlfast12064.logdir.tar.gz" ) )("memory" = "1000" )("disk" = "5000" )("runTimeEnvironment" = "APPS/HEP/ATLAS _COTRANSFER" )("stdout" = "stdout" )("stderr" = "stderr" )("gmlog" = "gmlog" )("CPUTime" = "2000" )("jobName" = "noEF_z1jckkw_nunu_PT40_ Atlfast " ) ) ) May 07 02:01:18 RELATION: jobName = SEQUENCE( LITERAL( noEF_z1jckkw_nunu_PT40_ Atlfast ) )

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June Higher level GRID tools (2) GridPilot –a Graphical User Interface (GUI) in Java –developed by Frederik Orellana and Cyril Topfel –only works with Nordu Grid so far –used so far for beam test analysis and simulation as a tool to get data to Bern and to Geneva Ganga –a command line tool (in Python) or a GUI –an “official” development, well supported –several people, including Till, have used it with success in the command line mode –can work with NorduGrid, LCG and plain batch system pathena –the easiest to use, you run pathena like athena –developed by Tadashi Maeno, BNL –needs a server that does all and a pilot process on every node –for the moment the only server is at BNL, but Tadashi will move here… The plan is to try Ganga and GridPilot on some real cases and see how we like them.

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June Summary The ATLAS cluster in Geneva is a large Tier 3 –now 108 cores and 26 TB –in a few months 178 cores and 70 TB In NorduGrid since a couple of years, runs ATLAS simulation like a Tier 2. Plan to continue that. Recently more interactive use by the Geneva group, plan to continue and to develop further. Not so good experience with Sun Fire X4500 running Scientific Linux. Need to decide if one should change to Solaris. Choice of middleware can be discussed. Room for more exchange of experiences between the Swiss WLCG experts: hardware, cluster architecture, OS, cluster management tools, middleware, data transfer tools

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June backup slides

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June NorduGrid and LCG When we started with the Grid in 2004, the LCG was not an option –Not applicable to small clusters like Bern ATLAS and Geneva ATLAS. You would need 4 machines to run services. –You need to reinstall the system and have SLC4. Not possible on the Ubelix cluster in Bern. –Support was given to large sites only. The situation now –NG in Bern and Geneva, NG + LCG in Manno –Ubelix (our biggest resource) will never move to SLC4. –Our clusters are on SLC4 and have become larger, but still a sacrifice of 4 machines would be significant. –The change would be disruptive and might not pay off in the long term. NG still has a better reputation for what concerns stability. We see it as a low(er) maintenance solution. –Higher level Grid tools promise to hide the differences to the users. Do not change now, but reevaluate the arguments regularly.

Dario Barberis: ATLAS Computing Plans for 2007 WLCG Workshop - 22 January Computing Model: central operations l Tier-0: nCopy RAW data to Castor tape for archival nCopy RAW data to Tier-1s for storage and subsequent reprocessing nRun first-pass calibration/alignment (within 24 hrs) nRun first-pass reconstruction (within 48 hrs) nDistribute reconstruction output (ESDs, AODs & TAGs) to Tier-1s l Tier-1s: nStore and take care of a fraction of RAW data (forever) nRun “slow” calibration/alignment procedures nRerun reconstruction with better calib/align and/or algorithms nDistribute reconstruction output (AODs, TAGs, part of ESDs) to Tier-2s nKeep current versions of ESDs and AODs on disk for analysis l Tier-2s: nRun simulation (and calibration/alignment when appropriate) nKeep current versions of AODs on disk for analysis

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June General comments The tier 3 is not counted for the ATLAS “central operations”. We can use it as we prefer. We have done quite some Tier 2 duty already. We are not obliged. It makes sense to keep the system working, but our work has a higher priority. How are we going to work when the data comes? Be careful when you extrapolate from your current experience –A large simulation effort in ATLAS (the Streaming Test) produces “10 runs of 30 minute duration at 200 Hz event rate (3.6 million events)”. That is ~10 -3 of the data sample produced by ATLAS in one year. Need to prepare for a much larger scale. FDR tests of the Computing this Summer. Need to connect to them, exercise the flow of data from Manno, FZK and CERN. Also exercise sending our jobs to the data and collecting the output.

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June To Do list, for discussion see with Olivier if GridPilot can work for him one way of working with Ganga on our cluster –from lxplus only grid00-cern is visible and ngsub command has a problem with double identity –on atlas-ui01 there is a library missing fix Swiss ATLAS computing documentation pages –many pages obsolete, a clearer structure is needed –need updated information about Grid tools and ATLAS software exercise data transfers –from CERN (T0), Karlsruhe (T1) and Manno (T2), –a few different ways and tools if thermal modeling for the SCT goes ahead, help to use the cluster

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June To Do list, for discussion (2) Medium term –consolidate system administration see if we can make routing more efficient make all software run on atlas-ui01, then make all the ui identical update the system on the new workers when ATLAS ready change old workers to SLC4 and setup some monitoring of the cluster –prepare next round of hardware, possibly evaluate prototypes from Sun –get Manno involved in the “FDR” (tests of ATLAS computing model) –decide about Ganga vs GridPilot, provide instructions how to use one Longer term –try pathena when available at CERN –try PROOF –setup a web server –try the short-lived credential service of Switch Help would be welcome!

S. Gadomski, "The ATLAS cluster in Geneva", Swiss WLCG experts workshop, CSCS, June Some open questions Local accounts vs afs accounts. Direct use of the batch system –with standard command-line tools or as back-end to Ganga –would require some work (queues in the batch system, user accounts on every worker) but can be done Use of the system by the Neutrino group.