1 The D0 NIKHEF Farm Kors Bos Ton Damen Willem van Leeuwen Fermilab, May 23 2001.

Slides:



Advertisements
Similar presentations
Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.
Advertisements

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
ICHEP visit to NIKHEF. D0 Monte Carlo farm hoeve (farm) MC request schuur (barn) SAM MHz 2-CPU nodes 50 * 40 GB 1.2 TB.
 Changes to sources of funding for computing in the UK.  Past and present computing resources.  Future plans for computing developments. UK Status &
11 Dec 2000F Harris Datagrid Testbed meeting at Milan 1 LHCb ‘use-case’ - distributed MC production
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
The D0 Monte Carlo Challenge Gregory E. Graham University of Maryland (for the D0 Collaboration) February 8, 2000 CHEP 2000.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
08/06/00 LHCb(UK) Meeting Glenn Patrick LHCb(UK) Computing/Grid: RAL Perspective Glenn Patrick Central UK Computing (what.
1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
A Design for KCAF for CDF Experiment Kihyeon Cho (CHEP, Kyungpook National University) and Jysoo Lee (KISTI, Supercomputing Center) The International Workshop.
March 2003 CERN 1 EDG and AliEn in Prague Dagmar Adamova INP Rez near Prague.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
28 April 2003Imperial College1 Imperial College Site Report HEP Sysman meeting 28 April 2003.
A Plan for HEP Data Grid Project in Korea Kihyeon Cho Center for High Energy Physics (CHEP) Kyungpook National University CDF/D0 Grid Meeting August 5,
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
International Workshop on HEP Data Grid Nov 9, 2002, KNU Data Storage, Network, Handling, and Clustering in CDF Korea group Intae Yu*, Junghyun Kim, Ilsung.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Status of UTA IAC + RAC Jae Yu 3 rd DØSAR Workshop Apr. 7 – 9, 2004 Louisiana Tech. University.
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.
The ALICE short-term use case DataGrid WP6 Meeting Milano, 11 Dec 2000Piergiorgio Cerello 1 Physics Performance Report (PPR) production starting in Feb2001.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
WP8 Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid WP8 Meeting, 16th November 2000 Glenn Patrick (RAL)
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
October 2002 INFN Catania 1 The (LHCC) Grid Project Initiative in Prague Dagmar Adamova INP Rez near Prague.
1 PRAGUE site report. 2 Overview Supported HEP experiments and staff Hardware on Prague farms Statistics about running LHC experiment’s DC Experience.
NL Service Challenge Plans Kors Bos, Sander Klous, Davide Salomoni (NIKHEF) Pieter de Boer, Mark van de Sanden, Huub Stoffers, Ron Trompert, Jules Wolfrat.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Grid Glasgow Outline LHC Computing at a Glance Glasgow Starting Point LHC Computing Challenge CPU Intensive Applications Timeline ScotGRID.
J.J.Blaising April 02AMS DataGrid-status1 DataGrid Status J.J Blaising IN2P3 Grid Status Demo introduction Demo.
Computing plans from UKDØ. Iain Bertram 8 November 2000.
National HEP Data Grid Project in Korea Kihyeon Cho Center for High Energy Physics (CHEP) Kyungpook National University CDF CAF & Grid Meeting July 12,
NIKHEF CT/ Status NIKHEF (NL). NIKHEFDataGrid/Oxford/July DutchGrid Participation of High-energy Physics Earth observation Computer.
PC clusters in KEK A.Manabe KEK(Japan). 22 May '01LSCC WS '012 PC clusters in KEK s Belle (in KEKB) PC clusters s Neutron Shielding Simulation cluster.
Grid Glasgow Outline LHC Computing at a Glance Glasgow Starting Point LHC Computing Challenge CPU Intensive Applications Timeline ScotGRID.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Partner Logo A Tier1 Centre at RAL and more John Gordon eScience Centre CLRC-RAL HEPiX/HEPNT - Catania 19th April 2002.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
DCAF (DeCentralized Analysis Farm) Korea CHEP Fermilab (CDF) KorCAF (DCAF in Korea) Kihyeon Cho (CHEP, KNU) (On the behalf of HEP Data Grid Working Group)
W.A.Wojcik/CCIN2P3, Nov 1, CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center URL:
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
D0 File Replication PPDG SLAC File replication workshop 9/20/00 Vicky White.
Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
CCIN2P3 Site Report - BNL, Oct 18, CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center.
Status report NIKHEF Willem van Leeuwen February 11, 2002 DØRACE.
“A Data Movement Service for the LHC”
NL Service Challenge Plans
Experiences with Large Data Sets
SAM at CCIN2P3 configuration issues
UK GridPP Tier-1/A Centre at CLRC
The INFN TIER1 Regional Centre
DataTAG Project update
Gridifying the LHCb Monte Carlo production system
Status report NIKHEF Willem van Leeuwen February 11, 2002 DØRACE.
Lee Lueking D0RACE January 17, 2002
Grid activities in NIKHEF
Presentation transcript:

1 The D0 NIKHEF Farm Kors Bos Ton Damen Willem van Leeuwen Fermilab, May

2 Layout of this talk D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base A Grid intermezzo The network The next steps Fermilab, May

3 D0 Monte Carlo needs D0 Trigger rate is 100 Hz, 10 7 seconds/yr  10 9 events/yr We want 10% of that be simulated  10 8 events/yr To simulate 1 QCD event takes ~3 minutes (size ~2 Mbyte) –On a 800 MHz PIII So 1 cpu can produce ~10 5 events/yr (~200 Gbyte) –Assuming a 60% overall efficiency So our 100 cpu farm can produce ~10 7 events/yr (~20 Tbyte) –And this is only 10% of the goal we set ourselves –Not counting Nijmegen D0 farm yet So we need another 900 cpu’s –UTA (50), Lyon (200), Prague(10), BU(64), –Nijmegen(50), Lancaster(200), Rio(25),

4 How it looks

5 The NIKHEF D0 Farm Farm Server 100 Mbit/s Surfnet 1 Gbit/s SARA network Tape 1 Gbit/s switch 100 Mbit/s NIKHEF network 1 Gbit/s 1.5 TB disk cache File Server 1 Gbit/s Sam station Farm nodes.. Etc. Meta 155 Mbit/s Sam station

6 50 Farm nodes (100 cpu’s) Dell Precision Workstation 220 Dual Pentium III processor 800 MHz / 256 kB cache each 512 MB PC800 ECC RDRAM 40 GB (7200 rpm) ATA-66 disk drive no screen no keyboard no mouse wake up on Lan functionality

7 The File Server Elonex EIDE Server Dual Pentium III 700 MHz 512 MB SDRAM 20 GByte EIDE disk 1.2 Tbyte : 75 GB EIDE disks 2 x Gigabit Netgear GA620 network card The Farm Server Dell Precision 620 workstation Dual Pentium III Xeon 1 GHz 512 MB RDRAM 72.8 GByte SCSI disk Will also serve as D0 software server for the NIKHEF/D0 people

8 Software on the farm Boot via the network Standard Redhat Linux 6.2 Ups/upd on the server D0 software on the server FBSNG on the server, deamon on the nodes SAM on the file server Used to test new machines …

9 What we run on the farm Particle Generator: Pythia or Isajet Geant Detector simulation: d0gstar Digitization, adding min.bias: psim Check the data: mc_analyze Reconstruction: preco Analysis: reco_analyze

10 Example: Min.bias Did a run with 1000 events on all cpu’s –Took ~2 min./event –So ~1.5 days for the whole run –Ouput file size ~575 MByte We left those files on the nodes reason for enough local disk space Intend to repeat that “sometimes”

11 Output data -rw-r--r-- 1 a03 computer 298 Nov 5 19:25 RunJob_farm_qcdJob params -rw-r--r-- 1 a03 computer Nov 5 10:35 d0g_mcp03_pmc _nikhef.d0farm_isajet_qcd-incl- PtGt2.0_mb-none_p1.1_ _2000 -rw-r--r-- 1 a03 computer 791 Nov 5 19:25 d0gstar_qcdJob params -rw-r--r-- 1 a03 computer 809 Nov 5 19:25 d0sim_qcdJob params -rw-r--r-- 1 a03 computer Nov 3 16:15 gen_mcp03_pmc _nikhef.d0farm_isajet_qcd-incl- PtGt2.0_mb-none_p1.1_ _2000 -rw-r--r-- 1 a03 computer 1003 Nov 5 19:25 import_d0g_qcdJob py -rw-r--r-- 1 a03 computer 912 Nov 5 19:25 import_gen_qcdJob py -rw-r--r-- 1 a03 computer 1054 Nov 5 19:26 import_sim_qcdJob py -rw-r--r-- 1 a03 computer 752 Nov 5 19:25 isajet_qcdJob params -rw-r--r-- 1 a03 computer 636 Nov 5 19:25 samglobal_qcdJob params -rw-r--r-- 1 a03 computer Nov 5 19:24 sim_mcp03_psim _nikhef.d0farm_isajet_qcd-incl- PtGt2.0_mb-poisson-2.5_p1.1_ _2000 -rw-r--r-- 1 a03 computer 2132 Nov 5 19:26 summary.conf

12 Output data translated Gbyte gen_* 1.5 Gbyte d0g_* 0.7 Gbyte sim_* import_gen_*.py import_d0g_*.py import_sim_*.py isajet_*.params RunJob_Farm_*.params d0gstar_*.params d0sim_*.params samglobal_*.params Summary.conf 12 files for generator+d0gstar+psim But of course only 3 big ones Total ~2 Gbyte

13 Data management sam NIKHEF D0 FARM Fermilab d0mino SARA TERAS reconstructed data Import_gen.py generator data Import_d0g.py geant data (hits) Import_sim.py sim data (digis) Import_reco.py parameters

14 Automation Mc_runjob (modified) –Prepares MC jobs (gen+sim+reco+anal) (f.e.) 300 events per job/cpu Repeat (f.e.) 500 times –Submits them into the batch (FBS) Ran on the nodes –Copy to fileserver after completion A separate batch job onto the fileserver –Submits them into SAM Sam does file transfers to Fermi and SARA Runs for a week …

15 farm server file server node SAM DB datastore fbs(rcp) fbs(sam) fbs(mcc) mcc request mcc input mcc output 1.2 TB 40 GB FNAL SARA control data metadata fbs job: 1 mcc 2 rcp 3 sam 50 +

16 sam Fermilab d0mino SARA TERAS This is a grid! NIKHEF D0 FARM in2p3 D0 FARM KUN D0 FARM

17 The Grid Not just D0, but for the LHC expts. Not just SAM, but for any database Not just farms, but any cpu resource Not just SARA, but any mass storage Not just FBS, but any batch system Not just HEP, but any science, EO, …

18 European Datagrid Project 3 yr. Project for 10 M€ Manpower to develop grid tools Cern, in2p3, infn, pparc, esa, fom Nikhef + sara + knmi –Farm management –Mass storage management –Network management –Testbed –HEP & EO applications

19 LHC - Regional Centres Department Atlas LHCbAlice Desktop CERN – Tier 0 Tier 1 FNAL NIKHEF/ SARA IN2P3 Tier2 Vrije Univ. Amsterdam RAL INFN Brussel Leuven Utrecht Nijmegen SURFnet possibly KEK BNL

20 DataGrid : Test bed sites Nikhef

The NL-Datagrid Project

22 NL-Datagrid Goals National test bed for middleware development –WP4, WP5, WP6, WP7, WP8, WP9 To become an LHC Tier-1 center –ATLAS, LHCb, Alice To use it for the existing program –D0, Antares To use it for other sciences –EO, Astronomy, Biology for tests with other (Trans Atlantic) grids –D0 –PPDG, GriPhyN

23 NL-Datagrid Testbed Sites Nijmegen Univ. (Atlas) CERN RAL FNAL ESA Univ.Utrecht (Alice) Vrije Univ. (LHCb) Univ.Amsterdam (Atlas)

24 Utrecht Univ. Dutch Grid topology NIKHEF Free Univ. Surfnet SARA KNMI FNAL ESA D-PAF Munchen CERN Geneva Nijmegen Univ. LHCb D0 Atlas Alice D0 Atlas LHCb Alice

25 End of the Grid intermezzo Back to The NIKHEF D0 farm and Fermilab: The network

26 Network bandwidth NIKHEF  SURFnet1 Gbit SURFnet: Amsterdam  Chicago 622 Mbit Esnet: Chicago  Fermilab155 Mbit ATM But ftp gives us ~4 Mbit/sec bbftp gives us ~25 Mbit/sec bbftp processes in parallel ~45 Mbit/sec For 2002 NIKHEF  SURFnet2.5 Gbit SURFnet: Amsterdam  Chicago622 Mbit SURFnet: Amsterdam  Chicago2.5 Bbit optical Chicago  Fermilab? but more..

27 ftp++ ftp gives you 4 Mb/s to Fermilab bbftp: increased buffer, # streams gsiftp: with security layer, increased buffer,.. grid_ftp: increased buffer, # streams, #sockets, fail-over protection, security bbftp  ~20 Mb/s grid_ftp  ~25 Mb/s Multiple ftp in //  factor 2 seen Should get to > 100 Mbit/sec Or ~1 Gbyte/minute

28 SURFnet5 access capacity Access capacity 100 Gbit/s 1 Gbit/s 10 Mbit/s 100 Mbit/s 10 Gbit/s Mbit/s 2,5 Gbit/s 20 Gbit/s SURFnet5 10 Gbit/s 1.0 Gbit/s SURFnet4

29 NL SURFnet Geneva UK SuperJANET4 Abilene ESNET MREN It GARR-B GEANT NewYork Fr Renater STAR-TAP STAR-LIGHT 622 Mb 2.5 Gb TA access capacity

30 Network load last week Needed for 100 MC CPU’s: ~10 Mbit/s (200 GB/day) Available to Chicago: 622 Mbit/s Available to FNAL: 155 Mbit/s Needed next year (double cap.): ~25 Mbit/s Available to Chicago: 2.5 Gbit/s: factor 100 more !! Available to FNAL: ??

31 New nodes for D0 In a 2u 19” mounting Dual 1 GHz PIII 1 Gbyte RAM 40 Gbyte disk 100 Mbit ethernet Cost ~k$2 Dell machines were ~k$4 (tax incl)  FACTOR 2 cheaper!! assembly time 1/hour 1 switch k$2.5 (24 ports) 1 rack k$2 (46u high) Requested for 2001: k$60 22 dual cpu’s 1 switch 1 19” rack

32

33 The End Kors Bos Fermilab, May