1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,

1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab, May 23 2001

2 D0 Monte Carlo needs D0 Trigger rate is 100 Hz, 10 7 seconds/yr  10 9 events/yr We want at least 10% of that be simulated  10 8 events/yr To simulate 1 QCD event takes ~3 minutes (size ~2 Mbyte) –On a 800 MHz PIII So 1 cpu can produce ~10 5 events/yr (~200 Gbyte) –Assuming a 60% overall efficiency So our 100 cpu farm can produce ~10 7 events/yr (~20 Tbyte) –And this is only 10% of the goal we set ourselves –Not counting Nijmegen D0 farm yet So we need at least an order of magnitude more –UTA (50), Lyon (200), Prague(10), BU(64), –Nijmegen(50), Lancaster(200), Rio(25),

3 Example: Min.bias Did a run with 1000 events on all cpu’s –Took ~2 min./event –So ~1.5 days for the whole run –Ouput file size ~575 MByte We left those files on the nodes reason for enough local disk space ! Intend to repeat that “sometimes”

4 Output data -rw-r--r-- 1 a03 computer 298 Nov 5 19:25 RunJob_farm_qcdJob308161443.params -rw-r--r-- 1 a03 computer 1583995325 Nov 5 10:35 d0g_mcp03_pmc03.00.01_nikhef.d0farm_isajet_qcd-incl- PtGt2.0_mb-none_p1.1_308161443_2000 -rw-r--r-- 1 a03 computer 791 Nov 5 19:25 d0gstar_qcdJob308161443.params -rw-r--r-- 1 a03 computer 809 Nov 5 19:25 d0sim_qcdJob308161443.params -rw-r--r-- 1 a03 computer 47505408 Nov 3 16:15 gen_mcp03_pmc03.00.01_nikhef.d0farm_isajet_qcd-incl- PtGt2.0_mb-none_p1.1_308161443_2000 -rw-r--r-- 1 a03 computer 1003 Nov 5 19:25 import_d0g_qcdJob308161443.py -rw-r--r-- 1 a03 computer 912 Nov 5 19:25 import_gen_qcdJob308161443.py -rw-r--r-- 1 a03 computer 1054 Nov 5 19:26 import_sim_qcdJob308161443.py -rw-r--r-- 1 a03 computer 752 Nov 5 19:25 isajet_qcdJob308161443.params -rw-r--r-- 1 a03 computer 636 Nov 5 19:25 samglobal_qcdJob308161443.params -rw-r--r-- 1 a03 computer 777098777 Nov 5 19:24 sim_mcp03_psim01.02.00_nikhef.d0farm_isajet_qcd-incl- PtGt2.0_mb-poisson-2.5_p1.1_308161443_2000 -rw-r--r-- 1 a03 computer 2132 Nov 5 19:26 summary.conf

5 Output data translated 0.047 Gbyte gen_* 1.5 Gbyte d0g_* 0.7 Gbyte sim_* import_gen_*.py import_d0g_*.py import_sim_*.py isajet_*.params RunJob_Farm_*.params d0gstar_*.params d0sim_*.params samglobal_*.params Summary.conf 12 files for generator+d0gstar+psim But of course only 3 big ones Total ~2 Gbyte Per Day, on 100 cpu’s Total 200 Gbyte/day !

6 Automation Mc_runjob (modified) –Prepares MC jobs (gen+sim+reco+anal) (f.e.) 300 events per job/cpu Repeat (f.e.) 500 times –Submits them into the batch (FBSNG) Ran on the nodes Moves the executable to the nodes + some files –Copy to fileserver after completion A separate batch job onto the fileserver Data moves between nodes and server –Submits them into SAM Sam does file transfers to Fermi and SARA Runs for a week …

7 farm server file server node SAM DB datastore fbs(rcp) fbs(sam) fbs(mcc) mcc request mcc input mcc output 1.2 TB 40 GB SARA control data metadata fbs job: 1 mcc 2 rcp 3 sam 50 + datastore FNAL

8 Network bandwidth NIKHEF  SURFnet1 Gbit SURFnet: Amsterdam  Chicago 622 Mbit Esnet: Chicago  Fermilab155 Mbit ATM But ftp gives us ~4 Mbit/sec bbftp gives us ~25 Mbit/sec bbftp processes in parallel ~45 Mbit/sec For 2002 NIKHEF  SURFnet2.5 Gbit SURFnet: Amsterdam  Chicago622 Mbit SURFnet: Amsterdam  Chicago2.5 Gbit optical Chicago  Fermilab? More than 155

9 network capacity internally Access capacity 100 Gbit/s 1 Gbit/s 10 Mbit/s 100 Mbit/s 10 Gbit/s 1999200020012002 155 Mbit/s 2,5 Gbit/s 20 Gbit/s SURFnet5 10 Gbit/s 1.0 Gbit/s SURFnet4

10 NL SURFnet Geneva UK SuperJANET4 Abilene ESNET MREN It GARR-B GEANT NewYork Fr Renater STAR-TAP STAR-LIGHT 622 Mb 2.5 Gb TA network capacity

11 Network load last week Needed for 100 MC CPU’s: ~10 Mbit/s (200 GB/day) Available to Chicago: 622 Mbit/s Available to FNAL: 155 Mbit/s Needed next year (double cap.): ~25 Mbit/s Available to Chicago: 2.5 Gbit/s: factor 100 more !! Available to FNAL: ??

12 Conclusions Producing a lot of data is easy Storing a lot of data less easy, but still easy Moving a lot of data even less easy, but still easy So what is the problem? Managing a lot of data is difficult  metadata dbase The network around Fermilab/CERN is getting tight Otherwise there is enough bandwidth ! Conclusion: Do the easiest thing: Don’t store or move: recalculate !!

1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,

Similar presentations

Presentation on theme: "1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,

Similar presentations

Presentation on theme: "1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,"— Presentation transcript:

Similar presentations

About project

Feedback