Download presentation
Presentation is loading. Please wait.
Published byFrank Henderson Modified over 9 years ago
1
1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab, May 23 2001
2
2 D0 Monte Carlo needs D0 Trigger rate is 100 Hz, 10 7 seconds/yr 10 9 events/yr We want at least 10% of that be simulated 10 8 events/yr To simulate 1 QCD event takes ~3 minutes (size ~2 Mbyte) –On a 800 MHz PIII So 1 cpu can produce ~10 5 events/yr (~200 Gbyte) –Assuming a 60% overall efficiency So our 100 cpu farm can produce ~10 7 events/yr (~20 Tbyte) –And this is only 10% of the goal we set ourselves –Not counting Nijmegen D0 farm yet So we need at least an order of magnitude more –UTA (50), Lyon (200), Prague(10), BU(64), –Nijmegen(50), Lancaster(200), Rio(25),
3
3 Example: Min.bias Did a run with 1000 events on all cpu’s –Took ~2 min./event –So ~1.5 days for the whole run –Ouput file size ~575 MByte We left those files on the nodes reason for enough local disk space ! Intend to repeat that “sometimes”
4
4 Output data -rw-r--r-- 1 a03 computer 298 Nov 5 19:25 RunJob_farm_qcdJob308161443.params -rw-r--r-- 1 a03 computer 1583995325 Nov 5 10:35 d0g_mcp03_pmc03.00.01_nikhef.d0farm_isajet_qcd-incl- PtGt2.0_mb-none_p1.1_308161443_2000 -rw-r--r-- 1 a03 computer 791 Nov 5 19:25 d0gstar_qcdJob308161443.params -rw-r--r-- 1 a03 computer 809 Nov 5 19:25 d0sim_qcdJob308161443.params -rw-r--r-- 1 a03 computer 47505408 Nov 3 16:15 gen_mcp03_pmc03.00.01_nikhef.d0farm_isajet_qcd-incl- PtGt2.0_mb-none_p1.1_308161443_2000 -rw-r--r-- 1 a03 computer 1003 Nov 5 19:25 import_d0g_qcdJob308161443.py -rw-r--r-- 1 a03 computer 912 Nov 5 19:25 import_gen_qcdJob308161443.py -rw-r--r-- 1 a03 computer 1054 Nov 5 19:26 import_sim_qcdJob308161443.py -rw-r--r-- 1 a03 computer 752 Nov 5 19:25 isajet_qcdJob308161443.params -rw-r--r-- 1 a03 computer 636 Nov 5 19:25 samglobal_qcdJob308161443.params -rw-r--r-- 1 a03 computer 777098777 Nov 5 19:24 sim_mcp03_psim01.02.00_nikhef.d0farm_isajet_qcd-incl- PtGt2.0_mb-poisson-2.5_p1.1_308161443_2000 -rw-r--r-- 1 a03 computer 2132 Nov 5 19:26 summary.conf
5
5 Output data translated 0.047 Gbyte gen_* 1.5 Gbyte d0g_* 0.7 Gbyte sim_* import_gen_*.py import_d0g_*.py import_sim_*.py isajet_*.params RunJob_Farm_*.params d0gstar_*.params d0sim_*.params samglobal_*.params Summary.conf 12 files for generator+d0gstar+psim But of course only 3 big ones Total ~2 Gbyte Per Day, on 100 cpu’s Total 200 Gbyte/day !
6
6 Automation Mc_runjob (modified) –Prepares MC jobs (gen+sim+reco+anal) (f.e.) 300 events per job/cpu Repeat (f.e.) 500 times –Submits them into the batch (FBSNG) Ran on the nodes Moves the executable to the nodes + some files –Copy to fileserver after completion A separate batch job onto the fileserver Data moves between nodes and server –Submits them into SAM Sam does file transfers to Fermi and SARA Runs for a week …
7
7 farm server file server node SAM DB datastore fbs(rcp) fbs(sam) fbs(mcc) mcc request mcc input mcc output 1.2 TB 40 GB SARA control data metadata fbs job: 1 mcc 2 rcp 3 sam 50 + datastore FNAL
8
8 Network bandwidth NIKHEF SURFnet1 Gbit SURFnet: Amsterdam Chicago 622 Mbit Esnet: Chicago Fermilab155 Mbit ATM But ftp gives us ~4 Mbit/sec bbftp gives us ~25 Mbit/sec bbftp processes in parallel ~45 Mbit/sec For 2002 NIKHEF SURFnet2.5 Gbit SURFnet: Amsterdam Chicago622 Mbit SURFnet: Amsterdam Chicago2.5 Gbit optical Chicago Fermilab? More than 155
9
9 network capacity internally Access capacity 100 Gbit/s 1 Gbit/s 10 Mbit/s 100 Mbit/s 10 Gbit/s 1999200020012002 155 Mbit/s 2,5 Gbit/s 20 Gbit/s SURFnet5 10 Gbit/s 1.0 Gbit/s SURFnet4
10
10 NL SURFnet Geneva UK SuperJANET4 Abilene ESNET MREN It GARR-B GEANT NewYork Fr Renater STAR-TAP STAR-LIGHT 622 Mb 2.5 Gb TA network capacity
11
11 Network load last week Needed for 100 MC CPU’s: ~10 Mbit/s (200 GB/day) Available to Chicago: 622 Mbit/s Available to FNAL: 155 Mbit/s Needed next year (double cap.): ~25 Mbit/s Available to Chicago: 2.5 Gbit/s: factor 100 more !! Available to FNAL: ??
12
12 Conclusions Producing a lot of data is easy Storing a lot of data less easy, but still easy Moving a lot of data even less easy, but still easy So what is the problem? Managing a lot of data is difficult metadata dbase The network around Fermilab/CERN is getting tight Otherwise there is enough bandwidth ! Conclusion: Do the easiest thing: Don’t store or move: recalculate !!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.