Download presentation
Presentation is loading. Please wait.
1
Converting ASGARD into a MC-Farm for Particle Physics Beowulf-Day 17.01.05 A.Biland IPP/ETHZ
2
17.01.05Adrian Biland, IPP/ETHZ2 Beowulf Concept Three Main Components:
3
17.01.05Adrian Biland, IPP/ETHZ3 Beowulf Concept Three Main Components: CPU Nodes
4
17.01.05Adrian Biland, IPP/ETHZ4 Beowulf Concept Three Main Components: CPU NodesNetwork
5
17.01.05Adrian Biland, IPP/ETHZ5 Beowulf Concept Three Main Components: CPU NodesNetworkFileserver
6
17.01.05Adrian Biland, IPP/ETHZ6 Beowulf Concept Three Main Components: CPU NodesNetworkFileserver $$$$$$$$$ ? $$$$ ? $$$ ? How much of the (limited) money to spend for what ??
7
17.01.05Adrian Biland, IPP/ETHZ7 Beowulf Concept Intended (main) usage : “Eierlegende Woll-Milch-Sau” (one size fits everything) Put ~equal amount of money into each component ==> ok for (almost) any possible use, but waste of money for most applications
8
17.01.05Adrian Biland, IPP/ETHZ8 Beowulf Concept Intended (main) usage : ~80% ~10% ~10% [ ASGARD, HREIDAR-I ] CPU-bound jobs with limited I/O and interCPU communication
9
17.01.05Adrian Biland, IPP/ETHZ9 Beowulf Concept Intended (main) usage : ~50% ~40% ~10% [ HREIDAR-II ] Jobs with high interCPU communication needs: (Parallel Proc.)
10
17.01.05Adrian Biland, IPP/ETHZ10 Beowulf Concept Intended (main) usage : ~50% ~10% ~40% Jobs with high I/O needs or large datasets: (Data Analysis)
11
17.01.05Adrian Biland, IPP/ETHZ11 Fileserver Problems: a) Speed (parallel access) Inexpensive Fileservers reach disk-I/O ~50 MB/s 500 single-CPU jobs ==> 50 MB/s /500 jobs = 100kB/s /job (as an upper limit; typical values reached much smaller) Using several Fileservers in parallel: -- difficult data management (where is which file ?) [ use parallel filesystems ? ] -- hot spots (all jobs want to access same dataset ) [ data replication ==> $$$ ]
12
17.01.05Adrian Biland, IPP/ETHZ12 Fileserver Problems: a) Speed (parallel access) How (not) to read/write the data: Bad: NFS (constant transfer of small chunks of data) ==> always disk repositioning ==> disk-I/O --> 0 (somewhat improved with large cache (>>100MB) in memory if write-cache full: long time to flush to disk ==> server blocks) ~ok: rcp (transfer of large blocks from/to local /scratch ) /scratch rather small on ASGARD if many jobs want to transfer at same time ??? Best: fileserver initiates rpc transfers on request user discipline, not very transparent, …
13
17.01.05Adrian Biland, IPP/ETHZ13 Fileserver Problems: b) Capacity …. 500 jobs producing data Each writes 100kB/s ==> 50 MB/s to Fileserver ==> 4.2 TB / day !
14
17.01.05Adrian Biland, IPP/ETHZ14 Particle Physics MC Need huge amount of statistically independent events #events >> #CPUs ==> ‘embarassingly parallel’ problem ==> 5x500 MIPS as good as 1x2500 MIPS Usually two sets of programs: a)Simulation: produce huge, very detailed MC-files (adapted standard packages [GEANT, CORSKA, …] b) Reconstruction: read MC-files, write smaller reco-files selected events, physics data (special SW developed by each experiment) Mass-Production: only reco-files needed: ==> combine both tasks in one job, use /scratch
15
17.01.05Adrian Biland, IPP/ETHZ15 ASGARD Status: 24 nodes/ frame 10 frames /home /work /arch Local disk per node: 1GB / 1GB swap 4GB /scratch
16
17.01.05Adrian Biland, IPP/ETHZ16 ASGARD Status: 24 nodes/ frame 10 frames /home /work /arch Local disk per node: 1GB / 1GB swap 4GB /scratch Fileserver: ++bandwidth, ++capacity /scratch: guaranteed space/job Needed:
17
17.01.05Adrian Biland, IPP/ETHZ17 ASGARD Upgrade: 24 nodes/ frame 10 frames /home /work /arch Local disk per node: 0.2GB / 0.3GB swap 2.5GB /scr1 2.5GB /scr2 4x 400GB SATA, RAID-10 (800GB usable) / frame
18
17.01.05Adrian Biland, IPP/ETHZ18 ASGARD Upgrade: 24 nodes/ frame 10 frames /home /work /arch Local disk per node: 0.2GB / 0.3GB swap 2.5GB /scr1 2.5GB /scr2 4x 400GB SATA, RAID-10 (800GB usable) / frame Adding 10 Fileservers (~65kFr), ASGARD can serve ~2years as MC Farm and GRID Testbed …
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.