Presentation is loading. Please wait.

Presentation is loading. Please wait.

STORM & GPFS on Tier-2 Milan

Similar presentations


Presentation on theme: "STORM & GPFS on Tier-2 Milan"— Presentation transcript:

1 STORM & GPFS on Tier-2 Milan
Workshop CCR – May

2 Why StoRM ? Security model inherited by ACL support of underlying file system. Light diskspace and authorization manager, no special hardware required. Scalable Easily configurable using yaim. Individual parts of final configuration can still be tuned at a later time. CCR - May STORM&GPFS

3 Why StoRM ? SOAP web service, roles of all its parts can be assigned separately: front- end, back-end, gridftp, mysql request rate handled up to 40Hz Gridftp throughput 120MB/s on 1Gb/s lan. . Well matched with gpfs architecture. CCR - May STORM&GPFS

4 Why GPFS ? visibility as a local fs, no remote protocols.
Cluster structured Slave clustering Redundancy Scalable Abstraction layer Robustness High performance: concurrent read from 16 clients 380MB/s (over a physical network limit of 412MB/s), concurrent write 320MB/s CCR - May STORM&GPFS

5 Network and storage setup
CCR - May STORM&GPFS

6 gpfs gpfs gridftp-b1-1 gridftp-b1-2 gridftp-b1-3 gridftp-b1-4 Wn-b1-1
Ce-b1-1 (condor scheduler)‏ glite-UI-nodes.mi.infn.it glite-condor-nodes.mi.infn.it ui.mi.infn.it gpfs Wn-b1-1 (condor exe)‏ gpfs Wn-b1-38 (condor exe)‏ gsiftp://gridftp-b1-1.mi.infn.it:2811/atlas/… /dev/storage_2 /dev/storage_1 /dev/software c13 c14 c2 c3 c1 c6 c7 c8 srm://t2cmcondor.mi.infn.it:8444/srm/managerv2?SFN=/atlas/… c15 c16 c17 c4 c5 c9 c10 c11 c12 mmcrfs /dev/software -F nsd-software.txt -B 64K -m 2 -M 2 -r 2 -R 2 -Q yes -n 512 -A yes -v no -N ## definitions for /dev/storage_1 fs #sdb:ts-b1-2,ts-b1-1,ts-b1-4,ts-b1-3::dataAndMetadata:1:c2 c2:::dataAndMetadata:1::: #sdc:ts-b1-3,ts-b1-4,ts-b1-1,ts-b1-2::dataAndMetadata:1:c3 c3:::dataAndMetadata:1::: #sdd:ts-b1-4,ts-b1-3,ts-b1-2,ts-b1-1::dataAndMetadata:1:c4 c4:::dataAndMetadata:1::: #sde:ts-b1-1,ts-b1-2,ts-b1-3,ts-b1-4::dataAndMetadata:1:c5 c5:::dataAndMetadata:1::: storm FE T2cmcondor (central manager)‏ glite-condor.mi.infn.it gpfs gpfs ts-b1-1 ,..., ts-b1-4 ts-b1-5, ts-b1-6 ts-b1-7,ts-b1-8 Fiber channel FC multipath FC multipath storm-BE gridftp (ops)‏ gpfs 40TB 46TB 46TB se-b1-1 /opt/glite/yaim/bin/ig_yaim -c -s siteinfo/ig-site-info.def -n ig_SE_storm_backend gridftp-b1-1 gpfs gridftp-b1-2 gpfs gridftp-b1-3 gpfs gridftp-b1-4 gpfs

7 Issues with first production run
Project Outline: Analyse MSSM A/H  tau tau  l h at 14TeV, to obtain discovery potential for this channel in ATLAS (Publish result as a PUB Note)‏ To do this must produce ALL our own datasets Production: Target Numbers of Events: Total: 35 M events Milano Share: 3.2 M events CCR - May STORM&GPFS

8 Issues with first production run
Job specification: Atlfast II Simulation using job transform (csc_simul_reco_trf.py)‏ Input: evgen -> lcg-cp from SE: INFN-MILANO_LOCALGROUPDISK Output: AOD -> lcg-cp to SE: INFN-MILANO_LOCALGROUPDISK Event Per Job: 250 Total Jobs: 12800 Requirements: 2GB RAM / 2GB swap Running Time: Intel(R) Xeon(R) CPU 2.50GHz cache size: 6144 KB  6 hours (TYPE1)‏ Intel(R) Xeon(TM) CPU 3.06GHz cache size: 512 KB  18 hours (TYPE2)‏ Cluster Performance: 48 CPU (TYPE1) : 192 Jobs/Day 124 CPU (TYPE2): 165 Jobs/Day CCR - May STORM&GPFS

9 Issues with first production run
Failure Rate: 50% Environment setup and variables tuning Machine requirements: huge memory needed Works at typical ATLAS Production failure rate ~3% when functioning correctly Mainly issues with setup, GPFS and storage Gpfs perfectly integrated with StoRM. No special settings required New clients can be easily added CCR - May STORM&GPFS

10 Issues with first production run
Gpfs caching related to OS limits (single- process 2GB virtual memory on 32-bit nodes). 64-bit clients can accommodate larger cache.. C-NFS is good to provide fault-tolerant NFS read-only mounts for better, faster caching, but special care must be taken. Better to separate disk servers from nfs and set gpfs redundancy for related filesystems Local hosts should resolve gridftp servers, to prevent StoRM from generating too much internal traffic. (VLANs could also be used). CCR - May STORM&GPFS

11 Conclusions Most issues found in GPFS and worker node setup.
The GPFS cluster architecture allows better organization for pools of similar machines, with central steering. GPFS performance can be easily enforced in a SAN. Headnodes can be added as needed StoRM scalability: Can manage as many frontend and gridftp servers as needed CCR - May STORM&GPFS


Download ppt "STORM & GPFS on Tier-2 Milan"

Similar presentations


Ads by Google