Download presentation
Presentation is loading. Please wait.
Published byGwenda O’Brien’ Modified over 8 years ago
1
StoRM + Lustre Proposal YAN Tian On behalf of Distributed Computing Group 2014.12.10 1
2
INTRODUCTION TO STORM 1.Architecture 2.Security (X509) 3.User Access Management 4.Server Scalability 2
3
StoRM Architecture Overview 3 Simple architecture – FE handle authorization and SRM request – DB store asynchronous SRM request info – BE execute syn/asyn request, bind with underlying fs StoRM act as a frontend of storage at a site
4
StoRM Security StoRM rely on user credential for what concern user authentication and authorization. StoRM is able to support VOMS extension, and to use that to define access policy (complete VOMS- awareness) 4
5
User Access Management There are several steps StoRM does to manage access to file: 1.User makes a request with his proxy 2.StoRM checks if the user can perform the requested operation on the required resource 3.StoRM ask user mapping to the LCMAPS service 4.StoRM enforce a real ACL on the file and directories requested 5.Jobs running on behalf of the user can perform a direct access on the data 5
6
Scalability Single host 6 Clustered deployment
7
STORM + LUSTRE PERFORMANCE TEST 1.Test Bed 2.SE Transfer Out Test (besdirac’s dir. read) 3.Job Write to Lustre Test (besdirac’s dir. write) 4.SE Transfer In Test (besdirac’s dir. write) 5.DST Data Trasfer between SE (other user’s dir. read – to be taken) 6.Multi-VO Support Test 7
8
Test Bed Single server without data disk 10 Gbps network /cefs mounted with ACL enabled A subdirectory of /cefs (own to account besdirac) is bind to StoRM pub directory ModelDell PowerEdger R620 CPUXeon E5-2609 v2 ( 8 cores) Memory64 GB HDD300 GB SAS RAID-1 Network10 Gbps 8
9
SE Transfer Out Test Test procedure: 1.prepare 2,000 files of 1GB size located in /cefs 2.registering metadata into DIRAC DFC 3.transfer the dataset to remote SE at WHU Test Results: 1.registering DFC takes 70 seconds, i.e., 35 seconds per 1k files 2.average transfer speed is 80.9 MB/s, peak speed is 91.9 MB/s 3.one-time success rate is 100% 9
10
10 IHEP-STORM WHU-USER average: 80.9 MB/s peak: 91.9 MB/s 2 TB data transferred in 7 hours
11
11 2000 files of 1GB size 100% success
12
Job Write to Lustre Test Facts about testing jobs: – 200M total events, bhabha sim. + rec. – split by run, 20k max events/job – 10,929 total jobs submitted – 10,282 jobs done (94.1%) – Job failed reason: 353 stalled (USTC unstable power supply) 275 overload (UMN node error) 6 application failed 13 network failure – 1.4 TB data generated and uploaded to StoRM+Lustre (IHEP-STORM) Test results: 1.No job failed because of upload output data error 2.1.4 TB output data write to test SE with high success rate 3.output can be immediately seen at Lustre 12
13
13 3 days >10K jobs
14
14 94.2% success rate no job failed for upload output data
15
15 ~ 1.4 TB output data write to StoRM+Lustre
16
16 data uploaded with good quality
17
Output data can be seen at Lustre 17 data write to immediately
18
SE Transfer In Test Facts: – tranfser from UMN SE to /cefs/tmp_storage/yant/transfer/DsHi – 2.3 TB MC Sample (dst, rtraw, logs, scripts) – 16011 files – registered into DFC in 12m50s (48s/1k files) – speed: 20~30 MB/s – quality: > 99% 18
19
19
20
20
21
Multi-VO Test Currently supported VO: bes, cepc, juno Each VO’s user can read/write it’s own root directory User from one VO can not access other VO user’s files A test is performed: 1.initialize proxy as cepc VO user 2.check if bes VO’s directory is available 3.check if cepc VO’s directory is available 4.srmcp test (read/write) Test Result: 1.cepc user can not visit bes VO’s directory 2.cepc user can read/write its VO’s own directory 21
22
22 Register as cepc user Failed to access BES VO’s storage area Success to read/write CEPC VO’s storage area
23
SUMMARY AND DISCUSSION 23
24
Test Summary With ACL enabled /cefs, in besdirac’s diretory, read/write is OK Need more debug&test on reading other user’s data Speed of read (80MB/s) is acceptable Speed of write (20-30MB/s), need more test Mulit-VO support is working 24
25
Comparison of StoRM and dCache The StoRM solution is easier to install and maintain, no extra development is required The StoRM solution could be more efficient without registering lustre metadata in advance and without data movement StoRM is a promising solution and we will do more tests before making final decision 25
26
Lustre Data Security StoRM SE server acts like lxslc5 login node Lustres are mounted on / use mount –bind to remount a subdirectory of Lustre to StoRM pub directory only this subdirectory is visiable to grid user (by low level srm command) currently, in StoRM, all grid user are mapped to AFS account ‘besdirac’, r/w on Lustre is executed by user ‘besdirac’. So, only besdirac’s directory can be modified, other user’s data in Lustre is safe In production senario: input/output data of DIRAC jobs will be located at one Lustre user’s directory (i.e. besdirac) in besdirac’s directory, we create subdirectories for each grid users When we need to transfer DST from IHEP to remote site, that DST directory is mounted temporarily and read only When transfer DST from remote site back to IHEP, data will be write into besdirac’s dir. 26
27
Production Solution 1 enable ACL, user_xattr on production Lustre ( /besfs, /besfs2, /bes3fs, /junofs, etc) create a directory for user besdirac in each Lustre with serveral or dozens of TB quota (depend on physics user’s requirements) disadvantage: prod. Lustre are busy and can’t be shutdown to enable ACL a solution: can be performed during mantaince time 27
28
Production Solution 2 prepare a seperated Lustre, e.g. /diracfs or we can change current IHEPD-USER’s 88TB disk pool (even 126TB data disk) to Lustre advantage: production Lustres are un-effected disadvantage: – abandon StoRM+Lustre solution ; – hard to enlarge /diracfs to PB level 28
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.