Status of BESIII Distributed Computing name list BESIII Collaboration Meeting, IHEP, June 2014
Outline DMS status WMS status Site status Site monitoring Release of BESDIRAC v0r8 Summary BESIII Collaboration Meeting, IHEP 4th June 2014
DMS Status Motivation Data transfers over SEs Deployment of StoRM SE Combination of dCache SE and Lustre BESIII Collaboration Meeting, IHEP 4th June 2014
Motivation distributed storage solution improve performance of reconstruction jobs on grid evnironment by changing computing model from remote central storage to distributed local storage easy connection to local analysis jobs for distributing raw/dst data from IHEP to collaboration members. site SE site CE site CE site Lustre site SE central storage solution WAN site SE IHEP SE site CE site CE site CE network jam site SE high load of SE site CE site CE site CE site CE read randomtrg data write output dst download randomtrg data upload output dst SE replicate/transfer BESIII Collaboration Meeting, IHEP 4th June 2014
SE Data Transfer: Statistics 24.5 TB XYZ DST data ( IHEP USTC @ 3.20 TB/day ) 4.4 TB randomtrg data ( IHEP USTC, JINR, WHU, UMN @ 1.95 TB/day ) BESIII Collaboration Meeting, IHEP 4th June 2014
SE Data Transfer: Speed & Quality USTC JINR & WHU Quality UMN BESIII Collaboration Meeting, IHEP 4th June 2014
SE Data Transfer: Web UI Twiki: http://docbes3.ihep.ac.cn/~offlinesoftware/index.php/BESDIRAC_Data_Transfer_Guide operations button request status submit request monitoring jobs BESIII Collaboration Meeting, IHEP 4th June 2014
Deployment of StoRM SE Succesful Case: WHU site, 39 TB StoRM SE rpm packaged by Sergey Belov at JINR and tested at IHEP easy to install and configure (1 day work) easy to maintenance: good stability since its deployment at Aprl 2, 2014 webDAV support for HTTPS access Hardware of WHU SE: 12 cores Xeon E5-2440 @ 2.40GHz 8 GB memory 15 x 3TB SATA Disk RAID-6 Two network interface, configured with WAN IP and LAN IP Hardware information of current SEs http://docbes3.ihep.ac.cn/~offlinesoftware/index.php/Site_Resource_and_Contact Detail StoRM SE installation guide: http://docbes3.ihep.ac.cn/~offlinesoftware/index.php/Setting_up_a_local_SE BESIII Collaboration Meeting, IHEP 4th June 2014
Combination of dCache and Lustre Features directories in Lustre can be mounted on dCache SE ro, rw model supported Benefits enlarge capacity of SE no need to transfer data between SE and Lustre unified interface for users To do the metadata sub-system in pool nodes are done the interface program between pool nodes and Lustre is under developing permission control need to consider carefully will go into test phase in three month, and working at the end of this year BESIII Collaboration Meeting, IHEP 4th June 2014
dCache and Lustre: Current System 1 Server DELL R720 CPU: E5-2609 x2 Memory: 32GB 2 Disk Array 24 x 3TB RAID-6 Capacity: 126 TB Network: 10Gbps Ethernet BESIII Collaboration Meeting, IHEP 4th June 2014
dCache and Lustre: Future System Head Node (1 server) DELL/IBM/HP 1U, CPUx2, Mem: 32GB 1Gbps Ethernet Disk Node (2 servers) DELL/IBM/HP 1U, CPUx2 Mem: 64 GB 10 Gbps Ethernet 8 Gb Hba Disk Array (3 servers) 24 x 3TB (2 srv) 24 x 4Tb (1 srv) Capacity: 126TB + 80 TB BESIII Collaboration Meeting, IHEP 4th June 2014
dCache and Lustre: Develop Status Realized: direct read/write metadata info Under develop: storage info BESIII Collaboration Meeting, IHEP 4th June 2014
WMS status Upgrade of GangaBOSS Tests of Simulation+Reconstruction Upgrade of OS and CVMFS in sites BESIII Collaboration Meeting, IHEP 4th June 2014
gangaBOSS Release 1.0.6 New features: generate a dataset of output data, inform user the dataset name and LFN path register metadata for output data support simulation+reconstruction add more minor status in Logger info add more error code for debug auto upload job logs to SE and register in DFC BESIII Collaboration Meeting, IHEP 4th June 2014
Tests of Simulation+Reconstruction site without SE can download randomtrg data from SEs success rate is greater than 95% when SE works fine. GUCAS’s network is poor to download from IHEP UMN has 700+ cores, the load of their SE is higher than others Other optimization scheme: Fabio’s clould storage? mount SE on WNs in site? benefits: read less than download; support spliting by event BESIII Collaboration Meeting, IHEP 4th June 2014
Upgrade of OS and CVMFS in sites CVMFS have to be upgrade from 2.0 to 2.1 version 2.0 is out of support by CERN GUCAS, USTC’s WN with old version have upgrated recently. PKU under upgrading OS have to be upgrade from SL 5 to SL 6 BOSS will be upgraded to run on SL 6 at the end of this year hope site will update your OS BESIII Collaboration Meeting, IHEP 4th June 2014
Site Status # Site Name Type CPU Cores SE Capacity OS Site Status 1 BES.IHEP-PBS.cn Cluster 96 dCache 126 TB SL 5.5 Running 2 BES.UCAS.cn 152 3 BES.USTC.cn 228 ~ 896 dCache 24 TB SL 5.7 4 BES.PKU.cn 100 SL 5.10 5 BES.JINR.ru gLite 40 ~ 200 dCache 7.5 TB SL 6.5 6 BES.UMN.us 768 BeStMan 50 TB SL 5.9 7 BES.WHU.cn 200 ~ 400 StoRM 39 TB SL 6.4 8 BES.INFN-Torino.it 200 9 BES.SDU.cn ~100 Preparing 10 BES.BUAA.cn ~256 SL 5.8 Total 1784 ~ 3368 246.5 TB BESIII Collaboration Meeting, IHEP 4th June 2014
Site monitoring CE Availablity Host (worker nodes) Network SE latency WMS_send_test BOSS_work_test CPU_limit_test Reliablity statistics Host (worker nodes) job success rate statistics Network ping CE login node SE latency data upload, replicate More test will be added: SE transfer speed SE usage information dataset status pilot monitoring Author: Igor Pelevanyuk @ JINR Details in Alexey’s report BESIII Collaboration Meeting, IHEP 4th June 2014
Release of BESDIRAC v0r8 DIRAC version: v6r10pre17 besdirac-* toolkits included dataset toolkit tested and added transfer toolkit refined download tool work with rsync gangaBOSS upgraded generate a dataset of output data register metadata for output data support simulation+reconstruction Upgration is transparent to user. BESIII Collaboration Meeting, IHEP 4th June 2014
Usage of Private Users General procedure Features Hypernews Twiki apply a certificate (1 week) submit job with ganga (one cmd) monitoring job in web UI download data from SE to Lustre (one cmd) Features support simulation+reconstruction resource: more than 1500 cpu cores easy to monitor job status Hypernews http://hnbes3.ihep.ac.cn/HyperNews/get/distributed.html Twiki http://docbes3.ihep.ac.cn/~offlinesoftware/index.php/Distributed_Computing http://docbes3.ihep.ac.cn/~offlinesoftware/index.php/BESDIRAC_User_Tutorial Contact person Xiaomei ZHANG (zhangxm@ihep.ac.cn), Xianghu ZHAO(zhaoxh@ihep.ac.cn), Tian YAN (yant@ihep.ac.cn). BESIII Collaboration Meeting, IHEP 4th June 2014
Thanks Thanks for: Thank you for your attention! Questions and Answers the resources contribution of sites the effort of site administrators and contactors the effort of developers at JINR the advise of DIRAC experts Thank you for your attention! Questions and Answers BESIII Collaboration Meeting, IHEP 4th June 2014