Winnie Lacesso Bristol Storage June 2009
2 DPM LCG Storage lcgse01 = DPM built in 2005 by Yves Coppens & Pete Gronbech SuperMicro X5DPAGG (Streamline Computing), 2 Intel Xeon 2.8GHz, 2GB RAM, 32-bit SL3.x, Adaptec 39320A Ultra320 dual-channel May 2006:Transtec T6100 = Infortrend/EonStor A08U-G2421 8x400GB RAID5 = 2.2TB usable; 673GB = any-VO, 1.527TB = CMS May 2007: Transtec PV610S = Infortrend A16U-G x750GB as 2xRAID6 = 8.4TB usable, all CMS-only all ext3 filesystems; both RAID arrays nearly full Feb-May 2008: intermittent SCSI problems with 16-bay June 2008: rebuild lgse01 as SL4 32-bit; July-Aug: SCSI problems increase, always w/16-bay, causing errors in dpm filesystems :( Aug: replace Adaptec SCSI ctlr w/LSI: No help. Add +2GB RAM. Sept/Oct/Nov - trying to debug, RAID array rejected 5 disks in 3 months; Vendor finally admits to replace hardware. Arrives in Dec. New hardware replaced January - excellent working since then.
3 HPC-LCG Storage HPC has used DPM = lcgse01 so far HPC uses gpfs so Jon Wakelin looked into StoRM which can (supposedly) leverage gpfs for bulk access (instead of going thru server = bottleneck) lcgse02 = Viglen 1U, X7DBU mobo, 2 x Intel E5405 = 8 x 2.0GHz, 16GB RAM, 2 x 250GB RAID1 disks, dual PSU gridftp01 = identical but only 8GB RAM SL bit, gpfs (currently) - kernel versions are constrained by gpfs (currently ) StoRM FrontEnd + Backend on one machine (common config) StoRM supports gsiftp, rfio & file protocol Passing all OPS, LHCb, CMS SAM tests since forever :)
4 GPFS & HPC storage Storageless Physics gpfs cluster = {lcgse02,gridftp01} plus 3 test nodes Storage gpfs cluster = 4 x DDN I/O servers (filers) & 44TB usable Jon got them multiclustered over public network so StoRM can write But after Jon left we found out rfio does not work - must be a config problem with ACLs within gpfs, but we can't find it yet HPC WN gpfs cluster needs to be multiclustered with Storage gpfs cluster, so LCG jobs on WN can ask lcgse02 for file:/ location of their data and access it over gpfs. HPC maintenance outage in May - multiclustering failed with openssl errors - no help from IBM gpfs experts New Storage Admin Bob Cregan will debug it!
5 StoRM SE, GPFS New hardware for HPC CE & StoRM SE, also gridftp server & new MON (syslog, Nagios, etc): X7DBU Xeon E5405 with 2GB RAM/core HPC CE working well except gpfs timeouts – patchy OPS SAM fails Problems with StoRM - gpfs multiclustering not yet working, rfio permission problems (ACLs??) - thought Jon left it in working order but guess not... New Storage Admin (Bob Cregan) will help get gpfs multiclustering working Good performance on new hardware!