SAN DIEGO SUPERCOMPUTER CENTER Using Gordon to Accelerate LHC Science Rick Wagner San Diego Supercomputer Center XSEDE 13 July 22-25, 2013 San Diego, CA Brian Bockelman University of Nebraska-Lincoln
SAN DIEGO SUPERCOMPUTER CENTER Coauthors Mahidhar Tatineni Eva Hocks Kenneth Yoshimoto Scott Sakai Michael L. Norman Igor Sfiligoi (UCSD) Matevz Tadel (UCSD) James Letts (UCSD) Frank Würthwein (UCSD) Lothar A. Bauerdick (FNAL)
SAN DIEGO SUPERCOMPUTER CENTER When Grids Collide
SAN DIEGO SUPERCOMPUTER CENTER Overview 2012 LHC data collection rates higher than first planned (1000Hz vs. 150Hz) Additional data was “parked” to be reduced during 2 year shutdown Delays the science from data at the end
SAN DIEGO SUPERCOMPUTER CENTER Overview Frank Würthwein (UCSD, CMS Tier II lead) approaches Mike Norman (Director of SDSC) regarding analysis delay A rough plan emerges: Ship data at the tail of the analysis chain to SDSC Attach Gordon to CMS workflow Ship results back to FNAL From CMS perspective, Gordon becomes a compute resources From SDSC perspective, CMS jobs run like a gateway
SAN DIEGO SUPERCOMPUTER CENTER Gordon Overview 3D Torus Dual rail QDR 64, 2S Westmere I/O nodes 12 core, 48 GB/node 4 LSI controllers 16 SSDs Dual 10GbE SuperMicro mobo PCI Gen2 300 GB Intel 710 eMLC SSDs 300 TB aggregate 1,024 2S Xeon E5 (Sandy Bridge) nodes 16 cores, 64 GB/node Intel Jefferson Pass mobo PCI Gen3 Large Memory vSMP Supernodes 2TB DRAM 10 TB Flash “Data Oasis” Lustre PFS 100 GB/sec, 4 PB
SAN DIEGO SUPERCOMPUTER CENTER CMS Components CMSSW: Base software components, NFS exported from IO node OSG worker node client: CA certs, CRLs Squid proxy: cache calibration data needed for each job, running on IO node glideinWMS: worker node manager pulls down CMS jobs BOSCO: GSI-SSH capable batch job submission tool PhEDEx: data transfer management
SAN DIEGO SUPERCOMPUTER CENTER
Results Work completed in February to March million collision events 125TB in, ~150 TB out ~2 million SUs Good experience regarding OSG-XSEDE compatibility
SAN DIEGO SUPERCOMPUTER CENTER Thoughts & Conclusions OSG & XSEDE technologies very similar GridFTP GSI authentication Batch systems, etc. Staff at both ends speak the same language Some things would make a repeat easier: CVMFS (Fuse-based file system for CMS tools) Common runtime profile for OSG & XSEDE Common SU and data accounting