US ATLAS Western Tier 2 Status Report Wei Yang Nov. 30, 2007 US ATLAS Tier 2 and Tier 3 workshop at SLAC
VersionGHzCoresSPECint2000 SPECcfp200 0Description PeakValue cluster name SITE:PROD_SLAC Pentium(R) III CPU family 1400MHz ,392noma Dual Core AMD Opteron(tm) Processor ,12649,54457,508don Dual Core AMD Opteron(tm) Processor ,452284,592349,860cob Dual Core AMD Opteron(tm) Processor ,521170,352138,400yili Xeon(TM) CPU 2.66GHz ,360tori Dual Core AMD Opteron™ Processor ,8271,015,8120boer Intel Xeon(R) X ,178226,5120fell Summary: Totals1,012 1,746,812648,520 Normalization 1, ATLAS has ~ 32% fair share
CPU resource Current (FY07 fund) 312 cores of AMD Opteron 2218 2 GB / core Procurement with FY08 program fund 320 cores of Intel Xeon X5355 (40 units) 2 GB / core 38% program fund
Storage Current 72 TB raw / 51 TB usable (3 thumpers) FY08 Procurement 58% program funds Thumpers. > 200 TB usable based on old price Negotiating Price Xrootd on Solaris / ZFS GridFTP for Xrootd Composite Name Space and XrootdFS Run SRM on XrootdFS
Procurement for other Experiments 224 cores of AMD Opteron 2218 104 cores of Intel Xeon X5355 2 GB / core In general queues, accessible by ATLAS SUN Black box # 2 Intel Xeon X5335 CPU nodes ordered from DELL
Software infrastructure RHEL 3 32 bit on old batch nodes RHEL 4 64 bit on newer batch nodes LSF 6.1 with Fair Share How to implement analysis queue in fair share environment ? Nagios Ganglia
Grid Middleware and DDM OSG 0.8 Gatekeeper Need customization of Globus LSF module GridFTP and GridFTP for Xrootd GUMS 1.2 one-to-one mapping for DNs and local accounts RSV and Gratia Has to hack into Gratia code to make it reporting correctly DQ2 0.4, will upgrade to 0.4.1
Networking Tuning significantly improved the iperf performance measurement Increase TCP buffers Enable hardware functions for transmission Can not achieve stable measurement, probably due to competing traffic WAN is still 1 GB / 2GB 10 GB upgrade plan pushed back to January Plan to evaluate TeraPath and QoS
Networking, cont’d Disk-2-disk 2~3MB/sec with 1 stream 15 MB/sec with 12 streams 300 MB / sec goal SRM on XrootdFS + multiple GridFTP for Xrootd One GridFTP with multiple NIC channel binding
ATLAS production SLAC Production runs smoothly Gatekeeper can handle jobs easily Utilization is low Regional physicists utilizing WT2 spare cycles Much less (useless) debugging at Tier 2 level after moving to PandaMover
September is both a good month and a bad month for WT2