Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader
Randy MelenApril 14, Past 12 months... n Busy! n Target of May 9 for BaBar detector to begin n Challenge to get systems assembled and tested in time, to get C++ code working and sufficiently optimized, to handle 100 events/second for reconstruction and event recording n Once BaBar data begins, more difficult to make system changes, take service outages
Randy MelenApril 14, New Hardware Developments n Increased Solaris batch systems to compute farm ( from 5 Sun Ultra 2300 systems to 18 systems) n Upgraded Sun UE 6000 to 4GB memory n Acquired 4 Sun UE4500 systems, increased to 6 systems, for HPSS data movers, total of 4TB of disk n Acquired Sun UE10000 (24 CPUs, 12GB memory, 1.5TB disk, 2 domains) n 4 Sun E250 systems as tape movers n 3 IBM F50 systems as data movers
Randy MelenApril 14, New Hardware Developments (cont.) n Added 220 Sun U5 systems (256MB, 9GB IDE disk, 333MHz UltraSPARC IIi with 2MB cache, $188/SI95) n Expect to add ~200 more U5 systems 2Q1999, probably more disk, perhaps UE10000 upgrade to 400MHz CPUs
Randy MelenApril 14, Farm Management n Upgraded farm master for LSF to IBM F50 n Working with Sun Auto Client software and cacheFS to centrally manage Sun U5 systems n Actively doing Solaris performance tuning on UE6000 and UE10000 n Adding 2 Sun E250 systems as BaBar build systems; need to be able to build 1M C++ lines of code each night (twice?)
Randy MelenApril 14, Mass Storage Hardware n Upgraded 5 STK silos to PowderHorn robots n Added a 6’th STK silo and 12 STK Eagle drives; more Eagle drives will be needed n Need to add BaBar data import/export tape device; considering STK 9740 with DLT 7000 and RedWood drives
Randy MelenApril 14, Farm Network Technology n Currently using 3 Cisco Catalyst 5500 switches (~1.2 Gbps backplanes), everything on Fast Ethernet, single collision domains n Migrating to 3 Cisco Catalyst 6509 switches (~16 Gbps backplanes) n Deploying Gb Ethernet on ~16 Solaris servers
Randy MelenApril 14, HPSS Phase 3 (Porting) Ongoing n With assistance from Sun, began moving and testing the Solaris port to Solaris 2.6 n Lots of issues related to getting infrastructure pieces at correct version levels n Began HPSS 4.1 datamover port to Solaris 2.6 n Sun and IBM signed agreement for IBM to port HPSS 4.1A; we expect to deploy ~4Q1999
Randy MelenApril 14, HPSS Stage 4 (PRV0) Plans n While Solaris port continues, use IBM F50 systems as datamovers n Move development (porting and testing) to Solaris U250 build servers
Randy MelenApril 14, Currently Supported Systems n General Servers u generally Solaris > Solaris 2.6 u AFS servers will become Sun U2300 systems for AFS 3.5 multithreading u AIX > u phasing out “core” NFS file server (AIX 3.2.5!) by moving binaries and home directories to AFS n Farm Servers u AIX now frozen, not a porting platform for BaBar as of 7/1998 u Solaris > 2.6 completed n Desktop u still NT though much more Linux than before now
Randy MelenApril 14, Intel Farm Prototype n A prototype 17 node Intel compute farm acquired 4Q1998: u 2-way 256MB, 9GB disk, Dell 450MHz Pentium-II u partnership with Accelerator Research group and NERSC u strong interest in MPI and developing for Cray T3E production u decided on Linux from RedHat u modest success so far for scalability u expect to expand to 32 nodes 3Q1999 u Issues that remain: F Commercial software support (e.g., Objectivity, AFS, LSF with AFS support) F Manageability of large numbers of systems F MPI cluster vs “task farm”