Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC Computing Facility Brookhaven National Laboratory
Background Rapid development of large clusters built with affordable commodity hardware Need to address software scalability issues with deploying and effectively operating large clusters Critical for the efficient operation of the CPU cluster in the Linux Farm at the RCF
The rapid growth of the Linux Farm
Hardware in the Linux Farm BrandCPURAMDiskQuantity VA Linux450 MHz0.5-1 GB9-120 GB154 VA Linux700 MHz0.5 GB9-36 GB48 VA Linux800 MHz0.5-1 GB GB 168 IBM1.0 GHz0.5-1 GB GB 315 IBM1.4 GHz1 GB GB 160 IBM2.4 GHz1 GB240 GB252
Monitoring Mix of open-source, staff-designed and vendor-provided monitoring software Software-redesign for scalability (push vs. pull method) in large clusters Persistency and fault-tolerant features Near real-time information
Monitoring Models
Cluster Monitoring (Staff-designed)
Cluster Monitoring (Ganglia project)
Image Distribution in the Linux Farm NFS-based image distribution system until 2001 – not scalable Switched to Web-based RedHat KickStart installer Fast and scalable (20 minutes/server with 100’s of servers at a time) Highly configurable (multiple images, build options, etc)
Database Systems MySQL widely used throughout the RCF Open-source nature General monitoring & control (cluster, infrastructure, batch, storage, etc) Flexible and scalable for lightweight operations
MySQL Usage in the Linux Farm
Batch job control via MySQL database
Other System Administration Tools PYTHON-based scripts for fast, parallel access to multiple servers PYTHON-based scripts for infrastructure emergency remote power management access Vendor-provided scalable, remote power management software
Cluster Management Tool (RCF- designed)
Cluster Management Tool (vendor- provided)
Conclusion Scalable system software important for efficiently deploying and managing large clusters Fast image downloading with current software Necessary to mix system software from various sources to address all our needs and requirements