Download presentation
Presentation is loading. Please wait.
Published byFay Leonard Modified over 9 years ago
1
Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC Computing Facility Brookhaven National Laboratory
2
Background Rapid development of large clusters built with affordable commodity hardware Need to address software scalability issues with deploying and effectively operating large clusters Critical for the efficient operation of the 2000+ CPU cluster in the Linux Farm at the RCF
3
The rapid growth of the Linux Farm
4
Hardware in the Linux Farm BrandCPURAMDiskQuantity VA Linux450 MHz0.5-1 GB9-120 GB154 VA Linux700 MHz0.5 GB9-36 GB48 VA Linux800 MHz0.5-1 GB18-480 GB 168 IBM1.0 GHz0.5-1 GB18-144 GB 315 IBM1.4 GHz1 GB36-144 GB 160 IBM2.4 GHz1 GB240 GB252
5
Monitoring Mix of open-source, staff-designed and vendor-provided monitoring software Software-redesign for scalability (push vs. pull method) in large clusters Persistency and fault-tolerant features Near real-time information
6
Monitoring Models
7
Cluster Monitoring (Staff-designed)
8
Cluster Monitoring (Ganglia project)
9
Image Distribution in the Linux Farm NFS-based image distribution system until 2001 – not scalable Switched to Web-based RedHat KickStart installer Fast and scalable (20 minutes/server with 100’s of servers at a time) Highly configurable (multiple images, build options, etc)
10
Database Systems MySQL widely used throughout the RCF Open-source nature General monitoring & control (cluster, infrastructure, batch, storage, etc) Flexible and scalable for lightweight operations
11
MySQL Usage in the Linux Farm
12
Batch job control via MySQL database
13
Other System Administration Tools PYTHON-based scripts for fast, parallel access to multiple servers PYTHON-based scripts for infrastructure emergency remote power management access Vendor-provided scalable, remote power management software
14
Cluster Management Tool (RCF- designed)
15
Cluster Management Tool (vendor- provided)
16
Conclusion Scalable system software important for efficiently deploying and managing large clusters Fast image downloading with current software Necessary to mix system software from various sources to address all our needs and requirements
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.