Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC.

Similar presentations


Presentation on theme: "Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC."— Presentation transcript:

1 Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC Computing Facility Brookhaven National Laboratory

2 Background  Rapid development of large clusters built with affordable commodity hardware  Need to address software scalability issues with deploying and effectively operating large clusters  Critical for the efficient operation of the 2000+ CPU cluster in the Linux Farm at the RCF

3 The rapid growth of the Linux Farm

4 Hardware in the Linux Farm BrandCPURAMDiskQuantity VA Linux450 MHz0.5-1 GB9-120 GB154 VA Linux700 MHz0.5 GB9-36 GB48 VA Linux800 MHz0.5-1 GB18-480 GB 168 IBM1.0 GHz0.5-1 GB18-144 GB 315 IBM1.4 GHz1 GB36-144 GB 160 IBM2.4 GHz1 GB240 GB252

5 Monitoring  Mix of open-source, staff-designed and vendor-provided monitoring software  Software-redesign for scalability (push vs. pull method) in large clusters  Persistency and fault-tolerant features  Near real-time information

6 Monitoring Models

7 Cluster Monitoring (Staff-designed)

8 Cluster Monitoring (Ganglia project)

9 Image Distribution in the Linux Farm  NFS-based image distribution system until 2001 – not scalable  Switched to Web-based RedHat KickStart installer  Fast and scalable (20 minutes/server with 100’s of servers at a time)  Highly configurable (multiple images, build options, etc)

10 Database Systems  MySQL widely used throughout the RCF  Open-source nature  General monitoring & control (cluster, infrastructure, batch, storage, etc)  Flexible and scalable for lightweight operations

11 MySQL Usage in the Linux Farm

12 Batch job control via MySQL database

13 Other System Administration Tools  PYTHON-based scripts for fast, parallel access to multiple servers  PYTHON-based scripts for infrastructure emergency remote power management access  Vendor-provided scalable, remote power management software

14 Cluster Management Tool (RCF- designed)

15 Cluster Management Tool (vendor- provided)

16 Conclusion  Scalable system software important for efficiently deploying and managing large clusters  Fast image downloading with current software  Necessary to mix system software from various sources to address all our needs and requirements


Download ppt "Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC."

Similar presentations


Ads by Google