Download presentation
Presentation is loading. Please wait.
Published byKerry Jordan Modified over 9 years ago
1
The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services
2
Components Solaris Farm 900 single CPU units Linux Farm512 dual CPU units AFS7 servers, 3 TB NFS21 servers, 16 TB Objectivity94 servers, 52 TB LSFMaster, backup, license HPSSMaster + 10 tape movers Interactive25 servers, + E10000 Build Farm12 servers Network9 Cisco 6509 switches
3
Staffing System Admin7 Mass Storage3 Applications3 Batch1 Operations4 Operators0 Same staff supports most Unix desktops on site
4
Growth in Systems
5
Growth in Staffing
6
Ratio of Systems/Staff
7
Physical Racking, power, cooling, seismic, network Remote power management Remote console management Installation Burn-in, DOAs Maintenance Replacement burn-in Divergence from original models Locating a machine
8
Networking Gb to servers 100Mb to farm nodes Speed matching (problems) at switches Network glitches and storms Network monitoring
9
System Admin Network install (256 machines in < 1 hr) Patch management Power Up/Down Nightly maintenance System Ranger (monitor) Report summarization “A Cluster is a large Error Amplifier”
10
User Application Issues Workload scheduling Startup effects Distribution vs Hot Spots System and Network Limits File descriptors Memory Cache contention NIS, DNS, AMD Job Scheduling Test Beds
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.