Linux Cluster Tools Development Dane Skow Fermilab October 8, 1999 HEPNT/HEPiX
Projects Linux Farms (FT and Run II) Level 3 trigger farms Tape mover nodes (Enstore) Desktops Prototyping systems (DAQ tests)
Delete sample document icons and replace with working document icons as follows: From Insert Menu, select Object... Click “Create from File” Locate File name in “File” box Make sure “Display as Icon” is checked Click OK Select icon From Slide Show Menu, Select “Action Settings” Click “Object Action” and select “Edit” FNAL System Census 1999 8/30/2019
Farms Facility farm nearly completely Linux now (39+50 dual PCs, 6 quad SGIs) Run II farms ramping up from 8 nodes to 50 (CDF & D0 each). Decision to use an I/O node for output building made. SGI Origin’s for I/O nodes. Production farm on 2.0.32 kernel been fine. Prototype on 2.0.35 has been rocky. Burn-in on 2.2.10 has had many machines hang. Moving to 2.2.12. Level 3 trigger farms Tests have been good to date. Large scale purchases delayed until late 2000 ?
Prototyping systems Linux boxes popular for test clusters to develop ideas and software testing. Used extensively by the online data logging, D0 data handling teams.
Desktops Over half of all Linux boxes still are on the desktop. Growth continues to pace farms deployment (even with 100+ node purchases). Code developers are prime deployment targets. Physics analysis users beginning but running into troubles with tapes. People still using VAX and Unix workstation mindset. Most desktops are run in “Orange” mode. “Self-help” mailing list linux-users@fnal.gov very successful
Security AutoRPM system with has been popular and effective for distributing security patches. Distribution continues to have the default service configuration pared down. Applications bundled follow the RedHat release. Users are supportive of “minimal” default. In early deployment of AFS client with good success. Plan on making standard for next release (RH 6.1).
Infrastructure Discussions of tools that are needed seem to break down into 4 categories: system monitoring and alarm Currently use simple ping tests and PATROL. This is area of greatest activity of Beowulf world. system installation and config (patch) management. Use network install server and AutoRPM Backup and failure recovery. Systracker and other ideas. Still early Resource accounting and capacity planning. Use batch systems for scheduling and pacct’ing scripts for usage tracking.
Futures - Infrastructure Many people interested in this area, but uncoordinated efforts. Beowulf, MOSIX, etc. Small DOE grant funded change control work over the summer (RAP) called systracker. Discussion group for the “Next Generations Operations” for FNAL datacenter operations. Just completing requirements gathering phase.
Systracker Based on our success with AutoRPM we invited Kirk Bauer to come work on a configuration management tool. Prototype of system change tracking system (logger and replay mechanism). Desire is for easy method to restore changes to install configuration. PERL modules based on concepts of tripwire, Autorpm and RCS. Local machine alpha version available. Next step would be archive server, addition of other package handling methods (UPS, etc.).
Systracker Config Files CVS repository Systracker Difference engine System Dirs RPMs Replay engine UPS
Systracker Presume that one can install a system to a base configuration. Take a snapshot of this as the system baseline. Use tripwire mechanisms to monitor system files and directories for changes and check updates into a CVS repository. Modified RPM to archive RPMs to a repository. Create a module to create a “replay” script from differences between baseline and target. Working on installation scripts to replay the “replay” Alpha code available at http://home.fnal.gov/~dane/systracker.tgz
Futures - Software Desktop environment decision (KDE vrs Gnome) likely to be desired soon. Strong desire for centralized backup or archive service. Both of these will be exacerbated by increase in use of physics analysis tools (PAW now, ROOT most likely). Discussions about whether one wants tracked
Futures - Hardware Looking harder at high density systems (2U cases, racked bare boards, etc.) Run II purchases likely delayed until FY01. Purchasing preconfigured hardware from specified vendors is work not yet done. Brave ideas from several future experiments about 1000’s of PC per experiment.
Summary At FNAL, Linux installation infrastructure better than most OS flavors. Users are “violently” in favor of an “Orange” configuration but not diligent in carrying out admin duties. Linux growth not yet maxed out. Likely to completely supplant the Unix desktop. Serious use by amateurs not yet there. Coming soon. Desired applications for Linux continue to rise. Expect to see videoconferencing, etc coming.