Download presentation
Presentation is loading. Please wait.
Published byMay Bennett Modified over 9 years ago
1
Oak Ridge National Laboratory Tools for Cluster Administration and Applications (ancient technology – from 2001…)
2
Oak Ridge National Laboratory -- U.S. Department of Energy 2...Problem System Administrators DO NOT scale Install / update operating system Install applications Add / Remove users etc. Users DO NOT scale Install applications Move data files Launch applications Interact with active jobs etc. Tools that… Treat cluster as single machine Scale from 1-to-N nodes 10,000’s of nodes Scale to Federated clusters Easy to learn – use – adapt...Solution Large Cluster Administration: what is the...
3
Oak Ridge National Laboratory -- U.S. Department of Energy 3 Tool Review Systemimager LUI – Linux Utility for cluster Install VA Cluster Management (VACM) Alert Parallel UNIX Commands – (Ptools) dsh prsh Webmin ALINKA LCM - Linux Cluster Manager ALINKA RAISIN SCMS – Smile Cluster Management System C 3 – Cluster Command & Control M3C – Managing Multiple Multi-User Clusters
4
Oak Ridge National Laboratory -- U.S. Department of Energy 4 Systemimager Disk image / system administration maintain disk coherency across cluster administrator level tool image server stores images can build image server database of site disk images Pros: supported by VA Linux as opensource architecture independent Cons: requires each node to request image (“pull image”) only operates at disk image level (not individual file) Dependencies: rsync, DHCP http://download.sourceforge.net/systemimager
5
Oak Ridge National Laboratory -- U.S. Department of Energy 5 Linux Utility for cluster Install – (LUI) System install / restore administrator level tool easy to duplicate install by resource linux kernel, system map, partition table, RPMs, “user exits”, local & remote NFS file systems no need to store disk images Pros: LUI available as an RPM supported by IBM as opensource architecture independent machine & resource groups Cons: only useful for system initialization manually installed packages will have to be reinstalled Dependencies: NFS, tftp-hpa, bootp or dhcp, perl http://oss.software.ibm.com/developer/opensource/linux/projects/lui
6
Oak Ridge National Laboratory -- U.S. Department of Energy 6 VA Cluster Management - (VACM) GUI based Hardware level monitor device power control, hardware reset, remote bios control, chasis intrusion, cpu fan status Intel Intelligent Platform Management Interface motherboards Pros: monitor does not impact performance as IPMI runs in hardware micro controllers Cons: only available for Intel IMPI compliant motherboards does not monitor power supply fan or external fan Dependencies: IMPI motherboard: NB440BX Server Platform (Nightshade) T440BX Server Platform (Nightlight) L440GX Server Platform (Lancewood) GTK+ v1.02, Gnome-libs, GDK v1.2, imlib v1.0.6 http://www.valinux.com/software/vacm/
7
Oak Ridge National Laboratory -- U.S. Department of Energy 7 Alert Web based UNIX cluster monitoring tool local clients on each node reports to monitor node(s) clients are scripts running as cron jobs monitors run daemon to receive reports from clients Monitors alerts print web pages email notification of events Pros supports cluster configuration files, allowing definitions of subclusters errors can be categorized notifications can be assigned for each category uses a special Alert log as opposed to having to search syslog clients can be written to handle new monitoring tasks Cons no proactive event correction ability http://www.cs.virginia.edu/~jdm2d/alert/
8
Oak Ridge National Laboratory -- U.S. Department of Energy 8 Parallel UNIX Commands – (Ptools project) Parallel version of common UNIX commands cp, cat, ls, rm, mv, find, ps, kill, exec, and test Other parallel tools parallel process find, command execution on satisfied condition, command execution on collection of files, display command output Target Architecture MPP with full Unix environment on each node SP-1 Meiko CS-2 Unix NOWs Argonne National Laboratory William Gropp Ewing Lusk Status: vaporware -- latest reference ‘94 SHPCC paper http://www.ptools.org/ http://www.ptools.org/projects.html#PUC
9
Oak Ridge National Laboratory -- U.S. Department of Energy 9 Distributed Shell – (dsh) Command line based sequential execution across collection of hosts rsh to access nodes output prepended with host name Pros: single or multiple remote commands can create node groups command can specify individual hosts or use node groups Cons: no concurrent execution no interactive operation Dependencies: rsh, Perl environment vars: BEOWULF_ROOT – directory with beowulf related files WCOLL – location of file with default working collective http://www.ccr.buffalo.edu/dsh.htm
10
Oak Ridge National Laboratory -- U.S. Department of Energy 10 Parallel Remote Shell – (prsh) Command line based concurrent execution across collection of hosts run UNIX command across nodes stderr & stdout returned to originating computer Pros: ability to use rsh or ssh hosts and options can be specified in environment variables output can be associated with hostname using --prepend Cons: not able to perform interactive tasks (stdin set to /dev/null) using --status with rsh unreliable Dependencies: rsh, ssh, Perl environment vars: PRSH_OPTIONS – used before command line options PRSH_HOSTS – default host list http://www.cacr.caltech.edu/projects/beowulf/GrendelWeb/ software/index.html
11
Oak Ridge National Laboratory -- U.S. Department of Energy 11 Webmin web interface for system administration designed for use on individual systems – not clusters web server and CGI programs to perform administration tasks Pros: quick, graphical interface to most common system administration tasks telnet module for console access to hosts ability to define custom commands view and manage running processes easy addition of user written modules, and standards for writing them Cons: not intended for clusters must have web server on every host modules must be written entirely in Perl Dependencies: Perl 5 or later web server http://www.webmin.com/webmin/
12
Oak Ridge National Laboratory -- U.S. Department of Energy 12 ALINKA LCM - Linux Cluster Manager Command line based management and configuration Pros: cluster-wide command execution, except superuser commands ability to define and manage subclusters load monitoring of nodes MPI/PVM job execution support Cons: master node is NFS server for /home, /etc, and /var, limiting scalability no support for using SSH, and cluster command doesn't work as root no support for NIS or Shadow passwords limited to homogeneous clusters difficult to install and operate Dependencies: rsh, tar, nfs-server, sudo, php cgi-bin with pgsql support, bootpd, tcpdump, postgresql, gawk http://www.alinka.com/download.htm#lcm
13
Oak Ridge National Laboratory -- U.S. Department of Energy 13 ALINKA RAISIN GUI based management and configuration same functionality as ALINKA LCM added GUI Pros: cluster-wide command execution, except superuser commands ability to define and manage subclusters load monitoring of nodes MPI/PVM job execution support Cons: all cons of ALINKA LCM commercial license Dependencies: same as ALINKA LCM apache php module for apache with postgresql support gnuplot http://www.alinka.com/araisin.htm
14
Oak Ridge National Laboratory -- U.S. Department of Energy 14 Smile Cluster Management System – (SCMS) Command line and GUI environment designed managing beowulf-type clusters as single machine latest version looks promising with ptools like command line interface Pros: many system utilities (e.g. node status, node control panel, node file system, disk space, ftp, process status, reboot/shutdown, rpm package manager, telnet, parallel UNIX commands, alarm services, and motherboard monitoring) performance monitoring/logging of CPU, memory, I/O, and network user-definable alarm levels with e-mail or script notifications Cons: no support for job scheduling and cluster resource allocation no MPI/PVM job submission tool no support for using SSH Dependencies: rsh, Java, Perl http://smile.cpe.ku.ac.th/
15
Oak Ridge National Laboratory -- U.S. Department of Energy 15 Cluster Command & Control (C3) Tools Command line based single machine interface cluster configuration file serial & parallel versions Pros: serial version – deterministic execution, good for debugging parallel version – efficient execution ability to rapidly deploy software updates and update system images command line list option allows subcluster management distributed file scatter and gather operations execution of any non-interactive command Cons: no support for interactive command execution Dependencies: DHCP, rsync 2.4.3 or later, OpenSSL, OpenSSH, DNS, SystemImager v0.23, Perl v5.6.0 or later http://www.csm.ornl.gov/clusterpowertools torc@msr.epm.ornl.gov http://www.csm.ornl.gov/clusterpowertools
16
Oak Ridge National Laboratory -- U.S. Department of Energy 16 Cluster Command & Control (C3) Tools System administration cpushimage - “push” image across cluster cshutdown - Remote shutdown to reboot or halt cluster User tools cpush - push single file -to- directory crm - delete single file -to- directory cget - retrieve files from each node cexec - execute arbitrary command on each node cps - run ps and retrieve the output from each node ckill - kill a process on each node Add “s” to end for serial version -- cshutdowns, cpushs, etc...
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.