Beowulf Software
Monitoring and Administration Beowulf Watch Tck/Tk and rsh based memory, users, process info. EPCKPT Checkpoint for Linux checkpointing kernel patch to Linux saving running process ’ s snapshot for later restart useful for fault tolerance, process tracing/debugging, rollback transactions, migration
Monitoring and Administration lperfex performance monitoring and analysis tool for Linux/IA32 system P-pro/PIII status register 의 정보 사용 (???) Compaq CMU Disk Image Cloning can do network installation and disk partitioning Console Broadcasting Serial Console connecting each computing nodes
Monitoring and Administration SCMS Parallel Unix command pls, pps, … Display node status CPU, Memory, Device info. administration shutdown, reboot, remote login, remote command execution FAI (Fully Automatic Installation) Automatic Installation over cluster PCs for Debian Linux, no interaction needed
Global Process Space BPROC remote process start without remote-login Ghost Process implemented with Kernel-Thread master node 의 ghost process 는 remote 에서 실행중인 real process 에 대응된다. PID masquerading masqueraded PID related operation 을 control 하는 daemon Starting Processes rexec : execve syscall 과 유사, homogeneous node 여야 한다. move or rfork : saving process ’ s memory region and recreating it on the remote node can transport binary and anything mmap ’ ed(ex DLL)
Global Process Space bexec (brexec) ftp://ftp.parl.clemson.edu/pub/beowulf/bexec tgz ftp://ftp.parl.clemson.edu/pub/beowulf/bexec tgz use a daemon to start tasks and deliver signals user-level implementation
Load-balancing & Allocations job manager load balancing and queue control of jobs solve problem of batch queue computing system Condor Load-balancing over large number of systems owned by different people process migration, node status monitoring, resource allocation Condor + BPROC ??
Cluster Networking Channel Bonding allow multiple device to be used as one in order to improve bandwidth low-level approach
File Systems GFS (Global File Systems) multiple nodes can share storage over network SFS (Secure File Systems) store files securely on remote sites using normal network protocols(FTP,HTTP,NFS … ) use smartcards for authentication and signature
File Systems PVFS (Parallel Virtual File System) improve performance of coarse-grain parallel applications with large I/O requirements operates at the user-level no kernel modifications needed