Download presentation
Presentation is loading. Please wait.
Published byAshlie Todd Modified over 8 years ago
1
Beowulf Software
2
Monitoring and Administration Beowulf Watch http://www.kaybee.org/~kirk/html/linux.html http://www.kaybee.org/~kirk/html/linux.html Tck/Tk and rsh based memory, users, process info. EPCKPT Checkpoint for Linux http://www.cos.ufrj.br/~edpin/epckpt http://www.cos.ufrj.br/~edpin/epckpt checkpointing kernel patch to Linux saving running process ’ s snapshot for later restart useful for fault tolerance, process tracing/debugging, rollback transactions, migration
3
Monitoring and Administration lperfex http://www.osc.edu/~troy/lperfex http://www.osc.edu/~troy/lperfex performance monitoring and analysis tool for Linux/IA32 system P-pro/PIII status register 의 정보 사용 (???) Compaq CMU http://www.compaq.com/solutions/customsystems/hps/linux-cmu.html http://www.compaq.com/solutions/customsystems/hps/linux-cmu.html Disk Image Cloning can do network installation and disk partitioning Console Broadcasting Serial Console connecting each computing nodes
4
Monitoring and Administration SCMS http://smile.cpe.ku.ac.th/software/scms/index.html http://smile.cpe.ku.ac.th/software/scms/index.html Parallel Unix command pls, pps, … Display node status CPU, Memory, Device info. administration shutdown, reboot, remote login, remote command execution FAI (Fully Automatic Installation) http://www.informatik.uni-koeln.de/fai http://www.informatik.uni-koeln.de/fai Automatic Installation over cluster PCs for Debian Linux, no interaction needed
5
Global Process Space BPROC http://beowulf.gsfc.nasa.gov/software/bproc.html http://beowulf.gsfc.nasa.gov/software/bproc.html remote process start without remote-login Ghost Process implemented with Kernel-Thread master node 의 ghost process 는 remote 에서 실행중인 real process 에 대응된다. PID masquerading masqueraded PID related operation 을 control 하는 daemon Starting Processes rexec : execve syscall 과 유사, homogeneous node 여야 한다. move or rfork : saving process ’ s memory region and recreating it on the remote node can transport binary and anything mmap ’ ed(ex DLL)
6
Global Process Space bexec (brexec) ftp://ftp.parl.clemson.edu/pub/beowulf/bexec-1.1.2.tgz ftp://ftp.parl.clemson.edu/pub/beowulf/bexec-1.1.2.tgz use a daemon to start tasks and deliver signals user-level implementation
7
Load-balancing & Allocations job manager http://bond.imm.dtu.dk/jobd http://bond.imm.dtu.dk/jobd load balancing and queue control of jobs solve problem of batch queue computing system Condor http://www.cs.wisc.edu/project/condor http://www.cs.wisc.edu/project/condor Load-balancing over large number of systems owned by different people process migration, node status monitoring, resource allocation Condor + BPROC ??
8
Cluster Networking Channel Bonding http://pdsf.nersc.gov/linux http://pdsf.nersc.gov/linux allow multiple device to be used as one in order to improve bandwidth low-level approach
9
File Systems GFS (Global File Systems) http://www.globalfilesystem.org http://www.globalfilesystem.org multiple nodes can share storage over network SFS (Secure File Systems) http://elbe.borg.umn.edu http://elbe.borg.umn.edu store files securely on remote sites using normal network protocols(FTP,HTTP,NFS … ) use smartcards for authentication and signature
10
File Systems PVFS (Parallel Virtual File System) http://ece.clemson.edu/parl/pvfs http://ece.clemson.edu/parl/pvfs improve performance of coarse-grain parallel applications with large I/O requirements operates at the user-level no kernel modifications needed
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.