Download presentation
Presentation is loading. Please wait.
Published byJanel Gordon Modified over 9 years ago
1
Open Source Cluster Applications Resources
2
Overview What is O.S.C.A.R.? History Installation Operation Spin-offs Conclusions
3
History CCDK (Community Cluster Development Kit) OCG (Open Cluster Group) OSCAR (the Open Source Cluster Application Resource) IBM, Dell, SGI and Intel working closely together ORNL – Oak Ridge National Laboratory
4
First Meeting Tim Mattson and Stephen Scott Decided on these: That the adoption of clusters for mainstream, high-performance computing is inhibited by a lack of well-accepted software stacks that are robust and easy to use by the general user. That the group embraces the open-source model of software distribution. Anything contributed to the group must be freely distributable, preferably as source code under the Berkeley open-source license. That the group can accomplish its goals by propagating best-known practices built up through many years of hard work by cluster computing pioneers.
5
Initial Thoughts Differing architectures (small, medium, large) Two paths of progress, R&D and ease of use Primarily for non-computer-savvy users. Scientists Academics Homogeneous system
6
Timeline Initial meeting in 2000 Beta development started the same year First distribution, OSCAR 1.0 in 2001 at LinuxWorld Expo in New York City Today up to OSCAR 5.1 Heterogeneous system Far more robust More user friendly
7
Supported Distributions – 5.0 Distribution and ReleaseArchitectureStatus Red Hat Enterprise Linux 4x86Fully supported Red Hat Enterprise Linux 4x86_64Fully supported Red Hat Enterprise Linux 4ia64Fully supported Fedora Core 4x86Fully supported Fedora Core 4x86_64Fully supported Fedora Core 5x86Fully supported Fedora Core 5x86_64Fully supported Mandriva Linux 2006x86Fully supported SUSE Linux 10.0x86Fully supported
8
Installation Detailed Installation notes Detailed User guide Basic idea: Configure head node (server) Configure image for client nodes Configure network Distribute node images Manage your own cluster!!
9
Head Node Install by running./install_cluster eth1 script GUI will auto-launch Chose desired step in GUI, make sure each step is complete before proceeding onto next one All the configuration can be done from this system from now on
11
Download Subversion is used Default is the OSCAR SVN Can set up custom SVN Allows for up to date installation Allows for controlled rollouts of multiple clusters OPD also has powerful command line functionality (LWP for proxy servers)
12
Select & Configure OSCAR packages Customize server up to your liking/needs Some packages can be customized This step is very crucial, choice of packages can affect performance as well as compatibility
13
Installation of Server Node Simply installs packages which were selected Automatically configures the server node Now the Head or Server is ready to manage, administer and schedule jobs for it’s client nodes
14
Build Client Image Choose name Specify packages within the package file Specify distribution Be wary of automatic reboot if network boot is manually selected as default
15
Building the Client Image …
16
Define Clients This step creates the network structure of the nodes It’s advisable to assign IP based on physical links GUI short-comings regarding multiple IP spans Incorrect setup can lead to an error during node installation
17
Define Clients
18
Setup Networking SIS – System Installation Suite SystemImager MAC addresses are scanned for Must link a MAC to a node Must select network boot method (rsync, multicast, bt) Must make sure clients support PXE boot or create boot CDs Own Kernel can be used if the one supplied with SIS does not work
20
Client Installation and Test After the network is properly configured, installation can begin All nodes are installed and rebooted Once the system imaging is complete, a test can be run to ensure the cluster is working properly At this point, the cluster is ready to begin parallel job scheduling
21
Operation Admin packages are: Torque Resource Manager Maui Scheduler C3 pfilter System Imager Suite Switcher Environment Manager OPIUM Ganglia
22
Operation Library packages: LAM/MPI OpenMPI MPICH PVM
23
Torque Resource Manager Server on Head node “mom” daemon on clients Handles job submission and execution Keeps track of cluster resources Has own scheduler but uses Maui by default Commands are not intuitive, documentation must be read From OpenPBS http://svn.oscar.openclustergroup.org/wiki/oscar:5.1:a dministration_guide:ch4.1.1_torque_overview http://svn.oscar.openclustergroup.org/wiki/oscar:5.1:a dministration_guide:ch4.1.1_torque_overview
24
Maui Scheduler Handles job scheduling Sophisticated algorithms Customizable Much literature on it’s algorithms Has a commercial gen. of Maui called Moab Accepted as the unofficial HPC standard for scheduling http://www.clusterresources.com/pages/resources/do cumentation.php http://www.clusterresources.com/pages/resources/do cumentation.php
25
C3 - Cluster Command Control Developed by ORNL Collection of tools for cluster administration Commands: cget, cpush, crm, cpushimage cexec, cexecs, ckill, cshutdown cnum, cname, clist Cluster Configuration Files http://svn.oscar.openclustergroup.org/wiki/oscar:5.1:a dministration_guide:ch4.3.1_c3_overview http://svn.oscar.openclustergroup.org/wiki/oscar:5.1:a dministration_guide:ch4.3.1_c3_overview
26
pfilter Cluster traffic filter Default is that client nodes can only send outgoing communications, outside the scope of the cluster If it is desirable to open up client nodes, pfilter config file must be modified
27
System Imager Suite Tool for network Linux installations Image based, can even chroot into image Also has database which contains cluster configuration information Tied in with C3 Can handle multiple images per cluster Completely automated once image is created http://wiki.systemimager.org/index.php/Main_Page
28
Switcher Environment Manager Handles “dot” files Does not limit advanced users Designed to help non-savvy users Has guards in place that prevent system destruction Which MPI to use – per user basis Operates on two levels: user and system Modules package is included for advanced users (and used by switcher)
29
OPIUM Login is handled by the Head node Once connection is established, client nodes do not require authentication Synchronization run by root, at intervals It stores hash values of the password in.shh folder along with a “salt” Password changes must be done at the Head node as all changes propagate from there
30
Ganglia Distributed Monitoring System Low overhead per node XML for data representation Robust Used in most cluster and grid solutions http://ganglia.info/papers/science.pdf
31
LAM/MPI LAM - Local Area Multicomputer LAM initializes the runtime environment on a select number of nodes MPI 1 and some of MPI 2 MPICH2 can be used if installed Two tiered debugging system exists: snapshot and communication log Daemon based http://www.lam-mpi.org/
32
Open MPI Replacement for LAM/MPI Same team working on it LAM/MPI relegated to upkeep only, all new development in Open MPI Much more robust (OS, schedulers) Full MPI-2 compliance Much higher performance http://www.open-mpi.org/
33
PVM – Parallel Virtual Machine Same as LAM/MPI Can be run outside of the scope of Torque and Maui Supports Windows nodes as well Much better portability Not as robust and powerful as Open MPI http://www.csm.ornl.gov/pvm/
34
Spin-offs HA-OSCAR - http://xcr.cenit.latech.edu/ha-oscar/http://xcr.cenit.latech.edu/ha-oscar/ VMware with OSCAR - http://www.vmware.com/vmtn/appliances/directory/ 341 http://www.vmware.com/vmtn/appliances/directory/ 341 SSI-OSCAR - http://ssi-oscar.gforge.inria.fr/http://ssi-oscar.gforge.inria.fr/ SSS-OSCAR - http://www.csm.ornl.gov/oscar/sss/http://www.csm.ornl.gov/oscar/sss/
35
Conclusions Future Direction Open MPI Windows, Mac OS?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.