EGEE is a project funded by the European Union under contract IST GPFS General Parallel File System INFN-GRID Technical Board – Bologna 1-2 luglio Rosanna Catania
INFN–GRID Technical Board, Bologna The General Parallel File System (GPFS) for Linux on xSeries® is a high-performance shared-disk file system that can provide data access from all nodes in a Linux cluster environment. Parallel and serial applications can readily access shared files using standard UNIX® file system interfaces, and the same file can be accessed concurrently from multiple nodes. GPFS provides high availability through logging and replication, and can be configured for failover from both disk and server malfunctions. Introducing GPFS
INFN–GRID Technical Board, Bologna What does GPFS do? Presents one file system to many nodes – appears to the user as a standard Unix filesystem Allows nodes concurrent access to the same data GPFS offers: scalability, high availability and recoverability high performance
INFN–GRID Technical Board, Bologna Why use GPFS? GPFS highlights Improved system performance Assured file consistency High recoverability and increased data availability Enhanced system flexibility Simplified administration
INFN–GRID Technical Board, Bologna Distributions and kernel levels has been tested whit GPFS: GPFS 2.2 for Linux on xSeries GPFS VersionLinux DistributionKernel Version Red Hat EL * Red Hat Pro SuSE SLES (service pack 3)
INFN–GRID Technical Board, Bologna “Direct attached” configuration (the one NOT tested yet because of RH7.3) GPFS software installed on each node. GPFS dedicated network, min. 10/100 ethernet SAN connection between all nodes and storage Each logical disk becomes a logical volume, from which the GPFS filesystem is created. Node GPFS SAN
INFN–GRID Technical Board, Bologna “Shared disk” configuration (the one actually tested!) Additional shared disk (SD) software layer on all nodes. Nodes which have connection to storage are SD servers. Each logical disk becomes a “shared disk”. Disks here are twin tailed between nodes. Nodes which aren’t connected to the storage are SD clients and can access disks via the SD servers. storage are SD servers. This configuration produces a lot of traffic across the GPFS network Node GPFS SD client SD server
INFN–GRID Technical Board, Bologna Quorum: multi-node quorum Quorum = ½ number of nodes + 1 Node GPFS Node GPFS Node GPFS Node GPFS Quorum exists, filesystem accessible Quorum lost, filesystem inaccessible
INFN–GRID Technical Board, Bologna System requirements Upgrade kernel Apply patches mmap-invalidate patch.gz and NFS lock to the Linux kernel, recompile, and install this kernel Ensure the glibc level is or greater Proper authorization is garanted to all nodes in the GPFS cluster to use alternative remote shell and remote copy commands (at Catania we use SSH everywhere)
INFN–GRID Technical Board, Bologna RPMs : rsct.basic i386.rpm rsct.core i386.rpm rsct.core.utils i386.rpm src i386.rpm gpfs.base i386.rpm gpfs.docs noarch.rpm gpfs.gpl noarch.rpm gpfs.msg.en_US noarch.rpm
INFN–GRID Technical Board, Bologna RSCT: Reliable Scalable Cluster Tecnology RSCT is a set of software components that together provide a comprenhensive clustering environment for Linux Is the ininfrastructure used by a variety of IBM products to provide clusters with improved system availability, scalability, and ease of use.
INFN–GRID Technical Board, Bologna RSCT: Components The Resource Monitoring and Control (RMC) subsystem Provided global access to subsystem and resources throughout the cluster: a single monitoring/management infrastructure The RSCT core resource managers A software layer between a resource (hardware or software) end RMC The RSCT cluster security services Provide the secirity infrastructure that enable RSCT components to authenticate the ifdentity of other parties The Topology Services subsystem Provide node-network failure detection The Group Service subsystem Provides cross node/process coordination
INFN–GRID Technical Board, Bologna RSCT peer domain: configuration IP connectivity between all nodes of the peer domain Prepare initial security environment on each node that will be in the peer domain using the preprpnode -k originator_node ip_server1 Create a new peer domain definition by issuing the mkrpdomain – f allnodes.txt domain_name Bring the peer domain online using the startrpdomain domain_name Verify your configuration lsrpdomain domain_name lsrpnode –a
INFN–GRID Technical Board, Bologna GPFS: Installation On each node copy the self-extrating images from the CDROM, invoke and accept the license agreement . /gpfs_install _i386 --silent rpm -ivh gpfs.base i386.rpm gpfs.docs noarch.rpm gpfs.gpl noarch.rpm gpfs.msg.en_US noarch.rpm Build your GPFS portability module vi /usr/lpp/mmfs/src/config/site.mcr export SHARKCLONEROOT=/usr/lpp/mmfs/src cd /usr/lpp/mmfs/src/ make World To install the linux portability interface for GPFS make InstallImages Verification less /var/adm/ras/mmfs.log.latest
INFN–GRID Technical Board, Bologna GPFS: Configuration CREATING the CLUSTER: mmcrcluster -t lc -n allnodes.txt -p primary_server -s secondary_server -r /usr/bin/ssh -R /usr/bin/scp mmlscluster CREATING the NODESET ON THE ORIGINATOR NODE: mmconfig -n allnodes.txt -A -C cluster_name mmlsconfig -C cluster_name START the GPFS SERVICES ON EACH NODE: mmstartup -C cluster_name (mmstartup –a)
INFN–GRID Technical Board, Bologna GPFS: Configuration CREATE NSD (Node Shared Disks) mmcrnsd -F Descfile -v yes CREATING A FILE SYSTEM mkdir /gpfs mmcrfs /gpfs gpfs0 -F Descfile -C cluster_name -A yes MOUNT A FILE SYSTEM mount /gpfs VERIFICATION: mmlscluster mmlsconfig-C cluster_name
INFN–GRID Technical Board, Bologna Conclusions and outlook GPFS learning curve is very steep: documentation is “monumental” but not always well organized Once properly installed and configured, GPFS actually allows to “see” many disk server as a unique entity Network bandwidth of the single servers is VERY important (GPFS sets down to the “slowest” node) Reliability is still under testing Preliminary I/O performance tests in the “NFS” configuration show a worse behaviour w.r.t. to native NFS (about 4:1) The proper configuration with GPFS installed both on WNs and servers has still to be tested (very soon!): short term: trying to install the “right” kernel on the WNs running Grid.It long term: re-doing tests on Scientific Linux whenever available
INFN–GRID Technical Board, Bologna Useful links