EGEE is a project funded by the European Union under contract IST Test di GPFS a Catania IV Workshop INFN Grid – Bari Ottobre Rosanna Catania INFN Catania
INFN–GRID Workshop, Bari The General Parallel File System (GPFS) for Linux on xSeries® is a high-performance shared-disk file system that can provide data access from all nodes in a Linux cluster environment. Parallel and serial applications can readily access shared files using standard UNIX® file system interfaces, and the same file can be accessed concurrently from multiple nodes. GPFS provides high availability through logging and replication, and can be configured for failover from both disk and server malfunctions. Introducing GPFS
INFN–GRID Workshop, Bari What does GPFS do? Why use GPFS? Presents one file system to many nodes – appears to the user as a standard Unix filesystem Allows nodes concurrent access to the same data GPFS offers: scalability, high availability and recoverability high performance GPFS highlights Improved system performance! Assured file consistency High recoverability and increased data availability Enhanced system flexibility Simplified administration
INFN–GRID Workshop, Bari System requirements Upgrade kernel Ensure the glibc level is or greater Proper authorization is garanted to all nodes in the GPFS cluster to use alternative remote shell and remote copy commands (at Catania we use SSH everywhere)
INFN–GRID Workshop, Bari RPMs : src i386.rpm rsct.basic i386.rpm rsct.core i386.rpm rsct.core.utils i386.rpm gpfs.base i386.rpm gpfs.docs noarch.rpm gpfs.gpl noarch.rpm gpfs.msg.en_US noarch.rpm
INFN–GRID Workshop, Bari RSCT: Reliable Scalable Cluster Tecnology RSCT is a set of software components that together provide a comprenhensive clustering environment for Linux Is the infrastructure used to provide clusters with improved system availability, scalability, and ease of use.
INFN–GRID Workshop, Bari RSCT peer domain: configuration IP connectivity between all nodes of the peer domain Prepare initial security environment on each node that will be in the peer domain using the preprpnode -k originator_node ip_server1 Create a new peer domain definition by issuing the mkrpdomain – f allnodes.txt domain_name Bring the peer domain online using the startrpdomain domain_name Verify your configuration lsrpdomain domain_name lsrpnode –a
INFN–GRID Workshop, Bari GPFS: Installation On each node copy the self-extrating images from the CDROM, invoke and accept the license agreement . /gpfs_install _i386 --silent rpm -ivh gpfs.base i386.rpm gpfs.docs noarch.rpm gpfs.gpl noarch.rpm gpfs.msg.en_US noarch.rpm Build your GPFS portability module vi /usr/lpp/mmfs/src/config/site.mcr export SHARKCLONEROOT=/usr/lpp/mmfs/src cd /usr/lpp/mmfs/src/ make World To install the linux portability interface for GPFS make InstallImages
INFN–GRID Workshop, Bari GPFS: Configuration CREATING the CLUSTER: mmcrcluster -t lc -n allnodes.txt -p primary_server -s secondary_server -r /usr/bin/ssh -R /usr/bin/scp mmlscluster CREATING the NODESET ON THE ORIGINATOR NODE: mmconfig -n allnodes.txt -A -C cluster_name mmlsconfig -C cluster_name START the GPFS SERVICES ON EACH NODE: mmstartup -C cluster_name (mmstartup –a) Verification less /var/adm/ras/mmfs.log.latest
INFN–GRID Workshop, Bari “Direct attached” configuration (Tested on Grid!) GPFS software installed on each node. GPFS dedicated network, min. 10/100 ethernet SWT: connection between all nodes and storage Each logical disk becomes a logical volume, from which the GPFS filesystem is created. Node GPFS SWT
INFN–GRID Workshop, Bari GPFS: Configuration CREATE NSD (Node Shared Disks) mmcrnsd -F Descfile -v yes CREATING A FILE SYSTEM mkdir /gpfs mmcrfs /gpfs gpfs0 -F Descfile -C cluster_name -A yes MOUNT A FILE SYSTEM mount /gpfs VERIFICATION: mmlscluster mmlsconfig-C cluster_name
INFN–GRID Workshop, Bari Distributions and kernel levels has been tested on Grid: GPFS 2.2 for Linux on xSeries GPFS VersionLinux DistributionKernel VersionRole Red Hat smpDisk Server Red Hat smpDis Server Red Hat smpClient Red Hat legacy.smp Client
INFN–GRID Workshop, Bari Test A – 1WN : configuration 2 SERVER RH WN RH smp 2 storage disk 480 = 960 GB 2 storage disk 720 = GB /local 1 GB file system: ext3 /NFS-DATA mounted via NFS /gpfs-data 2.2TB mounted by GPFS SERVER NFS. TOTAL:4 disk/2 server = 2.4 TB /GPFS-DATA 2.2TB file system: GPFS /NFS-DATA /local
INFN–GRID Workshop, Bari TEST A: READING… (average of 5 samples) Seconds MB
INFN–GRID Workshop, Bari TEST A : WRITING… (average of 5 samples) Seconds MB
INFN–GRID Workshop, Bari Test B : configuration 1 SERVER RH smp 1 SERVER RH smp WN RH legacy.smp 2 storage disk 720 = TB 2 storage disk 1000 = 2 TB TOTAL:4 disk/2 server = 3.4 TB /GPFS-DATA 3.4TB file system: GPFS /local 1 GB file system: ext3 /NFS-DATA mounted via NFS /gpfs-data 2.2TB mounted by GPFS SERVER NFS. /NFS-DATA /local
INFN–GRID Workshop, Bari TEST B – 1WN: READING… (average of 5 samples) Seconds MB
INFN–GRID Workshop, Bari TEST B – 2 WN : READING… (average of 5 samples) Seconds MB
INFN–GRID Workshop, Bari TEST B – 3 WN : READING… (average of 5 samples) Seconds MB
INFN–GRID Workshop, Bari TEST B – 1WN: WRITING… (average of 5 samples) Seconds MB
INFN–GRID Workshop, Bari TEST B - 2 WN : WRITING… (average of 5 samples) Seconds MB
INFN–GRID Workshop, Bari TEST B – 3 WN : WRITING… (average of 5 samples) Seconds MB
INFN–GRID Workshop, Bari Analysis of results Reading from GPFS takes more or less the same time of reading from NFS Writing on GPFS is faster than on NFS and increases with the number of WNs
INFN–GRID Workshop, Bari Conclusions and outlook Preliminary I/O performance tests in the “NFS” configuration show a worse behaviour w.r.t. to native NFS (about 4:1) ; “Direct attached” is strongly suggest to improve performance Network bandwidth of the single servers is VERY important (GPFS sets down to the “slowest” node) The proper configuration with GPFS installed both on WNs and servers has been tested: Short term (next weeks): tests of reliability Medium term (by the end of the year): use GPFS to manage all the disk storage at Catania