An Introduction to GPFS Vladimir Sapunenko INFN-CNAF
V.Sapunenko - INFN T1+T2 cloud workshop What is GPFS? IBM General Parallel File System is a high-performance shared-disk cluster file system. Designed to support high performance computing POSIX compliant Provides concurrent high-speed file access to apps executing on multiple nodes of an AIX and Linux clusters Switching fabric I/O nodes Shared disks 21-22/11/2006 V.Sapunenko - INFN T1+T2 cloud workshop
V.Sapunenko - INFN T1+T2 cloud workshop The file system Supports quotas, snapshots and extended ACLs Built from a collection of disks Each disk can contain data and/or metadata Coherency and consistency maintained via distributed lock manager 21-22/11/2006 V.Sapunenko - INFN T1+T2 cloud workshop
Performance and scalability Striping across multiple disks attached to multiple nodes Efficient client side caching Support for large block size (configurable) Advanced algorithms for read-ahead and write-behind Dynamic optimization of I/O based on access pattern (sequential, reverse sequential, random) 21-22/11/2006 V.Sapunenko - INFN T1+T2 cloud workshop
V.Sapunenko - INFN T1+T2 cloud workshop Administration Consistent with standard Linux file system administration Simple CLI, most commands can be issued from any node in the cluster No Java and graphic libraries dependency Extensions for clustering aspects A single command can perform an action across the entire cluster Support for Data Management API (IBM’s implementation of X/Open data storage management API) 21-22/11/2006 V.Sapunenko - INFN T1+T2 cloud workshop
V.Sapunenko - INFN T1+T2 cloud workshop Data availability Fault tolerance Clustering – node failure Storage system failure – data replication File system health monitoring Extensive logging and automated recovery actions in case of failure Data replication available for Journal logs; Data Metadata 21-22/11/2006 V.Sapunenko - INFN T1+T2 cloud workshop
Information Lifecycle Management (ILM) New feature of GPFS v3.1 Storage pools allow the creation of disk groups within a file system (hardware partitioning) Filesets is a sub-tree of the file system namespace (Namespace partitioning). For example, it can be used as administrative boundaries to set quotas. 21-22/11/2006 V.Sapunenko - INFN T1+T2 cloud workshop
V.Sapunenko - INFN T1+T2 cloud workshop User defined polices File placement policies Define where the data will be created (appropriate storage pool) Rules are determined by attributes like File name User name Fileset File management policies Possibility to move data from one pool to another without changing file location in the directory structure Change replication status Prune file system (deleting files as defined by policy) Determined by attributes like Access time Path name Size of the file 21-22/11/2006 V.Sapunenko - INFN T1+T2 cloud workshop
V.Sapunenko - INFN T1+T2 cloud workshop Policy rules examples If the storage pool named pool_1 has an occupancy percentage above 90% now, bring the occupancy percentage of pool_1 down to 70% by migrating the largest files to storage pool pool_2: RULE 'mig1' MIGRATE FROM POOL 'pool_1' THRESHOLD(90,70) WEIGHT(KB_ALLOCATED) TO POOL 'pool_2' Delete files from the storage pool named pool_1 that have not been accessed in the last 30 days, and are named like temporary files or appear in any directory that is named tmp: RULE 'del1' DELETE FROM POOL 'pool_1' WHERE (DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME) > 30) AND (lower(NAME) LIKE '%.tmp' OR PATH_NAME LIKE '%/tmp/%') 21-22/11/2006 V.Sapunenko - INFN T1+T2 cloud workshop
Cluster configuration Shared disk cluster SAN storage attached to all nodes in the cluster via FiberChannel (FC) All nodes interconnected via LAN Data flows via FC Control info transmitted via Ethernet 21-22/11/2006 V.Sapunenko - INFN T1+T2 cloud workshop
Network based block I/O Block level I/O interface over network – Network Shared Disk (NSD) GPFS transparently handles I/O whether NSD or direct attachment is used Intra-cluster communications can be separated using dedicated interfaces 21-22/11/2006 V.Sapunenko - INFN T1+T2 cloud workshop
V.Sapunenko - INFN T1+T2 cloud workshop GPFS@Tier1 I/O servers Worker nodes Production GPFS v.2.3.0-12 12 I/O servers 15 file systems 125.4 TB (via SAN) 654 worker nodes Testbed GPFS v.3.1.0-4 3 I/O servers (no SAN disks) 2 clients Virtual SAN Storage disks 21-22/11/2006 V.Sapunenko - INFN T1+T2 cloud workshop
V.Sapunenko - INFN T1+T2 cloud workshop Enhancements in GPFS v3.1 The requirements for IP connectivity in a multi-cluster environment have been relaxed. In earlier releases of GPFS, 'all-to-all' connectivity was required. Any node mounting a given file system had to be able to open a TCP/IP connection to any other node mounting the same file system, irrespective of which cluster either of the nodes belonged to. Enhanced file system administration The "mmmount" and "mmumount" commands are provided for cluster-wide file system management. Performans monitoring 21-22/11/2006 V.Sapunenko - INFN T1+T2 cloud workshop
Known GPFS limitations Number of filesystems < 32 Number of cluster nodes < 4096 (2441 tested) Single disk size < 2TB Filesystem size < 299 bytes (2 PB tested) Number of files < 2*109 Does not support the Red Hat EL 4.0 uniprocessor (UP) kernel. Does not support the RHEL 3.0 and RHEL 4.0 hugemem kernel. Although GPFS is a POSIX-compliant file system, some exceptions apply to this: Memory mapped files are not supported in this release. The stat() is not fully supported. mtime, atime and ctime returned from the stat() system call may be updated slowly if the file has recently been updated on another node 21-22/11/2006 V.Sapunenko - INFN T1+T2 cloud workshop