The NERSC Global File System and PDSF Tom Langley PDSF Support Group NERSC at Lawrence Berkeley National Laboratory Fall HEPiX 09-13 October 2006.

The NERSC Global File System and PDSF Tom Langley PDSF Support Group NERSC at Lawrence Berkeley National Laboratory Fall HEPiX 09-13 October 2006

Outline Define needs NGF PDSF and NGF

Defining the need The NERSC facility is home to several large scale systems, many if not all of which may be used by any user at any time. The movement of users between systems necessitates the ability for their data to be accessible on all systems. The central storage facility should be able to manage very large amounts of data while providing acceptable data rates to and from each attached system. Procedures should be in place to adequately backup and restore user data as needed.

NGF Current Configuration Current configuration: –24 I/O server nodes – Linux SLES 9 SP2 & GPFS 2.3 PTF 12 –70 TB usable end user storage DDN 8500 with SATA drives IBM DS4500 SATA drives IBM DS4500 FC drives –50 million inodes –3+ GB/s bandwidth for streaming I/O –Storage and servers external to all NERSC systems –Distributed over 10 Gigabit Ethernet infrastructure –Single file system instance providing file and data sharing among multiple NERSC systems Both large and small files expected Persistent data, not scratch Backed up to HPSS

The /project file system The global /project filesystem used to share data between individual project participants: –Access characteristics Mountable remotely with R/W access, with nosuid and nodev mount option root mapped to nobody:nobody –Current usage 19.5 TB used (28% of capacity) 2.2 M inodes used (5% of capacity) –Backed up to HPSS bi-weekly –Default project quota: 1 TB 250,000 inodes

Deployed NERSC Systems NGF /project is currently mounted on all major NERSC systems (1240+ clients): –PDSF, IA32/X86_64 Linux cluster running Scientific Linux –Jacquard, LNXI Opteron System running SLES 9 –Da Vinci, SGI Altix running SLES 9 Service Pack 3 with direct storage access –Bassi, IBM Power5 running AIX 5.3 –Seaborg, IBM SP running AIX 5.2

NGF Current Configuration

Current Architecture Limitations NGF architectural limitations on legacy system performance –Majority of NGF access via IP-based GPFS traffic Only Da Vinci can access NGF storage directly –Engenio storage multipath performance deficiencies Accessing a LUN via multiple paths results in reduced performance Currently, access to a LUN is limited to a single path Da Vinci restricted to accessing Engenio storage via IP links –Limited IP gateway bandwidth between NERSC systems and NGF Routing issue on Jacquard Bonded Ethernet performance on Seaborg & NGF Limited bandwidth into each system

Backups-today Currently the NGF /project filesystem gets a full backup once every other week which takes 5-6 days to complete. The current backup scripts back up data at an average rate of ~50MB/s. –A new version of the scripts which overlap backup file creation with transfers to HPSS, and supports the concurrent backup of multiple project directories is nearly ready, and should allow us to increase that rate significantly. It is intended that /project will eventually receive nightly incremental backups as well as the bi-weekly full backups.

Backups – future enhancements It is intended that /project will eventually receive nightly incremental backups as well as the bi-weekly full backups. –Significant growth of NGF may necessitate a longer backup cycle where full backups occur monthly, or bi- monthly. –In order to mitigate the risk of increased data loss in the case of a lost/damaged backup file the backup system will support multiple parallel backup hierarchies which run on alternating days so that if a high level dump in one hierarchy is lost, data can still be brought to within 2 days of current via the other hierarchy.

User Feedback User feedback –Ease of Use Supports ACLs Supports Quotas –Performance Sufficient for many projects –Availability and reliability Outages have been noticed “Contagious” GPFS problems

Problem Encountered Outages –The current NGF architecture has built-in redundancy to allow NGF to survive from any single hardware failure –Center-wide NGF outage may occur due to multiple failures or software bugs –Partial outage (multiple NGF outage within a NERSC system) may also occur due to network failures Problems –Servers crashes (thermal faults) –Environmental –Disks and Controllers –Switches –Software bugs

Solutions –Pro-active monitoring –Procedural development –Operations staff activities –PMRs filed and fixes applied –Replacing old servers

Proactive Monitoring Nagios event detection and notification –Disk faults and soft failures –Server crashes –Nodes/Systems currently being monitored: UPS: 3 APC UPS FC Switches: 2 Brocade FC switches, 2 Qlogic FC switches Storage: 2 DDN controllers, 4 IBM FAStTs Servers: 28 NGF servers –Nagios allows event-driven procedures for the operations staff Cacti performance tracking –NSD servers: disk I/O, network traffic, cpu and memory usage, load average –FC switches: FC port statistics, fan, temperature –DDN: FC port statistics (IO/s, MB/s)

Event Monitoring with Nagios

Performance Tracking with Cacti

Home File System(s) 2 basic approaches possible –passwd info refers to the same path for all systems /home/matt/ –passwd info refers to different subdirs of the user’s directory for each system /home/matt/pdsf /home/matt/seaborg /home/matt/…

One directory for all Users see exactly the same thing in their home dir every time they log in, no matter what machine they’re on. Programs sometimes change the format of their configuration files(dot files) from one release to another without changing the file’s name. To make this work you need to either make that filename look different from system to system(make it an afs @sys style symbolic link), or convince the program to look for it in a different place on each system(set the $HOME environment variable) GPFS does not support a feature similar to afs’ @sys links

One directory for all Setting $HOME affects all applications not just the one that needs different config files Programs that use getpwnam() to determine the users home directory, and look there for config files rather than in $HOME Setting $HOME essentially emulates the effect of having separate home dirs for each system

One directory per system By default users start off in a different directory on each system Dot files are different on each system unless the user uses symbolic links to make them the same All of a users files are accessible from all systems, but a user may need to “cd../seaborg” to get at files he created on seaborg if he’s logged into a different system

One directory per system Could have a “shared” subdirectory that points to the same place on all systems –/home/matt/seaborg/shared -> /home/matt/shared –/home/matt/bassi/shared -> /home/matt/shared A user who really wants the exact same home dir on all machines can get that behavior by making the per-system directories actually be symlinks to the shared directory. –/home/matt/seaborg/shared -> /home/matt/shared –/home/matt/pdsf/shared -> /home/matt/shared

Global Homes Conclusion Both schemes have advantages but without the ability to make individual files look different on a system by system basis via an @sys like feature the single directory scheme winds up looking a lot like the multiple directory scheme but with some added disadvantages. For this reason the NGF group decided to go with separate directories per system, especially since the user can achieve the single directory behavior if he or she so chooses.

PDSF and NGF Compatibility between the local GPFS instance and the global file system is necessary. Users have very large data requirements so NGF must be prepared to handle the potential of tens of terabytes of data at any given time. Client data movement is on the order of several TB daily so the data bandwidth must be sufficient to handle the traffic. PDSF currently contains roughly 200TB of local disk store utilizing both NFS and GPFS. This storage is separated from NGF since its service level is different from NGF.

GPFS Overview/Review Version 2.3 ptf12

New problems Servers do not fail, just degrade –Home file system now become very sluggish, under NFS it would fail and we would reboot PDSF reused hardware links the NSD function and disk Volume groups are larger, thus when one system does go down it affects a larger group of researchers – Existing hardware does not allow for redundant paths to disk – But metadata/data mirror is available

New benefits Groups now have one or two directory structures to find their data, it used to be up to 20. Better performance, data now striped across multiple systems Servers less prone to hang under heavy load Administration is easier, We have ~10 GPFS volumes instead of ~70 NFS ones Able to get experiments to move away from the cheapest storage available, to something a little more reliable. –Fibre channel array boxes with SATA drives –Will allow multiple path to storage and failover NSD servers

The NERSC Global File System and PDSF Tom Langley PDSF Support Group NERSC at Lawrence Berkeley National Laboratory Fall HEPiX 09-13 October 2006.

Similar presentations

Presentation on theme: "The NERSC Global File System and PDSF Tom Langley PDSF Support Group NERSC at Lawrence Berkeley National Laboratory Fall HEPiX 09-13 October 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The NERSC Global File System and PDSF Tom Langley PDSF Support Group NERSC at Lawrence Berkeley National Laboratory Fall HEPiX 09-13 October 2006.

Similar presentations

Presentation on theme: "The NERSC Global File System and PDSF Tom Langley PDSF Support Group NERSC at Lawrence Berkeley National Laboratory Fall HEPiX 09-13 October 2006."— Presentation transcript:

Similar presentations

About project

Feedback