1 The NERSC Global File System NERSC June 12th, 2006.

Slides:



Advertisements
Similar presentations
IBM SMB Software Group ® ibm.com/software/smb Maintain Hardware Platform Health An IT Services Management Infrastructure Solution.
Advertisements

Chapter 1: Introduction to Scaling Networks
MUNIS Platform Migration Project WELCOME. Agenda Introductions Tyler Cloud Overview Munis New Features Questions.
Architecture and Implementation of Lustre at the National Climate Computing Research Center Douglas Fuller National Climate Computing Research Center /
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
Highly Available Central Services An Intelligent Router Approach Thomas Finnern Thorsten Witt DESY/IT.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Silicon Graphics, Inc. Poster Presented by: SGI Proprietary Technologies for Breakthrough Research Rosario Caltabiano North East Higher Education & Research.
On evaluating GPFS Research work that has been done at HLRS by Alejandro Calderon.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
© 2010 VMware Inc. All rights reserved VMware ESX and ESXi Module 3.
National Energy Research Scientific Computing Center (NERSC) The GUPFS Project at NERSC GUPFS Team NERSC Center Division, LBNL November 2003.
Module 8 Implementing Backup and Recovery. Module Overview Planning Backup and Recovery Backing Up Exchange Server 2010 Restoring Exchange Server 2010.
VMware vCenter Server Module 4.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
High Availability & Oracle RAC 18 Aug 2005 John Sheaffer Platform Solution Specialist
Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.
Chapter 2: Installing and Upgrading to Windows Server 2008 R2 BAI617.
NERCS Users’ Group, Oct. 3, 2005 NUG Training 10/3/2005 Logistics –Morning only coffee and snacks –Additional drinks $0.50 in refrigerator in small kitchen.
Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University.
The NERSC Global File System (NGF) Jason Hick Storage Systems Group Lead CAS2K11 September 11-14,

DAC-FF The Ultimate Fibre-to-Fibre Channel External RAID Controller Solution for High Performance Servers, Clusters, and Storage Area Networks (SAN)
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Sensitivity of Cluster File System Access to I/O Server Selection A. Apon, P. Wolinski, and G. Amerson University of Arkansas.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
20-22 September 1999 HPSS User Forum, Santa Fe CERN IT/PDP 1 History  Test system HPSS 3.2 installation in Oct 1997 IBM AIX machines with IBM 3590 drives.
National Energy Research Scientific Computing Center (NERSC) Visualization Tools and Techniques on Seaborg and Escher Wes Bethel & Cristina Siegerist NERSC.
Computer Systems Lab The University of Wisconsin - Madison Department of Computer Sciences Linux Clusters David Thompson
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
Large Scale Parallel File System and Cluster Management ICT, CAS.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Probe Plans and Status SciDAC Kickoff July, 2001 Dan Million Randy Burris ORNL, Center for.
Rob Allan Daresbury Laboratory NW-GRID Training Event 25 th January 2007 Introduction to NW-GRID R.J. Allan CCLRC Daresbury Laboratory.
CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, Sam Liston,
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
CASPUR Site Report Andrei Maslennikov Lead - Systems Amsterdam, May 2003.
Welcome to the PVFS BOF! Rob Ross, Rob Latham, Neill Miller Argonne National Laboratory Walt Ligon, Phil Carns Clemson University.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
ClinicalSoftwareSolutions Patient focused.Business minded. Slide 1 Opus Server Architecture Fritz Feltner Sept 7, 2007 Director, IT and Systems Integration.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
Office of Science U.S. Department of Energy NERSC Site Report HEPiX October 20, 2003 TRIUMF.
Building and managing production bioclusters Chris Dagdigian BIOSILICO Vol2, No. 5 September 2004 Ankur Dhanik.
SA1 operational policy training, Athens 20-21/01/05 Presentation of the HG Node “Isabella” and operational experience Antonis Zissimos Member of ICCS administration.
National Energy Research Scientific Computing Center (NERSC) NERSC Site Report Shane Canon NERSC Center Division, LBNL 10/15/2004.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
LBNL/NERSC/PDSF Site Report for HEPiX Catania, Italy April 17, 2002 by Cary Whitney
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
An Introduction to GPFS
GPFS Parallel File System
The NERSC Global File System and PDSF Tom Langley PDSF Support Group NERSC at Lawrence Berkeley National Laboratory Fall HEPiX October 2006.
GGF 17 - May, 11th 2006 FI-RG: Firewall Issues Overview Document update and discussion The “Firewall Issues Overview” document.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
Compute and Storage For the Farm at Jlab
Ryan Leonard Storage and Solutions Architect
VMware ESX and ESXi Module 3.
CCNA Routing and Switching Routing and Switching Essentials v6.0
Exadata and ZFS Storage at Nielsen
Chapter 10: Device Discovery, Management, and Maintenance
CCNA Routing and Switching Routing and Switching Essentials v6.0
Introduction to Networks
Chapter 10: Device Discovery, Management, and Maintenance
NERSC Reliability Data
Design Unit 26 Design a small or home office network
National Energy Research Scientific Computing Center (NERSC)
System calls….. C-program->POSIX call
High Performance Storage System
Presentation transcript:

1 The NERSC Global File System NERSC June 12th, 2006

2 Overview NGF: What/Why/How NGF Today –Architecture –Who’s Using it –Problems/Solutions NGF Tomorrow –Performance Improvements –Reliability Enhancements –New Filesystems(/home)

3 What is NGF?

4 NERSC Global File System - what What do we mean by a global file systems? –Available via standard APIs for file system access on all NERSC systems. POSIX MPI-IO –We plan on being able to extend that access to remote sites via future enhancements. –High Performance NGF is seen as a replacement for our current file systems, and is expected to meet the same high performance standards

5 NERSC Global File System - why Increase User productivity –To reduce users’ data management burden. –Enable/Simplify workflows involving multiple NERSC computational systems –Accelerate the adoption of new NERSC systems Users have access to all of their data, source code, scripts, etc. the first time they log into the new machine Enable more flexible/responsive management of storage –Increase Capacity/Bandwidth on demand

6 NERSC Global File System - how Parallel Network/SAN heterogeneous access model Multi-Platform (AIX/linux for now)

7 NGF Today

8 NGF current architecture NGF is a GPFS file system using GPFS multi-cluster capabilities Mounted on all NERSC systems as /project External to all NERSC computational clusters Small linux server cluster managed separately from computational systems. 70 TB user visible storage. 50+ Million inodes. 3GB/s aggregate bandwith

9 NGF Current Configuration

10 /project Limited initial deployment - no homes, no /scratch Projects can include many users potentially using multiple systems(mpp, vis, …) and seemed to be prime candidates to benefit from the NGF shared data access model Backed up to HPSS bi-weekly –Will eventually receive nightly incremental backups. Default project quota: –1 TB –250,000 inodes

11 /project – 2 Current usage –19.5 TB used (28% of capacity) –2.2 M inodes used (5% of capacity) NGF /project is currently mounted on all major NERSC systems (1240+ clients): –Jacquard, LNXI Opteron System running SLES 9 –Da Vinci, SGI Altix running SLES 9 Service Pack 3 with direct storage access –PDSF IA32 Linux cluster running Scientific Linux –Bassi, IBM Power5 running AIX 5.3 –Seaborg, IBM SP running AIX 5.2

12 /project – problems & Solutions /project has not been without it’s problems –Software bugs 2/14/06 outage due to Seaborg gateway crash – problem reported to IBM, new ptf with fix installed. GPFS on AIX5.3 ftruncate() error on compiles – problem reported to IBM. efix now installed on Bassi. –Firmware bugs FibreChannel Switch bug – firmware upgraded. DDN firmware bug(triggered on rebuild) – firmware upgraded –Hardware Failures Dual disk failure in raid array – more exhaustive monitoring of disk health including soft errors now in place

13 NGF – Solutions General actions taken to improve reliability. –Pro-active monitoring – see the problems before they’re problems –Procedural development – decrease time to problem resolution/perform maintenance without outages –Operations staff activities – decrease time to problem resolution –PMRs filed and fixes applied – prevent problem recurrence –Replacing old servers – remove hardware with demonstrated low MTBF NGF Availability since 12/1/05: ~99% (total down time: 2439 minutes)

14 Current Project Information Projects using /project file system: (46 projects to date) –narccap: North American Regional Climate Change Assessment Program – Phil Duffy, LLNL Currently using 4.1 TB Global model with fine resolution in 3D and time; will be used to drive regional models Currently using only Seaborg –mp107: CMB Data Analysis – Julian Borrill, LBNL Currently using 2.9 TB Concerns about quota management and performance –16 different file groups

15 Current Project Information Projects using /project file system (cont.): –incite6: Molecular Dynameomics – Valerie Daggett, UW Currently using 2.1 TB –snaz: Supernova Science Center – Stan Woosley, UCSC Currently using 1.6 TB

16 Other Large Projects ProjectPIUsage snapSaul Perlmutter922 GB aerosolCatherine Chuang912 GB acceldacRobert Ryne895 GB vorpalDavid Bruhwiler876 GB m526Peter Cummings759 GB gc8Martin Karplus629 GB incite7Cameron Geddes469 GB

17 NGF Performance Many users have reported good performance for their applications(little difference from /scratch) Some applications show variability of read performance(MADCAP/MADbench) – we are investigating this actively.

18 MADbench Results OperationMinMaxMeanStdDev Bassi Home Read Bassi Home Write Bassi Scratch Read Bassi Scratch Write Bassi Project Read Bassi Project Write Seaborg Home Read Seaborg Home Write Seaborg Scratch Read Seaborg Scratch Write Seaborg Project Read Seaborg Project Write

19 Bassi Read Performance

20 Bassi Write Performance

21 Current Architecture Limitations NGF performance is limited by the architecture of current NERSC systems –Most NGF I/O uses GPFS TCP/IP storage access protocol Only Da Vinci can access NGF storage directly via FC. –Most NERSC systems have limited IP bandwidth outside of the cluster interconnect. 1 gig-e per I/O node on Jacquard. each compute node uses only 1 I/O node for NGF traffic. 20 I/O noodes feed into 1 10Gb ethernet Seaborg has 2 gateways with 4xgig-e bonds. Again each compute node uses only 1 gateway. Bassi nodes each have 1-gig interfaces all feeding into a single 10Gb ethernet link

22 NGF tomorrow(and beyond …)

23 Performance Improvements NGF Client System Performance upgrades –Increase client bandwidth to NGF via hardware and routing improvements. NGF storage fabric upgrades –Increase Bandwidth and ports of NGF storage fabric to support future systems. Replace old NGF Servers –New servers will be more reliable. –10-gig ethernet capable. New Systems will be designed to support High performance to NGF.

24 NGF /home We will deploy a shared /home file system in 2007 –Initially only home for 1 system, may be mounted on others. –New systems thereafter all have home directories on NGF /home –Will be a new file system with tuning parameters configured for small file accesses.

25 /home layout – decision slide Two options 1.A user’s login directory is the same for all systems –/home/matt/ 2.A user’s login directory is a different subdirectory of the user’s directory for each system –/home/matt/seaborg –/home/matt/jacquard –/home/matt/common –/home/matt/seaborg/common ->../common

26 One directory for all Users see exactly the same thing in their home dir every time they log in, no matter what machine they’re on. Problems –Programs sometimes change the format of their configuration files(dotfiles) from one release to another without changing the file’s name. –Setting $HOME affects all applications not just the one that needs different config files –Programs have been known to use getpwnam() to determine the users home directory, and look there for config files rather than in $HOME –Setting $HOME essentially emulates the effect of having separate home dirs for each system

27 One directory per system By default users start off in a different directory on each system Dotfiles are different on each system unless the user uses symbolic links to make them the same All of a users files are accessible from all systems, but a user may need to “cd../seaborg” to get at files he created on seaborg if he’s logged into a different system

28 NGF /home conclusion We currently believe that the multiple directories option will result in less problems for the users, but are actively evaluating both options. We would welcome user input on the matter.

29 NGF /scratch We plan on deploying a shared /scratch to NERSC-5 sometime in 2008