Parallel and Grid I/O Infrastructure W. Gropp, R. Ross, R. Thakur Argonne National Lab A. Choudhary, W. Liao Northwestern University G. Abdulla, T. Eliassi-Rad.

Slides:



Advertisements
Similar presentations
MicroKernel Pattern Presented by Sahibzada Sami ud din Kashif Khurshid.
Advertisements

University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
File Consistency in a Parallel Environment Kenin Coloma
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
Phillip Dickens, Department of Computer Science, University of Maine. In collaboration with Jeremy Logan, Postdoctoral Research Associate, ORNL. Improving.
A Scalable Virtual Registry Service for jGMA Matthew Grove CCGRID WIP May 2005.
By Ali Alskaykha PARALLEL VIRTUAL FILE SYSTEM PVFS PVFS Distributed File System:
GridFTP: File Transfer Protocol in Grid Computing Networks
2: OS Structures 1 Jerry Breecher OPERATING SYSTEMS STRUCTURES.
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
Grid IO APIs William Gropp Mathematics and Computer Science Division.
HDF5 collective chunk IO A Working Report. Motivation for this project ► Found extremely bad performance of parallel HDF5 when implementing WRF- Parallel.
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.
1 A Look at PVFS, a Parallel File System for Linux Talk originally given by Will Arensman and Anila Pillai.
Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
1 Outline l Performance Issues in I/O interface design l MPI Solutions to I/O performance issues l The ROMIO MPI-IO implementation.
1 High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006.
HDF5 A new file format & software for high performance scientific data management.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
MPICH2 – A High-Performance and Widely Portable Open- Source MPI Implementation Darius Buntinas Argonne National Laboratory.
DORII Joint Research Activities DORII Joint Research Activities Status and Progress 6 th All-Hands-Meeting (AHM) Alexey Cheptsov on.
1 Parallel and Grid I/O Infrastructure Rob Ross, Argonne National Lab Parallel Disk Access and Grid I/O (P4) SDM All Hands Meeting March 26, 2002.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
Opportunities in Parallel I/O for Scientific Data Management Rajeev Thakur and Rob Ross Mathematics and Computer Science Division Argonne National Laboratory.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Project 4 : SciDAC All Hands Meeting, September 11-13, 2002 A. Choudhary, W. LiaoW. Gropp, R. Ross, R. Thakur Northwestern UniversityArgonne National Lab.
The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.
DOE PI Meeting at BNL 1 Lightweight High-performance I/O for Data-intensive Computing Jun Wang Computer Architecture and Storage System Laboratory (CASS)
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
1 HDF5 Life cycle of data Boeing September 19, 2006.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Processes Introduction to Operating Systems: Module 3.
May 2003National Coastal Data Development Center Brief Introduction Two components Data Exchange Infrastructure (DEI) Spatial Data Model (SDM) Together,
Welcome to the PVFS BOF! Rob Ross, Rob Latham, Neill Miller Argonne National Laboratory Walt Ligon, Phil Carns Clemson University.
Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix Juan Piernas-Canovas SDM CENTER.
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.
Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:
Accelerating High Performance Cluster Computing Through the Reduction of File System Latency David Fellinger Chief Scientist, DDN Storage ©2015 Dartadirect.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
SDM Center Parallel I/O Storage Efficient Access Team.
Parallel IO for Cluster Computing Tran, Van Hoai.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
Unified Parallel C at LBNL/UCB Berkeley UPC Runtime Report Jason Duell LBNL September 9, 2004.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
An Introduction to GPFS
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
Introduction to Operating Systems Concepts
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
The Post Windows Operating System
File System Implementation
SDM workshop Strawman report History and Progress and Goal.
Introduction to Operating Systems
Presentation transcript:

Parallel and Grid I/O Infrastructure W. Gropp, R. Ross, R. Thakur Argonne National Lab A. Choudhary, W. Liao Northwestern University G. Abdulla, T. Eliassi-Rad Lawrence Livermore National Lab

Parallel and Grid I/O Infrastructure Outline Introduction PVFS and ROMIO Parallel NetCDF Query Pattern Analysis Please interrupt at any point for questions!

Parallel and Grid I/O Infrastructure What is this project doing? Extending existing infrastructure work –PVFS parallel file system –ROMIO MPI-IO implementation Helping match application I/O needs to underlying capabilities –Parallel NetCDF –Query Pattern Analysis Linking with Grid I/O resources –PVFS backend for GridFTP striped server –ROMIO on top of Grid I/O API

Parallel and Grid I/O Infrastructure What Are All These Names? MPI - Message Passing Interface Standard –Also known as MPI-1 MPI-2 - Extensions to MPI standard –I/O, RDMA, dynamic processes MPI-IO - I/O part of MPI-2 extensions ROMIO - Implementation of MPI-IO –Handles mapping MPI-IO calls into communication (MPI) and file I/O PVFS - Parallel Virtual File System –An implementation of a file system for Linux clusters

Parallel and Grid I/O Infrastructure Fitting the Pieces Together Query Pattern AnalysisParallel NetCDF Any MPI-IO Implementation Query Pattern Analysis (QPA) and Parallel NetCDF both written in terms of MPI-IO calls –QPA tools pass information down through MPI-IO hints –Parallel NetCDF written using MPI-IO for data read/write ROMIO implementation uses PVFS as storage medium on Linux clusters or could hook to Grid I/O resources ROMIO MPI-IO Implementation PVFS Parallel File SystemGrid I/O Resources

Parallel and Grid I/O Infrastructure PVFS and ROMIO Provide a little background on the two –What they are, example to set context, status Motivate the work Discuss current research and development –I/O interfaces –MPI-IO Hints –PVFS2 Our work with these two closely tied together.

Parallel and Grid I/O Infrastructure Parallel Virtual File System Parallel file system for Linux clusters –Global name space –Distributed file data –Builds on TCP, local file systems Tuned for high performance concurrent access Mountable like NFS file systems User-level interface library (used by ROMIO) 200+ users on mailing list, 100+ downloads/month –Up from 160+ users in March Installations at OSC, Univ. of Utah, Phillips Petroleum, ANL, Clemson Univ., etc.

Parallel and Grid I/O Infrastructure PVFS Architecture Client - Server architecture Two server types –Metadata server (mgr) - keeps track of file metadata (permissions, owner) and directory structure –I/O servers (iod) - orchestrate movement of data between clients and local I/O devices Clients access PVFS one of two ways –MPI-IO (using ROMIO implementation) –Mount through Linux kernel (loadable module)

PVFS and ROMIO PVFS Performance Ohio Supercomputer Center cluster 16 I/O servers (IA32), 70+ clients (IA64), IDE disks Block partitioned data, accessed through ROMIO

Parallel and Grid I/O Infrastructure ROMIO Implementation of MPI-2 I/O specification –Operates on wide variety of platforms –Abstract Device Interface for I/O (ADIO) aids in porting to new file systems –Fortran and C bindings Successes –Adopted by industry (e.g. Compaq, HP, SGI) –Used at ASCI sites (e.g. LANL Blue Mountain) MPI-IO Interface ADIO Interface FS-Specific Code (e.g. AD_PVFS, AD_NFS)

Parallel and Grid I/O Infrastructure Example of Software Layers FLASH Astrophysics application stores checkpoints and visualization data using HDF5 HDF5 in turn uses MPI-IO (ROMIO) to write out its data files PVFS client library is used by ROMIO to write data to PVFS file system PVFS client library interacts with PVFS servers over network FLASH Astrophysics Code HDF5 I/O Library ROMIO MPI-IO Library PVFS Client Library PVFS Servers

Parallel and Grid I/O Infrastructure Example of Software Layers (2) FLASH Astrophysics application stores checkpoints and visualization data using HDF5 HDF5 in turn uses MPI-IO (IBM) to write out its data files GPFS File System stores data to disks FLASH Astrophysics Code HDF5 I/O Library IBM MPI-IO Library GPFS

Parallel and Grid I/O Infrastructure Status of PVFS and ROMIO Both are freely available, widely distributed, documented, and supported products Current work focuses on: –Higher performance through more rich file systems interfaces –Hint mechanisms for optimizing behavior of both ROMIO and PVFS –Scalability –Fault tolerance

Parallel and Grid I/O Infrastructure Why Does This Work Matter? Much of I/O on big machines goes through MPI-IO –Direct use of MPI-IO (visualization) –Indirect use through HDF5 or NetCDF (fusion, climate, astrophysics) –Hopefully soon through Parallel NetCDF! On clusters, PVFS is currently the most deployed parallel file system Optimizations in these layers are of direct benefit to those users Providing guidance to vendors for possible future improvements

Parallel and Grid I/O Infrastructure I/O Interfaces Scientific applications keep structured data sets in memory and in files For highest performance, the description of the structure must be maintained through software layers –Allow the scientist to describe the data layout in memory and file –Avoid packing into buffers in intermediate layers –Minimize the number of file system operations needed to perform I/O

Parallel and Grid I/O Infrastructure File System Interfaces MPI-IO is a great starting point Most underlying file systems only provide POSIX-like contiguous access List I/O work was first step in the right direction –Proposed FS interface –Allows movement of lists of data regions in memory and file with one call Memory File

Parallel and Grid I/O Infrastructure List I/O Implemented in PVFS Transparent to user through ROMIO Distributed in latest releases

Parallel and Grid I/O Infrastructure List I/O Example Simple datatype repeated over file Desire to read first 9 bytes This is converted into four [offset,length] pairs One can see how this process could result in a very large list of offsets and lengths

Parallel and Grid I/O Infrastructure Describing Regular Patterns List I/O can ’ t describe regular patterns (e.g. a column of a 2D matrix) in an efficient manner MPI datatypes can do this easily Datatype I/O is our solution to this problem –Concise set of datatype constructors used to describe types –API for passing these descriptions to a file system

Parallel and Grid I/O Infrastructure Datatype I/O Built using a generic datatype processing component (also used in MPICH2) –Optimizing for performance Prototype for PVFS in progress –API and server support Prototype of support in ROMIO in progress –Maps MPI datatypes to PVFS datatypes –Passes through new API This same generic datatype component could be used in other projects as well

Parallel and Grid I/O Infrastructure Datatype I/O Example Same datatype as in previous example Describe datatype with one construct: –index {(0,1), (2,2)} describes pattern of one short block and one longer one –automatically tiled (as with MPI types for files) Linear relationship between # of contiguous pieces and size of request is removed # of Bytes # of Datatypes12 3 Datatype size of a byte

Parallel and Grid I/O Infrastructure MPI Hints for Performance ROMIO has a number of performance optimizations built in The optimizations are somewhat general, but there are tuning parameters that are very specific –buffer sizes –number and location of processes to perform I/O –data sieving and two-phase techniques Hints may be used to tune ROMIO to match the system

Parallel and Grid I/O Infrastructure ROMIO Hints Currently all of ROMIO ’ s optimizations may be controlled with hints –data sieving –two-phase I/O –list I/O –datatype I/O Additional hints are being considered to allow ROMIO to adapt to access patterns –collective-only I/O –sequential vs. random access –inter-file dependencies

Parallel and Grid I/O Infrastructure PVFS2 PVFS (version 1.x.x) plays an important role as a fast scratch file system for use today PVFS2 will supersede this version, adding –More comprehensive system management –Fault tolerance through lazy redundancy –Distributed metadata –Component-based approach for supporting new storage and network resources Distributed metadata and fault tolerance will extend scalability into thousands and tens of thousands of clients and hundreds of servers PVFS2 implementation is underway

Parallel and Grid I/O Infrastructure Summary ROMIO and PVFS are a mature foundation on which to make additional improvements New, rich I/O descriptions allow for higher performance access Addition of new hints to ROMIO allows for fine-tuning its operation PVFS2 focuses on the next generation of clusters